DEEP LEARNING / ARTIFICIAL INTELLIGENCE FOR ... · DEEP LEARNING / ARTIFICIAL INTELLIGENCE FOR ACCELERATED DATA ANALYTICS 5th December, 2017 Dr. Ettikan Kandasamy Karuppiah, Director

DEEP LEARNING / ARTIFICIAL INTELLIGENCE FOR ACCELERATED DATA ANALYTICS

5th December, 2017

Dr. Ettikan Kandasamy Karuppiah,Director of Developers Ecosystem,Nvidia, South East Asia Region

2

“Find where I parked my car”

AI IS EVERYWHERE, TOUCHING OUR LIVES

“Find the bag I just saw in this magazine”

“What movie should I watch next?”

Bringing grandmother closer to family by bridging

language barrier

Predicting sick baby’s vitals like heart rate, blood pressure, survival rate

Enabling the blind to “see” their surrounding, read

emotions on faces

Increasing public safety with smart video surveillance at

airports & malls

Providing intelligent services in hotels, banks and stores

Separating weeds as it harvests, reduces chemical

usage by 90%

3

EVERY INDUSTRY HAS AWOKEN TO AI

2014 2016

1,549

19,439

Higher Ed

Telecom

Healthcare

Finance

Automotive

Others

Government

Developer Tools

Organizations engaged with NVIDIA on Deep Learning

4

“mobile computing, inexpensive sensors collecting terabytes of data, rise of

machine learning that can use that data will fundamentally change the way the

global economy is organized.”

— Fortune, CEO – The Revolution is Coming March 8, 2016

5

THE 4TH WAVEDigital Lifestyle Services (DLS) - Trillion Dollar Wave

3G

4G

2G

5G

Digital Lifestyle

Services• Video & Content• Cloud Services• Cyber Security• Financial Services• IoT – Smart Cities• IoT – Connected

Vehicles• AR / VR• Mobile Gaming• Many more…

Reference : http://www.chetansharma.com/

6

Internet Startups

7

Visualization of the activations of theconvolutions over a car damage image

Visualization of the layers of a neural network and the back propagation process

Neural Network Demo

https://www.galacticar.ai/ -> Galaxy.ai : Artificial Intelligence Driven Sedan Damage Estimator

8

Frontend

9

Backend

NVIDIA METROPOLIS —EDGE TO CLOUD AI CITY PLATFORM

Camera

CLOUDTraining and Inference

EDGE AND ON-PREMISESInference

DGX Appliance

Video recorder Server

JETSON TESLA/QUADRO

JETPACK, TENSOR RT, DEEPSTREAM

AI IS THE SOLUTION TO SELF-DRIVINGPerception Reasoning

HD Map Mapping

Driving

AI Computing

Unsupervised Learning on Unstructured Data

13

3000

AI MOMENTUM

TODAY BY 2020

AI startups

85% $ 47B 20%

of all customer service interactions will

be powered by AI bots

spend on AI technologies

of companies will dedicate workers to monitor and guide neural networks

14

POTENTIAL AREAS FOR DISRUPTIONTop 8 Technical Domains

• Many Network Element are 20 year old design. Some elements can be accelerated with GPU

• E.g. Deep Packet InspectionNetwork Function Acceleration

• Virtualization removes networking interfaces, x86 is sluggish vs ASICS / FPGA

• GPU could be very promising for certain . Session Border Controllers Virtual Network Function

Acceleration

• AR / VR media servers,

• UHD Transcoding, etcAcceleration of graphical /

Video intense workload

• Connected Cars, Industrial IoT, Smart Cities

• Video Analytics High Speed AI Inference

• Network analytics, Subscriber Analytics

• Realtime Campaign Management , Self-Optimizing Network (SON)AI based Automation

• Virtualization and microservices

• Mobile Edge Computing with 4G+ an 5GEdge Cloud Computing

• RF planning and Optimization

• Acceleration of Massive MIMO 5G

• AI based network securityCyber Security for Networks

15

Telco Usecase : Fraud Detection

16

Telco Usecase : Customer Churn Prediction

17

Telco Usecase : Malware Thread Detection

Deep Learning Neural Nets Are Effective Against AI MalwareFEBRUARY 5, 2016 BY JAMIE PHAN 0 COMMENTS

Baidu, Google, and Facebook are deeply invested in deep learning via

neural networks, or networks of hardware and software that

approximate the web of neurons in the human brain.

https://www.bizety.com/2016/02/05/deep-learning-neural-nets-are-effective-against-ai-malware/

…attacks (Advanced Persistent Threats). According to the

Verizon 2015 Data Breach report, a total of over 170

million malware events were reported, with 70-90% of the

malware samples given were unique to a threat

organization. Thus, finding the malware signatures alone

are becoming more ineffective…..

Deep Instinct, an Israeli startup, claims that they are the first to employ offer a deep learning- based AV

solution using ANNs. They use data fragmentation by breaking down objects into their smallest parts, and

applying deep learning to identify unknown suspicious behavior and detect, predict, and prevent known

and unknown threats.

https://www.bizety.com/author/jamiep-2/

https://www.bizety.com/2016/02/05/deep-learning-neural-nets-are-effective-against-ai-malware/#disqus_thread

http://www.greycastlesecurity.com/resources/documents/2015_Verizon_Business_Data_Breach_Investigations_Report.pdf

18

1980 1990 2000 2010 2020

GPU-Computing perf

1.5X per year

1000X

by

2025

RISE OF GPU COMPUTING

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,

O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected

for 2010-2015 by K. Rupp

102

103

104

105

106

107

Single-threaded perf

1.5X per year

1.1X per year

APPLICATIONS

SYSTEMS

ALGORITHMS

CUDA

ARCHITECTURE

19

Deep Learning-Inferencing:TESLA V100 DELIVERS NEEDED RESPONSIVENESS WITH UP TO 99X MORE THROUGHPUT

0

1,000

2,000

3,000

4,000

5,000

CPU Server Tesla V100

ResNet-50

4,647 i/s@ 7mslatency

47 i/s@ 21msLatency

0

600

1,200

1,800


VGG-16

1,658i/s@ 7ms

Latency

23 i/s@ 43msLatency

0

500

1,000

1,500

2,000

2,500

3,000

3,500


GoogleNet

3,270 i/s@ 7ms

Latency

136 i/s@ 7ms

LatencyThro

ughput

(Im

ages/

Sec)

Thro

ughput

(Im

ages/

Sec)

Thro

ughput

(Im

ages/

Sec)

CPU: Xeon E5-2690 V4

GPU Servers: Dual Xeon E5-2690 [email protected] with 16GB PCIe GPUs configs as shownUbuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13; NCCL 2.0.4, TensorRT pre-release, data set: ImageNet,GPU Optimal batch size used to achieve 7ms latency; CPU batch size reduced to 1 if latency exceeds 7ms

20

REVOLUTIONARY AI PERFORMANCE3X Faster DL Training Performance

Over 80x DL Training Performance in 3 Years

1x K80cuDNN2

4x M40cuDNN3

8x P100cuDNN6

8x V100cuDNN7

0x

20x

40x

60x

80x

100x

Q1

15

Q3

15

Q2

17

Q2

16

Googlenet Training Performance(Speedup Vs K80)

Speedup v

s K80

85% Scale-Out EfficiencyScales to 64 GPUs with Microsoft

Cognitive Toolkit

0 5 10 15

64X V100

8X V100

8X P100

Multi-Node Training with NCCL2.0(ResNet-50)

ResNet50 Training for 90 Epochs with 1.28M images dataset | Using Caffe2 | V100 performance measured on pre-production hardware.

1 Hour

7.4 Hours

18 Hours

3X Reduction in Time to Train Over P100

0 10 20

1XV100

1XP100

2XCPU

LSTM Training(Neural Machine Translation)

Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x Xeon E5 2699 V4 | V100 performance

measured on pre-production hardware.

15 Days

18 Hours

6 Hours

21

VOLTA DELIVERS 3X MORE INFERENCE THROUGHPUT

Low Latency performance with V100 and TensorRT

Fuse LayersCompact

Optimize Precision(FP32, FP16, INT8)

3x more throughput at 7ms latency with V100 (ResNet-50)

TensorRT CompiledReal-timeNetwork

Trained Neural

Network

0

1,000

2,000

3,000

4,000

5,000

CPU Tesla P100(TensorFlow)

Tesla P100(TensorRT)

Tesla V100(TensorRT)

Thro

ughput

@ 7

ms

(Im

ages/

Sec)

33ms

CPU Server: 2X Xeon E5-2660 V4; GPU: w/P100, w/V100 (@150W) | V100 performance measured on pre-production hardware.

3X

10ms

7ms

7ms

22

DATACENTER IN A RACK1 Rack of Tesla P100 Delivers Performance of 6,000 CPUs

QUANTUM PHYSICSMILC

WEATHERCOSMO

DEEP LEARNINGCAFFE/ALEXNET

12 NODES in RACK8 P100s per Node

MOLECULAR DYNAMICSAMBER

# of Racks1 108642 . . .40 84

638 CPUs / 186 kW

650 CPUs / 190 kW

6,000 CPUs / 1.8 MW

2,900 CPUs / 850 kW

96 P100s / 38 kW

36 NODES in RACKDUAL CPUs Per Node

. . .

23

DEEP LEARNING FRAMEWORKS

VISION SPEECH BEHAVIOR

Object Detection Voice Recognition Language TranslationRecommendation

EnginesSentiment Analysis

DEEP LEARNING

cuDNN

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

NCCL

cuFFT

Mocha.jl

Image Classification

NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning

24

GPU-ACCELERATION HAS NO LIMITSMapD

BlazeGraph

Kinetica

Leading In-Memory DB> 50x Slower

NoSQL DB’s> 100x Slower

Aggregate of queries - Time (s)Less is better!

SQream

1403

1843

700

GPUs 700X-800X faster

than graphs in all cases

700M Edges Single Node

Xeon 2650 vs 2 K80

1.98B Edges 16 EC2

r3.xlarge vs 16 K40s

1.98B Edges 16 EC2

r3.4xlarge vs 16 K40s2

1.98B Edges Spark CPU

Baseline

1

Speed-up over baseline spark CPU configuration

Speed-u

p (

hig

her

is f

ast

er)

25

GPU DATABASES ARE EVEN FASTER1.1 Billion Taxi Ride Benchmarks

21 30

1560

80 99

1250

150269

2250

372

696

2970

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node

Query 1 Query 2 Query 3 Query 4

Tim

e in M

illise

conds

Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS @marklit82

10190 8134 19624 85942

http://tech.marksblogg.com/billion-nyc-taxi-rides-redshift-large-cluster.html

http://tech.marksblogg.com/billion-nyc-taxi-rides-spark-2-1-0-emr.html

26

TESLA PLATFORMLeading Data Center Platform for HPC and AI

TESLA GPU & SYSTEMS

NVIDIA SDK

INDUSTRY TOOLS

APPLICATIONS & SERVICES

C/C++

ECOSYSTEM TOOLS & LIBRARIES

HPC

+400 More Applications

cuBLAS

cuDNN TensorRT

DeepStreamSDK

FRAMEWORKS MODELS

Cognitive Services

AI TRAINING & INFERENCE

Machine Learning Services

ResNetGoogleNetAlexNet

DeepSpeechInceptionBigLSTM

DEEP LEARNING SDK

NCCL

COMPUTEWORKS

NVLINK SYSTEM OEM CLOUDTESLA GPU

27

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Docker/Container Development and Deployment model

Always up-to-date via the cloud

Mixed framework environments — containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

NVIDIA GPU CLOUDGPU-ACCELERATED CLOUD PLATFORMOPTIMIZED FOR DEEP LEARNING

Containerized in NVDocker

Optimization Across the Full Stack

Always Up-to-Date

Fully Tested and Maintained by NVIDIA

Now Available with Volta GPUs on AWS

Sign up: www.nvidia.com/gpu-cloud

29

Thank You

Documents

DEEP LEARNING / ARTIFICIAL INTELLIGENCE FOR ... · DEEP LEARNING / ARTIFICIAL INTELLIGENCE FOR ACCELERATED DATA ANALYTICS 5th December, 2017 Dr. Ettikan Kandasamy Karuppiah, Director