53
1 © 2017 The MathWorks, Inc. FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING VISION AND DEEP LEARNING ALGORITHMS USING MATLAB Avinash Nehemiah Joss Knight Girish Venkataramani

FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

  • Upload
    others

  • View
    31

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

1© 2017 The MathWorks, Inc.

FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING VISION AND DEEP LEARNING ALGORITHMS USING MATLAB

Avinash Nehemiah Joss Knight Girish Venkataramani

Page 2: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

2

Talk Outline

Design Deep Learning & Vision

Algorithms

High Performance Embedded

Implementation

Highlights • Manage large image sets• Automate image labeling• Easy access to models• Pre-built training

frameworks

Highlights Automate compilation of

MATLAB to CUDA 14x speedup over Caffe

& 4x speedup over TensorFlow

Accelerate and Scale Training

Highlights • Acceleration with GPU’s• Scale to clusters

Page 3: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

3

Let’s Use Object Detection as an Example

TRUCK

SUVCAR

In our example we’ll use deep learning for object detection.

Page 4: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

4

Two Approaches for Deep Learning

2. Fine-tune a pre-trained model (transfer learning)

1. Train a Deep Neural Network from Scratch

Page 5: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

5

Transfer Learning Workflow

Transfer LearningImages

New Classifier

Learn New Weights

Modify Network Structure

Load Reference NetworkLabels

Training DataLabels: Car, Truck, Large Truck, SUV, Van

Alexnet, VGG-16, VGG-19, GoogLeNet

Page 6: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

6

Manage Large Sets of Images

Transfer LearningImages

New Classifier

Learn New Weights

Modify Network Structure

Load Reference NetworkLabels

Handle Large Sets of Images

Easily manage large sets of images- Single line of code to access images- Operates on disk, database, big-data file system

imageData = imageDataStore(‘vehicles’)

Easily manage large sets of images- Single line of code to access images- Operates on disk, database, big-data file system

Organize Images in Folders( ~ 10,000 images , 5 folders)

Page 7: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

7

Automate Ground Truth Labeling

Transfer LearningImages

New Classifier

Learn New Weights

Modify Network Structure

Load Reference NetworkLabels

Ground Truth Labeling

Page 8: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

8

Automate Ground Truth Labeling

Page 9: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

9

Access Reference Models in MATLAB

Transfer LearningImages

New Classifier

Learn New Weights

Modify Network Structure

Load Reference NetworkLabels

Easily Load Reference NetworksAccess Models with 1-line of MATLAB CodeNet1 = alexnetNet2 = vgg16Net3 = vgg19

Page 10: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

10

Access Reference Models in MATLAB

Easily manage large sets of images- Single line of code to access images- Operates on disk, database, big-data file system

1. Reference Models

2. Model Importer

3. Tutorials

Page 11: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

11

Modify Network Structure

Transfer LearningImages

New Classifier

Learn New Weights

Modify Network Structure

Load Reference NetworkLabels

Simple MATLAB API to modify layers:layers(23) = fullyConnectedLayer(5, 'Name','fc8');layers(25) = classificationLayer('Name',‘VehicleClassifier')

Page 12: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

12

Training Object Detectors

Transfer LearningImages

New Classifier

Learn New Weights

Modify Network Structure

Load Reference NetworkLabels

Train Any Network trainNetwork(datastore, layers, options)

Frameworks for Computer Vision • Deep Learning: R-CNN, Fast R-CNN, Faster R-CNN• Machine Learning: ACF, Cascade Object Detectors

Page 13: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

13

Visualizing and Debugging Intermediate Results

Filters…

Activations

Deep Dream

Training Accuracy Visualization Deep Dream

Layer Activations Feature Visualization

• Many options for visualizations and debugging• Examples to get started

Page 14: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

14

Real World Systems Use More Than Deep Learning

Deep learning vehicle detector performance degraded with environmental effects ( fog etc. )

Fog Removal

Challenge: Deep learning frameworks do not include “classical” computer vision

Solution: Convert MATLAB code with deep learning and computer vision to embedded implementation

Page 15: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

15

Talk Outline

Design Deep Learning & Vision

Algorithms

High Performance Embedded

ImplementationAccelerate and Scale

Training

Can you solve “real” problems for production systems with MATLAB ? Doesn’t it take hours or days to train ?

Page 16: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

16

Problems of acceleration and scale

How can I make my code run faster? How can I scale up to bigger problems?

Will I have to learn new tools? Will I have to learn new concepts?

Page 17: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

17

MATLAB and Parallel Computing

Accelerate your code on the GPU

– DEMO: Preprocess your image dataset

Scale to multi-GPU and clusters

– DEMO: Parallel training with an Amazon P2 cluster

Page 18: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

18

Transfer Learning with MATLAB

Transfer LearningImages

New Classifier

Learn New Weights

Modify Network Structure

Load Reference NetworkLabels

Images

Learn New Weights

?

Page 19: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

19

Page 20: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

20

Page 21: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

21

Page 22: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

22

Page 23: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

23

Page 24: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

24

Page 25: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

25

Page 26: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

26

Page 27: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

29

Statistics and Machine LearningDistributions, hypothesis testing, k-

means clustering, nearest neighbour

Neural Networks

Deep Learning, Neural Network training and simulation

Image Processing and Computer Vision

Feature detection, transformations, filtering, object analysis

Signal Processing and Communications

FFT filtering, cross correlation, BER simulations

Parallel Computing

Over 300 core MATLAB functions optimized for GPU

• Elementary math• Linear algebra• FFTs and IFFTs• Convolution and filtering• Fitting and interpolation• Reductions and sorting• Sparse matrix support• double, single and integer

support

Built-in function support

Page 28: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

30

Programming with GPUs

GPU-optimized functions

Simple programming constructs

– gpuArray, gather

Writing kernels in the MATLAB language

– arrayfun

Interface with your own CUDA C and C++ code

– CUDAKernel, mexcuda

Ease

of U

se

Greater C

ontrol

Ease

of U

se

Greater C

ontrol

Prototyping Test Framework

Page 29: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

31

MATLAB and Parallel Computing

Accelerate your code on the GPU

– DEMO: Preprocess your image dataset

Scale to multi-GPU and clusters

– DEMO: Parallel training with an Amazon P2 cluster

Page 30: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

32

Deep learning on CPU, GPU, multi-GPU and clusters

More GPUs

Page 31: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

33

Deep learning on CPU, GPU, multi-GPU and clusters

More GPUs

Mor

e C

PUs

Page 32: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

34

Deep learning on CPU, GPU, multi-GPU and clusters

More GPUs

Mor

e C

PUs

Page 33: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

36

MATLAB and Parallel Computing

Accelerate your code on the GPU

– DEMO: Preprocess your image dataset

Scale to multi-GPU and clusters

– DEMO: Parallel training with an Amazon P2 cluster

Page 34: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

37

Transfer learning in 11 lines of MATLAB code

Page 35: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

38

Page 36: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

39

Performance

30-40% reduction in training time each time you double your GPUs

Communicating with the Cloud costs nothing extra

Page 37: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

40

Talk Outline

Design Deep Learning & Vision

Algorithms

High Performance Embedded

ImplementationAccelerate and Scale

Training

Can you create high performance implementation from MATLAB code ?

Page 38: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

41

Alexnet inference using MATLAB solution is~14x faster than pyCaffe and 60% faster than C++ Caffe~ 4x faster and ~3x less memory-use than TensorFlow

Presenting the MATLAB to CUDA parallelizing compiler

Why?

MATLAB

GPU Parallelization

Kernel creation

Memory allocation

Data transfer minimization

Page 39: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

42

Sample Generated CUDA Code

MATLAB source code Auto-generated CUDA code

GPU Parallelization

Kernel creation

Memory allocation

Data transfer minimization

Page 40: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

43

MATLAB to CUDA compiler flow

Control-flow graphIntermediate representation

(CFG – IR)

….

….

Identify loop-nests that will become CUDA kernels

….

Convert loop to CUDA kernelThread/blocks inferred from loop dims

Perform Use-def analysis.cudaMalloc GPU vars, insert memcpy

Infer data locality. Map to shared memory. Synthesize shared memory access

CUDA kernel optimizations

(×) cuBlas calls

(\) cuSolver calls

fft cuFFT calls

nnet cuDNN calls

Front – end

Traditional compiler optimizations

MATLAB Library function mapping

Parallel loop creation

CUDA kernel creation

cudaMemcpy minimization

Shared memory synthesis

CUDA code emission

Library function mapping

Parallel loop creation

CUDA kernel creation

cudaMemcpy minimization

Shared memory synthesis

Page 41: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

44

MATLAB to CUDA compiler: It’s all about big parallel loops!

Control-flow graphIntermediate representation

(CFG – IR)

….

….

CUDA kernel optimizations

Front – end

Traditional compiler optimizations

MATLAB Library function mapping

Parallel loop creation

CUDA kernel creation

cudaMemcpy minimization

Shared memory synthesis

CUDA code emission

Scalarization

Loop perfectization

Loop interchange

Loop fusion

Scalar replacement

Loopoptimizations

Scalarization

Loop fusion

Scalar replacement

Page 42: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

45

MATLAB to CUDA compiler: It’s all about big parallel loops!

Control-flow graphIntermediate representation

(CFG – IR)

….

….

CUDA kernel optimizations

Front – end

Traditional compiler optimizations

MATLAB Library function mapping

Parallel loop creation

CUDA kernel creation

cudaMemcpy minimization

Shared memory synthesis

CUDA code emission

Scalarization

Loop perfectization

Loop interchange

Loop fusion

Scalar replacement

Loopoptimizations

2 kernels (size N), 20*N bytes

1 kernel (size N), 16*N bytes

Page 43: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

46

cudaMemcpy minimizationA(:) = ….C(:) = ….

for i = 1:N….gB = kernel1(gA);gA = kernel2(gB);if (some_condition)

gC = kernel3(gA, gB);end….

end

…. = C;

cudaMemcpy*definitely* needed

cudaMemcpy*not* needed

cudaMemcpy*may be* needed

Observations• Equivalent to Partial redundancy elimination (PRE) • Dynamic strategy – track memory location with a

status flag per variable• Use-Def to determine where to insert memcpy

A(:) = …A_isDirtyOnCpu = true;…for i = 1:N

if (A_isDirtyOnCpu)cudaMemcpy(gA, A);A_isDirtyOnCpu = false;

endgB = kernel1(gA);gA = kernel2(gB);if (somecondition)

gC = kernel3(gA, gB);C_isDirtyOnGpu = true;

end…

end…if (C_isDirtyOnGpu)

cudaMemcpy(C, gC);C_isDirtyOnGpu = false;

end… = C;

Assume gA, gB and gC are mapped to GPU memory

Generated (pseudo) code

Page 44: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

47

Dotprod

Input image Conv. kernel Output image

rows

cols

kw

kh

Shared memory synthesis for stencil operations

For stencil operations, the MATLAB to CUDA compiler automatically• Infers GPU shared memory• Automates the collaborative loading in to shared memory block• Automatically translates access from global variable to shared-mem variable

Page 45: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

48

Example: Compiling fog-rectification algorithm

Page 46: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

49

MATLAB to CUDA Compilation in Computer Vision Applications

Distance transform

Fog removal

SURF feature extraction

Ray tracing

Stereo disparity

Page 47: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

50

Deep learning prediction performance: Alexnet

Page 48: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

51

Deep learning prediction performance: Alexnet

0

1

2

3

4

5

6

7

8

9CPU resident memory GPU peak memory (nvidia-smi)

Mem

ory

usag

e (G

B)

Batch Size1 16 32 64

CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50 GHzGPU Tesla K40c

Py-C

affe

MAT

LAB

to C

UD

A co

mpi

ler

Tens

orFl

ow

MAT

LAB

on

CPU

+GPU

C++

-Caf

fe

Page 49: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

52

Deep learning prediction performance: Alexnet

0

50

100

150

200

250

1 16 32 64 128

Fram

e ra

te (F

ps)

Batch Size

C++-Caffe

MATLAB to CUDA compiler

Jetson (Tegra) TX1

Page 50: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

53

Deep Learning Prediction Performance: VGG-16

0

10

20

30

40

50

60

70

80

90

1 16 32 64

CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHzGPU Tesla K40c

MATLAB to CUDA compiler

C++ Caffe

MATLAB running on CPU+GPU

TensorFlow

py Caffe

Fram

e ra

te (F

ps)

Batch Size

Page 51: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

54

Create CNNs with MATLAB, Deploy with MATLAB to CUDA compiler

Alexnet Vehicle Detection

People detection Lane detection

~30 Fps(Tegra X1)

~66 Fps(Tegra X1)

~20 Fps(K40c)

~130 Fps(K40c)

Page 52: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

55

Conclusions

Design Deep Learning & Vision

AlgorithmAccelerate and Scale

Training

Deep learning design is easy in MATLAB

Parallel Computing Toolbox

7x faster than pyCaffe2x faster than TensorFlow

MATLAB to CUDA compiler14x faster than pyCaffe

4x faster than TensorFlow1.6x faster than C++ Caffe

High Performance Embedded

Implementation

Page 53: FROM DESKTOP TO CLOUD TO EMBEDDED GPUS - GTC On …on-demand.gputechconf.com/gtc/...desktop...matlab.pdf · FROM DESKTOP TO CLOUD TO EMBEDDED GPUS DESIGNING, TRAINING, AND COMPILING

56

What Next?

www.mathworks.com/matlab-cuda-beta

MATLAB to CUDA compiler:Sign up for our beta program

Try Deep Learning with MATLAB

Visit our booth, we love to chat: Booth # 804