29
Copyright © 2016 Imagination Technologies 1 Efficient Convolutional Neural Network Inference on Mobile GPUs Paul Brasnett May 3, 2016

"Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Embed Size (px)

Citation preview

Page 1: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 1

Efficient Convolutional Neural Network

Inference on Mobile GPUs Paul Brasnett

May 3, 2016

Page 2: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 2

• About Imagination Technologies

• PowerVR GPUs

• Case study: Implementing Convolutions

• Performance Analysis

• Conclusions

• Resources

Overview

Page 3: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 3

• Imagination Technologies

is a leading IP supplier for

multimedia, processors and

communications

• More than 8bn units

containing Imagination IP

shipped

About Imagination Technologies

SoC

fabric

PowerVR Graphics & GPU Compute

Processors

Ensigma Communications

Processors

PowerVR Vision

Processors

MIPS Processors

PowerVR Video

Processors

Page 4: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 4

What is a Mobile GPU?

Mobile GPU

Optimised for High

Performance at

Low Power

Page 5: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 5

What is a Mobile GPU?

Mobile Devices

Automotive

Consumer Multimedia

Wearables

Internet of Things

Augmented Reality

Mobile GPU

Optimised for High

Performance at

Low Power

Page 6: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 6

Why Mobile GPUs for Vision Processing?

CPUs can generate large amounts of heat • CPUs can deliver high peak/burst

performance

• But generate large amounts of heat

• PowerVR Mobile GPUs provide

• Lowest power FP16 & int pipelines

• Local memory for highly efficient data

access for compute operations

• Power-saving features such as gating

of non-compute parts of GPU for

efficient compute operation

Page 7: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 7

Why Mobile GPUs for Vision Processing?

Provence(raytracing)

Particle Simulation –

32k

Particle Simulation –

4k Julia Set

AmbientOcclusion

Denoise Gaussian Blur

CPU 100.00% 100% 100% 100% 100% 100% 100%

PowerVR Series6 265% 407% 517% 963% 1126% 482% 383%

0%

100%

200%

300%

400%

500%

600%

Perf

orm

ance

rel

ativ

e t

o C

PU

Page 8: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 8

Moving the CNN Workload to the GPU

PowerVR GPU — Graphics and compute CPU

Large Cache

Unified System Memory

CPU1

CPU0

THREADS

Few

Multiprocessor (Unified Shading Cluster)

Multiprocessor (Unified Shading Cluster)

Coarse Grain Scheduler

L2

System Level Cache Cache Unit

Residency

Slots

Common

Store Compute Store

Texture

Processing Unit

Residency

Slots

Common

Store Compute Store Scheduler

System Memory Interface

enqueue Compute

Kernel

Host Interface

Scheduler

System Memory Interface

Page 9: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 9

Evolution of Mobile GPU

PowerVR

Series 6 GPU

PowerVR

Series 7 GPU

PowerVR

Series 8 GPU

Page 10: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 10

Evolution of Mobile GPU

OpenCL 1.2

OpenCV

OpenVX

Vulkan

OpenCL 2.0

New APIs

Page 11: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 11

• Mobile GPU increasingly dominating compute performance in SoCs

GPU Dominates Compute in Modern SoCs

CPU

GP

U

Illustrative diagram only, to show relative CPU/GPU size

Page 12: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 12

• State-of-the-art performance

• Rapid development cycles

• Range of vision tasks

• Classification

• Localisation

• Other applications…

Why CNNs?

Camera Localisation

PoseNet: A Convolutional Network for Real-Time 6-DOF Camera

Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015

Page 13: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 13

What is a CNN?

Convolution Activation Normalization Pooling Fully Connected

Convolution Image Activation Pooling

Fully Connected

CNN Architecture Building Blocks

CNN Example Network

Normalization

Soft Max

Convolution Activation Pooling Normalization

Convolution Activation Pooling Soft Max

Page 14: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 14

• Training — Offline

CNN Object Classification

Architecture

Data CNN Library Compute + Time Model Coefficients

Page 15: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 15

• Training — Offline

• Inference — Online

CNN Object Classification

Architecture

Data CNN Library Compute + Time Model Coefficients

Architecture

Model Coefficients

Page 16: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 16

• Training — Offline

• Inference — Online

CNN Object Classification

Architecture

Data CNN Library Compute + Time Model Coefficients

Architecture

Model Coefficients

Image

CNN Library Compute Classification

Mobile GPU

Page 17: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 17

Where is the Cost in CNN Inference?

Flops by layer-type (AlexNet)

Convolution

Normalisation

Pooling

Fully Connected

Page 18: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 18

• Create as many work-items as is size of output matrix

• Each work-item will read it’s row and column and produce dot product

• Requires large number of accesses to memory

Matrix Multiply — Naïve

x =

A B C

Page 19: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 19

• The OpenCL memory model

closely maps to GPU architecture

• Private Memory — Per work-item

• Local Memory

• Shared within a work-group

• Global Memory /Constant Memory

• Visible to all work-groups

• Host memory

• Typically share CPU/GPU on a

mobile SoC

OpenCL Memory Model

Page 20: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 20

• Work-items load A data into private memory

Matrix Multiply — Tiling Approach

Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”

x =

A B C

Page 21: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 21

• Work-items load A data into private memory

• Work-groups load B data into local memory

• Each work item will read from local memory and produce a dot product

• Significantly reduces global memory accesses

Matrix Multiply — Tiling Approach

x =

A B C

Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”

Page 22: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 22

• Choose work-group size to fit the GPU, 32 work-items is typically a good

choice for PowerVR GPUs

• Read multiple items (e.g. 4 or 8) into private memory at a time to optimise

memory transfers

• Consider the use of half data type in place of float

• Most PowerVR platforms provide up to 2x the flops

• Define workgroup size at compile time

• __attribute__((reqd_work_group_size(SIZE, 1, 1)))

Matrix Multiply — OpenCL Tips

Page 23: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 23

Matrix Multiply — Tiling Approach

0.1

1

10

100

1000

Tim

e (

s)

Matrix Size

Naïve

Tiled matrix multiply

Page 24: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 24

CNN Classification: AlexNet & GoogLeNet

60

5.5

Model Coefficients (Millions)

AlexNet GoogLeNet

1.3

3.1

Operations (Billions)

AlexNet GoogLeNet18.2

10.07

Top-5 Error Rate (%)

AlexNet GoogLeNet

Bandwidth Compute

Page 25: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 25

• Time consumed by layer type

Performance Analysis — CNN Inference

GoogLeNet

Convolutions

Pooling

Normalisation

Fully Connected

Reference Time*: 1.36 Reference Time*: 1.00

AlexNet

Convolutions

Pooling

Normalisation

Fully Connected

Page 26: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 26

Performance Analysis — GPU v CPU*

* CPU results based on Caffe (with ATLAS)

0

2

4

6

8

10

12

14R

ela

tive F

PS

Perf

orm

an

ce

(Hig

her

is b

ett

er)

AlexNet

GPU - PowerVR 2 ClusterGPU (480MHz)

CPU - ARM A15 (1.6GHz)

Page 27: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 27

Efficiency Analysis — GPU v CPU

0

0.5

1

1.5

2

2.5

3

3.5R

ela

tive E

ffic

ien

cy (

Hig

her

is

bett

er)

AlexNet

GPU - PowerVR 2Cluster GPU (480MHz)

CPU - ARM A15(1.6GHz)

Page 28: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 28

• Mobile GPUs are widely available in a range of SoCs across numerous

markets today

• Compared to mobile CPUs, PowerVR Mobile GPUs offer

• upto 3x higher efficiency and

• upto 12x higher performance deployment for CNNs

• Newer CNN architectures with smaller fully connected layers help to

make more efficient use of compute resources

• PowerVR GPUs scale to allow for higher levels of performance & lower

power for current and future generations of vision enabled products

• COME & SEE THE DEMO DURING THE NEXT BREAK

Conclusions

Page 29: "Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

Copyright © 2016 Imagination Technologies 29

• PowerVR GPU Compute

• https://imgtec.com/tools/powervr-gpu-compute/

• Guide to writing OpenCL

• http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue

• PowerVR Imaging Framework

• http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk

• PowerVR CNN Demo

• See our stand

• OpenCL Tutorial

• https://handsonopencl.github.io/

Resources