26
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 29 th October, 2017 PR12 Paper Review Jinwon Lee Samsung Electronics

MobileNet - PR044

Embed Size (px)

Citation preview

Page 1: MobileNet - PR044

MobileNets:Efficient Convolutional Neural Networks for

Mobile Vision Applications

29th October, 2017PR12 Paper Review

Jinwon LeeSamsung Electronics

Page 2: MobileNet - PR044

MobileNets

Page 3: MobileNet - PR044

Key Requirements for Commercial Computer Vision Usage• Data-centers(Clouds)

Rarely safety-critical

Low power is nice to have

Real-time is preferable

• Gadgets – Smartphones, Self-driving cars, Drones, etc. Usually safety-critical(except smartphones)

Low power is must-have

Real-time is required

Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

Page 4: MobileNet - PR044

What’s the “Right” Neural Network for Use in a Gadget?• Desirable Properties

Sufficiently high accuracy

Low computational complexity

Low energy usage

Small model size

Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

Page 5: MobileNet - PR044

Why Small Deep Neural Networks?

• Small DNNs train faster on distributed hardware

• Small DNNs are more deployable on embedded processors

• Small DNNs are easily updatable Over-The-Air(OTA)

Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

Page 6: MobileNet - PR044

Techniques for Small Deep Neural Networks

• Remove Fully-Connected Layers

• Kernel Reduction ( 3x3 1x1 )

• Channel Reduction

• Evenly Spaced Downsampling

• Depthwise Separable Convolutions

• Shuffle Operations

• Distillation & Compression

Slide Credit : Small Deep Neural Networks - Their Advantages, and Their Design by Forrest Iandola

Page 7: MobileNet - PR044

Key Idea : Depthwise Separable Convolution!

Page 8: MobileNet - PR044

Recap – Convolution Operation

1 1 1 1 01 1 1 1 00 0 0 1 10 1 1 1 00 1 1 0 0

0 1 1 0 10 1 1 0 00 0 1 1 00 0 1 1 11 1 1 0 0

1 1 1 0 00 1 1 1 00 0 1 1 10 0 1 1 00 1 1 0 0

-1 0 00 1 00 0 -1

0 -1 0-1 1 -10 -1 0

1 0 10 1 01 0 1

-1 0 -10 1 00 0 -1

0 -1 0-1 1 00 -1 0

1 0 10 -1 01 0 1

1 -1 10 -1 -13 1 0

3 0 1-2 0 20 2 3

=convolution

Input channel : 3 Output channel : 2# of filters : 2

Page 9: MobileNet - PR044

Recap – VGG, Inception-v3

• VGG – use only 3x3 convolution Stack of 3x3 conv layers has same effective receptive

field as 5x5 or 7x7 conv layer

Deeper means more non-linearities

Fewer parameters: 2 x (3 x 3 x C) vs (5 x 5 x C)

regularization effect

• Inception-v3 Factorization of filters

Page 10: MobileNet - PR044

Why should we always consider all channels?

Page 11: MobileNet - PR044

Standard Convolution

Depthwise convolution

Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/

Page 12: MobileNet - PR044

Depthwise Separable Convolution

• Depthwise Convolution + Pointwise Convolution(1x1 convolution)

Depthwise convolution Pointwise convolution

Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/

Page 13: MobileNet - PR044

Standard Convolution vs Depthwise Separable Convolution

Page 14: MobileNet - PR044

Standard Convolution vs Depthwise Separable Convolution• Standard convolutions have the computational cost of

DK x DK x M x N x DF x DF

• Depthwise separable convolutions cost DK x DK x M x DF x DF + M x N x DF x DF

• Reduction in computations 1 / N + 1 / DK

2

If we use 3x3 depthwise separable convolutions, we get between 8 to 9 times less computations

DK : width/height of filtersDF : width/height of feature mapsM : number of input channelsN : number of output channels(number of filters)

Page 15: MobileNet - PR044

Depthwise Separable Convolutions

Page 16: MobileNet - PR044

Model Structure

Page 17: MobileNet - PR044

Width Multiplier & Resolution Multiplier

• Width Multiplier – Thinner Models For a given layer and width multiplier α, the number of input channels M

becomes αM and the number of output channels N becomes αN – where αwith typical settings of 1, 0.75, 0.6 and 0.25

• Resolution Multiplier – Reduced Representation The second hyper-parameter to reduce the computational cost of a neural

network is a resolution multiplier ρ

0<ρ≤1, which is typically set of implicitly so that input resolution of network is 224, 192, 160 or 128(ρ = 1, 0.857, 0.714, 0.571)

• Computational cost:

DK x DK x αM x ρDF x ρDF + αM x αN x ρDF x ρDF

Page 18: MobileNet - PR044

Width Multiplier & Resolution Multiplier

Page 19: MobileNet - PR044

Experiments – Model Choices

Page 20: MobileNet - PR044

Model Shrinking Hyperparameters

Page 21: MobileNet - PR044

Model Shrinking Hyperparameters

Page 22: MobileNet - PR044

Results

Page 23: MobileNet - PR044

Results

PlaNet : 52M parameters, 5.74B mult-addsMobilNet : 13M parameters, 0.58M mult-adds

Page 24: MobileNet - PR044

Results

Page 25: MobileNet - PR044

Results

Page 26: MobileNet - PR044

Tensorflow Implementation

https://github.com/Zehaos/MobileNet/blob/master/nets/mobilenet.py