24
Dibakar Gope, Jesse Beu, Ganesh Dasika, Urmish Thakker, Matthew Mattina Arm ML Research Lab Applying Ternary Quantization to Already Constrained Networks

o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

Dibakar Gope, Jesse Beu, Ganesh Dasika, Urmish Thakker, Matthew Mattina

Arm ML Research Lab

Applying Ternary Quantization to Already

Constrained Networks

Page 2: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

2 © 2019 Arm Limited

Executive Summary• ML algorithms are increasingly deployed in IoT devices

Challenge:• Highly constrained memory budget• Aggressive model compression is required to

• Target severely constrained devices• Deploy more applications on them

Our solutions:• Hybrid neural-tree network to improve over an already optimized IoT network

SysML 2019, https://arxiv.org/abs/1903.01531

• Per-layer hybrid filter banks to improve over the already optimized MobileNetsUnder review, arXiv: https://arxiv.org/abs/1911.01028

BBC Micro:Bit (Arm Cortex M0) (16KB RAM)

Apollo Lite (Arm Cortex M4) (32kB SRAM)

Page 3: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

3 © 2019 Arm Limited

Hybrid Neural-Tree Network Summary• “DS-CNN” – a highly optimized network for keyword spotting

How do we optimize it further at iso-accuracy? Fit it into 32KB?

• Ternarize weight values using Strassen’s algorithm Memory footprint reduced by 30%

• Selectively use decision trees to reduce compute #Operations reduced by 12%

• < 0.3% loss in accuracy for these savings

DS-CNN

HybridNet

90

91

92

93

94

95

0 5 10 15 20 25 30 35 40 45

Accu

racy

(%)

Memory Footprint (KB)

Accuracy vs Memory Footprint

DS-CNNHybridNet

90

91

92

93

94

95

0 0.5 1 1.5 2 2.5 3Ac

cura

cy (%

)#Operations (M)

Accuracy vs #Operations

Page 4: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

4 © 2019 Arm Limited

Known Solutions

• Architectural optimizations, e.g., MobileNet

• Pruning

• Quantization

• Low rank factorizations of weights and activations

Page 5: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

5 © 2019 Arm Limited

Our Strategy

• Optimize depthwise separable (DS) convolutional layers• Model complexity dominated by 1 x 1 filters• DS-CNN has 4 DS layers• Optimize DS-CNN ↔ optimize 1x1 filters

• Other techniques to reduce complexity• Binary/ternary quantization – loses accuracy• Model pruning – loses accuracy

• Our approach• Exploit StrassenNets, approximates matrix multiplication using Strassen’s algorithm• > 99% reduction in MULs for 3x3 filters, mostly ternary weights, preserve accuracy

• Apply StrassenNets to 1x1 filters

3 x 3 filter1 x 1 filter

Very compact Over parametrized

Page 6: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

6 © 2019 Arm Limited

Standard Matrix Multiplication (8 MULs)

MULs: 2 x 2 x 2 = 8 MULS

Page 7: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

7 © 2019 Arm Limited

Strassen’s Matrix Multiplication (7 MULs)• Turns Sum-of-Products into Sum-of-Product-of-Sums

• But using fewer products overall• Sum networks can be represented via ternary weights

7 intermediate terms

Page 8: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

8 Confidential © 2019 Arm Limited

Strassen’s Matrix Multiplication (7 MULs)

(Point-wise Multiply)

a b c

p1 p2 p3

d

p4 p5 p6 p7

(a) (a+b) (c+d) (d) (a+d) (b-d) (a-c)

(f-h) (h) (e) (g-e) (e+h) (g+h) (e+f)

C00 C01 C10 C11

activations “weights” Exactoutput

7 terms

A

BP

Output

Page 9: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

9 Confidential © 2019 Arm Limited

StrassenNets ( < 7 MULs)

Approximate output

(Point-wise Multiply)

a b c

p1

d

p2 p3

(…) (…) (…)

C00 C01 C10 C11

<< 7 terms(…) (…) (…)

(Point-wise Multiply)

a b c

p1 p2 p3

d

p4 p5 p6 p7

(a) (a+b) (c+d) (d) (a+d) (b-d) (a-c)

(f-h) (h) (e) (g-e) (e+h) (g+h) (e+f)

C00 C01 C10 C11

7 terms

Exact output

Page 10: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

10 © 2019 Arm Limited

Convert DS-CNN to Strassenified DS-CNN

Conv1

DS-Conv1

DS-Conv2

Input

DS-Conv3

DS-Conv4

Pooling layer

Output layer

Convolutionallayers

Strassen Convolution

NarrowHidden

layer

WideHidden

layer

• #Intermediate terms ↔ width of hidden layer• Different width of hidden layer ↔ different accuracy vs. costs

Intermediate terms

HiddenLayer

DS-CNN

(…) (…) (…) (…) (…)

(…) (…) (…)(…) (…)Weight

Input

Output

(…) (…) (…)

(…) (…) (…)Weight

Input

Output

Page 11: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

11 © 2019 Arm Limited

DS-CNN vs Strassenified DS-CNN

DS-CNN StrassenifiedDS-CNN

90

91

92

93

94

95

0 1 2 3 4 5 6 7 8 9 10 11

Accu

racy

(%)

Operations (M)

Accuracy vs #Operations

DS-CNNStrassenified

DS-CNN

90

91

92

93

94

95

0 5 10 15 20 25 30 35 40

Accu

racy

(%)

Model Size (KB)

Accuracy vs Model Size

• 98% reduction in MULs, modest savings in model size• > 50% increase in ADDS for 1x1 convolutions to achieve iso-accuracy• Over parameterized 3x3 convolutions does not have this problem• See our paper for details

DS-CNN Strassenified DS-CNN

Page 12: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

12 © 2019 Arm Limited

StrassenNets Pitfalls and Potential Solutions• How do we address the increase in ADDs?

• Explore tree-based learning

Cons• Bad feature extractor

• Poor prediction accuracy for complex applications

• > 10% drop in accuracy for keyword spotting

Pros• Accuracy on par with neural models

• Computationally efficient compared to neural networkse.g., Bonsai decision trees [ICML 2017]

Page 13: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

13 © 2019 Arm Limited

Advantages & Limitations of StrassenNets & Decision Trees

Conv1

DS-Conv1

DS-Conv2

Input

DS-Conv3

DS-Conv4

Pooling layer

Output layer

Yes No

Yes No

𝐱

Decision Tree

StrassenifiedDS-CNN

DS-CNN

Decision Tree

Strassenified DS-CNN

0

1

2

3

4

5

0 20 40 60 80 100 120 140 160

Ope

ratio

ns (M

)

Model Size (KB)

DS-CNN vs. Strassenified DS-CNN vs. Decision Tree

Better

Even this large tree can not

preserve accuracy

Page 14: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

14 © 2019 Arm Limited

Hybrid Neural-Tree Network for KWS

𝐲 =

𝐖 𝑫 tanh 𝐕 𝑫

+𝐖 𝑫 tanh 𝐕 𝑫

+𝐖 𝑫 tanh 𝐕 𝑫

𝐖 , 𝐕𝛉 𝑫 > 0

No

𝑫

Conv1

DS-Conv1

DS-Conv2

No

𝐖 , 𝐕𝛉 𝑫 > 0

𝐖 , 𝐕𝐖 , 𝐕

𝐖 , 𝐕𝛉 𝑫 > 0

𝐖 , 𝐕𝐖 , 𝐕

NoNo

Yes

Yes

Input

3 convolution Layersinstead of

5 layers in baseline DS-CNN

Conv1

DS-Conv1

DS-Conv2

Input

DS-Conv3

DS-Conv4

Pooling

Output

𝐱

Rich featuresfed to

Decision Tree

StrassenNets (baseline DS-CNN w/ 5 Convs.)

StrassenNets (HybridNet w/ 3 Convs. + Depth 2 tree)

Reduction in

Computational Cost

Page 15: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

15 © 2019 Arm Limited

DS-CNNStrassenified-HybridNet

(8b Activations)

90

91

92

93

94

95

0 0.5 1 1.5 2 2.5 3

Accu

racy

(%)

#Operations (M)

Accuracy vs #Operations

DS-CNN vs Strassenified Hybrid Neural-Tree Network

• Strassenified HybridNet - 11.1% reduction in Ops, 30.6% improvement in memory footprint 0.27% loss in accuracy than DS-CNN98.9% reduction in MULs, 12.2% reduction in ADDs (MACs = ADDs = MULs)

• Ops dominated with area- and energy-efficient ADDs (3-30x [Horowitz 2014]) than MULs

DS-CNN

Strassenified-HybridNet(8b Activations)

90

91

92

93

94

95

0 5 10 15 20 25 30 35 40 45

Accu

racy

(%)

Memory Footprint (KB)

Accuracy vs Overall Memory Footprint

Page 16: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

arXiv: https://arxiv.org/abs/1911.01028

Ternary MobileNets via

Per-layer hybrid filter banks

Page 17: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

17 © 2019 Arm Limited

Apply Ternary Quantization to MobileNets

MobileNets V1 –13 (1 x 1) convolutional layers

Photo credit: Google AI blog on MobileNets V1

3 x 3 filter1 x 1 filter

Very compact Over parametrized

Page 18: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

18 © 2019 Arm Limited

Baseline

TWN

StrassenNets

54

56

58

60

62

64

66

0 100 200 300 400 500 600 700

Accu

racy

(%)

#MACs/ADDs Operations (M)

Accuracy vs #Operations

Performance of MobileNet with Existing Ternary Quantization

• Increase in ADDs using StrassenNets ↔ Wide hidden layers for closely approximating each 1 x 1 filter of MobileNets

Better

• TWN (NeurIPS 2016) –Drops accuracy by 9.6%

• StrasssenNets (ICML 2018) –Increase ADDs by > 300%

Page 19: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

19 © 2019 Arm Limited

Different filters respond differently to quantization• Use of wide hidden layers might be required for few 1 x 1 filters

• But NOT for all

• Filters capture different features ↔ respond differently to quantization

Page 20: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

20 © 2019 Arm Limited

Different filters respond differently to quantization

L2-loss:

2 hidden units: 0.02, 4 hidden units: 0.0-1 2 -1

-1 2 -1

-1 2 -1

Vertical Lines detector

0 -1 0

-1 5 -1

0 -1 0

Sharpen filter

L2-loss:

2 hidden units: 0.094 hidden units: 0.09, 8 hidden units: 0.01

How do we exploit that?

1) Bank similar value structure filters together

2) Use fewer hidden units to approximate

3) See our paper for details

Filters exists with similar value structure

-1 2 -1 -1 2 -1 -1 2 -1 Vertical Lines detector (flattened)

-1 -1 -1 2 2 2 -1 -1 -1 Horizontal Lines detector (flattened)

Share common values at 5 places, corners and center

Page 21: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

21 © 2019 Arm Limited

MobileNet via per-layer hybrid filter banks

• See our paper for details

A layer with hybrid filter bank

Input

Convolutionallayers

MobileNet

Conv1

DS-Conv1

DS-Conv2

DS-Conv3

DS-Conv4

FC layer

Output layer

𝐖

Previous Depthwiseconvolutional layer

1) Traditional convolution2) full-precision weights

1) StrassenNets2) Use narrow hidden layers3) Restrict increase in ADDs

Channel concatenation

𝐖

𝑾 𝑣𝑒𝑐(𝑨)

Precision criticalfilters

Quantization tolerant

filters

Page 22: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

22 © 2019 Arm Limited

Baseline

TWN

StrassenNetsHybrid Filter Banks

54

56

58

60

62

64

66

0 500 1000 1500 2000 2500 3000

Accu

racy

(%)

Model Size (KB)

Accuracy vs Model Size

Baseline

TWN

StrassenNetsHybrid Filter Banks

54

56

58

60

62

64

66

0 0.2 0.4 0.6 0.8 1 1.2

Accu

racy

(%)

Energy/Inference (Normalized)

Accuracy vs Energy

Performance of MobileNet with Hybrid Filter Banks

• Hybrid Filter Banks – 47% reduction in MULs, only 48% increase in ADDs (compared to > 300%)28% reduction in energy/inference, 51% reduction in model sizeComparable accuracy No degradation in inference throughput

Better Better

Page 23: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

23 © 2019 Arm Limited

Conclusion• Easier to compress over-parameterized networks

• Significantly difficult to compress already highly optimized networks

• Hybrid Neural-Tree Network for highly optimized keyword spotting network

- 30% reduction in memory footprint, 12% reduction in #Ops, comparable accuracy

- SysML 2019 : https://arxiv.org/abs/1903.01531

• Hybrid filter banks for highly optimized MobileNets- 50% reduction in model size, 28% improvement in energy, same throughput,

comparable accuracy

- Under review: https://arxiv.org/abs/1911.01028

Input

Page 24: o Ç ] v P d v Ç Y µ v ] Ì ] } v } o Ç } v ] v E Á } l · 2020. 2. 22. · Title: Microsoft PowerPoint - tinyML meetup Nov 2019 - Read-Only Author: jesbeu01 Created Date: 11/22/2019

Thank YouDankeMerci谢谢

ありがとうGracias

Kiitos감사합니다

ध वाद شكًراתודה

© 2019 Arm Limited