Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
The Embedded Learning Library
The Embedded Learning Library (ELL)
Cross-compiler for AI pipelines, specialized for resource constrained target platforms
https://github.com/Microsoft/ELL
AI
Pipeline
Target
Machine
Code
ELL
• 3 years at Microsoft Research
• compiler toolchain, tutorials, model gallery
• focus: ARM CPUs embedded GPUs, vision on ARM Cortex A53, keyword spotting on ARM Cortex M4f
The Embedded Learning Library
Computation Graph Optimizer
ELL Platform Abstraction Layer
LLVM
Emitter
OpenCL
Emitter
Importer Importer Importers
Importer Importer Target
Profiles
…
Importer Importer ELL Trainers
Target
Dataset Pretrained
Model
LLVM OpenCL BLAS
Architecture
AI compiler vs. AI runtime
• model-specific optimization
• target-specific optimization
• small executable
• portability
• seamless migration from cloud to edge
why AI compiler? why AI runtime?
best of both worlds
just-in-time AI compiler
compression techniques:
• efficient architectures
• pruning
• low precision math and quantization
• low rank matrix approximation
Evaluation
small loss in accuracy large gain in cost
January 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
model Pareto frontier
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search January 2018
February 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
March 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
April 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
• variety of convolution kernels
• scheduling
• engineering
Lossless acceleration
January 2019
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Lossless acceleration
February 2019
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Lossless acceleration
March 2019
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Lossless acceleration .
mix and match compression techniques
engineering/ML co-design
during training vs post processing
Lossy Acceleration
bit value
0 0
1 1
bit value
0 -1
1 1
bits value
00 0
01 1
10 n/a
11 -1
bits value
0…k [0...2^k - 1]
bits value
0…k [-2^(b-1)-1...2^(b-1)-1]
bits Value
0…k lookup
bits value
0…k a±b±c±.. ±n
Quantization semantics binary
ternary linear
exponential
lookup/clustered iterative sum
b3 b2 b1 b0 a3 a2 a1 a0
d3 d2 d1 d0 c3 c2 c1 c0
d0 c0 b0 a0
d1 c1 b1 a1
d2 c2 b2 a2
d3 c3 b3 a3
bit packed
bit planes
Quantization representation
Quantization example
activations
weights
5 1 7 6 3 4 2 5
1 -1 0 -1 -1 -1 1 0
ternary weights, 3-bit unsigned linear activations (bitplane)
dot = 5*1 + 1*-1 + 7*0 + 6*-1 + 3*-1 + 4*-1 + 2*1 + 5*0 = -7
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
5 1 7 6 3 4 2 5
1 -1 0 -1 -1 -1 1 0
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
o = 11101001 && 11011110 = 11001000
absSum += popcount(o) = 3
o = 1100100 && 01011100 = 10000100
negSum += popcount(o) = 2
absSum: o = a && m
absSum += popcount(o)
negSum: o = a && s
negSum += popcount(o)
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
o = 00111010 && 11011110 = 00011010
absSum += popcount(o) = 3 + 2*3 = 9
o = 00011010 && 01011100 = 00011000
negSum += popcount(o) = 2 + 2*2 = 6
absSum: o = a && m
absSum += popcount(o) << 1
negSum: o = a && s
negSum += popcount(o) << 1
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
absSum: o = a && m
absSum += popcount(o) << 2
negSum: o = a && s
negSum += popcount(o) << 2
total = absSum – 2 * negSum
o = 10110101 && 11011110 = 11001000
absSum += popcount(o) = 9 + 4 * 3 = 21
o = 11001000 && 01011100 = 01001000
negSum += popcount(o) = 6 + 4 * 2 = 14
total = 21 – 2 * 14 = -7
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
instruction_count = 8 instructions * 3 bits = 24 instructions
vector size = 8
instructions per element = 24 / 8 = 3
if word is 128-bit (NEON):
instruction_count = 8 instructions * 3 bits + 0.3 reduce ops = 24.3 instructions
vector size = 128
instructions per element = 24.3 / 128 = 0.19 (5x faster than float)
Quantization performance
0
5
10
15
20
25
quantize
d v
s fu
ll p
reci
sio
n Speedup on ARM1176
1 Bit 2 Bits 3 bits 8 bits
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7acc
ura
cy v
s o
rig
inal m
od
el
proportion of zeros in ternary weights
model with
binary weights models with
trinarized
weights
Quantized weight accuracy
Quantized activation accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8
acc
ura
cy v
s re
al act
ivatio
ns
quantized activation bit count
ternary weights
binary weights
• post-training lossy compression (pruning and quantization)
• engineering/ML training co-design
• infrastructure:
beating BLAS on embedded platforms
extending platform abstraction layer to embedded GPUs
global optimizer
Current focus areas
Questions?
• https://microsoft.github.io/ELL/
• Code: https://github.com/Microsoft/ELL
• Model Gallery: https://microsoft.github.io/ELL/gallery/
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
Not every model is a winner
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13