29
Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa Clara Eric Flamand, CTO & CoFounder of Greenwaves Technologies

Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Extending the RISC-V ISA for Optimized Support of CNNs in a

Multi-Core context

RISC-V Summit Dec 3-6 2018 Santa Clara

Eric Flamand, CTO & CoFounder of Greenwaves Technologies

Page 2: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Who are we?

• French based startup created in 2015• First product, GAP8, launched in Feb 2018

12/4/2018 RISC-V Summit Dec 2018 2

Page 3: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Our Market Vision

12/4/2018 RISC-V Summit Dec 2018 3

The IoT pipeNB-IoT, LTE-M, Sigfox,

LoRa, etc.

B/day to kB/dayBattery operated

sensors

8-bit, 160x120 @ 10 fps =4.6 Mbit/s

24-bit @ 50kHz = 1.2 Mbit/s

Linear PCM =1.4 Mbit/s

Market DemandRich sensor data

Keyword SpottingBeam formingSpeech pre-processing

Vibration analysisFault detection

Face detectionPresence detectionCountingEmotion detection

Page 4: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Our Market Vision

12/4/2018 RISC-V Summit Dec 2018 4

B/day to kB/day B/day to kB/dayBattery operated

sensors

The IoT pipeNB-IoT, LTE-M, Sigfox,

LoRa, etc.

8-bit, 160x120 @ 10 fps =4.6 Mbit/s

24-bit @ 50kHz = 1.2 Mbit/s

Linear PCM =1.4 Mbit/s

Market DemandRich sensor data

CNNSVM

BayesianBoostingCepstral analysis

Market demand+

Low operation cost+

Low deployment cost +

Low installation cost=

Massive deployment of intelligent rich data sensors

Issue: way more MIPS than an MCU can deliver but need

to bewithin an MCU power

envelope ?

Page 5: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

GAP8 An IOT Application Processor

12/4/2018 RISC-V Summit Dec 2018 5

MemoryL2

FC clock & voltage domain

PMU RTC

FabricController

L1

ROM

I$

Debug

LVDS

Serial I/Q

UART

SPI

I2C

I2S

CPI

HyperBus

GPIO / PWM

Mic

ro D

MA Logarithmic Interconnect

Shared L1 Memory

Shared Instruction Cache

Cor

e 0

Debug

ClusterDMA

HWSync

Cor

e 1

Cor

e 7

Cor

e 6

Cor

e 5

Cor

e 4

Cor

e 3

Cor

e 2

HW

CE

Cluster clock & voltage domain

Two independent clock and voltage domains, from 0-133MHz/1V up to 0-250MHz/1.2V

MCU FunctionExtended RISC-V coreExtensive I/O setMicro DMAEmbedded DC/DC convertersSecured execution / e-fuses

Computation engine function8 extended RISC-V coresFully programmableEfficient parallelizationShared instruction cacheMulti channel DMAHW synchronizationHW convolution Engine (3 * 3x3)

Retentive1µA+x*8µA

Pre-analysis1mWs

Inferencefew 10mWs

An integrated, hierarchical architectureDeep sleep

1uATSMC 55LP1.0V to 1.2VMax Freq: 133 MHz to 250 MHz

Page 6: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Gap8 The open source heritage

12/4/2018 RISC-V Summit Dec 2018 6

GreenWaves- Best in class Instruction Set

Architecture (ISA)- UC Berkeley originated

- GWT Member of RiscV Foundation

- Open Source Computing Platform created by ETHZ and UniBo

- Permissive license (solderpad)- Multiple tape outs

- GWT contributes to PULP

- Innovating on Risc-V and PULP-Proprietary balanced system solution (SOC) based on PULP open source elements plus GWT proprietary elements both on HW and SW/Tools side

Page 7: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

12/4/2018 RISC-V Summit Dec 2018 7

Extending the ISA – Impact on CNN centric applications

Page 8: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

ISA Extension(s)

12/4/2018 RISC-V Summit Dec 2018 8

Given our 4 stages in order how to increase ILP with a moderate gate count increase given an application family?

Group 1: Loop Kernels• Zero overhead HW loop• Post modified load/store, Reg/Reg load/store

Group 2: DSP/Linear Algebra• Mac/Msu with optional normalization and rounding• Add/sub/mult with optional normalization and rounding• Clip,Min,Max,Abs

Group 3: Bit manipulation• Insert/extract/set/clear/findfirst/findlast/countleadingbits/rotation• PopulationCount

Group 4: Vectorial/SIMD 4 Bytes, 2 Half Words• Add/sub/avg/min/max/abs/shift/logical• Shuffle/insert/extract/pack• Dot product/sum of dot products

Group 5: Complex Numbers, Treillis• Product/Conjugate/Rotation• Max path/path selection

PULP

Greenwaves

Page 9: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Performance and Power Measurement

12/4/2018 RISC-V Summit Dec 2018 9

Page 10: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Impact on CNN

• Selected layers• Convolution, fixed point. Will use 5x5• Convolution, binary. Will use 5x5• Max pooling. Will use 2x2 with stride 2• Average pooling. Will use 2x2 with stride 2• Linear

• Vectorization impact• Qx.y <= 15 Vector of 2 signed short int• Qx.y <= 7 Vector of 4 signed bytes

12/4/2018 RISC-V Summit Dec 2018 10

Compared configurations• Pure RiscV• Gap8 without vector (Groups 1,2 and 3)• Gap8 with vector (Groups 1,2,3 and 4)

Page 11: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Convolution

12/4/2018 RISC-V Summit Dec 2018 11

for (int c=0; c<(W-4); c++) for (int l=0; l<(H-4); l++) int R = Out[l*W+c]<<Norm; for (int kl=0; kl<5; kl++) for (int kc=0; kc<5; kc++) R += Filter[kl*5+kc]*In[(l+kl)*W + c+kc]; Out[l*W+c] = R>>Norm;

.L28:lb t5,0(a6)lb t4,0(a1)lb t3,2(a1)lb s10,2(a6)lb t1,4(a1)lb s8,4(a6)lb a7,6(a1)lb s7,6(a6)mul t4,t4,t5lb s6,8(a6)lb t5,8(a1)add a1,a1,10add a6,a6,t6mul t3,t3,s10add t4,t4,s9mul t1,t1,s8add t3,t3,t4mul a7,a7,s7add t1,t1,t3mul t5,t5,s6add a7,a7,t1add s9,t5,a7bne t0,a1,.L28

lp.setup x1,t3,(.L242) lb t5,0(t1)lb t4,0(a7)lb s10,1(t1)lb s9,1(a7)p.macs a6,t5,t4lb s8,2(t1)lb s7,2(a7)lb s6,3(t1)lb t6,3(a7)lb t5,4(t1)lb t4,4(a7)add t1,t1,5add a7,a7,a1p.macs a6,s10,s9p.macs a6,s8,s7p.macs a6,s6,t6

.L242: p.macs a6,t5,t4

lp.setup x1,s2,(.L68)lb a4,0(s3)p.lw a7,a1(s4!)p.lb t1,a1(s5!)sll a4,a4,a5pv.sdotsp.b a4,t3,t6pv.sdotsp.b a4,a6,t0pv.sdotsp.b a4,a0,t2pv.sdotsp.b a4,a2,s0pv.sdotsp.b a4,a7,s1pv.sdotsp.b a4,a3,t4pv.sdotsp.b a4,t1,t5sra a4,a4,a5p.sb a4,s7(s3!)mv t3,a6pv.shuffle2.b a3,t1,s6mv a6,a0mv a0,a2

.L68: mv a2,a7

x5 x5

RiscVGap8 NoVect

Gap8 Vect

Page 12: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Binary Convolution

12/4/2018 RISC-V Summit Dec 2018 12

lp.setup x1,t3,(.L286)lhu a5,0(a5)p.bclr a2,a7,28,3or a2,a2,160p.extractur a5,a5,a2p.insert a6,a5,5,24srl a5,a6,1xor a5,a5,t4xor a2,t4,a6not a5,a5not a2,a2and a5,a5,t0lhu s3,0(t1)and a2,a2,t0p.cnt a5,a5p.cnt a2,a2sll a5,a5,8or a5,a5,a2pv.add.b a5,s3,a5add a7,a7,a3p.sh a5,t6(t1!)srl a6,a6,6

.L286: srl a5,a7,3

PopCount on RiscV is not naïve implementation but still costs approx 15 cycles

Page 13: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Convolution

12/4/2018 RISC-V Summit Dec 2018 13

Cycles Per Ouput RV 1C Gap8 1C Gap8 1C/Vect Gap8 8C/Vect

Short Convolution 5x5 135,1 100,2 40,2 5,3

Short Xnor Conv 5x5 29,9 11,4 11,4 1,5

Byte Convolution 5x5 135,1 98,2 19,1 2,5

Byte Xnor Conv 5x5 29,9 11,4 11,4 1,5

pJ Per Ouput RV 1C Gap8 1C Gap8 1C/Vect Gap8 8C/Vect

Short Convolution 5x5 9674,2 7677,4 3008,9 1384,7

Short Xnor Conv 5x5 2379,8 864,4 864,4 374,3

Byte Convolution 5x5 9674,2 7524,2 1430,2 653,8

Byte Xnor Conv 5x5 2379,8 864,9 864,9 373,8

Page 14: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Pooling

12/4/2018 RISC-V Summit Dec 2018 14

lp.setup x1,a3,(.L29)p.lh a4,t6(a0!)p.lh a5,t6(a2!)pv.max.b a5,a5,a4pv.extract.b a4,a5,0pv.extract.b a5,a5,1p.max a5,a4,a5

.L29:p.sb a5,t5(a1!)

lp.setup x1,a4,(.L39)p.lh a5,t0(a6!)p.lh t3,t0(t1!)pv.shuffle2.b a5,t3,t4pv.dotsp.sci.b a5,a5,1sra a5,a5,2

.L39: p.sb a5,t6(a7!)

lp.setup x1,a6,(.L216)p.lb a5,s2(a7!)p.lb t0,s2(t4!)p.lb a4,s2(t5!)p.lb t6,s2(t3!)p.max a5,a5,t0p.max a4,a4,t6p.max a5,a5,a4

.L216: p.sb a5,s0(t1!)

lp.setup x1,a4,(.L227)p.lb a5,s2(a6!)p.lb t0,s2(t3!)p.lb t6,s2(t4!)p.lb t5,s2(t1!)add a5,a5,t0add a5,a5,t6p.addN a5,a5,t5,2

.L227:p.sb a5,s0(a7!)

.L14: lbu a5,1(a2)lbu t4,0(a2)add a7,a7,1sll a6,a5,24sll t3,t4,24sra a6,a6,24sra t3,t3,24bge a6,t3,.L11mv a5,t4mv a6,t3

.L11: lbu t3,1(a4)add a2,a2,t5sll t4,t3,24sra t4,t4,24bge a6,t4,.L12mv a5,t3

.L12: lbu a6,0(a4)sll t4,a5,24sra t4,t4,24sll t3,a6,24sra t3,t3,24bge t4,t3,.L13mv a5,a6

.L13:sb a5,0(t1)

.L21:lb a5,0(a4)lb t5,1(a4)lb t4,0(a2)lb t3,1(a2)add a5,a5,t5add a5,a5,t4add a5,a5,t3sra a5,a5,2sb a5,0(a7)add a6,a6,1add a4,a4,t1add a2,a2,t1add a7,a7,t0

bne t6,a6,.L21

for (int c=0; c<Wo; c++) for (int l=0; l<Ho; l++) Out[l*Wo+c] = Max(Max(In[2*l*W+2*c], In[2*l*W + 2*c+1]), Max(In[(2*l+1)*W+2*c], In[(2*l+1)*W + 2*c+1]));

Max Pooling

for (int c=0; c<Wo; c++) for (int l=0; l<Ho; l++) Out[l*Wo+c] = (In[2*l*W+2*c]+In[2*l*W + 2*c+1]+In[(2*l+1)*W+2*c]+In[(2*l+1)*W + 2*c+1])>>2;

Average Pooling

RiscV

Gap8 No Vect Gap8 Vect

Page 15: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Pooling

12/4/2018 RISC-V Summit Dec 2018 15

Cycles Per Ouput RV 1C Gap8 1C Gap8 1C/Vect Gap8 8C/Vect

Short 2x2/2 Max Pool 32,2 8,3 8,2 1,2Short 2x2/2 Avg Pool 16,2 8,3 6,2 1,1Byte 2x2/2 Max Pool 32,2 8,3 8,2 0,9Byte 2x2/2 Avg Pool 16,2 8,3 7,2 1,1

pJ Per Ouput RV 1C Gap8 1C Gap8 1C/Vect Gap8 8C/Vect

Short 2x2/2 Max Pool 2268,7 598,1 597,2 293,0Short 2x2/2 Avg Pool 1179,9 650,3 456,2 253,1Byte 2x2/2 Max Pool 2268,7 597,5 596,8 215,0Byte 2x2/2 Avg Pool 1179,4 650,4 531,3 254,1

Page 16: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Linear

12/4/2018 RISC-V Summit Dec 2018 16

for (int i=0; i<H; i++) { int R = Out[i]<<Norm;

for (int j=0; j<(W>>2); j++) {R += In[4*j]*Filter[W*i+4*j];R += In[4*j+1]*Filter[W*i+4*j+1];R += In[4*j+2]*Filter[W*i+4*j+2];R += In[4*j+3]*Filter[W*i+4*j+3];

}for (int j=4*(W>>2); j<W; j++) R += In[j]*Filter[W*i+j];Out[i] = R>>Norm;

}

.L41:lb t3,0(a2)lb s5,0(a6)lb t1,1(a2)lb s4,1(a6)p.mul t3,t3,s5lb a7,2(a2)lb s3,2(a6)lb s1,3(a2)lb s2,3(a6)add a2,a2,4add a6,a6,4p.mul t1,t1,s4add t3,t3,t4p.mul a7,a7,s3add t1,t1,t3p.mul t3,s1,s2add a7,a7,t1add t4,t3,a7

bne t5,a2,.L41

RiscV

lp.setup x1,t3,(.L259)lb t5,0(a7)lb t4,0(t1)lb s4,1(a7)lb s3,1(t1)p.macs a6,t5,t4lb s2,2(a7)lb s1,2(t1)lb t5,3(a7)lb t4,3(t1)add a7,a7,4add t1,t1,4p.macs a6,s4,s3p.macs a6,s2,s1

.L259: p.macs a6,t5,t4

Gap8 NoVect

lp.setup x1,t5,(.L99)p.lw t4,8(t6!)p.lw t3,8(t0!)p.lw t1,8(a2!)p.lw a7,8(t2!)pv.sdotsp.b a6,t4,t3

.L99: pv.sdotsp.b a6,t1,a7

Gap8 Vect

Page 17: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Linear

12/4/2018 RISC-V Summit Dec 2018 17

Cycles per sum of product RV 1C Gap8 1C Gap8 1C/Vect Gap8 8C/Vect

Short Linear 5,3 3,0 1,5 0,3

Byte Linear 5,3 3,0 0,8 0,2

pJ per sum of product RV 1C Gap8 1C Gap8 1C/Vect Gap8 8C/Vect

Short Linear 382,5 241,3 122,7 64,2

Byte Linear 382,3 241,0 62,3 34,6

Page 18: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Summary - Performance

12/4/2018 RISC-V Summit Dec 2018 18

Average Extension’s Speed Gain: 3,6

Convolution: 80% of CNN workload

Page 19: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Summary – Energy Efficiency

12/4/2018 RISC-V Summit Dec 2018 19

Average Extension’s Energy Gain: 3,4

Convolution: 80% of CNN workload

Page 20: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

12/4/2018 RISC-V Summit Dec 2018 20

Memory Management

Handling large network with minimal energy overhead

Page 21: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Memory Management

12/4/2018 RISC-V Summit Dec 2018 21

Shared L1

L2

1 8

External L3 (Ram/Flash)

DMA

uDMA

• Gap8 is not equipped with data caches• Silicon area• More important energy efficiency

mostly due to hit ratio• We can turn this weakness into an (energy)

benefit if we can automate data transfers• In practice a vast majority of traffic is

predictable

ExecL2 to L1L3 to L2

Automatic data tiling and pipelined memory transfer interleaved with parallel call to compute

kernel is solved by our “Autotiler” tool

Page 22: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

AutoTiler

12/4/2018 RISC-V Summit Dec 2018 22

Basic KernelsHow to handle a parametric tile• Vectorization + Parallelization• No assumption on where actual data are located

User Kernels

Passing actual data to basic kernels and having data circulating between them• A multi dimensional iteration space (2D; 3D; 4D; 5D. ..) and a traversal

order• Each argument is a sub space of the iteration space and has actual

dimensions, location (L2, external) and properties. Order may differ from the one of the iteration space

• Given a memory budget the auto tiler “tiles” each argument and generates a fully pipelined implementation interleaving processing and data transfers

• Basic Kernels are inserted at defined locations in the iteration space (prologue, body, epilog, …)

• Generated tiles are passed to Basic Kernels

Usually seen as libraries

Can be grouped and organized as generators

Page 23: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

AutoTiler

12/4/2018 RISC-V Summit Dec 2018 23

BasicKernelsUser KernelsGroup of User KernelsGenerators

C Programs, calls to Autotiler’s Model API

C Libraries

Autotiler Library

(Constraints Solver, C Code Generator)

Compile & Run on PC

C code for the target handling data transfers and Basic Kernels dispatch on cluster’s cores

#include "AutoTilerLib.h"#include "CNN_Generator.h"void Mnist(){ CNN_TiledConvNxNReLUPool2x2_SW_fp("Conv5x5RLMP_0", 5, 1, 32, 28, 28, 1); CNN_TiledConvNxNReLUPool2x2_SW_fp("Conv5x5RLMP_1", 5, 32, 64, 12, 12, 1); CNN_TiledLinearLayer ("LinearLayerRL_2", 64, 4, 4, 10, 1, 0, 0);}

Page 24: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

12/4/2018 RISC-V Summit Dec 2018 24

On real life networks

Page 25: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Key Word Spotting

12/4/2018 RISC-V Summit Dec 2018 25

CNN on HWCE: Avg power: 8.79mWDuration: 58ms

MFCC on FC: Avg power: 3.3 mWDuration 170ms

Processing of 1 second of voice data at 1.0V:

CNN (cluster)SW version 155ms 11,8mW : 1,8 mW averageHWCE version 58ms 8.8mW : 509uW average

MFCC (FC)170ms 3,3mW : 560uW average

Total 1,07mW with HWCE2,36mW in SW

Google CNN:Conv 8x20, MaxPool 2x2/2, 1 InFeat, 32 OutFeat, W:95, H:40Conv 4x10, ReLU, InFeat 32, OutFeat 32Linear: 10 Outs

Page 26: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

CNN Based Text Recognition

12/4/2018 RISC-V Summit Dec 2018 26

Trainable Par: 421 263Neurons: 1 511 904

33ms per image

Page 27: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

DRONET: RESNET based Autonous Drone

12/4/2018 RISC-V Summit Dec 2018 27

• Developed by UZH and ETH-Z• Autonomously follow a road and avoid collision• Up to 18 Frames Per Second at maximum frequency• @1.0V, FC: 50MHz, Cluster: 100MHz 6.5fps 40mW

Page 28: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

Conclusion

• Well selected extension can really make the difference at a very limited silicon area overhead.

• On CNN we measure a factor of approx. 3.5 for both speed and energy efficiency for a single core

• Parallelism brings another boost factor (24.4) on performance thanks to a close to optimal scaling. Root cause is architecture.

• More interesting, parallelism contributes very significantly to the energy per operation improvement with a factor of 2 on top of the ISA extension contribution for a total of x7.4 vs RiscV single core. Here also root cause is architecture.

• These gains are further amplified by the capability to optimally managed memory transfers across memory hierarchy.

• This is enabling the support of mid complexity CNN with MCU class power budget

12/4/2018 RISC-V Summit Dec 2018 28

Page 29: Multi-Core context Optimized Support of CNNs in a …...2018/12/16  · Extending the RISC-V ISA for Optimized Support of CNNs in a Multi-Core context RISC-V Summit Dec 3-6 2018 Santa

12/4/2018 RISC-V Summit Dec 2018 29

Thank You!