18
2/12/2015 1 Hao Jiang A collaborative research among San Francisco State University, EI-Lab at University of Pittsburgh, HP Labs, and AFRL Neuromorphic Computing based Processors 2 Outline • Why Neuromorphic Computing? • Challenges and New Opportunity • Spiking Neuromorphic Design • A Framework of Heterogeneous Computing Systems • Conclusion

Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

  • Upload
    others

  • View
    6

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

1

Hao Jiang

A collaborative research among San Francisco State University, EI-Lab at

University of Pittsburgh, HP Labs, and AFRL

Neuromorphic Computing based Processors

2

Outline

• WhyNeuromorphicComputing?

• ChallengesandNewOpportunity

• SpikingNeuromorphicDesign

• AFrameworkofHeterogeneousComputingSystems

• Conclusion

Page 2: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

2

3

WhyNeuromorphicComputing?

•VonNeumannarch.isfacingseverechallenges– VonNeumannbottleneck– Inefficientincognitivecomputations

•Humanbrain:highefficiency– 100TFLOPSvs.20Watt– Highlyconnected:50Bneurons&1014 synapses

– Verylight:3lbsHead

Computation & ControlComputation & Control

Tape

Neuromorphic Design by Leveraging Memristor Technology

4

GraymatterWhitematter

Neocortex6layersSignalstravelwithinandbetweenlayers

Brain– TheMostEfficientComputingMachine

Brain:15–30BneuronsExtremelycomplex4km/mm3

35w

Neuron:Processsignalsfromotherneurons.

Synapse:MemoryWeightsignals

NeuralNetwork

Page 3: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

3

5

Brain‐likeNeuromorphicCircuits

• Slowprogressinneuromorphichardwareimplementation− Lackofefficientsynapsedesign− Notsupportivetomassconnection

Highlyparallel Ultrapowerefficient

Flexible Extremelyrobust

Realworldinput

Humanfriendlyoutput

Datafriendly

6

Outline

• WhyNeuromorphicComputing?

• ChallengesandNewOpportunity

• SpikingNeuromorphicDesign

• AFrameworkofHeterogeneousComputingSystems

• Conclusion

Page 4: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

4

7

ChallengesinTraditionalApproach

Developingandimplementingneuralnetworkmodelsonlargescalecomputerclustersorsupercomputers.

Performance (100M MIPS) Challenge Energy Challenge

10-20 Megawatts

11000 U.S. households

8

TraditionalAnalogApproachWeight matrix W

Y = f (W X)

X Y

Implementation

floating gates, capacitor, etc.

op-amps, analog voltage multipliers and differentiators

Difficulties

Volatile data, low precision, control signal

Voltage offset, noise generation, voltage saturation

Scaling

O(N2) for weight carrier

O(N2) for voltage multiplier

Weight

Compute

Intrinsic,hardtoovercome

Successfulinsmallscalesystems

Designcomplexity,power,andareagrowveryfast

Page 5: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

5

9

LatestProgress

NumentaHTM

Micron Automata

IBM TrueNorth

10

Memristor– RebirthofAnalogApproach

Memristor

Naturalweightcarriers:• Non‐volatility,highdensity

• Analogresistancestates

• Twoterminalprogramming

MemristorCrossbar

• Naturalweightsummation

• MIMO:avoidsneakpath

• Cost~O(N),notO(N2)

M = RL α + RH (1- α)

I = VM1/M1 + VM2/M2 +…+ VMn/Mn

Page 6: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

6

11

0 10 20 30 40 50 60 70300

400

500

600

700

Pulse number

Res

ista

nce

()

0 10 20 30 40 50 60 70-4

-2

0

2

4

Vol

tage

(V

)

Memristor– RebirthofNeuromorphicCircuits

• Twoterminal,highdensity• Non‐volatility• Analog/multi‐levelstates

• Naturalmatrixfunction• AMIMOsystem• Goodcombinationwithmemristor

Memristor↔Synapse Crossbar↔Network

TaN1+x

HPlab,2012

EIlab,DAC’12VO

VI

WLi

mi,j

N2

N3

N4

Ni

Ni+1

Nn

N1 N2 N3 Nj-1 Nj Nn-1 Nn

BLjN1

EIlab,APL’13

EIlab&HPlabTiN-TaOx device, pulses grows linearly in amplitude

12

Outline

• WhyNeuromorphicComputing?

• ChallengesandNewOpportunity

• SpikingNeuromorphicDesign

• AFrameworkofHeterogeneousComputingSystems

• Conclusion

Page 7: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

7

13

Spiking‐based Neuromorphic Computing

• Why spiking?

– Inspired by human brains

– Minimized transition electrical charge

– Reduced data communication distance

– High parallelism in processing

• Approaches in hardware system

– Analoganddigitalcircuitblocks, capacitors

– Crossbar array basing on SRAM, PCM,Memristor cell

– In this work: Memristor based crossbar arrayas synapse

14

Spiking Neuromorphic System

010

101

Integrate and Fire

100 Counters

• Spiking‐based computationsystem

Closer to biological system Power efficiency High reliability

• Matrix computationtransformation

Σ i=1 gij Vi M

I1 I2 In …

V1

V2

Vm

g11 g12 g1n

g21 g22 g2n

gm1 gm2 gmn

Mathematicalmatrixcomputation

Memristor‐based crossbar arrayfor matrix computation

Page 8: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

8

15

Computation Methodology

Traditional integrate and firemodel

Rate coding

ts

Vm

Cm Vth

Vout

T

mi = 2

Ideal: nj ∝ Σ i=1 gij miN

Working mechanism:

• Vm < Vth Cm integrates

Vm ≥ Vth A spike is fired out,

then Cm is rest to 0

• Spike occurs: WeightedcurrenttointegratorNo spike:Nocurrentto&fromintegrator

16

Memristor crossbar array structure

Voltage (V)

Cur

rent

(A

)

100

10-2

10-4

10-6

10-8

10-10

10-12

10-14

102

-3 -2 -1 0 1 2 3

Memristor

Selector

• 1S1Mmemristor‐basedcell

– Alleviate the impact of sneak path leakage

– A thin‐film based selector after eachmemristor

– Minimal unit cell area of 4F2

• Selector property and operation

– Spikeoccurs:SelectorON, and Rs_on

<<RM

– No spike: SelectorOFF, and Rs_off>>RM

gij · gs gij = ~ gij + gs

gij

Page 9: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

9

17

High‐speedIntegrateandFireCircuit(H‐IFC)

VREF=33% of vin

18

Integrate and Fire Circuit

Cm

Rmem

VCm

。Vout

VTH

VOUT VCm

Comparator

Iin

Vin

• Structure and property

– Integrate capacitor, Resettransistor, Comparator withpositive feedback

– High speed

– Vth is much smaller than Vdd

• Power and area

– ~100μW

– 28μm×12μm

(180μm Technology)

Page 10: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

10

19

SystemVerification

Sp

ikin

g In

pu

ts

I1

g11 g12 g1n

g21 g22 g2n

gn1 gn2 gnn

. . .

I2 In

I&F I&F I&F . . . Cm Fire Circuit

20

OutputPulseNumber

Output Pulse Number

Sensing Capacitance

Pulse Duration

Input Voltage

Comparator Trigging Voltage

Memristor Conductance

Selected Row

Page 11: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

11

21

TheoreticalComputationvs.SimulationResults

0

20

40

60

80

100

0 0.5 1 1.5 2 2.5 3 3.5

∑ i=1 N mi gij

Ou

tpu

t S

pik

e N

um

ber

Ideal Linear Curve

Simulation Result

× 10-4

Parameters Vth 0.5V

Cm 50fF

Vdd 1.8V

• Real: Nonlinearity

• Sums of weighted signaldependent

Reasons:

• Reset time• IFC delay

Optimization:

• Larger Cm• Higher speed of IFC

Be used in neural network

22

AdaptabilityinNeuronNetwork

WL1

WL2

WL31

WL32

. . .

Vout

0 50 100 150 200 250 (ns) (a)

0 50 100 150 200 250 (ns) (b)

WL1

WL2

WL31

WL32

. . .

Vout

0 50 100 150 200 250 (ns) (c)

WL1

WL2

WL31

WL32

. . .

Vout

Output Spike Number (nj)

a 19

b 20

c 19

Good adaptability in neuralnetwork

Page 12: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

12

23

Outline

• WhyNeuromorphicComputing?

• ChallengesandNewOpportunity

• SpikingNeuromorphicDesign

• AFrameworkofHeterogeneousComputingSystems

• Conclusion

24

OurApproach

• Aframeworkofheterogeneouscomputingsystemsenhancedwithneuromorphiccomputingaccelerators (NCAs).

• Purpose: Tocombinetheflexibility ofconventionalarchitectureinlogicandscientificcomputationandtheefficiency ofneuromorphicarchitectureforANNapplications.

Page 13: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

13

25

Frontend:PrepareData&Instructions

Instruction Type Descriptionsetpreg Configuration Placetheroutinginformationstoredat

registerregtocentralroutermovd#(reg) I/O LoadthedatafrommemorytoNCAlaunch Configuration Notifythecentralroutertostarttransmittingdeqreg I/O DequeuetheheaddataofOut‐queueandwrite

ittoregisterreg

bool RecallBSB(float *vec, float *wm)

{     /* simulate the synapse network*/     for(i=0;i<BsbSize;++i) wx[i] += �wm[i*BsbSize+j] * vec[j];    ……    /* activation function*/    for(i=0;i<BsbSize; ++i) wx[i] = ALPHA*wx[i] + LAMDA*vec[i];    ……    /* check convergence */    for(i=0;i<BsbSize;++i) if(fabs f(vec[i])  != 1.0) return  false;    return  true;}

bool RecallBSB(float *vec)

{    /*inputs to NCA*/

    Send(NCA.id, vec);

    /*outpus from NCA*/

    return  Receive(NCA.id)

}

; send  each input from register to input; buffer associated with specific NCALW R1, $(vec)

MOVD NCA.id, R1

……; launch  the NCA

SET NCA.id, #VAL  LAUNCH; put the output from output buffer of; NCA to register, here is only one output

DEQ R1, NCA.id

RETSource‐to‐source 

translation

Training

NCA‐aware compilation

26

Backend:SystemDesign

Arbiter

NCA

I/OCfg

Buffers

Arbiter

NCA

I/OCfg

Buffers

Arbiter

NCA

I/OCfg

Buffers

Arbiter

NCA

I/OCfg

Buffers

Bridge

ADC

00101101

Bridge

ADC

00101101

GeneralPurposeProcessor

SRAM

I/O

ConventionalProcessing Neuromorphic Computing Accelerators

Fetc

h

Dec

ode

Issu

e

Exe

cute

Mem

ory

Wri

te

back

RF NCA $ NCA

RFPipeline

• Tightlycoupleddesign

• Invokedbyspecialinst.

Page 14: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

14

27

NCAArchitecture

• AhierarchicalstructureofMBCarrays

• MixedsignalNoC– Analogcomputation– Digitalcontrolandroutingsignaltransition– MBCarraysconnectedinametamorphouscentralizedmesh(MCMesh)manner

SUM AMP

GroupRouter

GroupRouter

GroupRouter

GroupRouter

CentralRouter

28

SystemLevelEvaluation• Twoimplementationsrepresentingtradeoffsbetweencomputationperformanceandaccuracy

• 7classificationbenchmarks• Classificationrateisusedas

reliabilitymetric

Multi-layer perception (MLP) Auto-associative memory (AAM)

Benchmark Description

cancer breast cancer diagnose

connect-4 connect-4 game

gene nucleotide sequences detection

lymphography lymph diagnose

MNIST digit recognition

mushroompoisonous mushroom discrimination

thyroid thyroid diagnose

Page 15: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

15

29

ExperimentalSetup

The Design Parameters of NCA Components

The Benchmark Implementation Details

30

ImpactofDeficientHardware

• Programmingprecisionduetolimiteddeviceresolution

• Devicevariationsandsignalfluctuations

• AAMismorerobustthanMLP

Page 16: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

16

31

MBCSizeExploration

• Largerarraysizeispreferablefromperformanceperspective• However,asarraysizeincreasestheclassificationratedegradesinducedbytheaggravatedvariations

32

Comparisonw/OtherDesigns

• Baseline: CPUasgeneralpurposeprocessor

• D‐NPU: apopulardigitalneuromorphicaccelerator(MICRO’12)

• MBCs+D‐Net:MBCarraysw/digitalNoCinordertoevaluatetheefficiencyofmixed‐signalNoC

• NCA: ourdesignw/MBCarraysandmixed‐signalNoC

Sigmoid Unit

Multiply-add

Output buffer

Input buffer

AccumulatorRegister

Weight buffer

PE PE PE PE

PE PE PE PE

PE PE

PE PE

PE PE

PE PE

Page 17: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

17

33

Comparisonw/OtherDesigns

D‐NPU islimitedbythecomputationalbandwidth.

MBCs+D‐Net islimitedbythecostlyAD/DAconversions.

MLP AAM

Speedup 177.67 27.20

Energy Saving

184.71 25.18

NCAImprovement

36

Outline

• WhyNeuromorphicComputing?

• ChallengesandNewOpportunity

• SpikingNeuromorphicDesign

• AFrameworkofHeterogeneousComputingSystems

• Conclusion

Page 18: Neuromorphic Computing based Processors · 2015-02-22 · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits •Slow progress in neuromorphic hardware implementation −Lack of efficient

2/12/2015

18

37

Conclusion&Perspective

• Invention of new devices inspires the study of the next generation neuromorphic computing systems.

• A spiking neuromorphic computing system by leveraging the memristor crossbar array is demonstrated.

• We propose a heterogeneous system that combines the flexibility of conventional architecture and the efficiency of neuromorphic architecture.

• In the future research, we plan to extend the investigation to larger scale ANN applications.

• The techniques to enhance the run-time robustness in training and testing procedures will be studied.

38

ThankYouandQuestions?