More than Moore with Neuromorphic Computing Architectures · More than Moore with Neuromorphic Computing Architectures Colloquium GSI Darmstadt December 19, 2017 Karlheinz Meier Ruprecht-Karls-Universität

MorethanMoorewithNeuromorphicComputingArchitectures

ColloquiumGSIDarmstadtDecember19,2017

KarlheinzMeier

Ruprecht-Karls-UniversitätHeidelberg

[email protected]@brainscales

On computable numbers, with an application to the Entscheidungsproblem published 1937 in Proceedings of the London Mathematical Society

„Wemaycompareamanintheprocessofcomputingarealnumberwitha

machinewhichisonlycapableofafinitenumberofconditions“

The Turing Machine

Modelled after a human executing a set of computational instructions: 1. Limited size of internal storage→ finite number of states 2. Infinite size of external storage but only a fraction can be used at a given time t 1. + 2. uniquely determine the state of the machine at time t+1

EachTuringmachineuniquelyrepresentsanalgorithm

Realizing the Turing Machine The von Neumann Architecture

–  Data and instructions stored in memory –  Content of memory addressable by location –  Instructions executed sequentially unless order is explicitly modified –  Memory and Computation physically separated

Memory Computation

Control

Data Instructions

©IBM

Realizing the von Neumann Architecture Claude Shannon : From algebra to logic circuits

©H.Eisenreich,TUDresden E.Coli

©XILINXInc.

RealizingShannonsGatesPhysicstakesover-DigitalisreallyAnalog

FigurecourtesyofKunleOlukotun,LanceHammond,HerbSutter,andBurtonSmith

2020 2025 2030Year

LimitsofTechnologyScalingPe

rformance

Transistors

ThreadPerformance

ClockFrequency

Power(watts)

#Cores

180nm

65nm

28nm10nm

K.Boahen,2017

Nomoreprogressfromsmallertransistors

NewARCHITECTURESsuddenlyinteresting!

Non-vonNeumann

Non-Turing

TheBrain– ExtremeMatter

1011nodes(neurons)

1015connections(synapses)

Dynamiclong-rangeandshort-rangeinteractions

Opensystemdrivenby

externalworld

Timescalesfrommillisecondstoyears

Stochasticonthemicroscopiclevel

AssetsofbraincomputingØ  EnergyefficiencyØ  CompactnessØ  FaulttoleranceØ  SpeedØ  ConfigurationandlearningreplaceprogrammingØ  Scalability

LarrySmarr,Calit

Conventionalcomputingismovingawayfromthebrain

Whybraininspiredcomputing?futurecomputingbasedon

biologicalinformationprocessing

understandingbiologicalinformationprocessing

Twofundamentallydifferentmodelingapproaches:

•  NUMERICALMODEL(Turing)

representsmodelparametersasbinarynumbers

•  PHYSICALMODEL(notTuring)

representsmodelparametersasphysicalquantities→voltage,current,charge(likethebiologicalbrain)

canbecombinedtoformahybridsystem

needmodelsystemtotestideas

Herrmannv.Helmholtz(1821-1894)JuliusBernstein(1839-1917)

SantiagoRamónyCajal(1852-1934)

Individualcellsinthebrainarespatiallyseparated

constituents

“interactionoveradistance”

and

“spatialandtemporalintegration”

5µm

Basicsofneuralcommunicationaction potential (“spike”)

neurons

synapses output spike

neuron threshold voltage

membrane voltage

Ø Thresholdfornon-linearresponseleadingtoall-or-nonelaw(spikes)

Ø neuronsintegrateoverspaceandtime-characteristictimeconstants

Ø temporalcorrelationisimportantØ mixed-signalsystem:

actionpotential↔membranevoltage

Ahrensetal,NatureMethods10,413–420(2013)

W. W. Norton

MembraneCapacitanceC IonChannel

Conductancegi=1/Ri

MembraneVoltageU

SomeElectricalQuantitiesofarealNeuronMembrane

IonChannelCurrentIi

U,Iandgarefunctionsoftimeinanoperatingnetwork!Currenttheoriesandmodellingaretreatingthesequantitiesonly(fewexceptions)

MovingChargesQ

jdrift =σ ⋅

E = −σ ⋅ grad(Φ)

jdiffusion = −D ⋅ grad(n)

Hodgkin-Huxley1952Describingthenon-linearity

Markram,2015

PhD,MihaiPetrovici,2015

Leaky-integrate-and-fire(LIF)

Diesmann,Proceedingsofthe4thBiosupercomputingSymposium,Tokyo,2012

K-Computer,RIKENLab,12.6MWProcessor-to-NeuralCellRatio1:20.000

Simulationspeed1.520:1comparedtobiologicalreal-time

N.Gogtayetal.,PNAS101(21):8174-8179,2004

15yearsofwiringuptheadolescentbrainduringdevelopment

Slowdynamics

Nature+Real-time Simulation

CausalityDetection 10-4s 0.1s

SynapticPlasticity 1s 1000s

Learning Day 1000Days

Development Year 1000Years

12OrdersofMagnitude

Evolution >Millenia >1000Millenia

>15OrdersofMagnitude

TimeScales

FIGURE 5

100 J1 Joule

10-4 J0.1 milliJoule

10-8 J10 nanoJoule

10-10 J0.1 nanoJoule

10-14 J10 femtoJoule

Energy Scales

EF

FI

CI

EN

CY

EnergyScalesComputationalPrimitive:Energyusedforasynaptictransmission10-14ordersofmagnitudedifferencefor„thesamething“

From:HBPprojectreport

?

How much does a Neural Computation cost ? FROM TOP TO BOTTOM (Human Brain)

20 W total Power equally shared 100 Billion neurons firing at 1 Hz 10-10 Joule per action potential 1015 Synapses transmitting at 1 Hz 10-14 Joule per synaptic transmission

FROM BOTTOM TO TOP

Approx. 109 ATP molecules to be hydrolyzed for action potential

Approx. 105 ATP molecules to be hydrolyzed for synaptic transmission D. Attwell and S. B. Laughlin Obtain 10-19 Joule (approx. 1 eV) per ATP molecule Bray, Dennis. Cell Movements. New York: Garland, 1992 10-10 Joule (100.000 fJ = 0.1 nJ) per action potential

10-14 Joule (10 fJ) per synaptic transmission

Metal(100nmtimes100nm)

Insulator(Oxide)<20AtomLayers

Semiconductor

2Volts

„Switching��ofaMOStransistor:approximately0.5fJ

SynapticTransmission:

approximately10fJ

20CMOSTransistors

Electronicsvs.Biologyonthedevicelevel-Notabigdifference!

It‘snotthedevices,it‘stheARCHITECTUREandtheCOMPUTATIONALMODEL

½CU2

TheBraininaComputer? ComputerslikeBrains?

1X =l 2X =l 3X =l

1W =l 2W =lNl=W

Nl=X

OutputlayerInputlayer

Innerlayers

ArtificialNeuronalNetworksignoretimeevolution….

Here:local,norecurrencyfeed-forward

CAT

PairsofneuronsconnectedbyweightsNeuronperformsintegration(summing)

∑

)3( =ix

y

)1( =iwu ( )uf

)(Iw

)2( =iw)2( =ix

)( Iix =

1X =l 2X =l 3X =l

1W =l 2W =lNl=W

Nl=X

OutputlayerInputlayerInnerlayers

CAT

LearningExample:Supervised

Labelledinputdata

DesiredoutputDeviation

Actualoutput

JumpstartStrategicNetworkSupervisedLearningPredicthumanmovesdatabaseofexistingmatches160.000matches,30Millionpositions

PolicyNetworkReinforcementLearningNetworkself-matches128.000.000matches

ValueNetworkCombinationoffirst2steps30Millionself-matchesOneyearlearningtime,0.5MWEnergy:183MWhExcessivetrainingsamples

LearningisslowandexpensiveApplicationisfast

https://upload.wikimedia.org/wikipedia/commons/0/0a/Temporal_summation.JPG

Whatistime(spiking...)goodfor?Ø  SparseinformationcodingbytimecorrelationsØ  Shorttermspikebasedsynapticplasticity(STP)Ø  Spike-timing-dependentplasticity(STDP)Ø  Temporalnoise(stochasticity)basedcomputing

Ø  EnergyefficiencyØ  Computationaladvantages

Brain-inspiredorbrain-derivedorneuromorphiccomputing

Digital

•  Discretevaluesofphysicalvariables•  ComputationbyBooleanalgebra•  Onewireonebitofinformation•  Signalrestoredaftergate

Analog

•  Continuousvaluesofphysicalvariables•  Computationbycomponentphysics•  Onewiremanybitsofinformation•  Signalnotrestoredafterstage

Nature/mixed-signal• Localanaloguecomputation• Binarycommunicationbyspikes• Signalrestoration

Large-scaleNeuromorphicComputing–compare

Ø  Commoditymicroprocessors SpiNNaker,HBP Soft-binary-codeØ  Customfullydigital TrueNorth,IBM Hard-binary-codeØ  CustomMixed-Signal BrainScaleS,HBP PhysicalmodelAnythingincommon?

+Massivelyparallel(closetoperfectweakscaling)+Asynchronouscommunication+Configurability-Limitedflexibilityandcomplexityinneuralmodels

COMPLEMENTARITYOFAPPROACHESESSENTIAL!

Click to edit Master title style

• Click to edit Master text styles

–  Secondlevel•  Third level

–  Fourth level

HBP Neuromorphic Computing Concepts

PHYSICALMODELSYSTEM

Localanalogcomputingwith4Millionneuronsand1Billionsynapses–binary,asynchronous

communication–x10000acceleratedemulation

Location:Heidelberg(Germany)

MANY-CORENUMERICALMODELSYSTEM

0.5–1MillionARMprocessors–address-based,smallpacket,asynchronouscommunication–real-timesimulation

Location:Manchester(UK)

35

PhysicalModelSystemContinuousTimeIntegratingNeuralCellMembrane(+non-linearity)

CmdVdt

= −gleak V −Eleak( )

Cm

R = 1/gleak

Eleak

V(t) gleak [S] Cm [F]

Biology(*) 10-8 10-10 VLSI 10-6 10-13

(*) Brette/Gerstner, J. Neurophysiology, 2005

„Time“isimposedbyinternalphysics,notbyexternalcontrol€

cmdVdt

= −gleak V − E l( ) + pkgk V − Ex( )k∑ + plgl V − E i( )

l∑pk,l(t) exponential onset and decay (post-synaptic potential shape) gk,l 0 to gmax (“weights”)

effective membrane time-constant cm /gtotal is time-dependent

A New VLSI Model of Neural Microcircuits Including Spike Time Dependent Plasticity, Johannes Schemmel, Karlheinz Meier, Eilif Muller, Proceedings of the 2004 International Joint Conference on Neural Networks (IJCNN'04), IEEE Press, pp. 1711-1716, 2004

Implementationexamplewithsynapticinputsandneuronnon-linearitymixed-signal:analogcores,binarycommunication

Physical Model, local analogue computing,

binary continuous time communication

Wafer-Scale Integration of 200.000 neurons and 50.000.000 synapses on

a single 20 cm wafer

Short term and long term plasticity, 10.000 faster

than real-time

HardwarePlatformConstraints

NeuromorphicSubstrate

SimulationandVerfification

ModelBuildingCells,Network,Plasticity

Theory

TechnologyMapping

TechnologyRouting

FormalNetworkDescription

BiologicalDatabases

ConfigurationSpace40MBforafullWafer

ChallengeandOpportunity:Variability

Marder,TaylorNatureNeuroscience14,Nr2,2011

Pyloricrhythmofthecrustaceanstomatogastricganglion

20.000.000modelnetworkscreatedwith17randomcellparameters,fixedconnectivity(Neuron)400.000networksfoundwith„identical(de-generate)“timingbehaviourinmeasuredbiologicalrange

Sensitivityofsingleparameterswithin„de-generate“solutions

Marder,TaylorNatureNeuroscience14,Nr2,2011

Variabilityhastobeattherightplace...

Hardware-In-the-Loop

Millionsofparameters-networktopology-neuronsizesandparameters-synapticstrengths

Whatfor?-  Calibration-  Learning-  Environment-  Data

Separated?

ConventionalComputerDataorsimulatedenvironment

Learningmechanisms

Actionofmachineondataorenvironment

Stateofdataorenvironment

Reward(penalty)

NeuromorphicMachine

SebastianSchmittetal.,acceptedIJCNN2017

Physicalmodelemulation

Feed-forward,rate-based.4-layerspikingnetworkMNISTclassificationonaphysicalmodelmachine

performancebeforeandafterhardwarein-the-looplearning

https://arxiv.org/abs/1703.01909

MNISTclassificationonaphysicalmodelmachineNeuronalfiringactivityafterhardwarein-the-looplearning

input 2xhidden

label

https://arxiv.org/abs/1703.01909

Schmuker,M.etal.,"Aneuromorphicnetworkforgenericmultivariatedataclassification."ProceedingsoftheNationalAcademyofSciences(2014):201303053.

3LayerSpikingNeuronNetworkderivedfromInsectOlfactorySystem

LI:ReceptorNeurons

LII:Decorrelationthroughlateralinhibition(Glomeruli)

LIII:Association(SoftWTAthroughstronginhibitorypopulatuions)

SupervisedLearningSynapticProjectionsfromLayer2toLayer3

Exampleforinsectbrainderivedcircuit

Schmuker,M.etal.,"Aneuromorphicnetworkforgenericmultivariatedataclassification."ProceedingsoftheNationalAcademyofSciences(2014):201303053.

Neuronalfiringactivitybeforeandafterlearning

Applicationingenericmultivariatedataclassification

T. Pfeil, A.-C. Scherzer, J. Schemmel and K. Meier, Neuromorphic Learning towards Nano Second Precision, Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN). Dallas, TX, USA: IEEE Press, 2013, pp. 869-873.

On-chip:Spike-Timing-

Dependent-Plasticity

BoltzmannMachines

Networksofsymmetricallyconnectedstochasticnodesk

Stateofnodesdescribedbyvectorofbinaryrandomvariableszk(0,1)

Probabilityforstate-vectorconvergestoatargetBoltzmann-distributionEnergyfunction

WHATFOR?Learninternalstochasticmodelofinputspace–Generateordiscriminate

Stochasticallymodulatedfiringprobability

Deterministicfiring

M.A.Petrovici,J.Bill,I.Bytschok,J.Schemmel,andK.M.:Stochasticinferencewithspikingneuronsinthehigh-conductancestate.PhysicalReviewE94,2016

StochasticityfromØ  ExternalnoisesourceØ  InternalnoisesourceØ  Ongoingnetwork

activity

Petrovici,MihaiA.,etal."Stochasticinferencewithdeterministicspikingneurons."arXivpreprintarXiv:1311.3211Buesing,Lars,etal."Neuraldynamicsassampling:amodelforstochasticcomputationinrecurrentnetworksofspikingneurons."PLoSComput.Biol7.11(2011):e1002211.

SpikingLIFNeuronsasa2-state(binary)system

Neuralcomputabilitycondition

Anetworkofspikingneuronsdrawssamplesfromthejointdistributionpifthemembranepotentials!"oftheneuronsfollows

Correspondstoalogisticactivationfunctionoftheneuron

PhD,MihaiPetrovici,BALuziweiLeng2015

LearningspecificinputdistributionsbyadjustingLOCALinteractions-  Clampvisibleunitstovalueofparticularpattern–reachthermalequilibrium-  Incremementinteractionbetweenany2nodesthatarebothon-  Runnetworkfreelyandsamplefromstoredprobabilitydistribution-  Inferfromclampedinput

Freerunning„Dreaming“Generative

InferringInputincompatiblewith0

Discriminative

FIGURE 5

100 J1 Joule

10-4 J0.1 milliJoule

10-8 J10 nanoJoule

10-10 J0.1 nanoJoule

10-14 J10 femtoJoule

Energy Scales

EF

FI

CI

EN

CY

EnergyScalesEnergyusedforasynaptictransmission

FillingtheGap

-  Typically10.000.000timesmoreenergyefficientthanstate-of-theartHPC(comparablemodel)

-  10.000lessefficientthanbiology

From:HBPprojectreport

Nature+Real-time Simulation Accelerated

Model

CausalityDetection 10-4s 0.1s 10-8s

SynapticPlasticity 1s 1000s 10-4s

Learning Day 1000Days 10s

Development Year 1000Years 3000s

12OrdersofMagnitude

Evolution >Millenia >1000Millenia >Months

>15OrdersofMagnitude

TimeScales

Processing Element

Processing Element

Processing Element

Processing Element

Rout

er

SerDesSerDesSerDes

SerDes SerDes SerDes SerDes

MCU

Memory Interface

Shared Memory

Shared Memory

Today:Workingprototypes2020:Operationalsystems

Goal:learningcognitivemachines

SpiNNaker-2

4-coreQuadProcessingElement25GIPS/WonasingledieFloatingpointprecisionTruerandomnumbers

BrainScales-2

FlexiblelocallearningOn-the-flynetworkreconfiguration

StructuredneuronsDendriticcomputation

NextgenerationofNMcomputingintheHBP

Final Thoughts Ø  After 10 years of development the BrainScaleS large scale

physical hardware system is being commissioned and delivers first results

Ø  Fully non-Turing, physical model computing can solve

established machine learning tasks Ø  2nd generation physical model systems start to offer very

advanced accelerated local learning capabilities and exploitation of dendritic computation

Goal : Build a continuously learning cognitive machine

Documents

More than Moore with Neuromorphic Computing Architectures · More than Moore with Neuromorphic Computing Architectures Colloquium GSI Darmstadt December 19, 2017 Karlheinz Meier Ruprecht-Karls-Universität