Download ppt - Spike timing dependent plasticity - STDP Markram et. al. 1997 +10 ms -10 ms Pre before Post: LTP Post before Pre: LTD

Spike timing dependent plasticity - STDP

Markram et. al. 1997

+10 ms

-10 ms

Pre before Post: LTP

Post before Pre: LTD

Pre follows Post:Long-term Depression

Pre

tPre

Post

tPost

Synaptic

change %

Spike Timing Dependent Plasticity: Temporal Hebbian Learning

Weight-change curve (Bi&Poo, 2001)

Pre

tPre

Post

tPost

Pre precedes Post:Long-term Potentiation

Aca

usal

Causal

(possibly)

Machine Learning Classical Conditioning Synaptic Plasticity

Dynamic Prog.(Bellman Eq.)

REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based

d-Rule

Monte CarloControl

Q-Learning

TD( )often =0

ll

TD(1) TD(0)

Rescorla/Wagner

Neur.TD-Models(“Critic”)

Neur.TD-formalism

DifferentialHebb-Rule

(”fast”)

STDP-Modelsbiophysical & network

EVALUATIVE FEEDBACK (Rewards)

NON-EVALUATIVE FEEDBACK (Correlations)

SARSA

Correlationbased Control

(non-evaluative)

ISO-Learning

ISO-Modelof STDP

Actor/Critictechnical & Basal Gangl.

Eligibility Traces

Hebb-Rule


(”slow”)

supervised L.

Anticipatory Control of Actions and Prediction of Values Correlation of Signals

=

=

=

Neuronal Reward Systems(Basal Ganglia)

Biophys. of Syn. PlasticityDopamine Glutamate

STDP

LTP(LTD=anti)

ISO-Control

Overview over different methods

You are here !

I. Pawlow

History of the Concept of TemporallyAsymmetrical Learning: Classical Conditioning

I. Pawlow


Correlating two stimuli which are shifted with respect to each other in time.

Pavlov’s Dog: “Bell comes earlier than Food”

This requires to remember the stimuli in the system.

Eligibility Trace: A synapse remains “eligible” for modification for some time after it was active (Hull 1938, then a still abstract concept).

0 = 1

1

Unconditioned Stimulus (Food)

Conditioned Stimulus (Bell)

Response

X

+Stimulus Trace E

The first stimulus needs to be “remembered” in the system

Classical Conditioning: Eligibility Traces

I. Pawlow


Eligibility Traces

Note: There are vastly different time-scales for (Pavlov’s) behavioural experiments:

Typically up to 4 seconds

as compared to STDP at neurons:

Typically 40-60 milliseconds (max.)

Machine Learning Classical Conditioning Synaptic Plasticity

Dynamic Prog.(Bellman Eq.)

REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based

d-Rule

Monte CarloControl

Q-Learning

TD( )often =0

ll

TD(1) TD(0)

Rescorla/Wagner

Neur.TD-Models(“Critic”)

Neur.TD-formalism


(”fast”)

STDP-Modelsbiophysical & network

EVALUATIVE FEEDBACK (Rewards)

NON-EVALUATIVE FEEDBACK (Correlations)

SARSA

Correlationbased Control

(non-evaluative)

ISO-Learning

ISO-Modelof STDP

Actor/Critictechnical & Basal Gangl.

Eligibility Traces

Hebb-Rule


(”slow”)

supervised L.

Anticipatory Control of Actions and Prediction of Values Correlation of Signals

=

=

=

Neuronal Reward Systems(Basal Ganglia)

Biophys. of Syn. PlasticityDopamine Glutamate

STDP

LTP(LTD=anti)

ISO-Control

Overview over different methods

Mathematical formulation of learning rules is

similar but time-scales are much different.

Early: “Bell”

Late: “Food”

x

)( )( )( tytutdt

dii

Differential Hebb Learning Rule

Xi

X0

Simpler Notationx = Inputu = Traced Input

V

V’(t)

ui

u0

Defining the TraceIn general there are many ways to do this, but usually one chooses a trace that looks biologically realistic and allows for some analytical calculations, too.

EPSP-like functions:-function:

Double exp.:

This one is most easy to handle analytically and, thus, often used.

DampenedSine wave:

Shows an oscillation.

h(t) =n

0 t<0hk(t) tõ 0

h(t) = teà atk

h(t) = b1sin(bt) eà at

k

h(t) = î1(eà at à eà bt)

k

Defining the Traced Input u

Convolution used to define the traced input,

Correlation used to calculate weight growth (see below).

)()()()()()()( xfxgxgxfduuxgufxh

u

)()()()()()()( xgxfxfxgduxugufxh

w

Defining the Traced Input u

)()()()()()()( xfxgxgxfduuxgufxh

u

Specifically (we are dealing with causal functions!):

u(t) = s0

1x(ü)h(t à ü)dü

If x is a spike train(using the d-function):

Then:

For example:

u(t) =P

j=0

Mh(t à tj)

u(t) = h(t à T)

u(t) = h(t)x(t) = î (0)

x(t) = î (T)

x(t) =P

j=0

Mî (tj)

Differential Hebb Rules – The Basic Rule

General:

Two inputs only. Thuswe get for the output:

v = w0u0+ w1u1

w0=1=const.

One weight unchanging:

Same h for all inputs.

ISO rule

Isotropic Sequence Order Lng.

(as we can also allow w0 to change!)

dtdw1 = ö u1 v0

1

0

X

v

v’

ISO-Learning

h

h

x

x0

1

The basic rule: ISO-Learning

Differential Hebb Rules – More rules (but why?)

ISO3dt

dw1 = ö u1 v0 R0k

Three factor learningk

The denotes that we are onlyusing positive contributions

1

X

v

v’

ISO3-Learning

h

h

hr

x

x

r >0R’

0

1

0

ISO3 - Learning

ICO

Input correlation Learning

(as we take the derivative of the

unchanging input u0)

dtdw1 = ö u1 u0

0

1

X

v

ICO-Learning

h

h

x

x

u ’

0

0

1

0

ICO - Learning

Stability Analysis

x

X1

X0

Vu1

u0

ACCC

dtdw1 = öu1v0

4 w1(t) = s0

1

dtdw1(t) dt

4 w1(t) = 4 wAC1 (t) + 4 wCC

1 (t)

X0X1

Inputs

Stability Analysis

4 w1(t) = 4 wAC1 (t) + 4 wCC

1 (t)Desired contribution

Undesired contribution

Some problems with these differential equations:

4 w1(t) = s0

1

dtdw1(t) dt

1) As we are

integrating to ∞ strictly we need to assume that there is no second pulse pair coming in “ever”.

2) Furthermore we should assume that w1’→0 (hence small) or we get second order influences, too.

Stability Analysis (ISO)Under these assumptions we can calculate wAC and wCC to find out whether the rules are stable or not.

In general we assume two inputs:

x1(t) = î (t) and x0(t) = î (t à T)

dtdw1 = öu1v0and get for ISO:

4 wCC1 = w0sh(t)h0(t à T)dt = w02û

1a+baà bh(t)

4 wAC1 = w1

àesh(t)h0(tà T)dt à 1

á= w1

àe2

1h2(1 ) à 1á

= 0

ISO is (only) asymptotically stable for t→∞

X0X1

InputsT

Magn. of one step

Stability Analysis for pulse pair inputs (ISO)

=0.001

=0.002

time [step]0 2000 6000 10000

0

0.02

0.04

0.06

0.08

0.10

0.12

w1

Setting x0=0

The remaining upward drift is only due to the AC term influence (Instable !)

Single pairingrelaxation behavior

This shows that early arrival of a new pulse pair might easily fall into a not fully relaxed system. (Instable !)

t

w

Notice the ACcontribution

Compare to STDP

w

T-200 -100 0 100 200

-0.02

-0.01

0

0.01

0.02

ISO: Weight change curve

Learning Window (weight change curve)

The weight change curve plots w in dependence on the pulse pairing distance T in steps, where we define T>0 if the x1 signal arrives before x0 and T<0 else.

ISO ruledt

dw1 = ö u1 v0

1

0

X

v

v’

ISO-Learning

h

h

x

x0

1


ICO

Input correlation Learning

(as we take the derivative of the

unchanging input u0)

dtdw1 = ö u1 u0

0

1

X

v

ICO-Learning

h

h

x

x

u ’

0

0

1

0

ICO - Learning

Stability Analysis: Compare ISO with ICO

Notice the difference

Stability Analysis: ICO

4 wAC1 ñ 0

4 wCC1 = w0sh(t)h0(t à T)dt = w02û

1a+baà bh(t)Same as for ISO!

Fully stable !

No more drift!

=0.001

=0.002

time [step]

1

0 2000 6000 100000

0.02

0.04

0.06

0.08

0.10

0.12x =00

w

T-200 -100 0 100 200

-0.02

-0.01

0

0.01

0.02

ICO: Weight change curve(same as for ISO)

Single pulse pair(no more AC term in ICO).

IS O

ICO t

w

ISO ruledt

dw1 = ö u1 v0

1

0

X

v

v’

ISO-Learning

h

h

x

x0

1


ICOdt

dw1 = ö u1 u00

1

X

v

ICO-Learning

h

h

x

x

u ’

0

0

1

0

ICO - Learning

Stability Analysis: More comparisons

Conjoint learning-control-signal (same for all inputs !)

Single input as designated learning-control-signal.

Makes ICO a heterosynaptic rule of questionable biological realism.

Stability Analysis: More comparisonsThis difference is especially visible when wanting to symmetrize the rules (both weights can change!).

X

X

v

v’h

h

x

x0

1i

ISO-Sym One control signal !

T=18

1

0

-3.5-3

-2.5-2

-1.5-1

-0.5 0

0.5 1

1.5 2

0 2 4 6 8 10time [steps]

x106

X

X

v

h

h

x

x0

1 1

0

d/dt

ICO-SymTwocontrolsignals !

time [steps]

-0.2

-0.1

0

0.1

0.2

0 2000 4000 6000 8000 10000

T=15

time [steps]

-0.2

-0.1

0

0.1

0.2

0 2000 4000 6000 8000 10000

T=15

ICO-sym is truly symmetrical, but needs two control signals.

ISO-sym behaves in a difficult and unstable oscillatory way.

X0X1

InputsT

Synapse w1 grows because x1 is before x0.

The Effects of Symmetry

Synapse w0 shrinks because x0 is after x1.

ISO3: uses – like ISO – a single learning-control-signal

ISO3dt

dw1 = ö u1 v0 R0k

1

X

v

v’

ISO3-Learning

h

h

hr

x

x

r >0R’

0

1

0

ISO3 - LearningIdea: The system should learn ONLY at that moment in time when there was a “relevant” event r !

We use a shorter trace for r, as it should remain rather restricted in time.

Same filter function h but parameters ar and br.

We also define Tr as the

interval between x1 and r. Many times Tr=T, hence r occurs together with x0.

Stability Analysis: ISO3

4 wAC1 = w1

Rh(t)h0(t)h0

r(t à Tr)dt

4 wCC1 = w0

Rh(t)h0(t à T)h0

r(t à Tr)dt

Observations:1) Cannot be solved anymore!2) AC term is generally NOT equal to zero.3) Not even asymptotic convergence can be generally assured.

So what have we gained ?

One can show that for Tr=T the AC term vanishes if v has its maximum at T.

u1'

Stability Analysis: ISO3, graphical proof

x0x1

u0

r

Maximum at T

T

AC

CCContributions of AC and CCgraphically depicted

u1

v0(t) = u01(t); t < T

as x0 has not yet happened

limt! Tà

v0(t) = 0

If we restrict learning to the moment when x0 occurs then we do not have any AC contribution.

!! A questionable assumption: argmax(u1) = T !!

Stability Analysis: ISO3

=7.25 x 10-5

=10-4

time [step]0 2000 6000 10000

0x = 01

0

0.02

0.04

0.06

0.08

0.10

0.12 ISO

ISO3

Single pulse pair(ISO3 is stable and relaxesinstantaneously).IS O

ISO3 t

w

∆

-150 0T

150

0

0.001

0.002

Weight change curve(no more STDP!)

No more upwards drift for ISO3

A General Problem: T is usually unknown and variable

Introducing a filter bank: (example ISO)

0

1

11

N

1

N

X

X

xu

u

xu

00

11

1

v

v’

1

1

1

N

0

h

h

h

N1 uu 11

Spreading out the earlier input over time!

Remember: “A questionable assumption: argmax(u1) = T”

Stability Analysis: ISO3 with a filter bank

With a filter bank we get for the output: v = w0u0+P

j=0

Nwj

1uj1

ö14 wk

1 =R

w0uk1u

00R

0 +

Single weights develop now as:

Ruk

1

P

j=1

Nwj

1(uj1)

0R0

dtdw1 = ö u1 v0 R0

kOriginal Rule was:

CC ACWith delta-function inputs at t=0 and t=T we get:

ö14 wk

1 = w0u0(0)uk1(T) +

ð P

jwj

1uj1(T)

ñ0uk

1(T)

It is possible to prove that this term becomes zero as a consequence of the learning!

| {z }