58
BCS547 Neural Decoding

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Embed Size (px)

Citation preview

Page 1: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

BCS547

Neural Decoding

Page 2: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Population Code

Tuning Curves Pattern of activity (r)

-100 0 1000

20

40

60

80

100

Direction (deg)

Act

ivit

y

-100 0 1000

20

40

60

80

100

Preferred Direction (deg)

Act

ivit

y s?

Page 3: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Nature of the problem

In response to a stimulus with unknown orientation s, you observe a pattern of activity r. What can you say about s given r?

Bayesian approach: recover p(s|r) (the posterior distribution)

Estimation theory: come up with a single value estimate from rs

Page 4: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Maximum Likelihood

Tuning Curves

-100 0 1000

20

40

60

80

100

Direction (deg)

Act

ivit

y

Pattern of activity (r)

-100 0 1000

20

40

60

80

100

Preferred Direction (deg)

Act

ivit

y

Page 5: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

-100 0 1000

20

40

60

80

100

Preferred Direction (deg)

Act

ivit

y

Maximum Likelihood

Template

Page 6: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

-100 0 100

20

40

60

80

100

0

Preferred Direction (deg)

Act

ivit

y

Maximum Likelihood

Template

MLs

Page 7: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Maximum Likelihood

-100 0 100

20

40

60

80

100

0

Preferred Direction (deg)

Act

ivit

yMLs

Page 8: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Maximum Likelihood

The maximum likelihood estimate is the value of s maximizing the likelihood p(r|s). Therefore, we seek such that:

s

MLˆ arg max |s

s P s r

Noise distribution

Page 9: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Activity distribution

P(ai|=-60)

P(ri|s=0)

P(ri|s=-60)

Page 10: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Maximum Likelihood

The maximum likelihood estimate is the value of s maximizing the likelihood p(s|r). Therefore, we seek such that:

is unbiased and efficient.

s

MLˆ arg max |s

s P s r

Noise distributionMLs

Page 11: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Estimation Theory

-100 0 1000

20

40

60

80

100

Preferred orientation

Activity vector: r

Decoder ss Encoder(nervous system)

Page 12: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

-100 0 1000

20

40

60

80

100

Preferred retinal location

r2

Decoder

Trial 2

2ss Encoder(nervous system)

-100 0 1000

20

40

60

80

100

Preferred retinal location

r1

Decoder

Trial 1

1ss Encoder(nervous system)

Decoder

Trial 200

200ss Encoder(nervous system)

-100 0 1000

20

40

60

80

100

Preferred retinal location

r200

Page 13: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Estimation Theory

If , the estimate is said to be unbiasedˆ[ | ]E s s s

If is as small as possible, the estimate is said to be efficient2ˆ|s s

-100 0 1000

20

40

60

80

100

Preferred orientation

Activity vector: r

Decoder ss Encoder(nervous system)

Page 14: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Estimation theory

• A common measure of decoding performance is the mean square error between the estimate and the true value

• This error can be decomposed as:

2ˆMSE |E s s s

2 2ˆ|

2 2ˆ|

ˆMSE | s s

s s

E s s s

bias

Page 15: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Efficient Estimators

The smallest achievable variance for an unbiased estimator is known as the Cramer-Rao bound, CR

2.

An efficient estimator is such that

In general :

2 2| CRs s

2 2| CRs s

Page 16: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

and it is equal to:

where p(r|s) is the distribution of the neuronal noise.

Fisher Information

2

1

CR

I s

2

2

ln |P sI s E

s

r

Fisher information is defined as:

Page 17: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Fisher Information

2

2

1 1

1

''

1

22 ' ''''

221

22 '

22

ln P |

P | P |!

ln P | ln ln !

ln P |

ln P |

ln P |

i ik f sn n

ii i

i i i

n

i i i ii

ni i

ii i

ni i i i

ii ii

i i i i

i

sI E

s

f s es r k s

k

s k f s f s k

s k f sf s

s f s

s k f s k f sf s

s f sf s

s f s f s f s fE

s f s

r

r

r

r

r

r

''''

1

2'

1

n

ii i

ni

i i

sf s

f s

f sI

f s

Page 18: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Fisher Information

• For one neuron with Poisson noise

• For n independent neurons :

The more neurons, the better! Small variance is good!

Large slope is good!

2f

fi

i i

sI s

s

2

2f

fi

ii

sI s d

s

Page 19: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Fisher Information and Tuning Curves

• Fisher information is maximum where the slope is maximum

• This is consistent with adaptation experiments

Page 20: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Fisher Information

• In 1D, Fisher information decreases as the width of the tuning curves increases

• In 2D, Fisher information does not depend on the width of the tuning curve

• In 3D and above, Fisher information increases as the width of the tuning curves increases

• WARNING: this is true for independent gaussian noise.

Page 21: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Ideal observer

The discrimination threshold of an ideal observer, s, is proportional to the variance of the Cramer-Rao Bound.

In other words, an efficient estimator is an ideal observer.

CRs

Page 22: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

• An ideal observer is an observer that can recover all the Fisher information in the activity (easy link between Fisher information and behavioral performance)

• If all distributions are gaussians, Fisher information is the same as Shannon information.

Page 23: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Estimation theory

Other examples of decoders

-100 0 1000

20

40

60

80

100

Preferred orientation

Activity vector: r

Decoder ss Encoder(nervous system)

Page 24: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Voting Methods

Optimal Linear Estimator

ˆ i ii

s w r

Page 25: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Linear Estimators

1

1

*

2*

1

2

1

1

1

1

*

*0 0

,...,

,...,

1

2

1

2

0

0

1

n

n

n

i ii

n

i ii

n

i ii

n

i ii

n

i ii

x x

y y

y ax b

E y y

ax b y

Eax b y

b

E

b

ax b y

b y axn

b y a x

y y a x x

y ax

X

Y

Page 26: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Linear Estimators

*

2*

1

2

1

1

1

12

2

1

1

2

1

2

0

0

n

i ii

n

i ii

n

i i ii

n

i i ii

n

i ixyi

nx

ii

y ax

E y y

ax y

Ex ax y

a

E

a

x ax y

x yC

ax

Page 27: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Linear Estimators

1

1

11 1

1

11 1

1

1

* T

T

2*

1

11 T T

T2 2

...

... ... ...

...

...

... ... ...

...

... 1

1

2

... m

m

n

nm m

n

np p

i

i

ip

n

i ii

XX XY

x yx y

x x

x x

m n

x x

y y

p n

y y

y

p

y

p m

E

n mp

m p m m m p

CC

X

Y

y

y W x

W

y y

W C C XX XY

W

*2

1

i

i

mx y

ii x

Cx

y

X and Y must be zero mean

Trust cells that have small variances and large covariances

Page 28: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Voting Methods

Optimal Linear Estimator

1ˆ ,T

i i si

s w r C C rr rW r W

Page 29: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Voting Methods

Optimal Linear Estimator

Center of Mass

ˆi i

i ii

ij jj j

r sr

s sr r

Linear in ri/jrj

Weights set to si

1ˆ ,T

i i si

s w r C C rr rW r W

Page 30: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Center of Mass/Population Vector

• The center of mass is optimal (unbiased and efficient) iff: The tuning curves are gaussian with a zero baseline, uniformly distributed and the noise follows a Poisson distribution

• In general, the center of mass has a large bias and a large variance

Page 31: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Voting Methods

Optimal Linear Estimator

Center of Mass

Population Vector

ˆi i

i

ii

r ss

r

ˆ

ˆˆ ( )

i i i ii i

r r

s angle

P P P

P

1ˆ ,T

i i si

s w r rr rW r W C C

Linear in ri

Weights set to Pi

Nonlinear step

Page 32: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Population Vector

sriPi

P

Page 33: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Population Vector

11 112 21

1 ?

ˆ Tmi i

i mm

s

rp p

rp p

r

P

rr r P

P P W r

W C C W

Typically, Population vector is not the optimal linear estimator.

Page 34: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Population Vector

• Population vector is optimal iff: The tuning curves are cosine, uniformly distributed and the noise follows a normal distribution with fixed variance

• In most cases, the population vector is biased and has a large variance

• The variance of the population vector estimate does not reflect Fisher information

Page 35: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Population Vector

Population vector

CR bound

Population vector should NEVER be used to estimateinformation content!!!! The indirect method is prone to severe problems…

Page 36: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Population Vector

PVs

Page 37: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Maximum Likelihood

-100 0 100

20

40

60

80

100

0

Preferred Direction (deg)

Act

ivit

yMLs

Page 38: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Maximum Likelihood

If the noise is gaussian and independent

Therefore

and the estimate is given by:

2

2ˆ arg min

2i i

s i

r f ss

2

2| exp

2i i

i

r f sP s

r

2

2log |

2i i

i

r f sP s

r

Distance measure:Template matching

Page 39: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Gradient descent for ML

• To minimize the likelihood function with respect to s, one can use a gradient descent technique in which s is updated according to:

1t t t

t

s s s

Ls

s

Page 40: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Gaussian noise with variance proportional to the mean

If the noise is gaussian with variance proportional to the mean, the distance being minimized changes to:

2

ˆ arg min2

i i

s i i

r f ss

f s

Data point with small variance are weighted more heavily

Page 41: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Poisson noise

If the noise is Poisson then

And :

| ( | )

!

iii

ii

f sr

ii

ii

p s p r s

e f s

r

r

|

!

i ir f s

ii

i

f s ep r s

r

Page 42: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

ML and template matching

Maximum likelihood is a template matching procedure BUT the metric used is not always the Euclidean distance, it depends on the noise distribution.

Page 43: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Bayesian approach

We want to recover p(s|r). Using Bayes theorem, we have:

likelihood of s

posterior distribution over sprior distribution over r

prior distribution over s

||

p s p sp s

p

rr

r

Page 44: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Bayesian approach

What is the likelihood of sp(r| s)?It is the distribution of the noise… It is the same distribution we used for maximum likelihood.

Page 45: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Bayesian approach

• The prior p(s) correspond to any knowledge we may have about s before we get to see any activity.

• Ex: prior for smooth and slow motions

Page 46: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Bayesian approach

Once we have p(sr), we can proceed in two different ways. We can keep this distribution for Bayesian inferences (as we would do in a Bayesian network) or we can make a decision about s. For instance, we can estimate s as being the value that maximizes p(s|r), This is known as the maximum a posteriori estimate (MAP). For flat prior, ML and MAP are equivalent.

Page 47: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Bayesian approach

Limitations: the Bayesian approach and ML require a lot of data (estimating p(r|s) requires at least n+(n-1)(n-1)/2 parameters for multivariate gaussian)…

Alternative: 1- Naïve Bayes: assume independence and hope for the best2- Use clever method for fitting p(r|s).3- Estimate p(s|r) directly using a nonlinear estimate.4- hope the brain uses likelihood functions that have only N free parameters, e.g., the exponential family with linear sufficient statistics

Page 48: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Bayesian approach:logistic regression

Example: Decoding finger movements in M1. On each trial, we observe 100 cells and we want to know which one of the 5 fingers is being moved.

1 2 3 100

1 2 3 4 5

…100 input units

5 categories

P(F5|r)

r

1

0

| Ti iP F g t r W r

g(x)

Page 49: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

P(F5|r)

Bayesian approach:logistic regression

Example: 5N free parameters instead of O(N2)

1 2 3 100

1 2 3 4 5

…100 input units

5 categories

r

1

0

| Ti iP F s t r W r

s

Page 50: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Bayesian approach:multinomial distributions

Example: Decoding finger movements in M1. Each finger can take 3 mutually exclusive states: no movement, flexion, extension.

Probability of no movementProbability of flexionProbability of extension

Activity of the N M1 neurons

W

Digit 1 Wrist

Softmax

Digit 2 Digit 3 Digit 4 Digit 5

Page 51: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Decoding time varying signals

s(t)

(t)

Page 52: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Decoding time varying signals

s t

ˆ *t

os t k t t k t d

t

Note the time shift…

Page 53: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Decoding time varying signals

1

1

ˆ o

t

t n

ii

n

ii

s t t k t

k t d

k t t d

k t t

Discrete sum of templates centered on spikes

Page 54: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Decoding time varying signals

s(t)

(t)

Page 55: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

Decoding time varying signals

• Finding the optimal kernel (similar to OLE)

ˆ

s

s t k t

s k

Qk

Q

Page 56: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

est 01

est 0

2

est 0 00

2

00

0

0

0 0 01

1

1

' ' '

1'

if

1 1then

n

ii

T

T

s

T

n

s ii

s t K t t r d K

s t d t r K

E dt s t s tT

E dt d t r K s tT

d Q K Q

Q dt t r t rT

Q r

K Q C s tr n

0

otherwise

1exp

2

exps

K d K i

Q iK

Q

Autocorrelation function of the spike train

Appendix A chapter 2

If the spike train is uncorrelated, the optimal kernel is the spike triggered average of the stimulus

Correlation of the firing rate and stimulus

1'

T

sQ dt t r s tT

Page 57: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80

0

01

0 01

1

1'

1

1 1

1

T

s

NT

ii

N T T

ii

N

ii

Q dt t r s tT

dt t r s tT

dt t s t dtr s tT T

s tT

Page 58: BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) -1000100 0 20 40 60 80 100 Direction (deg) Activity -1000100 0 20 40 60 80