Maximum Likelihood and GLMs

Maximum Likelihood and GLMs

Jonathan Pillow

Mathematical Tools for Neuroscience (NEU 314)Fall, 2021

lecture 20

1

quiz

1) Compute the conditional P(x | y = 1) 2) Compute the mean E(y)3) Compute the P(x)P(y), the independent approximation to P(x,y)4) Compute the entropy of P(x)5) Write down a formula for mutual information, I(x,y).

1

0.25 0.5

0.25 0

2

1

2

x

P(x,y)

y

BONUS: Compute the mutual information between x and y.

(Feel free to use calculator for this one, or you can use the fact that the entropy of the distribution [1/3 2/3] is approximately 0.9)

2

Estimation

( )1 2, ,..., Nr r r=r

s

neuron #

spik

e co

unt

parameter(“stimulus”)

measured dataset(“population response”)

model

Maximum likelihood estimator= value of at which the likelihood is maximal ✓̂ML

Maximum a posteriori (MAP) estimator= value of at which the posterior is maximal ✓̂MAP p(✓|m)

3

Simple Example: Gaussian noise & prior

1. Likelihood

additive Gaussian noise

zero-mean Gaussian2. Prior

mean variance

⟹ Posterior:

encoding model:

4

8 0 8

8

0

8

-

-

θ

m

Observation model

5

θ

m

8 0 8

8

0

8

-

-

Observation model

6

θ

m

8 0 8

8

0

8

-

-

Observation model

7

θ

m

8 0 8

8

0

8

-

-

Likelihood: considering as a function of θ

8

θ

m

8 0 8

8

0

8

-

-

8 0 8-

8 0 8-

Likelihood: considering as a function of θ

9

Prior

θ

m

8 0 8-

8

0

8

-

10

Computing the posterior

x

likelihood prior

00

∝

posterior

0

0

θm

11

x ∝

likelihood prior posterior

00 0

00 0

0

bias

m*

θm

Making an Bayesian Estimate:

12

x ∝


00 0

00 0

0

largerbias

θm

High Measurement Noise: large bias

13

x ∝


00 0

00 0

0

smallbias

θm

Low Measurement Noise: small bias

14

Bayesian Estimation:

• Likelihood and prior combine to form posterior

• MAP estimate is always biased towards the prior (compared to the ML estimate)

15

+

Which grating moves faster?

Application #1: Biases in Motion Perception

16

+

Which grating moves faster?

Application #1: Biases in Motion Perception

17

Explanation from Weiss, Simoncelli & Adelson (2002):

• In the limit of a zero-contrast grating, likelihood becomes infinitely broad ⇒ percept goes to zero-motion.

prior priorlikelihood

likelihoodposterior

0 0

Noisier measurements, so likelihood is broader⇒ posterior has

larger shift toward 0(prior = no motion)

• Claim: explains why people actually speed up when driving in fog!

18

Maximum Likelihood Estimation: 2 worked examples for spike count

encoding models

19

Example 1: linear Poisson neuron

spike count

spike rate

encoding model:

stimulusparameter

important distributions

Gaussian0 1 2 3 4 5 6 7 8 9 10

−3 −2 −1 0 1 2 3

Poisson

0 1 2 3 4 5 6 7 8 9 10

−3 −2 −1 0 1 2 3

others that may come up: Bernoulli, binomial, multinomial, exponential, gamma,

37

= mean

P(y)

20

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60

conditional distributionp(y|x)

21

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60


22

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60


23

Maximum Likelihood Estimation:

• given observed data , find that maximizes

all spikecounts

all stimuli

parameters

}single-trial probability

Q: what assumption are we making about the responses?A: conditional independence across trials!

24

Q: when do we call a likelihood?

Maximum Likelihood Estimation:


all spikecounts

all stimuli

parameters

}single-trial probability

Q: what assumption are we making about the responses?A: conditional independence across trials!

A: when considering it as a function of !

25

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)Maximum Likelihood Estimation:


p(y|x)

• could in theory do this by turning a knob

26

0 20 400

20

40

60

(contrast)

(spi

ke c

ount



p(y|x)


27

0 20 400

20

40

60

(contrast)

(spi

ke c

ount



p(y|x)


28

likelihood

Likelihood function: as a function of

Because data are independent:

0 1 2

29

0 1 2

log-likelihood

log

Likelihood function: as a function of

Because data are independent:

0 1 2

likelihood

30

0 1 2

log-likelihood

Do it: solve for

31

•Closed-form solution:

0 1 2

log-likelihood

(let’s notice: this is kind of a weird result!)

32

Example 2: linear Gaussian neuron

spike count

spike rate

encoding model:

stimulusparameter

33

0 20 40

0

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60

All slices have same width

encoding distribution

34

Do it: differentiate, set to zero, and solve for .θ

Log-Likelihood

35

Documents

Maximum Likelihood and GLMs