34
Bayesian inversion of deterministic dynamic causal models Kay H. Brodersen 1,2 & Ajita Gupta 2 Mike West 1 Department of Economics, University of Zurich, Switzerland 2 Department of Computer Science, ETH Zurich, Switzerland

Bayesian inversion of deterministic dynamic causal models

Embed Size (px)

Citation preview

Page 1: Bayesian inversion of deterministic dynamic causal models

Bayesian inversion of deterministic dynamic causal models

Kay H. Brodersen1,2 & Ajita Gupta2

Mike West

1 Department of Economics, University of Zurich, Switzerland 2 Department of Computer Science, ETH Zurich, Switzerland

Page 2: Bayesian inversion of deterministic dynamic causal models

2

1. Problem setting Model; likelihood; prior; posterior.

2. Variational Laplace Factorization of posterior; why the free-energy; energies; gradient ascent; adaptive step size; example.

3. Sampling Transformation method; rejection method; Gibbs sampling; MCMC; Metropolis-Hastings; example.

4. Model comparison Model evidence; Bayes factors; free-energy; prior arithmetic mean; posterior harmonic mean; Savage-Dickey; example.

Overview

With material from Will Penny, Klaas Enno Stephan, Chris Bishop, and Justin Chumbley.

Page 3: Bayesian inversion of deterministic dynamic causal models

Problem setting 1

Page 4: Bayesian inversion of deterministic dynamic causal models

4

Model = likelihood + prior

Page 5: Bayesian inversion of deterministic dynamic causal models

5

Question 1: what do the data tell us about the model parameters?

compute the posterior

𝑝 𝜃 𝑦, 𝑚 =𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚

𝑝 𝑦 𝑚

Question 2: which model is best?

compute the model evidence

𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚) = ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃

Bayesian inference is conceptually straightforward

?

?

Page 6: Bayesian inversion of deterministic dynamic causal models

Variational Laplace 2

Page 7: Bayesian inversion of deterministic dynamic causal models

7

Variational Laplace in a nutshell

, | ,p y q q q

( )

( )

exp exp ln , ,

exp exp ln , ,

q

q

q I p y

q I p y

Iterative updating of sufficient statistics of approx. posteriors by gradient ascent.

ln | , , , |

ln , , , , , |q

p y m F KL q p y

F p y KL q p m

Mean field approx.

Neg. free-energy approx. to model evidence.

Maximise neg. free energy wrt. q = minimise divergence,

by maximising variational energies

K.E. Stephan

Page 8: Bayesian inversion of deterministic dynamic causal models

8

Variational Laplace

Assumptions

mean-field approximation

Laplace approximation

W. Penny

Page 9: Bayesian inversion of deterministic dynamic causal models

9

Variational Laplace

Inversion strategy

Recall the relationship between the log model evidence and the negative free-energy 𝐹:

Maximizing 𝐹 implies two things:

(i) we obtain a good approximation to ln 𝑝 𝑦 𝑚

(ii) the KL divergence between 𝑞 𝜃 and 𝑝 𝜃 𝑦, 𝑚 becomes minimal

Practically, we can maximize F by iteratively (EM) maximizing the variational energies:

W. Penny

ln 𝑝 𝑦 𝑚 = 𝐸𝑞 ln 𝑝 𝑦 𝜃 − 𝐾𝐿 𝑞 𝜃 |𝑝 𝜃 𝑚

=: 𝐹

+ 𝐾𝐿 𝑞 𝜃 ||𝑝 𝜃 𝑦, 𝑚≥0

Page 10: Bayesian inversion of deterministic dynamic causal models

10

Variational Laplace

Implementation: gradient-ascent scheme (Newton’s method)

Compute curvature matrix

Compute gradient vector

𝑗𝜃 𝑖 =𝜕𝐼(𝜃)

𝜕𝜃(𝑖)

𝐻𝜃 𝑖, 𝑗 =𝜕2𝐼(𝜃)

𝜕𝜃(𝑖)𝜕𝜃(𝑗)

Newton’s Method for finding a root (1D)

( ( ))( ) ( )

'( ( ))

f x oldx new x old

f x old

Page 11: Bayesian inversion of deterministic dynamic causal models

11

Variational Laplace

New estimate

Compute Newton update (change)

Big curvature -> small step

Small curvature -> big step

Implementation: gradient-ascent scheme (Newton’s method)

Newton’s Method for finding a root (1D)

( ( ))( ) ( )

'( ( ))

f x oldx new x old

f x old

W. Penny

Page 12: Bayesian inversion of deterministic dynamic causal models

12

Newton’s method – demonstration

Newton’s method is very efficient. However, its solution is not insensitive to the starting point, as shown above.

http://demonstrations.wolfram.com/LearningNewtonsMethod/

Page 13: Bayesian inversion of deterministic dynamic causal models

13

Variational Laplace

Nonlinear regression (example)

Ground truth

(known parameter values that were used to generate the data on the left):

Model (likelihood):

Data:

t

y(t)

W. Penny

where

Page 14: Bayesian inversion of deterministic dynamic causal models

14

Variational Laplace

Nonlinear regression (example)

We begin by defining our prior: Prior density ( ) (ground truth)

W. Penny

Page 15: Bayesian inversion of deterministic dynamic causal models

15

Variational Laplace

Nonlinear regression (example)

Posterior density ( ) VL optimization (4 iterations)

Starting point ( 2.9, 1.65 )

VL estimate ( 3.4, 2.1 )

True value ( 3.4012, 2.0794 )

W. Penny

Page 16: Bayesian inversion of deterministic dynamic causal models

Sampling 3

Page 17: Bayesian inversion of deterministic dynamic causal models

17

Sampling

Deterministic approximations

computationally efficient

efficient representation

learning rules may give additional insight

application initially involves hard work

systematic error

Stochastic approximations

asymptotically exact

easily applicable general-purpose algorithms

computationally expensive

storage intensive

Page 18: Bayesian inversion of deterministic dynamic causal models

18

We can obtain samples from some distribution 𝑝 𝑧 by first sampling from the uniform distribution and then transforming these samples.

Strategy 1 – Transformation method

0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

p(u

)

0 1 2 3 40

0.5

1

1.5

2

p(z

)

𝑢 = 0.9 𝑧 = 1.15

transformation: 𝑧 𝜏 = 𝐹−1 𝑢 𝜏

yields high-quality samples

easy to implement

computationally efficient

obtaining the inverse cdf can be difficult

Page 19: Bayesian inversion of deterministic dynamic causal models

19

Strategy 2 – Rejection method

When the transformation method cannot be applied, we can resort to a more general method called rejection sampling. Here, we draw random numbers from a simpler proposal distribution 𝑞(𝑧) and keep only some of these samples.

The proposal distribution 𝑞(𝑧), scaled by a factor 𝑘, represents an envelope of 𝑝(𝑧).

yields high-quality samples

easy to implement

can be computationally efficient

computationally inefficient if proposal is a poor approximation Bishop (2007) PRML

Page 20: Bayesian inversion of deterministic dynamic causal models

20

Often the joint distribution of several random variables is unavailable, whereas the full-conditional distributions are available. In this case, we can cycle over full-conditionals to obtain samples from the joint distribution.

Strategy 3 – Gibbs sampling

easy to implement

samples are serially correlated

the full-conditions may not be available

Bishop (2007) PRML

Page 21: Bayesian inversion of deterministic dynamic causal models

21

Idea: we can sample from a large class of distributions and overcome the problems that previous methods face in high dimensions using a framework called Markov Chain Monte Carlo.

Strategy 4 – Markov Chain Monte Carlo (MCMC)

Increase in density:

Decrease in density:

p(z*) p(zτ)

p(zτ)

p(z*)

z*

z*

1

)(~*)(~

zp

zp

M. Dümcke

Page 22: Bayesian inversion of deterministic dynamic causal models

22

MCMC demonstration: finding a good proposal density

When the proposal distribution is too narrow, we might miss a mode.

When it is too wide, we obtain long constant stretches without an acceptance.

http://demonstrations.wolfram.com/MarkovChainMonteCarloSimulationUsingTheMetropolisAlgorithm/

Page 23: Bayesian inversion of deterministic dynamic causal models

23

MH creates as series of random points 𝜃 1 , 𝜃 2 , … , whose distribution converges to the target distribution of interest. For us, this is the posterior density 𝑝(𝜃|𝑦).

We could use the following proposal distribution:

𝑞 𝜃 𝜏 𝜃 𝜏−1 = 𝒩 0, 𝜎𝐶𝜃

MCMC for DCM

prior covariance

scaling factor adapted such that acceptance rate is between 20% and 40%

W. Penny

Page 24: Bayesian inversion of deterministic dynamic causal models

24

MCMC for DCM

W. Penny

Page 25: Bayesian inversion of deterministic dynamic causal models

25

MCMC – example

W. Penny

Page 26: Bayesian inversion of deterministic dynamic causal models

26

MCMC – example

W. Penny

Page 27: Bayesian inversion of deterministic dynamic causal models

27

MCMC – example

Variational-Laplace Metropolis-Hastings

W. Penny

Page 28: Bayesian inversion of deterministic dynamic causal models

Model comparison 4

Page 29: Bayesian inversion of deterministic dynamic causal models

29

Model evidence

W. Penny

Page 30: Bayesian inversion of deterministic dynamic causal models

30

Prior arithmetic mean

W. Penny

Page 31: Bayesian inversion of deterministic dynamic causal models

31

Posterior harmonic mean

W. Penny

Page 32: Bayesian inversion of deterministic dynamic causal models

32

In many situations we wish to compare models that are nested. For example:

𝑚𝐹: full model with parameters 𝜃 = 𝜃1, 𝜃2

𝑚𝑅: reduced model with 𝜃 = (𝜃1, 0)

Savage-Dickey ratio

In this case, we can use the Savage-Dickey ratio to obtain a Bayes factor without having to compute the two model evidences:

𝐵𝑅𝐹 =𝑝 𝜃2 = 0 𝑦, 𝑚𝐹

𝑝 𝜃2 = 0 𝑚𝐹

𝐵𝑅𝐹 = 0.9

W. Penny

Page 33: Bayesian inversion of deterministic dynamic causal models

33

Comparison of methods

Chumbley et al. (2007) NeuroImage

MCMC VL

Page 34: Bayesian inversion of deterministic dynamic causal models

34

Comparison of methods

esti

mat

ed m

inu

s tr

ue

log

BF

Chumbley et al. (2007) NeuroImage