Bayesian inference - University of Edinburgh. Daunizeau Institute of Empirical Research in...

Preview:

Citation preview

J. Daunizeau

Institute of Empirical Research in Economics, Zurich, Switzerland

Brain and Spine Institute, Paris, France

Bayesian inference

Overview of the talk

1 Probabilistic modelling and representation of uncertainty

1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

Overview of the talk

1 Probabilistic modelling and representation of uncertainty

1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

Degree of plausibility desiderata:- should be represented using real numbers (D1)

- should conform with intuition (D2)- should be consistent (D3)

a=2b=5

a=2

• normalization:

• marginalization:

• conditioning :(Bayes rule)

Bayesian paradigmprobability theory: basics

Bayesian paradigmderiving the likelihood function

- Model of data with unknown parameters:

( )y f θ= e.g., GLM: ( )f Xθ θ=

- But data is noisy: ( )y f θ ε= +

- Assume noise/residuals is ‘small’:

( ) 2

2

1exp

2p ε ε

σ

∝ −

( )4 0.05P ε σ> ≈

ε

→ Distribution of data, given fixed parameters:

( ) ( )( )2

2

1exp

2p y y fθ θ

σ

∝ − −

θ

f

Likelihood:

Prior:

Bayes rule:

Bayesian paradigmlikelihood, priors and the model evidence

θ

generative model m

Bayesian paradigmforward and inverse problems

( ),p y mϑ

forward problem

likelihood

( ),p y mϑ

inverse problem

posterior distribution

Principle of parsimony :

« plurality should not be assumed without necessity »

y=f(

x)y

= f

(x)

x

“Occam’s razor” :

model e

vid

en

ce p

(y|m

)

space of all data sets

Model evidence:

Bayesian paradigmmodel comparison

••• hierarchy

causality

Hierarchical modelsprinciple

Hierarchical modelsdirected acyclic graphs (DAGs)

•••

prior densities posterior densities

Hierarchical modelsunivariate linear hierarchical model

( )t t Y≡ t *

( )0*P t t H>

( )0p t H

( )0*P t t H α> ≤if then reject H0

• estimate parameters (obtain test stat.)

H

0:θ = 0• define the null, e.g.:

• apply decision rule, i.e.:

classical SPM

( )p yθ

θ

( )0P H y

( )0P H y α≥if then accept H0

• invert model (obtain posterior pdf)

H

0:θ > 0• define the null, e.g.:

• apply decision rule, i.e.:

Bayesian PPM

Frequentist versus Bayesian inferencea (quick) note on hypothesis testing

Y

• define the null and the alternative hypothesis in terms of priors, e.g.:

( )

( ) ( )

0 0

1 1

1 if 0:

0 otherwise

: 0,

H p H

H p H N

θθ

θ

==

= Σ

( )( )

0

1

1P H y

P H y≤if then reject H0• apply decision rule, i.e.:

y

Frequentist versus Bayesian inferencewhat about bilateral tests?

( )1p Y H

( )0p Y H

space of all datasets

• Savage-Dickey ratios (nested models, i.i.d. priors):

( ) ( )( )( )

1

0 1

1

0 ,

0

p y Hp y H p y H

p H

θ

θ

==

=

Overview of the talk

1 Probabilistic modelling and representation of uncertainty

1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

Sampling methodsMCMC example: Gibbs sampling

Variational methodsVB / EM / ReML

→ VB : maximize the free energy F(q) w.r.t. the “variational” posterior q(θ)

under some (e.g., mean field, Laplace) approximation

( )1 or 2q θ

( )1 or 2,p y mθ

( )1 2, ,p y mθ θ

θ

1

θ

2

Overview of the talk

1 Probabilistic modelling and representation of uncertainty

1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

realignmentrealignment smoothingsmoothingnormalisationnormalisation general linear modelgeneral linear modeltemplatetemplate Gaussian Gaussian field theoryfield theoryp <0.05p <0.05statisticalstatisticalinferenceinference

segmentationsegmentationsegmentationsegmentationand normalisationand normalisationand normalisationand normalisation dynamic causaldynamic causaldynamic causaldynamic causalmodellingmodellingmodellingmodellingposterior probabilityposterior probabilityposterior probabilityposterior probabilitymaps (PPMs)maps (PPMs)maps (PPMs)maps (PPMs) multivariatemultivariatemultivariatemultivariatedecodingdecodingdecodingdecoding

grey matter CSFwhite matter

yi

ci λ

µk

µ2

µ1

σ1σ 2 σ k

class variances

class

means

ith voxel

value

ith voxellabel

class

frequencies

aMRI segmentationmixture of Gaussians (MoG) model

Decoding of brain imagesrecognizing brain states from fMRI

+

fixation cross

>>

paceresponse

log-evidence of X-Y sparse mappings:effect of lateralization

log-evidence of X-Y bilateral mappings:effect of spatial deployment

fMRI time series analysisspatial priors and model comparison

PPM: regions best explainedby short-term memory model

PPM: regions best explained by long-term memory model

fMRI time series

GLM coeff

prior variance

of GLM coeff

prior variance

of data noiseAR coeff

(correlated noise)

short-term memorydesign matrix (X)

long-term memorydesign matrix (X)

m2m1 m3 m4V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattentionV1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattentionm1 m2 m3 m4151050 V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention1.251.251.251.250.130.130.130.130.460.460.460.46 0.390.390.390.390.260.260.260.260.260.260.260.26 0.100.100.100.10 estimatedeffective synaptic strengthsfor best model (m4)models marginal likelihood

ln p y m( )

Dynamic Causal Modellingnetwork structure identification

1 2

31 2

31 2

3

1 2

3

time

( , , )x f x u θ=&

32θ

21θ

13θ

0t∆ →

t t+ ∆

t t− ∆

tu

13

uθ3

DCMs and DAGsa note on causality

m1

m2

diff

ere

nces in

lo

g-

mo

de

l e

vid

en

ces

( ) ( )1 2ln lnp y m p y m−

subjects

fixed effect

random effect

assume all subjects correspond to the same model

assume different subjects might correspond to different models

Dynamic Causal Modellingmodel comparison for group studies

I thank you for your attention.

A note on statistical significancelessons from the Neyman-Pearson lemma

• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test

( )( )

1

0

p y Hu

p y HΛ = ≥

is the most powerful test of size to test the null. ( )0p u Hα = Λ ≥

MVB (Bayes factor) u=1.09, power=56%

CCA (F-statistics)

F=2.20, power=20%

error I rate

1 -

err

or

II r

ate

ROC analysis

• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?

Recommended