84
Importance sampling methods for Bayesian discrimination between embedded models Importance sampling methods for Bayesian discrimination between embedded models Christian P. Robert Universit´ e Paris Dauphine & CREST-INSEE http://www.ceremade.dauphine.fr/ ~ xian 45th Scientific Meeting of the Italian Statistical Society Padova, 16 giugno 2010 Joint work with J.-M. Marin

45th SIS Meeting, Padova, Italy

Embed Size (px)

DESCRIPTION

These are my abridged slides for the Italian annual stat conference in Padova/Padua.

Citation preview

Page 1: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling methods for Bayesiandiscrimination between embedded models

Christian P. Robert

Universite Paris Dauphine & CREST-INSEEhttp://www.ceremade.dauphine.fr/~xian

45th Scientific Meeting of the Italian Statistical SocietyPadova, 16 giugno 2010

Joint work with J.-M. Marin

Page 2: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Outline

1 Bayesian model choice

2 Importance sampling model comparison solutions compared

Page 3: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Model choice

Model choice as model comparison

Choice between models

Several models available for the same observation

Mi : x ∼ fi(x|θi), i ∈ I

Subsitute hypotheses with models

Page 4: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Model choice

Model choice as model comparison

Choice between models

Several models available for the same observation

Mi : x ∼ fi(x|θi), i ∈ I

Subsitute hypotheses with models

Page 5: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Model choice

Bayesian model choice

Probabilise the entire model/parameter space

allocate probabilities pi to all models Mi

define priors πi(θi) for each parameter space Θi

compute

π(Mi|x) =pi

∫Θi

fi(x|θi)πi(θi)dθi∑j

pj

∫Θj

fj(x|θj)πj(θj)dθj

take largest π(Mi|x) to determine “best” model,

Page 6: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Model choice

Bayesian model choice

Probabilise the entire model/parameter space

allocate probabilities pi to all models Mi

define priors πi(θi) for each parameter space Θi

compute

π(Mi|x) =pi

∫Θi

fi(x|θi)πi(θi)dθi∑j

pj

∫Θj

fj(x|θj)πj(θj)dθj

take largest π(Mi|x) to determine “best” model,

Page 7: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Model choice

Bayesian model choice

Probabilise the entire model/parameter space

allocate probabilities pi to all models Mi

define priors πi(θi) for each parameter space Θi

compute

π(Mi|x) =pi

∫Θi

fi(x|θi)πi(θi)dθi∑j

pj

∫Θj

fj(x|θj)πj(θj)dθj

take largest π(Mi|x) to determine “best” model,

Page 8: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Model choice

Bayesian model choice

Probabilise the entire model/parameter space

allocate probabilities pi to all models Mi

define priors πi(θi) for each parameter space Θi

compute

π(Mi|x) =pi

∫Θi

fi(x|θi)πi(θi)dθi∑j

pj

∫Θj

fj(x|θj)πj(θj)dθj

take largest π(Mi|x) to determine “best” model,

Page 9: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Bayes factor

Bayes factor

Definition (Bayes factors)

For comparing model M0 with θ ∈ Θ0 vs. M1 with θ ∈ Θ1, underpriors π0(θ) and π1(θ), central quantity

B01 =π(Θ0|x)π(Θ1|x)

/π(Θ0)π(Θ1)

=

∫Θ0

f0(x|θ0)π0(θ0)dθ0∫Θ1

f1(x|θ)π1(θ1)dθ1

[Jeffreys, 1939]

Page 10: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Bayesian model choice

Evidence

Evidence

Problems using a similar quantity, the evidence

Zk =∫

Θk

πk(θk)Lk(θk) dθk,

aka the marginal likelihood.[Jeffreys, 1939]

Page 11: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

A comparison of importance sampling solutions

1 Bayesian model choice

2 Importance sampling model comparison solutions comparedRegular importanceBridge samplingMixtures to bridgeHarmonic meansChib’s solutionThe Savage–Dickey ratio

[Marin & Robert, 2010]

Page 12: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Regular importance

Bayes factor approximation

When approximating the Bayes factor

B01 =

∫Θ0

f0(x|θ0)π0(θ0)dθ0∫Θ1

f1(x|θ1)π1(θ1)dθ1

use of importance functions $0 and $1 and

B01 =n−1

0

∑n0i=1 f0(x|θi0)π0(θi0)/$0(θi0)

n−11

∑n1i=1 f1(x|θi1)π1(θi1)/$1(θi1)

Page 13: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Regular importance

Diabetes in Pima Indian women

Example (R benchmark)

“A population of women who were at least 21 years old, of PimaIndian heritage and living near Phoenix (AZ), was tested fordiabetes according to WHO criteria. The data were collected bythe US National Institute of Diabetes and Digestive and KidneyDiseases.200 Pima Indian women with observed variables

plasma glucose concentration in oral glucose tolerance test

diastolic blood pressure

diabetes pedigree function

presence/absence of diabetes

Page 14: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Regular importance

Probit modelling on Pima Indian women

Probability of diabetes function of above variables

P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,

Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:

β ∼ N3(0, n(XTX)−1

)

Page 15: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Regular importance

Probit modelling on Pima Indian women

Probability of diabetes function of above variables

P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,

Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:

β ∼ N3(0, n(XTX)−1

)

Page 16: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Regular importance

Importance sampling for the Pima Indian dataset

Use of the importance function inspired from the MLE estimatedistribution

β ∼ N (β, Σ)

R Importance sampling codemodel1=summary(glm(y~-1+X1,family=binomial(link=probit)))

is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)

is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)

bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],

sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-

dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))

Page 17: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Regular importance

Importance sampling for the Pima Indian dataset

Use of the importance function inspired from the MLE estimatedistribution

β ∼ N (β, Σ)

R Importance sampling codemodel1=summary(glm(y~-1+X1,family=binomial(link=probit)))

is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)

is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)

bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],

sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-

dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))

Page 18: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Regular importance

Diabetes in Pima Indian womenComparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations from the prior andthe above MLE importance sampler

●●

Monte Carlo Importance sampling

23

45

Page 19: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Bridge sampling

Special case:If

π1(θ1|x) ∝ π1(θ1|x)π2(θ2|x) ∝ π2(θ2|x)

live on the same space (Θ1 = Θ2), then

B12 ≈1n

n∑i=1

π1(θi|x)π2(θi|x)

θi ∼ π2(θ|x)

[Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]

Page 20: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

(Further) bridge sampling

General identity:

B12 =

∫π2(θ|x)α(θ)π1(θ|x)dθ∫π1(θ|x)α(θ)π2(θ|x)dθ

∀ α(·)

1n1

n1∑i=1

π2(θ1i|x)α(θ1i)

1n2

n2∑i=1

π1(θ2i|x)α(θ2i)θji ∼ πj(θ|x)

Page 21: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Optimal bridge sampling

The optimal choice of auxiliary function is

α? =n1 + n2

n1π1(θ|x) + n2π2(θ|x)

leading to

B12 ≈

1n1

n1∑i=1

π2(θ1i|x)n1π1(θ1i|x) + n2π2(θ1i|x)

1n2

n2∑i=1

π1(θ2i|x)n1π1(θ2i|x) + n2π2(θ2i|x)

Back later!

Drawback: Dependence on the unknown normalising constantssolved iteratively

Page 22: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Optimal bridge sampling

The optimal choice of auxiliary function is

α? =n1 + n2

n1π1(θ|x) + n2π2(θ|x)

leading to

B12 ≈

1n1

n1∑i=1

π2(θ1i|x)n1π1(θ1i|x) + n2π2(θ1i|x)

1n2

n2∑i=1

π1(θ2i|x)n1π1(θ2i|x) + n2π2(θ2i|x)

Back later!

Drawback: Dependence on the unknown normalising constantssolved iteratively

Page 23: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Extension to varying dimensions

When dim(Θ1) 6= dim(Θ2), e.g. θ2 = (θ1, ψ), introduction of apseudo-posterior density, ω(ψ|θ1, x), augmenting π1(θ1|x) intojoint distribution

π1(θ1|x)× ω(ψ|θ1, x)

on Θ2 so that

B12 =

∫π1(θ1|x)α(θ1, ψ)π2(θ1, ψ|x)dθ1ω(ψ|θ1, x) dψ∫π2(θ1, ψ|x)α(θ1, ψ)π1(θ1|x)dθ1 ω(ψ|θ1, x) dψ

= Eπ2

[π1(θ1)ω(ψ|θ1)π2(θ1, ψ)

]=

Eϕ [π1(θ1)ω(ψ|θ1)/ϕ(θ1, ψ)]Eϕ [π2(θ1, ψ)/ϕ(θ1, ψ)]

for any conditional density ω(ψ|θ1) and any joint density ϕ.

Page 24: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Extension to varying dimensions

When dim(Θ1) 6= dim(Θ2), e.g. θ2 = (θ1, ψ), introduction of apseudo-posterior density, ω(ψ|θ1, x), augmenting π1(θ1|x) intojoint distribution

π1(θ1|x)× ω(ψ|θ1, x)

on Θ2 so that

B12 =

∫π1(θ1|x)α(θ1, ψ)π2(θ1, ψ|x)dθ1ω(ψ|θ1, x) dψ∫π2(θ1, ψ|x)α(θ1, ψ)π1(θ1|x)dθ1 ω(ψ|θ1, x) dψ

= Eπ2

[π1(θ1)ω(ψ|θ1)π2(θ1, ψ)

]=

Eϕ [π1(θ1)ω(ψ|θ1)/ϕ(θ1, ψ)]Eϕ [π2(θ1, ψ)/ϕ(θ1, ψ)]

for any conditional density ω(ψ|θ1) and any joint density ϕ.

Page 25: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Illustration for the Pima Indian dataset

Use of the MLE induced conditional of β3 given (β1, β2) as apseudo-posterior and mixture of both MLE approximations on β3

in bridge sampling estimate

R bridge sampling codecova=model2$cov.unscaled

expecta=model2$coeff[,1]

covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

probit1=hmprobit(Niter,y,X1)

probit2=hmprobit(Niter,y,X2)

pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))

probit1p=cbind(probit1,pseudo)

bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),

sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],

cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+

dnorm(pseudo,expecta[3],cova[3,3])))

Page 26: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Illustration for the Pima Indian dataset

Use of the MLE induced conditional of β3 given (β1, β2) as apseudo-posterior and mixture of both MLE approximations on β3

in bridge sampling estimate

R bridge sampling codecova=model2$cov.unscaled

expecta=model2$coeff[,1]

covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

probit1=hmprobit(Niter,y,X1)

probit2=hmprobit(Niter,y,X2)

pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))

probit1p=cbind(probit1,pseudo)

bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),

sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],

cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+

dnorm(pseudo,expecta[3],cova[3,3])))

Page 27: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Bridge sampling

Diabetes in Pima Indian women (cont’d)Comparison of the variation of the Bayes factor approximationsbased on 100× 20, 000 simulations from the prior (MC), the abovebridge sampler and the above importance sampler

●●

MC Bridge IS

23

45

Page 28: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Mixtures to bridge

Approximating Zk using a mixture representation

Bridge sampling redux

Design a specific mixture for simulation [importance sampling]purposes, with density

ϕk(θk) ∝ ω1πk(θk)Lk(θk) + ϕ(θk) ,

where ϕ(·) is arbitrary (but normalised)Note: ω1 is not a probability weight

[Chopin & Robert, 2010]

Page 29: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Mixtures to bridge

Approximating Zk using a mixture representation

Bridge sampling redux

Design a specific mixture for simulation [importance sampling]purposes, with density

ϕk(θk) ∝ ω1πk(θk)Lk(θk) + ϕ(θk) ,

where ϕ(·) is arbitrary (but normalised)Note: ω1 is not a probability weight

[Chopin & Robert, 2010]

Page 30: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Mixtures to bridge

Approximating Z using a mixture representation (cont’d)

Corresponding MCMC (=Gibbs) sampler

At iteration t

1 Take δ(t) = 1 with probability

ω1πk(θ(t−1)k )Lk(θ

(t−1)k )

/(ω1πk(θ

(t−1)k )Lk(θ

(t−1)k ) + ϕ(θ(t−1)

k ))

and δ(t) = 2 otherwise;

2 If δ(t) = 1, generate θ(t)k ∼ MCMC(θ(t−1)

k , θk) whereMCMC(θk, θ′k) denotes an arbitrary MCMC kernel associatedwith the posterior πk(θk|x) ∝ πk(θk)Lk(θk);

3 If δ(t) = 2, generate θ(t)k ∼ ϕ(θk) independently

Page 31: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Mixtures to bridge

Approximating Z using a mixture representation (cont’d)

Corresponding MCMC (=Gibbs) sampler

At iteration t

1 Take δ(t) = 1 with probability

ω1πk(θ(t−1)k )Lk(θ

(t−1)k )

/(ω1πk(θ

(t−1)k )Lk(θ

(t−1)k ) + ϕ(θ(t−1)

k ))

and δ(t) = 2 otherwise;

2 If δ(t) = 1, generate θ(t)k ∼ MCMC(θ(t−1)

k , θk) whereMCMC(θk, θ′k) denotes an arbitrary MCMC kernel associatedwith the posterior πk(θk|x) ∝ πk(θk)Lk(θk);

3 If δ(t) = 2, generate θ(t)k ∼ ϕ(θk) independently

Page 32: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Mixtures to bridge

Approximating Z using a mixture representation (cont’d)

Corresponding MCMC (=Gibbs) sampler

At iteration t

1 Take δ(t) = 1 with probability

ω1πk(θ(t−1)k )Lk(θ

(t−1)k )

/(ω1πk(θ

(t−1)k )Lk(θ

(t−1)k ) + ϕ(θ(t−1)

k ))

and δ(t) = 2 otherwise;

2 If δ(t) = 1, generate θ(t)k ∼ MCMC(θ(t−1)

k , θk) whereMCMC(θk, θ′k) denotes an arbitrary MCMC kernel associatedwith the posterior πk(θk|x) ∝ πk(θk)Lk(θk);

3 If δ(t) = 2, generate θ(t)k ∼ ϕ(θk) independently

Page 33: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Mixtures to bridge

Evidence approximation by mixtures

Rao-Blackwellised estimate

ξ =1T

T∑t=1

ω1πk(θ(t)k )Lk(θ

(t)k )/ω1πk(θ

(t)k )Lk(θ

(t)k ) + ϕ(θ(t)

k ) ,

converges to ω1Zk/{ω1Zk + 1}Deduce Z3k from ω1Z3k/{ω1Z3k + 1} = ξ ie

Z3k =

∑Tt=1 ω1πk(θ

(t)k )Lk(θ

(t)k )/ω1π(θ(t)

k )Lk(θ(t)k ) + ϕ(θ(t)

k )

∑Tt=1 ϕ(θ(t)

k )/ω1πk(θ

(t)k )Lk(θ

(t)k ) + ϕ(θ(t)

k )

[Bridge sampler]

Page 34: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Mixtures to bridge

Evidence approximation by mixtures

Rao-Blackwellised estimate

ξ =1T

T∑t=1

ω1πk(θ(t)k )Lk(θ

(t)k )/ω1πk(θ

(t)k )Lk(θ

(t)k ) + ϕ(θ(t)

k ) ,

converges to ω1Zk/{ω1Zk + 1}Deduce Z3k from ω1Z3k/{ω1Z3k + 1} = ξ ie

Z3k =

∑Tt=1 ω1πk(θ

(t)k )Lk(θ

(t)k )/ω1π(θ(t)

k )Lk(θ(t)k ) + ϕ(θ(t)

k )

∑Tt=1 ϕ(θ(t)

k )/ω1πk(θ

(t)k )Lk(θ

(t)k ) + ϕ(θ(t)

k )

[Bridge sampler]

Page 35: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

The original harmonic mean estimator

When θki ∼ πk(θ|x),

1T

T∑t=1

1L(θkt|x)

is an unbiased estimator of 1/mk(x)[Newton & Raftery, 1994]

Highly dangerous: Most often leads to an infinite variance!!!

Page 36: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

The original harmonic mean estimator

When θki ∼ πk(θ|x),

1T

T∑t=1

1L(θkt|x)

is an unbiased estimator of 1/mk(x)[Newton & Raftery, 1994]

Highly dangerous: Most often leads to an infinite variance!!!

Page 37: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

Approximating Zk from a posterior sample

Use of the [harmonic mean] identity

Eπk

[ϕ(θk)

πk(θk)Lk(θk)

∣∣∣∣x] =∫

ϕ(θk)πk(θk)Lk(θk)

πk(θk)Lk(θk)Zk

dθk =1Zk

no matter what the proposal ϕ(·) is.[Gelfand & Dey, 1994; Bartolucci et al., 2006]

Direct exploitation of the MCMC output

Page 38: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

Approximating Zk from a posterior sample

Use of the [harmonic mean] identity

Eπk

[ϕ(θk)

πk(θk)Lk(θk)

∣∣∣∣x] =∫

ϕ(θk)πk(θk)Lk(θk)

πk(θk)Lk(θk)Zk

dθk =1Zk

no matter what the proposal ϕ(·) is.[Gelfand & Dey, 1994; Bartolucci et al., 2006]

Direct exploitation of the MCMC output

Page 39: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

Comparison with regular importance sampling

Harmonic mean: Constraint opposed to usual importance samplingconstraints: ϕ(θ) must have lighter (rather than fatter) tails thanπk(θk)Lk(θk) for the approximation

Z1k = 1

/1T

T∑t=1

ϕ(θ(t)k )

πk(θ(t)k )Lk(θ

(t)k )

to have a finite variance.E.g., use finite support kernels for ϕ

Page 40: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

Comparison with regular importance sampling

Harmonic mean: Constraint opposed to usual importance samplingconstraints: ϕ(θ) must have lighter (rather than fatter) tails thanπk(θk)Lk(θk) for the approximation

Z1k = 1

/1T

T∑t=1

ϕ(θ(t)k )

πk(θ(t)k )Lk(θ

(t)k )

to have a finite variance.E.g., use finite support kernels for ϕ

Page 41: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

Comparison with regular importance sampling (cont’d)

Compare Z1k with a standard importance sampling approximation

Z2k =1T

T∑t=1

πk(θ(t)k )Lk(θ

(t)k )

ϕ(θ(t)k )

where the θ(t)k ’s are generated from the density ϕ(·) (with fatter

tails like t’s)

Page 42: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

HPD indicator as ϕUse the convex hull of MCMC simulations corresponding to the10% HPD region (easily derived!) and ϕ as indicator:

ϕ(θ) =10T

∑t∈HPD

Id(θ,θ(t))≤ε

[Robert & Wraith, 2009]

Page 43: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Harmonic means

Diabetes in Pima Indian women (cont’d)Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above harmonic mean sampler and importance samplers

●●

Harmonic mean Importance sampling

3.102

3.104

3.106

3.108

3.110

3.112

3.114

3.116

Page 44: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),

Zk = mk(x) =fk(x|θk)πk(θk)

πk(θk|x)

Use of an approximation to the posterior

Zk = mk(x) =fk(x|θ∗k)πk(θ∗k)

πk(θ∗k|x).

Page 45: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),

Zk = mk(x) =fk(x|θk)πk(θk)

πk(θk|x)

Use of an approximation to the posterior

Zk = mk(x) =fk(x|θ∗k)πk(θ∗k)

πk(θ∗k|x).

Page 46: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Case of latent variables

For missing variable z as in mixture models, natural Rao-Blackwellestimate

πk(θ∗k|x) =1T

T∑t=1

πk(θ∗k|x, z(t)k ) ,

where the z(t)k ’s are Gibbs sampled latent variables

Skip difficulties...

Page 47: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Label switching

A mixture model [special case of missing variable model] isinvariant under permutations of the indices of the components.E.g., mixtures

0.3N (0, 1) + 0.7N (2.3, 1)

and0.7N (2.3, 1) + 0.3N (0, 1)

are exactly the same!c© The component parameters θi are not identifiablemarginally since they are exchangeable

Page 48: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Label switching

A mixture model [special case of missing variable model] isinvariant under permutations of the indices of the components.E.g., mixtures

0.3N (0, 1) + 0.7N (2.3, 1)

and0.7N (2.3, 1) + 0.3N (0, 1)

are exactly the same!c© The component parameters θi are not identifiablemarginally since they are exchangeable

Page 49: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Connected difficulties

1 Number of modes of the likelihood of order O(k!):c© Maximization and even [MCMC] exploration of the

posterior surface harder

2 Under exchangeable priors on (θ,p) [prior invariant underpermutation of the indices], all posterior marginals areidentical:c© Posterior expectation of θ1 equal to posterior expectation

of θ2

Page 50: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Connected difficulties

1 Number of modes of the likelihood of order O(k!):c© Maximization and even [MCMC] exploration of the

posterior surface harder

2 Under exchangeable priors on (θ,p) [prior invariant underpermutation of the indices], all posterior marginals areidentical:c© Posterior expectation of θ1 equal to posterior expectation

of θ2

Page 51: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

License

Since Gibbs output does not produce exchangeability, the Gibbssampler has not explored the whole parameter space: it lacksenergy to switch simultaneously enough component allocations atonce

0 100 200 300 400 500

−10

12

3

n

µ i

−1 0 1 2 3

0.20.3

0.40.5

µi

p i

0 100 200 300 400 500

0.20.3

0.40.5

n

p i

0.2 0.3 0.4 0.5

0.40.6

0.81.0

pi

σ i

0 100 200 300 400 500

0.40.6

0.81.0

n

σ i

0.4 0.6 0.8 1.0

−10

12

3

σi

µ i

Page 52: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Label switching paradox

We should observe the exchangeability of the components [labelswitching] to conclude about convergence of the Gibbs sampler.If we observe it, then we do not know how to estimate theparameters.If we do not, then we are uncertain about the convergence!!!

Page 53: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Label switching paradox

We should observe the exchangeability of the components [labelswitching] to conclude about convergence of the Gibbs sampler.If we observe it, then we do not know how to estimate theparameters.If we do not, then we are uncertain about the convergence!!!

Page 54: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Label switching paradox

We should observe the exchangeability of the components [labelswitching] to conclude about convergence of the Gibbs sampler.If we observe it, then we do not know how to estimate theparameters.If we do not, then we are uncertain about the convergence!!!

Page 55: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Compensation for label switching

For mixture models, z(t)k usually fails to visit all configurations in a

balanced way, despite the symmetry predicted by the theory

πk(θk|x) = πk(σ(θk)|x) =1k!

∑σ∈S

πk(σ(θk)|x)

for all σ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using

πk(θ∗k|x) =1T k!

∑σ∈Sk

T∑t=1

πk(σ(θ∗k)|x, z(t)k ) .

[Berkhof, Mechelen, & Gelman, 2003]

Page 56: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Compensation for label switching

For mixture models, z(t)k usually fails to visit all configurations in a

balanced way, despite the symmetry predicted by the theory

πk(θk|x) = πk(σ(θk)|x) =1k!

∑σ∈S

πk(σ(θk)|x)

for all σ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using

πk(θ∗k|x) =1T k!

∑σ∈Sk

T∑t=1

πk(σ(θ∗k)|x, z(t)k ) .

[Berkhof, Mechelen, & Gelman, 2003]

Page 57: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Galaxy dataset

n = 82 galaxies as a mixture of k normal distributions with bothmean and variance unknown.

[Roeder, 1992]Average density

data

Rel

ativ

e F

requ

ency

−2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

Page 58: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Galaxy dataset (k)Using only the original estimate, with θ∗k as the MAP estimator,

log(mk(x)) = −105.1396

for k = 3 (based on 103 simulations), while introducing thepermutations leads to

log(mk(x)) = −103.3479

Note that−105.1396 + log(3!) = −103.3479

k 2 3 4 5 6 7 8

mk(x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44

Estimations of the marginal likelihoods by the symmetrised Chib’sapproximation (based on 105 Gibbs iterations and, for k > 5, 100permutations selected at random in Sk).

[Lee, Marin, Mengersen & Robert, 2008]

Page 59: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Galaxy dataset (k)Using only the original estimate, with θ∗k as the MAP estimator,

log(mk(x)) = −105.1396

for k = 3 (based on 103 simulations), while introducing thepermutations leads to

log(mk(x)) = −103.3479

Note that−105.1396 + log(3!) = −103.3479

k 2 3 4 5 6 7 8

mk(x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44

Estimations of the marginal likelihoods by the symmetrised Chib’sapproximation (based on 105 Gibbs iterations and, for k > 5, 100permutations selected at random in Sk).

[Lee, Marin, Mengersen & Robert, 2008]

Page 60: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Galaxy dataset (k)Using only the original estimate, with θ∗k as the MAP estimator,

log(mk(x)) = −105.1396

for k = 3 (based on 103 simulations), while introducing thepermutations leads to

log(mk(x)) = −103.3479

Note that−105.1396 + log(3!) = −103.3479

k 2 3 4 5 6 7 8

mk(x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44

Estimations of the marginal likelihoods by the symmetrised Chib’sapproximation (based on 105 Gibbs iterations and, for k > 5, 100permutations selected at random in Sk).

[Lee, Marin, Mengersen & Robert, 2008]

Page 61: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Case of the probit model

For the completion by z,

π(θ|x) =1T

∑t

π(θ|x, z(t))

is a simple average of normal densities

R Bridge sampling codegibbs1=gibbsprobit(Niter,y,X1)

gibbs2=gibbsprobit(Niter,y,X2)

bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),

sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/

mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),

sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))

Page 62: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Case of the probit model

For the completion by z,

π(θ|x) =1T

∑t

π(θ|x, z(t))

is a simple average of normal densities

R Bridge sampling codegibbs1=gibbsprobit(Niter,y,X1)

gibbs2=gibbsprobit(Niter,y,X2)

bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),

sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/

mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),

sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))

Page 63: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

Chib’s solution

Diabetes in Pima Indian women (cont’d)Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above Chib’s and importance samplers

●●

Chib's method importance sampling

0.024

00.0

245

0.025

00.0

255

Page 64: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

The Savage–Dickey ratio

Special representation of the Bayes factor used for simulation

Original version (Dickey, AoMS, 1971)

Page 65: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Savage’s density ratio theorem

Given a test H0 : θ = θ0 in a model f(x|θ, ψ) with a nuisanceparameter ψ, under priors π0(ψ) and π1(θ, ψ) such that

π1(ψ|θ0) = π0(ψ)

then

B01 =π1(θ0|x)π1(θ0)

,

with the obvious notations

π1(θ) =∫π1(θ, ψ)dψ , π1(θ|x) =

∫π1(θ, ψ|x)dψ ,

[Dickey, 1971; Verdinelli & Wasserman, 1995]

Page 66: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Measure-theoretic difficulty

Representation depends on the choice of versions of conditionaldensities:

B01 =∫π0(ψ)f(x|θ0, ψ) dψ∫

π1(θ, ψ)f(x|θ, ψ) dψdθ[by definition]

=∫π1(ψ|θ0)f(x|θ0, ψ) dψ π1(θ0)∫π1(θ, ψ)f(x|θ, ψ) dψdθ π1(θ0)

[specific version of π1(ψ|θ0)

and arbitrary version of π1(θ0)]

=∫π1(θ0, ψ)f(x|θ0, ψ) dψ

m1(x)π1(θ0)[specific version of π1(θ0, ψ)]

=π1(θ0|x)π1(θ0)

[version dependent]

Page 67: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Measure-theoretic difficulty

Representation depends on the choice of versions of conditionaldensities:

B01 =∫π0(ψ)f(x|θ0, ψ) dψ∫

π1(θ, ψ)f(x|θ, ψ) dψdθ[by definition]

=∫π1(ψ|θ0)f(x|θ0, ψ) dψ π1(θ0)∫π1(θ, ψ)f(x|θ, ψ) dψdθ π1(θ0)

[specific version of π1(ψ|θ0)

and arbitrary version of π1(θ0)]

=∫π1(θ0, ψ)f(x|θ0, ψ) dψ

m1(x)π1(θ0)[specific version of π1(θ0, ψ)]

=π1(θ0|x)π1(θ0)

[version dependent]

Page 68: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Choice of density version

c© Dickey’s (1971) condition is not a condition:

Ifπ1(θ0|x)π1(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

is chosen as a version, then Savage–Dickey’s representation holds

Page 69: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Choice of density version

c© Dickey’s (1971) condition is not a condition:

Ifπ1(θ0|x)π1(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

is chosen as a version, then Savage–Dickey’s representation holds

Page 70: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Savage–Dickey paradox

Verdinelli-Wasserman extension:

B01 =π1(θ0|x)π1(θ0)

Eπ1(ψ|x,θ0,x)

[π0(ψ)π1(ψ|θ0)

]similarly depends on choices of versions...

...but Monte Carlo implementation relies on specific versions of alldensities without making mention of it

[Chen, Shao & Ibrahim, 2000]

Page 71: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Savage–Dickey paradox

Verdinelli-Wasserman extension:

B01 =π1(θ0|x)π1(θ0)

Eπ1(ψ|x,θ0,x)

[π0(ψ)π1(ψ|θ0)

]similarly depends on choices of versions...

...but Monte Carlo implementation relies on specific versions of alldensities without making mention of it

[Chen, Shao & Ibrahim, 2000]

Page 72: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Computational implementation

Starting from the (new) prior

π1(θ, ψ) = π1(θ)π0(ψ)

define the associated posterior

π1(θ, ψ|x) = π0(ψ)π1(θ)f(x|θ, ψ)/m1(x)

and imposeπ1(θ0|x)π0(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

to hold.Then

B01 =π1(θ0|x)π1(θ0)

m1(x)m1(x)

Page 73: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Computational implementation

Starting from the (new) prior

π1(θ, ψ) = π1(θ)π0(ψ)

define the associated posterior

π1(θ, ψ|x) = π0(ψ)π1(θ)f(x|θ, ψ)/m1(x)

and imposeπ1(θ0|x)π0(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

to hold.Then

B01 =π1(θ0|x)π1(θ0)

m1(x)m1(x)

Page 74: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

First ratio

If (θ(1), ψ(1)), . . . , (θ(T ), ψ(T )) ∼ π(θ, ψ|x), then

1T

∑t

π1(θ0|x, ψ(t))

converges to π1(θ0|x) (if the right version is used in θ0).

π1(θ0|x, ψ) =π1(θ0)f(x|θ0, ψ)∫π1(θ)f(x|θ, ψ) dθ

Page 75: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Rao–Blackwellisation with latent variablesWhen π1(θ0|x, ψ) unavailable, replace with

1T

T∑t=1

π1(θ0|x, z(t), ψ(t))

via data completion by latent variable z such that

f(x|θ, ψ) =∫f(x, z|θ, ψ) dz

and that π1(θ, ψ, z|x) ∝ π0(ψ)π1(θ)f(x, z|θ, ψ) available in closedform, including the normalising constant, based on version

π1(θ0|x, z, ψ)π1(θ0)

=f(x, z|θ0, ψ)∫

f(x, z|θ, ψ)π1(θ) dθ.

Page 76: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Rao–Blackwellisation with latent variablesWhen π1(θ0|x, ψ) unavailable, replace with

1T

T∑t=1

π1(θ0|x, z(t), ψ(t))

via data completion by latent variable z such that

f(x|θ, ψ) =∫f(x, z|θ, ψ) dz

and that π1(θ, ψ, z|x) ∝ π0(ψ)π1(θ)f(x, z|θ, ψ) available in closedform, including the normalising constant, based on version

π1(θ0|x, z, ψ)π1(θ0)

=f(x, z|θ0, ψ)∫

f(x, z|θ, ψ)π1(θ) dθ.

Page 77: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Bridge revival (1)

Since m1(x)/m1(x) is unknown, apparent failure!Use of the bridge identity

Eπ1(θ,ψ|x)

[π1(θ, ψ)f(x|θ, ψ)

π0(ψ)π1(θ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π1(ψ|θ)π0(ψ)

]=m1(x)m1(x)

to (biasedly) estimate m1(x)/m1(x) by

T/ T∑

t=1

π1(ψ(t)|θ(t))π0(ψ(t))

based on the same sample from π1.

Page 78: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Bridge revival (1)

Since m1(x)/m1(x) is unknown, apparent failure!Use of the bridge identity

Eπ1(θ,ψ|x)

[π1(θ, ψ)f(x|θ, ψ)

π0(ψ)π1(θ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π1(ψ|θ)π0(ψ)

]=m1(x)m1(x)

to (biasedly) estimate m1(x)/m1(x) by

T/ T∑

t=1

π1(ψ(t)|θ(t))π0(ψ(t))

based on the same sample from π1.

Page 79: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Bridge revival (2)Alternative identity

Eπ1(θ,ψ|x)

[π0(ψ)π1(θ)f(x|θ, ψ)π1(θ, ψ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π0(ψ)π1(ψ|θ)

]=m1(x)m1(x)

suggests using a second sample (θ(1), ψ(1), z(1)), . . . ,(θ(T ), ψ(T ), z(T )) ∼ π1(θ, ψ|x) and the ratio estimate

1T

T∑t=1

π0(ψ(t))/π1(ψ(t)|θ(t))

Resulting unbiased estimate:

B01 =1T

∑t π1(θ0|x, z(t), ψ(t))

π1(θ0)1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ(t))

Page 80: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Bridge revival (2)Alternative identity

Eπ1(θ,ψ|x)

[π0(ψ)π1(θ)f(x|θ, ψ)π1(θ, ψ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π0(ψ)π1(ψ|θ)

]=m1(x)m1(x)

suggests using a second sample (θ(1), ψ(1), z(1)), . . . ,(θ(T ), ψ(T ), z(T )) ∼ π1(θ, ψ|x) and the ratio estimate

1T

T∑t=1

π0(ψ(t))/π1(ψ(t)|θ(t))

Resulting unbiased estimate:

B01 =1T

∑t π1(θ0|x, z(t), ψ(t))

π1(θ0)1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ(t))

Page 81: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Difference with Verdinelli–Wasserman representation

The above leads to the representation

B01 =π1(θ0|x)π1(θ0)

Eπ1(θ,ψ|x)

[π0(ψ)π1(ψ|θ)

]which shows how our approach differs from Verdinelli andWasserman’s

B01 =π1(θ0|x)π1(θ0)

Eπ1(ψ|x,θ0,x)

[π0(ψ)π1(ψ|θ0)

]

Page 82: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Difference with Verdinelli–Wasserman approximation

In terms of implementation,

B01MR

(x) =1T

∑t π1(θ0|x, z(t), ψ(t))

π1(θ0)1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ(t))

formaly resembles

B01VW

(x) =1T

T∑t=1

π1(θ0|x, z(t), ψ(t))π1(θ0)

1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ0)

.

But the simulated sequences differ: first average involvessimulations from π1(θ, ψ, z|x) and from π1(θ, ψ, z|x), while secondaverage relies on simulations from π1(θ, ψ, z|x) and fromπ1(ψ, z|x, θ0),

Page 83: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Difference with Verdinelli–Wasserman approximation

In terms of implementation,

B01MR

(x) =1T

∑t π1(θ0|x, z(t), ψ(t))

π1(θ0)1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ(t))

formaly resembles

B01VW

(x) =1T

T∑t=1

π1(θ0|x, z(t), ψ(t))π1(θ0)

1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ0)

.

But the simulated sequences differ: first average involvessimulations from π1(θ, ψ, z|x) and from π1(θ, ψ, z|x), while secondaverage relies on simulations from π1(θ, ψ, z|x) and fromπ1(ψ, z|x, θ0),

Page 84: 45th SIS Meeting, Padova, Italy

Importance sampling methods for Bayesian discrimination between embedded models

Importance sampling model comparison solutions compared

The Savage–Dickey ratio

Diabetes in Pima Indian women (cont’d)Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above importance, Chib’s, Savage–Dickey’s and bridgesamplers

IS Chib Savage−Dickey Bridge

2.83.0

3.23.4