If you can't read please download the document
Upload
ngoanh
View
223
Download
0
Embed Size (px)
Citation preview
Diagnostics in MCMC
Hoff Chapter 6
October 13, 2010
Convergence to Posterior Distribution
Theory tells us that if we run the Gibbs sampler long enough thesamples we obtain will be samples from the joint posteriordistribution (target or stationary distribution). This does notdepend on the starting point (forgets the past).
I How long do we need to run the Markov Chain to adequatelyexplore the posterior distribution?
I Mixing of the chain plays a critical role in how fast we canobtain good results. How can we tell if the chain is mixingwell (or poorly)?
Three Component Mixture Model
Posterior for :
| Y 0.45N(3, 1/3) + 0.10N(0, 1/3) + 0.45N(3, 1/3)
How can we draw samples from the posterior?
Introduce mixture component indicator , an unobserved latentvariable which simplifies sampling
I = 1 then | ,Y N(3, 1/3) and P( = 1 | Y ) = 0.45
I = 2 then | ,Y N(0, 1/3) and P( = 2 | Y ) = 0.10;
I = 3 then | ,Y N(3, 1/3) and P( = 3 | Y ) = 0.45
Monte Carlo sampling: Draw ; Given , draw
MC density
Mixture Density
Den
sity
4 2 0 2 4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Histogram of from 1000 MC draws with posterior density as asolid line
MC Variation
If we want to find the posterior mean of g(), then the MonteCarlo estimate based on M MC samples is
gMC =1
M
m
g((m)) E[g() | Y ]
with variance
Var[gMC ] = E[gMC E[gMC ]]2 =
Var[g() | Y ]
M
leading to Monte Carlo Standard Error
Var[gMC ].
We expect the posterior mean of g() should in the intervalgMC 2
Var[gMC ] for roughly 95% of repeated MC samples.
To increase accuracy, increase M.
Markov Chain Monte Carlo
I Full conditional for | ,YI | = 1, Y N(3, 1/3)I | = 2, Y N(0, 1/3)I | = 3, Y N(3, 1/3)
I Full conditional for | ,Y (use Bayes Theorem):
P( = d | ,Y ) =p( = d | Y ) dnorm(,md , s
2d)
3d=1 p( = d | Y ) dnorm(,md , s
2d)
where dnorm is the normal density.
MCMC density
Mixture Density
Den
sity
6 4 2 0 2 4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1000 draws using MCMC starting at = 6 and = 1
MC Trace Plots
0 200 400 600 800 1000
4
2
02
4
Monte Carlo Samples
Index
MCMC Traceplot
0 200 400 600 800 1000
6
4
2
02
4
Markov Chain Monte Carlo Samples
Index
Stationarity
Partition the parameter space as follows:
I A0 = (,5)
I A1 = (5,1)
I A2 = (1, 1)
I A3 = (1, 5)
I A4 = (5,)
Under the posterior, we should haveP(A3) = P(A1) > P(A2) > P(A0) = P(A4).We need to have our MCMC sample size S to be big enough sothat we can
1. Move out of A2 (or another areas of low probability) intoregions of high posterior regions (convergence)
2. Move between A1 and A3 and other high probability regions(mixing)
Stickiness
The traceplots show
I MC samples can move from one region to another in 1 step(perfect mixing)
I MCMC quickly moves away from the starting value 6
I MCMC has more difficulty moving from A2 into higherprobability regions
I MCMC has difficulty moving between the differentcomponents and tends to get stuck in one component for awhile (stickiness)
Consequences for Variance for MCMC estimates
VarMCMC[g ] =VarMC(g )+
s 6=t
E[(
g((s)) E[g()|Y ]) (
g((t)) E[g()|Y ])]
I The second term depends on the autocorrelation of sampleswithin the Markov chain
I Autocorrelation of lag t is the correlation between g()(s))and g()(s+t)) (elements that are t time steps apart)
I often positive so MCMC variance is larger than MC variance.
I high correlation is an indicator of poor mixing, so that weneed a larger sample size to obtain a comparable variance
I Effective Sample Size
VarMC =VarMCMC
Seff
ACF Plots Monte Carlo Sample
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
ACF Plots MCMC Sample
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
CODA
The CODA package provides many popular diagnostics forassessing convergence of MCMC output from WinBUGS (andother programs)
> library(coda)
> effectiveSize(theta.MCMC)
var1 var2
3.444007 4.390776
The precision of the MCMC estimate of the posterior mean basedon 1000 samples is a good as taking 4 independent samples!
Running for 1 million iterations, still has an effective sample size of1927.
Useful Diagnostics/Functions
I Effective Sample Size: effectiveSize()
I Autocorrelation autocorr.plot()
I Cross Variable Correlations:crosscorr.plot()
I Geweke: geweke.diag()
I Gelman-Rubin: gelman.diag()
I Heidelberg & Welch: heidel.diag()
I Raftery-Lewis: raftery.diag()
Geweke
Geweke (1992) proposed a convergence diagnostic for Markovchains based on a test for equality of the means of the first and lastpart of a Markov chain (by default the first 10% and the last 50%).
If the samples are drawn from the stationary distribution of thechain, the two means are equal and Gewekes statistic has anasymptotically standard normal distribution.
Gelman-Rubin
Gelman and Rubin (1992) propose a general approach tomonitoring convergence of MCMC output in which m > 1 parallelchains are run.
I Use different starting values that are overdispersed relative tothe posterior distribution.
I Convergence is diagnosed when the chains have forgottentheir initial values, and the output from all chains isindistinguishable.
I The diagnostic is applied to a single variable from the chain.It is based a comparison of within-chain and between-chainvariances (similar to a classical analysis of variance)
I Assumes that the target is normal (transformations may help)
I Values of R near 1 suggest convergence
Heidelberg-Welch
The convergence test uses the Cramer-von-Mises statistic to testthe null hypothesis that the sampled values come from a stationarydistribution.
I The test is successively applied, firstly to the whole chain,then after discarding the first 10%, 20%, of the chain untileither the null hypothesis is accepted, or 50% of the chain hasbeen discarded.
I The latter outcome constitutes failure of the stationaritytest and indicates that a longer MCMC run is needed.
I If the stationarity test is passed, the number of iterations tokeep and the number to discard (burn-in) are reported.
Raftery-Lewis
Calculates the number of iterations required to estimate thequantile q to within an accuracy of r with probability p.
I Separate calculations are performed for each variable withineach chain. If the number of iterations in data is too small, anerror message is printed indicating the minimum length ofpilot run.
I The minimum length is the required sample size for a chainwith no correlation between consecutive samples. An estimateI (the dependence factor) of the extent to whichautocorrelation inflates the required sample size is alsoprovided.
I Values of I larger than 5 indicate strong autocorrelation whichmay be due to a poor choice of starting value, high posteriorcorrelations or stickiness of the MCMC algorithm.
I The number of burn-in iterations to be discarded at thebeginning of the chain is also calculated.
Summary
I Diagnostics cannot guarantee that chain has converged
I Can indicate that it has not converged
Solutions?
I Run longer and thin output
I Reparametrize model
I Block correlated variables together
I integrate out variables
I Add auxiliary variables (Slice-sampler for example)
I Use Rao-Blackwellization in estimation