Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
STA 216, Generalized Linear Models, Lecture 6
September 13, 2007
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Introduction to Bayes Inference for GLMsDescription of PosteriorAsymptotic Approximations
Introduction to MCMC AlgorithmsGibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Bayesian Inference via the Posterior Distribution
I Recall that Bayesian inference is based on the posteriordistribution
π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ
=π(θ)L(y | θ)
L(y),
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Bayesian Inference via the Posterior Distribution
I Recall that Bayesian inference is based on the posteriordistribution
π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ
=π(θ)L(y | θ)
L(y),
I π(θ) = prior distribution for parameter θ
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Bayesian Inference via the Posterior Distribution
I Recall that Bayesian inference is based on the posteriordistribution
π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ
=π(θ)L(y | θ)
L(y),
I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θ
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Bayesian Inference via the Posterior Distribution
I Recall that Bayesian inference is based on the posteriordistribution
π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ
=π(θ)L(y | θ)
L(y),
I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Bayesian Inference via the Posterior Distribution
I Recall that Bayesian inference is based on the posteriordistribution
π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ
=π(θ)L(y | θ)
L(y),
I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior
I Good news - we have the numerator in this expression
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Bayesian Inference via the Posterior Distribution
I Recall that Bayesian inference is based on the posteriordistribution
π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ
=π(θ)L(y | θ)
L(y),
I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior
I Good news - we have the numerator in this expression
I Bad news - the denominator is typically not available (mayinvolve high dimensional integral)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Conjugate Priors
I For conjugate priors, the posterior distribution of θ isavailable analytically
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Conjugate Priors
I For conjugate priors, the posterior distribution of θ isavailable analytically
I Example: L(y | θ) =∏n
i=1N(yi;x
′
iβ, τ−1) (normal linearregression)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Conjugate Priors
I For conjugate priors, the posterior distribution of θ isavailable analytically
I Example: L(y | θ) =∏n
i=1N(yi;x
′
iβ, τ−1) (normal linearregression)
I The conjugate prior is normal-gamma:
π(β, τ) = Np(β0, τ−1Σ0)G(τ ; a, b),
where Np(·) denotes the p-variate normal &G(·) denotes the gamma
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Conjugate Priors
I For conjugate priors, the posterior distribution of θ isavailable analytically
I Example: L(y | θ) =∏n
i=1N(yi;x
′
iβ, τ−1) (normal linearregression)
I The conjugate prior is normal-gamma:
π(β, τ) = Np(β0, τ−1Σ0)G(τ ; a, b),
where Np(·) denotes the p-variate normal &G(·) denotes the gamma
I For this prior, the posterior is also normal-gamma
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Non-Conjugate Priors
I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Non-Conjugate Priors
I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model
I One can potentially rely on an asymptotic normalapproximation
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Non-Conjugate Priors
I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model
I One can potentially rely on an asymptotic normalapproximation
I As n → ∞, the posterior distribution is normal centered onMLE
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Asymptotic Approximation with Informative Priors
I Suppose we have a N(β0,Σ0) prior for β.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Asymptotic Approximation with Informative Priors
I Suppose we have a N(β0,Σ0) prior for β.
I Asymptotic normal approximation to the posterior is
π(β |y,X) ∝ exp
{−
1
2(β − β0)Σ
−1
0(β − β0)
}
× exp
{−
1
2(β − β̂)′I(β̂)(β − β̂)′
}
∝ N(β; β̃, Σ̃β
)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Asymptotic Approximation with Informative Priors
I Suppose we have a N(β0,Σ0) prior for β.
I Asymptotic normal approximation to the posterior is
π(β |y,X) ∝ exp
{−
1
2(β − β0)Σ
−1
0(β − β0)
}
× exp
{−
1
2(β − β̂)′I(β̂)(β − β̂)′
}
∝ N(β; β̃, Σ̃β
)
I Approximate posterior mean & variance:
β̃ = Σ̃(Σ−1
0β0 + I(β̂)β̂
), Σ̃β =
(Σ−1
0+ I(β̂)
)−1
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Comments on Asymptotic Approximation
I Even for moderate sample sizes, asymptotic approximationmay be inaccurate
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Comments on Asymptotic Approximation
I Even for moderate sample sizes, asymptotic approximationmay be inaccurate
I In logistic regression for rare outcomes or rare binaryexposures, posterior can be highly skewed
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Description of PosteriorAsymptotic Approximations
Comments on Asymptotic Approximation
I Even for moderate sample sizes, asymptotic approximationmay be inaccurate
I In logistic regression for rare outcomes or rare binaryexposures, posterior can be highly skewed
I Appealing to avoid any reliance on large sampleassumptions and base inferences on exact posterior
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
MCMC - Basic Idea
I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
MCMC - Basic Idea
I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution
I Note that this does not give us an approximation toπ(θ |y) directly
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
MCMC - Basic Idea
I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution
I Note that this does not give us an approximation toπ(θ |y) directly
I However, from these samples we can obtain summaries ofthe posterior distribution for θ
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
MCMC - Basic Idea
I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution
I Note that this does not give us an approximation toπ(θ |y) directly
I However, from these samples we can obtain summaries ofthe posterior distribution for θ
I Summaries of exact posterior distributions of g(θ), for anyfunctional g(·), can also be obtained.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How does MCMC work?
I Let θt = (θt1, . . . , θ
tp) denote the value of the p × 1 vector of
parameters at iteration t.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How does MCMC work?
I Let θt = (θt1, . . . , θ
tp) denote the value of the p × 1 vector of
parameters at iteration t.
I θ0 = initial value used to start the chain (shouldn’t be
sensitive)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How does MCMC work?
I Let θt = (θt1, . . . , θ
tp) denote the value of the p × 1 vector of
parameters at iteration t.
I θ0 = initial value used to start the chain (shouldn’t be
sensitive)
I MCMC generates θt from a distribution that depends onthe data & potentially on θt−1, but not on θ1, . . . , θt−2.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How does MCMC work?
I Let θt = (θt1, . . . , θ
tp) denote the value of the p × 1 vector of
parameters at iteration t.
I θ0 = initial value used to start the chain (shouldn’t be
sensitive)
I MCMC generates θt from a distribution that depends onthe data & potentially on θt−1, but not on θ1, . . . , θt−2.
I This results in a Markov chain with stationary distributionπ(θ |y) under some conditions on the sampling distribution
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Different flavors of MCMC
I The most commonly used MCMC algorithms are:
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Different flavors of MCMC
I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Different flavors of MCMC
I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Different flavors of MCMC
I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,
1990)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Different flavors of MCMC
I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,
1990)
I Easy overview of Gibbs - Casella & George (1992, The
American Statistician, 46, 167-174)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Different flavors of MCMC
I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,
1990)
I Easy overview of Gibbs - Casella & George (1992, The
American Statistician, 46, 167-174)
I Easy overview of MH - Chib & Greenberg (1995, The
American Statistician)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I Start with initial value θ0 = (θ01, . . . , θ0
p)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I Start with initial value θ0 = (θ01, . . . , θ0
p)
I For iterations t = 1, . . . , T ,
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I Start with initial value θ0 = (θ01, . . . , θ0
p)
I For iterations t = 1, . . . , T ,
1. Sample θt1 from the conditional posterior distribution
π(θ1 | θ2 = θt−1
2, . . . , θp = θt−1
p ,y)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I Start with initial value θ0 = (θ01, . . . , θ0
p)
I For iterations t = 1, . . . , T ,
1. Sample θt1 from the conditional posterior distribution
π(θ1 | θ2 = θt−1
2, . . . , θp = θt−1
p ,y)
2. Sample θt2 from the conditional posterior distribution
π(θ2 | θ1 = θt1, θ3 = θt−1
3, . . . , θp = θt−1
p )
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I Start with initial value θ0 = (θ01, . . . , θ0
p)
I For iterations t = 1, . . . , T ,
1. Sample θt1 from the conditional posterior distribution
π(θ1 | θ2 = θt−1
2, . . . , θp = θt−1
p ,y)
2. Sample θt2 from the conditional posterior distribution
π(θ2 | θ1 = θt1, θ3 = θt−1
3, . . . , θp = θt−1
p )
3. Similarly, sample θt3, . . . , θ
tp from the conditional posterior
distributions given current values of other parameters.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling (continued)
I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling (continued)
I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)
I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling (continued)
I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)
I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).
I It is necessary to discard the initial samples as a burn-in toallow convergence
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling (continued)
I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)
I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).
I It is necessary to discard the initial samples as a burn-in toallow convergence
I In simple models such as GLMs, convergence typicallyoccurs quickly & burn-in of 100 iterations should besufficient (to be conservative SAS uses 2,000 as default)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Example - DDE & Preterm Birth
I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Example - DDE & Preterm Birth
I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables
I Data from US Collaborative Perinatal Project (CPP) - n =2380 children out of which 361 were born preterm
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Example - DDE & Preterm Birth
I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables
I Data from US Collaborative Perinatal Project (CPP) - n =2380 children out of which 361 were born preterm
I Analysis: Bayesian analysis using a probit model
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Probit Model
yi = 1 if preterm birth and yi = 0 if full-term birth
Pr(yi = 1 |xi, β) = Φ(x′
iβ),
I xi = (1, ddei, xi3, . . . , xi7)′
I xi3, . . . , xi7=possible confounders (black race, etc)
I β1 = intercept
I β2 = slope
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
I Likelihood:
π(y |β,X) =
n∏
i=1
Φ(x′
iβ)yi
{1 − Φ(x′
iβ)}1−yi
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
I Likelihood:
π(y |β,X) =
n∏
i=1
Φ(x′
iβ)yi
{1 − Φ(x′
iβ)}1−yi
I Posterior:π(β |y,X) ∝ π(β)π(y |β,X).
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
I Likelihood:
π(y |β,X) =
n∏
i=1
Φ(x′
iβ)yi
{1 − Φ(x′
iβ)}1−yi
I Posterior:π(β |y,X) ∝ π(β)π(y |β,X).
I No closed form available for normalizing constant
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Maximum Likelihood Results
Parameter MLE SE Z stat p-value
β1 -1.08068 0.04355 -24.816 < 2e − 16β2 0.17536 0.02909 6.028 1.67e-09β3 -0.12817 0.03528 -3.633 0.000280β4 0.11097 0.03366 3.297 0.000978β5 -0.01705 0.03405 -0.501 0.616659β6 -0.08216 0.03576 -2.298 0.021571β7 0.05462 0.06473 0.844 0.398721
β2 = dde slope (highly significant increasing trend)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Bayesian Analysis - Prior Elicitation
I Ideally, read literature on preterm birth → β0 best guess ofβ
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Bayesian Analysis - Prior Elicitation
I Ideally, read literature on preterm birth → β0 best guess ofβ
I Should be possible (in particular) for confoundingcoefficients
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Bayesian Analysis - Prior Elicitation
I Ideally, read literature on preterm birth → β0 best guess ofβ
I Should be possible (in particular) for confoundingcoefficients
I Σ0 expresses uncertainty - place high probability in aplausible range
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Bayesian Analysis - Prior Elicitation
I Ideally, read literature on preterm birth → β0 best guess ofβ
I Should be possible (in particular) for confoundingcoefficients
I Σ0 expresses uncertainty - place high probability in aplausible range
I Much better than flat priors, which can yield implausibleestimates!
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Bayesian Analysis - Prior Elicitation
I Ideally, read literature on preterm birth → β0 best guess ofβ
I Should be possible (in particular) for confoundingcoefficients
I Σ0 expresses uncertainty - place high probability in aplausible range
I Much better than flat priors, which can yield implausibleestimates!
I As a default, shrinkage-type prior we use N(0, 4 × I7×7)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I We choose β0 = 0 as starting values
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I We choose β0 = 0 as starting values
I MLEs or asymptotic approximation to posterior mean mayprovide better default choice
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I We choose β0 = 0 as starting values
I MLEs or asymptotic approximation to posterior mean mayprovide better default choice
I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I We choose β0 = 0 as starting values
I MLEs or asymptotic approximation to posterior mean mayprovide better default choice
I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in
I For typical GLMs, such as probit models, convergence rapid
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Gibbs Sampling
I We choose β0 = 0 as starting values
I MLEs or asymptotic approximation to posterior mean mayprovide better default choice
I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in
I For typical GLMs, such as probit models, convergence rapid
I For illustration, we collected 1,000 iterations
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Example - probit binary regression model
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Posterior Summaries
Parameter Mean Median SD 95% credible interval
β1 -1.08 -1.08 0.04 (-1.16, -1.01)β2 0.17 0.17 0.03 (0.12, 0.23)β3 -0.13 -0.13 0.04 (-0.2, -0.05)β4 0.11 0.11 0.03 (0.05, 0.18)β5 -0.02 -0.02 0.03 (-0.08, 0.05)β6 -0.08 -0.08 0.04 (-0.15, -0.02)β7 0.05 0.06 0.06 (-0.07, 0.18)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Estimated Posterior Density
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Inferences on Functionals
I Often, it is not the regression parameter which is ofprimary interest.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Inferences on Functionals
I Often, it is not the regression parameter which is ofprimary interest.
I One may want to estimate functionals, such as the mean atdifferent values of a predictor.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Inferences on Functionals
I Often, it is not the regression parameter which is ofprimary interest.
I One may want to estimate functionals, such as the mean atdifferent values of a predictor.
I By applying the function to every iteration of the MCMCalgorithm after burn-in, one can obtain samples from themarginal posterior density of the unknown of interest.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Estimated Dose Response Function
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Metropolis-Hastings Sampling
I Gibbs sampling requires sampling from the conditionalposterior distributions
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Metropolis-Hastings Sampling
I Gibbs sampling requires sampling from the conditionalposterior distributions
I Metropolis-Hastings is an alternative that avoids thisrestriction
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Metropolis-Hastings Sampling
I Gibbs sampling requires sampling from the conditionalposterior distributions
I Metropolis-Hastings is an alternative that avoids thisrestriction
I Again start with an initial value θ0 and sequentially updatethe parameters θ1, . . . , θp
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Metropolis-Hastings (continued)
I To draw θtj:
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Metropolis-Hastings (continued)
I To draw θtj:
1. Sample a candidate θ̃tj ∼ qj(· | θ
t−1
j )
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Metropolis-Hastings (continued)
I To draw θtj:
1. Sample a candidate θ̃tj ∼ qj(· | θ
t−1
j )
2. Let θtj = θ̃t
j with probability
min
{1,
π(θ̃tj) L(y | θj = θ̃t
j ,−) qj(θt−1
j | θ̃tj)
π(θt−1
j ) L(y | θj = θt−1
j ,−) qj(θ̃tj | θ
tj)
},
L(y | θj = θ̃tj ,−)=likelihood given θj = θ̃t
j and currentvalues of other parameters
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Metropolis-Hastings (continued)
I To draw θtj:
1. Sample a candidate θ̃tj ∼ qj(· | θ
t−1
j )
2. Let θtj = θ̃t
j with probability
min
{1,
π(θ̃tj) L(y | θj = θ̃t
j ,−) qj(θt−1
j | θ̃tj)
π(θt−1
j ) L(y | θj = θt−1
j ,−) qj(θ̃tj | θ
tj)
},
L(y | θj = θ̃tj ,−)=likelihood given θj = θ̃t
j and currentvalues of other parameters
3. Otherwise let θtj = θt−1
j .
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Comments on Metropolis-Hastings
I Performance sensitive to the proposal distributions,qj(· | θ
t−1
j )
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Comments on Metropolis-Hastings
I Performance sensitive to the proposal distributions,qj(· | θ
t−1
j )
I Most common proposal is N(θt−1
j , κ), which is centered onthe previous value
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Comments on Metropolis-Hastings
I Performance sensitive to the proposal distributions,qj(· | θ
t−1
j )
I Most common proposal is N(θt−1
j , κ), which is centered onthe previous value
I This results in a Metropolis random walk
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Comments on Metropolis-Hastings
I Performance sensitive to the proposal distributions,qj(· | θ
t−1
j )
I Most common proposal is N(θt−1
j , κ), which is centered onthe previous value
I This results in a Metropolis random walk
I Inefficient if κ is too small or too large
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Adaptive Rejection Sampling (ARS)
I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Adaptive Rejection Sampling (ARS)
I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions
I Uses sequentially defined envelopes around target density,leading to some additional computational expense
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Adaptive Rejection Sampling (ARS)
I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions
I Uses sequentially defined envelopes around target density,leading to some additional computational expense
I Log concavity holds for most GLMs and typical priors
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Adaptive Rejection Sampling (ARS)
I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions
I Uses sequentially defined envelopes around target density,leading to some additional computational expense
I Log concavity holds for most GLMs and typical priors
I When violated adaptive rejection Metropolis sampling(ARMS) (Gilks et al., 1995) used.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
SAS Implementation
I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
SAS Implementation
I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS
I Hence, SAS uses Gibbs sampling for posterior computation
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
SAS Implementation
I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS
I Hence, SAS uses Gibbs sampling for posterior computation
I Important to diagnose convergence & mixing wheneverusing MCMC!!
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Some Terminology
I Convergence: initial drift in the samples towards astationary distribution
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Some Terminology
I Convergence: initial drift in the samples towards astationary distribution
I Burn-in: samples at start of the chain that are discarded toallow convergence
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Some Terminology
I Convergence: initial drift in the samples towards astationary distribution
I Burn-in: samples at start of the chain that are discarded toallow convergence
I Slow mixing: tendency for high autocorrelation in thesamples.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Some Terminology
I Convergence: initial drift in the samples towards astationary distribution
I Burn-in: samples at start of the chain that are discarded toallow convergence
I Slow mixing: tendency for high autocorrelation in thesamples.
I Thinning: practice of collecting every kth iteration toreduce autocorrelation
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Some Terminology
I Convergence: initial drift in the samples towards astationary distribution
I Burn-in: samples at start of the chain that are discarded toallow convergence
I Slow mixing: tendency for high autocorrelation in thesamples.
I Thinning: practice of collecting every kth iteration toreduce autocorrelation
I Trace plot: plot of sampled values of a parameter vsiteration #
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Example - trace plot with poor mixing
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Poor mixing Gibbs sampler
I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Poor mixing Gibbs sampler
I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean
I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Poor mixing Gibbs sampler
I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean
I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters
I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Poor mixing Gibbs sampler
I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean
I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters
I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries
I For very poor mixing chain, may even need millions ofiterations.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Poor mixing Gibbs sampler
I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean
I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters
I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries
I For very poor mixing chain, may even need millions ofiterations.
I Routinely examine trace plots!
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Example - trace plot with good mixing
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics
I Diagnostics available to help decide on number of burn-in& collected samples
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics
I Diagnostics available to help decide on number of burn-in& collected samples
I Note: no definitive tests of convergence & you should checkconvergence of all parameters
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics
I Diagnostics available to help decide on number of burn-in& collected samples
I Note: no definitive tests of convergence & you should checkconvergence of all parameters
I With experience visual inspection of trace plots perhapsmost useful approach
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics
I Diagnostics available to help decide on number of burn-in& collected samples
I Note: no definitive tests of convergence & you should checkconvergence of all parameters
I With experience visual inspection of trace plots perhapsmost useful approach
I There are a number of useful automated tests
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics in SAS
I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics in SAS
I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence
I Geweke: applies test of stationarity to a single chain
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics in SAS
I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence
I Geweke: applies test of stationarity to a single chain
I Heidelberger-Welch (stationarity): alternative to Geweke
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics in SAS
I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence
I Geweke: applies test of stationarity to a single chain
I Heidelberger-Welch (stationarity): alternative to Geweke
I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics in SAS
I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence
I Geweke: applies test of stationarity to a single chain
I Heidelberger-Welch (stationarity): alternative to Geweke
I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?
I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics in SAS
I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence
I Geweke: applies test of stationarity to a single chain
I Heidelberger-Welch (stationarity): alternative to Geweke
I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?
I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.
I autocorrelation: high values indicate slow mixing
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence diagnostics in SAS
I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence
I Geweke: applies test of stationarity to a single chain
I Heidelberger-Welch (stationarity): alternative to Geweke
I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?
I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.
I autocorrelation: high values indicate slow mixing
I effective sample size: low value relative to actual #indicates slow mixing
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Practical advice on convergence diagnosis
I The Gelman-Rubin approach is quite appealing in usingmultiple chains
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Practical advice on convergence diagnosis
I The Gelman-Rubin approach is quite appealing in usingmultiple chains
I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Practical advice on convergence diagnosis
I The Gelman-Rubin approach is quite appealing in usingmultiple chains
I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good
I Overly sensitive to minor departures from stationarity thatdo not impact inferences
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Practical advice on convergence diagnosis
I The Gelman-Rubin approach is quite appealing in usingmultiple chains
I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good
I Overly sensitive to minor departures from stationarity thatdo not impact inferences
I Sometimes this can be solved with more iterations
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Practical advice on convergence diagnosis
I The Gelman-Rubin approach is quite appealing in usingmultiple chains
I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good
I Overly sensitive to minor departures from stationarity thatdo not impact inferences
I Sometimes this can be solved with more iterations
I Otherwise, you may want to try multiple chains
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Practical advice on convergence diagnosis
I The Gelman-Rubin approach is quite appealing in usingmultiple chains
I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good
I Overly sensitive to minor departures from stationarity thatdo not impact inferences
I Sometimes this can be solved with more iterations
I Otherwise, you may want to try multiple chains
I For the models considered in SAS, chains tend to be verywell behaved when the MLE exists or priors are informative
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How to summarize results from the MCMC chain?
I Posterior mean: θ̂ = 1/(T − B)∑T
t=B+1θt, with B = #
burn-in samples, T = total # samples
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How to summarize results from the MCMC chain?
I Posterior mean: θ̂ = 1/(T − B)∑T
t=B+1θt, with B = #
burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and
provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How to summarize results from the MCMC chain?
I Posterior mean: θ̂ = 1/(T − B)∑T
t=B+1θt, with B = #
burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and
provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)
I Posterior median (50th percentile of {θt}Tt=B+1
) providesalternative point estimate
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How to summarize results from the MCMC chain?
I Posterior mean: θ̂ = 1/(T − B)∑T
t=B+1θt, with B = #
burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and
provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)
I Posterior median (50th percentile of {θt}Tt=B+1
) providesalternative point estimate
I Posterior standard deviation calculated as square root of
v̂ar(θj |y) =1
T − B − 1
T∑
t=B+1
(θtj − θ̂j)
2.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
How to summarize results from the MCMC chain?
I Posterior mean: θ̂ = 1/(T − B)∑T
t=B+1θt, with B = #
burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and
provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)
I Posterior median (50th percentile of {θt}Tt=B+1
) providesalternative point estimate
I Posterior standard deviation calculated as square root of
v̂ar(θj |y) =1
T − B − 1
T∑
t=B+1
(θtj − θ̂j)
2.
I As n increases, we obtain
π(θj |y) ≈ N(θj; θ̂j, v̂ar(θj |y)
).
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Interval estimates
I As a Bayesian alternative to the confidence interval, onecan use a credible interval
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Interval estimates
I As a Bayesian alternative to the confidence interval, onecan use a credible interval
I The 100(1 − α)% credible interval ranges from the α/2 to1 − α/2 percentiles of {θt}T
t=B+1.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Interval estimates
I As a Bayesian alternative to the confidence interval, onecan use a credible interval
I The 100(1 − α)% credible interval ranges from the α/2 to1 − α/2 percentiles of {θt}T
t=B+1.
I A highest posterior density (HPD) interval can also becalculated - smallest interval containing true parameterwith 100(1 − α) posterior probability
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Posterior probabilities
I Often interest focuses on the weight of evidence ofH1 : θj > 0
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Posterior probabilities
I Often interest focuses on the weight of evidence ofH1 : θj > 0
I One can use the estimated posterior probability:
P̂r(θj > 0 |data) =1
T − B
T∑
t=B+1
1(θtj > 0),
with 1(θtj > 0) = 1 if θt
j > 0 and 0 otherwise.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Posterior probabilities
I Often interest focuses on the weight of evidence ofH1 : θj > 0
I One can use the estimated posterior probability:
P̂r(θj > 0 |data) =1
T − B
T∑
t=B+1
1(θtj > 0),
with 1(θtj > 0) = 1 if θt
j > 0 and 0 otherwise.
I A high value (e.g., greater than 0.95) suggests strongevidence in favor of H1
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Marginal posterior density estimation
I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Marginal posterior density estimation
I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture
I Since we have many samples from the posterior, we canaccurately estimate the exact posterior density
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Marginal posterior density estimation
I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture
I Since we have many samples from the posterior, we canaccurately estimate the exact posterior density
I This can be done using a kernel-smoothed densityestimation procedure applied to the samples {θt
j}Tt=B+1
.
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Illustration - linear regression
I Lewis & Taylor (1967) - study of weight (yi) in 237 students
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Illustration - linear regression
I Lewis & Taylor (1967) - study of weight (yi) in 237 students
I The model is as follows:
yi = β0 + β1x1i + β2x2i + β3x3i + εi, i = 1, . . . , 237,
x1i=height in feet - 5 feetx2i=age in years - 16x3i=1 for males, 0 for females
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Illustration - linear regression
I Lewis & Taylor (1967) - study of weight (yi) in 237 students
I The model is as follows:
yi = β0 + β1x1i + β2x2i + β3x3i + εi, i = 1, . . . , 237,
x1i=height in feet - 5 feetx2i=age in years - 16x3i=1 for males, 0 for females
I Implemented in SAS Proc BGENMOD - 2,000 burn-in &10,000 collected
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Output and diagnostics - intercept (β0)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Output and diagnostics - height (β1)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Output and diagnostics - age (β2)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Output and diagnostics - male (β3)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Mixing - Autocorrelation in MCMC samples
Parameter Lag1 Lag5 Lag10 Lag50
Intercept 0.5489 0.0114 -0.0107 0.0009height 0.5166 -0.0124 0.0112 0.0042age 0.4634 -0.0068 -0.0038 0.0032male 0.5613 0.0294 -0.0170 0.0017
Precision -0.0039 -0.0088 -0.0042 0.0018Conclusion: Very good mixing
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Tests of convergence
Gelman-Rubin GewekeParameter Estimate 97.5% z Pr > |z|
Intercept 1.0000 1.0002 0.5871 0.5572height 1.0004 1.0013 1.7153 0.0863age 1.0003 1.0012 -1.3831 0.1666male 1.0001 1.0005 -1.2658 0.2056
Precision 1.0003 1.0010 2.4947 0.0126Gelman-Rubin: values ≈ 1 suggest convergenceGeweke: convergence suggested except for precisionHeidelberger-Welsh: all parameters passed
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Output and diagnostics - precision (τ)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Number of samples sufficient?
I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Number of samples sufficient?
I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)
I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Number of samples sufficient?
I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)
I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient
I Effective sample size: ranged between 3033.5 - 3740.2 forregression coefficients
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Number of samples sufficient?
I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)
I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient
I Effective sample size: ranged between 3033.5 - 3740.2 forregression coefficients
I 10,000 Gibbs samples contain as much information as3033.5-3740.2 independent draws
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Posterior summaries
10,000 Samples
Parameter Mean SD 95% CI 95% HPD
Intercept 96.155 1.138 [93.906, 98.352] [93.866, 98.294]height 3.103 0.272 [2.576, 3.642] [2.550, 3.611]age 2.390 0.566 [1.272, 3.492] [1.282. 3.4980male -0.280 1.601 [-3.3601, 2.948] [-3.344, 2.961]
precision 0.0071 0.00066 [0.0058, 0.0084] [0.0058, 0.0084]50,000 Samples
Intercept 96.207 1.145 [93.968, 98.457] [93.997, 98.482]height 3.107 0.267 [2.581, 3.627] [2.574, 3.619]age 2.375 0.562 [1.265, 3.467] [1.268, 3.470]male -0.353 1.605 [-3.495, 2.825] [-3.451, 2.863]
precision 0.0071 0.00065 [0.0059, 0.0084] [0.0058, 0.0084]
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence Diagnostics (50,000 samples)
I Gelman-Rubin 97.5% bound max of 1.0001
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence Diagnostics (50,000 samples)
I Gelman-Rubin 97.5% bound max of 1.0001
I Geweke p-values minimum of 0.4716
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence Diagnostics (50,000 samples)
I Gelman-Rubin 97.5% bound max of 1.0001
I Geweke p-values minimum of 0.4716
I Heidelberger-Welsh passed for all parameters
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Convergence Diagnostics (50,000 samples)
I Gelman-Rubin 97.5% bound max of 1.0001
I Geweke p-values minimum of 0.4716
I Heidelberger-Welsh passed for all parameters
I Conclusion: for longer chain, no evidence of lack ofconvergence
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Discussion
I Overall picture suggests convergence, good mixing andsufficient number of collected samples
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Discussion
I Overall picture suggests convergence, good mixing andsufficient number of collected samples
I Don’t take rejection of one convergence test too seriously iftrace plot looks good
STA 216, Generalized Linear Models, Lecture 6
OutlineIntroduction to Bayes Inference for GLMs
Introduction to MCMC Algorithms
Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration
Discussion
I Overall picture suggests convergence, good mixing andsufficient number of collected samples
I Don’t take rejection of one convergence test too seriously iftrace plot looks good
I Rejection motivates collection of additional samples tomake sure inferences do not change
STA 216, Generalized Linear Models, Lecture 6