153
Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September 13, 2007 STA 216, Generalized Linear Models, Lecture 6

STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

STA 216, Generalized Linear Models, Lecture 6

September 13, 2007

STA 216, Generalized Linear Models, Lecture 6

Page 2: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Introduction to Bayes Inference for GLMsDescription of PosteriorAsymptotic Approximations

Introduction to MCMC AlgorithmsGibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

STA 216, Generalized Linear Models, Lecture 6

Page 3: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

STA 216, Generalized Linear Models, Lecture 6

Page 4: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θ

STA 216, Generalized Linear Models, Lecture 6

Page 5: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θ

STA 216, Generalized Linear Models, Lecture 6

Page 6: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior

STA 216, Generalized Linear Models, Lecture 6

Page 7: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior

I Good news - we have the numerator in this expression

STA 216, Generalized Linear Models, Lecture 6

Page 8: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior

I Good news - we have the numerator in this expression

I Bad news - the denominator is typically not available (mayinvolve high dimensional integral)

STA 216, Generalized Linear Models, Lecture 6

Page 9: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

STA 216, Generalized Linear Models, Lecture 6

Page 10: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

I Example: L(y | θ) =∏n

i=1N(yi;x

iβ, τ−1) (normal linearregression)

STA 216, Generalized Linear Models, Lecture 6

Page 11: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

I Example: L(y | θ) =∏n

i=1N(yi;x

iβ, τ−1) (normal linearregression)

I The conjugate prior is normal-gamma:

π(β, τ) = Np(β0, τ−1Σ0)G(τ ; a, b),

where Np(·) denotes the p-variate normal &G(·) denotes the gamma

STA 216, Generalized Linear Models, Lecture 6

Page 12: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

I Example: L(y | θ) =∏n

i=1N(yi;x

iβ, τ−1) (normal linearregression)

I The conjugate prior is normal-gamma:

π(β, τ) = Np(β0, τ−1Σ0)G(τ ; a, b),

where Np(·) denotes the p-variate normal &G(·) denotes the gamma

I For this prior, the posterior is also normal-gamma

STA 216, Generalized Linear Models, Lecture 6

Page 13: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Non-Conjugate Priors

I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model

STA 216, Generalized Linear Models, Lecture 6

Page 14: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Non-Conjugate Priors

I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model

I One can potentially rely on an asymptotic normalapproximation

STA 216, Generalized Linear Models, Lecture 6

Page 15: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Non-Conjugate Priors

I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model

I One can potentially rely on an asymptotic normalapproximation

I As n → ∞, the posterior distribution is normal centered onMLE

STA 216, Generalized Linear Models, Lecture 6

Page 16: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Asymptotic Approximation with Informative Priors

I Suppose we have a N(β0,Σ0) prior for β.

STA 216, Generalized Linear Models, Lecture 6

Page 17: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Asymptotic Approximation with Informative Priors

I Suppose we have a N(β0,Σ0) prior for β.

I Asymptotic normal approximation to the posterior is

π(β |y,X) ∝ exp

{−

1

2(β − β0)Σ

−1

0(β − β0)

}

× exp

{−

1

2(β − β̂)′I(β̂)(β − β̂)′

}

∝ N(β; β̃, Σ̃β

)

STA 216, Generalized Linear Models, Lecture 6

Page 18: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Asymptotic Approximation with Informative Priors

I Suppose we have a N(β0,Σ0) prior for β.

I Asymptotic normal approximation to the posterior is

π(β |y,X) ∝ exp

{−

1

2(β − β0)Σ

−1

0(β − β0)

}

× exp

{−

1

2(β − β̂)′I(β̂)(β − β̂)′

}

∝ N(β; β̃, Σ̃β

)

I Approximate posterior mean & variance:

β̃ = Σ̃(Σ−1

0β0 + I(β̂)β̂

), Σ̃β =

(Σ−1

0+ I(β̂)

)−1

STA 216, Generalized Linear Models, Lecture 6

Page 19: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Comments on Asymptotic Approximation

I Even for moderate sample sizes, asymptotic approximationmay be inaccurate

STA 216, Generalized Linear Models, Lecture 6

Page 20: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Comments on Asymptotic Approximation

I Even for moderate sample sizes, asymptotic approximationmay be inaccurate

I In logistic regression for rare outcomes or rare binaryexposures, posterior can be highly skewed

STA 216, Generalized Linear Models, Lecture 6

Page 21: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Comments on Asymptotic Approximation

I Even for moderate sample sizes, asymptotic approximationmay be inaccurate

I In logistic regression for rare outcomes or rare binaryexposures, posterior can be highly skewed

I Appealing to avoid any reliance on large sampleassumptions and base inferences on exact posterior

STA 216, Generalized Linear Models, Lecture 6

Page 22: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

STA 216, Generalized Linear Models, Lecture 6

Page 23: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

I Note that this does not give us an approximation toπ(θ |y) directly

STA 216, Generalized Linear Models, Lecture 6

Page 24: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

I Note that this does not give us an approximation toπ(θ |y) directly

I However, from these samples we can obtain summaries ofthe posterior distribution for θ

STA 216, Generalized Linear Models, Lecture 6

Page 25: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

I Note that this does not give us an approximation toπ(θ |y) directly

I However, from these samples we can obtain summaries ofthe posterior distribution for θ

I Summaries of exact posterior distributions of g(θ), for anyfunctional g(·), can also be obtained.

STA 216, Generalized Linear Models, Lecture 6

Page 26: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

STA 216, Generalized Linear Models, Lecture 6

Page 27: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

I θ0 = initial value used to start the chain (shouldn’t be

sensitive)

STA 216, Generalized Linear Models, Lecture 6

Page 28: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

I θ0 = initial value used to start the chain (shouldn’t be

sensitive)

I MCMC generates θt from a distribution that depends onthe data & potentially on θt−1, but not on θ1, . . . , θt−2.

STA 216, Generalized Linear Models, Lecture 6

Page 29: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

I θ0 = initial value used to start the chain (shouldn’t be

sensitive)

I MCMC generates θt from a distribution that depends onthe data & potentially on θt−1, but not on θ1, . . . , θt−2.

I This results in a Markov chain with stationary distributionπ(θ |y) under some conditions on the sampling distribution

STA 216, Generalized Linear Models, Lecture 6

Page 30: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:

STA 216, Generalized Linear Models, Lecture 6

Page 31: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)

STA 216, Generalized Linear Models, Lecture 6

Page 32: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)

STA 216, Generalized Linear Models, Lecture 6

Page 33: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,

1990)

STA 216, Generalized Linear Models, Lecture 6

Page 34: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,

1990)

I Easy overview of Gibbs - Casella & George (1992, The

American Statistician, 46, 167-174)

STA 216, Generalized Linear Models, Lecture 6

Page 35: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,

1990)

I Easy overview of Gibbs - Casella & George (1992, The

American Statistician, 46, 167-174)

I Easy overview of MH - Chib & Greenberg (1995, The

American Statistician)

STA 216, Generalized Linear Models, Lecture 6

Page 36: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

STA 216, Generalized Linear Models, Lecture 6

Page 37: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

STA 216, Generalized Linear Models, Lecture 6

Page 38: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

1. Sample θt1 from the conditional posterior distribution

π(θ1 | θ2 = θt−1

2, . . . , θp = θt−1

p ,y)

STA 216, Generalized Linear Models, Lecture 6

Page 39: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

1. Sample θt1 from the conditional posterior distribution

π(θ1 | θ2 = θt−1

2, . . . , θp = θt−1

p ,y)

2. Sample θt2 from the conditional posterior distribution

π(θ2 | θ1 = θt1, θ3 = θt−1

3, . . . , θp = θt−1

p )

STA 216, Generalized Linear Models, Lecture 6

Page 40: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

1. Sample θt1 from the conditional posterior distribution

π(θ1 | θ2 = θt−1

2, . . . , θp = θt−1

p ,y)

2. Sample θt2 from the conditional posterior distribution

π(θ2 | θ1 = θt1, θ3 = θt−1

3, . . . , θp = θt−1

p )

3. Similarly, sample θt3, . . . , θ

tp from the conditional posterior

distributions given current values of other parameters.

STA 216, Generalized Linear Models, Lecture 6

Page 41: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

STA 216, Generalized Linear Models, Lecture 6

Page 42: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).

STA 216, Generalized Linear Models, Lecture 6

Page 43: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).

I It is necessary to discard the initial samples as a burn-in toallow convergence

STA 216, Generalized Linear Models, Lecture 6

Page 44: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).

I It is necessary to discard the initial samples as a burn-in toallow convergence

I In simple models such as GLMs, convergence typicallyoccurs quickly & burn-in of 100 iterations should besufficient (to be conservative SAS uses 2,000 as default)

STA 216, Generalized Linear Models, Lecture 6

Page 45: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - DDE & Preterm Birth

I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables

STA 216, Generalized Linear Models, Lecture 6

Page 46: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - DDE & Preterm Birth

I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables

I Data from US Collaborative Perinatal Project (CPP) - n =2380 children out of which 361 were born preterm

STA 216, Generalized Linear Models, Lecture 6

Page 47: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - DDE & Preterm Birth

I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables

I Data from US Collaborative Perinatal Project (CPP) - n =2380 children out of which 361 were born preterm

I Analysis: Bayesian analysis using a probit model

STA 216, Generalized Linear Models, Lecture 6

Page 48: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Probit Model

yi = 1 if preterm birth and yi = 0 if full-term birth

Pr(yi = 1 |xi, β) = Φ(x′

iβ),

I xi = (1, ddei, xi3, . . . , xi7)′

I xi3, . . . , xi7=possible confounders (black race, etc)

I β1 = intercept

I β2 = slope

STA 216, Generalized Linear Models, Lecture 6

Page 49: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

STA 216, Generalized Linear Models, Lecture 6

Page 50: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

π(y |β,X) =

n∏

i=1

Φ(x′

iβ)yi

{1 − Φ(x′

iβ)}1−yi

STA 216, Generalized Linear Models, Lecture 6

Page 51: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

π(y |β,X) =

n∏

i=1

Φ(x′

iβ)yi

{1 − Φ(x′

iβ)}1−yi

I Posterior:π(β |y,X) ∝ π(β)π(y |β,X).

STA 216, Generalized Linear Models, Lecture 6

Page 52: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

π(y |β,X) =

n∏

i=1

Φ(x′

iβ)yi

{1 − Φ(x′

iβ)}1−yi

I Posterior:π(β |y,X) ∝ π(β)π(y |β,X).

I No closed form available for normalizing constant

STA 216, Generalized Linear Models, Lecture 6

Page 53: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Maximum Likelihood Results

Parameter MLE SE Z stat p-value

β1 -1.08068 0.04355 -24.816 < 2e − 16β2 0.17536 0.02909 6.028 1.67e-09β3 -0.12817 0.03528 -3.633 0.000280β4 0.11097 0.03366 3.297 0.000978β5 -0.01705 0.03405 -0.501 0.616659β6 -0.08216 0.03576 -2.298 0.021571β7 0.05462 0.06473 0.844 0.398721

β2 = dde slope (highly significant increasing trend)

STA 216, Generalized Linear Models, Lecture 6

Page 54: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

STA 216, Generalized Linear Models, Lecture 6

Page 55: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

STA 216, Generalized Linear Models, Lecture 6

Page 56: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

I Σ0 expresses uncertainty - place high probability in aplausible range

STA 216, Generalized Linear Models, Lecture 6

Page 57: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

I Σ0 expresses uncertainty - place high probability in aplausible range

I Much better than flat priors, which can yield implausibleestimates!

STA 216, Generalized Linear Models, Lecture 6

Page 58: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

I Σ0 expresses uncertainty - place high probability in aplausible range

I Much better than flat priors, which can yield implausibleestimates!

I As a default, shrinkage-type prior we use N(0, 4 × I7×7)

STA 216, Generalized Linear Models, Lecture 6

Page 59: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

STA 216, Generalized Linear Models, Lecture 6

Page 60: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

STA 216, Generalized Linear Models, Lecture 6

Page 61: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in

STA 216, Generalized Linear Models, Lecture 6

Page 62: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in

I For typical GLMs, such as probit models, convergence rapid

STA 216, Generalized Linear Models, Lecture 6

Page 63: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in

I For typical GLMs, such as probit models, convergence rapid

I For illustration, we collected 1,000 iterations

STA 216, Generalized Linear Models, Lecture 6

Page 64: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - probit binary regression model

STA 216, Generalized Linear Models, Lecture 6

Page 65: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior Summaries

Parameter Mean Median SD 95% credible interval

β1 -1.08 -1.08 0.04 (-1.16, -1.01)β2 0.17 0.17 0.03 (0.12, 0.23)β3 -0.13 -0.13 0.04 (-0.2, -0.05)β4 0.11 0.11 0.03 (0.05, 0.18)β5 -0.02 -0.02 0.03 (-0.08, 0.05)β6 -0.08 -0.08 0.04 (-0.15, -0.02)β7 0.05 0.06 0.06 (-0.07, 0.18)

STA 216, Generalized Linear Models, Lecture 6

Page 66: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Estimated Posterior Density

STA 216, Generalized Linear Models, Lecture 6

Page 67: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Inferences on Functionals

I Often, it is not the regression parameter which is ofprimary interest.

STA 216, Generalized Linear Models, Lecture 6

Page 68: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Inferences on Functionals

I Often, it is not the regression parameter which is ofprimary interest.

I One may want to estimate functionals, such as the mean atdifferent values of a predictor.

STA 216, Generalized Linear Models, Lecture 6

Page 69: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Inferences on Functionals

I Often, it is not the regression parameter which is ofprimary interest.

I One may want to estimate functionals, such as the mean atdifferent values of a predictor.

I By applying the function to every iteration of the MCMCalgorithm after burn-in, one can obtain samples from themarginal posterior density of the unknown of interest.

STA 216, Generalized Linear Models, Lecture 6

Page 70: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Estimated Dose Response Function

STA 216, Generalized Linear Models, Lecture 6

Page 71: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings Sampling

I Gibbs sampling requires sampling from the conditionalposterior distributions

STA 216, Generalized Linear Models, Lecture 6

Page 72: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings Sampling

I Gibbs sampling requires sampling from the conditionalposterior distributions

I Metropolis-Hastings is an alternative that avoids thisrestriction

STA 216, Generalized Linear Models, Lecture 6

Page 73: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings Sampling

I Gibbs sampling requires sampling from the conditionalposterior distributions

I Metropolis-Hastings is an alternative that avoids thisrestriction

I Again start with an initial value θ0 and sequentially updatethe parameters θ1, . . . , θp

STA 216, Generalized Linear Models, Lecture 6

Page 74: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

STA 216, Generalized Linear Models, Lecture 6

Page 75: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

1. Sample a candidate θ̃tj ∼ qj(· | θ

t−1

j )

STA 216, Generalized Linear Models, Lecture 6

Page 76: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

1. Sample a candidate θ̃tj ∼ qj(· | θ

t−1

j )

2. Let θtj = θ̃t

j with probability

min

{1,

π(θ̃tj) L(y | θj = θ̃t

j ,−) qj(θt−1

j | θ̃tj)

π(θt−1

j ) L(y | θj = θt−1

j ,−) qj(θ̃tj | θ

tj)

},

L(y | θj = θ̃tj ,−)=likelihood given θj = θ̃t

j and currentvalues of other parameters

STA 216, Generalized Linear Models, Lecture 6

Page 77: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

1. Sample a candidate θ̃tj ∼ qj(· | θ

t−1

j )

2. Let θtj = θ̃t

j with probability

min

{1,

π(θ̃tj) L(y | θj = θ̃t

j ,−) qj(θt−1

j | θ̃tj)

π(θt−1

j ) L(y | θj = θt−1

j ,−) qj(θ̃tj | θ

tj)

},

L(y | θj = θ̃tj ,−)=likelihood given θj = θ̃t

j and currentvalues of other parameters

3. Otherwise let θtj = θt−1

j .

STA 216, Generalized Linear Models, Lecture 6

Page 78: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

STA 216, Generalized Linear Models, Lecture 6

Page 79: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

I Most common proposal is N(θt−1

j , κ), which is centered onthe previous value

STA 216, Generalized Linear Models, Lecture 6

Page 80: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

I Most common proposal is N(θt−1

j , κ), which is centered onthe previous value

I This results in a Metropolis random walk

STA 216, Generalized Linear Models, Lecture 6

Page 81: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

I Most common proposal is N(θt−1

j , κ), which is centered onthe previous value

I This results in a Metropolis random walk

I Inefficient if κ is too small or too large

STA 216, Generalized Linear Models, Lecture 6

Page 82: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

STA 216, Generalized Linear Models, Lecture 6

Page 83: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

I Uses sequentially defined envelopes around target density,leading to some additional computational expense

STA 216, Generalized Linear Models, Lecture 6

Page 84: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

I Uses sequentially defined envelopes around target density,leading to some additional computational expense

I Log concavity holds for most GLMs and typical priors

STA 216, Generalized Linear Models, Lecture 6

Page 85: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

I Uses sequentially defined envelopes around target density,leading to some additional computational expense

I Log concavity holds for most GLMs and typical priors

I When violated adaptive rejection Metropolis sampling(ARMS) (Gilks et al., 1995) used.

STA 216, Generalized Linear Models, Lecture 6

Page 86: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

SAS Implementation

I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS

STA 216, Generalized Linear Models, Lecture 6

Page 87: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

SAS Implementation

I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS

I Hence, SAS uses Gibbs sampling for posterior computation

STA 216, Generalized Linear Models, Lecture 6

Page 88: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

SAS Implementation

I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS

I Hence, SAS uses Gibbs sampling for posterior computation

I Important to diagnose convergence & mixing wheneverusing MCMC!!

STA 216, Generalized Linear Models, Lecture 6

Page 89: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

STA 216, Generalized Linear Models, Lecture 6

Page 90: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

STA 216, Generalized Linear Models, Lecture 6

Page 91: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

I Slow mixing: tendency for high autocorrelation in thesamples.

STA 216, Generalized Linear Models, Lecture 6

Page 92: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

I Slow mixing: tendency for high autocorrelation in thesamples.

I Thinning: practice of collecting every kth iteration toreduce autocorrelation

STA 216, Generalized Linear Models, Lecture 6

Page 93: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

I Slow mixing: tendency for high autocorrelation in thesamples.

I Thinning: practice of collecting every kth iteration toreduce autocorrelation

I Trace plot: plot of sampled values of a parameter vsiteration #

STA 216, Generalized Linear Models, Lecture 6

Page 94: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - trace plot with poor mixing

STA 216, Generalized Linear Models, Lecture 6

Page 95: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

STA 216, Generalized Linear Models, Lecture 6

Page 96: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

STA 216, Generalized Linear Models, Lecture 6

Page 97: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries

STA 216, Generalized Linear Models, Lecture 6

Page 98: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries

I For very poor mixing chain, may even need millions ofiterations.

STA 216, Generalized Linear Models, Lecture 6

Page 99: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries

I For very poor mixing chain, may even need millions ofiterations.

I Routinely examine trace plots!

STA 216, Generalized Linear Models, Lecture 6

Page 100: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - trace plot with good mixing

STA 216, Generalized Linear Models, Lecture 6

Page 101: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

STA 216, Generalized Linear Models, Lecture 6

Page 102: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

I Note: no definitive tests of convergence & you should checkconvergence of all parameters

STA 216, Generalized Linear Models, Lecture 6

Page 103: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

I Note: no definitive tests of convergence & you should checkconvergence of all parameters

I With experience visual inspection of trace plots perhapsmost useful approach

STA 216, Generalized Linear Models, Lecture 6

Page 104: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

I Note: no definitive tests of convergence & you should checkconvergence of all parameters

I With experience visual inspection of trace plots perhapsmost useful approach

I There are a number of useful automated tests

STA 216, Generalized Linear Models, Lecture 6

Page 105: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

STA 216, Generalized Linear Models, Lecture 6

Page 106: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

STA 216, Generalized Linear Models, Lecture 6

Page 107: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

STA 216, Generalized Linear Models, Lecture 6

Page 108: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

STA 216, Generalized Linear Models, Lecture 6

Page 109: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.

STA 216, Generalized Linear Models, Lecture 6

Page 110: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.

I autocorrelation: high values indicate slow mixing

STA 216, Generalized Linear Models, Lecture 6

Page 111: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.

I autocorrelation: high values indicate slow mixing

I effective sample size: low value relative to actual #indicates slow mixing

STA 216, Generalized Linear Models, Lecture 6

Page 112: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

STA 216, Generalized Linear Models, Lecture 6

Page 113: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

STA 216, Generalized Linear Models, Lecture 6

Page 114: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

STA 216, Generalized Linear Models, Lecture 6

Page 115: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

I Sometimes this can be solved with more iterations

STA 216, Generalized Linear Models, Lecture 6

Page 116: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

I Sometimes this can be solved with more iterations

I Otherwise, you may want to try multiple chains

STA 216, Generalized Linear Models, Lecture 6

Page 117: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

I Sometimes this can be solved with more iterations

I Otherwise, you may want to try multiple chains

I For the models considered in SAS, chains tend to be verywell behaved when the MLE exists or priors are informative

STA 216, Generalized Linear Models, Lecture 6

Page 118: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samples

STA 216, Generalized Linear Models, Lecture 6

Page 119: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

STA 216, Generalized Linear Models, Lecture 6

Page 120: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

I Posterior median (50th percentile of {θt}Tt=B+1

) providesalternative point estimate

STA 216, Generalized Linear Models, Lecture 6

Page 121: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

I Posterior median (50th percentile of {θt}Tt=B+1

) providesalternative point estimate

I Posterior standard deviation calculated as square root of

v̂ar(θj |y) =1

T − B − 1

T∑

t=B+1

(θtj − θ̂j)

2.

STA 216, Generalized Linear Models, Lecture 6

Page 122: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

I Posterior median (50th percentile of {θt}Tt=B+1

) providesalternative point estimate

I Posterior standard deviation calculated as square root of

v̂ar(θj |y) =1

T − B − 1

T∑

t=B+1

(θtj − θ̂j)

2.

I As n increases, we obtain

π(θj |y) ≈ N(θj; θ̂j, v̂ar(θj |y)

).

STA 216, Generalized Linear Models, Lecture 6

Page 123: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Interval estimates

I As a Bayesian alternative to the confidence interval, onecan use a credible interval

STA 216, Generalized Linear Models, Lecture 6

Page 124: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Interval estimates

I As a Bayesian alternative to the confidence interval, onecan use a credible interval

I The 100(1 − α)% credible interval ranges from the α/2 to1 − α/2 percentiles of {θt}T

t=B+1.

STA 216, Generalized Linear Models, Lecture 6

Page 125: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Interval estimates

I As a Bayesian alternative to the confidence interval, onecan use a credible interval

I The 100(1 − α)% credible interval ranges from the α/2 to1 − α/2 percentiles of {θt}T

t=B+1.

I A highest posterior density (HPD) interval can also becalculated - smallest interval containing true parameterwith 100(1 − α) posterior probability

STA 216, Generalized Linear Models, Lecture 6

Page 126: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior probabilities

I Often interest focuses on the weight of evidence ofH1 : θj > 0

STA 216, Generalized Linear Models, Lecture 6

Page 127: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior probabilities

I Often interest focuses on the weight of evidence ofH1 : θj > 0

I One can use the estimated posterior probability:

P̂r(θj > 0 |data) =1

T − B

T∑

t=B+1

1(θtj > 0),

with 1(θtj > 0) = 1 if θt

j > 0 and 0 otherwise.

STA 216, Generalized Linear Models, Lecture 6

Page 128: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior probabilities

I Often interest focuses on the weight of evidence ofH1 : θj > 0

I One can use the estimated posterior probability:

P̂r(θj > 0 |data) =1

T − B

T∑

t=B+1

1(θtj > 0),

with 1(θtj > 0) = 1 if θt

j > 0 and 0 otherwise.

I A high value (e.g., greater than 0.95) suggests strongevidence in favor of H1

STA 216, Generalized Linear Models, Lecture 6

Page 129: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Marginal posterior density estimation

I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture

STA 216, Generalized Linear Models, Lecture 6

Page 130: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Marginal posterior density estimation

I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture

I Since we have many samples from the posterior, we canaccurately estimate the exact posterior density

STA 216, Generalized Linear Models, Lecture 6

Page 131: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Marginal posterior density estimation

I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture

I Since we have many samples from the posterior, we canaccurately estimate the exact posterior density

I This can be done using a kernel-smoothed densityestimation procedure applied to the samples {θt

j}Tt=B+1

.

STA 216, Generalized Linear Models, Lecture 6

Page 132: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Illustration - linear regression

I Lewis & Taylor (1967) - study of weight (yi) in 237 students

STA 216, Generalized Linear Models, Lecture 6

Page 133: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Illustration - linear regression

I Lewis & Taylor (1967) - study of weight (yi) in 237 students

I The model is as follows:

yi = β0 + β1x1i + β2x2i + β3x3i + εi, i = 1, . . . , 237,

x1i=height in feet - 5 feetx2i=age in years - 16x3i=1 for males, 0 for females

STA 216, Generalized Linear Models, Lecture 6

Page 134: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Illustration - linear regression

I Lewis & Taylor (1967) - study of weight (yi) in 237 students

I The model is as follows:

yi = β0 + β1x1i + β2x2i + β3x3i + εi, i = 1, . . . , 237,

x1i=height in feet - 5 feetx2i=age in years - 16x3i=1 for males, 0 for females

I Implemented in SAS Proc BGENMOD - 2,000 burn-in &10,000 collected

STA 216, Generalized Linear Models, Lecture 6

Page 135: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - intercept (β0)

STA 216, Generalized Linear Models, Lecture 6

Page 136: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - height (β1)

STA 216, Generalized Linear Models, Lecture 6

Page 137: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - age (β2)

STA 216, Generalized Linear Models, Lecture 6

Page 138: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - male (β3)

STA 216, Generalized Linear Models, Lecture 6

Page 139: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Mixing - Autocorrelation in MCMC samples

Parameter Lag1 Lag5 Lag10 Lag50

Intercept 0.5489 0.0114 -0.0107 0.0009height 0.5166 -0.0124 0.0112 0.0042age 0.4634 -0.0068 -0.0038 0.0032male 0.5613 0.0294 -0.0170 0.0017

Precision -0.0039 -0.0088 -0.0042 0.0018Conclusion: Very good mixing

STA 216, Generalized Linear Models, Lecture 6

Page 140: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Tests of convergence

Gelman-Rubin GewekeParameter Estimate 97.5% z Pr > |z|

Intercept 1.0000 1.0002 0.5871 0.5572height 1.0004 1.0013 1.7153 0.0863age 1.0003 1.0012 -1.3831 0.1666male 1.0001 1.0005 -1.2658 0.2056

Precision 1.0003 1.0010 2.4947 0.0126Gelman-Rubin: values ≈ 1 suggest convergenceGeweke: convergence suggested except for precisionHeidelberger-Welsh: all parameters passed

STA 216, Generalized Linear Models, Lecture 6

Page 141: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - precision (τ)

STA 216, Generalized Linear Models, Lecture 6

Page 142: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

STA 216, Generalized Linear Models, Lecture 6

Page 143: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient

STA 216, Generalized Linear Models, Lecture 6

Page 144: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient

I Effective sample size: ranged between 3033.5 - 3740.2 forregression coefficients

STA 216, Generalized Linear Models, Lecture 6

Page 145: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient

I Effective sample size: ranged between 3033.5 - 3740.2 forregression coefficients

I 10,000 Gibbs samples contain as much information as3033.5-3740.2 independent draws

STA 216, Generalized Linear Models, Lecture 6

Page 146: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior summaries

10,000 Samples

Parameter Mean SD 95% CI 95% HPD

Intercept 96.155 1.138 [93.906, 98.352] [93.866, 98.294]height 3.103 0.272 [2.576, 3.642] [2.550, 3.611]age 2.390 0.566 [1.272, 3.492] [1.282. 3.4980male -0.280 1.601 [-3.3601, 2.948] [-3.344, 2.961]

precision 0.0071 0.00066 [0.0058, 0.0084] [0.0058, 0.0084]50,000 Samples

Intercept 96.207 1.145 [93.968, 98.457] [93.997, 98.482]height 3.107 0.267 [2.581, 3.627] [2.574, 3.619]age 2.375 0.562 [1.265, 3.467] [1.268, 3.470]male -0.353 1.605 [-3.495, 2.825] [-3.451, 2.863]

precision 0.0071 0.00065 [0.0059, 0.0084] [0.0058, 0.0084]

STA 216, Generalized Linear Models, Lecture 6

Page 147: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

STA 216, Generalized Linear Models, Lecture 6

Page 148: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

I Geweke p-values minimum of 0.4716

STA 216, Generalized Linear Models, Lecture 6

Page 149: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

I Geweke p-values minimum of 0.4716

I Heidelberger-Welsh passed for all parameters

STA 216, Generalized Linear Models, Lecture 6

Page 150: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

I Geweke p-values minimum of 0.4716

I Heidelberger-Welsh passed for all parameters

I Conclusion: for longer chain, no evidence of lack ofconvergence

STA 216, Generalized Linear Models, Lecture 6

Page 151: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Discussion

I Overall picture suggests convergence, good mixing andsufficient number of collected samples

STA 216, Generalized Linear Models, Lecture 6

Page 152: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Discussion

I Overall picture suggests convergence, good mixing andsufficient number of collected samples

I Don’t take rejection of one convergence test too seriously iftrace plot looks good

STA 216, Generalized Linear Models, Lecture 6

Page 153: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Discussion

I Overall picture suggests convergence, good mixing andsufficient number of collected samples

I Don’t take rejection of one convergence test too seriously iftrace plot looks good

I Rejection motivates collection of additional samples tomake sure inferences do not change

STA 216, Generalized Linear Models, Lecture 6