180
Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved Illustrating that frequentist results can be obtained with Bayesian procedures Illustrating a multivariate (independent) sampling algorithm Bayesian Biostatistics - Piracicaba 2014 196

Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

  • Upload
    trandan

  • View
    250

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Chapter 4

More than one parameter

Aims:

◃ Moving towards practical applications

◃ Illustrating that computations become quickly involved

◃ Illustrating that frequentist results can be obtained with Bayesian procedures

◃ Illustrating a multivariate (independent) sampling algorithm

Bayesian Biostatistics - Piracicaba 2014 196

Page 2: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.1 Introduction

• Most statistical models involve more than one parameter to estimate

• Examples:

◃ Normal distribution: mean µ and variance σ2

◃ Linear regression: regression coefficients β0, β1, . . . , βd and residual variance σ2

◃ Logistic regression: regression coefficients β0, β1, . . . , βd

◃ Multinomial distribution: class probabilities θ1, θ2, . . . , θd with∑d

j=1 θj = 1

• This requires a prior for all parameters (together): expresses our beliefs about themodel parameters

• Aim: derive posterior for all parameters and their summary measures

Bayesian Biostatistics - Piracicaba 2014 197

Page 3: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

• It turns out that in most cases analytical solutions for the posterior will not bepossible anymore

• In Chapter 6, we will see that for this Markov Chain Monte Carlo methods areneeded

• Here, we look at a simple multivariate sampling approach: Method of Composition

Bayesian Biostatistics - Piracicaba 2014 198

Page 4: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.2 Joint versus marginal posterior inference

• Bayes theorem:

p(θ | y) = L(θ | y)p(θ)∫L(θ | y)p(θ) dθ

◦ Hence, the same expression as before but now θ = (θ1, θ2, . . . , θd)T

◦ Now, the prior p(θ) is multivariate. But often a prior is given for eachparameter separately

◦ Posterior p(θ | y) is also multivariate. But we usually look only at the(marginal) posteriors p(θj | y) (j = 1, . . . , d)

• We also need for each parameter: posterior mean, median (and sometimes mode),and credible intervals

Bayesian Biostatistics - Piracicaba 2014 199

Page 5: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

• Illustration on the normal distribution with µ and σ2 unknown

• Application: determining 95% normal range of alp (continuation of Example III.6)

• We look at three cases (priors):

◃ No prior knowledge is available

◃ Previous study is available

◃ Expert knowledge is available

• But first, a brief theoretical introduction

Bayesian Biostatistics - Piracicaba 2014 200

Page 6: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.3 The normal distribution with µ and σ2 unknown

Acknowledging that µ and σ2 are unknown

• Sample y1, . . . , yn of independent observations from N(µ, σ2)

• Joint likelihood of (µ, σ2) given y:

L(µ, σ2 | y) = 1

(2πσ2)n/2exp

[− 1

2σ2

n∑i=1

(yi − µ)2

]

• The posterior is again product of likelihood with prior divided by the denominatorwhich involves an integral

• In this case analytical calculations are possible in 2 of the 3 cases

Bayesian Biostatistics - Piracicaba 2014 201

Page 7: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.3.1 No prior knowledge on µ and σ2 is available

• Noninformative joint prior p(µ, σ2) ∝ σ−2 (µ and σ2 a priori independent)

• Posterior distribution p(µ, σ2 | y) ∝ 1σn+2 exp

{− 1

2σ2

[(n− 1)s2 + n(y − µ)2

]}

6.8

6.9

7.0

7.1

7.2

7.3

7.4

1.2

1.4

1.6

1.8

2.0

2.2

2.4

µσ2

Bayesian Biostatistics - Piracicaba 2014 202

Page 8: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Justification prior distribution

• Most often prior information on several parameters arrives to us for each of theparameters separately and independently ⇒ p(µ, σ2) = p(µ)× p(σ2)

• And, we do not have prior information on µ nor on σ2 ⇒ choice of priordistributions:

��� �� � � ��

���

���

���

���

��

�� ������������ �

��� �� � � ��

���

���

���

���

��

� ������ ������������ �

• The chosen priors are called flat priors

Bayesian Biostatistics - Piracicaba 2014 203

Page 9: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

• Motivation:

◦ If one is totally ignorant of a location parameter, then it could take any valueon the real line with equal prior probability.

◦ If totally ignorant about the scale of a parameter, then it is as likely to lie inthe interval 1-10 as it is to lie in the interval 10-100. This implies a flat prioron the log scale.

• The flat prior p(log(σ)) = c is equivalent to chosen prior p(σ2) ∝ σ−2

Bayesian Biostatistics - Piracicaba 2014 204

Page 10: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Marginal posterior distributions

Marginal posterior distributions are needed in practice

◃ p(µ | y)

◃ p(σ2 | y)

• Calculation of marginal posterior distributions involve integration:

p(µ | y) =∫p(µ, σ2 | y)dσ2 =

∫p(µ | σ2,y)p(σ2 | y)dσ2

• Marginal posterior is weighted sum of conditional posteriors with weights =uncertainty on other parameter(s)

Bayesian Biostatistics - Piracicaba 2014 205

Page 11: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Conditional & marginal posterior distributions for the normal case

• Conditional posterior for µ: p(µ | σ2,y) = N(y, σ2/n)

• Marginal posterior for µ: p(µ | y) = tn−1(y, s2/n)

⇒ µ− y

s/√n∼ t(n−1) (µ is the random variable)

• Marginal posterior for σ2: p(σ2 | y) ≡ Inv− χ2(n− 1, s2)(scaled inverse chi-squared distribution)

⇒ (n− 1)s2

σ2∼ χ2(n− 1) (σ2 is the random variable)

= special case of IG(α, β) (α = (n− 1)/2, β = 1/2)

Bayesian Biostatistics - Piracicaba 2014 206

Page 12: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Some t-densities

�� �� �� � � � �

���

���

���

���

���

������ � ���������

��� �� �� � � � �

���

���

���

���

���

������ � ����������

��� �� �� � � � �

���

���

���

���

���

������ � ����������

�� �� �� � � � �

���

���

���

���

���

������ � ���������

��� �� �� � � � �

���

���

���

���

���

������ � ����������

��� �� �� � � � �

���

���

���

���

���

������ � ����������

Bayesian Biostatistics - Piracicaba 2014 207

Page 13: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Some inverse-gamma densities

� � � � � ��

���

���

���

���

���

���

���� � ������

�� � � � � ��

���

���

���

���

���

���

���� � ������

�� � � � � ��

���

���

���

���

���

���

���� � ����

� � � � � ��

���

���

���

���

���

���

���� � ������

�� � � � � ��

���

���

���

���

���

���

���� � ����������

�� � � � � ��

���

���

���

���

���

���

���� � �������

Bayesian Biostatistics - Piracicaba 2014 208

Page 14: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Joint posterior distribution

• Joint posterior = multiplication of marginal with conditional posterior

p(µ, σ2 | y) = p(µ | σ2,y) p(σ2 | y) = N(y, σ2/n) Inv− χ2(n− 1, s2)

• Normal-scaled-inverse chi-square distribution = N-Inv-χ2(y,n,(n− 1),s2)

6.8

6.9

7.0

7.1

7.2

7.3

7.4

1.2

1.4

1.6

1.8

2.0

2.2

2.4

µσ2

⇒ A posteriori µ and σ2 are dependent

Bayesian Biostatistics - Piracicaba 2014 209

Page 15: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Posterior summary measures and PPD

For µ:

◃ Posterior mean = mode = median = y

◃ Posterior variance = (n−1)n(n−2)s

2

◃ 95% equal tail credible and HPD interval:

[y − t(0.025;n− 1) s/√n, y + t(0.025;n− 1) s/

√n]

For σ2:

◦ Posterior mean, mode, median, variance, 95% equal tail CI all analytically available

◦ 95% HPD interval is computed iteratively

PPD:

◦ tn−1

[y, s2

(1 + 1

n

)]-distribution

Bayesian Biostatistics - Piracicaba 2014 210

Page 16: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Implications of previous results

Frequentist versus Bayesian inference:

◃ Numerical results are the same

◃ Inference is based on different principles

Bayesian Biostatistics - Piracicaba 2014 211

Page 17: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example IV.1: SAP study – Noninformative prior

◃ Example III.6: normal range for alp is too narrow

◃ Joint posterior distribution = N-Inv-χ2 (NI prior + likelihood, see before)

◃ Marginal posterior distributions (red curves) for y = 100/√alp

6.2 6.6 7.0 7.4

01

23

4

µ

Posterior

1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.5

1.0

1.5

2.0

σ2

Posterior

Bayesian Biostatistics - Piracicaba 2014 212

Page 18: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Normal range for alp:

• PPD for y = t249(7.11, 1.37)-distribution

• 95% normal range for alp = [104.1, 513.2], slightly wider than before

Bayesian Biostatistics - Piracicaba 2014 213

Page 19: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.3.2 An historical study is available

• Posterior of historical data can be used as prior to the likelihood of current data

• Prior = N-Inv-χ2(µ0,κ0,ν0,σ20)-distribution (from historical data)

• Posterior = N-Inv-χ2(µ,κ, ν,σ2)-distribution (combining data and N-Inv-χ2 prior)

◃ N-Inv-χ2 is conjugate prior

◃ Again shrinkage of posterior mean towards prior mean

◃ Posterior variance = weighted average of prior-, sample variance and distancebetween prior and sample mean

⇒ posterior variance is not necessarily smaller than prior variance!

• Similar results for posterior measures and PPD as in first case

Bayesian Biostatistics - Piracicaba 2014 214

Page 20: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example IV.2: SAP study – Conjugate prior

• Prior based on retrospective study (Topal et al., 2003) of 65 ‘healthy’ subjects:

◦ Mean (SD) for y = 100/√alp = 5.25 (1.66)

◦ Conjugate prior = N-Inv-χ2(5.25, 65, 64,2.76)

◦ Note: mean (SD) prospective data: 7.11 (1.4), quite different

◦ Posterior = N-Inv-χ2(6.72, 315, 314, 2.61):

◦ Posterior mean in-between between prior mean & sample mean, but:

◦ Posterior precision = prior + sample precision

◦ Posterior variance < prior variance and > sample variance

◦ Posterior informative variance > NI variance

◦ Prior information did not lower posterior uncertainty, reason: conflict oflikelihood with prior

Bayesian Biostatistics - Piracicaba 2014 215

Page 21: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Marginal posteriors:

6.2 6.6 7.0 7.4

01

23

4

µ

Posterior

1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.5

1.0

1.5

2.0

σ2

Posterior

Red curves = marginal posteriors from informative prior (historical data)

Bayesian Biostatistics - Piracicaba 2014 216

Page 22: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Histograms retro- and prospective data:

Prospective data (Likelihood)

100ALP(−1 2)

Density

2 4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

Retrospective data (Informative prior)

100ALP(−1 2)

Density

2 4 6 8 10 12

0.0

0.1

0.2

0.3

Bayesian Biostatistics - Piracicaba 2014 217

Page 23: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.3.3 Expert knowledge is available

• Expert knowledge available on each parameter separately

⇒ Joint prior N(µ0, σ20) × Inv− χ2(ν0, τ

20 ) = conjugate

• Posterior cannot be derived analytically, but numerical/sampling techniques areavailable

Bayesian Biostatistics - Piracicaba 2014 218

Page 24: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

What now?

Computational problem:

◃ ‘Simplest problem’ in classical statistics is already complicated

◃ Ad hoc solution is still possible, but not satisfactory

◃ There is the need for another approach

Bayesian Biostatistics - Piracicaba 2014 219

Page 25: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.4 Multivariate distributions

Distributions with a multivariate response:

◃ Multivariate normal distribution: generalization of normal distribution

◃ Multivariate Student’s t-distribution: generalization of location-scale t-distribution

◃ Multinomial distribution: generalization of binomial distribution

Multivariate prior distributions:

◃ N-Inv-χ2-distribution: prior for N(µ, σ2)

◃ Dirichlet distribution: generalization of beta distribution

◃ (Inverse-)Wishart distribution: generalization of (inverse-) gamma (prior) forcovariance matrices (see mixed model chapter)

Bayesian Biostatistics - Piracicaba 2014 220

Page 26: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example IV.3: Young adult study – Smoking and alcohol drinking

• Study examining life style among young adults

Smoking

Alcohol No Yes

No-Mild 180 41

Moderate-Heavy 216 64

Total 396 105

• Of interest: association between smoking & alcohol-consumption

Bayesian Biostatistics - Piracicaba 2014 221

Page 27: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Likelihood part:

2×2 contingency table = multinomial model Mult(n,θ)

• θ = {θ11, θ12, θ21, θ22 = 1− θ11 − θ12 − θ21} and 1 =∑

i,j θij

• y = {y11, y12, y21, y22} and n =∑

i,j yij

Mult(n,θ) =n!

y11! y12! y21! y22!θy1111 θ

y1212 θ

y1121 θ

y2222

Bayesian Biostatistics - Piracicaba 2014 222

Page 28: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Dirichlet prior:

Conjugate prior to multinomial distribution = Dirichlet prior Dir(α)

θ ∼ 1

B(α)

∏i,j

θαij−1ij

◦ α = {α11, α12, α21, α22}

◦ B(α) =∏

i,j Γ(αij)/Γ(∑

i,j αij

)⇒ Posterior distribution = Dir(α + y)

• Note:

◦ Dirichlet distribution = extension of beta distribution to higher dimensions

◦ Marginal distributions of a Dirichlet distribution = beta distribution

Bayesian Biostatistics - Piracicaba 2014 223

Page 29: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Measuring association:

• Association between smoking and alcohol consumption:

ψ =θ11 θ22θ12 θ21

• Needed p(ψ | y), but difficult to derive

• Alternatively replace analytical calculations by sampling procedure

Bayesian Biostatistics - Piracicaba 2014 224

Page 30: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Analysis of contingency table:

• Prior distribution: Dir(1, 1, 1, 1)

• Posterior distribution: Dir(180+1, 41+1,216+1, 64+1)

• Sample of 10, 000 generated values for θ parameters

• 95% equal tail CI for ψ: [0.839, 2.014]

• Equal to classically obtained estimate

Bayesian Biostatistics - Piracicaba 2014 225

Page 31: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Posterior distributions:

θ11

0.30 0.35 0.40 0.45

05

10

15

20

θ12

0.04 0.06 0.08 0.10 0.12

05

101520253035

θ21

0.35 0.40 0.45 0.50

05

10

15

ψ

0.5 1.0 1.5 2.0 2.5 3.0 3.5

0.0

0.4

0.8

1.2

Bayesian Biostatistics - Piracicaba 2014 226

Page 32: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.5 Frequentist properties of Bayesian inference

• Not of prime interest for a Bayesian to know the sampling properties of estimators

• However, it is important that Bayesian approach gives most often the right answer

• What is known?

◃ Theory: posterior is normal for a large sample (BCLT)

◃ Simulations: Bayesian approach may offer alternative interval estimators withbetter coverage than classical frequentist approaches

Bayesian Biostatistics - Piracicaba 2014 227

Page 33: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.6 The Method of Composition

A method to yield a random sample from a multivariate distribution

• Stagewise approach

• Based on factorization of joint distribution into a marginal & several conditionals

p(θ1, . . . , θd | y) = p(θd | y) p(θd−1 | θd,y) . . . p(θ1 | θd−1, . . . , θ2,y)

• Sampling approach:

◃ Sample θd from p(θd | y)

◃ Sample θ(d−1) from p(θ(d−1) | θd,y)◃ . . .

◃ Sample θ1 from p(θ1 | θd−1, . . . , θ2,y)

Bayesian Biostatistics - Piracicaba 2014 228

Page 34: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Sampling from posterior when y ∼ N(µ, σ2), both parameters unknown

• Sample first σ2, then given a sampled value of σ2 (σ2) sample µ from p(µ | σ2,y)

• Output case 1: No prior knowledge on µ and σ2 on next page

Bayesian Biostatistics - Piracicaba 2014 229

Page 35: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Sampled posterior distributions:

σ2

1.4 1.6 1.8 2.0 2.2 2.4

0.0

0.5

1.0

1.5

2.0

(a)

µ

6.9 7.1 7.3

01

23

45 (b)

6.9 7.0 7.1 7.2 7.3 7.4

1.4

1.6

1.8

2.0

2.2

2.4

µ

σ2

(c)

y~4 6 8 10 12

0.00

0.10

0.20

0.30

(d)

Bayesian Biostatistics - Piracicaba 2014 230

Page 36: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.7 Bayesian linear regression models

• Example of a classical multiple linear regression analysis

• Non-informative Bayesian multiple linear regression analysis:

◃ Non-informative prior for all parameters + . . . classical linear regression model

◃ Analytical results are available + method of composition can be applied

Bayesian Biostatistics - Piracicaba 2014 231

Page 37: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.7.1 The frequentist approach to linear regression

Classical regression model: y =Xβ + ε

. y = a n× 1 vector of independent responses

.X = n× (d + 1) design matrix

. β = (d + 1)× 1 vector of regression parameters

. ε = n× 1 vector of random errors ∼ N(0, σ2 I)

Likelihood:

L(β, σ2 | y,X) =1

(2πσ2)n/2exp

[− 1

2σ2(y −Xβ)T (y −Xβ)

]. MLE = LSE of β: β = (XTX)−1XTy

. Residual sum of squares: S = (y −Xβ)T (y −Xβ)

. Mean residual sum of squares: s2 = S/(n− d− 1)

Bayesian Biostatistics - Piracicaba 2014 232

Page 38: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example IV.7: Osteoporosis study: a frequentist linear regression analysis

◃ Cross-sectional study (Boonen et al., 1996)

◃ 245 healthy elderly women in a geriatric hospital

◃ Aim: Find determinants for osteoporosis

◃ Average age women = 75 yrs with a range of 70-90 yrs

◃ Marker for osteoporosis = tbbmc (in kg) measured for 234 women

◃ Simple linear regression model: regressing tbbmc on bmi

◃ Classical frequentist regression analysis:

◦ β0 = 0.813 (0.12)

◦ β1 = 0.0404 (0.0043)

◦ s2 = 0.29, with n− d− 1 = 232

◦ corr(β0, β1) =-0.99

Bayesian Biostatistics - Piracicaba 2014 233

Page 39: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Scatterplot + fitted regression line:

20 25 30 35 40

0.5

1.0

1.5

2.0

2.5

BMI(kg m2)

TB

BM

C (

kg)

Bayesian Biostatistics - Piracicaba 2014 234

Page 40: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.7.2 A noninformative Bayesian linear regression model

Bayesian linear regression model = prior information on regression parameters &residual variance + normal regression likelihood

• Noninformative prior for (β, σ2): p(β, σ2) ∝ σ−2

• Notation: omit design matrix X

• Posterior distributions:

p(β, σ2 | y) = N(d+1)

[β | β, σ2(XTX)−1

]× Inv− χ2(σ2 | n− d− 1, s2)

p(β | σ2,y) = N(d+1)

[β | β, σ2(XTX)−1

]p(σ2 | y) = Inv− χ2(σ2 | n− d− 1, s2)

p(β | y) = Tn−d−1

[β | β, s2(XTX)−1

]Bayesian Biostatistics - Piracicaba 2014 235

Page 41: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.7.3 Posterior summary measures for the linear regression model

• Posterior summary measures of

(a) regression parameters β

(b) parameter of residual variability σ2

• Univariate posterior summary measures

◃ The marginal posterior mean (mode, median) of βj = MLE (LSE) βj

◃ 95% HPD interval for βj

◃ Marginal posterior mode and mean of σ2

◃ 95% HPD-interval for σ2

Bayesian Biostatistics - Piracicaba 2014 236

Page 42: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Multivariate posterior summary measures

Multivariate posterior summary measures for β

• Posterior mean (mode) of β = β (MLE=LSE)

• 100(1-α)%-HPD region

• Contour probability for H0 : β = β0

Bayesian Biostatistics - Piracicaba 2014 237

Page 43: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Posterior predictive distribution

• PPD of y with x: t-distribution

• How to sample?

◃ Directly from t-distribution

◃ Method of Composition

Bayesian Biostatistics - Piracicaba 2014 238

Page 44: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.7.4 Sampling from the posterior distribution

• Most posteriors can be sampled via standard sampling algorithms

• What about p(β | y) = multivariate t-distribution? How to sample from thisdistribution? (R function rmvt in mvtnorm)

• Easy with Method of Composition: Sample in two steps

◃ Sample from p(σ2 | y): scaled inverse chi-squared distribution ⇒ σ2

◃ Sample from p(β | σ2,y) = multivariate normal distribution

Bayesian Biostatistics - Piracicaba 2014 239

Page 45: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example IV.8: Osteoporosis study – Sampling with Method of Composition

• Sample σ2 from p(σ2 | y) = Inv− χ2(σ2 | n− d− 1, s2)

• Sample from β from p(β | σ2,y) = N(d+1)

[β | β, σ2(XTX)−1

]• Sampled mean regression vector = (0.816, 0.0403)

• 95% equal tail CIs = β0: [0.594, 1.040] & β1: [0.0317, 0.0486]

• Contour probability for H0 : β = 0 < 0.001

• Marginal posterior of (β0, β1) has a ridge (r(β0, β1) = −0.99)

Bayesian Biostatistics - Piracicaba 2014 240

Page 46: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

PPD:

• Distribution of a future observation at bmi=30

• Sample future observation y from N(µ30, σ230):

◃ µ30 = βT(1, 30)

◃ σ230 = σ2[1 + (1, 30)(XTX)−1(1, 30)T

]• Sampled mean and standard deviation = 2.033 and 0.282

Bayesian Biostatistics - Piracicaba 2014 241

Page 47: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Posterior distributions:

β0

0.6 0.8 1.0 1.2

01

23

4 (a)

β1

0.025 0.035 0.045

020

40

60

80

100

(b)

0.6 0.8 1.0 1.2

0.025

0.035

0.045

β0

β1

(c)

y~1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5 (d)

Bayesian Biostatistics - Piracicaba 2014 242

Page 48: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.8 Bayesian generalized linear models

Generalized Linear Model (GLIM): extension of the linear regression model to a wideclass of regression models

• Examples:

◦ Normal linear regression model with normal distribution for continuousresponse and σ2 assumed known

◦ Poisson regression model with Poisson distribution for count response, andlog(mean) = linear function of covariates

◦ Logistic regression model with Bernoulli distribution for binary response andlogit of probability = linear function of covariates

Bayesian Biostatistics - Piracicaba 2014 243

Page 49: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

4.8.1 More complex regression models

• Considered multiparameter models are limited

◃ Weibull distribution for alp?

◃ Censored/truncated data?

◃ Cox regression?

• Postpone to MCMC techniques

Bayesian Biostatistics - Piracicaba 2014 244

Page 50: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Take home messages

• Any practical application involves more than one parameter, hence immediatelyBayesian inference is multivariate even with univariate data.

• A multivariate prior is needed and a multivariate posterior is obtained, but themarginal posterior is the basis for practical inference

• Nuisance parameters:

◃ Bayesian inference: average out nuisance parameter

◃ Classical inference: profile out (maximize out nuisance parameter)

• Multivariate independent sampling can be done, if marginals can be computed

• Frequentist properties of Bayesian estimators (with NI priors) often good

Bayesian Biostatistics - Piracicaba 2014 245

Page 51: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Chapter 5

Choosing the prior distribution

Aims:

◃ Review the different principles that lead to a prior distribution

◃ Critically review the impact of the subjectivity of prior information

Bayesian Biostatistics - Piracicaba 2014 246

Page 52: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.1 Introduction

Incorporating prior knowledge

◃ Unique feature for Bayesian approach

◃ But might introduce subjectivity

◃ Useful in clinical trials to reduce sample size

In this chapter we review different kinds of priors:

◃ Conjugate

◃ Noninformative

◃ Informative

Bayesian Biostatistics - Piracicaba 2014 247

Page 53: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.2 The sequential use of Bayes theorem

• Posterior of the kth experiment = prior for the (k + 1)th experiment (sequentialsurgeries)

• In this way, the Bayesian approach can mimic our human learning process

• Meaning of ‘prior’ in prior distribution:

◦ Prior: prior knowledge should be specified independent of the collected data

◦ In RCTs: fix the prior distribution in advance

Bayesian Biostatistics - Piracicaba 2014 248

Page 54: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.3 Conjugate prior distributions

In this section:

• Conjugate priors for univariate & multivariate data distributions

• Conditional conjugate and semi-conjugate distributions

• Hyperpriors

Bayesian Biostatistics - Piracicaba 2014 249

Page 55: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.3.1 Conjugate priors for univariate data distributions

• In previous chapters, examples were given whereby combination of prior withlikelihood, gives posterior of the same type as the prior.

• This property is called conjugacy.

• For an important class of distributions (those that belong to exponential family)there is a recipe to produce the conjugate prior

Bayesian Biostatistics - Piracicaba 2014 250

Page 56: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Table conjugate priors for univariate discrete data distributions

Exponential family member Parameter Conjugate prior

UNIVARIATE CASE

Discrete distributions

Bernoulli Bern(θ) θ Beta(α0, β0)

Binomial Bin(n,θ) θ Beta(α0, β0)

Negative Binomial NB(k,θ) θ Beta(α0, β0)

Poisson Poisson(λ) λ Gamma(α0, β0)

Bayesian Biostatistics - Piracicaba 2014 251

Page 57: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Table conjugate priors for univariate continuous data distributions

Exponential family member Parameter Conjugate prior

UNIVARIATE CASE

Continuous distributions

Normal-variance fixed N(µ, σ2)-σ2 fixed µ N(µ0, σ20)

Normal-mean fixed N(µ, σ2)-µ fixed σ2 IG(α0, β0)

Inv-χ2(ν0, τ20 )

Normal∗ N(µ, σ2) µ, σ2 NIG(µ0, κ0, a0, b0)

N-Inv-χ2(µ0,κ0,ν0,τ20 )

Exponential Exp(λ) λ Gamma(α0, β0)

Bayesian Biostatistics - Piracicaba 2014 252

Page 58: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Recipe to choose conjugate priors

p(y | θ) ∈ exponential family:

p(y | θ) = b(y) exp[c(θ)Tt(y) + d(θ)

]◦ d(θ), b(y) = scalar functions, c(θ) = (c1(θ), . . . , cd(θ))

T

◦ t(y) = d-dimensional sufficient statistic for θ (canonical parameter)

◦ Examples: Binomial distribution, Poisson distribution, normal distribution, etc.

For a random sample y = {y1, . . . , yn} of i.i.d. elements:

p(y | θ) = b(y) exp[c(θ)Tt(y) + nd(θ)

]◦ b(y) =

∏n1 b(yi) & t(y) =

∑n1 t(yi)

Bayesian Biostatistics - Piracicaba 2014 253

Page 59: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Recipe to choose conjugate priors

For the exponential family, the class of prior distributions ℑ closed under sampling =

p(θ | α, β) = k(α, β) exp[c(θ)Tα + βd(θ)

]◦ α = (α1, . . . , αd)

T and β hyperparameters

◦ Normalizing constant: k(α, β) = 1/∫exp[c(θ)Tα + βd(θ)

]dθ

Proof of closure:

p(θ | y) ∝ p(y | θ)p(θ)= exp

[c(θ)Tt(y) + n d(θ)

]exp[c(θ)Tα + βd(θ)

]= exp

[c(θ)Tα∗ + β∗d(θ)

],

with α∗ = α + t(y), β∗ = β + n

Bayesian Biostatistics - Piracicaba 2014 254

Page 60: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Recipe to choose conjugate priors

• Above rule gives the natural conjugate family

• Enlarge class of priors ℑ by adding extra parameters: conjugate family of priors,again closed under sampling (O’Hagan & Forster, 2004)

• The conjugate prior has the same functional form as the likelihood, obtained byreplacing the data (t(y) and n) by parameters (α and β)

• A conjugate prior is model-dependent, in fact likelihood-dependent

Bayesian Biostatistics - Piracicaba 2014 255

Page 61: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Practical advantages when using conjugate priors

A (natural) conjugate prior distribution for the exponential family is convenient fromseveral viewpoints:

• mathematical

• numerical

• interpretational (convenience prior):

◃ The likelihood of historical data can be easily turned into a conjugate prior.

The natural conjugate distribution = equivalent to a fictitious experiment

◃ For a natural conjugate prior, the posterior mean = weighted combination ofthe prior mean and sample estimate

Bayesian Biostatistics - Piracicaba 2014 256

Page 62: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example V.2: Dietary study – Normal versus t-prior

• Example II.2: IBBENS-2 normal likelihood was combined with N(328,100)(conjugate) prior distribution

• Replace the normal prior by a t30(328, 100)-prior

⇒ posterior practically unchanged, but 3 elegant features of normal prior are lost:

◃ Posterior cannot be determined analytically

◃ Posterior is not of the same class as the prior

◃ Posterior summary measures are not obvious functions of the prior and thesample summary measures

Bayesian Biostatistics - Piracicaba 2014 257

Page 63: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.3.2 Conjugate prior for normal distribution – mean and varianceunknown

N(µ, σ2) with µ and σ2 unknown ∈ two-parameter exponential family

• Conjugate = product of a normal prior with inverse gamma prior

• Notation: NIG(µ0, κ0, a0, b0)

Bayesian Biostatistics - Piracicaba 2014 258

Page 64: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Mean known and variance unknown

• For σ2 unknown and µ known :

Natural conjugate is inverse gamma (IG)

Equivalently: scaled inverse-χ2 distribution (Inv-χ2)

Bayesian Biostatistics - Piracicaba 2014 259

Page 65: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.3.3 Multivariate data distributions

Priors for two popular multivariate models:

• Multinomial model

• Multivariate normal model

Bayesian Biostatistics - Piracicaba 2014 260

Page 66: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Table conjugate priors for multivariate data distributions

Exponential family member Parameter Conjugate prior

MULTIVARIATE CASE

Discrete distributions

Multinomial Mult(n,θ) θ Dirichlet(α0)

Continuous distributions

Normal-covariance fixed N(µ, Σ)-Σ fixed µ N(µ0, Σ0)

Normal-mean fixed N(µ, Σ)-µ fixed Σ IW(Λ0, ν0)

Normal∗ N(µ, Σ) µ, Σ NIW(µ0, κ0, ν0, Λ0)

Bayesian Biostatistics - Piracicaba 2014 261

Page 67: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Multinomial model

Mult(n,θ): p(y | θ) = n!y1!y2!...yk!

∏kj=1 θ

yjj ∈ exponential family

Natural conjugate: Dirichlet(α0) distribution

p(θ | α0) =∏kj=1 Γ(α0j)∑kj=1 Γ(α0j)

∏kj=1 θ

α0j−1j

Properties:

◃ Posterior distribution = Dirichlet(α0 + y)

◃ Beta distribution = special case of a Dirichlet distribution with k = 2

◃ Marginal distributions of the Dirichlet distribution = beta distributions

◃ Dirichlet(1, 1, . . . , 1) = extension of the classical uniform prior Beta(1,1)

Bayesian Biostatistics - Piracicaba 2014 262

Page 68: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Multivariate normal model

The p-dimensional multivariate normal distribution:

p(y1, . . . ,yn | µ,Σ) = 1(2π)np/2|Σ|1/2 exp

[−1

2

∑ni=1(yi − µ)TΣ−1(yi − µ)

]

Conjugates:

◃ Σ known and µ unknown: N(µ0,Σ0) for µ

◃ Σ unknown and µ known: inverse Wishart distribution IW(Λ0, ν0) for Σ

◃ Σ unknown and µ unknown:

Normal-inverse Wishart distribution NIW(µ0, κ0, ν0,Λ0) for µ and Σ

Bayesian Biostatistics - Piracicaba 2014 263

Page 69: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.3.4 Conditional conjugate and semi-conjugate priors

Example θ = (µ, σ2) for y ∼ N(µ, σ2)

• Conditional conjugate for µ: N(µ0, σ20)

• Conditional conjugate for σ2: IG(α, β)

• Semi-conjugate prior = product of conditional conjugates

• Often conjugate priors cannot be used in WinBUGS, but semi-conjugates arepopular

Bayesian Biostatistics - Piracicaba 2014 264

Page 70: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.3.5 Hyperpriors

Conjugate priors are restrictive to present prior knowledge

⇒ Give parameters of conjugate prior also a prior

Example:

• Prior: θ ∼ Beta(1, 1)

• Instead: θ ∼ Beta(α, β) and α ∼ Gamma(1, 3), β ∼ Gamma(2, 4)

◃ α, β = hyperparameters

◃ Gamma(1, 3)× Gamma(2, 4) = hyperprior/hierarchical prior

• Aim: more flexibility in prior distribution (and useful for Gibbs sampling)

Bayesian Biostatistics - Piracicaba 2014 265

Page 71: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.4 Noninformative prior distributions

Bayesian Biostatistics - Piracicaba 2014 266

Page 72: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.4.1 Introduction

Sometimes/often researchers cannot or do not wish to make use of prior knowledge⇒ prior should reflect this absence of knowledge

• Prior that express no knowledge = (initially) called a noninformative (NI)

• Central question: What prior reflects absence of knowledge?

◃ Flat prior?

◃ Huge amount of research to find best NI prior

◃ Other terms for NI: non-subjective, objective, default, reference, weak, diffuse,flat, conventional and minimally informative, etc

• Challenge: make sure that posterior is a proper distribution!

Bayesian Biostatistics - Piracicaba 2014 267

Page 73: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.4.2 Expressing ignorance

• Equal prior probabilities = principle of insufficient reason, principle of indifference,Bayes-Laplace postulate

• Unfortunately, but . . . flat prior cannot express ignorance

Bayesian Biostatistics - Piracicaba 2014 268

Page 74: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Ignorance at different scales:

0 1 2 3 4 5

01

23

45

Flat prior on σ

σ

Pri

or

0 1 2 3 4 5

01

23

45

Flat prior on σ

σ2

Pri

or

0 1 2 3 4 5

01

23

45

Flat prior on σ2

σ2

Pri

or

0 1 2 3 4 50

12

34

5

Flat prior on σ2

σ

Pri

or

Ignorance on σ-scale is different from ignorance on σ2-scale

Bayesian Biostatistics - Piracicaba 2014 269

Page 75: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Ignorance cannot be expressed mathematically

Bayesian Biostatistics - Piracicaba 2014 270

Page 76: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.4.3 General principles to choose noninformative priors

A lot of research has been spent on the specification of NI priors, most popular areJeffreys priors:

• Result of a Bayesian analysis depends on choice of scale for flat prior:

p(θ) ∝ c or p(h(θ)) ≡ p(ψ) ∝ c

• To preserve conclusions when changing scale: Jeffreys suggested a rule toconstruct priors based on the invariance principle/rule (conclusions do notchange when changing scale)

• Jeffreys rule suggests a way to choose a scale to take the flat prior on

• Jeffreys rule also exists for more than one parameter (Jeffreys multi-parameterrule)

Bayesian Biostatistics - Piracicaba 2014 271

Page 77: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Examples of Jeffreys priors

• Binomial model: p(θ) ∝ θ−1/2(1− θ)−1/2 ⇔ flat prior on ψ(θ) ∝ arcsin√θ

• Poisson model: p(λ) ∝ λ−1/2 ⇔ flat prior on ψ(λ) =√λ

• Normal model with σ fixed: p(µ) ∝ c

• Normal model with µ fixed: p(σ2) ∝ σ−2 ⇔ flat prior on log(σ)

• Normal model with µ and σ2 unknown: p(µ, σ2) ∝ σ−2, which reproducessome classical frequentist results !!!

Bayesian Biostatistics - Piracicaba 2014 272

Page 78: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.4.4 Improper prior distributions

• Many NI priors are improper (= AUC is infinite)

• Improper prior is technically no problem when posterior is proper

• Example: Normal likelihood (µ unknown + σ2 known) + flat prior on µ

p(µ | y) = p(y | µ) p(µ)∫p(y | µ) p(µ) dµ

=p(y | µ) c∫p(y | µ) c dµ

=1√

2πσ/√nexp

[−n2

(µ− y

σ

)2]

• Complex models: difficult to know when improper prior yields a proper posterior(variance of the level-2 obs in Gaussian hierarchical model)

• Interpretation of improper priors?

Bayesian Biostatistics - Piracicaba 2014 273

Page 79: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.4.5 Weak/vague priors

• For practical purposes: sufficient that prior is locally uniform also called vague orweak

• Locally uniform: prior ≈ constant on interval outside which likelihood ≈ zero

• Examples for N(µ, σ2) likelihood:

◦ µ: N(0, σ20) prior with σ0 large◦ σ2: IG(ε, ε) prior with ε small ≈ Jeffreys prior

Bayesian Biostatistics - Piracicaba 2014 274

Page 80: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Locally uniform prior

0 200 400 600 800 1000

0.0

00

0.0

05

0.0

10

0.0

15

0.0

20

0.0

25

µ

LOCALLY UNIFORM PRIOR

LIKELIHOOD

POSTERIOR

Bayesian Biostatistics - Piracicaba 2014 275

Page 81: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Vague priors in software:

• WinBUGS allows only (proper) vague priors (Jeffreys priors are not allowed)

◦ mu ∼ dnorm(0.0,1.0E-6): normal prior with variance = 1000

◦ tau2 ∼ dgamma(0.001,0.001): inverse gamma prior for variance withshape=rate=10−3

• SAS allows improper priors (allows Jeffreys priors)

Bayesian Biostatistics - Piracicaba 2014 276

Page 82: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Density of log(σ) for σ2 (= 1/τ 2) ∼ IG(ε, ε)

� � � � � �

���

���

���

���

���

��� ��� � ���

��

��

������

�������

� �� �� �� ��

����������������������������

� � � � � �

���

���

���

���

��� ��� � ����

��

��

������

�������

� ��� ��� ���

�����

�����

�����

�����

�����

Bayesian Biostatistics - Piracicaba 2014 277

Page 83: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Density of log(σ) for σ2 ∼ IG(ε, ε)

� � � � � �

���

���

���

���

��� ��� � �����

��

��

������

�������

� ��� ��� ���

������

������

������

������

������

� � � � � �

����

����

����

����

���

�� ������ � �����

��

��

������

������

� ��� ��� ���

������

������

������

������

Bayesian Biostatistics - Piracicaba 2014 278

Page 84: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.5 Informative prior distributions

Bayesian Biostatistics - Piracicaba 2014 279

Page 85: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.5.1 Introduction

• In basically all research some prior knowledge is available

• In this section:

◃ Formalize the use of historical data as prior information using the power prior

◃ Review the use of clinical priors, which are prior distributions based on eitherhistorical data or on expert knowledge

◃ Priors that are based on formal rules expressing prior skepticism and optimism

• The set of priors representing prior knowledge = subjective or informative priors

• But, first two success stories how the Bayesian approach helped to find:

◃ a crashed plane

◃ a lost fisherman on the Atlantic Ocean

Bayesian Biostatistics - Piracicaba 2014 280

Page 86: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Locating a lost plane

◃ Statisticians helped locate an Air France plane in 2011 which was missing for twoyears using Bayesian methods

◃ June 2009: Air France flight 447 went missing flying from Rio de Janeiro in Brazilto Paris, France

◃ Debris from the Airbus A330 was found floating on the surface of the Atlantic fivedays later

◃ After a number of days, the debris would have moved with the ocean current,hence finding the black box is not easy

◃ Existing software (used by the US Coast Guard) did not help

◃ Senior analyst at Metron, Colleen Keller, relied on Bayesian methods to locate theblack box in 2011

Bayesian Biostatistics - Piracicaba 2014 281

Page 87: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Members of the Brazilian Frigate Constituicao recovering debris

in June 2009

Debris from the Air France crash is laid out for investi-

gation in 2009

A 2009 infrared satellite image shows weather conditions

off the Brazilian coast and the plane search area

Bayesian Biostatistics - Piracicaba 2014 282

Page 88: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Finding a lost fisherman on the Atlantic Ocean

New York Times (30 September 2014)

◃ ”. . . if not for statisticians, a Long Island fisherman might have died in theAtlantic Ocean after falling off his boat early one morning last summer

◃ The man owes his life to a once obscure field known as Bayesian statistics - a setof mathematical rules for using new data to continuously update beliefs or existingknowledge

◃ It is proving especially useful in approaching complex problems, including searcheslike the one the Coast Guard used in 2013 to find the missing fisherman, JohnAldridge

◃ But the current debate is about how scientists turn data into knowledge, evidenceand predictions. Concern has been growing in recent years that some fields are notdoing a very good job at this sort of inference. In 2012, for example, a team atthe biotech company Amgen announced that they’d analyzed 53 cancer studies

Bayesian Biostatistics - Piracicaba 2014 283

Page 89: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

and found it could not replicate 47 of them

◃ The Coast Guard has been using Bayesian analysis since the 1970s. The approachlends itself well to problems like searches, which involve a single incident and manydifferent kinds of relevant data, said Lawrence Stone, a statistician for Metron, ascientific consulting firm in Reston, Va., that works with the Coast Guard

The Coast Guard, guided by the statistical

method of Thomas Bayes, was able to find the

missing fisherman John Aldridge.

Bayesian Biostatistics - Piracicaba 2014 284

Page 90: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.5.2 Data-based prior distributions

• In previous chapters:

◦ Combined historical data with current data assuming identical conditions

◦ Discounted importance of prior data by increasing variance

• Generalized by power prior (Ibrahim and Chen):

◦ Likelihood historical data: L(θ | y0) based on y0 = {y01, . . . , y0n0}◦ Prior of historical data: p0(θ | c0)◦ Power prior distribution:

p(θ | y0, a0) ∝ L(θ | y0)a0 p0(θ | c0)

with 0 no accounting ≤ a0 ≤ 1 fully accounting

Bayesian Biostatistics - Piracicaba 2014 285

Page 91: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.5.3 Elicitation of prior knowledge

• Elicitation of prior knowledge: turn (qualitative) information from ‘experts’ intoprobabilistic language

• Challenges:

◃ Most experts have no statistical background

◃ What to ask to construct prior distribution:

◦ Prior mode, median, mean and prior 95% CI?

◦ Description of the prior: quartiles, mean, SD?

◃ Some probability statements are easier to elicit than others

Bayesian Biostatistics - Piracicaba 2014 286

Page 92: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example V.5: Stroke study – Prior for 1st interim analysis from experts

Prior knowledge on θ (incidence of SICH), elicitation based on:

◦ Most likely value for θ and prior equal-tail 95% CI

◦ Prior belief pk on each of the K intervals Ik ≡ [θk−1, θk) covering [0,1]

0.0 0.1 0.2 0.3 0.4

02

46

810

θ

Bayesian Biostatistics - Piracicaba 2014 287

Page 93: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Elicitation of prior knowledge – some remarks

• Community and consensus prior: obtained from a community of experts

• Difficulty in eliciting prior information on more than 1 parameter jointly

• Lack of Bayesian papers based on genuine prior information

Bayesian Biostatistics - Piracicaba 2014 288

Page 94: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Identifiability issues

• With overspecified model: non-identifiable model

• Unidentified parameter, when given a NI prior also posterior is NI

• Bayesian approach can make parameters estimable, so that it becomes anidentifiable model

• In next example, not all parameters can be estimated without extra (prior)information

Bayesian Biostatistics - Piracicaba 2014 289

Page 95: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example V.6: Cysticercosis study – Estimate prevalence without gold standard

Experiment:

◃ 868 pigs tested in Zambia with Ag-ELISA diagnostic test

◃ 496 pigs showed a positive test

◃ Aim: estimate the prevalence π of cysticercosis in Zambia among pigs

If estimate of sensitivity α and specificity β available, then:

π =p+ + β − 1

α + β − 1

◦ p+ = n+/n = proportion of subjects with a positive test

◦ α and β = estimated sensitivity and specificity

Bayesian Biostatistics - Piracicaba 2014 290

Page 96: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Data:

Table of results:

Test Disease (True) Observed

+ -

+ πα (1− π)(1− β) n+=496

- π(1− α) (1− π)β n−=372

Total π (1− π) n=868

◃ Only collapsed table is available

◃ Since α and β vary geographically, expert knowledge is needed

Bayesian Biostatistics - Piracicaba 2014 291

Page 97: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Prior and posterior:

• Prior distribution on π (p(π)) , α (p(α)) and β (p(β)) is needed

• Posterior distribution:

p(π, α, β | n+, n−) ∝(nn+

)[πα + (1− π)(1− β)]n

+

[π(1− α) + (1− π)β]n−p(π)p(α)p(β)

• WinBUGS was used

Bayesian Biostatistics - Piracicaba 2014 292

Page 98: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Posterior of π:

(a) Uniform priors for π, α and β (no prior information)

(b) Beta(21,12) prior for α and Beta(32,4) prior for β (historical data)

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

N = 10000 Bandwidth = 0.04473

(a)

p(π|y)

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

N = 10000 Bandwidth = 0.01542

(b)

p(π|y)

Bayesian Biostatistics - Piracicaba 2014 293

Page 99: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.5.4 Archetypal prior distributions

• Use of prior information in Phase III RCTs is problematic, except for medicaldevice trials (FDA guidance document)

⇒ Pleas for objective priors in RCTs

• There is a role of subjective priors for interim analyses:

◃ Skeptical prior

◃ Enthusiastic prior

Bayesian Biostatistics - Piracicaba 2014 294

Page 100: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example V.7: Skeptical priors in a phase III RCT

Tan et al. (2003):

◃ Phase III RCT for treating patients with hepatocellular carcinoma

◃ Standard treatment: surgical resection

◃ Experimental treatment: surgery + adjuvant radioactive iodine (adjuvant therapy)

◃ Planning: recruit 120 patients

Frequentist interim analyses for efficacy were planned:

◃ First interim analysis (30 patients): experimental treatment better (P = 0.01< 0.029 = P -value of stopping rule)

◃ But, scientific community was skeptical about adjuvant therapy

⇒ New multicentric trial (300 patients) was set up

Bayesian Biostatistics - Piracicaba 2014 295

Page 101: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Prior to the start of the subsequent trial:

◃ Pretrial opinions of the 14 clinical investigators were elicited

◃ The prior distributions of each investigator were constructed by eliciting the priorbelief on the treatment effect (adjuvant versus standard) on a grid of intervals

◃ Average of all priors = community prior

◃ Average of the priors of the 5 most skeptical investigators = skeptical prior

To exemplify the use of the skeptical prior:

◃ Combine skeptical prior with interim analysis results of previous trial

⇒ 1-sided contour probability (in 1st interim analysis) = 0.49

⇒ The first trial would not have been stopped for efficacy

Bayesian Biostatistics - Piracicaba 2014 296

Page 102: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Questionnaire:

Bayesian Biostatistics - Piracicaba 2014 297

Page 103: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Prior of investigators:

Bayesian Biostatistics - Piracicaba 2014 298

Page 104: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Skeptical priors:

Bayesian Biostatistics - Piracicaba 2014 299

Page 105: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

A formal skeptical/enthusiastic prior

Formal subjective priors (Spiegelhalter et al., 1994) in normal case:

• Useful in the context of monitoring clinical trials in a Bayesian manner

• θ = true effect of treatment (A versus B)

• Skeptical normal prior: choose mean and variance of p(θ) to reflect skepticism

• Enthusiastic normal prior: choose mean and variance of p(θ) to reflectenthusiasm

• See figure next page & book

Bayesian Biostatistics - Piracicaba 2014 300

Page 106: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example V.8+9

−1.0 −0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

θ

skeptical prior

enthusiastic prior

θa

5%5%

Bayesian Biostatistics - Piracicaba 2014 301

Page 107: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.6 Prior distributions for regression models

Bayesian Biostatistics - Piracicaba 2014 302

Page 108: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.6.1 Normal linear regression

Normal linear regression model:

yi = xTi β + εi, (i = 1, . . . , n)

y =Xβ + ε

Bayesian Biostatistics - Piracicaba 2014 303

Page 109: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Priors

• Non-informative priors:

◃ Popular NI prior: p(β, σ2) ∝ σ−2 (Jeffreys multi-parameter rule)

◃ WinBUGS: product of independent N(0, σ20) (σ

20 large) + IG(ε, ε) (ε small)

• Conjugate priors:

◃ Conjugate NIG prior = N(β0, σ2Σ0) × IG(a0, b0) (or Inv-χ

2(ν0, τ20 ))

• Historical/expert priors:

◃ Prior knowledge on regression coefficients must be given jointly

◃ Elicitation process via distributions at covariate values

◃ Most popular: express prior based on historical data

Bayesian Biostatistics - Piracicaba 2014 304

Page 110: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.6.2 Generalized linear models

• In practice choice of NI priors much the same as with linear models

• But, too large prior variance may not be best for sampling, e.g. in logisticregression model

• In SAS: Jeffreys (improper) prior can be chosen

• Conjugate priors are based on fictive historical data

◃ Data augmentation priors & conditional mean priors

◃ Not implemented in classical software, but fictive data can be explicitly addedand then standard software can be used

Bayesian Biostatistics - Piracicaba 2014 305

Page 111: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.7 Modeling priors

Modeling prior: adapt characteristics of the statistical model

• Multicollinearity: appropriate prior avoids inflation of of β

• Numerical (separation) problems: appropriate prior avoids inflation of β

• Constraints on parameters: constraint can be put in prior

• Variable selection: prior can direct the variable search

Bayesian Biostatistics - Piracicaba 2014 306

Page 112: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Multicollinearity

Multicollinearity: |XTX| ≈ 0 ⇒ regression coefficients and standard errors inflated

Ridge regression:

◃ Minimize: (y∗ −Xβ)T (y∗ −Xβ) + λβTβ with λ ≥ 0 & y∗ = y − y1n

◃ Estimate: βR(λ) = (XTX + λI)−1XTy

= Posterior mode of a Bayesian normal linear regression analysis with:

◃ Normal ridge prior N(0, τ 2I) for β

◃ τ 2 = σ2/λ with σ and λ fixed

• Can be easily extended to BGLIM

Bayesian Biostatistics - Piracicaba 2014 307

Page 113: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Numerical (separation) problems

Separation problems in binary regression models: complete separation andquasi-complete separation

Solution: Take weakly informative prior on regression coefficients

0 2 4 6 8 10

02

46

810

x1

x2

0

0

0

0

0

0

0

1

1

1

1

1

1

1

quasi complete separation

N(0,100)

Cauchy (Gelman)

Bayesian Biostatistics - Piracicaba 2014 308

Page 114: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Constraints on parameters

Signal-Tandmobielr study:

• θk = probability of CE among Flemish children in (k = 1, . . . , 6) school year

• Constraint on parameters: θ1 ≤ θ2 ≤ · · · ≤ θ6

• Solutions:

◃ Prior on θ = (θ1, . . . , θ6)T that maps all θs that violate the constraint to zero

◃ Neglect the values that are not allowed in the posterior (useful when sampling)

Bayesian Biostatistics - Piracicaba 2014 309

Page 115: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Other modeling priors

• LASSO prior (see Bayesian variable selection)

• . . .

Bayesian Biostatistics - Piracicaba 2014 310

Page 116: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

5.8 Other regression models

• A great variety of models

• Not considered here: conditional logistic regression model, Cox proportionalhazards model, generalized linear mixed effects models

• . . .

Bayesian Biostatistics - Piracicaba 2014 311

Page 117: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Take home messages

• Often prior is dominated by the likelihood (data)

• Prior in RCTs: prior to the trial

• Conjugate priors: convenient mathematically, computationally and from aninterpretational viewpoint

• Conditional conjugate priors: heavily used in Gibbs sampling

• Hyperpriors: extend the range of conjugate priors, also important in Gibbssampling

Bayesian Biostatistics - Piracicaba 2014 312

Page 118: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

• Noninformative priors:

◃ do not exist, strictly speaking

◃ in practice vague priors (e.g. locally uniform) are ok

◃ important class of NI priors: Jeffreys priors

◃ be careful with improper priors, they might imply improper posterior

• Informative priors:

◃ can be based on historical data & expert knowledge (but only useful whenviewpoint of a community of experts)

◃ are useful in clinical trials to reduce sample size

Bayesian Biostatistics - Piracicaba 2014 313

Page 119: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Chapter 6

Markov chain Monte Carlo sampling

Aims:

◃ Introduce the sampling approach(es) that revolutionized Bayesian approach

Bayesian Biostatistics - Piracicaba 2014 314

Page 120: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.1 Introduction

◃ Solving the posterior distribution analytically is often not feasible due to thedifficulty in determining the integration constant

◃ Computing the integral using numerical integration methods is a practicalalternative if only a few parameters are involved

⇒ New computational approach is needed

◃ Sampling is the way to go!

◃ With Markov chain Monte Carlo (MCMC) methods:

1. Gibbs sampler

2. Metropolis-(Hastings) algorithm

MCMC approaches have revolutionized Bayesian methods!

Bayesian Biostatistics - Piracicaba 2014 315

Page 121: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional probability

Two (discrete) random variables X and Y

• Joint probability of X and Y: probability that X=x and Y=y happen together

• Marginal probability of X: probability that X=x happens

• Marginal probability of Y: probability that Y=y happens

• Conditional probability of X given Y=y: probability that X=x happens if Y=y

• Conditional probability of Y given X=x: probability that Y=y happens if X=x

Bayesian Biostatistics - Piracicaba 2014 316

Page 122: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional probability

IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bankparticipated in a dietary study

LENGTH

WE

IGH

T

140 150 160 170 180 190 200

4060

8010

012

0

Bayesian Biostatistics - Piracicaba 2014 317

Page 123: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional probability

IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bankparticipated in a dietary study

LENGTH

WE

IGH

T

140 150 160 170 180 190 200

4060

8010

012

0

Bayesian Biostatistics - Piracicaba 2014 318

Page 124: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional probability

IBBENS study: frequency table

Length

Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− Total

−50 2 12 4 0 0 0 0 18

50− 60 1 25 50 14 0 0 0 90

60− 70 0 12 54 52 13 1 0 132

70− 80 0 5 42 72 34 0 0 153

80− 90 0 0 12 58 32 2 1 105

90− 100 0 0 0 20 18 3 0 41

100− 110 0 0 1 2 7 1 0 11

110− 120 0 0 0 2 2 1 0 5

120− 0 0 0 0 1 0 0 1

Total 3 54 163 220 107 8 1 556

Bayesian Biostatistics - Piracicaba 2014 319

Page 125: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional probability

IBBENS study: joint probability

Length

Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− total

−50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556

50− 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556

60− 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556

70− 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556

80− 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556

90− 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556

100− 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556

110− 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556

120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556

Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1

Bayesian Biostatistics - Piracicaba 2014 320

Page 126: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional probability

IBBENS study: marginal probabilities

Length

Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− total

−50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556

50− 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556

60− 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556

70− 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556

80− 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556

90− 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556

100− 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556

110− 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556

120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556

Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1

Bayesian Biostatistics - Piracicaba 2014 321

Page 127: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional probability

IBBENS study: conditional probabilities

Length

Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− total

−50 12/54

50− 60 1/90 25/90 25/54 50/90 14/90 0/90 0/90 0/90 90/90

60− 70 12/54

70− 80 5/54

80− 90 0/54

90− 100 0/54

100− 110 0/54

110− 120 0/54

120− 0/54

Total 54/54

Bayesian Biostatistics - Piracicaba 2014 322

Page 128: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional density

Two (continuous) random variables X and Y

• Joint density of X and Y: density f (x, y)

• Marginal density of X: density f (x)

• Marginal density of Y: density f (y)

• Conditional density of X given Y=y: density f (x|y)

• Conditional density of Y given X=x: density f (y|x)

Bayesian Biostatistics - Piracicaba 2014 323

Page 129: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional density

IBBENS study: joint density

Bayesian Biostatistics - Piracicaba 2014 324

Page 130: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional density

IBBENS study: marginal densities

Bayesian Biostatistics - Piracicaba 2014 325

Page 131: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Intermezzo: Joint, marginal and conditional density

IBBENS study: conditional densities

Conditional density of

LENGTH GIVEN WEIGHT

Conditional density of

WEIGHT GIVEN LENGTH

Bayesian Biostatistics - Piracicaba 2014 326

Page 132: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.2 The Gibbs sampler

• Gibbs Sampler: introduced by Geman and Geman (1984) in the context ofimage-processing for the estimation of the parameters of the Gibbs distribution

• Gelfand and Smith (1990) introduced Gibbs sampling to tackle complexestimation problems in a Bayesian manner

Bayesian Biostatistics - Piracicaba 2014 327

Page 133: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.2.1 The bivariate Gibbs sampler

Method of Composition:

• p(θ1, θ2 | y) is completely determined by:

◃ marginal p(θ2 | y)

◃ conditional p(θ1 | θ2,y)

• Split-up yields a simple way to sample from joint distribution

Bayesian Biostatistics - Piracicaba 2014 328

Page 134: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Gibbs sampling:

• p(θ1, θ2 | y) is completely determined by:

◃ conditional p(θ2 | θ1,y)

◃ conditional p(θ1 | θ2,y)

• Property yields another simple way to sample from joint distribution:

◃ Take starting values θ01 and θ02 (only 1 is needed)

◃ Given θk1 and θk2 at iteration k, generate the (k + 1)-th value according toiterative scheme:

1. Sample θ(k+1)1 from p(θ1 | θk2 ,y)

2. Sample θ(k+1)2 from p(θ2 | θ(k+1)

1 ,y)

Bayesian Biostatistics - Piracicaba 2014 329

Page 135: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Result of Gibbs sampling:

• Chain of vectors: θk = (θk1 , θk2)T , k = 1, 2, . . .

◦ Consists of dependent elements

◦ Markov property: p(θ(k+1) | θk, θ(k−1), . . . , y) = p(θ(k+1) | θk,y)

• Chain depends on starting value + initial portion/burn-in part must be discarded

• Under mild conditions: sample from the posterior distribution = target distribution

⇒ From k0 on: summary measures calculated from the chain consistently estimatethe true posterior measures

Gibbs sampler is called a Markov chain Monte Carlo method

Bayesian Biostatistics - Piracicaba 2014 330

Page 136: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.1: SAP study – Gibbs sampling the posterior with NI priors

• Example IV.5: sampling from posterior distribution of the normal likelihood basedon 250 alp measurements of ‘healthy’ patients with NI prior for both parameters

• Now using Gibbs sampler based on y = 100/√alp

• Determine two conditional distributions:

1. p(µ | σ2,y): N(µ | y, σ2/n)2. p(σ2 | µ,y): Inv− χ2(σ2 | n, s2µ) with s2µ = 1

n

∑ni=1(yi − µ)2

• Iterative procedure: At iteration (k + 1)

1. Sample µ(k+1) from N(y, (σ2)k/n)

2. Sample (σ2)(k+1) from Inv− χ2(n, s2µ(k+1))

Bayesian Biostatistics - Piracicaba 2014 331

Page 137: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Gibbs sampling:

6 7 8 9 100

12

34

56

µ

σ2

6 7 8 9 10

01

23

45

6

µ

σ2

6 7 8 9 10

01

23

45

6

µ

σ2

6 7 8 9 10

01

23

45

6

µ

σ2

◦ Sampling from conditional density of µ given σ2

◦ Sampling from conditional density of σ2 given µ

Bayesian Biostatistics - Piracicaba 2014 332

Page 138: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Gibbs sampling path and sample from joint posterior:

6.6 6.8 7.0 7.2 7.4

1.4

1.6

1.8

2.0

2.2

2.4

2.6

µ

σ2

(a)

6.6 6.8 7.0 7.2 7.4

1.4

1.6

1.8

2.0

2.2

2.4

2.6

µ

σ2

(b)

◦ Zigzag pattern in the (µ, σ2)-plane

◦ 1 complete step = 2 substeps (blue=genuine element)

◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 333

Page 139: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Posterior distributions:

µ

6.8 6.9 7.0 7.1 7.2 7.3 7.4

01

23

4

(a)

σ2

1.4 1.6 1.8 2.0 2.2 2.4

0.0

0.5

1.0

1.5

2.0

2.5

(b)

Solid line = true posterior distribution

Bayesian Biostatistics - Piracicaba 2014 334

Page 140: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.2: Sampling from a discrete × continuous distribution

• Joint distribution: f (x, y) ∝(nx

)yx+α−1(1− y)(n−x+β−1)

◦ x a discrete random variable taking values in {0, 1, . . . , n}◦ y a random variable on the unit interval

◦ α, β > 0 parameters

• Question: f (x)?

Bayesian Biostatistics - Piracicaba 2014 335

Page 141: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Marginal distribution:

x

Density

0 5 10 15 20 25 30

0.00

0.02

0.04

0.06

0.08

◦ Solid line = true marginal distribution

◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 336

Page 142: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.3: SAP study – Gibbs sampling the posterior with I priors

• Example VI.1: now with independent informative priors (semi-conjugate prior)

◦ µ ∼ N(µ0, σ20)

◦ σ2 ∼ Inv− χ2(ν0, τ20 )

• Posterior:

p(µ, σ2 | y) ∝ 1

σ0e− 1

2σ20(µ−µ0)2

× (σ2)−(ν0/2+1) e−ν0 τ20/2σ

2

× 1

σn

n∏i=1

e− 1

2σ2(yi−µ)2

∝n∏i=1

e− 1

2σ2(yi−µ)2 e

− 12σ20

(µ−µ0)2(σ2)−(

n+ν02 +1) e−ν0 τ

20/2σ

2

Bayesian Biostatistics - Piracicaba 2014 337

Page 143: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Conditional distributions:

• Determine two conditional distributions:

1. p(µ | σ2,y):∏n

i=1 e− 1

2σ2(yi−µ)2 e

− 12σ20

(µ−µ0)2(N(µk, (σ2)k

))

2. p(σ2 | µ,y): Inv− χ2(ν0 + n,

∑ni=1(yi−µ)2+ν0τ20

ν0+n

)

• Iterative procedure: At iteration (k + 1)

1. Sample µ(k+1) from N(µk, (σ2)k

)2. Sample (σ2)(k+1) from Inv− χ2

(ν0 + n,

∑ni=1(yi−µ)2+ν0τ20

ν0+n

)

Bayesian Biostatistics - Piracicaba 2014 338

Page 144: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Trace plots:

0 500 1000 1500

5.5

6.0

6.5

7.0

Iteration

µ

(a)

0 500 1000 1500

1.8

2.2

2.6

3.0

Iteration

σ2

(b)

Bayesian Biostatistics - Piracicaba 2014 339

Page 145: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.2.2 The general Gibbs sampler

Starting position θ0 = (θ01, . . . , θ0d)T

Multivariate version of the Gibbs sampler:

Iteration (k + 1):

1. Sample θ(k+1)1 from p(θ1 | θk2 , . . . , θk(d−1), θ

kd,y)

2. Sample θ(k+1)2 from p(θ2 | θ(k+1)

1 , θk3 , . . . , θkd,y)

...

d. Sample θ(k+1)d from p(θd | θ(k+1)

1 , . . . , θ(k+1)(d−1), y)

Bayesian Biostatistics - Piracicaba 2014 340

Page 146: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

• Full conditional distributions: p(θj | θk1 , . . . , θk(j−1), θk(j+1), . . . , θ

k(d−1), θ

kd,y)

• Also called: full conditionals

• Under mild regularity conditions:

θk,θ(k+1), . . . ultimately are observations from the posterior distribution

With the help of advanced sampling algorithms (AR, ARS, ARMS, etc)sampling the full conditionals is done based on the prior × likelihood

Bayesian Biostatistics - Piracicaba 2014 341

Page 147: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.4: British coal mining disasters data

◃ British coal mining disasters data set: # severe accidents in British coal minesfrom 1851 to 1962

◃ Decrease in frequency of disasters from year 40 (+ 1850) onwards?

0 20 40 60 80 100

01

23

45

6

1850+year

# D

isaste

rs

Bayesian Biostatistics - Piracicaba 2014 342

Page 148: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Statistical model:

• Likelihood: Poisson process with a change point at k

◃ yi ∼ Poisson(θ) for i = 1, . . . , k

◃ yi ∼ Poisson(λ) for i = k + 1, . . . , n (n=112)

• Priors

◃ θ: Gamma(a1, b1), (a1 constant, b1 parameter)

◃ λ: Gamma(a2, b2), (a2 constant, b2 parameter)

◃ k: p(k) = 1/n

◃ b1: Gamma(c1, d1), (c1, d1 constants)

◃ b2: Gamma(c2, d2), (c2, d2 constants)

Bayesian Biostatistics - Piracicaba 2014 343

Page 149: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Full conditionals:

p(θ | y, λ, b1, b2, k) = Gamma(a1 +k∑i=1

yi, k + b1)

p(λ | y, θ, b1, b2, k) = Gamma(a2 +n∑

i=k+1

yi, n− k + b2)

p(b1 | y, θ, λ, b2, k) = Gamma(a1 + c1, θ + d1)

p(b2 | y, θ, λ, b1, k) = Gamma(a2 + c2, λ + d2)

p(k | y, θ, λ, b1, b2) =π(y | k, θ, λ)∑nj=1 π(y | j, θ, λ)

with π(y | k, θ, λ) = exp [k(λ− θ)]

λ

)∑ki=1 yi

◦ a1 = a2 = 0.5, c1 = c2 = 0, d1 = d2 = 1

Bayesian Biostatistics - Piracicaba 2014 344

Page 150: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Posterior distributions:

θ λ

2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

k

35 40 45

0.00

0.05

0.10

0.15

0.20

◦ Posterior mode of k: 1891

◦ Posterior mean for θ/λ= 3.42 with 95% CI = [2.48, 4.59]

Bayesian Biostatistics - Piracicaba 2014 345

Page 151: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Note:

• In most published analyses of this data set b1 and b2 are given inverse gammapriors. The full conditionals are then also inverse gamma

• The results are almost the same ⇒ our analysis is a sensitivity analysis of theanalyses seen in the literature

• Despite the classical full conditionals, the WinBUGS/OpenBUGS sampler for θand λ are not standard gamma but rather a slice sampler. See Exercise 8.10.

Bayesian Biostatistics - Piracicaba 2014 346

Page 152: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.5: Osteoporosis study – Using the Gibbs sampler

Bayesian linear regression model with NI priors:

◃ Regression model: tbbmci = β0 + β1bmii + εi (i = 1, . . . , n = 234)

◃ Priors: p(β0, β1, σ2) ∝ σ−2

◃ Notation: y = (tbbmc1, . . . , tbbmc234)T , x = (bmi1, . . . , bmi234)

T

Full conditionals: p(σ2 | β0, β1,y) = Inv− χ2(n, s2β)

p(β0 | σ2, β1,y) = N(rβ1, σ2/n)

p(β1 | σ2, β0,y) = N(rβ0, σ2/xTx)

withs2β = 1

n

∑(yi − β0 − β1 xi)

2

rβ1 =1n

∑(yi − β1 xi)

rβ0 =∑

(yi − β0)xi/xTx

Bayesian Biostatistics - Piracicaba 2014 347

Page 153: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Comparison with Method of Composition:

Parameter Method of Composition

2.5% 25% 50% 75% 97.5% Mean SD

β0 0.57 0.74 0.81 0.89 1.05 0.81 0.12

β1 0.032 0.038 0.040 0.043 0.049 0.040 0.004

σ2 0.069 0.078 0.083 0.088 0.100 0.083 0.008

Gibbs sampler

2.5% 25% 50% 75% 97.5% Mean SD

β0 0.67 0.77 0.84 0.91 1.10 0.77 0.11

β1 0.030 0.036 0.040 0.042 0.046 0.039 0.0041

σ2 0.069 0.077 0.083 0.088 0.099 0.083 0.0077

◦ Method of Composition = 1,000 independently sampled values

◦ Gibbs sampler: burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 348

Page 154: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Index plot from Method of Composition:

0 200 400 600 800 1000

0.030

0.045

Index

β1

(a)

0 200 400 600 800 1000

0.07

0.09

0.11

Index

σ2

(b)

Bayesian Biostatistics - Piracicaba 2014 349

Page 155: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Trace plot from Gibbs sampler:

0 500 1000 1500

0.030

0.045

Iteration

β1

(a)

0 500 1000 1500

0.06

0.08

0.10

Iteration

σ2

(b)

Bayesian Biostatistics - Piracicaba 2014 350

Page 156: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Trace versus index plot:

Comparison of index plot with trace plot shows:

• σ2: index plot and trace plot similar ⇒ (almost) independent sampling

• β1: trace plot shows slow mixing ⇒ quite dependent sampling

⇒ Method of Composition and Gibbs sampling: similar posterior measures of σ2

⇒ Method of Composition and Gibbs sampling: less similar posterior measures ofβ1

Bayesian Biostatistics - Piracicaba 2014 351

Page 157: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Autocorrelation:

◃ Autocorrelation of lag 1: correlation of βk1 with β(k−1)1 (k=1, . . .)

◃ Autocorrelation of lag 2: correlation of βk1 with β(k−2)1 (k=1, . . .)

. . .

◃ Autocorrelation of lag m: correlation of βk1 with β(k−m)1 (k=1, . . .)

High autocorrelation:

⇒ burn-in part is larger ⇒ takes longer to forget initial positions

⇒ remaining part needs to be longer to obtain stable posterior measures

Bayesian Biostatistics - Piracicaba 2014 352

Page 158: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.2.3 Remarks∗

• Full conditionals determine joint distribution

• Generate joint distribution from full conditionals

• Transition kernel

Bayesian Biostatistics - Piracicaba 2014 353

Page 159: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.2.4 Review of Gibbs sampling approaches

Sampling the full conditionals is done via different algorithms depending on:

◃ Shape of full conditional (classical versus general purpose algorithm)

◃ Preference of software developer:

◦ SASr procedures GENMOD, LIFEREG and PHREG: ARMS algorithm

◦ WinBUGS: variety of samplers

Several versions of the basic Gibbs sampler:

◃ Deterministic- or systematic scan Gibbs sampler: d dims visited in fixed order

◃ Block Gibbs sampler: d dims split up into m blocks of parameters and Gibbssampler applied to blocks

Bayesian Biostatistics - Piracicaba 2014 354

Page 160: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Review of Gibbs sampling approaches – The block Gibbs sampler

Block Gibbs sampler:

• Normal linear regression

◃ p(σ2 | β0, β1,y)

◃ p(β0, β1 | σ2,y)

• May speed up considerably convergence, at the expense of more computationaltime needed at each iteration

• WinBUGS: blocking option on

• SASr procedure MCMC: allows the user to specify the blocks

Bayesian Biostatistics - Piracicaba 2014 355

Page 161: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.3 The Metropolis(-Hastings) algorithm

Metropolis-Hastings (MH) algorithm = general Markov chain Monte Carlotechnique to sample from the posterior distribution but does not require fullconditionals

• Special case: Metropolis algorithm proposed by Metropolis in 1953

• General case: Metropolis-Hastings algorithm proposed by Hastings in 1970

• Became popular only after introduction of Gelfand & Smith’s paper (1990)

• Further generalization: Reversible Jump MCMC algorithm by Green (1995)

Bayesian Biostatistics - Piracicaba 2014 356

Page 162: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.3.1 The Metropolis algorithm

Sketch of algorithm:

• New positions are proposed by a proposal density q

• Proposed positions will be:

◃ Accepted:

◦ Proposed location has higher posterior probability: with probability 1

◦ Otherwise: with probability proportional to ratio of posterior probabilities

◃ Rejected:

◦ Otherwise

• Algorithm satisfies again Markov property ⇒ MCMC algorithm

• Similarity with AR algorithm

Bayesian Biostatistics - Piracicaba 2014 357

Page 163: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Metropolis algorithm:

Chain is at θk ⇒ Metropolis algorithm samples value θ(k+1) as follows:

1. Sample a candidate θ from the symmetric proposal density q(θ | θ), withθ = θk

2. The next value θ(k+1) will be equal to:

• θ with probability α(θk, θ) (accept proposal),

• θk otherwise (reject proposal),

with

α(θk, θ) = min

(r =

p(θ | y)p(θk | y)

, 1

)

Function α(θk, θ) = probability of a move

Bayesian Biostatistics - Piracicaba 2014 358

Page 164: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

The MH algorithm only requires the product of the prior and the likelihoodto sample from the posterior

Bayesian Biostatistics - Piracicaba 2014 359

Page 165: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.7: SAP study – Metropolis algorithm for NI prior case

Settings as in Example VI.1, now apply Metropolis algorithm:

◃ Proposal density: N(θk,Σ) with θk = (µk, (σ2)k)T and Σ = diag(0.03, 0.03)

6.6 6.8 7.0 7.2 7.4

1.4

1.6

1.8

2.0

2.2

2.4

2.6

µ

σ2

(a)

6.6 6.8 7.0 7.2 7.4

1.4

1.6

1.8

2.0

2.2

2.4

2.6

µ

σ2

(b)

◦ Jumps to any location in the (µ, σ2)-plane

◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 360

Page 166: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

MH-sampling:

6.5 7.0 7.5

1.5

2.0

2.5

µ

σ2

6.5 7.0 7.5

1.5

2.0

2.5

µ

σ2

6.5 7.0 7.5

1.5

2.0

2.5

µ

σ2

Bayesian Biostatistics - Piracicaba 2014 361

Page 167: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Marginal posterior distributions:

µ

6.9 7.0 7.1 7.2 7.3

01

23

45

6 (a)

σ2

1.6 1.8 2.0 2.2 2.4

0.0

0.5

1.0

1.5

2.0

2.5

3.0 (b)

◦ Acceptance rate = 40%

◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 362

Page 168: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Trace plots:

600 800 1000 1200 1400

6.9

7.1

7.3

Iteration

µ

(a)

600 800 1000 1200 1400

1.6

2.0

2.4

Iteration

σ2

(b)

◦ Accepted moves = blue color, rejected moves = red color

Bayesian Biostatistics - Piracicaba 2014 363

Page 169: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Second choice of proposal density:

◃ Proposal density: N(θk,Σ) with θk = (µk, (σ2)k)T and Σ = diag(0.001, 0.001)

6.6 6.8 7.0 7.2 7.4

1.4

1.6

1.8

2.0

2.2

2.4

2.6

µ

σ2

(a)

σ2

1.5 1.7 1.9 2.1

0.0

1.0

2.0

3.0

(b)

◦ Acceptance rate = 84%

◦ Poor approximation of true distribution

Bayesian Biostatistics - Piracicaba 2014 364

Page 170: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Accepted + rejected positions:

6.5 7.0 7.5

1.5

2.0

2.5

Variance proposal = 0.03

µ

σ2

●●

●●

●●●

6.5 7.0 7.5

1.5

2.0

2.5

Variance proposal = 0.001

µ

σ2

● ●

●●

●●●

6.5 7.0 7.5

1.5

2.0

2.5

Variance proposal = 0.1

µ

σ2

●●

●●

●●

Bayesian Biostatistics - Piracicaba 2014 365

Page 171: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Problem:

What should be the acceptance rate for a good Metropolis algorithm?

From theoretical work + simulations:

• Acceptance rate: 45% for d = 1 and ≈ 24% for d > 1

Bayesian Biostatistics - Piracicaba 2014 366

Page 172: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.3.2 The Metropolis-Hastings algorithm

Metropolis-Hastings algorithm:

Chain is at θk ⇒ Metropolis-Hastings algorithm samples value θ(k+1) as follows:

1. Sample a candidate θ from the (asymmetric) proposal density q(θ | θ), withθ = θk

2. The next value θ(k+1) will be equal to:

• θ with probability α(θk, θ) (accept proposal),

• θk otherwise (reject proposal),

with

α(θk, θ) = min

(r =

p(θ | y) q(θk | θ)p(θk | y) q(θ | θk)

, 1

)

Bayesian Biostatistics - Piracicaba 2014 367

Page 173: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

• Reversibility condition: Probability of move from θ to θ = probability of movefrom θ to θ

• Reversible chain: chain satisfying reversibility condition

• Example asymmetric proposal density: q(θ | θk) ≡ q(θ) (Independent MHalgorithm)

• WinBUGS makes use of univariate MH algorithm to sample from somenon-standard full conditionals

Bayesian Biostatistics - Piracicaba 2014 368

Page 174: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.8: Sampling a t-distribution using Independent MH algorithm

Target distribution : t3(3, 22)-distribution

(a) Independent MH algorithm with proposal density N(3,42)

(b) Independent MH algorithm with proposal density N(3,22)

t

−5 0 5 10

0.00

0.10

0.20

0.30 (a)

t

−5 0 5 10

0.00

0.10

0.20

0.30 (b)

Bayesian Biostatistics - Piracicaba 2014 369

Page 175: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.3.3 Remarks*

• The Gibbs sampler is a special case of the Metropolis-Hastings algorithm, butGibbs sampler is still treated differently

• The transition kernel of the MH-algorithm

• The reversibility condition

• Difference with AR algorithm

Bayesian Biostatistics - Piracicaba 2014 370

Page 176: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

6.5. Choice of the sampler

Choice of the sampler depends on a variety of considerations

Bayesian Biostatistics - Piracicaba 2014 371

Page 177: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Example VI.9: Caries study – MCMC approaches for logistic regression

Subset of n = 500 children of the Signal-Tandmobielr study at 1st examination:

◃ Research questions:

◦ Have girls a different risk for developing caries experience (CE ) than boys(gender) in the first year of primary school?

◦ Is there an east-west gradient (x-coordinate) in CE?

◃ Bayesian model: logistic regression + N(0, 1002) priors for regression coefficients

◃ No standard full conditionals

◃ Three algorithms:

◦ Self-written R program: evaluate full conditionals on a grid + ICDF-method

◦ WinBUGS program: multivariate MH algorithm (blocking mode on)

◦ SASr procedure MCMC: Random-Walk MH algorithm

Bayesian Biostatistics - Piracicaba 2014 372

Page 178: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Program Parameter Mode Mean SD Median MCSE

Intercept -0.5900 0.2800

MLE gender -0.0379 0.1810

x-coord 0.0052 0.0017

Intercept -0.5880 0.2840 -0.5860 0.0104

R gender -0.0516 0.1850 -0.0578 0.0071

x-coord 0.0052 0.0017 0.0052 6.621E-5

Intercept -0.5800 0.2810 -0.5730 0.0094

WinBUGS gender -0.0379 0.1770 -0.0324 0.0060

x-coord 0.0052 0.0018 0.0053 5.901E-5

Intercept -0.6530 0.2600 -0.6450 0.0317

SASr gender -0.0319 0.1950 -0.0443 0.0208

x-coord 0.0055 0.0016 0.0055 0.00016

Bayesian Biostatistics - Piracicaba 2014 373

Page 179: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Conclusions:

• Posterior means/medians of the three samplers are close (to the MLE)

• Precision with which the posterior mean was determined (high precision = lowMCSE) differs considerably

• The clinical conclusion was the same

⇒ Samplers may have quite a different efficiency

Bayesian Biostatistics - Piracicaba 2014 374

Page 180: Chapter 4 More than one parameter - USP€¦ · Chapter 4 More than one parameter Aims: Moving towards practical applications Illustrating that computations become quickly involved

Take home messages

• The two MCMC approaches allow fitting basically any proposed model

• There is no free lunch: computation time can be MUCH longer than withlikelihood approaches

• The choice between Gibbs sampling and the Metropolis-Hastings approachdepends on computational and practical considerations

Bayesian Biostatistics - Piracicaba 2014 375