Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Gibbs Sampling in Hierarchical Models
Econ 690
Purdue University
March 19, 2012
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
In many models in economics and statistics, it is natural tointroduce some kind of structure relating the parameters ofthe model.
For example, one might wish to express some degree of“similarity” across parameters of a model by assuming thatthey are drawn from a common population distribution.
The parameters of the population distribution are also ofinterest.
In Bayesian terms, such specifications can be accommodatedby the appropriate choice of priors on the model parameters,while in Frequentist parlance, these are termed “randomcoefficient” models.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Consider the following most basic version of a longitudinal (panel)data model:
In the above, yit refers to the outcomes for individual (or moregenerally, group) i at time t, and αi is a person (or group) specificrandom effect.
We assume i = 1, 2, · · · ,N and t = 1, 2, · · · .T (i.e., a balancedpanel).
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
For this model, we will do the following:
1 (a) Comment on how the presence of the random effects αi
accounts for correlation patterns within individuals over time.
2 (b) Derive the conditional posterior distributionp(αi |α, σ2
ε , σ2α, y).
3 (c) Obtain the mean of the conditional posterior distributionin (b). Comment on its relationship to a shrinkage estimator.(These are estimators that are typically written as some sortof weighted average of a “data” term and a prior term). Howdoes the mean change as T and σ2
ε /σ2α change?
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
(a) Conditional on the random effects {αi}Ni=1, the yit areindependent.
However, marginalized over the random effects, outcomes arecorrelated within individuals over time.
To see this, note that we can write our model equivalently as:
yit = α + ui + εit ,
where we have rewritten our “random effect” specification as
αi = α + ui , uiiid∼ N(0, σ2
α)
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Thus for t 6= s,
Cov(yit , yis |α, σ2ε , σ
2α) = Cov(ui + εit , ui + εis)
= Cov(ui , ui )
= Var(ui )
= σ2α,
so that outcomes are correlated over time within individuals.
However, the model does not permit any degree of correlationbetween the outcomes of different individuals.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
From previous results relating to the derivation of conditionalposterior distributions for regression parameters in a linear model,we can obtain:
where
and
with ιT denoting a T × 1 vector of ones.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
The mean of this conditional posterior distribution is easilyobtained from our solution in (b):
Let
w = w(T , [σ2ε /σ
2α]) ≡ T
T + (σ2ε /σ
2α).
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
We can then write
E (αi |β, σ2ε , σ
2α, y) = wy i + (1− w)α.
This is in the form of a shrinkage estimator, where the conditionalposterior mean of αi is a weighted average of the averagedoutcomes for individual i , y i , and the common mean for allindividuals, α.
As T → 1, the weight w places all mass on y i .
On the other hand, if σ2ε is large relative to σ2
α, (and T is small ormoderate), the common mean α will get substantial weight.
The “fixed effect” formulation of this model is often criticized foroverfitting.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Posterior Simulation in a Panel Model
We illustrate the use of the Gibbs sampler in such models with thecelebrated “rat growth dataset” of Gelfand et al (1990).
30 different rats are weighed at 5 different points in time.
We denote the weight of rat i at measurement j as yij and let xij
denote the age of the i th rat at the j th measurement.
Since each of the rats were weighed at exactly the same number ofdays since birth, we have
xi1 = 8, xi2 = 15, xi3 = 22, xi4 = 29, xi5 = 36 ∀i .
The rat growth data set is provided on the following page:
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Rat Growth Data from Gelfand et al (1990).Rat Weight Measurements Rat Weight Measurementsi yi1 yi2 yi3 yi4 yi5 i yi1 yi2 yi3 yi4 yi5
1 151 199 246 283 320 16 160 207 248 288 3242 145 199 249 293 354 17 142 187 234 280 3163 147 214 263 312 328 18 156 203 243 283 3174 155 200 237 272 297 19 157 212 259 307 3365 135 188 230 280 323 20 152 203 246 286 3216 159 210 252 298 331 21 154 205 253 298 3347 141 189 231 275 305 22 139 190 225 267 3028 159 201 248 297 338 23 146 191 229 272 3029 177 236 285 340 376 24 157 211 250 285 323
10 134 182 220 260 296 25 132 185 237 286 33111 160 208 261 313 352 26 160 207 257 303 34512 143 188 220 273 314 27 169 216 261 295 33313 154 200 244 289 325 28 157 205 248 289 31614 171 221 270 326 358 29 137 180 219 258 29115 163 216 242 281 312 30 153 200 244 286 324
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
In our model, we want to permit unit-specific variation in birth andgrowth rates.
This leads us to specify the following model:
so that each rat possesses its own intercept αi and growth rate βi .
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
We also assume that the rats share some degree of “similarity” intheir weight at birth and rates of growth.
Thus, we assume that the intercept and slope parameters aredrawn from a common Normal population:
We interpret α0 = θ0(1) as the population average weight at birthand β0 = θ0(2) as the population average growth rate.
The diagonal elements of Σ quantify the variation around thesepopulation means. (What would we expect the sign of Σ’soff-diagonal to be)?
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
We complete our Bayesian analysis by specifying the followingpriors:
σ2|a, b ∼ IG (a, b)
θ0|η,C ∼ N(η,C )
Σ−1|ρ,R ∼ W ([ρR]−1, ρ),
with W denoting the Wishart distribution.We now seek to describe how the Gibbs sampler can be employedto fit this hierarchical model.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Given the assumed conditional independence across observations,the joint posterior distribution for all the parameters of this modelcan be written as:
p(Γ|y) ∝
[30∏i=1
p(yi |xi , θi , σ2)p(θi |θ0,Σ
−1)
]p(θ0|η,C)p(σ2|a, b)p(Σ−1|ρ,R),
where Γ ≡ [{θi}, θ0,Σ−1, σ2] denotes all the parameters of the
model. We have stacked the observations over time for eachindividual rat so that
yi =
yi1
yi2...
yi5
and Xi =
1 xi1
1 xi2...
...1 xi5
.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Fitting this model via the Gibbs sampler requires thederivation of four posterior conditional distributions:
1
p(θi |Γ−θi , y).
2
p(θ0|Γ−θ0 , y).
3
p(σ2|Γ−σ2 , y).
4
p(Σ−1|Γ−Σ−1 , y).
We will derive each of these densities.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
As for the complete posterior conditional for θi , we note:
This fits directly into our standard linear regression result, applyingLindley and Smith (1972):
where
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
As for the posterior conditional for θ0, we first obtain
Since the second stage of our model specifies p(θi |θ0,Σ−1) as iid ,
we can write θ1
θ2...θ30
=
I2I2...I2
θ0 +
u1
u2...
u30
,
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Equivalently, we can write:
θ = Iθ0 + u,
with θ = [θ′1 θ′2 · · · θ′30]′, I = [I2 I2 · · · I2]′, u = [u′1 u′2 · · · u′30]′ and
E (uu′) = I30 ⊗ Σ.In this form, we can again apply our well-known result to obtain:
θ0|Γ−θ0 , y ∼ N(Dθ0dθ0 ,Dθ0)
where
Dθ0 =(
I ′(I30 ⊗ Σ−1)I + C−1)−1
=(30Σ−1 + C−1
)−1
dθ0 = (I ′(I30 ⊗ Σ−1)θ + C−1η) = (30Σ−1θ + C−1η),
where θ = (1/30)∑30
i=1 θi .
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
As for the posterior conditional for σ2, we obtain
Thus,
where N = 5(30) = 150.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Finally, for the posterior conditional for Σ−1, we obtain
Therefore,
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
We fit this model using priors of the forms:
η =
[10015
],C =
[402 00 102
], ρ = 5, R =
[102 00 .52
],
a = 3, b = 1/40.
The sampler was run for 10,000 iterations, and the first 500 werediscarded as the burn-in.
In the next two graphs, we provide some suggestive evidence ofrapid convergence, and also that the chain tends to mix reasonablywell.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
0 10 20 300
200
Iteration
α 0
0 10 20 300
50
β 0
0 10 20 300
300
σ2 α
0 10 20 300
120
α 20
Iteration
Iteration Iteration
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Table 12.3: Autocorrelations in ParameterChains at Various Lag Orders
Parameter Lag 1 Lag 5 Lag 10
α0 .24 .010 .007β0 .22 -.004 -.009σ2α .36 .020 -.010ρα,β .37 .036 .003α15 .18 .025 .018
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Table 12.2: Posterior Quantities for a Selection of Parameters
Parameter Post Mean Post Std. 10th Percentile 90th Percentile
α0 106.6 2.34 103.7 109.5β0 6.18 .106 6.05 6.31σ2α 124.5 42.41 77.03 179.52σ2β .275 .088 .179 .389ρα,β -.120 .211 -.390 .161α10 93.76 5.24 87.09 100.59α25 86.91 5.81 79.51 94.45β10 5.70 .217 5.43 5.98β25 6.75 .243 6.44 7.05
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
80
90
100
110
120
130
Inte
rcep
ts
5
5.5
6
6.5
7
7.5
Gro
wth
Rat
es
Hierarchical Model
Hierarchical Model
OLSEstimates
OLSEstimates
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
A second application follows (Krueger 1998; Krueger andWhitmore 2001) and applies our model to analyze data fromProject STAR (Student/Teacher Achievement Ratio).
Project STAR was an experiment in Tennessee that randomlyassigned students to one of three types of classes - small class,regular size class, and regular size class with a teacher’s aide(regular/aide class).
The dependent variable is the average of a reading percentilescore and math percentile score of a Project STAR student.
There are two explanatory variables - a dummy variableindicating whether a student is assigned to a small class andanother indicating assignment to a regular/aide class. Thedefault category, therefore, is assignment to regular class.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
The Project STAR data we use contains 79 participatingschools with a total of 5,726 students who entered the projectduring kindergarten.
We focus on the achievement measure taken at the end of thekindergarten year and consider heterogeneity of treatmentimpacts across schools.
Therefore, in this application of the model in (47), i denotesthe school and j/ t no longer represents a time index but,instead, denotes the student within a school.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Table 4: Posterior means, standard deviations and probabilities of beingpositive of the parameters
Parameter E(β|D) Std(β|D) Pr(β > 0|D)
β0 (intercept) 51 1.82 1β1 (small class) 5.48 1.44 1β2 (regular/aide class) 0.311 1.26 0.596√σ2 22.9 0.221 1√Σβ(1, 1) 15.2 1.32 1√Σβ(2, 2) 10.6 1.24 1√Σβ(3, 3) 8.93 1.14 1
Σβ(1, 2)/√
Σβ(1, 1)×Σβ(2, 2) -0.454 0.111 0.000125
Σβ(1, 3)/√
Σβ(1, 1)×Σβ(3, 3) -0.483 0.111 0.000125
Σβ(2, 3)/√
Σβ(2, 2)×Σβ(3, 3) 0.548 0.118 1
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Evidence of small class size effect, no strong evidence of aideeffect.
Evidence of heterogeneity of impacts across schools.
Strong correlation among school-level parameters ... what isthe/ a interpretation for this result?
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Gibbs sampling in mixed models
Let us now turn to a restricted (and perhaps morewidely-used) version of our previous model.
Note that, in the specification just presented, both theintercept and slope (or, more generally, set of slopes), werepermitted to vary across the units.
In a restricted version of this model, perhaps the “default”panel data specification in economics, we may permit theintercept to vary across individuals, but restrict the otherregression coefficients to be constant across individuals.
Such a model is termed a mixed model.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Formally, we consider a specification of the form:
yit = αi + xitβ + εit , εitiid∼ N(0, σ2
ε )
αiiid∼ N(α, σ2
α).
For this model we employ independent priors of the form:
β ∼ N(β0,Vβ)
α ∼ N(α0,Vα)
σ2ε ∼ IG (e1, e2)
σ2α ∼ IG (a1, a2).
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
We seek to do the following:
(a) Derive the complete posterior conditionals
p(αi |β, α, σ2ε , σ
2α, y), and p(β|{αi}, α, σ2
ε , σ2α, y),
(b) Describe how one could use a blocking or grouping step[e.g., Chib and Carlin (1999)] to obtain draws directly fromthe joint posterior conditional p({αi}, β|α, σ2
ε , σ2α, y).
(c) Describe how the Gibbs sampler can be used to fit themodel, given your result in (b). Would you expect anyimprovements in this blocked algorithm relative to thestandard Gibbs algorithm in (a)?
(d) How does your answer in (c) change for the case of anunbalanced panel where T = Ti?
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
As for (a), the complete posterior conditionals can be obtained in astraightforward manner. Specifically, we obtain
p(αi |β, α, σ2ε , σ
2α, y) ∼ N(Dd ,D)
where
D =(T/σ2
ε + 1/σ2α
)−1, d =
T∑t=1
(yit − xitβ)/σ2ε + α/σ2
α.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
The complete posterior conditional for β follows similarly:
β|{αi}, α, σ2ε , σ
2α, y ∼ N(Hh,H),
where
H =(
X ′X/σ2ε + V−1
β
)−1, h = X ′(y − α)/σ2
ε + V−1β β0,
withα = [(ιTα1)′ (ιTα2)′ · · · (ιTαN)′]′.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
(b) Instead of the strategy described in (a), we seek to draw therandom effects {αi} and “fixed effects” β in a single block.
This strategy of grouping together correlated parameters willgenerally facilitate the mixing of the chain and thereby reducenumerical standard errors associated with the Gibbs samplingestimates.
We break this joint posterior conditional into the following twopieces:
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
The assumptions of our model imply that the random effects {αi}are conditionally independent, so that
p({αi}, β|α, σ2ε , σ
2α, y) =
[N∏
i=1
p(αi |β, α, σ2ε , σ
2α, y)
]p(β|α, σ2
ε , σ2α, y).
This suggests that one can draw from this joint posteriorconditional via the method of composition by first drawing fromp(β|α, σ2
ε , σ2α, y) and then drawing each αi independently from its
complete posterior conditional distribution.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
We now derive the conditional posterior distribution for β,marginalized over the random effects. Note that our model can berewritten as follows:
where uiiid∼ N(0, σ2
α).Let vit = ui + εit . If we stack this equation over t within i weobtain:
where
yi = [yi1 yi2 · · · yiT ]′, xi = [x ′i1 x ′i2 · · · x ′iT ]′, andvi = [vi1 vi2 · · · viT ]′.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Stacking again over i we obtain:
y = iNTα + Xβ + v ,
whereE (vv ′) = IN ⊗ Σ.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
In this form, we can now appeal to our standard results for theregression model to obtain
β|α, σ2ε , σ
2α, y ∼ N(Gg ,G )
where
G =(
X ′(IN ⊗ Σ−1)X + V−1β
)−1=
(N∑
i=1
x ′i Σ−1xi + V−1
β
)−1
and
g = X ′(IN⊗Σ−1)(y−ιNTα)+V−1β β0 =
N∑i=1
x ′i Σ−1(yi−ιTα)+V−1
β β0.
Thus, to sample from the desired joint conditional, you first sampleβ from the distribution given above and then sample the randomeffects independently from their complete conditional posteriordistributions.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Finally, it is also worth noting that β and α could be drawntogether in the first step of this process.
That is, one could draw from the joint posterior conditional
p({αi}, α, β|σ2ε , σ
2α, y) =
[N∏
i=1
p({αi}|β, α, σ2α, σ
2ε , y)
]p(β, α|σ2
ε , σ2α, y)
in a similar way as described above.
We would expect the mixing of such chains to improve relative tothe standard (or unblocked) Gibbs sampler. In the limiting case,where we can sample in a single block, we are back to iid sampling!
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
(c) Given the result in (b), we now need to obtain the remainingcomplete conditionals. These are given as follows:
α|{αi}, β, σ2ε , σ
2α, y ∼ N(Rr ,R)
where
R =(N/σ2
α + V−1α
)−1, r =
N∑i=1
αi/σ2α + V−1
α α0.
σ2α|{αi}, β, σ2
ε , y ∼ IG
(N/2) + a1,
[a−1
2 + .5N∑
i=1
(αi − α)2
]−1 .
σ2ε |{αi}, β, σ2
α, y ∼ IG
((NT/2) + e1,[
e−12 + .5(y − ιNTα− Xβ)′(y − ιNTα− Xβ)
]−1).
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Only slight changes are required in the case of unbalanced panels.
Let NT continue to denote the total number of observations,NT ≡
∑Ni=1 Ti .
In addition, let Σi ≡ σ2ε ITi
+ σ2αιTi
ι′Ti.
Replacing Σ with Σi and T with Ti , as appropriate in the aboveformulae, is all that is required.
Justin L. Tobias Hierarchical Models
Hierarchical Modeling with Longitudinal (Panel) Data Blocking Steps in Mixed Models
Further Reading
Gelfand, A.E., S.E. Hills, A. Racine-Poon and A.F.M. Smith(1990).Illustration of Bayesian inference in normal data models usingGibbs sampling.JASA 85, 972-985.
Chib, S. and B. Carlin (1999).On MCMC Sampling in hierarchical longitudinal models.Statistics and Computing 9, 17-26.
Justin L. Tobias Hierarchical Models