19
{ Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using Stata (Second Edition) by Sophia Rabe-Hesketh Anders Skrondal

{ Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Embed Size (px)

Citation preview

Page 1: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

{

Multilevel Modeling using

StataAndrew HicksCCPR Statistics and Methods Core

Workshop based on the book:

Multilevel and Longitudinal ModelingUsing Stata(Second Edition)

bySophia Rabe-HeskethAnders Skrondal

Page 2: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using
Page 3: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

200

300

400

500

600

700

Min

i Wrig

ht M

eas

ure

me

nts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Subject ID

Occasion 1 Occasion 2

Within-Subject Dependence

Within-Subject Dependence: We can predict occasion 2 measurement ifwe know the subject’s occasion 1 measurement.

Between-Subject Heterogeneity: Large differences between subjects(compare subjects 9 and 15)

Within-subject dependence is due to between-subject heterogeneity

Page 4: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Standard Regression Model

𝑦 𝑖𝑗=𝛽+𝜉 𝑖𝑗

Measurement of subject i on occasion j

Population Mean

Residuals (error terms)Independent over subjects and occasions

Clearly ignores information aboutwithin-subject dependence

{{

{ { 𝜷

Page 5: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Variance Component Model

𝑦 𝑖𝑗=𝛽+𝜉 𝑖𝑗

𝜁 𝑗 𝜖 𝑖𝑗𝑦 𝑖𝑗=𝛽+¿ +¿Random Intercept: deviation of subjectj’s mean from overall mean

Within-subject residual: deviation of observation i from subject j’s mean

Page 6: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Variance Component Model

𝑦 𝑖𝑗=𝛽+𝜉 𝑖𝑗

𝜁 𝑗 𝜖 𝑖𝑗𝑦 𝑖𝑗=𝛽+¿ +¿Random Intercept: deviation of subjectj’s mean from overall mean

Within-subject residual: deviation of observation i from subject j’s mean

Page 7: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Variance Component Model

𝜁 𝑗 𝜖 𝑖𝑗𝑦 𝑖𝑗=𝛽+¿ +¿Random Intercept: deviation of subjectj’s mean from overall mean

Within-subject residual: deviation of observation i from subject j’s mean

𝜷𝜁 𝑗

𝛽+𝜁 𝑗𝜖2 𝑗

𝜖1 𝑗

Page 8: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Variance Component Model

𝜁 𝑗 𝜖 𝑖𝑗𝑦 𝑖𝑗=𝛽+¿ +¿𝜁 𝑗 ∼ 𝑁 (0 ,𝜓)𝜖 𝑖𝑗∼ 𝑁 (0 ,𝜃)

𝑉𝑎𝑟 ( 𝑦 𝑖𝑗 )=𝑉𝑎𝑟 ( 𝛽)+𝑉𝑎𝑟 (𝜁 𝑗)+𝑉𝑎𝑟 (𝜖 𝑖𝑗)0 𝜓 𝜃

𝑉𝑎𝑟 ( 𝑦 𝑖𝑗 )=𝜓+𝜃

Page 9: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Variance Component Model

𝜁 𝑗 𝜖 𝑖𝑗𝑦 𝑖𝑗=𝛽+¿ +¿Proportion of Total Variance due to subject differences:

=

=

Intraclass Correlation: within cluster correlation

=

Page 10: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using
Page 11: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Random or Fixed Effect?

Since every subject has a different effect we can think of subjects as categorical explanatory variables. Since the effectsof each subject is random, we have been using a random effect model:

, 𝜁 𝑗∼ 𝑁 (0 ,𝜓)What if we want to fix our model so that each effect is for a specific subject? Then we would use a fixed effect model:

,

.xtreg wm, fe

Page 12: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Random or Fixed Effect?

random effect model:

if the interest concerns the population of clusters

“generalize the potential effect” i.e. nurse giving the drug

fixed effect model:

if we are interest in the “effect” of the specific clusters in a particulardataset

“replicable in life” i.e. the actual drug

Page 13: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Random Intercept Model with Covariates

𝑦 𝑖𝑗=𝛽+𝜉 𝑖𝑗

𝜁 𝑗 𝜖 𝑖𝑗𝑦 𝑖𝑗=𝛽+¿ +¿without covariates:

Page 14: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Random Intercept Model with Covariates

with covariates:

𝑦 𝑖𝑗=𝛽1+𝛽2 𝑥2 𝑖𝑗+… 𝛽𝑝 𝑥𝑝𝑖𝑗+𝜉 𝑖𝑗

𝜖 𝑖𝑗+¿𝑦 𝑖𝑗=𝛽1+𝛽2 𝑥2 𝑖𝑗+… 𝛽𝑝 𝑥𝑝𝑖𝑗+𝜁 𝑗

𝜖 𝑖𝑗+¿

random parameter not estimated with fixed parameters

but whose variance is estimated with variance of

Page 15: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using
Page 16: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Ecological Fallacyoccurs when between-cluster relationships differ substantially from within-cluster relationships.

• Can be caused by cluster-lever confounding

For example, mothers who smoke during pregnancy may also adoptother behaviors such as drinking and poor nutritional intake, or have lowersocioeconomic status and be less educated. These variables adversely affectbirthweight and have not be adequately controlled for. In these cases thecovariate is correlated with the error term. (endogeneity)

• Because of this, the between-effect may be an overestimate of thetrue effect.

• In contrast, for within-effects each mother serves as her own control, so within mother estimates may be closer to the true causal effect.

Page 17: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

How to test for endogeneity?

Use the Hausman test to compare two alternative estimators of

Page 18: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Random-coefficient model

We’ve already considered random intercept models where the interceptis allowed to vary over clusters after controlling for covariates.

What if we would also like the coefficients (or slopes) to vary across clusters?

Models the involve both random intercepts and random slopes are called Random Coefficient Models

Page 19: { Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on the book: Multilevel and Longitudinal Modeling Using

Random-coefficient model

Random Intercept Model:

𝑦 𝑖𝑗=𝛽1+𝛽2 𝑥𝑖𝑗+𝜁 𝑗+𝜖𝑖𝑗

Random Coefficient Model:

𝑦 𝑖𝑗=𝛽1+𝛽2 𝑥𝑖𝑗+𝜁 1 𝑗+𝜁2 𝑗 𝑥 𝑖𝑗+𝜖 𝑖𝑗

𝑦 𝑖𝑗=(𝛽¿¿1+𝜁1 𝑗)+(𝛽2+𝜁2 𝑗)𝑥𝑖𝑗+𝜖 𝑖𝑗¿

cluster-specific random intercept

cluster-specific random slope