100
Introduction You can get the datasets and SAS commands for this workshop on the web page: http://www.umich.edu/~kwelch/procmixworkshop.html The lab examples come from the book, “Linear Mixed Models: A Practical Guide Using Statistical Software”, by Brady T. West, Kathleen B. Welch, and Andrzej T. Gałecki, Chapman & Hall/CRC, 2006 (referred to as WWG). The model and figure numbers in the labs correspond to those in WWG. Go to http://www-personal.umich.edu/~bwest/almmussp.html to get more detailed analysis for each example in the book, along with the appropriate syntax in SAS (using SAS 9.1.3), SPSS, Stata, R, and HLM software programs for each analysis presented in the book. In this workshop we will be using four actual datasets plus the hypothetical banana data. 1.rat_pup.dat This data set was provided by Jose Pinheiro and Doug Bates in their book, “Mixed-Effects Models in S and S-Plus” (2000). In the analysis of this data set we compare the birth weights of 322 rat pups in 27 litters (litter size varies from 2 to 18 pups) born to mothers treated with High, Low, and Control doses of a chemical. 1

Introduction - €¦  · Web view5 1 217 5 -0.03 1 425 53 2 0.082 6 1 217 6 0.76 1 450 65 2 0.082 7 1 217 7 -0.03 1 452 51 2 0.082 8 1 217 8 0.2 1 443 66 2 0.082 We get descriptive

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Introduction

Introduction

You can get the datasets and SAS commands for this workshop on the web page:

http://www.umich.edu/~kwelch/procmixworkshop.html

The lab examples come from the book, “Linear Mixed Models: A Practical Guide Using Statistical Software”, by Brady T. West, Kathleen B. Welch, and Andrzej T. Gałecki, Chapman & Hall/CRC, 2006 (referred to as WWG). The model and figure numbers in the labs correspond to those in WWG.

Go to http://www-personal.umich.edu/~bwest/almmussp.html

to get more detailed analysis for each example in the book, along with the appropriate syntax in SAS (using SAS 9.1.3), SPSS, Stata, R, and HLM software programs for each analysis presented in the book.

In this workshop we will be using four actual datasets plus the hypothetical banana data.

1. rat_pup.dat This data set was provided by Jose Pinheiro and Doug Bates in their book, “Mixed-Effects Models in S and S-Plus” (2000). In the analysis of this data set we compare the birth weights of 322 rat pups in 27 litters (litter size varies from 2 to 18 pups) born to mothers treated with High, Low, and Control doses of a chemical.

2. classroom.csv This data set was from a study by of Instructional Improvement (SII; Hill, Rowan, and Ball, 2004) and includes data from 1190 first grade students from 312 classrooms in 107 schools.

3. ratbrain.dat This data set was reported by Douglas et. al., 2004 and shows measurement of the effects of two different chemicals on three different regions of the brain in five adult male rats.

4. autism.csv This data set was derived from a study of 156 children with Autism Spectrum Disorder (Oti, Anderson, Lord, 2006). Measurements were made at five basic ages for each child: 2, 3, 5, 9, and 13 years. Not all children were measured at all time points.

Lab Example 2

Two-Level Clustered Data

The Rat Pup Data

(Chapter 3 in WWG)

This is a two-level clustered data set, in which the clusters are litters, and individual pups are the units of analysis. This analysis is displayed in more detail than the analyses for later examples.

In this data set, 30 pregnant rats were treated with one of three doses of a drug (High, Low, or Control), and the birth weights of the rat pups that were born were measured. There were originally 10 litters for each of the drug doses. However, 3 litters in the high dose did not survive, resulting in 27 litters. There were 322 rat pups in the final study.

The table below shows a portion of the Rat Pup data. We have Level 1 covariates (e.g. Sex) that are specific for each rat pup and Level 2 covariates (e.g. Treatment and Litsize) that are constant for all rat pups within a given litter. The response variable, Weight, varies for each rat pup.

Portion of the Rat Pup Data Set

LITTER

TREATMENT

LITSIZE

PUP_ID

WEIGHT

SEX

1

Control

12

1

6.60

Male

1

Control

12

2

7.40

Male

1

Control

12

3

7.15

Male

1

Control

12

4

7.24

Male

1

Control

12

5

7.10

Male

1

Control

12

6

6.04

Male

1

Control

12

7

6.98

Male

1

Control

12

8

7.05

Male

1

Control

12

9

6.95

Female

11

Low

16

132

5.65

Male

11

Low

16

133

5.78

Male

21

High

14

258

5.09

Male

21

High

14

259

5.57

Male

Data Exploration

We first examine the data using descriptive statistics and take a graphical look at the data using Boxplots.

data ratpup;

infile "rat_pup.dat" firstobs=2 dlm="09"X;

input pup_id weight sex $litter litsize treatment $;

if treatment = "High" then treat = 1;

if treatment = "Low" then treat = 2;

if treatment = "Control" then treat = 3;

run;

proc format;

value trtfmt 1 = "High"

2 = "Low"

3 = "Control";

run;

In the data step we set up the new variable, Treat, so that the highest value (=3) is coded for Control, so that Control will be the “reference level” for Treat.

title "Summary statistics for weight by treatment and sex";

proc means data=ratpup maxdec=2;

class treat sex;

var weight;

format treat trtfmt.;

run;

tc "Means " \f C \l 1

tc "Summary statistics " \f C \l 2Analysis Variable : weight

Treat

Sex

N Obs

N

Mean

Std Dev

Minimum

Maximum

High

Female

32

32

5.85

0.60

4.48

7.68

Male

33

33

5.92

0.69

5.01

7.70

Low

Female

65

65

5.84

0.45

4.75

7.73

Male

61

61

6.03

0.38

5.25

7.13

Control

Female

54

54

6.12

0.69

3.68

7.57

Male

77

77

6.47

0.75

4.57

8.33

There is a tendency for the mean Weight to be lower for the High and Low treatment groups, compared to the Control. We also see that the mean for males is somewhat higher than for females within each level of Treat.

We now look at boxplots of Weight for each combination of treatment and sex.

We now graphically examine the relationship between Litter Size and Weight within each Treatment. We first create a new variable called RANKLIT that ranks the litters by size.

/*Create Ranklit*/

proc sort data=ratpup;

by litsize litter;

run;

data ratpup2;

set ratpup;

by litsize litter;

if first.litter then ranklit+1;

label ranklit="New Litter ID (Ordered by Size)";

run;

There appears to be a relationship between litter size and birthweight, so that larger litters tend to have smaller pups, within each treatment and sex, except for Low Males.

Models

We consider several competing models for the Rat Pup Data. We use a top-down model fitting strategy, where we first set up a “loaded” model that includes all the candidate fixed effects, then we decide on an appropriate covariance structure for the random effects, and finally we reduce the fixed effects portion of the model to arrive at our final model.

The table below shows a summary of each of the models we fit. You can use the model numbers to keep track of what we're doing as we move through the examples.

Term / Variable

General Notation

Model

3.1

3.2A

3.2B

3.3

Fixed Effects

Intercept

(0

(

(

(

(

TREAT1

(High vs. Control)

(1

(

(

(

(

TREAT2

(Low vs. Control)

(2

(

(

(

(

SEX1

(Female vs. Male)

(3

(

(

(

(

LITSIZE

(4

(

(

(

(

TREAT1 × SEX1

(5

(

(

(

TREAT2 × SEX1

(6

(

(

(

Random Effects

Litter

(j)

Intercept

uj

(

(

(

(

Residuals

Rat Pup (pup i in litter j)

εij

(

(

(

(

Covariance

Parameters (θD)

for D matrix

Litter Level

Variance of intercepts

σ2litter

(

(

(

(

Covariance Parameters (θR)

for Ri matrix

Rat Pup Level

Variances of Residuals

σ2high

σ2low

σ2control

σ2

σ2high

σ2low

σ2control

σ2high/low,

σ2control

σ2high/low,

σ2control

Model Specification

0123

456

jij

WEIGHTTREAT1TREAT2SEX1

LITSIZETREAT1SEX1TREAT2SEX1

ijjjij

jjijjij

+u+

bbbb

bbb

=+++

++´+´

e

Model 3.1:

jij

u

e

++

ü

ý

þ

m

This model includes the fixed effects of Treatment and Sex and their interaction. It also includes the fixed effect of LITSIZE. There is a random intercept (uj ) for each litter and a random error (εij) for each rat pup.

2

~(0,),

jlitter

uN

s

2

~(0,).

ij

N

es

Hypothesis Tests

A summary of the hypotheses tested in the analysis of the Rat Pup data is shown in the following table. Note that different test types and sometimes different estimation methods are considered for the different hypothesis tests. We use Likelihood Ratio tests (LRT) with REML estimation to help us decide on the appropriate covariance structure; we use an F-test with REML estimation to test the Fixed Effects of the Treatment by Sex interaction; and we use an LRT test with ML estimation to decide whether to keep the Fixed Effects of Treatment in the model.

Hypothesis SpecificationHypothesis Test

Label

Null

(H0)

Alternative

(HA)

Test

Models Compared

Estimation Method

Asymptotic / Approximate Dist. of Test Statistic under H0

Nested Model

(H0)

Reference Model

(HA)

3.1

Drop u0j

(σ2litter = 0)

Retain u0j

(σ2litter > 0)

LRT

Model

3.1A

Model

3.1

REML

0.5χ20 + 0.5χ21

3.2

Homogeneous Residual Variance

(σ2high = σ2low = σ2control = σ2)

Residual Variances are not all equal

LRT

Model

3.1

Model 3.2A

REML

χ22

3.3

Grouped

Heterogeneous

Residual Variance

(σ2high = σ2low)

σ2high ≠ σ2low

LRT

Model

3.2B

Model 3.2A

REML

χ21

3.4

Homogeneous Residual Variance

(σ2high/low = σ2control = σ2)

Grouped Heterogeneous Residual Variance

(σ2high/low ≠

σ2control)

LRT

Model 3.1

Model 3.2B

REML

χ21

3.5

Drop TREATMENT × SEX effects

((5 = (6 = 0)

(5 ≠ 0, or

(6 ≠ 0

Type III

F-test

N/A

Model 3.2B

REML

F(2,194)1

3.6

Drop TREATMENT effects

((1 = (2 = 0)

(1 ≠ 0, or

(2 ≠ 0

LRT

Model 3.3A

Model 3.3

ML

χ22

Type III F-test

N/A

REML

F(2,24.3)

Step 1: Fit a model with a “loaded” mean structure (Model 3.1).

The SAS commands used to fit Model 3.1 to the Rat Pup data using Proc Mixed are shown below.

title "Model 3.1";

proc mixed data=ratpup order=internal covtest ;

class treat sex litter ;

model weight = treat sex litsize treat*sex /

solution ddfm = sat ;

random intercept / subject=litter ;

format treat trtfmt. ;

run ;

Annotated output from running these commands is shown below:

Model 3.1

The Mixed Procedure

Model Information

Data Set WORK.RATPUP

Dependent Variable weight

Covariance Structure Variance Components

Subject Effect litter

Estimation Method REML

Residual Variance Method Profile

Fixed Effects SE Method Model-Based

Degrees of Freedom Method Satterthwaite

Class Level Information

Class Levels Values

treat 3 High Low Control

sex 2 Female Male

litter 27 1 2 3 4 5 6 7 8 9 10 11 12 13

14 15 16 17 18 19 20 21 22 23

24 25 26 27

Dimensions

Covariance Parameters 2

Columns in X 13

Columns in Z Per Subject 1

Subjects 27

Max Obs Per Subject 18

Number of Observations

Number of Observations Read 322

Number of Observations Used 322

Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 490.50994069

1 3 401.17987994 0.00076832

2 1 401.10574867 0.00001581

3 1 401.10432307 0.00000001

Convergence criteria met.

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

Intercept litter 0.09649 0.03279 2.94 0.0016

Residual 0.1635 0.01351 12.10 <.0001

Fit Statistics

-2 Res Log Likelihood 401.1

AIC (smaller is better) 405.1

AICC (smaller is better) 405.1

BIC (smaller is better) 407.7

Solution for Fixed Effects

Standard

Effect sex treat Estimate Error DF t Value Pr > |t|

Intercept 8.3233 0.2733 32.9 30.45 <.0001

treat High -0.9061 0.1915 30.8 -4.73 <.0001

treat Low -0.4670 0.1582 28.2 -2.95 0.0063

treat Control 0 . . . .

sex Female -0.4117 0.07315 295 -5.63 <.0001

sex Male 0 . . . .

litsize -0.1284 0.01875 31.8 -6.85 <.0001

treat*sex Female High 0.1070 0.1318 304 0.81 0.4173

treat*sex Male High 0 . . . .

treat*sex Female Low 0.08387 0.1057 299 0.79 0.4281

treat*sex Male Low 0 . . . .

treat*sex Female Control 0 . . . .

treat*sex Male Control 0 . . . .

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

treat 2 24.3 11.49 0.0003

sex 1 303 46.99 <.0001

litsize 1 31.8 46.87 <.0001

treat*sex 2 302 0.47 0.6282

Step 2: Select a structure for the random effects

We want to decide if we should include the random intercept for each litter in our model, so we need to test Hypothesis 3.1 shown below:

H0: σ2litter = 0 (drop the random litter effects)

HA: σ2litter > 0 (keep the random litter effects)

We now fit a nested model, excluding the random intercept for each litter (the Random statement is excluded from our SAS code).

title "Model 3.1A";

proc mixed data=ratpup order=internal covtest ;

class treat sex litter ;

model weight = treat sex treat*sex litsize /

solution ddfm = sat ;

*random intercept / subject=litter;

format treat trtfmt. ;

run;

The fit statistics that result after fitting Model 3.1A are shown below:

Fit Statistics

-2 Res Log Likelihood 490.5

AIC (smaller is better) 492.5

AICC (smaller is better) 492.5

BIC (smaller is better) 496.3

We can now compare the two models to decide if we wish to keep the random litter effects in the model (i.e. if σ2litter. = 0). We use a likelihood ratio test comparing the -2 Res log likelihood for Model 3.1 to that of Model 3.1A .

Because we are testing a null hypothesis that is on the boundary of the parameter space, we must use a p-value based on a mixture of chi-square distributions to get the appropriate p-value for this test. We use 0.5χ20 + 0.5χ21.

However, we can ignore the χ20 portion, and we simply end up with a p-value that is one-half the p-value for χ21 as shown below.

data _null_;

lrtstat = 490.5 - 401.1;

df = 1;

pvalue = 0.5*(1 - probchi(lrtstat,df));

format pvalue 10.8;

run;

title "P-value for variance of random litter effects";

proc print data=test1;

run;

The results of this test are displayed in the output window:

Obs lrtstat df pvalue

1 89.4 1 0.00000000

Because this result is significant, we will keep the random intercept for each litter in this model and for all subsequent models.

We could stop our analysis at this point and be satisfied with the results, or we could decide to drop the treatment by sex interaction term from the model. However, we will first think about whether we really have equal variances for all of the residuals for all of the treatments.

Step 3: Select a covariance structure for the residuals

One of the very nice features of SAS Proc Mixed is that it allows you to have unequal residual variances for different groups of subjects. We will investigate this option in this part of the lab exercise.

Based on our examination of the boxplots of Weight for different litters by levels of Treatment and sex, we noted that the variance in the Control treated litters tended to be larger than that observed in the High and Low treated litters..

We can build this heterogeneity of variance into the SAS code, using the Repeated statement (which is used to specify the residual covariance structure), as shown in the syntax for Model 3.2A below. Notice that we are only using the repeated statement because we wish to modify the residual variance from being constant for all treatments, to being different for different treatments.

title "Model 3.2A";

proc mixed data=ratpup order=internal;

class treat litter sex;

model weight = treat sex treat*sex litsize /

solution ddfm=sat ;

random int / subject=litter;

repeated / group=treat;

format treat trtfmt.;

run;

The residual variance estimates for each of the treatments based on this model is shown below. Notice that the residual variance for Control is about twice as high as for the High and Low treatments, and that the variance for the High and Low treatments is very similar (.1084 and .0846, respectively).

Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Intercept litter 0.09827

Residual treat High 0.1084

Residual treat Low 0.08459

Residual treat Control 0.2650

The fit statistics for this model are shown below:

Fit Statistics for Model 3.2A

-2 Res Log Likelihood 359.9

AIC (smaller is better) 367.9

AICC (smaller is better) 368.0

BIC (smaller is better) 373.1

We can now proceed to test the following hypothesis:

H0: σ2high = σ2low = σ2control = σ2 (the residual variances are all equal)

HA: At least one pair of residual variances is not equal

Because we are no longer on the boundary of a parameter space, the REML-based likelihood ratio test that we calculate has 2 df, which correspond to the two additional covariance parameters in Model 3.2A vs. Model 3.1.

title "P-value for Hypothesis 3.2";

data _null_;

lrtstat = 401.1-359.9;

df=2;

pvalue = 1 - probchi(lrtstat,df);

format pvalue 10.8;

put lrtstat=df=pvalue=;

run;

The results of this hypothesis test are displayed in the SAS log . The result is significant, so we decide that Model 3.2A is preferable to Model 3.1:

lrtstat=41.2 df=2 pvalue=0.000000

We now further refine the residual covariance structure by deciding if we need to have separate residuals variances for each level of treatment, or if we can pool the residual variance for the High and Low treatment groups. To do this, we fit Model 3.2B, but first we create a new variable, called TRTGRP, combining the High and Low treatment groups.

title "RATPUP3 dataset";

data ratpup3;

set ratpup2;

if treatment in ("High", "Low") then TRTGRP = 1;

if treatment = "Control" then TRTGRP = 2;

run;

proc format;

value tgrpfmt 1="High/Low"

2="Control";

run;

title "Model 3.2B";

proc mixed data=ratpup3 order=internal;

class treat litter sex trtgrp;

model weight = treat sex treat*sex litsize /

solution ddfm=sat;

random int / subject=litter;

repeated / group=trtgrp;

format treat trtfmt. trtgrp tgrpfmt.;

run;

The covariance parameter estimates for Model 3.2B are shown below:

Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Intercept litter 0.09895

Residual TRTGRP High/Low 0.09242

Residual TRTGRP Control 0.2650

The fit statistics for this model are:

Fit Statistics for Model 3.2B

-2 Res Log Likelihood 361.1

AIC (smaller is better) 367.1

AICC (smaller is better) 367.2

BIC (smaller is better) 371.0

We can now test the hypotheses shown below:

H0: σ2high = σ2low

HA: σ2high ≠ σ2low

We use a REML-based likelihood ratio test. The SAS code for this hypothesis test is shown below:

title "P-value for Hypothesis 3.3";

data _null_;

lrtstat = 361.1-359.9;

df = 1;

pvalue = 1-probchi(lrtstat, df);

format pvalue 10.8;

put lrtstat = df = pvalue =;

run;

The results of this test are not significant (p=0.27), so we decide to use the more simple model, Model 3.2B as the preferred model at this stage, and we pool the residual variance for the High and Low treatment groups.

lrtstat=1.2 df=1 pvalue=0.27332168

Hypothesis 3.4 involves comparing Model 3.2B to Model 3.1.

H0: σ2high/low = σ2control = σ2

HA: σ2high/low ≠ σ2control

We do not show syntax for Hypothesis 3.4 here. Instead, we simply work with Model 3.2B with unequal variances for the treated and control litters as our preferred model.

Step 4: Reduce the model by removing nonsignificant fixed effects.

We now test Hypothesis 3.5 to decide whether we can remove the Fixed Effects associated with the Treatment by Sex interaction. For this test we use an approximate F-test, with the Satterthwaite degrees of freedom, from our current model, Model 3.2B.

The Type 3 F-test results for Fixed Effects for Model 3.2B are shown below. We decide to drop the Treatment by Sex interaction (p=.7289) from the model.

Model 3.2B

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

treat 2 24.4 11.18 0.0004

sex 1 296 59.17 <.0001

treat*sex 2 194 0.32 0.7289

litsize 1 31.2 49.33 <.0001

We can now test Hypothesis 3.6. We choose to do this using a likelihood ratio test. Because the two models that we wish to compare differ only in the Fixed Effects that they include, we use Maximum Likelihood (method=ml) to fit both models, and then compare them.

title "Model 3.3 using ML";

proc mixed data=ratpup3 order=internal method=ml;

class treat litter sex trtgrp;

model weight = treat sex litsize /solution ddfm=sat ;

random int / subject=litter;

repeated / group=trtgrp;

format treat trtfmt.;

run ;

title "Model 3.3A using ML";

proc mixed data=ratpup3 order=internal method=ml;

class treat litter sex trtgrp;

model weight = sex litsize /solution ddfm=sat ;

random int / subject=litter;

repeated / group=trtgrp;

format treat trtfmt.;

run ;

The fit statistics for Model 3.3 and Model 3.3A, both fit using ML estimation are shown below:

Model 3.3 using ML

Fit Statistics

-2 Log Likelihood 337.8

AIC (smaller is better) 353.8

AICC (smaller is better) 354.2

BIC (smaller is better) 364.1

Model 3.3A using ML

Fit Statistics

-2 Log Likelihood 356.4

AIC (smaller is better) 368.4

AICC (smaller is better) 368.6

BIC (smaller is better) 376.1

The SAS syntax below can be used to carry out the LRT for the effect of Treatment. We use a chi-square distribution with 2 df, corresponding to the two fixed effects associated with treatment in Model 3.3, compared to Model 3.3A.

title "P-value for Hypothesis 3.6";

data _null_;

lrtstat = 356.5-337.8;

df=2;

pvalue = 1 - probchi(lrtstat,df);

format pvalue 10.8;

put lrtstat=df=pvalue=;run;

The result of this test is significant, as shown in the SAS log, and reproduced below, so we conclude that Treatment does indeed have an effect on birth weight of rat pups, after controlling for Litter size and Sex.

lrtstat=18.7 df=2 pvalue=0.00008697

We now refit Model 3.3 using REML estimation to get unbiased estimates of the covariance parameters. We also add a number of options to the SAS syntax. We request that estimates of V, the implied marginal variance-covariance and correlation matrices for the third litter (chosen because it is a small litter) be displayed in the output. We also estimate post-hoc tests for all differences among the treatment means, using the Tukey-Kramer adjustment for multiple comparisons. Finally, we include model diagnostics, which are a regular part of SAS 9.2.

ods rtf file="ratpup_diagnostics.rtf";

ods graphics on ;

title "Model 3.3 using REML, Model diagnostics";

proc mixed data=ratpup3 order=internal boxplot covtest ;

class treat litter sex trtgrp;

model weight = treat sex litsize / solution ddfm=sat

residual

outpred = pdat1 outpredm = pdat2

influence(iter=5 effect=litter est) ;

id pup_id litter trtgrp treat sex litsize;

random int / subject=litter solution v=3 vcorr=3 ;

repeated / group=trtgrp ;

lsmeans treat / adjust=tukey ;

format treat trtfmt. trtgrp tgrpfmt.;

run ;

ods graphics off;

ods rtf close;

The ods rtf file= statement tells SAS to save the output from this procedure to a file on the local hard drive. We have specified rtf as the file type, but other types, including html, are available. The ods rtf close statement closes the output file destination.

The ods graphics on statement produces the experimental statistical graphics available for proc mixed in SAS Release 9.1.3. The ods graphics off statement is submitted at the end of the code.

The output below is generated after fitting this model.

Model 3.3 using REML, Model diagnostics

The Mixed Procedure

Model Information

Data Set WORK.RATPUP3

Dependent Variable weight

Covariance Structure Variance Components

Subject Effect litter

Group Effect TRTGRP

Estimation Method REML

Residual Variance Method None

Fixed Effects SE Method Model-Based

Degrees of Freedom Method Satterthwaite

Class Level Information

Class Levels Values

treat 3 High Low Control

litter 27 1 2 3 4 5 6 7 8 9 10 11 12 13

14 15 16 17 18 19 20 21 22 23

24 25 26 27

sex 2 Female Male

TRTGRP 2 High/Low Control

Dimensions

Covariance Parameters 3

Columns in X 7

Columns in Z Per Subject 1

Subjects 27

Max Obs Per Subject 18

Number of Observations

Number of Observations Read 322

Number of Observations Used 322

Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 487.48200292

1 3 356.30395333 0.00022075

2 1 356.27846330 0.00000022

3 1 356.27843834 0.00000000

Convergence criteria met.

Estimated V Matrix for litter 3

Row Col1 Col2 Col3 Col4

1 0.3636 0.09900 0.09900 0.09900

2 0.09900 0.3636 0.09900 0.09900

3 0.09900 0.09900 0.3636 0.09900

4 0.09900 0.09900 0.09900 0.3636

Estimated V Correlation Matrix for litter 3

Row Col1 Col2 Col3 Col4

1 1.0000 0.2722 0.2722 0.2722

2 0.2722 1.0000 0.2722 0.2722

3 0.2722 0.2722 1.0000 0.2722

4 0.2722 0.2722 0.2722 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Group Estimate Error Value Pr Z

Intercept litter 0.09900 0.03288 3.01 0.0013

Residual TRTGRP High/Low 0.09178 0.009855 9.31 <.0001

Residual TRTGRP Control 0.2646 0.03395 7.79 <.0001

Fit Statistics

-2 Res Log Likelihood 356.3

AIC (smaller is better) 362.3

AICC (smaller is better) 362.4

BIC (smaller is better) 366.2

Solution for Fixed Effects

Standard

Effect sex treat Estimate Error DF t Value Pr > |t|

Intercept 8.3276 0.2741 33.3 30.39 <.0001

treat High -0.8623 0.1829 25.7 -4.71 <.0001

treat Low -0.4337 0.1523 24.3 -2.85 0.0088

treat Control 0 . . . .

sex Female -0.3434 0.04204 256 -8.17 <.0001

sex Male 0 . . . .

litsize -0.1307 0.01855 31.1 -7.04 <.0001

Solution for Random Effects

Std Err

Effect litter Estimate Pred DF t Value Pr > |t|

Intercept 1 0.1636 0.1633 64.1 1.00 0.3204

Intercept 2 -0.06635 0.1566 62.3 -0.42 0.6733

Intercept 3 -0.1668 0.2346 34.5 -0.71 0.4821

Intercept 4 -0.05175 0.1566 62.3 -0.33 0.7422

Intercept 5 0.3224 0.1591 64.3 2.03 0.0469

Intercept 6 -0.05511 0.1843 54.1 -0.30 0.7661

Intercept 7 0.3756 0.1657 42.4 2.27 0.0286

Intercept 8 -0.02604 0.1607 47.7 -0.16 0.8720

Intercept 9 -0.5708 0.1606 47.6 -3.55 0.0009

Intercept 10 0.07527 0.1592 64.3 0.47 0.6378

Intercept 11 0.04793 0.1345 33.5 0.36 0.7238

Intercept 12 0.01423 0.2357 29.2 0.06 0.9523

Intercept 13 -0.3857 0.1296 44.6 -2.98 0.0047

Intercept 14 0.02224 0.1294 36.4 0.17 0.8646

Intercept 15 0.03691 0.1268 42.3 0.29 0.7724

Intercept 16 0.07164 0.1270 42.6 0.56 0.5758

Intercept 17 -0.4146 0.1268 39.4 -3.27 0.0022

Intercept 18 0.4574 0.1299 36.8 3.52 0.0012

Intercept 19 -0.1987 0.1412 44.9 -1.41 0.1663

Intercept 20 0.3486 0.1345 33.5 2.59 0.0140

Intercept 21 -0.2996 0.1623 29.8 -1.85 0.0750

Intercept 22 -0.5306 0.1474 37.6 -3.60 0.0009

Intercept 23 0.2997 0.2025 38 1.48 0.1471

Intercept 24 0.1937 0.1509 33.6 1.28 0.2079

Intercept 25 0.2443 0.1537 40.9 1.59 0.1197

Intercept 26 0.2499 0.1497 39.7 1.67 0.1029

Intercept 27 -0.1574 0.1496 39.6 -1.05 0.2989

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

treat 2 24.3 11.39 0.0003

sex 1 256 66.72 <.0001

litsize 1 31.1 49.62 <.0001

Least Squares Means

Standard

Effect treat Estimate Error DF t Value Pr > |t|

treat High 5.5518 0.1448 23.3 38.34 <.0001

treat Low 5.9804 0.1046 21.4 57.19 <.0001

treat Control 6.4141 0.1105 27 58.07 <.0001

Differences of Least Squares Means

Standard

Effect treat _treat Estimate Error DF t Value Pr > |t| Adjustment Adj P

treat High Low -0.4286 0.1755 23.4 -2.44 0.0225 Tukey-Kramer 0.0558

treat High Control -0.8623 0.1829 25.7 -4.71 <.0001 Tukey-Kramer 0.0002

treat Low Control -0.4337 0.1523 24.3 -2.85 0.0088 Tukey-Kramer 0.0231

Conditional Residuals

The conditional residuals are the differences between the observed values and the litter-specific predicted values for birth weight, based on the estimated fixed effects and the EBLUPs (Estimated Best Linear Unbiased Predictors) of the random effects. They can be obtained in SAS using the following syntax. We present the conditional raw residuals separately for the High/Low and the control groups, because we found that they had different residual variances.

proc sort data=pdat1;

by trtgrp;

run;

/*Distribution of conditional raw residuals by treatment group*/

proc univariate data=pdat1 normal;

format trtgrp tgrpfmt.;

by trtgrp;

var resid;

histogram;

qqplot / normal(mu=est sigma=est);

run;

Model 3.3 using REML, Model diagnostics

-1.25-1.00-0.75-0.50-0.2500.250.500.75

0

5

10

15

20

25

30

35

40

Percent

Residual

Model 3.3 using REML, Model diagnostics

-3.0-2.4-1.8-1.2-0.600.61.2

0

10

20

30

40

50

60

70

Percent

Residual

Model 3.3 using REML, Model diagnostics

-3-2-10123

-1.5

-1.0

-0.5

0

0.5

1.0

Residual

Normal Quantiles

Model 3.3 using REML, Model diagnostics

-3-2-10123

-4

-3

-2

-1

0

1

2

Residual

Normal Quantiles

We can check a plot of the conditional residuals vs. the predicted values using the syntax below:

proc sort data=pdat1;

by trtgrp;

run;

proc sgplot data=pdat1;

format trtgrp tgrpfmt.;

by trtgrp;

scatter y=resid x=pred ;

run;

quit;

Conditional Studentized Residuals

The conditional studentized residual for an observation is the difference between the observed value and the predicted value, based on both the fixed and random effects in the model, divided by its estimated standard deviation. The standard deviation of a residual depends on both the residual variance for the treatment group (High/Low or Control), and the leverage of the observation. The conditional studentized residuals are more appropriate to examine model assumptions and to detect outliers and potentially influential points than are the raw residuals, because the raw residuals may not come from populations with equal variance, as was the case in this analysis.

Furthermore, if an observation has a large conditional residual, we cannot know if it is large because it comes from a population with a larger variance, or if it is just an unusual observation (Schabenberger, 2004).

The graph below shows the conditional studentized residuals for each litter. Notice that these residuals are much more similar in their degree of variability than the original data were.

Influence Diagnostics

In this section, we examine influence diagnostics generated by Proc Mixed for the rat pups and their litters, obtained using SAS 9.2.

Overall and Fixed Effects Influence Diagnostics

A panel of Fixed Effects Influence Diagnostics from SAS ODS graphics output is shown below. The graphs included in this panel show the effect of removing each litter on selected model summary statistics. The graphs shown below were obtained by using the following syntax:

model weight = treat sex litsize / solution ddfm=sat

influence(iter=5 effect=litter est) ;

Influence on Covariance Parameters

The ODS graphics output from SAS also produces influence diagnostics for the covariance parameter estimates. These can be useful in identification of litters that may have a large effect on the estimate of the between-litter variance (σ2litter), or on the residual variance estimates for the two treatment groups (High/Low and Control).

Based on these diagnostic plots, it looks like litter 6 (which received the Control dose) was influential in this analysis. Try refitting Model 3.3 without litter 6, and see what effect it has.

A note on denominator degrees of freedom (df) methods for F-tests in SAS:

The containment method is the default denominator degrees of freedom method for Proc Mixed when a random statement is used, and no denominator degrees of freedom method is specified. The containment method can be explicitly requested by using the ddfm=contain option. The containment method attempts to choose the correct degrees of freedom for fixed effects in the model, based on the syntax used to define the random effects. For example, the df for the variable TREAT would be the df for the random effect that contains the word “treat” in it. The syntax

random int / subject=litter(treat);

would cause SAS to use the degrees of freedom for litter(treat) as the denominator degrees of freedom for the F-test of the fixed effects of TREAT. If no random effect syntactically includes the word “treat”, the residual degrees of freedom would be used.

The Satterthwaite approximation (ddfm=sat) is intended to produce a more accurate F-test approximation, and hence a more accurate p-value for the F-test. The SAS documentation warns that the small-sample properties of the Satterthwaite approximation have not been thoroughly investigated for all types of models implemented in Proc Mixed.

The Kenward-Roger method (ddfm=kr) is an attempt to make a further adjustment to the F-statistics calculated by SAS, to take into account the fact that the REML estimates of the covariance parameters are in fact estimates, and not known quantities. This method inflates the marginal variance-covariance matrix, and then uses the Satterthwaite method on the resulting matrix.

The between-within method (ddfm=bw) is the default for repeated measures designs, and may also be specified for analyses that include a random statement. This method divides the residual degrees of freedom into a between-subjects portion and a within-subjects portion. If levels of a covariate change within a subject (a time-dependent covariate), then the degrees of freedom for effects associated with that covariate are the within-subjects df. If levels of a covariate are constant within subjects (a time-independent covariate), then the degrees of freedom are assigned to the between-subjects df for F-tests.

The residual method (ddfm=resid) assigns n ( rank(X) as the denominator degrees of freedom for all fixed effects in the model. The rank of X is the number of linearly independent columns in the X matrix for a given model. This is the same as the degrees of freedom used in OLS regression (i.e., n ( p, where n is the total number of observations in the data set and p is the number of fixed effect parameters being estimated).

Lab Example 3

Three-Level Clustered Data

The Classroom Data

(Chapter 4 in WWG)

This is a three-level clustered data set, in which the units of analysis are students (level 1), who are nested within classrooms (clusters that form level 2), and the classrooms are in turn nested in schools (clusters of clusters that form level 3).

We first import the classroom data from the .csv file where it is stored:

PROC IMPORT OUT= WORK.classroom

DATAFILE= "classroom.csv"

DBMS=CSV REPLACE;

GETNAMES=YES;

DATAROW=2;

GUESSINGROWS=20;

RUN;

The data set has information on all three levels within the same file. Each row contains the School ID, Classroom ID, and Child ID, along with information about the child at level 1 (SES, MINORITY, MATHKIND, MATHGAIN), information about the teacher at level 2 (YEARSTEA) and information about the school at level 3 (HOUSEPOV).

Obs

schoolid

classid

childid

ses

minority

mathkind

mathgain

yearstea

housepov

1

1

160

1

0.46

1

448

32

1

0.082

2

1

160

2

-0.27

1

460

109

1

0.082

3

1

160

3

-0.03

1

511

56

1

0.082

4

1

217

4

-0.38

1

449

83

2

0.082

5

1

217

5

-0.03

1

425

53

2

0.082

6

1

217

6

0.76

1

450

65

2

0.082

7

1

217

7

-0.03

1

452

51

2

0.082

8

1

217

8

0.2

1

443

66

2

0.082

We get descriptive statistics for the student-level characteristics (level 1 characteristics) of the 1190 students in the dataset:

title "Level 1 Descriptive Statistics";

proc means data = classroom;

var sex minority mathkind mathgain ses;

run;

Level 1 Descriptive Statistics

Variable N Mean Std Dev Minimum Maximum

----------------------------------------------------------------------

sex 1190 0.5058824 0.5001756 0 1.0000000

minority 1190 0.6773109 0.4677015 0 1.0000000

mathkind 1190 466.6588235 41.4585375 290.0000000 629.0000000

mathgain 1190 57.5663866 34.6138163 -110.0000000 253.0000000

ses 1190 -0.0129832 0.7415381 -1.6100000 3.2100000

----------------------------------------------------------------------

Next, we create a classroom-level (level 2) data set and get descriptive statistics for the 312 classrooms in that data set.

proc sort data = classroom;

by classid;

run;

data level2;

set classroom;

by classid;

if first.classid then output;

run;

title "Level 2 Descriptive Statistics";

proc means data = level2;

var yearstea;

run;

The only numeric variable we will use here is YEARSTEA, the years of teaching experience of the classroom teacher.

Level 2 Descriptive Statistics

Analysis Variable : yearstea

N Mean Std Dev Minimum Maximum

-------------------------------------------------------------------

312 12.2761538 9.6490462 0 40.0000000

-------------------------------------------------------------------

Next, we generate a school level (level 3) data set with the 112 schools in it, and get descriptive statistics at the school level. The only numeric variable we will in this data set is HOUSEPOV, the proportion of houses in poverty in the neighborhood surrounding the school. HOUSEPOV has a lot of variability and ranges from 1.2% to 56%.

Level 3 Descriptive Statistics

The MEANS Procedure

Analysis Variable : housepov

N Mean Std Dev Minimum Maximum

-------------------------------------------------------------------

107 0.1941495 0.1396932 0.0120000 0.5640000

-------------------------------------------------------------------

Models

We now fit models for the classroom data set. We use the original data set (CLASSROOM), with all values combined into one data set. Only observations that are complete for the dependent variable and all predictors will be included in each model.

We use a step-up approach for building the models for this 3-level data set, to mirror the way models are usually fit using the HLM (Bryck and Raudenbush) approach.

The first model we fit is an "empty" or null model, with only the intercept, and no predictors in it. We use this model to derive estimates of the residual (level 1) variance, the variance between classrooms (level 2 variance) and the variance between schools (level 3 variance). We then add level 1 predictors, level 2 predictors, and level 3 predictors to the model.

We also examine the EBLUPs for each classroom and school, using the /s option in the random statement.

Before fitting the first model, we sort the data set by School ID and Classroom ID. Although this step is not necessary, it makes it easier to read the output produced for the EBLUPs by SAS.

proc sort data = classroom;

by schoolid classid;

run;

Model 4.1 below includes the noclprint option in the proc statement, which suppresses printing of the levels of all the schools and classrooms, to save space in the output. Also, we use v=4 and vcorr=4, to print out the estimated V and Vcorrelation matrices for the 4th school, because it is one of the smaller schools, with only 6 students, making it easier to read the printout.

title "Model 4.1 Empty Model: Get Eblups for Classrooms";

proc mixed data = classroom noclprint covtest;

class classid schoolid;

model mathgain = / solution;

random intercept / subject = schoolid v=4 vcorr=4 s;

random intercept / subject = classid(schoolid);

run;

We show partial output from this model below:

Model 4.1 Empty Model: Get Eblups for Classrooms

The Mixed Procedure

Model Information

Data Set WORK.CLASSROOM

Dependent Variable mathgain

Covariance Structure Variance Components

Subject Effects schoolid,

classid(schoolid)

Estimation Method REML

Residual Variance Method Profile

Fixed Effects SE Method Model-Based

Degrees of Freedom Method Containment

Dimensions

Covariance Parameters 3

Columns in X 1

Columns in Z Per Subject 10

Subjects 107

Max Obs Per Subject 31

Number of Observations

Number of Observations Read 1190

Number of Observations Used 1190

Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 11809.55097802

1 2 11768.77565175 0.00000220

2 1 11768.76493957 0.00000000

Convergence criteria met.

Estimated V Matrix for schoolid 4

Row Col1 Col2 Col3 Col4 Col5 Col6

1 1204.91 176.63 176.63 176.63 77.4419 77.4419

2 176.63 1204.91 176.63 176.63 77.4419 77.4419

3 176.63 176.63 1204.91 176.63 77.4419 77.4419

4 176.63 176.63 176.63 1204.91 77.4419 77.4419

5 77.4419 77.4419 77.4419 77.4419 1204.91 176.63

6 77.4419 77.4419 77.4419 77.4419 176.63 1204.91

Estimated V Correlation Matrix for schoolid 4

Row Col1 Col2 Col3 Col4 Col5 Col6

1 1.0000 0.1466 0.1466 0.1466 0.06427 0.06427

2 0.1466 1.0000 0.1466 0.1466 0.06427 0.06427

3 0.1466 0.1466 1.0000 0.1466 0.06427 0.06427

4 0.1466 0.1466 0.1466 1.0000 0.06427 0.06427

5 0.06427 0.06427 0.06427 0.06427 1.0000 0.1466

6 0.06427 0.06427 0.06427 0.06427 0.1466 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr > Z

Intercept schoolid 77.4419 32.6072 2.37 0.0088

Intercept classid(schoolid) 99.1853 41.8043 2.37 0.0088

Residual 1028.28 49.0441 20.97 <.0001

Fit Statistics

-2 Res Log Likelihood 11768.8

AIC (smaller is better) 11774.8

AICC (smaller is better) 11774.8

BIC (smaller is better) 11782.8

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 57.4271 1.4431 106 39.80 <.0001

The solutions for Random Effects shown below is only for schoolid = 1 and 2. SAS has left space here for 10 classrooms within each school, because the largest school had 10 classrooms. The other classrooms are simply left with values of zero in the printout. The p-values for the individual random effects are not really of interest to us, except to point out unusual schools or classrooms.

Solution for Random Effects

Std Err

Effect classid schoolid Estimate Pred DF t Value

Intercept 1 0.9411 7.1657 878 0.13

Intercept 160 1 1.6380 8.9188 878 0.18

Intercept 217 1 -0.4326 8.1149 878 -0.05

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 2 2.5471 7.0625 878 0.36

Intercept 197 2 -1.6939 9.1905 878 -0.18

Intercept 211 2 2.5129 8.6881 878 0.29

Intercept 307 2 2.4433 8.6881 878 0.28

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Intercept 0 9.9592 878 0.00

Solution for Random Effects

Effect classid schoolid Pr > |t|

Intercept 1 0.8955

Intercept 160 1 0.8543

Intercept 217 1 0.9575

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 2 0.7184

Intercept 197 2 0.8538

Intercept 211 2 0.7725

Intercept 307 2 0.7786

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

Intercept 1.0000

If we want to get more compact information about the schools and classrooms Eblups, we can use the alternative SAS syntax for the random effects:

title "Model 4.1: Alternative Syntax";

ods exclude solutionr;

ods output solutionr=eblupdat;

proc mixed data = classroom noclprint covtest;

class classid schoolid;

model mathgain = / s;

random schoolid / solution;

random intercept / subject = classid(schoolid) solution;

run;

This causes SAS to think of this data set as having only one subject, but it also causes SAS to print out the random intercepts for schools first, followed by the random intercepts for classrooms, making it easier to read.

Solution for Random Effects

Std Err

Effect classid schoolid Estimate Pred DF t Value Pr > |t|

schoolid 1 0.9411 7.1657 878 0.13 0.8955

schoolid 2 2.5471 7.0625 878 0.36 0.7184

schoolid 3 13.6730 6.6390 878 2.06 0.0397

. . .

Intercept 160 1 1.6380 8.9188 878 0.18 0.8543

Intercept 217 1 -0.4326 8.1149 878 -0.05 0.9575

Intercept 197 2 -1.6939 9.1905 878 -0.18 0.8538

Intercept 211 2 2.5129 8.6881 878 0.29 0.7725

Intercept 307 2 2.4433 8.6881 878 0.28 0.7786

Intercept 11 3 3.8699 8.6607 878 0.45 0.6551

Intercept 137 3 5.5630 9.1818 878 0.61 0.5448

Intercept 145 3 8.1016 8.4622 878 0.96 0.3386

Intercept 228 3 -0.02247 8.8971 878 -0.00 0.9980

We now wish to fit a model (Model 4.1) with fixed effects for level 1 covariates, plus the random effects for school and classroom. We want to compare this overall model with the model containing the intercept only (Model 4.1), using a likelihood ratio test. To do this, we need to fit both models using ML estimation.

title "Model 4.1: ML Estimation";

proc mixed data = classroom noclprint method=ml;

class classid schoolid;

model mathgain = / s;

random schoolid ;

random intercept / subject = classid(schoolid) ;

run;

title "Model 4.2: ML Estimation include level 1 variables";

proc mixed data = classroom noclprint method = ml;

class classid schoolid;

model mathgain = mathkind sex minority ses / solution;

random intercept / subject = schoolid;

random intercept / subject = classid(schoolid);run;

Model 4.1: ML Estimation

Fit Statistics

-2 Log Likelihood 11771.3

AIC (smaller is better) 11779.3

AICC (smaller is better) 11779.4

BIC (smaller is better) 11790.0

Model 4.2: ML Estimation include level 1 variables

Fit Statistics

-2 Log Likelihood 11391.0

AIC (smaller is better) 11407.0

AICC (smaller is better) 11407.1

BIC (smaller is better) 11428.3

We use a likelihood ratio Chi-square test, with 4 df, because we have 4 covariates in Model 4.2, vs. none in Model 4.1. We decide to keep Model 4.2, because our LR chi-square test is significant (p<0.001).

We now include level 2 (classroom level) predictors (YEARSTEA, MATHPREP, MATHKNOW) to the model, and use t-tests to check the significance of each of these continuous predictors. We do not use Method=ML for this model.

model mathgain = mathkind sex minority ses yearstea mathprep mathknow / solution;

The t-tests for each of these variables is not significant at alpha=0.05, so we don't include any of these level-2 predictors in our model.

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 282.02 11.7016 103 24.10 <.0001

mathkind -0.4750 0.02275 792 -20.88 <.0001

sex -1.3395 1.7186 792 -0.78 0.4360

minority -7.8680 2.4181 792 -3.25 0.0012

ses 5.4194 1.2760 792 4.25 <.0001

yearstea 0.03975 0.1170 792 0.34 0.7343

mathprep 1.0948 1.1482 792 0.95 0.3406

mathknow 1.9149 1.1468 792 1.67 0.0954

We now include the level 3 (school level) predictor, HOUSEPOV, and check the t-test for this variable.

model mathgain = mathkind sex minority ses housepov / solution;

Because the t-test for this variable is not significant (p=.25), we don't include it. Our final model is model 4.2, containing level 1 predictors and the random effects for school and classroom.

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 285.06 11.0208 106 25.87 <.0001

mathkind -0.4709 0.02228 873 -21.13 <.0001

sex -1.2345 1.6574 873 -0.74 0.4566

minority -7.7557 2.3850 873 -3.25 0.0012

ses 5.2398 1.2450 873 4.21 <.0001

housepov -11.4398 9.9373 873 -1.15 0.2500

We now refit Model 4.2, but this time we include two new ODS (output delivery system) statements. The first statement prevents the Eblups from being printed in the output window, but they are still generated and placed into a dataset called Eblups_4_2. We can then use this data set to check on the distribution of the Eblups.

ods exclude SolutionR;

ods output SolutionR=Eblups_4_2;

title "Model 4.2 (Final)";

proc mixed data = classroom noclprint covtest;

class classid schoolid;

model mathgain = mathkind sex minority ses /

solution outpred = pdat1;

random schoolid /solution;

random classid(schoolid) / solution;

run;

Model 4.2 (Final)

The Mixed Procedure

Model Information

Data Set WORK.CLASSROOM

Dependent Variable mathgain

Covariance Structure Variance Components

Estimation Method REML

Residual Variance Method Profile

Fixed Effects SE Method Model-Based

Degrees of Freedom Method Containment

Dimensions

Covariance Parameters 3

Columns in X 5

Columns in Z 419

Subjects 1

Max Obs Per Subject 1190

Number of Observations

Number of Observations Read 1190

Number of Observations Used 1190

Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 11448.84660718

1 2 11385.80958266 0.00000140

2 1 11385.80306098 0.00000000

Convergence criteria met.

Covariance Parameter Estimates

Standard Z

Cov Parm Estimate Error Value Pr > Z

schoolid 75.2154 25.9224 2.90 0.0019

classid(schoolid) 83.2427 29.3680 2.83 0.0023

Residual 734.59 34.7008 21.17 <.0001

Fit Statistics

-2 Res Log Likelihood 11385.8

AIC (smaller is better) 11391.8

AICC (smaller is better) 11391.8

BIC (smaller is better) 11399.8

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 282.79 10.8533 106 26.06 <.0001

mathkind -0.4698 0.02227 874 -21.10 <.0001

sex -1.2511 1.6577 874 -0.75 0.4506

minority -8.2620 2.3401 874 -3.53 0.0004

ses 5.3464 1.2411 874 4.31 <.0001

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

mathkind 1 874 445.19 <.0001

sex 1 874 0.57 0.4506

minority 1 874 12.46 0.0004

ses 1 874 18.56 <.0001

We now look at the distribution of the Eblups for the classrooms, using the SAS commands below. Notice that we select only classroom Eblups, by using a Where statement.

title "Figure 4.5";

proc univariate data=eblups_4_2 noprint;

where effect="classid(schoolid)";

var estimate;

histogram;

qqplot / normal (mu=est sigma=est);

format estimate 5.2;

run;

We look at the distribution of the Eblups for the schools by using the SAS commands below. Notice that we select only school Eblups, by again using a Where statement.

title "Figure 4.6";

proc univariate data=eblups_4_2 noprint;

where effect="schoolid";

var estimate;

histogram;

qqplot / normal (mu=est sigma=est);

format estimate 5.2;

run;

Finally, we can look at the distribution of the conditional random residuals. These residuals are not in the Eblups data set, but are in Pdat1, the data set that we output as part of the model statement:

model mathgain = mathkind sex minority ses /

solution outpred = pdat1;

title "Figure 4.7";

proc univariate data=pdat1 noprint;

var resid;

histogram;

qqplot / normal (mu=est sigma=est);

format resid 5.;

run;

From these plots, it looks like we have some outliers—that is children who did either very well or very poorly, and for whom our model did not predict well.

The scatterplot of residuals vs. predicted values appears to be reasonable, with no indication of unequal variance in the graph below.

Lab Example 4

Repeated Measures Data

The Rat brain Data

(Chapter 5 in WWG)

The Rat Brain data set was collected for five animals, where nucleotide activation was measured in five brain regions using two treatments. Here we show the Rat Brain data for three of the brain regions in its original form.

data ratbrain_wide;

infile "ratbrain.dat" firstobs=2 dlm="09"X;

input animal $ Carb_BST Carb_LS Carb_VDB

Basal_BST Basal_LS Basal_VDB;

proc print data=ratbrain_wide;

run;

The printout of the Rat Brain data set in wide form shows that we have 6 measurements for each rat, based on the two treatments (Basal and Carbachol), and three brain regions (BST, LS, and VDB). All six measurements for the same rat are on the same row of data.

Printout of Rat Brain Wide Data Set

Basal_ Basal_

Obs animal Carb_BST Carb_LS Carb_VDB BST Basal_LS VDB

1 R111097 371.71 302.02 449.70 366.19 199.31 187.11

2 R111397 492.58 355.74 459.58 375.58 204.85 179.38

3 R100797 664.72 587.10 726.96 458.16 245.04 237.42

4 R100997 515.29 437.56 604.29 479.81 261.19 195.51

5 R110597 589.25 493.93 621.07 462.79 278.33 262.05

Analysis comparing regions for Basal (control) treatment using Proc GLM

We begin by carrying out a traditional repeated measures ANOVA analysis of the ratbrain data set in Wide Form, using Proc GLM. Initially we will only be looking at the effect of brain region for the Basal treatment. It is important to remember three major points about this analytic method:

1. If there are any missing data points for any brain region for any subject, the entire subject will be dropped from the analysis.

2. The only type of covariance structure allowed by Proc GLM for the univariate analysis of repeated measures data is Compound Symmetric. Adjustments (including Greenhouse-Geisser and Huyn-Feldt) are available to compensate for lack of compound symmetry in the variance-covariance matrix.

3. Time-varying covariates cannot be included in the analysis using Proc GLM.

The Proc GLM syntax sets up the analysis as a multivariate dependent variable problem. We let SAS know in the repeated statement that there is one repeated measures factor, Region, and that it has three levels.

title "Proc GLM Repeated Measures ANOVA";

title2 "Compare Region for Basal Only";

proc glm data=ratbrain_wide;

model Basal_BST Basal_LS Basal_VDB = / nouni;

repeated region 3 / summary;

run; quit;

Partial output from Proc GLM is shown below. There are two parts of the analysis, first a multivariate (MANOVA) analysis, and then a univariate ANOVA. The MANOVA results are similar to what you would get using SAS Proc Mixed with Covtype=UN, and the univariate analysis results are like what you would get with Proc Mixed with Covtype=CS (compound symmetry).

Proc GLM Repeated Measures ANOVA

Compare Region for Basal Only

The GLM Procedure

Number of Observations Read 5

Number of Observations Used 5

Repeated Measures Level Information

Dependent Variable Basal_BST Basal_LS Basal_VDB

Level of region 1 2 3

MANOVA Test Criteria and Exact F Statistics

for the Hypothesis of no region Effect

H = Type III SSCP Matrix for region

E = Error SSCP Matrix

S=1 M=0 N=0.5

Statistic Value F Value Num DF Den DF Pr > F

Wilks' Lambda 0.01068796 138.84 2 3 0.0011

Pillai's Trace 0.98931204 138.84 2 3 0.0011

Hotelling-Lawley Trace 92.56319571 138.84 2 3 0.0011

Roy's Greatest Root 92.56319571 138.84 2 3 0.0011

The GLM Procedure

Repeated Measures Analysis of Variance

Univariate Tests of Hypotheses for Within Subject Effects

Source DF Type III SS Mean Square F Value Pr > F

region 2 139642.4535 69821.2267 150.17 <.0001

Error(region) 8 3719.4783 464.9348

Adj Pr > F

Source G - G H - F

region <.0001 <.0001

Error(region)

Greenhouse-Geisser Epsilon 0.6129

Huynh-Feldt Epsilon 0.7441

Rearranging Data from Wide to Long

The appropriate data structure for an analysis using SAS Proc Mixed is the vertical or “long” structure. The table below shows an example of the data appropriate structure for a longitudinal or repeated measures data set.

Subject

Time Point

Response

1

1

14

1

2

17

1

3

21

2

1

16

2

2

16

2

3

19

Note that there are three records per subject, corresponding to the three time points of data collection, and that the response variable has values which vary over time. There is also an explicit variable indicating the time points of the repeated measures.

There may be differing numbers of observations for each subject due to missing data in a repeated measures study, or perhaps due to loss to follow-up in a longitudinal study. Missing data is not a problem in a LMM setting, as long as the data can be considered to be MAR, i.e., Missing at Random.

If you have a data set that has a “horizontal” structure, in which the repeated measurements on the same subject are contained in different variables across the same row, you will need to restructure the data. This can be done using a data step, as shown in the example below:

The SAS commands below rearrange the Rat Brain data set so it is in long form appropriate for analysis using SAS Proc Mixed.

data ratbrain(keep=animal treat region activate);

set ratbrain_wide;

array origvar (2,3) Basal_BST Basal_LS Basal_VDB

Carb_BST Carb_LS Carb_VDB;

do treatment = 1 to 2;

do region = 1 to 3;

activate = origvar(treat,region);

output;

end;

end;

run;

The printout of the Rat Brain data set in the long form shows that we now have 6 observations for each rat, one for each combination of the two treatments (Basal and Carbachol), and three brain regions (BST, LS, and VDB). The dependent variable, ACTIVATE, is now present once on each row of data.

Long Data Set

Obs animal treat region activate

1 R111097 1 1 366.19

2 R111097 1 2 199.31

3 R111097 1 3 187.11

4 R111097 2 1 371.71

5 R111097 2 2 302.02

6 R111097 2 3 449.70

7 R111397 1 1 375.58

8 R111397 1 2 204.85

9 R111397 1 3 179.38

10 R111397 2 1 492.58

11 R111397 2 2 355.74

12 R111397 2 3 459.58

13 R100797 1 1 458.16

14 R100797 1 2 245.04

15 R100797 1 3 237.42

16 R100797 2 1 664.72

17 R100797 2 2 587.10

18 R100797 2 3 726.96

19 R100997 1 1 479.81

20 R100997 1 2 261.19

21 R100997 1 3 195.51

22 R100997 2 1 515.29

23 R100997 2 2 437.56

24 R100997 2 3 604.29

25 R110597 1 1 462.79

26 R110597 1 2 278.33

27 R110597 1 3 262.05

28 R110597 2 1 589.25

29 R110597 2 2 493.93

30 R110597 2 3 621.07

Analysis of the Rat Brain Data Using Proc Mixed

We will first examine using different ways to choose covariance structures for a simple repeated measures design. For the rat brain data, nucleotide activation was measured in three brain regions, with two treatments (Carbachol and Basal).

Analysis comparing regions for Basal (control) treatment using Proc Mixed

We begin the analysis using Proc Mixed by comparing a set of different models, using a marginal model approach (with no random effects), and different covariance structures for R. In the first model, we use an UN structured covariance structure for R. Note that because we are fitting a marginal model, there is no random statement.

title "Repeated Measures Anova Using Proc Mixed";

title2 "Compare Region for Basal Only";

proc mixed data=ratbrain covtest;

where treat = 1;

class region animal;

model activate=region;

repeated /subject=animal type=un r rcorr;

run;

Repeated Measures Anova Using Proc Mixed

Compare Region for Basal Only

The Mixed Procedure

Model Information

Data Set WORK.RATBRAIN

Dependent Variable activate

Covariance Structure Unstructured

Subject Effect animal

Estimation Method REML

Residual Variance Method None

Fixed Effects SE Method Model-Based

Degrees of Freedom Method Between-Within

Class Level Information

Class Levels Values

region 3 1 2 3

animal 5 R100797 R100997 R110597

R111097 R111397

Dimensions

Covariance Parameters 6

Columns in X 4

Columns in Z 0

Subjects 5

Max Obs Per Subject 3

Number of Observations

Number of Observations Read 15

Number of Observations Used 15

Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 128.65412382

1 1 114.63447657 0.00000000

Convergence criteria met.

Estimated R Matrix for animal R100797

Row Col1 Col2 Col3

1 2842.82 1736.67 1225.30

2 1736.67 1202.34 964.95

3 1225.30 964.95 1276.56

Estimated R Correlation Matrix

for animal R100797

Row Col1 Col2 Col3

1 1.0000 0.9394 0.6432

2 0.9394 1.0000 0.7789

3 0.6432 0.7789 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) animal 2842.82 2010.18 1.41 0.0786

UN(2,1) animal 1736.67 1268.27 1.37 0.1709

UN(2,2) animal 1202.34 850.18 1.41 0.0786

UN(3,1) animal 1225.30 1132.52 1.08 0.2793

UN(3,2) animal 964.95 785.17 1.23 0.2191

UN(3,3) animal 1276.56 902.66 1.41 0.0786

Fit Statistics

-2 Res Log Likelihood 114.6

AIC (smaller is better) 126.6

AICC (smaller is better) 143.4

BIC (smaller is better) 124.3

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

5 14.02 0.0155

Solution for Fixed Effects

Standard

Effect region Estimate Error DF t Value Pr > |t|

Intercept 212.29 15.9785 4 13.29 0.0002

region 1 216.21 18.2690 4 11.83 0.0003

region 2 25.4500 10.4786 4 2.43 0.0721

region 3 0 . . . .

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

region 2 4 185.13 0.0001

We now change the model, using Type=CS for the structure of the R matrix.

repeated region / subject = animal type=cs r rcorr;

Partial output for this model is shown below. The AIC is a bit smaller, BIC is basically unchanged.

Estimated R Correlation Matrix

for animal R100797

Row Col1 Col2 Col3

1 1.0000 0.7379 0.7379

2 0.7379 1.0000 0.7379

3 0.7379 0.7379 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

CS animal 1308.97 1038.07 1.26 0.2073

Residual 464.93 232.47 2.00 0.0228

Fit Statistics

-2 Res Log Likelihood 121.6

AIC (smaller is better) 125.6

AICC (smaller is better) 126.9

BIC (smaller is better) 124.8

We could also consider other structures, such as Compound symmetry heterogeneous (type=CSH), which is not shown here.

Two Repeated Factors (Treat and Region) Analysis Using Proc GLM

Fitting models with more than one repeated measures factor is not as simple as when there is only a single repeated measures factor.

We begin this portion of the analysis by fitting a traditional repeated measures ANOVA