35
Random Effects Part 2 STAT:5201 Week 10 - Lecture 1b 1 / 35

Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random EffectsPart 2

STAT:5201

Week 10 - Lecture 1b

1 / 35

Page 2: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

In genetics, an inbred line is homozygous at every location along thechromosome.

In the lab, we commonly have inbred plants and also inbred mice.

The benefit to scientists is that we can create genetically identicaloffspring.

2 / 35

Page 3: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

Gene expression example:

10 different inbred lines

3 organisms from each line (genetically identical)

Response ≡ gene expression (measured at each of 12,488 genes).

In the lab, we commonly have inbred plants and also inbred mice.

3 / 35

Page 4: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

In random effects models (i.e. when we have all random effects) weare investigating variance components. Which variance componentsare relatively large or small?

Example (Single Random Effects Factor)

At each gene, I can consider the following model (specific to a gene):

4 / 35

Page 5: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

Ifσ2L

σ2+σ2L≈ 1, then the variability in gene expression is mainly due

to genetics and is said to be ‘reproducible’. This is a measure of‘heritability’.

5 / 35

Page 6: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

Here, we will use the estimated intraclass correlation ρI for each ofthe 12,488 genes to filter-out those with ‘highly heritable expression’.

So, we will useσ2L

σ2+σ2L> cut-off as a filter.

First 7 observations shown here (response is on log-scale).

6 / 35

Page 7: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

Because I’m going to do the same analysis at each gene, I will makesure my data is sorted by gene first.

7 / 35

Page 8: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

When working with random effects, we will usually use the newerprocedure PROC MIXED, though occasionally we will have a reason touse PROC GLM when modeling random effects.

Here, besides using PROC MIXED, I will do something else new whichis to save some of the PROC MIXED output to a new SAS data setcalled ‘covparm’ by utilizing an Output Delivery System (ODS)statement.

Example (Single Random Effects Factor - gene expression)

After running the procedure, my ODS statement will create the‘covparm’ data set which holds the estimates for σ2L and σ2 for eachgene.

These are the variance components used in the random effects model(there’s a separate such model fit for each gene here). And I’m goingto use these estimates to estimate the intraclass correlation ρIforeach gene.

8 / 35

Page 9: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

The PROC MIXED statement with the ODSa output statement isbelow.

aWe will see more about how to use ODS in the next lecture.

9 / 35

Page 10: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

Because we are running this procedure by gene, we receive outputafter ever model fit. Below we see some output for the last gene.

10 / 35

Page 11: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

And covariance parameter estimates σ̂2L and σ̂2 are also outputted forgene 12,488:

Next, let’s look at the ‘covparm’ data set I created using ODS...

11 / 35

Page 12: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

Now, I have an estimate for σ2L and σ2 at each gene.

12 / 35

Page 13: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

One thing for which we can appropriately use PROC GLM in randomeffects models is to get the expected mean squares (EMS) of arandom effects factor.BUT PROC GLM CAN DO THE WRONG THING WITHRANDOM EFFECTS, so if we have random effects, in general, wewill use PROC MIXED .

Example (Single Random Effects Factor - gene expression)

Here, we use PROC GLM to get the formula for the EMS for the effectsin the model at single gene (it is the same for all genes).

13 / 35

Page 14: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

And the relevant PROC GLM output:

And we see the EMS for Line as E (MSLine) = σ2 + 3σ2L.

Also, σ̂2L = MSLine−σ̂2

3 = 100.68−9.393 = 30.43

14 / 35

Page 15: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

It turns-out that PROC MIXED will give you REML estimates for thevariance parameters by default:

15 / 35

Page 16: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

The histogram of the 12,488 estimated intraclass correlations:

ρI estimates (12488 of them)

Frequency

0.0 0.2 0.4 0.6 0.8 1.0

0500

1000

1500

There were about 4500 genes with highly heritable expression, suchthat the estimated ρI>= 0.80.

16 / 35

Page 17: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Random Effects

Example (Single Random Effects Factor - gene expression)

Even though we work with inbred lines, the phenotypes (observedqualities, like plant height, corn yield, weight, etc.) will not all beexactly the same.

Genetics don’t explain all variation... below we see a cloned cat CCfrom mother Rainbow:

17 / 35

Page 18: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Mixed Effects Models: Random Effects and Fixed Effects

When we have both random and fixed effects in a model, we call it aMixed Effects Model (or Mixed Model). Thus, the procedure PROC

MIXED will be used heavily in the second part of the course.

In this class we will work with linear models, so we now move toLinear Mixed Models.

Though Oehlert doesn’t include it, a common way to describe amixed model in matrix notation is the following:

Y = Xβ︸︷︷︸+ Zu︸︷︷︸+e

↑ ↑fixed effects random effectspart part

18 / 35

Page 19: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Mixed Effects Models: Random Effects and Fixed Effects

Y = Xβ︸︷︷︸+ Zu︸︷︷︸+e

↑ ↑fixed effects random effectspart part

X is an Nxp model matrix of known constants

β is a px1 vector of ‘fixed’ unknown parameters

Z is an Nxq model matrix of known constants

u is a qx1 random vector (of random effects)

e is an Nx1 vector of random errors

This is the notation used in SAS documentation.

19 / 35

Page 20: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Matrix Notation for Random Effects Model Example

Consider the example on gene expression we saw earlier where weestimated ρIat each gene. The model was...

Let’s show Y = Xβ + Zu + e notation for a single gene...

20 / 35

Page 21: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Matrix Notation for Random Effects Model Example

Y = Xβ + Zu + e notation for a single gene...

For each gene,N = 10x3 = 30 XNx1 = 1Nx1 β = µ p = 1

ZNx10 (below) u10x1(below) q = 10

Z =

1 0 0 0 0 0 0 0 01 0 0 0 0 0 0 0 01 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0

. . .0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 1

Nx10

u =

L1L2L3L4L5L6L7L8L9L10

10x1

21 / 35

Page 22: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Matrix Notation for Random Effects Model Example

Y = Xβ + Zu + e notation for a single gene...

For each gene,N = 10x3 = 30 XNx1 = 1Nx1 β = µ p = 1

ZNx10 (below) u10x1(below) q = 10

And we saw this output from PROC MIXED before:

22 / 35

Page 23: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Yijk = µ+ αi + βj + (αβ)ij + εijk

for i = 1, . . . , a, j = 1, . . . , b k = 1, 2, . . . , n

with random effects all independent from each other

αiiid∼ N(0, σ2α)

βjiid∼ N(0, σ2β)

(αβ)ijiid∼ N(0, σ2αβ)

εijkiid∼ N(0, σ2)

αi ≡ random effect of Factor A level i

βj ≡ random effect of Factor B level j

(αβ)ij ≡ random interaction effect for Factor A level i and Factor Blevel j

23 / 35

Page 24: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Yijk = µ+ αi + βj + (αβ)ij + εijk

Again, with all the effects (except µ) as random effects.

Marginal Variance:Var(Yijk) = σ2α + σ2β + σ2αβ + σ2

As there are 4 variance components in this model.

Population hypothesis tests of interest:H0 : σ2α = 0

H0 : σ2β = 0

H0 : σ2αβ = 0

24 / 35

Page 25: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Construct F-tests by examining expected mean squares (EMS)

“The denominator mean square for F -tests in random effects modelsWILL NOT always be MSE!”

- Oehlert

25 / 35

Page 26: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Construct F-tests by examining expected mean squares (EMS)

E (MSA) = σ2 + nσ2αβ + nbσ2α

E (MSAB) = σ2 + nσ2αβ

E (MSE ) = σ2

H0 : σ2α = 0 Fo = MSAMSAB

and under Ho true, Fo ∼ F(a−1,(a−1)(b−1))

↑ ↑Denominator is Denominator dfNOT MSE coincides with AB term

When H0 is true, the E (MSA) = E (MSAB).

We ned a numerator and denominator with the same expected valueunder H0 true (and a large expected value under HA true).

26 / 35

Page 27: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

H0 : σ2α = 0 ⇒ Fo = MSAMSAB

We say “MSAB is the error term for testing the main effect of A.”

H0 : σ2β = 0 ⇒ Fo = MSBMSAB

H0 : σ2αβ = 0 ⇒ Fo = MSABMSE

We’ll see later, sometimes there is no MS in the ANOVA table thatgives us the appropriate error term for a test of interest (like we hadso nicely in this 2-factor random effects model).

In those cases, we’ll use a linear combination of MS ’s to createappropriate F -tests, and these lead to ‘approximate F -tests’.

27 / 35

Page 28: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

Investigation of sources of variability in a manufacturing system is acommon use of random effects models in industry.

Here, an instrument is used to measure a critical dimension on a part by aparticular operator.

Twenty parts have been randomly selected from the production process,and three randomly selected operators measure each part twice with thisinstrument.

The order of measurements are completely randomized, so this is atwo-factor factorial where both factors (part and operator) are random,with replication (n=2).

28 / 35

Page 29: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

Two-way model: Yijk = µ+ αi + βj + (αβ)ij + εijk

With all random effects (except µ) as defined previously, independentfrom each other using parameters σ2α, σ

2β, σ

2αβ, σ

2.

29 / 35

Page 30: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

Get the general plot:

30 / 35

Page 31: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

Visually, the largest effect seems to be the part effect.31 / 35

Page 32: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

Fit the random effects model using PROC MIXED a:

aIn Proc Mixed, only fixed effects go on the model statement. All randomeffects go in the random statement

32 / 35

Page 33: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

33 / 35

Page 34: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

More PROC MIXED output:

And we will next use PROC GLM a to get the EMS for the effects.

aIn Proc GLM, all effects go on the model statement. Then random effectsare specified in the random statement 34 / 35

Page 35: Random Effects Part 2 - University of Iowahomepage.stat.uiowa.edu/.../4-3_random_effects_pt2.pdf · ‘covparm’ data set which holds the estimates for ˙2 ... = ˙2 + ˙2 + ˙ 2

Model for Two-Way Random Effects Model (all random effects)

Example (Operator and Part random effects)

Formulas for EMS is in the PROC GLM output:

These coincides with our formulas using a = 3, b = 20, n = 2:σ2 + nσ2αβ + nbσ2α

σ2 + nσ2αβ + naσ2β

σ2 + nσ2αβ35 / 35