Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Random EffectsPart 2
STAT:5201
Week 10 - Lecture 1b
1 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
In genetics, an inbred line is homozygous at every location along thechromosome.
In the lab, we commonly have inbred plants and also inbred mice.
The benefit to scientists is that we can create genetically identicaloffspring.
2 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
Gene expression example:
10 different inbred lines
3 organisms from each line (genetically identical)
Response ≡ gene expression (measured at each of 12,488 genes).
In the lab, we commonly have inbred plants and also inbred mice.
3 / 35
Random Effects
In random effects models (i.e. when we have all random effects) weare investigating variance components. Which variance componentsare relatively large or small?
Example (Single Random Effects Factor)
At each gene, I can consider the following model (specific to a gene):
4 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
Ifσ2L
σ2+σ2L≈ 1, then the variability in gene expression is mainly due
to genetics and is said to be ‘reproducible’. This is a measure of‘heritability’.
5 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
Here, we will use the estimated intraclass correlation ρI for each ofthe 12,488 genes to filter-out those with ‘highly heritable expression’.
So, we will useσ2L
σ2+σ2L> cut-off as a filter.
First 7 observations shown here (response is on log-scale).
6 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
Because I’m going to do the same analysis at each gene, I will makesure my data is sorted by gene first.
7 / 35
Random Effects
When working with random effects, we will usually use the newerprocedure PROC MIXED, though occasionally we will have a reason touse PROC GLM when modeling random effects.
Here, besides using PROC MIXED, I will do something else new whichis to save some of the PROC MIXED output to a new SAS data setcalled ‘covparm’ by utilizing an Output Delivery System (ODS)statement.
Example (Single Random Effects Factor - gene expression)
After running the procedure, my ODS statement will create the‘covparm’ data set which holds the estimates for σ2L and σ2 for eachgene.
These are the variance components used in the random effects model(there’s a separate such model fit for each gene here). And I’m goingto use these estimates to estimate the intraclass correlation ρIforeach gene.
8 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
The PROC MIXED statement with the ODSa output statement isbelow.
aWe will see more about how to use ODS in the next lecture.
9 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
Because we are running this procedure by gene, we receive outputafter ever model fit. Below we see some output for the last gene.
10 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
And covariance parameter estimates σ̂2L and σ̂2 are also outputted forgene 12,488:
Next, let’s look at the ‘covparm’ data set I created using ODS...
11 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
Now, I have an estimate for σ2L and σ2 at each gene.
12 / 35
Random Effects
One thing for which we can appropriately use PROC GLM in randomeffects models is to get the expected mean squares (EMS) of arandom effects factor.BUT PROC GLM CAN DO THE WRONG THING WITHRANDOM EFFECTS, so if we have random effects, in general, wewill use PROC MIXED .
Example (Single Random Effects Factor - gene expression)
Here, we use PROC GLM to get the formula for the EMS for the effectsin the model at single gene (it is the same for all genes).
13 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
And the relevant PROC GLM output:
And we see the EMS for Line as E (MSLine) = σ2 + 3σ2L.
Also, σ̂2L = MSLine−σ̂2
3 = 100.68−9.393 = 30.43
14 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
It turns-out that PROC MIXED will give you REML estimates for thevariance parameters by default:
15 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
The histogram of the 12,488 estimated intraclass correlations:
ρI estimates (12488 of them)
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0500
1000
1500
There were about 4500 genes with highly heritable expression, suchthat the estimated ρI>= 0.80.
16 / 35
Random Effects
Example (Single Random Effects Factor - gene expression)
Even though we work with inbred lines, the phenotypes (observedqualities, like plant height, corn yield, weight, etc.) will not all beexactly the same.
Genetics don’t explain all variation... below we see a cloned cat CCfrom mother Rainbow:
17 / 35
Mixed Effects Models: Random Effects and Fixed Effects
When we have both random and fixed effects in a model, we call it aMixed Effects Model (or Mixed Model). Thus, the procedure PROC
MIXED will be used heavily in the second part of the course.
In this class we will work with linear models, so we now move toLinear Mixed Models.
Though Oehlert doesn’t include it, a common way to describe amixed model in matrix notation is the following:
Y = Xβ︸︷︷︸+ Zu︸︷︷︸+e
↑ ↑fixed effects random effectspart part
18 / 35
Mixed Effects Models: Random Effects and Fixed Effects
Y = Xβ︸︷︷︸+ Zu︸︷︷︸+e
↑ ↑fixed effects random effectspart part
X is an Nxp model matrix of known constants
β is a px1 vector of ‘fixed’ unknown parameters
Z is an Nxq model matrix of known constants
u is a qx1 random vector (of random effects)
e is an Nx1 vector of random errors
This is the notation used in SAS documentation.
19 / 35
Matrix Notation for Random Effects Model Example
Consider the example on gene expression we saw earlier where weestimated ρIat each gene. The model was...
Let’s show Y = Xβ + Zu + e notation for a single gene...
20 / 35
Matrix Notation for Random Effects Model Example
Y = Xβ + Zu + e notation for a single gene...
For each gene,N = 10x3 = 30 XNx1 = 1Nx1 β = µ p = 1
ZNx10 (below) u10x1(below) q = 10
Z =
1 0 0 0 0 0 0 0 01 0 0 0 0 0 0 0 01 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0
. . .0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 1
Nx10
u =
L1L2L3L4L5L6L7L8L9L10
10x1
21 / 35
Matrix Notation for Random Effects Model Example
Y = Xβ + Zu + e notation for a single gene...
For each gene,N = 10x3 = 30 XNx1 = 1Nx1 β = µ p = 1
ZNx10 (below) u10x1(below) q = 10
And we saw this output from PROC MIXED before:
22 / 35
Model for Two-Way Random Effects Model (all random effects)
Yijk = µ+ αi + βj + (αβ)ij + εijk
for i = 1, . . . , a, j = 1, . . . , b k = 1, 2, . . . , n
with random effects all independent from each other
αiiid∼ N(0, σ2α)
βjiid∼ N(0, σ2β)
(αβ)ijiid∼ N(0, σ2αβ)
εijkiid∼ N(0, σ2)
αi ≡ random effect of Factor A level i
βj ≡ random effect of Factor B level j
(αβ)ij ≡ random interaction effect for Factor A level i and Factor Blevel j
23 / 35
Model for Two-Way Random Effects Model (all random effects)
Yijk = µ+ αi + βj + (αβ)ij + εijk
Again, with all the effects (except µ) as random effects.
Marginal Variance:Var(Yijk) = σ2α + σ2β + σ2αβ + σ2
As there are 4 variance components in this model.
Population hypothesis tests of interest:H0 : σ2α = 0
H0 : σ2β = 0
H0 : σ2αβ = 0
24 / 35
Model for Two-Way Random Effects Model (all random effects)
Construct F-tests by examining expected mean squares (EMS)
“The denominator mean square for F -tests in random effects modelsWILL NOT always be MSE!”
- Oehlert
25 / 35
Model for Two-Way Random Effects Model (all random effects)
Construct F-tests by examining expected mean squares (EMS)
E (MSA) = σ2 + nσ2αβ + nbσ2α
E (MSAB) = σ2 + nσ2αβ
E (MSE ) = σ2
H0 : σ2α = 0 Fo = MSAMSAB
and under Ho true, Fo ∼ F(a−1,(a−1)(b−1))
↑ ↑Denominator is Denominator dfNOT MSE coincides with AB term
When H0 is true, the E (MSA) = E (MSAB).
We ned a numerator and denominator with the same expected valueunder H0 true (and a large expected value under HA true).
26 / 35
Model for Two-Way Random Effects Model (all random effects)
H0 : σ2α = 0 ⇒ Fo = MSAMSAB
We say “MSAB is the error term for testing the main effect of A.”
H0 : σ2β = 0 ⇒ Fo = MSBMSAB
H0 : σ2αβ = 0 ⇒ Fo = MSABMSE
We’ll see later, sometimes there is no MS in the ANOVA table thatgives us the appropriate error term for a test of interest (like we hadso nicely in this 2-factor random effects model).
In those cases, we’ll use a linear combination of MS ’s to createappropriate F -tests, and these lead to ‘approximate F -tests’.
27 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
Investigation of sources of variability in a manufacturing system is acommon use of random effects models in industry.
Here, an instrument is used to measure a critical dimension on a part by aparticular operator.
Twenty parts have been randomly selected from the production process,and three randomly selected operators measure each part twice with thisinstrument.
The order of measurements are completely randomized, so this is atwo-factor factorial where both factors (part and operator) are random,with replication (n=2).
28 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
Two-way model: Yijk = µ+ αi + βj + (αβ)ij + εijk
With all random effects (except µ) as defined previously, independentfrom each other using parameters σ2α, σ
2β, σ
2αβ, σ
2.
29 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
Get the general plot:
30 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
Visually, the largest effect seems to be the part effect.31 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
Fit the random effects model using PROC MIXED a:
aIn Proc Mixed, only fixed effects go on the model statement. All randomeffects go in the random statement
32 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
33 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
More PROC MIXED output:
And we will next use PROC GLM a to get the EMS for the effects.
aIn Proc GLM, all effects go on the model statement. Then random effectsare specified in the random statement 34 / 35
Model for Two-Way Random Effects Model (all random effects)
Example (Operator and Part random effects)
Formulas for EMS is in the PROC GLM output:
These coincides with our formulas using a = 3, b = 20, n = 2:σ2 + nσ2αβ + nbσ2α
σ2 + nσ2αβ + naσ2β
σ2 + nσ2αβ35 / 35