Upload
vancong
View
218
Download
0
Embed Size (px)
Citation preview
Statistics 262: IntermediateBiostatistics
Random Effects, GLMs
Jonathan Taylor & Kristin Cobb
Statistics 262: Intermediate Biostatistics – p.1/??
Overview of today’s class
Analysis of Variance: Random effects
Generalized Linear Models
Statistics 262: Intermediate Biostatistics – p.2/??
Example: sodium content in beer
How much sodium is there in North Americanbeer? How much does this vary by brand?
Observations: for 6 brands of beer, werecorded the sodium content of 8 12 ouncebottles.
Questions of interest: what is the “grandmean” sodium content? How much variabilityis there from brand to brand?
“Individuals” in this case are brands, repeatedmeasures are the 8 bottles.
Statistics 262: Intermediate Biostatistics – p.3/??
One-way random effects model
Yij ∼ µ· + αi + εij, 1 ≤ i ≤ r, 1 ≤ j ≤ n
εij ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ n
αi ∼ N(0, σ2µ), 1 ≤ i ≤ r.
Statistics 262: Intermediate Biostatistics – p.4/??
One-way random (equal sample sizes)
Source SS df E(SS)
Treatments SSTR =Pr
i=1n
`
Y i· − Y··
´2r − 1 σ2 + nσ2
µ
Error SSE =Pr
i=1
Pnj=1
(Yij − Y i·)2 (n − 1)r σ2
ANOVA table is still useful to setup tests: thesame F statistics for fixed or random will workhere.
Statistics 262: Intermediate Biostatistics – p.5/??
Inference for µ·
We know that E(Y ··) = µ·, and can show that
Var(Y ··) =nσ2
µ + σ2
rn.
Therefore,Y ·· − µ·
SSTR(r−1)rn
∼ tr−1
Statistics 262: Intermediate Biostatistics – p.6/??
Inference for µ·
Why r − 1 degrees of freedom? Imagine wecould record an infinite number ofobservations for each individual, so thatY i· → µi.
To learn anything about µ· we still only have robservations (µ1, . . . , µr).
Sampling more within an individual cannotnarrow the CI for µ·.
Statistics 262: Intermediate Biostatistics – p.7/??
Inference for σ2µ
σ2µ+σ2
This quantity describes the relativecontribution of the random effects variance tothe total variance of one observation.
We use the fact that
F =σ2 + nσ2
µ
σ2×
SSTRr−1SSE
(n−1)r
∼ Fr−1,(n−1)r.
Manipulate the inequalities:
P (Fr−1,(n−1)r,α/2 ≤ F ≤ Fr−1,(n−1)r,1−α/2) = 1−α.
Statistics 262: Intermediate Biostatistics – p.8/??
Estimating σ2µ
From the ANOVA table
σ2µ =
E(SSTR/(r − 1)) − E(SSE/((n − 1)r))
n.
Natural estimate:
S2µ =
SSTR/(r − 1) − SSE/((n − 1)r)
n
Problem: this estimate can be negative! Oneof the difficulties in random effects model.
Statistics 262: Intermediate Biostatistics – p.9/??
SAS code
PROC GLM DATA=DATADIR.beer;
CLASS BRAND BOTTLE;
MODEL SODIUM = BRAND;
MEANS BRAND / LSD CLDIFF;
RUN;
Statistics 262: Intermediate Biostatistics – p.10/??
Two-way random effects model
Yijk ∼ µ·· + αi + βj + (αβ)ij + εij, 1 ≤ i ≤ r, 1 ≤j ≤ m, 1 ≤ k ≤ n
εijk ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤n
αi ∼ N(0, σ2α), 1 ≤ i ≤ r.
βj ∼ N(0, σ2β), 1 ≤ j ≤ m.
(αβ)ij ∼ N(0, σ2αβ), 1 ≤ j ≤ m, 1 ≤ i ≤ r.
Statistics 262: Intermediate Biostatistics – p.11/??
ANOVA tables: Two-way (random)
SS df E(MS)
SSA r − 1 σ2 + nmσ2α + nσ2
αβ
SSB m − 1 σ2 + nrσ2
β+ nσ2
αβ
SSAB (m − 1)(r − 1) σ2 + nσ2
αβ
SSE =Pr
i=1
Pmj=1
Pnk=1
(Yijk − Y ij·)2 (n − 1)mr σ2
To test H0 : σ2α = 0 use SSA and SSAB.
To test H0 : σ2αβ = 0 use SSAB and SSE.
Statistics 262: Intermediate Biostatistics – p.12/??
Mixed effects model
In some studies, some factors can be thoughtof as fixed, others random.
For instance, we might have a study of theeffect of a standard part of the brewingprocess on sodium levels in the beerexample.
Then, we might think of a model in which wehave a fixed effect for “brewing technique”and a random effect for beer.
Statistics 262: Intermediate Biostatistics – p.13/??
Two-way mixed effects model
Yijk ∼ µ·· + αi + βj + (αβ)ij + εij, 1 ≤ i ≤ r, 1 ≤j ≤ m, 1 ≤ k ≤ n
εijk ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤n
αi ∼ N(0, σ2α), 1 ≤ i ≤ r.
βj, 1 ≤ j ≤ m are constants.
(αβ)ij ∼ N(0, σ2αβ), 1 ≤ j ≤ m, 1 ≤ i ≤ r.
Statistics 262: Intermediate Biostatistics – p.14/??
Constraints
∑mj=1 βj = 0
∑ri=1(αβ)ij = 0, 1 ≤ i ≤ r.
Statistics 262: Intermediate Biostatistics – p.15/??
ANOVA tables: Two-way (mixed)
SS df E(MS)
SSA r − 1 σ2 + nmσ2α
SSB m − 1 σ2 + nr
Pmj=1
β2
i
m−1+ nσ2
αβ
SSAB (m − 1)(r − 1) σ2 + nσ2
αβ
SSE =Pr
i=1
Pmj=1
Pnk=1
(Yijk − Y ij·)2 (n − 1)ab σ2
To test H0 : σ2α = 0 use SSA and SSE.
To test H0 : β1 = · · · = βm = 0 use SSB andSSAB.
To test H0 : σ2αβ use SSAB and SSE.
Statistics 262: Intermediate Biostatistics – p.16/??
Example: pearls
Imitation pearl “study”.
Two factors: how many coats of lacquer, andbatch.
Natural to use batch as random, coats asfixed, but we’ll do both.
Statistics 262: Intermediate Biostatistics – p.17/??
SAS code: two-way
PROC MIXED;CLASS BATCH COATS;MODEL VALUE= ;RANDOM COATS BATCH COATS*BATCH;
RUN;
Statistics 262: Intermediate Biostatistics – p.18/??
SAS code: mixed
PROC MIXED;CLASS COATS BATCH;MODEL VALUE=COATS;RANDOM BATCH COATS*BATCH;
RUN;
Statistics 262: Intermediate Biostatistics – p.19/??
Nested vs. crossed
In previous examples the factors are suchthat we observe every pair of possible values.
Sometimes, particularly in random effectsmodels, this is not possible.
For example, consider a model with “city” as afactor, and “country” as another.
Statistics 262: Intermediate Biostatistics – p.20/??
Nested random effect models
Suppose we want to study the effectivenessof a recycling program at reducing householdwaste.
Outcome: monthly per capita householdwaste.
Design: for all G8 countries, recyclingprogram is implemented in 10 cities atrandom, with 10 other cities chosen atrandom. Outcome is measured each monthfor a year.
Statistics 262: Intermediate Biostatistics – p.21/??
What can be estimated?
Interactions between city and country can’t beestimated, nor between city and “program”.
We say that “city” is nested within “country”and “program” because the levels of city can’tappear in other countries, and each city eitherimplements the program or not.
It is possible to estimate interactions betweentime, country and program.
City can be either a random or fixed effect –here we treat it as random.
Statistics 262: Intermediate Biostatistics – p.22/??
Nested model: SAS “code”
PROC MIXED;CLASS CITY COUNTRY PROGRAM MONTH;MODEL WASTE = COUNTRY PROGRAM MONTHCOUNTRY*PROGRAM;RANDOM CITY(COUNTRY PROGRAM);
LSMEANS COUNTRY*MONTH;
Statistics 262: Intermediate Biostatistics – p.23/??
Generalized linear models
All models we have seen so far deal withcontinuous outcome variables with norestriction on their expectations, and all haveassumed that mean and variance areunrelated.
Many outcomes of interest do not satisfy this.
Examples: binary outcomes, Poisson countoutcomes.
Statistics 262: Intermediate Biostatistics – p.24/??
Binary outcomes
If our outcome Yi is a 0-1 random variable,then 0 ≤ E(Yi) ≤ 1.
Also,
E(Yi) = πi, V ar(Yi) = πi(1 − πi).
A convenient way to model the dependenceof Yi on covariates Xi1, . . . , Xik is through thelogit transform.
Statistics 262: Intermediate Biostatistics – p.25/??
Logit transform
Logit transform:
logit(π) = log
(π
1 − π
)∈ (−∞, +∞)
Inverse:
logit−1(x) =ex
1 + ex∈ (0, 1).
Statistics 262: Intermediate Biostatistics – p.26/??
Binary regression
Logistic regression model:
logit(E(Yi)) = logit(πi) = β0 +k∑
j=1
βjXij .
Probit regression model:
Φ−1(E(Yi)) = β0+k∑
j=1
βjXij, Φ(x) =
∫ x
−∞
e−x2/2
√2π
dx.
In each case, V ar(Yi) = πi(1 − πi) but theregression model is different.
Statistics 262: Intermediate Biostatistics – p.27/??
Link & variance fns.
If
ηi = g(E(Yi)) = g(µi) = β0 +k∑
j=1
βjXij
then g is the link function for the model.
IfV ar(Yi) = φ · V (E(Yi)) = φ · V (µi)
for some function V , then V is the variancefunction for the data.
logit(E(Yi)) = logit(πi) = β0 +k∑
j=1
βjXij .
Statistics 262: Intermediate Biostatistics – p.28/??
Binary (again)
For a logistic model,
g(µ) = logit(µ), V (µ) = µ(1 − µ).
For a probit model,
g(µ) = Φ−1(µ), V (µ) = µ(1 − µ).
Statistics 262: Intermediate Biostatistics – p.29/??
Other common examples
Multiple linear regression:g(µ) = µ, V ar(µ) = 1
Linear regression with variance tied to meang(µ) = µ, V ar(µ) = µ2.
Poisson log-linear models:g(µ) = log(µ), V ar(µ) = µ.
Statistics 262: Intermediate Biostatistics – p.30/??
Fitting a GLM
When fitting GLMs, instead of SSE, thelog-likelihood is minimized,
DEV (β0, . . . , βk)n∑
i=1
2 log L(Yi, Xij) − 2 log L(β0, . . . , βk|Yi, Xij).
The first term above is the likelihood in thecompletely unrestricted model. For example,in Gaussian setting it is zero (can set eachµi = Yi). In Poisson setting it is
2
(n∑
i=1
Yi log(Yi) − Yi − log(Yi!)
).
Statistics 262: Intermediate Biostatistics – p.31/??
Quasi-likelihood
For some choices of link and variancefunctions, there is no deviance.
In these situations, quasi-likelihood is what isminimized instead
X2(β0, . . . , βk) =n∑
i=1
(Yi − µi(β0, . . . , βk)√
V (µi)
)2
.
Pearson’s X2 also gives an estimate of φ ifnot specified as in binary or Poisson models
φ̂ =1
n − k − 1X2(β̂0, . . . , β̂k).
Statistics 262: Intermediate Biostatistics – p.32/??
Inference in GLMs
Large sample theory tells us that
β̂ ∼ N(β, φ̂(XTW−1X)−1
)
with weights Wi = V (µ̂i).
For testing individual β’s everything isbasically the same as in multiple regression,just have different estimates ofSE(
∑kj=0 ajβ̂j).
Likewise, confidence intervals just havedifferent estimates of SE. Use normal insteadof t. Statistics 262: Intermediate Biostatistics – p.33/??
Partial deviance tests
As in multiple regression to test that somesubset H0 : βi1 = · · · = βil = 0 we fit a full anda reduced model.
Large sample theory again tells us that
DEVχ2 =DEV (R) − DEV (F )
φ̂∼ χ2
dfR−dfF.
If deviance is unavailable, Pearson’s X2 issubstituted.
Reject H0 if DEVχ2 is larger than χ2dfR−dfF ,1−α.
Statistics 262: Intermediate Biostatistics – p.34/??
Wald χ2 tests
Test can also be done using a Wald χ2 whichdoes not fit a full and reduced model.
Wald χ2 to test Cβ = 0:
WALDχ2 = Cβ̂(C(XTW−1X)−1CT )−1(Cβ̂T ).
Reject H0 : Cβ = 0 if WALDχ2 is larger thanχ2
#rowsC,1−α.
Statistics 262: Intermediate Biostatistics – p.35/??
Logistic example
A local health clinic sent fliers to its clients toencourage everyone, but especially olderpersons at high risk of complications, to get aflu shot in time for protection against anexpected flu epidemic.
In a pilot follow-up study, 50 clients wererandomly selected and asked whether theyactually received a flu shot.
In addition, data were collected on their ageand their health awareness.
Statistics 262: Intermediate Biostatistics – p.36/??
SAS code: logistic
PROC GENMOD;MODEL SHOT = AGE HEALTH AGE*HEALTH /DIST=BINOMIAL LINK=LOGIT TYPE3WALD CL WALDCI;RUN;
Statistics 262: Intermediate Biostatistics – p.37/??
SAS code: probit
PROC GENMOD;MODEL SHOT = AGE HEALTH AGE*HEALTH /DIST=BINOMIAL LINK=PROBIT TYPE3WALD CL WALDCI;RUN;
Statistics 262: Intermediate Biostatistics – p.38/??
Poisson example
A researcher in geriatrics designed aprospective study to investigate the effects oftwo interventions on the frequency of falls.
One hundred subjects were randomlyassigned to one of the two interventions:education only and education plus exercise.
Three variables: gender, a balance index, anda strength index.
Statistics 262: Intermediate Biostatistics – p.39/??
SAS code
PROC GENMOD;MODEL FALLS=INTERVENTION GENDER BALANCESTRENGTH /DIST=POISSON LINK=LOG TYPE3 WALDCI;CONTRAST ’GENDER&STRENGTH’ GENDER 1,STRENGTH 1;
RUN;
Statistics 262: Intermediate Biostatistics – p.40/??
Overdispersion in count data
Although Poisson is popular, it is not alwaysappropriate.
Often, observations are clustered more thanPoisson which increases the variance.
This phenomenon is called “overdispersion.”
Statistics 262: Intermediate Biostatistics – p.41/??
Simple approximate fix
We can estimate φ from either deviance orPearson X2.
This says that V ar(Yi) = φE(Yi) instead ofPoisson relationship where φ = 1.
Other solutions: use different distribution,maybe negative binomial.
Statistics 262: Intermediate Biostatistics – p.42/??
SAS code
PROC GENMOD;MODEL FALLS=INTERVENTION GENDER BALANCESTRENGTH /DIST=POISSON LINK=LOG TYPE3 WALDCI;CONTRAST ’GENDER&STRENGTH’ GENDER 1,STRENGTH 1;
RUN;
Statistics 262: Intermediate Biostatistics – p.43/??
Diagnostics
Diagnostic tests are similar to multipleregression, only use different residuals.
Deviance residuals: contribution of i-thobservation to deviance.
rD,i2 log L(Yi, Xij) − 2 log L(β0, . . . , βk|Yi, Xij)
Pearson residuals:
rP,i =Yi − µ̂i
φ̂V (µ̂i).
Statistics 262: Intermediate Biostatistics – p.44/??