Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/~jtaylo/courses/stats262/spring.2004/notes/... · SAS code PROC GLM DATA=DATADIR.beer; ... Statistics 262: Intermediate

Statistics 262: IntermediateBiostatistics

Random Effects, GLMs

Jonathan Taylor & Kristin Cobb

Statistics 262: Intermediate Biostatistics – p.1/??

Overview of today’s class

Analysis of Variance: Random effects

Generalized Linear Models


Example: sodium content in beer

How much sodium is there in North Americanbeer? How much does this vary by brand?

Observations: for 6 brands of beer, werecorded the sodium content of 8 12 ouncebottles.

Questions of interest: what is the “grandmean” sodium content? How much variabilityis there from brand to brand?

“Individuals” in this case are brands, repeatedmeasures are the 8 bottles.


One-way random effects model

Yij ∼ µ· + αi + εij, 1 ≤ i ≤ r, 1 ≤ j ≤ n

εij ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ n

αi ∼ N(0, σ2µ), 1 ≤ i ≤ r.


One-way random (equal sample sizes)

Source SS df E(SS)

Treatments SSTR =Pr

i=1n

`

Y i· − Y··

´2r − 1 σ2 + nσ2

µ

Error SSE =Pr

i=1

Pnj=1

(Yij − Y i·)2 (n − 1)r σ2

ANOVA table is still useful to setup tests: thesame F statistics for fixed or random will workhere.


Inference for µ·

We know that E(Y ··) = µ·, and can show that

Var(Y ··) =nσ2

µ + σ2

rn.

Therefore,Y ·· − µ·

SSTR(r−1)rn

∼ tr−1


Inference for µ·

Why r − 1 degrees of freedom? Imagine wecould record an infinite number ofobservations for each individual, so thatY i· → µi.

To learn anything about µ· we still only have robservations (µ1, . . . , µr).

Sampling more within an individual cannotnarrow the CI for µ·.


Inference for σ2µ

σ2µ+σ2

This quantity describes the relativecontribution of the random effects variance tothe total variance of one observation.

We use the fact that

F =σ2 + nσ2

µ

σ2×

SSTRr−1SSE

(n−1)r

∼ Fr−1,(n−1)r.

Manipulate the inequalities:

P (Fr−1,(n−1)r,α/2 ≤ F ≤ Fr−1,(n−1)r,1−α/2) = 1−α.


Estimating σ2µ

From the ANOVA table

σ2µ =

E(SSTR/(r − 1)) − E(SSE/((n − 1)r))

n.

Natural estimate:

S2µ =

SSTR/(r − 1) − SSE/((n − 1)r)

n

Problem: this estimate can be negative! Oneof the difficulties in random effects model.


SAS code

PROC GLM DATA=DATADIR.beer;

CLASS BRAND BOTTLE;

MODEL SODIUM = BRAND;

MEANS BRAND / LSD CLDIFF;

RUN;


Two-way random effects model

Yijk ∼ µ·· + αi + βj + (αβ)ij + εij, 1 ≤ i ≤ r, 1 ≤j ≤ m, 1 ≤ k ≤ n

εijk ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤n

αi ∼ N(0, σ2α), 1 ≤ i ≤ r.

βj ∼ N(0, σ2β), 1 ≤ j ≤ m.

(αβ)ij ∼ N(0, σ2αβ), 1 ≤ j ≤ m, 1 ≤ i ≤ r.


ANOVA tables: Two-way (random)

SS df E(MS)

SSA r − 1 σ2 + nmσ2α + nσ2

αβ

SSB m − 1 σ2 + nrσ2

β+ nσ2

αβ

SSAB (m − 1)(r − 1) σ2 + nσ2

αβ

SSE =Pr

i=1

Pmj=1

Pnk=1

(Yijk − Y ij·)2 (n − 1)mr σ2

To test H0 : σ2α = 0 use SSA and SSAB.

To test H0 : σ2αβ = 0 use SSAB and SSE.


Mixed effects model

In some studies, some factors can be thoughtof as fixed, others random.

For instance, we might have a study of theeffect of a standard part of the brewingprocess on sodium levels in the beerexample.

Then, we might think of a model in which wehave a fixed effect for “brewing technique”and a random effect for beer.


Two-way mixed effects model

Yijk ∼ µ·· + αi + βj + (αβ)ij + εij, 1 ≤ i ≤ r, 1 ≤j ≤ m, 1 ≤ k ≤ n

εijk ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤n

αi ∼ N(0, σ2α), 1 ≤ i ≤ r.

βj, 1 ≤ j ≤ m are constants.

(αβ)ij ∼ N(0, σ2αβ), 1 ≤ j ≤ m, 1 ≤ i ≤ r.


Constraints

∑mj=1 βj = 0

∑ri=1(αβ)ij = 0, 1 ≤ i ≤ r.


ANOVA tables: Two-way (mixed)

SS df E(MS)

SSA r − 1 σ2 + nmσ2α

SSB m − 1 σ2 + nr

Pmj=1

β2

i

m−1+ nσ2

αβ

SSAB (m − 1)(r − 1) σ2 + nσ2

αβ

SSE =Pr

i=1

Pmj=1

Pnk=1

(Yijk − Y ij·)2 (n − 1)ab σ2

To test H0 : σ2α = 0 use SSA and SSE.

To test H0 : β1 = · · · = βm = 0 use SSB andSSAB.

To test H0 : σ2αβ use SSAB and SSE.


Example: pearls

Imitation pearl “study”.

Two factors: how many coats of lacquer, andbatch.

Natural to use batch as random, coats asfixed, but we’ll do both.


SAS code: two-way

PROC MIXED;CLASS BATCH COATS;MODEL VALUE= ;RANDOM COATS BATCH COATS*BATCH;

RUN;


SAS code: mixed

PROC MIXED;CLASS COATS BATCH;MODEL VALUE=COATS;RANDOM BATCH COATS*BATCH;

RUN;


Nested vs. crossed

In previous examples the factors are suchthat we observe every pair of possible values.

Sometimes, particularly in random effectsmodels, this is not possible.

For example, consider a model with “city” as afactor, and “country” as another.


Nested random effect models

Suppose we want to study the effectivenessof a recycling program at reducing householdwaste.

Outcome: monthly per capita householdwaste.

Design: for all G8 countries, recyclingprogram is implemented in 10 cities atrandom, with 10 other cities chosen atrandom. Outcome is measured each monthfor a year.


What can be estimated?

Interactions between city and country can’t beestimated, nor between city and “program”.

We say that “city” is nested within “country”and “program” because the levels of city can’tappear in other countries, and each city eitherimplements the program or not.

It is possible to estimate interactions betweentime, country and program.

City can be either a random or fixed effect –here we treat it as random.


Nested model: SAS “code”

PROC MIXED;CLASS CITY COUNTRY PROGRAM MONTH;MODEL WASTE = COUNTRY PROGRAM MONTHCOUNTRY*PROGRAM;RANDOM CITY(COUNTRY PROGRAM);

LSMEANS COUNTRY*MONTH;


Generalized linear models

All models we have seen so far deal withcontinuous outcome variables with norestriction on their expectations, and all haveassumed that mean and variance areunrelated.

Many outcomes of interest do not satisfy this.

Examples: binary outcomes, Poisson countoutcomes.


Binary outcomes

If our outcome Yi is a 0-1 random variable,then 0 ≤ E(Yi) ≤ 1.

Also,

E(Yi) = πi, V ar(Yi) = πi(1 − πi).

A convenient way to model the dependenceof Yi on covariates Xi1, . . . , Xik is through thelogit transform.


Logit transform

Logit transform:

logit(π) = log

(π

1 − π

)∈ (−∞, +∞)

Inverse:

logit−1(x) =ex

1 + ex∈ (0, 1).


Binary regression

Logistic regression model:

logit(E(Yi)) = logit(πi) = β0 +k∑

j=1

βjXij .

Probit regression model:

Φ−1(E(Yi)) = β0+k∑

j=1

βjXij, Φ(x) =

∫ x

−∞

e−x2/2

√2π

dx.

In each case, V ar(Yi) = πi(1 − πi) but theregression model is different.


Link & variance fns.

If

ηi = g(E(Yi)) = g(µi) = β0 +k∑

j=1

βjXij

then g is the link function for the model.

IfV ar(Yi) = φ · V (E(Yi)) = φ · V (µi)

for some function V , then V is the variancefunction for the data.

logit(E(Yi)) = logit(πi) = β0 +k∑

j=1

βjXij .


Binary (again)

For a logistic model,

g(µ) = logit(µ), V (µ) = µ(1 − µ).

For a probit model,

g(µ) = Φ−1(µ), V (µ) = µ(1 − µ).


Other common examples

Multiple linear regression:g(µ) = µ, V ar(µ) = 1

Linear regression with variance tied to meang(µ) = µ, V ar(µ) = µ2.

Poisson log-linear models:g(µ) = log(µ), V ar(µ) = µ.


Fitting a GLM

When fitting GLMs, instead of SSE, thelog-likelihood is minimized,

DEV (β0, . . . , βk)n∑

i=1

2 log L(Yi, Xij) − 2 log L(β0, . . . , βk|Yi, Xij).

The first term above is the likelihood in thecompletely unrestricted model. For example,in Gaussian setting it is zero (can set eachµi = Yi). In Poisson setting it is

2

(n∑

i=1

Yi log(Yi) − Yi − log(Yi!)

).


Quasi-likelihood

For some choices of link and variancefunctions, there is no deviance.

In these situations, quasi-likelihood is what isminimized instead

X2(β0, . . . , βk) =n∑

i=1

(Yi − µi(β0, . . . , βk)√

V (µi)

)2

.

Pearson’s X2 also gives an estimate of φ ifnot specified as in binary or Poisson models

φ̂ =1

n − k − 1X2(β̂0, . . . , β̂k).


Inference in GLMs

Large sample theory tells us that

β̂ ∼ N(β, φ̂(XTW−1X)−1

)

with weights Wi = V (µ̂i).

For testing individual β’s everything isbasically the same as in multiple regression,just have different estimates ofSE(

∑kj=0 ajβ̂j).

Likewise, confidence intervals just havedifferent estimates of SE. Use normal insteadof t. Statistics 262: Intermediate Biostatistics – p.33/??

Partial deviance tests

As in multiple regression to test that somesubset H0 : βi1 = · · · = βil = 0 we fit a full anda reduced model.

Large sample theory again tells us that

DEVχ2 =DEV (R) − DEV (F )

φ̂∼ χ2

dfR−dfF.

If deviance is unavailable, Pearson’s X2 issubstituted.

Reject H0 if DEVχ2 is larger than χ2dfR−dfF ,1−α.


Wald χ2 tests

Test can also be done using a Wald χ2 whichdoes not fit a full and reduced model.

Wald χ2 to test Cβ = 0:

WALDχ2 = Cβ̂(C(XTW−1X)−1CT )−1(Cβ̂T ).

Reject H0 : Cβ = 0 if WALDχ2 is larger thanχ2

#rowsC,1−α.


Logistic example

A local health clinic sent fliers to its clients toencourage everyone, but especially olderpersons at high risk of complications, to get aflu shot in time for protection against anexpected flu epidemic.

In a pilot follow-up study, 50 clients wererandomly selected and asked whether theyactually received a flu shot.

In addition, data were collected on their ageand their health awareness.


SAS code: logistic

PROC GENMOD;MODEL SHOT = AGE HEALTH AGE*HEALTH /DIST=BINOMIAL LINK=LOGIT TYPE3WALD CL WALDCI;RUN;


SAS code: probit

PROC GENMOD;MODEL SHOT = AGE HEALTH AGE*HEALTH /DIST=BINOMIAL LINK=PROBIT TYPE3WALD CL WALDCI;RUN;


Poisson example

A researcher in geriatrics designed aprospective study to investigate the effects oftwo interventions on the frequency of falls.

One hundred subjects were randomlyassigned to one of the two interventions:education only and education plus exercise.

Three variables: gender, a balance index, anda strength index.


SAS code

PROC GENMOD;MODEL FALLS=INTERVENTION GENDER BALANCESTRENGTH /DIST=POISSON LINK=LOG TYPE3 WALDCI;CONTRAST ’GENDER&STRENGTH’ GENDER 1,STRENGTH 1;

RUN;


Overdispersion in count data

Although Poisson is popular, it is not alwaysappropriate.

Often, observations are clustered more thanPoisson which increases the variance.

This phenomenon is called “overdispersion.”


Simple approximate fix

We can estimate φ from either deviance orPearson X2.

This says that V ar(Yi) = φE(Yi) instead ofPoisson relationship where φ = 1.

Other solutions: use different distribution,maybe negative binomial.


SAS code

PROC GENMOD;MODEL FALLS=INTERVENTION GENDER BALANCESTRENGTH /DIST=POISSON LINK=LOG TYPE3 WALDCI;CONTRAST ’GENDER&STRENGTH’ GENDER 1,STRENGTH 1;

RUN;


Diagnostics

Diagnostic tests are similar to multipleregression, only use different residuals.

Deviance residuals: contribution of i-thobservation to deviance.

rD,i2 log L(Yi, Xij) − 2 log L(β0, . . . , βk|Yi, Xij)

Pearson residuals:

rP,i =Yi − µ̂i

φ̂V (µ̂i).


Documents

Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/~jtaylo/courses/stats262/spring.2004/notes/... · SAS code PROC GLM DATA=DATADIR.beer; ... Statistics 262: Intermediate