39

Igert glmm

Embed Size (px)

DESCRIPTION

IGERT glmm talk

Citation preview

Page 1: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Generalized linear mixed model discussion

Ben Bolker

McMaster University, Mathematics & Statistics and Biology

25 April 2014

Page 2: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Acknowledgments

lme4: Doug Bates, MartinMächler, Steve Walker

Data: Josh Banta, Adrian Stier,Sea McKeon, David Julian,Jada-Simone White

NSERC (Discovery)

SHARCnet

Page 3: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 4: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 5: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Page 6: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Page 7: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Page 8: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Coral protection by symbionts(McKeon et al., 2012)

none shrimp crabs both

Number of predation events

Symbionts

Num

ber

of b

lock

s

0

2

4

6

8

10

1

2

0

1

2

0

2

0

1

2

Page 9: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Environmental stress: Glycera cell survival(D. Julian unpubl.)

H2S

Cop

per

0

33.3

66.6

133.3

0 0.03 0.1 0.32

Osm=12.8Normoxia

Osm=22.4Normoxia

0 0.03 0.1 0.32

Osm=32Normoxia

Osm=41.6Normoxia

0 0.03 0.1 0.32

Osm=51.2Normoxia

Osm=12.8Anoxia

0 0.03 0.1 0.32

Osm=22.4Anoxia

Osm=32Anoxia

0 0.03 0.1 0.32

Osm=41.6Anoxia

0

33.3

66.6

133.3

Osm=51.2Anoxia

0.0

0.2

0.4

0.6

0.8

1.0

Page 10: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Arabidopsis response to fertilization & clipping(Banta et al., 2010)

panel: nutrient, color: genotypeLo

g(1+

frui

t set

)

0

1

2

3

4

5

unclipped clipped

●●●●● ●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●● ●● ●

●●

●●

●●

●● ●

●●

● ●●● ●●● ●●● ●●● ●● ●● ●● ●

●●

● ●●● ●●●●

●●

● ●

●●●●●● ●●

●● ●

● ●

: nutrient 1

unclipped clipped

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●●●●

●●●

●●

●●

●●

● ●

●●

●●●● ●● ●●● ●●●● ●

●●

●●

●●

●●●●●●● ●●● ●●

●●

●●

●●●●●●

●●●●●

●●

: nutrient 8

Page 11: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Coral demography(J.-S. White unpubl.)

Before Experimental

● ● ● ●●●

● ●● ●

●●

●●●●

●● ● ●

●●● ●●

●●●

● ●●●

●●

●●

●●

●●●

●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

●●●●

● ●●●

●●

● ●● ●●

●●

●●

●●● ● ●

● ●● ●

●● ●●● ●●● ●

●●

●●● ●

●● ●●●

●●

●●

● ●● ●●●

●●●●

●●●

●●●●

●●●● ●

●●

●● ●● ●●●

●●

●●

●●● ●●

●● ●

● ●

●●0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)

Mor

talit

y pr

obab

ility

Treatment

Present

Removed

Page 12: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

b︸︷︷︸conditionalmodes

∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix

)

Page 13: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

Page 14: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

Page 15: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

Page 16: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

Page 17: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

Page 18: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

Page 19: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 20: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Maximum likelihood estimation

L(Yi |θ,β)︸ ︷︷ ︸likelihood

=

∫· · ·

∫L(Yi |β,b)︸ ︷︷ ︸

data|random e�ects

× L(b|Σ(θ))︸ ︷︷ ︸random e�ects

db

Best �t is a compromise between two components(consistency of data with �xed e�ects and conditional modes;consistency of random e�ect with RE distribution)

Page 21: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Integrated (marginal) likelihood

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

conditional mode value (u)

Sca

led

prob

abili

ty

L(b |σ2)

L(x |b, β)

Lprod

Page 22: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Shrinkage: Arabidopsis conditional modes

● ●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Genotype

Mea

n(lo

g) fr

uit s

et

0 5 10 15 20 25

−15

−3

0

3

● ● ●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

3 2 10

8 10 43 9 9 4 6 4 2 6 10 5 7 9 4 9 11 2 5 5

Page 23: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Estimation methods

deterministic : various approximate integrals (Breslow, 2004) . . .

stochastic (Monte Carlo): frequentist and Bayesian (Booth andHobert, 1999; Ponciano et al., 2009; Sung, 2007)

Page 24: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Deterministic approaches

PQL fast and biased, especially for binary/low-count data:(MASS:glmmPQL)

Laplace intermediate (lme4:glmer, glmmML, glmmADMB,R2ADMB (AD Model Builder))

Gauss-Hermite quadrature slow but accurate (lme4:glmer,glmmML, repeated)

INLA Bayesian, very �exible: INLA

General trade-o� between �exibility (ADMB/glmmADMB) ande�ciency (lme4)

Page 25: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Stochastic approaches

Mostly Bayesians (Bayesian computation handleshigh-dimensional integration)

various �avours: Gibbs sampling, MCMC, MCEM, etc.

generally slower but more �exible

simpli�es many inferential problems

must specify priors, assess convergence/error

specialized: glmmAK, MCMCglmm (Had�eld, 2010), bernor

general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),R2jags, rjags (JAGS), glmer2stan, Stan

Page 26: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Estimation: example (McKeon et al., 2012)

Log−odds of predation−6 −4 −2 0 2

Symbiont

Crab vs. Shrimp

Added symbiont

GLM (fixed)GLM (pooled)PQLLaplaceAGQ

Page 27: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 28: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Wald tests

Wald tests (e.g. typical results of summary)

based on information matrixassume quadratic log-likelihood surface

exact for regular linear models;only asymptotically OK for GLM(M)s

computationally cheap

approximation is sometimes awful (Hauck-Donner e�ect)

Page 29: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

2D pro�les for coral predation

Scatter Plot Matrix

.sig01

2 4 6 8 101214

−3−2−1

0

(Intercept)

0

5

10

1510 15

0 1 2 3

tttcrabs

−10−8−6−4−20

−4 −2 0

0 1 2 3

tttshrimp

−10−8−6−4−2 −6 −4 −2

0 1 2 3

tttboth

−12−10−8−6−4−2

0 1 2 3

Page 30: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Likelihood ratio tests

better, but still have to deal with two �nite-size problems:

�denominator degrees of freedom� (when estimating scale)numerator is only asymptotically χ2 anyway (Bartlettcorrections)Kenward-Roger correction? (Stroup, 2014)

Pro�le con�dence intervals: moderately expensive/fragile

Page 31: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrapping

�t null model to data

simulate �data� from null model

�t null and working model, compute likelihood di�erence

repeat to estimate null distribution

should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)

Page 32: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrap results

True p value

Infe

rred

p v

alue

0.020.040.060.08

0.02 0.06

Osm Cu

H2S

0.02 0.06

0.020.040.060.08

Anoxia

Page 33: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Bayesian approaches

If we have a good sample from the posterior distribution(Markov chains have converged etc. etc.) we get most of theinferences we want for free by summarizing the marginalposteriors

post hoc Bayesian can work, but mode at zero causesproblems

Page 34: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 35: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

On beyond R

Julia: MixedModels package

SAS: PROC MIXED, NLMIXED

AS-REML

Stata (GLLAMM, xtmelogit)

AD Model Builder

HLM, MLWiN

Page 36: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Challenges

Small/medium data: inference, singular �ts (blme, MCMCglmm)

Big data: speed!

Worst case: large n, small N (e.g. telemetry/genomics)

Model diagnosis

Con�dence intervals accounting for uncertainty in variances

See also: http://rpubs.com/bbolker/glmmchapter, https://groups.nceas.ucsb.edu/non-linear-modeling/projects

Page 37: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

What about space?

Sometimes blocks are spatial partitions (sites, zip codes,states)

O�-the-shelf methods for �true� spatial GLMMs?(Dormann et al., 2007)

Correlation of residuals or conditional modes?

INLA; GeoBUGS; ADMB

lme4: hacking Z : pedigrees, moving average, CAR models?

lme4: flexLambda branch

methods also apply to temporal, phylogenetic correlations(Ives and Helmus, 2011)

Page 38: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Next steps

Complex random e�ects:regularization, model selection, penalized methods(lasso/fence)

Flexible correlation and variance structures

Flexible/nonparametric random e�ects distributions

hybrid & improved MCMC methods

Reliable assessment of out-of-sample performance

Page 39: Igert glmm

De�nitions Estimation Inference Challenges & open questions References

Banta, J.A., Stevens, M.H.H., and Pigliucci, M., 2010. Oikos, 119(2):359�369. ISSN 1600-0706.doi:10.1111/j.1600-0706.2009.17726.x.

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.

Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.

Dormann, C.F., McPherson, J.M., et al., 2007. Ecography, 30(5):609�628.doi:10.1111/j.2007.0906-7590.05171.x.

Had�eld, J.D., 2010. Journal of Statistical Software, 33(2):1�22. ISSN 1548-7660.

Ives, A.R. and Helmus, M.R., 2011. Ecological Monographs, 81(3):511�525. ISSN 0012-9615.doi:10.1890/10-1264.1.

McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.

Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.

Stroup, W.W., 2014. Agronomy Journal, 106:1�17. doi:10.2134/agronj2013.0342.

Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.