Igert glmm

  • View
    298

  • Download
    5

  • Category

    Science

Preview:

DESCRIPTION

IGERT glmm talk

Citation preview

De�nitions Estimation Inference Challenges & open questions References

Generalized linear mixed model discussion

Ben Bolker

McMaster University, Mathematics & Statistics and Biology

25 April 2014

De�nitions Estimation Inference Challenges & open questions References

Acknowledgments

lme4: Doug Bates, MartinMächler, Steve Walker

Data: Josh Banta, Adrian Stier,Sea McKeon, David Julian,Jada-Simone White

NSERC (Discovery)

SHARCnet

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

De�nitions Estimation Inference Challenges & open questions References

Coral protection by symbionts(McKeon et al., 2012)

none shrimp crabs both

Number of predation events

Symbionts

Num

ber

of b

lock

s

0

2

4

6

8

10

1

2

0

1

2

0

2

0

1

2

De�nitions Estimation Inference Challenges & open questions References

Environmental stress: Glycera cell survival(D. Julian unpubl.)

H2S

Cop

per

0

33.3

66.6

133.3

0 0.03 0.1 0.32

Osm=12.8Normoxia

Osm=22.4Normoxia

0 0.03 0.1 0.32

Osm=32Normoxia

Osm=41.6Normoxia

0 0.03 0.1 0.32

Osm=51.2Normoxia

Osm=12.8Anoxia

0 0.03 0.1 0.32

Osm=22.4Anoxia

Osm=32Anoxia

0 0.03 0.1 0.32

Osm=41.6Anoxia

0

33.3

66.6

133.3

Osm=51.2Anoxia

0.0

0.2

0.4

0.6

0.8

1.0

De�nitions Estimation Inference Challenges & open questions References

Arabidopsis response to fertilization & clipping(Banta et al., 2010)

panel: nutrient, color: genotypeLo

g(1+

frui

t set

)

0

1

2

3

4

5

unclipped clipped

●●●●● ●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●● ●● ●

●●

●●

●●

●● ●

●●

● ●●● ●●● ●●● ●●● ●● ●● ●● ●

●●

● ●●● ●●●●

●●

● ●

●●●●●● ●●

●● ●

● ●

: nutrient 1

unclipped clipped

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●●●●

●●●

●●

●●

●●

● ●

●●

●●●● ●● ●●● ●●●● ●

●●

●●

●●

●●●●●●● ●●● ●●

●●

●●

●●●●●●

●●●●●

●●

: nutrient 8

De�nitions Estimation Inference Challenges & open questions References

Coral demography(J.-S. White unpubl.)

Before Experimental

● ● ● ●●●

● ●● ●

●●

●●●●

●● ● ●

●●● ●●

●●●

● ●●●

●●

●●

●●

●●●

●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

●●●●

● ●●●

●●

● ●● ●●

●●

●●

●●● ● ●

● ●● ●

●● ●●● ●●● ●

●●

●●● ●

●● ●●●

●●

●●

● ●● ●●●

●●●●

●●●

●●●●

●●●● ●

●●

●● ●● ●●●

●●

●●

●●● ●●

●● ●

● ●

●●0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)

Mor

talit

y pr

obab

ility

Treatment

Present

Removed

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

b︸︷︷︸conditionalmodes

∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix

)

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

a way to account for among-individual, within-block correlation

a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)

levels selected at random from a larger population

a way do to shrinkage estimation/share information amonglevels

a way to estimate variability among levels

a way to allow predictions on unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

Maximum likelihood estimation

L(Yi |θ,β)︸ ︷︷ ︸likelihood

=

∫· · ·

∫L(Yi |β,b)︸ ︷︷ ︸

data|random e�ects

× L(b|Σ(θ))︸ ︷︷ ︸random e�ects

db

Best �t is a compromise between two components(consistency of data with �xed e�ects and conditional modes;consistency of random e�ect with RE distribution)

De�nitions Estimation Inference Challenges & open questions References

Integrated (marginal) likelihood

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

conditional mode value (u)

Sca

led

prob

abili

ty

L(b |σ2)

L(x |b, β)

Lprod

De�nitions Estimation Inference Challenges & open questions References

Shrinkage: Arabidopsis conditional modes

● ●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Genotype

Mea

n(lo

g) fr

uit s

et

0 5 10 15 20 25

−15

−3

0

3

● ● ●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

3 2 10

8 10 43 9 9 4 6 4 2 6 10 5 7 9 4 9 11 2 5 5

De�nitions Estimation Inference Challenges & open questions References

Estimation methods

deterministic : various approximate integrals (Breslow, 2004) . . .

stochastic (Monte Carlo): frequentist and Bayesian (Booth andHobert, 1999; Ponciano et al., 2009; Sung, 2007)

De�nitions Estimation Inference Challenges & open questions References

Deterministic approaches

PQL fast and biased, especially for binary/low-count data:(MASS:glmmPQL)

Laplace intermediate (lme4:glmer, glmmML, glmmADMB,R2ADMB (AD Model Builder))

Gauss-Hermite quadrature slow but accurate (lme4:glmer,glmmML, repeated)

INLA Bayesian, very �exible: INLA

General trade-o� between �exibility (ADMB/glmmADMB) ande�ciency (lme4)

De�nitions Estimation Inference Challenges & open questions References

Stochastic approaches

Mostly Bayesians (Bayesian computation handleshigh-dimensional integration)

various �avours: Gibbs sampling, MCMC, MCEM, etc.

generally slower but more �exible

simpli�es many inferential problems

must specify priors, assess convergence/error

specialized: glmmAK, MCMCglmm (Had�eld, 2010), bernor

general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),R2jags, rjags (JAGS), glmer2stan, Stan

De�nitions Estimation Inference Challenges & open questions References

Estimation: example (McKeon et al., 2012)

Log−odds of predation−6 −4 −2 0 2

Symbiont

Crab vs. Shrimp

Added symbiont

GLM (fixed)GLM (pooled)PQLLaplaceAGQ

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

Wald tests

Wald tests (e.g. typical results of summary)

based on information matrixassume quadratic log-likelihood surface

exact for regular linear models;only asymptotically OK for GLM(M)s

computationally cheap

approximation is sometimes awful (Hauck-Donner e�ect)

De�nitions Estimation Inference Challenges & open questions References

2D pro�les for coral predation

Scatter Plot Matrix

.sig01

2 4 6 8 101214

−3−2−1

0

(Intercept)

0

5

10

1510 15

0 1 2 3

tttcrabs

−10−8−6−4−20

−4 −2 0

0 1 2 3

tttshrimp

−10−8−6−4−2 −6 −4 −2

0 1 2 3

tttboth

−12−10−8−6−4−2

0 1 2 3

De�nitions Estimation Inference Challenges & open questions References

Likelihood ratio tests

better, but still have to deal with two �nite-size problems:

�denominator degrees of freedom� (when estimating scale)numerator is only asymptotically χ2 anyway (Bartlettcorrections)Kenward-Roger correction? (Stroup, 2014)

Pro�le con�dence intervals: moderately expensive/fragile

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrapping

�t null model to data

simulate �data� from null model

�t null and working model, compute likelihood di�erence

repeat to estimate null distribution

should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrap results

True p value

Infe

rred

p v

alue

0.020.040.060.08

0.02 0.06

Osm Cu

H2S

0.02 0.06

0.020.040.060.08

Anoxia

De�nitions Estimation Inference Challenges & open questions References

Bayesian approaches

If we have a good sample from the posterior distribution(Markov chains have converged etc. etc.) we get most of theinferences we want for free by summarizing the marginalposteriors

post hoc Bayesian can work, but mode at zero causesproblems

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

On beyond R

Julia: MixedModels package

SAS: PROC MIXED, NLMIXED

AS-REML

Stata (GLLAMM, xtmelogit)

AD Model Builder

HLM, MLWiN

De�nitions Estimation Inference Challenges & open questions References

Challenges

Small/medium data: inference, singular �ts (blme, MCMCglmm)

Big data: speed!

Worst case: large n, small N (e.g. telemetry/genomics)

Model diagnosis

Con�dence intervals accounting for uncertainty in variances

See also: http://rpubs.com/bbolker/glmmchapter, https://groups.nceas.ucsb.edu/non-linear-modeling/projects

De�nitions Estimation Inference Challenges & open questions References

What about space?

Sometimes blocks are spatial partitions (sites, zip codes,states)

O�-the-shelf methods for �true� spatial GLMMs?(Dormann et al., 2007)

Correlation of residuals or conditional modes?

INLA; GeoBUGS; ADMB

lme4: hacking Z : pedigrees, moving average, CAR models?

lme4: flexLambda branch

methods also apply to temporal, phylogenetic correlations(Ives and Helmus, 2011)

De�nitions Estimation Inference Challenges & open questions References

Next steps

Complex random e�ects:regularization, model selection, penalized methods(lasso/fence)

Flexible correlation and variance structures

Flexible/nonparametric random e�ects distributions

hybrid & improved MCMC methods

Reliable assessment of out-of-sample performance

De�nitions Estimation Inference Challenges & open questions References

Banta, J.A., Stevens, M.H.H., and Pigliucci, M., 2010. Oikos, 119(2):359�369. ISSN 1600-0706.doi:10.1111/j.1600-0706.2009.17726.x.

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.

Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.

Dormann, C.F., McPherson, J.M., et al., 2007. Ecography, 30(5):609�628.doi:10.1111/j.2007.0906-7590.05171.x.

Had�eld, J.D., 2010. Journal of Statistical Software, 33(2):1�22. ISSN 1548-7660.

Ives, A.R. and Helmus, M.R., 2011. Ecological Monographs, 81(3):511�525. ISSN 0012-9615.doi:10.1890/10-1264.1.

McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.

Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.

Stroup, W.W., 2014. Agronomy Journal, 106:1�17. doi:10.2134/agronj2013.0342.

Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.

Recommended