Upload
ben-bolker
View
297
Download
5
Embed Size (px)
DESCRIPTION
IGERT glmm talk
Citation preview
De�nitions Estimation Inference Challenges & open questions References
Generalized linear mixed model discussion
Ben Bolker
McMaster University, Mathematics & Statistics and Biology
25 April 2014
De�nitions Estimation Inference Challenges & open questions References
Acknowledgments
lme4: Doug Bates, MartinMächler, Steve Walker
Data: Josh Banta, Adrian Stier,Sea McKeon, David Julian,Jada-Simone White
NSERC (Discovery)
SHARCnet
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuouspredictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random e�ects)
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuouspredictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random e�ects)
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuouspredictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random e�ects)
De�nitions Estimation Inference Challenges & open questions References
Coral protection by symbionts(McKeon et al., 2012)
none shrimp crabs both
Number of predation events
Symbionts
Num
ber
of b
lock
s
0
2
4
6
8
10
1
2
0
1
2
0
2
0
1
2
De�nitions Estimation Inference Challenges & open questions References
Environmental stress: Glycera cell survival(D. Julian unpubl.)
H2S
Cop
per
0
33.3
66.6
133.3
0 0.03 0.1 0.32
Osm=12.8Normoxia
Osm=22.4Normoxia
0 0.03 0.1 0.32
Osm=32Normoxia
Osm=41.6Normoxia
0 0.03 0.1 0.32
Osm=51.2Normoxia
Osm=12.8Anoxia
0 0.03 0.1 0.32
Osm=22.4Anoxia
Osm=32Anoxia
0 0.03 0.1 0.32
Osm=41.6Anoxia
0
33.3
66.6
133.3
Osm=51.2Anoxia
0.0
0.2
0.4
0.6
0.8
1.0
De�nitions Estimation Inference Challenges & open questions References
Arabidopsis response to fertilization & clipping(Banta et al., 2010)
panel: nutrient, color: genotypeLo
g(1+
frui
t set
)
0
1
2
3
4
5
unclipped clipped
●●●●● ●
●●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●●● ●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●● ●● ●
●
●●
●
●●
●
●●
●
●
●
●
●● ●
●●
●
●
●
● ●●● ●●● ●●● ●●● ●● ●● ●● ●
●
●●
● ●●● ●●●●
●
●
●
●●
●
●
● ●
●
●
●
●●●●●● ●●
●
●● ●
●
●
● ●
●
●
●
●
●
: nutrient 1
unclipped clipped
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●●
●
●
●
●
●
●
●●
●●●●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●● ●● ●●● ●●●● ●
●
●
●●
●●
●●
●
●
●●●●●●● ●●● ●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●●●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
: nutrient 8
De�nitions Estimation Inference Challenges & open questions References
Coral demography(J.-S. White unpubl.)
Before Experimental
●
●
● ● ● ●●●
●
● ●● ●
●●
●●●●
●
●● ● ●
●●● ●●
●
●●●
● ●●●
●
●●
●
●
●●
●●
●●●
●● ●
●●
●●
● ●
●
●●
●
● ●
●
●●●
●●●
●
●●●●
● ●●●
●
●●
● ●● ●●
●●
●●
●
●●● ● ●
●
● ●● ●
●● ●●● ●●● ●
●●
●
●
●
●●● ●
●
●
●
●● ●●●
●
●
●
●●
●●
● ●● ●●●
●
●
●
●●●●
●
●
●●●
●●●●
●●●● ●
●●
●● ●● ●●●
●●
●●
●
●●● ●●
●
●● ●
●
● ●
●
●●0.00
0.25
0.50
0.75
1.00
0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)
Mor
talit
y pr
obab
ility
Treatment
●
●
Present
Removed
De�nitions Estimation Inference Challenges & open questions References
Technical de�nition
Yi︸︷︷︸response
∼
conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸
inverselink
function
, φ︸︷︷︸scale
parameter
)
η︸︷︷︸linear
predictor
= Xβ︸︷︷︸�xede�ects
+ Zb︸︷︷︸randome�ects
b︸︷︷︸conditionalmodes
∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix
)
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
a way to account for among-individual, within-block correlation
a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)
levels selected at random from a larger population
a way do to shrinkage estimation/share information amonglevels
a way to estimate variability among levels
a way to allow predictions on unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
a way to account for among-individual, within-block correlation
a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)
levels selected at random from a larger population
a way do to shrinkage estimation/share information amonglevels
a way to estimate variability among levels
a way to allow predictions on unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
a way to account for among-individual, within-block correlation
a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)
levels selected at random from a larger population
a way do to shrinkage estimation/share information amonglevels
a way to estimate variability among levels
a way to allow predictions on unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
a way to account for among-individual, within-block correlation
a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)
levels selected at random from a larger population
a way do to shrinkage estimation/share information amonglevels
a way to estimate variability among levels
a way to allow predictions on unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
a way to account for among-individual, within-block correlation
a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)
levels selected at random from a larger population
a way do to shrinkage estimation/share information amonglevels
a way to estimate variability among levels
a way to allow predictions on unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
What are random e�ects?
a way to account for among-individual, within-block correlation
a compromise between complete pooling (σ2among = 0) and�xed e�ects (σ2among →∞)
levels selected at random from a larger population
a way do to shrinkage estimation/share information amonglevels
a way to estimate variability among levels
a way to allow predictions on unmeasured levels
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
Maximum likelihood estimation
L(Yi |θ,β)︸ ︷︷ ︸likelihood
=
∫· · ·
∫L(Yi |β,b)︸ ︷︷ ︸
data|random e�ects
× L(b|Σ(θ))︸ ︷︷ ︸random e�ects
db
Best �t is a compromise between two components(consistency of data with �xed e�ects and conditional modes;consistency of random e�ect with RE distribution)
De�nitions Estimation Inference Challenges & open questions References
Integrated (marginal) likelihood
−10 −5 0 5 10
0.0
0.2
0.4
0.6
0.8
1.0
conditional mode value (u)
Sca
led
prob
abili
ty
L(b |σ2)
L(x |b, β)
Lprod
De�nitions Estimation Inference Challenges & open questions References
Shrinkage: Arabidopsis conditional modes
● ●
●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Genotype
Mea
n(lo
g) fr
uit s
et
0 5 10 15 20 25
−15
−3
0
3
● ● ●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
3 2 10
8 10 43 9 9 4 6 4 2 6 10 5 7 9 4 9 11 2 5 5
De�nitions Estimation Inference Challenges & open questions References
Estimation methods
deterministic : various approximate integrals (Breslow, 2004) . . .
stochastic (Monte Carlo): frequentist and Bayesian (Booth andHobert, 1999; Ponciano et al., 2009; Sung, 2007)
De�nitions Estimation Inference Challenges & open questions References
Deterministic approaches
PQL fast and biased, especially for binary/low-count data:(MASS:glmmPQL)
Laplace intermediate (lme4:glmer, glmmML, glmmADMB,R2ADMB (AD Model Builder))
Gauss-Hermite quadrature slow but accurate (lme4:glmer,glmmML, repeated)
INLA Bayesian, very �exible: INLA
General trade-o� between �exibility (ADMB/glmmADMB) ande�ciency (lme4)
De�nitions Estimation Inference Challenges & open questions References
Stochastic approaches
Mostly Bayesians (Bayesian computation handleshigh-dimensional integration)
various �avours: Gibbs sampling, MCMC, MCEM, etc.
generally slower but more �exible
simpli�es many inferential problems
must specify priors, assess convergence/error
specialized: glmmAK, MCMCglmm (Had�eld, 2010), bernor
general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),R2jags, rjags (JAGS), glmer2stan, Stan
De�nitions Estimation Inference Challenges & open questions References
Estimation: example (McKeon et al., 2012)
Log−odds of predation−6 −4 −2 0 2
Symbiont
Crab vs. Shrimp
Added symbiont
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
GLM (fixed)GLM (pooled)PQLLaplaceAGQ
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
Wald tests
Wald tests (e.g. typical results of summary)
based on information matrixassume quadratic log-likelihood surface
exact for regular linear models;only asymptotically OK for GLM(M)s
computationally cheap
approximation is sometimes awful (Hauck-Donner e�ect)
De�nitions Estimation Inference Challenges & open questions References
2D pro�les for coral predation
Scatter Plot Matrix
.sig01
2 4 6 8 101214
−3−2−1
0
(Intercept)
0
5
10
1510 15
0 1 2 3
tttcrabs
−10−8−6−4−20
−4 −2 0
0 1 2 3
tttshrimp
−10−8−6−4−2 −6 −4 −2
0 1 2 3
tttboth
−12−10−8−6−4−2
0 1 2 3
De�nitions Estimation Inference Challenges & open questions References
Likelihood ratio tests
better, but still have to deal with two �nite-size problems:
�denominator degrees of freedom� (when estimating scale)numerator is only asymptotically χ2 anyway (Bartlettcorrections)Kenward-Roger correction? (Stroup, 2014)
Pro�le con�dence intervals: moderately expensive/fragile
De�nitions Estimation Inference Challenges & open questions References
Parametric bootstrapping
�t null model to data
simulate �data� from null model
�t null and working model, compute likelihood di�erence
repeat to estimate null distribution
should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)
De�nitions Estimation Inference Challenges & open questions References
Parametric bootstrap results
True p value
Infe
rred
p v
alue
0.020.040.060.08
0.02 0.06
Osm Cu
H2S
0.02 0.06
0.020.040.060.08
Anoxia
De�nitions Estimation Inference Challenges & open questions References
Bayesian approaches
If we have a good sample from the posterior distribution(Markov chains have converged etc. etc.) we get most of theinferences we want for free by summarizing the marginalposteriors
post hoc Bayesian can work, but mode at zero causesproblems
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
De�nitions Estimation Inference Challenges & open questions References
On beyond R
Julia: MixedModels package
SAS: PROC MIXED, NLMIXED
AS-REML
Stata (GLLAMM, xtmelogit)
AD Model Builder
HLM, MLWiN
De�nitions Estimation Inference Challenges & open questions References
Challenges
Small/medium data: inference, singular �ts (blme, MCMCglmm)
Big data: speed!
Worst case: large n, small N (e.g. telemetry/genomics)
Model diagnosis
Con�dence intervals accounting for uncertainty in variances
See also: http://rpubs.com/bbolker/glmmchapter, https://groups.nceas.ucsb.edu/non-linear-modeling/projects
De�nitions Estimation Inference Challenges & open questions References
What about space?
Sometimes blocks are spatial partitions (sites, zip codes,states)
O�-the-shelf methods for �true� spatial GLMMs?(Dormann et al., 2007)
Correlation of residuals or conditional modes?
INLA; GeoBUGS; ADMB
lme4: hacking Z : pedigrees, moving average, CAR models?
lme4: flexLambda branch
methods also apply to temporal, phylogenetic correlations(Ives and Helmus, 2011)
De�nitions Estimation Inference Challenges & open questions References
Next steps
Complex random e�ects:regularization, model selection, penalized methods(lasso/fence)
Flexible correlation and variance structures
Flexible/nonparametric random e�ects distributions
hybrid & improved MCMC methods
Reliable assessment of out-of-sample performance
De�nitions Estimation Inference Challenges & open questions References
Banta, J.A., Stevens, M.H.H., and Pigliucci, M., 2010. Oikos, 119(2):359�369. ISSN 1600-0706.doi:10.1111/j.1600-0706.2009.17726.x.
Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.
Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.
Dormann, C.F., McPherson, J.M., et al., 2007. Ecography, 30(5):609�628.doi:10.1111/j.2007.0906-7590.05171.x.
Had�eld, J.D., 2010. Journal of Statistical Software, 33(2):1�22. ISSN 1548-7660.
Ives, A.R. and Helmus, M.R., 2011. Ecological Monographs, 81(3):511�525. ISSN 0012-9615.doi:10.1890/10-1264.1.
McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.
Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.
Stroup, W.W., 2014. Agronomy Journal, 106:1�17. doi:10.2134/agronj2013.0342.
Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.