Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
LM GLM MLM
11/19/2019 Interpretable Linear Modeling 2
Overview
Linear Modeling
Generalized Linear Modeling
Multilevel Modeling
โข Linear models (LM) are statistical prediction models with high interpretabilityโข They are best combined with a small number of interpretable featuresโข They allow you to quantify the size and direction of each feature's effectโข They allow you to isolate the unique effect of each feature
โข LM incorporates regression, ANOVA, ๐ก๐ก-tests, and ๐น๐น-testsโข LM allows for multiple ๐๐ (predictor or independent) variablesโข LM allows for multiple ๐๐ (explained or dependent) variables
โข LM assumes the Y variables are normally distributed (to avoid โ GLM)โข LM assumes the data points are independent (to avoid โ MLM)
11/19/2019 Interpretable Linear Modeling 4
What is linear modeling?
โข Linear regression is often presented with the following notation:
๐ฆ๐ฆ๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ1๐ฅ๐ฅ1๐๐ + โฏ+ ๐ฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ + ๐๐๐๐
โข ๐ฆ๐ฆ is a vector of observations on the explained variableโข ๐ฅ๐ฅ is a vector of observations on the predictor variableโข ๐ผ๐ผ is the interceptโข ๐ฝ๐ฝ are the slopesโข ๐๐ is a vector of normally distributed and independent residuals
11/19/2019 Interpretable Linear Modeling 5
Simple linear regression
โข Instead, we will use the following (equivalent) notationโข This notation will make it easier to generalize LM
๐ฆ๐ฆ๐๐ ~ Normal ๐๐๐๐ ,๐๐
๐๐๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ1๐ฅ๐ฅ1๐๐ + โฏ+ ๐ฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐
โข This can be read as ๐ฆ๐ฆ is normally distributed around ๐๐๐๐ with SD ๐๐โข ๐๐๐๐ relates ๐ฆ๐ฆ๐๐ to ๐ฅ๐ฅ๐๐ and defines the "best fit" prediction lineโข ๐๐ captures the residuals or errors when ๐ฆ๐ฆ๐๐ โ ๐๐๐๐
11/19/2019 Interpretable Linear Modeling 6
Simple linear regression
Likelihood
Linear model
โข The goal is to find the parameter values that minimize the residualsโข In linear modeling, this means estimating ๐ผ๐ผ, ๐ฝ๐ฝ, and ๐๐
โข This can be accomplished in several different waysโข Maximum likelihood (i.e., minimize the residual sum of squares)โข Bayesian approximation (e.g., maximum a posteriori or MCMC)
โข I will provide code for use in R's lme4 and in Python's statsmodelsโข These both use maximum likelihood estimation by defaultโข I also highly recommend MCMC (especially for MLM)
11/19/2019 Interpretable Linear Modeling 7
Estimating the regression coefficients
Simulated Data of 150 YouTube Book Review Videos
11/19/2019 Interpretable Linear Modeling 8
Simulated LM example dataset
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ
review_score ~ 1
11/19/2019 Interpretable Linear Modeling 9
Example of LM with intercept-only
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 6.51 [6.20, 6.83]๐๐ Residual SD 1.94
Deviance = 563.16, Adjusted ๐ ๐ 2 = 0.000
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ๐ฅ๐ฅ๐๐
review_score ~ 1 + is_critic
11/19/2019 Interpretable Linear Modeling 10
Example of LM with binary predictor
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 6.85 [6.46, 7.24]๐ฝ๐ฝ is_critic โ0.87 [โ1.50,โ0.24]๐๐ Residual SD 1.90
Deviance = 536.15, Adjusted ๐ ๐ 2 = 0.042
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ๐ฅ๐ฅ๐๐
review_score ~ 1 + smile_rate
11/19/2019 Interpretable Linear Modeling 11
Example of LM with continuous predictor
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 5.40 [4.90, 5.90]๐ฝ๐ฝ smile_rate 1.15 [0.72, 1.57]๐๐ Residual SD 1.78
Deviance = 471.43, Adjusted ๐ ๐ 2 = 0.157
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ1๐ฅ๐ฅ1๐๐ + ๐ฝ๐ฝ2๐ฅ๐ฅ2๐๐
review_score ~ 1 + smile_rate + is_critic
11/19/2019 Interpretable Linear Modeling 12
Example of LM with two predictors
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 5.71 [5.13, 6.29]๐ฝ๐ฝ1 smile_rate 1.07 [0.65, 1.49]๐ฝ๐ฝ2 is_critic โ0.61 [โ1.20,โ0.01]๐๐ Residual SD 1.77
Deviance = 458.68, Adjusted ๐ ๐ 2 = 0.174
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)๐๐๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ1๐ฅ๐ฅ1๐๐ + ๐ฝ๐ฝ2๐ฅ๐ฅ2๐๐ + ๐ฝ๐ฝ3๐ฅ๐ฅ1๐๐๐ฅ๐ฅ2๐๐
review_score ~ 1 + smile_rate * is_critic
11/19/2019 Interpretable Linear Modeling 13
Example of LM with interaction effect
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 5.96 [5.31, 6.61]๐ฝ๐ฝ1 smile_rate 0.83 [0.33, 1.34]๐ฝ๐ฝ2 is_critic โ1.31 [โ2.32,โ0.30]๐ฝ๐ฝ3 Interaction 0.79 [โ0.13, 1.70]๐๐ Residual SD 1.76
Deviance = 449.88, Adjusted ๐ ๐ 2 = 0.185
โข Generalized linear modeling (GLM) is an extension of LMโข It allows LM to accommodate non-normally distributed ๐๐ variablesโข Examples include variables that are: binary, discrete, and bounded
โข This is accomplished using GLM families and link functionsโข Families model the likelihood function using a specific distributionโข Link functions connect the linear model to the mean of the likelihoodโข Link function also ensure that the mean is the "right" kind of number
11/19/2019 Interpretable Linear Modeling 15
What is generalized linear modeling?
๐ฆ๐ฆ๐๐ ~ Family ๐๐๐๐ , โฆ
link(๐๐๐๐) = ๐ผ๐ผ + ๐ฝ๐ฝ1๐ฅ๐ฅ1๐๐ + โฏ+ ๐ฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐
11/19/2019 Interpretable Linear Modeling 16
Common GLM families and link functions
Family Uses Supports Link Function
Normal Linear data โ: (โโ,โ) Identity ๐๐
Gamma Non-negative data(reaction times) โ: (0,โ) Inverse
or Power ๐๐โ1
Poisson Count data โค: 0, 1, 2, โฆ Log log(๐๐)
Binomial Binary dataCategorical data
โค: 0, 1โค: [0,๐พ๐พ) Logit log
๐๐1 โ ๐๐
11/19/2019 Interpretable Linear Modeling 17
Simulated GLM example dataset
Simulated Data of 150 Romantic Couples' Interactions
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ๐ฅ๐ฅ๐๐
divorced ~ 1 + satisfaction
11/19/2019 Interpretable Linear Modeling 18
Example of LM for binary data
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 0.98 [0.89, 1.07]๐ฝ๐ฝ satisfaction โ0.22 [โ0.25,โ0.19]๐๐ Residual SD 0.304
Deviance = 13.70, Adjusted ๐ ๐ 2 = 0.617
๐ฆ๐ฆ๐๐ ~ Binomial(1,๐๐๐๐)
logit(๐๐๐๐) = ๐ผ๐ผ + ๐ฝ๐ฝ๐ฅ๐ฅ๐๐divorced ~ 1 + satisfaction
family = binomial(link = "logit")
11/19/2019 Interpretable Linear Modeling 19
Example of GLM for binary data
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 3.52 [2.45, 4.87]๐ฝ๐ฝ satisfaction โ1.78 [โ2.40,โ1.30]
Deviance = 82.04, Pseudo ๐ ๐ 2 = 0.664
Note: Results are in transformed (i.e., logit) units
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ + ๐ฝ๐ฝ๐ฅ๐ฅ๐๐
interruptions ~ 1 + satisfaction
11/19/2019 Interpretable Linear Modeling 20
Example of LM for count data
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 17.11 [15.76, 18.45]๐ฝ๐ฝ satisfaction โ3.95 [โ4.38,โ3.52]๐๐ Residual SD 4.61
Deviance = 3143.62, Adjusted ๐ ๐ 2 = 0.688
๐ฆ๐ฆ๐๐ ~ Poisson(๐๐๐๐)
log(๐๐๐๐) = ๐ผ๐ผ + ๐ฝ๐ฝ๐ฅ๐ฅ๐๐interruptions ~ 1 + satisfactionfamily = poisson(link = "log")
11/19/2019 Interpretable Linear Modeling 21
Example of GLM for count data
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 3.16 [3.08, 3.24]๐ฝ๐ฝ satisfaction โ0.77 [โ0.83,โ0.72]
Deviance = 325.24, Pseudo ๐ ๐ 2 = 0.895
Note: Results are in transformed (i.e., log) units
โข MLM allows (G)LM to handle data points that are dependent/clusteredโข This is problematic because data within a cluster are likely to be similarโข Effects (e.g., intercepts and slopes) may also differ between clusters
11/19/2019 Interpretable Linear Modeling 23
What is multilevel modeling?
Person 1
Obs. 1
X
Y
Obs. 2
X
Y
Obs. 3
X
Y
Person 2
Obs. 1
X
Y
Obs. 2
X
Y
Obs. 3
X
Y
Person 3
Obs. 1
X
Y
Obs. 2
X
Y
Obs. 3
X
Y
โข One approach would be to estimate effects across all clusters (complete pooling)โข But this would require all clusters to be nearly identical, which they may not be
โข Another would be to estimate effects for each cluster separately (no pooling)โข But this would ignore similarities between clusters and may overfit the data
โข The MLM approach tries to have the best of both approaches (partial pooling)โข MLM tries to learn the similarities and the differences between clustersโข It estimates effects for each individual cluster separatelyโข But these effects are assumed to be drawn from a population of clustersโข MLM explicitly models this population and the cluster-specific variation
11/19/2019 Interpretable Linear Modeling 24
What is multilevel modeling?
โข MLM comes with several important benefits over single-level modelsโข Improved estimates for repeat samplingโข Improved estimates for imbalanced samplingโข Estimates of variation across clustersโข Avoid averaging and retain variationโข Better accuracy and inference!
โข Many have argued that "multilevel regression deserves to be the default approach"
11/19/2019 Interpretable Linear Modeling 25
What is multilevel modeling?
๐ฆ๐ฆ๐๐ ~ Normal ๐๐๐๐ ,๐๐
๐๐๐๐ = ๐ผ๐ผCLUSTER[๐๐]
๐ผ๐ผCLUSTER ~ Normal(๐ผ๐ผ,๐๐CLUSTER)
โข ๐ผ๐ผ captures the overall intercept (i.e., the mean of all clusters' intercepts)โข ๐ผ๐ผCLUSTER ๐๐ captures each cluster's deviation from the overall interceptโข ๐๐CLUSTER captures how much variability there is across clusters' interceptsโข We now have an intercept for every cluster and a "population" of interceptsโข The model learns about each cluster's intercept from the population of intercepts
โข This pooling within parameters is very helpful for imbalanced sampling
11/19/2019 Interpretable Linear Modeling 26
Varying intercepts
11/19/2019 Interpretable Linear Modeling 27
Actual MLM example dataset
Real Data of 751 Smiles from 136 Subjects (Girard, Shandar, et al. 2019)
11/19/2019 Interpretable Linear Modeling 28
Complete pooling (underfitting)
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ
smile_int ~ 1
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 2.14 [2.10, 2.19]๐๐ Residual SD 0.66
Deviance = 1495.8
11/19/2019 Interpretable Linear Modeling 29
No pooling (overfitting)
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผPERSON[๐๐]
smile_int ~ 1 + factor(subject)
Variable Estimate 95% CI
๐ผ๐ผ 1 Intercept for P1 2.14 [1.93, 2.89]๐ผ๐ผ 2 Intercept for P2 2.17 [1.49, 2.85]โฆ โฆ โฆ โฆ
๐๐ Residual SD 0.60
Deviance = 1210.10
11/19/2019 Interpretable Linear Modeling 30
Partial pooling (ideal fit)
๐ฆ๐ฆ๐๐ ~ Normal(๐๐๐๐ ,๐๐)
๐๐๐๐ = ๐ผ๐ผ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐
๐ผ๐ผ๐๐๐๐๐๐๐๐๐๐๐๐ ~ Normal ๐ผ๐ผ,๐๐๐๐๐๐๐๐๐๐๐๐๐๐
smile_int ~ 1 + (1 | subject)
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 2.15 [2.09, 2.21]๐๐๐๐ Intercept SD 0.27๐๐ Residual SD 0.60
Deviance = 1458.81
๐ฆ๐ฆ๐๐ ~ Normal ๐๐๐๐ ,๐๐
๐๐๐๐ = ๐ผ๐ผCLUSTER ๐๐ + ๐ฝ๐ฝCLUSTER ๐๐ ๐ฅ๐ฅ๐๐๐ผ๐ผCLUSTER๐ฝ๐ฝCLUSTER ~ MVNormal
๐ผ๐ผ๐ฝ๐ฝ, ๐๐
๐๐ =๐๐๐ผ๐ผ 00 ๐๐๐ฝ๐ฝ
๐๐๐๐๐ผ๐ผ 00 ๐๐๐ฝ๐ฝ
โข We can pool and learn from the covariance between varying intercepts and slopesโข We do this with a 2D normal distribution with means ๐ผ๐ผ and ๐ฝ๐ฝ and covariance matrix ๐๐โข We build ๐๐ through matrix multiplication of the variances and correlation matrix ๐๐โข This is more complex but results in even better pooling: within and across parameters
11/19/2019 Interpretable Linear Modeling 32
Varying intercepts and slopes
11/19/2019 Interpretable Linear Modeling 33
MLM with varying intercepts and slopes
๐ฆ๐ฆ๐๐ ~ Normal ๐๐๐๐ ,๐๐
๐๐๐๐ = ๐ผ๐ผPERSON ๐๐ + ๐ฝ๐ฝPERSON ๐๐ ๐ฅ๐ฅ๐๐๐ผ๐ผPERSON๐ฝ๐ฝPERSON ~ MVNormal
๐ผ๐ผ๐ฝ๐ฝ, ๐๐
๐๐ =๐๐๐ผ๐ผ 00 ๐๐๐ฝ๐ฝ
๐๐๐๐๐ผ๐ผ 00 ๐๐๐ฝ๐ฝ
smile_int ~ 1 + amused_sr + (1 + amused_sr| subject)
Deviance = 1458.81
Variable Estimate 95% CI
๐ผ๐ผ (Intercept) 1.98 [1.91, 2.06]๐ฝ๐ฝ amused_sr 0.12 [0.09, 0.15]๐๐๐ผ๐ผ Intercept SD 0.24 [0.17, 0.32]๐๐๐ฝ๐ฝ Slope SD 0.04 [0.00, 0.08]
๐๐ corr ๐๐๐ผ๐ผ ,๐๐๐ฝ๐ฝ 0.30 [โ0.69, 0.96]
๐๐ Residual SD 0.57 [0.54, 0.60]