Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

1

Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Group 4 Module 14, April 16, 2018

Some review, then Hierarchical Linear Models,

Optimizing, Iterating ctd.

Some review(s) first• ctrees

– group2/lab1_ctree2.R• Kknn for iris• SVM for iris

• Factor Analysis v. Principal Components

• Remember open lab this Thursday

• Assignment 7 due Friday 20th 2

Swiss ctree…

3

Kknn lab - iris

4

Svm lab - iris

5

randomForest (iris)

6

Factor Analysis – 2 (Athletics)

7

Factor Analysis – 3 (Athletics)

8

Principal component (Athletics)

9

10

Iterating: structure to regression• Hierarchical … simpler form of mixed model?

11

Remember: Random effects..In the initial exploration class as nested within school, “class is 'under' school”, specified inside parentheses and can be repeated measures, interaction terms, or nested

lmm.1 <- lmer(extro ~ open + social + class + (1|school/class), data = lmm.data)summary(lmm.1)

12

Summary(lmm.1)Linear mixed model fit by REML ['lmerMod']Formula: extro ~ open + social + class + (1 | school/class) Data: lmm.data

REML criterion at convergence: 3521.5

Scaled residuals: Min 1Q Median 3Q Max -10.0144 -0.3373 0.0164 0.3378 10.5788

Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.8822 1.6977 school (Intercept) 95.1725 9.7556 Residual 0.9691 0.9844 Number of obs: 1200, groups: class:school, 24; school, 6

13

Fixed effects: Estimate Std. Error t value(Intercept) 5.712e+01 4.052e+00 14.098open 6.053e-03 4.965e-03 1.219social 5.085e-04 1.853e-03 0.274classb 2.047e+00 9.835e-01 2.082classc 3.698e+00 9.835e-01 3.760classd 5.656e+00 9.835e-01 5.751

Correlation of Fixed Effects: (Intr) open social classb classcopen -0.049 social -0.046 -0.006 classb -0.121 -0.002 0.005 classc -0.121 -0.001 0.000 0.500 classd -0.121 0.000 0.002 0.500 0.500

14

Now: Intra Class Correlation# First, run the 'null' model (which includes just the intercepts and the random effect for the highest level of the nesting variables; in this example 'school’.lmm.null <- lmer(extro ~ 1 + (1|school), data = lmm.data)summary(lmm.null)

15

summaryLinear mixed model fit by REML ['lmerMod']Formula: extro ~ 1 + (1 | school) Data: lmm.dataREML criterion at convergence: 5806.1Scaled residuals: Min 1Q Median 3Q Max -5.9773 -0.5315 0.0059 0.5298 6.2109 Random effects: Groups Name Variance Std.Dev. school (Intercept) 95.87 9.791 Residual 7.14 2.672 Number of obs: 1200, groups: school, 6

Fixed effects: Estimate Std. Error t value(Intercept) 60.267 3.998 15.07

16

Intra Class Correlation (ICC)# Notice the variance component estimates for the random effect. If we add these together, then divide that total by the 'school' variance estimate; we get the ICC95.8720 + 7.139995.8720 / 103.0119

# This indicates that 93.06886% of the variance in 'extro' can be "explained" by school group membership (verified below using Bliese's multilevel package).

17

# ICC1 and ICC2 as described by Bliese.

library(multilevel)aov.1 <- aov(extro ~ school, lmm.data)summary(aov.1) Df Sum Sq Mean Sq F value Pr(>F) school 5 95908 19182 2687 <2e-16 ***Residuals 1194 8525 7 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 18

ICC1/ ICC2 (Bliese)# Below (ICC1) indicates that 93.07% of the variance in 'extro' can be "explained" by school# group membership.ICC1(aov.1)[1] 0.930689

# The ICC2 value (below) of .9996 indicates that school groups can be very reliably differentiated in terms of 'extro' scores.> ICC2(aov.1)[1] 0.9996278 19

Simulating the Posterior Distribution of Predicted Values.

# 'arm' package and use the 'sim' function. Note: n = 100 is the default for 'sim’.library(arm)

sim.100 <- sim(lmm.2, n = 100)# Show the structure of objects in the 'sim' object.str(sim.100)<not displayed> 20


# Fixed effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3).fe.sim <- fixef(sim.100)fe.sim (Intercept) open agree social classb classc classd [1,] 55.24643 0.0113879890 -7.370662e-03 4.115703e-03 1.99092257 2.9418821 3.162604 [2,] 56.69630 0.0051451242 -1.373704e-02 -1.799054e-03 1.73041539 3.8671053 6.160748 [3,] 63.18570 0.0003935109 2.607783e-03 1.435752e-03 1.80586410 3.2203590 5.802364 [4,] 56.00007 0.0042571840 -6.076147e-03 -5.324692e-03 2.71728164 5.6066533 6.852651 [5,] 59.94718 0.0026340937 -2.584516e-03 3.295548e-07 1.45650055 3.3174045 5.871667 [6,] 65.26589 0.0100470520 -1.324052e-02 -3.480780e-04 1.79030239 3.3253023 4.050358 [7,] 56.80116 0.0082074105 -8.175804e-03 1.182413e-03 2.35693946 3.0119753 5.937348 [8,] 61.32350 0.0047934705 -1.484498e-02 -2.710392e-03 2.11558934 4.2048688 6.552194 [9,] 53.87001 0.0054213155 -7.160089e-03 8.668833e-04 1.86080451 2.8613245 4.761669 [10,] 57.47641 0.0055136083 -6.293459e-03 -5.253847e-05 3.17600677 6.4525022 6.438270

21


# Random effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3).re.sim <- ranef(sim.100)re.sim[[1]] # For "class:school" random effect.re.sim[[2]] # For ”school" random effect.

22

re.sim[[1]] # For ”class:school" random effect., , (Intercept)

a:I a:II a:III a:IV a:V a:VI b:I b:II [1,] -1.8138575 1.009722294 0.502308352 0.574242632 1.62249792 0.34486828 0.41734749 -0.516721008 [2,] -4.5023927 0.325461572 1.105711427 0.555938715 1.49927806 -1.05082790 0.72720272 1.065476210 [3,] -2.9011592 1.699112086 1.924096930 1.588047483 0.08551292 -1.71333314 0.47475579 0.095562455 [4,] -4.7454517 -1.024665550 0.449287566 1.066899463 1.56470696 -1.34450134 -0.47980863 0.964331898 [5,] -4.6413961 0.092845610 0.878011579 0.328065852 0.94227622 -2.48685750 0.13250051 0.336973705Much more!

23

re.sim[[2]] # For "school" random effect.

, , (Intercept)

I II III IV V VI [1,] -10.889610 11.9319979 6.4468727 7.52046579 9.407021912 14.8484638 [2,] -11.811196 -10.1548630 -2.3812528 4.24907315 6.038850618 15.1022442 [3,] -17.642004 -6.5881409 2.6734584 5.09687885 7.313420709 7.6798984 [4,] -12.201235 -6.5415744 -6.2550322 4.62112286 13.050521302 14.7147714 [5,] -16.604904 -10.9215257 -3.2698478 2.47299902 2.276550540 11.8441601

24

Get predicted values

# To get predicted values from the posterior distribution, use the 'fitted' function.

yhat.lmm.2 <- fitted(sim.100, lmm.2)head(yhat.lmm.2)< see output >tail(yhat.lmm.2)< see output >

25

# The above object (yhat.lmm.2) is a matrix of 100 (simulations) by 1200 participants.# In this matrix, each row represents a participant and each column represents a simulated predicted value for the outcome variable of our lmm.2 model.# Therefore, the yhat.lmm.2 object can be used to create credible intervals for each participant (i.e. individual level).> quantile(yhat.lmm.2, probs = c(.025, .985)) # For first participant (i.e. row 1). 2.5% 98.5% 39.93096 81.29584 26

# We can also create a data frame with the quantiles for every participant.quant.mat <- data.frame(matrix(rep(NA, 1200*2), ncol = 2))names(quant.mat) <- c("2.5%", "98.5%")quant.mat[,1] <- apply(yhat.lmm.2, 1, quantile, probs = .025)quant.mat[,2] <- apply(yhat.lmm.2, 1, quantile, probs = .985)head(quant.mat, 25) 27

Head of data frame 2.5% 98.5%1 47.99122 80.077362 66.11761 72.793333 76.65614 83.608974 46.50965 79.564515 48.01904 80.077426 47.20663 54.454877 49.31807 75.217088 48.06083 80.11512

28

In R - lcmm• Estimation of latent class mixed-effect models

for different types of outcomes (continuous Gaussian, continuous non-Gaussian or ordinal)

• This function fits mixed models and latent class mixed models for different types of outcomes. – continuous longitudinal outcomes (Gaussian or

non-Gaussian) as well as bounded quantitative, discrete and ordinal longitudinal outcomes.

29

What does it do?• The different types of outcomes are taken

into account using parameterized nonlinear link functions between the observed outcome and the underlying latent process of interest

• At the latent process level, the model estimates a standard linear mixed model or a latent class mixed model when heterogeneity in the population is investigated (in the same way as in function hlme -> next) but it should be noted that the program also works when no random-effect is included! 30

What does it do?• Parameters of the nonlinear link function and

of the latent process mixed model are estimated simultaneously using a maximum likelihood method.

lcmm(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, link = "linear", intnodes = NULL, epsY = 0.5, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter=100, nsim=100, prior,range=NULL, na.action=1) ### that’s a lot of parameters 31

Turning to lcmm# Beta link functionm11<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="beta")summary(m11)plot.linkfunction(m11,bty="l")# I-splines with 3 equidistant nodesm12<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="3-equi-splines")summary(m12)# I-splines with 5 nodes at quantilesm13<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="5-quant-splines")summary(m13)# I-splines with 5 nodes, and interior nodes entered manuallym14<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="5-manual-splines",intnodes=c(10,20,25))summary(m14)plot.linkfunction(m14,bty="l")

32

Turning to lcmm# Thresholds# Especially for the threshold link function, we recommend to estimate models # with increasing complexity and use estimates of previous ones to specify # plausible initial values (we remind that estimation of models with threshold# link function involves a computationally demanding numerical integration # -here of size 3)m15<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="thresholds",maxiter=100,B=c(-0.8379, -0.1103, 0.3832, 0.3788 , 0.4524, -7.3180, 0.5917, 0.7364, 0.6530, 0.4038, 0.4290, 0.6099, 0.6014 , 0.5354 , 0.5029 , 0.5463, 0.5310 , 0.5352, 0.6498, 0.6653, 0.5851, 0.6525, 0.6701 , 0.6670 , 0.6767 , 0.7394 , 0.7426, 0.7153, 0.7702, 0.6421))summary(m15)plot.linkfunction(m15,bty="l")

33

Turning to lcmm#### Plot of estimated different link functions:#### (applicable for models that only differ in the "link function" used. #### Otherwise, the latent process scale is different and a rescaling is necessary)transfo <- data.frame(marker=m10$estimlink[,1],linear=m10$estimlink[,2],beta=m11$estimlink[,2],spl_3e=m12$estimlink[,2],spl_5q=m13$estimlink[,2],spl_5m=m14$estimlink[,2])dev.new()plot(transfo[,1]~transfo[,2],xlim=c(-10,5),col=1,type='l',xlab="latent process",ylab="marker",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,3],xlim=c(-10,5),col=2,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,4],xlim=c(-10,5),col=3,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,5],xlim=c(-10,5),col=4,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(m15$estimlink[,1]~m15$estimlink[,2],xlim=c(-10,5),col=5,type='l',xlab="",ylab="",bty="l")legend(x="bottomright",legend=c(colnames(transfo[,2:5]),"thresholds"),col=1:5,lty=1,inset=.02,bty="n”)

34

Turning to lcmm#### Estimation of 2-latent class mixed models with different assumed link #### functions with individual and class specific linear trend#### for illustration, only default initial values where used but other#### sets of initial values should also be tried to ensure convergence #### towards the golbal maximum# Linear link functionm20<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="linear")summary(m20)postprob(m20)# Beta link functionm21<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="beta")summary(m21)postprob(m21)# I-splines link function (and 5 nodes at quantiles)m22<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="5-quant-splines")summary(m22)postprob(m22)

data <- data_Jointlcmm[data_Jointlcmm$ID==193,]plot.predict(m22,var.time="Time",newdata=data,bty="l")

35

Turning to multlcmmlibrary(lcmm)data(data_Jointlcmm)# linear link function# Latent process mixed model for two curvilinear outcomes. Link functions are aproximated by I-splines, the first one has 3 nodes (i.e. 1 internal node 8), the second one has 4 nodes (i.e. 2 internal nodes 12,25)m1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm)

Be patient, multlcmm is running ... The program took 56.14 seconds

36

Quicker lcmm# to reduce the computation time, the same model is estimated using# a vector of initial valuesm1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm, B=c(-1.071, -0.192, 0.106, -0.005, -0.193, 1.012, 0.870, 0.881, 0.000, 0.000, -7.520, 1.401, 1.607 , 1.908, 1.431, 1.082, -7.528, 1.135 , 1.454 , 2.328, 1.052)) Be patient, multlcmm is running ... The program took 7.78 seconds

37

Summary(m1)General latent class mixed model fitted by maximum likelihood method multlcmm(fixed = Ydep1 + Ydep2 ~ 1 + Time * X2 + contrast(X2), random = ~1 + Time, subject = "ID", randomY = TRUE, link = c("4-manual-splines", "3-manual-splines"), intnodes = c(8, 12, 25), data = data_Jointlcmm) Statistical Model: Dataset: data_Jointlcmm Number of subjects: 300 Number of observations: 3356 Number of latent classes: 1 Number of parameters: 21 Link functions: Quadratic I-splines with nodes 0 8 12 17.581 for Ydep1 Quadratic I-splines with nodes 0 25 30 for Ydep2

38

Summary(m1) Iteration process: Convergence criteria satisfied Number of iterations: 4 Convergence criteria: parameters= 5.2e-11 : likelihood= 2.1e-08 : second derivatives= 1.2e-09 Goodness-of-fit statistics: maximum log-likelihood: -6977.48 AIC: 13996.95 BIC: 14074.73

39

Summary(m1)Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-valueintercept (not estimated) 0.00000 Time -1.07056 0.12293 -8.70900 0.00000X2 -0.19225 0.16697 -1.15100 0.24957Time:X2 0.10627 0.18634 0.57000 0.56847Contrasts on X2 (p=0.88696) Ydep1 -0.00483 0.03399 -0.14215 0.88696Ydep2* 0.00483 0.03399 0.14215 0.88696 *coefficient not estimated but obtained from the others as minus the sum of themVariance-covariance matrix of the random-effects:(the variance of the first random effect is not estimated) intercept Timeintercept 1.00000 Time -0.19338 1.01251

40

Summary(m1) – last bit! Ydep1 Ydep2Residual standard error: 0.86955 0.88053Standard error of the random effect: 0.00000 0.00000Parameters of the link functions: coef Se Wald p-valueYdep1-I-splines1 -7.51985 0.64412 -11.675 0e+00Ydep1-I-splines2 1.40067 0.18058 7.756 0e+00Ydep1-I-splines3 1.60739 0.10324 15.569 0e+00Ydep1-I-splines4 1.90822 0.07873 24.238 0e+00Ydep1-I-splines5 1.43117 0.09075 15.770 0e+00Ydep1-I-splines6 1.08205 0.21198 5.105 0e+00Ydep2-I-splines1 -7.52861 0.67080 -11.223 0e+00Ydep2-I-splines2 1.13505 0.25553 4.442 1e-05Ydep2-I-splines3 1.45345 0.14629 9.935 0e+00Ydep2-I-splines4 2.32793 0.08636 26.956 0e+00Ydep2-I-splines5 1.05187 0.05908 17.803 0e+00

41

plot(m1,which="linkfunction")# variation percentages explained by linear mixed regression

> VarExpl(m1,data.frame(Time=0)) class1%Var-Ydep1 56.94364%Var-Ydep2 56.32753

42

summary(m2)< … ># posterior classificationpostprob(m2)Posterior classification: class1 class2N 143.00 157.00% 47.67 52.33Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2class1 1.0000 0.0000class2 0.0589 0.9411Posterior probalities above a threshold (%): class1 class2prob>0.7 100 98.09Prob>0.8 100 96.18prob>0.9 100 85.99

43

# longitudinal predictions in the outcomes scales for a given profile of covariates

newdata <- data.frame(Time=seq(0,5,length=100),X1=rep(0,100),X2=rep(0,100),X3=rep(0,100))predGH <- predictY(m2,newdata,var.time="Time",methInteg=0,nsim=20)head(predGH)

Etc. 44

In lcmm - hlme• Fits a latent class linear mixed model (LCLMM) also

known as growth mixture model or heterogeneous linear mixed model.

• LCLMM consists in assuming that the population is divided in a finite number of latent classes; each latent class is characterized by a specific mean trajectory which is described by a class-specific linear mixed model.

• Both the latent class membership and the trajectory can be explained according to covariates.

• This model is limited to a Gaussian outcome.45

In Rhlme(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, cor=NULL, data, B, convB=0.0001, convL=0.0001, convG=0.0001, prior, maxiter=500, subset=NULL, na.action=1)

46

Exampledata(data_hlme)m1<-hlme(Y~Time*X1, random=~Time, subject='ID', ng=1, idiag=TRUE, data=data_hlme)summary(m1)

47

Summary hlmeHeterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = Y ~ Time * X1, random = ~Time, subject = "ID", ng = 1, idiag = TRUE, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 1 Number of parameters: 7

48

Summary hlmeIteration process: Convergence criteria satisfied Number of iterations: 9 Convergence criteria: parameters= 1.2e-07 : likelihood= 1.6e-05 : second derivatives= 6.2e-13 Goodness-of-fit statistics: maximum log-likelihood: -804.98 AIC: 1623.95 BIC: 1642.19 Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-valueintercept 25.86515 0.79448 32.556 0.00000Time -0.33282 0.17547 -1.897 0.05787X1 1.69698 1.03466 1.640 0.10098Time:X1 -0.39364 0.22848 -1.723 0.08491

49

Summary hlme

Variance-covariance matrix of the random-effects: intercept Timeintercept 24.63032 Time 0.00000 1.168762

coef seResidual standard error: 0.9501876 0.05765784

50

plot(m1)

51

Examplem2<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0.11, -0.74, -0.07, 20.71, 29.39, -1, 0.13, 2.45, -0.29, 4.5, 0.36, 0.79, 0.97))m2Heterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = Y ~ Time * X1, mixture = ~Time, random = ~Time, subject = "ID", classmb = ~X2 + X3, ng = 2, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 2 Number of parameters: 13

52

Iteration process: Convergence criteria satisfied Number of iterations: 2 Convergence criteria: parameters= 1.3e-07 : likelihood= 4.4e-07 : second derivatives= 2.5e-12 Goodness-of-fit statistics: maximum log-likelihood: -773.82 AIC: 1573.64 BIC: 1607.51

Examplesummary(m2)postprob(m2)Posterior classification: class1 class2N 46 54% 46 54 Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2class1 0.9588 0.0412class2 0.0325 0.9675 Posterior probalities above a threshold (%): class1 class2prob>0.7 93.48 100.00prob>0.8 93.48 92.59prob>0.9 86.96 83.33

53

54

Example### same model as m2 but initial values specifiedm3<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0, 0, 0, 30, 25, 0, -1, 0, 0, 5, 0, 1, 1))m3

55

Predicting…summary(m3)

Etc.

## plot of predicted trajectories using some newdatanewdata<-data.frame( Time= seq(0,5,length=100), X1= rep(0,100), X2=rep(0,100), X3=rep(0,100)) plot.predict(m3,newdata,"Time","right",bty="l") 56

plot m3

57

Beyond PCA!• Kernel PCA• ICA

– PCA is not particularly helpful for finding independent clusters

– ICA idea:– Assume non-Gaussian data– Find multiple sets of components– Minimize correlation between components

– Blind source separation example:– Given: Audio recording with w/2 overlapping voices – Goal: Separate voices into separate tracks 58

Beyond PCA!Probabilistic PCA

Bayesian source separation

Continuous Latent Variables

59

Reading, etc.• http://data-informed.com/focus-predictive-ana

lytics/

• Final week – your project presentations ~ Monday, Thursday in two sections (Carnegie 113 and Lally 102) – we cannot run over the class time to complete these – plan accordingly, arrive on time (instructions and initial schedule sent in LMS) – and attendance is essential (no excuses)

• 5 MINUTES – you do not need more60

http://data-informed.com/focus-predictive-analytics/

http://data-informed.com/focus-predictive-analytics/

Documents

Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018