60
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 4 Module 14, April 16, 2018 Some review, then Hierarchical Linear Models, Optimizing, Iterating ctd.

Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Embed Size (px)

Citation preview

Page 1: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

1

Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Group 4 Module 14, April 16, 2018

Some review, then Hierarchical Linear Models,

Optimizing, Iterating ctd.

Page 2: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Some review(s) first• ctrees

– group2/lab1_ctree2.R• Kknn for iris• SVM for iris

• Factor Analysis v. Principal Components

• Remember open lab this Thursday

• Assignment 7 due Friday 20th 2

Page 3: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Swiss ctree…

3

Page 4: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Kknn lab - iris

4

Page 5: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Svm lab - iris

5

Page 6: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

randomForest (iris)

6

Page 7: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Factor Analysis – 2 (Athletics)

7

Page 8: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Factor Analysis – 3 (Athletics)

8

Page 9: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Principal component (Athletics)

9

Page 10: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

10

Page 11: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Iterating: structure to regression• Hierarchical … simpler form of mixed model?

11

Page 12: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Remember: Random effects..In the initial exploration class as nested within school, “class is 'under' school”, specified inside parentheses and can be repeated measures, interaction terms, or nested

lmm.1 <- lmer(extro ~ open + social + class + (1|school/class), data = lmm.data)summary(lmm.1)

12

Page 13: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary(lmm.1)Linear mixed model fit by REML ['lmerMod']Formula: extro ~ open + social + class + (1 | school/class) Data: lmm.data

REML criterion at convergence: 3521.5

Scaled residuals: Min 1Q Median 3Q Max -10.0144 -0.3373 0.0164 0.3378 10.5788

Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.8822 1.6977 school (Intercept) 95.1725 9.7556 Residual 0.9691 0.9844 Number of obs: 1200, groups: class:school, 24; school, 6

13

Page 14: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Fixed effects: Estimate Std. Error t value(Intercept) 5.712e+01 4.052e+00 14.098open 6.053e-03 4.965e-03 1.219social 5.085e-04 1.853e-03 0.274classb 2.047e+00 9.835e-01 2.082classc 3.698e+00 9.835e-01 3.760classd 5.656e+00 9.835e-01 5.751

Correlation of Fixed Effects: (Intr) open social classb classcopen -0.049 social -0.046 -0.006 classb -0.121 -0.002 0.005 classc -0.121 -0.001 0.000 0.500 classd -0.121 0.000 0.002 0.500 0.500

14

Page 15: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Now: Intra Class Correlation# First, run the 'null' model (which includes just the intercepts and the random effect for the highest level of the nesting variables; in this example 'school’.lmm.null <- lmer(extro ~ 1 + (1|school), data = lmm.data)summary(lmm.null)

15

Page 16: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

summaryLinear mixed model fit by REML ['lmerMod']Formula: extro ~ 1 + (1 | school) Data: lmm.dataREML criterion at convergence: 5806.1Scaled residuals: Min 1Q Median 3Q Max -5.9773 -0.5315 0.0059 0.5298 6.2109 Random effects: Groups Name Variance Std.Dev. school (Intercept) 95.87 9.791 Residual 7.14 2.672 Number of obs: 1200, groups: school, 6

Fixed effects: Estimate Std. Error t value(Intercept) 60.267 3.998 15.07

16

Page 17: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Intra Class Correlation (ICC)# Notice the variance component estimates for the random effect. If we add these together, then divide that total by the 'school' variance estimate; we get the ICC95.8720 + 7.139995.8720 / 103.0119

# This indicates that 93.06886% of the variance in 'extro' can be "explained" by school group membership (verified below using Bliese's multilevel package).

17

Page 18: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

# ICC1 and ICC2 as described by Bliese.

library(multilevel)aov.1 <- aov(extro ~ school, lmm.data)summary(aov.1) Df Sum Sq Mean Sq F value Pr(>F) school 5 95908 19182 2687 <2e-16 ***Residuals 1194 8525 7 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 18

Page 19: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

ICC1/ ICC2 (Bliese)# Below (ICC1) indicates that 93.07% of the variance in 'extro' can be "explained" by school# group membership.ICC1(aov.1)[1] 0.930689

# The ICC2 value (below) of .9996 indicates that school groups can be very reliably differentiated in terms of 'extro' scores.> ICC2(aov.1)[1] 0.9996278 19

Page 20: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Simulating the Posterior Distribution of Predicted Values.

# 'arm' package and use the 'sim' function. Note: n = 100 is the default for 'sim’.library(arm)

sim.100 <- sim(lmm.2, n = 100)# Show the structure of objects in the 'sim' object.str(sim.100)<not displayed> 20

Page 21: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Simulating the Posterior Distribution of Predicted Values.

# Fixed effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3).fe.sim <- fixef(sim.100)fe.sim (Intercept) open agree social classb classc classd [1,] 55.24643 0.0113879890 -7.370662e-03 4.115703e-03 1.99092257 2.9418821 3.162604 [2,] 56.69630 0.0051451242 -1.373704e-02 -1.799054e-03 1.73041539 3.8671053 6.160748 [3,] 63.18570 0.0003935109 2.607783e-03 1.435752e-03 1.80586410 3.2203590 5.802364 [4,] 56.00007 0.0042571840 -6.076147e-03 -5.324692e-03 2.71728164 5.6066533 6.852651 [5,] 59.94718 0.0026340937 -2.584516e-03 3.295548e-07 1.45650055 3.3174045 5.871667 [6,] 65.26589 0.0100470520 -1.324052e-02 -3.480780e-04 1.79030239 3.3253023 4.050358 [7,] 56.80116 0.0082074105 -8.175804e-03 1.182413e-03 2.35693946 3.0119753 5.937348 [8,] 61.32350 0.0047934705 -1.484498e-02 -2.710392e-03 2.11558934 4.2048688 6.552194 [9,] 53.87001 0.0054213155 -7.160089e-03 8.668833e-04 1.86080451 2.8613245 4.761669 [10,] 57.47641 0.0055136083 -6.293459e-03 -5.253847e-05 3.17600677 6.4525022 6.438270

21

Page 22: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Simulating the Posterior Distribution of Predicted Values.

# Random effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3).re.sim <- ranef(sim.100)re.sim[[1]] # For "class:school" random effect.re.sim[[2]] # For ”school" random effect.

22

Page 23: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

re.sim[[1]] # For ”class:school" random effect., , (Intercept)

a:I a:II a:III a:IV a:V a:VI b:I b:II [1,] -1.8138575 1.009722294 0.502308352 0.574242632 1.62249792 0.34486828 0.41734749 -0.516721008 [2,] -4.5023927 0.325461572 1.105711427 0.555938715 1.49927806 -1.05082790 0.72720272 1.065476210 [3,] -2.9011592 1.699112086 1.924096930 1.588047483 0.08551292 -1.71333314 0.47475579 0.095562455 [4,] -4.7454517 -1.024665550 0.449287566 1.066899463 1.56470696 -1.34450134 -0.47980863 0.964331898 [5,] -4.6413961 0.092845610 0.878011579 0.328065852 0.94227622 -2.48685750 0.13250051 0.336973705Much more!

23

Page 24: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

re.sim[[2]] # For "school" random effect.

, , (Intercept)

I II III IV V VI [1,] -10.889610 11.9319979 6.4468727 7.52046579 9.407021912 14.8484638 [2,] -11.811196 -10.1548630 -2.3812528 4.24907315 6.038850618 15.1022442 [3,] -17.642004 -6.5881409 2.6734584 5.09687885 7.313420709 7.6798984 [4,] -12.201235 -6.5415744 -6.2550322 4.62112286 13.050521302 14.7147714 [5,] -16.604904 -10.9215257 -3.2698478 2.47299902 2.276550540 11.8441601

24

Page 25: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Get predicted values

# To get predicted values from the posterior distribution, use the 'fitted' function.

yhat.lmm.2 <- fitted(sim.100, lmm.2)head(yhat.lmm.2)< see output >tail(yhat.lmm.2)< see output >

25

Page 26: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

# The above object (yhat.lmm.2) is a matrix of 100 (simulations) by 1200 participants.# In this matrix, each row represents a participant and each column represents a simulated predicted value for the outcome variable of our lmm.2 model.# Therefore, the yhat.lmm.2 object can be used to create credible intervals for each participant (i.e. individual level).> quantile(yhat.lmm.2, probs = c(.025, .985)) # For first participant (i.e. row 1). 2.5% 98.5% 39.93096 81.29584 26

Page 27: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

# We can also create a data frame with the quantiles for every participant.quant.mat <- data.frame(matrix(rep(NA, 1200*2), ncol = 2))names(quant.mat) <- c("2.5%", "98.5%")quant.mat[,1] <- apply(yhat.lmm.2, 1, quantile, probs = .025)quant.mat[,2] <- apply(yhat.lmm.2, 1, quantile, probs = .985)head(quant.mat, 25) 27

Page 28: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Head of data frame 2.5% 98.5%1 47.99122 80.077362 66.11761 72.793333 76.65614 83.608974 46.50965 79.564515 48.01904 80.077426 47.20663 54.454877 49.31807 75.217088 48.06083 80.11512

28

Page 29: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

In R - lcmm• Estimation of latent class mixed-effect models

for different types of outcomes (continuous Gaussian, continuous non-Gaussian or ordinal)

• This function fits mixed models and latent class mixed models for different types of outcomes. – continuous longitudinal outcomes (Gaussian or

non-Gaussian) as well as bounded quantitative, discrete and ordinal longitudinal outcomes.

29

Page 30: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

What does it do?• The different types of outcomes are taken

into account using parameterized nonlinear link functions between the observed outcome and the underlying latent process of interest

• At the latent process level, the model estimates a standard linear mixed model or a latent class mixed model when heterogeneity in the population is investigated (in the same way as in function hlme -> next) but it should be noted that the program also works when no random-effect is included! 30

Page 31: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

What does it do?• Parameters of the nonlinear link function and

of the latent process mixed model are estimated simultaneously using a maximum likelihood method.

lcmm(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, link = "linear", intnodes = NULL, epsY = 0.5, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter=100, nsim=100, prior,range=NULL, na.action=1) ### that’s a lot of parameters 31

Page 32: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Turning to lcmm# Beta link functionm11<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="beta")summary(m11)plot.linkfunction(m11,bty="l")# I-splines with 3 equidistant nodesm12<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="3-equi-splines")summary(m12)# I-splines with 5 nodes at quantilesm13<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="5-quant-splines")summary(m13)# I-splines with 5 nodes, and interior nodes entered manuallym14<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="5-manual-splines",intnodes=c(10,20,25))summary(m14)plot.linkfunction(m14,bty="l")

32

Page 33: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Turning to lcmm# Thresholds# Especially for the threshold link function, we recommend to estimate models # with increasing complexity and use estimates of previous ones to specify # plausible initial values (we remind that estimation of models with threshold# link function involves a computationally demanding numerical integration # -here of size 3)m15<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1,data=data_Jointlcmm,link="thresholds",maxiter=100,B=c(-0.8379, -0.1103, 0.3832, 0.3788 , 0.4524, -7.3180, 0.5917, 0.7364, 0.6530, 0.4038, 0.4290, 0.6099, 0.6014 , 0.5354 , 0.5029 , 0.5463, 0.5310 , 0.5352, 0.6498, 0.6653, 0.5851, 0.6525, 0.6701 , 0.6670 , 0.6767 , 0.7394 , 0.7426, 0.7153, 0.7702, 0.6421))summary(m15)plot.linkfunction(m15,bty="l")

33

Page 34: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Turning to lcmm#### Plot of estimated different link functions:#### (applicable for models that only differ in the "link function" used. #### Otherwise, the latent process scale is different and a rescaling is necessary)transfo <- data.frame(marker=m10$estimlink[,1],linear=m10$estimlink[,2],beta=m11$estimlink[,2],spl_3e=m12$estimlink[,2],spl_5q=m13$estimlink[,2],spl_5m=m14$estimlink[,2])dev.new()plot(transfo[,1]~transfo[,2],xlim=c(-10,5),col=1,type='l',xlab="latent process",ylab="marker",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,3],xlim=c(-10,5),col=2,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,4],xlim=c(-10,5),col=3,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(transfo[,1]~transfo[,5],xlim=c(-10,5),col=4,type='l',xlab="",ylab="",bty="l")par(new=TRUE)plot(m15$estimlink[,1]~m15$estimlink[,2],xlim=c(-10,5),col=5,type='l',xlab="",ylab="",bty="l")legend(x="bottomright",legend=c(colnames(transfo[,2:5]),"thresholds"),col=1:5,lty=1,inset=.02,bty="n”)

34

Page 35: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Turning to lcmm#### Estimation of 2-latent class mixed models with different assumed link #### functions with individual and class specific linear trend#### for illustration, only default initial values where used but other#### sets of initial values should also be tried to ensure convergence #### towards the golbal maximum# Linear link functionm20<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="linear")summary(m20)postprob(m20)# Beta link functionm21<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="beta")summary(m21)postprob(m21)# I-splines link function (and 5 nodes at quantiles)m22<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE,data=data_Jointlcmm,link="5-quant-splines")summary(m22)postprob(m22)

data <- data_Jointlcmm[data_Jointlcmm$ID==193,]plot.predict(m22,var.time="Time",newdata=data,bty="l")

35

Page 36: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Turning to multlcmmlibrary(lcmm)data(data_Jointlcmm)# linear link function# Latent process mixed model for two curvilinear outcomes. Link functions are aproximated by I-splines, the first one has 3 nodes (i.e. 1 internal node 8), the second one has 4 nodes (i.e. 2 internal nodes 12,25)m1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm)

Be patient, multlcmm is running ... The program took 56.14 seconds

36

Page 37: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Quicker lcmm# to reduce the computation time, the same model is estimated using# a vector of initial valuesm1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm, B=c(-1.071, -0.192, 0.106, -0.005, -0.193, 1.012, 0.870, 0.881, 0.000, 0.000, -7.520, 1.401, 1.607 , 1.908, 1.431, 1.082, -7.528, 1.135 , 1.454 , 2.328, 1.052)) Be patient, multlcmm is running ... The program took 7.78 seconds

37

Page 38: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary(m1)General latent class mixed model fitted by maximum likelihood method multlcmm(fixed = Ydep1 + Ydep2 ~ 1 + Time * X2 + contrast(X2), random = ~1 + Time, subject = "ID", randomY = TRUE, link = c("4-manual-splines", "3-manual-splines"), intnodes = c(8, 12, 25), data = data_Jointlcmm) Statistical Model: Dataset: data_Jointlcmm Number of subjects: 300 Number of observations: 3356 Number of latent classes: 1 Number of parameters: 21 Link functions: Quadratic I-splines with nodes 0 8 12 17.581 for Ydep1 Quadratic I-splines with nodes 0 25 30 for Ydep2

38

Page 39: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary(m1) Iteration process: Convergence criteria satisfied Number of iterations: 4 Convergence criteria: parameters= 5.2e-11 : likelihood= 2.1e-08 : second derivatives= 1.2e-09 Goodness-of-fit statistics: maximum log-likelihood: -6977.48 AIC: 13996.95 BIC: 14074.73

39

Page 40: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary(m1)Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-valueintercept (not estimated) 0.00000 Time -1.07056 0.12293 -8.70900 0.00000X2 -0.19225 0.16697 -1.15100 0.24957Time:X2 0.10627 0.18634 0.57000 0.56847Contrasts on X2 (p=0.88696) Ydep1 -0.00483 0.03399 -0.14215 0.88696Ydep2* 0.00483 0.03399 0.14215 0.88696 *coefficient not estimated but obtained from the others as minus the sum of themVariance-covariance matrix of the random-effects:(the variance of the first random effect is not estimated) intercept Timeintercept 1.00000 Time -0.19338 1.01251

40

Page 41: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary(m1) – last bit! Ydep1 Ydep2Residual standard error: 0.86955 0.88053Standard error of the random effect: 0.00000 0.00000Parameters of the link functions: coef Se Wald p-valueYdep1-I-splines1 -7.51985 0.64412 -11.675 0e+00Ydep1-I-splines2 1.40067 0.18058 7.756 0e+00Ydep1-I-splines3 1.60739 0.10324 15.569 0e+00Ydep1-I-splines4 1.90822 0.07873 24.238 0e+00Ydep1-I-splines5 1.43117 0.09075 15.770 0e+00Ydep1-I-splines6 1.08205 0.21198 5.105 0e+00Ydep2-I-splines1 -7.52861 0.67080 -11.223 0e+00Ydep2-I-splines2 1.13505 0.25553 4.442 1e-05Ydep2-I-splines3 1.45345 0.14629 9.935 0e+00Ydep2-I-splines4 2.32793 0.08636 26.956 0e+00Ydep2-I-splines5 1.05187 0.05908 17.803 0e+00

41

Page 42: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

plot(m1,which="linkfunction")# variation percentages explained by linear mixed regression

> VarExpl(m1,data.frame(Time=0)) class1%Var-Ydep1 56.94364%Var-Ydep2 56.32753

42

Page 43: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

summary(m2)< … ># posterior classificationpostprob(m2)Posterior classification: class1 class2N 143.00 157.00% 47.67 52.33Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2class1 1.0000 0.0000class2 0.0589 0.9411Posterior probalities above a threshold (%): class1 class2prob>0.7 100 98.09Prob>0.8 100 96.18prob>0.9 100 85.99

43

Page 44: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

# longitudinal predictions in the outcomes scales for a given profile of covariates

newdata <- data.frame(Time=seq(0,5,length=100),X1=rep(0,100),X2=rep(0,100),X3=rep(0,100))predGH <- predictY(m2,newdata,var.time="Time",methInteg=0,nsim=20)head(predGH)

Etc. 44

Page 45: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

In lcmm - hlme• Fits a latent class linear mixed model (LCLMM) also

known as growth mixture model or heterogeneous linear mixed model.

• LCLMM consists in assuming that the population is divided in a finite number of latent classes; each latent class is characterized by a specific mean trajectory which is described by a class-specific linear mixed model.

• Both the latent class membership and the trajectory can be explained according to covariates.

• This model is limited to a Gaussian outcome.45

Page 46: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

In Rhlme(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, cor=NULL, data, B, convB=0.0001, convL=0.0001, convG=0.0001, prior, maxiter=500, subset=NULL, na.action=1)

46

Page 47: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Exampledata(data_hlme)m1<-hlme(Y~Time*X1, random=~Time, subject='ID', ng=1, idiag=TRUE, data=data_hlme)summary(m1)

47

Page 48: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary hlmeHeterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = Y ~ Time * X1, random = ~Time, subject = "ID", ng = 1, idiag = TRUE, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 1 Number of parameters: 7

48

Page 49: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary hlmeIteration process: Convergence criteria satisfied Number of iterations: 9 Convergence criteria: parameters= 1.2e-07 : likelihood= 1.6e-05 : second derivatives= 6.2e-13 Goodness-of-fit statistics: maximum log-likelihood: -804.98 AIC: 1623.95 BIC: 1642.19 Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-valueintercept 25.86515 0.79448 32.556 0.00000Time -0.33282 0.17547 -1.897 0.05787X1 1.69698 1.03466 1.640 0.10098Time:X1 -0.39364 0.22848 -1.723 0.08491

49

Page 50: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Summary hlme

Variance-covariance matrix of the random-effects: intercept Timeintercept 24.63032 Time 0.00000 1.168762

coef seResidual standard error: 0.9501876 0.05765784

50

Page 51: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

plot(m1)

51

Page 52: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Examplem2<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0.11, -0.74, -0.07, 20.71, 29.39, -1, 0.13, 2.45, -0.29, 4.5, 0.36, 0.79, 0.97))m2Heterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = Y ~ Time * X1, mixture = ~Time, random = ~Time, subject = "ID", classmb = ~X2 + X3, ng = 2, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 2 Number of parameters: 13

52

Iteration process: Convergence criteria satisfied Number of iterations: 2 Convergence criteria: parameters= 1.3e-07 : likelihood= 4.4e-07 : second derivatives= 2.5e-12 Goodness-of-fit statistics: maximum log-likelihood: -773.82 AIC: 1573.64 BIC: 1607.51

Page 53: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Examplesummary(m2)postprob(m2)Posterior classification: class1 class2N 46 54% 46 54 Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2class1 0.9588 0.0412class2 0.0325 0.9675 Posterior probalities above a threshold (%): class1 class2prob>0.7 93.48 100.00prob>0.8 93.48 92.59prob>0.9 86.96 83.33

53

Page 54: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

54

Page 55: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Example### same model as m2 but initial values specifiedm3<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0, 0, 0, 30, 25, 0, -1, 0, 0, 5, 0, 1, 1))m3

55

Page 56: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Predicting…summary(m3)

Etc.

## plot of predicted trajectories using some newdatanewdata<-data.frame( Time= seq(0,5,length=100), X1= rep(0,100), X2=rep(0,100), X3=rep(0,100)) plot.predict(m3,newdata,"Time","right",bty="l") 56

Page 57: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

plot m3

57

Page 58: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Beyond PCA!• Kernel PCA• ICA

– PCA is not particularly helpful for finding independent clusters

– ICA idea:– Assume non-Gaussian data– Find multiple sets of components– Minimize correlation between components

– Blind source separation example:– Given: Audio recording with w/2 overlapping voices – Goal: Separate voices into separate tracks 58

Page 59: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Beyond PCA!Probabilistic PCA

Bayesian source separation

Continuous Latent Variables

59

Page 60: Testtw.rpi.edu/media/latest/DataAnalytics2018_group4_modul… · PPT file · Web viewPeter Fox . Data Analytics – ITWS-4600/ITWS-6600/MATP-4450. Group 4 Module 14, April 16, 2018

Reading, etc.• http://data-informed.com/focus-predictive-ana

lytics/

• Final week – your project presentations ~ Monday, Thursday in two sections (Carnegie 113 and Lally 102) – we cannot run over the class time to complete these – plan accordingly, arrive on time (instructions and initial schedule sent in LMS) – and attendance is essential (no excuses)

• 5 MINUTES – you do not need more60