Upload
others
View
34
Download
1
Embed Size (px)
Citation preview
images/upf-logo
Discriminant Analysis
Albert Satorra
Multivariate Analysis UPF, Tardor del 2015
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 1 / 27
images/upf-logo
Table of contents
1 Separation among groups
2 Exemple of Grape Brandies: 4 variables — 3 groups
3 Manova
4 Factorial Discriminant Analysis
5 Example of discriminant analysis
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 2 / 27
images/upf-logo
Separation among groups
Figure : Single variable: group differences
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 3 / 27
images/upf-logo
Separation among groups
Figure : Two or more variable: group differences
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 4 / 27
images/upf-logo
Separation among groups
Figure : Principal directions for discrimination
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 5 / 27
images/upf-logo
Separation among groups
Figure : Principal directions for discrimination
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 6 / 27
images/upf-logo
Exemple of Grape Brandies: 4 variables — 3 groups
Figure : Example
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 7 / 27
images/upf-logo
Exemple of Grape Brandies: 4 variables — 3 groups
Data of Cooper i Weeks (1983)
Cooper & Weeks (1983) Table 12.8 Amounts of Flavour Compounds in Grape Brandies
Source: Extract from Schreier P. and Reiner L., Characterisation of grap brandies, Journal of the Science of Food and Agriculture, 30, 1979
GRoup = 1 German grape brandies;
GRoup = 2 French cognacs;
GRoup = 3 French grape brandies
A = ethyl butanoate B = ethl octanoate C = eth 2 furanoate D = ethyl miristate
A B C D Grou
1692 4968 29 139 1
3244 6710 31 85 1
2551 6895 41 121 1
2363 7164 28 100 1
1762 6734 14 58 1
1376 5241 16 80 1
739 3087 20 61 1
1323 4418 3 60 ? ### <------ desconeixem Grup de proced.
1002 13270 77 210 2
1038 11245 83 154 2
623 12338 93 122 2
903 11987 112 146 2
1068 11583 87 103 2
810 11691 85 92 2
1994 7569 55 133 2
604 13614 119 131 ? ### <------ desconeixem Grup de proced.
1828 9769 26 60 3
822 9283 13 139 3
962 6368 18 88 3
1708 10896 25 71 3
1247 8040 21 76 3
1450 6760 10 121 3
1085 8110 19 77 3
1300 8461 19 90 ? ### <------ desconeixem Grup de proced.
data= scan()
data=read.table("G:/Albert/A_A_A_Web/AnalisiMultivariant/A_Datasets/manova.dat", header=T)
da=data[-c(8,16, 24), ]
# data=matrix(data, 24,5, byrow = T)
colnames(da) = c(’A’,’B’,’C’,’D’,’Gr’)
da= as.data.frame(da)
attach(da)
ng =aggregate(Gr,list(Gr),length)
range = min(Gr):max(Gr)
G = length(range)
n = sum(ng[,2])
p = 4
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 8 / 27
images/upf-logo
Manova
Group differences: Anova, Manova
meang =aggregate(da[,1:p],list(Gr),mean)
## group means
# Group.1 A B C D
#1 1 1961.000 5828.429 25.57143 92.00000
#2 2 1062.571 11383.286 84.57143 137.14286
#3 3 1300.286 8460.857 18.85714 90.28571
omean = apply(da[,1:p],2,mean)
## overall mean
# A B C D
# 1441.2857 8557.5238 43.0000 106.4762
cmeang = meang[,1+ (1:p)] - matrix(1,G,1)%*%matrix(omean,1,p)
cmeang = as.matrix(sqrt(ng[,2])*cmeang)
### Sum of Squares Between
SSB = t(cmeang)%*%cmeang
# A B C D
# A 3033859.1 -17324133.3 -149782.00 -117981.714
# B -17324133.3 108095649.2 1171583.00 894100.762
# C -149782.0 1171583.0 18303.71 13426.286
# D -117981.7 894100.8 13426.29 9884.952
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 9 / 27
images/upf-logo
Manova
. . .
### Sum of Squares Between
SSW =matrix(0,p,p)
for (i in range){ S = (ng[i,2]-1)*cov(da[ Gr == i, 1:p ]) ; SSW = SSW + S }
# A B C D
#A 6117113.14 3627147 3503.00000 19991.85714
#B 3627147.14 48547772 191295.00000 94045.00000
#C 3503.00 191295 2512.28571 -94.28571
#D 19991.86 94045 -94.28571 19536.28571
### Sum of Squares Total
SST = (n-1)*cov(da[,1:p])
> SST
A B C D
A 9150972.29 -13696986.1 -146279 -97989.86
B -13696986.14 156643421.2 1362878 988145.76
C -146279.00 1362878.0 20816 13332.00
D -97989.86 988145.8 13332 29421.24
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 10 / 27
images/upf-logo
Manova
Manova, Wilks’ Lambda
### noteu que
SSW + SSB
A B C D
A 9150972.29 -13696986.1 -146279 -97989.86
B -13696986.14 156643421.2 1362878 988145.76
C -146279.00 1362878.0 20816 13332.00
D -97989.86 988145.8 13332 29421.24
Difference among groups, Wilks’ Lambda:
LW = det(SSW )/ det(SST )
η2 = 1 − LW
η2 quadrat de Fisher es
eta2= 0.9585027
1-pf(F,m1,m2) =1.866419e-08
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 11 / 27
images/upf-logo
Manova
Figure : Manova and discriminant analysis
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 12 / 27
images/upf-logo
Manova
Figure : Manova and discriminant analysis
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 13 / 27
images/upf-logo
Manova
Anova and Manovama=manova(cbind(V1,V2,V3,V4) ~ Gr );
ANOVA:
summary.aov(ma)
Response V1 :
Df Sum Sq Mean Sq F value Pr(>F)
Gr 1 1527902 1527902 3.8082 0.0659 .
Residuals 19 7623070 401214
Response V2 :
Df Sum Sq Mean Sq F value Pr(>F)
Gr 1 24253881 24253881 3.4808 0.0776 .
Residuals 19 132389541 6967871
Response V3 :
Df Sum Sq Mean Sq F value Pr(>F)
Gr 1 157.8 157.79 0.1451 0.7075
Residuals 19 20658.2 1087.27
Response V4 :
Df Sum Sq Mean Sq F value Pr(>F)
Gr 1 10.3 10.29 0.0066 0.9359
Residuals 19 29411.0 1547.94
MANOVA:
summary(ma) ;
Df Pillai approx F num Df den Df Pr(>F)
Gr 1 0.61892 6.4964 4 16 0.002647 **
Residuals 19
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 14 / 27
images/upf-logo
Factorial Discriminant Analysis
Figure : Factorial Discriminant Analysis
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 15 / 27
images/upf-logo
Factorial Discriminant Analysis
Figure : Factorial Discriminant Analysis
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 16 / 27
images/upf-logo
Factorial Discriminant Analysis
Canonical Discriminant Analysis
Figure :
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 17 / 27
images/upf-logo
Factorial Discriminant Analysis
Discriminant functions
pg=rep(1/3,3)
Disp = SSW/(n-G) # dispersion matrix
Disp
A B C D
A 339839.6190 201508.175 194.611111 1110.658730
B 201508.1746 2697098.444 10627.500000 5224.722222
C 194.6111 10627.500 139.571429 -5.238095
D 1110.6587 5224.722 -5.238095 1085.349206
>
CFUN = rbind()
for (i in 1:G)
{ B1 = as.matrix(meang[i,2:(1+p)])%*%solve(Disp)
a1 = -.5*as.matrix(meang[i,2:(1+p)])%*%solve(Disp)%*%t(as.matrix(meang[i,2:(1+p)])) + log(pg[i])
BA = cbind(B1,a1)
CFUN = rbind(CFUN , BA)
}
CFUN = t(CFUN )
CFUN ### classification functions
gdesconegut = data[c(8,16,24),1:p]
gdesconegut
A B C D
8 1323 4418 3 60
16 604 13614 119 131
24 1300 8461 19 90
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 18 / 27
images/upf-logo
Factorial Discriminant Analysis
Classification
as.matrix(gdesconegut[1,])%*%CFUN[-5,] + CFUN[5,]
1 2 3
8 2.868149 -21.35584 2.407006
classificat a 1 !
as.matrix(gdesconegut[2,])%*%CFUN[-5,] + CFUN[5,]
1 2 3
16 26.06156 57.45375 22.68623
classificat a 2
as.matrix(gdesconegut[3,])%*%CFUN[-5,] + CFUN[5,]
1 2 3
24 11.72611 -2.064508 15.96609
classificat a 3 !
########## funcio lda de library(MASS)
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 19 / 27
images/upf-logo
Factorial Discriminant Analysis
Linear Discriminant Analysis using R
lda(Gr ~ A + B + C + D)
## lda(Gr ~ A + B + C + D, prior = c(1,1,1)/3, subset = train)
Prior probabilities of groups:
1 2 3
0.3333333 0.3333333 0.3333333
Group means:
A B C D
1 1961.000 5828.429 25.57143 92.00000
2 1062.571 11383.286 84.57143 137.14286
3 1300.286 8460.857 18.85714 90.28571
### analisi factorial discriminant
Coefficients of linear discriminants:
LD1 LD2
A 3.088050e-04 -0.0010990240
B 6.440719e-05 0.0006804682
C -8.528876e-02 -0.0517612426
D -8.552957e-03 -0.0015027848
Proportion of trace:
LD1 LD2
0.8364 0.1636
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 20 / 27
images/upf-logo
Example of discriminant analysis
Example
idreUCLA, discriminant analysisA large international air carrier has collected data on employees in threedifferent job classifications: 1) customer service personnel, 2) mechanicsand 3) dispatchers. The director of Human Resources wants to know ifthese three job classifications appeal to different personality types. Eachemployee is administered a battery of psychological test which includemeasures of interest in outdoor activity, sociability and conservativeness.
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 21 / 27
images/upf-logo
Example of discriminant analysis
ANOVA
data = read.dta("http://www.ats.ucla.edu/stat/stata/dae/discrim.dta"); attach(data)
ma= manova(cbind(outdoor,social,conservative) ~ job);
summary.aov(ma);
Response outdoor :
Df Sum Sq Mean Sq F value Pr(>F)
job 2 1609.8 804.90 47.516 < 2.2e-16 ***
Residuals 241 4082.5 16.94
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Response social :
Df Sum Sq Mean Sq F value Pr(>F)
job 2 2889.1 1444.56 79.01 < 2.2e-16 ***
Residuals 241 4406.3 18.28
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Response conservative :
Df Sum Sq Mean Sq F value Pr(>F)
job 2 691.76 345.88 31.066 9.921e-13 ***
Residuals 241 2683.26 11.13
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 22 / 27
images/upf-logo
Example of discriminant analysis
MANOVA
summary(ma)
Df Pillai approx F num Df den Df Pr(>F)
job 2 0.76207 49.248 6 480 < 2.2e-16 ***
Residuals 241
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 23 / 27
images/upf-logo
Example of discriminant analysis
Factorial Discriminant Analysis
soutdoor =scale(outdoor)
ssocial = scale(social)
sconservative= scale(conservative)
ld = lda(job ~ soutdoor + ssocial + sconservative );
LD =cbind(soutdoor ,ssocial ,sconservative)%*%ld$scaling;
mi=min(LD); ma=max(LD);
plot(LD, type = ’n’, xlim=c(mi,ma),ylim=c(mi,ma));
text(LD[job=="customer service",], ’serv’, cex=0.6, col=2);
text(LD[job=="mechanic",], ’mech’, cex=0.6, col=3);
text(LD[job=="dispatch",], ’disp’, cex=0.6, col=4);
abline(h=0, lty=3, lwd=0.8)
abline(v=0, lty=3, lwd=0.8)
## dev.copy2pdf(file="/AlbertNou/A_A_A_Web/AnalisiMultivariant/curs2006/discrim1.pdf")
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 24 / 27
images/upf-logo
Example of discriminant analysis
Factorial Discriminant Analysis: plot of training set
−4 −2 0 2 4
−4
−2
02
4
LD1
LD2
serv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmechmech
mech
mech
mech
mechmech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech mechmech
mech
mech
mech
dispdisp
disp
disp
disp
disp
dispdisp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
dispdisp
disp
disp
disp
dispdisp
dispdispdisp
disp
disp
disp
Figure : p.d.f de la Normal
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 25 / 27
images/upf-logo
Example of discriminant analysis
Factorial Discriminant Analysis
COR=cor(LD, cbind(soutdoor, ssocial, sconservative) )
colnames(COR)= c("outdoor", "social", "conservative")
b=COR[2,]/COR[1,]
for (i in 1:length(b)) {abline(c(0,b[i]), col=1, lty = 3, lwd=2) }
expan =2
for (i in 1:length(b)) {
text(expan*COR[1,i], expan*COR[2,i], colnames(COR)[i], col=1, cex=1.8)
arrows(0,0,expan*COR[1,i], expan*COR[2,i], length=.3, col=1)}
#legend(-4,3, names(var), lty=1: 6, col = 2:6, cex=0.4)
## dev.copy2pdf(file="/AlbertNou/A_A_A_Web/AnalisiMultivariant/curs2006/discrim2.pdf")
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 26 / 27
images/upf-logo
Example of discriminant analysis
Factorial Discriminant Analysis: plot of training set
−4 −2 0 2 4
−4
−2
02
4
LD1
LD2
serv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
servserv
serv
servserv
serv
serv
serv
serv
serv
serv
serv
serv
serv
serv
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmechmech
mech
mech
mech
mechmech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mechmech
mech
mech
mech
mech
mech
mech
mech mechmech
mech
mech
mech
dispdisp
disp
disp
disp
disp
dispdisp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
disp
dispdisp
disp
disp
disp
dispdisp
dispdispdisp
disp
disp
disp
outdoor
social
conservative
Figure : p.d.f de la Normal
Albert Satorra ( Multivariate Analysis UPF, Tardor del 2015 ) AD/E-GRAU Fall 2015 27 / 27