View
298
Download
1
Category
Preview:
Citation preview
8/14/2019 SPSS Discriminant Function Analysis.pdf
1/58
SPSS Discriminant FunctionAnalysis
By Hui Bian
Office for Faculty Excellence
8/14/2019 SPSS Discriminant Function Analysis.pdf
2/58
Discriminant Function Analysis
What is Discriminant function analysisIt builds a predictive model for groupmembership
The model is composed of a discriminantfunction based on linear combinations ofpredictor variables.
Those predictor variables provide the bestdiscrimination between groups.
2
8/14/2019 SPSS Discriminant Function Analysis.pdf
3/58
Discriminant Function Analysis
Purpose of Discriminant analysis
to maximally separate the groups.
to determine the most parsimonious way toseparate groups
to discard variables which are little relatedto group distinctions
3
8/14/2019 SPSS Discriminant Function Analysis.pdf
4/58
Discriminant Function Analysis
Summary: we are interested in the relationshipbetween a group of independent variables and onecategorical variable. We would like to know howmany dimensions we would need to express thisrelationship. Using this relationship, we canpredict a classification based on the independentvariables or assess how well the independentvariables separate the categories in theclassification.
4
8/14/2019 SPSS Discriminant Function Analysis.pdf
5/58
Discriminant Function Analysis
It is similar to regression analysis
A discriminant score can be calculated based on theweighted combination of the independent variables
D i = a + b 1 x 1 + b 2 x 2 ++ b nx n
D i is predicted score (discriminant score)
x is predictor and b is discriminant coefficient
We use maximum likelihood technique to assign a caseto a group from a specified cut-off score.
If group size is equal, the cut-off is mean score.If group size is not equal, the cut-off is calculated fromweighted means.
5
8/14/2019 SPSS Discriminant Function Analysis.pdf
6/58
Discriminant Function Analysis
Grouping variablesCategorical variables
Can have more than two values
The codes for the grouping variables must beintegers
Independent variablesContinuous
Nominal variables must be recoded to dummyvariables
6
8/14/2019 SPSS Discriminant Function Analysis.pdf
7/58
Discriminant Function Analysis
Discriminant functionA latent variable of a linear combination ofindependent variablesOne discriminant function for 2-group discriminant
analysisFor higher order discriminant analysis, the number ofdiscriminant function is equal to g-1 (g is the numberof categories of dependent/grouping variable).The first function maximizes the difference betweenthe values of the dependent variable.
The second function maximizes the differencebetween the values of the dependent variable whilecontrolling the first function.And so on.
7
8/14/2019 SPSS Discriminant Function Analysis.pdf
8/58
Discriminant Function Analysis
The first function will be the most powerfuldifferentiating dimension.
The second and later functions may also representadditional significant dimensions of differentiation.
8
8/14/2019 SPSS Discriminant Function Analysis.pdf
9/58
Discriminant Function Analysis
Assumptions (from SPSS 19.0 help)Cases should be independent.Predictor variables should have a multivariate normaldistribution, and within-group variance-covariance
matrices should be equal across groups.Group membership is assumed to be mutuallyexclusiveThe procedure is most effective when groupmembership is a truly categorical variable; if groupmembership is based on values of a continuousvariable (for example, high IQ versus low IQ),consider using linear regression to take advantage ofthe richer information that is offered by thecontinuous variable itself.
9
8/14/2019 SPSS Discriminant Function Analysis.pdf
10/58
Discriminant Function Analysis
Assumptions(similar to those for linear regression)
Linearity, normality, multilinearity, equalvariances
Predictor variables should have a multivariatenormal distribution.
fairly robust to violations of the most of theseassumptions. But highly sensitive to outliers.
Model specification
10
8/14/2019 SPSS Discriminant Function Analysis.pdf
11/58
Discriminant Function Analysis
Test of significanceFor two groups, the null hypothesis is that themeans of the two groups on the discriminantfunction-the centroids, are equal.
Centroids are the mean discriminant score foreach group.Wilks lambda is used to test for significantdifferences between groups.
Wilks lambda is between 0 and 1. It tells us thevariance of dependent variable that is notexplained by the discriminant function.
11
8/14/2019 SPSS Discriminant Function Analysis.pdf
12/58
Discriminant Function Analysis
Wilks lambda is also used to test for significantdifferences between the groups on the individualpredictor variables.
It tells which variables contribute a significant amount of
prediction to help separate the groups.
12
8/14/2019 SPSS Discriminant Function Analysis.pdf
13/58
Discriminant Function Analysis
Two groups using an example from SPSS manualExample: the purpose of this example is to identifycharacteristics that are indicative of people who arelikely to default on loans, and use those characteristicsto identify good and bad credit risks.
Sample includes a total of 850 cases (old andnew/future customers) The first 700 cases arecustomers who were previously given loans.
Use first 700 customers to create a discriminant analysismodel, setting the remaining 150 customers aside to
validate the analysis.Then use the model to classify the 150 prospectivecustomers as good or bad credit risks.
13
8/14/2019 SPSS Discriminant Function Analysis.pdf
14/58
Discriminant Function Analysis
Grouping variable: Default
Predictors: employ, address, debtinc, andcreaddebt
Obtain Discriminant function analysisAnalyze > Classify > Discriminant
14
8/14/2019 SPSS Discriminant Function Analysis.pdf
15/58
Discriminant Function Analysis
15
8/14/2019 SPSS Discriminant Function Analysis.pdf
16/58
Discriminant Function Analysis
16
8/14/2019 SPSS Discriminant Function Analysis.pdf
17/58
Discriminant Function Analysis
Click Classify to get this window
17
8/14/2019 SPSS Discriminant Function Analysis.pdf
18/58
Discriminant Function Analysis
Click Save to get this window
18
8/14/2019 SPSS Discriminant Function Analysis.pdf
19/58
Discriminant Function Analysis
SPSS Output: descriptive statistics
19
8/14/2019 SPSS Discriminant Function Analysis.pdf
20/58
Discriminant Function Analysis
SPSS output: ANOVA table
In the ANOVA table, the smaller the Wilks's lambda, the moreimportant the independent variable to the discriminant function.Wilks's lambda is significant by the F test for all independentvariables.
20
8/14/2019 SPSS Discriminant Function Analysis.pdf
21/58
Discriminant Function Analysis
SPSS Output (correlation matrix)
The within-groups correlation matrix shows thecorrelations between the predictors.
21
8/14/2019 SPSS Discriminant Function Analysis.pdf
22/58
Discriminant Function Analysis
SPSS output: test of homogeneity ofcovariance matrices
The larger the log determinantin the table, the more thatgroup's covariance matrixdiffers. The "Rank" columnindicates the number ofindependent variables in thiscase. Since discriminant
analysis assumes homogeneityof covariance matrices betweengroups, we would like to seethe determinants be relativelyequal.
22
8/14/2019 SPSS Discriminant Function Analysis.pdf
23/58
Discriminant Function Analysis
SPSS output: test of homogeneity ofcovariance matrices
1. Box's M test tests the assumption ofhomogeneity of covariance matrices.
This test is very sensitive to meetingthe assumption of multivariatenormality.2. Discriminant function analysis isrobust even when the homogeneity ofvariances assumption is not met,provided the data do not containimportant outliers.3. For our data, we conclude the groupsdo differ in their covariance matrices,violating an assumption of DA.4. when n is large, small deviationsfrom homogeneity will be foundsignificant, which is why Box's M mustbe interpreted in conjunction with
inspection of the log determinants.
23
8/14/2019 SPSS Discriminant Function Analysis.pdf
24/58
Discriminant Function Analysis
SPSS output: test of homogeneity ofcovariance matrices
1. The larger the eigenvalue, the moreof the variance in the dependentvariable is explained by thatfunction.
2. Dependent has two categories, thereis only one discriminant function.
3. The canonical correlation is themeasure of association between the
discriminant function and thedependent variable.
4. The square of canonical correlationcoefficient is the percentage ofvariance explained in the dependentvariable.
24
8/14/2019 SPSS Discriminant Function Analysis.pdf
25/58
Discriminant Function Analysis
SPSS output: summary of canonical discriminantfunctions
When there are two groups, thecanonical correlation is the mostuseful measure in the table, and it isequivalent to Pearson's correlationbetween the discriminant scores andthe groups.Wilks' lambda is a measure of howwell each function separates casesinto groups. Smaller values of Wilks'lambda indicate greaterdiscriminatory ability of the function.
The associated chi-square statistic tests thehypothesis that the means of the functions listed areequal across groups. The small significance valueindicates that the discriminant function does better
than chance at separating the groups.
25
8/14/2019 SPSS Discriminant Function Analysis.pdf
26/58
Discriminant Function Analysis
SPSS output: summary of canonical discriminantfunctions
The standardized discriminant functioncoefficients in the table serve the samepurpose as beta weights in multipleregression (partial coefficient) : theyindicate the relative importance of theindependent variables in predicting thedependent. They allow you to compare
variables measured on different scales.Coefficients with large absolute valuescorrespond to variables with greaterdiscriminating ability.
26
8/14/2019 SPSS Discriminant Function Analysis.pdf
27/58
Discriminant Function Analysis
SPSS output: summary of canonical discriminantfunctions
1.The structure matrix table shows thecorrelations of each variable with eachdiscriminant function.2. Only one discriminant function is inthis study.3. The correlations then serve likefactor loadings in factor analysis --
that is, by identifying the largestabsolute correlations associated witheach discriminant function theresearcher gains insight into how toname each function.
27
8/14/2019 SPSS Discriminant Function Analysis.pdf
28/58
Discriminant Function Analysis
SPSS output: summary of canonical discriminantfunctions
1. Discriminant function is a latentvariable that is created as alinear combination ofindependent variables.
2. Discriminating variables areindependent variables.
3. The table shows the Pearsoncorrelations between predictorsand standardized canonicaldiscriminant functions.
4. Loading < .30 may be removedfrom the model.
28
8/14/2019 SPSS Discriminant Function Analysis.pdf
29/58
Discriminant Function Analysis
SPSS output: summary of canonicaldiscriminant functions
This table contains theunstandardized discriminantfunction coefficients. Thesewould be used likeunstandardized b (regression)
coefficients in multipleregression -- that is, they areused to construct the actualprediction equation which canbe used to classify new cases.
29
8/14/2019 SPSS Discriminant Function Analysis.pdf
30/58
Discriminant Function Analysis
Discriminant function: o ur model should belike this:
D i = 0.058 0.12emplo 0.037addres +
0.075debin + 0.312 credebt
30
8/14/2019 SPSS Discriminant Function Analysis.pdf
31/58
Discriminant Function Analysis
SPSS output: summary of canonicaldiscriminant functions
Centroids are the mean discriminant scores for each group. This table is used toestablish the cutting point for classifying cases. If the two groups are of equalsize, the best cutting point is half way between the values of the functions atgroup centroids (that is, the average). If the groups are unequal, the optimalcutting point is the weighted average of the two values. The computer does theclassification automatically, so these values are for informational purposes.
31
8/14/2019 SPSS Discriminant Function Analysis.pdf
32/58
Discriminant Function Analysis
The centroids are calculated based on the function:D i = 0.058 0.12emplo 0.037addres +0.075debin + 0.312 credebt
Centroids are discriminant score for each groupwhen the variable means (rather than individualvalues for each case) are entered into the function.
32
8/14/2019 SPSS Discriminant Function Analysis.pdf
33/58
Discriminant Function Analysis
SPSS Output: Classification Statistics
Prior Probabilities are used inclassification. The default isusing observed group sizes. Inyour sample to determine theprior probabilities ofmembership in the groupsformed by the dependent, andthis is necessary if you havedifferent group sizes. If eachgroup is of the same size, as analternative you could specifyequal prior probabilities for allgroups.
33
8/14/2019 SPSS Discriminant Function Analysis.pdf
34/58
Discriminant Function Analysis
SPSS Output: Classification Statistics
Two sets (one for eachdependent group) ofunstandardized lineardiscriminant coefficients arecalculated, which can be usedto classify cases. This is theclassical method ofclassification, though now littleused.
34
8/14/2019 SPSS Discriminant Function Analysis.pdf
35/58
Discriminant Function Analysis
SPSS Output: Classification Statistics
This table is used to assess howwell the discriminant functionworks, and if it works equally wellfor each group of the dependentvariable. Here it correctlyclassifies more than 75% of thecases, making about the sameproportion of mistakes for bothcategories. Overall, 76.0% of thecases are correctly classified.
35
8/14/2019 SPSS Discriminant Function Analysis.pdf
36/58
Discriminant Function Analysis
SPSS Output: separate-group plots
If two distributionsoverlap too much, itmeans they do not
discriminate too (poordiscriminant function).
36
8/14/2019 SPSS Discriminant Function Analysis.pdf
37/58
Discriminant Function Analysis
Run DA
37
8/14/2019 SPSS Discriminant Function Analysis.pdf
38/58
Discriminant Function Analysis
We get this classification table
38 Sensitivity = 45.4%; specificity = 94.2%
8/14/2019 SPSS Discriminant Function Analysis.pdf
39/58
Discriminant Function Analysis
The table from previous slide shows howaccurately the customers were classifiedinto these groups.
Sensitivity: highly sensitive test means thatthere are few false negative results (Type IIerror)
Specificity: highly specific test means that
there few false positive results (Type Ierror).
39
8/14/2019 SPSS Discriminant Function Analysis.pdf
40/58
Discriminant Function Analysis
Discriminant methodsEnter all independent variables into theequation at once
Stepwise: remove independent variables thatare not significant.
40
8/14/2019 SPSS Discriminant Function Analysis.pdf
41/58
Discriminant Function Analysis
Discriminant Function Analysis (more than twoGroups)
Example from SPSS mannual.
A telecommunications provider hassegmented its customer base by serviceusage patterns, categorizing the customersinto four groups. If demographic data canbe used to predict group membership, youcan customize offers for individualprospective customers.
41
8/14/2019 SPSS Discriminant Function Analysis.pdf
42/58
Discriminant Function Analysis
Variables in the analysis
Dependent variable custcat (fourcategories)
Independent variables: demographics
Obtain discriminant function analysisAnalyze > Classify > Discriminant
42
8/14/2019 SPSS Discriminant Function Analysis.pdf
43/58
Discriminant Function Analysis
When you have a lot of predictors, the stepwise methodcan be useful by automatically selecting the "best"variables to use in the model.
43
8/14/2019 SPSS Discriminant Function Analysis.pdf
44/58
Discriminant Function Analysis
Click Method
44
8/14/2019 SPSS Discriminant Function Analysis.pdf
45/58
Discriminant Function Analysis
MethodWilks' lambda. A variable selection method forstepwise discriminant analysis that chooses variablesfor entry into the equation on the basis of how muchthey lower Wilks' lambda. At each step, the variable
that minimizes the overall Wilks' lambda is entered.Unexplained variance. At each step, the variable thatminimizes the sum of the unexplained variationbetween groups is entered.Mahalanobis distance. A measure of how much acase's values on the independent variables differ
from the average of all cases. A large Mahalanobisdistance identifies a case as having extreme valueson one or more of the independent variables.
45
8/14/2019 SPSS Discriminant Function Analysis.pdf
46/58
Discriminant Function Analysis
MethodSmallest F ratio. A method of variableselection in stepwise analysis based onmaximizing an F ratio computed from the
Mahalanobis distance between groups.Rao's V. A measure of the differencesbetween group means. Also called theLawley-Hotelling trace. At each step, thevariable that maximizes the increase in
Rao's V is entered. After selecting thisoption, enter the minimum value avariable must have to enter the analysis.
46
8/14/2019 SPSS Discriminant Function Analysis.pdf
47/58
Discriminant Function Analysis
Use F value. A variable is entered into the model if its Fvalue is greater than the Entry value and is removed ifthe F value is less than the Removal value . Entry mustbe greater than Removal, and both values must bepositive. To enter more variables into the model, lower
the Entry value. To remove more variables from themodel, increase the Removal value.
Use probability of F. A variable is entered into the modelif the significance level of its F value is less than theEntry value and is removed if the significance level isgreater than the Removal value. Entry must be less than
Removal, and both values must be positive. To entermore variables into the model, increase the Entry value.To remove more variables from the model, lower theRemoval value.
47
8/14/2019 SPSS Discriminant Function Analysis.pdf
48/58
Discriminant Function Analysis
SPSS output
The stepwise method starts with a modelthat doesn't include any of the predictors
(step 0).At each step, the predictor with the largestF to Enter, value that exceeds the entrycriteria (by default, 3.84) is added to themodel.
48
8/14/2019 SPSS Discriminant Function Analysis.pdf
49/58
Discriminant Function Analysis
SPSS output
49
8/14/2019 SPSS Discriminant Function Analysis.pdf
50/58
Discriminant Function Analysis
SPSS output: variables in the analysis
Tolerance is the proportion of a variable's variance notaccounted for by other independent variables in the equation.A variable with very low tolerance contributes littleinformation to a model and can cause computationalproblems.Actually, tolerance is about multicollinearity.
8/14/2019 SPSS Discriminant Function Analysis.pdf
51/58
Discriminant Function Analysis
SPSS output: variables in the analysis
F to Remove values are useful for describing what happens if a variableis removed from the current model (given that the other variablesremain).
51
8/14/2019 SPSS Discriminant Function Analysis.pdf
52/58
Discriminant Function Analysis
SPSS output: Summary of Canonical DiscriminantFunctions
Nearly all of the variance explained by the model is due to the firsttwo discriminant functions. We can ignore the third function. For eachset of functions, this tests the hypothesis that the means of thefunctions listed are equal across groups. The test of function 3 has ap value of .34, so this function contributes little to the model.
52
8/14/2019 SPSS Discriminant Function Analysis.pdf
53/58
Discriminant Function Analysis
SPSS output
The structure matrix tableshows the correlations ofeach variable with eachdiscriminant function. Thecorrelations serve like factorloadings in factor analysis --
that is, by identifying thelargest absolute correlationsassociated with eachdiscriminant function.
53
8/14/2019 SPSS Discriminant Function Analysis.pdf
54/58
Discriminant Function Analysis
Three discriminant functions
54
Function 1: -2.84 + .88ed + .01em + .18reFunction 2: -1.86 + .12ed + .10em + .17reFunction 3: -.82 - .22ed -.01em + .66re
8/14/2019 SPSS Discriminant Function Analysis.pdf
55/58
Discriminant Function Analysis
SPSS Output: combined-group plot
The closer the groupcentroids, the moreerrors of classification
likely will be.
55
8/14/2019 SPSS Discriminant Function Analysis.pdf
56/58
Discriminant Function Analysis
SPSS output: classification table
The model excels at identifying Total service customers. However, it doesan exceptionally poor job of classifying E-service customers. You may needto find another predictor in order to separate these customers.
56
8/14/2019 SPSS Discriminant Function Analysis.pdf
57/58
Discriminant Function Analysis
We have created a discriminant model thatclassifies customers into one of four predefined"service usage" groups, based on demographicinformation from each customer. Using the
structure matrix, we identified which variables aremost useful for segmenting the customer base.Lastly, the classification results show that themodel does poorly at classifying E-servicecustomers. If identifying E-service customers is notthe concern, the model may be accurate enoughfor this purpose.
57
8/14/2019 SPSS Discriminant Function Analysis.pdf
58/58
Discriminant Function Analysis
58
Recommended