Upload
pamy26250
View
228
Download
0
Embed Size (px)
Citation preview
8/13/2019 Multi Variate Part 2
1/57
Multivariate Data Analysis Using
Statgraphics Centurion: Part 2
Dr. Neil W. Polhemus
Statpoint Technologies, Inc.
1
8/13/2019 Multi Variate Part 2
2/57
Multivariate Statistical Methods
The simultaneous observation and analysis of more than one responsevariable.
*Primary Uses
1. Data reduction or structural simplification2. Sorting and grouping
3. Investigation of the dependence among variables
4. Prediction
5. Hypothesis construction and testing
*Johnson and Wichern, Applied Multivariate Statistical Analysis
2
8/13/2019 Multi Variate Part 2
3/57
Data set: U.S. Demographics
3
8/13/2019 Multi Variate Part 2
4/57
Variables
Education: percent completing college
Income per Capita: in 2011 dollars
Urban Percentage: percent living in urban areas
Population Density: people per square mile
Non-English Language: percent that dont speak English
Median Age: in years
Unemployment Rate: for 2012
Percent Female: of general population
Crime Rate: violent crimes per 100,000
Manufacturing: % of GDP
Infant Mortality: 2006
Poverty Rate: percentage below poverty level
Election 2012: results of 2012 presidential election (Obama or Romney)
4
8/13/2019 Multi Variate Part 2
5/57
Urban Percentage
5
8/13/2019 Multi Variate Part 2
6/57
8/13/2019 Multi Variate Part 2
7/57
Poverty Rate
7
8/13/2019 Multi Variate Part 2
8/57
Star Glyphs
8
8/13/2019 Multi Variate Part 2
9/57
Data Input
9
8/13/2019 Multi Variate Part 2
10/57
10
8/13/2019 Multi Variate Part 2
11/57
Cluster Analysis
Method for assigning objects to groups so thatobjects in the same group are more similar toeach other than they are to objects in other
groups.
Some questions:
1. How many natural groups are there?
2. Which objects belong to each group?
11
8/13/2019 Multi Variate Part 2
12/57
Data Input
12
8/13/2019 Multi Variate Part 2
13/57
8/13/2019 Multi Variate Part 2
14/57
Dendrogram Nearest Neighbor
14
8/13/2019 Multi Variate Part 2
15/57
Dendrogram Furthest Neighbor
15
6
8/13/2019 Multi Variate Part 2
16/57
Agglomeration Distance Plot
16
8/13/2019 Multi Variate Part 2
17/57
Dendrogram - 4 Clusters
17
8
8/13/2019 Multi Variate Part 2
18/57
Save Cluster Numbers
18
8/13/2019 Multi Variate Part 2
19/57
Map of Cluster Numbers
19
20
8/13/2019 Multi Variate Part 2
20/57
Method of k-Means
1. Starts with a set of typical objects (seeds) foreach cluster.
2. Each object is assigned to group with nearest
seed.3. Centroids of groups are computed.
4. Objects are assigned to different groups ifnearer to a different centroid.
5. Steps 3 and 4 are repeated until no change.
20
21
8/13/2019 Multi Variate Part 2
21/57
Method of k-Means
21
22
8/13/2019 Multi Variate Part 2
22/57
Seeds
22
Row 5 = California
Row 32 = New York
Row 34 = North Dakota
Row 43 = Texas
23
8/13/2019 Multi Variate Part 2
23/57
Map for k-Means
23
24
8/13/2019 Multi Variate Part 2
24/57
Classification
The classification problem deals with theassignmentof objects to knowngroups. Goalsinclude:
1. Understanding what differentiates the groups.
2. Predicting group membership for unclassifiedobjects.
24
25
8/13/2019 Multi Variate Part 2
25/57
Methods
Discriminant Analysis (on the CenturionRelatemenu) based on Fishers lineardiscriminant functions.
Neural Network Classifier (on the CenturionRelatemenu) based on a Bayesian classifierwith priors and costs.
Uniwin(companion program to Statgraphics) includes methods for dealing with qualitative
factors.
25
26
8/13/2019 Multi Variate Part 2
26/57
(Linear) Discriminant Analysis
Constructs linear functions of the standardizedclassification variables:
These functions are derived so as to maximize theseparation of the groups. If there are ggroups,there areg-1 discriminant functions.
26
pjpjjj ZdZdZdD ...
2211
27
8/13/2019 Multi Variate Part 2
27/57
Example
27
28
8/13/2019 Multi Variate Part 2
28/57
Data Input
28
29
8/13/2019 Multi Variate Part 2
29/57
Analysis Options
29
30
8/13/2019 Multi Variate Part 2
30/57
All Variables
30
31
8/13/2019 Multi Variate Part 2
31/57
Discriminant Function Plot
31
32
8/13/2019 Multi Variate Part 2
32/57
Backward selection
32
33
8/13/2019 Multi Variate Part 2
33/57
Classification Functions
These functions create a score for each group:
Objects are assigned to whatever group has thelargest value of (Cj* priorj) wherepriorjis theprior probability of belonging to groupj.
33
02211 ... jpjpjjj cXcXcXcC
34
8/13/2019 Multi Variate Part 2
34/57
Classification Functions
34
35
8/13/2019 Multi Variate Part 2
35/57
Classification Summary
35
36
8/13/2019 Multi Variate Part 2
36/57
Classification Table
3
37
8/13/2019 Multi Variate Part 2
37/57
Validation Set
May leave some of the objects out of the trainingset and use them to validate the results.
37
Correctly classified 8 out of 10 omitted states.
38
8/13/2019 Multi Variate Part 2
38/57
Neural Network Classifier
Implements a nonparametric method forclassifying objects based on the product of 3quantities:
1. The estimated density function in the neighborhood of the object(given a specified value of s).
2. The prior probabilities of belonging to each group.
3. The costs of misclassifying cases that belong to a given group.
3
39
8/13/2019 Multi Variate Part 2
39/57
Bivariate Density (s= 0.5)
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0123
45
(X 0.001)
density
Romney
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0123
45
(X 0.001)
density
Romney
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0123
45
(X 0.001)
density
Romney
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0123
45
(X 0.001)
density
Romney
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0123
45
(X 0.001)
density
Romney
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0123
45
(X 0.001)
density
Romney
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0123
45
(X 0.001)
density
RomneyObama
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0
1
2
3
4(X 0.001)
density
Obama
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0
1
2
3
4(X 0.001)
density
Obama
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0
1
2
3
4(X 0.001)
density
Obama
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0
1
2
3
4(X 0.001)
density
Obama
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0
1
2
3
4(X 0.001)
density
Obama
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0
1
2
3
4(X 0.001)
density
Obama
Bivariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Poverty rate0
1
2
3
4(X 0.001)
density
40
8/13/2019 Multi Variate Part 2
40/57
Schematic Diagram
41
8/13/2019 Multi Variate Part 2
41/57
Data Input
42
8/13/2019 Multi Variate Part 2
42/57
Analysis Options
43
8/13/2019 Multi Variate Part 2
43/57
Classification Summary
44
8/13/2019 Multi Variate Part 2
44/57
Reduced Set of Variables
45
8/13/2019 Multi Variate Part 2
45/57
Some Improvement
46
8/13/2019 Multi Variate Part 2
46/57
Classification Plot
47
8/13/2019 Multi Variate Part 2
47/57
Classification Plot
48
8/13/2019 Multi Variate Part 2
48/57
UNIWIN Plus from Sigma Plus
Software package written by our longtimecolleague Christian Charles at Sigma Plus inFrance.
Contains additional features for multivariateanalysis.
Reads Statgraphics data files.
49
8/13/2019 Multi Variate Part 2
49/57
Uniwin Cluster and Classification
Handle qualitative variables.
50
8/13/2019 Multi Variate Part 2
50/57
New Variables
51
8/13/2019 Multi Variate Part 2
51/57
Uniwin Cluster Analysis
52
8/13/2019 Multi Variate Part 2
52/57
Dendrogram
53
8/13/2019 Multi Variate Part 2
53/57
Uniwin Clusters
54
8/13/2019 Multi Variate Part 2
54/57
Uniwin Classification
55
8/13/2019 Multi Variate Part 2
55/57
Classification Results
Wrongly classified only Nevada (only cattle state to vote for Obama).
56
8/13/2019 Multi Variate Part 2
56/57
More Information
Statgraphics Centurion:www.statgraphics.com
Uniwin Plus:www.statgraphics.frorwww.sigmaplus.fr
Or send e-mail to [email protected]
http://www.statgraphics.com/http://www.statgraphics.fr/http://www.sigmaplus.fr/mailto:[email protected]:[email protected]://www.sigmaplus.fr/http://www.statgraphics.fr/http://www.statgraphics.com/8/13/2019 Multi Variate Part 2
57/57
Join the Statgraphics Community on:
Follow us on
https://twitter.com/STATGRAPHICShttp://www.linkedin.com/groups?gid=1796550https://www.facebook.com/pages/STATGRAPHICS/79706768276