Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Visualizing Multivariate Uncertainty:
Some Graphical Methods for Multivariable Spatial Data
Suicides
InfantsCrime_prop
Instruction
Donations Crime_pers
Median Ranks by Region
Michael FriendlyYork University
http://www.math.yorku.ca/SCS/friendly.html
National Academy of Sciences
Washington, DC, Mar, 2005
Visualizing Multivariate Uncertainty outline
Outline
Multivariate uncertainty and “moral statistics”A. M. Guerry’s Moral Statistics of FranceGuerry’s data and analyses
Multivariate analyses: Data-centric displays
Bivariate plots and data ellipsesBiplotsCanonical discriminant plotsHE plots for multivariate linear models
Multivariate mapping: Map-centric displays
Star mapsReduced-rank color maps
National Academies of Sciences, March 2005 1 c© Michael Friendly
Visualizing Multivariate Uncertainty guerry1
Multivariate Uncertainty and “Moral Statistics” ∼ 1800
It is a capital mistake to theorize before one has data. Sherlock Homes inScandal in Bohemia
What to do about crime?Liberal view: increase education, literacyConservative view: build more prisons
What to do about poverty?Liberal view: increase social assistanceConservative view: build more poor-houses
But:Little actual data – all armchair theorizingNo ways to understand or visualize relationships between variables• Statistical graphics just invented (Playfair)— line graph, bar chart, pie chart• All 1D or 1.5D (time series)
National Academies of Sciences, March 2005 2 c© Michael Friendly
Visualizing Multivariate Uncertainty background
The rise of “moral statistics” and modern social science
Political arithmetic: William Petty (and others)
1654— first attempt at scientific survey (on Irish estates)1687— idea that wealth and strength of a state depended on its subjects(number and characteristics)
Demography: Johann Peter Sussmilch (1741)—
importance of measuring and analyzing population distributionsidea that ethical and state policies could encourage growth and wealth(increase birth rate, decrease death rate)• discourage alcohol, gambling, prostitution & priestly celibacy• encourage state support for medical care, distribution of land, lower taxes
Statistik: Numbers of the state (1800–1820), Germany and France
collect data on imports, exports, transportation, ...
Guerry & QueteletQuetelet: Concepts of “average man” and “social physics”Guerry: First real social data analysis (Guerry, 1833)
National Academies of Sciences, March 2005 3 c© Michael Friendly
Visualizing Multivariate Uncertainty data
Guerry’s data
Compte general de l’administration de la justice criminelle en France
The first national compilation of official justice data (1825)• detailed data on all charges and disposition• collected quarterly in all 86 departments.Other sources: Bureau de Longitudes (illegitimate births); Parent-Duchatelet(prostitutes in Paris); Compte du ministere du guerre (military desertions); ...
Moral variables: Scaled so ’more’ is ’better’
Crime pers Population per Crime against personsCrime prop Population per Crime against propertyDonations Donations to the poorInfants Population per illegitimate birthLiteracy Percent who can read & writeSuicides Population per suicide
Tried to define these to ensure comparability and representativeness• Crime: Use number of accused rather than convicted• Literacy: Reported levels of education unreliable; use data from military draft
examinations (% of young men able to read and write)
Other variables: Ranks by department: wealth, commerce, ...
National Academies of Sciences, March 2005 4 c© Michael Friendly
Visualizing Multivariate Uncertainty questions
Guerry’s Questions
Should crime and other moral variables be considered as structural, lawful
characteristics of society, or simply as indicants of individual behavior?
Statistical regularity as the key to social science (“social physics”) social
equivalent of “law of large numbers”)
Guerry showed that rates of crime had nearly invariant distributions over time
(1825–1830) when classified by region, sex of accused, type of crime, etc. “We
would be forced to recognze that the facts of moral order, like those of physical
order, obey invariant laws...” (p.14)
Relations between crime and other moral variables
Do crimes against persons and crimes against property show the same or
different trends?
How does crime relate to education and literacy?
• Some “armchair” arguments had suggested increasing literacy to decrease
crime: “The definitive result shows that 67 out of 100 prisoners can neither
read nor write. What stronger proof could there be that ignorance is the
mother of all vices” (A. Taillander, 1828)
Does crime vary coherently over regions of France (C, N, S, E, W)?
National Academies of Sciences, March 2005 5 c© Michael Friendly
Visualizing Multivariate Uncertainty maps
Guerry’s maps
National Academies of Sciences, March 2005 6 c© Michael Friendly
Visualizing Multivariate Uncertainty maps
Guerry’s maps
Population per Crime against persons
Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86
Population per Crime against property
Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86
Per cent who can Read and Write
Rank 1 - 8 10 - 17 20 - 2629 - 35 38 - 47 48 - 5658 - 66 68 - 76 77 - 86
Population per Illegitimate birth
Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86
Donations to the poor
Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86
Population per Suicide
Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86
77
66
70
15
33 8
85
3
49
28
7
20
34
38
6343
54
26
81
76
86
53
9
19
24
5279
17
42
42
61
14
55
82
47
44
65
35
51
71
29
48
36
258
6
78
80 27
6774
6862
60
10
64
69
72
73
59
32
3111
4
12
5
45
57
75
84
22
39
5613
40
83
1623
18
21
50 25
30
46
37
1
84
19
48
40
5273
60
70
6
74
32
17
7
79
8318
76
82
64
37
86
72
22
46
12
434
50
39
75
41
78
30
44
36
56
51
25
24
81
85
67
16
6361
23
57
42 15
6866
3365
49
9
53
26
31
54
5
80
5871
45
14
13
8
47
77
55
1
2
21 3
35
38
2759
69
20
43 11
28
62
29
10
42
64
3
58
8026
78
10
73
39
35
44
65
35
4045
3
1
74
7
17
10
85
50
64
68 6
47
35
44
47
56
22
8
26
31
85
29
26
31
15
20
50
2035
26
17
52 77
8312
7986
5
71
14
56
68
56
62
12
6066
35
76
82
56
73
38
32
82
52
6870
48
53
1422
17
42
29 22
3
76
60
62
81
26
43
67
6885
37
65
50
57
64
6
5
33
5371
69
52
35
83
24
63
18
40
36
2575
76
34
31
9
62
84
56
41
14
59
32
19
78
79
21
7
5847
73
51
22 11
4649
1545
80
20
54
4
48
61
8
66
1613
30
23
27
2
12
60
10
1
3
3938
74
17
7770
28
44
86 82
55
29
42
72
45
62
69
11
5717
56
24
25
10
18
7
85
33
7776
66
82
9
67
70
41
21
13
74
3984
16
6
14
44
2
60
71
59
32
15
75
50
22
12
61
42
4738
4
37
46 28
30 5
2734
81
65
68
55
49
64
51
52
1954
72
80
54
3
73
26
20
35
58
4829
83
43
2340
8
1
79 63
78
31
36
86
56
12
82
18
2569
43
84
8
74
83
5
49
81
4226
32
66
24
78
79
58
64
37
16
2040
28
71
72
31
48
65
39
21
59
55
57
19
77
86
44
10
6862
9
51
70 6
3345
2315
54
41
47
17
3
53
22
80
7385
61
29
34
27
63
36
46
2
7
4 1
38
13
7667
14
30
75 35
52
50
11
60
National Academies of Sciences, March 2005 7 c© Michael Friendly
Visualizing Multivariate Uncertainty maps
Guerry’s analyses
Relate variables by comparing maps and ranked lists (1st|| coordinate plot)
Conclusion: no clear relation between crime and literacy
Literacy Ranked lists Crimes against persons
Similar analyses for other variables (suicide, illegitimate births, ...)
National Academies of Sciences, March 2005 8 c© Michael Friendly
Visualizing Multivariate Uncertainty multivar0
Graphical methods for multivariate data
Bivariate displays: Bivariate displays can be enhanced to show statisticalrelations more clearly and effectively
Scatterplots with data (concentration) ellipses and smoothed (loess) curvesScatterplot matricesCorrgrams and visual thinning
Reduced-rank displays: Multivariate visualization techniques can show thestatistical data in simple ways, using dimension reduction techniques.
Biplots - show variables and observations in space accounting for greatestvarianceCanonical discriminant plots - show variables and observations in spaceaccounting for greatest between-group variation
HE plots: Visualization for Multivariate Linear Models
National Academies of Sciences, March 2005 9 c© Michael Friendly
Visualizing Multivariate Uncertainty bivar
Bivariate plots: Points and visual summaries
Indre
Haute-Marne
Meuse
Doubs
Cote-d’Or
Jura
AriegeHaut-Rhin
Corsica
Creuse
Ardennes
Pop
. per
Crim
e ag
ains
t per
sons
0
10000
20000
30000
40000
Percent Read & Write
10 20 30 40 50 60 70 80
Scatterplot with linear regression line
National Academies of Sciences, March 2005 10 c© Michael Friendly
Visualizing Multivariate Uncertainty dataellipse0
The Data Ellipse: Galton’s Discovery
Pearson (1920): “... one of the most noteworthy scientific discoveries arising from pure
analysis of observations.”
National Academies of Sciences, March 2005 11 c© Michael Friendly
Visualizing Multivariate Uncertainty dataellipse0
The Data Ellipse: Details
Visual summary for bivariate marginal relations
Shows: means, standard deviations, correlation, regression line(s)
Defined: set of points whose squared Mahalanobis distance ≤ c2,
D2(y) ≡ (y − y)T S−1 (y − y) ≤ c2
S = sample variance-covariance matrix
Radius: when y is approx. bivariate normal, D2(y) has a large-sample χ22
distribution with 2 degrees of freedom.
• c2 = χ22(0.40) ≈ 1: 1 std. dev univariate ellipse– 1D shadows: y ± 1s
• c2 = χ22(0.68) = 2.28: 1 std. dev bivariate ellipse
• Small samples: c2 ≈ 2F2,n−2(1 − α)Construction: Transform the unit circle, U = (sinθ, cos θ),
Ec = y + cS1/2US1/2 = any “square root” of S (e.g., Cholesky)
Robust version: Use robust covariance estimate (MCD, MVE)
Nonparametric version: Use kernel density estimation
National Academies of Sciences, March 2005 12 c© Michael Friendly
Visualizing Multivariate Uncertainty bivar
Bivariate plots: Data ellipse and smoothing
Indre
Haute-Marne
Meuse
Doubs
Cote-d’Or
Jura
AriegeHaut-Rhin
Corsica
Creuse
Ardennes
Pop
. per
Crim
e ag
ains
t per
sons
0
10000
20000
30000
40000
Percent Read & Write
10 20 30 40 50 60 70 80
Scatterplot with 68% data ellipse and smoothed (loess) curve
National Academies of Sciences, March 2005 13 c© Michael Friendly
Visualizing Multivariate Uncertainty bivar
Bivariate plots: Region differences
C
E
N
S
W
Indre
Haute-Marne
Meuse
Doubs
Cote-d’Or
Jura
AriegeHaut-Rhin
Corsica
Creuse
Ardennes
Pop
. per
Crim
e ag
ains
t per
sons
0
10000
20000
30000
40000
Percent Read & Write
10 20 30 40 50 60 70 80
National Academies of Sciences, March 2005 14 c© Michael Friendly
Visualizing Multivariate Uncertainty bivar
Bivariate plots: Scatterplot matrices
Crime_pers
2199
37014
Crime_prop
1368
20235
Literacy
12
74
Infants
2660
62486
National Academies of Sciences, March 2005 15 c© Michael Friendly
Visualizing Multivariate Uncertainty corrgram
Corrgrams— Correlation matrix displays
How to show a correlation matrix for different purposes? (Friendly, 2002)
Render a correlation to depict sign and magnitude (tasks: lookup, comparison,detection)
Correlation value (x 100)-100 -85 -70 -55 -40 -25 -10 5 20 35 50 65 80 95 Number
Circle
Ellipse
Bars
Shaded
Task-specific renderings:
Task Lookup Comparison Detection
Rendering Number Circle Shading
National Academies of Sciences, March 2005 16 c© Michael Friendly
Visualizing Multivariate Uncertainty corrgram
Corrgrams— Rendering
Baseball data: (lower) Patterns vs. (upper) comparison
Baseball data: PC2/1 order
Years
logSal
Homer
Putouts
RBI
Walks
Runs
Hits
Atbat
Errors
Assists
Assis
tsErr
ors
Atb
at Hits Runs
Walk
s RB
I
Puto
utsHom
er
logS
al Years
National Academies of Sciences, March 2005 17 c© Michael Friendly
Visualizing Multivariate Uncertainty corrgram
Corrgrams— Variable ordering
Baseball data: (a) alpha vs. (b) correlation ordering (Friendly and Kwan, 2003)
(a) Alpha order
Assists
Atbat
Errors
Hits
Homer
logSal
Putouts
RBI
Runs
Walks
Years
Years
Walk
s Runs
RB
I
Puto
utslogS
al Hom
er
Hits
Err
ors
Atb
at
Assis
ts
(b) PC2/1 order
Years
logSal
Homer
Putouts
RBI
Walks
Runs
Hits
Atbat
Errors
Assists
Assis
tsErr
ors
Atb
at Hits Runs
Walk
s RB
I
Puto
utsHom
er
logS
al Years
See: http://www.math.yorku.ca/SCS/sasmac/corrgram.html
National Academies of Sciences, March 2005 18 c© Michael Friendly
Visualizing Multivariate Uncertainty corrgram
Corrgrams— Variable ordering
Reorder variables to show similarities: PC1 or angles (PC2/PC1)
logSal
Years
Homer
Runs
Hits
RBI
Atbat
WalksPutouts
Assists Errors
Dim
en
sio
n 2
(1
7.4
%)
-1.0
-0.5
0.0
0.5
1.0
1.5
Dimension 1 (46.3%)-1.0 -0.5 0.0 0.5 1.0 1.5
National Academies of Sciences, March 2005 19 c© Michael Friendly
Visualizing Multivariate Uncertainty corrgram
Corrgrams— Guerry data
Guerry Variables
Crime_pers
Desertion
Wealth
Infanticide
Literacy
Instruction
Clergy
Suicides
Lottery
Crime_prop
Infants
Infa
nts
Crim
e_pr
op Lo
ttery
Sui
cide
s C
lerg
y
Inst
ruct
ionLi
tera
cy Infa
ntic
ideW
ealth
Des
ertio
n
Crim
e_pe
rs
National Academies of Sciences, March 2005 20 c© Michael Friendly
Visualizing Multivariate Uncertainty corrgram
Guerry data— Variable ordering
Suicides
Wealth
Lottery
Crime_propInfants
Instruction
Clergy
Crime_pers
Desertion
Infanticide
Literacy
Dim
ensi
on 2
(15
.8%
)
-1.2
-0.8
-0.4
0.0
0.4
0.8
1.2
Dimension 1 (35.5%)
-1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2
National Academies of Sciences, March 2005 21 c© Michael Friendly
Visualizing Multivariate Uncertainty corrgram
Visual thinning: Minimal summaries for large data sets
Guerry data: schematic scatterplot matrix: 68% data ellipse + loess smooth
rime_pers
2199
37014
Desertion
1
86
Wealth
1
86
Infanticide
1
86
Literacy
12
74
Instruction
1
86
Clergy
1
86
Suicides
3460
163241
Lottery
1
86
Crime_prop
1368
20235
Infants
2660
62486
National Academies of Sciences, March 2005 22 c© Michael Friendly
Visualizing Multivariate Uncertainty multivar1
Multivariate analyses: Reduced rank displays
Multivariate visualization techniques can show the statistical data in simple ways,using dimension reduction techniques.
Biplots - show variables and departments in space accounting for greatestvarianceCanonical discriminant plots - show variables and departments in spaceaccounting for greatest between-region variation
Can try to show geographic location by color coding or other visual attributes.
Color code by regionShow data ellipse to summarize regions
→ Data-centric displays: The multivariate data is shown directly; geographicrelations indirectly
National Academies of Sciences, March 2005 23 c© Michael Friendly
Visualizing Multivariate Uncertainty multivar1
Biplots
Biplots represent both variables (attributes) and observations (departments) in thesame plot— a low-rank (2D) approximation to a data matrix (Gabriel, 1971)
Y � = Y − Y·· ≈ ABT =d∑
k=1
akbTk
Variables are usually represented by vectors from origin (mean)Observations are usually represented by pointsCan show clusters of observations by data ellipses
Properties:
Angles between vectors show correlations (r ≈ cos(θ))Length of variable vectors ∼ % variance accounted foryij ≈ aT
i bj : projection of observation on variable vectorDimensions are uncorrelated overall (but not necessarily within group)
National Academies of Sciences, March 2005 24 c© Michael Friendly
Visualizing Multivariate Uncertainty multivar1
Biplots: Guerry data
National Academies of Sciences, March 2005 25 c© Michael Friendly
Visualizing Multivariate Uncertainty biplot-bb
Biplots: Baseball data
C
1B
OF
1B
IF
IF
IF
IF
DH
IF
COF
IF
UT
IF
IF
C
OFOF
C
1BOFOF
C
IF
OF
OF
IF
C
IF
C1B
UT C
IF
OF OF
C
OF
IF
OFOF
OF
IF
IF
OF
IF
IF
IF
DH
OF
IF
UT
IF
1BOF
IF
OF
OF
IF
IF 1B
OF
OFOF
OF
C
IF
C
OF
IF
OFIF
OF
OFOF 1B
IFIF
OF
OF
IF
1B
C
1B
IF
IF
OF
IF
IF
OF
IF OF
OF
IF
1BOF
OFOF
IF
IF
DH
IF
1B
OF
OF
UT
OF
OF
OF
1B OF
C
DH
IF
IF
DH
IF OFOF
IF
OF
OF
IFIF
C
OF
UT C
IF
OFC
C
IF
OF
OF
OF
OFOF
1B1BOF OF
OF
OF
IF
DH
OF
IF
1B
OF
OFOF
OF
OFC
DH
DH
IF
1B
IF
OFOF
IF
DH
OFC
OF C
OF
IF
CIF CCOF
OF
OFOF OF
IF
OF
IF
C
OF
IF
OFIF
1B
OF
1B
IF
UTOF
CIF
OF
CC
OF
DH
DH
IF
DH
OF
IF
IFIF
IF
OF
OF
IF
IF
IF
OF
DH
IF
IF
OF
1B
C
1BIF
IF
IF
1B
IF
IF
IF
IF
IF
UT
OF
IF
IFUT
OF
OF
IFIF
C
OFC
UT
C
IF
IF
OFUT
IFIFOF
1B
IF
IFIF
1B 1B
OF
IF
IF1B
OF
logSal
Years
Homer
Runs
Hits
RBI
Atbat
WalksPutouts
Assists Errors
Dim
ensi
on 2
(17
.4%
)
-1.0
-0.5
0.0
0.5
1.0
1.5
Dimension 1 (46.3%)
-1.0 -0.5 0.0 0.5 1.0 1.5
Biplot of baseball data, players labeled by position
National Academies of Sciences, March 2005 26 c© Michael Friendly
Visualizing Multivariate Uncertainty biplot-bb
C
1B
OF
1B
IF
IF
IF
IF
DH
IF
COF
IF
UT
IF
IF
C
OFOF
C
1BOFOF
C
IF
OF
OF
IF
C
IF
C1B
UT C
IF
OF OF
C
OF
IF
OFOF
OF
IF
IF
OF
IF
IF
IF
DH
OF
IF
UT
IF
1BOF
IF
OF
OF
IF
IF 1B
OF
OFOF
OF
C
IF
C
OF
IF
OFIF
OF
OFOF 1B
IFIF
OF
OF
IF
1B
C
1B
IF
IF
OF
IF
IF
OF
IF OF
OF
IF
1BOF
OFOF
IF
IF
DH
IF
1B
OF
OF
UT
OF
OF
OF
1B OF
C
DH
IF
IF
DH
IF OFOF
IF
OF
OF
IFIF
C
OF
UT C
IF
OFC
C
IF
OF
OF
OF
OFOF
1B1BOF OF
OF
OF
IF
DH
OF
IF
1B
OF
OFOF
OF
OFC
DH
DH
IF
1B
IF
OFOF
IF
DH
OFC
OF C
OF
IF
CIF CCOF
OF
OFOF OF
IF
OF
IF
C
OF
IF
OFIF
1B
OF
1B
IF
UTOF
CIF
OF
CC
OF
DH
DH
IF
DH
OF
IF
IFIF
IF
OF
OF
IF
IF
IF
OF
DH
IF
IF
OF
1B
C
1BIF
IF
IF
1B
IF
IF
IF
IF
IF
UT
OF
IF
IFUT
OF
OF
IFIF
C
OFC
UT
C
IF
IF
OFUT
IFIFOF
1B
IF
IFIF
1B 1B
OF
IF
IF1B
OF
logSal
Years
Homer
Runs
Hits
RBI
Atbat
WalksPutouts
Assists Errors
1BC
DH
IF
OFUT
Dim
ensi
on 2
(17
.4%
)
-1.0
-0.5
0.0
0.5
1.0
1.5
Dimension 1 (46.3%)
-1.0 -0.5 0.0 0.5 1.0 1.5
Biplot of baseball data, with data ellipses by position
National Academies of Sciences, March 2005 27 c© Michael Friendly
Visualizing Multivariate Uncertainty biplot-bb
logSal
Years
Homer
Runs
Hits
RBI
Atbat
WalksPutouts
Assists Errors
1BC
DH
IF
OFUT
Dim
ensi
on 2
(17
.4%
)
-1.0
-0.5
0.0
0.5
1.0
1.5
Dimension 1 (46.3%)
-1.0 -0.5 0.0 0.5 1.0 1.5
Biplot of baseball data, with data ellipses by position (thinned)
National Academies of Sciences, March 2005 28 c© Michael Friendly
Visualizing Multivariate Uncertainty multivar1
Canonical discriminant plots
Project the variables into a low-rank (2D) space that maximally discrimates amongregions (Friendly, 1991)
Visual summary of a MANOVACanonical dimensions are linear combinations of the variables with maximumunivariate F -statistics.Vectors from the origin (grand mean) for the observed variables show thecorrelations with the canonical dimensions
Properties:
Canonical variates are uncorrelatedCircles of radius
√χ2
2(1 − α)/ni give confidence regions for group means.Variable vectors show how variables discriminate among groupsLengths of variable vectors ∼ contribution to discrimination
National Academies of Sciences, March 2005 29 c© Michael Friendly
Visualizing Multivariate Uncertainty multivar1
Canonical discriminant plots: Guerry data, by Region
National Academies of Sciences, March 2005 30 c© Michael Friendly
Visualizing Multivariate Uncertainty cdaplots-bb
CDA plots: Baseball data, by player position
logsalyears
homerruns
hits
rbi
atbat
walks
putoutsassists
errors
Position 1B C DHIF OF UT
Can
onic
al D
imen
sion
2 (
27.1
%)
-6
-4
-2
0
2
4
6
Canonical Dimension 1 (72.9%)
-6 -4 -2 0 2 4 6 8
National Academies of Sciences, March 2005 31 c© Michael Friendly
Visualizing Multivariate Uncertainty heplots
HE plots: Visualization for Multivariate Linear Models
How are p responses, Y = (y1, y2, . . . ,yp) related to q predictors,X = (x1, x2, . . . ,xq)? (Friendly, 2004a)
MANOVA: X ∼ discrete factorsMMRA: X ∼ quantitative predictorsMANCOVA, response surface models,
⎫⎪⎬⎪⎭
All the same MLM:
Y(n×p)
= X(n×q)
B(q×p)
+ E(n×p)
Analogs of univariate tests:
Explained variation: MSH �−→ (p × p) covariance matrix, HResidual variation: MSE �−→ (p × p) covariance matrix, ETest statistics: F �−→ |H − λE| = 0 �→ λ1, λ2, . . . λs
How big is H relative to E ?
Latent roots λ1, λ2, . . . λs measure the “size” of H relative to E ins = min(p, dfh) orthogonal directions.Test statistics: Wilks’ Λ, Pillai trace, Hotelling-Lawley trace, Roy’s maximumroot combine these into a single number
National Academies of Sciences, March 2005 32 c© Michael Friendly
Visualizing Multivariate Uncertainty heplots
HE plots: Visualization for Multivariate Linear Models
HE plot: for two response variables, (y1, y2), plot a H ellipse and E ellipse
HE plot matrices: For all p responses, plot an HE scatterplot matrix
→ Shows: size, dimensionality, and effect-correlation of H relative to E .
Scatter around group meansrepresented by each ellipse
(a)
1
2
3
4
5
6
7
8
H matrix
E matrix
Deviations of group means fromgrand mean (outer) and pooledwithin-group (inner) ellipses.
How big is H relative to E?
(b)
Individual group scatterY2
0
10
20
30
40
Y1
0 10 20 30 40
Between and Within ScatterY2
0
10
20
30
40
Y1
0 10 20 30 40
Essential ideas behind multivariate tests: (a) Data ellipses; (b) H and E ellipses
National Academies of Sciences, March 2005 33 c© Michael Friendly
Visualizing Multivariate Uncertainty heplots
Simple example: Iris data
Setosa
Versicolor
Virginica
Setosa
Versicolor
Virginica
Sep
al le
ngth
in m
m.
40
50
60
70
80
Petal length in mm.
10 20 30 40 50 60 70
Matrix Hypothesis Error
Sep
al le
ngth
in m
m.
40
50
60
70
80
Petal length in mm.
10 20 30 40 50 60 70
(a) Data ellipses and (b) H and E ellipses
H ellipse: Shows 2D covariation of predicted values (means)
E ellipse: Shows 2D covariation of residuals
points: show group means on both variables
National Academies of Sciences, March 2005 34 c© Michael Friendly
Visualizing Multivariate Uncertainty he-bb
Baseball data: Variation by position
How do relations among variables vary with player’s position?
Fit MANOVA model,
(logSal Years Homer Runs Hits RBI Atbat Walks PutoutsAssists Errors) = PositionHE plots for selected pairs: (Years, logSal), (Putouts, Assists)
1B
C
DH
IF OF
UT
Model: logSal Years ... Errors = Position
Matrix Hypothesis Error
log
Sal
ary
2.0
2.2
2.4
2.6
2.8
3.0
3.2
Years in the Major Leagues
0 5 10 15 20
C
DH
IF
OF
UT
Model: logSal Years ... Errors = Position
Matrix Hypothesis Error
Ass
ists
-100
0
100
200
300
Put Outs
0 100 200 300 400 500 600
National Academies of Sciences, March 2005 35 c© Michael Friendly
Visualizing Multivariate Uncertainty he-bb
1B
C
DH
IF OF
UT
Model: logSal Years ... Errors = Position
Matrix Hypothesis Error
log
Sal
ary
2.0
2.2
2.4
2.6
2.8
3.0
3.2
Years in the Major Leagues
0 5 10 15 20
National Academies of Sciences, March 2005 36 c© Michael Friendly
Visualizing Multivariate Uncertainty he-bb
C
DH
IF
OF
UT
Model: logSal Years ... Errors = Position
Matrix Hypothesis Error
Ass
ists
-100
0
100
200
300
Put Outs
0 100 200 300 400 500 600
National Academies of Sciences, March 2005 37 c© Michael Friendly
Visualizing Multivariate Uncertainty he-guerry
Guerry data: Predicting crime
How do rates of crime vary with other variables?
Fit MANCOVA model,
(Crime_pers Crime_prop) = Region + Wealth + Suicides+ Literacy + Donations + Infants
HE plots: Overall, plus for Region and covariate effects
.
Correze
Seine
Ain
Haute-Loire
Creuse
C
E
N
S W
Region
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Region effect
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
.
Correze
Seine
Ain
Haute-Loire
Creuse
.
Wealth
.
Suicides
.
LIteracy
.
Donation
.
InfantsP
op. p
er C
rime
agai
nst p
rope
rty
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Covariates: Wealth Suicides Literacy Donations Infants
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
National Academies of Sciences, March 2005 38 c© Michael Friendly
Visualizing Multivariate Uncertainty he-guerry
.
Correze
Seine
Ain
Haute-Loire
Creuse
C
E
N
S W
Region
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Region effect
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Overall: Predicted crimes against persons and property are negatively correlatedLarger variation in crimes against propertyRegion variation greater in crimes against persons
National Academies of Sciences, March 2005 39 c© Michael Friendly
Visualizing Multivariate Uncertainty he-guerry
.
Correze
Seine
Ain
Haute-Loire
Creuse
.
Wealth
.
Suicides
.
LIteracy
.
Donation
.
Infants
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Covariates: Wealth Suicides Literacy Donations Infants
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Pop
. per
Crim
e ag
ains
t pro
pert
y
0
5000
10000
15000
20000
Pop. per Crime against persons0 10000 20000 30000 40000
Each quantitative variable (covariate) plots as a 1D ellipse (vector)Orientation: relation of xi to y1, y2
Length: strength of relation
National Academies of Sciences, March 2005 40 c© Michael Friendly
Visualizing Multivariate Uncertainty multimap
Multivariate mapping: Map-centric displays
How to generalize choropleth maps to many variables?
Star maps: Show multivariate data on the map using star icons, variable ∼length of ray
Reduced-rank RGB displays: Factor analysis → (F1, F2, F3) factor scores�→ (R, G, B) shading
PREFMAP (x, y) maps: Fit data variables to (Long, Lat) map coordinates.Display variables as vectors in map coordinates.
National Academies of Sciences, March 2005 41 c© Michael Friendly
Visualizing Multivariate Uncertainty multimap
Star maps
Suicides
InfantsCrime_prop
Instruction
Donations Crime_pers
Departments coloredby Region
Star map of Guerry Variables (Ranks)
National Academies of Sciences, March 2005 42 c© Michael Friendly
Visualizing Multivariate Uncertainty multimap
Star maps: Medians by region
Suicides
InfantsCrime_prop
Instruction
Donations Crime_pers
Median Ranks by Region
National Academies of Sciences, March 2005 43 c© Michael Friendly
Visualizing Multivariate Uncertainty multimap
Star maps: Multivariate boxplots by region
Suicides
InfantsCrime_prop
Instruction
Donations Crime_pers
stars for Q1, Median, Q3How to show unusualdepts?
National Academies of Sciences, March 2005 44 c© Michael Friendly
Visualizing Multivariate Uncertainty gfactor
Reduced-rank color-coded displays
Use dimension-reduction technique (PCA, Factor analysis, ...) to produce scoresfor observations (departments) on 3 dimensions (F1, F2, F3)
Factor1 Factor2 Factor3Variable Civil society Moral values Crime
Pop per Crime against persons 0 97Pop per Crime against property 0 75 0 39
Percent Read & Write -0 72Pop per illegitimate birth 0 62 0 42
Donations to the poor 0 89Pop per suicide 0 80
Scale (F1, F2, F3) → [0,1]
Color mapping function, e.g., C(F1, F2, F3) �→ rgb(Fi, Fj , Fk)
National Academies of Sciences, March 2005 45 c© Michael Friendly
Visualizing Multivariate Uncertainty gfactor
Reduced-rank color-coded displays
F3 F2
F1
RGB 3-factor map: R=f1, G=f2, B=f3Variables: Crime_pers Crime_prop Literacy Infants Donations Suicides
1
2
3
4
5 7
8
9
10
11
12
13
14
15
16 17
18
19
21
22
23
24
25
26
27
28 29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47 48
49
50 51
52 53
54 55
56
57
58
59
60
61
62
63
64 65
66
67
68
69
70
71
72
75
76
77 78
79
80
81
82
83
84
85 86
87
88
89
200
National Academies of Sciences, March 2005 46 c© Michael Friendly
Visualizing Multivariate Uncertainty summary
Summary and future directions
Guerry’s challenge: Understanding uncertainty in multivariate, spatial data
How visualize and understand relations among many variables?How to relate these to geographic information?
Understanding multivariate variationVisual summaries (data ellipses, smoothings) can show statistical relationsmore clearly and effectivelyReduced-rank visualization methods can show simpler, approximate views,based on several criteria.Multivariate statistical models need their own visualization methods, justbeginning – HE plots as an example.
Understanding multivariate, spatial variationMultivariate visualizations applied to spatial data can be revealing, but still needworkStatistical methods for spatial data need to be extended to the multivariatesetting.
National Academies of Sciences, March 2005 47 c© Michael Friendly
Visualizing Multivariate Uncertainty
ReferencesFriendly, M. (1991). SAS System for Statistical Graphics. Cary, NC: SAS Institute, 1st edn.
Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. The AmericanStatistician, 56(4), 316–324.
Friendly, M. (2004a). Graphical methods for the multivariate general linear model. Journal ofComputational and Graphical Statistics. (submitted 2/17/04; resubmitted: 10/02/04).
Friendly, M. (2004b). Milestones in the history of data visualization: A case study in statisticalhistoriography. In W. Gaul and C. Weihs, eds., Studies in Classification, Data Analysis, andKnowledge Organization. New York: Springer. (In press).
Friendly, M. and Kwan, E. (2003). Effect ordering for data displays. Computational Statistics andData Analysis, 43(4), 509–539.
Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principalcomponents analysis. Biometrics, 58(3), 453–467.
Guerry, A.-M. (1833). Essai sur la statistique morale de la France. Paris. English translation:Hugh P. Whitt and Victor W. Reinking, Lewiston, N.Y. : Edwin Mellen Press, c2002.
Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13(1), 25–45.
National Academies of Sciences, March 2005 48 c© Michael Friendly