Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
MS1bStatistical Machine Learning and Data Mining
Yee Whye TehDepartment of Statistics
Oxford
http://www.stats.ox.ac.uk/~teh/smldm.html
1
Course Information
I Course webpage:http://www.stats.ox.ac.uk/~teh/smldm.html
I Lecturer: Yee Whye TehI TA for Part C: Thibaut LienantI TA for MSc: Balaji Lakshminarayanan and Maria LomeliI Please subscribe to Google Group:
https://groups.google.com/forum/?hl=en-GB#!forum/smldm
I Sign up for course using sign up sheets.
2
Course StructureLectures
I 1400-1500 Mondays in Math Institute L4.I 1000-1100 Wednesdays in Math Institute L3.
Part C:I 6 problem sheets.I Classes: 1600-1700 Tuesdays (Weeks 3-8) in 1 SPR Seminar Room.I Due Fridays week before classes at noon in 1 SPR.
MSc:I 4 problem sheets.I Classes: Tuesdays (Weeks 3, 5, 7, 9) in 2 SPR Seminar Room.I Group A: 1400-1500, Group B: 1500-1600.I Due Fridays week before classes at noon in 1 SPR.I Practical: Week 5 and 7 (assessed) in 1 SPR Computing Lab.I Group A: 1400-1600, Group B: 1600-1800.
3
Course Aims
1. Have ability to use the relevant R packages to analyse data, interpretresults, and evaluate methods.
2. Have ability to identify and use appropriate methods and models for givendata and task.
3. Understand the statistical theory framing machine learning and datamining.
4. Able to construct appropriate models and derive learning algorithms forgiven data and task.
4
What is Machine Learning?
sensory
What's out there?How does world work?
What's going to happen?What should i do?
data
5
What is Machine Learning?
data
InformationStructurePredictionDecisionsActions
http://gureckislab.org 6
What is Machine Learning?
Machine Learning
statistics
computerscience
cognitivescience
psychology
mathematics
engineeringoperationsresearch
physics
biologygenetics
businessfinance
7
What is the Difference?
Traditional Problems in Applied StatisticsWell formulated question that we would like to answer.Expensive to gathering data and/or expensive to do computation.Create specially designed experiments to collect high quality data.
Current SituationInformation Revolution
I Improvements in computers and data storage devices.I Powerful data capturing devices.I Lots of data with potentially valuable information available.
8
What is the Difference?
Data characteristicsI SizeI DimensionalityI ComplexityI MessyI Secondary sources
Focus on generalization performanceI Prediction on new dataI Action in new circumstancesI Complex models needed for good generalization.
Computational considerationsI Large scale and complex systems
9
Applications of Machine Learning
I Pattern Recognition
I Sorting ChequesI Reading License PlatesI Sorting EnvelopesI Eye/ Face/ Fingerprint Recognition
10
Applications of Machine Learning
I Business applicationsI Help companies intelligently find informationI Credit scoringI Predict which products people are going to buyI Recommender systemsI Autonomous trading
I Scientific applicationsI Predict cancer occurence/type and health of patients/personalized healthI Make sense of complex physical, biological, ecological, sociological models
11
Further Readings, News and Applications
Links are clickable in pdf. More recent news posted on course webpage.I Leo Breiman: Statistical Modeling: The Two CulturesI NY Times: RI NY Times: Career in StatisticsI NY Times: Data Mining in WalmartI NY Times: Big Data’s Impact In the WorldI Economist: Data, Data EverywhereI McKinsey: Big data: The Next Frontier for CompetitionI NY Times: Scientists See Promise in Deep-Learning ProgramsI New Yorker: Is “Deep Learning” a Revolution in Artificial Intelligence?
12
Types of Machine Learning
Unsupervised LearningUncover structure hidden in ‘unlabelled’ data.
I Given network of social interactions, find communities.I Given shopping habits for people using loyalty cards: find groups of
‘similar’ shoppers.I Given expression measurements of 1000s of genes for 1000s of patients,
find groups of functionally similar genes.
Goal: Hypothesis generation, visualization.
13
Types of Machine Learning
Supervised LearningA database of examples along with “labels” (task-specific).
I Given network of social interactions along with their browsing habits,predict what news might users find interesting.
I Given expression measurements of 1000s of genes for 1000s of patientsalong with an indicator of absence or presence of a specific cancer,predict if the cancer is present for a new patient.
I Given expression measurements of 1000s of genes for 1000s of patientsalong with survival length, predict survival time.
Goal: Prediction on new examples.
14
Types of Machine Learning
Semi-supervised LearningA database of examples, only a small subset of which are labelled.
Multi-task LearningA database of examples, each of which has multiple labels corresponding todifferent prediction tasks.
Reinforcement LearningAn agent acting in an environment, given rewards for performing appropriateactions, learns to maximize its reward.
15
OxWaSP
Oxford-Warwick Centre for Doctoral Training in StatisticsI Programme aims to produce EuropeÕs future research leaders in
statistical methodology and computational statistics for modernapplications.
I 10 fully-funded (UK, EU) students a year (1 international).I Website for prospective students.I Deadline: January 24, 2014
16
Exploratory Data Analysis
NotationI Data consists of p measurements (variables/attributes) on n examples
(observations/cases)I X is a n× p-matrix with Xij := the j-th measurement for the i-th example
X =
x11 x12 . . . x1j . . . x1p
x21 x22 . . . x2j . . . x2p...
.... . .
.... . .
...xi1 xi2 . . . xij . . . xip...
.... . .
.... . .
...xn1 xn2 . . . xnj . . . xnp
I Denote the ith data item by xi ∈ Rp. (This is transpose of ith row of X)I Assume x1, . . . , xn are independently and identically distributed samples
of a random vector X over Rp.
17
Crabs Data (n = 200, p = 5)
Campbell (1974) studied rock crabs of the genus leptograpsus. One species,L. variegatus, had been split into two new species, previously grouped bycolour, orange and blue. Preserved specimens lose their colour, so it washoped that morphological differences would enable museum material to beclassified.
Data are available on 50 specimens of each sex of each species, collected onsight at Fremantle, Western Australia. Each specimen has measurements on:
I the width of the frontal lobe FL,I the rear width RW,I the length along the carapace midline CL,I the maximum width CW of the carapace, andI the body depth BD in mm.
in addition to colour (species) and sex.
18
Crabs Data I
## load package MASS containing the datalibrary(MASS)## look at datacrabs
## assign predictor and class variablesCrabs <- crabs[,4:8]Crabs.class <- factor(paste(crabs[,1],crabs[,2],sep=""))
## various plotsboxplot(Crabs)hist(Crabs$FL,col=’red’,breaks=20,xname=’Frontal Lobe Size (mm)’)hist(Crabs$RW,col=’red’,breaks=20,xname=’Rear Width (mm)’)hist(Crabs$CL,col=’red’,breaks=20,xname=’Carapace Length (mm)’)hist(Crabs$CW,col=’red’,breaks=20,xname=’Carapace Width (mm)’)hist(Crabs$BD,col=’red’,breaks=20,xname=’Body Depth (mm)’)plot(Crabs,col=unclass(Crabs.class))parcoord(Crabs)
19
Crabs datasp sex index FL RW CL CW BD
1 B M 1 8.1 6.7 16.1 19.0 7.02 B M 2 8.8 7.7 18.1 20.8 7.43 B M 3 9.2 7.8 19.0 22.4 7.74 B M 4 9.6 7.9 20.1 23.1 8.25 B M 5 9.8 8.0 20.3 23.0 8.26 B M 6 10.8 9.0 23.0 26.5 9.87 B M 7 11.1 9.9 23.8 27.1 9.88 B M 8 11.6 9.1 24.5 28.4 10.49 B M 9 11.8 9.6 24.2 27.8 9.710 B M 10 11.8 10.5 25.2 29.3 10.311 B M 11 12.2 10.8 27.3 31.6 10.912 B M 12 12.3 11.0 26.8 31.5 11.413 B M 13 12.6 10.0 27.7 31.7 11.414 B M 14 12.8 10.2 27.2 31.8 10.915 B M 15 12.8 10.9 27.4 31.5 11.016 B M 16 12.9 11.0 26.8 30.9 11.417 B M 17 13.1 10.6 28.2 32.3 11.018 B M 18 13.1 10.9 28.3 32.4 11.219 B M 19 13.3 11.1 27.8 32.3 11.320 B M 20 13.9 11.1 29.2 33.3 12.1
20
Univariate Boxplots
FL RW CL CW BD
1020
3040
50
21
Univariate HistogramsHistogram of Frontal Lobe Size
Frontal Lobe Size (mm)
Fre
quen
cy
10 15 20
05
1015
20
Histogram of Rear Width
Rear Width (mm)
Fre
quen
cy
6 8 12 16 20
05
1015
Histogram of Carapace Length
Carapace Length (mm)
Fre
quen
cy
15 25 35 45
05
1015
20
Histogram of Carapace Width
Carapace Width (mm)
Fre
quen
cy
20 30 40 50
05
1015
20
Histogram of Body Depth
Body Depth (mm)
Fre
quen
cy
10 15 20
05
1015
2025
30
22
Simple Pairwise Scatterplots
FL
6 8 10 12 14 16 18 20 20 30 40 50
1015
20
68
1012
1416
1820
RW
CL
1520
2530
3540
45
2030
4050
CW
10 15 20 15 20 25 30 35 40 45 10 15 20
1015
20
BD
23
Parallel Coordinate Plots
FL RW CL CW BD
24
Visualization and Dimensionality Reduction
These summary plots are helpful, but do not really help very much if thedimensionality of the data is high (a few dozen or thousands).Visualizing higher-dimensional problems:
I We are constrained to view data in 2 or 3 dimensionsI Look for ‘interesting’ projections of X into lower dimensionsI Hope that for large p, considering only k� p dimensions is just as
informative.
Dimensionality reductionI For each data item xi ∈ Rp, find a lower dimensional representation
zi ∈ Rk with k� p.I Preserve as much as possible the interesting statistical
properties/relationships of data items.
25
Principal Components Analysis (PCA)
I PCA considers interesting directions to be those with greatest variance.I A linear dimensionality reduction technique:I Finds an orthogonal basis v1, v2, . . . , vp for the data space such that
I The first principal component (PC) v1 is the direction of greatest variance ofdata.
I The second PC v2 is the direction orthogonal to v1 of greatest variance, etc.I The subspace spanned by the first k PCs represents the ’best’ k-dimensional
representation of the data.I The k-dimensional representation of xi is:
zi = V>xi =
k∑`=1
v>` xi
where V ∈ Rp×k.I For simplicity, we will assume from now on that our dataset is centred, i.e.
we subtract the average x̄ from each xi.
26
Principal Components Analysis (PCA)
I Our data set is an iid sample of a random vector X = [X1 . . .Xp]>.
I For the 1st PC, we seek a derived variable of the form
Z1 = v11X1 + v12X2 + · · ·+ v1pXp = v>1 X
where v1 = [v11, . . . , v1p]> ∈ Rp are chosen to maximise
Var(Z1).
To get a well defined problem, we fix
v>1 v1 = 1.
I The 2nd PC is chosen to be orthogonal with the 1st and is computed in asimilar way. It will have the largest variance in the remaining p− 1dimensions, etc.
27
Principal Components Analysis (PCA)
−4 −2 0 2 4
−4
−2
02
4
X1
X2
−4 −2 0 2 4
−4
−2
02
4
PC_1
PC
_2
28
Deriving the First Principal ComponentI Maximise, subject to v>1 v1 = 1:
Var(Z1) = Var(v>1 X) = v>1 Cov(X)v1 ≈ v>1 Sv1
where S ∈ Rp×p is the sample covariance matrix, i.e.
S =1
n− 1
n∑
i=1
(xi − x̄)(xi − x̄)> =1
n− 1
n∑
i=1
xix>i =1
n− 1X>X.
I Rewriting this as a constrained maximisation problem,
L (v1, λ1) = v>1 Sv1 − λ1(v>1 v1 − 1
).
I The corresponding vector of partial derivatives yields
∂L(v1, λ1)
∂v1= 2Sv1 − 2λ1v1.
I Setting this to zero reveals the eigenvector equation, i.e. v1 must be aneigenvector of S and λ1 the corresponding eigenvalue.
I Since v>1 Sv1 = λ1v>1 v1 = λ1, the 1st PC must be the eigenvectorassociated with the largest eigenvalue of S.
29
Deriving Subsequent Principal Components
I Proceed as before but include the additional constraint that the 2nd PCmust be orthogonal to the 1st PC:
L (v2, λ2, µ) = v>2 Sv2 − λ2(v>2 v2 − 1
)− µ
(v>1 v2
).
I Solving this shows that v2 must be the eigenvector of S associated withthe 2nd largest eigenvalue, and so on
I The eigenvalue decomposition of S is given by
S = VΛV>
where Λ is a diagonal matrix with eigenvalues
λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0
and V is a p× p orthogonal matrix whose columns are the p eigenvectorsof S, i.e. the principal components v1, . . . , vp.
30
Properties of the Principal ComponentsI PCs are uncorrelated
Cov(X>vi,X>vj) ≈ v>i Svj = 0 for i 6= j.
I The total sample variance is given by
p∑
i=1
Sii = λ1 + . . .+ λp,
so the proportion of total variance explained by the kth PC is
λk
λ1 + λ2 + . . .+ λpk = 1, 2, . . . , p
I S is a real symmetric matrix, so eigenvectors (principal components) areorthogonal.
I Derived variables Z1, . . . ,Zp have variances λ1, . . . , λp.
31
R code
This is what we have had before:
library(MASS)Crabs <- crabs[,4:8]Crabs.class <- factor(paste(crabs[,1],crabs[,2],sep=""))plot(Crabs,col=unclass(Crabs.class))
Now perform PCA with function princomp. (Alternatively, solve for the PCsyourself using eigen or svd).
Crabs.pca <- princomp(Crabs,cor=FALSE)plot(Crabs.pca)pairs(predict(Crabs.pca),col=unclass(Crabs.class))
32
Original Crabs Data
FL
6 8 10 12 14 16 18 20 20 30 40 50
1015
20
68
1012
1416
1820
RW
CL
1520
2530
3540
45
2030
4050
CW
10 15 20 15 20 25 30 35 40 45 10 15 20
1015
20
BD
33
PCA of Crabs Data
Comp.1
−2 −1 0 1 2 3 −1.0 −0.5 0.0 0.5 1.0
−20
−10
010
2030
−2
−1
01
23
Comp.2
Comp.3
−2
−1
01
2
−1.
0−
0.5
0.0
0.5
1.0
Comp.4
−20 −10 0 10 20 30 −2 −1 0 1 2 −0.5 0.0 0.5
−0.
50.
00.
5
Comp.5
34
PC 2 vs PC 3
−2 −1 0 1 2 3
−2
−1
01
2
Comp.2
Com
p.3
35
PCA on Face Images
http://vismod.media.mit.edu/vismod/demos/facerec/basic.html 36
PCA on European Genetic Variation
http://www.nature.com/nature/journal/v456/n7218/full/nature07331.html 37
Comments on the use of PCA
I PCA commonly used to project data X onto the first k PCs giving thek-dimensional view of the data that best preserves the first two moments.
I Although PCs are uncorrelated, scatterplots sometimes reveal structuresin the data other than linear correlation.
I PCA commonly used for lossy compression of high dimensional data.I Emphasis on variance is where the weaknesses of PCA stem from:
I The PCs depend heavily on the units measurement. Where the data matrixcontains measurements of vastly differing orders of magnitude, the PC willbe greatly biased in the direction of larger measurement. It is thereforerecommended to calculate PCs from Corr(X) instead of Cov(X).
I Robustness to outliers is also an issue. Variance is affected by outlierstherefore so are PCs.
38
Eigenvalue Decomposition (EVD)
Eigenvalue decomposition plays a significant role in PCA. PCs areeigenvectors of S = 1
n−1 X>X and PCA properties are derived from those ofeigenvectors and eigenvalues.
I For any p× p symmetric matrix S, there exists p eigenvectors v1, . . . , vp
that are pairwise orthogonal and p associated eigenvalues λ1, . . . , λp
which satisfy the eigenvalue equation Svi = λivi ∀i.I S can be written as S = VΛV> where
I V = [v1, . . . , vp] is a p× p orthogonal matrixI Λ = diag {λ1, . . . , λp}I If S is a real-valued matrix, then the eigenvalues are real-valued as well,λi ∈ R ∀i
I To compute the PCA of a dataset X, we can:I First estimate the covariance matrix using the sample covariance S.I Compute the EVD of S using the R command eigen.
39
Singular Value Decomposition (SVD)
Though the EVD does not always exist, the singular value decomposition isanother matrix factorization technique that always exist, even for non-squarematrices.
I X can be written as X = UDV> whereI U is an n× n matrix with orthogonal columns.I D is a n× p matrix with decreasing non-negative elements on the diagonal
(the singular values) and zero off-diagonal elements.I V is a p× p matrix with orthogonal columns.
I SVD can be computed using very fast and numerically stable algorithms.The relevant R command is svd.
40
Some Properties of the SVDI Let X = UDV> be the SVD of the n× p data matrix X.I Note that
(n− 1)S = X>X = (UDV>)>(UDV>) = VD>U>UDV> = VD>DV>,
using orthogonality (U>U = In) of U.I The eigenvalues of S are thus the diagonal entries of 1
n−1 D2 and thecolumns of the orthogonal matrix V are the eigenvectors of S.
I We also have
XX> = (UDV>)(UDV>)> = UDV>VD>U> = UDD>U>,
using orthogonality (V>V = Ip) of V.I SVD also gives the optimal low-rank approximations of X:
minX̃‖X̃ − X‖2 s.t. X̃ has maximum rank r < n, p.
This problem can be solved by keeping only the r largest singular valuesof X, zeroing out the smaller singular values in the SVD.
41
Biplots
I PCA plots show the data items (as rows of X) in the PC space.I Biplots allow us to visualize the original variables (as columns X) in the
same plot.I As for PCA, we would like the geometry of the plot to preserve as much
of the covariance structure as possible.
42
BiplotsRecall that X = [X1, . . . ,Xp]> and X = UDV> is the SVD of the data matrix.
I The PC projection of xi is:
zi = V>xi = DU>i = [D11Ui1, . . . ,DkkUik]>.
I The jth unit vector ej ∈ Rp points in the direction of Xj. Its PC projection isV>j = V>ej, the jth row of V.
I The projection of the variable indicates the weighting each PC gives tothe original variables.
I Dot products between the projections gives entries of the data matrix:
xij =
p∑
k=1
UikDkkVjk = 〈DU>i ,V>j 〉.
I Distance of projected points from projected variables gives originallocation.
I These relationships can be plotted in 2D by focussing on first two PCs.
43
Biplots
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−5 0 5
−6
−4
−2
02
46
X1
X2
44
BiplotsI There are other projections we can consider for biplots:
xij =
p∑
k=1
UikDkkVjk = 〈DU>i ,V>j 〉 = 〈D1−αU>i ,D
αV>j 〉.
where 0 ≤ α ≤ 1. The α = 1 case has some nice properties.I Covariance of the projected points is:
1n− 1
n∑
i=1
U>i Ui =1
n− 1I.
Projected points are uncorrelated and dimensions are equi-variance.I The covariance between Xj and X` is:
Var(XjX`) =1
n− 1〈DV>j ,DV>` 〉
So the angle between the projected variables gives the correlation.I When using k < p PCs, quality depends on the proportion of variance
explained by the PCs.
45
Biplots
−0.2 −0.1 0.0 0.1 0.2
−0.
2−
0.1
0.0
0.1
0.2
Comp.1C
omp.
2
1
2
3
4
5
6
7
8
9
10
11
12
1314
15
16
17
18
1920
21
22
23
24
25
26
27
28
29
30
3132 3334
3536
37
38
39
40
41
42
4344
45
46
47 48
49
50
51
52
53
54
5556
57
58
59
60
61
6263
64
65
6667
68
69
70
71
72
73
74
7576
77
7879
80
81
8283
84
85
86
87
88
89
90
9192
93
94
95
96
97
98
99
100
−20 −10 0 10 20 30
−20
−10
010
2030
X1
X2
pc <- princomp(x)biplot(pc,scale=0)biplot(pc,scale=1)
46
Iris Data
50 sample from 3 species of iris: iris setosa,versicolor, and virginica
Each measuring the length and widths ofboth sepal and petals
Collected by E. Anderson (1935) andanalysed by R.A. Fisher (1936)
Using again function princomp and biplot.
iris1 <- irisiris1 <- iris1[,-5]biplot(princomp(iris1,cor=T))
47
Iris Data
−0.2 −0.1 0.0 0.1 0.2
−0.
2−
0.1
0.0
0.1
0.2
Comp.1
Com
p.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2425
26
27
28
29
3031
32
33
34
35
36
3738
39
40
41
42
43
44
45
46
47
48
49
50
51
52 53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
8182
8384
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125126
127
128
129
130
131
132
133134
135
136
137
138
139
140141142
143
144
145
146
147
148
149
150
−10 −5 0 5 10
−10
−5
05
10
Sepal.Length
Sepal.Width
Petal.LengthPetal.Width
48
US Arrests Data
This data set contains statistics, in arrests per 100,000 residents for assault,murder, and rape in each of the 50 US states in 1973. Also given is thepercent of the population living in urban areas.
pairs(USArrests)usarrests.pca <- princomp(USArrests,cor=T)plot(usarrests.pca)
pairs(predict(usarrests.pca))biplot(usarrests.pca)
49
US Arrests Data Pairs Plot
Murder
50 150 250
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
10 20 30 40
510
15
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
5015
025
0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●Assault
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
● UrbanPop
3050
7090
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●
5 10 15
1020
3040
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
30 50 70 90
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
Rape
50
US Arrests Data Biplot
−0.2 −0.1 0.0 0.1 0.2 0.3
−0.
2−
0.1
0.0
0.1
0.2
0.3
Comp.1
Com
p.2
AlabamaAlaska
Arizona
Arkansas
California
ColoradoConnecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana Iowa
Kansas
KentuckyLouisiana
MaineMaryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon Pennsylvania
Rhode Island
South Carolina
South DakotaTennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
−5 0 5
−5
05
Murder
Assault
UrbanPop
Rape
51
Multidimensional Scaling
Suppose there are n points X in Rp, but we are only given the n× n matrix D ofinter-point distances.
Can we reconstruct X?
52
Multidimensional Scaling
Rigid transformations (translations, rotations and reflections) do not changeinter-point distances so cannot recover X exactly. However X can berecovered up to these transformations!
I Let dij = ‖xi − xj‖2 be the distance between points xi and xj.
d2ij = ‖xi − xj‖2
2
= (xi − xj)>(xi − xj)
= x>i xi + x>j xj − 2x>i xj
I Let B = XX> be the n× n matrix of dot-products, bij = x>i xj. The aboveshows that D can be computed from B.
I Some algebraic exercise shows that B can be recovered from D if weassume
∑ni=1 xi = 0.
53
Multidimensional Scaling
I If we knew X, then SVD gives X = UDV>. As X has rank k = min(n, p),we have at most k singular values in D and we can assume U ∈ Rn×k,D ∈ Rk×p and V ∈ Rp×p.
I The eigendecomposition of B is then:
B = XX> = UDD>U> = UΛU>.
I This eigendecomposition can be obtained from B without knowledge of X!I Let x̃>i = UiΛ
12 be the ith row of UΛ
12 . Pad x̃i with 0s so that it has length p.
x̃>i x̃j = UiΛU>j = bij = x>i xj
and we have found a set of vectors with dot-products given by B.I The vectors x̃i differs from xi only via the orthogonal matrix V so are
equivalent up to rotation and reflections.
54
US City Flight Distances
We present a table of flying mileages between 10 American cities, distancescalculated from our 2-dimensional world. Using D as the starting point, metricMDS finds a configuration with the same distance matrix.
ATLA CHIG DENV HOUS LA MIAM NY SF SEAT DC0 587 1212 701 1936 604 748 2139 2182 543587 0 920 940 1745 1188 713 1858 1737 5971212 920 0 879 831 1726 1631 949 1021 1494701 940 879 0 1374 968 1420 1645 1891 12201936 1745 831 1374 0 2339 2451 347 959 2300604 1188 1726 968 2339 0 1092 2594 2734 923748 713 1631 1420 2451 1092 0 2571 2408 2052139 1858 949 1645 347 2594 2571 0 678 24422182 1737 1021 1891 959 2734 2408 678 0 2329543 597 1494 1220 2300 923 205 2442 2329 0
55
US City Flight Distances
library(MASS)
us <- read.csv("http://www.stats.ox.ac.uk/~teh/teaching/smldm/data/uscities.csv")
## use classical MDS to find lower dimensional views of the data## recover X in 2 dimensions
us.classical <- cmdscale(d=us,k=2)
plot(us.classical)text(us.classical,labels=names(us))
56
US City Flight Distances
−1000 −500 0 500 1000
−10
00−
500
050
010
00
ATLANTA
CHICAGO
DENVER
HOUSTON
LA
MIAMI
NY
SF
SEATTLE
DC
57
Lower-dimensional Reconstructions
In classical MDS derivation, we used all eigenvalues in theeigendecomposition of B to reconstruct
x̃i = UiΛ12 .
We can use only the largest k < min(n, p) eigenvalues and eigenvectors in thereconstruction, giving the ‘best’ k-dimensional view of the data.
This is analogous to PCA, where only the largest eigenvalues of X>X areused, and the smallest ones effectively suppressed.
Indeed, PCA and classical MDS are duals and yield effectively the sameresult.
58
Crabs Datalibrary(MASS)Crabs <- crabs[,4:8]Crabs.class <- factor(paste(crabs[,1],crabs[,2],sep=""))
crabsmds <- cmdscale(d= dist(Crabs),k=2)plot(crabsmds, pch=20, cex=2,col=unclass(Crabs.class))
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
● ●●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●● ●
●
●
●
●
● ●
●
●●
●●
●●
● ●
●
● ●
●●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
−30 −20 −10 0 10 20
−3
−2
−1
01
2
MDS 1
MD
S 2
59
Crabs DataCompare with previous PCA analysis.Classical MDS solution corresponds to the first 2 PCs.
Comp.1
−2 −1 0 1 2 3 −1.0 −0.5 0.0 0.5 1.0
−20
−10
010
2030
−2
−1
01
23
Comp.2
Comp.3
−2
−1
01
2
−1.
0−
0.5
0.0
0.5
1.0
Comp.4
−20 −10 0 10 20 30 −2 −1 0 1 2 −0.5 0.0 0.5
−0.
50.
00.
5
Comp.5
60
Example: Language dataPresence or absence of 2867 homologous traits in 87 Indo-Europeanlanguages.
> X[1:15,1:16]V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
Irish_A 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Irish_B 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Welsh_N 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0Welsh_C 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0Breton_List 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Breton_SE 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Breton_ST 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0Romanian_List 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Vlach 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Italian 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Ladin 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Provencal 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0French 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Walloon 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0French_Creole_C 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61
Example: Language dataUsing MDS with non-metric scaling.
MDS (i.e. cmdscale) which minimizes (d2ij ! d̃2
ij)2. Sammon thereby puts
more weight on reproducing the separation of points which are close byforcing them apart. Projection by MDS(Jaccard/sammon) with cluster dis-covery by k-means (Jaccard): There is an obvious east to west (top-left to
!0.5 0.0 0.5
!0.6
!0.4
!0.2
0.0
0.2
0.4
0.6
Irish_A
Irish_B
Welsh_NWelsh_C
Breton_List
Breton_SE
Breton_ST
Romanian_ListVlach
Italian
Ladin
Provencal
French
Walloon
French_Creole_C
French_Creole_D
Sardinian_N
Sardinian_L
Sardinian_C
Spanish Portuguese_STBrazilian
Catalan
German_STPenn_Dutch
Dutch_List
Afrikaans
Flemish
Frisian
Swedish_UpSwedish_VL
Swedish_List
Danish
Riksmal
Icelandic_ST
Faroese
English_STTakitaki
Lithuanian_O
Lithuanian_STLatvian
Slovenian
Lusatian_L
Lusatian_U
Czech
Slovak
Czech_E
Ukrainian
Byelorussian
PolishRussian
MacedonianBulgarian
Serbocroatian
Gypsy_Gk
Singhalese
Kashmiri
Marathi
Gujarati
Panjabi_ST
Lahnda
Hindi
Bengali
Nepali_List
Khaskura
Greek_ML
Greek_MD
Greek_Mod
Greek_D
Greek_K
Armenian_Mod
Armenian_List
Ossetic
Afghan
WaziriPersian_ListTadzik
Baluchi
Wakhi
Albanian_T
Albanian_Top
Albanian_G
Albanian_K
Albanian_C
HITTITE
TOCHARIAN_A
TOCHARIAN_B
bottom-right) separation of languages in the MDS and the clusters in theMDS grouping agree with the clusters discovered by agglomerative clus-tering and k-means. The two clustering methods group languages slightlydifferently with k-means splitting the Germanic languages.
## (alternative/MDS) make a field to display the clusters
## use MDS - sammon does this nicely
di.sam <- sammon(D,magic=0.20000002,niter=1000,tol=1e-8)
eqscplot(di.sam$points,pch=km$cluster,col=km$cluster)
text(di.sam$points,labels=row.names(X),pos=4,col=km$cluster)
5
62
Varieties of MDS
Generally, MDS is a class of dimensionality reduction techniques whichrepresents data points x1, . . . , xn ∈ Rp in a lower-dimensional spacez1, . . . , zn ∈ Rk which tries to preserve inter-point (dis)similarities.
I It requires only the matrix D of pairwise dissimilarities
dij = d(xi, dj).
For example we can use Euclidean distance dij = ‖xi − xj‖2. Otherdissimilarities are possible. Conversely, it can use a matrix of similarities.
I MDS finds representations z1, . . . , zn ∈ Rk such that
d(xi, xj) ≈ d̃ij = d̃(zi, zj),
where d̃ represents dissimilarity in the reduced k-dimensional space, anddifferences in dissimilarities are measured by a stress function S(dij, d̃ij).
63
Varieties of MDSChoices of (dis)similarities and stress functions lead to different objectivefunctions and different algorithms.
I Classical - preserves similarities instead
S(Z) =∑
i6=j
(sij − 〈zi − z̄, zj − z̄〉)2
I Metric Shepard-Kruskal
S(Z) =∑
i6=j
(dij − ‖zi − zj‖2)2
I Sammon - preserves shorter distances more
S(Z) =∑
i 6=j
(dij − ‖zi − zj‖2)2
dij
I Non-Metric Shepard-Kruskal - ignores actual distance values, only ranks
S(Z) = ming increasing
∑
i 6=j
(g(dij)− ‖zi − zj‖2)2
64
Nonlinear Dimensionality Reduction
Two aims of different varieties of MDS:I To visualize the (dis)similarities among items in a dataset, where these
(dis)disimilarities may not have Euclidean geometric interpretations.I To perform nonlinear dimensionality reduction.
Many high-dimensional datasets exhibit low-dimensional structure (“live on alow-dimensional menifold”).
65
Isomap
Isomap is a non-linear dimensional reduction technique based on classicalMDS. Differs from other MDSs in its estimate of distances dij.
1. Calculate distances dij for i, j = 1, . . . , n between all data points, using theEuclidean distance.
2. Form a graph G with the n samples as nodes, and edges between therespective K nearest neighbours.
3. Replace distances dij by shortest-path distance on graph dGij and perform
classical MDS, using these distances.
converts distances to inner products (17 ),which uniquely characterize the geometry ofthe data in a form that supports efficientoptimization. The global minimum of Eq. 1 isachieved by setting the coordinates yi to thetop d eigenvectors of the matrix !(DG) (13).
As with PCA or MDS, the true dimen-sionality of the data can be estimated fromthe decrease in error as the dimensionality ofY is increased. For the Swiss roll, whereclassical methods fail, the residual varianceof Isomap correctly bottoms out at d " 2(Fig. 2B).
Just as PCA and MDS are guaranteed,given sufficient data, to recover the truestructure of linear manifolds, Isomap is guar-anteed asymptotically to recover the true di-mensionality and geometric structure of astrictly larger class of nonlinear manifolds.Like the Swiss roll, these are manifolds
whose intrinsic geometry is that of a convexregion of Euclidean space, but whose ambi-ent geometry in the high-dimensional inputspace may be highly folded, twisted, orcurved. For non-Euclidean manifolds, such asa hemisphere or the surface of a doughnut,Isomap still produces a globally optimal low-dimensional Euclidean representation, asmeasured by Eq. 1.
These guarantees of asymptotic conver-gence rest on a proof that as the number ofdata points increases, the graph distancesdG(i, j) provide increasingly better approxi-mations to the intrinsic geodesic distancesdM(i, j), becoming arbitrarily accurate in thelimit of infinite data (18, 19). How quicklydG(i, j) converges to dM(i, j) depends on cer-tain parameters of the manifold as it lieswithin the high-dimensional space (radius ofcurvature and branch separation) and on the
density of points. To the extent that a data setpresents extreme values of these parametersor deviates from a uniform density, asymp-totic convergence still holds in general, butthe sample size required to estimate geodes-ic distance accurately may be impracticallylarge.
Isomap’s global coordinates provide asimple way to analyze and manipulate high-dimensional observations in terms of theirintrinsic nonlinear degrees of freedom. For aset of synthetic face images, known to havethree degrees of freedom, Isomap correctlydetects the dimensionality (Fig. 2A) and sep-arates out the true underlying factors (Fig.1A). The algorithm also recovers the knownlow-dimensional structure of a set of noisyreal images, generated by a human hand vary-ing in finger extension and wrist rotation(Fig. 2C) (20). Given a more complex dataset of handwritten digits, which does not havea clear manifold geometry, Isomap still findsglobally meaningful coordinates (Fig. 1B)and nonlinear structure that PCA or MDS donot detect (Fig. 2D). For all three data sets,the natural appearance of linear interpolationsbetween distant points in the low-dimension-al coordinate space confirms that Isomap hascaptured the data’s perceptually relevantstructure (Fig. 4).
Previous attempts to extend PCA andMDS to nonlinear data sets fall into twobroad classes, each of which suffers fromlimitations overcome by our approach. Locallinear techniques (21–23) are not designed torepresent the global structure of a data setwithin a single coordinate system, as we do inFig. 1. Nonlinear techniques based on greedyoptimization procedures (24–30) attempt todiscover global structure, but lack the crucialalgorithmic features that Isomap inheritsfrom PCA and MDS: a noniterative, polyno-mial time procedure with a guarantee of glob-al optimality; for intrinsically Euclidean man-
Fig. 2. The residualvariance of PCA (opentriangles), MDS [opentriangles in (A) through(C); open circles in (D)],and Isomap (filled cir-cles) on four data sets(42). (A) Face imagesvarying in pose and il-lumination (Fig. 1A).(B) Swiss roll data (Fig.3). (C) Hand imagesvarying in finger exten-sion and wrist rotation(20). (D) Handwritten“2”s (Fig. 1B). In all cas-es, residual variance de-creases as the dimen-sionality d is increased.The intrinsic dimen-sionality of the datacan be estimated bylooking for the “elbow”at which this curve ceases to decrease significantly with added dimensions. Arrows mark the true orapproximate dimensionality, when known. Note the tendency of PCA and MDS to overestimate thedimensionality, in contrast to Isomap.
Fig. 3. The “Swiss roll” data set, illustrating how Isomap exploits geodesicpaths for nonlinear dimensionality reduction. (A) For two arbitrary points(circled) on a nonlinear manifold, their Euclidean distance in the high-dimensional input space (length of dashed line) may not accuratelyreflect their intrinsic similarity, as measured by geodesic distance alongthe low-dimensional manifold (length of solid curve). (B) The neighbor-hood graph G constructed in step one of Isomap (with K " 7 and N "
1000 data points) allows an approximation (red segments) to the truegeodesic path to be computed efficiently in step two, as the shortestpath in G. (C) The two-dimensional embedding recovered by Isomap instep three, which best preserves the shortest path distances in theneighborhood graph (overlaid). Straight lines in the embedding (blue)now represent simpler and cleaner approximations to the true geodesicpaths than do the corresponding graph paths (red).
R E P O R T S
www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2321
on
De
ce
mb
er
5,
20
08
w
ww
.scie
nce
ma
g.o
rgD
ow
nlo
ad
ed
fro
m
Examples from Tenenbaum et al. (2000).
66
Handwritten Characters
tion to geodesic distance. For faraway points,geodesic distance can be approximated byadding up a sequence of “short hops” be-tween neighboring points. These approxima-tions are computed efficiently by findingshortest paths in a graph with edges connect-ing neighboring data points.
The complete isometric feature mapping,or Isomap, algorithm has three steps, whichare detailed in Table 1. The first step deter-mines which points are neighbors on themanifold M, based on the distances dX (i, j)between pairs of points i, j in the input space
X. Two simple methods are to connect eachpoint to all points within some fixed radius !,or to all of its K nearest neighbors (15). Theseneighborhood relations are represented as aweighted graph G over the data points, withedges of weight dX(i, j) between neighboringpoints (Fig. 3B).
In its second step, Isomap estimates thegeodesic distances dM (i, j) between all pairsof points on the manifold M by computingtheir shortest path distances dG(i, j) in thegraph G. One simple algorithm (16 ) for find-ing shortest paths is given in Table 1.
The final step applies classical MDS tothe matrix of graph distances DG " {dG(i, j)},constructing an embedding of the data in ad-dimensional Euclidean space Y that bestpreserves the manifold’s estimated intrinsicgeometry (Fig. 3C). The coordinate vectors yi
for points in Y are chosen to minimize thecost function
E ! !#$DG% " #$DY%!L2 (1)
where DY denotes the matrix of Euclideandistances {dY(i, j) " !yi & yj!} and !A!L2
the L2 matrix norm '(i, j Ai j2 . The # operator
Fig. 1. (A) A canonical dimensionality reductionproblem from visual perception. The input consistsof a sequence of 4096-dimensional vectors, rep-resenting the brightness values of 64 pixel by 64pixel images of a face rendered with differentposes and lighting directions. Applied to N " 698raw images, Isomap (K" 6) learns a three-dimen-sional embedding of the data’s intrinsic geometricstructure. A two-dimensional projection is shown,with a sample of the original input images (redcircles) superimposed on all the data points (blue)and horizontal sliders (under the images) repre-senting the third dimension. Each coordinate axisof the embedding correlates highly with one de-gree of freedom underlying the original data: left-right pose (x axis, R " 0.99), up-down pose ( yaxis, R " 0.90), and lighting direction (slider posi-tion, R " 0.92). The input-space distances dX(i, j )given to Isomap were Euclidean distances be-tween the 4096-dimensional image vectors. (B)Isomap applied to N " 1000 handwritten “2”sfrom the MNIST database (40). The two mostsignificant dimensions in the Isomap embedding,shown here, articulate the major features of the“2”: bottom loop (x axis) and top arch ( y axis).Input-space distances dX(i, j ) were measured bytangent distance, a metric designed to capture theinvariances relevant in handwriting recognition(41). Here we used !-Isomap (with ! " 4.2) be-cause we did not expect a constant dimensionalityto hold over the whole data set; consistent withthis, Isomap finds several tendrils projecting fromthe higher dimensional mass of data and repre-senting successive exaggerations of an extrastroke or ornament in the digit.
R E P O R T S
22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320
on D
ecem
ber
5, 2008
ww
w.s
cie
ncem
ag.o
rgD
ow
nlo
aded fro
m
67
Faces
tion to geodesic distance. For faraway points,geodesic distance can be approximated byadding up a sequence of “short hops” be-tween neighboring points. These approxima-tions are computed efficiently by findingshortest paths in a graph with edges connect-ing neighboring data points.
The complete isometric feature mapping,or Isomap, algorithm has three steps, whichare detailed in Table 1. The first step deter-mines which points are neighbors on themanifold M, based on the distances dX (i, j)between pairs of points i, j in the input space
X. Two simple methods are to connect eachpoint to all points within some fixed radius !,or to all of its K nearest neighbors (15). Theseneighborhood relations are represented as aweighted graph G over the data points, withedges of weight dX(i, j) between neighboringpoints (Fig. 3B).
In its second step, Isomap estimates thegeodesic distances dM (i, j) between all pairsof points on the manifold M by computingtheir shortest path distances dG(i, j) in thegraph G. One simple algorithm (16 ) for find-ing shortest paths is given in Table 1.
The final step applies classical MDS tothe matrix of graph distances DG " {dG(i, j)},constructing an embedding of the data in ad-dimensional Euclidean space Y that bestpreserves the manifold’s estimated intrinsicgeometry (Fig. 3C). The coordinate vectors yi
for points in Y are chosen to minimize thecost function
E ! !#$DG% " #$DY%!L2 (1)
where DY denotes the matrix of Euclideandistances {dY(i, j) " !yi & yj!} and !A!L2
the L2 matrix norm '(i, j Ai j2 . The # operator
Fig. 1. (A) A canonical dimensionality reductionproblem from visual perception. The input consistsof a sequence of 4096-dimensional vectors, rep-resenting the brightness values of 64 pixel by 64pixel images of a face rendered with differentposes and lighting directions. Applied to N " 698raw images, Isomap (K" 6) learns a three-dimen-sional embedding of the data’s intrinsic geometricstructure. A two-dimensional projection is shown,with a sample of the original input images (redcircles) superimposed on all the data points (blue)and horizontal sliders (under the images) repre-senting the third dimension. Each coordinate axisof the embedding correlates highly with one de-gree of freedom underlying the original data: left-right pose (x axis, R " 0.99), up-down pose ( yaxis, R " 0.90), and lighting direction (slider posi-tion, R " 0.92). The input-space distances dX(i, j )given to Isomap were Euclidean distances be-tween the 4096-dimensional image vectors. (B)Isomap applied to N " 1000 handwritten “2”sfrom the MNIST database (40). The two mostsignificant dimensions in the Isomap embedding,shown here, articulate the major features of the“2”: bottom loop (x axis) and top arch ( y axis).Input-space distances dX(i, j ) were measured bytangent distance, a metric designed to capture theinvariances relevant in handwriting recognition(41). Here we used !-Isomap (with ! " 4.2) be-cause we did not expect a constant dimensionalityto hold over the whole data set; consistent withthis, Isomap finds several tendrils projecting fromthe higher dimensional mass of data and repre-senting successive exaggerations of an extrastroke or ornament in the digit.
R E P O R T S
22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320
on D
ecem
ber
5, 2008
ww
w.s
cie
ncem
ag.o
rgD
ow
nlo
aded fro
m
68
Other Nonlinear Dimensionality Reduction Techniques
I Locally Linear Embedding.I Laplacian Eigenmaps.I Maximum Variance Unfolding.
69
Neural Electroencephalography (EEG)
http://newton.umsl.edu/tsytsarev_files/Lecture02.htm 70
Neural Spike Waveforms
71
Pairs Plot of Principal Components
−5
0
5
−20
−10
0
10
−10
−5
0
5
−5 0 5−5
0
5
−5 0 5
−5 0 5
−20 −10 0 10
−20 −10 0 10
−10 −5 0 5
−10 −5 0 5
−5 0 5
−5
0
5
−5
0
5
−20
−10
0
10
−10
−5
0
5
72
Clustering
I Many datasets consist of multiple heterogeneous subsets. Clusteranalysis is a range of methods that reveal this heterogeneity bydiscovering clusters of similar points.
I Model-based clustering:I Each cluster is described using a probability model.
I Model-free clustering:I Defined by similarity among points within clusters (dissimilarity among points
between clusters).I Partition-based clustering methods:
I Allocate points into K clusters.I The number of cluster is usually fixed beforehand or investigated for various
values of K as part of the analysis.I Hierarchy-based clustering methods:
I Allocate points into clusters and clusters into super-clusters forming ahierarchy.
I Typically the hierarchy forms a binary tree (a dendrogram) where eachcluster has two “children”.
73
Hierarchical Clustering
I Hierarchically structured data can be found everywhere (measurementsof different species and different individuals within species), hierarchicalmethods attempt to understand data by looking for clusters.
I There are two general strategies for generating hierarchical clusters.Both proceed by seeking to minimize some measure of dissimilarity.
I Agglomerative / Bottom-Up / MergingI Divisive / Top-Down / Splitting
Hierarchical clusters are generated where at each level, clusters arecreated by merging clusters at lower levels. This process can easily beviewed by a dendogram/tree.
74
Measuring DissimilarityTo find hierarchical clusters, we need some way to measure the dissimilaritybetween clusters
I Given two points xi and xj, it is straightforward to measure theirdissimilarity, say d(xi, xj) = ‖xi − xj‖2.
I It is unclear however how to extend this to measure dissimilarity betweenclusters, D(Ci,Cj) for clusters Ci and Cj.
Many such proposals though no concensus as to which is best.(a) Single Linkage
D(Ci,Cj) = minx,y
(d(x, y)|x ∈ Ci, y ∈ Cj)
(b) Complete Linkage
D(Ci,Cj) = maxx,y
(d(x, y)|x ∈ Ci, y ∈ Cj)
(c) Average Linkage
D(Ci,Cj) = avgx,y (d(x, y)|x ∈ Ci, y ∈ Cj)
75
Measuring Dissimilarity
76
Hierarchical Clustering on Artificial Dataset
−20 0 20 40 60 80 100
−40
−20
020
4060
80
V1
V2
1
2
3
4
5
6
7
8
910
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
3233 34
3536 37
3839
40
4142
43
44 45
4647
4849
50
51
52
5354
55
56
5758
59
60
6162
63
64
65
66 67
68
6970
71
72
7374
7576
777879
80
8182
8384
85
86
87
888990
91
92
93
94
95
96
97
9899
100101
102103
104
105106
107
108109
110
111112
113114
115
116117
118
119
120
121122
123
124
125
126
127128
129
130
131
132133
134135
136
137
138 139
140141142
143144 145
146
147
148149
150151
152
153
154
155156
157
158
159160
161162163
164
165
166
167 168
169
170171
172173
174
175
176 177
178
179180
181 182183
184
185
186
187188
189190191
192193 194195
196 197198
199
200201
202
203
204
205
206
207208
209210 211
212
213
214
215
216
217
218
219 220
221 222223
224225
226
227228229
230
231
232
233
234
235
236237
238
239
240241242
243
244
245246247
248
249
250251
252
253 254 255256257
258259
260261
262
263
264 265
266
267268
269
270
271272
273274
275
276
277
278
279
280
281282
283284285286
287
288
289 290
291
292
293
294
295
296
297
298299
300 301
302
303
304305
306
307308
309
310
311
312313
314
315
316
317
318319
320
321
322
323
324
325
326
327
328
329
330
331
332333334 335 336337
338339
340
341
342 343
344345
346347
348
349
350
351
352
353
354
355
356357
358
359
360
361
362
363364
365
366
367
368369
370371372
373
374375
376377378
379380
381382
383384385
386
387
388
389
390391
392393
394
395
396 397398
399
400
401402
403
404
405
406
407
408
409
410
411412
413
414
415
416
417 418419420
421
422
423424
425426
427
428
429
430
431432433
434
435
436437
438
439
440441 442
443
444445
446447 448
449
450
451
452453
454
455
456
457
458
459460
461
462
463464
465466467
468
469
470
471
472
473
474
475476
477478 479
480
481
482
483
484
485
486
487
488489
490491 492
493
494
495496
497
498499
500
501
502
503 504
505
506507
508
509510
511
512
513
514
515
516
517 518519
520521 522
523
524
525
526
527
528
529
530
531
532
533534
535
536
537
538539 540
541
542
543
544
545
546
547
548549
550
551552
553
554
555
556557
558
559
560
561
562
563
564
565
566567
568
569570
571 572573
574
575576577
578
579
580581
582 583
584585586587
588
589590
591
592593 594
595
596 597
598
599600
601
602603604
605
606
607
608
609
610
611
612
613
614615
616617
618619
620
621622
623 624
625
626
627
628629
630
631
632
633
634635636
637638
639
640641
642643 644
645
646
647
648
649650
651
652
653
654
655
656657
658659
660
661
662663
664 665
666
667
668669
670
671672
673674
675 676
677
678
679
680
681
682683
684
685
686
687688
689
690
691692
693
694
695696
697698699700701
702
703 704
705
706
707
708
709
710
711
712
713
714
715
716717
718
719
720
721722
723 724725
726
727728729
730731
732
733
734
735
736
737738
739
740
741
742743
744
745
746
747
748
749
750751 752753
754755756
757
758
759
760
761762
763764
765
766
767768
769
770771
772 773774
775
776777
778
779
780781
782
783784
785786787
788
789
790791
792
793794 795796797
798
799
800801
802803
804
805
806
807
808
809810
811
812
813814
815
816
817
818819
820821
822
823
824825
826 827828
829
830
831
832 833
834
835
836
837838
839 840
841
842
843844
845
846
847
848849
850
851
852
853854
855856
857 858
859
860861
862
863
864
865866 867
868
869
870
871
872 873874
875876
877878879880
881
882
883
884
885886 887888
889
890
891
892893
894895
896
897
898
899
900
901
902
903904
905
906
907 908
909
910
911912
913
914
915916
917
918919
920
921922
923
924
925926
927
928929930
931
932
933
934
935
936
937
938
939
940
941942
943944
945
946
947
948
949
950951952
953
954
955
956
957
958959960
961
962
963
964965
966967
968969
970
971972
973974
975976
977
978979
980
981
982983984
985
986
987
988
989990
991
992
993
994
995
996997998
9991000
1001
1002
1003
1004
10051006
1007
1008 1009
1010
10111012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
10231024
1025102610271028
1029 103010311032
10331034 1035
1036
1037
1038
1039
1040
1041
1042
10431044
1045
1046
1047
10481049
1050
1051
1052
1053
10541055
1056
1057
1058105910601061
1062
1063
1064
1065 1066
1067
1068
1069
1070
1071
1072
1073
1074
1075 1076
1077
1078
1079
1080
1081
1082
1083
10841085
1086
1087
1088
1089
10901091
1092
1093
10941095
1096 10971098
1099
11001101
1102
11031104
11051106
1107
11081109
1110
1111
111211131114
1115
1116
11171118
1119
1120
11211122
11231124
1125
1126
1127
1128
11291130
1131
11321133
1134
1135 1136
1137
11381139
1140
11411142 11431144
1145
1146
1147
11481149
1150
11511152
1153
1154
11551156
1157
1158
1159
11601161
1162
1163
1164
116511661167
11681169
1170
11711172
1173
1174
1175
117611771178
1179
1180
1181
11821183
1184
11851186
1187
11881189
1190
1191
1192
1193
1194
1195
1196
119711981199
1200
1201
1202
1203
12041205
1206
1207
1208
12091210
1211
1212
1213
1214
1215
1216
12171218
1219
1220
122112221223
1224
1225
12261227
12281229
1230
1231
1232
1233
1234
1235
1236
1237
1238
123912401241
1242
1243
1244
1245
12461247
12481249
1250
1251
1252
1253
125412551256
1257
12581259
1260126112621263
1264
1265
1266 12671268
1269
12701271
1272
1273
1274
1275
12761277
1278
1279 12801281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
12931294
12951296
1297
1298
129913001301
1302
1303
13041305 1306
1307
1308
13091310
13111312
1313
13141315
131613171318
1319
1320
13211322
1323
1324
1325
13261327
1328 132913301331
13321333
1334
1335
1336
1337
1338
13391340
13411342
1343
1344 13451346
1347
1348
13491350
1351
1352
1353
1354
13551356
13571358
1359
1360
13611362
1363
1364
1365
1366
1367
136813691370 1371
1372
1373
1374
137513761377
1378
1379
1380
1381
13821383
13841385
1386
1387
13881389
139013911392
13931394
1395
1396
1397
13981399
14001401
1402
1403
1404
1405140614071408
1409
1410
1411
14121413
1414
14151416
1417
1418
1419
1420
14211422
1423
1424
1425
1426
142714281429
1430
1431
1432
14331434
14351436
1437
1438
1439
144014411442 1443
1444
1445
1446
1447
14481449
1450
1451
1452
14531454
1455
1456
1457
14581459
14601461
1462
14631464
1465146614671468
1469
1470
1471
14721473
14741475
1476
1477
14781479
14801481
14821483
148414851486
14871488
1489
1490
1491
14921493
14941495
1496
1497
14981499
1500
15011502
1503
1504
150515061507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
15241525
1526
15271528
1529
15301531
15321533
1534
1535
1536
1537
1538
1539 1540
15411542
154315441545
15461547
15481549 1550
1551
1552 1553
1554
1555
1556
1557
1558
15591560
15611562
15631564
1565
1566
1567
1568
1569
1570 15711572
1573
1574
1575
1576
15771578
1579
15801581
1582
1583
1584
15851586
1587
1588
15891590
1591
1592
1593
15941595
1596
1597
159815991600
1601
1602 1603
1604
1605
1606
1607
1608
1609
1610
1611
161216131614
16151616
16171618
1619 16201621
1622
1623
1624
1625
16261627
162816291630
16311632
1633
1634
16351636 1637
16381639
1640
1641
1642
1643
1644
1645
1646
16471648
1649
165016511652
1653
1654
1655
1656
1657
1658
16591660
1661
1662
16631664
16651666
16671668
1669 1670
1671
1672
16731674
1675
167616771678
1679
1680
16811682
16831684
1685
16861687
1688 16891690
1691
1692
1693
1694
1695
1696
1697 169816991700
17011702
17031704
17051706
1707
1708
1709
17101711
17121713
1714
1715
1716
1717
1718
17191720
1721
1722
1723
1724 17251726
1727
1728
17291730
17311732
1733
1734
1735
1736 17371738
17391740
174117421743
1744
174517461747
1748
1749
1750
17511752
1753
1754
1755
1756
1757
1758
175917601761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
17721773
17741775
17761777177817791780
1781
17821783 1784
1785
17861787
1788
17891790
1791
179217931794
17951796
1797
1798
1799
1800
1801
1802
1803
1804
1805 18061807
1808
18091810
18111812
18131814
1815
18161817
18181819
1820
1821
1822
1823
182418251826
1827
1828
1829
1830
183118321833
1834
18351836
1837
18381839
1840
1841
1842 1843
1844
1845
1846
18471848
1849
1850
1851
18521853
1854185518561857 1858
1859
1860
18611862
1863
1864
1865
18661867
186818691870
1871
1872
1873
1874
1875
1876
18771878
1879
188018811882
1883
18841885
1886
18871888
1889
18901891
18921893
1894
18951896
18971898
18991900
1901
1902
1903 1904
1905
1906
190719081909
1910
19111912 1913
1914
1915
1916
19171918
1919
1920
1921
1922
1923
1924
1925
19261927
1928
1929
1930 1931
1932
1933
19341935
1936
19371938 19391940
19411942
194319441945
19461947
1948
1949
1950
19511952
1953
1954
1955
195619571958
1959
19601961
1962
1963
1964
19651966
1967
1968
19691970
19711972
1973
19741975
1976
1977
19781979
19801981
1982
19831984
1985
19861987
1988
19891990
1991
1992
19931994 1995
1996
1997
19981999
20002001
20022003
2004
20052006
2007
20082009
2010
2011
2012
2013
2014
2015
2016
201720182019
2020 20212022
2023 20242025
2026
20272028
2029
2030
2031
2032
20332034
2035
20362037
2038
20392040
2041
204220432044
204520462047
20482049
2050
2051
20522053
2054
2055
2056
2057
2058
2059 2060
2061
206220632064
206520662067
2068
20692070
2071
20722073
2074
207520762077
2078
2079
2080
2081
2082
2083
2084
20852086
2087
2088
2089
2090
20912092
2093
209420952096
2097
2098 2099
2100
2101
2102
2103 2104
2105
2106
2107
2108
2109
2110
211121122113
21142115
2116
21172118
2119
2120
2121
21222123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
21382139
2140
2141
2142
2143
21442145
2146
2147
21482149
2150 21512152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
21652166
2167
2168
21692170
2171
2172
21732174
2175
2176
2177 2178
2179
2180
2181
2182
2183
2184
21852186
21872188
2189
2190
2191
21922193
2194
21952196
2197
2198
2199
2200
2201
2202
22032204
22052206
2207 2208
2209
2210
2211
2212
2213
22142215
2216
22172218
2219 2220
2221
2222
22232224
2225
2226
2227
2228
22292230
2231
2232
22332234
2235
2236
2237
22382239
2240
2241
2242
2243
2244
2245
2246
22472248
2249
2250
22512252
2253
2254
225522562257
2258
22592260
2261
22622263
2264
2265 22662267
2268
22692270
2271
22722273
2274
22752276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
22932294
22952296
22972298
2299
2300
2301
2302
2303
23042305
2306
2307
2308
230923102311
2312
23132314
23152316
2317
2318 2319
2320
2321
23222323
2324
23252326
232723282329
2330
2331
2332
2333
23342335
2336
2337
2338
2339
2340
2341
2342
2343
23442345
2346
2347
23482349
2350
23512352
2353 2354
23552356
23572358
2359
2360
23612362
2363
23642365
2366
2367
2368
23692370
2371
2372
2373
2374
2375
2376
2377
2378
2379
23802381
23822383
2384
23852386
2387
2388
2389
23902391
23922393
23942395
2396
2397
23982399
24002401
2402 240324042405
24062407
2408
24092410
2411
2412
24132414
2415
2416
2417
24182419
2420
2421 24222423
24242425
2426
242724282429
2430
2431
2432
24332434
2435
2436
2437
2438
2439
244024412442
2443 2444
2445
24462447
2448
2449
2450
2451
2452
2453
24542455
2456
24572458 2459
2460
2461
24622463
246424652466
246724682469
2470
2471
2472
2473
2474
2475
2476
2477
24782479
2480
2481
2482
24832484
2485
2486
2487
24882489
2490
249124922493
2494
2495
2496
2497
24982499
2500
2501
2502
2503
250425052506
25072508
2509
2510
2511
2512
25132514
2515
2516
2517
2518 2519
2520
2521
2522
2523
25242525
2526 2527
2528
25292530
2531
2532
2533
2534
25352536
2537
25382539 2540
2541
2542
2543
2544
2545
2546
25472548
2549
25502551 25522553
2554
2555
25562557
255825592560
25612562
2563
2564
2565
25662567
2568
2569
2570
2571
2572
2573
25742575
2576
2577
2578
2579
2580
2581
25822583
2584
2585
2586
2587
25882589
2590
25912592
25932594
2595
25962597
25982599
2600
2601
2602
2603
2604
2605
2606
2607
26082609
26102611
2612
26132614
26152616
26172618
2619
26202621
2622
26232624
2625
2626
2627
2628
2629
2630
2631
2632
263326342635
2636
263726382639
2640
2641
26422643
26442645
264626472648
2649
2650
26512652
2653
2654
2655
2656
2657
26582659 2660
266126622663
2664
2665
2666
2667
26682669
2670
2671
2672
2673
2674
2675
2676
26772678
2679
26802681
2682
268326842685 26862687
2688
2689
2690
26912692
26932694
2695
2696
2697
2698
2699
27002701
2702
27032704
2705
27062707
2708
2709
2710
2711
2712
2713 27142715
2716
27172718
2719
27202721
2722
2723
2724
2725
2726
2727
2728
2729
27302731
2732
2733
2734
27352736
2737
27382739
2740
2741
27422743
27442745
2746
2747
2748
2749
2750
27512752
2753
2754
27552756
2757
2758
2759
2760
2761
2762
276327642765 2766
2767
27682769
2770
2771
2772
2773
2774
27752776
2777
2778
27792780
27812782
27832784
2785
2786
2787
27882789
2790
27912792
2793
2794
2795
2796
2797 2798
2799
2800
2801
2802
2803
28042805
2806
28072808
2809
28102811
2812
28132814
2815
28162817
2818
28192820
2821
2822
2823
28242825 2826
282728282829
2830
28312832
28332834
28352836
2837
2838
2839
28402841
28422843
2844
28452846
2847
2848
28492850
2851
28522853
2854
285528562857
28582859
2860
2861
2862
28632864
2865
286628672868
28692870
2871
28722873
2874
28752876
2877
2878
28792880
28812882
2883
2884
28852886
28872888
2889
28902891
2892
2893
2894
2895
2896
2897
2898
2899
2900 2901
2902
2903
2904
29052906
2907
2908
2909
29102911
2912
29132914
2915
2916
29172918
2919
2920
2921
2922
2923
29242925
29262927
29282929
2930
29312932
2933
29342935
2936
293729382939
2940
2941
2942
2943
2944
2945
29462947
29482949
2950
2951
2952
2953
2954
29552956
2957
295829592960
2961
2962
296329642965
2966
2967
2968
2969
2970
29712972
2973
2974
2975
2976
2977
29782979
2980
2981
2982
29832984
298529862987
2988
2989
29902991 2992
2993
2994
2995
29962997
2998 2999
3000
77
Hierarchical Clustering on Artificial Dataset
#start afreshdat=xclara #3000 x 2library(cluster)
#plot the dataplot(dat,type="n")text(dat,labels=row.names(dat))
plot(agnes(dat,method="single"))plot(agnes(dat,method="complete"))plot(agnes(dat,method="average"))
78
Hierarchical Clustering on Artificial Dataset
110
392
694
505
547 39
5 70 24 760 54
121
680
349
036
515
389
089
911
329
4 330 70
9 246
691
2245
153
074
7 620
804
240
516
32 502 4
9713
361
5 158
612
179
454
326
212
627
5123
528
918
02
714 1
1263
178
1 648
254
764 55
366
483 42
273
7 40 231
488
386 398 82
8 201
621
54 68 338 5
09 838
3116
729
934
965
6 256
600
237
703
396
769
670
700
63 706
588 95 17
568
673
040
533
913
038 632 56
747
262
773 26
474
485
273
784
817 35
787
575
18
25 557
474
517
559
746
813
718 750
521
531
450
549 44
76 152
354
872
885
796 79
455
552
221
78 881 34
610
615
431
189
782 184 1
96 457
144
372
693
253
323
28 893 58
5 126
124
634
1729
663
6 424
436
797
397
845 7
83 823
699
88 193
233
310
777 38
244
851
459
162
8 260
284
213
495 52
088
0 616
754
659
383
878 6
82 833
587
785 87
457
573
5 511
592 75 234
291
861
683
879 64
273
1 457
079
0 285
77 335
674
755
343
741
188
720
589 2
38 453
295
551
602
814
738
801 4
2930
548
152
102
617
444
778 202
848
576 3
7036
42 657
163
538
321
780
603
652
859
293
226
604
705
717 2
82 333
434
460
810 3
80 844
103
314
171
776
855
208
445
394
584
223
159
432
877
594
176
206 540
580
81 269 84
149
653
400
716
143
418 65
830
470
245
299 209
347
471
161
162
147
619
172
236
142
378
433 89
222
253
3 556
724
862
864 27
416
838
5 609
766
277
629 40
337
676
8 887 27
935
279
517
842
619
479
331
332
9 367
247
806
626
812
637
527
743
865
808
542
512 26
129
230
812
334
677
771
723 2
8649
980
519
237
783
7 849 59
071
013
552
4 856
281
563
548
359 2
4211 351
207
297
368 23
989
816 500
155
847 26
337
944
341
2 834
5621
564
126 503 25
925
849
419
778
727
374 701 88
6 504
668 5
890 341 10
867
567
8 458
889
882
651
842
742
748 25
231
269
7 198
218 3
3237
146
5 466 288
417
786
13 300
664
826 6
551
060
773
967
269
877
4 29
322
719
388
545
46 722
210
391 8
40 165
136
721
276
579
469
375
736 1
1782
286
612
858
669
676
3 513 80
734
575
764
414
058
259
968
039
314
171
389 195
477
37 336
467
605
230
431
442
830
486
873
518
752 50
134
145
767
820 57 779
759
2347
653
757
8 104 11
985
419
148
7 858
428 227
871
596
618
268
410 1
3959
861
180
081
689
116
057
175
680
979
2 219
271
870 55
364
0 660
721
129
830
189
453
270
646
597 72
8 534 665
888
479
635
35 704
214
265
66 225
464
643
485
555
900
515
673
853
564
798
187
577
791
821
625
788
1853
671
1 360
173
715
123
356 50
828
732
083
543
064
7 306
170
205
20 373
49 278
492
550 4
02 692
41 690
478
839
423
794
762 1
1480 200
543
844
658
368
473
2 761
734
484
67 369
645
654 53
550
712
125
136
3 811 66
930
366
738
142
1 729
480
827
190
449
183
613
315
109
661
249
257
440
447
896
156
860
302
733
344
355
472
836
87 189
970
786
7 48
473
802
290
43 132
272
122
851
758
129
624
863 3
28 415 6
0131
742
039
940
665
045 407 14
892
526
528
461
843
501 1
5150
669
576
5 6940
167
1 869
125
408
846
546
489
15 164
745
782
841
622
681
712
3364 199
97 525 131
389
409 48
221
250
475
544
83 94 127 868 68
522
441
1 358
419
459 1
81 825
220
364
404 7
99 630
496
532
266
468
1924
852
3 331
340
815
560
59 185
6211
855
4 228
362
39 319
177 98
116
120
283
522 49
337
457
293
614
655
414
633
649
427
595 89
518
657
388
472 244
539
441
384
342
593 34
843
983
2 753
157
666
498
361 3
3711
156
1 772 5
4313
858
151
910
020
478
915
046
356
885
7 267
676
275
413
689 4
6252
963
982
422
945
668
7 688
101
169
166
679
146
740
255
558
775
818
437
324
876
491
565
105
307 32
556
620
391
232
316
574
725
819
883
137
243
390
318
569
27 638
217
662
60 726 3
5010
717
438
660
814
85 850
110
606
831
245
727
96 435
470
1451 62
328
071 182
630
1683
901
1748
919
1390
1767
1931
1078
1094
1489
1151
1400
1877
2037 1
592
1999 19
9114
9411
2318
2515
8318
70 933
983
2001 936
1827
1006
1791 17
9896
397
516
5110
9819
3517
2519
4911
7216
2113
67 1144
1170
1946
1183
1606 15
0915
8115
2212
9816
6918
9710
9515
6113
8219
0713
1518
31 1050
1586
2050 18
4019
8419
2513
2911
0913
11 1267
1858
1272
976
1711
1226
1468
1247
1512
1782 18
6419
4013
0313
3015
3015
73 170
011
2912
8714
6216
2514
7218
9313
0916
2018
3215
2717
37 100
216
5519
4417
3115
4611
7316
96 1498
1978
1917
926
1042
1958 14
5918
7913
1313
6218
02 1936
1007
1579 18
1312
8515
5219
9413
8817
34 1076
1369
2029
1708
2020 10
7011
0113
4611
7618
6716
3520
40 143
614
2513
4210
4716
9412
41 109
693
713
6820
49 944
989
981
1764
1564
1759
1420
1225
1036
1107 1
299
1029
1482
1569
1762
1604
959
1636
1125
1189
1483
1710 1
691
1932
1122
1359
1559
1649
1892
1355
1456
1828
1976
974
984
1919
1422 16
1510
0812
93 1223
1143
1909 18
7415
5016
5219
72 146
314
7420
2315
7715
8016
3218
0910
6112
3015
6212
6210
5418
48 1839
1239
1963
1603
1780
1777
1281
1835
1664
1663
1806
1860
1937
1105
1587
1208
1816 9
3417
6919
2310
5515
7815
29 1967
1059
1268
1890
1630
1340
1585
1383 18
5510
1219
1613
7219
04 1847 16
3410
6613
3915
6714
0519
65 1548
950
1822 11
7412
8018
68 1682
1453
1740
1624
2007 1
575
1533
1104
1974
2031
1058
1826
1908
1464 99
617
8414
6516
5019
5719
4110
8813
4199
792
018
33 1389
1126
1057
2035 13
84 149
310
2416
6110
7412
6313
74 1302
1751
1572 11
8011
0218
1115
4112
3415
05 115
717
5792
813
7619
0210
8617
6020
2219
4512
0312
0796
210
1415
3113
26 1399
1306
1969 1
478
971
1049
1133
1458
1637
1100
1275
2010 14
7620
3312
6414
4913
1691
311
6212
6017
0717
7893
294
9 101
517
6519
9894
114
3817
7915
1518
0418
7811
4912
05 1219
1773 9
0713
3115
4418
1811
9819
8311
6517
8315
1919
9018
6212
3213
6014
1616
7420
03 977
1623
1197
1555
1873
1332
1659
1717
1385
1715
1514
1153
1508
1676
1181
1532
1545
1171
1736 17
6616
6219
11 152
393
813
33 102
512
8418
5213
4417
24 131
818
9619
2917
7620
1514
1314
1999
510
3719
81 904
1871
988
1046
1128
1253 16
4710
8514
2499
417
47 1713
1159
1697
1716
1912
1397
1407 19
8218
5610
1111
4213
1212
7911
5214
0618
98 982
1114
1597
1432
1211
1654
1084
1118
1411
1113
1454 10
2811
0318
6614
67 152
812
2216
81 164
211
3814
15 1256
1314
1501
1885 15
58 1704
916
1457
1466
1108
1216
1979
1345
1952 11
4111
47 181
591
517
20 1005
1845
2025 13
5319
9692
415
11 1602
1959
1328
1712
1136
1938
1956
1484
970
1093 19
8512
0612
1418
7610
2616
7713
00 1679
1178
1370
1186
1237
1381 13
7912
5516
1218
3712
9514
99 1282
1901
1068
1259
1274
1758 93
098
518
8496
720
1110
87 1746
1913
1140
1231
1245
1375
1403 17
4319
61 1488
1212
1441
1643
1888 16
2620
32 188
299
116
1619
21 103
911
5611
66 1795
1947
1185
1242 16
3314
8614
8515
3716
38 986
2002 1
726
1167
1266
1869
1989
1699
1805
1323
1391 1
179
1240
912
1960 11
32 948
1781
1444
1687
1927 19
0615
0320
08 1551
1308
1479
1448
1510
987
1439
2045
1215
1598 1
304
1673
956
978
1335
1513
1824
1729
1060
1971
1730
1204 1
003
1257
1793
1987
1106
1182
1690
1789 20
30 1518 10
2310
9217
0214
4514
7115
2496
110
6413
2119
70 154
014
9020
1293
115
6317
8893
514
3711
9513
3619
95 1030
1742
1875 13
5210
0411
4613
6192
212
5412
33 1955 13
4814
3415
9110
1312
61 1628
1939 1
506
1322
1918
1378
2024
1130
1398
1834
1863
1859 14
5594
011
3513
6514
0291
192
918
46 1754
1210
1358
1785 15
3418
08 1973
1243
1727
2047
1027
1619 19
4813
9319
97 1075
1814
1079
1943 15
47 1905
1112
1953 1
252
1350
1750 14
9618
19 130
113
9613
9416
1717
3815
2117
03 159
614
7311
8816
6020
09 1065
1903 16
8010
4418
3012
7012
9616
1392
710
9019
6617
0117
8618
8320
3618
9111
9218
41 969
1721
1763
966
1276
1770
903
1117
1688
1796 95
215
4310
4110
8217
56 162
918
3615
49 1535
1803
1857
1072
1139
1317
1062
1639
1115
1155 18
8112
2813
4316
1120
1319
4219
86 960
1560
972
1487 1
614
1177
964
1150
1168 10
4013
34 1357
1761 12
8318
72 2016
1292
908
1349
1246
1307 1
584
1895
1914
947
1752
1423
1810
1732
1829 16
7012
6914
21 193
495
117
9016
89 979
1609
1526
1698 10
9711
3114
4312
5812
91 204
818
6111
6015
53 202
191
711
6310
9914
9710
2210
7710
3518
51 1111 1
954
1470
1653
990
1213
1706
1217
1641
1656
1018
1605
1395
2043
1854 90
920
46 1305
1417
1412
1693
1993
1539
1608
1193
1354
992
1071
1286
1290
1794
1817
2044
1739
1705
1031
1073
2039
1191
1461 14
2714
4616
8611
6116
9211
1011
8414
52 1347 9
7310
5217
9720
42 1556 12
9712
6515
4216
4020
0519
3320
2616
2712
7713
92 177
112
4410
8013
5618
1214
9218
38 1695
1570
1657
1418
1977
1091
1236
1568 9
8011
1916
2218
80 178
712
9414
3516
3110
4312
7117
18 1380
1089
1926
1121
1250 9
5818
8714
0915
9414
2914
4019
62 142
812
2115
02 100
012
1816
7210
1616
00 153
818
4219
68 1920
1684
946
1886
1017
1733
918
1363
1792
1799
1950
1980
1063 1507 14
6917
2214
7515
3615
7416
7120
1790
295
711
9613
37 1491
1582
1894 17
1419
2819
8820
2713
1916
67 201
999
310
5115
6620
38 123
810
1915
20 1554
1900
1158
1164
1447
1709
1410
2006
1404
1450 9
2317
35 2018 1
145
939
1209
2041 12
5112
88 943
1154 16
65 999
1033
1741 14
6013
1013
77 200
4 1666
1889 1
169
1414
1034
1433
1865
1924
1187
1504 13
5195
411
2415
9394
211
9012
7812
2914
9511
4814
3119
92 1749 12
0217
4518
53 158
813
8710
09 945
921
1915
906
1001 1
576
1386
1477
1056
1480
1069
905
1645
1273
1775
1820
1481 14
0810
10 968
1565
1571
1719
965
1646
1021
1426
1964
2014 1
910
1517
955
1500
2028
1320
1324
914
1768 20
3410
6715
99 1045
1755
1116
1048
1843
1744
1823
1821
1235
1200
1224
910
1053
1801
1844
1083
1849
1675
1199
1595
1678
1327
1220
1668
1774 13
2518
99 1194
1516
1607
1364
1175
1032
1371
1020
1366
1590
1753
1807
1850
1975
1137
1589
1648 1
134
953
1800
1723
1081
1644
925
1601 1
227 15
5714
0120
0019
51 1618
1442
1772
1338
327
1430 11
2012
8919
2270
810
3816
1012
0113
7312
4917
2815
2519
3024
142
534 663 6
182
911
599
816
8520
5125
0729
4820
5521
2029
3828
25 2860
2169
2925
2149
2407
2213
2066
2703 22
8126
2924
6520
7822
8421
1526
9721
3926
0225
2129
3526
4924
1429
6427
8828
3226
1427
37 2409
2954
2097
2658
2669
2123
2769
2267
2400
2570
2786
2305
2308 29
3025
29 2133
2489
2672 22
0925
3628
96 246
427
0628
0621
2224
4027
2027
7825
9228
85 2615
2721
2818
2751
2878
2973 27
0926
2028
88 283
728
6920
7123
8326
17 2815
2328
2348 25
1329
6529
98 2476
2597
2142
2781
2084
2098
2546 28
70 2090
2329
2485 23
1123
5120
9623
0028
11 285
622
0123
34 2245
2471
2809
2235
2277
2487
2753 25
66 2408
2413
2976
2836
2928 24
2920
9525
49 2265
2677
2898
2349
2374 28
6428
55 2410 22
1922
6228
7628
7323
1528
12 268
129
9224
1523
8223
6925
3526
5325
7928
5228
83 2427
2401
2498
2755
2462
2639
2491
2594
2343
2635 23
6526
22 2686
2493
2969
2276
2929 2
428 21
5823
6127
9320
8021
3725
6225
3121
3023
22 2425
2232
2290
2683
2760
2293
2458
2984
2178
2621
2902
2205
2475 2
460
2278
2403
2532
2704
2582
2404
2656
2438 25
1722
4627
6129
5120
6021
4626
0725
4329
11 273
825
64 2730
2805 27
5020
6424
7922
5222
30 2188
2673
2075
2168
2337
2559 27
9528
3529
00 2416
2700
2695 28
7522
8624
2126
0921
2822
7023
5229
6123
9024
7421
9123
9125
0423
6424
5926
3323
1723
60 2558
2424
2914
2611
2651
2820
2439
2872
2393
2584
2436
2514
2576 20
6222
9723
41 2207
2103
2814
2829 2
392
2575
2569
2865
2953
2813
2881
2884
2121
2222
2112
2229
2548
2675
2816
2511
2823
2174
2810
2131
2652
2692
2908
2143
2378
2057
2089
2268
2707
2175 21
4824
9923
4626
2427
8228
2620
6727
6522
2428
4823
9426
80 2644
2118
2228
2527
2573 24
05 2236
2370
2742
2445
2843
2190
2238
2211
2254
2398
2568
2746 2
204
2567 2
743
2842
2110
2616
2295
2114
2135
2312 2
645
2822
2170
2868
2273
2239
2266
2335
2824 25
0229
3122
2323
5725
5626
8926
2727
14 2117
2323
2797
2124
2515 22
4925
6124
9222
1724
0625
2526
38 2844
2802
2451
2263
2132
2358
2705
2741 28
6323
8820
8529
8823
7629
9322
5628
49 2647
2478 25
06 2113
2151
2827
2423
2373
2968 23
8125
05 2719
2309
2985
2368
2324
2978 28
4026
9129
87 2747
2648
2995
2444
2661
2173
2780
2257
2306
2828 25
0025
5328
6128
91 298
627
17 227
220
8826
57 2282
2091
2162
2450
2956
2109
2269
2271
2728 24
1120
9229
4224
30 296
022
5122
6028
38 2604
2682
2632
2225
2758 23
0224
3423
7124
96 2587 2
244
2915
2554
2698
2420
2461
2516
2522
2684
2913
2608
2107
2326
2385
2975
2395
2618
2668
2218
2939
2715
2186
2231
2193
2776
2291
2526 2
386
2972
2418
2947
2530 29
09 2678
2052
2431 22
9627
1127
1320
7721
3421
5026
2829
50 2166
2181
2214
2237
2980
2303
2733
2770 2
294
2063
2255 25
4527
4827
79 2053
2596
2076
2099
2867
2331
2557
2694
2187
2399
2203 24
6629
9626
4228
5723
3323
6227
8920
9427
5624
2225
5224
5728
9028
94 276
726
3429
05 2643
2664
2494
2419
2630
2074
2981
2846
2585
2437
2936
2182
2777 2
227
2292
2880 2
375
2307
2519
2722 29
4427
85 295
927
9829
67 2145
2240
2775 21
0226
5527
83 248
230
0021
3621
9522
48 2736 2
762
2058
2990
2086
2108
2625 23
1629
4629
5824
5426
0627
68 267
421
5623
5321
6723
1829
34 2957 21
7724
7225
6329
0725
4426
59 2389
2723
2882
2105
2833 29
1228
8622
7529
2124
9725
60 2687
2859
2708 28
5027
3925
1826
8528
04 273
122
6120
6123
59 2441
2484
2577 27
2529
2726
5429
9723
3627
6620
6525
9321
92 2159 29
1028
8721
7922
2024
5520
6829
7422
4723
8424
17 2259
2899 24
7025
3323
1424
73 2726
2172
2970
2547
2877
2989
2800
2483
2079
2199
2745
2387
2298
2646
2841
2773
2082
2469
2901 29
71 270
128
6623
1929
4129
6328
3125
7428
0823
5426
93 2463
2379
2637
2509
2600 27
8728
9225
4125
5027
1829
9920
8322
4221
4129
2020
9326
4121
2623
9722
0624
81 2274
2435
2449 22
3324
5627
5221
6427
4021
3828
3421
8522
3423
3825
0127
63 258
325
7223
0426
6025
3923
6325
4229
3729
24 2488
2432
2081
2111
2279
2468
2528 2
879
2208
2667
2790
2087
2226
2327
2161
2350
2922 2
650
2994 27
12 2665
2906
2280
2153
2503
2520 23
4225
9920
5627
8426
0128
4529
4925
1222
8323
5520
6925
89 217
127
2929
1625
3422
1229
4322
0223
2128
7421
9426
36 221
627
9922
8729
52 2977
2571
2160
2955
2127
2918
2180
2749 21
63 2807
2565
2839
2250
2754
2771
2688
2871
2332
2299
2452
2101
2119
2759
2072
2285
2598
2372
2367
2258
2613
2330
2140
2716
2734
2155
2603
2523
2962
2581
2591
2198
2917 24
4829
2629
3223
0124
8624
4226
4023
4526
7928
9320
7326
9626
7622
8923
2525
88 2510 29
8224
9525
8626
1027
7229
9126
2320
5928
1922
2123
1021
0626
7028
9525
7822
1522
8827
4424
4321
4423
66 252
424
0226
9924
2629
0422
6421
1621
8921
9628
8923
4726
7124
8027
9623
5623
9625
0829
7928
4721
7621
0026
1925
3726
1221
4725
3821
2922
41 2595
2794
2580
2830
2791
2152
2490
2433
2735 26
2627
2727
5727
3223
4427
1029
1927
0228
9720
5421
6523
3921
0428
5822
5327
24 244
6 2340
2540
2453
2690
2933
2184
2605
2966
2183
2853 29
0321
2525
5528
5424
4722
0023
2027
7426
6622
4326
3129
83 259
028
0121
5421
5723
1328
5124
1228
6224
7721
9727
6425
5128
1720
7028
2128
0329
4529
40 309
1127
1658
2792
2923
610
1248
387
562
2662
2380
2467
353
770
2377
2210 41
674
926
63
02
46
810
Dendrogram of agnes(x = dat, method = "single")
Agglomerative Coefficient = 0.93dat
Hei
ght
79
Hierarchical Clustering on Artificial Dataset
124 760
392
694
395 10 541 70
505
547
153
890
899
22 365
212
627
13 300
826
664
698
774
713
140
582
599
680
268
410
393
667
6551
060
767
273
982
286
639
940
665
049
029 322
388
545
719
276
579
320
835 86 398
828
242
136
721
469
375
736
165
210
391
840
1853
671
136
017
371
528
751
235
289
180
246
691
123
356
508
430
647
7378
481
717
020
580 200
306 3 54 68 33
883
820
162
140
563 706
175
588 95
686
730
38 632
262
773
264
357
875
509
567 47
744
852
751
135
524
160
571
756
809
792
192
377
837
849
590
710 8
474
517
521
531
25 557
559
746
813
718
750 44 79
455
552 31
167
299
349
656
256
600
396
769 74 701
886
196
46 722
117
237
703
670
700
76 152
354
796
872
885 82 184
130
339
41 690
478
839
109
661
249
257
440
447
896
111
561
157
666
498
337
1433 64 199
97 525
131
59 185 62
118
554
389
409
482
113
294
330
709
216
803 20 373
49 278
402
423
794
762
492
550
692
186
573
884
228
362
114
427
595
895
414
633
649
229
462
491
565
27 638
318
569 43
772 244
441
361
348
384
539
439
832
753
772
138
581
519
342
593
543
60 726
350
107
241
425 3
8741
674
9 687 189
472
836
681 20
312
786
868
535
841
945
949
653
218
182
541
368
922
079
936
440
463
011 351
207
297
898
119
854
239
368
564
23 476
104
537
578
191
487
428
858 21
250
475
544
224
411
190
449
302
733
67 369
645
654
183
613
315
507
535
121
669
251
363
811
344
355
83 94 272
504
668 84 658
143
418
149
653
400
716 19
248
523
331
340
815
560
85 850
110
606 83
196 435
470 14
5112 334
677
771
723
198
218
856
332
466
371
465
417
786
853
156
860
798
288
800
464
643
485
555
900
28 893
585
126
81 269
383
878
874
587
785
682
833
144
372
693
253
323
154
311
897
450
549
291
861
683
879
642
731 75 234
308
871
281
563
359
548
286
499
805 16 500
155
847
263
379
443
412
834
219
553
640
271
870
660
66 225
328
415
26 503
259
258
5621
564
121
349
552
088
057
573
561
675
465
957
779
182
178
810
020
478
955
877
581
881
988
312
285
160
175
812
962
486
331
742
048
910
116
914
616
667
9 740
150
463
568
857
187
515
673 82
4 623
353
271
411
263
178
164
840 231
488
139
598
611
816
891 37 336 50
467
605 55
366
483
422
737
254
764
134
145
767
820
304
702
452
159
540
176
206
432
877
594 45 407 92
526
528
148
712
151
506
695
765
461
843
501 4
570
790
285
77 335
674
755
343
741
102
617
444
778
223
36 42 657
163
538
321
780
603
293
652
859 78 881
221
106
346
124
634
457
260
284
103
314
305
481
429
171
776
855
238
453
580
295
814
551
602
738
801
197
787
273
222
533
724
556
862
864
759
52 370
202
848
576
188
720
589
460
810
208
445
394
584
226
604
705
717
282
333
380
434
844
128
513
345
757
644
511
592
586
696
763
807 5
438
484
446
583
684
732
761 53
270
646
597
728
888
534
665 7
211
298
301
894 89 195
477
480
827
99 209
479
635
194
793
313
329
261
292
35 704
214
265 54
623
044
283
043
148
687
351
862
514
237
843
389
216
838
560
976
637
676
814
761
917
223
616
116
234
747
127
762
988
727
940
330
375
217
424
436
296
636
699
596
618
274
397
845
783
797
823
88 193
233
310
777
448
514
382
591
628 58 90 341
141
227
252
312
697
458
889
882
651
842
43 132
745
6940
167
112
540
884
686
949
452
774
386
554
257 779
352
795
108
675
678
742
748
178
426
367
247
806
637
626
812
512
808 39 319
177
255
9811
612
028
352
249
337
457
210
530
732
556
613
724
3 390
245
727
267
676 72
938
142
173
462
232
487
657
472
5 71 182 27
528
061
012
48 970
786
7 4829
047
380
2 15 164
782
841
266
468 9
123
231
634 663 61 829 11
5 309
32 502
497
240
620
804
133
615
158
516
451
530
747 93
614
655
179
454
326
612
456
687
688
174
386
608
217
662
529
639 30
1683
1081
1644
943
1154 99
916
6510
3411
6912
5112
8814
3318
6519
2470
810
3816
1011
2012
89 1922
327
1430
965
1646 1021
1069
1187
1504
1351
953
1800
1200
1235
960
1560
1357
1761
992
1071
1286
964
1334
1040
1150
1168
1283
1872
917
1163
1099
1497
1022
1077
1217
1641
1656
1018
1705
1035
1851
1111
1954
1470
1653
1722
944
989
1834
1863
981
1764
1564
1759
1420
1029
1299
1604
1482
1569
1762
1036
1107
1225
1180
1359 98
714
3920
4512
1515
9813
0499
012
1317
0695
916
3614
8317
1011
2511
2211
8916
9111
3013
9818
5914
5618
2819
7690
813
4912
4613
0715
8418
9519
1494
717
5216
7014
2318
1017
3218
2995
117
9016
89 979
1609
1526
1698 10
0911
6116
9212
6914
2119
3495
018
2211
7415
3315
7516
2420
0710
1312
6116
2819
3911
0419
7414
9318
8212
8018
6816
8214
5317
4010
9712
5812
9111
3114
4320
4812
2115
0211
6015
5318
6120
21 921
1915
958
1887
1428
1409
1594
1429
1440
1962
922
1254
1233
1955
931
1563
1195
1336
1995
1788
1030
1352
1742
1875 93
514
3713
6194
618
8610
1710
0411
4610
4418
3012
7012
9211
9915
9516
78 1327
1744
1823
1821
925
1601 9
9810
3213
71 1227 1
685
914
1768
2034
1048
1843
1067
1599
1045
1755 11
1610
0012
1816
7216
8410
1616
0015
3818
4219
6819
2010
4313
8012
7117
1810
8919
26 1175
1565
1194
1134
1338
1364
1516
1607
1249
1728
1320
1324 15
5790
117
4815
2717
3711
2919
1712
8714
6216
2514
7218
9313
0916
2018
3210
0216
5519
4414
2511
7316
9619
7814
9812
1214
41 985
1488
1546
1731
1743
1961
2016
1064
1540
1321
1970
1490
2012
928
1086
1203
1207
1264
1449
1476
2033
1760
2022 93
011
0012
7520
1013
1696
210
1415
3113
9913
2613
7619
0219
4516
4318
8813
0619
6914
7813
8817
3416
4918
9291
219
6011
3215
0320
0819
0613
0814
7914
4815
1015
5194
817
8114
4416
8719
2713
4814
3415
9197
110
4911
3314
5816
3711
4012
3112
4513
7514
03 916
1457
1466
1108
1216
1979
1345
1952
1141
1147
1138
1415
1528
1901
1068
1259
1815
1153
1181
1532
1545
982
1114
1597
1211
1432
1654
1103
1866
1467
1222
1681
1642
1256
1314
1501
1885
1558
1028
1084
1118
1411
1214
1876
1113
1454
1704
1136
1938
1956
1484 92
610
4219
5813
1314
5918
7910
9613
6218
0219
3611
8612
9514
9912
5516
1218
3712
8212
3713
8113
7997
010
9319
8512
0610
2616
7713
0016
7911
7813
7010
4716
9412
4114
1314
1910
0715
7918
1317
0820
2012
8515
5219
9410
7613
6920
2910
7014
3611
0113
4611
7618
6716
3520
40 920
1833
1389
1126
933
983
2001
1342
936
1827
1006
1791
1798
976
1711
1226
1468
1669
1897
1884
1247
1512
1782
1864
1940
1303
1330
1700
1530
1573 99
617
8416
5019
5719
41 997
1465
1057
2035
1384
1322
1918
1088
1341
1274
1758
963
1725
1949 975
1651
1098
1935
1144
1298
1522
1170
1946
1095
1561
1382
1315
1831
1907
1109
1311
1050
1925
1984
1172
1621
1367
1329
1586
2050
1840
1061
1262
1230
1562
1580
1632
1809 90
418
7199
417
4713
9714
0716
4719
8298
811
2812
5310
4610
8514
2411
5916
9717
1319
1217
1618
5691
517
2010
0518
4520
2513
5319
96 924
1511
1602
1328
1959
1011
1142
1152
1279
1312
1406
1898
1062
1639
1115
1155
1228
1881
1549
1712
1343
1611
2013
1673
1942
1986 94
211
9012
7812
2914
9510
2314
4514
7115
2412
9017
9418
1720
44 956
978
1335
1513
1003
1092
1702
1060
1971
1204
1730
1729
1824
1257
1793
1987
919
1390
1767
1931
1078
1183
1606
1509
1581
1094
1489
1991
1151
1400
1877
2037
1494
1825
1583
1870
1592
1999
1024
1661
1074
1263
1374
1751
1302
1572
1267
1858
1272 96
110
6519
0316
8011
2313
5511
0615
1811
8220
3016
9017
89 913
1162
1260
1707
1778
1515
1804
1878 97
414
6410
5818
2619
0898
419
1914
6314
7420
2315
7710
0812
9312
2314
2216
1511
4319
0918
7415
5016
5219
72 932
949
1015
941
1438
1779
1149
1205
1787
1102
1811
1757
1541
1157
1234
1505
1765
1998 934
1769
1923
1055
1578
1529
1967
1746
1339
1567
1405
1965
1548
1012
1916
1634
1066
1372
1904
1847
1059
1268
1890
1630
1855
1340
1585
1383
1054
1848
1663
1806
1839
1105
1587
1208
1816
1239
1963
1603
1780
1777
1281
1835
1664
1860
1937
1219
1773
1913
1626
2032 92
796
917
0117
8617
2117
6318
8320
3610
9019
6618
9111
9218
4196
720
1110
87 995
1037
1981 96
612
7617
7014
9115
8218
9417
1419
2819
8820
27 918
1507
1063
1792
1799
1363
1950
1980
1052
1797
2042
1265
1542
1627
1640
2005
1933
2026 98
012
7713
9211
1916
2218
8010
8013
5618
1214
1819
7712
4417
7112
2016
6817
7413
2518
9914
6914
9218
3816
9515
7016
57 945 955
1500
2028 99
310
5115
6620
3812
3810
5614
80 1201
1224
1723
973
1556
1110
1347
1184
1452 1148
1431
1992
1749
1019
1520
1554
1900
1158
1164
1447
1709
1091
1236
1297
1568
1410
2006
903
1117
1796
1688
952
1543
1041
1082
1756
1629
1836
1535
1803
1857
1179
1776
2015
938
1333
1025
1344
1724
1284
1852
1165
1783
1519
1990
1862
1416
1674
2003
1605
907
1232
1360
1072
991
1616
1921
1039
1198
1983
1331
1544
1818
1156
1166
1795
1485
1537
1638
1185
1242
1633
1947
1486
1027
1619
1948
1393
1997
1075
1814
1139
1317 91
111
8897
214
8716
1411
9313
5492
918
4617
5416
6020
0912
9616
1312
1013
5817
8512
4317
2720
4715
3418
0819
7393
713
6820
4919
3213
1818
9615
2319
2915
5920
3197
716
2315
1411
9715
5518
7313
3216
5917
1713
8517
1511
7117
3617
6616
6219
1115
0816
76 909
2046
1305
1417
939
1209
2041
1191
1461
1427
1446
1686
1412
1693
1993
1539
1608
1031
1073
2039
1394
1473
1112
1953
1252
1396
1177
1301
1350
1750
1496
1819
1033
1741
1460
1889
1310
1377
2004
1666
1414
940
1135
1365
1402
1294
1435
986
2002
1726
1240
1869
1989
1631
1699
1805
1167
1266
1323
1391 17
3910
1011
2112
5010
7919
4315
4719
0515
2117
0315
9616
1717
3812
0215
8814
0817
4518
5310
2018
5019
75 1648
1137
1589
1366
1590
1753
1807
1401
2000
1951
1525
1930
1442
1772
1618
902
957
1196
1337 91
010
8318
4910
5318
0118
44 906
1001
1576
1404
1450
1386
1477 15
1790
516
4512
7314
8117
7518
2013
1916
6720
19 923
1735
1145
954
1124
1593
1426
1910
1964
2014 96
816
7120
17 1387
1475
1536
1574
1733
1571
1719
1378
2024
1455
1506
1395
2043
1854 2018
1127
1373
1675 16
5856
226
6223
80 2467
2059
2819
2578
2215
2288
2744
2150
2628
2950
2571
2221
2310
2497
2560
2739
2327
2443
2106
2670
2895
2356
2396
2508
2979 2
125
2129
2241
2595
2244
2915
2554
2698
2299
2452
2453
2690
2933
2792
2052
2431
2296
2711
2713 2077
2454
2674
2606
2768
2912
2105
2833
2886
2167
2333
2177
2472
2907
2389
2563
2723
2147
2538
2318
2934
2957
2544
2659
2402
2699
2426
2904 20
7028
2121
8328
5326
3129
8325
5128
1722
7529
2125
1827
3126
8728
5928
5027
0822
8028
8729
6324
8326
8528
0424
4829
2629
32 2524
2586
2610
2772
2991
2055
2825
2120
2938
2860
2706
2806
2149
2407
2213
2169
2925
2267
2400
2305
2308
2930
2529
2570
2786
2066
2703
2281
2629
2078
2284
2139
2602
2465
2097
2658
2669
2709
2878
2973
2489
2672
2521
2935
2115
2697
2954
2620
2888
2837
2869
2409
2788
2832
2614
2737
2414
2964
2649
2107
2186
2385
2975
2193
2776
2231
2138
2834
2185
2234
2338
2583
2218
2939
2715
2291
2526
2909
2418
2947
2530
2326
2395
2618
2668
2386
2972
2678
2080
2137
2562
2531
2130
2322
2425
2232
2290
2683
2760
2293
2458
2984
2413
2976
2836
2928
2608
2809
2084
2098
2546
2870
2245
2471
2158
2429
2276
2929
2428
2748
2779
2362
2789
2522
2684
2913
2063
2255
2545
2103
2814
2829
2392
2575
2088
2657
2282
2604
2682
2569
2865
2953
2091
2162
2450
2956
2109
2271
2728
2269
2411
2225
2758
2371
2496
2587
2302
2434
2092
2942
2430
2632
2251
2260
2838
2960
2173
2780
2306
2828
2500
2986
2553
2861
2891
2134
2166
2181
2214
2257
2717
2237
2980
2303
2294
2733
2770
2051
2507
2948
2420
2461
2516
2079
2199
2387
2745
2112
2229
2143
2378
2511
2823
2174
2810
2548
2675
2816
2131
2652
2692
2908
2060
2738
2146
2607
2564
2730
2805
2543
2911
2750
2298
2646
2841
2630
2773
2062
2207
2297
2341
2121
2128
2961
2813
2881
2884
2270
2352
2390
2474
2074
2981
2585
2846
2307
2375
2437
2936
2250
2332
2182
2777
2227
2292
2880
2127
2918
2163
2180
2749
2565
2839
2807
2688
2871
2754
2771
2058
2990
2316
2086
2108
2625
2910
2272
2946
2191
2391
2504
2439
2872
2317
2360
2558
2424
2914
2611
2651
2820
2393
2584
2436
2514
2576
2085
2988
2376
2993
2959
2256
2849
2506
2444
2478
2647
2113
2381
2505
2151
2827
2423
2719
2373
2968
2309
2985
2368
2958
2094
2756
2831
2114
2135
2312
2645
2822
2863
2156
2353
2422
2552
2767
2457
2890
2894
2494
2634
2905
2643
2664
2064
2479
2252
2188
2230
2673
2075
2168
2337
2559
2795
2835
2900 22
3520
9623
0028
1128
5622
0123
3422
8624
2126
0924
1627
0026
9528
7522
1922
7725
6624
0824
8727
5323
4923
7428
5528
6423
8820
7123
8326
1729
6529
9823
2823
4824
7625
1320
9023
2924
8523
1123
5121
4227
8125
9728
1520
9525
4926
7728
9822
6524
1024
9329
6921
2425
1522
4925
6121
3223
5827
0524
9222
1728
4424
5128
0224
0625
2526
3822
6327
4123
6424
5926
3322
6228
7628
7324
6226
3924
9125
9423
1528
1223
8224
0124
9827
5524
1526
8129
9223
4326
3523
6526
8626
2223
6925
7928
5228
8324
2725
3526
5320
5627
8426
0128
4529
4925
1222
8323
5521
5526
0323
4523
6721
0226
5527
8324
8221
3627
3630
0021
9522
4827
6221
4027
1627
3421
9829
1725
2329
6226
7928
9324
1228
62 247
722
5327
2424
4623
4427
1029
1920
6829
7421
7925
7428
0821
6029
5522
4723
8424
1722
5928
9923
1424
7327
2624
7025
3323
2429
7828
4026
4829
9526
6126
9129
8727
4721
0026
1925
3726
1221
7229
7025
4728
7729
8928
00 215
721
9628
89 2791
2313
2851
2116
2189
2480
2796
2347
2671 24
4728
3029
0328
0329
4577
029
4020
7222
8525
9823
7221
7625
8021
5323
4224
3225
0325
2025
9925
5528
5420
9326
4121
2623
9724
5627
5225
0127
6329
2425
3925
4229
3721
6123
5029
22 2164
2304
2660
2363
2206
2481
2274
2233
2435
2449
2572
2740
2122
2818
2440
2720
2778
2592
2885
2615
2721
2145
2240
2775
2751
2123
2769
2133
2246
2761
2951
2222
2665
2650
2994
2712
2101
2119
2759
2264
2727
2757 21
8427
3220
5325
9620
9928
6720
7623
3125
5721
8723
9922
0324
4127
2529
2726
9428
4220
8722
2622
2024
5524
6629
9626
4228
5727
9829
6725
8125
9123
0124
8624
4226
4022
8925
1023
2525
8829
8223
4025
4026
2320
5723
4626
2420
8921
4824
9921
7522
6827
0722
1121
9024
4528
4325
6827
4620
6727
6522
2428
4823
9426
8026
4422
0924
6425
3628
9621
1822
2825
2725
7324
0522
3622
3823
9823
7027
4223
6127
9321
1026
1622
9522
0425
6727
4322
2323
5722
5427
8228
2621
1723
2327
9725
5626
8926
2727
1421
7028
6822
7329
3122
3925
0222
6623
3528
2420
6123
5921
5920
6525
9321
9223
3627
6624
8425
7726
5429
9720
8322
4221
4129
20 2794
2069
2589
2799
2943
2171
2729
2916
2534
2212
2287
2952
2977
2202
2321
2874
2319
2941
2261
2495
2178
2621
2902
2404
2656
2438
2517
2205
2475
2460
2532
2704
2278
2403
2582
2258
2613
2330
2419
2488
2073
2696
2676
2194
2636
2216
2144
2366
2923
2200
2243
2320
2774
2666
2590
2801
2605
2966
2210
2054
2165 2339
2104
2858
2702
2897 21
5428
4720
8121
1122
7921
5224
9024
3327
3526
2620
8224
6929
0127
0129
7127
8528
6625
1927
2229
4428
8229
0622
0826
6727
9024
6825
2828
7923
5426
9324
6327
1829
9923
7926
3725
5025
0926
0027
8728
9225
4126
6321
9727
6423
77
020
4060
8010
012
014
0
Dendrogram of agnes(x = dat, method = "complete")
Agglomerative Coefficient = 0.99dat
Hei
ght
80
Hierarchical Clustering on Artificial Dataset
124 760
216
803 10
392
694
395 70
505
547
153
890
899
5123
528
918
011
329
433
070
924
669
114
33 64 199
97 525
131
59 185 6
211
855
438
940
948
213 300
826
664
698
774
713
6551
060
767
273
982
286
626
841
0 393
667
29 322
388
545
719
136
721
469
375
736
276
579 22
451
530
747
212
627
365
541
399
406
650 49
032 502
497
240
620
804
516
133
615
158
612
179
454 32
6 9361
465
552
963
9 217
662
174
386
608
383
820
162
140
586 398
828
242
54 68 338 73
784
817
357
875
170
205
1853
671
136
032
083
512
335
650
817
371
528
730
638 632
509
567
4774
485
226
277
326
413
552
475
119
237
783
784
959
071
016
057
175
680
979
268
673
0 20 373
114
49 278
402
492
550
692
430
647
41 690
478
839
80 200
423
794
762
109
661
249
257
440
447
896
186
573
884
228
362
12 334
677
771
723
198
218
856
288
417
786 85
333
246
637
146
518
751
567
346
464
348
555
590
079
880
087
127 638
318
569 4
3772 244
384
539
441
157
666
498
361
772
111
561
337
138
581 51
934
259
334
843
983
275
3 543
229
462
491
565
414
633 64
942
759
589
5 271
411
263
178
164
813
414
576
782
0 37 336 50
467
605 45 407
5536
648
342
273
725
476
440 231
488
139
598
611
816
891 92
526
528
461
843
501
151
506
695
765
358
419
459 49
653
239 319
177 98
148
712
116
120
283
522
493
374
572
105
307
325
566 25
55
438
484
446
583
684
732
761 7
211
298
301
894
479
635 53
270
646
597
728
534
665
888
35 704
214
265 54
623
044
283
043
148
687
3 518
625
303
752
1742
443
678
379
729
663
669
982
359
661
888 193
233
310
777
382
448
514
591
628
142
378
433
892
274
397
845 75
922
253
355
672
486
286
416
838
560
976
627
762
940
327
937
676
888
757 779
352
795
147
619
172
236
99 209
161
162
347
471 89 195
477
512
808
178
426
367
194
793
313
329
261
292 58 90 341
252
312
697
141
227
140
582
599
680 69
401
671
869
125
408
846
108
675
678
458
889
882
651
842
742
748
247
806
637
626
812 49
452
774
386
554
2 970
786
7 4829
047
380
248
082
7 15 164
782
841 43 132 74
526
646
845
668
768
8 9123
231
64
570
790
285
188
720
589
460
810
238
453 52 370
202
848
576
102
617
444
778 36 42 657
163
538
321
780
603
293
652
859 77 335
674
755
343
741 22
310
331
417
177
685
512
463
445
719
778
727
329
581
455
160
273
880
130
548
142
912
858
669
676
380
751
364
420
844
539
458
434
575
7 84 658
143
418
380
149
653
400
716
304
702
452 27
215
943
287
759
417
620
654
058
022
660
470
571
728
233
343
484
4 847
451
725 557
750 44 79
455
552
196
559
746
813
718
74 701
886
76 152
796
221
354
872
885 82 184
130
339
670
78 881
346
106
260
284 31
167
299
349
656 63 706
588
175 95
237
703
700
256
600
396
769 46 722
117
165
210
391
840
28 893
585
126
81 269
154
311
897
450
549
521
531
383
878
682
833
587
785
511
592
874
75 234
308
291
861
683
879
642
731
144
372
693
253
323
281
563
359
548
286
499
805 11 351
207
297
368
898
119
854
564
16 500
155
847
263
239
379
443
412
834
219
553
640
271
870
660
26 503
259
258
5621
564
121
349
552
088
057
573
561
675
465
915
686
057
779
182
178
866 225
328
415
601
122
851
758
129
624
863
317
420 48
910
020
478
955
877
581
881
988
310
116
9 146
166
679 7
4015
046
356
885
782
4 623
1924
852
333
134
081
5 560
85 850
110
606 8
3123 476
537
578
104
191
487
428
858
67 369
645
654
183
613
315
507
535
190
449
121
251
363
811
669
344
355
504
668 21
250
475
544 83 94 127
868
685
181
825
220
364
404
630
799
413
689
87 189
681
224
411
302
733
472
836
203
34 663 61 829
115 30
913
724
3 390
245
727
267
676
381
421
734
729
622
324
876
574
725 20
7028
2196 435 47
0 1451
353
60 726
350 10
724
142
538
76
708
1038
1610
1120
1289 1
922
327
1430 7
118
2 275 280
610
1248
416
749
3016
8310
3411
6912
5112
8814
3318
65 1924
939
1209
2041 94
311
54 999
1665 10
8116
44 101
010
3317
4114
6018
8914
1413
1013
7720
0416
6690
311
1717
9616
8815
4910
8217
5616
2918
3695
215
4310
4110
7215
3518
0318
57 907
1331
1544
1818
1198
1983
1232
1360
1416
1674
2003 16
0593
813
3310
2512
8418
5213
4417
2411
6517
8315
1919
9018
6294
011
3513
6514
0298
620
0217
2616
9918
05 1631
1167
1266
1869
1989
1323
1391
1179
1776
2015 12
40 1739
909
2046
1305
1417
1193
1354
1412
1693
1993
1539
1608
1031
1073
2039
1191
1461
1427
1446
1686
929
1846
1754
1296
1613
1660
2009
1210
1358
1785
1534
1808
1243
1727
2047
1973
1027
1619
1948
1393
1997
1075
1814
1139
1317
1079
1943
1547
1905
1121
1250
1394
1473
1521
1703
1617
1738
1596
1112
1953
1252
1396
1301
1350
1750
1496
1819 11
7710
2018
5019
7513
6615
9017
5318
0711
3715
89 164
815
2519
3012
0215
8817
4518
5312
9414
35 1408
1401
2000
1951
1618
1442
1772
901
1748
1527
1737
1129
1917
1287
1462
1625
1472
1893
1309
1620
1832
1002
1655
1944
1731
1546
1173
1696
1978
1498
1425 98
510
6413
2119
7015
4014
9020
1210
9711
3114
4312
5812
9120
4811
6015
5318
6120
2192
810
8613
7619
0217
6020
2219
4512
1214
4116
4318
8896
210
1415
3113
2613
9916
4918
92 930
1100
1275
2010
1264
1449
1316
1476
2033
1180
1359
1203
1207
1306
1969
1478
971
1049
1133
1458
1637
1140
1231
1245
1375
1403
1488
1743
1961 20
1692
212
5412
3319
5593
514
3711
9513
3619
9517
8813
6193
115
6310
0411
4610
3017
4218
7513
5294
618
8610
1710
4418
3012
7012
9291
219
6011
3214
4416
8719
2719
0615
0320
0813
0814
7914
4815
1015
5194
817
8113
4814
3415
9110
4312
7117
1813
8010
8919
2610
0012
1816
72 1684
1016
1600
1538
1842
1968
1920 91
111
8896
015
6013
5717
6197
214
8716
1499
210
7112
8696
413
3410
4011
5011
6812
8318
7294
498
998
117
6415
6417
5914
2010
3611
0712
2510
2912
9916
04 987
1439
2045
1215
1598
1304
959
1636
1125
1189
1483
1710
1559
1122
1691
1932
1482
1569
1762
1318
1896
2031
1929
1130
1398
1834
1863 18
5914
5618
2819
76 917
1163
1099
1497
1022
1077
990
1213
1706
1217
1641
1656
1035
1851
1111
1954
1470
1653 10
6911
8715
0413
5117
2295
318
00 1200
1235
965
1646 10
2110
1817
0590
418
7198
811
2812
5310
4610
8514
24 994
1747
1397
1407
1647
1982
1159
1697
1713
1912
1716
1856
915
1720
1005
1845
2025
1353
1996 924
1511
1959
1602
1328
1712
1011
1142
1312
1152
1279
1406
1898
1062
1639
1115
1155
1228
1881
1343
1611
2013
1942
1986 16
7395
697
813
3515
1317
2918
2410
6019
7112
0417
3012
5717
9319
8710
0310
9217
0211
0620
3015
1811
8216
9017
89 961
1065
1903
1680 13
5591
614
5714
6618
1511
0812
1619
7913
4519
5211
4111
47 991
1616
1921
1039
1156
1166
1795
1947
1185
1242
1633
1486
1485
1537
1638 97
716
2315
1415
2311
9715
5518
7313
3216
5917
1713
8517
1511
7117
3617
6616
6219
1115
0816
7610
6812
5911
5311
8115
3215
45 926
1042
1958
1313
1459
1879
1362
1802
1936
937
1368
2049
1007
1579
1813
1708
2020
1285
1552
1994
1388
1734
1076
1369
2029
1096
1070
1101
1346
1436
1176
1867
1635
2040
970
1093
1985
1206
1214
1876 19
0110
2616
7713
0016
7911
7813
7010
4716
9412
4114
1314
1911
8612
5516
1218
3712
8212
3713
8113
7912
9514
99 920
1833
1389
1126
1057
2035
1384 99
713
4299
617
8414
6516
5019
5719
4110
8813
41 933
983
2001
1669
1897
1884
936
1827
1006
1791
1798
1172
1621
1367
976
1711
1226
1468
1247
1512
1782
1864
1940
1303
1330
1700
1530
1573 982
1114
1597
1211
1432
1654
1028
1084
1118
1411
1113
1454
1704
1136
1938
1956
1484
1103
1866
1467
1528
1222
1681
1642
1138
1415
1256
1314
1501
1885
1558
1024
1661
1074
1263
1374
1302
1751
1572
1109
1311
1267
1858
1272
913
1162
1260
1707
1778
1515
1804
1878
967
2011
1087
1583
1870 99
510
3719
81 919
1390
1767
1931
1078
1094
1489
1151
1123
1991
1400
1877
2037
1592
1999
1494
1825
963
1725
1949 975
1651
1098
1935
1095
1561
1382
1315
1831
1907
1144
1298
1522
1170
1946
1183
1606
1509
1581
1050
1925
1984
1329
1586
2050
1840
1061
1262
1230
1562
1580
1632
1809 97
411
5710
5818
2619
0814
6498
419
1914
2210
0812
9312
2316
1514
6314
7420
2315
7711
4319
0918
7415
5016
5219
7211
0515
8712
0818
1612
3919
6316
0317
8017
7712
8118
3516
6416
6318
0618
6019
3712
1917
7319
1316
2620
3293
294
910
1594
114
3817
7917
6519
9811
4912
05 1787
1012
1916
1634
1066
1372
1904
1847
1102
1811
1757
1234
1505
1541 90
295
713
1916
6720
1990
516
4512
7317
7518
20 1481
923
1735
2018
954
1124
1593 11
4514
2619
6420
1419
1013
7396
816
7120
1714
7515
3615
7417
33 1387
908
1349
1246
1307
1584
1895
1914
1013
1261
1628
1939
950
1822
1174
1533
1575
1624
2007
1221
1502
1274
1758
1322
1918
934
1769
1923
1055
1578
1529
1967 1746
1059
1268
1890
1630
1855
1340
1585
1383
1378
2024
1455
1506
1395
2043
1854
1054
1848
1839
1339
1567
1405
1965
1548
1104
1974
1493
1882
1280
1868
1682
1453
1740
947
1752 16
7014
2318
1017
3218
2912
6914
2119
34 1009
1161
1692
951
1790
1689 97
916
0915
2616
98 1571
1719
906
1001
1576
1386
1477
1404
1450
1517
910
1083
1849
1053
1801
1844 1
675
927
1192
1841
1090
1966
1891
969
1721
1763
1701
1786
1883
2036
966
1276
1770 9
4519
2819
8820
2711
9613
3714
9115
8218
9417
14 918
1363
1950
1980
1063
1792
1799 97
315
5610
5217
9720
4212
6515
4216
2716
4020
0519
3320
26 980
1277
1392
1119
1622
1880
1080
1356
1812
1418
1977 1244
1771
1469
1507
1492
1838
1695
1570
1657
1220
1668
1774
1325
1899
942
1190
1278
1229
1495
1023
1445
1471
1524
1290
1794
1817
2044
1110
1184
1452
1347
1148
1431
1992
1749
1019
1520
1554
1900
1158
1164
1447
1709
1091
1236
1297
1568
1410
2006 9
5515
0020
28 1224
1201
1723
993
1051
1566
2038
1238
1056
1480 91
417
6820
3410
6715
99 1048
1843
1744
1823 18
2110
4517
55 1116
921
1915
1199
1595
1678 1
327
958
1887
1428
1409
1594
1429
1440
1962
1032
1371 12
2792
516
0199
816
8511
3413
3811
7515
6511
9413
6415
1616
07 1249
1728
1320
1324 15
5711
2716
5856
226
62 2380 2
467
2792
2125
2453
2690 29
3321
2922
4125
9522
4429
1525
5426
9822
9924
5220
5224
3122
9627
1127
13 2077
2105
2833
2886
2454
2674
2606
2768
2912
2167
2333
2193
2776
2231
2107
2186
2385
2975
2218
2939
2715
2326
2395
2618
2668
2291
2526
2909
2386
2972
2678
2748
2779 21
4725
3821
7724
7225
6323
8927
2323
1829
3429
5725
4426
5929
0724
0226
9924
2629
0420
5928
1922
2123
1021
3421
6621
5026
2829
50 2571
2106
2670
2895 25
7822
1522
8827
44 2443
2356
2396
2508
2979
2275
2921
2518
2731
2687
2859
2850
2708 23
2724
9725
6027
3922
8028
8729
6324
8326
8528
04 2448
2926
2932 2
524
2586
2610
2772
2991 21
8328
5326
3129
8325
5128
1720
5125
0729
4824
2024
6125
1620
7921
9927
4523
8722
9826
4628
4126
3027
50 2773
2074
2981
2585
2846
2375
2437
2936 23
0721
8227
7722
2722
9228
8025
7428
0821
2729
1821
6321
8027
4925
6528
3928
0722
5027
5427
71 2332
2367
2688
2871
2056
2784
2601
2845
2949 25
1221
5526
03 2345
2102
2655
2783
2482
2283
2355
2136
2736
3000
2195
2248
2140
2716
2734
2198
2917
2523
2962
2679
2893
2412
2862
2477
2253
2724
2446
2344
2710 29
1920
6829
7421
7922
4723
8424
1721
6029
5522
5928
9924
7025
3323
1424
7327
2623
2429
7828
4026
4829
9526
6126
9129
8727
4721
0026
1925
3726
1221
7229
7025
4728
7729
8928
00 215
727
9123
1328
5121
1621
8924
8027
9621
9628
8923
4726
71 2447
2830
2903
2803
2945
2058
2990
2316
2086
2108
2625
2272
2946
2910
2191
2391
2504
2364
2459
2633 27
4123
1723
6025
5824
2429
1426
1126
5128
2024
3928
7223
9325
8424
3625
1425
7620
6027
3821
4626
0725
6425
4329
1127
3028
0520
6222
9723
4122
0721
2128
1328
8128
8429
6120
6424
7922
5222
3021
8826
7320
7521
6823
3725
5927
9528
3529
0021
2822
7023
5223
9024
7420
8529
8823
7629
9329
5922
5628
4925
0624
7826
4724
4421
1323
8125
0523
0929
8523
6821
5128
2727
1923
7329
6824
23 2958
2063
2255
2545
2103
2814
2829
2392
2575
2569
2865
2953
2112
2229
2143
2378
2511
2823
2174
2810
2548
2675
2816
2131
2652
2692
2908
2088
2657
2282
2604
2682
2091
2162
2450
2956
2109
2269
2411
2271
2728
2225
2758
2302
2434
2371
2496
2587 27
6220
9229
4224
3026
3222
5122
6028
3829
6022
5727
1723
0628
2825
0025
5328
6128
9129
8620
7123
8326
1729
6529
9823
2823
4825
1324
7628
0924
1329
7628
3629
2820
8420
9825
4628
7022
4524
7121
5824
2922
7629
2924
2820
9525
4922
6524
1026
7728
9821
4227
8125
9728
1520
9023
2924
8523
1123
5120
9623
0028
1128
5622
0123
3424
0822
8624
2126
0924
1627
0026
9528
7522
1923
4923
7428
5528
64 2388
2235
2277
2487
2753
2566
2080
2137
2562
2531
2130
2322
2425
2173
2780
2232
2290
2683
2760
2293
2458
2984
2362
2789
2522
2684
2913 26
0821
8122
1422
3729
8023
0322
9427
3327
7020
5325
9624
6629
9626
4228
5727
9829
6720
7623
3125
5726
9420
9928
6721
8723
9922
0320
8722
2622
2024
5525
8125
9120
8224
6929
0129
7127
0128
6625
1927
2229
4427
8528
8220
9427
5628
3124
2225
5227
6724
5728
9028
9424
9426
3429
0526
4326
6420
5421
65 2339
2104
2858
2702
2897
2081
2111
2279
2152
2490
2433
2735 26
2622
0826
6727
9024
6825
2828
79 2906
2354
2693
2463
2718
2999
2379
2637
2550
2509
2600
2787
2892
2541 2
154
2847
2197
2764
2377
2055
2120
2938
2825
2860
2169
2925
2149
2407
2213
2489
2672
2706
2806
2115
2697
2649
2954
2620
2888
2837
2869
2066
2703
2281
2629
2078
2284
2139
2602
2465
2521
2935
2409
2414
2964
2788
2832
2614
2737
2418
2947
2530
2097
2658
2669
2709
2878
2973
2122
2440
2720
2778
2818
2751
2592
2885
2615
2721
2123
2769
2133
2267
2400
2570
2786
2529
2305
2308
2930
2093
2641
2126
2397
2456
2752 21
6422
0624
8122
7422
3324
3524
4925
7227
4023
0426
6025
3923
6321
3828
3421
8522
3423
38 2583
2501
2763
2542
2937 29
2421
0121
19 2759
2264
2145
2240
2775 21
6123
5029
2226
5029
9427
1226
6520
5723
4626
2421
4824
9920
8921
7522
6827
0722
1121
9024
4528
4321
1822
2825
2725
7324
0522
3622
3823
7027
4223
9820
6727
6522
2428
4825
6827
4623
9426
8026
4422
0925
3628
9624
6423
6127
9321
7826
2129
0224
1924
0426
5624
3825
1723
1929
41 2488
2205
2475
2460
2532
2704 25
8222
7824
0322
4627
6129
51 2222
2114
2135
2312
2645
2822
2863
2156
2353
2170
2868
2273
2239
2502
2931
2223
2357
2266
2335
2824
2556
2689
2627
2714
2117
2323
2797
2124
2515
2249
2561
2492
2132
2358
2705
2217
2844
2263
2451
2802
2406
2525
2638
2262
2876
2873
2462
2639
2491
2594
2315
2812
2382
2401
2498
2755
2415
2681
2992
2343
2635
2365
2622
2686
2369
2535
2653
2427
2579
2852
2883
2493
2969
2061
2359
2654
2997
2336
2766
2441
2484
2577
2725
2927
2301
2486
2442
2640
2065
2593
2192
2159
2204
2567
2743
2842
2110
2616
2295
2254
2782
2826
2258
2613 23
3020
8322
4221
4129
20 279
422
8925
1023
2525
8829
8223
4025
4026
2320
6925
8927
9929
4321
7127
2929
16 2534
2202
2321
2874 22
6124
9522
1222
8729
5229
7720
7326
96 2676
2194
2636
2216
2144
2366
770
2940
2072
2285
2598 23
7221
5323
4225
0325
20 2432
2599
2176
2580
2732
2555
2854
2184
2727
2757 22
0022
4323
2027
74 2666
2590
2801
2605
2966
2210
2923
2663
020
4060
Dendrogram of agnes(x = dat, method = "average")
Agglomerative Coefficient = 0.99dat
Hei
ght
81
Using Dendograms
I Different ways of measuring dissimilarity result in different trees.I Dendograms are useful for getting a feel for the structure of
high-dimensional data though they don’t represent distances betweenobservations well.
I Dendograms show hierarchical clusters with respect to increasing valuesof dissimilarity between clusters, cutting a dendogram horizontally at aparticular height partitions the data into disjoint clusters which arerepresented by the vertical lines it intersects. Cutting horizontallyeffectively reveals the state of the clustering algorithm when thedissimilarity value between clusters is no more than the value cut at.
I Despite the simplicity of this idea and the above drawbacks, hierarchicalclustering methods provide users with interpretable dendograms thatallow clusters in high-dimensional data to be better understood.
82
Hierarchical Clustering on Indo-European Languages
Alb
ania
n_G
Alb
ania
n_C
Alb
ania
n_K
Alb
ania
n_T
Alb
ania
n_T
op
Gyp
sy_G
kS
ing
hal
ese
Kas
hm
iri
Nep
ali_
Lis
tK
has
kura Hin
di
Pan
jab
i_S
TL
ahn
da
Ben
gal
iM
arat
hi
Gu
jara
tiA
fgh
anW
azir
iO
sset
icW
akh
iB
alu
chi
Per
sian
_Lis
tT
adzi
k Gre
ek_K
Gre
ek_D
Gre
ek_M
od
Gre
ek_M
LG
reek
_MD
Arm
enia
n_M
od
Arm
enia
n_L
ist
HIT
TIT
ET
OC
HA
RIA
N_A
TO
CH
AR
IAN
_B Iris
h_A
Iris
h_B
Wel
sh_N
Wel
sh_C
Bre
ton
_Lis
tB
reto
n_S
EB
reto
n_S
TL
atvi
anL
ith
uan
ian
_OL
ith
uan
ian
_ST
Ukr
ain
ian
Bye
loru
ssia
nP
olis
hR
uss
ian
Lu
sati
an_L
Lu
sati
an_U
Cze
ch_E
Cze
chS
lova
kM
aced
on
ian
Bu
lgar
ian
Slo
ven
ian
Ser
bo
cro
atia
nIc
elan
dic
_ST
Far
oes
eS
wed
ish
_Lis
tS
wed
ish
_Up
Sw
edis
h_V
LD
anis
hR
iksm
alE
ng
lish
_ST
Tak
itak
iG
erm
an_S
TP
enn
_Du
tch Fri
sian
Fle
mis
hD
utc
h_L
ist
Afr
ikaa
ns
Ro
man
ian
_Lis
tV
lach
Sar
din
ian
_NS
ard
inia
n_L
Sar
din
ian
_CF
ren
ch_C
reo
le_C
Fre
nch
_Cre
ole
_D Pro
ven
cal
Fre
nch
Wal
loo
n Sp
anis
hP
ort
ug
ues
e_S
TB
razi
lian
Cat
alan
Ital
ian
Lad
in
0.0
0.2
0.4
0.6
0.8
1.0
83
K-meansPartition-based methods seek to divide data points into a pre-assignednumber of clusters C1, . . . ,CK where for all k, k′ ∈ {1, . . . ,K},
Ck ⊂ {1, . . . , n} , Ck ∩ Ck′ = ∅ ∀k 6= k′,K⋃
k=1
Ck = {1, . . . , n} .
For each cluster, represent it using a prototype or cluster centre µk.We can measure the quality of a cluster with its within-cluster deviance
W(Ck, µk) =∑
i∈Ck
‖xi − µk‖22.
The overall quality of the clustering is given by the total within-clusterdeviance:
W =
K∑
k=1
W(Ck, µk).
The overall objective is to choose both the cluster centres and allocation ofpoints to minimize the objective function.
84
K-means
W =
K∑
k=1
∑
i∈Ck
‖xi − µk‖22 =
n∑
i=1
‖xi − µci‖22
where ci = k if and only if i ∈ Ck.I Given partition {Ck}, we can find the optimal prototypes easily by
differentiating W with respect to µk:
∂W∂µk
= 2∑
i∈Ck
(xi − µk) = 0 ⇒ µk =1|Ck|
∑
i∈Ck
xi
I Given prototypes, we can easily find the optimal partition by assigningeach data point to the closest cluster prototype:
ci = argmink‖xi − µk‖2
2
But joint minimization over both is computationally difficult.
85
K-meansThe K-means algorithm is a well-known method that locally optimizes theobjective function W.Iterative and alternating minimization.
1. Randomly fix K cluster centres µ1, . . . , µK .2. For each i = 1, . . . , n, assign each xi to the cluster with the nearest centre,
ci := argmink‖xi − µk‖2
2
3. Set Ck := {i : ci = k} for each k.4. Move cluster centres µ1, . . . , µK to the average of the new clusters:
µk :=1|Ck|
∑
i∈Ck
xi
5. Repeat steps 2 to 4 until there is no more changes.6. Return the partition {C1, . . . ,CK} and means µ1, . . . , µK at the end.
86
K-means
Some notes about the K-means algorithm.
I The algorithm stops in a finite number of iterations. Between steps 2 and3, W either stays constant or it decreases, this implies that we neverrevisit the same partition. As there are only finitely many partitions, thenumber of iterations cannot exceed this.
I The K-means algorithm need not converge to global optimum. K-meansis a heuristic search algorithm so it can get stuck at suboptimalconfigurations. The result depends on the starting configuration. Typicallyperform a number of runs from different configurations, and pick bestclustering.
87
K-means on Crabs
Looking at the Crabs data again.
library(MASS)library(lattice)data(crabs)
splom(~log(crabs[,4:8]),col=as.numeric(crabs[,1]),pch=as.numeric(crabs[,2]),main="circle/triangle is gender, black/red is species")
88
K-means on Crabscircle/triangle is gender, black/red is species
Scatter Plot Matrix
FL2.62.83.03.2
2.62.83.03.2
2.02.22.42.6
2.02.22.42.6●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●● ●●●●
●●
● ●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●● ●●●●●●●●
●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●
●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
RW2.42.62.83.0
2.42.62.83.0
1.82.02.22.4
1.82.02.22.4●
●●●●●●●●●●●
●●●●●
●●●●●●●●
●●●●●●
●●●●●●●
●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●
●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●
●●●●●●●●●● ●
●●●●●
●●●●●●●●●●●●●●
●●●●● ●●●●●●●
●●●
●●●●
●●●
●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●
●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●
●●● ●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●
●
CL3.43.63.8
3.43.63.8
2.83.03.2
2.83.03.2●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●
●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●● ●●●●●
●●●●●●●●●●●●●
●
●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●
●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●
●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●● ●●●●
●●● ●●
●●●●●●●●●●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●
●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●
●●●●●●
CW3.43.63.84.0
3.43.63.84.0
2.83.03.23.4
2.83.03.23.4●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●●
●●●●● ●●●●●●●●●●●●●
●
●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●
●●● ●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●●
●●●●●●●●●●●
●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●
●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
BD2.5
3.02.5 3.0
2.0
2.5
2.0 2.5
89
K-means on Crabs
Apply K-means with 2 clusters and plot results.
cl <- kmeans( log(crabs[,4:8]), 2, nstart=1, iter.max=10)
splom(~log(crabs[,4:8]),col=cl$cluster+2,main="blue/green is cluster finds big/small")
90
K-means on Crabsblue/green is cluster finds big/small
Scatter Plot Matrix
FL2.62.83.03.2
2.62.83.03.2
2.02.22.42.6
2.02.22.42.6 ●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●● ●●●●
●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●
● ●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●
●●●
●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●
●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●
●●●
●●●●●●●
●●●●●●●●●●●●●●● ●●●●●●●●●
●●●●●●●●●
●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●
●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●● ●●●●●●●●
●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●
●●●
●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●
●
●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
RW2.42.62.83.0
2.42.62.83.0
1.82.02.22.4
1.82.02.22.4●
●●●●●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●●
●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●●●
●●●●●
●●●●●●●●
●●●●●●
●●●●●●●
●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●●●●●
●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●
●●●●●
●
●●●●●
●●●●●●●
●●●●●●●●●●●●
●●●●●●
●●●●●●●●●
●●●●
●
●●●●●
●
●
●●●●●●●●●● ●
●●●●●
●●●●●●●●●●●●●●
●●●●● ●●●●●●●
●●●
●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●
●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●
●●●●●
●
●●●
●●●●
●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●
●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●●● ●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●
●
CL3.43.63.8
3.43.63.8
2.83.03.2
2.83.03.2●
●●●●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●
●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●
●●●
●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●● ●●●●●
●●●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●●
●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●● ●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●●● ●●
●●●●●●●●●●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●
●
●
●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●
●
●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●
●●●●●●
CW3.43.63.84.0
3.43.63.84.0
2.83.03.23.4
2.83.03.23.4 ●●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●●
●●●●● ●●●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●
●
●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●
●
●
●●●
●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
● ●●●●
●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●
●●● ●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●
●●
●●●●●●●●●●●
●●●●●●
●
● ●●●●
●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●
●●●
●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●
●
●●●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
BD2.5
3.02.5 3.0
2.0
2.5
2.0 2.5
91
K-means on Crabs
‘Whiten’ or ‘sphere’1 the data using PCA.
pcp <- princomp( log(crabs[,4:8]) )spc <- pcp$scores %*% diag(1/pcp$sdev)splom( ~spc[,1:3],
col=as.numeric(crabs[,1]),pch=as.numeric(crabs[,2]),main="circle/triangle is gender, black/red is species")
And apply K-means again.
cl <- kmeans(spc, 2, nstart=1, iter.max=20)splom( ~spc[,1:3],
col=cl$cluster+2, main="blue/green is cluster")
1Apply a linear transformation so that covariance matrix is identity.92
K-means on Crabs
circle/triangle is gender, black/red is species
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●●● ●● ●●● ●
●●● ●●
●●●
●
●●
●●●
● ●●●●●
● ●●●●● ●●●
●● ●● ●●●●●●
●● ●●●
●● ●●●●●●
●
●●●●●
●
●
● ●●●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●●●●●●
●●●● ●
●●●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●●
●●●●●
●● ●●●●●● ●●
●●●
●●●●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
20 1 2
−2
−1
0
−2 −1 0
●
●
●●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●●●
●●●
●●●
●
●
●
●●●●
●●
●●●●
●
●
●●
●
●●
●●●●●
●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●●
● ●●●
●
●
●●
●
●●
●●●● ●
●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
blue/green is cluster
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●
●●
● ●● ●●
●●● ● ●●● ●●●●●● ● ●●●
● ●●● ●●● ●●●●● ●●●●
●● ●●● ●
●
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●●● ●● ●●● ●
●●● ●●
●●●
●
●
● ●●●●
●●●
●●● ●●● ●
●●●●●●
●●●
●●● ●● ● ●●●● ●●● ●●●● ●
●●●● ●
●●
●●
●●●
● ●●●●●
● ●●●●● ●●●
●● ●● ●●●●●●
●● ●●●
●● ●●●●●●
●
●●●●●
●
●
●●●●
●●●●●
● ●●● ● ●●● ●●
● ●●●● ●●● ●●
●●●●● ●●● ●●● ●●
● ●●●●●
●
●
● ●●●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●●●●●●
●●●● ●
●●●●
●
●●●
●●●
● ●●
● ●●●● ●●● ●
●●●●● ●
●●●●●● ●● ●● ●●● ●●●
●●●● ●● ●
●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●●
●●●●●
●● ●●●●●● ●●
●●●
●●●●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●●
●●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●●●●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●●●●
●●●
●
●●●●
●
●
●
●
●●
●●●
●●●●
●●
●●●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
20 1 2
−2
−1
0
−2 −1 0●
●
●
●●●
●
●
●
●● ●
●
●
● ●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●●●●
●●●●●●
●
●
● ●●●
●
●
●
●
●●
●●●
●●●
●
●●
●●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●●
●
●●
●●●
●
●
●
●●●●●●●
●
●
●●●●
●
●●
●●
●
●●●●
●
●
●
●●●
●●●
●
●●●●
●●
●●
●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●●●
●●●
●●●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●●
●●●●
●●
●
●
●●●
●●
●●●●
●
●
●●
●
●●
●●●●●
●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
● ●
●
●●●● ●● ●
●
●
●●
●●
●
●●
●●
●
●●● ●
●
●
●
●●●
●●●
●
●●●●
●●
●●
●
● ●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●●
●●● ●●●
●
●
●●
●
●●
● ●●●
●
●
●●
●
●●
●●●● ●
●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
Discovers gender difference...Results depends crucially on sphering the data first.
93
K-means on Crabs
Using 4 cluster centers.circle/triangle is gender, black/red is species
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●●● ●● ●●● ●
●●● ●●
●●●
●
●●
●●●
● ●●●●●
● ●●●●● ●●●
●● ●● ●●●●●●
●● ●●●
●● ●●●●●●
●
●●●●●
●
●
● ●●●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●●●●●●
●●●● ●
●●●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●●
●●●●●
●● ●●●●●● ●●
●●●
●●●●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
20 1 2
−2
−1
0
−2 −1 0
●
●
●●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●●●
●●●
●●●
●
●
●
●●●●
●●
●●●●
●
●
●●
●
●●
●●●●●
●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●●
● ●●●
●
●
●●
●
●●
●●●● ●
●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
colors are clusters
Scatter Plot Matrix
V11
2
3 1 2 3
−2
−1
0
−2 −1 0
●
●●
●●
●●● ●
●●●● ● ●●● ● ●
●●●● ● ● ●●
● ●●● ●●● ●●●●●
●●●●
●● ●
●● ●
●
●
●●●●● ●● ●● ●
●●● ●●
●●●
●● ●●●● ●● ●●● ●● ●
●● ●● ●●
● ●●●
● ●●●●●
●
●
●●
●●●
●●●
●●● ● ●● ●
●●●●●●
●●●
●●● ●● ● ●●●● ●●
● ●●●● ●
●●●
● ●●●
●●
●●●
●●●●●●
● ●●●●● ●●●
●●●● ●●●●●●
●● ●●●
●● ●● ●●●●
●
●●●●●
●
●
●●
●●
●●●●
●● ●●● ● ●●● ●
●● ●●●● ●●
● ●●●●●●● ●●● ●
●● ●●
●●●●● ●
●
●
● ●● ●●● ●●
●●●
● ● ●●●
●●● ●●●●● ●●●●●●
●●●●●
●● ●●●
● ●●● ●
●●●●
●
●●
●●●
●● ●
●● ●●●● ●
●● ●●●●
●● ●
●●●●●● ●● ●● ●●
● ●●●●●
●●●
● ●●●
●●
●●●
●● ●●● ●
●● ●●●●●●●
● ●●●●
●●●●●
●●● ●●
●●●● ●●●●
●
●●●●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●●●●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●
●●●
●●
●
●
●●●●
●
●
●
●
●●
●●
●
●●●●
●●
●●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
●●
●
●
●
●●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
V20
1
2 0 1 2
−2
−1
0
−2 −1 0
●
●
●
●●●
●
●
●
●
● ●
●
●
● ●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●●●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●●
●
●●●●●
●
●
● ●●●
●
●
●
●
●●
●●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●●
●●●
● ●
●
●
●
● ●●●
●
● ●
●
●●
●●
●
●●
●
●●
●
●●
●●●
●
●
●
●●●
●●●●
●
●
●
●●
●
●
●●
●●
●
●●●●
●
●
●
●●●
●●●
●
●●●
●
●●
●●
●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●●●
●●●●●
●●●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●
●
●●●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●●●●
●●
●
●
●●
●
●●
●●●●
●
●
●●
●
●
●
●●
●●●●●
●
●●●●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●
●●●
●●●●
● ●
●
●●●●
●● ●
●
●
●
●●
●
●
●●
●●
●
●●● ●
●
●
●
●●●
●●●
●
●●●
●
●●
●●
●
● ●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●●●
●●●
●●
● ●●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●●● ●●●
●
●
●●
●
●●
● ●●●
●
●
●●
●
●
●
●●
●● ●●●
●
● ●● ●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●
●●●
●●●●
●
V30
1
2 0 1 2
−2
−1
0
−2 −1 0
94
K-means on Spike Waveforms
library(MASS)library(lattice)spikespca <- read.table("spikes.txt")cl <- kmeans(data,6,nstart=20)splom(data,col=cl$cluster)
95
K-means on Spike Waveforms
Scatter Plot Matrix
V11234
1 2 3 4
−3−2−1
0
−3−2−1 0
●
●
●
●
●
●●●● ●
●
●●●●
●●●
●
●●
●●●●
●
●
●
●
●●
● ●● ●
●●●●●
●●
●●●●
●●
●●●
●
●●●
●
●●
●●
●
●
●●●
●
●●
●
●
●
● ●
●
●
●
●
● ●
●●●
●
●
●
●●
●●●
●● ●
●
●
●●●
●
●
●
●●
●
●●●●
●
●
●
●●
●●
●●●
●
●
●● ●
●●●
●●●●
●●●
●
● ●●●●●
●
●
●
● ●
●
● ●●●●
●
●
●
●●
●●
●
●
● ●
●
●
● ●●●● ●● ●●●
●●●●
●●
●●●
●● ●●
●●
●
●
●
● ●
●●
●
● ●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●● ●
●●
●
●●●
●●
●●
●●
●
●
●● ●●●●
●●●● ●●● ●
●
●
●●
●
●●
●●
●
●●●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●●
●
●● ●●
●●
●●
●●●●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●●●
●
●
●
●●●
●
●●●●● ● ●●
●
● ●● ●
●
●● ●●
●
●
●
●● ●●
●●●
●●
●
●●●●●
●
●
●●
●
●
●
●●●●
●●●
● ●
●●
●
●
●
●
●●●●
●
●
●
● ●●
●
●
●●●●●●
●●
●
●●●●
● ●
●
●
●●●●●
●●●
●
●● ●●
●
●●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
● ●
●●●●
●●●
●
●●●
●
●
●
●
●
●
●●●●●
●●●●
●
● ●●●
●
●
●●●
●
●●
●
●●●●
●
●● ●●
●
●
●●●
●
●
●●
●●
●
●●●
●●
●
●
●●
●
●
●●
●
●
●●●●
●● ●
●
● ●● ●
●●
●
●
●●
●
●
●●●
●●
●
●
●●
●
●●
●
●
●●●●●●●●●
●●●●
●●
●
●
● ●●●●●
●
●●
●
● ●●●●●●
● ●●●
●●●
●●●●
●●
● ●●
●
●●
●
●●
●
●●●●●●●●
●
●● ●
●●●●
●●
●●● ●●●●
●●
●
●
●
●
●●●
●
●●●●●●●
●
●●●●
● ●●
●●●
●● ●●
●●●
●●
●
● ●●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
●
●
●●●●●
●● ●
●●
●●
●
●●●
●●
●●
●
●
●
●
●
●
●●
●
●
●●●●●● ●●
●
●● ●●●●
●●
●●●●●
●
●
●●
●● ●●
●●
●
●
●●
●
●●●●
●●●●●
● ●
●
●●●
●
●
●
●
●
●
●●
●
●● ●
●●
●●●●●●
●
●●●
●●
●
●●
● ●●
●
●● ●
●
●●●
●
●●●●●
●●
●●●●
●
●●●
●
●●
●●
●●
●●●●
●
●●
●
●
●●
●
●
● ●●
●●
●
●●●●●
●
●●●●●
● ●
●●●●
●
●
●
●●●●●●
●
●●●
●●
●●●●
●●●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●●●●
●●●
●● ●●●
●●
●
●●●●
●
●
●●●
●●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●●●●●
●
●●●●●●●
●
●●●●●●
●
●
●
●
●●
●●●●●
●●●●●
●
●●●
●●●
●●●
●
●●●●
●●
●●
●
●
●●●●
●●
●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●●
●●●
●●●●
●
●●●
●
●
●
●●
●
●●●●●
●
●
●●
●●
●●●
●
●
●●●
●●●
●●●●●●●
●
●●●●●●
●
●
●
●●
●
● ●● ●●
●
●
●
●●
●●
●
●
●●
●
●
●●●●●●●●●●●
● ●●●
●●
●●●●●
●●●
●
●
●
●●
●●
●
● ●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●●
●●
●●
●●
●
●
●●●●●●
●●●●●●●●
●
●
●●
●
●●
●●
●
●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●●●
●●
●
●
●
●
●
●
●
●●
●
●●●●●●●
●
●●●●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●
●
●
●
●●●
●
●●●●●●●●
●
●●●●
●
●●●●
●
●
●
●●●●●●●
●●
●
●●●●●
●
●
●●
●
●
●
●●●
●●●●●●
●●
●
●
●
●
●●●●●
●
●
●●●
●
●
●●●●●
●
●●
●
●●●●
●●
●
●
●●●●●●●
●
●
●●●●
●
●●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●●●
●●●
●
●●●
●
●
●
●
●
●
●●●●●●
●●●
●
●●●●
●
●
●●
●
●
●●
●
●●●●
●
●●●●●
●
●●●
●
●
●●
●●●
●●●
●●
●
●
●●
●
●
●●
●
●
●●●●
●●●
●
●●●●●●
●
●
●●
●
●
●●●
●●●
●
●●
●
●●●
●
●●●●●●●●●
●●●●
●●
●
●
●● ●●●●
●
●●
●
●●●●● ●●●●●
●
●●●●●●●●●
●●●
●
●●
●
●●
●
●●●●●
●●●
●
●●●●●
●●
●●
●●● ●●●●
●●
●
●
●
●
●●●
●
●●●●●●●●
●●●●●●
●●
●●●●●
●●●
●●●
●
●●●●
●
●
●
●●
●●●●
●
●
● ●
●
●
●
●●●●●
●●●
●●●●
●
●●●
●●
●●
●
●
●
●
●
●
●●
●
●
●● ●●●● ●●
●
●●●●●●
●●
●●●●
●
●
●
●●
●●●●
●●
●
●
●●
●
●●●●●●●●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●●●
●●
●●
●●● ●
●
●●●
●●
●
●●●●●
●
●●●
●
●●●●
●●●●●
● ●
●●●●●
●●●
●
●●
●●●
●
●●●●
●
●●
●
●
●●
●
●
●●●
●●
●
●●●●●
●
●●
●●●
●●
●●● ●
●
●
●
●●● ●●●
●
●●●
●●
●● ●●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●●●●●●
●●●
●●●●●
●●
●
●●●●
●
●
●●●●●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●●●● ●
●
●●●●● ●●
●
●●
●●●●
●
●
●
●
●●
●●●●
●●●●
●●
●
●●●●
● ●
●●●
●
●●●
●
●●
●●
●
●
●●●
●
●●
●
●
●
● ●
●
●
●
●
● ●
●●●
●
●
●
●●
●●
●●
●●●
●
●●●
●
●
●
●●
●
●●●●
●
●
●
●●
●●
●●●
●
●
●● ●
●●●
●●●●
●●●
●
●●●●●
●●
●
●
●●
●
●●●●●
●
●
●
●●
●●
●
●
●●
●
●
● ●●●●●● ●●●●
●●●●●●●●
● ●●●
●●
●
●
●
● ●
●●
●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●●
●●
●
●●●
●●
●●
●●
●
●
●●●●●●
●●●●● ●● ●
●
●
●●
●
●●
●●
●
●●●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●●● ●●
●●
●
●
●
●
●
●
●
●●
●
●●●●
●●
●●
●●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●
●
●
●
● ●●
●
●●●●●● ●●
●
●●● ●
●
●●● ●
●
●
●
●●●●●●
●●●
●
●●●●●●
●
●●
●
●
●
●●●●●
●●●●
●●
●
●
●
●
●●● ●
●
●
●
● ●●
●
●
●●●● ●●
●●
●
●●●●
●●
●
●
●●●
●●● ●
●
●
●●●●
●
●●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
● ●
●●●●
●●●
●
●●●
●
●
●
●
●
●
●●●●●
●●●●
●
●● ●●
●
●
●●●
●
●●
●
●●●
●●
●●●●
●
●
●●●
●
●
●●
●●
●
●●●
●●
●
●
●●
●
●
●●
●
●
●●●●
●●●
●
●●●●
●●
●
●
●●
●
●
●●●
●●
●
●
●●
●
●●●
●
●●● ●●●●●●
● ●●●
●●
●
●
●●●●●●
●
●●
●
● ●●●●●●
●● ●●
●●●
●●●●
●●
●●●
●
●●
●
●●
●
●●●●●●●●
●
●●●
●●●●
●●
●●●●
●●●
●●●
●
●
●
●●●
●
●●●●
●●●●
●●●●● ●
●●
● ●●●●
●● ●●
●●
●
●●●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●●●●
●●●●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●●
●
●
●●●
●● ●●●
●
●●●●●●
●●
●●●●●
●
●
●●
●● ●●
●●
●
●
●●
●
●●●●
●● ●●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●●●
●●
●●
●●● ●
●
●●●
●●
●
●●
●●●
●
●●●
●
●●●●
●●●●●
●●
●●●●●
●●
●
●
●●
●●
●●
●● ●●
●
●●
●
●
●●
●
●
● ●●
●●●
●●●●●
●
●●●
●●
●●
●●●●
●
●
●
● ●● ●●●
●
●●●
●●
●● ●●
●●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●●●●●
●●
●●● ●
●●●●
●
●●● ●
●
●
●●●
●●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●● ●● ●
●
●● ●●
● ●●
●
●●
●●●●
●
●
●
●
●●
● ●●●
●● ●●
●●
●
●●●●
●●
●● ●
●
●●●
●
●●
●●
●
●
● ●●
●
●●
●
●
●
● ●
●
●
●
●
●●
●●●
●
●
●
●●
●●●
●●●
●
●
●●●
●
●
●
●●
●
●● ●●
●
●
●
●●
●●
●●●
●
●
●● ●
● ●●
●●●●
●●●
●
●●●●●
●●
●
●
●●
●
●●●●●
●
●
●
●●
●●
●
●
●●
●
●
● ●● ●● ●● ●●●●
●●●●
●●
●●● ●●
●●●
●
●
●
●●
● ●
●
●●
●●
●
●●
●
●
●●
●
●
● ●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●●
●●
●
●●●
●●
●●
●●
●
●
●●● ●●●
● ●● ●●● ●●
●
●
●●
●
●●
●●
●
●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●● ●●●
●●
●
●
●
●
●
●
●
●●
●
●● ●●
●●
●●
●●●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●● ●●
●
●
●
● ●●
●
●●●●●● ●●
●
●●● ●
●
●●●●
●
●
●
●● ●●
● ●●
●●
●
●●●●●
●
●
●●
●
●
●
●●●
●●
●●●●
●●
●
●
●
●
●●●●●
●
●
● ●●
●
●
●● ● ●●●
● ●
●
●● ●●
●●
●
●
●●●
●●● ●
●
●
●●●●
●
●●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●●●
● ●●
●
●●●
●
●
●
●
●
●
●●●●●
●●
●●
●
●●● ●
●
●
●●●
●
● ●
●
●●●
●●
● ●●●
●
●
●●●
●
●
●●
●●●
●●●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●●●
●
●●● ●
●●
●
●
●●
●
●
●●●
●●
●
●
●●
●
●●●
●
●●● ●●●● ●●
● ● ●●
●●
●
●
●● ●●●●
●
●●
●
● ●●●●●●
●● ●●
●●●
●●●●●●
● ●●
●
●●
●
●●
●
●●●●●
● ●●
●
●●●
● ●● ●
●●
●●●●
●●●
●●
●
●
●
●
● ●●
●
●● ●●●● ●
●
●●●●● ●
●●
●●●●●
●●●●
●●
●
●●●
●
●
●
●
●●
●● ●
●
●
●
● ●
●
●
●
●●●● ●
●●●
●●
●●
●
●●●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●●● ● ●● ●
●
●●● ●● ●●
●●● ●●
●
●
●
●●
●● ●●
●●
●
●
●●
●
●●●●
●●● ●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●●●
●●
●●
● ●● ●
●
●●●
● ●
●
●●
●● ●
●
●●●
●
● ●●
●
●● ●● ●
● ●
●● ●●
●
●●●
●
●●
●●
●●
●●●●
●
●●
●
●
●●
●
●
●●●
●●
●
●●●●●
●
●●●
● ●
●●
●●● ●
●
●
●
●●●● ●●
●
●●●
●●
●●●●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●● ●
●
●●
●●● ●●●●●
●
●● ●●
●
●
●●●
●●●
●
●
●
● ●
●
●
●●●●
●
●
●●
●●
●●●
●
● ●●●●●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●● ●●●
●●
●●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●●
●
●
●
●●●
●
●●
●
●●
●●●
●●●
●●
●
●●
● ●●●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●
●●●●
●
●
●
●●
●
● ●
●
●
● ●
● ●
●
●●●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●● ●
●
● ●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
● ●●
●● ●●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●●●●
●●
●
●● ●
●
●
●
●
●
●
●●
●●
●
●
●●
●●●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●● ●● ●●
●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●●●●●●
●
●●
●
● ●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●●●
●●
●
●
●
●●●●●●
●
●
●
●●
●
●
●
●
●●●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●●●
●●
●
●●●
●
●●● ●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●● ●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●● ●●
●
●
●
●●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●● ●●
●●
●
●●●
●
●●●
●●
●
●●
●
●
●
●●
●
●●●●●
●
●
●
●●
●
●●●●
●
●
●
●
●●●●
●●●
●
●●
●
●
●
●
●
●
●
●●●
●●●●●
●●● ●
●
●
●●●●●
●●
●
●●
●
●●●●
●
●●
●
●
●●●
●
●●●●
●
●
●
●●● ●●●
●
●
●
●●
●
●
●●●●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●●
●●
●
●●●●
●●
●● ●
●
●●
●
●
●
●
●●
●
●●●●
●
●●
●
●
●
●●●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●
●
●●●●●
●
●●
●
●
●
●
●●
●
●
●●●●
●
●● ●
●●
●
●●●
●
●
●
●●●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●●●●●
●●
●●●●●●
●●●●
●
●
●●
●●●●●●
●●●●
●
●●
●
●
●
●●●
●●
●●● ●●
●●● ●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●●
●● ●●●
●
●
●●●●
●
●●●●●
●●●●
●
●
●
●●
●●●
●
●
●●●● V20
1
2 0 1 2
−2
−1
0
−2 −1 0
●
●
●●
●●
●●●
●
●●●●●●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●●●●●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●●
●
●
●
●●●●
●●
●
●●
●●●
●●●
●●
●
●●
● ●●●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●
●●●●●
●
●●●
●
●
●
●●●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●●●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●●●
●
●●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●
●●●●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●●●●
●
●
●
●
● ●
●
●
●
●
●
●●●●
●●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
● ●
●●●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●●●●●●
●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●●●● ●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●●●
●●
●
●
●
●●●●●●
●
●
●
●●
●
●
●
●
●●●
●●●
●
●
●
●
●
● ●●
●
●
●
●●●●●
●●
●
●●●
●
●●●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●●●
●
●●
●
●
●●
●
●● ●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●●
●
●
●●
●
●●●●●●
●
●●●
●
●●●
●●
●
●●
●
●
●
●●
●
●●●●●
●
●
●
●●
●
●●● ●●
●
●
●
●●●●
●●●
●
●●
●
●
●
●
●
●
●
●●●
●●●●●
●●●●
●
●
●●●●●
●●
●
●●
●
●●●●
●
●●
●
●
●●●
●
●●●●
●
●
●
●●●●●●
●
●
●
●●
●
●
●●●
●●
●● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●●
●●
●
●●●●
●●
●●●
●
●●
●
●
●
●
●●
●
● ●●●
●
●●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●
●
●●●
●●
●
●●
●
●
●
●
●●
●
●
●●●●
●
●●●
●●
●
●●●
●
●
●
●●●●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●●
●●●●
●●●●●●●
●●● ●
●
●
●●
●● ●●●●
●●●●
●
●●
●
●
●
●●●
●●
●●●●●●
●●●●
●
●●
●
●
●
●●
●
●
●
●
●●●
●●●●●●
●●●●●
●
●
●●●●●
●●●●●
●●●●
●
●
●
●●
●●●
●
●
●●●●
●
●
●●
●●
●●●
●
●●●●●●
●
● ●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●●●● ●
●●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●●
●
●
●
●●●
●
●●
●
●●
●●●
●●●
●●
●
●●
●●●●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●
●●●●
●
●
●
●●
●
● ●
●
●
●●
● ●
●
●●●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●● ●
●
●●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
● ●●●
●●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●●●
●●●●
●
●
●
●
●●
●
●
●
●
●
●● ●●
●●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
● ●
●●●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●● ●● ●●
●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●●●● ●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●●●●●
●
●
●
●●● ●●●
●
●
●
●●
●
●
●
●
●●●
●●●
●
●
●
●
●
● ●●
●
●
●
●●
●● ●
●●
●
●●●
●
●●●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●● ●●
●●
●
●●●
●
●●●
●●
●
●●
●
●
●
●●
●
●●●●●
●
●
●
●●
●
●●●●●
●
●
●
●●●●
●●●
●
●●
●
●
●
●
●
●
●
●●●
●●●●
●●●●●
●
●
●●●●
●
●●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●●●●●●
●
●
●
● ●
●
●
●●●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●●
●●
●
●●●●
●●
●●●
●
●●
●
●
●
●
●●●
● ●●●
●
●●
●
●
●
●●●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●●
●●● ●
●
●
●
●
●
●● ●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●●
●
●●●
●●●
●●
●
●
●
●
●●
●
●
●●●●
●
●● ●
●●
●
●●●
●
●
●
●●●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●●
●●●●
●●●●●●
●
●●●●
●
●
●●
●● ●●●●
●●●●
●
●●
●
●
●
●● ●
●●
●●●●●●●● ●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●●●●●
●●●●●
●
●
●●●●●
●●● ●●
●●●●
●
●
●
●●
●●●
●
●
●●●●
●
●
● ●
●●
● ●●
●
●●● ●●●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●●●
●●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●●
●
●
●
●●●
●
●●
●
●●
●● ●
●●●
●●
●
●●
● ●●●
●
●
●
●
●
● ●●
●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●
●●●●
●
●
●
●●
●
● ●
●
●
● ●
●●
●
●●●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●● ● ●
●
● ●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
● ●●
●●●●
● ●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
●●
●●
●
● ●●●
●
●
●
●
● ●
●
●
●
●
●
●●●●
●●
●
●● ●
●
●
●
●
●
●
●●
●●
●
●
●●
●●●
●
●
●● ●
●
●●
●
●
●
●
●
●●
●
●●
●
● ●● ●● ●●
●
●
● ●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●●●●●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●●●
●●
●
●
●
● ●●●●●
●
●
●
●●
●
●
●
●
●●●
●●●
●
●
●
●
●
● ● ●
●
●
●
●●
● ●●
●●
●
●●●
●
● ●● ●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●
●●●
●
●
●
●●●
●●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●● ●●●●
●
●●●
●
●●●
●●
●
●●
●
●
●
●●
●
●●●●●
●
●
●
●●
●
●●●●
●
●
●
●
●●●●
●●●
●
●●
●
●
●
●
●
●
●
●● ●
●●●●●
● ●●●
●
●
●● ●●
●
●●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●●●●●●
●
●
●
●●
●
●
●●●
● ●
●● ●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●●
● ●
●
●
●●
●●
●
●●●●
●●
●● ●
●
●●
●
●
●
●
●●
●
●●● ●
●
●●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
● ●
●●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●●●
●
●●●●
●●
●●
●
●
●
●
●●
●
●
● ●●●
●
●●●
●●
●
●● ●
●
●
●
●●●●●
●
●●
●●
●
●●
●
●●
●
●
● ●
●
●
●
●●
●
●
●●●
●●●●●
●●● ●●●
●●● ●
●
●
●●
● ●● ●●●
●●●●
●
●●
●
●
●
●●●
●●
●●● ●●
●●● ●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
● ●●●● ●
●●●●●
●
●
●●●●
●
●● ●●●
●●●●
●
●
●
●●
● ● ●
●
●
●●●●
●●●● ●
●●●●●● ●●●●●●●●
● ●●●●●● ●● ●●● ●●●● ●●●●●●
●● ●●●●● ●●
● ●●●
●●● ●●
●
●● ●
●● ●●● ●
● ●●●
●● ●●
●●
●●●
● ●●
●●●
●●●●●● ●●●●● ●●
●●
●●●●●● ●● ●
●●
●
●●●
●
●
●●●
●●●
●●●●●●● ●●●●●
●●
●
●
● ●●●
●
●
●●●●
●●●●
●● ●● ●●
●● ●●●●●
●●●
●●●
●●● ●●
●●●●●●
●●●
● ● ●●● ●● ●
●● ●●
●●●
●●●●
●
●●●● ● ●●
●●●●●
●
● ●● ●●●● ●● ●
●
● ●●●
●● ●●●●●
● ●●●●
●●●●●●
●●●●●
●●●●●
●
●●●
●●
●●●
●
●●
●●
●●
● ●● ● ●●●●
●
● ●
●●●●●●● ●●●
●●
●
●
●●
●●● ●
●●●●●●
●●
●●●
●●●● ●●
●
●● ●●● ●●● ●● ●●●● ●● ●●●● ●●
●●●●●●●● ●●
●
●
● ●●●●●● ●●●●●●●●
●●● ●
●●
●●
●●
●●● ●●
●●●
●● ●●
●●●● ●●●●
●●●●● ●
● ●●●
●
●
●●●●● ●●●●●●●●
●
● ●
●
●●●●●●●●
● ●●●
●
●
●●●●●●●●●
●● ●
●
●
●
●● ●●● ●●● ●
●●●●●● ●●● ●●
●●●
●●●
●
●●●●
●●●●●●
●●●●●
●
●●●●
●●●
●
●●●●●●●●●●● ●●●
●●
●
●
●
● ● ●●●●
●●
●
●● ●●
●●● ●●●● ●●●●● ●●●● ●●
●● ●●● ●
●●
●
●●●
●
●● ●●●●
●
●●●●●●●●● ●●●●
●●
●
● ●●
●●
●●● ●●● ●●●●●
●
●●●●●●●
●●●
●●●●
●●
●●
●●● ●
●● ●
●●●●
●●● ●●
●● ●●●
●●●● ●●
●●●●●
●● ●
●● ●
●●
●
●●●● ●●●
●●● ●●●●●
●●●●●●●●
●●
●●
●
●●●●
●
●
●●●●
●●●
●●●
●
●●● ●●●●● ●●
●●●●●●
●●●●●●● ●●● ●●
●●
●● ●
●●
●
●●●●●●●●●●●●
●●●●●
●●
●
●●●
●●●
●●●
●
●
●●● ●●
●●●●●●●●●
●
●●●● ●
●
●●
●●
●●
●●●●● ●
●●●●●●
●●●●● ●●●●●
● ●●●●
●
●●●●
●●● ●●●
●●●●
●●●
●●●●● ●●●●●●●●
●●●● ●●●● ●●●● ●●●●
●●●●
●●●●●●●●
●●
●●●
● ●●●●
●●●●
●
●●●● ●
●
● ●●●
●●●●● ●● ●●●
●● ●●
●
● ●●●
● ●●●●●
●●●● ●●●●
●●●● ●●●●●●●●●
●●
●●●●
●●●● ●●● ●
●
●●● ●●
●●●● ● ●●●●
●●●● ●
●●●●●●● ●● ●● ●●●●●● ● ●●●● ●● ●● ●●● ●●
●●●●●●●●●
● ●●●
●●● ●
●
●
●● ●
●● ● ●● ●
●●●●
●● ● ●
● ●
●●●
●●●
●●●
●●●● ● ●●●●●●●
●●●
●●●●●● ●● ●●
●
●
●●●
●
●
●●
●●●
●●●●● ●●● ●● ●
●●●
●●
●
●● ●●
●
●
●●● ●
●●●
●●
●● ●● ●●●● ●●●●
●●●
●●●
●●●● ●
●●● ●● ●
●●●
●● ●●●●●●
●●●●
●●●
●●●●
●
●●
●● ●● ●●● ●●
●
●
●● ● ●●● ●● ●●
●
●●● ●
●●● ●●●●
●●●●●
●●●● ●
●●●
●●●●
●●● ●●
●●●
●●
●●●
●
●●
●●
●●● ●●● ● ●●
●
●
●●
●●● ●●●
●●●●
●●●
●
●●
●●●●
● ●●
●● ●●
●●
●●
●●●●●●
●
●●● ● ●●● ●●● ●●●●●●●●● ● ●●
●●●●
● ●●●● ●●
●
● ●●●●●
● ●●● ●●●● ●●
● ●●●●
●●
●●
●●●●●
●●●●●●
●● ● ●●● ● ●
●●● ●●●●
●●●●
●
●
●●●●● ● ●●●
●●●●
●
●●
●
●●●●●●
● ●● ●
●●
●
●
●●●
●● ●●●●
●●●
●
●
●
● ●● ●●●●
●●● ●
●●●●●●
●● ●●●
●●●●
●
● ●● ●
●●●●●● ●
● ●●●
●
●●●
●●
●●
●
●●●● ●●
● ● ●● ● ●●●●●
●
●
●
●● ●●● ●
●●
●
●● ●
●●● ●●
●●
●● ●● ●● ● ●● ●● ●
●●●●● ●
●●
●
● ●●
●
●●●●●●
●
●●●●●●
●●● ●●●●
●●
●
●●●
●●●●● ●● ●● ●●●●
●
●●●
●●●●
●●●● ● ●
●● ●
●●
●●● ●
●●●
●●●●
●●●●●
●●●●●
●●●●●
●●●
●●●
●● ●
●●●
●●
●
●●●● ●●●
●●●●●
● ●●
●●● ●● ●●●
●●
●●
●
● ●● ●
●
●
● ● ●●
●●●
●●●
●
●●●●●● ●●●●
●●●●●
●●●● ● ●
●●●● ●●●
●●
●●●
●●
●
●●●
●● ●● ● ●●
●●
●●●●●●●
●
●●●
●● ●
●● ●
●
●
●●● ●●●
●●●●● ● ● ●
●
●●●● ●
●
● ●
●●
●●
●●● ●●●●●●
●● ●●
●●●
●●●●●●
● ●●●●
●
●●● ●
●●●●●●
● ●●●
● ●●
●●●●●● ● ●●
● ●●●●
●● ●●●● ●●● ●●●● ●●●
●●●●
●●●●●● ●
●●
●● ●
●●●●●●●●●●
●●●●●
●
● ●●●●●
●● ● ●●●●●
●●●●
●
●● ●●
● ●● ●●●
●●●● ●●●●
●● ●●●●●● ●●●●●●
●●●● ●●
●●●
●● ●●
●
●●● ●●
●●●●
V30
50 5
−10
−5
−10 −5
● ●●● ●●
●●● ●●●●●●● ●● ●
●● ●●●●●●● ●●●●●●●● ●●● ●●●●●●●● ●●●●●
●●●●●● ●
●
●● ●
●●●●●●
●●●●
●●● ●
● ●
●●●
●●●
●●●●●● ●●●●
●●●●●
●●
●●
●●●●●●● ●●
●
●
●●●
●
●
●●
●●●
●●●●●●●●● ●●
●●●●●
●
●●●●
●
●
●●● ●
●●●
●●
● ●●●●●● ● ●●●●
●●●
●●●●●●●●●●●● ●●
●●●
● ●●●
●●● ●●
●●●●
●●● ●
●●
●
●●
●●●● ●●●●●
●
●
●●●●●●●● ●●
●
●●●●
●●●●●●●
●● ●●●
● ●●●●
●●●
●●●●
● ●● ●●
●●●
●●
●● ●
●
●●
●●●
●●●●●● ●●
●
●
●●
●●●●● ●
●●●●
●●
●
●
●●
●●●●●●
●●●●
●●
●●●
●●●●●●
●
●●●●●
●● ●●● ●●●● ●● ●● ●●●●●
●●●● ●● ● ●●
●
●
●●●● ●●
●● ●●●●●●●●●●●
●●
●●
●●
●●●●●
●●●●●●
●●●●●● ●●
●●●●
●●●●●
●●●
●
●●●● ●● ●●●●
●●●
●
●●
●
●●● ●●●
●●●●
●●
●
●
●●●●●●●●●
●●●
●
●
●
●●●●●●●●●● ●●
●●●●●●●●
●●●●●
●
●
● ●●●
●●●●●●●
●● ●●
●
●●●●
●●●
●
●●● ●●
●●● ●●●●● ●
● ●
●
●
●
● ●●●●●
●●
●
●● ●
●●● ●●●●
●●●●●
●●●●●● ●●
●●●●●●●
●
● ●●
●
●● ●●●●
●
●●● ●●●●
●●● ●●●●
●●
● ●●
●●
●●● ●●●● ●●●●
●
● ●●
●●●●
●●●
●● ●●
●●
●●
●●● ●
●●●●●●
●●●●●
●●●●●
●●
●●●●●●●●●●
●● ●●
●●●●
●
●●●● ●●●
●●●●●● ●●
●●
● ●●●●●●
●
●●
●
●●●●
●
●
● ●●●●
●●●●●
●
●●●●●●
●●●●●● ●●●
●●●● ●●
●●●●●●●
●●●
●●●
●
●
●● ●
●● ●●●●
●●●
● ●●●●
●●
●
●● ●
●● ●
●● ●
●
●
●●● ●●●●●● ●●●●●
●
●● ●●●
●
●●
●●
●●
●●●●●●
●●●●
●●●
●●●
●●●● ●●
●●●●●
●
●●●●
●●● ●●●
●●●
●●●
●●●
●●● ●●● ●●● ●●●●●●●●●●●● ●●●●●●
●●●●
●● ●
●●●●●●●
●●●
●●● ●●
●●● ●●
●●● ●●
●
●●●●
● ●●●●●●●●
●●
● ●●
●
●●●●●●●●
●●
●●●● ●● ●●●●●●●●●● ●●●
●●●●●●
●●●
●●●●● ●●
●
●●● ●●●●●● ● ●
● ●●●
● ●● ●
●●● ●●● ●● ●●● ●●●●●●●●
●●● ●●●● ● ●●●●●●●●●●●●● ●●●●
●● ●●●
●
●●●
●●●●● ●
● ●●●
●● ●●
●●
●●●
● ●●
●●●
●●●●● ● ●●● ●●●
●●
●●
● ● ●● ●● ●●●●
●
●●●
●
●
●●
●● ●
●●●●●●●● ●●●
● ●●
●●
●
● ●●●
●
●
●●● ●
●●●●
●● ● ●●●
●●● ●● ●●●●
●●
●●●
●● ●●●●●● ●●
●●●●● ●●
●● ● ●●
● ●●●
●●●●
●●
●
●●
●● ●● ●●●●●
●
●
● ●● ●●●●● ● ●
●
●●●●
●●●●●●●
●●●●●
●●●●●
●●●
● ●●●
●● ●●●
●● ●
●●
●● ●
●
●●
●●
●●
● ●●● ●● ●●
●
●●
●●●● ●●● ● ●
●●
●●
●
●●●●
●●● ●
●● ●●
●●●
●●
●●● ● ●●
●
●●● ●●
●● ●● ● ●● ●● ●● ●● ●●● ●●
●●●● ●
● ●●●●
●
● ●●●● ●
● ●●● ●●● ●●●
●●●●●
●●
●●
● ●●●●
●●●
●●●●
●● ●● ●●●●
●●●● ● ●
●●●●●
●
●● ● ●●● ●●●
●● ●●
●
●●
●
● ●●●●●
●●● ●●
●
●
●
●●
●●●●●●●●
●●
●
●
●
●●● ●●● ●●● ●●
●●●●● ●● ●●●●
●●●
●
●
●●● ●
●● ●●●●
●●●● ●
●
●●●
●●
● ●
●
●●●● ●
●●● ●● ● ●●●
● ●
●
●
●
●●●●● ●
●●
●
●●●●
● ●●●●
●●● ● ●●
●●●● ●●●●
●●●● ●●●●
● ● ●
●
●● ●●●●
●
●●● ●●●
●●●● ● ●●
●●
●
● ●●
●●
●● ●●● ●● ●●●●
●
● ●●
●●●●
●●●
●●●●● ●
●●
●● ●●
●●●
●●●●
● ●●●●●
● ● ●●
●●● ●●
●●●
●● ●
●●●
●●●
●●
●
●● ●●●●●
●●●●●
● ●●
●●
●●●●●●●
●
●●
●
●●● ●
●
●
● ●●●
●●●
●●●
●
● ●●●●●
● ●● ●●● ●●●
●●●●●●
●● ●
●●● ●●
●●
● ●●
●
●
●● ●
● ● ●●●● ●
●●
● ●●● ●●
●●
●●●
●● ●
●●●
●
●
●● ●●● ●●●
●● ●● ●●●
●●●● ●
●
● ●
●●
●●
●●● ●●●
●● ●●
●●●
●●●
● ●●● ●●
● ●●●●
●
●●●●
●● ●●●●
●●●
●● ●
●●●
●●●●● ●●
● ●●●●
●●●●● ●● ●●●●● ●●●
●●●●
●●●
●● ●●●●●
●● ●
●●●●●
●●● ●
●
●●●●●
●
●● ●●
●●●●● ●● ●●
●●
● ●●
●
● ●●●
● ●●● ●●
●●●
●● ●●●● ●●●●● ●● ●●●
●●●
●● ●● ●
●●●
●●●●●
●
● ● ●●●
●●●●
●
●
●
●
●●
●
●●
●
● ●●●●
●
●
●
● ●
●
●●●●● ●●
●● ● ●●
●● ●
●●●●
●●● ●●●
●●
●●●
●●●
●●
● ●●
●
●●
●●●●
●● ●● ●●●
●● ●
●
●
●●
●●
●●
●
●●
●●
●●
●●● ●●●●
●●● ●●
●
●●●●●●
●
●
●
●●
●●●
●
●
●
●
●●●
●●●●●
●●●
●
●
●●
●●●●
●
●●●●
●●●●●
●●●
●●
●
●
●● ●● ●
●
●
●
●●●●●
●●●
●●●● ●●●●●●
●
●
●
●●●
● ●●
● ●
●
●●
●●●
●
●●●
●
●
●
●●
●●●●
●
●
●●
●●
●●
● ●●
●●●● ●
●●
●
●●●●
●●
●●●● ●●
●
● ●●
●
● ●●●●
●●
●●●●
●
●●
●
●
●●●
● ●
●
●
●●●
●
●● ●
●● ●● ● ●
●
●●
●
●
●
●
●●●
●
●● ●●
●●
●●
●
●●
●●●
●●●●
●●●
●●●●●
●●●● ●●
●
●●
●●●
●●
●
●●
●
●●●●
●
●
●
●● ●●
● ●●●
●
●
●●
●●
●
●●●
●
●●●● ●●●
●●●●●
● ●●
●●●●●
●
●
●●● ●●●●
● ●●
●●
●●
●●●
●●●
●●
●●
● ●●●
●● ●
●
●●●
●
●●
●●
●●●●●●
●●
●●
●●
●●
●
●●●
●●●●
●
●●●●
●●●
●● ●●
●●
●
●●
● ●●●
●●● ●
●●
●●●
●●●
●●
●●
●● ●●
●● ●
●
●●●
●●
●●●
●●●
●
●
●
● ●● ●
●●
●
●
●●●
●●●
●●
●
●●●
●
●●
●
●
●
●
●●
●
●●
●● ●● ●●
●● ●
●
●
●●●
● ●●●●
●●
●●● ●●
●●
●●●●
●●●●
●●
●
●●
●●●●
● ●●●
●
●●●●
●●
●●●
●
● ●●
●
●●●
●●
●●●● ●
●
●●●●●●
●
●●●●●
●●●
●●●
●●
●
●
●
●●●●
● ●●●●●●●●
●●
●●●●●●
●
●● ●●●●●●
●
●●
●●
● ●●
●
●
●●●●●●●
●●● ●●●
●●
●●●
●
●●●●
●●●
●● ●●●●
●●
●
●● ●●
●●● ●●●
●
●● ●●●●●
●●●●
●●
●
●
●●●
●●
●● ●●
●●
●
●● ●
● ●●●
●●
●
●●
●
● ●●●●●●
●
●●●●
●●
●
●
●
●
●●
●●
●●
●
●● ●●
●
●●●●●
●
●●
●
●
●●
●●● ●
● ●●
●
●
●●
●●●●
● ●
●●
●●●● ●●●
●● ●●●
●●●
●●●
●
●
●●●●
●
●●
●●●
●●●
●●●
●
●
●●
●●
●●
●
●●
●
●●● ●●●
●●
●●
●●
●●
●●●●●●●● ●●
●●●●
●● ●●●●
●
●
●●●●
●●●
●●●●
●●
●●
●
●●●●
● ●●● ●● ●● ● ●●
●●●
●●
●●●
●●●
●
●●
●●
●●
●
●●●
●●● ●●●●
●●●●●● ●●
●
●●
●●●
●●
●
●
●● ●●●●● ●●●● ●
●
●
●
●●
●
●●
●
●●●●●
●
●
●
● ●
●
●●●●●●
●●
● ●● ●●
●●
●●●
●
●●●●●●●●
●●
●●
●●
●●
● ●●
●
●●
●●●●
●● ●●●●●
●● ●
●
●
●●
●●
●●
●
●●
●●
●●
●● ●●●●
●
●● ●●●
●
●●●●●
●●
●
●
●●●●●
●
●
●
●
●●●
●●●●
●
●●●
●
●
●●
●●●●
●
●● ●●
● ●●
●●●
●●
●●
●
●
●●● ●●
●
●
●
●●● ●
●
●●●● ●●●● ●●●●
●
●
●
●
●●●
● ●●
●●
●
●●
●●●
●
●●●
●
●
●
●●
●●●●
●
●
●●
●●
●●
●●●
●●● ●●
●●
●
●●
●●
●●
● ●●●●●
●
●●●
●
●●●●●
●●
●●●●
●
●●
●
●
●●●
● ●
●
●
●● ●
●
●●●
●● ●●● ●
●
●●●
●
●
●
●● ●
●
●●●●
●●
●●
●
●●
●● ●
●●
●●●●
●
● ●●●●
● ●●●●●
●
●●
● ● ●
●●
●
●●
●
●●●●
●
●
●
●● ●●●●●●
●
●
●●
● ●
●
●●●
●
●●●
● ●●●
●●●● ●
●●●
●●●●●
●
●
●●●● ●
●●
●●●
●●
●●
●●●
● ●●
●●
●●
●● ●●
●●●
●
●●●
●
●●
●●
●●●
● ●●●
●
●●
●●
●●
●
● ●●
●●●●
●
● ●●●
● ●●
●● ●●
●●
●
●●
●● ●●
●●●●
●●
●●●
●●●
●●
●●
●●●●
●●●
●
●●●
●●
●●●
●●
●
●
●
●
●●●●
●●
●
●
●●●
● ●●
●●
●
● ●●
●
●●
●
●
●
●
●●
●
●●
●●●● ● ●
●●●
●
●
●●●
●●●●●
●●
●●●●●
●●
●●●●
●● ●●
●●
●
●●●●●●
● ●●●
●
●●●
●
●●
●●●
●
● ●●
●
●●●●●
●●● ●●
●
●●●●●
●
●
● ●●●●
●●●
●●●
● ●
●
●
●
●●●●●●●●●●●●●
● ●
● ●●●●●
●
●●●● ●●●●
●
●●
●●
●●●
●
●
●●●●●
●●
●●●●●
●
●●
●●●
●
● ●●●
●●●
●● ●●● ●●
●●
●●●●
●● ● ●●●
●
●●●●●
●●●
● ●●
●●
●
●
●●●
●●
●●●●●
●●
● ● ●
●●●●
●●
●
● ●
●
●●● ●●●●
●
●●
●●
●●
●
●
●
●
●●
●●
●●
●
●●●●
●
●●
●●●●
●●
●
●
●●
●●● ●●● ●
●
●
●●
●●● ●
●●
●●●●
● ●●●●
●●●●●
● ●●
●●●
●
●
●●● ●
●
●●
●●●● ●
●
●● ●
●
●
●●
●●
●●
●
●●
●
●● ●●● ●●
●●
●●●
●●
●● ●●●●●●●●
●●●●●
●●●●●
●
●
●●
●●
●●●
●●●●
●●
●●
●
●● ●●
●●●
● ●●●●● ●●●
●●●
●
●● ●
●● ●
●
●●
●●
●●
●
●●●
● ●●●●●●
●●●●●●●●●
●●
●●●●
●
●
●
●● ●●●●●●
●●● ●
●
●
●
●●
●
●●
●
●●●●●●
●
●
●●
●
●●●●●●●●●●●●
●●●
●●●●
●●●●●●●●
●●●●
●●
●●●●
●
●
●●●● ●
●●●●●●●
●
●●●
●
●
●●●●
●●
●
●●
●●
●●
●●●●●●
●
●●●●●
●
●●●●●●●
●
●
●●
●●●
●
●
●
●
●●●
●●●●●
●●●●
●
●●
●●●●
●
●●●●
● ●●
●●●
●●
●●
●
●
●●●●●
●
●
●
●●●●
●
●●●●● ●●●●●●●●
●
●
●
●●●
●●●
●●
●
●●
●●●
●
●●●
●
●
●
●●
●●●●
●
●
●●●●●
●
●●●●●
●●●●●
●
●●
●●
●●
●●●●●●
●
●●●
●
●●●●●
●●
●●●●
●
●●
●
●
●●● ●●
●
●
●●●
●
●●●
●●●●●●
●
●●
●
●
●
●
●●●●
●●●●
●●
●●
●
●●
●●●
●●●●●●●
●●●●●
●●●●●●
●
●●●●●
●●
●
●●
●
●●●●
●
●
●
●●●●
●●●●
●
●
●●
●●
●
● ●●●
●●●
●●●
●●●●●●
●●●
●●●
●●
●
●
●●●●●●●
●●●
●●
●●
●●●
●●●
●●●
●
●●●●
●●●
●
●●●
●
●●●●●
●●●● ●●●
●●
●●
●●
●
●●●●●
●●
●
● ●●●
●●●
●●●●
●●
●
●●●●●
●●●●●
●●
●●●●●
●●●●●
●●●●●
● ●
●
●●●
●●
●●●●
●●
●
●
●
●●●●
●●●
●
●●●●●●
●●
●
●●●
●
●●
●
●
●
●
●●
●
●●
●●●● ●●
●●●
●
●
●●●
●●●●
●●●●●●●
●
●●●●●●
●●●
●
●●
●
●●●●●●
● ●●●
●
●●●
●
●●
●●●
●
●●●
●
● ●●●●
●●●●●
●
●●●●●
●
●
●●●●●
●●●●●●
●●
●
●
●
●●●●●●●●●
●●●●● ●
●●●●●●
●
●●●● ●●●●
●
●●
●●
●●●
●
●
●●●●●●
●●●●●●
●
●●
●●
●●
●●●●
●●●● ●●●●●
●●
●
●●●●
●●●●● ●●
●●●●●●●
●●●●
●●●
●
●●●
●●
●●●●●●●
●●●
●●●●
●●
●
● ●
●
●●●●●●●
●
●●
●●
●●
●
●
●
●
●●
●●
●●
●
●●●●
●
●●●●●
●
●●●
●
●●
●●●●● ●●
●
●
●●
●●●●●●
●●●
●●●●●●
●●●●●
●●●●
●●
●
●
●●●●
●
●●
●●●
●●●
●●●
●
●
●●●●
●●
●
●●●
●●●●●●●●●
●●●
●●
●●●●●●●●●●●●●●●●●●
● ●
●
●
●●●
●
●●●
●●
●●
●●
●●
●
●●●●
●●●●●●●●●●●●● ●●●
●●●●●●
●
●●
●●
●●
●
●●●
●●●●●●●
●●●●●●●●
●
●●
●●●
●●
●
●
●● ●●●●●●●●●
V40
2
40 2 4
−6
−4
−2
−6 −4 −2
●
●
●
●
●●
●
●●
●
●●● ●
●●
●
●
● ●
●
●●●●
●●●●
● ●● ●●
●●
●●●
●
●●●●●●●
●
●●
●●
●●
●●
●●●
●
●●
● ●●●
●● ●● ●●●
●● ●
●
●
●●●
●
●●
●
●●●●
●●
●● ● ●●●
●
●● ●●●
●
● ●●● ●
●●
●
●
●●
●●●
●
●
●
●
●● ●
●●●●
●
●●●
●
●
●●
●●● ●
●
●●●●
●●●
●●●
●●
●●
●
●
●●●● ●
●
●
●
●●● ●
●
●●●● ●●● ●●●●●
●
●
●
●
●●●
● ●●
●●
●
●●
●●●
●
●●●
●
●
●
●●
●●●
●
●
●
●●
●●
●●
● ●●
●●●●●
●●
●
●●
●●
●●
●●●●●●
●
● ●●
●
●●●● ●
●●
● ●● ●
●
●●
●
●
●● ●
●●
●
●
● ●●
●
●●●
● ● ●●● ●
●
●●
●
●
●
●
●●●
●
●● ●●
●●
●●
●
●●
●●●
●●
●●● ●●
● ●●●●●●● ● ●●
●
●●
● ●●
●●
●
● ●
●
● ●●●
●
●
●
●●● ●● ●●
●
●
●
●●
●●
●
●●●●
●● ●
● ●●●
●●●●●
●●●
●●●
●●
●
●
● ●●●●
●●
●●●
●●
●●
●●●
●●●
●●
●●
● ●●●
●●●
●
●●●
●
●●
●●
●●
●●● ●
●●
●●
●●
●●
●
●●●
●●●●
●
● ●●●
●●●
●● ●●
●●
●
●●
●● ●●
● ●●●
●●
●●●●
●●●
●●●
●● ●●
●●●
●
●●●
●●
●●●
●●
●
●
●
●
●●●●
●●
●
●
●●●
● ●●
●●
●
● ●●
●
●●
●
●
●
●
●●
●
●●
●●● ● ●●
●●●
●
●
●●●
●●● ●●
●●
●●●●
●
●●
●●●●
●●●●
●●
●
●●
●●●●
●●●●
●
●●●
●
●●
● ●●
●
●● ●
●
● ●●●●
●●● ●●
●
●●●●●
●
●
●●●●●
●●●
●●●
● ●
●
●
●
● ●●● ●●●●●●● ●●● ●
●●●
●● ●
●
● ●●●●●●●
●
●●
●●
●● ●
●
●
●● ●●●●
●●
●●●●●
●●
●●
●●
●●●●
● ●●● ●●
●● ●●
●●
●●● ●
●● ● ●● ●
●
●●●●●
● ●●
●●●
●●
●
●
●●●
●●
●● ●●●
●●
●● ●
● ●● ●
●●
●
●●
●
● ●●● ●● ●
●
●●
●●
●●
●
●
●
●
●●
●●
●●
●
●●● ●
●
●●●●●
●
●●
●
●
●●
●●● ●●● ●
●
●
●●
●●● ●
●●
●●
●●
●●●●●
●● ●●●
●●●
●●●
●
●
●●●●
●
● ●
●●●●●
●
●● ●
●
●
●●●●
●●
●
●●
●
●● ●●●● ●
●●●
●●
●●
● ●●● ●●●●●●
●●● ●●● ●●
● ●
●
●
●●●
●
● ●●
●●●●
●●
●●
●
●●●●
● ●●
● ●● ●● ● ●●●
● ●●
●
●●●
●●●
●
●●
●●
●●
●
●●●
●●●●● ●●
●●●●● ●●●
●
●●
●●●
●●
●
●
● ●● ● ●●●●
●●●
●
●
●
●
●●
●
●●
●●
●●
●●●
●
●
●●
●
●
●●●
●●● ●●
●
●
●●●
●
●
●●
●
●
●●●●●
●●
●●
● ●●●
●●
●
●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●● ●
●●●
●●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
● ●
●●
●
●●●
●
●
●
●
●
●
●●
●●●●●●●
●
●
●●
●
●
●
●●
●
●●
●●
●●
●●
●●
●●● ●
●
●
●
●
●
●●
●
●
●●●
●●
●
●●●
●
●
●
●●
●●●
●
●
●
●●●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●●
●
●
●●
●●
●●
●●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●●●
●●●
●
●
●●
●
●
●
●
●●●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●●
●
●
●
●●
●
●●●●●
●●
●
●● ●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●●
●
●
●● ●●
●
●
●
●
●
●
●●●●●
●
●●
●
●
●●●
●●
●●●
●●● ●
●
●
●●
●●
●●
●
●
●●
●
●● ●
●
●●●●
●
●●●
●
●●●●
●
●
●● ●
●
●
●
●
●
●
●●●
● ●
●
●
●
●
●●● ●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●● ● ●
●●
●●
●
●
●
●●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●●●
●●
●●
●
●
●●●
●
●
●
●
●●
●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●●
●
●
●
●●●
●
●●●
●●●
●
●
●
●●●
●
●●
●
●●●●
●●
●●●
●●
●
●●●
●●●
●
●●●
●●●
●
●
●
●
●● ●
●●
●
●
●
●●
●●
●
●●● ●
●●
●
●
●
●
●●
●●●●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●●
●●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●●
●●
●●●
●●
●
●●●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●●
●●
●●● ●●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●●●
●
●●●
●
●●●
●
●
●●●●●● ●
●
●●●●
●●●
●
●●
●●
●●●
●
●
●●
●
●●●●
●●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●●
●●
●●
●
●
●●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●●●●
●
●
●
●
●●
●
●●
●●
●●
●●●
●
●
●●
●
●
●●●
●● ● ●●
●
●
●●
●
●
●
●●
●
●
●●●●●
●●
●●
● ●●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●● ●
●●●
● ●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
● ●
● ●
●
●●●
●
●
●
●
●
●
●●
●●●● ●
●●●
●
●●
●
●
●
●●
●
● ●
●●
●●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●●●
●
●
●
●●
●●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●● ●
●●
●
●
●●
●●
●●●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●● ●
●●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●
●●
● ●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●●
●
●
●
●●
●
●●●●●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●●
●
●
●●●●
●
●
●
●
●
●
●●●●●
●
● ●
●
●
●●●
●●
●●●
●●● ●
●
●
●●
●●
●●
●
●
●●
●
●●●
●
●●●●
●
●●●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●●● ●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●● ●
●●
●●
●
●
●
●●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●● ●
●●
●●
●
●
●●●
●
●
●
●
●●
●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●●
●●●
●
●
●
●●●●
●●●
● ●●
●
●
●
●●●
●
●●
●
●●●●
●●
●●●
●●
●
●●●
●●●
●
●● ●
●●●
●
●
●
●
●●●
●●
●
●
●
●●
●●
●
●●●●●●
●
●
●
●
●●
● ●●●
●
●
●●
●
● ●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●●
●●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●●●
●●
● ●●
●●
●
●●●
●●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●●
●●● ●●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●● ●
●
●● ●
●
● ●●
●
●
●●●
●●●●
●
●●●●● ●
●●
●●
●●
●●●
●
●
●●
●
●●● ●
●●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●●
●●
●●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●●●●
●
●
●
●
●●
●
●●
●●
●●
●●●
●
●
●●
●
●
●●●
●●●●●
●
●
●●●
●
●
●●
●
●
●●●●●
●●
●●
●●●●
●●
●
●
●
● ●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●●●
●●●
●●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●●
●●●●●●●●
●
●●
●
●
●
●●
●
●●
●●
●●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●●●
●
●
●
●●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●●●
●●
●
●
●●
●●
●●
●●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●●●
●
●
●●
●
●
●
●
●●●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●●
●
●
●
●●●
●●● ●●
●●
●
●●●●
●
●
●●●
●●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●●●●
●●●
●●
●
●
●●●●
●
●
●
●
●
●
●●●●●
●
●●
●
●
●●●●
●
●●●●
●●●
●
●
●●
●●
●●
●
●
●●
●
●●●
●
●●●●
●
●●●
●
●●●
●
●
●
●● ●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●●
●●
●●
●
●
●
●●●●
●●
●
●●
●●
●
●
●
●
●●
●
●●●
●●
●●●
●
●●●
●
●
●
●
●●
●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
● ●●
●
●
●
●●●●
●●●●●●
●
●
●
●●●
●
●●
●
●●●●
●●
●●●
●●
●
●●
●
●●●
●
●● ●
●●●
●
●
●
●
●●●
●●
●
●
●
●●
●●
●
●●●●
●●
●
●
●
●
●●
●●●●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●●
●●
●●●
●●
●
●●●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●●
●●●●●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●●●
●
●●●
●
●●●
●
●
●●●
●●●●●
●●●●●●●●
●●
●●
●●●
●
●
●●
●
●●●●
●●
● ●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●●●
●●●●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●●●●
●
●
●
●
●●
●
●●
●●
●●
●●●
●
●
●●
●
●
●●●
●●● ●●
●
●
●●
●
●
●
●●
●
●
●●●●●
●●
●●
●●●●
●●
●
●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●
●●
●●●
●●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
● ●
● ●
●
●●●
●
●
●
●
●
●
●●
●●●●●
●●●
●
●●
●
●
●
●●
●
●●
●●
●●
●●
●●
●●● ●
●
●
●
●
●
●●
●
●
●●●
●●
●
●●●
●
●
●
●●●●●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
●●
●
●●
● ●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●●
●
●
● ●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●●
●●
●●
● ●
●
●
●
●●
●●
●
●
●
●●
●
●●● ●●
●●
●
●●●●
●
●
●●●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●● ●
●●●
●●
●
●
●●●●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●●●
●●
●●●
●●● ●
●
●
●●●
●
●●
●
●
●●
●
●●●
●
●●●●●
●●●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●●●
● ●
●
●
●
●
●●●●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●
●●
●●
●
●
●
●●●●
●●
●
●●
●●
●
●
●
●
●●
●
●● ●
●●
●●●
●
●●●
●
●
●
●
●●
●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●●●●
●●●● ●
●●
●
●
●●●
●
●●
●
●●●●
●●
●●●
● ●
●
●●●
●●●
●
●●●
●●●
●
●
●
●
●●●
●●
●
●
●
●●
●●
●
●●●●
●●
●
●
●
●
●●
●●●●
●
●
●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●●●
●●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●●
●
●
●
●●
●●●
●
●
●
●
●
●
●●●
●●
●●●
●●
●
●●●
● ●
●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●●
●●●●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●●●
●
●●●
●
● ●●
●
●
●●●
●●●●
●
●●●
●●●
●●
●●
●●
●● ●
●
●
●●
●
●●●●
●●● ●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
● ●●
●●●
●●
●
●●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●●●
●
V51
2
3 1 2 3
−2
−1
0
−2 −1 0
96
Stochastic OptimizationI Each iteration of K-means requires a pass through whole dataset. In
extremely large datasets, this can be computationally prohibitive.I Stochastic optimization: update cluster means after assigning each data
point to the closest cluster.I Repeat for t = 1, 2, . . . until satisfactory convergence:
1. Pick data item xi either randomly or in order.2. Assign xi to the cluster with the nearest centre,
ci := argmink‖xi − µk‖2
2
3. Update cluster centre:µk := µk + αt(xi − µk)
where αt > 0 are step sizes.I Algorithm stochastically minimizes the objective function. Convergence
requires slowly decreasing step sizes:
∞∑
t=1
αt =∞∞∑
t=1
α2t <∞
97
Vector Quantization
I A related algorithm developed in the signal processing literature for lossydata compression.
I If K � n, we can store the codebook of codewords µ1, . . . , µK , and eachvector xi is encoded using ci, which only requires dlog Ke bits.
I As with K-means, K must be specified. Increasing K improves the qualityof the compressed image but worsens the data compression rate, sothere is a clear tradeoff.
I Some audio and video codecs use this method.I Stochastic optimization algorithm for K-means was originally developed
for VQ.
98
VQ Image Compression
3× 3 block VQ: View each block of 3× 3 pixels as single observation
99
VQ Image Compression
Original image (24 bits/pixel, uncompressed size 1,402 kB)
100
VQ Image Compression
Codebook length 1024 (1.11 bits/pixel, total size 88kB)
101
VQ Image Compression
Codebook length 128 (0.78 bits/pixel, total size 50kB)
102
VQ Image Compression
Codebook length 16 (0.44 bits/pixel, total size 27kB)
103
K-means Additional CommentsI Sensitivity to distance measure. Euclidean distance can be greatly
affected by measurement unit and by strong correlations. Can useMahalanobis distance,
‖x− y‖M =√
(x− y)>M−1(x− y)
where M is positive semi-definite matrix, e.g. sample covariance.I Other partition based methods. There are many other partition based
methods that employ related ideas. For example K-medoids differs fromK-means in requiring cluster centres µi to be an observation xi
2,K-medians (use median in each dimension) and K-modes (use mode).
I Determination of K. The K-means objective will always improve withlarger number of clusters K. Determination of K requires an additionalregularization criterion. E.g., in DP-means3, use
W =
K∑
k=1
∑
i∈Ck
‖xi − µk‖22 + λK
2See also Affinity propagation.3DP-means paper.
104
Probabilistic Methods
I Algorithmic approach:
Analysis/Interpretation
Data
Algorithm
I Probabilistic modelling approach:
DataUnobservedprocess
GenerativeModel
AnalysisInterpretation
105
Mixture ModelsI Mixture models suppose that our dataset was created by sampling iid
from K distinct populations (called mixture components).I Typical samples in population k can be modelled using a distribution
F(φk) with density f (x|φk). For a concrete example, consider a Gaussianwith unknown mean φk and known symmetric covariance σ2I,
f (x|φk) = |2πσ2|− p2 exp
(− 1
2σ2 ‖x− φk‖22
).
I Generative process: for i = 1, 2, . . . , n:I First determine which population item i came from (independently):
Zi ∼ Discrete(π1, . . . , πK) i.e. P(Zi = k) = πk
where mixing proportions are πk ≥ 0 for each k and∑K
k=1 πk = 1.I If Zi = k, then Xi = (Xi1, . . . ,Xip)
> is sampled (independently) fromcorresponding population distribution:
Xi|Zi = k ∼ F(φk)
I We observe that Xi = xi for each i, and would like to learn about theunknown parameters of the process.
106
Mixture Models
I Unknowns to learn given data areI Parameters: π1, . . . , πK , φ1, . . . , φK , as well asI Latent variables: z1, . . . , zK .
I The joint probability over all cluster indicator variables {Zi} are:
pZ((zi)ni=1) =
n∏
i=1
πzi =
n∏
i=1
K∏
k=1
π1(zi=k)k
I The joint density at observations Xi = xi given Zi = zi are:
pX((xi)ni=1|(Zi = zi)
ni=1) =
n∏
i=1
K∏
k=1
f (xi|φk)1(zi=k)
I So the joint probability/density4 is:
pX,Z((xi, zi)ni=1) =
n∏
i=1
K∏
k=1
(πkf (xi|φk))1(zi=k)
4In this course we will treat probabilities and densities equivalently for notational simplicity. Ingeneral, the quantity is a density with respect to the product base measure, where the basemeasure is the counting measure for discrete variables and Lebesgue for continuous variables.
107
Mixture Models - Posterior Distribution
I Suppose we know the parameters (πk, φk)Kk=1.
I Zi is a random variable, so the posterior distribution given data set X tellsus what we know about it:
Qik := p(Zi = k|xi) =p(Zi = k, xi)
p(xi)=
πkf (xi|φk)∑Kj=1 πjf (xi|φj)
where the marginal probability is:
p(xi) =
K∑
j=1
πjf (xi|φj)
I The posterior probability Qik of Zi = k is called the responsibility ofmixture component k for data point xi.
I The posterior distribution softly partitions the dataset among the kcomponents.
108
Mixture Models - Maximum Likehood
I How can we learn about the parameters θ = (πk, φk)Kk=1 from data?
I Standard statistical methodology asks for the maximum likelihoodestimator (MLE).
I The log likelihood is the log marginal probability of the data:
`((πk, φk)Kk=1) := log p((xi)
ni=1|(πk, φk)
Kk=1) =
n∑
i=1
logK∑
j=1
πjf (xi|φj)
∇φk`((πk, φk)Kk=1) =
n∑
i=1
πkf (xi|φk)∑Kj=1 πjf (xi|φj)
∇φk log f (xi|φk)
=
n∑
i=1
Qik∇φk log f (xi|φk)
I A difficult equation to solve, as Qik depends implicitly on φk...
109
Mixture Models - Maximum Likehood
n∑
i=1
Qik∇φk log f (xi|φk) = 0
I What if we ignore the dependence of Qik on the parameters?I Taking the mixture of Gaussian with covariance σ2I as example,
n∑
i=1
Qik∇φk
(−p
2|2πσ2| − 1
2σ2 ‖xi − φk‖22
)
=1σ2
n∑
i=1
Qik(xi − φk) =1σ2
((∑ni=1 Qikxi
)− φk
(∑ni=1 Qik
))= 0
φMLE?k =
∑ni=1 Qikxi∑ni=1 Qik
110
Mixture Models - Maximum Likehood
I The estimate is a weighted average of data points, where the estimatedmean of cluster k uses its responsibilities to data points as weights.
φMLE?k =
∑ni=1 Qikxi∑ni=1 Qik
I Makes sense: Suppose we knew that data point xi came from populationzi. Then Qizi = 1 and Qik = 0 for k 6= zi and:
πMLEk =
∑i:zi=k xi∑i:zi=k 1
I Our best guess of the originating population is given by Qik.
111
Mixture Models - Maximum Likehood
I For the mixing proportions, we can similarly derive an estimator.I Include a Lagrange multiplier λ to enforce constraint
∑k πk = 1.
∇logπk
(`((πk, φk)
Kk=1)− λ(
∑Kk=1 πk − 1)
)
=
n∑
i=1
πkf (xi|φk)∑Kj=1 πjf (xi|φj)
− λπk
=
n∑
i=1
Qik − λπkπMLE?k =
∑ni=1 Qik
n
I Again makes sense: the estimate is simply (our best guess of) theproportion of data points coming from population k.
112
Mixture Models - The EM Algorithm
I Putting all the derivations together, we get an iterative algorithm forlearning about the unknowns in the mixture model.
I Start with some initial parameters (π(0)k , φ
(0)l )K
k=1.I Iterate for t = 1, 2, . . .:
I Expectation Step:
Q(t)ik :=
π(t−1)k f (xi|φ(t−1)
k )∑Kj=1 π
(t−1)j f (xi|φ(t−1)
j )
I Maximization Step:
π(t)k =
∑ni=1 Q(t)
ik
nφ(t)k =
∑ni=1 Q(t)
ik xi∑ni=1 Q(t)
ik
I Will the algorithm converge?I What does it converge to?
113
Likelihood Surface for a Simple Example01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
320 compBody.tex
−30 −20 −10 0 10 20 300
5
10
15
20
25
30
35
(a)
mu1
mu
2
−15.5 −10.5 −5.5 −0.5 4.5 9.5 14.5 19.5
−15.5
−10.5
−5.5
−0.5
4.5
9.5
14.5
19.5
(b)
Figure 11.6: Left: N = 200 data points sampled from a mixture of 2 Gaussians in 1d, with πk = 0.5, σk = 5, µ1 = −10 and µ2 = 10.Right: Likelihood surface p(D|µ1, µ2), with all other parameters set to their true values. We see the two symmetric modes, reflecting theunidentifiability of the parameters. Produced by mixGaussLikSurfaceDemo.
11.4.2.5 Unidentifiability
Note that mixture models are not identifiable, which means there are many settings of the parameters which have the samelikelihood. Specifically, in a mixture model withK components, there areK! equivalent parameter settings, which differ merelyby permuting the labels of the hidden states. See Figure 11.6 for an illustration. The existence of equivalent global modes doesnot matter when computing a single point estimate, such as the ML or MAP estimate, but it does complicate Bayesian inference,as we will in Section 12.5.6.3. Unfortunately, even finding just one of these global modes is computationally difficult. The EMalgorithm is only guaranteed to find a local mode. A variety of methods can be used to increase the chance of finding a goodlocal optimum. The simplest, and most widely used, is to perform multiple random restarts.
11.4.2.6 K-means algorithm
There is a variant of the EM algorithm for GMMs known as the K-means algorithm, which we now discuss.Consider a GMM in which we make the following assumptions: Σk = σ2ID is fixed, and πk = 1/K is fixed, so only the
cluster centers, µk ∈ RD, have to be estimated. Now consider an approximation to EM in which we make the approximation
p(zi = k|xi,θ) ≈ I(k = z∗i ) (11.61)
where zi∗ = arg maxk p(zi = k|xi,θ). This is sometimes called hard EM, since we are making a hard assignment of pointsto clusters. Since we assumed an equal spherical covariance matrix for each cluster, the most probable cluster for xi can becomputed by finding the nearest prototype:
z∗i = arg mink||xi − µk||2 (11.62)
Hence in each E step, we must find the Euclidean distance betweenN data points andK cluster centers, which takesO(NKD)time. However, this can be sped up using various techniques, such as applying the triangle inequality to avoid some redundantcomputations [Elk03]. Given the hard cluster assignments, the M step updates each cluster center by computing the mean of allpoints assigned to it:
µk =1
Nk
∑
i:zi=k
xi (11.63)
The resulting method is equivalent to the K-means algorithm. See Algorithm 5 for the pseudo-code.
Algorithm 3: K-means algorithm
initialize mk, k ← 1 to K1
repeat2
Assign each data point to its closest cluster center: zi = arg mink ||xi − µk||23
Update each cluster center by computing the mean of all points assigned to it: µk = 1Nk
∑i:zi=k
xi4
until converged5
Since K-means is not a proper EM algorithm, it is not maximizing likelihood. Instead, it can be interpreted as a greedyalgorithm for approximately minimizing the reconstruction error created by using vector quantization, as discussed in Sec-tion 8.5.3.3. (See also Section 20.2.1.)
c© Kevin P. Murphy. Draft — not for circulation.
(left) n = 200 data points from a mixture of two 1D Gaussians withπ1 = π2 = 0.5, σ = 5 and µ1 = 10, µ2 = −10.(right) Log likelihood surface ` (µ1, µ2), all the other parameters beingassumed known.
114
Example: Mixture of 3 GaussiansAn example with 3 clusters.
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●
●
●●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
−5 0 5 10
−5
05
X1
X2
115
Example: Mixture of 3 GaussiansAfter 1st E and M step.
−5 0 5 10
−50
5
data[,1]
data[,2]
Iteration 1
116
Example: Mixture of 3 GaussiansAfter 2nd E and M step.
−5 0 5 10
−50
5
data[,1]
data[,2]
Iteration 2
117
Example: Mixture of 3 GaussiansAfter 3rd E and M step.
−5 0 5 10
−50
5
data[,1]
data[,2]
Iteration 3
118
Example: Mixture of 3 GaussiansAfter 4th E and M step.
−5 0 5 10
−50
5
data[,1]
data[,2]
Iteration 4
119
Example: Mixture of 3 GaussiansAfter 5th E and M step.
−5 0 5 10
−50
5
data[,1]
data[,2]
Iteration 5
120
The EM AlgorithmI In a maximum likelihood framework, the objective function is the log
likelihood,
`(θ) =
n∑
i=1
logK∑
j=1
πjf (xi|φj)
Direct maximization is not feasible.I Consider another objective function F(θ, q) such that:
F(θ, q) ≤ `(θ) for all θ, q,max
qF(θ, q) = `(θ)
F(θ, q) is a lower bound on the log likelihood.I We can construct an alternating maximization algorithm as follows:
For t = 1, 2 . . . until convergence:
q(t) := argmaxqF(θ(t−1), q)
θ(t) := argmaxθF(θ, q(t))
121
EM Algorithm
I The lower bound we use is called the variational free energy.I q is a probability mass function for some distribution over (Zi) and
F(θ, q) =Eq[log p((xi, zi)ni=1)− log q((zi)
ni=1)]
=Eq
[(n∑
i=1
K∑
k=1
1(zi = k) (logπk + log f (xi|φk))
)− log q(z)
]
=∑
z
q(z)
[(n∑
i=1
K∑
k=1
1(zi = k) (logπk + log f (xi|φk))
)− log q(z)
]
Using z := (zi)ni=1 to shorten notation.
122
EM Algorithm - Solving for qI Introducing Lagrange multiplier to enforce
∑z q(z) = 1, and setting
derivatives to 0,
∇q(z)F(θ, q) =
n∑
i=1
K∑
k=1
1(zi = k) (logπk + log f (xi|φk))− log q(z)− 1− λ
=
n∑
i=1
(logπzi + log f (xi|φzi))− log q(z)− 1− λ = 0
q∗(z) =
∏ni=1 πzi f (xi|φzi)∑
z′∏n
i=1 πz′i f (xi|φz′i )=
n∏
i=1
πzi f (xi|φzi)∑k πkf (xi|φk)
=
n∏
i=1
p(zi|xi, θ)
I Optimal q∗ is simply the posterior distribution.I Plugging in optimal q∗ into the variational free energy,
F(θ, q∗) =
n∑
i=1
logK∑
k=1
πkf (xi|φk) = `(θ)
123
EM Algorithm - Solving for θI Setting derivative with respect to φk to 0,
∇φkF(θ, q) =∑
z
q(z)
n∑
i=1
1(zi = k)∇φk log f (xi|φk)
=
n∑
i=1
q(zi = k)∇φk log f (xi|φk) = 0
I This equation can be solved quite easily. E.g., for mixture of Gaussians,
φ∗k =
∑ni=1 q(zi = k)xi∑ni=1 q(zi = k)
I If it cannot be solved exactly, we can use gradient ascent algorithm:
φ∗k = φk + α
n∑
i=1
q(zi = k)∇φk log f (xi|φk)
I This leads to generalized EM algorithm. Further extension usingstochastic optimization method leads to stochastic EM algorithm.
I Similar derivation for optimal πk as before.124
EM AlgorithmI Start with some initial parameters (π
(0)k , φ
(0)l )K
k=1.I Iterate for t = 1, 2, . . .:
I Expectation Step:
q(t)(zi = k) :=π(t−1)k f (xi|φ(t−1)
k )∑Kj=1 π
(t−1)j f (xi|φ(t−1)
j )= Ep(zi|xi,θ
(t−1))[1(zi = k)]
I Maximization Step:
π(t)k =
∑ni=1 q(t)(zi = k)
nφ(t)k =
∑ni=1 q(t)(zi = k)xi∑n
i=1 q(t)(zi = k)
I Each step increases the log likelihood:
`(θ(t−1)) = F(θ(t−1), q(t)) ≤ F(θ(t), q(t)) ≤ F(θ(t), q(t+1)) = `(θ(t)).
I Additional assumption, that ∇2θF(θ(t), q(t)) are negative definite with
eigenvalues < −ε < 0, implies that θ(t) → θ∗ where θ∗ is a local MLE.
125
Notes on Probabilistic Approach and EM Algorithm
Some good things:
I Guaranteed convergence to locally optimal parameters.I Formal reasoning of uncertainties, using both Bayes Theorem and
maximum likelihood theory.I Rich language of probability theory to express a wide range of generative
models, and straightforward derivation of algorithms for ML estimation.
Some bad things:
I Can get stuck in local minima so multiple starts are recommended.I Slower and more expensive than K-means.I Choice of K still problematic, but rich array of methods for model
selection comes to rescue.
126
Flexible Gaussian Mixture Models
I We can allow each cluster to have its own mean and covariance structureallows greater flexibility in the model.
Different covariances
Identical covariances
Different, but diagonal covariances
Identical and spherical covariances
127
Probabilistic PCAI A probabilistic model related to PCA has the following generative model:
for i = 1, 2, . . . , n:I Let k < n, p be given.I Let Yi be a k-dimensional normally distributed random variable with 0 mean
and identity covariance:Yi ∼ N (0, Ik)
I We model the distribution of the ith data point given Yi as a p-dimensionalnormal:
Xi ∼ N (µ+ LYi, σ2I)
where the parameters are a vector µ ∈ Rp, a matrix L ∈ Rp×k and σ2 > 0.I EM algorithm can be used for ML estimation, but PCA can more directly
give a MLE (note this is not unique).I Let λ1 ≥ · · · ≥ λp be the eigenvalues of the sample covariance and let
V ∈ Rp×k have columns given by the eigenvectors of the top keigenvalues. Let R ∈ Rk×k be orthogonal. Then a MLE is:
µMLE = x̄ (σ2)MLE = 1p−k
∑pj=k+1 λj
LMLE = V diag((λ1 − (σ2)MLE)12 , . . . , (λk − (σ2)MLE)
12 )R
128
Mixture of Probabilistic PCAs
I We have learnt two types of unsupervised learning techniques:I Dimensionality reduction, e.g. PCA, MDS, Isomap.I Clustering, e.g. K-means, linkage and mixture models.
I Probabilistic models allow us to construct more complex models fromsimpler pieces.
I Mixture of probabilistic PCAs allows both clustering and dimensionalityreduction at the same time.
Zi ∼ Discrete(π1, . . . , πK)
Yi ∼ N (0, Id)
Xi|Zi = k,Yi = yi ∼ N (µk + Lyi, σ2Ip)
I Allows flexible modelling of covariance structure without using too manyparameters.
Ghahramani and Hinton 1996 129
Mixture of Probabilistic PCAsI PCA can reconstruct x given low dimensional embedding z, but is linear.I Isomap is non-linear, but cannot reconstruct x given any z.
tion to geodesic distance. For faraway points,geodesic distance can be approximated byadding up a sequence of “short hops” be-tween neighboring points. These approxima-tions are computed efficiently by findingshortest paths in a graph with edges connect-ing neighboring data points.
The complete isometric feature mapping,or Isomap, algorithm has three steps, whichare detailed in Table 1. The first step deter-mines which points are neighbors on themanifold M, based on the distances dX (i, j)between pairs of points i, j in the input space
X. Two simple methods are to connect eachpoint to all points within some fixed radius !,or to all of its K nearest neighbors (15). Theseneighborhood relations are represented as aweighted graph G over the data points, withedges of weight dX(i, j) between neighboringpoints (Fig. 3B).
In its second step, Isomap estimates thegeodesic distances dM (i, j) between all pairsof points on the manifold M by computingtheir shortest path distances dG(i, j) in thegraph G. One simple algorithm (16 ) for find-ing shortest paths is given in Table 1.
The final step applies classical MDS tothe matrix of graph distances DG " {dG(i, j)},constructing an embedding of the data in ad-dimensional Euclidean space Y that bestpreserves the manifold’s estimated intrinsicgeometry (Fig. 3C). The coordinate vectors yi
for points in Y are chosen to minimize thecost function
E ! !#$DG% " #$DY%!L2 (1)
where DY denotes the matrix of Euclideandistances {dY(i, j) " !yi & yj!} and !A!L2
the L2 matrix norm '(i, j Ai j2 . The # operator
Fig. 1. (A) A canonical dimensionality reductionproblem from visual perception. The input consistsof a sequence of 4096-dimensional vectors, rep-resenting the brightness values of 64 pixel by 64pixel images of a face rendered with differentposes and lighting directions. Applied to N " 698raw images, Isomap (K" 6) learns a three-dimen-sional embedding of the data’s intrinsic geometricstructure. A two-dimensional projection is shown,with a sample of the original input images (redcircles) superimposed on all the data points (blue)and horizontal sliders (under the images) repre-senting the third dimension. Each coordinate axisof the embedding correlates highly with one de-gree of freedom underlying the original data: left-right pose (x axis, R " 0.99), up-down pose ( yaxis, R " 0.90), and lighting direction (slider posi-tion, R " 0.92). The input-space distances dX(i, j )given to Isomap were Euclidean distances be-tween the 4096-dimensional image vectors. (B)Isomap applied to N " 1000 handwritten “2”sfrom the MNIST database (40). The two mostsignificant dimensions in the Isomap embedding,shown here, articulate the major features of the“2”: bottom loop (x axis) and top arch ( y axis).Input-space distances dX(i, j ) were measured bytangent distance, a metric designed to capture theinvariances relevant in handwriting recognition(41). Here we used !-Isomap (with ! " 4.2) be-cause we did not expect a constant dimensionalityto hold over the whole data set; consistent withthis, Isomap finds several tendrils projecting fromthe higher dimensional mass of data and repre-senting successive exaggerations of an extrastroke or ornament in the digit.
R E P O R T S
22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320
on
De
ce
mb
er
5,
20
08
w
ww
.scie
nce
ma
g.o
rgD
ow
nlo
ad
ed
fro
m
I We can learn a probabilistic mapping between the k-dimensional Isomapembedding space and the p-dimensional data space.
I Demo: [Using LLE instead of Isomap, and Mixture of factor analysersinstead of Mixture of PPCAs.]
Teh and Roweis 2002 130
Further Readings—Unsupervised Learning
I Hastie et al, Chapter 14.I James et al, Chapter 10.I Venables and Ripley, Chapter 11.I Tukey, John W. (1980). We need both exploratory and confirmatory. The
American Statistician 34 (1): 23-25.
131