Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Machine Learning for Data Science (CS4786)Lecture 3
Clustering
Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/
CLUSTERING CRITERION
Minimize total within-cluster dissimilarity
M1 = K�j=1�
s,t∈Cj
dissimilarity(xt
,xs
)Maximize between-cluster dissimilarity
M2 = �x
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(x
t
,xs
)Maximize smallest between-cluster dissimilarity
M3 = minx
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(xt
,xs
)Minimize largest within-cluster dissimilarity
M4 =maxj∈[K] max
s,t∈Cj
dissimilarity(xt
,xs
)
CLUSTERING CRITERION
minimizing M1 ≡maximizing M2
minimizing M5 ≡minimizing M6
Lets Build an Algorithm
CLUSTERING CRITERION
Minimize total within-cluster dissimilarity
M1 = K�j=1�
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)Maximize between-cluster dissimilarity
M2 = �x
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(x
t
,xs
)Maximize smallest between-cluster dissimilarity
M3 = minx
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(xt
,xs
)Minimize largest within-cluster dissimilarity
M4 =maxj∈[K] max
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)
SINGLE LINK CLUSTERING
Initialize n clusters with each point xt to its own cluster
Until there are only K clusters, do
1 Find closest two clusters and merge them into one cluster
2 Update between cluster distances (called proximity matrix)
dissimilarity(Ci, Cj) = min
t2Ci,s2Cj
dissimilarity(xt, xs)
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
SINGLE LINK OBJECTIVE
Objective for single-link:
M4 = minxs,xt∶c(xs)≠c(xt) �xs − xt�22
Single link clustering is optimal for above objective!
CLUSTERING CRITERION
Minimize total within-cluster dissimilarity
M1 = K�j=1�
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)Maximize between-cluster dissimilarity
M2 = �x
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(x
t
,xs
)Maximize smallest between-cluster dissimilarity
M3 = minx
s
,xt
∶c(xs
)≠c(xt
)dissimilarity(xt
,xs
)Minimize largest within-cluster dissimilarity
M4 =maxj∈[K] max
x
s
,xt
∈Cj
dissimilarity(xt
,xs
)
SINGLE LINK OBJECTIVE
Objective for single-link:
M4 = minxs,xt∶c(xs)≠c(xt) �xs − xt�22
Single link clustering is optimal for above objective!
Say c is solution produced by single-link clustering
Proof:
Say c0 6= c then,
9 t, s s.t. c0(xt) 6= c
0(xs) but c(xt) = c(xs)
min
t,s:c(xi) 6=c(xj)dissimilarity(x
i
, x
j
) > max
t,s:c(xt)=c(xs)dissimilarity(x
t
, x
s
)
Distance of points merged (edges on the tree)
Key observation:
xtxs
a b
c’ boundary
CLUSTERING CRITERION
Minimize within cluster average dissimilarity
M6 = K�j=1�s∈C
j
dissimilarity �xs
,Cj
�
= K�j=1�s∈C
j
���1�C
j
� �t∈C
j
,t≠s
dissimilarity (xs
,xt
)���= K�
j=1
1�C
j
� �s∈C
j
��� �t∈C
j
,t≠s
�xs
− x
t
�22���Minimize within-cluster variance: r
j
= 1n
j
∑x∈C
j
x
M5 = K�j=1�t∈C
j
�x
t
− r
j
�22
CLUSTERING CRITERION
minimizing M1 ≡maximizing M2
minimizing M5 ≡minimizing M6
Lets build an Algorithm
where rj =1
|Cj |X
xt2Cj
xt
CLUSTERING CRITERION
Minimize within-cluster variance: r
j
= 1n
j
∑x∈C
j
x
M5 = K�j=1�
x
t
∈Cj
�x
t
− r
j
�22Minimize within-cluster weighted scatter
M6 = K�j=1
1�C
j
� �x
s
∈Cj
��� �x
t
∈Cj
,xs
≠x
t
�xs
− x
t
�22���= K�
j=1
1�C
j
� �x
s
∈Cj
dissimilarity �xs
,Cj
�
t
t
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
t
K-MEANS CLUSTERING
For all j ∈ [K], initialize cluster centroids r
0j randomly and set m = 1
Repeat until convergence (or until patience runs out)1 For each t ∈ {1, . . . ,n}, set cluster identity of the point
cm(xt) = argminj∈[K]
�xt − r
m−1j �
2 For each j ∈ [K], set new representative as
r
mj = 1�Cm
j � �xt∈Cmj
xt
3 m← m + 1
KX
j=1
X
t2Cj
������xt �
1
|Cj |X
s2Cj
xs
������
2
= minr1,...,rK
KX
j=1
X
t2Cj
kxt � rjk2
K-means objective
K-MEANS CONVERGENCE
K-means algorithm converges to local minima of objective
O (c; r1, . . . , rK) = K�j=1�
c(xt)=j�xt − rj�22
Proof:Clustering assignment improves objective:
O �cm−1; rm1 , . . . , r
mK� ≥ O (cm; rm
1 , . . . , rmK)
(By definition of cm(xt))Computing centroids improves objective:
O (cm; rm1 , . . . , r
mK) ≥ O �cm; rm+1
1 , . . . , rm+1K �
(By the fact about centroid)
Minimize above objective over c and r1,…,rK
= =
M5 = minr1,...,rK
O(c; r1, . . . , rK)
Fact: Centroid is Minimizer
8rj ,X
t2Cj
������xt �
1
|Cj |X
s2Cj
xs
������
2
X
t2Cj
kxt � rjk2
X
t2Cj
kxt � rjk2
=X
t2Cj
kxt � µj + µj � rjk2
=X
t2Cj
kxt � µjk2 +X
t2Cj
kµj � rjk2 + 2X
t2Cj
(xt � µj)> (µj � rj)
=X
t2Cj
kxt � µjk2 +X
t2Cj
kµj � rjk2 + 2
0
@X
t2Cj
xt � |Cj |µj
1
A>
(µj � rj)
=X
t2Cj
kxt � µjk2 +X
t2Cj
kµj � rjk2
�X
t2Cj
kxt � µjk2
Proof
µj =1
|Cj |X
t2Cj
xt
K-MEANS CONVERGENCE
K-means algorithm converges to local minima of objective
O (c; r1, . . . , rK) = K�j=1�
c(xt)=j�xt − rj�22
Proof:Clustering assignment improves objective:
O �cm−1; rm−11 , . . . , rm−1
K � ≥ O �cm; rm−11 , . . . , rm−1
K �(By definition of cm(xt))
Computing centroids improves objective:
O �cm; rm−11 , . . . , rm−1
K � ≥ O (cm; rm1 , . . . , r
mK)
(By the fact about centroid)