46
Machine Learning for Data Science (CS4786) Lecture 3 Clustering Course Webpage : http://www.cs.cornell.edu/Courses/cs4786/2017fa/

Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Machine Learning for Data Science (CS4786)Lecture 3

Clustering

Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/

Page 2: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

CLUSTERING CRITERION

Minimize total within-cluster dissimilarity

M1 = K�j=1�

s,t∈Cj

dissimilarity(xt

,xs

)Maximize between-cluster dissimilarity

M2 = �x

s

,xt

∶c(xs

)≠c(xt

)dissimilarity(x

t

,xs

)Maximize smallest between-cluster dissimilarity

M3 = minx

s

,xt

∶c(xs

)≠c(xt

)dissimilarity(xt

,xs

)Minimize largest within-cluster dissimilarity

M4 =maxj∈[K] max

s,t∈Cj

dissimilarity(xt

,xs

)

Page 3: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

CLUSTERING CRITERION

minimizing M1 ≡maximizing M2

minimizing M5 ≡minimizing M6

Page 4: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Lets Build an Algorithm

CLUSTERING CRITERION

Minimize total within-cluster dissimilarity

M1 = K�j=1�

x

s

,xt

∈Cj

dissimilarity(xt

,xs

)Maximize between-cluster dissimilarity

M2 = �x

s

,xt

∶c(xs

)≠c(xt

)dissimilarity(x

t

,xs

)Maximize smallest between-cluster dissimilarity

M3 = minx

s

,xt

∶c(xs

)≠c(xt

)dissimilarity(xt

,xs

)Minimize largest within-cluster dissimilarity

M4 =maxj∈[K] max

x

s

,xt

∈Cj

dissimilarity(xt

,xs

)

Page 5: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

SINGLE LINK CLUSTERING

Initialize n clusters with each point xt to its own cluster

Until there are only K clusters, do

1 Find closest two clusters and merge them into one cluster

2 Update between cluster distances (called proximity matrix)

dissimilarity(Ci, Cj) = min

t2Ci,s2Cj

dissimilarity(xt, xs)

Page 6: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 7: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 8: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 9: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 10: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 11: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 12: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 13: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 14: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 15: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 16: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 17: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 18: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 19: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 20: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 21: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 22: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 23: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

SINGLE LINK OBJECTIVE

Objective for single-link:

M4 = minxs,xt∶c(xs)≠c(xt) �xs − xt�22

Single link clustering is optimal for above objective!

CLUSTERING CRITERION

Minimize total within-cluster dissimilarity

M1 = K�j=1�

x

s

,xt

∈Cj

dissimilarity(xt

,xs

)Maximize between-cluster dissimilarity

M2 = �x

s

,xt

∶c(xs

)≠c(xt

)dissimilarity(x

t

,xs

)Maximize smallest between-cluster dissimilarity

M3 = minx

s

,xt

∶c(xs

)≠c(xt

)dissimilarity(xt

,xs

)Minimize largest within-cluster dissimilarity

M4 =maxj∈[K] max

x

s

,xt

∈Cj

dissimilarity(xt

,xs

)

Page 24: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

SINGLE LINK OBJECTIVE

Objective for single-link:

M4 = minxs,xt∶c(xs)≠c(xt) �xs − xt�22

Single link clustering is optimal for above objective!

Say c is solution produced by single-link clustering

Proof:

Say c0 6= c then,

9 t, s s.t. c0(xt) 6= c

0(xs) but c(xt) = c(xs)

min

t,s:c(xi) 6=c(xj)dissimilarity(x

i

, x

j

) > max

t,s:c(xt)=c(xs)dissimilarity(x

t

, x

s

)

Distance of points merged (edges on the tree)

Key observation:

xtxs

a b

c’ boundary

Page 25: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

CLUSTERING CRITERION

Minimize within cluster average dissimilarity

M6 = K�j=1�s∈C

j

dissimilarity �xs

,Cj

= K�j=1�s∈C

j

���1�C

j

� �t∈C

j

,t≠s

dissimilarity (xs

,xt

)���= K�

j=1

1�C

j

� �s∈C

j

��� �t∈C

j

,t≠s

�xs

− x

t

�22���Minimize within-cluster variance: r

j

= 1n

j

∑x∈C

j

x

M5 = K�j=1�t∈C

j

�x

t

− r

j

�22

Page 26: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

CLUSTERING CRITERION

minimizing M1 ≡maximizing M2

minimizing M5 ≡minimizing M6

Page 27: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Lets build an Algorithm

where rj =1

|Cj |X

xt2Cj

xt

CLUSTERING CRITERION

Minimize within-cluster variance: r

j

= 1n

j

∑x∈C

j

x

M5 = K�j=1�

x

t

∈Cj

�x

t

− r

j

�22Minimize within-cluster weighted scatter

M6 = K�j=1

1�C

j

� �x

s

∈Cj

��� �x

t

∈Cj

,xs

≠x

t

�xs

− x

t

�22���= K�

j=1

1�C

j

� �x

s

∈Cj

dissimilarity �xs

,Cj

t

t

Page 28: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 29: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 30: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 31: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 32: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 33: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 34: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 35: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 36: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 37: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 38: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 39: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 40: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 41: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Demo

Page 42: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

t

K-MEANS CLUSTERING

For all j ∈ [K], initialize cluster centroids r

0j randomly and set m = 1

Repeat until convergence (or until patience runs out)1 For each t ∈ {1, . . . ,n}, set cluster identity of the point

cm(xt) = argminj∈[K]

�xt − r

m−1j �

2 For each j ∈ [K], set new representative as

r

mj = 1�Cm

j � �xt∈Cmj

xt

3 m← m + 1

Page 43: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

KX

j=1

X

t2Cj

������xt �

1

|Cj |X

s2Cj

xs

������

2

= minr1,...,rK

KX

j=1

X

t2Cj

kxt � rjk2

K-means objective

K-MEANS CONVERGENCE

K-means algorithm converges to local minima of objective

O (c; r1, . . . , rK) = K�j=1�

c(xt)=j�xt − rj�22

Proof:Clustering assignment improves objective:

O �cm−1; rm1 , . . . , r

mK� ≥ O (cm; rm

1 , . . . , rmK)

(By definition of cm(xt))Computing centroids improves objective:

O (cm; rm1 , . . . , r

mK) ≥ O �cm; rm+1

1 , . . . , rm+1K �

(By the fact about centroid)

Minimize above objective over c and r1,…,rK

= =

M5 = minr1,...,rK

O(c; r1, . . . , rK)

Page 44: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

Fact: Centroid is Minimizer

8rj ,X

t2Cj

������xt �

1

|Cj |X

s2Cj

xs

������

2

X

t2Cj

kxt � rjk2

Page 45: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

X

t2Cj

kxt � rjk2

=X

t2Cj

kxt � µj + µj � rjk2

=X

t2Cj

kxt � µjk2 +X

t2Cj

kµj � rjk2 + 2X

t2Cj

(xt � µj)> (µj � rj)

=X

t2Cj

kxt � µjk2 +X

t2Cj

kµj � rjk2 + 2

0

@X

t2Cj

xt � |Cj |µj

1

A>

(µj � rj)

=X

t2Cj

kxt � µjk2 +X

t2Cj

kµj � rjk2

�X

t2Cj

kxt � µjk2

Proof

µj =1

|Cj |X

t2Cj

xt

Page 46: Machine Learning for Data Science (CS4786) Lecture 3 · j randomly and set m = 1 Repeat until convergence (or until patience runs out) 1 For each t ∈ {1,...,n}, set cluster identity

K-MEANS CONVERGENCE

K-means algorithm converges to local minima of objective

O (c; r1, . . . , rK) = K�j=1�

c(xt)=j�xt − rj�22

Proof:Clustering assignment improves objective:

O �cm−1; rm−11 , . . . , rm−1

K � ≥ O �cm; rm−11 , . . . , rm−1

K �(By definition of cm(xt))

Computing centroids improves objective:

O �cm; rm−11 , . . . , rm−1

K � ≥ O (cm; rm1 , . . . , r

mK)

(By the fact about centroid)