Spectral clustering

Preview:

Citation preview

Pattern Recognition and Machine Learning Summer School 2014

Hierarchical methods

Agglomerative clustering

Divisive clustering

Iterative methods

k‐means clustering

EM algorithm

Mean‐shift algorithm

Spectral clustering

Normalized cut

Ratio cut

Graph‐cut

Clustering based on the spectrum of the graph

the multiset of the eigenvalues of the Laplacian matrix

Treats clustering as a graph partitioning problem without making specific assumptions on the form of the clusters.

Clusters points using eigenvectors of matrices derived from the data.

Maps data to a low‐dimensional space that are separated and can be easily clustered.

L = D (degree matrix) – W (adjacency matrix)

affinity or similarity of the two nodes

• Affinity matrix

• Laplacian matrix

• Similarity measures

– Cosine measure

– Bhattacharyya coefficient

• Distance measures

– Euclidean distance

– Manhattan distance

– Maximum distance …

Find a label vector x !

Convert the discrete problem to continuous domain

But, NP-hard problem..

Average association

Points in dominant cluster are non-zero

X(label) is divided

into 0 and 1

But, favor for small and isolated clusters

Sum of the weights to cut edges

Find the second minimum eigenvector

=

= assoc(G1,G) - assoc(G1,G1)

= cut(G1, G2)

(D-W) * 1 = 0 * 1

The smallest eigenvector is 1.

y : binary vector representing the

cluster association

Favors partitioning with equal size segments

The second smallest eigenvalue

Based on the edge weights

‘NP-complete’

Find z in

Pros Generic framework, can be used with many different

features

Cons High storage requirement and time complexity

Bias towards partitioning into equal segments

Need the number of clusters as parameter

Incremental partitioning

Partition using only one eigenvector at a time

Use procedure recursively

Batch partitioning

Use k eigenvectors

Directly compute k‐way partitioning, for example, by k‐means clustering

Usually performs better

Find a low‐dimensional

embedding by

eigen‐decomposition

separates data while projecting in the low dimensional space

allows clustering of non‐convex data effectively

Thank you !

Recommended