Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
Spectral Clustering
Veronika Strnadová-Neeley
Seminar on Top Algorithms in Computational Science
Main Reference: Ulrike Von Luxburg’s A Tutorial on Spectral Clustering
The Spectral Clustering Algorithm
Uses the eigenvalues and vectors of the graph Laplacian matrix in order to find clusters (or “partitions”) of the graph
1
2
34
5
The Spectral Clustering Algorithm
Uses the eigenvalues and vectors of the graph Laplacian matrix in order to find clusters (or “partitions”) of the graph
1
2
34
5
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
The Spectral Clustering Algorithm
Uses the eigenvalues and vectors of the graph Laplacian matrix in order to find clusters (or “partitions”) of the graph
1
2
34
5
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
𝜆1, 𝜆2, … , 𝜆𝑛
𝑒1, 𝑒2, … , 𝑒𝑛
The Spectral Clustering Algorithm
Uses the eigenvalues and vectors of the graph Laplacian matrix in order to find clusters (or “partitions”) of the graph
1
2
34
5
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
1
2
34
5𝜆1, 𝜆2, … , 𝜆𝑛
𝑒1, 𝑒2, … , 𝑒𝑛
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
1
2
34
5
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
1
2
34
5𝜆1, 𝜆2, … , 𝜆𝑛
𝑒1, 𝑒2, … , 𝑒𝑛
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
𝑐𝑢𝑡 𝐴1, 𝐴2 = 2𝐴1
𝐴2𝐴2
𝑐𝑢𝑡 𝐴1, 𝐴2 = 7
𝐴1
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
𝑅𝑎𝑡𝑖𝑜𝑐𝑢𝑡 𝐴1, 𝐴2, … , 𝐴𝑘 =
𝑖=1,…,𝑘
𝑐𝑢𝑡(𝐴𝑖 , 𝐴𝑖)
|𝐴𝑖|
𝐴1 𝐴2
𝑅𝑎𝑡𝑖𝑜𝑐𝑢𝑡 𝐴1, 𝐴2 = 0.226
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
𝑅𝑎𝑡𝑖𝑜𝑐𝑢𝑡 𝐴1, 𝐴2, … , 𝐴𝑘 =
𝑖=1,…,𝑘
𝑐𝑢𝑡(𝐴𝑖 , 𝐴𝑖)
|𝐴𝑖|
𝐴1 𝐴2
𝑅𝑎𝑡𝑖𝑜𝑐𝑢𝑡 𝐴1, 𝐴2 = 0.226
A little history…
• 1973: Algebraic Connectivity of Graph M. Fiedler
• 1986: Eigenvalues and expanders. N. Alon.
• 1989: Conductance and convergence of Markov chains-a combinatorial treatment of expanders. M. Mihail
• 1990 – Partitioning Sparse Matrices with Eigenvectors of Graphs A. Pothenet al.
• 2000: Normalized Cuts and Image Segmentation. J. Shi and J. Malik
• 2001: A Random Walks View of Spectral Segmentation. M. Meila
• 2010: Power Iteration Clustering. F. Lin and W. Cohen
Definition: Unnormalized Graph Laplacian Matrix
𝐿 = 𝐷 −𝑊
Where 𝐷 is the diagonal degree matrix and 𝑊𝑖𝑗 ≥ 0 weight of edge (𝑖, 𝑗), 𝑖 ≠ 𝑗
Definition: Unnormalized Graph Laplacian Matrix
𝐿 = 𝐷 −𝑊
𝐷 =
1 0 00 2 00 0 2
0 00 00 0
0 0 00 0 0
2 00 1
W=
0 1 01 0 10 1 0
0 00 01 0
0 0 10 0 0
0 11 0
1 2 3 4 51 1 1 1
Definition: Unnormalized Graph Laplacian Matrix
𝐿 = 𝐷 −𝑊
𝐿 =
1 0 00 2 00 0 2
0 00 00 0
0 0 00 0 0
2 00 1
−
0 1 01 0 10 1 0
0 00 01 0
0 0 10 0 0
0 11 0
1 2 3 4 51 1 1 1
Definition: Unnormalized Graph Laplacian Matrix
𝐿 = 𝐷 −𝑊
𝐿 =
1 −1 0−1 2 −10 −1 2
0 00 0−1 0
0 0 −10 0 0
2 −1−1 1
1 2 3 4 51 1 1 1
Definition: Unnormalized Graph Laplacian Matrix
𝐿 = 𝐷 −𝑊
𝐿 =
3 −3 0−3 4 −10 −1 5
0 00 0−4 0
0 0 −40 0 0
9 −5−5 5
Note: In a weighted graph, 𝐷𝑖𝑖 = 𝑗=1𝑛 𝑤𝑖𝑗
1 2 3 4 53 1 4 5
Definition: Unnormalized Graph Laplacian Matrix
𝐿 = 𝐷 −𝑊
1
2
34
5 𝐿 =
4 −1 −1−1 4 −1−1 −1 4
−1 −1−1 −1−1 −1
−1 −1 −1−1 −1 −1
4 −1−1 4
Definition: Unnormalized Graph Laplacian Matrix
𝐿 = 𝐷 −𝑊
1 2 3 4
5 6 7 8𝐿 =
2 00 2
0 00 0
0 00 0
2 00 4
−1 −1−1 −1
0 00 0
0 −1−1 −1
−1 0−1 −1
−1 −1−1 −1
0 −1−1 −1
0 00 0
−1 −10 −1
3 00 4
0 00 0
0 00 0
2 00 1
Properties of the Graph Laplacian
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
Note that in this property relates the difference in weights assigned to vertices to the quadratic form of 𝐿
1
2
34
5
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1
2
34
5
1
1
1
1
1
1
1
1
1
1
𝒇 =
𝟏𝟏𝟏𝟏𝟏
⇒ 𝒇𝑻𝑳𝒇 = 𝟎𝐿 =
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1
2
34
5
4
4
4
1
1
1
1
1
1
1
𝒇 =
𝟒𝟏𝟏𝟒𝟒
⇒ 𝒇𝑻𝑳𝒇 = 𝟗𝐿 =
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1
2
34
5
1
4
4
1
1
1
1
1
1
1
𝒇 =
𝟏𝟏𝟏𝟒𝟒
⇒ 𝒇𝑻𝑳𝒇 = 𝟐𝟕𝐿 =
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1⋮𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
= 𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1⋮𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1⋮𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1⋮𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1⋮𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1⋮𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1⋮𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
1. 𝒇𝑻𝑳𝒇 =𝟏
𝟐𝜮𝒊,𝒋=𝟏…𝒏𝒘𝒊𝒋(𝒇𝒊 − 𝒇𝒋)
𝟐 for all 𝒇𝝐ℝ𝒏
𝑓1…𝑓𝑛 (𝐷 −𝑊)𝑓1𝑓𝑛
𝑓1…𝑓𝑛
𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
𝑓1⋮𝑓𝑛
− 𝑓1…𝑓𝑛
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
𝑓1⋮𝑓𝑛
=
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 −
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗
=1
2
𝑖=1
𝑛
𝑑𝑖 𝑓𝑖2 − 2
𝑖=1
𝑛
𝑗=1
𝑛
𝑤𝑖𝑗𝑓𝑖𝑓𝑗 +
𝑗=1
𝑛
𝑑𝑗 𝑓𝑗2
=1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2
2. 𝑳 is symmetric and positive semi-definite
Recall the definition: 𝐿 = 𝐷 −𝑊
𝐷 diagonal, 𝑊 symmetric → 𝐿 symmetric
Recall property 1:
𝑓′𝐿𝑓 =1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2 ≥ 0 ∀𝑓 ∈ ℝ𝑛
2. 𝑳 is symmetric and positive semi-definite
Recall the definition: 𝐿 = 𝐷 −𝑊
𝐷 diagonal, 𝑊 symmetric → 𝐿 symmetric
Recall property 1:
𝑓′𝐿𝑓 =1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2 ≥ 0 ∀𝑓 ∈ ℝ𝑛
𝐿 =
3 −3 0−3 4 −10 −1 5
0 00 0−4 0
0 0 −40 0 0
9 −5−5 5
1 2 3 4 53 1 4 5
2. 𝑳 is symmetric and positive semi-definite
Recall the definition: 𝐿 = 𝐷 −𝑊
𝐷 diagonal, 𝑊 symmetric → 𝐿 symmetric
Recall property 1:
𝑓𝑇𝐿𝑓 =1
2
𝑗=1
𝑛
𝑖=1
𝑛
𝑤𝑖𝑗(𝑓𝑖−𝑓𝑗)2 ≥ 0 ∀𝑓 ∈ ℝ𝑛
3. The smallest eigenvalue of 𝑳 is 0 with corresponding eigenvector being the constant vector 𝕝
3. The smallest eigenvalue of 𝑳 is 0 with corresponding eigenvector being the constant vector 𝕝
𝐿 = 𝐷 −𝑊 =𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
−
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
=
𝑗=1
𝑛
𝑤1𝑗 − 𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮
−𝑤𝑛1 ⋯
𝑗=1
𝑛
𝑤𝑛𝑗 − 𝑤𝑛𝑛
⇒
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮−𝑤𝑛1 ⋯ 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
1⋮1
=
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 + 𝑗=2
𝑛 𝑤1𝑗⋮
𝑗=1𝑛−1(−𝑤𝑛𝑗) + 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
=0⋮0
0 is an eigenvalue and its corresponding eigenvector is the constant vector 𝕝
Since each eigenvalue is ≥ 0 (property 2), this is the smallest eigenvalue
3. The smallest eigenvalue of 𝑳 is 0 with corresponding eigenvector being the constant vector 𝕝
𝐿 = 𝐷 −𝑊 =𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
−
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
=
𝑗=1
𝑛
𝑤1𝑗 − 𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮
−𝑤𝑛1 ⋯
𝑗=1
𝑛
𝑤𝑛𝑗 − 𝑤𝑛𝑛
⇒
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮−𝑤𝑛1 ⋯ 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
1⋮1
=
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 + 𝑗=2
𝑛 𝑤1𝑗⋮
𝑗=1𝑛−1(−𝑤𝑛𝑗) + 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
=0⋮0
0 is an eigenvalue and its corresponding eigenvector is the constant vector 𝕝
Since each eigenvalue is ≥ 0 (property 2), this is the smallest eigenvalue
3. The smallest eigenvalue of 𝑳 is 0 with corresponding eigenvector being the constant vector 𝕝
𝐿 = 𝐷 −𝑊 =𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
−
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
=
𝑗=1
𝑛
𝑤1𝑗 −𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮
−𝑤𝑛1 ⋯
𝑗=1
𝑛
𝑤𝑛𝑗 − 𝑤𝑛𝑛
⇒ 𝐿𝟏 =
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮−𝑤𝑛1 ⋯ 𝑗=1
𝑛 𝑤𝑛𝑗 −𝑤𝑛𝑛
1⋮1
𝑗=1𝑛 𝑤1𝑗 −𝑤11 + 𝑗=2
𝑛 𝑤1𝑗⋮
𝑗=1𝑛−1(−𝑤𝑛𝑗) + 𝑗=1
𝑛 𝑤𝑛𝑗 −𝑤𝑛𝑛
=0⋮0
0 is an eigenvalue and its corresponding eigenvector is the constant vector 𝕝
Since each eigenvalue is ≥ 0 (property 2), this is the smallest eigenvalue
3. The smallest eigenvalue of 𝑳 is 0 with corresponding eigenvector being the constant vector 𝕝
𝐿 = 𝐷 −𝑊 =𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
−
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
=
𝑗=1
𝑛
𝑤1𝑗 −𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮
−𝑤𝑛1 ⋯
𝑗=1
𝑛
𝑤𝑛𝑗 − 𝑤𝑛𝑛
⇒ 𝐿𝟏 =
𝑗=1𝑛 𝑤1𝑗 −𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮−𝑤𝑛1 ⋯ 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
1⋮1
=
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 − 𝑗=2
𝑛 𝑤1𝑗⋮
𝑗=1𝑛−1(−𝑤𝑛𝑗) − 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
=0⋮0
0 is an eigenvalue and its corresponding eigenvector is the constant vector 𝕝
Since each eigenvalue is ≥ 0 (property 2), this is the smallest eigenvalue
3. The smallest eigenvalue of 𝑳 is 0 with corresponding eigenvector being the constant vector 𝕝
𝐿 = 𝐷 −𝑊 =𝑑1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑑𝑛
−
𝑤11 ⋯ 𝑤1𝑛⋮ ⋱ ⋮
𝑤𝑛1 ⋯ 𝑤𝑛𝑛
=
𝑗=1
𝑛
𝑤1𝑗 − 𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮
−𝑤𝑛1 ⋯
𝑗=1
𝑛
𝑤𝑛𝑗 − 𝑤𝑛𝑛
⇒
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 ⋯ −𝑤1𝑛
⋮ ⋱ ⋮−𝑤𝑛1 ⋯ 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
1⋮1
=
𝑗=1𝑛 𝑤1𝑗 − 𝑤11 − 𝑗=2
𝑛 𝑤1𝑗⋮
𝑗=1𝑛−1(−𝑤𝑛𝑗) − 𝑗=1
𝑛 𝑤𝑛𝑗 − 𝑤𝑛𝑛
=0⋮0
0 is an eigenvalue and its corresponding eigenvector is the constant vector 𝕝
Since each eigenvalue is ≥ 0 (property 2), this is the smallest eigenvalue
4. 𝑳 has non-negative, real-valued eigenvalues 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By the property 2, 𝐿 is symmetric → 𝐿 has 𝑛 real eigenvalues
Again by property 2, 𝐿 is positive-semidefinite→ 𝟎 ≤ 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By property 3, 𝝀𝟏 = 𝟎
Therefore, 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
4. 𝑳 has non-negative, real-valued eigenvalues 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By the property 2, 𝐿 is symmetric → 𝐿 has 𝑛 real eigenvalues
Again by property 2, 𝐿 is positive-semidefinite→ 𝟎 ≤ 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By property 3, 𝝀𝟏 = 𝟎
Therefore, 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
4. 𝑳 has non-negative, real-valued eigenvalues 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By the property 2, 𝐿 is symmetric → 𝐿 has 𝑛 real eigenvalues
Again by property 2, 𝐿 is positive-semidefinite→ 𝟎 ≤ 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By property 3, 𝝀𝟏 = 𝟎
Therefore, 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
4. 𝑳 has non-negative, real-valued eigenvalues 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By the property 2, 𝐿 is symmetric → 𝐿 has 𝑛 real eigenvalues
Again by property 2, 𝐿 is positive-semidefinite→ 𝟎 ≤ 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
By property 3, 𝝀𝟏 = 𝟎
Therefore, 𝟎 = 𝝀𝟏 ≤ 𝝀𝟐 ≤ ⋯ ≤ 𝝀𝒏
Properties of Graph Laplacians: Summary
𝐿 = 𝐷 −𝑊
1. 𝑓′𝐿𝑓 =1
2Σ𝑖,𝑗=1…𝑛𝑤𝑖𝑗(𝑓𝑖 − 𝑓𝑗)
2 for all 𝑓𝜖ℝ𝑛
2. 𝐿 is symmetric and positive semi-definite
3. The smallest eigenvalue of 𝐿 is 0, and the corresponding eigenvector is the constant vector 𝕝
4. 𝐿 has 𝑛 nonnegative, real eigenvalues 0 = 𝜆1 ≤ 𝜆2 ≤ … ≤ 𝜆𝑛
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Suppose 𝑘 = 1.
Suppose further that 𝒇 is an eigenvector with eigenvalue 0, i.e., 𝐿𝒇 = 𝟎
→ 0 = 𝒇𝑇𝐿𝒇 =12Σ𝑖,𝑗=1…𝑛𝑊𝑖𝑗(𝒇𝑖 − 𝒇𝑗)
2
Since all weights are positive,→ 𝒇𝑖 = 𝒇𝑗 for all connected vertices 𝑖, 𝑗
Thus 𝒇𝒗𝟏 = 𝒇𝒗𝟐 = ⋯ = 𝒇𝒗𝒎 for any path over vertices 𝑣1, 𝑣2, … , 𝑣𝑚
In a connected graph, all vertices are connected by some path
Thus 𝒇𝟏 = 𝒇𝟏 = ⋯ = 𝒇𝒏, and this is the only eigenvector with eigenvalue 0
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Suppose 𝑘 = 1.
Suppose further that 𝒇 is an eigenvector with eigenvalue 0, i.e., 𝐿𝒇 = 𝟎
→ 0 = 𝒇𝑇𝐿𝒇 =12Σ𝑖,𝑗=1…𝑛𝑊𝑖𝑗(𝒇𝑖 − 𝒇𝑗)
2
Since all weights are positive,→ 𝒇𝑖 = 𝒇𝑗 for all connected vertices 𝑖, 𝑗
Thus 𝒇𝒗𝟏 = 𝒇𝒗𝟐 = ⋯ = 𝒇𝒗𝒎 for any path over vertices 𝑣1, 𝑣2, … , 𝑣𝑚
In a connected graph, all vertices are connected by some path
Thus 𝒇𝟏 = 𝒇𝟏 = ⋯ = 𝒇𝒏, and this is the only eigenvector with eigenvalue 0
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Suppose 𝑘 = 1.
Suppose further that 𝒇 is an eigenvector with eigenvalue 0, i.e., 𝐿𝒇 = 𝟎
→ 0 = 𝒇𝑇𝐿𝒇 =12Σ𝑖,𝑗=1…𝑛𝑊𝑖𝑗(𝒇𝑖 − 𝒇𝑗)
2
Since all weights are positive,→ 𝒇𝑖 = 𝒇𝑗 for all connected vertices 𝑖, 𝑗
Thus 𝒇𝒗𝟏 = 𝒇𝒗𝟐 = ⋯ = 𝒇𝒗𝒎 for any path over vertices 𝑣1, 𝑣2, … , 𝑣𝑚
In a connected graph, all vertices are connected by some path
Thus 𝒇𝟏 = 𝒇𝟏 = ⋯ = 𝒇𝒏, and this is the only eigenvector with eigenvalue 0
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Suppose 𝑘 = 1.
Suppose further that 𝒇 is an eigenvector with eigenvalue 0, i.e., 𝐿𝒇 = 𝟎
→ 0 = 𝒇𝑇𝐿𝒇 =12Σ𝑖,𝑗=1…𝑛𝑊𝑖𝑗(𝒇𝑖 − 𝒇𝑗)
2
Since all weights are positive,→ 𝒇𝑖 = 𝒇𝑗 for all connected vertices 𝑖, 𝑗
Thus 𝒇𝒗𝟏 = 𝒇𝒗𝟐 = ⋯ = 𝒇𝒗𝒎 for any path over vertices 𝑣1, 𝑣2, … , 𝑣𝑚
In a connected graph, all vertices are connected by some path
Thus 𝒇𝟏 = 𝒇𝟏 = ⋯ = 𝒇𝒏, and this is the only eigenvector with eigenvalue 0
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Suppose 𝑘 = 1.
Suppose further that 𝒇 is an eigenvector with eigenvalue 0, i.e., 𝐿𝒇 = 𝟎
→ 0 = 𝒇𝑇𝐿𝒇 =12Σ𝑖,𝑗=1…𝑛𝑊𝑖𝑗(𝒇𝑖 − 𝒇𝑗)
2
Since all weights are positive,→ 𝒇𝑖 = 𝒇𝑗 for all connected vertices 𝑖, 𝑗
Thus 𝒇𝒗𝟏 = 𝒇𝒗𝟐 = ⋯ = 𝒇𝒗𝒎 for any path over vertices 𝑣1, 𝑣2, … , 𝑣𝑚
In a connected graph, all vertices are connected by some path
Thus 𝒇𝟏 = 𝒇𝟏 = ⋯ = 𝒇𝒏, and this is the only eigenvector with eigenvalue 0
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Now suppose that 𝑘 > 1.
WLOG, order the vertices according to the components they belong to
1
2
34
5
𝐿 =
2 0 00 1 −10 −1 1
−1 −10 00 0
−1 0 0−1 0 0
2 −1−1 2
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Now suppose that 𝑘 > 1.
WLOG, order the vertices according to the components they belong to
1
4
53
2
𝐿 =
2 −1 −1−1 2 −1−1 −1 2
0 00 00 0
0 0 00 0 0
1 −1−1 1
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Now suppose that 𝑘 > 1.
WLOG, order the vertices according to the components they belong to
→ 𝐿 has a block diagonal form
𝐿 =𝐿1 0 00 ⋱ 00 0 𝐿𝑘
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Now suppose that 𝑘 > 1.
WLOG, order the vertices according to the components they belong to
→ 𝐿 has a block diagonal form
The spectrum of 𝐿 is the union of the spectra of 𝐿1, … , 𝐿𝑘
𝐿 =𝐿1 0 00 ⋱ 00 0 𝐿𝑘
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Now suppose that 𝑘 > 1.
WLOG, order the vertices according to the components they belong to
→ 𝐿 has a block diagonal form
The spectrum of 𝐿 is the union of the spectra of 𝐿1, … , 𝐿𝑘
Since each 𝐿𝑖 represents a connected component, the eigenvectors corresponding to 𝜆1 = 0 of L are constant for the block 𝐿𝑖 and 0 elsewhere
𝐿1 0 00 ⋱ 00 0 𝐿𝑘
𝟏⋮0
=0⋮0
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Now suppose that 𝑘 > 1.
WLOG, order the vertices according to the components they belong to
→ 𝐿 has a block diagonal form
The spectrum of 𝐿 is the union of the spectra of 𝐿1, … , 𝐿𝑘
Since each 𝐿𝑖 represents a connected component, the eigenvectors corresponding to 𝜆1 = 0 of L are constant for the block 𝐿𝑖 and 0 elsewhere
𝐿1 0 00 ⋱ 00 0 𝐿𝑘
0⋮𝟏
=0⋮0
Proposition: The number of connected components of a graph is equal to the multiplicity 𝑘 of the first eigenvalue of the graph Laplacian matrix
Now suppose that 𝑘 > 1.
WLOG, order the vertices according to the components they belong to
→ 𝐿 has a block diagonal form
The spectrum of 𝐿 is the union of the spectra of 𝐿1, … , 𝐿𝑘
Since each 𝐿𝑖 represents a connected component, the eigenvectors corresponding to 𝜆1 = 0 of L are constant for the block 𝐿𝑖 and 0 elsewhere
Thus, 𝐿 has as many 0 eigenvalues as there are connected components
1
2
34
5
𝐿 =
2 0 00 1 −10 −1 1
−1 −10 00 0
−1 0 0−1 0 0
2 −1−1 2
1
2
34
5
𝐿 =
2 0 00 1 −10 −1 1
−1 −10 00 0
−1 0 0−1 0 0
2 −1−1 2
Spectral Clustering Algorithm
Given: A graph with 𝑛 vertices and edge weights 𝑊𝑖𝑗 , number of desired clusters 𝑘
1. Construct (normalized) graph Laplacian 𝐿 𝐺 𝑉, 𝐸 = 𝐷 −𝑊
2. Find the 𝑘 eigenvectors corresponding to the 𝑘 smallest eigenvalues of 𝐿
3. Let U be the n × 𝑘 matrix of eigenvectors
4. Use 𝑘-means to find 𝑘 clusters 𝐶′ letting 𝑥𝑖′ be the rows of U
5. Assign data point 𝑥𝑖 to the 𝑗th cluster if 𝑥𝑖′ was assigned to cluster 𝑗
Graph Cut Point of View
The goal of spectral clustering is to minimize the sum of the weights of edges between clusters
Define: cut 𝐴1, … , 𝐴𝑘 =1
2Σ𝑖=1…𝑘𝑊(𝐴𝑖 , 𝐴𝑖) where 𝑊 𝐴𝑖 , 𝐴𝑖 = 𝑗∈𝐴𝑖,𝑙∈𝐴𝑖
𝑤𝑗𝑙
Ex: 𝒌 = 𝟐
cut=2
𝐴1𝐴1
Graph Cut Point of View
The goal of spectral clustering is to minimize the sum of the weights of edges between clusters
Define: cut 𝐴1, … , 𝐴𝑘 =1
2Σ𝑖=1…𝑘𝑊(𝐴𝑖 , 𝐴𝑖) where 𝑊 𝐴𝑖 , 𝐴𝑖 = 𝑗∈𝐴𝑖,𝑙∈𝐴𝑖
𝑤𝑗𝑙
Ex: 𝒌 = 𝟐
cut=2
𝐴1𝐴1
Graph Cut Point of View
The goal of spectral clustering is to minimize the sum of the weights of edges between clusters
Define: cut 𝐴1, … , 𝐴𝑘 =1
2Σ𝑖=1…𝑘𝑊(𝐴𝑖 , 𝐴𝑖) where 𝑊 𝐴𝑖 , 𝐴𝑖 = 𝑗∈𝐴𝑖,𝑙∈𝐴𝑖
𝑤𝑗𝑙
Ex: 𝒌 = 𝟐
cut=2
𝐴1𝐴1
Graph Cut Point of View
The goal of spectral clustering is to minimize the sum of the weights of edges between clusters
Define: cut 𝐴1, … , 𝐴𝑘 =1
2Σ𝑖=1…𝑘𝑊(𝐴𝑖 , 𝐴𝑖) where 𝑊 𝐴𝑖 , 𝐴𝑖 = 𝑗∈𝐴𝑖,𝑙∈𝐴𝑖
𝑤𝑗𝑙
Ex: 𝒌 = 𝟐
cut=2
𝐴1
𝐴1
To create balanced clusters, minimize the RatioCut instead:
RatioCut 𝐴1, … , 𝐴𝑘 =1
2Σ𝑖=1…𝑘
𝑊(𝐴𝑖, 𝐴𝑖)
𝐴𝑖= Σ𝑖=1…𝑘
𝑐𝑢𝑡(𝐴𝑖, 𝐴𝑖)
𝐴𝑖
Graph Cut Point of View
RatioCut=0.18803
𝐴1𝐴1
To create balanced clusters, minimize the RatioCut instead:
RatioCut 𝐴1, … , 𝐴𝑘 =1
2Σ𝑖=1…𝑘
𝑊(𝐴𝑖, 𝐴𝑖)
𝐴𝑖= Σ𝑖=1…𝑘
𝑐𝑢𝑡(𝐴𝑖, 𝐴𝑖)
𝐴𝑖
Graph Cut Point of View
RatioCut=0.5500
𝐴1
𝐴1
…but minimizing RatioCut is NP-hard!!
min𝐴1,𝐴2,…,𝐴𝑘
1
2Σ𝑖=1…𝑘
𝑊(𝐴𝑖 , 𝐴𝑖)
𝐴𝑖
When the going gets tough…
We relax constraints on the problem
Minimizing Ratio Cut for 𝑘=2
Only two clusters, 𝐴 and 𝐴 (complement of 𝐴)
Want to minimize: min𝐴, 𝐴
1
2
𝑊(𝐴, 𝐴)
𝐴+
𝑊( 𝐴,𝐴)
𝐴
Trick: find a vector 𝑓 so that minimizing RatioCut 𝐴, 𝐴 is the same as minimizing 𝑓𝑇𝐿𝑓 subject to certain constraints.
Minimizing Ratio Cut for 𝑘=2
Only two clusters, 𝐴 and 𝐴 (complement of 𝐴)
Want to minimize: min𝐴, 𝐴
1
2
𝑊(𝐴, 𝐴)
𝐴+
𝑊( 𝐴,𝐴)
𝐴
Trick: find a vector 𝑓 so that minimizing RatioCut 𝐴, 𝐴 is the same as minimizing 𝑓𝑇𝐿𝑓 subject to certain constraints.
Minimizing Ratio Cut for 𝑘=2
Only two clusters, 𝐴 and 𝐴 (complement of 𝐴)
Want to minimize: min𝐴, 𝐴
1
2
𝑊(𝐴, 𝐴)
𝐴+
𝑊( 𝐴,𝐴)
𝐴
Trick: find a vector 𝑓 so that minimizing RatioCut 𝐴, 𝐴 is the same as minimizing 𝑓𝑇𝐿𝑓 subject to certain constraints.
Minimizing Ratio Cut for 𝑘=2
Goal: minimize: min𝐴, 𝐴
1
2
𝑊(𝐴, 𝐴)
𝐴+
𝑊( 𝐴,𝐴)
𝐴
Let: 𝑓𝑖 = 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
− 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
Minimizing Ratio Cut for 𝑘=2Goal: minimize: min
𝐴, 𝐴
1
2
𝑊(𝐴, 𝐴)
𝐴+
𝑊( 𝐴,𝐴)
𝐴
Let: 𝑓𝑖 = 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
− 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
Then, we can show that 𝑓𝑇𝐿𝑓 = 𝑉 RatioCut(𝐴, 𝐴)
Thus minimizing RatioCut(𝐴, 𝐴) is the same as minimizing 𝑓𝑇𝐿𝑓 with constraints:
𝑖=1𝑛 𝑓𝑖 = 0⇒ 𝑓 ⊥ 𝕝
and 𝑓 22= 𝑛, the number of vertices
and 𝑓𝑖 taking on discrete values as defined above
Minimizing Ratio Cut for 𝑘=2Goal: minimize: min
𝐴, 𝐴
1
2
𝑊(𝐴, 𝐴)
𝐴+
𝑊( 𝐴,𝐴)
𝐴
Let: 𝑓𝑖 = 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
− 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
Then, we can show that 𝑓𝑇𝐿𝑓 = 𝑉 RatioCut(𝐴, 𝐴)
Thus minimizing RatioCut(𝐴, 𝐴) is the same as minimizing 𝑓𝑇𝐿𝑓 with constraints:
𝑖=1𝑛 𝑓𝑖 = 0⇒ 𝑓 ⊥ 𝕝
and 𝑓 22= 𝑛, the number of vertices
and 𝑓𝑖 taking on discrete values as defined above
Minimizing Ratio Cut for 𝑘=2Goal: minimize: min
𝐴, 𝐴
1
2
𝑊(𝐴, 𝐴)
𝐴+
𝑊( 𝐴,𝐴)
𝐴
Let: 𝑓𝑖 = 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
− 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
Then, we can show that 𝑓𝑇𝐿𝑓 = 𝑉 RatioCut(𝐴, 𝐴)
Thus minimizing RatioCut(𝐴, 𝐴) is the same as minimizing 𝑓𝑇𝐿𝑓 with constraints:
𝑖=1𝑛 𝑓𝑖 = 0⇒ 𝑓 ⊥ 𝕝
and 𝑓 22= 𝑛, the number of vertices
and 𝑓𝑖 taking on discrete values as defined above
Now relax the constraints:
Minimize 𝑓𝑇𝐿𝑓 subject to: 𝑓 ⊥ 𝕝, 𝑓 22= 𝑛
The Rayleigh-Ritz theorem tells us that the solution to this optimization problem is given by 𝑒2, the eigenvector corresponding to 𝜆2
So we can approximate RatioCut by using the second eigenvector (corresponding to the second smallest eigenvalue) of 𝑳!!!! (this translates to solving for eigenvectors in the Spectral Clustering algorithm)
Now relax the constraints:
Minimize 𝑓𝑇𝐿𝑓 subject to: 𝑓 ⊥ 𝕝, 𝑓 22= 𝑛
The Rayleigh-Ritz theorem tells us that the solution to this optimization problem is given by 𝑒2, the eigenvector corresponding to 𝜆2
So we can approximate RatioCut by using the second eigenvector (corresponding to the second smallest eigenvalue) of 𝑳!!!! (this translates to solving for eigenvectors in the Spectral Clustering algorithm)
Now relax the constraints:
Minimize 𝑓𝑇𝐿𝑓 subject to: 𝑓 ⊥ 𝕝, 𝑓 22= 𝑛
The Rayleigh-Ritz theorem tells us that the solution to this optimization problem is given by 𝑒2, the eigenvector corresponding to 𝜆2
So we can approximate RatioCut by using the second eigenvector (corresponding to the second smallest eigenvalue) of 𝑳!!!! (this translates to solving for eigenvectors in the Spectral Clustering algorithm)
How to transform approximate solution back to discrete setting?Recall that we eliminated the constraint which assigned vertices to partitions:
𝑓𝑖 = 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
− 𝐴 𝐴 𝑖𝑓 𝑣𝑖 ∈ 𝐴
We can choose a threshold 𝑡, assigning all 𝑖 for which 𝑓𝑖 > 𝑡 to one partition 𝐴 and all 𝑖 for which 𝑓𝑖 ≤ 𝑡 to the other partition 𝐴
Approximating RatioCut for 𝑘 > 2
We can use similar arguments to show that solving minRatioCut for 𝑘 >2 corresponds to finding the first 𝑘 eigenvectors of 𝐿
Spectral Clustering Algorithm
Given: A graph with 𝑛 vertices and edge weights 𝑊𝑖𝑗 , number of desired clusters 𝑘
1. Construct (normalized) graph Laplacian 𝐿 𝐺 𝑉, 𝐸 = 𝐷 −𝑊
2. Find the 𝑘 eigenvectors corresponding to the 𝑘 smallest eigenvalues of 𝐿
3. Let U be the n × 𝑘 matrix of eigenvectors
4. Use 𝑘-means to find 𝑘 clusters 𝐶′ letting 𝑥𝑖′ be the rows of U
5. Assign data point 𝑥𝑖 to the 𝑗th cluster if 𝑥𝑖′ was assigned to cluster 𝑗
Examples:
1 2 3 4 5
1
2
34
5
1
2
34
5
1
2
34
5
1 2 3 4
5 6 7 8
Path Graph𝐿 =
1 −1 0−1 2 −10 −1 2
0 00 0−1 0
0 0 −10 0 0
2 −1−1 1
1 2 3 4 5
Complete Graph
𝐿 =
4 −1 −1−1 4 −1−1 −1 4
−1 −1−1 −1−1 −1
−1 −1 −1−1 −1 −1
4 −1−1 4
1
2
34
5
𝐿 =
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
1
2
34
5
𝐿 =
2 0 00 1 −10 −1 2
−1 −10 0−1 0
−1 0 −1−1 0 0
3 −1−1 2
1
34
52
Bipartite Graph
𝐿 =
2 00 2
0 00 0
0 00 0
2 00 4
−1 −1−1 −1
0 00 0
0 −1−1 −1
−1 0−1 −1
−1 −1−1 −1
0 −1−1 −1
0 00 0
−1 −10 −1
3 00 4
0 00 0
0 00 0
2 00 1
1 2 3 4
5 6 7 8
1
2
34
5𝐿 =
2 −1 0−1 2 −10 −1 2
0 −10 0−1 0
0 0 −1−1 0 0
2 −1−1 2
Real-World Examples
Issues in Spectral Clustering
• similarity function
• a reasonable choice is 𝑠𝑖𝑗 = 𝑒−∥𝑥𝑖−𝑥𝑗∥/2𝜎2when the data
points live in a Euclidean space, but it always depends on the domain
• type of similarity graph• 𝜀-neighborhood: difficulties arise when data is on “different
scales” – which 𝜀 should we choose?
• 𝒌-nearest neighbors: points in low densities can be grouped with points in high densities
• mutual 𝑘-nearest neighbors: tends to connect points with constant densities, but not points in densities that are different from each other
Issues in Spectral Clustering
• similarity function
• a reasonable choice is 𝑠𝑖𝑗 = 𝑒−∥𝑥𝑖−𝑥𝑗∥/2𝜎2when the data
points live in a Euclidean space, but it always depends on the domain
• type of similarity graph• 𝜀-neighborhood: difficulties arise when data is on “different
scales” – which 𝜀 should we choose?
• 𝒌-nearest neighbors: points in low densities can be grouped with points in high densities
• mutual 𝑘-nearest neighbors: tends to connect points with constant densities, but not points in densities that are different from each other
More issues
• fully connected graph• usually used with gaussian similarity function
• need to pick 𝜎 wisely
• finding the eigenvectors• If we use the 𝑘-nearest neighbor graph or the 𝜀-
neighborhood graph, the Laplace matrix will be sparse, and we have efficient methods to compute the first k eigenvectors of sparse matrices
• However, the speed of convergence of popular methods depends on the size of the eigengap
• choosing the number of clusters• “A variety of more or less successful methods have been
devised for this problem”
More issues
• fully connected graph• usually used with gaussian similarity function
• need to pick 𝜎 wisely
• finding the eigenvectors• If we use the 𝑘-nearest neighbor graph or the 𝜀-
neighborhood graph, the Laplace matrix will be sparse, and we have efficient methods to compute the first k eigenvectors of sparse matrices
• However, the speed of convergence of popular methods depends on the size of the eigengap
• choosing the number of clusters• “A variety of more or less successful methods have been
devised for this problem”
More issues
• fully connected graph• usually used with gaussian similarity function
• need to pick 𝜎 wisely
• finding the eigenvectors• If we use the 𝑘-nearest neighbor graph or the 𝜀-
neighborhood graph, the Laplace matrix will be sparse, and we have efficient methods to compute the first k eigenvectors of sparse matrices
• However, the speed of convergence of popular methods depends on the size of the eigengap
• choosing the number of clusters• “A variety of more or less successful methods have been
devised for this problem”
Large-Scale Spectral Clustering
Main Challenge: Finding the first 𝑘 eigenvectors can be prohibitively expensive
2010: Power Iteration Clustering: Key idea: use power iteration to find only ONE vector that will give partitions similar to those found by looking at first 𝑘eigenvectors
Large-Scale Spectral Clustering
Main Challenge: Finding the first 𝑘 eigenvectors can be prohibitively expensive
2010: Power Iteration Clustering: Key idea: use power iteration to find only ONE vector that will give partitions similar to those found by looking at first 𝑘eigenvectors
Power Iteration Clustering
Large-scale extension to Spectral Clustering
Uses power iteration on I − 𝐷−1L = 𝐷−1𝑊 until convergence to a linear combination of the 𝑘 smallest eigenvectors
Power Iteration Clustering
Large-scale extension to Spectral Clustering
Uses power iteration on I − 𝐷−1L = 𝐷−1𝑊 until convergence to a linear combination of the 𝑘 smallest eigenvectors
Applicable to large-scale text-document classification, works well for small values of 𝑘
If 𝐹𝐹𝑇 = 𝑊 then 𝐷−1𝑊 𝑣 = 𝐷−1𝐹𝐹𝑇𝑣
Conclusion
Spectral Clustering has been a successful heuristic algorithm for partitioning the vertices of a graph
Relates linear algebraic theory to a discrete optimization problem
Has been extended to many application domains and to a large-scale setting
More Details
For an arbitrary 𝑘
We look for a solution of:
min𝐴1…𝐴𝑘
𝑇𝑟(𝐻′𝐿𝐻) = min𝐴1…𝐴𝑘
𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡(𝐴1, … , 𝐴𝑘)
With the constraints:
𝐻′𝐻 = 𝐼, ℎ𝑖𝑗 =
1
𝐴𝑗
𝑖𝑓𝑣𝑖 ∈ 𝐴𝑗
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
We relax the problem, then come back to unrelaxed version
The relaxed problem becomes:
min𝐻∈ℝ𝑛×𝑘
𝑇𝑟 𝐻′𝐿𝐻 where 𝐻′𝐻 = 𝐼
Rayleigh-Ritz Theorem tells us that the solution to this problem is the 𝐻 which contains the first 𝑘 eigenvectors of 𝐿 as columns
And we get back to discrete partitions of the graph by using 𝑘-means clustering on the rows of 𝐻 = 𝑈
Note: In general it is known that efficient algorithms to approximate balanced graph cuts up to a constant factor do not exist
Normalized Graph Laplacians
• 𝐿𝑠𝑦𝑚 = 𝐷−1
2𝐿𝐷1
2
• 𝐿𝑟𝑤 = 𝐷−1𝐿
• Their eigenvalues and vectors are closely related to eachother and to the unnormalized graph Laplacian 𝐿
Normalized Laplacians - Properties
• The multiplicity of the eigenvalue 0 of both 𝐿𝑠𝑦𝑚 and 𝐿𝑟𝑤 is the number of connected components in the graph
• Graph is undirected with nonnegative weights
• The eigenspace of 0 for 𝐿𝑟𝑤 is spanned by the indicator vectors 𝕝𝐴𝑖for those components
• The eigenspace of 0 for 𝐿𝑠𝑦𝑚 is spanned by 𝐷 −12𝕝𝐴𝑖