Power Iteration Clustering Speaker: Xiaofei Di 2010.10.11

Power Iteration Clustering

Speaker: Xiaofei Di2010.10.11

Outline

• Authors• Abstract• Background • Power Iteration Clustering(PIC)• Conclusion

Authors• Frank Lin• PhD Student

Language Technologies Institute School of Computer Science Carnegie Mellon University

• http://www.cs.cmu.edu/~frank/

• William W. Cohen • Associate Research Professor, Machine

Learning Department, Carnegie Mellon University

• http://www.cs.cmu.edu/~wcohen/

http://www.cs.cmu.edu/~frank/

Abstract• We present a simple and scalable graph clustering

method called power iteration clustering.• PIC finds a very low-dimensional embedding of a

dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data. This embedding turns out to be an effective cluster indicator, consistently outperforming widely used spectral methods such as Ncut on real datasets.

• PIC is very fast on large datasets, running over 1000 times faster than an Ncut implementation based on the state-of-the-art IRAM eigenvector computation technique.

摘要• 本文提出了一种简单可扩展的图聚类方法：

快速迭代聚类（ PIC ）。• PIC 利用数据归一化的逐对相似度矩阵，采

用截断的快速迭代法，寻找数据集的一个超低维嵌入。这种嵌入恰好是很有效的聚类指标，使它在真实的数据集上总是好于广泛使用的谱聚类方法，比如 NCut 。

• 在大规模数据集上， PIC 非常快，比基于最好的特征计算技术实现的 Ncut 快 1000 倍。

Background 1 ----spectral clustering

: { , , ..., }1 2

: ( , ) 0

: ( , )

:

deg

X x x xn

s x xi j

W s x xij i j

D diagonal m

dataset

similarity function

affinity matrix

ree ma atrix

d Wjii ij

trix

1

1 1

2 2

-1

normalized affinity matrix

unnormalized graph Laplacian matrix

normalized s

: NA = D

: L = D -

ymmetric Laplacian matrix

normalized random-walk Laplacian m

W

: L=I - D

: L=I - D Watrix

W

WD

Background 2 ----Power Iteration

Method

• Advantage dose not compute matrix decomposition

• Disadvantages finds only the largest eigenvalue and converges slowly

• An eigenvalue algorithm– Input: initial vector b0 and the matrix A– Iteration: 1

1

kk

k

Abb Ab

• Convergence Under the assumptions:

A has an eigenvalue that is strictly greater in magnitude than its other eigenvalues The starting vector b0 has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue.

then: A subsequence of converges to an eigenvector associated with the dominant eigenvalue

kb

Power Iteration Clustering(PIC)Unfortunately, since the sum of each row of NA is 1, the largest eigenvector of NA (the smallest of L) is a constant vector with eigenvalue 1.

Fortunately, the intermediate vectors during the convergence process are interesting.

22

2

|| || and s(x ,x )=exp( )

2i j

i i j

x xx R

Example:

Conclusion: PI first converges locally within a cluster.

PI’s ConvergenceLet: W = NA (Normalized affinity matrix ),

1 1 n

1 1

2

W has eigenvectors e ,..., with eigenvalues λ ,...,λ

λ =1, e is constant

,...., are lager than the remaining ones

n

k

e

Spectral representation of

Spectral distance between a and b:

1

2 2 has an ( , )-eigengap between the k and (k+1) eigenvecto andr th t k khW

every W is e bounded

is the t-th iteration of PItv

0( , ) distance between a and b: t v

1 2

[ ( ) ( )]t

nj

j j jj k

e a e b c

2

signal is an approximation of spec, but

a) compressed to the small radius R

) has components distorted by c and

) has terms that are additively combined(rather than Euclidean)

t

t

t

i ib

c

a) The size of the radius is of no importance in clustering, because most clustering methods based on relative distance, not absolute one.

b) The importance of the dimension associated with the i-th eigenvector is downweighted by (a power of) its eigenvalue, which often improves performance for spectral methods.

c) For many natural problems, W is approximately block-stochastic, and hence the first k eigenvectors are approximately piecewise constant over the k clusters.

It is easy to see that when spec(a,b) is small, signal must also small. However, when a and b are in different clusters, since the terms are signed and additively

combined, it is possible that they may “cancel out” and make a and b seem to be in the same cluster. Fortunately, this seems to be uncommon in practice when the

cluster number k is not too large.

t

So for large enough a, small enough t,

signal is very likely a good cluster indicator.

Early stopping for PI1

1

at t:

at t:

ˆ|| || then stop PI

t t t

t t t

t

velocity v v

acceleration

if

While the clusters are ‘’locally converging”, the rate of convergence changes rapidly; whereas during the

final global convergence, the converge rate appears more stable.

1

5

0

1*10ˆ1.

n is the number of data instances

2. v ( ) ( )

V(A)=

3. [ ,..., ]

( dimension is good enough)

k

ijj

iji j

tt t

n

Ai

V A

A

V v v v

one

Experiments (1/3)• Purity : cluster purity• NMI : normalized mutual information• RI : rand index The Rand index or Rand measure is a measure of the similarity between two data clusterings. Given a set of n elements and two partitions of S to compare, and , we define the following:

a = | S * | , where b = | S * | , where c = | S * | , where d = | S * | , where

for someThen:

Experiments (2/3)Experimental comparisons on accuracy of PIC

Experimental comparisons on eigenvalue weighting

Experiments (3/3)

Experimental comparisons on scalability

NCutE uses slower, classic eigenvalue decomposition method to find all eigenvectors.NCutI uses fast Implicitly Restarted Arnoldi Method(IRAM) for the top k eigenvectors.

Synthetic dataset

Conclusion

• Novel• Simple• Efficient

Appendix ----NCut

Appendix ----NJW

Documents

Power Iteration Clustering Speaker: Xiaofei Di 2010.10.11