106
Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB and LBNL) # Departments of Statistics and EECS UC Berkeley Workshop on Spectral Algorithms, Simons Inst, Oct. , 2014 Monday, November 3, 2014

Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

Impact of Regularization on Spectral Clustering

Antony Joseph* and Bin Yu#

* Walmart Research Lab in San Francisco(formerly UCB and LBNL)

# Departments of Statistics and EECSUC Berkeley

Workshop on Spectral Algorithms, Simons Inst, Oct. , 2014

Monday, November 3, 2014

Page 2: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

spectral clusteringin

graphs

Berkeley Drosophila Genome Project (BDGP)(The fruit fly project)

Overview

2

collaborators :

• Siqi Wu, UC Berkeley

• Erwin Frise, Lawrence Berkeley Lab

• Ann Hammonds, Lawrence Berkeley Lab

• Sue Celniker, Lawrence Berkeley Lab

Monday, November 3, 2014

Page 3: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

spectral clusteringin

graphs

Berkeley Drosophila Genome Project (BDGP)(The fruit fly project)

Overview

2

collaborators :

• Siqi Wu, UC Berkeley

• Erwin Frise, Lawrence Berkeley Lab

• Ann Hammonds, Lawrence Berkeley Lab

• Sue Celniker, Lawrence Berkeley Lab

Monday, November 3, 2014

Page 4: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

A Graph

Social network people

The fruit fly project pixels/points in the embryo template

...

Context Nodes

3Monday, November 3, 2014

Page 5: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

The fruit fly project (Berkeley Drosophila Genome Project)

4Monday, November 3, 2014

Page 6: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Drosophila(fruit fly)

Widely studied :• genetic mechanism similar to humans • easy to maintain in the lab• short life cycle • ...

5Monday, November 3, 2014

Page 7: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

Image dataset from the fruit fly project

Monday, November 3, 2014

Page 8: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

Image dataset from the fruit fly project

tailless

Monday, November 3, 2014

Page 9: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

Image dataset from the fruit fly project

• Over 100,000 stained embryo images (over 7000 genes)

Monday, November 3, 2014

Page 10: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

Image dataset from the fruit fly project

• Over 100,000 stained embryo images (over 7000 genes)

• the interaction between different genes

• the genes required for development of various organs.

Goals: Contribute to the understanding of ...

Monday, November 3, 2014

Page 11: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

‘Fate’ map in early embryos

Lohs-Schardin et. al (’70), Hartenstein et. al. (‘85)

Laser ablation experiments in embryos in early stages of development

7Monday, November 3, 2014

Page 12: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

‘Fate’ map in early embryos

hind gut

anterior mid-gut

dorsal epidermis

ventral neurogenic region

procephalic neurogenic region

pharynx

mesoderm

esophagus

Lohs-Schardin et. al (’70), Hartenstein et. al. (‘85)

Laser ablation experiments in embryos in early stages of development

7Monday, November 3, 2014

Page 13: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Do genes explain the `fate’ map?

.... early stage gene expression images

8Monday, November 3, 2014

Page 14: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Discovery of fate map &

communities on graphs

9Monday, November 3, 2014

Page 15: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Discovery of fate map &

communities on graphs

Nodes : pixels/points in the embryo

9Monday, November 3, 2014

Page 16: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Discovery of fate map &

communities on graphs

9Monday, November 3, 2014

Page 17: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Discovery of fate map &

communities on graphs

Edge if lot of genes are co-expressed at the two nodes

9Monday, November 3, 2014

Page 18: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Discovery of fate map &

communities on graphs

9Monday, November 3, 2014

Page 19: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Discovery of fate map &

communities on graphs

fate map

9

??

Monday, November 3, 2014

Page 20: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Edge between node i and node j

10Monday, November 3, 2014

Page 21: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

= . . . . . .

Edge between node i and node j

10Monday, November 3, 2014

Page 22: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

= . . . . . .

= . . . . . .

Edge between node i and node j

10Monday, November 3, 2014

Page 23: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

= . . . . . .

= . . . . . .

> >

Edge between node i and node j

10Monday, November 3, 2014

Page 24: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

= . . . . . .

= . . . . . .

> >

Edge between node i and node j

90-th percentile

10Monday, November 3, 2014

Page 25: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

τ = τ =

Take K = 8

11

Comparing unregularized vs. regularized SC

Monday, November 3, 2014

Page 26: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

τ = τ =

Take K = 8dorsal epidermis

mesoderm

hind gut

ventral neurogenic region

procephalic neurogenic region

anterior mid-gut

pharynx

esophagus

11

Comparing unregularized vs. regularized SC

Monday, November 3, 2014

Page 27: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Communities

people like minded people

pixels/points in embryo area of future organs

...

Nodes Communities

12Monday, November 3, 2014

Page 28: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Finding communities

13Monday, November 3, 2014

Page 29: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Finding communities

Notion of (two) communities

13Monday, November 3, 2014

Page 30: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Finding communities

Notion of (two) communities

Methods

Spectral clustering (Fiedler (’73), Donath & Hoffman (’73), ...)

Modularity (Newman & Girvan (‘03)), Latent space methods (Hoff et. al. (’02))Profile-likelihood (Bickel & Chen (’09)), Pseudo-Likelihood (Amini et. al. (’13)),

13Monday, November 3, 2014

Page 31: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4614

Spectral Clustering

Monday, November 3, 2014

Page 32: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Notation

15

Adjacency matrix:(symmetric binary)

Number of nodes: n

A ∈ Rn×n

Aij = Aji =

�1, (i, j)

0,

Monday, November 3, 2014

Page 33: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Notation

15

Adjacency matrix:(symmetric binary)

Number of nodes: n

A ∈ Rn×n

Aij = Aji =

�1, (i, j)

0,

Each row/column of A associated with a node

Monday, November 3, 2014

Page 34: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Notation

15

Adjacency matrix:(symmetric binary)

Number of nodes: n

A ∈ Rn×n

Aij = Aji =

�1, (i, j)

0,

Degree matrix:(diagonal)

D ∈ Rn×n

Dii =�

j

Aij

Monday, November 3, 2014

Page 35: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Spectral Clustering

16

(Normalizedsymmetric Laplacian matrix)

= − / − /

Spectral clustering deals with the eigenvectors of the matrix :

Monday, November 3, 2014

Page 36: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Spectral Clustering

16

(Normalizedsymmetric Laplacian matrix)

= − / − /

Spectral clustering deals with the eigenvectors of the matrix :

Other matrices used ...

D −A

A

D−1 A ( Normalized random walk Laplacian)

(Unnormalized Laplacian)

(Adjacency matrix)

Monday, November 3, 2014

Page 37: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4617

Illustration of SC

Monday, November 3, 2014

Page 38: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4617

Illustration of SC

A =

Monday, November 3, 2014

Page 39: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4617

A =

Illustration of SC

Monday, November 3, 2014

Page 40: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4617

L =

Illustration of SC

Monday, November 3, 2014

Page 41: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4617

First eigenvector

Seco

nd e

igen

vect

or

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.038 0.0360.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

L =

Illustration of SC

Monday, November 3, 2014

Page 42: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4617

First eigenvector

Seco

nd e

igen

vect

or

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.038 0.0360.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

L =

Illustration of SC

cluster

Seco

nd e

igen

vect

or

First eigenvector

Monday, November 3, 2014

Page 43: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

SC for finding K clusters (Shi and Malik (00), Ng et. al (’02))

18

n×K V K L

V K

Monday, November 3, 2014

Page 44: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

SC for finding K clusters (Shi and Malik (00), Ng et. al (’02))

18

n×K V K L

V K

V

Monday, November 3, 2014

Page 45: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Popularity of spectral clustering

• Computational advantage :

-requires eigenvector decomposition which is very fast

Theoretical backing :

- relaxation of various cut-based measures

(Hagen & Kahng (’92), Shi & Malik (’00), Ng et al, (’02))

- Stochastic Block Model and its extensions

(McSherry (‘01), Rohe. et. al (‘11), Chaudhari et. al. (’12), Sussman (’12),

Fishkind (’11))

19Monday, November 3, 2014

Page 46: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Performance of spectral clustering improves greatly through regularization

Regularization proposed by Amini, Chen, Bickel and Levina (AoS, 2013)

20Monday, November 3, 2014

Page 47: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Performance of spectral clustering improves greatly through regularization

Regularization proposed by Amini, Chen, Bickel and Levina (AoS, 2013)

20

A

Aτ = A+τ

n11�, τ > 0.

Lτ Aτ

Vτ K

Vτ = K Lτ

Monday, November 3, 2014

Page 48: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Performance of spectral clustering improves greatly through regularization

Regularization proposed by Amini, Chen, Bickel and Levina (AoS, 2013)

Alternative forms of regularization proposed and analyzed in Chaudhuri et. al (2012), Qin & Rohe (’13)

20

A

Aτ = A+τ

n11�, τ > 0.

Lτ Aτ

Vτ K

Vτ = K Lτ

Monday, November 3, 2014

Page 49: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Stochastic Block Model

21Monday, November 3, 2014

Page 50: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Stochastic Block Model (SBM) (Holland et. al (’83))

22

n

(i, j) Pij

Monday, November 3, 2014

Page 51: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Stochastic Block Model (SBM) (Holland et. al (’83))

22

SBM with two blocks

=� �

=

n× n

n

(i, j) Pij

Monday, November 3, 2014

Page 52: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4623

sample.4

.3

.2

.2

Monday, November 3, 2014

Page 53: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4623

sample.4

.3

.2

.2

Monday, November 3, 2014

Page 54: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Analysis of regularization for the SBM(Focus on K =2)

24Monday, November 3, 2014

Page 55: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4625

Comparing unregularized vs. regularized SC

==

τ =

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

τ =

.003

.04

.0025

.0025

Monday, November 3, 2014

Page 56: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4625

Comparing unregularized vs. regularized SC

==

τ =

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

τ =

.003

.04

.0025

.0025

k-means success : 100%k-means success : 87%

Monday, November 3, 2014

Page 57: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Recap : Regularized spectral clustering

26

Aτ = A+τ

n11�, τ > 0.

Lτ = D−1/2τ AτD

−1/2τ

Vτ = Lτ

Monday, November 3, 2014

Page 58: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Population level quantities

27

A = P=

Monday, November 3, 2014

Page 59: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Population level quantities

27

τ

τ

Pτ = P + τn11

Lpopτ

Monday, November 3, 2014

Page 60: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Population level quantities

27

τ

τ

Pτ = P + τn11

Lpopτ

Recall:

Vτ n× 2

Monday, November 3, 2014

Page 61: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Population level quantities

27

τ

τ

Pτ = P + τn11

Lpopτ

Vτ V popτ

Monday, November 3, 2014

Page 62: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Population level quantities

27

1,τ 2,τ

τ

τ

Pτ = P + τn11

Lpopτ

Vτ V popτ

Monday, November 3, 2014

Page 63: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 64: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 65: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 66: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 67: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 68: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 69: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 70: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

τ =max = , max ∈ � ,τ − ,τ�

� ,τ − ,τ�

0.052 0.05 0.048 0.046 0.044 0.042 0.04 0.0380.1

0.08

0.06

0.04

0.02

0

0.02

0.04

0.06

0.08

τ

28Monday, November 3, 2014

Page 71: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4629

τ

τ =max = , max ∈ � ,τ − ,τ�

� ,τ − ,τ�

Monday, November 3, 2014

Page 72: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4629

τ

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

Page 73: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

τ

29

τ

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

Page 74: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4629

τ

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

Page 75: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4629

τ

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

Page 76: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4629

τ

τ

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

(µ2,τ τ)

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

Page 77: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4629

τ

Implication of concentration of Laplacian (Oliveira (’10)):

with high probability� τ − τ � � min

�√

, + τ, ,

( , + τ)

��log

τ � log ,

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

Page 78: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Improvements using extension of techniques in Balakrishnan et. al. (’11).

29

τ

τ �√ � τ − τ �

µ ,τ

Implication of matrix perturbation theory (Davis - Kahan) :

τ =Lτ Lpop

τ

� 1,τ − 2,τ�

Monday, November 3, 2014

Page 79: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4630

Let,

Set,

dn :=

τ = dn

Monday, November 3, 2014

Page 80: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4630

Let,

Set,

dn :=

τ = dn

Result (SBM with two blocks):

dn �√n log n

µ2,0

Monday, November 3, 2014

Page 81: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4630

Let,

Set,

Summary:

dn :=

τ = dn

Result (SBM with two blocks):

dn �√n log n

µ2,0

Monday, November 3, 2014

Page 82: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Choice of regularization parameter

31Monday, November 3, 2014

Page 83: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

� τ − τ �µ ,τ

Recall: trade-offs dictated by

32Monday, November 3, 2014

Page 84: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

� τ − τ �µ ,τ

Recall: trade-offs dictated by

� τ − ˆτ �µ̂ ,τ

τ

32Monday, November 3, 2014

Page 85: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Estimates based on estimated SBM (or degree corrected SBM)

� τ − τ �µ ,τ

Recall: trade-offs dictated by

� τ − ˆτ �µ̂ ,τ

τ

32Monday, November 3, 2014

Page 86: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

� τ − τ �µ ,τ

Recall: trade-offs dictated by

� τ − ˆτ �µ̂ ,τ

τ

32Monday, November 3, 2014

Page 87: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

ˆτ , µ̂ , τ

33

P=

τ

C1, C2

Monday, November 3, 2014

Page 88: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

ˆτ , µ̂ , τ

33

P=

τ

C1, C2

p1, p2 q C1 C2

e.g. p̂1 = C1

Monday, November 3, 2014

Page 89: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

ˆτ , µ̂ , τ

33

P̂ =

p̂1

p̂2q̂

τ

C1, C2

p1, p2 q C1 C2

e.g. p̂1 = C1

Monday, November 3, 2014

Page 90: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

ˆτ , µ̂ , τ

33

P̂ =

p̂1

p̂2q̂

τ

C1, C2

p1, p2 q C1 C2

e.g. p̂1 = C1

P̂ L̂popτ µ̂2,τ L̂pop

τ

Monday, November 3, 2014

Page 91: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4634

Example

==

.003

.01

.0025

.0025

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

τ = 0 τ = 18

Monday, November 3, 2014

Page 92: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4634

Example

==

.003

.01

.0025

.0025

first sample eigenvector first sample eigenvector

seco

nd s

ampl

e ei

genv

ecto

r

seco

nd s

ampl

e ei

genv

ecto

r

k-means success : 94%k-means success : 75%

τ = 0 τ = 18

Monday, November 3, 2014

Page 93: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Political blog data

35Monday, November 3, 2014

Page 94: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

source : Adamic & Glance (’05)

=

36Monday, November 3, 2014

Page 95: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

0 50 100 150 200 250 300 350 4000

100

200

300

400

500

600

700

800

Histogram of degrees

source : Adamic & Glance (’05)

=

36Monday, November 3, 2014

Page 96: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

first

eig

enve

ctor

nodes

Political blogs data set

Unregularized Spectral Clustering

37

seco

nd e

igen

vect

or

nodes

Monday, November 3, 2014

Page 97: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4638Monday, November 3, 2014

Page 98: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

second eigenvector discriminates these from the remaining

38Monday, November 3, 2014

Page 99: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4638Monday, November 3, 2014

Page 100: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/4638

third eigenvector

Monday, November 3, 2014

Page 101: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Regularized SC for political blogs datasetfir

st e

igen

vect

or

seco

nd e

igen

vect

or

39

τ = .

nodesnodes

Monday, November 3, 2014

Page 102: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Regularized SC for political blogs dataset

13% of misclassified nodes for regularizedcompared to 48% for unregularized

first

eig

enve

ctor

seco

nd e

igen

vect

or

39

τ = .

nodesnodes

Monday, November 3, 2014

Page 103: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

τ = τ =

Take K = 8

40

Comparing unregularized vs. regularized Spectral Clustering (SC)

Monday, November 3, 2014

Page 104: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

τ = τ =

Take K = 8dorsal epidermis

mesoderm

hind gut

ventral neurogenic region

procephalic neurogenic region

anterior mid-gut

pharynx

esophagus

40

Comparing unregularized vs. regularized Spectral Clustering (SC)

Monday, November 3, 2014

Page 105: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Summary

• Theoretical upper bound under SBM shows “bias-variance”-like trade-off while the amount of regularization increases in SC

• Theoretical analysis motivates practically useful scheme (using SBM or degree-corrected SBM) to select regularization parameter in RSC.

Promising results in fruitfly image segmentation

Paper at (2014 rev): http://arxiv.org/pdf/1312.1733.pdf

41Monday, November 3, 2014

Page 106: Impact of Regularization on Spectral Clustering · Impact of Regularization on Spectral Clustering Antony Joseph* and Bin Yu# * Walmart Research Lab in San Francisco (formerly UCB

/46

Ongoing/future directions

The BDGP project (with Antony Joseph, Siqi Wu, Ann Hammonds, Sue Celniker, Erwin Frise)

• Fast algorithm for computing the data-driven choice of regularization parameter• Role of regularization in other scenarios, such as hierarchical clusters• Regularization parameter choice for continuous data

Spectral Clustering (with Antony Joseph)

• Analysis of gene interactions in different regions of early stage embryos• Extension of analysis to later stage embryos

42Monday, November 3, 2014