29
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co- Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yubao Wu 2 Patric F. Sullivan 1 Wei Wang 3 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of California, Los Angeles Speaker: Wei Cheng The 19 th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD’13)

Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

  • Upload
    tekla

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Flexible and Robust Co-Regularized Multi-Domain Graph Clustering. Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yubao Wu 2 Patric F. Sullivan 1 Wei Wang 3 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of California, Los Angeles. - PowerPoint PPT Presentation

Citation preview

Page 1: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

Wei Cheng1 Xiang Zhang2 Zhishan Guo1 Yubao Wu2 Patric F. Sullivan1 Wei Wang3

1University of North Carolina at Chapel Hill,2Case Western Reserve University,

3University of California, Los Angeles

Speaker: Wei ChengThe 19th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD’13)

Page 2: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Outline• Introduction

• Motivation

• Co-regularized multi-domain graph clusteringSingle domain graph clusteringCross-domain Co-regularization

Residual sum of squares (RSS) lossClustering disagreement (CD) loss

• Re-evaluation cross-domain relationship

• Experimental Study

• Conclusion

Page 3: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Graph and Graph Clustering• Graphs are ubiquitous

social networksbiology interaction networks literature citation networks, etc

• Graphs clusteringDecompose a network into sub-networks based

on some topological propertiesUsually we look for dense sub-networks

Page 4: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

E.g., Detect protein functional modules in a PPI network

from Nataša Pržulj – Introduction to Bioinformatics. 2011.

Page 5: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

E.g., Community Detection

Collaboration network between scientistsfrom Santo Fortunato –Community detection in graphs

Page 6: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Multi-view Graph clustering

• Graphs collected from multiple sources/domains

• Multi-view graph clusteringRefine clusteringResolve ambiguity

Page 7: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Motivation• Multi-view

Exact one-to-oneComplete mappingThe same size

• More common cases Many-to-manyTolerate partial mappingDifferent sizesMappings are associated

with weights(confidence)

Page 8: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Motivation

• Objective: design algorithm which isFlexibilityRobustness

Suitable for common cases :Many-to-many weighted partial mappings

Flexibility and Robustness

Noisy graphs have little influence on others

Page 9: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Problem Formulation

A(1) A(2) A(3)affinity matrix

Sa,b(i,j) denotes the weight between the a-th

instance in Dj and the b-th instance in Di.

To partition each A(π) into kπ clusters while considering the co-regularized constraints implicitly encoded in cross-domain relationships in S.

Page 10: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Co-regularized multi-domain graph clustering (CGC)

• Single-domain ClusteringSymmetric Non-negative matrix factorization (NMF).Minimizing:

Here, , where each

represents the cluster assignment of the a-th instance in domain Dπ

( ) ( ) ( ) ( ) 2|| ( ) ||TFL A H H . .s t ( ) 0H

( ) ( ) ( ) ( )1* * *[ , ,..., ] n kT

a nH h h h R

( )*ah

Page 11: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Co-regularized multi-domain graph clustering (CGC)

• Cross-domain Co-regularizationResidual sum of squares (RSS) loss (when the number

of clusters is the same for different domains).

Clustering disagreement (CD) loss (when the number of clusters is the same or different).

Page 12: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Co-regularized multi-domain graph clustering (CGC)

• Residual sum of squares (RSS) loss Directly compare the H(π) inferred in different domains. To penalize the inconsistency of cross-domain cluster partitions for the l-th cluster

in Di, the loss for the b-th instance is

where

denotes the set of indices of instances in Di that are mapped to , and is its cardinality.

The RSS loss is

e

( , ) ( , ) ( ) ( ) 2, ,( ( , ) )i j i j j jb l b b lJ E x l h

( )( , )

( , ) ( ) ( , ) ( ), ,( , ) ( )

( )

1( , )

| ( ) | ji jb

i j j i j ib b a a li j j

a N xb

E x l S hN x

( , ) ( )( )i j j

bN x( , ) ( )| ( ) |i j j

bN x( )jbx

( , ) ( , ) ( , ) ( ) ( ) 2,

1 1

|| ||jnk

i j i j i j i jRSS b l F

l b

J J S H H

Page 13: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Co-regularized multi-domain graph clustering (CGC)

• Clustering disagreement (CD) Indirectly measure the clustering inconsistency of cross-domain

cluster partitions . Intuition:

• and are mapped to 2A⃝� B⃝� ⃝, and C is mapped to 4 ⃝ . Intuitively, if the similarity between cluster assignments for 2⃝ and 4 ⃝ is small, then the similarity of clustering assignments between and and A⃝� C⃝�the similarity between and should also be small.B⃝� C⃝� The CD loss is ( , ) ( , ) ( ) ( , ) ( ) ( ) ( ) 2|| ( ) ( ) ||i j i j i i j i T j j T

CD FJ S H S H H H

Linear kernel

Page 14: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Co-regularized multi-domain graph clustering (CGC)

• Objective function (Joint Matrix Optimization):

( )

( ) ( , ) ( , )

0(1 ) 1 ( , )

mind

i i j i j

H d i i j I

o L J

Can be solved with an alternating scheme: optimize the objective with respect to one variable while fixing others.

Page 15: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Re-Evaluating Cross-Domain Relationship

• The cross-domain instance relationship based on prior knowledge may contain noise.

• It is crucial to allow users to evaluate whether the provided relationships violate any single-domain clustering structures.

Page 16: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Re-Evaluating Cross-Domain Relationship

• We only need to slightly modify the

co-regularization loss functions by multiplying a confidence matrix

( , ) ( , ) ( , ) ( ) ( ) 2|| ( ) ||i j i j i j i jW FJ W S H H

( , )i jW

( )

( ) ( , ) ( , )

0, 0(1 ) 1 ( , )

mind

i i j i jW

W H d i i j I

o L J

Optimize:

Sort the values of W(i,j) and report to users the smallest elements.

Page 17: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

• Data sets:UCI (Iris, Wine, Ionosphere, WDBC)

Construct two cross-domain relationships: Iris-Wine, Ionosphere-WDBC, (positive/negative instances only mapped to positive/negative instances in another domain)

Newsgroup data (6 groups from 20 Newsgroups)comp.os.ms-windows.misc, comp.sys.ibm.pc.hardware,

comp.sys.mac.hardware, (3 comp)rec.motorcycles, rec.sport.baseball, rec.sport.hockey (3 rec)

protein-protein interaction (PPI) networks (from BioGrid), gene co-expression networks (from Gene Expression Ominbus), genetic interaction network (from TEAM)

Experimental Study

Page 18: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study• Effectiveness (UCI data set)

Page 19: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study• Robustness Evaluation (UCI)

Page 20: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study• Re-Evaluating Cross-Domain Relationship

(UCI)

Page 21: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study• Binary v.s. Weighted Relationship

Page 22: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study

• Binary v.s. Weighted Relationship

Page 23: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study

• Protein Module Detection by Integrating Multi-Domain Heterogeneous Data

5412 genes490032 genetic markers across 4890 (1952 disease and 2938 healthy) samples.We use 1 million top-ranked genetic marker pairs to construct the network and the test statistics as theweights on the edges

Page 24: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study

Protein Module Detection:

• Evaluation: standard Gene Set Enrichment Analysis (GSEA)we identify the most significantly enriched Gene Ontology

categories significance (p-value) is determined by the Fisher’s exact test raw p-values are further calibrated to correct for the multiple

testing problem

Page 25: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study• Protein Module Detection:

Comparison of CGC and single-domain graph clustering (k = 100)

Page 26: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study

• Protein Module Detection:

Page 27: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Conclusion

• In this paper…We propose a flexible co-regularized method,

CGC, to tackle the many-to-many, weighted, partial mappings for multi-domain graph clustering .

CGC utilizes cross-domain relationship as co-regularizing penalty to guide the search of consensus clustering structure.

CGC is robust even when the cross-domain relationships based on prior knowledge are noisy.

Page 28: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Thank You !

Questions?

Page 29: Flexible and Robust Co-Regularized Multi-Domain Graph Clustering

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study

• Performance Evaluation