62
On the Separability of Structural Classes of Communities Bruno Abrahao Sucheta Soundarajan John Hopcroft Robert Kleinberg Cornell University 1 Text

On the Separability of Structural Classes of Communities

Embed Size (px)

Citation preview

Page 1: On the Separability of Structural Classes of Communities

On the Separability of Structural Classes of Communities

Bruno AbrahaoSucheta SoundarajanJohn HopcroftRobert Kleinberg Cornell University

1

Text

Page 2: On the Separability of Structural Classes of Communities

The idea of community structure as distinctive relationships

[Newman-Girvan, 2004]

2

Page 3: On the Separability of Structural Classes of Communities

Which community is real?

3

Page 4: On the Separability of Structural Classes of Communities

Which community is real?

3

Page 5: On the Separability of Structural Classes of Communities

Which community is real?

“Real” Community Random WalkMetis

Infomap Newman-Modularity Louvain3

Page 6: On the Separability of Structural Classes of Communities

How do their structures differ?

“Real” Community Random WalkMetis

Infomap Newman-Modularity Louvain3

Page 7: On the Separability of Structural Classes of Communities

The definition of community structure

• Community structure is not well defined– different people have different notions of community structure

• Traditional strategy1. start with an expectation of what a community should look like

• e.g., a set of nodes that interact more within the set than with the outside

2. define an optimization problem3. design heuristic4. the solution gives the desired communities

4

Page 8: On the Separability of Structural Classes of Communities

Two research questions that we address here

• A multitude of different of algorithms– different objective functions– different heuristicsHow dissimilar are their outputs?

• Communities may differ from the proposed mathematical constructs

– e.g., preponderance of links to the outside, as in this figure, contrary to widely accepted notions

Which algorithms extract communities that most closely resemble the structure of real communities?

5

Page 9: On the Separability of Structural Classes of Communities

Two research questions that we address here

• A multitude of different of algorithms– different objective functions– different heuristicsHow dissimilar are their outputs?

• Communities may differ from the proposed mathematical constructs

– e.g., preponderance of links to the outside, as in this figure, contrary to widely accepted notions

Which algorithms extract communities that most closely resemble the structure of real communities?

5

Page 10: On the Separability of Structural Classes of Communities

Obstacles to answering the preceding questions

• We don't know what properties communities possess

• We can't characterize communities in the absence of negative examples– Look at real communities and determine their structure– do other sets that are not communities have these properties?

– every other connected set could be a negative example - intractable

– sets that are not annotated could also be communities

• We don't know what metrics we should use– modularity, conductance, clustering coefficient...

6

Page 11: On the Separability of Structural Classes of Communities

Our plan to address the preceding questions

• Propose a methodology to analyze community structure by using different notions of communities as references

– key idea: analyze community structure without requiring negative examples of communities

• Scalable and comprehensive, simultaneously considering– multiple notions of communities– diverse domains of application – a broad spectrum of community metrics

• Assess the structural dissimilarities between– the output of different community detection algorithms– the output of algorithms and real communities

7

Page 12: On the Separability of Structural Classes of Communities

Building structural classes by extracting examples

Algorithm Network

8

Page 13: On the Separability of Structural Classes of Communities

Building structural classes by extracting examples

Algorithm Network

Apply

8

Page 14: On the Separability of Structural Classes of Communities

Building structural classes by extracting examples

Algorithm Network Extract community examples

Apply

8

Page 15: On the Separability of Structural Classes of Communities

Building structural classes by extracting examples

Algorithm 2

Algorithm 4

Algorithm k

Algorithm 1

Algorithm 3

9

Page 16: On the Separability of Structural Classes of Communities

Building structural classes by extracting examples

Algorithm 2

Algorithm 4

Algorithm k

Algorithm 1

Algorithm 3

Class 1

Class 2

Class 3

Class 4

Class k

9

Page 17: On the Separability of Structural Classes of Communities

Building a feature space by characterizing examples

Labeled Example

10

Page 18: On the Separability of Structural Classes of Communities

Building a feature space by characterizing examples

Labeled Example

Feature Vector

10

Page 19: On the Separability of Structural Classes of Communities

Building a feature space by characterizing examples

11

Page 20: On the Separability of Structural Classes of Communities

Building a feature space by characterizing examples

Feature Space11

Page 21: On the Separability of Structural Classes of Communities

Measure inter-class separability from feature space

Feature Space

12

Page 22: On the Separability of Structural Classes of Communities

Measure inter-class separability from feature space

Feature Space

Class Separability Measure

12

Page 23: On the Separability of Structural Classes of Communities

Measure inter-class separability from feature space

Feature SpaceSeparability = Distinct structures

Class Separability Measure

Are the classes separable?

12

Page 24: On the Separability of Structural Classes of Communities

We test our methods on large-scale network datasets

• Social

• Commercial

• Biological

+ Rice University

Facebook+Rice with permission of Mislove et al.. Other datasets publicly available.

13

Page 25: On the Separability of Structural Classes of Communities

We furnish our framework with10 community detection algorithms

• BFS (Random connected subgraphs)• Random-Walk-based (with and without restart)• (α,β)-communities• InfoMap• Markov Clustering• Metis• Louvain• Newman-Clauset-Moore• Link Communities

14

Page 26: On the Separability of Structural Classes of Communities

Annotated communities identify exemplar communities

+ Rice University

Metadata included in the datasets

15

Page 27: On the Separability of Structural Classes of Communities

Annotated communities identify exemplar communities

+ Rice University

Metadata included in the datasets

15

Page 28: On the Separability of Structural Classes of Communities

Annotated communities identify exemplar communities

+ Rice University

Metadata included in the datasets

15

Page 29: On the Separability of Structural Classes of Communities

Annotated communities identify exemplar communities

+ Rice University

Metadata included in the datasets

15

Page 30: On the Separability of Structural Classes of Communities

Annotated communities identify exemplar communities

+ Rice University

Metadata included in the datasets

15

Page 31: On the Separability of Structural Classes of Communities

Annotated communities identify exemplar communities

+ Rice University

Metadata included in the datasets

15

Page 32: On the Separability of Structural Classes of Communities

To what extent are the classes separable?

16

Page 33: On the Separability of Structural Classes of Communities

First separability measure: Scatter Matrices

• Traditional methods for measuring class separability give a single score, e.g., scatter matrices

Reference: the same data with shuffled labels

• This is a global measure. We need more fine-grained separability information of each class!

Network

17

Page 34: On the Separability of Structural Classes of Communities

Idea: use the performance of probabilistic multi-class classifiers

Probabilistic k-way classifier (SVM, k-NN)

Algorithm 1

Algorithm 2

Annotated communities

Train

18

Page 35: On the Separability of Structural Classes of Communities

Idea: use the performance of probabilistic multi-class classifiers

Probabilistic k-way classifier (SVM, k-NN)

Algorithm 1

Algorithm 2

Annotated communities

Train

18

Page 36: On the Separability of Structural Classes of Communities

Probabilistic k-way classifier(SVM, k-NN)

Classify(cross-validation)

19

Idea: use the performance of probabilistic multi-class classifiers

Page 37: On the Separability of Structural Classes of Communities

Probabilistic k-way classifier(SVM, k-NN)

Classify(cross-validation)

19

Idea: use the performance of probabilistic multi-class classifiers

Page 38: On the Separability of Structural Classes of Communities

Probabilistic k-way classifier(SVM, k-NN)

Classify(cross-validation)

Pr(Algorithm 1) = 0.05Pr(Algorithm 2) = 0.08...Pr(Annotated) = 0.48

19

Idea: use the performance of probabilistic multi-class classifiers

Page 39: On the Separability of Structural Classes of Communities

Cross-validation performance indicates class separability

Probabilistic-SVM cross-validation outcome with 11 structural classes. Data: DBLP network.

Ann.

Metis

MCL

Newm.

Louv.

LC

IM

AB

RW15

RW0

BFS

0.0 0.2 0.4 0.6 0.8 1.0

BFS

RW0

RW15

AB

IM

LC

Louv.

Newm.

MCL

Metis

Ann

Stru

ctur

al Cl

ass

20

Page 40: On the Separability of Structural Classes of Communities

Matching annotated communities

Which algorithms extract communities that most closely resemble the structure of annotated communities?

21

Page 41: On the Separability of Structural Classes of Communities

Repeat the preceding experiment, leaving out the class of annotated communities

Probabilistic k-way classifier

Algorithm 1

Algorithm 2

Algorithm N

Learn

22

Page 42: On the Separability of Structural Classes of Communities

Repeat the preceding experiment, leaving out the class of annotated communities

Probabilistic k-way classifier

Algorithm 1

Algorithm 2

Algorithm N

Learn

22

Page 43: On the Separability of Structural Classes of Communities

Probabilistic k-way classifier

Classify

23

Introduce the class of annotated communities in the test set

Page 44: On the Separability of Structural Classes of Communities

Probabilistic k-way classifier

Classify

23

Introduce the class of annotated communities in the test set

Page 45: On the Separability of Structural Classes of Communities

Probabilistic k-way classifier

Classify

Pr(Algorithm 1) = 0.02Pr(Algorithm 2) = 0.19...Pr(Algorithm k) = 0.12

23

Introduce the class of annotated communities in the test set

Page 46: On the Separability of Structural Classes of Communities

Classification reveals that annotated resemble unstructured methods

LJ2

LJ1

DBLP

Amazon

Fly

HS

SC

Ugrad

grad

0.0 0.2 0.4 0.6 0.8 1.0

BFS

RW0

RW15

AB

IM

LC

Louv.

Newm.

MCL

Metis

0.0 0.2 0.4 0.6 0.8 1.0

BFS

RW0

RW15

AB

IM

LC

Louv.

Newm.

MCL

Metis

Probabilistic-SVM classification of annotated communities into 11 structural classes structural class for 9 different networks.

Netw

ork

24

Page 47: On the Separability of Structural Classes of Communities

Improving the quality of the space

What classes should we consider?

25

Page 48: On the Separability of Structural Classes of Communities

The classifier confuses the two types of RW communities

Probabilistic-SVM cross-validation outcome with 11 structural classes. Data: DBLP network.

Ann.

Metis

MCL

Newm.

Louv.

LC

IM

AB

RW15

RW0

BFS

0.0 0.2 0.4 0.6 0.8 1.0

BFS

RW0

RW15

AB

IM

LC

Louv.

Newm.

MCL

Metis

Ann

Stru

ctur

al Cl

ass

26

Page 49: On the Separability of Structural Classes of Communities

Fisher’s discriminant ratio

27

A Separability Framework for Analyzing Community Structure, Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, Robert Kleinberg, To appear in ACM Transactions on Knowledge Discovery from Data (TKDD), 2013

Page 50: On the Separability of Structural Classes of Communities

Fisher’s discriminant ratio

27

A Separability Framework for Analyzing Community Structure, Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, Robert Kleinberg, To appear in ACM Transactions on Knowledge Discovery from Data (TKDD), 2013

Page 51: On the Separability of Structural Classes of Communities

Cross-validation performance indicates class separability

Probabilistic-SVM cross-validation outcome with 11 structural classes. Data: DBLP network.

Ann.

Metis

MCL

Newm.

Louv.

LC

IM

AB

RW15

RW0

BFS

0.0 0.2 0.4 0.6 0.8 1.0

BFS

RW0

RW15

AB

IM

LC

Louv.

Newm.

MCL

Metis

Ann

Stru

ctur

al Cl

ass

28

Page 52: On the Separability of Structural Classes of Communities

Cross-validation performance indicates class separability

29

Stru

ctur

al C

lass

Page 53: On the Separability of Structural Classes of Communities

Classification reveals that annotated resemble unstructured methods

LJ2

LJ1

DBLP

Amazon

Fly

HS

SC

Ugrad

grad

0.0 0.2 0.4 0.6 0.8 1.0

BFS

RW0

RW15

AB

IM

LC

Louv.

Newm.

MCL

Metis

0.0 0.2 0.4 0.6 0.8 1.0

BFS

RW0

RW15

AB

IM

LC

Louv.

Newm.

MCL

Metis

Probabilistic-SVM classification of annotated communities into 11 structural classes structural class for 9 different networks.

Netw

ork

30

Page 54: On the Separability of Structural Classes of Communities

Classification reveals that annotated resemble unstructured methods

31

Net

wor

k

Page 55: On the Separability of Structural Classes of Communities

Can we reveal latent similarities among community detection algorithms?

Our framework enables one to cluster algorithms that behave similarly

32

Page 56: On the Separability of Structural Classes of Communities

Step 1: identifying the most important features

7 features out of 36 retain the discriminative power of the full set

33

Page 57: On the Separability of Structural Classes of Communities

Grouping algorithms by their tendencieswith respect to most discriminative features

34

High

Medium

Low

Page 58: On the Separability of Structural Classes of Communities

Grouping algorithms by their tendencieswith respect to most discriminative features

34

High

Medium

Low

Page 59: On the Separability of Structural Classes of Communities

Conclusion of methodology

• We present a methodology to address the complexity of analyzing community structure, which simultaneously considers

– large number of algorithms

– multiple domains of application

– a broad spectrum of metrics to characterize community structure

• A scalable framework that enables

– researchers to compare and understand biases of new and existing community detection algorithms

– practitioners to decide on the most suitable algorithm for particular purpose and network

35

Page 60: On the Separability of Structural Classes of Communities

Conclusion of experimental analysis

• Our experimental analysis, which include 10 community detection algorithms and 9 different networks analyzed with 36 properties reveals

– High variability among the output of community detection methods– Annotated communities have a distinct structure from what we

expect• their structure is closer to the output of baseline procedures than to that of popular

algorithms

– A small set of features explain the biases produced by different algorithms

– We can organize the tapestry of available community detection algorithms by grouping them with respect to similarities in behavior

36

Page 61: On the Separability of Structural Classes of Communities

Final remarks on future directions

• Traditional methods are unsupervised– they find a particular type of community– little sensitivity to different purposes, structures of interest and domains of

application

• Our approach suggests a supervised approach to community detection– user specifies what they intended to find through examples (real or synthetic)– algorithm learns from those examples and retrieves similar structures in the

network

37

Page 62: On the Separability of Structural Classes of Communities

On the Separability of Structural Classes of Communities

Bruno AbrahaoSucheta SoundarajanJohn HopcroftRobert Kleinberg Cornell University

Thank you!

38