41
Segmenting Sequences of Node-labeled Graphs Sorour E. Amiri, Liangzhe Chen, B. Aditya Prakash Department of Computer Science Virginia Tech ICDM, DaMNet, Barcelona, Spain, December 12, 2016

Segmenting Sequences of Node-labeled Graphs

Embed Size (px)

Citation preview

Page 1: Segmenting Sequences of Node-labeled Graphs

Segmenting Sequences of Node-labeled Graphs

Sorour E. Amiri, Liangzhe Chen, B. Aditya PrakashDepartment of Computer Science

Virginia Tech

ICDM, DaMNet, Barcelona, Spain, December 12, 2016

Page 2: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS Experiments Conclusion

Amiri, Chen, Prakash 2

Page 3: Segmenting Sequences of Node-labeled Graphs

Network SequencesEpidemiology: disease spreads over contact networks

Social Media: Information spreads over friendship networks

3

Flu

Meme

Amiri, Chen, Prakash

Page 4: Segmenting Sequences of Node-labeled Graphs

Making sense of network sequences

4

Flu

when do the infection patterns change?

Star Bridge Near Clique

Reason:• Virus mutation• Vaccination• …

Amiri, Chen, Prakash

Page 5: Segmenting Sequences of Node-labeled Graphs

Making sense of network sequences

5

Meme Reason:• Event• …

Star Clique

when do the infection patterns change?

Amiri, Chen, Prakash

Page 6: Segmenting Sequences of Node-labeled Graphs

Problem 1: Network sequence segmentation

Given a sequence of networks with labeled nodes, Find the best segmentation which captures:

Different distribution of node labels.

6

Star Bridge Near CliqueAmiri, Chen, Prakash

Page 7: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS Experiments Conclusion

7Amiri, Chen, Prakash

Page 8: Segmenting Sequences of Node-labeled Graphs

Alternative 1: Feature Ext. &Time-series

8

0 0 0 … 2F1: #cliques (of active subgraph)

F2: #ladders (of inactive subgraph)

F3: #ladders (of active subgraph)

1 1 0 … 0

0 0 0 … 1

[Henderson et al. 2010] [Likas, Vlassis, and Verbeek 2003] [Li et al. 2009]

Amiri, Chen, Prakash

G1 G2 G3 G4-1

0

1

2

Features time series

F1 F2 F3

Step 1: Feature Extraction

Step 2: Time-series segmentation

Page 9: Segmenting Sequences of Node-labeled Graphs

Alternative 1: Feature Ext. &Time-series

Drawbacks: Laborious feature-engineering “Local” change detection:

o One aggregation time periodo Threshold

9Amiri, Chen, Prakash

G1 G2 G3 G4-1

0

1

2

Features time series

F1 F2 F3

Page 10: Segmenting Sequences of Node-labeled Graphs

Alternative 2: Plain-graph-based analysis

10

[Shah et al. 2015] [Sun et al. 2007] [Lin et al. 2009] [Qu et al. 2014]

Step 1: Extract active subgraphs

Amiri, Chen, Prakash

Step 2: Dynamic graph segmentation

Page 11: Segmenting Sequences of Node-labeled Graphs

Alternative 2: Plain-graph-based analysis

Drawbacks: Inactive nodes are important to detect different patterns

Amiri, Chen, Prakash 11

Entire graph Active subgraph

Page 12: Segmenting Sequences of Node-labeled Graphs

Desirable Properties P1. Parameter-free:

• No threshold, No fixed granularity

P2. Comprehensive: • Use the entire graph

12Amiri, Chen, Prakash

Page 13: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS

Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

13Amiri, Chen, Prakash

Page 14: Segmenting Sequences of Node-labeled Graphs

Overview of SnapNETS Goal 1. Summarize each graph:

Keep structural and label dependent properties

Goal 2. Construct Segmentation graph:Define nodes and edgesDefining edges weights

o extract the features of summarized graphs

Goal 3. Find the best segmentation:Define the best segmentation (path)Compute the best segmentation

14Amiri, Chen, Prakash

Page 15: Segmenting Sequences of Node-labeled Graphs

Technical Challenges Using the entire graph snapshots:

Summarize graph while satisfying P2

Finding the number of segments: Compute segmentation while satisfying P1

15

Reminder: P1. Parameter-free P2. Comprehensive

Amiri, Chen, Prakash

Page 16: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS

Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

16Amiri, Chen, Prakash

Page 17: Segmenting Sequences of Node-labeled Graphs

Goal 1: Summarizing graph snapshots

We want to preserve Structural properties Nodes labels

Role of Eigenvalue:

17Amiri, Chen, Prakash

Same leading eigenvalue ( ) of Adjacency matrix Same diffusive properties

Leading eigenvalue Epidemic threshold [Prakash et al. 2012]

Page 18: Segmenting Sequences of Node-labeled Graphs

18

Our Approach We want to get a smaller graph with similar eigenvalues:

Successively merge nodes

Amiri, Chen, Prakash

Page 19: Segmenting Sequences of Node-labeled Graphs

Problem 2: Graph summarization Given: A graph with labeled nodes and a compression ratio. Find: a coarsened graph such that:

19Amiri, Chen, Prakash

Page 20: Segmenting Sequences of Node-labeled Graphs

CoarseNet algorithm [Purohit et al.2014] Matrix perturbation approach Successively merge nodes Keep leading eigenvalue

Our tweak Do not merge nodes with different labels

Problem 2: Graph summarization

20

Given: A graph with labeled nodes and a compression ratio.Find: a coarsened graph such that:

Amiri, Chen, Prakash

Page 21: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS

Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

21Amiri, Chen, Prakash

Page 22: Segmenting Sequences of Node-labeled Graphs

Nodes: For each segment there is a node + {Source (‘s’), Target (‘t’)}

Edges: There is a directed edge between adjacent nodes

Goal 2: Segmentation graph

22Amiri, Chen, Prakash

Page 23: Segmenting Sequences of Node-labeled Graphs

Edge Weights

23

How can we measure the distance between two segments?Amiri, Chen, Prakash

Page 24: Segmenting Sequences of Node-labeled Graphs

Our Approach Step 1: Extract features from summary graphs:

Easier and more efficient than on original graphs. No complex features

24Amiri, Chen, Prakash

Page 25: Segmenting Sequences of Node-labeled Graphs

Step 2: Distance of adjacent segments

25

Edge Weights

Amiri, Chen, Prakash

Page 26: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS

Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

26Amiri, Chen, Prakash

Page 27: Segmenting Sequences of Node-labeled Graphs

Goal 3: Finding the best segmentation Observation:

For each segmentation there is a path from ‘s’ to ‘t’For each path from ‘s’ to ‘t’ there is a segmentation

Therefore,• Best segmentation problem Path optimization problem

27Amiri, Chen, Prakash

Page 28: Segmenting Sequences of Node-labeled Graphs

Possible approach Longest path?

28

S t. . .

S t0.01 0.01 0.01 0.01

0.9 0.9 0.9

Sum = 3

Sum = 2.7

Over segmentation problem

Amiri, Chen, Prakash

Page 29: Segmenting Sequences of Node-labeled Graphs

Problem 3: Finding the best segmentation

Our idea: Average longest path

Advantages: Parameter free Naturally balances weight of the path with the number of segments.

29

Given a segmentation graphFind the average longest path from ‘s’ to ‘t’

Amiri, Chen, Prakash

Page 30: Segmenting Sequences of Node-labeled Graphs

Solving ALP Finding the ALP in general graphs is NP-hard. The segmentation graph is a DAG ALP can be solved in

polynomial time Negative cycle detection [Waggoner et al. 2013]

30Amiri, Chen, Prakash

Page 31: Segmenting Sequences of Node-labeled Graphs

Complete algorithm

31

Time complexity:

Amiri, Chen, Prakash

Page 32: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS

Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

32Amiri, Chen, Prakash

Page 33: Segmenting Sequences of Node-labeled Graphs

Experiments: datasets Different Domains with range of sizes:

BA-degree: Random Barabasi Albert graph Higgs: Tweets dataset (with the follower-followee network) Memetracker: Who-copies-from-whom blog and website network DBLP: Co-authorship network related to ‘network’ topic.

33Amiri, Chen, Prakash

Page 34: Segmenting Sequences of Node-labeled Graphs

Experiments: baselines DYNAMMO [Li et al. 2009]:

Feature Etraction & time series Change point detection ( Reconstruction errors) # segments = # segments of SnapNETS .

VOG [Koutra et al. 2014]: Get active sub-graph 10 most important sub-structures Cut when the set of sub-structures changes significantly

o (threshold = the one gives the best result)

SN-LP: Longest Path instead of ALP

34Amiri, Chen, Prakash

Page 35: Segmenting Sequences of Node-labeled Graphs

Experiments: Quantitative analysis

35

SnapNETS outperforms the baselines Clear patterns in summary graphs

We found Ground truth segmentation

As-Oregon

Amiri, Chen, Prakash

Page 36: Segmenting Sequences of Node-labeled Graphs

Case studies: Memetracker

36

Televised vice-presidential debates

Summary graphs are close to the case when all nodes have the same label (f5)

Random nodes are active (f8)

Summary graphs are substantially sparser (f2).

Many active nodes got merged into important nodes such as CNN and BBC to form hubs (f6)

Amiri, Chen, Prakash

Page 37: Segmenting Sequences of Node-labeled Graphs

Case studies: AS-Oregon

37

New community New segment

Amiri, Chen, Prakash

Page 38: Segmenting Sequences of Node-labeled Graphs

Outline Motivation Background Our Proposed Method: SnapNETS

Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation

Experiments Conclusion

38Amiri, Chen, Prakash

Page 39: Segmenting Sequences of Node-labeled Graphs

Conclusion: SnapNets Properties:

P1. Parameter-free P2. Comprehensive

Patterns: the ‘placement’ and ‘connection’ of active/inactive nodes:

• structural (e.g. community/role/centrality) • rate changes.

Global method: SnapNETS is a ‘global’ method and not simply a change-point detection method.

39Amiri, Chen, Prakash

Page 40: Segmenting Sequences of Node-labeled Graphs

Future Work Faster ALP: Linear? Handle dynamic graphs with varying

nodes and edges More node labels and real value features Work with partially observed graphs

40Amiri, Chen, Prakash

Page 41: Segmenting Sequences of Node-labeled Graphs

Any questions?

41

Funding:

Code at: https://github.com/SorourAmiri/SnapNETS

Sorour E. Amiri Liangzhe Chen B. Aditya Prakash

Goal 1 Goal 2 Goal 3Finding the best segmentation

Successively merge nodesKeep leading eigenvalueKeep same set of labels

Graph summarization Segmentation graph Nodes Edges Edge weights

ALP

SnapNETS Result