Upload
price-bates
View
88
Download
1
Tags:
Embed Size (px)
DESCRIPTION
GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs. Jimeng Sun CMU Spiros Papadimitriou IBM Philip S. Yu IBM Christos Faloutsos CMU. Motivation of GraphScope. Time-evolving graphs Network traffic graphs Email networks Customer product relationships - PowerPoint PPT Presentation
Citation preview
GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun CMU
Spiros Papadimitriou IBM
Philip S. Yu IBM
Christos Faloutsos CMU
Motivation of GraphScope
Time-evolving graphs Network traffic graphs Email networks Customer product relationshipsCall detail records in telecom networks Financial transaction data
Key questions:1. How to monitor community structures?
2. How to detect the change points?
2
3
1. Community discovery
5 10 15 20 25
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
Products
Graph Adjacency matrix
289 /300
48/50
5/200 2/75
Books
CEOsResearchers
BMWs
97%
96%
3%
3%
54%54%
Simultaneously group: customers and products,or, source-destination traffic graphs,or, sender-recipient communication, etc…
Cus
tom
ers
Product groups
Cus
tom
er g
roup
s
Customers
ProductsCustomers
Products
e.g.,
4
2. Change detection
time
Find change points in group structure
Products
Cus
tom
ers
Produ
cts
holiday season
Given graphs G1, G2, … Gt where Gi is n-by-m
1. partition them into time segments G(1), G(2), …
2. for each segment, identify the groups
5
Problem definition
time
1. Scalable, 2. Parameter-free, 3. Incremental
G(1) G(2)
6
Outline
MotivationGraphScope
Community discovery Change detection
Experiments
7
Community detectionClustering problem Compression problem
t = 0 t = 1 t = 2
8
Cost objective within a time segment
p 1,1
p 1,2
p 1,3
p 2,1
p 2,2
p 2,3
p 3,3
p 3,2
p 3,1
n1
n2
n3k =
3 row
groups
m 1
m 2
m 3
ℓ = 3
col. g
roup
s
dsegment duration
log dnimj
i,j d nimj H(pi,j)
density of ones (edges)
d n1m2 H(p1,2) bits for (1,2)
code cost
bits total
i,j+
description cost
+
+ log* d
9
Cost objective within a time segment
code cost(blocks)
description cost(blocks’ model)
+
one row groupone col group
n row groupsm col groups
low
high low
high
10
Cost objectivewithin a time segment
code cost(blocks)
description cost(blocks’ model)
+
k = 3 row groupsℓ = 3 col groups
low
low
Search for the optimum grouping
Problem is NP-hard even for one timestamp on column permutation onlyReduction from TSP problem [Johnson+ 03]
HeuristicsSearch: Split, Merge, Shuffle Initialization: Resume, Restart
11
12
Outline
MotivationGraphScope
Community discovery Change detection
Experiments
13
Change point detection
Option 1:Append to current segment
14
Change point detection
change point
Option 2:Start new segment
15
Change point detection
1: append
2: split (time)
In both cases, we do row & col. shuffles, splits and/or merges
Choose the most parsimonious option
16
Outline
MotivationGraphScope
Single timestamp Multiple timestamp
Experiments
Objectives
Effectiveness on Community discoveryChange detection
Compression benefit Scalable, incremental computation
17
18
Evolving communitiesNETWORK
29K hosts (nodes)12K edges (on avg)1,220 hours
~ 14.6M edges totaltime
19
Community change pointsENRON
34K email addresses12K emails (on avg)165 weeks
~ 2M emails total
Key change-pointscorrespond to
key events
Compression gain
20GraphScope gives 10%-150% compression gain
Graphscope
21
Graph stream clusteringScalability—NETWORK
29K hosts (nodes) 12K edges per hour (on average) 1,220 hours (timestamps) ~ 14.6M edges total
< 2 sec / snapshot on avg
Related work
Co-clustering [Dhillon+ KDD03] [Chakrabarti+ KDD04]
Graph partitioning [Karypis+ 99]
Time-evolving graphs [Chakrabarti+ KDD06] [Chi+ KDD07] [Asur+ KDD07]
22
23
Summary
Organize into few, homogeneous communities
Find changes in community structure
Scalable Parameter-free Incremental
GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun
Spiros Papadimitriou
Philip S. Yu
Christos Faloutsos
25
Graph stream clustering
t = 0 t = 1 t = 2
28
Graph clustering – [Chakrabarti+ KDD’04]
versus
Column groups Column groups
Row
gro
ups
Row
gro
ups
Good Clustering
1. Similar nodes are grouped together
2. As few groups as necessary
A few, homogeneous
blocks
Good Compression
Why is this better?
implies
29
Graph clustering – [Chakrabarti+ KDD’04]
versus
Column groups Column groups
Row
gro
ups
Row
gro
ups
Good Clustering
1. Similar nodes are grouped together
2. As few groups as necessary
A few, homogeneous
blocks
Good Compression
Why is this better?
implies
Good Clustering
GoodCompression
implies
30
log nimj
Assumes group paritionings,sizes and densities are given
i,j nimj H(pi,j)
Cost objective
n1
n2
n3
m1 m2 m3
p1,1 p1,2 p1,3
p2,1 p2,2 p2,3
p3,3p3,2p3,1
n £ m adj. matrix
k =
3 r
ow g
roup
s
ℓ = 3 col. groups
density of ones (edges)
n1m2 H(p1,2) bits for (1,2)
code cost
bits total
irow-partitionidescription j
col-partitionjdescription
i,jtransmit#edges ei,j
+
+
description cost
+
block size entropy
31
Graph clusteringScalability
Number of edges
Tim
e (s
ec)
Splits
Shuffles
Linear on the number of edges Scalable
Time vs. Size
32
Cost objective
code cost(blocks)
description cost(blocks’ model)
+
one row groupone col group
n row groupsm col groups
low
high low
high
33
Cost objective
code cost(blocks)
description cost(blocks’ model)
+
k = 3 row groupsℓ = 3 col groups
low
low
34
Search for optimum
k
ℓ
bit
cost
Cost vs. number of groups
one row
groupone
col group
n row
groupsm
col g
roupsk =
3 row
groupsℓ =
3 co
l groups
35
splitshuffle
k = 5, ℓ = 5k = 5, ℓ = 5
Search for optimumSummary
k=1, ℓ=2 k=2, ℓ=2 k=2, ℓ=3 k=3, ℓ=3 k=3, ℓ=4 k=4, ℓ=4 k=4, ℓ=5
k = 1, ℓ = 1
splitshuffle
Split:Increase k or ℓ
Shuffle:Rearrange rows and cols
Merge:Decrease k or ℓ
36
Graph clustering – [Chakrabarti+ KDD’04]
Given a graph of interactions or associationsCustomers to products Documents to termsPeople to peopleComputer communicationsFinancial transactions
Find simultaneouslyCommunities (source and destination)Their number