Upload
pei-lee
View
368
Download
0
Embed Size (px)
DESCRIPTION
We describe the complete skills of tracking cluster evolution patterns in large evolving networks in this talk. In simple words, given a dynamic graph which is updated at each time moment, how can we incrementally monitor the evolution patterns of graph clusters? Typical evolution patterns include appear/disappear, grow/decay, merge/split. We discussed the incremental computation framework, in contrast to the traditional graph snapshot sequence approach. The ICDE 2014 paper can be found at http://www.cs.ubc.ca/~peil/research.html
Citation preview
Pei Lee, ICDE 2014, Chicago, IL, USA
Incremental Cluster Evolution Tracking
from Highly Dynamic Network Data
Pei Lee, Laks V.S. LakshmananComputer Science Department
University of British Columbia
Vancouver, BC, Canada
Evangelos E. MiliosComputer Science Department
Dalhousie University
Halifax, NS, Canada
1
2014-4-16
Outline2
Motivation
Evolving network meets social event
Incremental Computation Framework
Divide-and-conquer vs. incremental computation
Post Network Construction
Combat noise
Network and Cluster Evolution
Evolution operations
Empirical Study
Examples
Outline3
Motivation
Evolving network meets social event
Incremental Computation Framework
Divide-and-conquer vs. incremental computation
Post Network Construction
Combat noise
Network and Cluster Evolution
Evolution operations
Empirical Study
Examples
Evolving Network
Network changes with time
Examples:
Social Network
add/remove friends or followers
Co-authorship/citation network
new collaborations/citations added every year
Email/Calling Graph
every edge has a time stamp
4
An illustration of evolving co-authorship network
5
Taken from http://wiki.cns.iu.edu/pages/viewpage.action?pageId=2199676
Social Streams:
Twitter, Facebook, etc6
7
Social Event Evolution Tracking
Event Evolution Patterns8
Post Network
(time t)
Post Network
(time t+1)
Event Snapshots
(time t)
Event Snapshots
(time t+1)
Evolution
Patterns:
emerge
disappear
grow
decay
merge
split
evolve
Evolving Network
Social Events
9
Model social stream as an evolving network
Outline10
Motivation
Evolving network meets social event
Incremental Computation Framework
Divide-and-conquer vs. incremental computation
Post Network Construction
Combat noise
Network and Cluster Evolution
Evolution operations
Empirical Study
Examples
Traditional Evolving Network
Mining Approaches
Divide and Conquer:
decompose a dynamic network into a series of
snapshots for each moment,
apply graph mining algorithms on each snapshot
to find useful patterns,
match patterns between consecutive moments to
generate a dynamic pattern sequence.
Imagine the finding of evolving clusters
11
Illustrating Divide-and-Conquer12
Taken from http://sydney.edu.au/engineering/it/~shhong/gallery.htm
Moment 1Moment 2
Moment 3
Moment 4
Moment 5
Divide-and-Conquer:
Clustering in evolving networks13
Ct: a cluster we find at snapshot of time t;
Ct+1: a cluster we find at snapshot of time t+1.
How to define “Ct evolves to Ct+1”?
Heuristics:
If Ct and Ct+1 have the overlap above a given
threshold, we say they are matched.
Formally, based on Jaccard similarity:
Drawbacks of Divide-and-conquer14
Quality:
It is difficult to decide the threshold K
The matching between two consecutive snapshots
will lose accuracy
Performance:
Need to cluster each snapshot from scratch
Lots of redundant computation
New Proposal: Incremental Computation
for dense subgraph mining15
Basic Idea:
For the very first snapshot, mine the graph pattern
set S0 from scratch
After this, this step is never applied again.
On the steady state, let t start at 1
Obtain the graph update ΔG by comparing the
network at moment t with moment t-1
Derive St from St-1 based on ΔG
Let t increase to t+1
Divide-and-Conquer vs. Incremental
Computation16
Divide-and-Conquer:
1, 2, 3, 4
Incremental Computation:
Initial step: 1
Steady state: 5
Advantages:
Avoid redundant computation
More accurately capture the evolution patterns
Incremental Computation
Framework17
Adjust the clusters at each moment as the
updating of networks
Outline18
Motivation
Evolving network meets social event
Incremental Computation Framework
Divide-and-conquer vs. incremental computation
Post Network Construction
Combat noise
Network and Cluster Evolution
Evolution operations
Empirical Study
Examples
Post Network Construction19
A social stream is a FIFO queue of posts
Post similarity:
Post Network:
Each post is a node
Each edge is constructed if the similarity of end nodes is higher than a given threshold
Content similarity
Time distance
Evolving Post Network20
We can build a post network for your daily
timeline in Facebook/Twitter/LinkedIn
As the streaming of posts, the post network is
evolving very quickly
Challenges of evolving post network mining:
The quick surge of post streams (speed)
A large number of posts are noise (quality)
The huge amount of posts (scalability)
Observing Time Window21
Len: time window length
Δt: time window shifting size at each moment
Notations:
How to filter out noise?22
Noise is ubiquitous in social streams
“Good morning ”, “thank you ^.^”, etc
About 40% tweets make very little sense
How to filter out noise?23
Distinguish posts into three types:wt(p): the priority of post p at moment t
For the example in social network:
Core: person with lots of friends
Border: not core, but a friend of core
Noise: not core, and not a friend of core
Outline24
Motivation
Evolving network meets social event
Incremental Computation Framework
Divide-and-conquer vs. incremental computation
Post Network Construction
Combat noise
Network and Cluster Evolution
Evolution operations
Empirical Study
Examples
Skeletal graph of a post network25
Skeletal Graph:
A graph consisting of all core posts
A brief summary of the original post network
Clusters can be derived from skeletal graphs
Our algorithm monitors the changing of
skeletal graphs
Network Evolution Operations26
Add a post
Remove a post
Cluster Evolution Operations27
We define 6 cluster evolution patterns:
appear, disappear, grow, decay, merge and split
Summary: Cluster Evolution28
Add a post:
a new cluster may appear
An existing cluster may grow
Multiple clusters may merge into the single one
Delete a post:
An existing cluster may disappear
An existing cluster may decay
An existing cluster may split into multiple clusters
Network Evolution to Cluster Evolution29
Cluster evolution of adding a post
Network Evolution to Cluster Evolution30
Cluster evolution of deleting a post
Bulk Updating31
Existing incremental computation on dynamic
graphs usually treats the addition/deletion of
nodes or edges one by one
Since social posts arrive at a high speed, the
post-by-post incremental updating will lead to
very poor performance
Bulk updating: update subgraph-by-subgraph
a bulk = a post cluster
More details in Section VII of the paper
Proposed Algorithms32
ICM: Incremental
Cluster Maintenance
eTrack: Cluster
Evolution Tracking
Outline33
Motivation
Evolving network meets social event
Incremental Computation Framework
Divide-and-conquer vs. incremental computation
Post Network Construction
Combat noise
Network and Cluster Evolution
Evolution operations
Empirical Study
Examples
Twitter Technology domain data sets34
Time span: 1 month
Tech-Lite: collecting all the timelines of users
listed in the Technology category of “Who to
follow” and their retweeted users
streaming rate is about 11700 tweets/day
Tech-Full: collecting all the timelines followed
by users who are in the Technology category
streaming rate is about 7216 tweets/hour
Ground Truth35
Major events from News articles:
Crawl news from major technology websites
By treating the news article titles as posts, we
apply our approach to extract events
Peaks in Google Trends
Precision and recall36
HashtagPeaks: use common hashtags to compute post similarity
UnigramPeaks: use common unigrams to compute post similarity
Louvain: use common entities to compute post similarity and apply Louvain community detection algorithm
eTrack: use common entities to compute post similarity and apply our approach
Top 10 social events detected by
different methods37
Running time 38
(a) Adjusting time window length
(b) Adjusting step length
Cluster Evolution Examples
39
40
41
Conclusion42
Theoretical side:
We propose an incremental computation
framework for cluster evolution tracking in highly
dynamic networks
Application side:
We propose an efficient tracking system for event
evolution patterns in social streams
Q & A
Post Network Mining43
A snapshot of post network is constructed by
the posts in the same time window
As social posts stream in, events (dense clusters) are identified out
Relationships between post
network, skeletal graph and clusters44
Skeletal graph is a sketch of post network
Clusters can be generated from the skeletal
graphs