Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Rise and Fall Patterns of Information Diffusion: Model and Implications �Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU), Lei Li (UCB), Christos Faloutsos (CMU) �
KDD 2012 1 Y. Matsubara et al.
Motivation �
Social media facilitate faster diffusion of news and rumors
KDD 2012 2
Q: How do news and rumors spread in social media? �
Y. Matsubara et al.
News spread in social media �
MemeTracker [Leskovec et al. KDD’09] short phrases sourced from U.S. politics in 2008
KDD 2012 3
20 40 60 80 100 120 140 1600
100
200
Time (hours)
# of
men
tions
“you can put lipstick on a pig” (# of mentions in blogs)
“yes we can”
Y. Matsubara et al.
20 40 60 80 100 120 140 1600
50
100
Time (hours)
# of
men
tions
(per hour, 1 week)
News spread in social media �
MemeTracker [Leskovec et al. KDD’09] short phrases sourced from U.S. politics in 2008
KDD 2012 4
20 40 60 80 100 120 140 1600
100
200
Time (hours)
# of
men
tions
“you can put lipstick on a pig” (# of mentions in blogs)
“yes we can”
Y. Matsubara et al.
20 40 60 80 100 120 140 1600
50
100
Time (hours)
# of
men
tions
Breaking news�
Decay �News
spread
(per hour, 1 week)
Rise and fall patterns in social media�
Twitter (# of hashtags per hour)
Google trend (# of queries per week)
KDD 2012 5
0 20 40 60 80 10020406080
100120
Time (weeks)
Valu
e
10 20 30 40 50 600
50
100
Time
Valu
e
0 50 100 1500
100
200
Time
Valu
e
0 50 100 1500
1000
2000
Time
Valu
e
Y. Matsubara et al.
“#assange” “#stevejobs”
“harry potter” (2010 - 2011) “tsunami” (in 2005)
(per hour, 1week) (per hour, 1 week)
(per week, 1 year) (per week, 2 years)
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
Rise and fall patterns in social media�
KDD 2012 6
How many patterns are there? -Earlier work claims there’re several classes • four classes on YouTube [Crane et al. PNAS’08] • six classes on Media [Yang et al. WSDM’11]
Y. Matsubara et al.
Rise and fall patterns in social media�
KDD 2012 7
Q. How many classes are there after all?
A. Our answer is “ONE”!
We can represent all patterns by single model
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
Y. Matsubara et al.
Outline �
KDD 2012 8
- Motivation - Problem definition - Proposed method - Experiments - Discussion – SpikeM at work - Conclusions �
Y. Matsubara et al.
Problem definition �
KDD 2012 9
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
Goal: predict/model social activity �
Given:
- Network of bloggers/users - External shock/event - Quality of the event β Find:
- How blogging activity will evolve over time
Problem 1 (What-if?)�β
Y. Matsubara et al.
Problem definition �
KDD 2012 10
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
Goal: predict/model social activity �
Given:
- Behavior of spikes Find:
- Equation/model that can explain them, e.g., - # of potential bloggers - Strength of external shock - Quality of the event β
Epidemic process by word-of-mouth
Problem 2 (Model design) �
β
Y. Matsubara et al.
Outline �
KDD 2012 11
- Motivation - Problem definition - Proposed method - Experiments - Discussion – SpikeM at work - Conclusions �
Y. Matsubara et al.
Proposed method�
SpikeM capture 3 properties of real spike
KDD 2012 12
20 40 60 80 100 120 140 1600
100
200
Time (hours)
# of
men
tions
1. periodicities
Y. Matsubara et al.
Proposed method�
SpikeM capture 3 properties of real spike
KDD 2012 13
20 40 60 80 100 120 140 1600
100
200
Time (hours)
# of
men
tions
2. avoid infinity
Y. Matsubara et al.
1. periodicities
Proposed method�
SpikeM capture 3 properties of real spike
KDD 2012 14
20 40 60 80 100 120 140 1600
100
200
Time (hours)
# of
men
tions
3. power-law fall
Y. Matsubara et al.
1. periodicities 2. avoid infinity
Proposed method�
SpikeM capture 3 properties of real spike
KDD 2012 15
20 40 60 80 100 120 140 1600
100
200
Time (hours)
# of
men
tions
3. power-law fall
SpikeM capture behavior of real spikes using few parameters
Y. Matsubara et al.
1. periodicities 2. avoid infinity
Main idea (details) �- 1. Un-informed bloggers (clique of N bloggers/nodes)
KDD 2012 16
Time n=0
Y. Matsubara et al.
Nodes (bloggers) consist of two states
- Un-informed of rumor
- informed, and Blogged about rumor
U
B
Main idea (details) �- 1. Un-informed bloggers (clique of N bloggers/nodes)
- 2. External shock at time nb (e.g, breaking news)
KDD 2012 17
Time n=0 Time n=nb
Y. Matsubara et al.
External shock
- Event happened at time - bloggers are informed, blog about news
nb
Sb
Sb
Main idea (details) �- 1. Un-informed bloggers (clique of N bloggers/nodes)
- 2. External shock at time nb (e.g, breaking news)
- 3. Infection (word-of-mouth effects)
KDD 2012 18
Time n=0 Time n=nb Time n=nb+1
β
Y. Matsubara et al.
Infectiveness of a blog-post
- Strength of infection (quality of news) - Decay function (how infective a blog posting is)
!f (n)
Main idea (details) �- 1. Un-informed bloggers (clique of N bloggers/nodes)
- 2. External shock at time nb (e.g, breaking news)
- 3. Infection (word-of-mouth effects)
KDD 2012 19
Time n=0 Time n=nb Time n=nb+1
β
Y. Matsubara et al.
Infectiveness of a blog-post
- Strength of infection (quality of news) - Decay function (how infective a blog posting is)
!f (n)
Decay function: �
f (n) = ! *n!1.5
f (n)
n
Linear scale f (n)
n
Log scale
-1.5
SpikeM-base (details) �Equations of SpikeM (base)
KDD 2012 20
- Total population of available bloggers - Strength of infection/news - External shock at birth (time ) - Background noise
nbSb
N
nb,Sb
!
!
Y. Matsubara et al.
!B(n+1) =U(n) " (!B(t)+ S(t)) " f (n+1# t)+!t=nb
n
$
U(n+1) =U(n)!"B(n+1)Un-informed
Blogged
SpikeM - with periodicity (details) �Full equation of SpikeM
KDD 2012 21 Y. Matsubara et al.
U(n+1) =U(n)!"B(n+1)
Un-informed
Blogged
!B(n+1) = p(n+1) " U(n) " (!B(t)+ S(t)) " f (n+1# t)+!t=nb
n
$%
&''
(
)**Periodicity
12pm Peak activity 3am
Low activity
Time n
activity
p(n)Bloggers change their
activity over time (e.g., daily, weekly, yearly)�
Learning parameters - Given a real time sequence - Minimize the error (Levenberg-Marquardt (LM) fitting)
Model fitting (Details) �
SpikeM consists of 7 parameters
KDD 2012 22
! = {N,",nb,Sb,#,Pa,Ps}
X = {X(1),...X(n),...,X(nd )}
D(X,! ) = (X(n)!"B(n))2n=1
nd
#
Y. Matsubara et al.
Analysis �
SpikeM matches reality exponential rise and power-raw fall
KDD 2012 23
Y. Matsubara et al.
rise fall
SpikeM vs. SI model (susceptible infected model)
Analysis �
KDD 2012 24
Linear-log
Log-log
Rise-part SpikeM: exponential SI model: exponential
Y. Matsubara et al.
rise fall
Reverse x-axis�
Analysis �
KDD 2012 25
Linear-log
Log-log
Y. Matsubara et al.
rise fall
Fall-part SpikeM: power law SI model: exponential
SpikeM matches reality
Outline �
KDD 2012 26
- Motivation - Problem definition - Proposed method - Experiments - Discussion – SpikeM at work - Conclusions �
Y. Matsubara et al.
Experiments�
We answer the following questions…
KDD 2012 27
Q1. Match real spikes - Q1-1: K-SC clusters - Q1-2: MemeTracker - Q1-3: Twitter - Q1-4: Google trend
Q2. Forecast future patterns
Y. Matsubara et al.
Q1-1 Explaining K-SC clusters�
KDD 2012 28
Six patterns of K-SC [Yang et al. WSDM’11]
SpikeM can generate all patterns in K-SC
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Valu
e
OriginalSpikeM
Y. Matsubara et al.
Q1-2 Matching MemeTracker patterns �
KDD 2012 29
MemeTracker (memes in blogs) [Leskovec et al. KDD’09]
SpikeM can fit various patterns in blog
Linear scale
Log scale
Y. Matsubara et al.
Noise-robust
fitting�Outliers
Q1-3 Matching Twitter data�
KDD 2012 30
Twitter data (hashtags)
SpikeM can generate various patterns in social media
Y. Matsubara et al.
Linear scale
Log scale
Q1-4 Matching Google trend data�
KDD 2012 31
Volume of searches for queries on Google
SpikeM can capture various patterns
Y. Matsubara et al.
Q2 Tail-part forecasts�
KDD 2012 32
- Given a first part of the spike - forecast the tail part
SpikeM can capture tail part (AR: fail)
Y. Matsubara et al.
Outline �
KDD 2012 33
- Motivation - Problem definition - Proposed method - Experiments - Discussion – SpikeM at work - Conclusions �
Y. Matsubara et al.
SpikeM at work �
SpikeM is capable of various applications
KDD 2012 34
- A1. What-if forecasting
- A2. Outlier detection
- A3. Reverse engineering
Y. Matsubara et al.
A1. “What-if” forecasting �
KDD 2012 35
Forecast not only tail-part, but also rise-part!
e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date
? � ? �
(1) First spike (2) Release date (3) Two weeks before release
Y. Matsubara et al.
A1. “What-if” forecasting �
KDD 2012 36
Forecast not only tail-part, but also rise-part!
SpikeM can forecast upcoming spikes
Y. Matsubara et al.
(1) First spike (2) Release date (3) Two weeks before release
A2. Outlier detection �
KDD 2012 37
Fitting result of “tsunami (Google trend)” in log-log scale
Y. Matsubara et al.
Another earthquake �
One year after Indian Ocean earthquake �Indian Ocean
earthquake �
A3. Reverse engineering �
KDD 2012 38
SpikeM provide an intuitive explanation PDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Meme�
Twitter�
A3. Reverse engineering �
KDD 2012 39
SpikeM provide an intuitive explanation PDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Observation 1 Total population N is
almost same
N =1,000 ~ 2,000
Meme�
Twitter�
A3. Reverse engineering �
KDD 2012 40
SpikeM provide an intuitive explanation PDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Observation 2 Strength of first burst (news) is
! *N =1.0
Meme�
Twitter�
A3. Reverse engineering �
KDD 2012 41
SpikeM provide an intuitive explanation PDF of parameters over 1,000 memes/hashtags
Y. Matsubara et al.
Observation 3 Daily periodicity with phase shift
Every meme has the same periodicity without lag
Ps = 0
(Twitter) Daily periodicity with
more spread in (i.e., Multiple time zone)
Ps
Meme�
Twitter�
Outline �
KDD 2012 42
- Motivation - Background - Proposed method - Experiments - Discussion – SpikeM at work - Conclusions �
Y. Matsubara et al.
Conclusions�
KDD 2012 43
SpikeM has following advantages:
• Unification power It includes earlier patterns/models
• Practicality: It Matches real datasets
• Parsimony It requires only 7 parameters
• Usefulness: What-if scenarios, outliers, etc.
Y. Matsubara et al.
Acknowledgements
Thanks Jaewon Yang & Jure Leskovec for the six clusters [WSDM’11]
Funding
44 KDD 2012 Y. Matsubara et al.
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
0 50 1000
50
100
Thank you�
KDD 2012 45 Y. Matsubara et al.
Code: http://www.kecl.ntt.co.jp/csl/sirg/people/yasuko/software.html
Email: matsubara.yasuko lab.ntt.co.jp
Yasushi Sakurai
Yasuko Matsubara
B. Aditya Prakash
Lei Li Christos Faloutsos