142
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU

Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Mining Billion-Node Graphs - Patterns and Algorithms

Christos Faloutsos CMU

Page 2: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Thank you! •  Panos Chrysanthis •  Ling Liu •  Vladimir Zadorozhny •  Prashant Krishnamurthy

C-BIG'12 C. Faloutsos (CMU) 2

Page 3: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 3

Resource

Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining

System) •  www.cs.cmu.edu/~pegasus •  code and papers C-BIG'12

Page 4: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 4

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

C-BIG'12

Page 5: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 5

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

>$10B revenue

>0.5B users

C-BIG'12

Page 6: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 6

Graphs - why should we care? •  IR: bi-partite graphs (doc-terms)

•  web: hyper-text graph

•  ... and more:

D1

DN

T1

TM

... ...

C-BIG'12

Page 7: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 7

Graphs - why should we care? •  ‘viral’ marketing •  web-log (‘blog’) news propagation •  computer network security: email/IP traffic

and anomaly detection •  .... •  Subject-verb-object -> graph •  Many-to-many db relationship -> graph

C-BIG'12

Page 8: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 8

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

C-BIG'12

Page 9: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 9

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

C-BIG'12

Page 10: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 10

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

C-BIG'12

Page 11: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 11

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

–  Large datasets reveal patterns/anomalies that may be invisible otherwise…

C-BIG'12

Page 12: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 12

Graph mining •  Are real graphs random?

C-BIG'12

Page 13: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 13

Laws and patterns •  Are real graphs random? •  A: NO!!

– Diameter –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

C-BIG'12

Page 14: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 14

Solution# S.1 •  Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

internet domains

att.com

ibm.com

C-BIG'12

Page 15: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 15

Solution# S.1 •  Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

C-BIG'12

Page 16: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 16

Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

C-BIG'12

Page 17: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 17

Solution# S.2: Eigen Exponent E

•  [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

C-BIG'12

Page 18: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 18

But: How about graphs from other domains?

C-BIG'12

Page 19: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 19

More power laws: •  web hit counts [w/ A. Montgomery]

Web Site Traffic

in-degree (log scale)

Count (log scale)

Zipf

users sites

``ebay’’

C-BIG'12

Page 20: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 20

epinions.com •  who-trusts-whom

[Richardson + Domingos, KDD 2001]

(out) degree

count

trusts-2000-people user

C-BIG'12

Page 21: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

And numerous more •  # of sexual contacts •  Income [Pareto] –’80-20 distribution’ •  Duration of downloads [Bestavros+] •  Duration of UNIX jobs (‘mice and

elephants’) •  Size of files of a user •  … •  ‘Black swans’ C-BIG'12 C. Faloutsos (CMU) 21

Page 22: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 22

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs •  degree, diameter, eigen, •  triangles •  cliques

– Weighted graphs – Time evolving graphs

•  Problem#2: Tools C-BIG'12

Page 23: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 23

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

C-BIG'12

Page 24: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 24

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns?

C-BIG'12

Page 25: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 25

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of participating triangles Y: count (~ pdf)

C-BIG'12

Page 26: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 26

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions

C-BIG'12

X-axis: # of participating triangles Y: count (~ pdf)

Page 27: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 27

Triangle Law: #S.4 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

C-BIG'12

Page 28: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 28

Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos)

Q: Can we do that quickly?

details

C-BIG'12

Page 29: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 29

Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos)

Q: Can we do that quickly? A: Yes!

#triangles = 1/6 Sum ( λi3 )

(and, because of skewness (S2) , we only need the top few eigenvalues!

details

C-BIG'12

Page 30: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 30

Triangle Law: Computations [Tsourakakis ICDM 2008]

1000x+ speed-up, >90% accuracy

details

C-BIG'12

Page 31: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 31 C-BIG'12 31 C. Faloutsos (CMU)

Page 32: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 32 C-BIG'12 32 C. Faloutsos (CMU)

Page 33: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 33 C-BIG'12 33 C. Faloutsos (CMU)

Page 34: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 34 C-BIG'12 34 C. Faloutsos (CMU)

Page 35: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin

Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010.

C. Faloutsos (CMU) 35 C-BIG'12

Page 36: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

!  equivalent to singular vectors (symmetric, undirected graph)

36 C. Faloutsos (CMU) C-BIG'12

Page 37: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

!  equivalent to singular vectors (symmetric, undirected graph)

37 C. Faloutsos (CMU) C-BIG'12

N

N

details

Page 38: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

!  equivalent to singular vectors (symmetric, undirected graph)

38 C. Faloutsos (CMU) C-BIG'12

N

N

details

Page 39: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

!  equivalent to singular vectors (symmetric, undirected graph)

39 C. Faloutsos (CMU) C-BIG'12

N

N

details

Page 40: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

!  equivalent to singular vectors (symmetric, undirected graph)

40 C. Faloutsos (CMU) C-BIG'12

N

N

details

Page 41: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes •  EE plot: •  Scatter plot of

scores of u1 vs u2 •  One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 41

u1

u2

C-BIG'12

1st Principal component

2nd Principal component

Page 42: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes •  EE plot: •  Scatter plot of

scores of u1 vs u2 •  One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 42

u1

u2 90o

C-BIG'12

Page 43: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes - pervasiveness • Present in mobile social graph

! across time and space • Patent citation graph

43 C. Faloutsos (CMU) C-BIG'12

Page 44: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

44 C. Faloutsos (CMU) C-BIG'12

Page 45: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

45 C. Faloutsos (CMU) C-BIG'12

Page 46: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

46 C. Faloutsos (CMU) C-BIG'12

Page 47: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

So what?

! Extract nodes with high scores

!  high connectivity ! Good “communities”

spy plot of top 20 nodes

47 C. Faloutsos (CMU) C-BIG'12

Page 48: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Bipartite Communities!

magnified bipartite community

patents from same inventor(s)

`cut-and-paste’ bibliography!

48 C. Faloutsos (CMU) C-BIG'12

Page 49: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

(maybe, botnets?)

Victim IPs?

Botnet members?

49 C. Faloutsos (CMU) C-BIG'12

Page 50: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 50

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs •  degree, diameter, eigen, •  triangles •  cliques

– Weighted graphs – Time evolving graphs

•  Problem#2: Tools C-BIG'12

Page 51: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 51

Observations on weighted graphs?

•  A: yes - even more ‘laws’!

M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008

C-BIG'12

Page 52: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 52

Observation W.1: Fortification Q: How do the weights of nodes relate to degree?

C-BIG'12

Page 53: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 53

Observation W.1: Fortification

More donors, more $ ?

$10

$5

C-BIG'12

‘Reagan’

‘Clinton’ $7

Page 54: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Edges (# donors)

In-weights ($)

C. Faloutsos (CMU) 54

Observation W.1: fortification: Snapshot Power Law

•  Weight: super-linear on in-degree •  exponent ‘iw’: 1.01 < iw < 1.26

Orgs-Candidates

e.g. John Kerry, $10M received, from 1K donors

More donors, even more $

$10

$5

C-BIG'12

Page 55: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 55

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  …

C-BIG'12

Page 56: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 56

Problem: Time evolution •  with Jure Leskovec (CMU ->

Stanford)

•  and Jon Kleinberg (Cornell – sabb. @ CMU)

C-BIG'12

Page 57: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 57

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data?

C-BIG'12

Page 58: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 58

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data? •  Diameter shrinks over time

C-BIG'12

Page 59: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 59

T.1 Diameter – “Patents”

•  Patent citation network

•  25 years of data •  @1999

–  2.9 M nodes –  16.5 M edges

time [years]

diameter

C-BIG'12

Page 60: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 60

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

C-BIG'12

Page 61: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 61

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

•  A: over-doubled! – But obeying the ``Densification Power Law’’

C-BIG'12

Page 62: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 62

T.2 Densification – Patent Citations

•  Citations among patents granted

•  @1999 –  2.9 M nodes –  16.5 M edges

•  Each year is a datapoint

N(t)

E(t)

1.66

C-BIG'12

Page 63: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 63

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  …

C-BIG'12

Page 64: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 64

T.3 : popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1 2 3

@t

@t + lag

C-BIG'12

Page 65: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 65

T.3 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent?

# in links (log)

days after post (log)

C-BIG'12

Page 66: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 66

T.3 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 •  close to -1.5: Barabasi’s stack model •  and like the zero-crossings of a random walk

# in links (log) -1.6

days after post (log)

C-BIG'12

Page 67: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C-BIG'12 C. Faloutsos (CMU) 67

-1.5 slope J. G. Oliveira & A.-L. Barabási Human Dynamics: The

Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Response time (log)

Prob(RT > x) (log)

Page 68: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

T.4: duration of phonecalls Surprising Patterns for the Call

Duration Distribution of Mobile Phone Users

Pedro O. S. Vaz de Melo, Leman Akoglu, Christos Faloutsos, Antonio A. F. Loureiro

PKDD 2010

C-BIG'12 C. Faloutsos (CMU) 68

Page 69: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Probably, power law (?)

C-BIG'12 C. Faloutsos (CMU) 69

??

Page 70: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

No Power Law!

C-BIG'12 C. Faloutsos (CMU) 70

Page 71: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

‘TLaC: Lazy Contractor’ •  The longer a task (phonecall) has taken, •  The even longer it will take

C-BIG'12 C. Faloutsos (CMU) 71

Odds ratio= Casualties(<x): Survivors(>=x) == power law

Page 72: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

72

Data Description

"  Data from a private mobile operator of a large city "  4 months of data "  3.1 million users "  more than 1 billion phone records

"  Over 96% of ‘talkative’ users obeyed a TLAC distribution (‘talkative’: >30 calls)

C-BIG'12 C. Faloutsos (CMU)

Page 73: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Outliers:

C-BIG'12 C. Faloutsos (CMU) 73

Page 74: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 74

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– Belief Propagation – Tensors – Spike analysis

•  Problem#3: Scalability •  Conclusions

C-BIG'12

Page 75: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C-BIG'12 C. Faloutsos (CMU) 75

E-bay Fraud detection

w/ Polo Chau & Shashank Pandit, CMU [www’07]

Page 76: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C-BIG'12 C. Faloutsos (CMU) 76

E-bay Fraud detection

Page 77: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C-BIG'12 C. Faloutsos (CMU) 77

E-bay Fraud detection

Page 78: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C-BIG'12 C. Faloutsos (CMU) 78

E-bay Fraud detection - NetProbe

Page 79: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C-BIG'12 C. Faloutsos (CMU) 79

E-bay Fraud detection - NetProbe

F A H F 99% A 99% H 49% 49%

Compatibility matrix

heterophily

Page 80: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Popular press

And less desirable attention: •  E-mail from ‘Belgium police’ (‘copy of

your code?’) C-BIG'12 C. Faloutsos (CMU) 80

Page 81: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 81

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– Belief Propagation – Tensors – Spike analysis

•  Problem#3: Scalability •  Conclusions

C-BIG'12

Page 82: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

GigaTensor: Scaling Tensor Analysis Up By 100 Times –

Algorithms and Discoveries

U Kang

Christos Faloutsos

KDD’12

Evangelos Papalexakis

Abhay Harpale

C-BIG'12 82 C. Faloutsos (CMU)

Page 83: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Background: Tensor

•  Tensors (=multi-dimensional arrays) are everywhere – Hyperlinks &anchor text [Kolda+,05]

URL 1�

URL 2�

Anchor Text�

Java�

C++�

C#�

11

1

11

1 1

C-BIG'12 83 C. Faloutsos (CMU)

Page 84: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Background: Tensor

•  Tensors (=multi-dimensional arrays) are everywhere – Sensor stream (time, location, type) – Predicates (subject, verb, object) in knowledge base

“Barrack Obama is the president of

U.S.” �

“Eric Clapton plays guitar” �

(26M)�

(26M)�

(48M)�

NELL (Never Ending Language Learner) data

Nonzeros =144M�

C-BIG'12 84 C. Faloutsos (CMU)

Page 85: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Problem Definition

•  How to decompose a billion-scale tensor? – Corresponds to SVD in 2D case

C-BIG'12 85 C. Faloutsos (CMU)

Page 86: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS Problem Definition

#  Q1: Dominant concepts/topics?�#  Q2: Find synonyms to a given noun phrase? #  (and how to scale up: |data| > RAM)

(26M)�

(26M)�

(48M)�

NELL (Never Ending Language Learner) data

Nonzeros =144M�

C-BIG'12 86 C. Faloutsos (CMU)

Page 87: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Experiments

•  GigaTensor solves 100x larger problem

Number of nonzero = I / 50�

(J)�

(I)�

(K)�

GigaTensor

Out of Memory

100x

C-BIG'12 87 C. Faloutsos (CMU)

Page 88: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

A1: Concept Discovery

•  Concept Discovery in Knowledge Base

C-BIG'12 88 C. Faloutsos (CMU)

Page 89: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS A1: Concept Discovery

C-BIG'12 89 C. Faloutsos (CMU)

Page 90: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS A2: Synonym Discovery

C-BIG'12 90 C. Faloutsos (CMU)

Page 91: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 91

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– Belief propagation – Tensors – Spike analysis

•  Problem#3: Scalability -PEGASUS •  Conclusions

C-BIG'12

Page 92: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

•  Meme (# of mentions in blogs) –  short phrases Sourced from U.S. politics in 2008

92

20 40 60 80 100 120 140 1600

100

200

Time (hours)

# of

men

tions

20 40 60 80 100 120 140 1600

50

100

Time (hours)

# of

men

tions

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

C. Faloutsos (CMU) C-BIG'12

Page 93: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

Rise and fall patterns in social media�

93

•  Can we find a unifying model, which includes these patterns?

•  four classes on YouTube [Crane et al. ’08] •  six classes on Meme [Yang et al. ’11]

C. Faloutsos (CMU) C-BIG'12

Page 94: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Rise and fall patterns in social media�

94

•  Answer: YES!

•  We can represent all patterns by single model

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

C. Faloutsos (CMU) C-BIG'12 In Matsubara+ SIGKDD 2012

Page 95: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

95

Main idea - SpikeM�-  1. Un-informed bloggers (uninformed about rumor)

-  2. External shock at time nb (e.g, breaking news)

-  3. Infection (word-of-mouth)

Time n=0� Time n=nb�

��

C. Faloutsos (CMU) C-BIG'12

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news) - Decay function

β

f (n)

Time n=nb+1�

Page 96: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

96

-  1. Un-informed bloggers (uninformed about rumor)

-  2. External shock at time nb (e.g, breaking news)

-  3. Infection (word-of-mouth)

Time n=0� Time n=nb�

��

C. Faloutsos (CMU) C-BIG'12

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news) - Decay function

β

f (n)

Time n=nb+1�

f (n) = β *n−1.5

Main idea - SpikeM�

Page 97: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

SpikeM - with periodicity�•  Full equation of SpikeM

97

ΔB(n+1) = p(n+1) ⋅ U(n) ⋅ (ΔB(t)+ S(t)) ⋅ f (n+1− t)+εt=nb

n

∑%

&''

(

)**

Periodicity�

noon Peak� 3am

Dip�

Time n�

Bloggers change their activity over time

(e.g., daily, weekly, yearly)�

activity�

p(n)

C. Faloutsos (CMU) C-BIG'12

Page 98: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Details�•  Analysis – exponential rise and power-raw fall

98

Lin-log�

Log-log�

Rise-part

SI -> exponential SpikeM -> exponential

C. Faloutsos (CMU) C-BIG'12

Page 99: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Details�•  Analysis – exponential rise and power-raw fall

99

Lin-log�

Log-log�

Fall-part

SI -> exponential SpikeM -> power law

C. Faloutsos (CMU) C-BIG'12

Page 100: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Tail-part forecasts�

100

•  SpikeM can capture tail part

C. Faloutsos (CMU) C-BIG'12

Page 101: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

“What-if” forecasting�

101

e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date

?� ?�

(1) First spike�

(2) Release date�

(3) Two weeks before release�

C. Faloutsos (CMU) C-BIG'12

Page 102: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

“What-if” forecasting�

102

– SpikeM can forecast not only tail-part, but also rise-part!

•  SpikeM can forecast shape of upcoming spikes

(1) First spike�

(2) Release date�

(3) Two weeks before release�

C. Faloutsos (CMU) C-BIG'12

Page 103: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 103

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– Belief propagation – Spike analysis – Tensors

•  Problem#3: Scalability -PEGASUS •  Conclusions

C-BIG'12

Page 104: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C-BIG'12 C. Faloutsos (CMU) 104

Scalability •  Google: > 450,000 processors in clusters of ~2000

processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003]

•  Yahoo: 5Pb of data [Fayyad, KDD’07] •  Problem: machine failures, on a daily basis •  How to parallelize data mining tasks, then? •  A: map/reduce – hadoop (open-source clone)

http://hadoop.apache.org/

Page 105: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 105

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old HERE

Conn. Comp old HERE

Triangles done HERE

Visualization started

Roadmap – Algorithms & results

C-BIG'12

Page 106: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

HADI for diameter estimation •  Radius Plots for Mining Tera-byte Scale

Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10

•  Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B)

•  Our HADI: linear on E (~10B) – Near-linear scalability wrt # machines – Several optimizations -> 5x faster

C. Faloutsos (CMU) 106 C-BIG'12

Page 107: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

????

19+ [Barabasi+]

107 C. Faloutsos (CMU)

Radius

Count

C-BIG'12

~1999, ~1M nodes

Page 108: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

19+ [Barabasi+]

108 C. Faloutsos (CMU)

Radius

Count

C-BIG'12

??

~1999, ~1M nodes

Page 109: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

19+? [Barabasi+]

109 C. Faloutsos (CMU)

Radius

Count

C-BIG'12

14 (dir.) ~7 (undir.)

Page 110: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) • 7 degrees of separation (!) • Diameter: shrunk

????

19+? [Barabasi+]

110 C. Faloutsos (CMU)

Radius

Count

C-BIG'12

14 (dir.) ~7 (undir.)

Page 111: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape?

????

111 C. Faloutsos (CMU)

Radius

Count

C-BIG'12

~7 (undir.)

Page 112: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

112 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality (?!)

C-BIG'12

Page 113: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Radius Plot of GCC of YahooWeb.

113 C. Faloutsos (CMU) C-BIG'12

Page 114: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

114 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

C-BIG'12

Page 115: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

115 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

C-BIG'12

EN

~7

Conjecture: DE

BR

Page 116: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

116 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

C-BIG'12

~7

Conjecture:

Page 117: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

Running time - Kronecker and Erdos-Renyi Graphs with billions edges.

details

Page 118: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 118

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old HERE

Conn. Comp old HERE

Triangles HERE

Visualization started

Roadmap – Algorithms & results

C-BIG'12

Page 119: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

C. Faloutsos (CMU) 119

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).

C-BIG'12

Page 120: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

C. Faloutsos (CMU) 120

•  PageRank •  proximity (RWR) •  Diameter •  Connected components •  (eigenvectors, •  Belief Prop. •  … )

Matrix – vector Multiplication

(iterated)

C-BIG'12

details

Page 121: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

121

Example: GIM-V At Work •  Connected Components – 4 observations:

Size

Count

C. Faloutsos (CMU) C-BIG'12

Page 122: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

122

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) C-BIG'12

1) 10K x larger than next

Page 123: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

123

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) C-BIG'12

2) ~0.7B singleton nodes

Page 124: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

124

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) C-BIG'12

3) SLOPE!

Page 125: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

125

Example: GIM-V At Work •  Connected Components

Size

Count 300-size

cmpt X 500. Why? 1100-size cmpt

X 65. Why?

C. Faloutsos (CMU) C-BIG'12

4) Spikes!

Page 126: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

126

Example: GIM-V At Work •  Connected Components

Size

Count

suspicious financial-advice sites

(not existing now)

C. Faloutsos (CMU) C-BIG'12

Page 127: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

127

GIM-V At Work •  Connected Components over Time •  LinkedIn: 7.5M nodes and 58M edges

Stable tail slope after the gelling point

C. Faloutsos (CMU) C-BIG'12

Page 128: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 128

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

C-BIG'12

Page 129: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 129

OVERALL CONCLUSIONS – low level:

•  Several new patterns (fortification, triangle-laws, conn. components, etc)

•  New tools: –  belief propagation, gigaTensor, etc

•  Scalability: PEGASUS / hadoop

C-BIG'12

Page 130: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 130

OVERALL CONCLUSIONS – high level

•  BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise

C-BIG'12

Page 131: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 131

References •  Leman Akoglu, Christos Faloutsos: RTG: A Recursive

Realistic Graph Generator Using Random Typing. ECML/PKDD (1) 2009: 13-28

•  Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

C-BIG'12

Page 132: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 132

References •  Deepayan Chakrabarti, Yang Wang, Chenxi Wang,

Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Trans. Inf. Syst. Secur. 10(4): (2008)

•  Deepayan Chakrabarti, Jure Leskovec, Christos Faloutsos, Samuel Madden, Carlos Guestrin, Michalis Faloutsos: Information Survival Threshold in Sensor and P2P Networks. INFOCOM 2007: 1316-1324

C-BIG'12

Page 133: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 133

References •  Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

C-BIG'12

Page 134: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 134

References •  T. G. Kolda and J. Sun. Scalable Tensor

Decompositions for Multi-aspect Data Mining. In: ICDM 2008, pp. 363-372, December 2008.

C-BIG'12

Page 135: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 135

References •  Jure Leskovec, Jon Kleinberg and Christos Faloutsos

Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award).

•  Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos: Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. PKDD 2005: 133-145

C-BIG'12

Page 136: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 136

References •  Jimeng Sun, Yinglian Xie, Hui Zhang, Christos

Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007.

•  Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter-free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007

C-BIG'12

Page 137: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

References •  Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

C-BIG'12 C. Faloutsos (CMU) 137

Page 138: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 138

References •  Hanghang Tong, Christos Faloutsos, and

Jia-Yu Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong.

•  Hanghang Tong, Christos Faloutsos, Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA

C-BIG'12

Page 139: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 139

References •  Hanghang Tong, Christos Faloutsos, Brian

Gallagher, Tina Eliassi-Rad: Fast best-effort pattern matching in large attributed graphs. KDD 2007: 737-746

C-BIG'12

Page 140: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 140

Project info

C-BIG'12

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

www.cs.cmu.edu/~pegasus

Page 141: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 141

Cast

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tong, Hanghang

Prakash, Aditya

C-BIG'12

Koutra, Danae

Beutel, Alex

Papalexakis, Vagelis

Page 142: Mining Billion-Node Graphs - Patterns and Algorithms Billion... · 2014. 4. 18. · CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU . ... CMU SCS

CMU SCS

C. Faloutsos (CMU) 142

OVERALL CONCLUSIONS – high level

•  BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise

C-BIG'12