16
(C) C. Faloutsos, 2017 1 CMU SCS Part 1: Graph Mining – patterns Christos Faloutsos CMU CMU SCS Our goal: Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining System) www.cs.cmu.edu/~pegasus code and papers Tepper, CMU, April 4 (c) C. Faloutsos, 2017 2 CMU SCS References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/ S00449ED1V01Y201209DMK006 Tepper, CMU, April 4 (c) C. Faloutsos, 2017 3 CMU SCS Outline Introduction – Motivation Part#1: Patterns in graphs Part#2: Tools (Ranking, proximity) • Conclusions Tepper, CMU, April 4 (c) C. Faloutsos, 2017 4

Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

1

CMU SCS

Part 1: Graph Mining – patterns

Christos Faloutsos CMU

CMU SCS

Our goal:

Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining

System) •  www.cs.cmu.edu/~pegasus •  code and papers Tepper, CMU, April 4 (c) C. Faloutsos, 2017 2

CMU SCS

References •  D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,

Tools and Case Studies, Morgan Claypool 2012 •  http://www.morganclaypool.com/doi/abs/10.2200/

S00449ED1V01Y201209DMK006

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 3

CMU SCS

Outline

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: Tools (Ranking, proximity) •  Conclusions

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 4

Page 2: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

2

CMU SCS

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 5

CMU SCS

Graphs - why should we care? •  IR: bi-partite graphs (doc-terms)

•  web: hyper-text graph

•  ... and more:

D1

DN

T1

TM

... ...

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 6

CMU SCS

Graphs - why should we care? •  network of companies & board-of-directors

members •  ‘viral’ marketing •  web-log (‘blog’) news propagation •  computer network security: email/IP traffic

and anomaly detection •  ....

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 7

CMU SCS

Outline

•  Introduction – Motivation •  Patterns in graphs

– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 8

Page 3: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

3

CMU SCS

Network and graph mining

•  How does the Internet look like? •  How does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 9

CMU SCS

Network and graph mining

•  How does the Internet look like? •  How does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 10

CMU SCS

Network and graph mining

•  How does the Internet look like? •  How does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

–  Large datasets reveal patterns/anomalies that may be invisible otherwise…

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 11

CMU SCS

Topology

How does the Internet look like? Any rules?

(Looks random – right?)

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 12

Page 4: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

4

CMU SCS

Graph mining •  Are real graphs random?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 13

CMU SCS

Laws and patterns •  Are real graphs random? •  A: NO!!

– Diameter –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 14

CMU SCS

Laws – degree distributions •  Q: avg degree is ~2 - what is the most

probable degree?

degree

count ??

2

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 15

CMU SCS

Laws – degree distributions •  Q: avg degree is ~2 - what is the most

probable degree?

degree degree

count ?? count

2 2

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 16

Page 5: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

5

CMU SCS

Solution S1 .Power-law: outdegree O

The plot is linear in log-log scale [FFF’99]

freq = degree (-2.15)

O = -2.15 Exponent = slope

Outdegree

Frequency

Nov’97

-2.15

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 17

CMU SCS

Solution# S.1’

•  Power law in the degree distribution [SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 18

CMU SCS

Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 19

CMU SCS

Solution# S.2: Eigen Exponent E

•  [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 20

Page 6: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

6

CMU SCS

But: How about graphs from other domains?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 21

CMU SCS

More power laws: •  web hit counts [w/ A. Montgomery]

Web Site Traffic

in-degree (log scale)

Count (log scale)

Zipf

users sites

``ebay’’

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 22

CMU SCS

epinions.com •  who-trusts-whom

[Richardson + Domingos, KDD 2001]

(out) degree

count

trusts-2000-people user

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 23

CMU SCS

And numerous more •  # of sexual contacts •  Income [Pareto] –’80-20 distribution’ •  Duration of downloads [Bestavros+] •  Duration of UNIX jobs (‘mice and

elephants’) •  Size of files of a user •  … •  ‘Black swans’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017 24

Page 7: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

7

CMU SCS

Outline

•  Introduction – Motivation •  Patterns in graphs

– Patterns in Static graphs •  Degree •  Triangles •  …

– Patterns in Weighted graphs – Patterns in Time evolving graphs

•  Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017 25

CMU SCS

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 26

CMU SCS

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 27

CMU SCS

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 28

Page 8: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

8

CMU SCS

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 29

CMU SCS

Triangle Law: #S.4 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 30

CMU SCS

Outline

•  Introduction – Motivation •  Patterns in graphs

– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs

•  Generators

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 31

CMU SCS Observations on weighted

graphs? •  A: yes - even more ‘laws’!

M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 32

Page 9: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

9

CMU SCS

Observation W.1: Fortification Q: How do the weights of nodes relate to degree?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 33

CMU SCS

Observation W.1: Fortification

More donors, more $ ?

$10

$5

‘Reagan’

‘Clinton’ $7

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 34

CMU SCS

Edges (# donors)

In-weights ($)

Observation W.1: fortification: Snapshot Power Law

•  Weight: super-linear on in-degree •  exponent ‘iw’: 1.01 < iw < 1.26

Orgs-Candidates

e.g. John Kerry, $10M received, from 1K donors

More donors, even more $

$10

$5

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 35

CMU SCS

Outline

•  Introduction – Motivation •  Patterns in graphs

– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs

•  Generators

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 36

Page 10: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

10

CMU SCS

Problem: Time evolution •  with Jure Leskovec (CMU ->

Stanford)

•  and Jon Kleinberg (Cornell – sabb. @ CMU)

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 37

CMU SCS

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 38

CMU SCS

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data? •  Diameter shrinks over time

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 39

CMU SCS

T.1 Diameter – “Patents”

•  Patent citation network

•  25 years of data •  @1999

–  2.9 M nodes –  16.5 M edges

time [years]

diameter

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 40

Page 11: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

11

CMU SCS T.2 Temporal Evolution of the

Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 41

CMU SCS T.2 Temporal Evolution of the

Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

•  A: over-doubled! – But obeying the ``Densification Power Law’’

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 42

CMU SCS T.2 Densification – Patent

Citations •  Citations among

patents granted •  @1999

–  2.9 M nodes –  16.5 M edges

•  Each year is a datapoint

N(t)

E(t)

1.66

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 43

CMU SCS

Outline

•  Introduction – Motivation •  Patterns in graphs

– Patterns in Static graphs – Patterns in Weighted graphs – Patterns in Time evolving graphs

•  Generators

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 44

Page 12: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

12

CMU SCS

More on Time-evolving graphs

M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 45

CMU SCS

Observation T.3: NLCC behavior Q: How do NLCC’s emerge and join with

the GCC? (``NLCC’’ = non-largest conn. components) – Do they continue to grow in size? –  or do they shrink? –  or stabilize?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 46

CMU SCS

Observation T.3: NLCC behavior •  After the gelling point, the GCC takes off, but

NLCC’s remain ~constant (actually, oscillate).

IMDB

CC size

Time-stamp Tepper, CMU, April 4 (c) C. Faloutsos, 2017 47

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 48

Page 13: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

13

CMU SCS

Example: GIM-V At Work •  Connected Components

Size

Count

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 49

CMU SCS

Example: GIM-V At Work •  Connected Components

Size

Count

~0.7B singleton nodes

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 50

CMU SCS

Example: GIM-V At Work •  Connected Components

Size

Count

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 51

CMU SCS

Example: GIM-V At Work •  Connected Components

Size

Count 300-size

cmpt X 500. Why? 1100-size cmpt

X 65. Why?

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 52

Page 14: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

14

CMU SCS

Example: GIM-V At Work •  Connected Components

Size

Count

suspicious financial-advice sites

(not existing now)

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 53

CMU SCS

GIM-V At Work •  Connected Components over Time •  LinkedIn: 7.5M nodes and 58M edges

Stable tail slope after the gelling point

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 54

CMU SCS

Timing for Blogs

•  with Mary McGlohon (CMU) •  Jure Leskovec (CMU->Stanford) •  Natalie Glance (now at Google) •  Mat Hurst (now at MSR) [SDM’07]

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 55

CMU SCS

T.4 : popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1 2 3

@t

@t + lag

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 56

Page 15: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

15

CMU SCS

T.4 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent?

# in links (log)

1 2 3 days after post (log)

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 57

CMU SCS

T.4 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 •  close to -1.5: Barabasi’s stack model •  and like the zero-crossings of a random walk

# in links (log)

1 2 3

-1.6

days after post (log)

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 58

CMU SCS

Conclusions (part1)

MANY patterns in real graphs •  Skewed degree distributions •  Small (and shrinking) diameter •  Power-laws wrt triangles •  Oscillating size of connected components •  … and more

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 59

CMU SCS

References •  D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,

Tools and Case Studies, Morgan Claypool 2012 •  http://www.morganclaypool.com/doi/abs/10.2200/

S00449ED1V01Y201209DMK006

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 60

Page 16: Part 1: Graph Mining – patternschristos/TALKS/17-CMU-Tepper-Alan/faloutsos-P1-patterns.pdf(C) C. Faloutsos, 2017 4 CMU SCS Graph mining • Are real graphs random? Tepper, CMU, April

(C) C. Faloutsos, 2017

16

CMU SCS

References •  Jure Leskovec, Jon Kleinberg and Christos Faloutsos

Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award).

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 61

CMU SCS

Project info www.cs.cmu.edu/~pegasus

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tsourakakis, Babis

Tong, Hanghang

Prakash, Aditya

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, INTEL, HP Tepper, CMU, April 4 (c) C. Faloutsos, 2017 62

CMU SCS

Part1 END

Tepper, CMU, April 4 (c) C. Faloutsos, 2017 63