63
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University

RTG: A Recursive Realistic Graph Generator using Random Typing

Embed Size (px)

DESCRIPTION

RTG: A Recursive Realistic Graph Generator using Random Typing. Leman Akoglu and Christos Faloutsos Carnegie Mellon University. Outline. Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion. Motivation - 1. - PowerPoint PPT Presentation

Citation preview

RTG: A Recursive Realistic Graph Generator using Random Typing

Leman Akoglu and Christos FaloutsosCarnegie Mellon University

Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 2

Motivation - 1 Complex graphs --WWW, computer, biological, social networks, etc. exhibit many common properties:

- power laws - small and shrinking diameter - community structure - …

How can we produce synthetic but realistic graphs?

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 3

http://www.aharef.info/static/htmlgraph/

Motivation - 2

Why do we need synthetic graphs?• Simulation• Sampling/Extrapolation• Summarization/Compression• Motivation to understand pattern generating

processes

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 4

Problem Definition

Discover a graph generator that is:G1. simple: the more intuitive the better!G2. realistic: outputs graphs that obey all “laws”G3. parsimonious: requires few parametersG4. flexible: able to produce the cross-product of

un/weighted, un/directed, uni/bipartite graphsG5. fast: generation should take linear time with

the size of the output graph

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 5

Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 6

Related Work

1. Graph Properties What we want to match

2. Graph Generators What has been proposed earlier

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 7

Related Work 1: Graph Properties

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 8

Related Work 2: Graph Generators

• Erdős-Rényi (ER) model [Erdős, Rényi `60]• Small-world model [Watts, Strogatz `98]• Preferential Attachment [Barabási, Albert `99]• Winners don’t take all [Pennock et al. `02]• Forest Fire model [Leskovec, Faloutsos `05]• Butterfly model [McGlohon et al. `08]

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 9

Related Work 2: Graph Generators

• Erdős-Rényi (ER) model [Erdős, Rényi `60]• Small-world model [Watts, Strogatz `98]• Preferential Attachment [Barabási, Albert `99]• Winners don’t take all [Pennock et al. `02]• Forest Fire model [Leskovec, Faloutsos `05]• Butterfly model [McGlohon et al. `08]

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 10

• Model some static graph property• Neglect dynamic properties • Cannot produce weighted graphs.

Related Work 2: Graph Generators

• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]

• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]

• Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08]04/20/23 Akoglu, Faloutsos ECML PKDD 2009 11

Related Work 2: Graph Generators

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 12

• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]

• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]

• Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08]

• Produces only undirected graphs • Cannot produce weighted graphs.• Requires quadratic time

• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]

• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]

• Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08]

Related Work 2: Graph Generators

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 13

• Hard to analyze

• Produces only undirected graphs • Cannot produce weighted graphs.• Requires quadratic time

Related Work 2: Graph Generators

• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]

• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]

• Kronecker graphs [Leskovec et al. `07] [Akoglu, `08]04/20/23 Akoglu, Faloutsos ECML PKDD 2009 14

• Multinomial/Lognormal distrib.• Fixed number of nodes

• Hard to analyze

• Produces only undirected graphs • Cannot produce weighted graphs.• Requires quadratic time

Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 15

rankco

unt

A Little History - 1

[Zipf, 1932] In many natural languages,

the rank r and the frequency fr of words follow a power law:

fr 1/r∝

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 16

A Little History - 2 [Mandelbrot, 1953] “Humans optimize avg. information per

unit transmission cost.”

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 17

A Little History - 2 [Miller, 1957] “A monkey types randomly on a keyboard:

Distribution of words follow a power-law.”

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 18

k equiprobable keys

. . . . .a b λ $ + Space

A Little History - 2

[Conrad and Mitzenmacher, 2004] “Same relation still holds when keys have

unequal probabilities.”

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 19

. . .a b λ $ + Space

Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 20

Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 21

Space

, where

Lemma 1. W is super-linear on N (power law):

Lemma 2. W is super-linear on E (power law):

Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law):

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 22

Please find the proofs in the paper.

Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys

Graph Properties

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 23

, where

Lemma 1. W is super-linear on N (power law):

Lemma 2. W is super-linear on E (power law):

Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law):

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 24

Please find the proofs in the paper.

Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys

L11. Weight PL

L05. Densification PL

L10. Snapshot PL

Advantages of the Preliminary Model 1

G1 - IntuitiveG1 - Easy to implementG2 - Realistic –provably follows several rules

G3 - Handful of parameters –k, q, W G5 - Fast –generating random sequence of char.s

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 25

Problems of the Preliminary Model 11- Multinomial degree distributions

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 26

rank

coun

t

in-degree

coun

t

Problems of the Preliminary Model 1

2- No homophily, no community structure Node i connects to any node j with prob. di*dj independently, rather than connecting to ‘similar’ nodes.

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 27

Preliminary Model 2RTG-IU: RTG with Independent Un-equiprobable keys

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 28

Solution to Problem 1:[Conrad and Mitzenmacher, 2004]

rank

coun

t

in-degree

coun

t

rank

coun

t

in-degree

coun

t

. . .ab λ $ + Space

. . . . .a b λ $ + Space

Proposed ModelRTG: Random Typing Graphs

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 29

Solution to Problem 2:“2D keyboard”• Generate source- destination labels in one shot.• Pick one of the nine keys randomly.

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 30

Solution to Problem 2:“2D keyboard”• Repeat recursively.• Terminate each label when the space key is typed on each dimension (dark blue).

Proposed ModelRTG: Random Typing Graphs

pa*pa

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 31

Solution to Problem 2:“2D keyboard” How do we choose the keys? Independent model does not yield community structure!

Proposed ModelRTG: Random Typing Graphs

pa*pb

pb*pa pb*pb

q*pa q*pb

pa*q

pb*q

q*q

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 32

Solution to Problem 2:“2D keyboard”• Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor)

Proposed ModelRTG: Random Typing Graphs

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 33

Solution to Problem 2:“2D keyboard”• Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor)• Favoring of diagonal keys creates homophily.

Proposed ModelRTG: Random Typing Graphs

Proposed Model

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 34

Parameters• k: Number of keys• q: Probability of hitting the space key S• W: Number of multi- edges in output graph G• β: imbalance factor

Up to this point, we discussed directed, weighted and unipartite graphs. Generalizations - Undirected graphs: Ignore edge directions; edge generation is symmetric. - Unweighted graphs: Ignore duplicate edges. - Bipartite graphs: Different key sets on source and destination; labels are different.

Proposed Model

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 35

Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 36

Experimental ResultsHow does RTG model real graphs?

• Blognet: a social network of blogs based on citations undirected, unweighted and unipartite N = 27, 726; E = 126, 227; over 80 time ticks.• Com2Cand: the U.S. electoral campaign donations network from organizations to candidates directed, weighted ($ amounts) and bipartite N = 23, 191; E = 877, 721; W = 4, 383, 105, 580 over 29 time ticks.

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 37

Experimental Results Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 38

degreedegreeco

unt

coun

t

L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04]

Experimental Results Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 39

trianglestrianglesco

unt

coun

t

L02. Triangle Power Law (TPL) [Tsourakakis `08]

Experimental Results 1 Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 40

rank rank

λ rank

λ rank

L03. Eigenvalue Power Law (EPL) [Siganos et al. `03]

Graph Properties

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 41

Experimental Results 1 Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 42

#nodes #nodes

#edg

es

#edg

es

L05. Densification Power Law (DPL) [Leskovec et al. `05]

Experimental Results Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 43

time time

diam

eter

diam

eter

L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05]

Experimental Results Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 44

time time

size

size

L07. Constant size 2nd and 3rd connected components [McGlohon et al. `08]

Experimental Results 1 Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 45

#edges #edges

λ 1 λ 1

L08. Principal Eigenvalue Power Law (λ1PL) [Akoglu et al. `08]

Experimental Results 1 Blognet RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 46

resolution resolution

entr

opy

entr

opy

L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08]

Graph Properties

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 47

Experimental Results 2 Com2Cand RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 48

time time

time time

diam

eter

diam

eter

size

size

Experimental Results 2 Com2Cand RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 49

#edges #edges

rank rank

λ 1 λ 1

λ rank

λ rank

Experimental Results 2 Com2Cand RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 50

in-degree

coun

t

in-degree

coun

t

resolution resolution

entr

opy

entr

opy

Experimental Results 2 Com2Cand RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 51

in-degreein-degree (#checks)

in-w

eigh

t

in-w

eigh

t ($

am

ount

)

L10. Snapshot Power Law (SPL) [McGlohon et al. `08]

Experimental Results 2 Com2Cand RTG

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 52

#edges

Tota

l wei

ght

L11. Weight Power Law (WPL) [McGlohon et al. `08]

Tota

l wei

ght

#edges

Graph Properties

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 53

Experimental ResultsOn “modularity” [Girvan and Newman `02]

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 54

No significant modularity --RTG-IE

“Modularity “decreaseswith increasing β

mor

e co

mm

unity

str

uctu

re

Graph Properties

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 55

Experimental ResultsOn complexity

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 56

Computation timegrows linearlywith increasing W

2M multi-edges in 7 sec.s

#multi-edges

time

(ms)

Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 57

Conclusion 1Our model is:G1. simple and intuitive --few lines of codeG2. realistic --graphs that obey all eleven

properties in real graphsG3. parsimonious --only a handful of parametersG4. flexible --can generate

weighted/unweighted, directed/undirected, unipartite/bipartite graphs and any combination of those

G5. fast --linear on the size of the output graph04/20/23 Akoglu, Faloutsos ECML PKDD 2009 58

Conclusion 2We showed that: RTG mimics real graphs well.

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 59

Contact

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 60

Leman Akogluwww.cs.cmu.edu/[email protected]

Christos Faloutsoswww.cs.cmu.edu/[email protected]

A Little History - 3 The infinite monkey theorem: A monkey typing randomly on a keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare.

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 61

Burstiness and Self-similarity If each step is a time tick, weight additions are uniform!

• Start with a uniform interval• Recursively subdivide weight additions to each half, quarter, and so on, according to the bias b > 0.5• b -fraction of the additions happen in one “half” and the remaining in the other.

TotalWeight

Time

Proposed Model

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 62

Related Work: Graph Properties

04/20/23 Akoglu, Faloutsos ECML PKDD 2009 63