View
36
Download
0
Category
Tags:
Preview:
DESCRIPTION
RTG: A Recursive Realistic Graph Generator using Random Typing. Leman Akoglu and Christos Faloutsos Carnegie Mellon University. Outline. Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion. Motivation - 1. - PowerPoint PPT Presentation
Citation preview
RTG: A Recursive Realistic Graph Generator using Random Typing
Leman Akoglu and Christos FaloutsosCarnegie Mellon University
Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 2
Motivation - 1 Complex graphs --WWW, computer, biological, social networks, etc. exhibit many common properties:
- power laws - small and shrinking diameter - community structure - …
How can we produce synthetic but realistic graphs?
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 3
http://www.aharef.info/static/htmlgraph/
Motivation - 2
Why do we need synthetic graphs?• Simulation• Sampling/Extrapolation• Summarization/Compression• Motivation to understand pattern generating
processes
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 4
Problem Definition
Discover a graph generator that is:G1. simple: the more intuitive the better!G2. realistic: outputs graphs that obey all “laws”G3. parsimonious: requires few parametersG4. flexible: able to produce the cross-product of
un/weighted, un/directed, uni/bipartite graphsG5. fast: generation should take linear time with
the size of the output graph
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 5
Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 6
Related Work
1. Graph Properties What we want to match
2. Graph Generators What has been proposed earlier
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 7
Related Work 2: Graph Generators
• Erdős-Rényi (ER) model [Erdős, Rényi `60]• Small-world model [Watts, Strogatz `98]• Preferential Attachment [Barabási, Albert `99]• Winners don’t take all [Pennock et al. `02]• Forest Fire model [Leskovec, Faloutsos `05]• Butterfly model [McGlohon et al. `08]
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 9
Related Work 2: Graph Generators
• Erdős-Rényi (ER) model [Erdős, Rényi `60]• Small-world model [Watts, Strogatz `98]• Preferential Attachment [Barabási, Albert `99]• Winners don’t take all [Pennock et al. `02]• Forest Fire model [Leskovec, Faloutsos `05]• Butterfly model [McGlohon et al. `08]
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 10
• Model some static graph property• Neglect dynamic properties • Cannot produce weighted graphs.
Related Work 2: Graph Generators
• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]
• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]
• Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08]04/20/23 Akoglu, Faloutsos ECML PKDD 2009 11
Related Work 2: Graph Generators
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 12
• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]
• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]
• Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08]
• Produces only undirected graphs • Cannot produce weighted graphs.• Requires quadratic time
• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]
• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]
• Kronecker graphs [Leskovec et al. `07] [Akoglu et al. `08]
Related Work 2: Graph Generators
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 13
• Hard to analyze
• Produces only undirected graphs • Cannot produce weighted graphs.• Requires quadratic time
Related Work 2: Graph Generators
• Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07]
• Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08]
• Kronecker graphs [Leskovec et al. `07] [Akoglu, `08]04/20/23 Akoglu, Faloutsos ECML PKDD 2009 14
• Multinomial/Lognormal distrib.• Fixed number of nodes
• Hard to analyze
• Produces only undirected graphs • Cannot produce weighted graphs.• Requires quadratic time
Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 15
rankco
unt
A Little History - 1
[Zipf, 1932] In many natural languages,
the rank r and the frequency fr of words follow a power law:
fr 1/r∝
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 16
A Little History - 2 [Mandelbrot, 1953] “Humans optimize avg. information per
unit transmission cost.”
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 17
A Little History - 2 [Miller, 1957] “A monkey types randomly on a keyboard:
Distribution of words follow a power-law.”
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 18
k equiprobable keys
. . . . .a b λ $ + Space
A Little History - 2
[Conrad and Mitzenmacher, 2004] “Same relation still holds when keys have
unequal probabilities.”
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 19
. . .a b λ $ + Space
Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 20
Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 21
Space
, where
Lemma 1. W is super-linear on N (power law):
Lemma 2. W is super-linear on E (power law):
Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law):
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 22
Please find the proofs in the paper.
Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys
, where
Lemma 1. W is super-linear on N (power law):
Lemma 2. W is super-linear on E (power law):
Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law):
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 24
Please find the proofs in the paper.
Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys
L11. Weight PL
L05. Densification PL
L10. Snapshot PL
Advantages of the Preliminary Model 1
G1 - IntuitiveG1 - Easy to implementG2 - Realistic –provably follows several rules
G3 - Handful of parameters –k, q, W G5 - Fast –generating random sequence of char.s
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 25
Problems of the Preliminary Model 11- Multinomial degree distributions
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 26
rank
coun
t
in-degree
coun
t
Problems of the Preliminary Model 1
2- No homophily, no community structure Node i connects to any node j with prob. di*dj independently, rather than connecting to ‘similar’ nodes.
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 27
Preliminary Model 2RTG-IU: RTG with Independent Un-equiprobable keys
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 28
Solution to Problem 1:[Conrad and Mitzenmacher, 2004]
rank
coun
t
in-degree
coun
t
rank
coun
t
in-degree
coun
t
. . .ab λ $ + Space
. . . . .a b λ $ + Space
Proposed ModelRTG: Random Typing Graphs
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 29
Solution to Problem 2:“2D keyboard”• Generate source- destination labels in one shot.• Pick one of the nine keys randomly.
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 30
Solution to Problem 2:“2D keyboard”• Repeat recursively.• Terminate each label when the space key is typed on each dimension (dark blue).
Proposed ModelRTG: Random Typing Graphs
pa*pa
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 31
Solution to Problem 2:“2D keyboard” How do we choose the keys? Independent model does not yield community structure!
Proposed ModelRTG: Random Typing Graphs
pa*pb
pb*pa pb*pb
q*pa q*pb
pa*q
pb*q
q*q
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 32
Solution to Problem 2:“2D keyboard”• Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor)
Proposed ModelRTG: Random Typing Graphs
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 33
Solution to Problem 2:“2D keyboard”• Boost probability of diagonal keys and decrease probability of off-diagonal ones (0<β<1: imbalance factor)• Favoring of diagonal keys creates homophily.
Proposed ModelRTG: Random Typing Graphs
Proposed Model
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 34
Parameters• k: Number of keys• q: Probability of hitting the space key S• W: Number of multi- edges in output graph G• β: imbalance factor
Up to this point, we discussed directed, weighted and unipartite graphs. Generalizations - Undirected graphs: Ignore edge directions; edge generation is symmetric. - Unweighted graphs: Ignore duplicate edges. - Bipartite graphs: Different key sets on source and destination; labels are different.
Proposed Model
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 35
Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 36
Experimental ResultsHow does RTG model real graphs?
• Blognet: a social network of blogs based on citations undirected, unweighted and unipartite N = 27, 726; E = 126, 227; over 80 time ticks.• Com2Cand: the U.S. electoral campaign donations network from organizations to candidates directed, weighted ($ amounts) and bipartite N = 23, 191; E = 877, 721; W = 4, 383, 105, 580 over 29 time ticks.
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 37
Experimental Results Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 38
degreedegreeco
unt
coun
t
L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04]
Experimental Results Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 39
trianglestrianglesco
unt
coun
t
L02. Triangle Power Law (TPL) [Tsourakakis `08]
Experimental Results 1 Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 40
rank rank
λ rank
λ rank
L03. Eigenvalue Power Law (EPL) [Siganos et al. `03]
Experimental Results 1 Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 42
#nodes #nodes
#edg
es
#edg
es
L05. Densification Power Law (DPL) [Leskovec et al. `05]
Experimental Results Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 43
time time
diam
eter
diam
eter
L06. Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05]
Experimental Results Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 44
time time
size
size
L07. Constant size 2nd and 3rd connected components [McGlohon et al. `08]
Experimental Results 1 Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 45
#edges #edges
λ 1 λ 1
L08. Principal Eigenvalue Power Law (λ1PL) [Akoglu et al. `08]
Experimental Results 1 Blognet RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 46
resolution resolution
entr
opy
entr
opy
L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08]
Experimental Results 2 Com2Cand RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 48
time time
time time
diam
eter
diam
eter
size
size
Experimental Results 2 Com2Cand RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 49
#edges #edges
rank rank
λ 1 λ 1
λ rank
λ rank
Experimental Results 2 Com2Cand RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 50
in-degree
coun
t
in-degree
coun
t
resolution resolution
entr
opy
entr
opy
Experimental Results 2 Com2Cand RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 51
in-degreein-degree (#checks)
in-w
eigh
t
in-w
eigh
t ($
am
ount
)
L10. Snapshot Power Law (SPL) [McGlohon et al. `08]
Experimental Results 2 Com2Cand RTG
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 52
#edges
Tota
l wei
ght
L11. Weight Power Law (WPL) [McGlohon et al. `08]
Tota
l wei
ght
#edges
Experimental ResultsOn “modularity” [Girvan and Newman `02]
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 54
No significant modularity --RTG-IE
“Modularity “decreaseswith increasing β
mor
e co
mm
unity
str
uctu
re
Experimental ResultsOn complexity
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 56
Computation timegrows linearlywith increasing W
2M multi-edges in 7 sec.s
#multi-edges
time
(ms)
Outline• Motivation• Problem Definition• Related Work• A Little History• Proposed Model• Experimental Results• Conclusion
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 57
Conclusion 1Our model is:G1. simple and intuitive --few lines of codeG2. realistic --graphs that obey all eleven
properties in real graphsG3. parsimonious --only a handful of parametersG4. flexible --can generate
weighted/unweighted, directed/undirected, unipartite/bipartite graphs and any combination of those
G5. fast --linear on the size of the output graph04/20/23 Akoglu, Faloutsos ECML PKDD 2009 58
Conclusion 2We showed that: RTG mimics real graphs well.
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 59
Contact
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 60
Leman Akogluwww.cs.cmu.edu/~lakoglulakoglu@cs.cmu.edu
Christos Faloutsoswww.cs.cmu.edu/~christoschristos@cs.cmu.edu
A Little History - 3 The infinite monkey theorem: A monkey typing randomly on a keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare.
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 61
Burstiness and Self-similarity If each step is a time tick, weight additions are uniform!
• Start with a uniform interval• Recursively subdivide weight additions to each half, quarter, and so on, according to the bias b > 0.5• b -fraction of the additions happen in one “half” and the remaining in the other.
TotalWeight
Time
Proposed Model
04/20/23 Akoglu, Faloutsos ECML PKDD 2009 62
Recommended