38
ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE COMPLEX NETWORKS CHRISTIAN STAUDT Verteidigung der Dissertation Fakultät für Informatik des Karlsruher Instituts für Technologie (KIT)

ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE COMPLEX NETWORKS

CHRISTIAN STAUDT

Verteidigung der

Dissertation

Fakultät für Informatik des Karlsruher Instituts für Technologie (KIT)

Page 2: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

OUTLINE & CONTRIBUTIONS

2

data analysis• inspection

• transformation

• modeling

main algorithmic contributions• parallel heuristics for community detection

[Staudt, Meyerhenke ‘16, in TPDS] • network sparsification via edge centrality rating

[Hamann, Lindner, Meyerhenke, Staudt, Wagner ‘16, in SNAM]

• realistic synthetic networks via generative models [with Gutfraind, Safro, Hamann, Meyerhenke, unpublished]

software package• NetworKit

[Staudt, Sazonovs, Meyerhenke ‘16, in Network Science]

|{z

}

network science• analysis of relational

data known and novel

algorithms by numerous contributors

+

graph algorithmics

network science

software engineering

data analysis / data science

algorithmics

this thesis

Page 3: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

INTRODUCTIONPart I

Page 4: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

NETWORK SCIENCE

4

��

��

��

��

��

��

��

��

����

��

��

��

����

��

�� ��

��

��

��

��

��

��

��

��

��

��

�� ��

����

����

��

����

��

���

��

��

����

��

��

��

����

��

����

��

��

visualization: node-edge diagram

graph representationobservation of phenomenon

phenomenon network concept network dataabstraction representation

network model

network model formation [Brandes et al. 13]

abstraction to network concept

V = NE = {(u, v) 2 D : x(u, v) 62 {0,1}}

!(u, v) = x(u, v) if (u, v) 2 E

example: Dolphin social network study [Lusseau et al. ‘03]

G = (V,E,!)

|V | = n, |E| = m

network analysisinterpretation network structure information

N =?

8(u, v) 2 D = N ⇥N , x(u, v) =?

Page 5: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

NETWORKIT: A TOOL SUITE FOR THE ANALYSIS OF LARGE COMPLEX NETWORKS

Part II

Page 6: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

NETWORKIT

6

contribution• NetworKit, an open-source tool suite for the analysis of large networks

[Staudt, Sazonovs, Meyerhenke ’16 in Network Science, to appear]

state of the art & targeted improvement• various graph processing or network analysis libraries exist (e.g. Boost Graph, JUNG, NetworkX… ) • increase performance, parallelism, focus on network science, … • provide short path from algorithms research to data analysis applications

Page 7: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

PRINCIPLES AND ARCHITECTURE

7

architecture overview

design goals• performance • usability and integration

algorithm and implementation patterns

• parallelism • heuristics and approximation

algorithms • modular design

C++ / OpenMP

Data Structures I/OAlgorithms

Cython

PythonTask-oriented Interface Additional

Functionality

Pythonized Classes

Wrapper Classes

NetworKit

pandas

numpy

matplotlib

ext. Python modules

Python shell / program

Page 8: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

USE CASES

8

e.g. as component in a protein network analysis pipeline [Flick ‘14]

as algorithm library

e.g. network profiles for explorative network analysis, here: connectome network

[Staudt et al. ‘16]

as interactive data analysis tool (e.g. via Jupyter Notebook)

Page 9: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

COMPARISON AND EVALUATION

9

comparative benchmark• consistently fastest average processing rate for typical analysis kernels on varied set of networks

Page 10: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

COMMUNITY DETECTION IN COMPLEX NETWORKS

Part III

Page 11: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

COMMUNITY DETECTION

11

0

1

2

3

4

5

6

7

8 9

10

11

12

13

14

15

16

17

18

1920

21

22

23

24

2526

2728

29

30 31

32

33

34

35

36

37

38

39

40

41

4243

44

45

46

47

48

49

50

51

52

53

54

55

5657

58

59

60

61

M�(G, ⇣, �) :=X

C2⇣

|E(C)||E|

| {z }coverage

�� ·X

C2⇣

�Pv2C deg(v)

�2

(2 · |E|)2| {z }

expected coverage

(multi-resolution) modularity ([Lambiotte ’10]) [Girvan & Newman ‘02]

community detection• reveal modular composition of network by finding

internally dense, externally sparse subgraphs (communities)

• community: vague concept, formalized via e.g. objective function modularity

modularity• optimization NP-hard [Brandes et al. ‘08] • parameter-free, well understood -> commonly

applied for explorative network analysis

Dolphins social network - node color by modularity-based communities

Page 12: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

ENGINEERING PARALLEL ALGORITHMS FOR COMMUNITY DETECTION

12

contribution• two fast, robust & scalable parallel heuristics: PLP & PLM • PLM: best quality/running time tradeoff in experimental comparison with current competitors

[Staudt & Meyerhenke ’13 at ICPP] [Staudt & Meyerhenke ’16 in TPDS]

state of the art & targeted improvement• O(m)-time heuristics for modularity maximization exist, but few parallel algorithms (10th DIMACS

Challenge 2012 [Bader et al. ’13]) • desired: implementation that robustly processes billion-edge graphs on typical multicore computer in

minutes

Page 13: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

PLM: PARALLEL LOUVAIN METHOD

13

predecessor• sequential locally greedy multilevel algorithm

[Blondel et al. ‘08] • optional extension: additional refinement phase

(PLMR) [Rotta & Noack ‘11]

our improvement• first parallelization

move phase

coarsening phaseprolongation

(+ refinement phase)

Page 14: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

PLM: PARALLELIZATION

14

parallelization issues and solutions• lock-free parallelisation where possible • move/refinement phase

• race conditions? yes, but mitigated by self-correcting iterative algorithm

• locks only on update of the volume of communities

Page 15: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

ENGINEERING PARALLEL ALGORITHMS FOR COMMUNITY DETECTION

15

PLM: strong scaling on 3 billion edge web graph

Pareto evaluation in comparison with codes from 10th DIMACS challenge [Bader et al.

’13] and others

experimental evaluation• running time and solution quality (modularity) measured on diverse set of networks • performance and scaling behavior optimized through algorithm engineering

−0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10

PRGulaULty scRUe

0

1

2

4

8

16

32

64

128

256

512

tLP

e s

cRUe

3L0

3L0*

3L3

3L05CL8BTBB

5*

LRuvaLn

E33

C**CC**CL

CEL

1 2 4 8 16 32

threads

0200400600800

1000120014001600

tim

e [

s]

1 2 4 8 16 32

threads

0

2

4

6

8

10

speedup

Page 16: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

EDGE CENTRALITY MEASURES FOR NETWORK SPARSIFICATION

Part IV

Page 17: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

EXAMPLES

17

Page 18: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

SPARSIFICATION

18

contributions • conceptual framework: network sparsification as edge centrality rating and filtering • first comparative study of various edge centrality measures for the purpose of structure-preserving network

sparsification • an effective novel method • (parallelized) efficient implementations

[Lindner, Staudt, Hamann, Meyerhenke, Wagner ’15 at ASONAM][Hamann, Lindner, Meyerhenke, Staudt, Wagner ’16 in SNAM]

state of the art & targeted improvement• numerous sparsification methods proposed, but lack of comparative work and unifying concepts

(edge) sparsification• reduce the edge set of a network while preserving important structural properties

edge score calculation edge filteringG = (V,E) G0 = (V,E0)

target edge ratioedge centrality measure

Page 19: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

EVALUATION

19

conclusions • class of methods (Simmelian Backbones, Jaccard Similarity, Algebraic Distance) that effectively preserves

community structures • novel method Local Degree preserves shortest paths, connectivity, many centralities • local filtering improves the preservation of almost all properties (diameter, centralities,…)

experimental evaluation • quantify how structural properties vary with decreasing edge ratio • network set: >100 social graphs (Facebook and others)

Page 20: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

GENERATIVE MODELS FOR REALISTIC SYNTHETIC NETWORKS

Part V

Page 21: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

GENERATING SCALED REPLICAS OF REAL-WORLD NETWORKS

21

contributions• fitting schemes for a variety of generative models • experimental study clarifying the degree of realism of various models • LFR+ generator, an effective tool for creating realistic (scaled) replicas of networks

[current joint work with Gutfraind, Safro, Meyerhenke, Hamann, unpublished]

state of the art• large variety of generative models with varying degrees of (claimed) realism

motivation• algorithm engineering: given a small real network, generate realistic (scaled) replicas to enable representative

experiments on larger data sets

model fitting scheme generator(scaling factor)

Ooriginal network

(scaled) replica

model parameters

network analysis

R

Page 22: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

EXAMPLE REPLICATION

22

original dolphins social network

consider key structural features…• degrees • connectedness • clustering • community structure • …

models (& generator algorithms)• [Erdös & Renyi ’60]([Batagelj & Brandes ‘05]) • [Barabasi & Albert ‘02] • [Chung et al. ‘00] • Edge-Switching Markov Chain

[Milo et al. ’03] • RMAT [Chakrabarti et al. ‘04] • Hyperbolic Unit-Disk Graph

[Krioukov et al. ’10]([Looz et al ’15]) • BTER [Kolda et al. ‘13] • LFR [Lancichinetti et al. ‘08]

Page 23: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

LFR+ GENERATOR: EXAMPLE REPLICATION

23

epidemiological contact network used in HIV research [Potterat et al.

’02]

scale-2 replica produced by LFR+ generator

sample from scale-200k replica produced by LFR+ generator

Page 24: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

SUMMARY

24

all contributions• Part II

• NetworKit[Staudt, Sazonovs, Meyerhenke ‘16, in Network Science, to appear]

• network analysis on distributed systems [Koch, Staudt, Vogel, Meyerhenke ’15 at FAB]

• Part III • parallel heuristics for community detection

[Staudt, Meyerhenke ‘13, at ICPP][Staudt, Meyerhenke ‘16, in TPDS]

• heuristics for selective community detection [Staudt, Marrakchi, Meyerhenke ’14 at IEEE BigData]

• Part IV• network sparsification via edge centrality

rating [Lindner, Staudt, Hamann, Meyerhenke, Wagner ‘15, at ASONAM][Hamann, Lindner, Meyerhenke, Staudt, Wagner ‘16, in SNAM]

• Part V• realistic synthetic networks via generative

models [with Gutfraind, Safro, Hamann, Meyerhenke, unpublished]

graph algorithmics

network science

software engineering

data analysis / data science

algorithmics

this thesis

Page 25: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

Appendix

Page 26: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

CHARACTERIZING THE STRUCTURE OF NETWORKS

26

distance• e.g. diameter, algebraic distance

node centrality• e.g. degree, betweenness, PageRank

edge centrality• e.g. edge betweenness, -> sparsification

partitioning• e.g. components, k-cores, communities

correlations• e.g. assortativity

emergent properties• e.g. epidemic spreading

0

1

2

3

4

5

6

7

8 9

10

11

12

13

14

15

16

17

18

1920

21

22

23

24

2526

2728

29

30 31

32

33

34

35

36

37

38

39

40

41

4243

44

45

46

47

48

49

50

51

52

53

54

55

5657

58

59

60

61

0

1

2

3

4

5

6

7

8 9

10

11

12

13

14

15

16

17

18

1920

21

22

23

24

2526

2728

29

30 31

32

33

34

35

36

37

38

39

40

41

4243

44

45

46

47

48

49

50

51

52

53

54

55

5657

58

59

60

61

Dolphins social network - betweenness Dolphins social network - k-core decomposition

Page 27: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

NETWORKIT API: CODE EXAMPLE

27

Page 28: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

FUNCTIONALITY

28

module description example componentscentrality node centrality measures centrality.Betweennesscommunity community detection community.PLMcomponents connected components components.ConnectedComponentscorrelation correlations correlations.Assortativitydistance distance measures distance.Diametergenerators generative models generators.ErdosRenyiGeneratorgraph graph API graph.Graph

linkprediction link prediction algorithms linkprediction.JaccardIndexprofiling network profiling tool profiling.Profilesimulation simulations simulation.EpidemicSimulationSEIR

examples of NetworKit’s modules and components

Page 29: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

NETWORKIT IN COMPARISON WITH DISTRIBUTED GRAPH PROCESSING FRAMEWORKS

29

conclusions• good scaling can be due to distributing own

overheads [McSherry et al ’15] • distributed frameworks have significant overheads,

so their application should be motivated by lack of memory on a single node

Staudt – Complex Network Analysis on Distributed Systems 9

Overview | Programming Models & Frameworks

Distributed Computing

General-Purpose Graph-Specific

MapReduce PACT Vertex-Centric

Graph-Centric

Pregel GAS

ApacheGiraph

GraphLab Giraph++Hadoop Apache Flink

distributed (graph) computing models & frameworks

24.9100.585.252.750.3

0306090120

1 2 4 8

0.54

1.42

1.97 2.01

2.99

0

1

2

3

4

NetworKit GraphLab

flickr-edges

159 122148.68880.75 65

69

347

51

1

10

100

1000

Giraph Giraph++ Flink

16.7

4857

41

87

0

30

60

90

120

NetworKit GraphLab

livejournal-links-u

698 633322233

1411

1

10

100

1000

10000

Giraph Giraph++ Flink

24.9

100.5

85.2

52.7 50.3

0

30

60

90

120

NetworKit GraphLab

orkut-links

1462 1318648

179

2511

1

10

100

1000

10000

Giraph Giraph++ Flink

27.2

140

0

30

60

90

120

150

NetworKit GraphLab

uk-2002

1856

1

10

100

1000

10000

Giraph

258

0

100

200

300

GraphLab

1856

1

10

100

1000

10000

Giraph

wikipedia-links

0.091.417202.815282.21224262.2624240.010.1110100

1 2 4 8

0.09

1.4

17 20

2.8

1528

2.2

1224 26

2.26

24 24

0.01

0.1

1

10

100

NetworKit GraphLab Giraph Giraph++ Flink

flickr-edges

0.2

2477 5274 79

45 3563

1438 3647

11

6434

0.01

0.1

1

10

100

NetworKit GraphLab Giraph Giraph++ Flink

livejournal-links-d

12

244

1

10

100

Giraph Giraph++

uk-2002

41

0

25

50

Giraph

twitter-l

nodes

community detection via label propagation

Page 30: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

PLP: PARALLEL LABEL PROPAGATION

30

predecessor• sequential label propagation algorithm [Raghavan

et al. ‘07] • a local coverage maximizer (implicitly maximizes

modularity by getting stuck in local optima)

our improvement• parallelization & optimizing heuristics

Page 31: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

SELECTIVE COMMUNITY DETECTION

31

s1

s2

s3

task- given a set of seed nodes, detect the communities that contain them

state of the art & targeted improvement- plethora of objective functions and heuristics proposed - lack of comparative work

contributions- comparative study clarifying the real-world performance of existing and

novel algorithms - Greedy Community Expansion: generic greedy algorithm for the

incremental maximization of different objective functions, subsuming several previous efforts

- SelSCAN: application of density-based clustering to SCDs

C

s

o1

o2

c1

c2

c3

v1

density-based community

Page 32: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

EDGE CENTRALITY MEASURES FOR SPARSIFICATION

32

methods- Random Edge (RE) - Triange Count (Tri) - Jaccard Similarity (JS) [Satuluri et

al. ’11] - (Quadrilateral or Triadic) Simmelian

Backbones (TS, QLS) [Nick et al. ’13][Nocaj et al. ’14]

- Edge Forest Fire (EFF) [Leskovec et al. ’06]

- Algebraic Distance (AD) [Chen & Safro ’11]

- Local Degree (LD)

edge rank correlations

Page 33: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

EDGE CENTRALITY: LOCAL DEGREE

33

motivation• hierarchy of hub nodes (with

high degree) is important for real-world network’s connectivity (esp. shortest paths)

measure• retain edges to top

neighbors by degree • time: [http://rocs.northwestern.edu/]O(m log(d

max

))

bdeg(u)↵c

experimental results• preserves wide range of properties (connectivity, shortest paths, centralities, epidemic simulation

behavior) • can be strongly correlated with edge betweenness - depending on the network • (edge betweenness: time)O(nm)

Page 34: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

SPARSIFICATION BY LOCAL FILTERING

34

edge score calculation edge filteringG = (V,E) G0 = (V,E0)

target edge ratioedge centrality measure

score transformationbdeg(u)↵c, ↵ 2 [0, 1]keep top

edges incident to every node…

… by applying global threshold to transformed

scores lc({u, v})

c({u, v})

advantages• improves preservation of all

properties (by preserving connectivity)

• accomodates structurally heterogeneous regions in networks

Quadrilateral Simmelian Backbone (QLS) [Nocaj et al.

’14]

Quadrilateral Simmelian Backbone

with local filtering (LQLS)

Page 35: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

REALISTIC SCALING OF NETWORKS

35

Page 36: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

LFR+ ALGORITHM

36

predecessor• LFR generator for community detection

benchmarking [Lancichinetti, Fortunato, Radicchi ’08]

our modification• utilize core algorithm but accept more

general input, fitted to original graph • -> increased flexibility and realism of the

replica

⇣communities

degrees

(�⇣(u))u2V

intra-community degrees

1. assign degrees to nodes 2. assign sizes to communities 3. assign nodes randomly & iteratively to

communities so that intra-community degrees are satisified

4. connect graph using Edge-Switching Markov Chain Generator model (one

graph per community, one global graph, rewiring step to remove additional intra-

community edges)

d = (deg(u))u2V

core algorithm

degree distribution with power-law exponent �

community size distribution with power-law exponent

mixing parameter µ

Ooriginal graph

vanilla LFR parameters LFR+ parameters

synthetic graph R

Page 37: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

STRUCTURE REPLICATION

37

Page 38: ALGORITHMS AND SOFTWARE FOR THE ANALYSIS OF LARGE … › wp-content › uploads › 2016 › 07 › ... · OUTLINE & CONTRIBUTIONS 2 data analysis • inspection • transformation

RUNNING TIME REPLICATION

38

Are algorithm running times obtained on synthetic graphs representative for those on real-world inputs?