23
Reza Motamedi, Reza Rejaie, Walter Willinger, Daniel Lowd, Roberto Gonzalez http://onrg.cs.uoregon.edu/WalkAbout Inferring Coarse Views of Connectivity in Very Large Graphs 10/8/14 1

Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Reza Motamedi, Reza Rejaie, Walter Willinger, Daniel Lowd, Roberto Gonzalez

http://onrg.cs.uoregon.edu/WalkAbout

Inferring Coarse Views of Connectivity in Very Large Graphs

10/8/14 1

Page 2: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Introduction

! Large-scale networked systems (e.g. OSNs) are often represented as graphs

! Characterizing the connectivity structure of such a graph provides deeper insights about the system

! Coarse view of a graph allows a top-down analysis •  Identify a few tightly connected regions along with their

inter-, and intra-region connectivity •  If needed/desired, zoom in on individual region and recurse

! How can one capture a coarse view of large graphs?

10/8/14 2

Page 3: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Obtaining coarse view of a graph

! Community detection techniques optimize an objective function •  Detects communities with 100s of nodes in real-world graphs •  Some techniques have limited scalability

! Graph partitioning techniques divide the graph into strongly connected partitions •  May produce balanced partitions •  May require seeds for each partition or the number of

partitions as input

10/8/14 3

Page 4: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

This paper presents

! The design of a scalable technique (WalkAbout) to infer coarse (regional) views of a graph

! An illustration of WalkAbout “in action” for inferring the regional connectivity of Flickr, Twitter, Google+

! A study of the relationship between regional- and community-level views of a large graph

! An initial attempt at answering the question “Are (Flickr) regions meaningful?”

10/8/14 4

Page 5: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Random Walks (RW)

! Consider an undirected, connected, non-bipartite graph

! The probability that a very long RW visits node x converges to

! The mixing time is the walk length at which the probability of being at node x is within of the stationary distribution •  We use mixing time rather informally, not specifying

10/8/14 5

G = [V,E]

deg(x)2× | E |

TG (ε)ε

ε

Page 6: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Behavior of Many RWs

! Starting |V| RWs in parallel (one from each node)

! V(x,wl): the expected number of RWs that are at node x after wl steps

! As wl reaches the mixing time, the number of walkers at node x converges to

! degree/visit ratio (dvr) converges to average node

degree 10/8/14 6

V (x,wl) ≈|V | deg(x)2* | E |

=>deg(x)V (x,wl)

≈2 | E ||V |

Page 7: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Validation through Simulations

! Use simulation over synthetic graphs to explore the dependency of dvr on different parameters •  More results in the paper

10/8/14 7

10 20 30 40 50 600

0.05

0.1

0.15

0.2PD

F

dvr

Avg. degree=24.74Avg. degree=33.94Avg. degree=44.30Avg. degree=53.29

40 45 50 55 600

0.05

0.1

0.15

0.2

PD

F

dvr

wl=10

wl=20

wl=50

wl=100

10 30 50 7040

45

50

55

60

wl

dvr

deg<50deg>100

Page 8: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Detecting Regions – Key Idea ! Suppose a graph consists of a few weakly connected

regions ! Starting RWs from randomly selected nodes on graph

G = [V, E] that has multiple regions •  Region i is Gi = [Vi, Ei]

! If wl is close to the mixing time of regions, a majority of RWs remain in their starting region •  the graph can be viewed as disconnected regions

! converges to average node degree of region i

10/8/14 8

dvri (x) =deg(x)

E[V (x,wl)]=2 | Ei ||Vi |

dvri (x)

Page 9: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Key Idea (cont’d)

! Regions with different average degree form separate peaks in the dvr histogram •  Region: a non-overlapping range of dvr values

! Formation of peaks is a transient phenomenon •  As wl increases beyond the mixing time of regions, dvr for all

nodes converges to a single value

Ø The similarity of dvr implies tighter connectivity among nodes in a region Ø  dvr signal is indirect and efficient => scalable

10/8/14 9

Page 10: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Validation on Synthetic Graphs

! A graph with two regions (average degree of 70, 60) connected with b bridge edges.

! Only changing a single region or the bridge

10/8/14

10 2040

6080

5055

60650

0.01

0.02

0.03

dvrAvg. Degree

PDF

2040

6080

20004000

60008000

0

0.01

0.02

0.03

vdrRegion size

pdf

2040

6080

12

34

5

x 104

0

0.01

0.02

0.03

dvrBridge Size

PDF

Page 11: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

WalkAbout

! Using many short RWs to infer/explore regional connectivity of large graphs •  The number of regions, nodes per region, and determining

inter-, intra-region connectivity

! Basic challenges •  The variation and rate of convergence of dvr is inversely

proportional with node degree (i.e. noise of low degree nodes) •  Regions having similar average deg. & different mixing times

! Identifying regions in two steps: •  Detecting the core (high degree) nodes of each region •  Mapping low degree nodes to the detected cores per region

10/8/14 11

Page 12: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

WalkAbout – Main Steps

! Emulating RWs and generating the dvr histogram •  Removing low degree nodes ( ) to reduce noise

! Identifying core of each region •  Search for the walk length that leads to pronounced peaks •  Detect a peak & its associated dvr range => nodes per region

! Mapping low degree nodes to cores •  Based on the relative reachability (using multiple RWs)

! Producing the regional view

10/8/14 12

Dmin

Page 13: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Inferring vs Exploring

! WalkAbout provides a few parameters that affect the resulting regional view ( , wl) •  Parameters can be set based on the domain knowledge

! Sensitivity to these parameters offers insight about the graph structure

! Developing WalkAbout as an interactive tool with GUI

•  Publicly available at http://onrg.cs.uoregon.edu/WalkAbout

10/8/14 13

Dmin

Page 14: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

WalkAbout in Action

! Inferring regional view of connectivity of the LCC for Flickr, Twitter and Google+

! To contrast: Apply Louvain Communitiy detection method

! Default setting •  See the tech report for results on the sensitivity to

10/8/14 14

Flickr Twitter GPlus Nodes 1.6M 41.6M 51.7M Edges 31.M 1,468M 869.4M

Communities 28K 39K 24K

Dmin = 500Dmin

Page 15: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Regional View of Flickr

10/8/14 15

20 25 30 350

0.005

0.01

0.015

0.02

0.025

dvr

PDF

R0

R1

R2

R3R4

1520

2530

1020

3040

500

0.01

0.02

0.03

0.04

dvrwl

PDF

Cores Regions Regions Regions

Size %Nodes %Edges Avg.Deg Mod.

R0 4000 92.8 58.2 11.9 0.4

R1 569 1.2 3.2 50.1 0.5

R2 3010 4 17.6 83.7 0.7

R3 2120 1.8 16.6 174.2 0.6

R4 1140 0.2 4.4 431 0.3

wl = 30,Dmin = 500

Page 16: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Lessons Learned

! Regions with closer dvr tend to have stronger inter-region connectivity •  Incorrectly placed high degree nodes •  Regions with different sizes and mixing times

! The number of peaks changes with walk length •  The number/selection of peaks affect the regional view

! Identified regions could be very imbalanced in size •  Detecting possible sub-regions in a hierarchical manner

10/8/14 16

Page 17: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Regions & Communities

! Comparing/relating the regional and community views •  Typical community is much smaller and more modular •  Largest communities have sizes comparable to regions Ø Orders of magnitude more communities

! The highest degree nodes per region are placed in a few communities with size & modularity comparable to regions!

10/8/14 17 TW G+ FL OR

0

0.2

0.4

0.6

0.8

Modularity

LouvainLarge LouvWA

TW G+ FL OR

101

102

103

Average Degree

TW G+ FL OR

102

104

106

Size

Page 18: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Mapping Communities to Regions

! Community c is mapped to region R that contains most of its nodes •  Mapping confidence: fraction of c’s nodes located in R

! Across regions of all OSNs •  For 75% of communities, the confidence is 100% •  For 90% of communities, the confidence is more than 80%

Ø Regions can be viewed as a collection of communities Ø A coarser view of the graph

10/8/14 18

Page 19: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Per-Region Analysis of Communities ! Are the characteristics of communities generally

reveal the features of their region? •  No strong relation between the modularity of communities in a

region and the modularity of the region

! The inter-connectivity among communities is critical to determine features of each region

10/8/14 19 100

101

102FL

R0 R1 R2 R3 R4

Aver

age

Deg

ree

100

101

102TW

R0 R1 R2 R3 R4 R5 100

101

102G+

R0 R1 R2 R3 R4 R50

0.2

0.4

0.6

0.8

1FL

R0 R1 R2 R3 R4

Modulairty

0

0.2

0.4

0.6

0.8

1TW

R0 R1 R2 R3 R4 R5 0

0.2

0.4

0.6

0.8

1G+

R0R1 R2 R3 R4 R5 100

101

102

FL

R0R1R2R3R4S

ize

100

101

102

TW

R0 R1 R2 R3 R4 R510

0

101

102

G+

R0 R1 R2 R3 R4 R5

Page 20: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Run-time

! Comparing the run times of WalkAbout and the Louvain community detection technique •  On Intel X5650 (2.66GHz) computer with 72GB RAM

! Splitting WalkAbout run time to •  dvr calculations to detect core, and •  Mapping of low degree nodes to those cores

! WalkAbout exhibits a shorter run time for large graphs

10/8/14 20 102 104 105

G+

TW

FL

Second

LouvainWA: Map to CoreWA: dvr

Page 21: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

A New Kind of Validation

10/8/14 21

! Do users in a region exhibit a similar social attributes •  Need social context for users

! 99K social groups in Flickr: group name, users/group •  Group name provides info about group interest or context •  Map each group to a region where most users are located •  Mapping confidence for R1-R4 is high even for large groups •  e.g. group names in R1 related to male nudity.

Ø Social forces appear to derive the formation of regions

0

0.2

0.4

0.6

0.8

1

Gro

up M

appin

g C

onfid

ence

R0 R1 R2 R3 R4

Page 22: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Conclusion & Outlook

10/8/14 22

! WalkAbout, a new technique to infer/explore coarse views of large graphs

! Applying WalkAbout to three major OSNs ! Are regions meaningful?

•  Relating the regional- and community-level views •  Showing social cohesion of regions in Flickr

! Future plans •  Exploring the recursive application of WalkAbout •  Multi-scale characterization of graph connectivity and its

application to examine graph evolution

Page 23: Inferring Coarse Views of Connectivity in Very Large Graphsmirage.cs.uoregon.edu/slide/cosn2014-slides.pdf · Key Idea (cont’d) ! Regions with different average degree form separate

Reza Motamedi, Reza Rejaie, Walter Willinger, Daniel Lowd, Roberto Gonzalez

http://onrg.cs.uoregon.edu/WalkAbout

Inferring Coarse Views of Connectivity in Very Large Graphs

10/8/14 23