54
Measuring and Extracting Proximity in Complex Networks Emden Gansner, Yehuda Koren, Stephen North , Chris Volinsky AT&T Labs Research

Measuring and Extracting Proximity in Complex Networks

  • Upload
    ronda

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Measuring and Extracting Proximity in Complex Networks. Emden Gansner, Yehuda Koren, Stephen North , Chris Volinsky AT&T Labs Research. AT&T “Safe Harbor”. - PowerPoint PPT Presentation

Citation preview

Page 1: Measuring  and  Extracting Proximity  in  Complex Networks

Measuring and Extracting Proximity

in Complex NetworksEmden Gansner, Yehuda Koren, Stephen North, Chris Volinsky

AT&T Labs Research

Page 2: Measuring  and  Extracting Proximity  in  Complex Networks

AT&T “Safe Harbor”The following contains "forward-looking statements" which are

based on management's beliefs as well as on a number of assumptions concerning future events made by and information currently available to management. Readers are cautioned not to put undue reliance on such forward-looking statements, which are not a guarantee of performance and are subject to a number of

uncertainties and other factors, many of which are outside AT&T's control, that could cause actual results to differ

materially from such statements. For a more detailed description of the factors that could cause such a difference, please see AT&T's

filings with the Securities and Exchange Commission. AT&T disclaims any intention or obligation to update or revise any

forward-looking statements, whether as a result of new information, future events or otherwise.

Page 3: Measuring  and  Extracting Proximity  in  Complex Networks

large social networks

co-authors 438K 31M

actor-actor 896K 1.1M

phone calls 300M 1000M

IM 200M 800M

data source |V| |E|

Page 4: Measuring  and  Extracting Proximity  in  Complex Networks

18 node subgraph Proximity: 1.35e+01 Captured: 1.31e+01(97%)

Adam?Glenn?Emden?

Connecting C

o-Authors

in DBLP

Page 5: Measuring  and  Extracting Proximity  in  Complex Networks

95% of communication between…

5 node subgraph Proximity: 7.10e+00Captured: 6.74e+00(95%(

Page 6: Measuring  and  Extracting Proximity  in  Complex Networks

Our goals• Measure proximity between nodes.

• Explain proximity by extracting connection subgraphs that are readily visualized.

Page 7: Measuring  and  Extracting Proximity  in  Complex Networks

What is proximity?

• proximity [prox·im·i·ty || prɑk'sɪmətɪ /prɒ-]n. adjacency, nearness, closeness, vicinity

• Network proximity is an elusive notion!

• Let’s work by refining a series of definitions.

Page 8: Measuring  and  Extracting Proximity  in  Complex Networks

Measuring proximity• Simplest approach – length of shortest path

Easily visualized

Page 9: Measuring  and  Extracting Proximity  in  Complex Networks

Measuring proximity• Simplest approach – length of shortest path

Easily illustrated

Disregards alternative paths

Captures 56%

Captures 98%

Page 10: Measuring  and  Extracting Proximity  in  Complex Networks

Measuring proximity• Simplest approach – length of shortest path

Easily visualized

Disregards alternative paths

Naïve calculation will be fooled by high degreesExample from a telephone call graph…

Page 11: Measuring  and  Extracting Proximity  in  Complex Networks

Which pair is closer?

Lefty Stephen

Suresh Shankar

• Both paths are 2-hops, about the same lengths

• But when considering node-degrees…

Meaningful connection

Random connection?

Page 12: Measuring  and  Extracting Proximity  in  Complex Networks

Measuring proximity – 2nd try• Net network flow between the nodes

Accounts for multiple pathsDistance indifferent – might favor long pathsHigh degree are still an issue

Page 13: Measuring  and  Extracting Proximity  in  Complex Networks

Measuring proximity – 3rd try• Delivered electric current

(effective conductance)• Resistor network model

Accounts for multiple paths

Penalizes long paths• High degrees??

• Getting us closer…

• “intuitive”

Physical analogy is not perfect!

edge weights

conductance, inverse-resistance

1V 0V

Page 14: Measuring  and  Extracting Proximity  in  Complex Networks

When is the electrical current analogy misleading?

Significant connection Noise?

What does current flow mean?

Page 15: Measuring  and  Extracting Proximity  in  Complex Networks

When is the electric current analogy misleading?

Noise?Significant connection

• Same current flow in both cases! • Degree-1 nodes are neutral (attract no flow)• Degree-1 nodes are very common, due to incomplete

information

Page 16: Measuring  and  Extracting Proximity  in  Complex Networks

Augment network by a universal sink [Faloutsos, McCurley & Tomkins, KDD 2004]

• Connect all nodes to a grounded universal sink (with 0V)

• Tax each node - deliver portion of the flow to the sink

No internal nodes of degree 1 (above problem solved)

Penalizes long paths

A new parameter to worry about:Which tax system? - Constant tax? Proportional tax? Tax brackets? How much?

• There is a worse problem…

Page 17: Measuring  and  Extracting Proximity  in  Complex Networks
Page 18: Measuring  and  Extracting Proximity  in  Complex Networks

Universal sink and (non-)monotonicity

• In our previous notions of proximity, adding nodes/edges to the network couldn’t decrease proximity

• Hmmm…this “blind monotonicity” was part of their shortcoming…

Network size

Pro

xim

ity

Page 19: Measuring  and  Extracting Proximity  in  Complex Networks

Universal sink and (non-)monotonicity• For all previous measures, adding nodes/edges

to the network couldn’t decrease proximity• With universal sink – no monotonicity:

Larger network proximity tends to zero, sink attracts more flow

• Even adding s—t paths can decrease proximity!

Network size

Pro

xim

ity

Page 20: Measuring  and  Extracting Proximity  in  Complex Networks

Universal sink and (non-)monotonicity

• Problems with non-monotonicity:– Counter-intuitive and hard to use– Size bias makes proximity-comparison across

different pairs completely unreliable– Impossible to explain (size-dependent) proximity

using a connection subgraph

Network size

Pro

xim

ity

Page 21: Measuring  and  Extracting Proximity  in  Complex Networks

A random-walk perspective

• Current-flow model has a direct r.w. interpretation• Reminder:

We defined proximity by “delivered current” or “effective conductance”

• The escape probability, Pesc(st), is the probability that a r.w. originating at s will reach t before visiting s again

• Let Deg(s) be the number of r.w.’s originating at s• The effective conductance between s and t, is

Pesc(st)*Deg(s)

Page 22: Measuring  and  Extracting Proximity  in  Complex Networks

• “Dead end” paths have no influence on escape probability

• Both graphs have the same escape-probability from red to green

Lower redgreen escape probability

Higher redgreen escape probability

In both cases higher effective conductance by Rayleigh’s Monotonicity Law

Page 23: Measuring  and  Extracting Proximity  in  Complex Networks

Extending escape probability• The escape probability, Pesc(st), is the

probability that a r.w. originating at s will reach t before visiting s again

• The cycle-free escape probability, Pc.f.esc(st) is the probability that a r.w. originating at s will reach t without visiting any node more than once

• Multiply by degree to get an absolute quantity (accounting for the number of "actually initiated" r.w.'s):The c.f. effective conductance between s and t is Pc.f.esc(st)*Deg(s)

Page 24: Measuring  and  Extracting Proximity  in  Complex Networks

Higher redgreen c.f. escape probability

Lower redgreen c.f. escape probability

• The c.f. effective conductance is a good candidate proximity measure:

Accounts for multiple pathsFavors short pathsPenalizes high-degree nodesPenalizes dead-end pathsParameter freeHas the “right” monotonicityAccommodates edge directionsHas a natural extension to multiple endpoints

Page 25: Measuring  and  Extracting Proximity  in  Complex Networks

Computing c.f. escape probability• Unlike previous measures, exact computation is

impossible

• Practically, we can estimate it extremely well• Probability of paths declines exponentially (e.g., 100th

path is x106 less probable than the first one.)• Estimate using the most probable paths:

c.f.escsimple path [ ]

P ( ) = prob( )p s t

s t p

c.f.eschighly probablesimple path [ ]

P ( ) prob( )

p s t

s t p

Page 26: Measuring  and  Extracting Proximity  in  Complex Networks

Finding k most probable paths

• For an edge u-v of weight w(u,v), define its length

• Edge lengths are positive• Exp(-<length of path>) = Prob(path)• Short path High-probable path• Compute k shortest simple paths in O(k|E|log|E|) time

[Katoh, Ibarki and Mine, 1982]

• Stop searching when probability drops below “10-6” of first path

( , )( , ) log

deg( ) deg( )

w u vl u v

u v

Page 27: Measuring  and  Extracting Proximity  in  Complex Networks

Extracting and explaining proximity

Page 28: Measuring  and  Extracting Proximity  in  Complex Networks

Extracting proximity• Cycle free effective conductance (CFEC) depends on the

full graph• Find a small subgraph that captures the most proximity• A tradeoff between “size” and “captured proximity”, can be

expressed in alternative ways:– Extract a subgraph with at most B nodes that captures maximal

CFEC• Maybe with B+1 nodes we can capture much more???

– Extract a minimal-sized subgraph that captures at least P% of total CFEC

• Maybe we can capture (P-1)% of total CFEC with a much smaller subgraph???

Page 29: Measuring  and  Extracting Proximity  in  Complex Networks

Extracting proximity• Find a small subgraph that captures most proximity• Achieve an efficient balance between “size” and

“proximity” by maximizing the ratio:

• Larger α emphasize proximity larger subgraph– α=0 returns only the shortest path

– α=∞ return all paths

• Optionally, explicitly fix lower and upper bounds on subgraph size

CFEC( )

sub ap

gr h

s t

Page 30: Measuring  and  Extracting Proximity  in  Complex Networks

What solutions do we seek?

• Overlapping paths delivering the most flow

Page 31: Measuring  and  Extracting Proximity  in  Complex Networks

The path merger algorithm• We already have a collection of paths• Find the subset of the paths that maximizes

CFEC( )

sub ap

gr h

s t

• Combine the selected paths into a “proximity subgraph”

• Overlapping paths are cheaper to add

• An NP-hard problem…

Page 32: Measuring  and  Extracting Proximity  in  Complex Networks

Optimal algorithm

• Scanning all subsets takes O(2k) time (can we do better?)

• A branch-and-bound pruning significantly reduces running time

• Huge deviations in path-quality make this approach effectivee.g. often it is clear that the best-subset must contain first path(s)

• Prematurely terminate exponential algorithm after scanning “too many” subsets

Page 33: Measuring  and  Extracting Proximity  in  Complex Networks

Agglomerative algorithm

• If optimal algorithm couldn’t finish, improve current result by an agglomerative algorithm

• Iteratively, merge the two subsets that maximize the ratio

• Record the best subset discovered

Page 34: Measuring  and  Extracting Proximity  in  Complex Networks

Working with large graphs in external storage

• Dealing with full graph is sometimes infeasible and usually unnecessary

• Prior to running the algorithm, we construct a candidate graph in main memory

• We begin by growing increasing neighborhoods around the endpoints

Page 35: Measuring  and  Extracting Proximity  in  Complex Networks

S T

Page 36: Measuring  and  Extracting Proximity  in  Complex Networks

S T

Dist(T,i)=2Dist(S,i)=2

Page 37: Measuring  and  Extracting Proximity  in  Complex Networks

S T

Dist(T,i)=3Dist(S,i)=3

Page 38: Measuring  and  Extracting Proximity  in  Complex Networks

S T

Dist(T,i)=4Dist(S,i)=4

Page 39: Measuring  and  Extracting Proximity  in  Complex Networks

S T

Dist(T,i)=5Dist(S,i)=5

Shortest path of length 10

Page 40: Measuring  and  Extracting Proximity  in  Complex Networks

No use for low-probability paths... Paths longer than

“24” unneeded!

Most probable path of length 10 was found

Page 41: Measuring  and  Extracting Proximity  in  Complex Networks

S T

Dist(T,i)=12Dist(S,i)=12

Page 42: Measuring  and  Extracting Proximity  in  Complex Networks

S T

Dist(T,i)=12Dist(S,i)=12 i

• Stop adding nodes• Any s—t path through unscanned node must be longer

than “24”, thus useless• Can we prune the resulting graph?• Yes!• From two circles into an ellipse…

Page 43: Measuring  and  Extracting Proximity  in  Complex Networks

Pruning the candidate graph

• We can safely prune a significant portion of the candidate graph

• Use the fact: dist(i,s)+dist(i,t)>L all s—t paths going via i are longer than L

• We ignore much less probable pathsPaths longer than “24” are not interesting

• Take only nodes within the ellipse defined by: dist(i,s)+dist(i,t)<24

Page 44: Measuring  and  Extracting Proximity  in  Complex Networks

S T

From 2-centers of circles to 2-foci of ellipse

Dist(T,i)=12Dist(S,i)=12

Dist(S,i)+Dist(T,i)=24

Page 45: Measuring  and  Extracting Proximity  in  Complex Networks

Some statistics…

Page 46: Measuring  and  Extracting Proximity  in  Complex Networks

Distribution of proximities in phone-call network

Page 47: Measuring  and  Extracting Proximity  in  Complex Networks
Page 48: Measuring  and  Extracting Proximity  in  Complex Networks

Distribution of #hops in phone-call network

Page 49: Measuring  and  Extracting Proximity  in  Complex Networks
Page 50: Measuring  and  Extracting Proximity  in  Complex Networks
Page 51: Measuring  and  Extracting Proximity  in  Complex Networks
Page 52: Measuring  and  Extracting Proximity  in  Complex Networks
Page 53: Measuring  and  Extracting Proximity  in  Complex Networks

Summary

• Proposed cycle free effective conductance (CFEC) with a random walk interpretation to measure “proximity” in social networks and other ad-hoc networks

• Described a way of approximating CFEC• Described a way of visualizing CFEC as a

subgraph• Extended the method to external datasets• Showed empirical evidence for its utility

Page 54: Measuring  and  Extracting Proximity  in  Complex Networks

Extensions

• Study proximity in other kinds of networks.

• Extend c.f. effective conductance to:– Multiple endpoints (already demonstrated)– Directed edges (future work – use k-shortest

paths in a directed graph, alg. due to Hershberger et al ).