28
Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France) A. Vázquez (Notre-Dame University, USA) A. Vespignani (LPT, France) ond-mat/0406404 http://www.th.u-psud.fr/page_perso/Barrat

I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

  • Upload
    debbie

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France. I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France) A. V á zquez (Notre-Dame University, USA) A. Vespignani (LPT, France). http://www.th.u-psud.fr/page_perso/Barrat. - PowerPoint PPT Presentation

Citation preview

Page 1: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Traceroute-like exploration of unknown networks: a statistical

analysis

A. Barrat, LPT, Université Paris-Sud, FranceI. Alvarez-Hamelin (LPT, France)L. Dall’Asta (LPT, France)A. Vázquez (Notre-Dame University, USA)A. Vespignani (LPT, France)

cond-mat/0406404

http://www.th.u-psud.fr/page_perso/Barrat

Page 2: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Plan of the talk

• Apparent complexity of Internet’s structure

• Problem of sampling biases

• Model for traceroute

• Theoretical approach of the traceroute mapping process

• Numerical results

• Conclusions

Page 3: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

•Multi-probe reconstruction (router-level)•Use of BGP tables for the Autonomous System level (domains)

Graph representation

different granularities

Internet representation

Many projects (CAIDA, NLANR, RIPE, IPM, PingER...)

Page 4: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

=>Large-scale visualizations

TOPOLOGICAL ANALYSIS by STATISTICAL TOOLS and GRAPH THEORY !

Page 5: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Main topological characteristics

•Small world

Distribution of Shortest paths (# hops) between two nodes

Page 6: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Main topological characteristics

•Small world : captured by Erdös-Renyi graphs

Poisson distribution

<k> = p N

With probability p an edge is established among couple of vertices

Page 7: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Main topological characteristics

•Small world•Large clustering: different neighbours of a node will likely know each other

1

2

3

n

Higher probability to be connected

=>graph models with large clustering, e.g. Watts-Strogatz 1998

Page 8: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Main topological characteristics

•Small world•Large clustering•Dynamical network•Broad connectivity distributions

•also observed in many other contexts (from biological to social networks)•huge activity of modeling

Page 9: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Main topological characteristics

Broad connectivity distributions:obtained from mapping projects

Govindan et al 2002

Result of a sampling: is this reliable ?

Page 10: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Sampling biases

Internet mapping:

Sampling is incompleteLateral connectivity is missed (edges are underestimated)Finite size sample

=> spanning tree

Page 11: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Sampling biases

● Vertices and edges best sampled in the proximity of sources

● Bad estimation of some topological properties

Statistical properties of the sampled graph may sharply differ from the original one

Lakhina et al. 2002

Clauset & Moore 2004

De Los Rios & Petermann 2004

Guillaume & Latapy 2004

Bad sampling

?

Page 12: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Evaluating sampling biases

Real graph G=(V,E) (known, with given properties )

Sampled graph G’=(V’,E’)

simulated sampling

Analysis of G’, comparison with G

Page 13: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Model for traceroute

G=(V, E)

Sources Targets

First approximation: union of shortest paths

NB: Unique Shortest Path

Page 14: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Model for traceroute

G’=(V’, E’)

First approximation: union of shortest paths

Very simple model, but: allows for some analyticaland numerical understanding

Page 15: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

More formally...

G = (V, E): sparse undirected graph with a set of

NS sources S = { i1, i2, …, iNS }

NT targets T = {j1, j2, …, jNT }

randomly placed.

The sampled graph G’=(V’,E’) is obtained by considering the union of all the traceroute-like paths connecting source-target pairs.

PARAMETERS: ,N

NSS ,

N

NTT

N

NN TS (probing effort)

Usually NS=O(1), T=O(1)

Page 16: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

For each set = { S,T }, the indicator function that a given edge (i, j) belongs to the sampled graph is

ml

N

s

mlij

N

tmjliij

S T

ts1

),(

1

11

with

0

1),( mlij

if (i,j) path between l,m

otherwise,S

N

sii N

S

s

1

T

N

tij N

T

t

1

Analysis of the mapping process

Page 17: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Averaging over all the possible realizations of the set = { S,T },

ml

mlijTS

ml

N

s

N

t

mlijmjliij

S T

ts)1(111 ),(

1 1

),(

WE HAVE NEGLECTED CORRELATIONS !!

• usually S T << 1

Mean-field statistical analysis

ij > 1 – exp(- S T bij )

betweenness

Page 18: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Betweenness centrality b

for each pair of nodes (l,m) in the graph, there are

lm shortest paths between l and m

ijlm shortest paths going through ij

bij is the sum of ijlm

/ lm over all pairs (l,m)

Similar concept: node betweenness bi

ij

kij: large betweenness

jk: small betweenness

Also: flow of information if each individual sends a message to all other individuals

Page 19: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Consequences of the analysis

ij > 1 – exp(- S T bij )

•Smallest betweenness bij=2 => <ij > 2 S T i

j

•Largest betweenness bij=O(N2) => <ij > 1

(i.e. i and j have to be source and target to discover i-j)

Page 20: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Results for the vertices

ij > 1 – exp(- bij/N)

<i > 1 – (1-T) exp( - bi/N ) (discovery probability)

N*k/Nk 1 – exp( - b(k)/N ) (discovery frequency)

<k*>/k ( 1 + b(k)/N )/k (discovered connectivity)

Summary:

• discovery probability strongly related with the centrality ;

• vertex discovery favored by a finite density of targets ;

• accuracy increased by increasing probing effort .

Page 21: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Numerical simulations

1. Homogeneous graphs:

• peaked distributions of k and b

• narrow range of betweenness

11,max ebb

(ex: ER random graphs)

=> Good sampling expected only for high probing effort

Page 22: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Numerical simulations

• broad distributions of k and b

P(k) ~ k-3

• large range of available values

1

k

1. Homogeneous graphs

2. Heavy-tailed graphs

(ex: Scale-free BA model)

=> Expected: Hubs well-sampled independently of

(b(k) ~k

Page 23: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Homogeneous graphs

Homogeneously pretty badly sampled

Numerical simulations

Page 24: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Hubs are well discovered

Numerical simulationsScale-free graphs

Page 25: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

No heavy-tail behaviour except....

• heavy-tailed P*(k) only for NS = 1 (cf Clauset and Moore 2004)

• cut-off around <k> => large, unrealistic <k> needed

• bad sampling of P(k)

Numerical simulationsHomogeneous graphs

NS=1

Page 26: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

• good sampling, especially of the heavy-tail;

• almost independent of NS ;

• slight bending for low degree (less central) nodes => bad evaluation of the exponents.

Scale-free graphs

Numerical simulations

Page 27: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Summary

Analytical approach to a traceroute-like sampling process

Link with the topological properties, in particular the betweenness

Usual random graphs more “difficult” to sample than heavy-tails

Heavy-tails well sampled

Bias yielding a sampled scale-free network from a homogeneous network:only in few cases

<k> has to be unrealistically large

Heavy tails properties are a genuine feature of the Internet

however

Quantitative analysis might be strongly biased (wrong exponents...)

Page 28: I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

•Optimized strategies:•separate influence of T, S

•location of sources, targets•investigation of other networks

•Results on redundancy issues•Massive deployment traceroute@home

Perspectives

•The internet is a weighted networks bandwidth, traffic, efficiency, routers capacity

•Data are scarse and on limited scale

•Interaction among topology and traffic

and...

cf also Guillaume and Latapy 2004