40
Should we build Gnutella on a structured overlay? Ant Rowstron joint work with Miguel Castro, Manuel Costa Microsoft Research Cambridge

Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Should we build Gnutella on astructured overlay?

Ant Rowstron

joint work with

Miguel Castro, Manuel Costa

Microsoft Research Cambridge

Page 2: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Structured P2P overlay networks

• Structured overlay network maps keys to nodes• Routes messages to keys; (can implement hash table)

overlay network with N nodes

k,v

[CAN, Chord, Kademlia, Pastry, Skipnets, Tapestry, Viceroy]

route(“insert v”, k)

route(“lookup”, k) v

Page 3: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Mapping keys to nodes

• Large id space (128-bit integers)

• NodeIds picked randomly from space

• Key is managed by its root node:

• Live node with id closest to the key

root nodefor key

id space

nodeIdkey

Page 4: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Pastry

2033*2032*2031*2030*

203*202*201*200*

23*22*21*20*

3*2*1*0*

203231

• routing table• nodeIds and keys in some base 2b (e.g., 4)• prefix constraints on nodeIds for each slot

leaf set

nodeId

Page 5: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Structured overlays• Overlay topology

– nodes self organize into structured graph– node identity constrains set of neighbors

• Data placement– data identified by a key– data stored at node responsible for key

• Queries– efficient key lookups (O(logN))

examples: CAN, Chord, Pastry, Tapestry

Page 6: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Gnutella

• Nodes form random graph (unstructured overlay)• Node stores its own published content• Lookups flooded through network (inefficient)

route(“insert v”)

v

overlay network with N nodes

route(“lookup”, reg. exp)

Page 7: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Gnutella

• Nodes form random graph (unstructured overlay)• Node stores its own published content• Lookup using random walks (needles and haystacks!)

route(“insert v”)

v

overlay network with N nodes

route(“lookup”, reg. exp)

Page 8: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Unstructured overlay

• Overlay topology– nodes self-organize into random graph

• Data placement– node stores data it publishes

• Queries– overlay supports arbitrarily complex queries– floods or random walks disseminate query– each node evaluates query locally

example: Gnutella

Page 9: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Can we build Gnutella on astructured overlay?

• Complex queries are important– unstructured overlays support them

– structured overlays do support them

• Peers are extremely transient– unstructured overlays more robust to churn

– structured overlays have higher overhead

[Chawathe et al. SIGCOMM’03]

Page 10: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Complex queries

• Arbitrarily complex queries– Unstructured overlay

• Flood– High overhead due to duplicates

• Random walks– High lookup latency

– Support arbitrarily complex queries

– Structured overlays• ?

Page 11: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Complex queries (structured)

• Structured overlay topology– nodes self organize into structured graph

• Same data placement as unstructured– node stores data it publishes

• Same queries as unstructured– overlay supports arbitrarily complex queries– floods or random walks disseminate queries– each node evaluates query locally

Page 12: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Flood queries

0x

1x

2x

3x

• Exploit structure to avoid duplicates

Page 13: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

03x

0x

1x

2x

3x

Flood queries 00x

01x

02x

Page 14: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Random walk queries 1

Page 15: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Random walk queries 2

Page 16: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

03x

0x

1x

2x

3x

Random walkqueries 3

00x

01x

02x• Exploiting routing tables

• Breadth-firstsearch

Page 17: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Story so far….

• Gnutella is built using an unstructured overlay• Described hybrid approach

– Structured overlay graph– Unstructured overlay data placement

• Described how to exploit structure in lookup– Same techniques as in an unstructured overlay– Implemented more efficiently

Next part: Churn and overhead

Page 18: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Overhead

• Both structured and unstructured– detect failures

– repair overlay graph when nodes join or leave

Page 19: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Detecting failures

• Probe neighbors in overlay

• Exploit symmetric state– Heartbeats versus probes

• Number of heartbeats is number of neighbors• Supress heartbeats with application traffic

Page 20: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Exploiting structure for maintenance

• Heartbeat sent to neighbor on the left• Probe node if no heartbeat• Tell others about failure if no probe reply

• Leads to lower overhead

Page 21: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Comparing overhead

• Unstructured overlay (Gnutella 0.4)– Max and min bounds placed on # neighbors

– Node discovery on join using random walks

– Failure detection heartbeat every 30 seconds

• Structured overlay (MS Pastry)– Leafsets

• Failure detection using heartbeats every 30 seconds

– Routing table• Failure detection using probes (tuned to churn)

Page 22: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Experimental comparison

• Discrete event simulator– Transit-stub network topology

• UW trace of node arrivals and departures– [Saroiu et al. MMCN’02]

– 60 hours trace

– average session = 2.3 hours, median ~ 1 hour

– Active nodes varies between 2,700 and 1,300

Page 23: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Gnutella trace: Failure rate

0.00E+00

5.00E-05

1.00E-04

1.50E-04

2.00E-04

2.50E-04

3.00E-04

0 10 20 30 40 50 60

Time (Hours)

Nod

e fa

ilure

s pe

r se

cond

per

nod

e

Page 24: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Overhead: Configuration

• Gnutella 0.4 (4)– Min neighbors 4, max neighbors 8 (avg. 5.8)

• Gnutella 0.4 (8)– Min neighbors 8, max neighbors 32 (avg. 11)

• Pastry– b=1, no proximity neighbor selection, l = 32

Page 25: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Overhead: Maintenance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 10 20 30 40 50 60

Time(hours)

Mes

sag

es /

seco

nd

/ n

od

e

Gnutella 0.4 (8)

Gnutella 0.4 (4)

Pastry

Page 26: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Gnutella 0.6 (SuperPeers)

• Super peers form random graph– Uses Gnutella 0.4 algorithm

• Normal nodes use super peers as proxies– Failure detections using heartbeats (30 secs)

– Connect to multiple super peers

Page 27: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

SuperPastry

• Super peers form Pastry overlay

• Normal nodes use super peers as proxies– Failure detections using heartbeats (30 secs)

Page 28: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Overhead: Configuration

• 0.2 probability of node being a super peer

• Gnutella 0.6 configured:– Min neighbours = 10

– Max neighbours = 32

• SuperPastry configured– Max in-degree from routing table = 32

• Super peers proxy for 30 normal nodes

• Normal nodes pick 3 super peers

Page 29: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Overhead: Maintenance

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 10 20 30 40 50 60

Time(hours)

Mes

sag

es /

seco

nd

/ n

od

e

Gnutella 0.6

SuperPastry

Page 30: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Gia [Chawathe et al. SIGCOMM’03]

• Adapts overlay to exploit heterogeneity– Uses a per-node metric of satisfaction

– Seeks new neighbors if unsatisfied

– Use parameters in Sigcomm Paper

– Neighbors [min = 3, max = max(3,min(128,C/4)) ]• Average 15.8

0.001

0.049

0.30

0.45

0.20

Probability

128

125

25

3

3

NeighborsCapacity

1000

10000

100

10

1

Page 31: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

HeteroPastry

• Routing table neighbor selection usingcapacity metric

• Uses routing table in-degree bound– Calculated as for Gia

Page 32: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Overhead: Maintenance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60

Time(hours)

Mes

sag

es /

seco

nd

/ n

od

e

Gia

HeteroPastry

Page 33: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

The story so far….

• Both structured and unstructured– detect failures

– repair overlay graph when nodes join or leave

• Structured exploits structure– Lower overheads

• Unstructured overlays sensitive to neighbors choice– Random walks between node discovery

Finally: Putting it all together….

Page 34: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Search: Configuration

• eDonkey file trace [Fessant et al. IPTPS’04]– 37,000 peers (25,172 contribute no files)

– 923,000 unique files (heavy tail zipf-like)

• Each node performs 0.01 lookups persecond (using a Poisson process)– Random walks TTL 128

• One hop replication [Chawathe et al. SIGCOMM’03]

– Uses routing table in structured overlays (***)

Page 35: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Search: Messages

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 10 20 30 40 50 60Time (hours)

Mes

sag

es /

seco

nd

/ n

od

e

GiaGnutella 0.6SuperPastryHeteroPastry

Page 36: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Search: Success rate

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60Time(hours)

Su

cces

s ra

te

HeteroPastryGiaGnutella 0.6SuperPastry

Page 37: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Search: Delay

0

5000

10000

15000

20000

25000

30000

0 10 20 30 40 50 60Time(hours)

Del

ay (

ms)

Gnutella 0.6SuperPastryGiaHeteroPastry

Page 38: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Conclusions

• Structure can improve Gnutella– Handles transient peers well

– Exploits structure to reduce maintenance overhead

– Supports complex queries

– Can also support DHT functionality

– Can exploit heterogenity

Page 39: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

And finally a question…

Does structure make security easier?

For slides:

http://www.research.microsoft.com/~antr/camb-ast.ppt

For more information:

http://www.research.microsoft.com/~antr/Pastry

Page 40: Should we build Gnutella on a structured overlay? · Can we build Gnutella on a structured overlay? •Complex queries are important –unstructured overlays support them –structured

Flooding queries

• exploit structure to avoid duplicates

• flooding a query q– if node is source of q do

for each routing table row r

send <flood, q, r> to nodes in row r

– if node receives <flood, q, s> do

for each routing table row r such that r > s

send <flood, q, r> to nodes in row r

• recursively partitions nodes into disjoint sets