Upload
monica-atkinson
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Structured overlays: Self-organization and Scalability @ SASO 2009 1
PP
2
Structured Overlays- Self-organization and Scalability
by
Anwitaman Datta – Nanyang Technological University, Singapore – [email protected]
Ali Ghodsi – UC Berkeley, USA –
Structured overlays: Self-organization and Scalability @ SASO 2009 2
PP
2
The P2P paradigm- A brief introduction
Part I
Structured overlays: Self-organization and Scalability @ SASO 2009 3
PP
23
Outline
• The P2P paradigm– History and philosophy
• P2P in the realm of distributed systems
– Concepts• Decentralization• Self-organization• Overlays
• Resource location problem at the large– Structured overlay networks – Unstructured overlay networks
Structured overlays: Self-organization and Scalability @ SASO 2009 4
PP
24
P2P is more than justPirate-to-Pirate
file-sharing!&
distributingillegal copies
The P2P paradigm
Structured overlays: Self-organization and Scalability @ SASO 2009 5
PP
25
<rdf:Description about='' xmlns:xap='http://ns.abode.com/xap/1.0/'> <xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate> <xap:Creator> Brahma </xap:Creator></rdf:Description>…
knowledge
bandwidth
storage
processing
content
Sharing resources in large-scale networks
Homo sapiens
The P2P paradigm: Application Perspective
Structured overlays: Self-organization and Scalability @ SASO 2009 6
PP
26
• Centralized solution undesirable or unattainable• Exploit resources at the edge
- no dedicated infrastructure/servers- peers act as both clients and servers (servent)
• Autonomous participants- large scale- dynamic system and workload- source of unpredictability
- e.g., correlated failures• Lack of global control or knowledge
- rely on self-organization
The P2P paradigm: Systems Perspective
Structured overlays: Self-organization and Scalability @ SASO 2009 7
PP
27
• So where does P2P fit in the realm of distributed systems?
A collection of (probably heterogeneous) automata whose distribution is transparent to the user so that the system appears as one local machine. This is in contrast to a network, where the user is aware that there are several machines, and their location, storage replication, load balancing and functionality is not transparent.
[http://foldoc.org/index.cgi?distributed+system]
– In its loosest sense, distributed system is any system with several nodes and a network between them
P2P is just distributed systems
Acknowledgement: The following discussion on how p2p paradigm fits in the realm of distributed systems is inspired by J. Kangasharju’s take on the same issue.
Structured overlays: Self-organization and Scalability @ SASO 2009 8
PP
28
• The definition (represents traditional view of distributed systems) implies a managed and controlled entity which acts as a single, logical system– Often also relies on dedicated infrastructure
• In contrast, P2P is decentralized and is not controlled or managed. P2P uses individually unreliable autonomous participants and generally rely on self-organization. – Still, ideally, the system should provide some
overall reliability guarantees
P2P in the realm of distributed systems
Structured overlays: Self-organization and Scalability @ SASO 2009 9
PP
29
• Grid– Coordinated resource sharing and problem
solving in dynamic, multi-institutional virtual organizations. - Ian Foster
• Note that a Grid is generally centralized
P2P in the realm of distributed systems
Structured overlays: Self-organization and Scalability @ SASO 2009 10
PP
210
• Ad-hoc networks– A wireless ad hoc network is a decentralized wireless network.
The network is ad hoc because each node is willing to forward data for other nodes, and so the determination of which nodes forward data is made dynamically based on the network connectivity. This is in contrast to wired networks in which routers perform the task of routing. It is also in contrast to managed (infrastructure) wireless networks, in which a special node known as an access point manages communication among other nodes.
– Can be seen as a `kind of´ peer-to-peer network• Though often very different research communities are involved,
and the focus of problems and functionalities are also very different.
P2P in the realm of distributed systems
Structured overlays: Self-organization and Scalability @ SASO 2009 11
PP
211
Self-organization
• Self-organizing systems common in nature– Physics, biology, ecology, economics, sociology, cybernatics– Microscopic (local) interactions– Limited information, individual decisions
• Distribution of control => decentralization– Symmetry in roles/peer-to-peer
– Emergence of macroscopic (global) properties• Resilience
– Fault tolerance as well as recovery– Adaptivity
Structured overlays: Self-organization and Scalability @ SASO 2009 12
PP
2
Resource discovery in the large - Structured overlay basics
Part II
Structured overlays: Self-organization and Scalability @ SASO 2009 13
PP
2
Structured overlays/Distributed hash tables
what it is
Structured overlays: Self-organization and Scalability @ SASO 2009 14
PP
2
What’s a Distributed Hash Table?
• An ordinary hash table
• Every node provides a lookup operation–Given a key: return the associated value
• Nodes keep routing pointers–If item not found locally, route to another node
Key ValueAnwitaman
Singapore
Ali Berkeley
Alberto Trento
Kurt Kassel
Ozalp Bologna
Randy Berkeley
, which is distributed
Structured overlays: Self-organization and Scalability @ SASO 2009 15
PP
2
Why’s that interesting?
• Characteristic properties– Self-management in presence joins/leaves/failures
• Routing information • Data items
– Scalability• Number of nodes can be huge• Number of items can be huge
Structured overlays: Self-organization and Scalability @ SASO 2009 16
PP
2
short interlude
applications
Structured overlays: Self-organization and Scalability @ SASO 2009 17
PP
2Name-based communication Pattern• Map node names to location
– Can store all kinds of contact information• Mediator peers for NAT hole punching• Profile information
• Used this way by:– Host Identity Payload (HIP)– P2P Session Initiation Protocol (P2PSIP)– Wuala– Internet Indirection Infrastructure (i3)
130.237.32.51anwita
193.10.64.99ali
18.7.22.83alberto
128.178.50.12ozalp
……
ValueKey
node A
node D
node B
node C
Structured overlays: Self-organization and Scalability @ SASO 2009 18
PP
2
Global File System
• Similar to DFS (eg NFS, AFS)– But files/metadata stored in directory– E.g. Wuala, WheelFS…
• What is new?– Application logic self-managed
• Add/remove servers on the fly• Automatic faliure handling• Automatic load-balancing
– No manual configuration for these ops
130.237.32.51/home/...
193.10.64.99/usr/…
18.7.22.83/boot/…
128.178.50.12/etc/…
……
ValueKey
node A
node D
node B
node C
Structured overlays: Self-organization and Scalability @ SASO 2009 19
PP
2
• A distributed web proxy/cache– Every node in the LAN runs a DHT client
• Browsing for a page:– Check DHT
• If page exists locally download from peer– Otherwise, fetch and cache
• Seamlessly add/remove workstations– No central servers
• Example:– Squirrel
130.237.32.51www.s...
193.10.64.99www2…
18.7.22.83www3…
128.178.50.12cs.edu
……
ValueKey
node A
node D
node B
node C
P2P Proxy
Structured overlays: Self-organization and Scalability @ SASO 2009 20
PP
2
P2P Web Servers
• Distributed Web Server– Pages stored in the directory
• What is new?– Application logic self-managed
• Automatically load-balances• Add/remove servers on the fly• Automatically handles failures
• Example:– CoralCDN
130.237.32.51www.s...
193.10.64.99www2
18.7.22.83www3
128.178.50.12cs.edu
……
ValueKey
node A
node D
node B
node C
Structured overlays: Self-organization and Scalability @ SASO 2009 21
PP
2
Access Layers for DHTs
• A relational view of the DHT (PIER)– Use SQL to fetch data– Standard operations (projection, selection,
equi-join)
• Approximate Matching (CUBIT)– Get k items with keys most similar to given key
130.237.32.51www.s...
193.10.64.99www2
18.7.22.83www3
128.178.50.12cs.edu
……
ValueKey
node A
node D
node B
node C
select name,salary
from emp, sal
where emp.id=sal.f_id
130anwita...
223alberto
141ali
221ozalp
……
ValueKey
node A
node D
node B
node C
get(”arwitanam”,1):
(”anwita”:”130”)
Structured overlays: Self-organization and Scalability @ SASO 2009 22
PP
2
towards DHT construction
consistent hashing
Structured overlays: Self-organization and Scalability @ SASO 2009 23
PP
2
Hash tables
• Ordinary hash tables– put(key,value)
• Store <key,value> in bucket (hash(key) mod 7)
– get(key)• Fetch <key,v> s.t. <key,v> is in bucket
(hash(key) mod 7)
0 1 2 3 4 5 6
Structured overlays: Self-organization and Scalability @ SASO 2009 24
PP
2DHT by mimicking Hash Tables
• Let each bucket be a server– n servers means n buckets
• Problem– How do we remove or add buckets?– A single bucket change requires re-shuffling a
large fraction of items
Structured overlays: Self-organization and Scalability @ SASO 2009 25
PP
2
Consistent Hashing Idea
• Logical name space, called the identifier space, consisting of identifiers {0,1,2,…, N-1}
• Identifier space is a logical ring modulo N
• Every node picks a random identifier
• Example:
– Space N=16 {0,…,15}
– Five nodes a, b, c, d• a picks 6• b picks 5• c picks 0• d picks 5• e picks 2
2
11
6
5
01
3
4
789
10
15
14
13
12
Structured overlays: Self-organization and Scalability @ SASO 2009 26
PP
2
Definition of Successor
• The successor of an identifier is the first node met going in clockwise direction
starting at the identifier
• Example– succ(12)=14
– succ(15)=2
– succ(6)=6
2
11
6
5
01
3
4
789
10
15
14
13
12
Structured overlays: Self-organization and Scalability @ SASO 2009 27
PP
2
Where to store items?
• Use globally known hash function, H
• Each item <key,value> gets the
identifier H(key)
• Store item at successor of H(key)– Term: node is responsible for item k
• Example– H(“Anwitaman”)=12
– H(“Ali”)=2
– H(“Alberto”)=9
– H(“Ozalp”)=14
2
11
6
5
01
3
4
789
10
15
14
13
12
Key Value
Anwitaman
Singapore
Ali BerkeleyAlberto TrentoKurt KasselOzalp Bologna
Structured overlays: Self-organization and Scalability @ SASO 2009 28
PP
2Consistent Hashing: Summary• Scalable
– Each node stores avg D/n items (for D total items, n nodes)– Reshuffle on avg D/n items for every join/leave/failure
• Everybody knows everybody– Akamai works this way– Amazon Dynamo too
• Load balancing– Whp O(log n) imbalance– Eliminate imbalance by
having each server ”simulate”log(n) random buckets
2
11
6
5
01
3
4
789
10
15
14
13
12
Structured overlays: Self-organization and Scalability @ SASO 2009 29
PP
2
towards dht construction
reducing neighbors
Structured overlays: Self-organization and Scalability @ SASO 2009 30
PP
2
Where to point (Chord)?• Each node points to its successor
– The successor of a node p is succ(p+1)– Known as a node’s succ pointer
• Each node points to its predecessor– First node met in anti-clockwise direction starting at n-1 – Known as a node’s pred pointer
• Example– 0’s successor is succ(1)=2– 2’s successor is succ(3)=5– 5’s successor is succ(6)=6– 6’s successor is succ(7)=11– 11’s successor is succ(12)=0
2
11
6
5
01
3
4
789
10
15
14
13
12
Structured overlays: Self-organization and Scalability @ SASO 2009 31
PP
2
DHT Lookup
• To lookup a key k
– Calculate H(k)
– Follow succ pointers until item k is found
• Example– Lookup ”Alberto” at node 2
• H(”Alberto”)=9
• Traverse nodes:2, 5, 6, 11 (BINGO)
• Return “Trento” to initiator
2
11
6
5
01
3
4
789
10
15
14
13
12
Key Value
Anwitaman Singapore
Ali Berkeley
Alberto Trento
Kurt Kassel
Ozalp Bologna
Structured overlays: Self-organization and Scalability @ SASO 2009 32
PP
2
towards dht construction
handling joins/leaves/failures
Structured overlays: Self-organization and Scalability @ SASO 2009 33
PP
2
Dealing with failures• Each node keeps a successor-list
– Pointer to f closest successors• succ(p+1)• succ(succ(p+1)+1)• succ(succ(succ(p+1)+1)+1)• ...
• Rule: If successor fails– Replace with closest alive successor
• Rule: If predecessor fails– Set pred to nil
• Set f=log(n)– With failure probability 0.5, w.h.p. all nodes in list
will not fail: 1/2log(n)=1/n
2
11
6
5
01
3
4
789
10
15
14
13
12
Structured overlays: Self-organization and Scalability @ SASO 2009 34
PP
2
Handling Dynamism
• Periodic stabilization used to make pointers eventually correct
– Try pointing succ to closest alive successor
– Try pointing pred to closest alive predecessor
Periodically at node p:
1. set v:=succ.pred2. if v≠nil and v is in
(p,succ]3. set succ:=v4. send a notify(p) to succ
When receiving notify(q) at node p:
1. if pred=nil or q is in (pred,p]
2. set pred:=q
Structured overlays: Self-organization and Scalability @ SASO 2009 35
PP
2
Handling joins
• When new node n joins– Find n’s successor with lookup(n)– Set succ to n’s successor– Stabilization fixes the rest
Periodically at node p:
1. set v:=succ.pred2. if v≠nil and v is in
(p,succ]3. set succ:=v4. send a notify(p) to succ
When receiving notify(q) at node p:
1. if pred=nil or q is in (pred,p]
2. set pred:=q
11
1513
Structured overlays: Self-organization and Scalability @ SASO 2009 36
PP
2
Handling leaves
• When n leaves– Just dissappear (like failure)
• When pred detected failed– Set pred to nil
• When succ detected failed– Set succ to closest alive in successor list
11
1513
Periodically at node p:
1. set v:=succ.pred2. if v≠nil and v is in
(p,succ]3. set succ:=v4. send a notify(p) to succ
When receiving notify(q) at node p:
1. if pred=nil or q is in (pred,p]
2. set pred:=q
Structured overlays: Self-organization and Scalability @ SASO 2009 37
PP
2Speeding up lookups with fingers
• If only pointer to succ(p+1) is used– Worst case lookup time is n, for n nodes
• Improving lookup time (binary search)– Point to succ(p+1)– Point to succ(p+2)– Point to succ(p+4)– Point to succ(p+8)– …– Point to succ(p+2(log N)-1)
• Distance always halved to
the destination, log hops
2
11
6
5
01
3
4
789
10
15
14
13
12
Structured overlays: Self-organization and Scalability @ SASO 2009 38
PP
2Handling Dynamism of Fingers and SList
• Node p periodically:
– Update fingers• Lookup p+21, p+22, p+23,…,p+2(log N)-1
– Update successor-list• slist := trunc(succ · succ.slist)
Structured overlays: Self-organization and Scalability @ SASO 2009 39
PP
2
Chord: Summary
• Lookup hops is logarithmic in n– Fast routing/lookup like in a dictionary
• Routing table size is logarithmic in n– Few nodes to ping
Structured overlays: Self-organization and Scalability @ SASO 2009 40
PP
2
Reliable Routing
• Iterative lookup– Generally slow (handling NATs, fw)– Reliability easy to achieve
• Initiator in full control
• Recursive lookup– Generally fast (use established links)– Several ways to do reliability
• End-to-end timeouts• Any node timeouts
– Difficult to determine timeout value
• Transitive lookup– Reliability: end-to-end timeouts
Structured overlays: Self-organization and Scalability @ SASO 2009 41
PP
2
Replication of items
• Successor-list replication (Chord,Pastry)– Idea: replicate nodes
• If node p responsible for set of items K• Replicate K on p’s immediate successors
• Symmetric Replication (DKS)– Idea: replicate identifiers
• Items with key 0,16,32,48 equivalent• Whoever is responsible for 0, also stores 16,32,48• Whoever is responsible for 16, also stores 0,32,48• …
Structured overlays: Self-organization and Scalability @ SASO 2009 42
PP
2
towards proximity awareness
plaxton-mesh (PRR)pastry/tapestry
Structured overlays: Self-organization and Scalability @ SASO 2009 43
PP
2
Plaxton Mesh [PRR]
• Identifiers represented with radix/base k– We use k=16, hexadecimal radix– Ring size N is a large power of k, e.g. 1640
Structured overlays: Self-organization and Scalability @ SASO 2009 44
PP
2
Plaxton Mesh (2)
• Additional routing table on top of ring• Routing table construction by example
– Node 3a7f keeps following routing table
• Kleene star * for wildcards– Flexibility to choose proximate neighbors
• Invariant: row i of any node in row i interchangeable
30* 31* 32* 33* 34* 35* 36* 37* 38* 39* self 3b* 3c* 3d* 3e* 3f*
3a0* 3a1* 3a2* 3a3* 3a4* 3a5* 3a6* self 3a8* 3a9* 3aa* 3ab* 3ac* 3ad* 3ae* 3af*
0* 1* 2* self 4* 5* 6* 7* 8* 9* a* b* c* d* e* f*
3a70* 3a71* 3a72* 3a73* 3a74* 3a75* 3a76* 3a77* 3a78* 3a79* 3a7a* 3a7b* 3a7c* 3a7d* 3a7e* self
Structured overlays: Self-organization and Scalability @ SASO 2009 45
PP
2
Plaxton Routing
• To route from 1234 to abcd:1. 1234 uses rt row 1: jump to a*, eg a999
2. a999 uses rt row 2: jump to ab*, eg ab11
3. ab11 uses rt row 3: jump to abc*, eg abc0
4. abc0 uses rt row 4: jump to abcd
• Routing terminates in log(N) hops– In practise log(n),
where N is id size and n is number of nodes
Structured overlays: Self-organization and Scalability @ SASO 2009 46
PP
2
Pastry
• Leaf set– Successor-list in both directions– Periodically gossiped to all leafs O(n2) [Bamboo]
• Plaxton-mesh on top of ring– Failures in routing table
• Get replacement from any node on same row
• Routing1) Route directly to responsible node in leaf set,
otherwise2) Route to closer (prefix) node, otherwise3) Route on ring
Structured overlays: Self-organization and Scalability @ SASO 2009 47
PP
2
Routing Table Initialization
• How does a new node initialize RT?– New node lookup its own id– At step i copy row i of the node
• Good if latencies are symmetric
– Example:Assume new node abcd knows 1234
1. 1234 uses rt row 1: jump to a*, eg a9992. a999 uses rt row 2: jump to ab*, eg ab113. ab11 uses rt row 3: jump to abc*, eg abc04. abc0 uses rt row 4: jump to abcd
Structured overlays: Self-organization and Scalability @ SASO 2009 48
PP
2
constant number of neighbors
De-bruijn graphsKoorde
Structured overlays: Self-organization and Scalability @ SASO 2009 49
PP
2
Even less routing info…
• How much routing state necessary?• Moore bound from graph theory
– Assume each node has k neighbors– How many nodes (at most) reachable in d hops?
– 0 hops: 1
– 1 hop: 1+k
– 2 hops: 1+k+k(k-1)
– 3 hops: 1+k+k(k-1)+k(k-1)2
– d hops: 1+k∑(k-1)i
=1+k((k-1)d-1)/(k-2)
Structured overlays: Self-organization and Scalability @ SASO 2009 50
PP
2
Moore bound
• Given k pointers per node– In d hops maximum n nodes are reachable– n ≤ 1+k((k-1)d-1)/(k-2)
– Solve d as a function of n– d ≥ logk-1[(n(k-2)+2)/k] ≈ logk n
• In DHTs, each node has k=log(n) neighbors– d ≈ loglog nn = log n/log(log n)
• So, optimally, for n nodes, with log(n) pointers we should reach everyone in log(n)/log(log n) hops
Structured overlays: Self-organization and Scalability @ SASO 2009 51
PP
2
Optimal Graphs
• De Bruijn graphs provide our bounds
• Example k=2– Consider each node’s identifier in binary– Each node i should know 2 neighbors:
• 2i (mod N)• 2i+1 (mod N)
• Example:– Node 011011 knows 110110 and 110111
Structured overlays: Self-organization and Scalability @ SASO 2009 52
PP
2
Routing in De Bruijn Graphs
• Example– k=2, n=23=8
• Routing– Main idea:
• Each hop shifts in one final digit (left-to-right)
– Eg node 110 wants to find 011• 110 jumps to 100 [010]
• 100 jumps to 001 [010]
• 001 jumps to 011 [011]
Structured overlays: Self-organization and Scalability @ SASO 2009 53
PP
2
Routing in De Bruijn Graphs
• Lookup algorithm at node m– Initially kshift=k (key to lookup)– All operations (<<) mod N
Structured overlays: Self-organization and Scalability @ SASO 2009 54
PP
2
Making a DHT of De Bruijn Graphs
• With d=2 pointers we get log(N) hops, where N is id space size (2160)– How to achieve log(n), n=number of nodes
• Main idea– Route on an imaginary 2160 graph,
• Invariant: go to predecessor of imaginary node
– Store pointer called d to predecessor of 2i
Structured overlays: Self-organization and Scalability @ SASO 2009 55
PP
2
Koorde DHT
– Algorithm at node m• i is imaginary node
– Initially i=m.successor
• Initially k=kshift
Structured overlays: Self-organization and Scalability @ SASO 2009 56
PP
2
2-hop Lemma
• The number of hops is w.h.p. at most 3log(N)– i.e. we need 2 succ traversals per De Bruijn hop
• When at m=predecessor(i)– Jump to 2m
– Traverse successor to reach predecessor(2i)• (2i-2m)/N fraction of space, with n(2i-2m)/N nodes• On average i-m = N/n • So n(2N/n)/N = 2 nodes traversed
QED
Structured overlays: Self-organization and Scalability @ SASO 2009 57
PP
2
Koorde works! (2)
• Still O(log N), how to get O(log n)?
• Use flexibility in i parameter– Can set i to any node in range m, m.succ
– Set low bits of i to maximize number of final digits
Structured overlays: Self-organization and Scalability @ SASO 2009 58
PP
2O(log n) Hops Koorde Theorem
• Distance between m and i – on avg N/n– Whp the distance is more than N/n2
– Number of low-order bits in range• log(N)-2log(n) bits can be set arbitrarily
• Need to route total log(N) bits– log(N)-2log(n) already done– 2log(n) bits needed to be shifted
QED
Structured overlays: Self-organization and Scalability @ SASO 2009 59
PP
2
architecture of structured overlays
a formal view of DHTs
Structured overlays: Self-organization and Scalability @ SASO 2009 60
PP
2General Architecture for DHTs
• Metric space S with distance function d– d(x,y)≥0
– d(x,x)=0
– d(x,y)=0 x=y
– d(x,y) + d(y,z) ≤ d(x,z)
– d(x,y)=d(y,x) (not always)
• Eg:– d(x,y) = y – x (mod N) Chord
– d(x,y) = x xor y Kademlia
– d(x,y) = sqrt( (x1-y1)2 + … + (xd-yd)2 ) CAN
Structured overlays: Self-organization and Scalability @ SASO 2009 61
PP
2
Graph Embedding
• Embed a virtual graph for routing– Powers of 2 (Chord)– Plaxton mesh (Pastry/Tapestry)– Hypercube – De-bruijn (Koorde, DH)– Butterfly (Viceroy)
• A node responsible for many virtual identifiers– Eg Chord nodes responsible for all virtual ids between
node id and predecessor
Structured overlays: Self-organization and Scalability @ SASO 2009 62
PP
262
Abstracting a tree Actual connectivity graph
0 1
00 01
000 001 010 011 100 101 110 111
A B C D E F G H
A
B
C
D
E
F
G
H
• Structural replication– Multiple peers responsible for the same key-space– Multiple routes resolving same prefix
P-Grid (EPFL)
Structured overlays: Self-organization and Scalability @ SASO 2009 63
PP
263
Query at A for 010 - A forwards it to D
000
011
A
B
C
D
E
F
G
H
0 1
00 01
000 001 010 011 100 101 110 111
A B C D E F G H
Query forwarding in P-Grid
Structured overlays: Self-organization and Scalability @ SASO 2009 64
PP
264
011
Query at A for 010 - D forwards it to C who has the answers!
000010A
B
C
D
E
F
G
H
0 1
00 01
000 001 010 011 100 101 110 111
A B C D E F G H
Query forwarding in P-Grid
Structured overlays: Self-organization and Scalability @ SASO 2009 65
PP
265
z x
y
*** 0 *** 1
New node y wants to join the network
z
x
y
*** 0 *** 1
Nodes y and z negotiates to repartition the key-space(alternatively, they could have decided to be replicas)
*** 01*** 00
Node joining in P-Grid
•Multiple peers can also decide to be replicas of the same partition
–Structural replication (a.k.a. Zone overloading)–Different kind of replication than in Chord
Structured overlays: Self-organization and Scalability @ SASO 2009 66
PP
266
directory(logical ID <-> IP address)
(if local cache does not work)lookup IP address
P-Grid
routing based on logical address (and cached IP)
Self-referential directory
routing based on logical address lookup IP address
in case of failure
Churn: Membership dynamics (peers leave and re-join) Peers rejoin with dynamic IP addresses
You may want to reconnect with the same guy
• Social/trust networks …
• Storage systems … (returning back with content)
P-Grid’s Self-referential directory and overlay maintenance
Structured overlays: Self-organization and Scalability @ SASO 2009 68
PP
268
Stale cache
1 : 12, 1301 : 5, 10001: 9,4
1 1
1 : 12, 1301 : 5,14001: 9,4
7 1
1 : 6,1301 :10,14000: 1,7
4 2,3
1 : 8,201 : 3, 10000: 1,7
9 2,3
1 : 8, 1300 : 7,9011: 3,10
5 4,5
1 : 2,1200 : 9,4011: 3,10
14 4,5
1 : 6,800 : 1,7010: 5,14
10 6,7
1 : 11,1200 : 1,9010: 5,14
3 6,7
0 : 4,711 : 2,12101: 8,13
11 8,9
1 : 1,311 : 2,12101: 8,13
6 8,9
0 : 5,911 : 2,12100: 6,11
13 10,11
0 : 4,911 : 2,12100: 6,11
8 10,11
0 : 5,710 : 6,13
12 12,13,14
0 : 1,1410 : 11,13
2 12,13,14
0 1
00
000 001
01
010 011
10
100 101
11
ID
ID
1 : 2 ,12
Up-to-date cache
Presently online
Presently offnline
LEGEND
4, 5 at 5,14
This toy example uses 4-bit representation of ID as the corresponding keyInformation about peer 4 is stored corresponding to key 0100 at peers 5,14
[Aberer04]
Structured overlays: Self-organization and Scalability @ SASO 2009 69
PP
269
1 : 12, 1301 : 5, 10001: 9,4
1 1
1 : 12, 1301 : 5,14001: 9,4
7 1
1 : 6,1301 :10,14000: 1,7
4 2,3
1 : 8,201 : 3, 10000: 1,7
9 2,3
1 : 8, 1300 : 7,9011: 3,10
5 4,5
1 : 2,1200 : 9,4011: 3,10
14 4,5
1 : 6,800 : 1,7010: 5,14
10 6,7
1 : 11,1200 : 1,9010: 5,14
3 6,7
0 : 4,711 : 2,12101: 8,13
11 8,9
1 : 1,311 : 2,12101: 8,13
6 8,9
0 : 5,911 : 2,12100: 6,11
13 10,11
0 : 4,911 : 2,12100: 6,11
8 10,11
0 : 5,710 : 6,13
12 12,13,14
0 : 1,1410 : 11,13
2 12,13,14
0 1
00
000 001
01
010 011
10
100 101
11
ID
ID
1 : 2 ,12
Stale cache
Up-to-date cache
Presently online
Presently offnline
query(01*) @ 7…query(0101) @ 7 (for stale entry 5, cycle -> abort)…query(1110) @ 7 (for stale entry 14, forward to 12 or 13)…query(1110) @ 12 (is offline)…query(1110) @ 13 (for stale entry 2)……query(0010) @ 13 (forward to 5)……query(0010) @ 5 (forward to 7)……query(0010) @ 7 (forward to 9)……query(0010) @ 9 (new entry for 2 found !)…query(1110) @ 2 (new entry for 14 found !)query(01*) @ 14 (finally )
Structured overlays: Self-organization and Scalability @ SASO 2009 70
PP
270
• Encountering unusable routes trigger queries recursively
• Recursive queries heal the network - A family of more efficient and adaptive overlay maintenance schemes (than proactive approaches)
- two extremes (of this family): Correction on Use, Correction on Failure
• System operates at a dynamic equilibrium
Self-healing recursive queries
Structured overlays: Self-organization and Scalability @ SASO 2009 71
PP
271
At steady-state: Effects of churn and self-healing cancel out Churn => ID-to-IP changes (unusable routing entries) Healing => make routes usable again
Dynamic equilibrium under churn
Structured overlays: Self-organization and Scalability @ SASO 2009 72
PP
272
Steady state: Probability distribution of the number of stale reference does not change We can obtain the repair cost and routing performance (latency / message cost) corresponding to this steady state.
0 refstale
1 refstale
2 refstale
r refstale…
repairs
IDchange
IDchange
IDchange
IDchange
r references (redundancy) per routing level per peer
Dynamic equilibrium under churn
Structured overlays: Self-organization and Scalability @ SASO 2009 73
PP
273
Contour map of cost/resilience trade-offs
Dynamic equilibrium under churn
Structured overlays: Self-organization and Scalability @ SASO 2009 74
PP
274
Comparison of maintenance mechanisms based on degree of laziness• Breakdown of the lazy mechanism
Dynamic equilibrium under churn
Analogous to Bamboo’s empirical experience of positive feedbacks!
Structured overlays: Self-organization and Scalability @ SASO 2009 75
PP
275
Reactivestrategies
Taxonomy of route maintenance mechanisms (circa 2004)
Structured overlays: Self-organization and Scalability @ SASO 2009 76
PP
276
Prevention is better than cure
•Predictive and proactive strategies for routing table maintenance
–Kademlia•Like P-Grid but uses XOR metric for routing
– Accordion•Like Chord but exploiting properties of algebraic small-world networks
Structured overlays: Self-organization and Scalability @ SASO 2009 77
PP
277
Kademlia
• Note similarity of topology with P-Grid - but uses different (XOR) routing mechanism
[Maymounkov02]
Structured overlays: Self-organization and Scalability @ SASO 2009 78
PP
278
XOR routing
Structured overlays: Self-organization and Scalability @ SASO 2009 79
PP
279
Reducing the effect of churn
Empirical observation from Gnutella trace: Probability of remaining online for another hour (y-axis) as a function of uptime (x-axis in minutes).
• Least recently seen eviction policy for `k-bucket´- but never evicts live nodes
Structured overlays: Self-organization and Scalability @ SASO 2009 80
PP
280
• . [Kleinberg00]• Guarantees poly-log n lookup hops • Allows smooth expansion of routing table
xx
1 space] IDin away isneighbor Pr[
Proactive route maintenanceBased on: Small-world distribution is flexible - useful for only long distance routes
Accordion [Li05]
Structured overlays: Self-organization and Scalability @ SASO 2009 81
PP
281
Main idea: Evicting stale entries efficiently
• Delete proactively before a lookup times out
• Pinging uses bandwidth inefficiently
• Predict each entry’s Pr(alive)
• Delete entries with Pr(alive) < threshold
Proactive route maintenance
Structured overlays: Self-organization and Scalability @ SASO 2009 82
PP
282
Analytic results
h(p
)Avg
Loo
kup
hops
+ t
imeo
uts
Delete entries with Pr(alive) < x
Best threshold
Delete AggressivelyDelete lazily
Choosing best deletion threshold
Structured overlays: Self-organization and Scalability @ SASO 2009 83
PP
283
U: known uptime A: time since last contacted Age
joined
UA
UUlifetimeAUlifetimealive
)|Pr()Pr(
• With Pareto session time:
Last contacted now
Timeline
• Delete entry if < thresholdUA
U
Predicting routing entry liveness
Structured overlays: Self-organization and Scalability @ SASO 2009 84
PP
284
CostBandwidth budget (bytes/node/sec)
Per
form
ance
Avg
lo
oku
p l
aten
cy (
mse
c)
Evaluation: performance/cost tradeoff
Structured overlays: Self-organization and Scalability @ SASO 2009 85
PP
285
Comparing with parameterized DHTs
Avg
lo
oku
p l
aten
cy (
mse
c)
Avg bandwidth consumed (bytes/node/sec)
Structured overlays: Self-organization and Scalability @ SASO 2009 86
PP
286
Avg
Lo
oku
p l
aten
cy (
mse
c)
Avg bandwidth consumed (bytes/node/sec)
Convex hull outlines best tradeoffs
Structured overlays: Self-organization and Scalability @ SASO 2009 87
PP
287
Lowest latency for varying churn
Median node session time (hours)
Avg
look
up la
tenc
y (m
sec)
• Accordion has lowest latency at low churn • Accordion’s latency increases slightly at high churn
Fixed budget,Variable churn
Structured overlays: Self-organization and Scalability @ SASO 2009 88
PP
288
Accordion stays within budget
Median node session time (hours)
Avg
ban
dwid
th (
byte
s/no
de/s
ec)
• Other protocols’ bandwidth increases with churn
Fixed budget,Variable churn
Structured overlays: Self-organization and Scalability @ SASO 2009 89
PP
289
Conclusions• Reactive strategies
– Redundancy can be exploited• To determine the degree of laziness• Trade-off between cost/resilience
– May lead to catastrophic failures under high churn (particularly for a lazy reactive strategy)
• e.g., because of positive feed-back
• Proactive strategies– Reduces the chance of catastrophic failure
• At the cost of continuous bandwidth usage– Sometimes unneccessarily
• Most maintenance strategies ignore the fact that persistent IDs may be useful
• E.g., does not look into the storage maintenance costs that need be carried out as a collateral
Structured overlays: Self-organization and Scalability @ SASO 2009 90
PP
2
Bootstrapping structured overlays
Part III
Structured overlays: Self-organization and Scalability @ SASO 2009 91
PP
291
Issues: – Properties of the resulting overlay
Load-balance, proximity, …
– Bootstrapping mechanisms• Sequential, Parallelized
- some implicit centralization• Decentralized
– Cost and overheads
• Construction cost & latency, …
Bootstrapping structured overlays
Structured overlays: Self-organization and Scalability @ SASO 2009 92
PP
292
In the beginning, there was …
Trivia:
The term "bootstrapping" alludes to a German legend about Baron Münchhausen, who claimed to have been able to lift himself out of a swamp by pulling himself up by his own hair. In later versions of the legend, he used his own boot straps to pull himself out of the sea which gave rise to the term bootstrapping. The term is believed to have entered computer jargon during the early 1950s by way of Heinlein's short story By His Bootstraps first published in 1941. (from Wikipedia)
Bootstrapping
Structured overlays: Self-organization and Scalability @ SASO 2009 93
PP
293
Load-balancing in DHTs
Load balancing in peer-to-peer (P2P) systems is a mechanism to spread various kinds of loads like storage, access and message forwarding among participating peers in order to achieve a fair or optimal utilization of contributed resources such as storage and bandwidth. For example,
Bootstrapping overlays
While bootstrapping an overlay network, we need to ensure good load-balancing characteristics.
– System with N homogeneous nodes
– The load is optimally balanced, • Load of each node is around 1/N of the total load.
Structured overlays: Self-organization and Scalability @ SASO 2009 94
PP
294
Load-balancing in DHTs
A First step: DHT
Use uniform hashing
The basic idea: Generate keys for each object to be stored by applying uniform (consistent) hashing (e.g. SHA-1)
• The keys are then uniformly distributed over the key-space
Assign peers to a part of the key-space by also applying (the same) hashing, on lets say on the peers’ IP address*
• Peers are then distributed uniformly over the key-space
This was expected to achieve load-balance
* Hashing was also expected to provide security in the original design of Chord, etc.
Structured overlays: Self-organization and Scalability @ SASO 2009 95
PP
295
• Analysis of distribution of data
• Example– Parameters
• 4,096 nodes• 500,000 documents
– Optimum• ~122 documents
per node
Optimal distribution of documents across nodes
Load-balancing in DHTs
[Rieche06]
Structured overlays: Self-organization and Scalability @ SASO 2009 96
PP
296
• Number of nodes storing no document– Parameters
• 4,096 nodes• 100,000 to 1,000,000
documents
– Some nodes w/o any load
Load-balancing in DHTs
Something’s wrong! What? Why??
Structured overlays: Self-organization and Scalability @ SASO 2009 97
PP
297
Balls into bins analogy
• n number of intervals (bins)– Intervals of equal size
• m number of items (balls)• sequentially choose a bin randomly for
each ball– A bin is hit with probability p = 1/n
• The number of balls in a bin is then given by the binomial distribution– Binomial distribution– Standard deviation
)(1
11
)(imi
b nni
miloadp
nn
mb
11
Structured overlays: Self-organization and Scalability @ SASO 2009 98
PP
298
A quick example (balls into bins)
Using mathematica
Expected value
In[61]:= Arraya,50; Doai0,i,1,50;DoxCeiling50Random;axax1,j,1,1000HistogramArraya,50,FrequencyData True
10 20 30 40 50
5
10
15
20
25
30
Structured overlays: Self-organization and Scalability @ SASO 2009 99
PP
299
Node C
Node A
Node B
Load-balancing in DHTs: Virtual Servers
• Each node is responsible for several intervals– "Virtual server"
• Example– Chord
Chord Ring
Increase the effective “n” by having many virtual peers for the same physical computer
nn
mb
11[Rao03]
Structured overlays: Self-organization and Scalability @ SASO 2009 100
PP
2100
• Each node is responsible for several intervals– log (n) virtual servers
• Load balancing– Different possibilities to change servers
• One-to-one• One-to-many• Many-to-many
– Copy of an interval is like removing and inserting a node in a DHT
Virtual Server
Structured overlays: Self-organization and Scalability @ SASO 2009 101
PP
2101
L L
L
L
LHH
HL
Load stealing/shedding
• One-to-One– Light node picks a random ID– Contacts the node x responsible for it– Accepts load if x is heavy
Slide has animation
Structured overlays: Self-organization and Scalability @ SASO 2009 102
PP
2102
Light nodes
L1
L4
L2
L3
Heavy nodes
H3
H2
H1
Directories
D1
D2
L5
• One-to-Many– Light nodes report their load information to directories– Heavy node H gets this information by contacting a directory– H contacts the light node which can accept the excess load
Load stealing/shedding
Slide has animation
Structured overlays: Self-organization and Scalability @ SASO 2009 103
PP
2103
Heavy nodes
H3
H2
H1
Directories
D1
D2L4
Light nodes
L1
L2
L3
L4
L5
• Many-to-Many– Many heavy and light nodes rendezvous at each step– Directories periodically compute the transfer schedule and report it
back to the nodes, which then do the actual transfer
Load stealing/shedding
Slide has animation
Structured overlays: Self-organization and Scalability @ SASO 2009 104
PP
2104
• Advantages– Easy shifting of load
• Whole Virtual Servers are shifted
– Can be extended for heterogeneous environments*• More virtual servers for a resource rich node
• Disadvantages– Increased administrative and message overheads
• Maintenance of all Finger-Tables
– Much load is shifted– Much more overlay traffic
Load-balancing in DHTs: Virtual Servers
* [Godfrey05]
Structured overlays: Self-organization and Scalability @ SASO 2009 105
PP
2105
• Idea– One hash function for all nodes
• h0
– Multiple hash functions for data• h1, h2, h3, …hd
• Two options– Data is stored at one node– Data is stored at one node &
other nodes store a pointer
Load-balancing in DHTs: Power of 2 choices
[Byers03]
Structured overlays: Self-organization and Scalability @ SASO 2009 106
PP
2106
• Inserting Data– Results of all hash functions are calculated
• h1(x), h2(x), h3(x), …hd(x)
– Data is stored on the retrieved node with the lowest load
– Alternative• Other nodes stores pointer
Load-balancing in DHTs: Power of 2 choices
Structured overlays: Self-organization and Scalability @ SASO 2009 107
PP
2107
• Retrieving– Without pointers
• Results of all hash functions are calculated• Request all of the possible nodes in parallel• One node will answer
– With pointers• Request only one of the possible nodes.• Node can forward the request directly to the final
node
Load-balancing in DHTs: Power of 2 choices
Structured overlays: Self-organization and Scalability @ SASO 2009 108
PP
2108
• Advantages– Simple– Generic randomized algorithm
• Disadvantages (with the specific realization)
– Message overhead at inserting data– With pointers
• Additional administration of pointers – More load– More adverse effect of churn
– Without pointers• Message overhead at every search
Load-balancing in DHTs: Power of 2 choices
Structured overlays: Self-organization and Scalability @ SASO 2009 109
PP
2109
A quick example (power of two choices)
Using mathematica
Expected value
In[67]:=
Arrayb,50; Dobi0,i,1,50;DoxCeiling50Random;yCeiling50Random; Ifbx by,by by1,bx bx1,j,1,1000HistogramArrayb,50,FrequencyData True
10 20 30 40 50
5
10
15
20
Structured overlays: Self-organization and Scalability @ SASO 2009 110
PP
2110
11 21
2
3
1
4
3
2
d=2d=2
So far ...
• Bootstrapping DHTs– Uniform key distributions
…– Peers joined the network
quasi-sequentially• The network was
partitioned incrementally
• Next– Non-uniform keys– Parallelized construction
CAN network construction
Structured overlays: Self-organization and Scalability @ SASO 2009 111
PP
2111
Beyond DHTs: Data-oriented overlays
Preserve ordering information As occurring in natural language (say).
Needs more sophisticated (storage) load-balancing mechanisms to support
range partitioned data
Uniform hashing (used in DHTs) destroys ordering information!
Resource Key What is a suitable function? Depends on the application needs!
Figure courtesy Sarunas
Structured overlays: Self-organization and Scalability @ SASO 2009 112
PP
2112
Beyond DHTs: Data-oriented overlays
• Complex queries– Approximate or similarity queries
• DHTs can only support exact search
– Range queries– etc.
• e.g., Skyline queries
• Overlay supporting arbitrarily skewed load-distributions– DHTs are just a special case
Structured overlays: Self-organization and Scalability @ SASO 2009 113
PP
2113
Parallelized construction of overlays
• Shortcomings of sequential construction– Implicitly assumes some coordinator
• Implicit centralization
– Slow• Since peers join one by one
• Parallelized construction– Faster– Analogous to (re-)indexing a new attribute in a
DB • Can be useful for recovery from catastrophic
failures
Structured overlays: Self-organization and Scalability @ SASO 2009 114
PP
2114
Parallelized construction of overlays
Skewed load-distribution
1
23
45
6 7
8
• Given– A mechanism to meet other random peers
• e.g., an existing unstructured overlay
– A parameter p• Determined according to the load-skew
[Aberer05]
Structured overlays: Self-organization and Scalability @ SASO 2009 115
PP
2115
Distributed proportional partitioning:
- p fraction of peers take one half of the space (partition 0)
- 1-p fraction of peers take the other half (partition 1)- Needed for partitioning the key-space
in a granularity adaptive to load-skew
Skewed load-distribution
1
23
45
67
8
p = 0.75
0 1
Parallelized construction of overlays
Structured overlays: Self-organization and Scalability @ SASO 2009 116
PP
2116
Referential integrity:
- Each peer needs to know some peer from the complimentary partition- Needed for overlay routing- This constraint necessitates a
non-trivial algorithm (in order to reduce communication cost during overlay construction)
Skewed load-distribution
1
23
45
6 7
8
0 1
Each of themneeds to know
7 or 8 vice versa
Parallelized construction of overlays
Structured overlays: Self-organization and Scalability @ SASO 2009 117
PP
2117
Markov Partitioning process - peers are decided 0/1 or undecided - each undecided peer interacts with
some random peer - which has decided 0/1 or is still undecided
Skewed load-distribution
1
23
45
6 7
8
0 1
know s 6
know s 1
vice versa
Parallelized construction of overlays
[Aberer05]
Structured overlays: Self-organization and Scalability @ SASO 2009 118
PP
2118
Used recursively: Partitions are repartitioned (using
appropriate parameters) A load-balanced overlay is formed
Skewed load-distribution
1
2
3
4
5
6 7
8
00
010 011
1
Several other practical issues - local estimates of parameter p - replication factor (re-)balancing
Now we can builda load-balanced overlay in a parallelized manner
for rather arbitrary load-skews
Parallelized construction of overlays
Structured overlays: Self-organization and Scalability @ SASO 2009 119
PP
2119
• Advantages– Parallelized and fast construction– No need of coordination
• Since no need of sequential joins
– Load-balancing for arbitrary load-skews
• Disadvantages (with the specific realization)
– Complex• Algorithm design, analysis and implementation• Needs partial global information
– e.g., parameter choices (based on sampling)
Distributed proportional partitioning
Structured overlays: Self-organization and Scalability @ SASO 2009 120
PP
2120
• Pairing and merging virtual trees– Pair nodes randomly
• By probing potential successors – and accepting/rejecting probes
• Paired nodes act as virtual supernode• Repeat the process
– needs a mechanism to merge such virtual trees
Other parallelized construction mechanism: Sorting peer-IDs to build a ring
1 7
1
5 9
5
1 5
1
7 9
7
1Virtual nodes
Pairing Merging tree (with sorted peers)
… …
[Angluin05]
Structured overlays: Self-organization and Scalability @ SASO 2009 121
PP
2121
• Gossip based mechanism– Nodes start with random
subsets• Leaf-set: Maintain a
constant number of nodes– Arrange them as potential
predecessors/successors» Ideally equal number
of each• Gossip leaf-set information
with nodes it knows– E.g., its current leaf-set
nodes (may also include past ones)
– Refine information & repeat process
Other parallelized construction mechanism: Sorting peer-IDs to build a ring
7: 4, 5, 9, 109: 6, 8, 12, 14…
node leaf-set
Node 7 gossips its leaf-set with nodes it knows (including node 9), and each node refreshes their leaf-sets.
after gossip
7: 5, 6, 8, 99: 7, 8, 10, 12…
recalculated leaf-set
Gradually converges to form a sorted list [Montresor05]
Structured overlays: Self-organization and Scalability @ SASO 2009 122
PP
2122
• Advantages– Parallelized and fast construction– No need of coordination
• Since no need of sequential joins– Relatively simple (no global information)– Gossip based mechanism is robust against churn
during sorting process• Disadvantages (with the specific realizations)
– Do not take into account load-balancing issues– Not directly applicable for systems using structural
replication/zone over-loading– Just builds the basic ring, but not the long range links
• Though not complicated to build once the ring is in place– Pairing & merging mechanism is vulnerable to churn
during the sorting process
Sorting peer IDs
Structured overlays: Self-organization and Scalability @ SASO 2009 123
PP
2123
So far …
Bootstrapping issues: – Load-balance– How?
• Sequential• Parallelized
Assumes (implicitly) that any peer can potentially meet any other peer, and thus already are part of one connected network, and then build a single structured
overlay composed of all these peers.
Structured overlays: Self-organization and Scalability @ SASO 2009 124
PP
2124
Figure from http://www.tellagate.com/kojima/blog/
Cluster A
Cluster B
Cluster C
Cluster X
Network 1 formed over time
Network 2 formed over time
join
Merger is accomplished trivially & transparently
The network can thus grow by organic merger of smaller (originally isolated) networks, allowing decentralized bootstrapping of Gnutella like unstructured overlays
Bootstrapping in unstructured networks
Structured overlays: Self-organization and Scalability @ SASO 2009 125
PP
2125
• Overlay merger– Needed for decentralized bootstrapping
– Needed for recovery from partitioning• Ignored in P2P literature!
– Lack of experience with real deployments
– Focus on other issues like churn
• Trivial in unstructured and super-peer networks
– Merger of index is a standard DB issue
Merging structured overlays
[Datta07]
Structured overlays: Self-organization and Scalability @ SASO 2009 126
PP
2126
• Overlay merger– Correctness of routing
• Maintain routing table
– Correct and complete key binding• Ship data to responsible peer(s)
– Replica synchronization
For locating the desired data/content, both are essential!
Merging structured overlays
Structured overlays: Self-organization and Scalability @ SASO 2009 127
PP
2127
• Merging tree topology with structural replication (e.g., P-Grid) has significantly different challenges than merging ring based networks.– Merging P-Grid networks transparently is much
simpler algorithmically.• So we use this case to illustrate the idea, the
challenges, important metrics …
Merging structured overlays
Structured overlays: Self-organization and Scalability @ SASO 2009 128
PP
2128
• If they have the same path– Synchronize replicas
• If one has a strict prefix path– Extend path and routing table, synchronize replica
• Stimulate new interactions
When peers from different networks meet
Structured overlays: Self-organization and Scalability @ SASO 2009 129
PP
2129
When peers from different networks meet
Structured overlays: Self-organization and Scalability @ SASO 2009 130
PP
2130
• Keys continue to be accessible to peers– Replica synchronization needed to access keys
from the other network
The merger process should be transparent. At application level, all the keys which are once accessible
continue to be accessible to individual users(unless the application deletes them).
When peers from different networks meet
Structured overlays: Self-organization and Scalability @ SASO 2009 131
PP
2131
- This transparency may be violated until the replicas are actually synchronized!- Ideally, if we could determine when the sync process is completed throughout the network, but could distinguish peers from originally different networks in the meanwhile, then we could retain transparency.- Instead of detecting global completion, use a heuristic time out (once local sync is completed).
When peers from different networks meet
Structured overlays: Self-organization and Scalability @ SASO 2009 132
PP
2132
• 3 axis
– Network sizes
– Duplicate content in the original unmerged networks
– Heuristic parameter (time out)
Parameter space
Structured overlays: Self-organization and Scalability @ SASO 2009 133
PP
2133
• Recall (over time) Ri/j
– Metric from Information retrieval • Recall is the fraction of the documents that are
relevant to the query that are successfully retrieved.
– Ri/i should always be 1 for the merger to be transparent to applications
• Volume of data transferred
Important performance metrics
Structured overlays: Self-organization and Scalability @ SASO 2009 134
PP
2134
Recall
Structured overlays: Self-organization and Scalability @ SASO 2009 135
PP
2135
Volume of data transferred
Structured overlays: Self-organization and Scalability @ SASO 2009 136
PP
2136
Increases when there is more common data! - The allotment of peers’ key-space change, so … Merger o
f two networks with originally arbitra
rily diffe
rent key-space partitions
Volume of data transferred
Structured overlays: Self-organization and Scalability @ SASO 2009 137
PP
2137
Volume of data transferred
Structured overlays: Self-organization and Scalability @ SASO 2009 138
PP
2138
Recall (with worst choice of timeout)
Structured overlays: Self-organization and Scalability @ SASO 2009 139
PP
2
Concluding remarks
Part IV
Structured overlays: Self-organization and Scalability @ SASO 2009 140
PP
2
DHTs
• Characteristic property– Self-manage responsibilities in presence:
• Node joins• Node leaves• Node failures• Load-imbalance• Replicas
• Basic structure of DHTs– Metric space– Embed graph with efficient search algo– Let each node simulate many virtual nodes
Structured overlays: Self-organization and Scalability @ SASO 2009 141
PP
2
The future of DHTs
• DHTs automatically handle– Replication, faults, load-balancing, joins, leaves, …
• One-size-fits-all?– Need dynamically auto-tunable DHTs– Difference applications have different needs
• Stronger guarantees– Consistency models– Transactions– Access layers
Structured overlays: Self-organization and Scalability @ SASO 2009 142
PP
2142
To P2P or not to P2P
• P2P vs. dedicated infrastructure– Is it technically feasible to realize everything using
P2P?• May be: Its an open issue
– Unlikely, in terms of performance– Harder to guarantee reliability (no one is fully accountable)
Structured overlays: Self-organization and Scalability @ SASO 2009 143
PP
2143
To P2P or not to P2P
• P2P vs. dedicated infrastructure– Shall P2P be preferred whenever technically
possible?• Don’t think so …
– A matter of risk/cost vs. benefit trade-offs
Structured overlays: Self-organization and Scalability @ SASO 2009 144
PP
2144
To P2P or not to P2P
• P2P vs. dedicated infrastructure– How about scalability?
• Again, depends– With enough money, client-server can scale in many cases
» e.g., Google – Network resource consuming applications like content
distribution may scale better using a P2P approach than client-server
Structured overlays: Self-organization and Scalability @ SASO 2009 145
PP
2145
To summarize …
• P2P makes sense if:– Budget/resource is limited
• Dedicated infrastructure is unsustainable or makes less economic sense
– Wide interest and relevance • To form a critical mass of users contributing resources
– Trust between participants is reasonably `high’• What’s `high’ depends on the application
– Rate of change is manageable• E.g., membership dynamics is not `too high’
– Criticality is `low’• Since it is harder to guarantee reliability or QoS in P2P• E.g., Skype’s disclaimer states its not for making
emergency calls!
Structured overlays: Self-organization and Scalability @ SASO 2009 146
PP
2146
To summarize …
• P2P systems exhibit following characteristics:– Autonomy from central servers– Use of edge resources
• Instead of dedicated infrastructure– Intermittent connectivity– Reliance on self-organizing mechanisms using
limited (locally available) information • No global coordination and control
– Unlike other distributed systems like Grid
Structured overlays: Self-organization and Scalability @ SASO 2009 147
PP
2147
References
[Aberer05] Indexing data-oriented overlay networks. Karl Aberer, Anwitaman Datta, Manfred Hauswirth, Roman Schmidt (VLDB 2005)
[Angluin05] Fast construction of overlay networks. D. Angluin, J. Aspnes, J. Chen, Y. Wu and Y. Yin (SPAA 2005) [Byers03] Simple Load Balancing for Distributed Hash Tables.J. Byers, J. Considine, and M. Mitzenmacher (IPTPS 2003)
[Datta07] Merging Intra-Planetary Index Structures: Decentralized Bootstrapping of Overlays. Anwitaman Datta (SASO 2007)
[Ghodsi06] Distributed k-ary System: Algorithms for Distributed Hash TablesAli Ghodsi, Dissertation, KTH—Royal Institute of Technology, Sweden, 2006
[Godfrey05] Heterogeneity and Load Balance in Distributed Hash Tables. P. Brighten Godfrey and Ion Stoica (INFOCOM 2005)
[Montresor05] Chord on Demand. A. Montresor, M. Jelasity and O. Babaoglu (P2P 2005) [Rao03] Load Balancing in Structured P2P Systems.A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, I. Stoica (IPTPS 2003)
[Rieche06] Ref: Ralf Steinmetz, Klaus Wehrle (Eds): Peer-to-Peer Systems and Applications.Reliability and Load-Balancing in DHTs, Simon Rieche, Klaus Wehrle, Heiko Niedermayer, Stefan Götz
[Xu03] On the Fundamental Tradeoffs between Routing Table Size and Network Diameter in Peer-to-Peer Networks.J. Xu, A. Kumar and X. Yu (JSAC 2003)
Structured overlays: Self-organization and Scalability @ SASO 2009 148
PP
2148
References
[Aberer04] Efficient, self-contained handling of identity in Peer-to-Peer systems.Karl Aberer, Anwitaman Datta, Manfred Hauswirth IEEE Transactions on Knowledge and Data Engineering (TKDE) 16(7), 2004.
[Kleinberg00] The small-world phenomenon: An algorithmic perspective.J. Kleinberg. Proc. 32nd ACM Symposium on Theory of Computing (STOC) 2000.
[Li05] Bandwidth-efficient Management of DHT Routing Tables. Jinyang Li, Jeremy Stribling, Robert Morris, and M. Frans Kaashoek.Usenix Symposium on Networked Systems Design and Implementation (NSDI) 2005.
[Rhea04] Handling Churn in a DHT. Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz. Proceedings of the USENIX Annual Technical Conference, June 2004.
[Maymounkov02] Kademlia: A Peer-to-peer Information System Based on the XOR MetricPetar Maymounkov and David Mazières (IPTPS 2002)