Overlays and DHTs
Presented by
Dong Wang and Farhana Ashraf
Schedule
Review of Overlay and DHT RON Pastry Kelips
Review on Overlay and DHT Overlay
Network build on top of another network Nodes connected to each other by logical/virtual links Improve Internet Routing and Easy to deploy P2P(Gnutella, Chord…), RON
DHT Allows you to do lookup, insert, delete objects with keys in a distri
buted settings Performance Concerns
Load Balancing Fault Tolerance Efficiency of lookups and inserts Locality
CAN, Chord, Pasrty, Tapestry are all DHTs
Resilient Overlay Network
David G. Andersen, etc.
MIT
SOSP 2001 Acknowledged:http://nms.csail.mit.edu/ron/Previous CS525 Courses
RON-Resilient Overlay Network Motivation Goal Design Evaluation Discussion
RON-Motivation
Current Internet Backbone NOT be able to
• Detect failed path and recover quickly• BGP takes several mins to recover from faults
• Detect flooding and congestion effectively• Leverage redundant path efficiently• Express fine-grained policy/metrics
RON-Basic Idea
A
B
C
D
Inequality of Triangles does not usually hold for Internet! -Eg. Latency-It is possible that: AB+BC<AC RON makes use of underlying path redundancy of Internet to provide better path and route around failure RON is end to end solution, packets are simply wrapped around and sent normally
RON-Main Goals
Fast Failure Detection and Recovery Average detect and recover delay<20s
Tighter integration of routing/path selection with the application Application can specify metrics to affect routing
Expressive Policy Routing Fine-grained and aim at users and hosts
RON-Design
Overlay Old idea in networks Easily deployed and let Internet focus on
scalability Only keep functionality between active peers
Approach Aggressively probe all inter-RON node paths Exchange routing information Route along best path (from end to end view)
consistent with routing policy
RON-Design: Architecture
Probe between nodes, detect path quality Store path qualities at Performance Database Link-state routing protocol among nodes Data are handled by application-specific conduit and
forwarded in UDP
RON-Design: Routing and Lookup Policy routing
Classify by policy Generate table per policy
Metric optimization Application tags the packet
with its specific metric Generate table per metric
Multi-level routing table and 3 stage lookup Policy->Metric->Next hop
RON Design-Probing and Outage Detect
Probe every PROBE_INTERVAL (12s) With 3 packets, both participants get an RTT and reachability withou
t syn. Clocks If probe is lost, send next immediately, up to 3 more probes (PROB
E_TIMEOUT 3s) Notify outage after 4 consecutive probe loses Outage detection time on average=19 s
RON Design-Policy Routing Allow user to define types of traffic allowed on
particular network links
Place policy tags on packets and build up policy based routing table
Two policy mechanisms exclusive cliques general policies
RON-Evaluation
Two main dataset from Internet deployment RON1-N=12 nodes, 132 distinct paths, traverse
36 AS and 74 Inter-AS paths RON2-N=16 nodes, 240 distinct paths, traverse
50 AS and 118 Inter-AS paths
Policy-prohibit sending traffic to or from commercial sites over the Internet2
CA-T1CCIArosUtah
CMU
To vu.nlLulea.se
MITMA-CableCisco
Cornell
NYU
OR-DSL
RON Evaluation-Major Results Increase the resilience of the overlay network
RON takes ~10s to route around failure Compared to BGP’s several minutes
Many Internet outage are avoidable Improve performance –Loss rate, Latency, T
CP Throughput Single-hop indirect routing works well Overhead is reasonable and acceptable
RON vs Internet 30 minute loss rates
For RON1-able to route around all outages; For RON2-about 60% outages are overcome
Performance-Loss rate
Performance-Latency
Performance-TCP Throughput
Performance Improvement: RON employs App-specific metric optimization to select path
Scalability
Conclusions
RON improves network reliability and performance
Overlay approach is attractive for resiliency: development, fewer nodes, simple substrate
Single-hop indirection in RON works well RON also introduces more probing and
updating traffic into network
RON-Discussion Aggressiveness
RON never back-off as TCP does Can it coexist with current traffic on Internet? What happens if everyone starts to use RON? Is it possible to modify RON to achieve good
behavior in a global sense Scalability
Trade scalability for improved reliability Many RONS coexisting in the Internet Hierarchical structure of RON network
Distributed Hash Table
Problem
Route a msg with key, K to the node, Z which has a ID closest to key K
Not scalable if routing table contains all the nodes
Tradeoffs Memory per node Lookup latency Number of messages
X=d46a1cd462ba
d4213f
d13da3
A = 65a1fc
d467c4
d471f1
Route(d46a1c)
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems
Antony RowstronPeter Druschel
Middleware 2001Acknowledged:Previous CS525 Courses
Motivation
Node IDs are assigned randomly With high probability nodes with adjacent IDs are
diverse
Considers network locality Seeks to minimize distance messages travel
Scalar proximity metric #IP routing hops RTT
Pastry: Node Soft State
Immediate Neighbors in ID space(Used for routing)
Nodes closest according to locality(Used to update routing table)
Used for routing
Similar to successor and predecessor
Similar to finger table entries
Storage requirement in each node = O(log N)
Pastry: Node Soft State
Leaf Set Contains L nodes, closest in the ID space
Neighborhood Set Contains M nodes, closest according to proximity metric
Routing Table Entries of row n, shares exactly the first n digits with the
local node Nodes are chosen according to proximity metric
Pastry: Routing
Case I Key within leaf set
Route to the node in the leaf set with ID closest to key
Case II (Prefix Routing) Key not within leaf set
Route a node in the routing table, such that the new node shares one more digit with the key than the local node
Case III Key not within leaf set Case II not possible
Route to a node which shares at least same number of digits with the key, but is closer to the key than the local node
Routing Example
Cuts the ID space into 1/(2^b)
Number of hops needed is log2^bN
d46a1cd462ba
d4213f
d13da3
65a1fc
d467c4
d471f1
lookup(d46a1c)
Self Organization: Node Join
X=d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
A = 65a1fc
Z=d467c4
d471f1
New node: X=d46a1c
A is X’s neighbor
Pastry: Node State Initialization
Leaf set (X) = leaf set (Z)
Neighborhood set (A) = neighborhood set (X)
Routing Table Row zero of X = row zero
of A Row one of X = row one
of B
X=d46a1c
Route(d46a1c)
d462ba
d4213f
B= d13da3
A = 65a1fc
Z=d467c4
d471f1
New node: X=d46a1c
X informs any nodes that need to be aware of its arrival X also improves its table locality by requesting neig
hborhood sets from all nodes X knows In practice: optimistic approach
Pastry: Node State Update
Pastry: Node departure (failure) Leaf set repair (eager – all the time):
Leaf set members exchange keep-alive messages Request set from furthest live node in set
Routing table repair (lazy – upon failure): Get table from peers in the same row, if not found –
from higher rows
Neighborhood set repair (eager)
Routing Performance
Average no. of hops = log(N) Pastry uses locality information
Kelips: Building an Efficient and Stable P2P DHT through Increased Memory and Background Overhead
Indranil Gupta, Ken Birman, Prakash Linga, Al Demers, and Robert van Renesse
IPTPS 2003Acknowledged:Previous CS525 Courses
Motivation
For n=1000000 and 20 byte per entry Storage requirement at a Pastry node = 120 byte
Not using memory efficiently
How can we achieve O(1) lookup latency? Increase memory usage per node
Design
Consists of k virtual affinity groups
Each node member of an affinity group
Soft State Affinity Group View
(Partial) set of other nodes lying in the same affinity group Contacts
(constant sized) set of nodes lying in each of the foreign affinity groups
Filetuples (partial) set of filename and host IP address (homenode), where hom
enode lies in the same affinity group
Contains RTT, heartbeat count for each of the entries
Kelips: Node Soft State
…AffinityGroup # 0 # 1 # k-1
129
30
15
160
76
18
167
soft state
id hbeat rttime
18
167
1890
2067
23ms
67ms
…
affinity group view
group contactnodes
0
1
[129, 30, … ]
[15, 160, …]
fname homenode
…
[18, 167, … ]p2p.txt
contacts
filetuple
Storage Requirement @ Kelips Node S(k,n) = n/k + c * (k-1) + F/k entries
Minimized at k = √( (n+F) / c ) Assume F is proportional to n, and c fixed Optimal k = O(sqrt(n))
For n=1000000, and F = 10 million Total memory requirement < 20 MB
Affinity groups Contacts Filetuples
Algorithm: Lookup
Lookup (key D at node A) Affinity group G of homenode of D = hash(D) A sends message to closest node X in the contact
set for affinity group G X finds homenode of D from filetuple set
O(1) lookup
Maintaining Soft State
Heartbeat mechanism Soft state entries refreshed periodically within and across g
roups
Each node periodically selects a few nodes as gossip targets and sends them partial soft state information
Uses constant gossip message size
O(log n) complexity for gossiping
Load Balance
N = 1500 with 38 affinity groups 1000 nodes with 30 affinity groups
Discussion Points
What happens when triangular inequality does not hold for a proximity metric?
What happens for high churn rate in Pastry and Kelips?
What was the intuition behind affinity groups?
Can we use Kelips in a Internet scale network?
Conclusion
A DHT tradeoff: Storage requirement Lookup Latency
Going one step further One hop lookups for p2p overlays
http://www.usenix.org/events/hotos03/tech/talks/gupta_talk.pdf
Chord Pastry Kelips
Storage requirement
O(log N) O(log N) O(√n)
Lookup Latency
O(log N) O(log N) O(1)