Upload
bob
View
46
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Week 2: P2P Overlay Networks. Minseok Kwon Department of Computer Science Rochester Institute of Technology [email protected] http://www.cs.rit.edu/~jmk. Client: Lookup (“title”). N2. N1. N3. N7. N4. N5. N8. N6. - PowerPoint PPT Presentation
Citation preview
1
Minseok Kwon
Department of Computer ScienceRochester Institute of Technology
http://www.cs.rit.edu/~jmk
Week 2: P2P Overlay Week 2: P2P Overlay NetworksNetworks
2
Peer-to-Peer (P2P) Networks
Internet
PublisherKey = “Davinci Code”
File = mpg data
ClientLookup(“Davinci Code”)
File = mpg data
• Lookup problem: Given a network of peers and a keyword, how can we find the data?
N1
N4
N5
N6
N2
N7
N3
N8
Client: Lookup (“title”)
• Structured P2P: Can we find a target efficiently, specifically in O(log n) steps?
3
Distributed Hash Table (DHT)• Question: Can we do the lookup in the logarithmic time?
If yes, how?• In CS 3, we know how to do this if numbers are sorted in
an array. Run binary search!• In P2P networks, we first compute node identifiers and
key identifiers using a hash function. • Key-identifier = SHA-1(keyword)• Node-identifier = SHA-1(IP address)
• We then arrange nodes and data items in a certain order in the same ID space. This is called DHT!
• Now, how can we map key IDs to node IDs to support the lookup?
• At the same time, the system must be scalable, robust, self-organizing, and completely decentralized.
4
CAN: Insert/Retrieve Files
y = hy(K)
I
• Node I wants to add file F with keyword K.
• First, get the target coordinates x and y.
• Second, route to the target (x, y).
• Third, store F at the target.
• How can we retrieve F later?
• Answer: get the target coordinates using the same hash functions.
(K,F)
x = hx(K)
5
CAN: Routing
(a,b)
(x,y)
• Greedy approach• Select one of the
neighbors which gets closer to the target.
6
CAN: Node Join
(p,q)
I
X
new node
• A new node contacts a random node I initially.
• I routes the new node to other randomly picked target (p, q), which is node X.
• The node X zone is split in half in which the new node takes one half.
new node
7
• Suppose that there are n nodes and d dimensions.• Scalable?
• Inserting a new node affects only a single other node and its immediate neighbors
• A node only maintains state for its immediate neighboring nodes. The number of neighbors is 2d.
• Efficient?• Average routing path: (dn1/d)/4 hops.
• Robust?• Resilient to node and link failures• No single point of failure since the system is completely
distributed.
How Good is CAN?
8
Chord: Basic Lookup
N32
N105
N90
Circular 7-bitID space
K5
K20
Key 5:
Node 105:
Which node is K80 stored at?
A key is stored at its successor
9
Chord: Advanced Lookup
N80
N120 1/2
1/4
1/8
1/161/32
Finger i points to successor of n+2^i
N80
N99
N110
N5
N10
N20
N32
N60
K19
Lookup(K19)
Lookup takes O(log(N)) steps
10
Chord: Join
N25
N40
K30, K38
K36 wants to join
N25
N40
K30, K38
K36 wants to join N36 K30
11
How Good is Chord?• Suppose that there are n nodes.• Scalable?
• O(log(n)) states per lookup
• Efficient?• O(log(n)) messages (or steps) per lookup
• Robust?• Survives massive failures
12
Is DHT a Silver Bullet?• What is the dominant P2P applications?
• Mass-market file sharing mostly for music, video, and software
• Potential problems with these applications?• Wide range of heterogeneity• Large transient user population
• Flooding does not scale; how about DHTs?• DHTs scale, especially for finding the location of a given
filename.• One problem with DHTs: require lots of extra work including
caching, keyword searching
• Do we really need DHTs for mass-market file sharing?• NOT necessarily!• DHTs are great at finding rare files, but most queries are about
popular files.
13
Suggested Solution: GIA• Can we make Gnutella-like P2P systems
scalable?• Our answer is yes, and we designed GIA!• Idea
• Unstructured (based on flooding), but take node capacity into account.
• High-capacity nodes have room for more queries; why not sending more queries to them?
• This will work only if: • High-capacity nodes have correspondingly more answers.• High-capacity nodes are easily reachable from other
nodes.
14
GIA (Gianduia) Design• Make high-capacity nodes easily reachable
• Dynamic topology adaptation
• Make high-capacity nodes have more answers• One-hop replication
• Search efficiently• Biased random walks
• Prevent overloaded nodes• Active flow control Query
15
Dynamic Topology Adaptation• Goals
• High capacity nodes are high-degree nodes.• Low capacity nodes are close to higher capacity ones.
new nodehigh-capacity node
• How does a new node join?• A new node selects the node with max capacity (> its own
capacity) among a small number of randomly selected nodes.
• Uses a level of satisfaction as a function of capacity, degree, and age.
• Neighbors must have outgoing capacity to handle forwarded queries.
16
One-hop Replication
• Content information is exchanged during connection, and updated incrementally.
• High-capacity peers can act as a proxy for low capacity peers.• Nodes keep an index of their neighbors’ shared files.
high-capacity node
high-capacity node maintains indices for bothitself and neighbors
17
Active Flow Control• Sender sends queries to a neighbor only if that
neighbor accepts queries.
• How to know whether or not accept queries?• If that neighbor sends a token…
• Active flow control periodically assigns tokens to neighbors.
• Token allocation rate varies on query processing capability and buffer queue.
18
Search Protocol• Biased random walk
• Rather than forwarding incoming queries to randomly chosen neighbors, a node selects the highest capacity neighbor for which it has flow-control tokens and sends the query to that neighbor.
• Use GUID to send queries to different paths.• TTL and max_responses bound propagation.
• Advantage: reduce flooding• Disadvantage: sensitive to peer failures
19
Performance
0.00001
0.001
0.1
10
1000
0.01 0.1 1Replication Rate (percentage)
Collapse Point (qps/node)
GIA: N=10,000
SUPER: N=10,000
RWRT: N=10,000
FLOOD: N=10,000
20
Transient Behavior
• GIA outperforms under heavy churn.
0.001
0.01
0.1
1
10
100
1000
10 100 1000 10000Per-node max-lifetime (seconds)
Collapse point (qps/node)
replication rate = 1.0%
replication rate = 0.5%
replication rate = 0.1%
21
Summary• GIA: scalable Gnutella
• 3-5 orders of magnitude improvement in system capacity.
• Unstructured approach is good enough• DHTs may be overkill!