21
1 Minseok Kwon Department of Computer Science Rochester Institute of Technology [email protected] http://www.cs.rit.edu/~jmk Week 2: P2P Overlay Week 2: P2P Overlay Networks Networks

Minseok Kwon Department of Computer Science Rochester Institute of Technology [email protected]

  • Upload
    bob

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Week 2: P2P Overlay Networks. Minseok Kwon Department of Computer Science Rochester Institute of Technology [email protected] http://www.cs.rit.edu/~jmk. Client: Lookup (“title”). N2. N1. N3. N7. N4. N5. N8. N6. - PowerPoint PPT Presentation

Citation preview

Page 1: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

1

Minseok Kwon

Department of Computer ScienceRochester Institute of Technology

[email protected]

http://www.cs.rit.edu/~jmk

Week 2: P2P Overlay Week 2: P2P Overlay NetworksNetworks

Page 2: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

2

Peer-to-Peer (P2P) Networks

Internet

PublisherKey = “Davinci Code”

File = mpg data

ClientLookup(“Davinci Code”)

File = mpg data

• Lookup problem: Given a network of peers and a keyword, how can we find the data?

N1

N4

N5

N6

N2

N7

N3

N8

Client: Lookup (“title”)

• Structured P2P: Can we find a target efficiently, specifically in O(log n) steps?

Page 3: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

3

Distributed Hash Table (DHT)• Question: Can we do the lookup in the logarithmic time?

If yes, how?• In CS 3, we know how to do this if numbers are sorted in

an array. Run binary search!• In P2P networks, we first compute node identifiers and

key identifiers using a hash function. • Key-identifier = SHA-1(keyword)• Node-identifier = SHA-1(IP address)

• We then arrange nodes and data items in a certain order in the same ID space. This is called DHT!

• Now, how can we map key IDs to node IDs to support the lookup?

• At the same time, the system must be scalable, robust, self-organizing, and completely decentralized.

Page 4: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

4

CAN: Insert/Retrieve Files

y = hy(K)

I

• Node I wants to add file F with keyword K.

• First, get the target coordinates x and y.

• Second, route to the target (x, y).

• Third, store F at the target.

• How can we retrieve F later?

• Answer: get the target coordinates using the same hash functions.

(K,F)

x = hx(K)

Page 5: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

5

CAN: Routing

(a,b)

(x,y)

• Greedy approach• Select one of the

neighbors which gets closer to the target.

Page 6: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

6

CAN: Node Join

(p,q)

I

X

new node

• A new node contacts a random node I initially.

• I routes the new node to other randomly picked target (p, q), which is node X.

• The node X zone is split in half in which the new node takes one half.

new node

Page 7: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

7

• Suppose that there are n nodes and d dimensions.• Scalable?

• Inserting a new node affects only a single other node and its immediate neighbors

• A node only maintains state for its immediate neighboring nodes. The number of neighbors is 2d.

• Efficient?• Average routing path: (dn1/d)/4 hops.

• Robust?• Resilient to node and link failures• No single point of failure since the system is completely

distributed.

How Good is CAN?

Page 8: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

8

Chord: Basic Lookup

N32

N105

N90

Circular 7-bitID space

K5

K20

Key 5:

Node 105:

Which node is K80 stored at?

A key is stored at its successor

Page 9: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

9

Chord: Advanced Lookup

N80

N120 1/2

1/4

1/8

1/161/32

Finger i points to successor of n+2^i

N80

N99

N110

N5

N10

N20

N32

N60

K19

Lookup(K19)

Lookup takes O(log(N)) steps

Page 10: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

10

Chord: Join

N25

N40

K30, K38

K36 wants to join

N25

N40

K30, K38

K36 wants to join N36 K30

Page 11: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

11

How Good is Chord?• Suppose that there are n nodes.• Scalable?

• O(log(n)) states per lookup

• Efficient?• O(log(n)) messages (or steps) per lookup

• Robust?• Survives massive failures

Page 12: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

12

Is DHT a Silver Bullet?• What is the dominant P2P applications?

• Mass-market file sharing mostly for music, video, and software

• Potential problems with these applications?• Wide range of heterogeneity• Large transient user population

• Flooding does not scale; how about DHTs?• DHTs scale, especially for finding the location of a given

filename.• One problem with DHTs: require lots of extra work including

caching, keyword searching

• Do we really need DHTs for mass-market file sharing?• NOT necessarily!• DHTs are great at finding rare files, but most queries are about

popular files.

Page 13: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

13

Suggested Solution: GIA• Can we make Gnutella-like P2P systems

scalable?• Our answer is yes, and we designed GIA!• Idea

• Unstructured (based on flooding), but take node capacity into account.

• High-capacity nodes have room for more queries; why not sending more queries to them?

• This will work only if: • High-capacity nodes have correspondingly more answers.• High-capacity nodes are easily reachable from other

nodes.

Page 14: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

14

GIA (Gianduia) Design• Make high-capacity nodes easily reachable

• Dynamic topology adaptation

• Make high-capacity nodes have more answers• One-hop replication

• Search efficiently• Biased random walks

• Prevent overloaded nodes• Active flow control Query

Page 15: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

15

Dynamic Topology Adaptation• Goals

• High capacity nodes are high-degree nodes.• Low capacity nodes are close to higher capacity ones.

new nodehigh-capacity node

• How does a new node join?• A new node selects the node with max capacity (> its own

capacity) among a small number of randomly selected nodes.

• Uses a level of satisfaction as a function of capacity, degree, and age.

• Neighbors must have outgoing capacity to handle forwarded queries.

Page 16: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

16

One-hop Replication

• Content information is exchanged during connection, and updated incrementally.

• High-capacity peers can act as a proxy for low capacity peers.• Nodes keep an index of their neighbors’ shared files.

high-capacity node

high-capacity node maintains indices for bothitself and neighbors

Page 17: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

17

Active Flow Control• Sender sends queries to a neighbor only if that

neighbor accepts queries.

• How to know whether or not accept queries?• If that neighbor sends a token…

• Active flow control periodically assigns tokens to neighbors.

• Token allocation rate varies on query processing capability and buffer queue.

Page 18: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

18

Search Protocol• Biased random walk

• Rather than forwarding incoming queries to randomly chosen neighbors, a node selects the highest capacity neighbor for which it has flow-control tokens and sends the query to that neighbor.

• Use GUID to send queries to different paths.• TTL and max_responses bound propagation.

• Advantage: reduce flooding• Disadvantage: sensitive to peer failures

Page 19: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

19

Performance

0.00001

0.001

0.1

10

1000

0.01 0.1 1Replication Rate (percentage)

Collapse Point (qps/node)

GIA: N=10,000

SUPER: N=10,000

RWRT: N=10,000

FLOOD: N=10,000

Page 20: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

20

Transient Behavior

• GIA outperforms under heavy churn.

0.001

0.01

0.1

1

10

100

1000

10 100 1000 10000Per-node max-lifetime (seconds)

Collapse point (qps/node)

replication rate = 1.0%

replication rate = 0.5%

replication rate = 0.1%

Page 21: Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

21

Summary• GIA: scalable Gnutella

• 3-5 orders of magnitude improvement in system capacity.

• Unstructured approach is good enough• DHTs may be overkill!