32
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis COOPIS 2001, Trento, Italy P-Grid: A Self-organizing Access Structure for P2P Information Systems Karl Aberer EPFL-DSC Distributed Information Systems Laboratory [email protected]

COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

Embed Size (px)

Citation preview

Page 1: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

P-Grid: A Self-organizing Access Structurefor P2P Information Systems

Karl AbererEPFL-DSC

Distributed Information Systems [email protected]

Page 2: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Overview

1. Peer-to-Peer Information Systems2. Data Access in a P2P Information System3. P-Grid

1. Structure2. Construction algorithm3. Simulation

4. P-Grid Search and Update1. Algorithms2. Simulation

5. Application to Gnutella6. Conclusions

Page 3: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

1. P2P Information Systems

• P2P Systems draw currently a lot of attention– File-sharing systems

• Napster, Gnutella, FreeNet, etc.

– Conferences• O’Reilly P2P conference 2001

(conferences.oreilly.com/p2p/)• 2001 International Conference

on Peer-to-Peer Computing (P2P2001) (www.ida.liu.se/conferences/p2p/p2p2001/)

• …

Page 4: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Napster [www.napster.com]

ANapster

C

1. A asks Napster: "I am searching XXX.mp3"

2. Napster tells A: "C should have XXX.mp3"

3. A asks C: "I am requesting XXX.mp3"

C

4. C deliversXXX.mp3 to A

XXX.mp3

YYY.mp3

Internet

Page 5: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Gnutella [www.gnutella.com]

A

B

1. A asks B: "I am searching XXX.mp3"

2. B tells A: "C should have XXX.mp3"

3. A asks C: "I am requesting XXX.mp3"

C

4. C deliversXXX.mp3 to A

XXX.mp3

YYY.mp3

ZZZ.mp3

Internet

Page 6: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Properties of P2P Information Systems

• No central coordination• No central database• No peer has a global view of the system• Global behavior emerges from local interactions• Peers are autonomous• Peers and connections are unreliable

• Despite these limitations:All existing information should be accessible

Page 7: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

2. Data Access in a P2P System

• B2B servers, Napster, eBay etc.– Central database (efficient) !

• Gnutella– Search requests are broadcasted (inefficient)– Anectode: the founder of Napster computed that a single

search request (18 Bytes) on a Napster community would generate 90 Mbytes of data transfers. [http://www.darkridge.com/~jpr5/doc/gnutella.html]

Page 8: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Problem

• Can a set of peers provide – efficient search on a data set– of which the storage space exceeds the resources of

each agent substantially: e.g. s_local = O(log(s_global))

• Answer– In principle, yes !– Requires scalable data access structure

Page 9: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Scalable Data Access Structures

• Work in the following way– Every peer maintains a small fragment of the database

and a routing table– The routing tables are organized such that at different

levels of granularity requests can be forwarded– Replication is used to increase robustness

route R0 route R1

route R00 route R01

data D01

Page 10: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Approaches

• Scalable data access structures– [Plaxton 97] (distributed object addressing)– CHORD [Dabek 01] (distributed object addressing)– CAN (distributed object addressing)– FreeNet [Clarke 00] (file sharing systems)– [Litwin 97] (distributed databases)– [Yokota 99] (parallel databases)– P-Grid [Aberer 01] (decentralized databases)– etc.

• Question– Are they decentralized ?

Page 11: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Comparison Criteria

• Routing criteria– trees, key similarity, hashing, multidim. keys, …

• Search criteria– equality, prefix, range, similarity

• Performance– search, update, join and leave the network

• Robustness– use of replication

• Global knowledge (except nature of search keys)– number of ex. addresses

• Global Control– Coordinator, central repository

• Local autonomy– fixed association of roles with address

Scalabledata access

structure

De-centralization

Page 12: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Comparison – Data Access Structure

yesO(log n)prefixBinary tree

P-Grid

noO(log n)rangeB-TreeYokota

yesO(log n) ?equalityKey similarity

FreeNet

yesO(n1/d)equalityMulti-dim. Grid

CAN

noO(log n)equalityImplicit binary tree

CHORD

yesO(log n)equalityBinary tree

Plaxton

ReplicationSearch perform.

SearchRouting

Page 13: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Comparison - Decentralization

yesnononeP-Grid

noyesallYokota

nonononeFreeNet

yesnononeCAN

nonoIP address space

CHORD

nonoMax # participants

Plaxton

Local autonomy

Global Control

Global Knowledge

Page 14: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

3. The P-Grid Search Structure

Page 15: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

ref data R0101

R1

Data Structure of a Peer

a

R0 R1 R1

R00R01 R00 R00

R011R010 R011 R011

R0100R0101 R0100 R0100

path of peerreferences

ref data R0101ref data R0101

Page 16: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

P-Grid Construction

• Bootstrap problem: How to build the P-Grid ?– without a fixed association of addresses with keys

• i.e. a global schema to assign roles • violating local autonomy

– efficiently

Page 17: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

P-Grid Construction Algorithm (Bootstrap)

• When peers meet (randomly)– Compare the current search paths p and q

• Case 1: p and q are the same– If the maximal path length is not reached extend the

paths and split search space, i.e. to p0 and q1

• Case 2: p is a subpath of q, i.e. q = p0…– Extend p by the complement of q, i.e. p1

• Case 3: only a common prefix exists– Forward to one of the referenced peers– Limit forwarding by recmax

• The peers remember each other and exchange in addition references at all levels

Page 18: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Simulations

• Implementation in Mathematica• Simulation parameters (n, k, recmax, refmax)

– Peer population size n– Key length k– Recursion depth recmax– Multiple references refmax

• Determine number of meetings required – by each peer– to reach on average 99% of maximal pathlength

Page 19: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Dependency on Peer Population Size

• (n = 200..1000, k = 6, recmax = 2, refmax = 1)• None !?

21.5

22

22.5

23

23.5

24

24.5

25

25.5

26

26.5

200 400 600 800 1000

Page 20: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Dependency on Key Length

• (n = 500, k = 2..7, recmax = 2, refmax = 1)• exponential

0

50

100

150

200

250

300

350

400

2 3 4 5 6 7

Page 21: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Dependency on Recursion Depth

• (n = 500, k = 6, recmax = 0..6, refmax = 1)• There exists an optimal value

0

10

20

30

40

50

60

70

80

0 1 2 3 4 5 6

Page 22: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Replica Distribution

• (n = 20000, k = 10, recmax =2, refmax =20)

Page 23: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Properties of P-Grid Bootstrap Algorithm

• Convergence ?– Does not depend on population size– Depends on key length exponentially– Depends on recursion depth

• Distribution of replicas ?– Simulations indicate a reasonable distribution– Access paths to replicas are non-uniformly distributed

• Balanced trees ?– Simple argument (and simulations) show that this is very

likely

Page 24: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

4. Search and Update

• Search straightforward– Follow own path or references– At most k steps– If multiple references are online, select randomly

• Updates– All replicas need to be found– Repeated searches

• Breadth first (limited recursion breadth)• Depth first• Depth first and contact buddies with same key

Page 25: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Simulation Result

• (n = 20000, k = 10, recmax = 2, refmax = 20)• online probability 30%

1000 2000 3000 4000 5000

0.2

0.4

0.6

0.8

1breadthfirst search

searchwith buddies

depth first search

Page 26: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Update vs. Search Cost

• Trade lower update quality for higher search cost– Use repeated searches to confirm results

recbreadth update rep. successrate query cost insertion cost2 1 1 137 782 2 1 34 1472 3 1 17 2243 1 1 112 6373 2 1 13 14343 3 1 13 2086

2 1 0.65 5.5 722 2 0.85 5.6 1452 3 0.89 5.4 2123 1 0.95 5.5 7343 2 0.98 5.5 13633 3 0.994 5.4 2080

Page 27: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

P-Grid Variations

• To be further explored– No global, maximal keylength– Growing and shrinking of keys

• problem: integrity of referenced peers

– Joining and leaving P-Grids

Page 28: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

P-Grid Flexibility

• The algorithm represents rather a framework than a single solution– options are left open and leave room for optimization– e.g. taking into account

• access probability• existing data distribution• reachability and access cost

Page 29: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

5. Application to Gnutella

• Currently under implementation• Uses Gnutella protocol and software• Controls routing of search requests using P-Grid• Problem: non-uniform distribution of search keys

– Build statistics– Compute a global, prefix-preserving hash function

Page 30: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

Computing the Required Resources

• Assume– 10^7 searchable keys (substrings of filenames)– 10 Bytes for storing a peer address– 10^5 Bytes per peer provided for indexing– 30 % online probability– 99 % answer reliability

• Then– Approx. 20.000 peers can be supported– refmax = 20 is sufficient

Page 31: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

6. Conclusions

• Scalable distributed and decentralized access structures are possible

• P-Grids offer a lot of flexibility to be further exploited

• Powerful tools for analysis required• Foundation for many fully decentralized P2P

applications• Application in mobile ad-hoc networks (

www.terminode.org), Swiss national research centre at EPFL

Page 32: COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for

©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

COOPIS 2001, Trento, Italy

References

• [Aberer01] Karl Aberer, Zoran Despotovic. Managing Trust in a Peer-2-Peer Information System. To appear in the Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2001) 2001.

• [Vingralek 98] Radek Vingralek, Yuri Breitbart, Gerhard Weikum: Snowball: Scalable Storage on Networks of Workstations with Balanced Load. Distributed and Parallel Databases 6(2): 117-156 (1998)

• [Stonebraker 96] Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB Journal 5(1): 48-63 (1996)

• [Plaxton 97] C. Greg Plaxton, Rajmohan Rajaraman, Andréa W. Richa: Accessing Nearby Copies of Replicated Objects in a Distributed Environment. SPAA 1997: 311-320.

• [Yokota 99] Haruo Yokota, Yasuhiko Kanemasa, Jun Miyazaki: Fat-Btree: An Update-Conscious Parallel Directory Structure. ICDE 1999: 448-457.

• [Litwin 97] Witold Litwin, Marie-Anne Neimat: LH*s: A High-Availability and High-Security Scalable Distributed Data Structure. RIDE 1997.

• [Stoica 00] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001.

• [Clarke 00] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001.

• [Ratnasamy01] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001.