Upload
paula-mince
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
P-Grid: A Self-organizing Access Structurefor P2P Information Systems
Karl AbererEPFL-DSC
Distributed Information Systems [email protected]
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Overview
1. Peer-to-Peer Information Systems2. Data Access in a P2P Information System3. P-Grid
1. Structure2. Construction algorithm3. Simulation
4. P-Grid Search and Update1. Algorithms2. Simulation
5. Application to Gnutella6. Conclusions
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
1. P2P Information Systems
• P2P Systems draw currently a lot of attention– File-sharing systems
• Napster, Gnutella, FreeNet, etc.
– Conferences• O’Reilly P2P conference 2001
(conferences.oreilly.com/p2p/)• 2001 International Conference
on Peer-to-Peer Computing (P2P2001) (www.ida.liu.se/conferences/p2p/p2p2001/)
• …
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Napster [www.napster.com]
ANapster
C
1. A asks Napster: "I am searching XXX.mp3"
2. Napster tells A: "C should have XXX.mp3"
3. A asks C: "I am requesting XXX.mp3"
C
4. C deliversXXX.mp3 to A
XXX.mp3
YYY.mp3
Internet
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Gnutella [www.gnutella.com]
A
B
1. A asks B: "I am searching XXX.mp3"
2. B tells A: "C should have XXX.mp3"
3. A asks C: "I am requesting XXX.mp3"
C
4. C deliversXXX.mp3 to A
XXX.mp3
YYY.mp3
ZZZ.mp3
Internet
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Properties of P2P Information Systems
• No central coordination• No central database• No peer has a global view of the system• Global behavior emerges from local interactions• Peers are autonomous• Peers and connections are unreliable
• Despite these limitations:All existing information should be accessible
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
2. Data Access in a P2P System
• B2B servers, Napster, eBay etc.– Central database (efficient) !
• Gnutella– Search requests are broadcasted (inefficient)– Anectode: the founder of Napster computed that a single
search request (18 Bytes) on a Napster community would generate 90 Mbytes of data transfers. [http://www.darkridge.com/~jpr5/doc/gnutella.html]
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Problem
• Can a set of peers provide – efficient search on a data set– of which the storage space exceeds the resources of
each agent substantially: e.g. s_local = O(log(s_global))
• Answer– In principle, yes !– Requires scalable data access structure
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Scalable Data Access Structures
• Work in the following way– Every peer maintains a small fragment of the database
and a routing table– The routing tables are organized such that at different
levels of granularity requests can be forwarded– Replication is used to increase robustness
route R0 route R1
route R00 route R01
data D01
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Approaches
• Scalable data access structures– [Plaxton 97] (distributed object addressing)– CHORD [Dabek 01] (distributed object addressing)– CAN (distributed object addressing)– FreeNet [Clarke 00] (file sharing systems)– [Litwin 97] (distributed databases)– [Yokota 99] (parallel databases)– P-Grid [Aberer 01] (decentralized databases)– etc.
• Question– Are they decentralized ?
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Comparison Criteria
• Routing criteria– trees, key similarity, hashing, multidim. keys, …
• Search criteria– equality, prefix, range, similarity
• Performance– search, update, join and leave the network
• Robustness– use of replication
• Global knowledge (except nature of search keys)– number of ex. addresses
• Global Control– Coordinator, central repository
• Local autonomy– fixed association of roles with address
Scalabledata access
structure
De-centralization
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Comparison – Data Access Structure
yesO(log n)prefixBinary tree
P-Grid
noO(log n)rangeB-TreeYokota
yesO(log n) ?equalityKey similarity
FreeNet
yesO(n1/d)equalityMulti-dim. Grid
CAN
noO(log n)equalityImplicit binary tree
CHORD
yesO(log n)equalityBinary tree
Plaxton
ReplicationSearch perform.
SearchRouting
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Comparison - Decentralization
yesnononeP-Grid
noyesallYokota
nonononeFreeNet
yesnononeCAN
nonoIP address space
CHORD
nonoMax # participants
Plaxton
Local autonomy
Global Control
Global Knowledge
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
3. The P-Grid Search Structure
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
ref data R0101
R1
Data Structure of a Peer
a
R0 R1 R1
R00R01 R00 R00
R011R010 R011 R011
R0100R0101 R0100 R0100
path of peerreferences
ref data R0101ref data R0101
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
P-Grid Construction
• Bootstrap problem: How to build the P-Grid ?– without a fixed association of addresses with keys
• i.e. a global schema to assign roles • violating local autonomy
– efficiently
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
P-Grid Construction Algorithm (Bootstrap)
• When peers meet (randomly)– Compare the current search paths p and q
• Case 1: p and q are the same– If the maximal path length is not reached extend the
paths and split search space, i.e. to p0 and q1
• Case 2: p is a subpath of q, i.e. q = p0…– Extend p by the complement of q, i.e. p1
• Case 3: only a common prefix exists– Forward to one of the referenced peers– Limit forwarding by recmax
• The peers remember each other and exchange in addition references at all levels
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Simulations
• Implementation in Mathematica• Simulation parameters (n, k, recmax, refmax)
– Peer population size n– Key length k– Recursion depth recmax– Multiple references refmax
• Determine number of meetings required – by each peer– to reach on average 99% of maximal pathlength
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Dependency on Peer Population Size
• (n = 200..1000, k = 6, recmax = 2, refmax = 1)• None !?
21.5
22
22.5
23
23.5
24
24.5
25
25.5
26
26.5
200 400 600 800 1000
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Dependency on Key Length
• (n = 500, k = 2..7, recmax = 2, refmax = 1)• exponential
0
50
100
150
200
250
300
350
400
2 3 4 5 6 7
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Dependency on Recursion Depth
• (n = 500, k = 6, recmax = 0..6, refmax = 1)• There exists an optimal value
0
10
20
30
40
50
60
70
80
0 1 2 3 4 5 6
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Replica Distribution
• (n = 20000, k = 10, recmax =2, refmax =20)
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Properties of P-Grid Bootstrap Algorithm
• Convergence ?– Does not depend on population size– Depends on key length exponentially– Depends on recursion depth
• Distribution of replicas ?– Simulations indicate a reasonable distribution– Access paths to replicas are non-uniformly distributed
• Balanced trees ?– Simple argument (and simulations) show that this is very
likely
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
4. Search and Update
• Search straightforward– Follow own path or references– At most k steps– If multiple references are online, select randomly
• Updates– All replicas need to be found– Repeated searches
• Breadth first (limited recursion breadth)• Depth first• Depth first and contact buddies with same key
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Simulation Result
• (n = 20000, k = 10, recmax = 2, refmax = 20)• online probability 30%
1000 2000 3000 4000 5000
0.2
0.4
0.6
0.8
1breadthfirst search
searchwith buddies
depth first search
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Update vs. Search Cost
• Trade lower update quality for higher search cost– Use repeated searches to confirm results
recbreadth update rep. successrate query cost insertion cost2 1 1 137 782 2 1 34 1472 3 1 17 2243 1 1 112 6373 2 1 13 14343 3 1 13 2086
2 1 0.65 5.5 722 2 0.85 5.6 1452 3 0.89 5.4 2123 1 0.95 5.5 7343 2 0.98 5.5 13633 3 0.994 5.4 2080
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
P-Grid Variations
• To be further explored– No global, maximal keylength– Growing and shrinking of keys
• problem: integrity of referenced peers
– Joining and leaving P-Grids
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
P-Grid Flexibility
• The algorithm represents rather a framework than a single solution– options are left open and leave room for optimization– e.g. taking into account
• access probability• existing data distribution• reachability and access cost
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
5. Application to Gnutella
• Currently under implementation• Uses Gnutella protocol and software• Controls routing of search requests using P-Grid• Problem: non-uniform distribution of search keys
– Build statistics– Compute a global, prefix-preserving hash function
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
Computing the Required Resources
• Assume– 10^7 searchable keys (substrings of filenames)– 10 Bytes for storing a peer address– 10^5 Bytes per peer provided for indexing– 30 % online probability– 99 % answer reliability
• Then– Approx. 20.000 peers can be supported– refmax = 20 is sufficient
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
6. Conclusions
• Scalable distributed and decentralized access structures are possible
• P-Grids offer a lot of flexibility to be further exploited
• Powerful tools for analysis required• Foundation for many fully decentralized P2P
applications• Application in mobile ad-hoc networks (
www.terminode.org), Swiss national research centre at EPFL
©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis
COOPIS 2001, Trento, Italy
References
• [Aberer01] Karl Aberer, Zoran Despotovic. Managing Trust in a Peer-2-Peer Information System. To appear in the Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2001) 2001.
• [Vingralek 98] Radek Vingralek, Yuri Breitbart, Gerhard Weikum: Snowball: Scalable Storage on Networks of Workstations with Balanced Load. Distributed and Parallel Databases 6(2): 117-156 (1998)
• [Stonebraker 96] Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB Journal 5(1): 48-63 (1996)
• [Plaxton 97] C. Greg Plaxton, Rajmohan Rajaraman, Andréa W. Richa: Accessing Nearby Copies of Replicated Objects in a Distributed Environment. SPAA 1997: 311-320.
• [Yokota 99] Haruo Yokota, Yasuhiko Kanemasa, Jun Miyazaki: Fat-Btree: An Update-Conscious Parallel Directory Structure. ICDE 1999: 448-457.
• [Litwin 97] Witold Litwin, Marie-Anne Neimat: LH*s: A High-Availability and High-Security Scalable Distributed Data Structure. RIDE 1997.
• [Stoica 00] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001.
• [Clarke 00] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001.
• [Ratnasamy01] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001.