Upload
gervase-hines
View
228
Download
0
Embed Size (px)
DESCRIPTION
P2P Search Techniques Centralized P2P systems e.g. Napster, Decentralized & unstructured P2P systems e.g. Gnutella Hybrid - partially decentralized e.g., Freenet Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems
Citation preview
P2P Search
COP6731Advanced Database Systems
P2P Computing
Powerful personal computer Share computing resources P2P Computing
Advantages: Shared infrastructure costs Highly scalable No SPOF censorship-resistance
P2P Search Techniques
Centralized P2P systems e.g. Napster, SETI@home
Decentralized & unstructured P2P systems e.g. Gnutella
Hybrid - partially decentralized e.g., Freenet
Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems
Napster
MP3 file sharing with a centralized catalog
Peers hold files Napster Inc’s servers hold catalog File transfer is P2P, using a
proprietary protocol
Central Napster server(xyz.mp3, 192.1.2.3)
192.1.2.3
Napster: Publish a File
Users upload their IP address and music titles they wish to share
Users search for peers to download desired files
xyz.mp3 ?
192.1.2.3192.1.2.3
Napster: Query for a File
Central Napster server
File transfer is P2P, using a proprietary protocol
192.1.2.3
xyz.mp3 ?
Napster: Transfer Requested File
Central Napster server
Disadvantage of Centralized Directory
Performance bottleneck Single point of failure
Can we do it without a directory ?
Gnutella
No catalog Pings network to locate Gnutella
peers File requests are broadcast to peers
Flooding or breadth-first research When provider is located, the file is
transferred via HTTP
xyz.mp3 ?
Gnutella: Issue a Request
Gnutella: Flood the Request
xyz.mp3
Gnutella: Reply with the File
Gnutella - Disadvantages
Network flooding - unnecessary network traffic
Using TTL - some files might not be found
Alternatively, using ultranodes (or supernodes) using depth-first search, i.e., Freenet
Morpheus, Kazaa
Cluster
Cluster
Cluster
CenterIndex forits cluster
C
B
A
F
E
D
I
H
G
Query: “W
ho has
file X”
Reply: “Peer H
has
file X”
Download file X from Peer H
SupernodeLayer
Using Ultranodes
Queries flood only the network of ultranodes
Other peer nodes shielded from query traffic
Combine the benefits of centralized and decentralized search;
Take advantage of the heterogeneity in peer capabilities;
Freenet - Depth-First Search
A
B
D
C
E
Query: “Who has file X”
Download file X from Peer E
Freenet – File not Found
A
B
D
C
E
Download file X from Peer E
F
NOT FOUND !
I HAVE FILE X !
The requested file not found due to a poor routing decision made at peer D
In this case, query backs out of the dead-end, and tries another peer in depth-first manner
Structured P2P Systems DHT-based
Chord / Pastry / Tapestry: hash-based into single dimensional space
CAN: hash-based into multi-dimensional space P-grid: hash-based into virtual binary search tree
Skip-list based Skipgraph / SkipNet
Index Tree-based BATON
DHT Design Goals
An “overlay” network with: Flexible mapping of keys to physical nodes
Data Independence Small network diameter Small degree (fan-out) Local routing decisions Robustness to churn Routing flexibility Proximity
A “storage” or “memory” mechanism with No guarantees on persistence Maintenance via soft state
Metrics
Searching/Lookup Number of hops in searching Number of messages Database related metrics:
Total disk I/O Response Time Accuracy
Maintenance Number of hops Number of messages
How to Bound Search Space ?
Network
Work onplacement!
Basic Idea - Hashing
Hash key
Object “y”
Objects have hash keys
Peer “x”Peer nodes also have hash keys in the same hash space
P2P Network
y xH(y) H(x)
Join (H(x))Publish (H(y))
Place object to the peer with closest hash keys
Viewed as a Distributed Hash Table
Hash table0 2128-1
Peernodes
Each is responsible for a range of the hash table,according to the peer hash key
Objects are placed in the peer with the closest keyNote thatpeers areInternetedges
Internet
How to Find an Object?
Hashtable
0 2128-1
Peernode
Simplest idea:Everyone knows everyone else!
one hop tofind the objectWant to keep only
a few entries!
Using Distributed Hash Table (DHT) A peer only needs to know its logical
neighbors Search based on multihop routing
Hashtable
0 2128-1
Peernode
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
Operation: take key as input; route messages to node holding key
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
insert(K1,V1)
Operation: take key as input; route messages to node holding key
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
Operation: take key as input; route messages to node holding key
insert(K1,V1)
(K1,V1)
K V
K VK V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
Operation: take key as input; route messages to node holding key
retrieve (K1)
K V
K VK V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: get()
Operation: take key as input; route messages to node holding key
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
retrieve (K1)
CAN – Content Addressable Network
Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone
Each peer knows the neighbors of its zone
Random assignment of peers to zones at startup
Dimensional-ordered multihop routing
CAN: Object Publishing
node I::publish(K,V) I
(1) a = hx(K)
CAN: Object Publishingx = a
node I::publish(K,V) I
(1) a = hx(K) b = hy(K)
CAN: Object Publishingx = a
y = b
node I::publish(K,V) I
(1) a = hx(K) b = hy(K)
CAN: Object Publishing
(2) route (K,V) -> J
node I::publish(K,V) I
J
(2) route (K,V) -> J
(3) J stores (K,V)
CAN: Object Publishing
(K,V)
node I::publish(K,V) I
(1) a = hx(K) b = hy(K)
J
(2) route “retrieve(K)” to J that is in charge of (a,b)
(K,V)(1) a = hx(K) b = hy(K)
node I::retrieve(K)
I
CAN: Object Retrieval
J
Some Research Topics
Content-based Image Retrieval in P2P
Location Management in P2P Security Considerations for DHT P2P Backup Wireless P2P