P2P Search COP6731 Advanced Database Systems. P2P Computing Powerful personal computer Share computing resources P2P Computing Advantages: Shared

P2P Search

COP6731Advanced Database Systems

P2P Computing

Powerful personal computer Share computing resources P2P Computing

Advantages: Shared infrastructure costs Highly scalable No SPOF censorship-resistance

Name

P2P Search Techniques

Centralized P2P systems e.g. Napster, SETI@home

Decentralized & unstructured P2P systems e.g. Gnutella

Hybrid - partially decentralized e.g., Freenet

Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems

Napster

MP3 file sharing with a centralized catalog

Peers hold files Napster Inc’s servers hold catalog File transfer is P2P, using a

proprietary protocol

Central Napster server(xyz.mp3, 192.1.2.3)

192.1.2.3

Napster: Publish a File

Users upload their IP address and music titles they wish to share

Users search for peers to download desired files

xyz.mp3 ?

192.1.2.3192.1.2.3

Napster: Query for a File

Central Napster server

File transfer is P2P, using a proprietary protocol

192.1.2.3

xyz.mp3 ?

Napster: Transfer Requested File

Central Napster server

Disadvantage of Centralized Directory

Performance bottleneck Single point of failure

Can we do it without a directory ?

Gnutella

No catalog Pings network to locate Gnutella

peers File requests are broadcast to peers

Flooding or breadth-first research When provider is located, the file is

transferred via HTTP

xyz.mp3 ?

Gnutella: Issue a Request

Gnutella: Flood the Request

xyz.mp3

Gnutella: Reply with the File

Gnutella - Disadvantages

Network flooding - unnecessary network traffic

Using TTL - some files might not be found

Alternatively, using ultranodes (or supernodes) using depth-first search, i.e., Freenet

Morpheus, Kazaa

Cluster

Cluster

Cluster

CenterIndex forits cluster

C

B

A

F

E

D

I

H

G

Query: “W

ho has

file X”

Reply: “Peer H

has

file X”

Download file X from Peer H

SupernodeLayer

Using Ultranodes

Queries flood only the network of ultranodes

Other peer nodes shielded from query traffic

Combine the benefits of centralized and decentralized search;

Take advantage of the heterogeneity in peer capabilities;

Freenet - Depth-First Search

A

B

D

C

E

Query: “Who has file X”

Download file X from Peer E

Freenet – File not Found

A

B

D

C

E

Download file X from Peer E

F

NOT FOUND !

I HAVE FILE X !

The requested file not found due to a poor routing decision made at peer D

In this case, query backs out of the dead-end, and tries another peer in depth-first manner

Structured P2P Systems DHT-based

Chord / Pastry / Tapestry: hash-based into single dimensional space

CAN: hash-based into multi-dimensional space P-grid: hash-based into virtual binary search tree

Skip-list based Skipgraph / SkipNet

Index Tree-based BATON

DHT Design Goals

An “overlay” network with: Flexible mapping of keys to physical nodes

Data Independence Small network diameter Small degree (fan-out) Local routing decisions Robustness to churn Routing flexibility Proximity

A “storage” or “memory” mechanism with No guarantees on persistence Maintenance via soft state

Metrics

Searching/Lookup Number of hops in searching Number of messages Database related metrics:

Total disk I/O Response Time Accuracy

Maintenance Number of hops Number of messages

How to Bound Search Space ?

Network

Work onplacement!

Basic Idea - Hashing

Hash key

Object “y”

Objects have hash keys

Peer “x”Peer nodes also have hash keys in the same hash space

P2P Network

y xH(y) H(x)

Join (H(x))Publish (H(y))

Place object to the peer with closest hash keys

Viewed as a Distributed Hash Table

Hash table0 2128-1

Peernodes

Each is responsible for a range of the hash table,according to the peer hash key

Objects are placed in the peer with the closest keyNote thatpeers areInternetedges

Internet

How to Find an Object?

Hashtable

0 2128-1

Peernode

Simplest idea:Everyone knows everyone else!

one hop tofind the objectWant to keep only

a few entries!

Using Distributed Hash Table (DHT) A peer only needs to know its logical

neighbors Search based on multihop routing

Hashtable

0 2128-1

Peernode

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

Operation: take key as input; route messages to node holding key

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

insert(K1,V1)


K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V



insert(K1,V1)

(K1,V1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V



retrieve (K1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: get()


K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

retrieve (K1)

CAN – Content Addressable Network

Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone

Each peer knows the neighbors of its zone

Random assignment of peers to zones at startup

Dimensional-ordered multihop routing

CAN: Object Publishing

node I::publish(K,V) I

(1) a = hx(K)

CAN: Object Publishingx = a


(1) a = hx(K) b = hy(K)

CAN: Object Publishingx = a

y = b


(1) a = hx(K) b = hy(K)


(2) route (K,V) -> J


J

(2) route (K,V) -> J

(3) J stores (K,V)


(K,V)


(1) a = hx(K) b = hy(K)

J

(2) route “retrieve(K)” to J that is in charge of (a,b)

(K,V)(1) a = hx(K) b = hy(K)

node I::retrieve(K)

I

CAN: Object Retrieval

J

Some Research Topics

Content-based Image Retrieval in P2P

Location Management in P2P Security Considerations for DHT P2P Backup Wireless P2P

Documents

P2P Search COP6731 Advanced Database Systems. P2P Computing Powerful personal computer Share computing resources P2P Computing Advantages: Shared