42
Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Embed Size (px)

Citation preview

Page 1: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

ForeseerA Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword SearchesDeveloped by Hailong Cai & Jun Wang

Presented By: Alex Thompson

Page 2: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Overview1. Peer-to-Peer Network Architecture

2. The standard (Gnutella)

3. Improving the design

4. Foreseer

5. The Search Algorithm

6. Experiment

7. Results

8. Further Study

9. Conclusions

Page 3: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Peer to Peer Architecture

• Motivations• Rapidly distribute data• Create vast repository of data spread over many nodes• Increase data availability• Leverage collective computing power to ease stress on

individual nodes and connections• Conserve bandwidth by distributing load throughout network

• Obstacles• Scalability• Bottlenecking• Wasted Bandwidth• File Corruption

Page 4: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Peer to Peer Architecture

Simple

Page 5: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Peer to Peer Architecture

Simple

Complex

Page 6: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Gnutella

• ‘True’ Peer-to-Peer• All nodes equally participate• All nodes expected to share workload and bandwidth

similarly• Key ideal: As long as 2 nodes exist, network will persist

• Searching in Gnutella• Send query to all connected nodes (typically 5 to 10)• These nodes search their file base

• If the data exists, they set up a connection and transfer the data.• If not, they forward the query the nodes to which they are connected.

• This process continues until query is found or dropped• Query will attempt to reach outer limits of network.

Page 7: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Gnutella

• Issues:• Not very scalable

• Message flooding occurs as network size increases.

• Inefficient use of bandwidth• Searches may cross same path multiple times.

• Searches take “walks to nowhere” by querying nodes with nothing shared.

• Searches completely directionless and bounce around network.

• Node tracking and inventory nonexistent• Nodes do not inform peers when they disconnect and searches may be sent

to dead ends.

Page 8: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Improving the Design

• Use a limited “blind” searching algorithm• Random Walk – forward query to random subset of

neighbors.• Pro: Reduces message flooding• Con: Still have repeated walks and walks to nowhere

• Directed BSF – select neighbors that have produced quality results before.

• Pro: Reuses paths by maintaining ‘Shortcuts’ list to previously useful nodes.

• Con: If a peer in the path departs, the path is lost.• Con: may delay queries for objects not on ‘shortcut’ neighbors as that

list must be exhausted before other neighbors are contacted.

Page 9: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Improving the Design

• Use an “informed” searching algorithm • Intelligent BFS – maintain query tables of data classes for

neighbors at each node.• Pro: Reuses paths for queries of similar classes.• Con: The space required for this is too large, with update overhead

offsetting potential benefits.• Con: Node data deletion and node departure are not addressed.

• Local Indices – Maintain list of files for neighbors within a certain radius (r)

• Pro: A single node may respond for many others• Con: If r is small, many queries cannot be satisfied• Con: If r is large, updating indices becomes too expensive

Page 10: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer

• Exploit inherent 2-dimensional locality in P2P systems• By using geographical and temporal localities, search

efficiency and scalability can be achieved.• Geographical Locality: Nodes which are geographically close to each

other.• Temporal Locality: Nodes which have been useful previously.

• 3 Components• Neighbors overlay (geographical component).• Friends overlay (temporal component).• Bloom Filters (indices distributed according to

relationships between friends and neighbors).

Page 11: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer

Neighbor Link

Friend Link

Page 12: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer

• Rationale: Daily Life• Everyone has friends and neighbors

• Neighbors live close by and you meet when you first settle in• Friends are met through business dealings and may live distantly

• Together, these constitute social network• If you need help on a project, you will go to this network• If no one in your circle of friends and neighbors can help, they will

check with their own circle.

Page 13: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer

• Bloom Filters - Hash-based data structure representing a set.• Why use this approach

• # of shared files on peers is limited (75% share 100 or less).• Documents on peer tend to share common topics.• Most files shared are multimedia, which has few keywords.

• Some stats• Returns no false negatives• May return false positives at a rate of

where m = # of bits in the filter, n = # of elements in the set• Assuming K-max keywords = 10,000 and k hash functions = 8, the

equation to compute m is

n

m

)6185.0(

bitskKMax 416,114

2ln

8000,10

2ln

Page 14: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer

• Your Neighbors• Bidirectional – neighbors are aware of e/other.• Latency used as metric for distance

• The longer the delay, the further away the potential neighbor must be

• Moving into the neighborhood• Established at setup

• ‘Ping’ sent out at startup• If a potential neighbor has fewer than ‘max’ neighbors, it will reply.• New neighbors also send filters along with positive replies.

• Process continues until new peer has minimum # of neighbors.

Page 15: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer

• Your Friends• Unidirectional

• Each peer has a certain group that it considers friends.• Each peer has a certain group that considers it a friend (backfriends)

• Nodes keep content info of friends, but not backfriends.

• How to make friends• At setup, new node asks its neighbors who their friends are.• Any friends of the neighbors that can accept a friend will

reply positively.• This is only an ‘initial’ list.

• After successful downloads, peer may update its friends list.• The idea: if a node was useful in finding data before, it might be

again – make it a friend.

Page 16: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Foreseer

• Node departure• Expected/Intentional - Nodes inform friends and neighbors

when they leave.• Drop – friends and neighbors are not informed

• Neighbors may learn of departure when they try and contact• No provision given for node to release backfriends

• Re-establishment• Attempt to contact neighbors and friends• If neighbors and friends can still accept you, they will

• May be necessary to repeat initialization if sufficient time has passed.

• No one knows you anymore or you’re in a new neighborhood.

Page 17: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Search Algorithm

• Basic Steps1. Initiating node attempts to resolve query locally

2. If query cannot be resolved locally, it is forwarded along the friends and neighbors links.

• Note: ‘Free-riders’ will not be searched until neighbors are touched.

3. If query is resolved, send success confirmation message.

Page 18: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Search Algorithm

• Local Matching• Keywords from stored documents are mapped to a filter:

Local Filter Keywords

“Geographical Locality”

“Temporal Locality”

“Foreseer”

“Bloom Filter”

Friend Filter Keywords

“search”

“distributed has table”

“caching”

“Bloom Filter”

Local

Filter

Friend

Filter

Page 19: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Search Algorithm

• Expanding to Friends and Neighbors

1hh 121 hhhh

21 hhh

Search Friends

Search Neighbors

Terminate Search

h1 = the # of hops along the friends overlay

h2 = the # of hops along the neighbors overlay

Page 20: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Search Algorithm• Friends go 1st

1.Friends files interest the source nodes, so friends of friends files are more likely to interest the source node.

2.Friends links don’t refer to free-riders.

3.Friends link are likely more spread out over the network.

HopNodes

TouchedNodes

ForseenMsgs

Produced0 1 f + n 01 f (f + n) f f… … … …i f-i (f + n) f-i f-i

… … … …h1 f-h1 (f + n) f-h1 f-h1

Page 21: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Search Algorithm• Then neighbors

• The friends search is limited to h1 hops.

• After this, the neighbors are searched for h2 hops.

HopNodes

Touched Nodes ForseenMsgs

Produced0 1 (f + n) f-h1 01 n (f + n) n f-h1 n… … … …i n-j (f + n) n-j f-h1 n-j

… … … …h2 n-h2 (f + n) n-h2 f-h1 n-h2

Page 22: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Search Algorithm

• Other resource-considerations• Free Riders – When there’s nothing to look for…

• (Initially) skipping free-riders saves a great deal of pointless walks.• Free-riders may issue queries to the system, but will have no friends.• Eliminates many meaningless messages and recovers bandwidth.

• False Positives – “But it’s supposed to be here?!”• Identification – multiple friends/neighbors seem to have the matching

data.• Resolution – selectively forward query to 2 or more friends/neighbors

• Chance of double error extremely slim.

Page 23: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Search Algorithm

• System Maintenance• Object Publishing & Removal

• Extract keywords from new file• Recreate context filter• Send ‘changes’ message to all backfriends and neighbors

• Alters bits on remote content filters.

• Node join & Departure• When a node leaves, it tells its backfriends and neighbors

• Backfriends & neighbors remove connection, delete content filter.

• Departing node caches F & N’s list for contact upon reconnection.

• When a node drops, BF’s & N’s wont know until they attempt contact• A more aggressive ‘ping-pong’ system could be implemented to deal

with this.

Page 24: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Experiment

• Foreseer settings

Neighbors (N)

Friends (F)

Backfriends (F-1)

102 N

84 F

201 F

Page 25: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Experiment

• The Challengers• Gnutella

• TTL = 7

• IBS (Interest Based Shortcuts)• TTL = 7• Max Shortcuts = 10

Page 26: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Experiment• LI (Local Indices)

• LI-1• r = 1• Note: less indices but can’t

satisfy enough queries

• LI-2• r = 2• Note: more indices but

expensive to maintain

Page 27: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Experiment• Network Topology

• Transit-Stub model• Euclidean coordinate space

• 51,984 physical nodes

• Randomly create nodes• 9 transit domains• 16 transit nodes per domain (on average)• 9 stub domains per transit node• 40 stub nodes (on average)

• Randomly create links• Transit nodes connect at a probability of 60%• Stub nodes connect at probability of 40%

Page 28: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Experiment• Latency

Page 29: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Experiment• Hardware

• SUNFire 880• 16 CPUs• Java 2 SDK environment

• Software• Java

• Test Data• Trace of an eDonkey system

• 923,000 filenames

• 37,000 peers

• Conducted during 1st Week, Nov. 2003

• Reconstruct download trace• Consists of query, response, transfer

• Assume only 1 copy of file is shared before simulation begins

Page 30: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Experiment• Test Metrics

• Success Rate – % successful queries.• Response Time – average time to fine 1st match.• Relative Distance – ‘Virtual’ distance traveled to reach destination.• Messaged Produced – average # of messages produced during search.• Nodes touched – average # of nodes visited during search.• Free-riders touched – average # of non-participating nodes visited during

search.

Page 31: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Foreseer simulations

• Optimal policy chosen is P1: {F5 N1}• Fewer free riders than ‘best’ policy

PolicySuccess

rateAvg. hops

Avg. Response Time

Relative Disance

# of messages

# of nodes

touched# of free-riders

touched

P1: {F3 N3} 93.64% 3.01 426 3.91 194 164 56P1: {F4 N2} 99.70% 2.95 453 4.22 234 175 18P1: {F5 N1} 99.90% 2.97 459 4.29 262 177 2P1: {F6 N0} 99.34% 2.95 457 4.24 250 169 0P2: {N4 F4} 25.30% 5.64 482 4.36 181 163 5P3: {F4 // N2} 94.87% 2.77 402 3.7 191 150 7P3: {F4 // N3} 98.25% 2.75 382 3.42 210 170 28P3: {F4 // N4} 99.29% 2.74 374 3.36 228 187 42P4: {N2 F2 N2} 96.46% 4.14 442 4.01 228 181 18P5: {F2 N2 F2} 98.10% 3.35 403 3.65 208 174 47

Page 32: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Queries Served

• Majority served by friends• Why use Neighbors?

• h > 3 = trouble

Page 33: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Search Efficiency

• Response Time• Foreseer exploits both temporal

and geographical locality.

• IBS pays greater penalty prior to contacting neighbors.

• Gnutella and LI-x since neither exploit both localities.

Page 34: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Search Efficiency

• Relative Distance• Only Foreseer defines neighbor as

‘temporally’ close.

• Other systems may be sending greater distances to neighbors.

Page 35: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Search Cost

• # of Messages• Gnutella floods all queries

• Others do if not met by shortcuts or local indices

• Foreseer benefits• Friends satisfy most queries

Page 36: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Search Cost

• # of nodes touched• Gnutella – no optimization

• IBS – temporal locality

• LI-2 – abundant indexing

• Foreseer – combines both intelligently

Page 37: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Search Cost

• Free-riders touched• Challengers mostly the same %

• But Foreseer still touches fewer total nodes

Page 38: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

• Search Cost• Update Messages

• IBS & Gnutella inapplicable

• Foreseer almost as good as LI-2 with less bandwidth used

• Bloom filters decrease size of remote filters

Page 39: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Foreseer testing

• Response time• 4 groups, e/ roughly double

size of smaller group.

• Note: very little improvement from adding neighbors.

• Message Count• Same idea as above

• Note: The paper concedes that for this test there was an anomaly with Max |F| = 4

• Query could not reach enough nodes due to the upper bound on friends.

Page 40: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Results• Foreseer testing

• Update costs• Linear function

• y = 1.5x + b

• # of free-riders touched• Varying N has little to do with

result once Max |F| >= 8• At Max |F| >= 12, query will not

touch free riders• Infers resolution within friends

overlay.

• Indicates, again, reliance on friends overlay for success.

Page 41: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Further Study

• Non-participant flags for free-riders.• Actual message costs of ping-pong peer detection.• Friends neighbors intersection resolution.

Page 42: Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson

Conclusion

• Created 2 overlays for an unstructured P2P network• Geographical (neighbors)• Temporal (friends)

• Achieved high search efficiency• Content filters

• Trace-driven simulation• 70% improvement

• response time • relative distance

• 90% improvement• # of query messages• # of nodes touched