Upload
daisy-price
View
223
Download
1
Tags:
Embed Size (px)
Citation preview
ForeseerA Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword SearchesDeveloped by Hailong Cai & Jun Wang
Presented By: Alex Thompson
Overview1. Peer-to-Peer Network Architecture
2. The standard (Gnutella)
3. Improving the design
4. Foreseer
5. The Search Algorithm
6. Experiment
7. Results
8. Further Study
9. Conclusions
Peer to Peer Architecture
• Motivations• Rapidly distribute data• Create vast repository of data spread over many nodes• Increase data availability• Leverage collective computing power to ease stress on
individual nodes and connections• Conserve bandwidth by distributing load throughout network
• Obstacles• Scalability• Bottlenecking• Wasted Bandwidth• File Corruption
Peer to Peer Architecture
Simple
Peer to Peer Architecture
Simple
Complex
Gnutella
• ‘True’ Peer-to-Peer• All nodes equally participate• All nodes expected to share workload and bandwidth
similarly• Key ideal: As long as 2 nodes exist, network will persist
• Searching in Gnutella• Send query to all connected nodes (typically 5 to 10)• These nodes search their file base
• If the data exists, they set up a connection and transfer the data.• If not, they forward the query the nodes to which they are connected.
• This process continues until query is found or dropped• Query will attempt to reach outer limits of network.
Gnutella
• Issues:• Not very scalable
• Message flooding occurs as network size increases.
• Inefficient use of bandwidth• Searches may cross same path multiple times.
• Searches take “walks to nowhere” by querying nodes with nothing shared.
• Searches completely directionless and bounce around network.
• Node tracking and inventory nonexistent• Nodes do not inform peers when they disconnect and searches may be sent
to dead ends.
Improving the Design
• Use a limited “blind” searching algorithm• Random Walk – forward query to random subset of
neighbors.• Pro: Reduces message flooding• Con: Still have repeated walks and walks to nowhere
• Directed BSF – select neighbors that have produced quality results before.
• Pro: Reuses paths by maintaining ‘Shortcuts’ list to previously useful nodes.
• Con: If a peer in the path departs, the path is lost.• Con: may delay queries for objects not on ‘shortcut’ neighbors as that
list must be exhausted before other neighbors are contacted.
Improving the Design
• Use an “informed” searching algorithm • Intelligent BFS – maintain query tables of data classes for
neighbors at each node.• Pro: Reuses paths for queries of similar classes.• Con: The space required for this is too large, with update overhead
offsetting potential benefits.• Con: Node data deletion and node departure are not addressed.
• Local Indices – Maintain list of files for neighbors within a certain radius (r)
• Pro: A single node may respond for many others• Con: If r is small, many queries cannot be satisfied• Con: If r is large, updating indices becomes too expensive
Foreseer
• Exploit inherent 2-dimensional locality in P2P systems• By using geographical and temporal localities, search
efficiency and scalability can be achieved.• Geographical Locality: Nodes which are geographically close to each
other.• Temporal Locality: Nodes which have been useful previously.
• 3 Components• Neighbors overlay (geographical component).• Friends overlay (temporal component).• Bloom Filters (indices distributed according to
relationships between friends and neighbors).
Foreseer
Neighbor Link
Friend Link
Foreseer
• Rationale: Daily Life• Everyone has friends and neighbors
• Neighbors live close by and you meet when you first settle in• Friends are met through business dealings and may live distantly
• Together, these constitute social network• If you need help on a project, you will go to this network• If no one in your circle of friends and neighbors can help, they will
check with their own circle.
Foreseer
• Bloom Filters - Hash-based data structure representing a set.• Why use this approach
• # of shared files on peers is limited (75% share 100 or less).• Documents on peer tend to share common topics.• Most files shared are multimedia, which has few keywords.
• Some stats• Returns no false negatives• May return false positives at a rate of
where m = # of bits in the filter, n = # of elements in the set• Assuming K-max keywords = 10,000 and k hash functions = 8, the
equation to compute m is
n
m
)6185.0(
bitskKMax 416,114
2ln
8000,10
2ln
Foreseer
• Your Neighbors• Bidirectional – neighbors are aware of e/other.• Latency used as metric for distance
• The longer the delay, the further away the potential neighbor must be
• Moving into the neighborhood• Established at setup
• ‘Ping’ sent out at startup• If a potential neighbor has fewer than ‘max’ neighbors, it will reply.• New neighbors also send filters along with positive replies.
• Process continues until new peer has minimum # of neighbors.
Foreseer
• Your Friends• Unidirectional
• Each peer has a certain group that it considers friends.• Each peer has a certain group that considers it a friend (backfriends)
• Nodes keep content info of friends, but not backfriends.
• How to make friends• At setup, new node asks its neighbors who their friends are.• Any friends of the neighbors that can accept a friend will
reply positively.• This is only an ‘initial’ list.
• After successful downloads, peer may update its friends list.• The idea: if a node was useful in finding data before, it might be
again – make it a friend.
Foreseer
• Node departure• Expected/Intentional - Nodes inform friends and neighbors
when they leave.• Drop – friends and neighbors are not informed
• Neighbors may learn of departure when they try and contact• No provision given for node to release backfriends
• Re-establishment• Attempt to contact neighbors and friends• If neighbors and friends can still accept you, they will
• May be necessary to repeat initialization if sufficient time has passed.
• No one knows you anymore or you’re in a new neighborhood.
Search Algorithm
• Basic Steps1. Initiating node attempts to resolve query locally
2. If query cannot be resolved locally, it is forwarded along the friends and neighbors links.
• Note: ‘Free-riders’ will not be searched until neighbors are touched.
3. If query is resolved, send success confirmation message.
Search Algorithm
• Local Matching• Keywords from stored documents are mapped to a filter:
Local Filter Keywords
“Geographical Locality”
“Temporal Locality”
“Foreseer”
“Bloom Filter”
Friend Filter Keywords
“search”
“distributed has table”
“caching”
“Bloom Filter”
Local
Filter
Friend
Filter
Search Algorithm
• Expanding to Friends and Neighbors
1hh 121 hhhh
21 hhh
Search Friends
Search Neighbors
Terminate Search
h1 = the # of hops along the friends overlay
h2 = the # of hops along the neighbors overlay
Search Algorithm• Friends go 1st
1.Friends files interest the source nodes, so friends of friends files are more likely to interest the source node.
2.Friends links don’t refer to free-riders.
3.Friends link are likely more spread out over the network.
HopNodes
TouchedNodes
ForseenMsgs
Produced0 1 f + n 01 f (f + n) f f… … … …i f-i (f + n) f-i f-i
… … … …h1 f-h1 (f + n) f-h1 f-h1
Search Algorithm• Then neighbors
• The friends search is limited to h1 hops.
• After this, the neighbors are searched for h2 hops.
HopNodes
Touched Nodes ForseenMsgs
Produced0 1 (f + n) f-h1 01 n (f + n) n f-h1 n… … … …i n-j (f + n) n-j f-h1 n-j
… … … …h2 n-h2 (f + n) n-h2 f-h1 n-h2
Search Algorithm
• Other resource-considerations• Free Riders – When there’s nothing to look for…
• (Initially) skipping free-riders saves a great deal of pointless walks.• Free-riders may issue queries to the system, but will have no friends.• Eliminates many meaningless messages and recovers bandwidth.
• False Positives – “But it’s supposed to be here?!”• Identification – multiple friends/neighbors seem to have the matching
data.• Resolution – selectively forward query to 2 or more friends/neighbors
• Chance of double error extremely slim.
Search Algorithm
• System Maintenance• Object Publishing & Removal
• Extract keywords from new file• Recreate context filter• Send ‘changes’ message to all backfriends and neighbors
• Alters bits on remote content filters.
• Node join & Departure• When a node leaves, it tells its backfriends and neighbors
• Backfriends & neighbors remove connection, delete content filter.
• Departing node caches F & N’s list for contact upon reconnection.
• When a node drops, BF’s & N’s wont know until they attempt contact• A more aggressive ‘ping-pong’ system could be implemented to deal
with this.
Experiment
• Foreseer settings
Neighbors (N)
Friends (F)
Backfriends (F-1)
102 N
84 F
201 F
Experiment
• The Challengers• Gnutella
• TTL = 7
• IBS (Interest Based Shortcuts)• TTL = 7• Max Shortcuts = 10
Experiment• LI (Local Indices)
• LI-1• r = 1• Note: less indices but can’t
satisfy enough queries
• LI-2• r = 2• Note: more indices but
expensive to maintain
Experiment• Network Topology
• Transit-Stub model• Euclidean coordinate space
• 51,984 physical nodes
• Randomly create nodes• 9 transit domains• 16 transit nodes per domain (on average)• 9 stub domains per transit node• 40 stub nodes (on average)
• Randomly create links• Transit nodes connect at a probability of 60%• Stub nodes connect at probability of 40%
Experiment• Latency
Experiment• Hardware
• SUNFire 880• 16 CPUs• Java 2 SDK environment
• Software• Java
• Test Data• Trace of an eDonkey system
• 923,000 filenames
• 37,000 peers
• Conducted during 1st Week, Nov. 2003
• Reconstruct download trace• Consists of query, response, transfer
• Assume only 1 copy of file is shared before simulation begins
Experiment• Test Metrics
• Success Rate – % successful queries.• Response Time – average time to fine 1st match.• Relative Distance – ‘Virtual’ distance traveled to reach destination.• Messaged Produced – average # of messages produced during search.• Nodes touched – average # of nodes visited during search.• Free-riders touched – average # of non-participating nodes visited during
search.
Results• Foreseer simulations
• Optimal policy chosen is P1: {F5 N1}• Fewer free riders than ‘best’ policy
PolicySuccess
rateAvg. hops
Avg. Response Time
Relative Disance
# of messages
# of nodes
touched# of free-riders
touched
P1: {F3 N3} 93.64% 3.01 426 3.91 194 164 56P1: {F4 N2} 99.70% 2.95 453 4.22 234 175 18P1: {F5 N1} 99.90% 2.97 459 4.29 262 177 2P1: {F6 N0} 99.34% 2.95 457 4.24 250 169 0P2: {N4 F4} 25.30% 5.64 482 4.36 181 163 5P3: {F4 // N2} 94.87% 2.77 402 3.7 191 150 7P3: {F4 // N3} 98.25% 2.75 382 3.42 210 170 28P3: {F4 // N4} 99.29% 2.74 374 3.36 228 187 42P4: {N2 F2 N2} 96.46% 4.14 442 4.01 228 181 18P5: {F2 N2 F2} 98.10% 3.35 403 3.65 208 174 47
Results• Queries Served
• Majority served by friends• Why use Neighbors?
• h > 3 = trouble
Results• Search Efficiency
• Response Time• Foreseer exploits both temporal
and geographical locality.
• IBS pays greater penalty prior to contacting neighbors.
• Gnutella and LI-x since neither exploit both localities.
Results• Search Efficiency
• Relative Distance• Only Foreseer defines neighbor as
‘temporally’ close.
• Other systems may be sending greater distances to neighbors.
Results• Search Cost
• # of Messages• Gnutella floods all queries
• Others do if not met by shortcuts or local indices
• Foreseer benefits• Friends satisfy most queries
Results• Search Cost
• # of nodes touched• Gnutella – no optimization
• IBS – temporal locality
• LI-2 – abundant indexing
• Foreseer – combines both intelligently
Results• Search Cost
• Free-riders touched• Challengers mostly the same %
• But Foreseer still touches fewer total nodes
• Search Cost• Update Messages
• IBS & Gnutella inapplicable
• Foreseer almost as good as LI-2 with less bandwidth used
• Bloom filters decrease size of remote filters
Results• Foreseer testing
• Response time• 4 groups, e/ roughly double
size of smaller group.
• Note: very little improvement from adding neighbors.
• Message Count• Same idea as above
• Note: The paper concedes that for this test there was an anomaly with Max |F| = 4
• Query could not reach enough nodes due to the upper bound on friends.
Results• Foreseer testing
• Update costs• Linear function
• y = 1.5x + b
• # of free-riders touched• Varying N has little to do with
result once Max |F| >= 8• At Max |F| >= 12, query will not
touch free riders• Infers resolution within friends
overlay.
• Indicates, again, reliance on friends overlay for success.
Further Study
• Non-participant flags for free-riders.• Actual message costs of ping-pong peer detection.• Friends neighbors intersection resolution.
Conclusion
• Created 2 overlays for an unstructured P2P network• Geographical (neighbors)• Temporal (friends)
• Achieved high search efficiency• Content filters
• Trace-driven simulation• 70% improvement
• response time • relative distance
• 90% improvement• # of query messages• # of nodes touched