Upload
linette-leonard
View
212
Download
0
Embed Size (px)
Citation preview
Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines,But Not ReallyWai Gen Yee, Dongmei Jia, Linh Thai Nguyen{yee, jiadong, nguylin}@iit.eduInformation Retrieval LaboratoryIllinois Institute of TechnologyChicago, IL USA
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
2
Goal
To motivate research in peer-to-peer information retrieval (P2P IR).
To model P2P IR in terms of a metasearch engine.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
3
Model
Peers share data objects, each described with a descriptor (bag of terms).
Peers are connected in a random graph. Queries (bag of terms) are routed to peers
(servers) that return references to data objects O s.t.: DOQ
DO is the descriptor of O. Each descriptor also contains the hash
value of the data object.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
4
Metadata Distribution Example
Assume Q={Mozart, Concerto}. Ungrouped results:
Hash KeyAll
descriptors contain Q.
Sources
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
5
Motivation for Model
Peer to peer file-sharing.Millions of users.Petabytes of data.
• Data objects are replicated.• A replica’s descriptor is independently
maintained.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
6
Metasearch Engines
Search other search engines.dogpile.comaskjeeves.com
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
7
Main Metasearch Engine Activities Source selection.
Which search engines to search. Query dispatching.
Translating a query to a local format. Result selection.
Picking from the multiple result sets. Result merging.
Unifying/ranking the selected results.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
8
Source Selection
Metasearch engine.Employs profiles of each search
engine to make decision. P2P File-Sharing System.
Routing:• Flooding.• Use of statistics of neighbors.• Distributed hash tables.
Cost related to peer autonomy.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
9
Query Dispatching
Metasearch Engine.One search engine may use a vector
space model, and another might use a Boolean model.
P2P File-Sharing System.Some search engines, such as eMule,
access multiple networks.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
10
Result Selection
Metasearch Engine.Some results lists might be pruned if
they come from less relevant search engines.
Uses search engine profiles. P2P File-Sharing System.
Generally, all results are sent to the client.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
11
Result Merging
Metasearch Engine.Rankings from individual lists.Profiles of search engines.
P2P File-Sharing System.Group results.Rank based on likelihood of
successful download:• Group size.• Connection quality.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
12
Example Search on Limewire’s Gnutella
Query (number of results)
Descriptors
Group Size
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
13
Basic Difference
Metasearch engines assume a fixed and reliable set of search engines.
Can collect statistics on search engines to improve query processing and results.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
14
P2P File Sharing Research Areas (1/2)
Source selection:Inexpensive routing with autonomous
peers. Query dispatching:
Translating queries to maximize precision and recall of final result set.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
15
P2P File Sharing Research Areas (1/2)
Result selection:Usage of queries and local statistics to
prune returned results. Result merging:
Usage of replication and distributed metadata to improve rankings.
Recall: link analysis for Web search.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
16
Goals of Open Source in P2P File-Sharing Systems
Allow the communal development of the technology.New routing techniques.New ranking functions.
Disclose all functionality.Better security.No spyware.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
17
Examples of Openness in P2P File-Sharing
Gnutella is an open protocol.Limewire, Bearshare, Kazaa.
Limewire publishes an open-source implementation of the Gnutella protocol.
eMule is another open-source project built on a competing protocol.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
18
Conclusion
Many research areas.Can be modeled as a form of
metasearch engine. High impact.
Many users and petabytes of data. There already exists an active open-
source community.Large community of users and much
source exist.
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
19
Questions and Contact Information
Wai Gen [email protected]/~waigen
• Recent results and publications.
Information Retrieval Laboratory, Illinois Institute of Technologyir.iit.edu