19
Search in Peer-to- Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu Information Retrieval Laboratory Illinois Institute of Technology Chicago, IL USA

Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Embed Size (px)

Citation preview

Page 1: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines,But Not ReallyWai Gen Yee, Dongmei Jia, Linh Thai Nguyen{yee, jiadong, nguylin}@iit.eduInformation Retrieval LaboratoryIllinois Institute of TechnologyChicago, IL USA

Page 2: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

2

Goal

To motivate research in peer-to-peer information retrieval (P2P IR).

To model P2P IR in terms of a metasearch engine.

Page 3: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

3

Model

Peers share data objects, each described with a descriptor (bag of terms).

Peers are connected in a random graph. Queries (bag of terms) are routed to peers

(servers) that return references to data objects O s.t.: DOQ

DO is the descriptor of O. Each descriptor also contains the hash

value of the data object.

Page 4: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

4

Metadata Distribution Example

Assume Q={Mozart, Concerto}. Ungrouped results:

Hash KeyAll

descriptors contain Q.

Sources

Page 5: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

5

Motivation for Model

Peer to peer file-sharing.Millions of users.Petabytes of data.

• Data objects are replicated.• A replica’s descriptor is independently

maintained.

Page 6: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

6

Metasearch Engines

Search other search engines.dogpile.comaskjeeves.com

Page 7: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

7

Main Metasearch Engine Activities Source selection.

Which search engines to search. Query dispatching.

Translating a query to a local format. Result selection.

Picking from the multiple result sets. Result merging.

Unifying/ranking the selected results.

Page 8: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

8

Source Selection

Metasearch engine.Employs profiles of each search

engine to make decision. P2P File-Sharing System.

Routing:• Flooding.• Use of statistics of neighbors.• Distributed hash tables.

Cost related to peer autonomy.

Page 9: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

9

Query Dispatching

Metasearch Engine.One search engine may use a vector

space model, and another might use a Boolean model.

P2P File-Sharing System.Some search engines, such as eMule,

access multiple networks.

Page 10: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

10

Result Selection

Metasearch Engine.Some results lists might be pruned if

they come from less relevant search engines.

Uses search engine profiles. P2P File-Sharing System.

Generally, all results are sent to the client.

Page 11: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

11

Result Merging

Metasearch Engine.Rankings from individual lists.Profiles of search engines.

P2P File-Sharing System.Group results.Rank based on likelihood of

successful download:• Group size.• Connection quality.

Page 12: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

12

Example Search on Limewire’s Gnutella

Query (number of results)

Descriptors

Group Size

Page 13: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

13

Basic Difference

Metasearch engines assume a fixed and reliable set of search engines.

Can collect statistics on search engines to improve query processing and results.

Page 14: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

14

P2P File Sharing Research Areas (1/2)

Source selection:Inexpensive routing with autonomous

peers. Query dispatching:

Translating queries to maximize precision and recall of final result set.

Page 15: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

15

P2P File Sharing Research Areas (1/2)

Result selection:Usage of queries and local statistics to

prune returned results. Result merging:

Usage of replication and distributed metadata to improve rankings.

Recall: link analysis for Web search.

Page 16: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

16

Goals of Open Source in P2P File-Sharing Systems

Allow the communal development of the technology.New routing techniques.New ranking functions.

Disclose all functionality.Better security.No spyware.

Page 17: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

17

Examples of Openness in P2P File-Sharing

Gnutella is an open protocol.Limewire, Bearshare, Kazaa.

Limewire publishes an open-source implementation of the Gnutella protocol.

eMule is another open-source project built on a competing protocol.

Page 18: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

18

Conclusion

Many research areas.Can be modeled as a form of

metasearch engine. High impact.

Many users and petabytes of data. There already exists an active open-

source community.Large community of users and much

source exist.

Page 19: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

19

Questions and Contact Information

Wai Gen [email protected]/~waigen

• Recent results and publications.

Information Retrieval Laboratory, Illinois Institute of Technologyir.iit.edu