Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar...

Preview:

Citation preview

Hypersearching the Web

Soumen Chakrabarti, Byron Dom, S. Ravi Kumar,Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins

Jacob Kalakal JosephCS 572 (Spring 2011) | Class Presentation | June 21, 2011

Outline• Characteristics of the WWW• Motivation for building search engines• Traditional SEs and the challenges• Improvements the associated problems• CLEVER• Power of hyperlinks• Hubs and Authorities• Algorithm• Evaluate CLEVER• Future scope• Answer questions and class discussion

WWW ~ Universe

Motivation for search engines

Initial Attempts

• Ranking functions based on simple heuristics

Challenges: Synonymy

Challenges: Polysemy

Challenges: Spamming

• Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets

• White font on White background

Improvements

Semantic Networks Human selectors

Helps synonymy but worsens polysemy Impractical

Hyperlinks - What a CLEVER idea!

Hubs & Authorities

How it works

Clever vs. Google

Google’s faster! Clever looks back also

Pros

• Rapid convergence (5 iterations for root set of 3000 pages)• Independent of the initial H, A scores• Get info even before we actually crawl

Segregation of web into clusters

Cons

• The underlying assumption – “Web links confer authority” – could be incorrect!– Navigation

– Advertisement

– Disapproval

Cons

• Ignores the Anchor text• It is not necessary for every page to be either

a hub or an authority• Universally popular Websites like Wikipedia

will be an authority on almost everything• May return a General result for a Narrow topic

search

What’s next?

References• S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar,

P. Raghavan, S. Rajagopalan, A. Tomkins,Hypersearching the Web. Scientific American, June 1999.

• CLEVER project (http://www.almaden.ibm.com/projects/clever.shtml)

• J. Kleinberg.Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998

• S. Brin, L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. Vol. 30, No. 1-7, pp. 107-117, 1998.

• WordNet Project (http://wordnet.princeton.edu/)

Group Discussion

Recommended