Upload
eamon
View
31
Download
0
Embed Size (px)
DESCRIPTION
Kay- Uwe Schmidt*, Tobias Sarnow *, Ljiljana Stojanovic ** *SAP Research, Vincenz-Prießnitz-Straße 1, 76131 Karlsruhe, Germany **Forschungszentrum Informatik, Haid-und-Neu-Straße 10-14, 76131 Karlsruhe, Germany Symposium on Applied Computing (2009) 2009. 08. 13. - PowerPoint PPT Presentation
Citation preview
Center for E-Business TechnologySeoul National University
Seoul, Korea
Socially Filtered Web Search:An approach using social bookmarking tags to personalize web search
Kay-Uwe Schmidt*, Tobias Sarnow*, Ljiljana Stojanovic**
*SAP Research, Vincenz-Prießnitz-Straße 1, 76131 Karlsruhe, Germany
**Forschungszentrum Informatik, Haid-und-Neu-Straße 10-14, 76131 Karlsruhe, Germany
Symposium on Applied Computing (2009)
2009. 08. 13.
Summarized & presented by Babar Tareen, IDS Lab., Seoul National University
Copyright 2008 by CEBT
Introduction
Search engines do not consider current work context
Static results for all users
Server side personalization has limited use
Client side search engines rely on additional terms extracted from documents, thus not scalable
Social Bookmarking based search result personalization addresses these issues
2
Copyright 2008 by CEBT
Related Work
Google History
goZone.com
Mahalo.com
UCAIR
3
Copyright 2008 by CEBT
Motivation
4
A developer is looking for guide lines for testing DB code
Visits
www.ibm.com/db2
www.hsqldb.org
Googles
“Test”
Original Results
Web based certification
Personality test
Bandwidth test
Personalized Results
DB2 training
DB2 programming test
Copyright 2008 by CEBT
Personalizing Search Results
Tracking browsing behavior
Create user model
Url’s
Tags fetched from Delicious
Issue original query
Enhance search query by adding tags
Issue new query
Display both results
Tags given by a community of users provide a good summary of web page content
5
Url Tags (Metadata)
www.youtube.com
video, youtube,entertainment, web2.0
www.amazon.com
shopping, books, ama-zon, music
www.snu.ac.kr university, snu, korea, 서울대
www.hsqldb.org database, java, sql, opensource
www.ibm.com/db2
ibm, db2, database, unix
Copyright 2008 by CEBT
Architecture [1]
6
Search Module
Carries out original query
Inserts space (<DIV>) for personalized results
Metric Module
Includes a metric that delivers a tag for personalized search
Search Enhancer Module
Combines search string with metric module tags
Metadata Module
Extracts metadata for a visited website from delicious
Copyright 2008 by CEBT
Architecture [2]
Built as add-on on top of
Firefox
Internet Explorer
7
Copyright 2008 by CEBT
Metric [1]
Two datasets
Collection of visited websites
Tags for each website
Query last 20 disjunct websites from user model
Format (url, count)
Sorted by weight ‘γ’
8
Copyright 2008 by CEBT
Metric [2]
Tags assigned to website
Format (tag, no of users)
t → tags assigned to a website
T → tags for all websites
9
Copyright 2008 by CEBT
Algorithm
10
Copyright 2008 by CEBT
Result
11
Copyright 2008 by CEBT
Evaluation
How effective can this be ?
12
Center for E-Business TechnologySeoul National University
Seoul, Korea13
Can Social Bookmarking Improve Web Search?
Pauly Heymann, Georgia Koutrika, Hector Garcia-Molina
Dept. of Computer Science, Stanford University
USA
Web Search and Data Mining 2008
Copyright 2008 by CEBT
Positive Factors [1]
URLs
Pages posted on delicious are often recently modified
– Delicious users post interesting pages that are actively updated or have been recently created
Approximately 25% of URLs posted by users are new, unindexed pages
– Delicious can server as a small data source for new web pages and to help crawl ordering
Roughly 9% of results for search queries are URLs present in delicious
– Delicious URLs are disproportionately common in search results compared to their coverage
While some users are more prolific than others, the top 10% of users only account for 56% of the posts
– Delicious is not highly reliant on a relatively small group of users
14
Copyright 2008 by CEBT
Positive Factors [2]
URLs
30-40% of URLs and approximately one in eight domains posted were not previously in delicious.
– Delicious has relatively little redundancy in page information
Tags
Popular query terms and tags overlap significantly
– Delicious may be able to help with queries where tags overlap with query terms
In this study, most tags were deemed relevant and objective by users
– Tags are on the whole accurate
15
Copyright 2008 by CEBT
Negative Factors
URLs
Approximately 120,000 URLs are posted to delicious each day
– The number of posts per day is relatively small; for instance, it represents 1/10 of the number of blog posts per day
There are roughly 115 million public posts, coinciding with about 30-50 million unique URLs
– The number of total posts is relatively small for instance, this is a small portion of the web as whole (perhaps 1/1000)
Tags
Tags are present in the pagetext of 50% of the pages they annotate
– A substantial proportion of tags are obvious in context, and many tagged pages would be discovered by a search engine
Domains are often highly correlated with particular tags and vice versa
– It may be more efficient to train librarians to label domains than to ask users to tag pages
16
Copyright 2008 by CEBT
Discussion
Query expansion model based on Social tagging
What is the probability of finding tags for random URL in delicious.com?
Generalization vs. Specialization
17