Upload
noah-barker
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Sackler – May 11, 2003
Organizing Search Organizing Search ResultsResults
Susan DumaisSusan Dumais
Microsoft ResearchMicrosoft Research
Sackler – May 11, 2003
Organizing Search ResultsOrganizing Search Results
Algorithms and interfaces that Algorithms and interfaces that improve the effectiveness of searchimprove the effectiveness of search Beyond ranked lists Beyond ranked lists Main goal to support searchMain goal to support search Also information analysis and discoveryAlso information analysis and discovery
Example applicationsExample applications SWISH, results classificationSWISH, results classification GridViz, results summarizationGridViz, results summarization SIS, personal landmarks for contextSIS, personal landmarks for context
Sackler – May 11, 2003
Searching with Information Searching with Information Structured Hierarchically Structured Hierarchically
(SWISH)(SWISH) CollaboratorsCollaborators
Edward Cutrell, Hao Chen (Berkeley)Edward Cutrell, Hao Chen (Berkeley) Key ThemesKey Themes
Going beyond long lists of resultsGoing beyond long lists of results Classification algorithmsClassification algorithms UI techniquesUI techniques
More about itMore about it http://http://research.microsoft.comresearch.microsoft.com /~ /~sdumaissdumais
Sackler – May 11, 2003
Query: “jaguar”
Organizing Search Organizing Search ResultsResults
List Organization
=> Shopping
=> Automotive
=> Automotive
=> Computers
SWISH Category Organization
Sackler – May 11, 2003
LookSmart Directory StructureLookSmart Directory Structure ~400k pages; 17k categories; 7 levels~400k pages; 17k categories; 7 levels 13 top-level categories; 150 second-level 13 top-level categories; 150 second-level
categoriescategories Top-level CategoriesTop-level Categories
Web DirectoryWeb Directory
AutomotiveBusiness & FinanceComputers & InternetEntertainment & MediaHealth & FitnessHobbies & InterestsHome & FamilyPeople & ChatReference & EducationShopping & ServicesSociety & PoliticsSports & RecreationTravel & Vacations
Buy or Sell a CarChatFinance & InsuranceMagazines & BooksMaintenance & RepairMakes, Models & ClubsMotorcyclesNew Car ShowroomsOff-Road, 4X4 & RVsOther Auto InterestsShows & MuseumsTrucks & TractorsVintage & Classic
Sackler – May 11, 2003
SWISH SystemSWISH System
Combines the advantages ofCombines the advantages of Directories - Manually crafted structure but Directories - Manually crafted structure but
small <~3 million pages>small <~3 million pages> Search engines - Broad coverage but limited Search engines - Broad coverage but limited
metadata <~3 billion pages>metadata <~3 billion pages> Project search engine results to category Project search engine results to category
structurestructure Two main componentsTwo main components
Text classification models Text classification models UI for integrating search results and structure UI for integrating search results and structure
Context (category structure) plus focus (search Context (category structure) plus focus (search results)results)
Sackler – May 11, 2003
SWISH ArchitectureSWISH Architecture
manuallyclassified
webpages
SVMmodel
Train(offline)
websearchresults
localsearchresults
...Classify(online)
Sackler – May 11, 2003
Learning & ClassificationLearning & Classification
Support Vector Machine (SVM)Support Vector Machine (SVM) Accurate and efficient for text classification Accurate and efficient for text classification
(Dumais et al., Joachims)(Dumais et al., Joachims) Model = weighted vector of wordsModel = weighted vector of words
““Automobile” = motorcycle, vehicle, parts, automobile, Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche …harley, car, auto, honda, porsche …
““Computers & Internet” = rfc, software, provider, windows, Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads ...user, users, pc, hosting, os, downloads ...
Hierarchical models for LS directoryHierarchical models for LS directory 1 model for top level; N models for second1 model for top level; N models for second Very useful in conjunction w/ user interactionVery useful in conjunction w/ user interaction
Sackler – May 11, 2003
List Organization Category Organization
User Interface User Interface ExperimentsExperiments
Sackler – May 11, 2003
60
70
80
90
100
110
120
Hover Inline No Cat
Names
Browse
Hover Inline + Cat Names
Group Interface List Interface
Sackler – May 11, 2003
Effect of Query Difficulty
0
20
40
60
80
100
120
140
HARD
HARDE
ASY
EASYGroup List
Easy queries are faster (p<0.01)
Group faster than List (p<0.01)
Benefit is larger for hard queries (p<0.06)
Sackler – May 11, 2003
SWISH: Summary and SWISH: Summary and Design ImplicationsDesign Implications
Text ClassificationText Classification Learn accurate category Learn accurate category
modelsmodels Classify new web pages Classify new web pages
on-the-flyon-the-fly Organize search resultsOrganize search results
User InterfaceUser Interface Tightly couple search Tightly couple search
results with category results with category structurestructure
User manipulation of User manipulation of presentation of category presentation of category structurestructure
Sackler – May 11, 2003
GridVizGridViz
CollaboratorsCollaborators George Robertson, Edward Cutrell, George Robertson, Edward Cutrell,
Jeremy Goecks (Georgia Tech)Jeremy Goecks (Georgia Tech) Key ThemesKey Themes
Abstract beyond individual resultsAbstract beyond individual results Highly interactive interface to support Highly interactive interface to support
understanding of trends and understanding of trends and relationshipsrelationships
More about it More about it http://http://research.microsoft.com/~sdumaisresearch.microsoft.com/~sdumais
Sackler – May 11, 2003
GridVizGridViz
Summarize the results of a searchSummarize the results of a search Grid-based designGrid-based design
Axes represent topic, time, peopleAxes represent topic, time, people Cells encode frequency, recencyCells encode frequency, recency
Supports activities like:Supports activities like: What newsgroups are active (on topic x)?What newsgroups are active (on topic x)? What people are active, authoritative (on What people are active, authoritative (on
topic x)? topic x)? When did I last interact w/ people?When did I last interact w/ people?
Sackler – May 11, 2003
GridViz DemoGridViz Demo
Sackler – May 11, 2003
User Interface User Interface ExperimentsExperiments
List View
GridViz
0
5
10
15
20
25
30
35
40
0
5
10
15
20
25
30
35
40
GridViz List-view
Sackler – May 11, 2003
GridViz SummaryGridViz Summary
Abstracting beyond individual resultsAbstracting beyond individual results Highly interactive interfaceHighly interactive interface Grid-based designGrid-based design
Axes represent people, topic, timeAxes represent people, topic, time Cells encode frequency, recency Cells encode frequency, recency
Preliminary but promisingPreliminary but promising
Sackler – May 11, 2003
Stuff I’ve Seen (SIS)Stuff I’ve Seen (SIS) CollaboratorsCollaborators
Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel Jancke, Daniel Robbins, Merrie Ringel (Stanford)(Stanford)
Key ThemesKey Themes Your contentYour content Information re-useInformation re-use Integration across sourcesIntegration across sources
More about it More about it … … internal for nowinternal for now
Sackler – May 11, 2003
Search Today …Search Today …
Many locations, interfaces for
finding things (e.g., web, mail, local files, help, history, intranet)
Often slow
Sackler – May 11, 2003
Search with SISSearch with SIS Unified index of stuff you’ve seen
Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc.
Full-text index of content plus metadata attributes (e.g., creation time, author, title, size)
Automatic and immediate update of index Rich UI possibilities, since it’s your content
Architecture Client side indexing and storage Built using MS Search components
Sackler – May 11, 2003
SIS DemoSIS Demo
Sackler – May 11, 2003
SIS Alpha ObservationsSIS Alpha Observations 800+ internal users
Usage logs (incl different interfaces), survey data
File types opened 76% Email 14% Web pages 10% Files
Age of items accessed 7% today 22% within the last week 46% within the last month
Item Access Distribution
0
20
40
60
80
100
120
0 500 1000 1500 2000 2500
Days Since Item First Seen
Fre
qu
ency
Sackler – May 11, 2003
SIS Alpha ObservationsSIS Alpha Observations
Use of other search tools Non-SIS search for web,
email, and files decreases Importance of people
25% of the queries involve people’s names
Importance of time Date by far the most
popular sort field, followed by rank, author, title
Even when rank is the default
Files Email Web Pages0
1
2
3
4
5
6
Pre-usage
Post-usage
Sackler – May 11, 2003
SIS UI InnovationsSIS UI InnovationsTimeline w/ LandmarksTimeline w/ Landmarks
Importance of timeImportance of time Timeline interfaceTimeline interface
Contextualize results using Contextualize results using important landmarks as important landmarks as pointers into human memorypointers into human memory General: holidays, world eventsGeneral: holidays, world events Personal: important photos, Personal: important photos,
appointmentsappointments
Sackler – May 11, 2003
Milestones in Time DemoMilestones in Time Demo
Sackler – May 11, 2003
Milestones in TimelineMilestones in Timeline
Landmarks + Dates Dates Only0
5
10
15
20
25
30
Sea
rch
Tim
e (s
)
Sackler – May 11, 2003
SIS SummarySIS Summary Unified index of stuff you’ve seen
Fast access to full-text and metadata, from heterogeneous sources
Automatic and immediate update of index Rich UI possibilities
Next steps Better support for tagging -> “flatland” Implicit queries for finding related info, and
identifying “Stuff I Should See” Integration with richer activity-based info,
Eve
Sackler – May 11, 2003
Organizinging Search Organizinging Search ResultsResults
Algorithms and interfaces to improve searchAlgorithms and interfaces to improve search Use structure and contextUse structure and context
Examples and key themesExamples and key themes SWISH … groupingSWISH … grouping GridViz … abstractionGridViz … abstraction SIS … personal content and landmarksSIS … personal content and landmarks
AlsoAlso Important attributes: People, topics, timeImportant attributes: People, topics, time InteractionInteraction EvaluationEvaluation
More informationMore information http://research.microsoft.com/~sdumaishttp://research.microsoft.com/~sdumais [email protected]@microsoft.com
Christopher Lee of (SIG)IR … Christopher Lee of (SIG)IR … http://http://www.cdvp.dcu.ie/SIGIR/index.htmlwww.cdvp.dcu.ie/SIGIR/index.html