28
Sackler – May 11, 2003 Organizing Search Organizing Search Results Results Susan Dumais Susan Dumais Microsoft Research Microsoft Research

Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Embed Size (px)

Citation preview

Page 1: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Organizing Search Organizing Search ResultsResults

Susan DumaisSusan Dumais

Microsoft ResearchMicrosoft Research

Page 2: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Organizing Search ResultsOrganizing Search Results

Algorithms and interfaces that Algorithms and interfaces that improve the effectiveness of searchimprove the effectiveness of search Beyond ranked lists Beyond ranked lists Main goal to support searchMain goal to support search Also information analysis and discoveryAlso information analysis and discovery

Example applicationsExample applications SWISH, results classificationSWISH, results classification GridViz, results summarizationGridViz, results summarization SIS, personal landmarks for contextSIS, personal landmarks for context

Page 3: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Searching with Information Searching with Information Structured Hierarchically Structured Hierarchically

(SWISH)(SWISH) CollaboratorsCollaborators

Edward Cutrell, Hao Chen (Berkeley)Edward Cutrell, Hao Chen (Berkeley) Key ThemesKey Themes

Going beyond long lists of resultsGoing beyond long lists of results Classification algorithmsClassification algorithms UI techniquesUI techniques

More about itMore about it http://http://research.microsoft.comresearch.microsoft.com /~ /~sdumaissdumais

Page 4: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Query: “jaguar”

Organizing Search Organizing Search ResultsResults

List Organization

=> Shopping

=> Automotive

=> Automotive

=> Computers

SWISH Category Organization

Page 5: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

LookSmart Directory StructureLookSmart Directory Structure ~400k pages; 17k categories; 7 levels~400k pages; 17k categories; 7 levels 13 top-level categories; 150 second-level 13 top-level categories; 150 second-level

categoriescategories Top-level CategoriesTop-level Categories

Web DirectoryWeb Directory

AutomotiveBusiness & FinanceComputers & InternetEntertainment & MediaHealth & FitnessHobbies & InterestsHome & FamilyPeople & ChatReference & EducationShopping & ServicesSociety & PoliticsSports & RecreationTravel & Vacations

Buy or Sell a CarChatFinance & InsuranceMagazines & BooksMaintenance & RepairMakes, Models & ClubsMotorcyclesNew Car ShowroomsOff-Road, 4X4 & RVsOther Auto InterestsShows & MuseumsTrucks & TractorsVintage & Classic

Page 6: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SWISH SystemSWISH System

Combines the advantages ofCombines the advantages of Directories - Manually crafted structure but Directories - Manually crafted structure but

small <~3 million pages>small <~3 million pages> Search engines - Broad coverage but limited Search engines - Broad coverage but limited

metadata <~3 billion pages>metadata <~3 billion pages> Project search engine results to category Project search engine results to category

structurestructure Two main componentsTwo main components

Text classification models Text classification models UI for integrating search results and structure UI for integrating search results and structure

Context (category structure) plus focus (search Context (category structure) plus focus (search results)results)

Page 7: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SWISH ArchitectureSWISH Architecture

manuallyclassified

webpages

SVMmodel

Train(offline)

websearchresults

localsearchresults

...Classify(online)

Page 8: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Learning & ClassificationLearning & Classification

Support Vector Machine (SVM)Support Vector Machine (SVM) Accurate and efficient for text classification Accurate and efficient for text classification

(Dumais et al., Joachims)(Dumais et al., Joachims) Model = weighted vector of wordsModel = weighted vector of words

““Automobile” = motorcycle, vehicle, parts, automobile, Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche …harley, car, auto, honda, porsche …

““Computers & Internet” = rfc, software, provider, windows, Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads ...user, users, pc, hosting, os, downloads ...

Hierarchical models for LS directoryHierarchical models for LS directory 1 model for top level; N models for second1 model for top level; N models for second Very useful in conjunction w/ user interactionVery useful in conjunction w/ user interaction

Page 9: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

List Organization Category Organization

User Interface User Interface ExperimentsExperiments

Page 10: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

60

70

80

90

100

110

120

Hover Inline No Cat

Names

Browse

Hover Inline + Cat Names

Group Interface List Interface

Page 11: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Effect of Query Difficulty

0

20

40

60

80

100

120

140

HARD

HARDE

ASY

EASYGroup List

Easy queries are faster (p<0.01)

Group faster than List (p<0.01)

Benefit is larger for hard queries (p<0.06)

Page 12: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SWISH: Summary and SWISH: Summary and Design ImplicationsDesign Implications

Text ClassificationText Classification Learn accurate category Learn accurate category

modelsmodels Classify new web pages Classify new web pages

on-the-flyon-the-fly Organize search resultsOrganize search results

User InterfaceUser Interface Tightly couple search Tightly couple search

results with category results with category structurestructure

User manipulation of User manipulation of presentation of category presentation of category structurestructure

Page 13: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

GridVizGridViz

CollaboratorsCollaborators George Robertson, Edward Cutrell, George Robertson, Edward Cutrell,

Jeremy Goecks (Georgia Tech)Jeremy Goecks (Georgia Tech) Key ThemesKey Themes

Abstract beyond individual resultsAbstract beyond individual results Highly interactive interface to support Highly interactive interface to support

understanding of trends and understanding of trends and relationshipsrelationships

More about it More about it http://http://research.microsoft.com/~sdumaisresearch.microsoft.com/~sdumais

Page 14: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

GridVizGridViz

Summarize the results of a searchSummarize the results of a search Grid-based designGrid-based design

Axes represent topic, time, peopleAxes represent topic, time, people Cells encode frequency, recencyCells encode frequency, recency

Supports activities like:Supports activities like: What newsgroups are active (on topic x)?What newsgroups are active (on topic x)? What people are active, authoritative (on What people are active, authoritative (on

topic x)? topic x)? When did I last interact w/ people?When did I last interact w/ people?

Page 15: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

GridViz DemoGridViz Demo

Page 16: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

User Interface User Interface ExperimentsExperiments

List View

GridViz

0

5

10

15

20

25

30

35

40

0

5

10

15

20

25

30

35

40

GridViz List-view

Page 17: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

GridViz SummaryGridViz Summary

Abstracting beyond individual resultsAbstracting beyond individual results Highly interactive interfaceHighly interactive interface Grid-based designGrid-based design

Axes represent people, topic, timeAxes represent people, topic, time Cells encode frequency, recency Cells encode frequency, recency

Preliminary but promisingPreliminary but promising

Page 18: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Stuff I’ve Seen (SIS)Stuff I’ve Seen (SIS) CollaboratorsCollaborators

Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel Jancke, Daniel Robbins, Merrie Ringel (Stanford)(Stanford)

Key ThemesKey Themes Your contentYour content Information re-useInformation re-use Integration across sourcesIntegration across sources

More about it More about it … … internal for nowinternal for now

Page 19: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Search Today …Search Today …

Many locations, interfaces for

finding things (e.g., web, mail, local files, help, history, intranet)

Often slow

Page 20: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Search with SISSearch with SIS Unified index of stuff you’ve seen

Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc.

Full-text index of content plus metadata attributes (e.g., creation time, author, title, size)

Automatic and immediate update of index Rich UI possibilities, since it’s your content

Architecture Client side indexing and storage Built using MS Search components

Page 21: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SIS DemoSIS Demo

Page 22: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SIS Alpha ObservationsSIS Alpha Observations 800+ internal users

Usage logs (incl different interfaces), survey data

File types opened 76% Email 14% Web pages 10% Files

Age of items accessed 7% today 22% within the last week 46% within the last month

Item Access Distribution

0

20

40

60

80

100

120

0 500 1000 1500 2000 2500

Days Since Item First Seen

Fre

qu

ency

Page 23: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SIS Alpha ObservationsSIS Alpha Observations

Use of other search tools Non-SIS search for web,

email, and files decreases Importance of people

25% of the queries involve people’s names

Importance of time Date by far the most

popular sort field, followed by rank, author, title

Even when rank is the default

Files Email Web Pages0

1

2

3

4

5

6

Pre-usage

Post-usage

Page 24: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SIS UI InnovationsSIS UI InnovationsTimeline w/ LandmarksTimeline w/ Landmarks

Importance of timeImportance of time Timeline interfaceTimeline interface

Contextualize results using Contextualize results using important landmarks as important landmarks as pointers into human memorypointers into human memory General: holidays, world eventsGeneral: holidays, world events Personal: important photos, Personal: important photos,

appointmentsappointments

Page 25: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Milestones in Time DemoMilestones in Time Demo

Page 26: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Milestones in TimelineMilestones in Timeline

Landmarks + Dates Dates Only0

5

10

15

20

25

30

Sea

rch

Tim

e (s

)

Page 27: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

SIS SummarySIS Summary Unified index of stuff you’ve seen

Fast access to full-text and metadata, from heterogeneous sources

Automatic and immediate update of index Rich UI possibilities

Next steps Better support for tagging -> “flatland” Implicit queries for finding related info, and

identifying “Stuff I Should See” Integration with richer activity-based info,

Eve

Page 28: Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

Sackler – May 11, 2003

Organizinging Search Organizinging Search ResultsResults

Algorithms and interfaces to improve searchAlgorithms and interfaces to improve search Use structure and contextUse structure and context

Examples and key themesExamples and key themes SWISH … groupingSWISH … grouping GridViz … abstractionGridViz … abstraction SIS … personal content and landmarksSIS … personal content and landmarks

AlsoAlso Important attributes: People, topics, timeImportant attributes: People, topics, time InteractionInteraction EvaluationEvaluation

More informationMore information http://research.microsoft.com/~sdumaishttp://research.microsoft.com/~sdumais [email protected]@microsoft.com

Christopher Lee of (SIG)IR … Christopher Lee of (SIG)IR … http://http://www.cdvp.dcu.ie/SIGIR/index.htmlwww.cdvp.dcu.ie/SIGIR/index.html