Upload
daniel-tunkelang
View
1.821
Download
0
Embed Size (px)
DESCRIPTION
This presentation outlines the principles of information seeking as a dialogue and walk though concrete examples that illustrate the principles of human-computer information retrieval (HCIR). The foundation is an interactive set retrieval approach that responds to queries with an overview of the user\'s current context and an organized set of options for incremental exploration. Contextual summaries of document sets optimize the system\'s communication with the user, while query refinement options optimize user\'s communication with the system. By enabling bidirectional communication between the user and the system, we can address the inherent limitations of best-match approaches.
Citation preview
© 2008 Endeca Technologies, Inc. All rights reserved.
Set Retrieval 2.0
Daniel TunkelangChief Scientist, Endeca
© 2008 Endeca Technologies, Inc. All rights reserved.2
howdy!
• 1988 – 1992
• 1993 – 1998
• 1999 -
© 2008 Endeca Technologies, Inc. All rights reserved.3
overview
what’s right with search today?
what’s wrong with search today?
how do we fix it?
© 2008 Endeca Technologies, Inc. All rights reserved.4
let’s quickly review some history…
© 2008 Endeca Technologies, Inc. All rights reserved.5
1947: Hans Peter Luhn
© 2008 Endeca Technologies, Inc. All rights reserved.6
1968: Gerald Salton
© 2008 Endeca Technologies, Inc. All rights reserved.7
1972: Karen Spärck Jones
© 2008 Endeca Technologies, Inc. All rights reserved.8
1980s: lots of progress
© 2008 Endeca Technologies, Inc. All rights reserved.9
1990s – 2000s: WWW
© 2008 Endeca Technologies, Inc. All rights reserved.10
today
© 2008 Endeca Technologies, Inc. All rights reserved.11
so, do we all feel lucky?
© 2008 Endeca Technologies, Inc. All rights reserved.12
recession? what recession?
© 2008 Endeca Technologies, Inc. All rights reserved.13
ask the users…
© 2008 Endeca Technologies, Inc. All rights reserved.14
…though they do have complaints
78% wish search engines could read their minds
what frustrates users most?– 25%: deluge of results– 24%: too many paid listings– 19%: inability to understand their keywords– 19%: disorganized / random results
The State of SearchAutobytel & Kelton Research, Oct ’07
© 2008 Endeca Technologies, Inc. All rights reserved.15
web search vs. enterprise search
“Search on the internet is solved. I always find what I need.
But why not in the enterprise?
Seems like a solution waiting to happen.”
- a Fortune 500 CTO
© 2008 Endeca Technologies, Inc. All rights reserved.16
enterprise users really have complaints
Why is Joe the Knowledge Worker so upset?
– 49%: finding the information needed to do their job is difficult and time consuming
– 50%: findability within organization worse than on their own consumer-facing site
Market IQ Report on FindabilityAIIM, June ’08
© 2008 Endeca Technologies, Inc. All rights reserved.17
selection bias?
© 2008 Endeca Technologies, Inc. All rights reserved.18
the library and information science critique
• models– relevance is subjective
• evaluation– neglects interactivity
• tools– no support for exploration
© 2008 Endeca Technologies, Inc. All rights reserved.19
the rebuttal
"Tell us what to do, and we will do it."
© 2008 Endeca Technologies, Inc. All rights reserved.20
besides, search is 90% solved
© 2008 Endeca Technologies, Inc. All rights reserved.21
we need to call a truce
- real, effective systems
- that support interaction
- cost-effective to evaluate
© 2008 Endeca Technologies, Inc. All rights reserved.22
let’s go back to the 80s for a moment
© 2008 Endeca Technologies, Inc. All rights reserved.23
then vs. now
• known-item search was an open problem– now it’s a commodity
• library and information science ideas of the 80s– ahead of their time
• now we can find known items– let’s tackle more ambitious information needs
© 2008 Endeca Technologies, Inc. All rights reserved.24
requirements
© 2008 Endeca Technologies, Inc. All rights reserved.25
transparency
© 2008 Endeca Technologies, Inc. All rights reserved.26
control
© 2008 Endeca Technologies, Inc. All rights reserved.27
guidance
© 2008 Endeca Technologies, Inc. All rights reserved.28
precision = fraction of retrieved documents that are relevant
recall = fraction of relevant documents that are retrieved
retrieveddocuments
relevantdocuments
set retrieval
© 2008 Endeca Technologies, Inc. All rights reserved.29
recall
precision
the classic trade-off
© 2008 Endeca Technologies, Inc. All rights reserved.30
set retrieval: 2 out of 3
© 2008 Endeca Technologies, Inc. All rights reserved.31
set retrieval 2.0 = set retrieval + guidance
Did you mean: guidance Related SearchesGuidance Counselor SalaryGuidance Counselor Job DescriptionDefinition of GuidanceGuidance CounselingHistory of Guidance CounselingChild GuidanceCareer GuidanceWhat Is the Meaning of GuidanceFree Marriage CounselingProblems in MarriageCareer ExplorationRole of School Counselor
© 2008 Endeca Technologies, Inc. All rights reserved.32
guidance vs. mind reading
• system can’t read your mind
• spouse / best friend can’t read your mind
• sometimes you can’t read your own mind
© 2008 Endeca Technologies, Inc. All rights reserved.33
so where does guidance come from?
© 2008 Endeca Technologies, Inc. All rights reserved.34
it’s people!
© 2008 Endeca Technologies, Inc. All rights reserved.35
human-computer information retrieval
• don’t just guess the user’s intent– optimize communication
• de-emphasize the top ten documents– response is a set of documents
• think beyond single queries– support refinement and exploration
© 2008 Endeca Technologies, Inc. All rights reserved.36
recall
precision
hcir cheats the trade-off
© 2008 Endeca Technologies, Inc. All rights reserved.37
but how do we get there?
© 2008 Endeca Technologies, Inc. All rights reserved.38
set retrieval 2.0
• set retrieval that responds to queries with– overview of the user's current context– organized set of options for exploration
• contextual summaries of document sets– optimize system’s communication with user
• query refinement options– optimize user’s communication with system
© 2008 Endeca Technologies, Inc. All rights reserved.39
faceted search guides refinement
© 2008 Endeca Technologies, Inc. All rights reserved.40
showing the right facets: microwaves
© 2008 Endeca Technologies, Inc. All rights reserved.41
showing the right facets: ceiling fans
© 2008 Endeca Technologies, Inc. All rights reserved.42
query-driven clarification before refinement
Matching Categories include:
Appliances > Small Appliances > Irons & Steamers
Appliances > Small Appliances > Microwaves & Steamers
Bath > Sauna & Spas > Steamers
Kitchen > Bakeware & Cookware > Cookware >Open Stock Pots > Double Boilers & Steamers
Kitchen > Small Appliances > Steamers
© 2008 Endeca Technologies, Inc. All rights reserved.43
results-driven clarification before refinement
Search: storage
© 2008 Endeca Technologies, Inc. All rights reserved.44
taxonomies are so 1990s
© 2008 Endeca Technologies, Inc. All rights reserved.45
dynamic topic facet
Subject
Electronic data processing (1002)
Distributed processing (937)
Parallel processing (619)
Computer networks (562)
Fault-tolerant-computing (365)Show more…
© 2008 Endeca Technologies, Inc. All rights reserved.46
facets populated using entity extraction
apple production
© 2008 Endeca Technologies, Inc. All rights reserved.47
bootstrap on folksonomies
© 2008 Endeca Technologies, Inc. All rights reserved.48
or learn from users
© 2008 Endeca Technologies, Inc. All rights reserved.49
hcir using set retrieval 2.0
emphasize set summaries over ranked lists
establish a dialog between the user and the data
enable exploration and discovery
© 2008 Endeca Technologies, Inc. All rights reserved.50
think outside the (search) box
• best-first search works for many use cases
• but not for some of the most valuable ones
• set retrieval 2.0 = set retrieval + guidance
• human-computer information retrieval
© 2008 Endeca Technologies, Inc. All rights reserved.51
thank you
communication 1.0email: [email protected]
communication 2.0blog: http://thenoisychannel.com
twitter: http://twitter.com/dtunkelang