Upload
multimediaeval
View
29
Download
0
Embed Size (px)
Citation preview
IR evaluation: Putting the user back in the loop
Evangelos [email protected]
Change the search algorithm.
How can we know whether we made the users happier?
Different approaches to evaluation
• User-‐studies
• In-‐situ evaluation• A/B Testing• Interleaving
• Collection-‐based evaluation
in-‐situ evaluation
A/B Testing
Baseline (control) Experimental (treatment)
collection-‐based evaluation
Machine Learning
• Feature vectors
• Labels
Cranfield Collections
Information Retrieval
• Documents• Queries
• Labels– relevance judgments
Query 1 Query 2 Query N
Cranfield Paradigm• Simple user model• Controlled experiments• Reusable but static test
collections
Online Evaluation• Full user participation• Many degrees of freedom• Unrepeatable experiments
System Focus User Focus
Evaluation Landscape
TREC Tasks TREC Session
TREC TotalRecall
TREC OpenSearch
TREC Total Recall
results
human assessor
search algorithm
query
documentcollection
TREC Session Track
TREC Session Track [2010-‐2014]
1. improve search by using session information
2. improve search over an entire user’s session instead of a single query
Paris Luxurious Hotels Paris Hilton
Test Collection
Evaluating Retrieval over Sessions: The TREC Session Track 2011–2014Ben Carterette1, Paul Clough2, Mark Hall3, Evangelos Kanoulas4, Mark Sanderson5
1 University of Delaware, 2 University of She�eld, 3 Edge Hill University, 4 University of Amsterdam, 5 RMIT University
Objectives
I Test if the retrieval e�ectiveness of a query could be improved by using previousqueries, ranked results, and user interactions.
Test Collection
Four test collections (2011–2014) comprising N sessions of varying length, each con-sisted of:I mi blocks of user interactions (the session’s length);I the current query qm1 in the session;I mi≠1 blocks of interactions in the session prior to the current query, composed of:
Û the user queries in the session, q1, q2, ..., qmi≠1;Û the ranked list of URLs seen by the user for each of those queries;Û the set of clicked URLs/snippets.
Test Collection Statistics
2011 2012 2013 2014collection ClueWeb09 ClueWeb09 ClueWeb12 ClueWeb12
topic propertiestopic set size 62 48 61 60
topic cat. dist. known-item 10 exploratory,6 interpretive,
20 known-item,12 known-subj
10 exploratory,9 interpretive,
32 known-item,10 known-subj
15 exploratory,15 interpretive,15 known-item,15 known-subj
session propertiesuser population U. She�eld U. She�eld U. She�eld + IR
researchersMTurk
search engine BOSS+CW09filter
BOSS+CW09filter
indri indri
total sessions 76 98 133 1,257sessions per topic 1.2 2.0 2.2 21.0
mean length (in queries) 3.7 3.0 3.7 3.7median time between queries 68.5s 66.7s 72.2s 25.6srelevance judgments
topics judged 62 48 49 51total judgments 19,413 17,861 13,132 16,949
Algorithmic Improvements
I Session history can be used to improve e�ectiveness over basic ad hoc retrieval.
0 20 40 60 80 100
−0
.10
.00.1
0.2
run number
ma
x ch
an
ge in
nD
CG
@1
0 fro
m R
L1
ba
selin
e
2011201220132014
Topic - System Analysis
I Known-subject and exploratory topics benefit most from access to session history.I There is substantial variability in topics due to the way the users perform their
search and formulate their query.
0.0
0.5
1.0
1.5
topic (ordered by median)
diff
ere
nce
in ∆
nD
CG
@1
0 o
ver
sess
ion
s
2012
−10
2012
−47
2014
−40
2013
−14
2012
−28
2012
−4
2013
−24
2014
−46
2012
−6
2014
−52
2014
−39
2014
−26
2014
−13
2014
−47
2012
−5
2014
−44
2013
−12
2011
−7
2012
−32
2011
−30
2014
−56
2013
−21
2011
−20
2012
−34
2013
−49
2014
−15
2012
−11
2014
−24
2014
−35
2014
−10
2012
−23
2014
−30
2011
−52
2013
−28
2012
−24
Conclusions
I Retrieval e�ectiveness can be improved for ad hoc retrieval using data based onsession history.
I The more detailed the session data, the greater the improvement.
SIGIR 2016
TREC Session Track [2010-‐2014]
1. improve search by using session information
2. improve search over an entire user’s session instead of a single query
TREC Tasks Track
TREC Tasks Track [2015–now]
1. understand underlying user’s task
2. assist user in completing the task
Make Improvements At Home
TASKUNDERSTANDING
Make Improvements At HomeTASK
COMPLETION
TREC Session Track [2010-‐2014]
1. improve search by using session information
2. improve search over an entire user’s session instead of a single query
CLEF Dynamic Search for Complex Tasks
CLEF Complex Tasks [now]
1. Produce methodology and algorithms that will lead to a dynamic test collection by simulating users
2. Understand and quantify what constitutes a good ranking of documents at different stages of a session, and a good overall session
TREC Open Search