27
IR evaluation: Putting the user back in the loop Evangelos Kanoulas [email protected]

MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Embed Size (px)

Citation preview

Page 1: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

IR  evaluation:  Putting  the  user  back  in  the  loop

Evangelos [email protected]

Page 2: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Change  the  search  algorithm.

How  can  we  know  whether  we  made  the  users  happier?

Page 3: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Different  approaches  to  evaluation

• User-­‐studies

• In-­‐situ  evaluation• A/B  Testing• Interleaving

• Collection-­‐based  evaluation

Page 4: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

in-­‐situ  evaluation

Page 5: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

A/B  Testing

Baseline  (control) Experimental  (treatment)

Page 6: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
Page 7: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
Page 8: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

collection-­‐based  evaluation

Page 9: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Machine  Learning

• Feature  vectors

• Labels

Cranfield Collections

Information  Retrieval

• Documents• Queries

• Labels– relevance  judgments

Query   1 Query   2 Query   N

Page 10: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
Page 11: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Cranfield Paradigm• Simple  user  model• Controlled  experiments• Reusable  but  static  test  

collections

Online  Evaluation• Full  user  participation• Many  degrees  of  freedom• Unrepeatable  experiments

System  Focus User  Focus

Evaluation  Landscape

TREC  Tasks TREC  Session  

TREC  TotalRecall  

TREC  OpenSearch

Page 12: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Total  Recall

results

human  assessor

search  algorithm

query

documentcollection

Page 13: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Session  Track

Page 14: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Session  Track  [2010-­‐2014]

1. improve  search  by  using  session  information

2. improve  search  over  an  entire  user’s  session  instead  of  a  single  query

Page 15: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Paris  Luxurious  Hotels Paris  Hilton

Page 16: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Test  Collection

Evaluating Retrieval over Sessions: The TREC Session Track 2011–2014Ben Carterette1, Paul Clough2, Mark Hall3, Evangelos Kanoulas4, Mark Sanderson5

1 University of Delaware, 2 University of She�eld, 3 Edge Hill University, 4 University of Amsterdam, 5 RMIT University

Objectives

I Test if the retrieval e�ectiveness of a query could be improved by using previousqueries, ranked results, and user interactions.

Test Collection

Four test collections (2011–2014) comprising N sessions of varying length, each con-sisted of:I mi blocks of user interactions (the session’s length);I the current query qm1 in the session;I mi≠1 blocks of interactions in the session prior to the current query, composed of:

Û the user queries in the session, q1, q2, ..., qmi≠1;Û the ranked list of URLs seen by the user for each of those queries;Û the set of clicked URLs/snippets.

Test Collection Statistics

2011 2012 2013 2014collection ClueWeb09 ClueWeb09 ClueWeb12 ClueWeb12

topic propertiestopic set size 62 48 61 60

topic cat. dist. known-item 10 exploratory,6 interpretive,

20 known-item,12 known-subj

10 exploratory,9 interpretive,

32 known-item,10 known-subj

15 exploratory,15 interpretive,15 known-item,15 known-subj

session propertiesuser population U. She�eld U. She�eld U. She�eld + IR

researchersMTurk

search engine BOSS+CW09filter

BOSS+CW09filter

indri indri

total sessions 76 98 133 1,257sessions per topic 1.2 2.0 2.2 21.0

mean length (in queries) 3.7 3.0 3.7 3.7median time between queries 68.5s 66.7s 72.2s 25.6srelevance judgments

topics judged 62 48 49 51total judgments 19,413 17,861 13,132 16,949

Algorithmic Improvements

I Session history can be used to improve e�ectiveness over basic ad hoc retrieval.

0 20 40 60 80 100

−0

.10

.00.1

0.2

run number

ma

x ch

an

ge in

nD

CG

@1

0 fro

m R

L1

ba

selin

e

2011201220132014

Topic - System Analysis

I Known-subject and exploratory topics benefit most from access to session history.I There is substantial variability in topics due to the way the users perform their

search and formulate their query.

0.0

0.5

1.0

1.5

topic (ordered by median)

diff

ere

nce

in ∆

nD

CG

@1

0 o

ver

sess

ion

s

2012

−10

2012

−47

2014

−40

2013

−14

2012

−28

2012

−4

2013

−24

2014

−46

2012

−6

2014

−52

2014

−39

2014

−26

2014

−13

2014

−47

2012

−5

2014

−44

2013

−12

2011

−7

2012

−32

2011

−30

2014

−56

2013

−21

2011

−20

2012

−34

2013

−49

2014

−15

2012

−11

2014

−24

2014

−35

2014

−10

2012

−23

2014

−30

2011

−52

2013

−28

2012

−24

Conclusions

I Retrieval e�ectiveness can be improved for ad hoc retrieval using data based onsession history.

I The more detailed the session data, the greater the improvement.

SIGIR 2016

Page 17: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Session  Track  [2010-­‐2014]

1. improve  search  by  using  session  information

2. improve  search  over  an  entire  user’s  session  instead  of  a  single  query

Page 18: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
Page 19: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Tasks  Track

Page 20: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Tasks  Track  [2015–now]

1. understand  underlying  user’s  task

2. assist  user  in  completing  the  task

Page 21: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Make Improvements At Home

TASKUNDERSTANDING

Page 22: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

Make Improvements At HomeTASK

COMPLETION

Page 23: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Session  Track  [2010-­‐2014]

1. improve  search  by  using  session  information

2. improve  search  over  an  entire  user’s  session  instead  of  a  single  query

Page 24: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

CLEF  Dynamic  Search  for  Complex  Tasks

Page 25: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

CLEF  Complex  Tasks  [now]

1. Produce  methodology  and  algorithms  that  will  lead  to  a  dynamic  test  collection by  simulating  users

2. Understand  and  quantify  what  constitutes  a  good  ranking  of  documents  at  different  stages of  a  session,  and  a  good  overall session

Page 26: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

TREC  Open  Search

Page 27: MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop