Upload
armani
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Evaluation of Relevance Feedback Algorithms for XML Retrieval. Silvana Solomon 27 February 2007. Supervisor: Dr. Ralf Schenkel. Outline. Short introduction Motivation & Goals Evaluating retrieval effectiveness INEX tool Evaluation methodology Results. (4) expanded query. (1) query. - PowerPoint PPT Presentation
Citation preview
Evaluation of Relevance Feedback
Algorithms for XML Retrieval
Silvana Solomon27 February 2007
Supervisor:
Dr. Ralf Schenkel
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Outline
Short introduction
Motivation & Goals
Evaluating retrieval effectiveness
INEX tool
Evaluation methodology
Results
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Introduction
Path to the result
sec„The IR process is composed…“
article
body
sec
subsec„For small collections…“
frontmatter
sec
subsec
p p p„Figure 1 outlines…“
author„Ian Ruthven“
Content of result
citation„D. Harman“
backmatter
(3) feedback
(4) expanded query
FeedbackXML SearchEngine
(1) q
ue
ry
(2) re
su
lts
(5) re
su
lts o
f e
xp
an
de
d q
ue
ry
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Motivation
Best way to compare feedback algorithms?
Cannot use standard evaluation tools on feedback results
Goals:
Analyze evaluation methods
Develop an evaluation tool
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluating Retrieval Effectiveness
Document collection
Topics set
Assessments set
Human assessors
Metrics
INEX: INitiative for the Evaluation of XML Retrieval 2006 document collection: 600,000 Wikipedia documents
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
INEX Tool: EvalJ
Tool for evaluation of information retrieval experiments
Implements a set of metrics used for evaluation
Limitations: cannot measure improvement of runs produced with feedback
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
RF Evaluation – Ranking Effect
Baseline run
doc[1]/bdy[1]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[1]
Mark in top results
relevantdoc[3]
doc[8]/bdy[1]/article[3]
doc[3]
doc[8]/bdy[1]/article[3]
doc[7]/article[3]
push the known relevant results to the top of the element ranking
artificially improves RP figures
doc[2]/bdy[1]/article[1]
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
RF Evaluation – Feedback Effect
measure improvement on unseen relevant elements
not directly tested
Modify
FB run
Evaluate untrained results
Baseline run
doc[1]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[3]
doc[8]/bdy[1]/article[3]
Mark in top results
relevant
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (1)1. Standard text IR: freezing known results at the
top independent results assumption
2. New approach: remove known results+X from the collection
resColl-result: remove results only (~doc retrieval) resColl-desc: remove results+descendants resColl-anc: remove results+ancestors resColl-path: remove results+desc+anc resColl-doc: remove whole doc with known results
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (2) Freezing:
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (2)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
block top-3
Feedback run
doc[7]/bdy[1]
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
Freezing:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (2)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
block top-3
Feedback run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
Freezing:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (2)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
block top-3
Feedback run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
Freezing:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (2)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
block top-3
Feedback run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
Freezing:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (3)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
resColl-path:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (3)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
resColl-path:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (3)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
resColl-path:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (3)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[2]/bdy[1]
doc[4]/bdy[1]/ article[4]
resColl-path:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (3)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[4]/bdy[1]/ article[4]
resColl-path:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Evaluation Methodology (3)
Baseline run
doc[7]/bdy[1]
doc[3]
doc[2]/bdy[1]
doc[8]/bdy[1]/article[3]
doc[4]/bdy[1]/ article[1]/ sec[6]
Feedback run
doc[2]/bdy[1]/article[1]
doc[9]
doc[4]/bdy[1]/article[2]
doc[4]/bdy[1]/ article[4]
resColl-path:
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Best Evaluation Methodology?
sec„The IR process is composed…“
article
body
sec
subsec„For small collections…“
frontmatter backmatter
sec
subsec
p p P„Figure 1 outlines…“
author„Ian Ruthven“
citation„D. Harman“
resColl-path
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Testing Evaluated Results
Standard method: average – problems:
Topic-id 205 280 307 325 341 400 Avg.
Baseline 0.2 0.3 0.1 0.1 0.2 0.3 0.2
Modified feedback
0.2 0.2 0.1 0.9 0.2 0.2 0.3
t-test & Wilcoxon signed-rank test: gives probability p that the baseline run is better than the feedback run
experiment significant if p<0.05 or p<0.01
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Results (1)
Evaluation mode: resColl-path
Feedback file INEX metric
Abs. improv.
Rel. improv.
T-test WSR
TopX_CO_Content.xml 0.0185 0.0112 1.5467 0.0001 0.0001
xfirm_r1_cosc3s.xml 0.0028 0.0015 1.0975 0.0003 0.0023
xfirm_r1_cosc5.xml 0.0026 0.0012 0.9222 0.0028 0.0422
xfirm_r1_cosc3.xml 0.0025 0.0012 0.8854 0.0032 0.0441
xfirm_r1_coc3s3.xml 0.0031 -0.0017 -0.3564 0.9301 0.9995
xfirm2_r2_cop4.xml 0.0032 -0.0018 -0.3594 0.8532 0.9732
xfirm2_r2_cot40.xml 0.0025 -0.0024 -0.4863 0.9239 0.9987
xfirm2_r2_cot10.xml 0.0023 -0.0026 -0.5334 0.9429 0.9999
xfirm_r1_coc3.xml 0.0014 -0.0034 -0.7186 0.9993 0.9999
xfirm_r1_coc10.xml 0.0013 -0.0035 -0.7281 0.9989 0.9999
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Results (2)
Comparison of evaluation techniques based on relative improvement w.r.t. baseline run
freezing resColl-anc
resColl-desc
resColl-doc
resColl-path
resColl- res
c3s c3s TopX TopX TopX c3s
TopX c5 c3s c3s c3s c5
c5 TopX c5 c5 c5 TopX
c3 c3 c3 c3 c3 c3
TopX = TopX_CO_Content.xmlc3 = xfirm_r1_cosc3.xmlc3s = xfirm_r1_cosc3s.xmlc5 = xfirm_r1_cosc5.xml
Silvana Solomon Evaluation of RF Algorithms for XML Retrieval
27 Feb 2007
Conclusions & Future Work Evaluation based on different techniques &
metrics
Correct improvement measurement
Not solved: comparing several systems with different output
Maybe a hybrid evaluation mode