Upload
essence-brydges
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Even More TopX:Relevance Feedback
Ralf Schenkel
Joint work with Osama Samodi, Martin Theobald
TopX Results with INEX 2007• 660,000 XMLified English Wikipedia articles• 107 topics with
– structural query (CAS)– nonstructural (aka keyword) query (CO)– informal description of information need– assessed answers (text passages)
• Evaluation metric based on recall/precision:fraction of relevant characters retrieved
1% recall
result list
C: #characters retrievedR: #relevant characters retrieved
P[0.01]=R/C
Results with INEX 2007
structure queries
keyword queries
Structural constraints can improve result quality
document retrieval
improved structurequeries
improved keywordqueries
(unchecked)
Users vs. Structural XML IR
//professor[contains(.,SB)and contains(.//course,IR]I need information about a professor in SB who
teaches IR.
Structural query languagesdo not work in practise:• Schema is unknown or heterogeneous• Language is too complex• Humans don‘t think XPath• Results often unsatisfying
System support to generate „good“ structured queries:• User interfaces („advanced search“)• Natural language processing• Interactive query refinement
Relevance Feedback for Interactive Query Refinement
1. User submits query
1
2
3
4
…
2. User marks relevant and nonrelevant docs3. System finds best terms to distinguish between relevant and nonrelevant docs
query evaluation
XML
XML
IR
IR
index
indexFagin
IR
index
4. System submits expanded query
XML not(Fagin)
Feedback for XML IR:• Start with keyword query• Find structural expansions• Create structural query
Tag+Content ofdescendants
sec„Semistructured data…“
Structural Featuresarticle
body
sec
subsec„XML has evolved…“
frontmatter backmatter
sec
subsec
p p p„With the advent of XSLT…“
author„Baeza-Yates“
Content ofresult
User marksrelevant result
Possible features:Tag+Contentof ancestors
Tag+Content of descen-dants of ancestors
C: XML D: p[XSLT] A: sec[data]AD: article//author[Baeza]
whererf number of relevant results with fR number of relevant resultseff number of elements that contain fE number of all elements
Feature Selection
( ) ( )*( )RSJ f fRSV f w f p q
Compute Robertson-Sparck-Jones weight for each feature (also used as weight in query):
0.5 0.5( ) log log
0.5 0.5f f f
RSJf f f
r E ef R rw f
R r ef r
Order features by Robertson Selection Value:
wherepf probability that f occurs in relevant result,qf probability that f occurs in nonrelevant result
Query Construction
C: XML D: p[XSLT] A: sec[data] AD: article//author[Baeza]
Initial query: query evaluation
Tag+Content ofdescendants
Content ofresult
Tag+Contentof ancestors
Tag+Content of descen-dants of ancestors
*[query evaluation]*[query evaluation XML]
p[XSLT]
sec[data]
article
author[Baeza]
needs schemainformation!
descendant-or-self axis
More Fancy Query Construction
*[query evaluation]*[query evaluation XML]
p[XSLT]
sec[data]
article
author[Baeza]
• No valid NEXI query, but XPath (ancestor axis) DAG queries in TopX• needs disjunctive evaluation
Example: „pyramids of egypt“
Architecture
TopX SearchEngine
query +results
feedb
ack
Weighting + Selection
expan
ded
qu
eryquery
results
CModule
DModule
AModule
Candidate ClassesAD
Module
INEX Tools & Assessments
RF in the TopX 2.0 Interface
Evaluation Methodology
Goal: avoid „training on the data“• Freeze known results at the top• Remove known results+X from the collection
– resColl-result: remove results only (~doc retrieval)– resColl-desc: remove results+descendants– resColl-anc: remove results+ancestors– resColl-path: remove results+desc+anc– resColl-doc: remove whole doc with known results
Evaluation: INEX 2003&2004• INEX collection (IEEE-CS journal and conference
articles):– 12,107 XML docs with 12 mio. elements– queries with manual relevance assessments
• 52 keyword queries from 2003 & 2004 with our TopX Search Engine [VLDB05]
• Baseline run with MAP~0.1, Precision@20=0.174• Automatic feedback for top-k from relevance
assessments• Evaluation ignores results used for feedback and
descendants of results (rescoll-desc)
INEX 2003&2004, rescoll-desc
All dimensions together are best.Reasonable results for INEX 2005 RF Track
Results for INEX 2005 Track
•INEX IEEE collection (scientific articles)•Feedback for the top-20 from the assessments (with the strict quantisation -> only „relevant“ and „nonrelevant“)• top 10 expansion features• runs with top 1500 results• MAP with inex_eval (with strict quantisation)
(Some) Results for INEX 2006 RF Track
•INEX Wikipedia collection•Feedback for the top-20 from the assessments (with the generalized quantisation -> graded relevance)• top 10 expansion features• runs with top 100 results for first 50 topics (time…)• MAP with inex_eval (with generalised quantisation)
Significance tests (Wilcoxon signed-rank, t-test)
Conclusions• Queries with structural constraints to improve
result quality
• Relevance Feedback to create such queries
• Structure of collection matters a lot