Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald

Even More TopX:Relevance Feedback

Ralf Schenkel

Joint work with Osama Samodi, Martin Theobald

TopX Results with INEX 2007• 660,000 XMLified English Wikipedia articles• 107 topics with

– structural query (CAS)– nonstructural (aka keyword) query (CO)– informal description of information need– assessed answers (text passages)

• Evaluation metric based on recall/precision:fraction of relevant characters retrieved

1% recall

result list

C: #characters retrievedR: #relevant characters retrieved

P[0.01]=R/C

Results with INEX 2007

structure queries

keyword queries

Structural constraints can improve result quality

document retrieval

improved structurequeries

improved keywordqueries

(unchecked)

Users vs. Structural XML IR

//professor[contains(.,SB)and contains(.//course,IR]I need information about a professor in SB who

teaches IR.

Structural query languagesdo not work in practise:• Schema is unknown or heterogeneous• Language is too complex• Humans don‘t think XPath• Results often unsatisfying

System support to generate „good“ structured queries:• User interfaces („advanced search“)• Natural language processing• Interactive query refinement

Relevance Feedback for Interactive Query Refinement

1. User submits query

1

2

3

4

…

2. User marks relevant and nonrelevant docs3. System finds best terms to distinguish between relevant and nonrelevant docs

query evaluation

XML

XML

IR

IR

index

indexFagin

IR

index

4. System submits expanded query

XML not(Fagin)

Feedback for XML IR:• Start with keyword query• Find structural expansions• Create structural query

Tag+Content ofdescendants

sec„Semistructured data…“

Structural Featuresarticle

body

sec

subsec„XML has evolved…“

frontmatter backmatter

sec

subsec

p p p„With the advent of XSLT…“

author„Baeza-Yates“

Content ofresult

User marksrelevant result

Possible features:Tag+Contentof ancestors

Tag+Content of descen-dants of ancestors

C: XML D: p[XSLT] A: sec[data]AD: article//author[Baeza]

whererf number of relevant results with fR number of relevant resultseff number of elements that contain fE number of all elements

Feature Selection

( ) ( )*( )RSJ f fRSV f w f p q

Compute Robertson-Sparck-Jones weight for each feature (also used as weight in query):

0.5 0.5( ) log log

0.5 0.5f f f

RSJf f f

r E ef R rw f

R r ef r

Order features by Robertson Selection Value:

wherepf probability that f occurs in relevant result,qf probability that f occurs in nonrelevant result

Query Construction

C: XML D: p[XSLT] A: sec[data] AD: article//author[Baeza]

Initial query: query evaluation

Tag+Content ofdescendants

Content ofresult

Tag+Contentof ancestors

Tag+Content of descen-dants of ancestors

*[query evaluation]*[query evaluation XML]

p[XSLT]

sec[data]

article

author[Baeza]

needs schemainformation!

descendant-or-self axis

More Fancy Query Construction

*[query evaluation]*[query evaluation XML]

p[XSLT]

sec[data]

article

author[Baeza]

• No valid NEXI query, but XPath (ancestor axis) DAG queries in TopX• needs disjunctive evaluation

Example: „pyramids of egypt“

Architecture

TopX SearchEngine

query +results

feedb

ack

Weighting + Selection

expan

ded

qu

eryquery

results

CModule

DModule

AModule

Candidate ClassesAD

Module

INEX Tools & Assessments

RF in the TopX 2.0 Interface

Evaluation Methodology

Goal: avoid „training on the data“• Freeze known results at the top• Remove known results+X from the collection

– resColl-result: remove results only (~doc retrieval)– resColl-desc: remove results+descendants– resColl-anc: remove results+ancestors– resColl-path: remove results+desc+anc– resColl-doc: remove whole doc with known results

Evaluation: INEX 2003&2004• INEX collection (IEEE-CS journal and conference

articles):– 12,107 XML docs with 12 mio. elements– queries with manual relevance assessments

• 52 keyword queries from 2003 & 2004 with our TopX Search Engine [VLDB05]

• Baseline run with MAP~0.1, Precision@20=0.174• Automatic feedback for top-k from relevance

assessments• Evaluation ignores results used for feedback and

descendants of results (rescoll-desc)

INEX 2003&2004, rescoll-desc

All dimensions together are best.Reasonable results for INEX 2005 RF Track

Results for INEX 2005 Track

•INEX IEEE collection (scientific articles)•Feedback for the top-20 from the assessments (with the strict quantisation -> only „relevant“ and „nonrelevant“)• top 10 expansion features• runs with top 1500 results• MAP with inex_eval (with strict quantisation)

(Some) Results for INEX 2006 RF Track

•INEX Wikipedia collection•Feedback for the top-20 from the assessments (with the generalized quantisation -> graded relevance)• top 10 expansion features• runs with top 100 results for first 50 topics (time…)• MAP with inex_eval (with generalised quantisation)

Significance tests (Wilcoxon signed-rank, t-test)

Conclusions• Queries with structural constraints to improve

result quality

• Relevance Feedback to create such queries

• Structure of collection matters a lot

Documents

Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald