Beyond Bags of Words: A Markov Random Field Model for Information Retrieval

Preview:

DESCRIPTION

Beyond Bags of Words: A Markov Random Field Model for Information Retrieval. Don Metzler. Bag of Words Representation. 71 the31 garden 22 and19 of 19 to19 house 18 in18 white 12 trees12 first 11 a11 president 8 for7 as 7 gardens7 rose 7 tour6 on 6 was6 east - PowerPoint PPT Presentation

Citation preview

Beyond Bags of Words:Beyond Bags of Words:A Markov Random Field Model for A Markov Random Field Model for

Information RetrievalInformation RetrievalDon Metzler

Bag of Words Representation71 the 31 garden22 and 19 of19 to 19 house18 in 18 white12 trees 12 first11 a 11 president8 for 7 as7 gardens 7 rose7 tour 6 on6 was 6 east6 tours 5 planting5 he 5 is5 grounds 5 that5 gardener 4 history4 text-decoration 4 john4 kennedy 4 april4 been 4 today4 with 4 none4 adams 4 spring4 at 4 had3 mrs 3 lawn… …

71 the 31 garden22 and 19 of19 to 19 house18 in 18 white12 trees 12 first11 a 11 president8 for 7 as7 gardens 7 rose7 tour 6 on6 was 6 east6 tours 5 planting5 he 5 is5 grounds 5 that5 gardener 4 history4 text-decoration 4 john4 kennedy 4 april4 been 4 today4 with 4 none4 adams 4 spring4 at 4 had3 mrs 3 lawn… …

Bag of Words Representation

Bag of Words Models

Binary Independence Model

Unigram Language Models

Language modeling first used in speech recognition to model speech generationIn IR, they are models of text generationTypical scenario

Estimate language model for every documentRank documents by the likelihood that the document generated the query

Documents modeled as multinomial distributions over a fixed vocabulary

Unigram Language Models

Query: Document:

Ranking via Query Likelihood

P( | ) = P( | )∙P( | )∙P( | )

Estimation

P( | ) P( | θ ) P( | θ ) P( θ )

Bag of Words Models

ProsSimple model

Easy to implement

Decent results

ConsToo simple

Unrealistic assumptions

Inability to explicitly model term dependencies

Beyond Bags of Words

Tree Dependence Model

Method of approximating complex joint distribution

Compute EMIM between every pair of terms

Build maximum spanning tree

Tree encodes first order dependencies

A

D

B

E

C

)|()|()|()|()(),,,,( CBPCEPDCPDAPDPEDCBAP

n-Gram Language Models

n-Gram Language Models

Query: Document:

Ranking via Query Likelihood

P( | ) = P( | )∙P( | , )∙P( | , )

Estimation

||

)1(||

)1()1(),|(1

,1

1

,11 C

cf

cf

cf

D

tf

tf

tfDwwP i

i

iii

i

iiii

Dependency Models

ProsMore realistic assumptions

Improved effectiveness

ConsLess efficient

Limited notion of dependence

Not well understood

State of the Art IR

Desiderata

Our desired model should be able to…Support standard IR tasks (ranking, query expansion, etc.)

Easily model dependencies between terms

Handle textual and non-textual features

Consistently and significantly improve effectiveness over bag of words models and existing dependence models

Proposed solution: Markov random fields

Markov Random Fields

MRFs provide a general, robust way of modeling a joint distributionThe anatomy of a MRF

Graph G • vertices representing random variables • edges encode dependence semantics

Potentials over the cliques of G• Non-negative functions over clique configurations• Measures ‘compatibility’

)(

, );(1

)(GCliquesc

G cZ

XP

Markov Random Fields for IR

Modeling Dependencies

Parameter Tying

In theory, a potential function can be associated with every clique in a graphTypical solution is to define potentials over maximal cliques of GNeed more fine-grained control over our potentialsUse clique sets

Set of cliques that share a parameter and potential functionWe identified 7 clique sets that are relevant to IR tasks

Clique SetsSingle term document/query cliques

TD = cliques w/ one query term + D

ψ(domestic, D), ψ(adoption, D), ψ(laws, D)

Ordered terms document/query cliques

OD = cliques w/ two or more contiguous query terms + D

ψ(domestic, adoption, D), ψ(adoption, laws, D),ψ(domestic, adoption, laws, D)

Unordered terms document/query cliques

UD = cliques w/ two or more query terms (in any order) + D

ψ(domestic, adoption, D), ψ(adoption, laws, D), ψ(domestic, laws, D), ψ(domestic, adoption, laws, D)

Clique Sets

Single term query cliquesTQ = cliques w/ one query termψ(domestic), ψ(adoption), ψ(laws)

Ordered terms query cliquesOQ = cliques w/ two or more contiguous query terms

ψ(domestic, adoption), ψ(adoption, laws), ψ(domestic, adoption, laws)

Unordered terms query cliquesUQ = cliques w/ two or more query terms (in any order) ψ(domestic, adoption), ψ(adoption, laws), ψ(domestic, laws),

ψ(domestic, adoption, laws)

Document cliqueD = singleton clique w/ D

ψ(D)

Features

Parameter Estimation

Given a set of relevance judgments R, we want the maximum a posteriori estimate:

What is P( Λ | R )? P( R | Λ ) and P( Λ )?Depends on how model is being evaluated!Want P( Λ | R ) to be peaked around the parameter setting that maximizes the metric we are interested in

)|(maxargˆ RP

Parameter Estimation

Evaluation Metric Surfaces

Feature Induction

Query Expansion

Query Expansion

Query Expansion ExampleO

rigin

al q

uery

: hu

bble

tel

esco

pe a

chie

vem

ents

Query Expansion ExampleO

rigin

al q

uery

: hu

bble

tel

esco

pe a

chie

vem

ents

Applications

Application: Ad Hoc Retrieval

Ad Hoc Results

Application: Web Search

Application: XML Retrieval

Example XML Document<PLAY><TITLE>The Tragedy of Romeo and Juliet</TITLE><ACT><TITLE>ACT I</TITLE><PROLOGUE><TITLE>PROLOGUE</TITLE><SPEECH><SPEAKER>NARRATOR</SPEAKER><LINE>Two households, both alike in dignity,</LINE><LINE>In fair Verona, where we lay our scene,</LINE><LINE>From ancient grudge break to new mutiny,</LINE><LINE>Where civil blood makes civil hands unclean.</LINE><LINE>From forth the fatal loins of these two foes</LINE><LINE>A pair of star-cross’d lovers take their life;</LINE>…</SPEECH></PROLOGUE><SCENE><TITLE>SCENE I. Verona. A public place.</TITLE><SPEECH><SPEAKER>SAMPSON</SPEAKER><LINE>Gregory, o’ my word, we’ll not carry coals.</LINE>…</PLAY>

Content and Structure Queries

NEXI query languagederived from XPath

allows mixture of content and structure

//scene[about(., poison juliet)]Return scene tags that are about poison and juliet.

//*[about(., plague both houses)]Return elements (of any type) about plague both houses.

//scene[about(., juliet chamber)]//speech[about(.//line, jewel)]Return speech tags about jewel where the scene is about juliet chamber.

Application: Text Classification

Conclusions

Recommended