Beyond Bags of Words: A Markov Random Field Model for Information Retrieval

Beyond Bags of Words:Beyond Bags of Words:A Markov Random Field Model for A Markov Random Field Model for

Information RetrievalInformation RetrievalDon Metzler

Bag of Words Representation71 the 31 garden22 and 19 of19 to 19 house18 in 18 white12 trees 12 first11 a 11 president8 for 7 as7 gardens 7 rose7 tour 6 on6 was 6 east6 tours 5 planting5 he 5 is5 grounds 5 that5 gardener 4 history4 text-decoration 4 john4 kennedy 4 april4 been 4 today4 with 4 none4 adams 4 spring4 at 4 had3 mrs 3 lawn… …

71 the 31 garden22 and 19 of19 to 19 house18 in 18 white12 trees 12 first11 a 11 president8 for 7 as7 gardens 7 rose7 tour 6 on6 was 6 east6 tours 5 planting5 he 5 is5 grounds 5 that5 gardener 4 history4 text-decoration 4 john4 kennedy 4 april4 been 4 today4 with 4 none4 adams 4 spring4 at 4 had3 mrs 3 lawn… …

Bag of Words Representation

Bag of Words Models

Binary Independence Model

Unigram Language Models

Language modeling first used in speech recognition to model speech generationIn IR, they are models of text generationTypical scenario

Estimate language model for every documentRank documents by the likelihood that the document generated the query

Documents modeled as multinomial distributions over a fixed vocabulary

Unigram Language Models

Query: Document:

Ranking via Query Likelihood

P( | ) = P( | )∙P( | )∙P( | )

Estimation

P( | ) P( | θ ) P( | θ ) P( θ )

Bag of Words Models

ProsSimple model

Easy to implement

Decent results

ConsToo simple

Unrealistic assumptions

Inability to explicitly model term dependencies

Beyond Bags of Words

Tree Dependence Model

Method of approximating complex joint distribution

Compute EMIM between every pair of terms

Build maximum spanning tree

Tree encodes first order dependencies

)|()|()|()|()(),,,,( CBPCEPDCPDAPDPEDCBAP

n-Gram Language Models

Query: Document:

Ranking via Query Likelihood

P( | ) = P( | )∙P( | , )∙P( | , )

Estimation

)1()1(),|(1

tfDwwP i

Dependency Models

ProsMore realistic assumptions

Improved effectiveness

ConsLess efficient

Limited notion of dependence

Not well understood

State of the Art IR

Desiderata

Our desired model should be able to…Support standard IR tasks (ranking, query expansion, etc.)

Easily model dependencies between terms

Handle textual and non-textual features

Consistently and significantly improve effectiveness over bag of words models and existing dependence models

Proposed solution: Markov random fields

Markov Random Fields

MRFs provide a general, robust way of modeling a joint distributionThe anatomy of a MRF

Graph G • vertices representing random variables • edges encode dependence semantics

Potentials over the cliques of G• Non-negative functions over clique configurations• Measures ‘compatibility’

, );(1

)(GCliquesc

Markov Random Fields for IR

Modeling Dependencies

Parameter Tying

In theory, a potential function can be associated with every clique in a graphTypical solution is to define potentials over maximal cliques of GNeed more fine-grained control over our potentialsUse clique sets

Set of cliques that share a parameter and potential functionWe identified 7 clique sets that are relevant to IR tasks

Clique SetsSingle term document/query cliques

TD = cliques w/ one query term + D

ψ(domestic, D), ψ(adoption, D), ψ(laws, D)

Ordered terms document/query cliques

OD = cliques w/ two or more contiguous query terms + D

ψ(domestic, adoption, D), ψ(adoption, laws, D),ψ(domestic, adoption, laws, D)

Unordered terms document/query cliques

UD = cliques w/ two or more query terms (in any order) + D

ψ(domestic, adoption, D), ψ(adoption, laws, D), ψ(domestic, laws, D), ψ(domestic, adoption, laws, D)

Clique Sets

Single term query cliquesTQ = cliques w/ one query termψ(domestic), ψ(adoption), ψ(laws)

Ordered terms query cliquesOQ = cliques w/ two or more contiguous query terms

ψ(domestic, adoption), ψ(adoption, laws), ψ(domestic, adoption, laws)

Unordered terms query cliquesUQ = cliques w/ two or more query terms (in any order) ψ(domestic, adoption), ψ(adoption, laws), ψ(domestic, laws),

ψ(domestic, adoption, laws)

Document cliqueD = singleton clique w/ D

Features

Parameter Estimation

Given a set of relevance judgments R, we want the maximum a posteriori estimate:

What is P( Λ | R )? P( R | Λ ) and P( Λ )?Depends on how model is being evaluated!Want P( Λ | R ) to be peaked around the parameter setting that maximizes the metric we are interested in

)|(maxargˆ RP

Parameter Estimation

Evaluation Metric Surfaces

Feature Induction

Query Expansion

Query Expansion ExampleO

Applications

Application: Ad Hoc Retrieval

Ad Hoc Results

Application: Web Search

Application: XML Retrieval

Example XML Document<PLAY><TITLE>The Tragedy of Romeo and Juliet</TITLE><ACT><TITLE>ACT I</TITLE><PROLOGUE><TITLE>PROLOGUE</TITLE><SPEECH><SPEAKER>NARRATOR</SPEAKER><LINE>Two households, both alike in dignity,</LINE><LINE>In fair Verona, where we lay our scene,</LINE><LINE>From ancient grudge break to new mutiny,</LINE><LINE>Where civil blood makes civil hands unclean.</LINE><LINE>From forth the fatal loins of these two foes</LINE><LINE>A pair of star-cross’d lovers take their life;</LINE>…</SPEECH></PROLOGUE><SCENE><TITLE>SCENE I. Verona. A public place.</TITLE><SPEECH><SPEAKER>SAMPSON</SPEAKER><LINE>Gregory, o’ my word, we’ll not carry coals.</LINE>…</PLAY>

Content and Structure Queries

NEXI query languagederived from XPath

allows mixture of content and structure

//scene[about(., poison juliet)]Return scene tags that are about poison and juliet.

//*[about(., plague both houses)]Return elements (of any type) about plague both houses.

//scene[about(., juliet chamber)]//speech[about(.//line, jewel)]Return speech tags about jewel where the scene is about juliet chamber.

Application: Text Classification

Conclusions

Beyond Bags of Words: A Markov Random Field Model for Information Retrieval

Documents

SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. · Title: SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. Subject: SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. Keywords: 303!5

Bopp Bags, Bopp Laminated Bags, Polypropylene Bags ... · Bopp bags with broad/strong seal like laminated bags. Thermal bags which keeps cold items cold & hot items hot. B.O.p.p

Shree Swami Bags, Pune, Bags

AUTHORS FR: MARKOV, G.L. TO: MARKOV, V.A. · Title: AUTHORS FR: MARKOV, G.L. TO: MARKOV, V.A. Subject: AUTHORS FR: MARKOV, G.L. TO: MARKOV, V.A. Keywords: Tho IL-28 kircrtSt (Technical

The analysis of hyperlink structure of Web documents has ...terrierteam.dcs.gla.ac.uk/publications/laweb-extended.pdfKeywords: Web information retrieval, hyperlink analysis, markov

PROFILE COMPANY - hartapack.com · Our range of products comprises shopping bags, general purpose bags, food packaging, industrial customized bags, garbage bags, agricultural bags,

Markov Systems and Markov Decision Processes...States, Actions, Observations Passive Controlled Fully Observable Markov Model Markov Decision Process (MDP) Hidden State Hidden Markov

Smartlift Jackbox - Bulk Bags | Dumpy Bags | FIBC Bags

Bagz International · Manufacturing, Exporting and Supplying an exclusive range of Beach Bags, Beaded Bags, Promotional Bags, Leather Bags, Jute Bags, Fashionable Bags, Designer Handbags,

China cooler bags manufacturer,cooler bags supplier,cooler bags China,cooler bags factory,custom cooler bags,insulated cooler bags,food cooler bags,picnic cooler bags

RBD ANNUAL REPORT - Moneycontrol.comFIBC (Jumbo Bags) Food grade bags Fertilizer bags Sugar bags with inner liner Standard cement bags Perforated laminated cement bags Normal standard

Application of Markov chains in an interactive information retrieval system

OUR PRODUCTS GARBAGE BAGS | DOG POOP BAGS | GROCERY BAGS | CARRY BAGS …jcbioplastic.com/JC_Bioplastic.pdf · GARBAGE BAGS | DOG POOP BAGS | GROCERY BAGS | CARRY BAGS DISPOSABLE

Emergency Source RecoveryEmergency Source Retrieval Source Retrieval Equipment Cellular telephone Tape recorder Shovel Lead shot bags Fl hlight 3 Extra signs and barrier tape Hack

Bags & Bags

Laparoscopic specimen retrieval bags in gyn surgery

PP woven bags, Polypropylene woven sacks, Plastic bags, Packaging bags

Markov process-based retrieval for encrypted JPEG images · 2017. 8. 23. · RESEARCH Open Access Markov process-based retrieval for encrypted JPEG images Hang Cheng1,2*, Xinpeng

BEYOND BAGS - interpack€¦ · 1 & 2 LOOP FIBC BAGS 4 LOOP FIBC BAGS Q BAGS / BAFFLE BAGS FIBC Bags, also known as Jumbo Bags / Bulk / Big Bags, are one of the most cost effective

Tirth Bags, Ahmedabad - Exporter & Supplier of Non Woven ... · Trader of Biodegradable Bags, Non Woven Bags, D Cut Bags, Printed Bags, Eco Friendly Bags and Water Proof Bags. The