Upload
bernadette-bailey
View
218
Download
5
Tags:
Embed Size (px)
Citation preview
A Markov Random Field Model for Term Dependencies
Chetan MishraCS 6501 Paper Presentation
Ideas, graphs, charts, and results from paper of same name by Metzler and Croft 2005 (SIGIR)
CS 6501: Text Mining 2
Agenda
1. Motivation behind the work2. Background – What is a Markov Random
Field (MRF)?3. Research Insight – How did the authors use
MRF to model term dependencies? Results?4. Future Work – If you thought this was
interesting, how could you build on this?5. Conclusion
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 3
Motivation
• Terms are not independently distributed– A model incorporating term dependencies should
outperform a model that ignores them• One problem: models incorporating term
dependencies seemed no better or worse– Statistical models weren’t effectively modeling
term dependencies– Why?
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 4
Motivation
• Two Problems (perspective of authors):– Problem 1: Most models have taken bag of word-
like approaches (which have tremendous data requirements)
– Solution 1: We need a new type of model– Problem 2: Term dependency modeling (even with
a reasonable model) requires a significant corpus– Solution 2: Add to research testing collections
large, web-scraped corpuses
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 5
Background
• What is a Markov random field (MRF) model?– Fancy name for a bidirectional graph-based model– Often used in machine learning to succinctly
model joint distributions• MRF models are used in the paper to tackle
the problem of document retrieval with response to a query
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 6
Model Overview
• Problem: Find documents that are relevant to a query – Imagine there’s a set of documents relevant with
respect to each query. – We will model the probability of a document
being relevant to a query with – Model will be providing user a ranked list of
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 7
Model Overview
• We will be modeling
• , also called the “potential function”– Identity depends widely on problem one is solving– Non-negative
CS@UVa
Joint distribution of query and document is the set of
cliques in MRF .
is a measure of the compatibility of the term(s) passed in with the topic of
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 8
Model Overview
• Model output needs to be ranked– Since only order matters we can make the
following simplifications:
Since ,
CS@UVa
By the joint probability law
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 9
The Markov Random Field Model
• What is ?– The Markov random field
• What does contain?– A document node and a node for each term in the
query ()– Edges between all and – Edges between each and that are not sufficiently
independent
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 10
The Markov Random Field Model
• “Edges between each and that are not sufficiently independent.”
• Recall – This means that represents a set of completely
mutually dependent words and the document. – Ranking ranks the sum of independent subsets of
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 11
The Markov Random Field Model
• The paper looks at the performance of three general types of dependencies: – Independence– Sequential dependence – Full dependence
• Visual Depiction:
CS@UVa
Metzler and Croft ‘05
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 12
Potential Functions
• How do we measure how relevant a set of mutually dependent words is to a document?– If one word, – If >1 word and the sequence is unobserved
– If >1 word and the sequence is observed
CS@UVa
is a smoothed, term frequency
= smoothed, sequence count of
appearing unordered within extra terms
= smoothed, ordered sequence count of in
Agenda Motivation Background Research Insight Future Work Conclusion
All log scale!
CS 6501: Text Mining 13
Parameter Training
• Don’t use Maximum Likelihood Estimation. Why?– sample space extremely large compared to
training data – Unlikely MLE estimate would be accurate
• Instead let’s maximize our accuracy metric– And let’s say:
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 14
Parameter Training
• What optimization technique do we use?– Authors found a shape common to the metric
surface via parameter sweepA hill-climbing
search should work well
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
CS 6501: Text Mining 15
Results
• Did MRF’s help?
– I’d say so. Significant gains across data sets
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
Independent Sequential Dependence Full Dependence
CS 6501: Text Mining 16
Future Work
• Query expansion – If we know a document relates to a query we just
received, how can be expand the query?• Statistical techniques to indicate which terms
should be declared “dependent”– Perhaps based on expected mutual information
measure
CS@UVa
Agenda Motivation Background Research Insight Future Work Conclusion
Conclusion
1. Motivation behind the work2. Background – What is a Markov Random
Field Model (MRF)?3. Research Insight – How did the authors use
MRF to model term dependencies? Results?4. Future Work – If you thought this was
interesting, how could you build on this?5. Conclusion
CS@UVa CS 6501: Text Mining 17
Agenda Motivation Background Research Insight Future Work Conclusion