48
1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst)

Risk Minimization and Language Modeling in Text Retrieval

  • Upload
    sela

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Risk Minimization and Language Modeling in Text Retrieval. ChengXiang Zhai. Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst). Information Overflow. Web Site Growth. query. “Tips on thesis defense”. - PowerPoint PPT Presentation

Citation preview

Page 1: Risk Minimization and  Language Modeling in Text Retrieval

1

Risk Minimization and

Language Modeling in Text

Retrieval ChengXiang Zhai

Thesis Committee:

John Lafferty (Chair), Jamie Callan

Jaime CarbonellDavid A. Evans

W. Bruce Croft (Univ. of Massachusetts, Amherst)

Page 2: Risk Minimization and  Language Modeling in Text Retrieval

2

Information Overflow

Web Site Growth

Page 3: Risk Minimization and  Language Modeling in Text Retrieval

3

Text Retrieval (TR)

RetrievalSystem

User

“Tips on thesis defense”

query

relevant docs

database/collection

text docs

Page 4: Risk Minimization and  Language Modeling in Text Retrieval

4

Challenges in TR

(independent,topical)Relevance

Ad hocparameter tuning

Utility

Page 5: Risk Minimization and  Language Modeling in Text Retrieval

5

Sophisticated Parameter Tuningin the Okapi System

(Robertson et al. 1999)

“k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000 (effectively infinite).”

Page 6: Risk Minimization and  Language Modeling in Text Retrieval

6

More Than “Relevance”

Relevance Ranking Desired Ranking

Redundancy

Readability

Page 7: Risk Minimization and  Language Modeling in Text Retrieval

7

Meeting the Challenges

Bayesian Decision Theory

Statistical Language Models

Risk Minimization Framework

Utility-based Retrieval

ParameterEstimation

Page 8: Risk Minimization and  Language Modeling in Text Retrieval

8

Map of Thesis

RiskMinimizationFramework

Two-stageLanguage Model

Automatic parameter setting

KL-divergenceRetrieval Model

Aspect RetrievalModel

Natural incorporation of feedback

Non-traditional ranking

New TR Framework New TR Models Features

Page 9: Risk Minimization and  Language Modeling in Text Retrieval

9

Retrieval as Decision-Making

Unordered subset?

Clustering?

Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? ()Choose: (D,)

Query … Ranked list?1 2 3 4

Page 10: Risk Minimization and  Language Modeling in Text Retrieval

10

Generative Model of Document & Query

observedPartiallyobserved

QU)|( Up Q

User

DS )|( Sp DSource

inferred

),|( Sdp Dd Document

),|( Uqp Qq Query

Page 11: Risk Minimization and  Language Modeling in Text Retrieval

11

Bayesian Decision Theory

Choice: (D1,1)

Choice: (D2,2)

Choice: (Dn,n)

...

query quser U

doc set C

source S

q

1

N

dSCUqpDLDD

),,,|(),,(minarg*)*,(,

hidden observedloss

Bayes risk for choice (D, )RISK MINIMIZATION

Loss

L

L

L

Page 12: Risk Minimization and  Language Modeling in Text Retrieval

12

Special Cases

• Set-based models (choose D)

• Ranking models (choose )

– Independent loss ( PRP)

• Relevance-based loss

• Distance-based loss

– Dependent loss

• MMR loss

• MDR loss

Boolean model

Probabilistic relevance model

Vector-space Model

Aspect retrieval model

Two-stage LM

KL-divergence model

Page 13: Risk Minimization and  Language Modeling in Text Retrieval

13

Map of Existing TR Models

Relevance

(R(q), R(d)) Similarity

P(r=1|q,d) r {0,1} Probability of Relevance

P(d q) or P(q d) Probabilistic inference

Different rep & similarity

Vector spacemodel

(Salton et al., 75)

Prob. distr.model

(Wong & Yao, 89)

GenerativeModel

RegressionModel

(Fox 83)

Classicalprob. Model(Robertson &

Sparck Jones, 76)

Docgeneration

Querygeneration

LMapproach

(Ponte & Croft, 98)(Lafferty & Zhai, 01a)

Prob. conceptspace model

(Wong & Yao, 95)

Differentinference system

Inference network model

(Turtle & Croft, 91)

Page 14: Risk Minimization and  Language Modeling in Text Retrieval

14

Where Are We?

RiskMinimizationFramework

Two-stageLanguage Model

KL-divergenceRetrieval Model

Aspect Retrieval Model

Page 15: Risk Minimization and  Language Modeling in Text Retrieval

15

Two-stage Language Models

QU)|( Up Q

DS)|( Sp D ),|( Sdp D

d

),|( Uqp Qq

otherwisec

ifdl DQ

DQ

),(0),,(

Loss function

),ˆ|(),( UqpqdR DQ

Rank

Risk ranking formula

Stage 1: compute D̂Stage 1

),ˆ|( Uqp DStage 2: compute

Stage 2

(Dirichlet prior smoothing)

(Mixture model)

Two-stage smoothing

Page 16: Risk Minimization and  Language Modeling in Text Retrieval

16

The Need of Query-Modeling(Dual-Role of Smoothing)

Verbosequeries

Keywordqueries

Page 17: Risk Minimization and  Language Modeling in Text Retrieval

17

Interaction of the Two Roles of Smoothing

Query Type JM Dir ADTitle 0.228 0.256 0.237Long 0.278 0.276 0.260

Relative performance of JM, Dir. and AD

0

0.1

0.2

0.3

JM DIR AD

Method

precision

TitleQuery

LongQuery

Page 18: Risk Minimization and  Language Modeling in Text Retrieval

18

Two-stage Smoothing

c(w,d)

|d|P(w|d) =

+p(w|C)

+

Stage-1

-Explain unseen words-Dirichlet prior(Bayesian)

(1-) + p(w|U)

Stage-2

-Explain noise in query-2-component mixture

Page 19: Risk Minimization and  Language Modeling in Text Retrieval

19

Estimating using leave-one-out

P(w1|d- w1)

P(w2|d- w2)

N

i Vw i

ii d

CwpdwcdwcCl

11 )

1||

)|(1),(log(),()|(

log-likelihood

)ˆ C|(μlargmaxμ 1μ

Maximum Likelihood Estimator

Newton’s Method

Leave-one-outw1

w2

P(wn|d- wn)

wn

...

Page 20: Risk Minimization and  Language Modeling in Text Retrieval

20

Estimating using Mixture Model

query

1

N

...

U)λ,|p(qargmaxλ

U))|λp(q)θ|λ)p(q((1πU)λ,|p(q

λ

N

1i

m

1jjdji i

ˆ

ˆ

Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm

P(w|d1)d1

P(w|dN)dN

… ...

Stage-1

(1-)p(w|d1)+ p(w|U)

(1-)p(w|dN)+ p(w|U)

Stage-2

Page 21: Risk Minimization and  Language Modeling in Text Retrieval

21

Collection query Optimal-JM Optimal-Dir Auto-2stageSK 20.3% 23.0% 22.2%*LK 36.8% 37.6% 37.4%SV 18.8% 20.9% 20.4%LV 28.8% 29.8% 29.2%SK 19.4% 22.3% 21.8%*LK 34.8% 35.3% 35.8%SV 17.2% 19.6% 19.9%LV 27.7% 28.2% 28.8%*SK 17.9% 21.5% 20.0%LK 32.6% 32.6% 32.2%SV 15.6% 18.5% 18.1%LV 26.7% 27.9% 27.9%*

AP88-89

WSJ87-92

ZIFF1-2

Automatic 2-stage results Optimal 1-stage results

Average precision (3 DB’s + 4 query types, 150 topics)

Page 22: Risk Minimization and  Language Modeling in Text Retrieval

22

Where Are We?

RiskMinimizationFramework

Two-stageLanguage Model

KL-divergenceRetrieval Model

Aspect Retrieval Model

Page 23: Risk Minimization and  Language Modeling in Text Retrieval

23

KL-divergence Retrieval Models

QU)|( Up Q

DS)|( Sp D ),|( Sdp D

d

),|( Uqp Qq

)||(

),(),,(

DQ

DQDQ

cD

cdl

Loss function

)ˆ||ˆ(),( DQ

Rank

DqdR

Risk ranking formula

)ˆ||ˆ( DQD

Page 24: Risk Minimization and  Language Modeling in Text Retrieval

24

Expansion-based vs. Model-based

D)|( DQP

Document DResults

Feedback Docs

Doc model

Q

D

)||( DQD

Doc model

Scoring

Scoring

Query Q

Document D

Query Q

Feedback Docs

Results

Expansion-basedFeedback

modify

modify

Model-basedFeedback

Query model

Query likelihood

KL-divergence

Page 25: Risk Minimization and  Language Modeling in Text Retrieval

25

Feedback as Model Interpolation

Query Q

D

)||( DQD

Document D

Results

Feedback Docs F={d1, d2 , …, dn}

FQQ )1('

Generative model

Divergence minimization

Q

F=0

No feedback

FQ '

=1

Full feedback

QQ '

Page 26: Risk Minimization and  Language Modeling in Text Retrieval

26

F Estimation Method I: Generative Mixture Model

w

w

F={d1, …, dn}

))|()|()log(();()|(log CwpwpdwcFpi w

i 1

)|(logmaxarg

FpF Maximum Likelihood

P(w| )

P(w| C)

1-

P(source)

Background words

Topic words

Page 27: Risk Minimization and  Language Modeling in Text Retrieval

27

F Estimation Method II: Empirical Divergence Minimization

d1

F={d1, …, dn}

1d

nd dn

close

))||()||(),,(1

||1

Cjd

n

iF DDCFD

),,(minarg CFDF

Empirical divergence

Divergence minimization

Cfar ()

C

Background model

Page 28: Risk Minimization and  Language Modeling in Text Retrieval

28

Example of Feedback Query Model

W p(W| )security 0.0558airport 0.0546

beverage 0.0488alcohol 0.0474bomb 0.0236

terrorist 0.0217author 0.0206license 0.0188bond 0.0186

counter-terror 0.0173terror 0.0142

newsnet 0.0129attack 0.0124

operation 0.0121headline 0.0121

Trec topic 412: “airport security”

W p(W| )the 0.0405

security 0.0377airport 0.0342

beverage 0.0305alcohol 0.0304

to 0.0268of 0.0241

and 0.0214author 0.0156bomb 0.0150

terrorist 0.0137in 0.0135

license 0.0127state 0.0127

by 0.0125

=0.9 =0.7

FF

Mixture model approach

Web database

Top 10 docs

Page 29: Risk Minimization and  Language Modeling in Text Retrieval

29

Model-based feedback vs. Simple LM

Simple LM Mixture Improv. Div.Min. Improv.AvgPr 0.21 0.296 pos +41% 0.295 pos +40%InitPr 0.617 0.591 pos -4% 0.617 pos +0%Recall 3067/4805 3888/4805 pos +27% 3665/4805 pos +19%AvgPr 0.256 0.282 pos +10% 0.269 pos +5%InitPr 0.729 0.707 pos -3% 0.705 pos -3%Recall 2853/4728 3160/4728 pos +11% 3129/4728 pos +10%AvgPr 0.281 0.306 pos +9% 0.312 pos +11%InitPr 0.742 0.732 pos -1% 0.728 pos -2%Recall 1755/2279 1758/2279 pos +0% 1798/2279 pos +2%

collection

AP88-89

TREC8

WEB

Page 30: Risk Minimization and  Language Modeling in Text Retrieval

30

Where Are We?

RiskMinimizationFramework

Two-stageLanguage Model

KL-divergenceRetrieval Model

Aspect Retrieval Model

Page 31: Risk Minimization and  Language Modeling in Text Retrieval

31

Aspect Retrieval

Query: What are the applications of robotics in the world today?

Find as many DIFFERENT applications as possible.

Example Aspects: A1: spot-welding robotics

A2: controlling inventory A3: pipe-laying robotsA4: talking robotA5: robots for loading & unloading memory tapesA6: robot [telephone] operatorsA7: robot cranes… …

Aspect judgments A1 A2 A3 … ... Ak

d1 1 1 0 0 … 0 0d2 0 1 1 1 … 0 0d3 0 0 0 0 … 1 0….dk 1 0 1 0 ... 0 1

Page 32: Risk Minimization and  Language Modeling in Text Retrieval

32

Evaluation Measures• Aspect Coverage (AC): measures per-doc

coverage

– #distinct-aspects/#docs

– Equivalent to the “set cover” problem, NP-hard

• Aspect Uniqueness(AU): measures redundancy

– #distinct-aspects/#aspects

– Equivalent to the “volume cover” problem, NP-hard

• Examples

0001001

0101100

1000101

… ...d1 d3d2

#doc 1 2 3 … …#asp 2 5 8 … …#uniq-asp 2 4 5AC: 2/1=2.0 4/2=2.0 5/3=1.67AU: 2/2=1.0 4/5=0.8 5/8=0.625

Accumulated counts

Page 33: Risk Minimization and  Language Modeling in Text Retrieval

33

Loss Function L( k+1 | 1 … k )

d1

dk

? dk+1

… 1

k

k+1

known

Novelty/RedundancyNov ( k+1 | 1 … k )

RelevanceRel( k+1 )

Maximal Marginal Relevance (MMR)

The best dk+1 is novel & relevant

1

k

k+1

Maximal Diverse Relevance (MDR)

Aspect Coverage Distrib. p(a|i)

The best dk+1 is complementary

in coverage

Page 34: Risk Minimization and  Language Modeling in Text Retrieval

34

Maximal Marginal Relevance (MMR) Models

• Maximizing aspect coverage indirectly through redundancy elimination

• Elements

– Redundancy/Novelty measure

– Combination of novelty and relevance

• Proposed & studied six novelty measures

• Proposed & studied four combination strategies

Page 35: Risk Minimization and  Language Modeling in Text Retrieval

35

Comparison of Novelty Measures (Aspect Coverage)

0

0.5

1

1.5

2

2.5

3

3.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Aspect Recall

Avg

. As

pe

ct C

ove

rag

e

Relevance

AvgKL

AvgMix

KLMin

KLAvg

MixMin

MixAvg

Page 36: Risk Minimization and  Language Modeling in Text Retrieval

36

Comparison of Novelty Measures (Aspect Uniqueness)

0

0.2

0.4

0.6

0.8

1

1.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Aspect Recall

Avg

. As

pe

ct U

niq

ue

ne

ss

Relevance

AvgKL

AvgMix

KLMin

KLAvg

MixMin

MixAvg

Page 37: Risk Minimization and  Language Modeling in Text Retrieval

37

A Mixture Model for Redundancy

P(w|Background)Collection

P(w|Old)

Ref. document

1-

=?

Maximum Likelihood Expectation-Maximization

Page 38: Risk Minimization and  Language Modeling in Text Retrieval

38

Cost-based Combination of Relevance and Novelty

1,

))|(1()|(

))|(1)(|(Re

))|(Re1())|(1)(|(Re)}{,,,...,|(

2

3

321111

c

cwhere

dNewpdqp

dNewpdlp

dlpcdNewpdlpcdddl

kk

Rank

kk

Rank

kkkkiiQkk

Relevance score Novelty score

Page 39: Risk Minimization and  Language Modeling in Text Retrieval

39

Maximal Diverse Relevance (MDR) Models

• Maximizing aspect coverage directly through aspect modeling

• Elements

– Aspect loss function

– Generative Aspect Model

• Proposed & studied KL-divergence aspect loss function

• Explored two aspect models (PLSI, LDA)

Page 40: Risk Minimization and  Language Modeling in Text Retrieval

40

Aspect Generative Model of Document & Query

QU),|( Up Q

User),|( Qqp

q Query

DS),|( Sp D

Source),|( Ddp

d Document

=( 1,…, k)

n

n

i

A

aDaiD dddwhereapdpdp ...,)|()|(),|( 1

1 1

dDirapdpdpn

i

A

aai )|()|()|(),|(

1 1

PLSI:

LDA:

Page 41: Risk Minimization and  Language Modeling in Text Retrieval

41

Aspect Loss Function

)|()1()|(1

)|(

,

)||()}{,,,...,|(

1

11,...,1

1,...,11111

k

k

ii

kk

kkQ

kiiQkk

apapk

ap

where

Ddddl

QU),|( Up Q ),|( Qqp

q

DS),|( Sp D Ddp ,|(

d

)ˆ||ˆ( 1,...,1k

kQD

Page 42: Risk Minimization and  Language Modeling in Text Retrieval

42

Aspect Loss Function: Illustration

Desired coveragep(a|Q)

“Already covered” p(a|1)... p(a|k -1)

New candidate p(a|k)

non-relevant

redundant

perfect

Combined coverage

)|()1()|(1

)|(

1

1

1,...,1

k

k

ii

kk

apapk

ap

Page 43: Risk Minimization and  Language Modeling in Text Retrieval

43

Preliminary Evaluation: MMR vs. MDR

Relevant Data Mixed Data Ranking Method AC AU AC AU Prec. MMR +2.6% +13.8% +1.5% +2.2% +3.4% MDR +9.8% +4.5% +1.5% 0.0% -13.8%

• On the relevant data set, both MMR and MDR

are effective, but they complement each other

- MMR improves AU more than AC

- MDR improves AC more than AU

• On the mixed data set, however,

- MMR is only effective when relevance ranking is accurate

- MDR improves AC, even though relevance ranking is degraded.

Page 44: Risk Minimization and  Language Modeling in Text Retrieval

44

Further Work is Needed

• Controlled experiments with synthetic data

– Level of redundancy

– Density of relevant documents

– Per-document aspect counts

• Alternative loss functions

• Aspect language models, especially along the line of LDA

– Aspect-based feedback

Page 45: Risk Minimization and  Language Modeling in Text Retrieval

45

Summary of Contributions

Two-stageLanguage Model

KL-divergenceRetrieval Model

Aspect RetrievalModel

New TR Models

RiskMinimizationFramework

New TR Framework

•Unifies existing models•Incorporates LMs•Serves as a map for exploring new models

Specific Contributions

•Empirical study of smoothing (dual role of smoothing)•New smoothing method (two-stage smoothing) •Automatic parameter setting (leave-one-out, mixture)

•Query/document distillation•Feedback with LMs (mixture model & div. min.)

•Evaluation criteria (AC, AU)•Redundancy/novelty measures (mixture weight)•MMR with LMs (cost-comb.)•Aspect-based loss function (“collective KL-div”)

Page 46: Risk Minimization and  Language Modeling in Text Retrieval

46

Future Research Directions

• Better Approximation of the risk integral

• More effective LMs for “traditional” retrieval

– Can we beat TF-IDF without increasing computational complexity?

– Automatic parameter setting, especially for feedback models

– Flexible passage retrieval, especially with HMM

– Beyond unigrams (more linguistics)

Page 47: Risk Minimization and  Language Modeling in Text Retrieval

47

More Future Research Directions

• Aspect Retrieval Models

– Document structure/sub-topic modeling

– Aspect-based feedback

• Interactive information retrieval models

– Risk minimization for information filtering

– Personalized & context-sensitive retrieval

Page 48: Risk Minimization and  Language Modeling in Text Retrieval

48

Thank you!