25
Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing Jiang ChengXiang Zhai

Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing JiangChengXiang Zhai

Embed Size (px)

Citation preview

Passage Retrieval using HMMs

HARD 2004University of Illinois at Urbana-

Champaign

Jing Jiang ChengXiang Zhai

Nokia, the world’sbiggest … acquired Sega… Japanese video gamemaker, … … … … … … …… … … … … … … … … …its mobile N-Gage game… … … … … … … … … ……features of a cell phone,MP3-player … … … … …… … … Nokia is the cell phonemarket leader …

Nintendo Co.’s … nowworks as a videophone… … … … … … … … … …… … … … … … … … … …… … … … … … … … … …… which makes mobileand Internet equipment… … … … … … … … … …… … … … … … … Nintendo has sold morethan 10 million Game Boy…

Motivation – Variable Length Passages

APE20030911.0887

APE20030922.0156

Nintendo Co.’s … nowworks as a videophone… … … … … … … … … …… … … … … … … … … …… … … … … … … … … …… which makes mobileand Internet equipment… … … … … … … … … …… … … … … … … Nintendo has sold morethan 10 million Game Boy…

Nokia, the world’sbiggest … acquired Sega… Japanese video gamemaker, … … … … … … …… … … … … … … … … …its mobile N-Gage game… … … … … … … … … ……features of a cell phone,MP3-player … … … … …… … … Nokia is the cell phonemarket leader …

HARD-422

video gamecrash

APE20030911.0887

APE20030922.0156

Motivation – Variable Length Passages

document-dependent

Nintendo Co.’s … nowworks as a videophone… … … … … … … … … …… … … … … … … … … …… … … … … … … … … …… which makes mobileand Internet equipment… … … … … … … … … …… … … … … … … Nintendo has sold morethan 10 million Game Boy…

Nokia, the world’sbiggest … acquired Sega… Japanese video gamemaker, … … … … … … …… … … … … … … … … …its mobile N-Gage game… … … … … … … … … ……features of a cell phone,MP3-player … … … … …… … … Nokia is the cell phonemarket leader …

HARD-443hand-

heldelectroni

cs

Motivation – Variable Length Passages

query-dependent

HARD-422

video gamecrash

APE20030911.0887

APE20030922.0156

Research Question

Passage length is document-dependent query-dependent

How to detect variable-length passages?

Previous Work on Passage Retrieval

Structural or semantic boundary Passage is not query-specific.

Fixed-length Passage length is not query-specific. Passage content may not be coherent.

Arbitrary – MultiText Only query words are considered. Heuristics are used to reduce search space.

HMM-based The method is promising, but previous work

didn’t fully explore its potential.

HMM-Based Method

w w…w w …w w w w w … ww

document

HMM-Based Methodrelevant passage

Q: hand-heldelectronics

p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 …

p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 …

p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 …

B1 R B2p(R|B1)= 0.1

p(B2|R)= 0.05

p(B1|B1)= 0.9

p(R|R)= 0.95

p(B2|B2)= 1

HMM:

w w…w w …w w w w w … ww

document

HMM-Based Method

p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 …

p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 …

p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 …

B1 R B2p(R|B1)= 0.1

p(B2|R)= 0.05

p(B1|B1)= 0.9

p(R|R)= 0.95

p(B2|B2)= 1

HMM:

B R…B B …R R R R B … BR

relevant passage

w w…w w …w w w w w … ww

document

Q: hand-heldelectronics

Constructing the HMM

B1 R B2

Constructing the HMM

B1 R B2 E

end-of-docstate

Constructing the HMM

B3

B1 Q B2 E

0.01

0.99

0.005

smoothing achieved bytransitions

end-of-docstate

Constructing the HMM

B3

B1 FB B2 E

0.01

0.99

0.005

expanded queryLM to incorporate

feedback

smoothing achieved bytransitions

end-of-docstate

Constructing the HMM

B3

B1 FB B2 E

0.01

0.99

0.005

expanded queryLM to incorporate

feedback

smoothing achieved bytransitions

end-of-docstate

transition probabilities trainedfor each document

Passage Extension

w w…w w w… w w w ww … ww

short passage withartificial boundary

B3

B1 FB B2 E

w w…w w w… w w w ww … www

passage extended to the natural topical boundary

w w…w w w… w w w ww … ww

true passage

Retrieval – Approach 1

Retrieval – Approach 11

2

3

n

ranking

Retrieval – Approach 1

1

2

3

n

rankingpassage

extraction

Retrieval – Approach 21

2

3

n

rankingpassage

extraction

Retrieval – Our Approachfixed-length

passages

1

2

3

n

ranking HMM

1

2

3

n

b0: whole-document ranking, pseudo-feedback

f0: 120-word passages, relevance feedback f1: HMM-extended

60-word passages, relevance feedback

our focus

Passage-Level Results

Overall, baseline was the best.

Run ExplanationBPref @

12K charsRec @

10 psgsPrec @ 10 psgs

b0whole-doc, pseudo

FB0.2710 0.2517 0.1570

f0fixed 120 psgs, rel

FB0.2080 0.1067 0.2391

f1fixed 60 psgs +

HMM, rel FB0.1860 0.1494 0.1411

Effectiveness of HMM method

Method BPref @ 12K Prec @ 12K CharRPrec

Fixed 60 0.1208 0.1623 0.0776

Fixed 60 + HMM 0.1868 0.2143 0.1424

Relative improvement

54.6% 32.0% 83.5%

Fixed 120 0.1738 0.2088 0.1043

Fixed 120 + HMM

0.2131 0.2265 0.1562

Relative improvement

22.6% 8.48% 49.8% HMM method improved performance over fixed-length passages Less improvement if fixed-length closer to optimal length

Diagnosis Runs

FactorFeedbac

kRanking HMM

Overall

b0pseudo

FBdoc yes N/A

f1 rel FB passage no N/A

f1 vs. b0(BPref@12

K)-12.1% -29.7%

+10.0%

-32.1%

non-optimal parameter setting

KL-divergence workspoorly on passages

HMM improvesboundaries

Discussions and Conclusions

HMM method improved the performance over fixed-length passages LM (KL-divergence) method gives worse performance on passage ranking than on document ranking

The End

Questions?