Upload
uriel-bever
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Passage Retrieval using HMMs
HARD 2004University of Illinois at Urbana-
Champaign
Jing Jiang ChengXiang Zhai
Nokia, the world’sbiggest … acquired Sega… Japanese video gamemaker, … … … … … … …… … … … … … … … … …its mobile N-Gage game… … … … … … … … … ……features of a cell phone,MP3-player … … … … …… … … Nokia is the cell phonemarket leader …
Nintendo Co.’s … nowworks as a videophone… … … … … … … … … …… … … … … … … … … …… … … … … … … … … …… which makes mobileand Internet equipment… … … … … … … … … …… … … … … … … Nintendo has sold morethan 10 million Game Boy…
Motivation – Variable Length Passages
APE20030911.0887
APE20030922.0156
Nintendo Co.’s … nowworks as a videophone… … … … … … … … … …… … … … … … … … … …… … … … … … … … … …… which makes mobileand Internet equipment… … … … … … … … … …… … … … … … … Nintendo has sold morethan 10 million Game Boy…
Nokia, the world’sbiggest … acquired Sega… Japanese video gamemaker, … … … … … … …… … … … … … … … … …its mobile N-Gage game… … … … … … … … … ……features of a cell phone,MP3-player … … … … …… … … Nokia is the cell phonemarket leader …
HARD-422
video gamecrash
APE20030911.0887
APE20030922.0156
Motivation – Variable Length Passages
document-dependent
Nintendo Co.’s … nowworks as a videophone… … … … … … … … … …… … … … … … … … … …… … … … … … … … … …… which makes mobileand Internet equipment… … … … … … … … … …… … … … … … … Nintendo has sold morethan 10 million Game Boy…
Nokia, the world’sbiggest … acquired Sega… Japanese video gamemaker, … … … … … … …… … … … … … … … … …its mobile N-Gage game… … … … … … … … … ……features of a cell phone,MP3-player … … … … …… … … Nokia is the cell phonemarket leader …
HARD-443hand-
heldelectroni
cs
Motivation – Variable Length Passages
query-dependent
HARD-422
video gamecrash
APE20030911.0887
APE20030922.0156
Research Question
Passage length is document-dependent query-dependent
How to detect variable-length passages?
Previous Work on Passage Retrieval
Structural or semantic boundary Passage is not query-specific.
Fixed-length Passage length is not query-specific. Passage content may not be coherent.
Arbitrary – MultiText Only query words are considered. Heuristics are used to reduce search space.
HMM-based The method is promising, but previous work
didn’t fully explore its potential.
HMM-Based Methodrelevant passage
Q: hand-heldelectronics
p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 …
p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 …
p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 …
B1 R B2p(R|B1)= 0.1
p(B2|R)= 0.05
p(B1|B1)= 0.9
p(R|R)= 0.95
p(B2|B2)= 1
HMM:
w w…w w …w w w w w … ww
document
HMM-Based Method
p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 …
p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 …
p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 …
B1 R B2p(R|B1)= 0.1
p(B2|R)= 0.05
p(B1|B1)= 0.9
p(R|R)= 0.95
p(B2|B2)= 1
HMM:
B R…B B …R R R R B … BR
relevant passage
w w…w w …w w w w w … ww
document
Q: hand-heldelectronics
Constructing the HMM
B3
B1 FB B2 E
0.01
0.99
0.005
expanded queryLM to incorporate
feedback
smoothing achieved bytransitions
end-of-docstate
Constructing the HMM
B3
B1 FB B2 E
0.01
0.99
0.005
expanded queryLM to incorporate
feedback
smoothing achieved bytransitions
end-of-docstate
transition probabilities trainedfor each document
Passage Extension
w w…w w w… w w w ww … ww
short passage withartificial boundary
B3
B1 FB B2 E
w w…w w w… w w w ww … www
passage extended to the natural topical boundary
w w…w w w… w w w ww … ww
true passage
Retrieval – Our Approachfixed-length
passages
1
2
3
n
…
ranking HMM
…
1
2
3
n
b0: whole-document ranking, pseudo-feedback
f0: 120-word passages, relevance feedback f1: HMM-extended
60-word passages, relevance feedback
our focus
Passage-Level Results
Overall, baseline was the best.
Run ExplanationBPref @
12K charsRec @
10 psgsPrec @ 10 psgs
b0whole-doc, pseudo
FB0.2710 0.2517 0.1570
f0fixed 120 psgs, rel
FB0.2080 0.1067 0.2391
f1fixed 60 psgs +
HMM, rel FB0.1860 0.1494 0.1411
Effectiveness of HMM method
Method BPref @ 12K Prec @ 12K CharRPrec
Fixed 60 0.1208 0.1623 0.0776
Fixed 60 + HMM 0.1868 0.2143 0.1424
Relative improvement
54.6% 32.0% 83.5%
Fixed 120 0.1738 0.2088 0.1043
Fixed 120 + HMM
0.2131 0.2265 0.1562
Relative improvement
22.6% 8.48% 49.8% HMM method improved performance over fixed-length passages Less improvement if fixed-length closer to optimal length
Diagnosis Runs
FactorFeedbac
kRanking HMM
Overall
b0pseudo
FBdoc yes N/A
f1 rel FB passage no N/A
f1 vs. b0(BPref@12
K)-12.1% -29.7%
+10.0%
-32.1%
non-optimal parameter setting
KL-divergence workspoorly on passages
HMM improvesboundaries
Discussions and Conclusions
HMM method improved the performance over fixed-length passages LM (KL-divergence) method gives worse performance on passage ranking than on document ranking