Upload
jleveling
View
393
Download
1
Embed Size (px)
DESCRIPTION
Talk given at the FGIR LWA workshop 2007, Halle (Saale), Germany
Citation preview
A Modified Information RetrievalApproach to Produce Answer
Candidates for QuestionAnswering
Johannes Leveling
Intelligent Information and Communication Systems (IICS)University of Hagen (FernUniversität in Hagen)
58084 Hagen, [email protected]
LWA 2007 Workshop, Halle (Saale), Germany
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Outline
1 IRSAW
2 QA phases
3 MIRAEmbedding of MIRAExpected answer typesTüBa-D/Z annotation
4 MAVE
5 Evaluation
6 Summary and Future Work
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 2 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
IRSAW questionanswering framework
Produce answer candidates
Natural language question
Answer candidate
Answer candidate
Answer candidate
InSicht
MIRA
QAP
Question
processing
preprocessing
Document
Local
producer:
producer:
producer:
Documents
IRSAW framework
Database
and selection: MAVE
Answer validation
Answer
IRSAW: Intelligent Information Retrieval on the Basis of aSemantically Annotated Webfunded by the DFG (Deutsche Forschungsgemeinschaft)
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 3 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Question answeringphases
1 Process document collection2 Preprocess question
(⇐ Natural language question)3 Retrieve text segments4 Match document and question representations5 Return answer candidates6 Merge and validate answer candidates
(⇒ Answer)
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 4 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Embedding of MIRA inIRSAW
• Employ different modules to produce datastreams containing answer candidates:• InSicht (Matching semantic network
representations, Hartrumpf and Leveling (2007))• QAP (Question Answering by Pattern matching,
Leveling (2006)), and• MIRA (Modified Information Retrieval Approach)
• Use different methods to produce answerstreams to increase recall and robustness
• Merge, rank, logically validate answercandidates and select best answer, (MAVE,Glöckner et al. (2007))
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 5 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
MIRA
• Shallow question answering• Expected answer type (EAT) of question
determined by Bayesian classifier:PERSON, SUBSTANCE, ...
• Manually annotated corpus with EAT tags (e.g.PERSON) and subclasses (e.g. person-firstperson-last)
• TüBa-D/Z newspaper corpus(Tübingen Treebank of Written German;http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml),approximately 470,000 words
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 6 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Expected answer types(1/3)
• Question (German): Wer wurde 1948 ersterMinisterpräsident Israels?
• Question (English): Who became the first Primeminister of Israel in 1948?
• EAT: PERSON• Answer string:
David ben Gurion• Tag sequence:
person-first person-part person-last
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 7 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Expected answer types(2/3)
• Question (German): In welchem Jahr endeteoffiziell die Besetzung Deutschlands?
• Question (English): In what year did theoccupation of Germany officially end?
• EAT: TIME• Answer string:
im Jahr 1955• Tag sequence:
prep year num-card
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 8 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Expected answer types(3/3)
• Question (German): Wie wird der Ebolavirusübertragen?
• Question (English): How is the Ebola virustransmitted?
• EAT: OTHER• Answer string: (Übertragen werden die
Ebolaviren durch direkten Körperkontakt und beiKontakt mit Körperausscheidungen infizierterPersonen per Kontaktinfektion bzw.Schmierinfektion.)
• Tag sequence:– (other entity type→ answer not found!)
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 9 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
EAT frequency inannotated TüBa-D/Z
Name class Corpus frequency
LOCATION 8,274PERSON 14,527ORGANIZATION 7,148TIME 14,524MEASURE 895SUBSTANCE 293OTHER 2,987
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 10 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
EAT subclass frequencyin annotated TüBa-D/Z
LOCATION Subclass frequency
city 3,717country 1,955region 926street 613state 370other 206building 195streetno 124river 85island 55sea 17mountain 11
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 11 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Tagging with subclassesToken EAT Subclass
Vor TIME prep25 TIME num-cardJahren TIME yearbetrat –Neil PERSON person-firstArmstrong PERSON person-lastals –erster –Mensch –den –Mond LOCATION other, –doch –heute TIME deicticstagniert –die –bemannte –Raumfahrt –. –
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 12 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
MAVE - MultiNet-basedAnswer Verification
• Validate answer candidates• Test logical validity of answer candidate by using
a) inferences, entailmentsb) heuristic quality indicators (fallback strategy)
• Select most trusted answer
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 13 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Evaluation results (1/3)
Performance results for InSicht, QAP, and MIRAbased on questions from QA@CLEF data from 2004to 2006
System # Candidates Coverage # Correct Precision
InSicht 1,212 226/600 625 51.6%QAP 2,562 114/600 1,190 46.6%MIRA 14,946 520/600 1,738 11.6%
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 14 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Evaluation results (2/3)
Performance results including answer selection byMAVE based on questions from QA@CLEF datafrom 2004 to 2006
Run # Correct # Inexact # Wrong
InSicht+Mira+QAP 247.4 15.8 307.8InSicht+Mira+QAP (opt.) 305.0 17.0 249.0
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 15 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Evaluation results (3/3)
Results for MIRA answer candidates for QA@CLEFdata from 2003 to 2006
top-N
N=50 N=30 N=10 N=5
# Correct (2006) 798 615 215 95# Inexact (2006) 56 53 20 12# Wrong (2006) 4,436 3,421 1,360 722
# Correct (2003–2006) 1,864 1,503 609 263# Inexact (2003–2006) 287 248 103 54# Wrong (2003–2006) 17,326 14,102 5,694 3,013
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 16 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Summary and FutureWork
MIRA:• Produces a highly recall-oriented answer
stream,• Covers more questions than the other answer
producers in IRSAW, and• Returns the largest number of correct answer
candidates.Future work:• Return additional answer support for temporal
deictic expressions• Support processing list questions
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 17 / 18
A modifiedinformation
retrieval approachto produce answercandidates for QA
JohannesLeveling
IRSAW
QA phases
MIRAEmbedding of MIRA
Expected answer types
TüBa-D/Z annotation
MAVE
Evaluation
Summary andFuture Work
References
Selected ReferencesGlöckner, Ingo; Sven Hartrumpf; and Johannes Leveling
(2007). Logical validation, answer merging and witnessselection – a case study in multi-stream question answering.In Proceedings of RIAO 2007, Large-Scale Semantic Accessto Content (Text, Image, Video and Sound). Pittsburgh, USA:C.I.D.
Hartrumpf, Sven and Johannes Leveling (2007). Interpretationand normalization of temporal expressions for questionanswering. In Evaluation of Multilingual and Multi-modalInformation Retrieval: 7th Workshop of the Cross-LanguageEvaluation Forum, CLEF 2006 (edited by et al.,Carol Peters), volume 4730 of LNCS, pp. 432–439. Berlin:Springer.
Leveling, Johannes (2006). On the role of information retrievalin the question answering system IRSAW. In Proceedings ofthe LWA 2006, Workshop Information Retrieval, pp.119–125. Hildesheim, Germany: Universität Hildesheim.
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 18 / 18