18
A Modified Information Retrieval Approach to Produce Answer Candidates for Question Answering Johannes Leveling Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen) 58084 Hagen, Germany [email protected] LWA 2007 Workshop, Halle (Saale), Germany

A Modified Information Retrieval Approach to Produce Candidates for Question Answering

Embed Size (px)

DESCRIPTION

Talk given at the FGIR LWA workshop 2007, Halle (Saale), Germany

Citation preview

Page 1: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A Modified Information RetrievalApproach to Produce Answer

Candidates for QuestionAnswering

Johannes Leveling

Intelligent Information and Communication Systems (IICS)University of Hagen (FernUniversität in Hagen)

58084 Hagen, [email protected]

LWA 2007 Workshop, Halle (Saale), Germany

Page 2: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Outline

1 IRSAW

2 QA phases

3 MIRAEmbedding of MIRAExpected answer typesTüBa-D/Z annotation

4 MAVE

5 Evaluation

6 Summary and Future Work

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 2 / 18

Page 3: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

IRSAW questionanswering framework

Produce answer candidates

Natural language question

Answer candidate

Answer candidate

Answer candidate

InSicht

MIRA

QAP

Question

processing

preprocessing

Document

Local

producer:

producer:

producer:

Documents

IRSAW framework

Database

and selection: MAVE

Answer validation

Answer

IRSAW: Intelligent Information Retrieval on the Basis of aSemantically Annotated Webfunded by the DFG (Deutsche Forschungsgemeinschaft)

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 3 / 18

Page 4: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Question answeringphases

1 Process document collection2 Preprocess question

(⇐ Natural language question)3 Retrieve text segments4 Match document and question representations5 Return answer candidates6 Merge and validate answer candidates

(⇒ Answer)

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 4 / 18

Page 5: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Embedding of MIRA inIRSAW

• Employ different modules to produce datastreams containing answer candidates:• InSicht (Matching semantic network

representations, Hartrumpf and Leveling (2007))• QAP (Question Answering by Pattern matching,

Leveling (2006)), and• MIRA (Modified Information Retrieval Approach)

• Use different methods to produce answerstreams to increase recall and robustness

• Merge, rank, logically validate answercandidates and select best answer, (MAVE,Glöckner et al. (2007))

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 5 / 18

Page 6: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

MIRA

• Shallow question answering• Expected answer type (EAT) of question

determined by Bayesian classifier:PERSON, SUBSTANCE, ...

• Manually annotated corpus with EAT tags (e.g.PERSON) and subclasses (e.g. person-firstperson-last)

• TüBa-D/Z newspaper corpus(Tübingen Treebank of Written German;http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml),approximately 470,000 words

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 6 / 18

Page 7: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Expected answer types(1/3)

• Question (German): Wer wurde 1948 ersterMinisterpräsident Israels?

• Question (English): Who became the first Primeminister of Israel in 1948?

• EAT: PERSON• Answer string:

David ben Gurion• Tag sequence:

person-first person-part person-last

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 7 / 18

Page 8: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Expected answer types(2/3)

• Question (German): In welchem Jahr endeteoffiziell die Besetzung Deutschlands?

• Question (English): In what year did theoccupation of Germany officially end?

• EAT: TIME• Answer string:

im Jahr 1955• Tag sequence:

prep year num-card

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 8 / 18

Page 9: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Expected answer types(3/3)

• Question (German): Wie wird der Ebolavirusübertragen?

• Question (English): How is the Ebola virustransmitted?

• EAT: OTHER• Answer string: (Übertragen werden die

Ebolaviren durch direkten Körperkontakt und beiKontakt mit Körperausscheidungen infizierterPersonen per Kontaktinfektion bzw.Schmierinfektion.)

• Tag sequence:– (other entity type→ answer not found!)

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 9 / 18

Page 10: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

EAT frequency inannotated TüBa-D/Z

Name class Corpus frequency

LOCATION 8,274PERSON 14,527ORGANIZATION 7,148TIME 14,524MEASURE 895SUBSTANCE 293OTHER 2,987

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 10 / 18

Page 11: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

EAT subclass frequencyin annotated TüBa-D/Z

LOCATION Subclass frequency

city 3,717country 1,955region 926street 613state 370other 206building 195streetno 124river 85island 55sea 17mountain 11

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 11 / 18

Page 12: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Tagging with subclassesToken EAT Subclass

Vor TIME prep25 TIME num-cardJahren TIME yearbetrat –Neil PERSON person-firstArmstrong PERSON person-lastals –erster –Mensch –den –Mond LOCATION other, –doch –heute TIME deicticstagniert –die –bemannte –Raumfahrt –. –

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 12 / 18

Page 13: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

MAVE - MultiNet-basedAnswer Verification

• Validate answer candidates• Test logical validity of answer candidate by using

a) inferences, entailmentsb) heuristic quality indicators (fallback strategy)

• Select most trusted answer

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 13 / 18

Page 14: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Evaluation results (1/3)

Performance results for InSicht, QAP, and MIRAbased on questions from QA@CLEF data from 2004to 2006

System # Candidates Coverage # Correct Precision

InSicht 1,212 226/600 625 51.6%QAP 2,562 114/600 1,190 46.6%MIRA 14,946 520/600 1,738 11.6%

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 14 / 18

Page 15: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Evaluation results (2/3)

Performance results including answer selection byMAVE based on questions from QA@CLEF datafrom 2004 to 2006

Run # Correct # Inexact # Wrong

InSicht+Mira+QAP 247.4 15.8 307.8InSicht+Mira+QAP (opt.) 305.0 17.0 249.0

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 15 / 18

Page 16: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Evaluation results (3/3)

Results for MIRA answer candidates for QA@CLEFdata from 2003 to 2006

top-N

N=50 N=30 N=10 N=5

# Correct (2006) 798 615 215 95# Inexact (2006) 56 53 20 12# Wrong (2006) 4,436 3,421 1,360 722

# Correct (2003–2006) 1,864 1,503 609 263# Inexact (2003–2006) 287 248 103 54# Wrong (2003–2006) 17,326 14,102 5,694 3,013

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 16 / 18

Page 17: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Summary and FutureWork

MIRA:• Produces a highly recall-oriented answer

stream,• Covers more questions than the other answer

producers in IRSAW, and• Returns the largest number of correct answer

candidates.Future work:• Return additional answer support for temporal

deictic expressions• Support processing list questions

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 17 / 18

Page 18: A Modified Information Retrieval Approach to Produce Candidates for Question Answering

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Selected ReferencesGlöckner, Ingo; Sven Hartrumpf; and Johannes Leveling

(2007). Logical validation, answer merging and witnessselection – a case study in multi-stream question answering.In Proceedings of RIAO 2007, Large-Scale Semantic Accessto Content (Text, Image, Video and Sound). Pittsburgh, USA:C.I.D.

Hartrumpf, Sven and Johannes Leveling (2007). Interpretationand normalization of temporal expressions for questionanswering. In Evaluation of Multilingual and Multi-modalInformation Retrieval: 7th Workshop of the Cross-LanguageEvaluation Forum, CLEF 2006 (edited by et al.,Carol Peters), volume 4730 of LNCS, pp. 432–439. Berlin:Springer.

Leveling, Johannes (2006). On the role of information retrievalin the question answering system IRSAW. In Proceedings ofthe LWA 2006, Workshop Information Retrieval, pp.119–125. Hildesheim, Germany: Universität Hildesheim.

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 18 / 18