A Modified Information Retrieval Approach to Produce Candidates for Question Answering

Preview:

DESCRIPTION

Talk given at the FGIR LWA workshop 2007, Halle (Saale), Germany

Citation preview

A Modified Information RetrievalApproach to Produce Answer

Candidates for QuestionAnswering

Johannes Leveling

Intelligent Information and Communication Systems (IICS)University of Hagen (FernUniversität in Hagen)

58084 Hagen, Germanyjohannes.leveling@fernuni-hagen.de

LWA 2007 Workshop, Halle (Saale), Germany

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Outline

1 IRSAW

2 QA phases

3 MIRAEmbedding of MIRAExpected answer typesTüBa-D/Z annotation

4 MAVE

5 Evaluation

6 Summary and Future Work

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 2 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

IRSAW questionanswering framework

Produce answer candidates

Natural language question

Answer candidate

Answer candidate

Answer candidate

InSicht

MIRA

QAP

Question

processing

preprocessing

Document

Local

producer:

producer:

producer:

Documents

IRSAW framework

Database

and selection: MAVE

Answer validation

Answer

IRSAW: Intelligent Information Retrieval on the Basis of aSemantically Annotated Webfunded by the DFG (Deutsche Forschungsgemeinschaft)

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 3 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Question answeringphases

1 Process document collection2 Preprocess question

(⇐ Natural language question)3 Retrieve text segments4 Match document and question representations5 Return answer candidates6 Merge and validate answer candidates

(⇒ Answer)

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 4 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Embedding of MIRA inIRSAW

• Employ different modules to produce datastreams containing answer candidates:• InSicht (Matching semantic network

representations, Hartrumpf and Leveling (2007))• QAP (Question Answering by Pattern matching,

Leveling (2006)), and• MIRA (Modified Information Retrieval Approach)

• Use different methods to produce answerstreams to increase recall and robustness

• Merge, rank, logically validate answercandidates and select best answer, (MAVE,Glöckner et al. (2007))

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 5 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

MIRA

• Shallow question answering• Expected answer type (EAT) of question

determined by Bayesian classifier:PERSON, SUBSTANCE, ...

• Manually annotated corpus with EAT tags (e.g.PERSON) and subclasses (e.g. person-firstperson-last)

• TüBa-D/Z newspaper corpus(Tübingen Treebank of Written German;http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml),approximately 470,000 words

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 6 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Expected answer types(1/3)

• Question (German): Wer wurde 1948 ersterMinisterpräsident Israels?

• Question (English): Who became the first Primeminister of Israel in 1948?

• EAT: PERSON• Answer string:

David ben Gurion• Tag sequence:

person-first person-part person-last

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 7 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Expected answer types(2/3)

• Question (German): In welchem Jahr endeteoffiziell die Besetzung Deutschlands?

• Question (English): In what year did theoccupation of Germany officially end?

• EAT: TIME• Answer string:

im Jahr 1955• Tag sequence:

prep year num-card

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 8 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Expected answer types(3/3)

• Question (German): Wie wird der Ebolavirusübertragen?

• Question (English): How is the Ebola virustransmitted?

• EAT: OTHER• Answer string: (Übertragen werden die

Ebolaviren durch direkten Körperkontakt und beiKontakt mit Körperausscheidungen infizierterPersonen per Kontaktinfektion bzw.Schmierinfektion.)

• Tag sequence:– (other entity type→ answer not found!)

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 9 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

EAT frequency inannotated TüBa-D/Z

Name class Corpus frequency

LOCATION 8,274PERSON 14,527ORGANIZATION 7,148TIME 14,524MEASURE 895SUBSTANCE 293OTHER 2,987

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 10 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

EAT subclass frequencyin annotated TüBa-D/Z

LOCATION Subclass frequency

city 3,717country 1,955region 926street 613state 370other 206building 195streetno 124river 85island 55sea 17mountain 11

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 11 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Tagging with subclassesToken EAT Subclass

Vor TIME prep25 TIME num-cardJahren TIME yearbetrat –Neil PERSON person-firstArmstrong PERSON person-lastals –erster –Mensch –den –Mond LOCATION other, –doch –heute TIME deicticstagniert –die –bemannte –Raumfahrt –. –

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 12 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

MAVE - MultiNet-basedAnswer Verification

• Validate answer candidates• Test logical validity of answer candidate by using

a) inferences, entailmentsb) heuristic quality indicators (fallback strategy)

• Select most trusted answer

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 13 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Evaluation results (1/3)

Performance results for InSicht, QAP, and MIRAbased on questions from QA@CLEF data from 2004to 2006

System # Candidates Coverage # Correct Precision

InSicht 1,212 226/600 625 51.6%QAP 2,562 114/600 1,190 46.6%MIRA 14,946 520/600 1,738 11.6%

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 14 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Evaluation results (2/3)

Performance results including answer selection byMAVE based on questions from QA@CLEF datafrom 2004 to 2006

Run # Correct # Inexact # Wrong

InSicht+Mira+QAP 247.4 15.8 307.8InSicht+Mira+QAP (opt.) 305.0 17.0 249.0

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 15 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Evaluation results (3/3)

Results for MIRA answer candidates for QA@CLEFdata from 2003 to 2006

top-N

N=50 N=30 N=10 N=5

# Correct (2006) 798 615 215 95# Inexact (2006) 56 53 20 12# Wrong (2006) 4,436 3,421 1,360 722

# Correct (2003–2006) 1,864 1,503 609 263# Inexact (2003–2006) 287 248 103 54# Wrong (2003–2006) 17,326 14,102 5,694 3,013

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 16 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Summary and FutureWork

MIRA:• Produces a highly recall-oriented answer

stream,• Covers more questions than the other answer

producers in IRSAW, and• Returns the largest number of correct answer

candidates.Future work:• Return additional answer support for temporal

deictic expressions• Support processing list questions

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 17 / 18

A modifiedinformation

retrieval approachto produce answercandidates for QA

JohannesLeveling

IRSAW

QA phases

MIRAEmbedding of MIRA

Expected answer types

TüBa-D/Z annotation

MAVE

Evaluation

Summary andFuture Work

References

Selected ReferencesGlöckner, Ingo; Sven Hartrumpf; and Johannes Leveling

(2007). Logical validation, answer merging and witnessselection – a case study in multi-stream question answering.In Proceedings of RIAO 2007, Large-Scale Semantic Accessto Content (Text, Image, Video and Sound). Pittsburgh, USA:C.I.D.

Hartrumpf, Sven and Johannes Leveling (2007). Interpretationand normalization of temporal expressions for questionanswering. In Evaluation of Multilingual and Multi-modalInformation Retrieval: 7th Workshop of the Cross-LanguageEvaluation Forum, CLEF 2006 (edited by et al.,Carol Peters), volume 4730 of LNCS, pp. 432–439. Berlin:Springer.

Leveling, Johannes (2006). On the role of information retrievalin the question answering system IRSAW. In Proceedings ofthe LWA 2006, Workshop Information Retrieval, pp.119–125. Hildesheim, Germany: Universität Hildesheim.

Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 18 / 18

Recommended