Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan...

Preview:

DESCRIPTION

Introduction TREC-2003: Main task & Passage task Main task: –Factoids –Lists –Definitions Main_task_score = ½ * factoid_score + ¼ * list_score + ¼ *definition_score

Citation preview

Answer Mining by Combining Extraction Techniques with Abductiv

e ReasoningSanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams and Jer

emy BensleyLCC

TREC 2003 Question Answering Track

Abstract• Information Extraction Technique:

– Axiomatic knowledge derived from WordNet for justifying answers extracted from the AQUAINT text collection

• CICERO LITE:– Named entity recognizer– Recognize precisely a large set of entities that ranged

over an extended set of semantic categories• Theorem Prover:

– Produce abductive justifications of the answers when it had access to the axiomatic transformations of the WordNet glosses

Introduction

• TREC-2003: Main task & Passage task• Main task:

– Factoids– Lists– Definitions

• Main_task_score = ½ * factoid_score + ¼ * list_score + ¼ *definition_score

• Factoid questions:– Seek short, fact-based answers– Ex. ”What are pennies made of?”

• List questions:– Requests a set of instances of specified types– Ex. “What grapes are used in making wine?”– Final answer set was created form the

participants & assessors– IR = #instances judged correct and distinct /

#answers in the final set– IP = #instances judged correct and distinct /

#instances returned– F = (2 * IP * IR) / (IP + IR)

• Definition questions:– Assessor created a list of acceptable info

nuggets, some of which are deemed essential– NR (Nugget Recall) = #essential nuggets

returned in response / #essential nuggets– NP (Nugget Precision)

• Allowance = 100 * #essential and acceptable nuggets returned

• Length = total #non-white space characters in answer strings

• Definition questions:– NP = 1, if length < allowance– NP = 1 – (length – allowance) / length,

otherwise– F = (26 * NP * NR) / (25 * NP + NR)

• TREC-2003:– Factoids: 413– Lists: 37– Definition: 50

Answer Type Count

Answers to Factoid 383

NIL-answers to Factoid 30

Answer instances in List final set 549

Essential nuggets for Definition 207

Total nuggets for Definition 417

The Architecture of the QA System

Question Processing

• Factoid or List questions:– Identify the expected answer type encoded as

• Semantic class recognized by CICERO LITE or• In a hierarchy of semantic concepts using the Wor

dNet hierarchies for verbs and nouns– Ex. “What American revolutionary general turn

ed over West Point to the British?”• Expected answer type is PERSON due to the noun general found in the hierarchy of humans in WordNet

• Definition questions:– Parsed for detecting the NPs and matched ag

ainst a set of patterns– Ex. “What is Iqra?”

• Matched against the pattern <What is Question-Phrase>

• Associated with the answer pattern <Question-Phrase, which means Answer-Phrase>

Document Processing

• Retrieve relevant passages based on the keywords provided by question processing

• Factoid questions:– Ranks the candidate passages

• List questions:– Ranks better passages having multiple occurrences

of concepts of the expected answer type• Definition questions:

– Allows multiple matches of keywords

Answer Extraction

• Factoid:– Answers first extracted based on the answer p

hrase provided by CICERO LITE– If the answer is not a named entity, it is justifie

d abductively by using a theorem prover that makes user of axioms derived form WordNet

– Ex. “What apostle was crucified?”

• List:– Extracted by using the ranked set of extracted

questions– Then determining a cutoff measure based on

the semantic similarity of answers

• Definition– Relies on pattern matching

Extracting Answers for Factoid Questions

• 289 correct answers– 234: identified by the CICERO LITE or recogni

zing it from the Answer Type Hierarchy– 65: due to theorem prover reported in Moldov

an et al. 2003• The role of theorem prover is to boost the precision

by filtering out incorrect answers that are not supported by an abductive justification

• Ex. “what country does Greenland belong to?”– Answered by “Greenland, which is a territory o

f Denmark”– The gloss of the synset of {territory, dominion,

province} is “a territorial possession controlled by a ruling state”

• Ex. “what country does Greenland belong to?”– The logical transformation for this gloss:

• control:v#1(e,x1,x2) & country:n#1(x1) & ruling:a#1(x1) & possession:n#2(x2) & territorial:a#1(x2)

– Refined expression:• process:v#2(e,x1,x2) & COUNTRY:n#1(x1) &

ruling:a#1(x1) & territory:n#2(x2)

Extracting Answers for Definition Questions

• 50 definition questions evaluated• 207 essential nuggets• 417 total nuggets• 485 answers extracted by this system

– Two runs: Exact answers & Corresponding sentence-type answers

– Vital matches: 68(exact) & 86(sentence) form 207– 110(exact ) & 114(sentence) from final set 417

• 38 patterns– 23 patterns had at least a match for the tested

questions

Extracting Answers for List Questions

• 37 list questions• A threshold-based cutoff of the answers

extracted• Decided on the threshold value by using

concept similarities between candidate answers

• Given N list answers– First computes the

similarity between the first and the last answer

– Similarity of a pair of answers

– Consider a window of three noun or verb concepts to the left and to the right of the exact answer

• Given N list answers:– Separate the concepts in nouns and verbs obtaining

– Similarity formula:

• Given N list answers:

• Given N list answers:

• Given N list answers:

Performance Evaluation

• Two different runs:– Exact answers & whole sentence containing the

answer

Conclusion

• Second submission was slightly higher than first submission

• Definition question gets higher score:– An entire sentence allowed more vital nuggets to be

identified by the assessors• Factoid questions in the main task were slightly

better than in the passage task– Passage might have contained multiple concepts

similar to the answer, and thus produced a more vague evaluation context

Recommended