Wintersemester 2009/10 Teil 7 Question Answeringasv.informatik.uni-leipzig.de/document/file_link/158/AdvIR-7.pdf · Wintersemester 2009/10 Teil 7 Question Answering Uwe Quasthoff

Advanced Information Retrieval Wintersemester 2009/10

Teil 7

Question AnsweringUwe QuasthoffUniversität Leipzig

Institut für [email protected]

unter Verwendung von Material von Giuseppe Attardi, Dipartimento di Informatica, Università di Pisa

Question Answering

IR: find documents relevant to query– query: boolean combination of

keywordsQA: find answer to question– Question: expressed in natural

language– Answer: short phrase (< 50 byte)

Trec-9 Q&A track

693 fact-based, short answer questions– either short (50 B) or long (250 B) answer

~3 GB newspaper/newswire text (AP, WSJ, SJMN, FT, LAT, FBIS)Score: MRR (penalizes second answer)Resources: top 50 (no answer for 130 q)Questions: 186 (Encarta), 314 (seeds from Excite logs), 193 (syntactic variants of 54 originals)

Commonalities

Approaches:– question classification– finding entailed answer type– use of WordNet

High-quality document search helpful (e.g. Queen College)

Sample Questions

Q: Who shot President Abraham Lincoln?A: John Wilkes Booth

Q: How many lives were lost in the Pan Am crash in Lockerbie?A: 270

Q: How long does it take to travel from London to Paris through the Channel?A: three hours 45 minutesQ: Which Atlantic hurricane had the highest recorded wind speed?A: Gilbert (200 mph)

Q: Which country has the largest part of the rain forest?A: Brazil (60%)

Question Types

Class 1 Answer: single datum or list of itemsC: who, when, where, how (old, much, large)

Class 2 A: multi-sentenceC: extract from multiple sentences

Class 3 A: across several textsC: comparative/contrastive

Class 4 A: an analysis of retrieved informationC: synthesized coherently from several retrieved fragments

Class 5 A: result of reasoningC: word/domain knowledge and common sense reasoning

Question subtypes

Class 1.A About subjects, objects, manner, time or location

Class 1.B About properties or attributes

Class 1.C Taxonomic nature

Results (long)

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

SMUQuee

nsWate

rloo

IBM

LIMSI

NTT IC

Pisa

MRRUnofficial

Falcon: ArchitectureQuestion

Question Semantic Form

ExpectedAnswer

TypeAnswer

Paragraphs

Answer Semantic Form

Answer

Answer Logical Form

Paragraph Index

Question ProcessingQuestion Processing Paragraph ProcessingParagraph Processing Answer ProcessingAnswer Processing

Paragraph filtering

Paragraph filtering

Collins Parser + NE Extraction


Abduction Filter

Abduction Filter

CoreferenceResolution

CoreferenceResolution

Question Taxonomy

Question ExpansionWordNet



Question Logical Form

Question parse

Who was the first Russian astronaut to walk in space

WP VBD DT JJ NNP NP TO VB IN NN

NP NP

PP

VP

S

VP

S

Question semantic form

astronaut

walk

Russianfirst

spacePERSON

first(x) ∧ astronaut(x) ∧ Russian(x) ∧ space(z) ∧ walk(y, z, x) ∧ PERSON(x)

Question logic form:Question logic form:

Answer type

Expected Answer Type

size Argentina

dimension

QUANTITYWordNet

Question: Question: What is the size of Argentina?What is the size of Argentina?

Questions about definitions

Special patterns:– What {is|are} …?– What is the definition of …?– Who {is|was|are|were} …?

Answer patterns:– …{is|are}– …, {a|an|the}– … -

Question Taxonomy

Reason

Number

Manner

Location

Organization

Product

Language

Mammal

Currency

Nationality

Question

Game

Reptile

Country

City

Province

Continent

Speed

Degree

Dimension

Rate

Duration

Percentage

Count

Question expansion

Morphological variants– invented → inventor

Lexical variants– killer → assassin– far → distance

Semantic variants– like → prefer

Indexing for Q/A

Alternatives:– IR techniques– Parse texts and derive conceptual

indexesFalcon uses paragraph indexing:– Vector-Space plus proximity– Returns weights used for abduction

Abduction to justify answers

Backchaining proofs from questionsAxioms:– Logical form of answer– World knowledge (WordNet)– Coreference resolution in answer text

Effectiveness:– 14% improvement– Filters 121 erroneous answers (of 692)– Requires 60% question processing time

TREC 13 QA

Several subtasks:– Factoid questions– Definition questions– List questions– Context questions

LCC still best performance, but different architecture

LCC Block Architecture

PassageRetrieval

PassageRetrieval

Answer Extraction

Theorem Prover

Answer Justification

Answer Reranking

Axiomatic Knowledge Base

Answer Extraction

Theorem Prover


Answer Reranking


WordNetNER WordNetNER

DocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q AQuestion Parse

Semantic Transformation

Recognition of Expected Answer Type

Keyword Extraction

Question Parse



Keyword Extraction

Question Processing Answer Processing

Question Processing

Two main tasks– Determining the type of the answer– Extract keywords from the question and

formulate a query

Answer Types

Factoid questions…– Who, where, when, how many…– The answers fall into a limited and

somewhat predictable set of categories• Who questions are going to be answered

by… • Where questions…

– Generally, systems select answer types from a set of Named Entities, augmented with other types that are relatively easy to extract

Answer Types

Of course, it isn’t that easy…– Who questions can have organizations

as answers• Who sells the most hybrid cars?

– Which questions can have people as answers

• Which president went to war with Mexico?

Answer Type TaxonomyContains ~9000 concepts reflecting expected answer typesMerges named entities with the WordNet hierarchy

Answer Type Detection

Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question.Not worthwhile to do something complex here if it can’t also be done in candidate answer passages.

Keyword Selection

Answer Type indicates what the question is looking for:– It can be mapped to a NE type and used

for search in enhanced indexLexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context.

Keyword Extraction

Questions approximated by sets of unrelated keywordsQuestion (from TREC QA track) KeywordsQ002: What was the monetary value of the Nobel Peace Prize in 1989?

monetary, value, Nobel, Peace, Prize

Q003: What does the Peugeot company manufacture?

Peugeot, company, manufacture

Q004: How much did Mercury spend on advertising in 1993?

Mercury, spend, advertising, 1993

Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

Keyword Selection Algorithm

1. Select all non-stopwords in quotations2. Select all NNP words in recognized

named entities3. Select all complex nominals with their

adjectival modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the answer type word

Passage RetrievalExtracts and ranks passagesusing surface-text techniques

PassageRetrieval

PassageRetrieval

Answer Extraction

Theorem Prover


Answer Reranking


Answer Extraction

Theorem Prover


Answer Reranking



DocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Q AQuestion Parse



Keyword Extraction

Question Parse



Keyword Extraction


Passage Extraction Loop

Passage Extraction Component– Extracts passages that contain all selected keywords– Passage size dynamic– Start position dynamic

Passage quality and keyword adjustment– In the first iteration use the first 6 keyword selection

heuristics– If the number of passages is lower than a threshold ⇒

query is too strict ⇒ drop a keyword– If the number of passages is higher than a threshold ⇒

query is too relaxed ⇒ add a keyword

Passage ScoringPassages are scored based on keyword windows– For example, if a question has a set of keywords: {k1,

k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:

k1 k2 k3

k2k1

Window 1

k1 k2 k3

k2k1

Window 2

k1 k2 k3

k2k1

Window 3

k1 k2 k3

k2k1

Window 4

Passage Scoring

Passage ordering is performed using a sort that involves three scores:– The number of words from the question

that are recognized in the same sequence in the window

– The number of words that separate the most distant keywords in the window

– The number of unmatched keywords in the window

Answer Extraction

Extracts and ranks answersusing NL techniques

PassageRetrieval

PassageRetrieval

Answer Extraction

Theorem Prover


Answer Reranking


Answer Extraction

Theorem Prover


Answer Reranking



DocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Q AQuestion Parse



Keyword Extraction

Question Parse



Keyword Extraction


Ranking Candidate Answers

Answer type: PersonText passage:

“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...”

Q066: Name the first private citizen to fly in space.

Ranking Candidate Answers

Answer type: PersonText passage:

“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...”

Best candidate answer: Christa McAuliffe

Q066: Name the first private citizen to fly in space.

Features for Answer RankingNumber of question terms matched in the answer passageNumber of question terms matched in the same phrase as the candidate answerNumber of question terms matched in the same sentence as the candidate answerFlag set to 1 if the candidate answer is followed by a punctuation signNumber of question terms matched, separated from the candidate answer by at most three words and one commaNumber of terms occurring in the same order in the answer passage as in the questionAverage distance from candidate answer to question term matches

Lexical Chains

Question: When was the internal combustion engine invented?

Answer: The first internal combustion engine was built in 1867.

Lexical chains:(1) invent:v#1 → HYPERNIM → create_by_mental_act:v#1 →

HYPERNIM → create:v#1 → HYPONIM → build:v#1

Question: How many chromosomes does a human zygote have?

Answer: 46 chromosomes lie in the nucleus of every normal human cell.

Lexical chains:(1) zygote:n#1 → HYPERNIM → cell:n#1 → HAS.PART →

nucleus:n#1

Theorem Prover

Q: What is the age of the solar system?QLF: quantity_at(x2) & age_nn(x2) & of_in(x2,x3) & solar_jj(x3) &

system_nn(x3)Question Axiom: (exists x1 x2 x3 (quantity_at(x2) & age_NN(x2) &

of_in(x2,x3) & solar_jj(x3) & system_nn(x3))Answer: The solar system is 4.6 billion years old.Wordnet Gloss: old_jj(x6) ↔ live_vb(e2,x6,x2) & for_in(e2,x1) &

relatively_jj(x1) & long_jj(x1) & time_nn(x1) & or_cc(e5,e2,e3) & attain_vb(e3,x6,x2) & specific_jj(x2) & age_nn(x2)

Linguistic Axiom: all x1 (quantity_at(x1) & solar_jj(x1) & system_nn(x1) → of_in(x1,x1))

Proof: ¬quantity_at(x2) | ¬age_nn(x2) | ¬of_in(x2,x3) | ¬solar_jj(x3) | ¬system_nn(x3)

Refutation assigns value to x2

Is the Web Different?

In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts.The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together.But…

The Web is Different

On the Web popular factoids are likely to be expressed in a gazillion different ways.At least a few of which will likely match the way the question was asked.So why not just grep (or agrep) the Web using all or pieces of the original question.

AskMSR

Process the question by…– Forming a search engine query from the

original question– Detecting the answer type

Get some resultsExtract answers of the right type based on– How often they occur

Step 1: Rewrite the questions

Intuition: The user’s question is often syntactically quite close to sentences that contain the answer

– Where is the Louvre Museum located? • The Louvre Museum is located in Paris

– Who created the character of Scrooge?• Charles Dickens created the character of

Scrooge.

Query rewritingClassify question into seven categories

– Who is/was/are/were…?– When is/did/will/are/were …?– Where is/are/were …?

a. Hand-crafted category-specific transformation rulese.g.: For where questions, move ‘is’ to all possible locations

Look to the right of the query terms for the answer.

“Where is the Louvre Museum located?”→ “is the Louvre Museum located”→ “the is Louvre Museum located”→ “the Louvre is Museum located”→ “the Louvre Museum is located”→ “the Louvre Museum located is”

Step 2: Query search engine

Send all rewrites to a Web search engineRetrieve top N answers (100-200)For speed, rely just on search engine’s “snippets”, not the full text of the actual document

Step 3: Gathering N-GramsEnumerate all N-grams (N=1,2,3) in all retrieved snippetsWeight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document– Example: “Who created the character of

Scrooge?”Dickens 117Christmas Carol 78Charles Dickens 75Disney 72Carl Banks 54A Christmas 41Christmas Carol 45Uncle 31

Step 4: Filtering N-Grams

Each question type is associated with one or more “data-type filters” = regular expressions for answer typesBoost score of n-grams that match the expected answer type.Lower score of n-grams that don’t match.

Step 5: Tiling the Answers

Dickens

Charles Dickens

Mr Charles

Scores

20

15

10

merged, discardold n-grams

Mr Charles DickensScore 45

Results

Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions– Technique does ok, not great (would have

placed in top 9 of ~30 participants)– But with access to the Web… they do

much better, would have come in second on TREC 2001

Neue Methode: Fragen vorbereiten

Statt: Nutzergestellte Fragen aufbereiten und damit antworten

suchen

Jetzt:Zu Text Fragen generieren und diese vorbereiteten Fragen als

„Index“ benutzen

Fragen zu Antworten erzeugen I

An zahlreichen Boeing-Modellen ist der Konzern heute maßgeblich beteiligt.

An VIELEN XXX-Modellen ist der Konzern (heute) IRGENDWIE beteiligt.An ETWAS ist der Konzern (heute) IRGENDWIE beteiligt.An ETWAS ist IRGENDWER beteiligt.

An WAS ist der Konzern heute maßgeblich beteiligt?WER ist an zahlreichen Boeing-Modellen beteiligt?

Im Werk Lübeck-Dänischburg wären etwa 270 der insgesamt rund 1200 Beschäftigten betroffen.

Im Werk XXX wären IRGENDWIEVIEL PERSONEN betroffen.IRGENDWO wären IRGENDWIEVIEL PERSONEN betroffen.

WIEVIELE PERSONEN sind im Werk Lübeck-Dänischburg betroffen?WER ist im Werk Lübeck-Dänischburg betroffen?

Fragen zu Antworten erzeugen II

In der ganzen Welt beschäftige der Konzern derzeit mehr als 12 600 Mitarbeiter.

IRGENDWO beschäftige der Konzern (derzeit) IRGENDWIEVIEL PERSONEN.

IRGENDWO beschäftige IRGENDWER IRGENDWEN.WIEVIEL PERSONEN beschäftige der Konzern ?WO beschäftige der Konzern derzeit mehr als 12 600 Mitarbeiter?WER beschäftige mehr als 12 600 Mitarbeiter?

Vorstandsmitglied Horst Teltschik war im Vorfeld dieses Besuchs mehrfach in China gewesen.

Vorstandsmitglied XXX war IRGENDWO gewesen.IRGENDWER war IRGENDWO gewesen.

WER war mehrfach in China gewesen?WO war Horst Teltschik?

Fragen zu Antworten erzeugen IIIDer Hersteller von Kunststoffspielwaren litt unter der Abwertung der

Währungen von England und Italien.Der Hersteller von ETWAS litt unter der Abwertung von ETWAS.Der Hersteller von ETWAS litt unter ETWAS.IRGENDWER litt unter ETWAS.

WER litt unter der Abwertung der Währungen von England und Italien?WER litt unter der Abwertung?Unter der Abwertung von WAS litt der Hersteller von Kunststoffspielwaren?Unter WAS litt der Hersteller von Kunststoffspielwaren?

Fragen zu Antworten erzeugen IVBis zur Entscheidung des Kartellamts sollen die Geschäfte

unabhängig voneinander fortgeführt werden.Bis zur Entscheidung von JEMAND sollen die Geschäfte unabhängig

voneinander fortgeführt werden.Bis IRGENDWANN sollen die Geschäfte IRGENDWIE fortgeführt werden.Bis IRGENDWANN soll IRGENDWAS getan werden.

Bis WANN sollen die Geschäfte unabhängig voneinander fortgeführtwerden?Bis zu WESSEN Entscheidung sollen die Geschäfte unabhängig voneinander fortgeführt werden?WAS soll getan werden bis zur Entscheidung des Kartellamts?

Fragen zu Antworten erzeugen V

Eberhard Müller sei von dem früheren Seat-Chef ganz schnell in Frühpension geschickt worden.

IRGENDWER sei von dem XXX-Chef IRGENDWIE in XXX-Pension geschickt worden.

IRGENDWER sei von dem IRGENDWEM IRGENDWIE in XXX-Pension geschickt worden.

IRGENDWER sei in XXX-Pension geschickt worden.IRGENDWER sei/ist BETROFFEN_VON IRGENDWAS.

WER sei/ist von dem früheren Seat-Chef in Frühpension geschickt worden?WER sei/ist von dem CHEF in Pension geschickt worden?VON WAS ist Eberhard Müller BETROFFEN?

Fragen zu Antworten erzeugen VIEinige Jahre hindurch galt Seat als das gewinnträchtigste

Unternehmen des VW-Konzerns.IRGENDWANN galt IRGENDWER als das BESTE Unternehmen des XXX-

Konzerns.IRGENDWIELANGE galt IRGENDWER als das BESTE Unternehmen des

XXX-Konzerns.IRGENDWIELANGE galt IRGENDWER als IRGENDWAS.

WANN galt Seat als das gewinnträchtigste Unternehmen des VW-Konzerns?WIELANGE galt Seat als das gewinnträchtigste Unternehmen des VW-Konzerns?WER galt als das gewinnträchtigste Unternehmen des VW-Konzerns?Als WAS galt Seat einige Jahre hindurch?

Fragen zu Antworten erzeugen VIIAmerikas Arbeitsmarkt durchläuft einen tiefgehenden

Strukturwandel.IRGENDEIN Arbeitsmarkt durchläuft einen STARKEN XXX-Wandel.IRGENDWER durchläuft IRGENDWAS.IRGENDWER VERÄNDERT_SICH.

WAS durchläuft Amerikas Arbeitsmarkt?WER durchläuft einen STARKEN XXX-Wandel.WER VERÄNDERT_SICH?

Documents

Wintersemester 2009/10 Teil 7 Question Answeringasv.informatik.uni-leipzig.de/document/file_link/158/AdvIR-7.pdf · Wintersemester 2009/10 Teil 7 Question Answering Uwe Quasthoff