Upload
allistair-haley
View
73
Download
6
Embed Size (px)
DESCRIPTION
Question Answering. Marti Hearst November 14, 2005. Question Answering. Outline Introduction to QA A typical full-fledged QA system A very simple system, in response to this An intermediate approach Incorporating a reasoning system Machine Learning of mappings - PowerPoint PPT Presentation
Citation preview
1
Question Answering
Marti HearstNovember 14, 2005
2
Question Answering
OutlineIntroduction to QAA typical full-fledged QA systemA very simple system, in response to thisAn intermediate approachIncorporating a reasoning systemMachine Learning of mappingsOther question types (e.g., biography, definitions)
3
A of Search Types
What is the typical height of a giraffe?
What are some good ideas for landscaping my client’s yard?
What are some promising untried treatments for Raynaud’s disease?Text Data Mining
Browse and Build
Question/Answer
4Adapted from slide by Surdeanu and Pasca
Document RetrievalUsers submit queries corresponding to their information needs.System returns (voluminous) list of full-length documents.It is the responsibility of the users to find information of interest within the returned documents.
Open-Domain Question Answering (QA)Users ask questions in natural language.
What is the highest volcano in Europe?System returns list of short answers.
… Under Mount Etna, the highest volcano in Europe, perches the fabulous town …A real use for NLP
Beyond Document Retrieval
5
Questions and Answers
What is the height of a typical giraffe?
The result can be a simple answer, extracted from existing web pages.Can specify with keywords or a natural language query
– However, most web search engines are not set up to handle questions properly.
– Get different results using a question vs. keywords
6
7
8
9
10Adapted from slide by Surdeanu and Pasca
The Problem of Question Answering
What is the nationality of Pope John Paul II? … stabilize the country with its help, the Catholic hierarchy stoutly held out
for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…
Natural language question, not keyword
queries
Short text fragment, not URL
list
11Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI
Question Answering from text
With massive collections of full-text documents, simply finding relevant documents is of limited use: we want answers QA: give the user a (short) answer to their question, perhaps supported by evidence. An alternative to standard IR
The first problem area in IR where NLP is really making a difference.
12Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI
People want to ask questions…
Examples from AltaVista query logwho invented surf music?how to make stink bombswhere are the snowdens of yesteryear?which english translation of the bible is used in official catholic liturgies?how to do clayarthow to copy psxhow tall is the sears tower?
Examples from Excite query log (12/1999)how can i find someone in texaswhere can i find information on puritan religion?what are the 7 wonders of the worldhow can i eliminate stressWhat vacuum cleaner does Consumers Guide recommend
13Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI
A Brief (Academic) HistoryIn some sense question answering is not a new research area Question answering systems can be found in many areas of NLP research, including:
Natural language database systems– A lot of early NLP work on these
Problem-solving systems– STUDENT (Winograd ’77)– LUNAR (Woods & Kaplan ’77)
Spoken dialog systems– Currently very active and commercially relevant
The focus is now on open-domain QA is newFirst modern system: MURAX (Kupiec, SIGIR’93):
– Trivial Pursuit questions – Encyclopedia answers
FAQFinder (Burke et al. ’97)TREC QA competition (NIST, 1999–present)
14Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI
AskJeeves
AskJeeves is probably most hyped example of “Question answering”How it used to work:
Do pattern matching to match a question to their own knowledge base of questionsIf a match is found, returns a human-curated answer to that known questionIf that fails, it falls back to regular web search(Seems to be more of a meta-search engine now)
A potentially interesting middle ground, but a fairly weak shadow of real QA
15Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI
Question Answering at TREC
Question answering competition at TREC consists of answering a set of 500 fact-based questions, e.g.,
“When was Mozart born?”.
Has really pushed the field forward.The document set
Newswire textual documents from LA Times, San Jose Mercury News, Wall Street Journal, NY Times etcetera: over 1M documents now.Well-formed lexically, syntactically and semantically (were reviewed by professional editors).
The questionsHundreds of new questions every year, the total is ~2400
TaskInitially extract at most 5 answers: long (250B) and short (50B).Now extract only one exact answer.Several other sub-tasks added later: definition, list, biography.
16
Sample TREC questions
1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"?
2. What was the monetary value of the Nobel Peace Prize in 1989?
3. What does the Peugeot company manufacture?
4. How much did Mercury spend on advertising in 1993?
5. What is the name of the managing director of Apricot Computer?
6. Why did David Koresh ask the FBI for a word processor?
7. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)?
17
TREC Scoring
For the first three years systems were allowed to return 5 ranked answer snippets (50/250 bytes) to each question.
Mean Reciprocal Rank Scoring (MRR):– Each question assigned the reciprocal rank of the first correct
answer. If correct answer at position k, the score is 1/k. 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ position
Mainly Named Entity answers (person, place, date, …)
From 2002 on, the systems are only allowed to return a single exact answer and the notion of confidence has been introduced.
18Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Top Performing SystemsIn 2003, the best performing systems at TREC can answer approximately 60-70% of the questionsApproaches and successes have varied a fair deal
Knowledge-rich approaches, using a vast array of NLP techniques stole the show in 2000-2003
– Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC )
Statistical systems starting to catch upAskMSR system stressed how much could be achieved by very simple methods with enough text (and now various copycats)People are experimenting with machine learning methods
Middle ground is to use large collection of surface matching patterns (ISI)
19
Example QA System
This system contains many components used by other systems, but more complex in some waysMost work completed in 2001; there have been advances by this group and others since then.Next slides based mainly on:
Paşca and Harabagiu, High-Performance Question Answering from Large Text Collections, SIGIR’01.Paşca and Harabagiu, Answer Mining from Online Documents, ACL’01.Harabagiu, Paşca, Maiorano: Experiments with Open-Domain Textual Question Answering. COLING’00
20Adapted from slide by Surdeanu and Pasca
QA Block Architecture
QuestionProcessing
QuestionProcessing
PassageRetrieval
PassageRetrieval
AnswerExtraction
AnswerExtraction
WordNet
NER
Parser
WordNet
NER
ParserDocumentRetrieval
DocumentRetrieval
Keywords Passages
Question Semantics
Captures the semantics of the questionSelects keywords for PR
Extracts and ranks passagesusing surface-text techniques
Extracts and ranks answersusing NL techniques
Q A
21Adapted from slide by Surdeanu and Pasca
Question Processing Flow
Q Questionparsing
Questionparsing
Construction of thequestion representation
Construction of thequestion representation
Answer type detectionAnswer type detection
Keyword selectionKeyword selection
Questionsemantic representation
AT category
Keywords
22Adapted from slide by Surdeanu and Pasca
Question Stems and Answer Types
Question Question stem Answer type
Q555: What was the name of Titanic’s captain? What Person
Q654: What U.S. Government agency registers trademarks?
What Organization
Q162: What is the capital of Kosovo? What City
Q661: How much does one ton of cement cost? How much Quantity
Other question stems: Who, Which, Name, How hot...Other answer types: Country, Number, Product...
Identify the semantic category of expected answers
23Adapted from slide by Surdeanu and Pasca
Detecting the Expected Answer Type
In some cases, the question stem is sufficient to indicate the answer type (AT)
Why REASONWhen DATE
In many cases, the question stem is ambiguousExamples
– What was the name of Titanic’s captain ?– What U.S. Government agency registers trademarks?– What is the capital of Kosovo?
Solution: select additional question concepts (AT words) that help disambiguate the expected answer typeExamples
– captain– agency– capital
24
Answer Type TaxonomyEncodes 8707 English concepts to help recognize expected answer typeMapping to parts of Wordnet done by hand
Can connect to Noun, Adj, and/or Verb subhierarchies
25Adapted from slide by Surdeanu and Pasca
Answer Type Detection Algorithm
Select the answer type word from the question representation.
Select the word(s) connected to the question. Some “content-free” words are skipped (e.g. “name”).From the previous set select the word with the highest connectivity in the question representation.
Map the AT word in a previously built AT hierarchyThe AT hierarchy is based on WordNet, with some concepts associated with semantic categories, e.g. “writer” PERSON.
Select the AT(s) from the first hypernym(s) associated with a semantic category.
28Adapted from slide by Surdeanu and Pasca
Keyword Selection
Answer Type indicates what the question is looking for, but provides insufficient context to locate the answer in very large document collectionLexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context.
29Adapted from slide by Surdeanu and Pasca
Lexical Term Extraction
Questions approximated by sets of unrelated words (lexical terms)Similar to bag-of-word IR models
Question (from TREC QA track) Lexical terms
Q002: What was the monetary value of the Nobel Peace Prize in 1989?
monetary, value, Nobel, Peace, Prize
Q003: What does the Peugeot company manufacture?
Peugeot, company, manufacture
Q004: How much did Mercury spend on advertising in 1993?
Mercury, spend, advertising, 1993
Q005: What is the name of the managing director of Apricot Computer?
name, managing, director, Apricot, Computer
30Adapted from slide by Surdeanu and Pasca
Keyword Selection Algorithm
1. Select all non-stopwords in quotations2. Select all NNP words in recognized named entities3. Select all complex nominals with their adjectival
modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the AT word (which was skipped in all previous
steps)
31Adapted from slide by Surdeanu and Pasca
Keyword Selection Examples
What researcher discovered the vaccine against Hepatitis-B?
Hepatitis-B, vaccine, discover, researcher
What is the name of the French oceanographer who owned Calypso?
Calypso, French, own, oceanographer
What U.S. government agency registers trademarks?U.S., government, trademarks, register, agency
What is the capital of Kosovo?Kosovo, capital
32Adapted from slide by Surdeanu and Pasca
Passage Retrieval
QuestionProcessing
QuestionProcessing
PassageRetrieval
PassageRetrieval
AnswerExtraction
AnswerExtraction
WordNet
NER
Parser
WordNet
NER
ParserDocumentRetrieval
DocumentRetrieval
Keywords Passages
Question Semantics
Captures the semantics of the questionSelects keywords for PR
Extracts and ranks passagesusing surface-text techniques
Extracts and ranks answersusing NL techniques
Q A
33Adapted from slide by Surdeanu and Pasca
Passage Extraction Loop
Passage Extraction ComponentExtracts passages that contain all selected keywordsPassage size dynamicStart position dynamic
Passage quality and keyword adjustmentIn the first iteration use the first 6 keyword selection heuristicsIf the number of passages is lower than a threshold query is too strict drop a keywordIf the number of passages is higher than a threshold query is too relaxed add a keyword
34Adapted from slide by Surdeanu and Pasca
Passage Retrieval Architecture
Passage ExtractionPassage Extraction
Passage Quality
Keyword Adjustment
Keyword Adjustment
Passage Scoring
Passage Scoring
Passage Ordering
Passage Ordering
Keywords No
Passages
Yes
Documents
DocumentRetrieval
DocumentRetrieval
RankedPassages
35Adapted from slide by Surdeanu and Pasca
Passage ScoringPassages are scored based on keyword windowsFor example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:
k1 k2 k3k2 k1
Window 1
k1 k2 k3k2 k1
Window 2
k1 k2 k3k2 k1
Window 3
k1 k2 k3k2 k1
Window 4
36Adapted from slide by Surdeanu and Pasca
Passage Scoring
Passage ordering is performed using a radix sort that involves three scores:
SameWordSequenceScore (largest)– Computes the number of words from the question that
are recognized in the same sequence in the window
DistanceScore (largest)– The number of words that separate the most distant
keywords in the window
MissingKeywordScore (smallest)– The number of unmatched keywords in the window
37Adapted from slide by Surdeanu and Pasca
Answer Extraction
QuestionProcessing
QuestionProcessing
PassageRetrieval
PassageRetrieval
AnswerExtraction
AnswerExtraction
WordNet
NER
Parser
WordNet
NER
ParserDocumentRetrieval
DocumentRetrieval
Keywords Passages
Question Semantics
Captures the semantics of the questionSelects keywords for PR
Extracts and ranks passagesusing surface-text techniques
Extracts and ranks answersusing NL techniques
Q A
38Adapted from slide by Surdeanu and Pasca
Ranking Candidate Answers
Answer type: Person Text passage:
“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”
Best candidate answer: Christa McAuliffe
Q066: Name the first private citizen to fly in space.
39Adapted from slide by Surdeanu and Pasca
Features for Answer Ranking
relNMW number of question terms matched in the answer passage
relSP number of question terms matched in the same phrase as the candidate answerrelSS number of question terms matched in the same sentence as the candidate answerrelFP flag set to 1 if the candidate answer is followed by a punctuation signrelOCTW number of question terms matched, separated from the candidate answer by at most three words and one commarelSWS number of terms occurring in the same order in the answer passage as in the questionrelDTW average distance from candidate answer to question term matches
SIGIR ‘01
40Adapted from slide by Surdeanu and Pasca
Answer Ranking based on Machine Learning
Relative relevance score computed for each pair of candidates (answer windows)relPAIR = wSWS relSWS + wFP relFP
+ wOCTW relOCTW + wSP relSP + wSS relSS
+ wNMW relNMW + wDTW relDTW + threshold
If relPAIR positive, then first candidate from pair is more relevant
Perceptron model used to learn the weightsScores in the 50% MRR for short answers, in the 60% MRR for long answers
42Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Can we make this simpler?
One reason systems became so complex is that they have to pick out one sentence within a small collection
The answer is likely to be stated in a hard-to-recognize manner.Alternative Idea:
What happens with a much larger collection? The web is so huge that you’re likely to see the answer stated in a form similar to the question
Goal: make the simplest possible QA system by exploiting this redundancy in the web
Use this as a baseline against which to compare more elaborate systems.The next slides based on:
– Web Question Answering: Is More Always Better? Dumais, Banko, Brill, Lin, Ng, SIGIR’02
– An Analysis of the AskMSR Question-Answering System, Brill, Dumais, and Banko, EMNLP’02.
43Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
AskMSR System Architecture
1 2
3
45
44Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Step 1: Rewrite the questions
Intuition: The user’s question is often syntactically quite close to sentences that contain the answer
Where is the Louvre Museum located? The Louvre Museum is located in Paris
Who created the character of Scrooge?
Charles Dickens created the character of Scrooge.
45Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Query rewritingClassify question into seven categories
Who is/was/are/were…?When is/did/will/are/were …?Where is/are/were …?
a. Hand-crafted category-specific transformation rulese.g.: For where questions, move ‘is’ to all possible locations
Look to the right of the query terms for the answer.
“Where is the Louvre Museum located?” “is the Louvre Museum located” “the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is”
b. Expected answer “Datatype” (eg, Date, Person, Location, …)When was the French Revolution? DATE
Nonsense,but ok. It’sonly a fewmore queriesto the search engine.
46Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Query Rewriting - weighting
Some query rewrites are more reliable than others.
+“the Louvre Museum is located”
Where is the Louvre Museum located?
Weight 5if a match,probably right
+Louvre +Museum +located
Weight 1Lots of non-answerscould come back too
47Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Step 2: Query search engine
Send all rewrites to a Web search engineRetrieve top N answers (100-200)For speed, rely just on search engine’s “snippets”, not the full text of the actual document
48Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Step 3: Gathering N-Grams
Enumerate all N-grams (N=1,2,3) in all retrieved snippetsWeight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document
Example: “Who created the character of Scrooge?”
Dickens 117Christmas Carol 78Charles Dickens 75Disney 72Carl Banks 54A Christmas 41Christmas Carol 45Uncle 31
49Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Step 4: Filtering N-Grams
Each question type is associated with one or more “data-type filters” = regular expressionWhen…Where…What …Who …
Boost score of n-grams that match regexpLower score of n-grams that don’t match regexpDetails omitted from paper….
Date
Location
Person
50Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Step 5: Tiling the Answers
Dickens
Charles Dickens
Mr Charles
Scores
20
15
10
merged, discardold n-grams
Mr Charles DickensScore 45
N-Gramstile highest-scoring n-gram
N-Grams
Repeat, until no more overlap
51Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Results
Standard TREC contest test-bed (TREC 2001):~1M documents; 900 questions
Technique doesn’t do too well (though would have placed in top 9 of ~30 participants)
– MRR: strict: .34– MRR: lenient: .43– 9th place
52
ResultsFrom EMNLP’02 paper
MMR of .577; answers 61% correctlyWould be near the top of TREC-9 runs
Breakdown of feature contribution:
53Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Issues
Works best/only for “Trivial Pursuit”-style fact-based questionsLimited/brittle repertoire of
question categoriesanswer data types/filtersquery rewriting rules
54Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Intermediate Approach:Surface pattern discovery
Based on:Ravichandran, D. and Hovy E.H. Learning Surface Text Patterns for a Question Answering System, ACL’02Hovy, et al., Question Answering in Webclopedia, TREC-9, 2000.
Use of Characteristic Phrases"When was <person> born”
Typical answers– "Mozart was born in 1756.”– "Gandhi (1869-1948)...”
Suggests regular expressions to help locate correct answer– "<NAME> was born in <BIRTHDATE>”– "<NAME> ( <BIRTHDATE>-”
55Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Use Pattern Learning
Examples:“The great composer Mozart (1756-1791) achieved fame at a young age”“Mozart (1756-1791) was a genius”“The whole world would always be indebted to the great music of Mozart (1756-1791)”
Longest matching substring for all 3 sentences is "Mozart (1756-1791)”
Suffix tree would extract "Mozart (1756-1791)" as an output, with score of 3
Reminiscent of IE pattern learning
56Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Pattern Learning (cont.)
Repeat with different examples of same question type
“Gandhi 1869”, “Newton 1642”, etc.
Some patterns learned for BIRTHDATEa. born in <ANSWER>, <NAME>b. <NAME> was born on <ANSWER> , c. <NAME> ( <ANSWER> -d. <NAME> ( <ANSWER> - )
57Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
QA Typology from ISITypology of typical question forms—94 nodes (47 leaf nodes)Analyzed 17,384 questions (from answers.com)
(THING ((AGENT (NAME (FEMALE-FIRST-NAME (EVE MARY ...)) (MALE-FIRST-NAME (LAWRENCE SAM ...)))) (COMPANY-NAME (BOEING AMERICAN-EXPRESS)) JESUS ROMANOFF ...) (ANIMAL-HUMAN (ANIMAL (WOODCHUCK YAK ...)) PERSON) (ORGANIZATION (SQUADRON DICTATORSHIP ...)) (GROUP-OF-PEOPLE (POSSE CHOIR ...)) (STATE-DISTRICT (TIROL MISSISSIPPI ...)) (CITY (ULAN-BATOR VIENNA ...)) (COUNTRY (SULTANATE ZIMBABWE ...)))) (PLACE (STATE-DISTRICT (CITY COUNTRY...)) (GEOLOGICAL-FORMATION (STAR CANYON...)) AIRPORT COLLEGE CAPITOL ...) (ABSTRACT (LANGUAGE (LETTER-CHARACTER (A B ...))) (QUANTITY (NUMERICAL-QUANTITY INFORMATION-QUANTITY MASS-QUANTITY MONETARY-QUANTITY TEMPORAL-QUANTITY ENERGY-QUANTITY TEMPERATURE-QUANTITY ILLUMINATION-QUANTITY
(SPATIAL-QUANTITY (VOLUME-QUANTITY AREA-QUANTITY DISTANCE-QUANTITY))
... PERCENTAGE))) (UNIT ((INFORMATION-UNIT (BIT BYTE ... EXABYTE)) (MASS-UNIT (OUNCE ...)) (ENERGY-UNIT (BTU ...)) (CURRENCY-UNIT (ZLOTY PESO ...)) (TEMPORAL-UNIT (ATTOSECOND ... MILLENIUM)) (TEMPERATURE-UNIT (FAHRENHEIT KELVIN CELCIUS)) (ILLUMINATION-UNIT (LUX CANDELA)) (SPATIAL-UNIT ((VOLUME-UNIT (DECILITER ...)) (DISTANCE-UNIT (NANOMETER ...)))) (AREA-UNIT (ACRE)) ... PERCENT)) (TANGIBLE-OBJECT ((FOOD (HUMAN-FOOD (FISH CHEESE ...))) (SUBSTANCE ((LIQUID (LEMONADE GASOLINE BLOOD ...)) (SOLID-SUBSTANCE (MARBLE PAPER ...)) (GAS-FORM-SUBSTANCE (GAS AIR)) ...)) (INSTRUMENT (DRUM DRILL (WEAPON (ARM GUN)) ...) (BODY-PART (ARM HEART ...)) (MUSICAL-INSTRUMENT (PIANO))) ... *GARMENT *PLANT DISEASE)
58Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Experiments
6 different question typesfrom Webclopedia QA Typology
– BIRTHDATE– LOCATION– INVENTOR– DISCOVERER– DEFINITION– WHY-FAMOUS
59Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Experiments: pattern precision
BIRTHDATE:1.0 <NAME> ( <ANSWER> - )0.85 <NAME> was born on <ANSWER>,0.6 <NAME> was born in <ANSWER>0.59 <NAME> was born <ANSWER>0.53 <ANSWER> <NAME> was born0.50 - <NAME> ( <ANSWER>0.36 <NAME> ( <ANSWER> -
INVENTOR1.0 <ANSWER> invents <NAME>1.0 the <NAME> was invented by <ANSWER>1.0 <ANSWER> invented the <NAME> in
60Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Experiments (cont.)
DISCOVERER1.0 when <ANSWER> discovered <NAME>1.0 <ANSWER>'s discovery of <NAME>0.9 <NAME> was discovered by <ANSWER> in
DEFINITION1.0 <NAME> and related <ANSWER>1.0 form of <ANSWER>, <NAME>0.94 as <NAME>, <ANSWER> and
61Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Experiments (cont.)
WHY-FAMOUS1.0 <ANSWER> <NAME> called1.0 laureate <ANSWER> <NAME>0.71 <NAME> is the <ANSWER> of
LOCATION1.0 <ANSWER>'s <NAME>1.0 regional : <ANSWER> : <NAME>0.92 near <NAME> in <ANSWER>
Depending on question type, get high MRR (0.6–0.9), with higher results from use of Web than TREC QA collection
62Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Shortcomings & Extensions
Need for POS &/or semantic types– "Where are the Rocky Mountains?”– "Denver's new airport, topped with white fiberglass
cones in imitation of the Rocky Mountains in the background , continues to lie empty”
– <NAME> in <ANSWER>
NE tagger &/or ontology could enable system to determine "background" is not a location
63Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Shortcomings... (cont.)
Long distance dependencies"Where is London?”
"London, which has one of the busiest airports in the world, lies on the banks of the river Thames”
would require pattern like:<QUESTION>, (<any_word>)*, lies on <ANSWER>
Abundance & variety of Web data helps system to find an instance of patterns w/o losing answers to long distance dependencies
64Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Shortcomings... (cont.)
System currently has only one anchor wordDoesn't work for Q types requiring multiple words from question to be in answer
– "In which county does the city of Long Beach lie?”– "Long Beach is situated in Los Angeles County”– required pattern:
<Q_TERM_1> is situated in <ANSWER> <Q_TERM_2>
Does not use case– "What is a micron?”– "...a spokesman for Micron, a maker of semiconductors,
said SIMMs are..."
If Micron had been capitalized in question, would be a perfect answer
65Adapted from slide by Harabagiu and Narayanan
The results of the past 5 TREC evaluations of QA systems indicate that current state-of-the-art QA is determined by the recognition of Named EntitiesIn TREC 2003 the LCC QA system extracted 289 correct answers for factoid questionsThe Name Entity Recognizer was responsible for 234 of them
QUANTITY 55 ORGANIZATION 15 PRICE 3
NUMBER 45 AUTHORED WORK 11 SCIENCE NAME 2
DATE 35 PRODUCT 11 ACRONYM 1
PERSON 31 CONTINENT 5 ADDRESS 1
COUNTRY 21 PROVINCE 5 ALPHABET 1
OTHER LOCATIONS 19 QUOTE 5 URI 1
CITY 19 UNIVERSITY 3
The Importance of NER
66Adapted from slide by Harabagiu and Narayanan
The Special Case of Names
1934: What is the play “West Side Story” based on?Answer: Romeo and Juliet
1976: What is the motto for the Boy Scouts?Answer: Be prepared.
1982: What movie won the Academy Award for best picture in 1989?Answer: Driving Miss Daisy
2080: What peace treaty ended WWI?Answer: Versailles
2102: What American landmark stands on Liberty Island?Answer: Statue of Liberty
Questions asking for names of authored works
67Adapted from slide by Shauna Eggers
Problems
NE assumes all answers are named entitiesOversimplifies the generative power of language!What about: “What kind of flowers did Van Gogh paint?”
Does not account well for morphological, lexical, and semantic alternations
Question terms may not exactly match answer terms; connections between alternations of Q and A terms often not documented in flat dictionaryExample: “When was Berlin’s Brandenburger Tor erected?” no guarantee to match builtRecall suffers
68Adapted from slide by Shauna Eggers
LCC Approach:WordNet to the rescue!WordNet can be used to inform all three steps of the Q/A process1. Answer-type recognition (Answer Type Taxonomy)2. Passage Retrieval (“specificity” constraints)3. Answer extraction (recognition of keyword alternations)
Using WN’s lexico-semantic info: Examples“What kind of flowers did Van Gogh paint?”
– Answer-type recognition: need to know (a) answer is a kind of flower, and (b) sense of the word flower
– WordNet encodes 470 hyponyms of flower sense #1, flowers as plants
– Nouns from retrieved passages can be searched against these hyponyms
“When was Berlin’s Brandenburger Tor erected?”– Semantic alternation: erect is a hyponym of sense #1 of
build
69
WN for Answer Type RecognitionEncodes 8707 English concepts to help recognize expected answer typeMapping to parts of Wordnet done by hand
Can connect to Noun, Adj, and/or Verb subhierarchies
70Adapted from slide by Shauna Eggers
WN in Passage Retrieval
Identify relevant passages from textExtract keywords from the question, andPass them to the retrieval module
“Specificity” – filtering question concepts/keywords
Focuses search, improves performance and precisionQuestion keywords can be omitted from the search if they are too generalSpecificity calculated by counting the hyponyms of a given keyword in WordNet
– Count ignores proper names and same-headed concepts– Keyword is thrown out if count is above a given threshold
(currently 10)
71Adapted from slide by Shauna Eggers
WN in Answer ExtractionIf keywords alone cannot find an acceptable answer, look for alternations in WordNet!
Q196: Who wrote “Hamlet”?
Morphological Alternation: wrote written
Answer: before the young playwright has written Hamlet – and Shakespeare seizes the opportunity
Q136: Who is the queen of Holland?
Lexical Alternation: Holland Netherlands
Answer: Pricess Margrit, sister of Queen Beatrix of the Netherlands, was also present
Q196: What is the highest mountain in the world?
Semantic Alternation: mountain peak
Answer: first African country to send an expedition to Mount Everest, the world’s highest peak
72Adapted from slide by Shauna Eggers
EvaluationPaşca/Harabagiu (NAACL’01 Workshop) measured approach using TREC-8 and TREC-9 test collectionsWN contributions to Answer Type Recognition
Count number of questions for which acceptable answers were found; 3GB text collection, 893 questions
All What only
Flat dictionary (baseline) 227 (32%) 48 (13%)
A-type taxonomy (static) 445 (64%) 179 (50%)
A-type taxonomy (dynamic) 463 (67%) 196 (56%)
A-type taxonomy (dynamic + answer patterns) 533 (76%) 232 (65%)
Method # questions with correct answer type
73Adapted from slide by Shauna Eggers
Evaluation
WN contributions to Passage RetrievalImpact of keyword alternations
Impact of specificity knowledge
No alternations enabled 55.3% precision
Lexical alternations enabled 67.6%
Lexical + semantic alternations enabled 73.7%
Morphological expansions enabled 76.5%
TREC-8 TREC-9
Not included 133 (65%) 463 (67%)
Included 151 (76%) 515 (74%)
Specificity knowledge
# questions with correct answer infirst 5 documents returned
74
Going Beyond Word Matching
Use techniques from artificial intelligence to try to draw inferences from the meanings of the wordsThis is a highly unusual and ambitious approach.
Surprising it works at all!Requires huge amounts of hand-coded information
Uses notions of proofs and inference from logicAll birds fly. Robins are birds. Thus, robins fly.forall(X): bird(X) -> fly(x)forall(X,Y): student(X), enrolled(X,Y) -> school(Y)
75Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Inference via a Logic Prover
The LCC system attempts inference to justify an answerIts inference engine is a kind of funny middle ground between logic and pattern matchingBut quite effective: 30% improvement
Q: When was the internal combustion engine invented?A: The first internal-combustion engine was built in 1867.
invent -> create_mentally -> create -> build
76Adapted from slide by Surdeanu and Pasca
COGEXWorld knowledge from:
WordNet glosses converted to logic forms in the eXtended WordNet (XWN) project Lexical chains
– game:n#3 HYPERNYM recreation:n#1 HYPONYM sport:n#1
– Argentine:a#1 GLOSS Argentina:n#1
NLP axioms to handle complex NPs, coordinations, appositions, equivalence classes for prepositions etceteraNamed-entity recognizer
– John Galt HUMAN
A relaxation mechanism is used to iteratively uncouple predicates, remove terms from LFs. The proofs are penalized based on the amount of relaxation involved.
77Adapted from slides by Manning, Harabagiu, Kusmerick, ISI
Logic Inference Example
“How hot does the inside of an active volcano get?” get(TEMPERATURE, inside(volcano(active))) “lava fragments belched out of the mountain were as hot as 300 degrees Fahrenheit” fragments(lava, TEMPERATURE(degrees(300)), belched(out, mountain)) volcano ISA mountain lava ISPARTOF volcano -> lava inside volcano fragments of lava HAVEPROPERTIESOF lava
The needed semantic information is in WordNet definitions, and was successfully translated into a form that was used for rough ‘proofs’
78Adapted from slide by Harabagiu and Narayanan
Axiom Creation
XWN AxiomsA major source of world knowledge is a general purpose knowledge base of more than 50,000 parsed and disambiguated WordNet glosses that are transformed into logical form for use during the course of a proof.
Gloss: Kill is to cause to die
Logical Form: kill_VB_1(e1,x1,x2) -> cause_VB_1(e1,x1,x3) & to_TO(e1,e2) & die_VB_1(e2,x2,x4)
79Adapted from slide by Harabagiu and Narayanan
Lexical ChainsLexical Chains
Lexical chains provide an improved source of world knowledge by supplying the Logic Prover with much needed axioms to link question keywords with answer concepts.
Question: How were biological agents acquired by bin Laden?
Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchased three chemical and biological_agent production facilities in
Lexical Chain: ( v - buy#1, purchase#1 ) HYPERNYM ( v - get#1, acquire#1 )
80Adapted from slide by Harabagiu and Narayanan
Axiom Selection Lexical chains and the XWN knowledge base work together to select and
generate the axioms needed for a successful proof when all the keywords in the questions are not found in the answer.
Question: How did Adolf Hitler die?
Answer: … Adolf Hitler committed suicide …
The following Lexical Chain is detected:( n - suicide#1, self-destruction#1, self-annihilation#1 ) GLOSS ( v - kill#1 ) GLOSS ( v - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 ) 2
The following axioms are loaded into the Prover: exists x2 all e1 x1 (suicide_nn(x1) -> act_nn(x1) & of_in(x1,e1) &
kill_vb(e1,x2,x2)).
exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2) -> cause_vb_2(e1,x1,x3) & to_to(e1,e2) & die_vb(e2,x2,x4)).
81
LCC System Refecences
The previous set of slides drew information from these sources:
The Informative Role of WordNet in Open-Domain Question Answering, Pasca and Harabagiu, WordNet and Other Lexical Resources, NAACL 2001 WorkshopPasca and Harabagiu, High Performance Question/Answering, SIGIR’01Moldovan, Clark, Harabagiu, Maiorano: COGEX: A Logic Prover for Question Answering. HLT-NAACL 2003Moldovan, Pasca, Harabagiu, and Surdeanu: Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21(2): 133-154 (2003) Harabagiu and Maiorano, Abductive Processes for Answer Justification, AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002
82
Using Machine Learning in QA
The following slides are based on:Ramakrishnan, Chakrabarti, Paranjpe, Bhattacharyya, Is Question Answering an Acquired Skill? WWW’04
83
Learning Answer Type Mapping
Idea: use machine learning techniques to automatically determine answer types and query terms from questions.Two types of answer types:
Surface patterns– Infinite set, so can’t be covered by a lexicon– DATES NUMBERS PERSON NAMES LOCATIONS– “at DD:DD” “in the ‘DDs” “in DDDD” “Xx+ said”– Can also associate with synset [date#n#7]
WordNet synsets– Consider: “name an animal that sleeps upright”– Answer: “horse”
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
84Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
Determining Answer Types
The hard ones are “what” and “which” questions.Two useful heuristics:
If the head of the NP appearing before the auxiliary or main verb is not a wh-word, mark this as an a-type clueOtherwise, the head of the NP appearing after the auxiliary/main verb is an atype clue.
85
Learning Answer Types
Given a QA pair (q, a):– (“name an animal that sleeps upright”, “horse”)
(1a) See which atype(s) “horse” can map to(1b) Look up the hypernyms of “horse” -> S(2a) Record the k words to the right of the q-word(2b) For each of these k words, look up their synsets
– An, animal, that
(2c) Increment the counts for those synsets that also appear in SDo significance testing
Compare synset frequencies against a background setRetain only those that are significantly associated with the question word more so than in general (chi-square)
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
86
Learning Answer Tyeps
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
87
Learning to Choose Query Terms
Which words from the question to use in the query?A tradeoff between precision and recall.Example:
“Tokyo is the capital of which country?”Want to use “Tokyo” verbatimProbably “capital” as wellBut maybe not “country”; maybe “nation” or maybe this word won’t appear in the retrieved passage at all.
– Also, “country” corresponds to the answer type, so probably we don’t want to require it to be in the answer text.
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
88
Learning to Choose Query Terms
Features:POS assigned to word and immediate neighborsStarts with uppercase letterIs a stopwordIDF scoreIs an answer-type for this questionAmbiguity indicators:
– # of possible WordNet senses (NumSense)– # of other WordNet synsets that describe this sense
E.g., for “buck”: stag, deer, doe (NumLemma)
Learner:J48 decision tree worked best
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
89
Learning to Choose Query TermsResults
WordNet ambiguity indicators were very helpful– Raised accuracy from 71-73% to 80%
Atype flag improved accuracy from 1-3%
90
Learning to Score Passages
Given a question, and answer, & a passage (q, a, r)Assign +1 if r contains aAssign –1 otherwise
Features:Do selected terms s from q appear in r?Does r have an answer zone a that does not s?Are the distances between tokens in a and s small?Does a have a strong WordNet similarity with q’s answer type?
Learner:Use logistic regression, since it produces a ranking rather than a hard classification into +1 or –1
– Produces a continuous estimate between 0 and 1
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
91
Learning to Score Passages
Results:F-scores are low (.33 - .56)However, reranking greatly improves the rank of the corresponding passages.Eliminates many non-answers, pushing better passages towards the top.
92Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
Learning to Score Passages
93
Computing WordNet Similarity
Path-based similarity measures are not all that good in WordNet
3 hops from entity to artifact3 hops from mammal to elephant
An alternative: Given a target synset t and an answer synset a:
– Measure the overlap of nodes on the path from t to all noun roots and from a to all noun roots
– Algorithm for computing similarity of t to a: If t is not a hypernym of a: assign 0 Else collect the set of hypernym synsets of t and a Call them Ht and Ha Compute the Jaccard overlap
» |Ht Intersect Ha| / |Ht Union Ha|
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
94
Computing WordNet SimilarityAlgorithm for computing similarity of t to a:
– |Ht Intersect Ha| / |Ht Union Ha|
chordate
vertebrate
mammal
elephant
placental mammalproboscidean
animal
living thing
organism
object
entity|Ht Intersect Ha|
|Ht Union Ha|
Ht = mammal, Ha = elephant7/10 = .7
Ht = animal, Ha = elephant5/10 = .5
Ht = animal, Ha = mammal4/7 = .57
Ht = mammal, Ha = fox7/11 = .63
Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04
95Adapted from slide by Surdeanu and Pasca
System Extension: Definition Questions
Definition questions ask about the definition or description of a concept:
Who is John Galt?What is anorexia nervosa?
Many “information nuggets” are acceptable answersWho is George W. Bush?
– … George W. Bush, the 43rd President of the United States… – George W. Bush defeated Democratic incumbent
Ann Richards to become the 46th Governor of the State of Texas…
ScoringAny information nugget is acceptablePrecision score over all information nuggets
96Adapted from slide by Surdeanu and Pasca
Definition Detection with Pattern Matching
What <be> a <QP> ?Who <be> <QP> ? example: “Who is Zebulon Pike?”<QP>, the <AP><QP> (a <AP>)<AP HumanConcept> <QP> example: “explorer Zebulon Pike”
Question patterns
Answer patterns
Q386: What is anorexia nervosa?
cause of anorexia nervosa, an eating disorder...
Q358: What is a meerkat? the meerkat, a type of mongoose, thrives in...
Q340: Who is Zebulon Pike? in 1806, explorer Zebulon Pike sighted the...
97Adapted from slide by Surdeanu and Pasca
Answer Detection with Concept Expansion
Enhancement for Definition questionsIdentify terms that are semantically related to the phrase to defineUse WordNet hypernyms (more general concepts)
Question WordNet hypernym Detected answer candidate
What is a shaman? {priest, non-Christian priest}
Mathews is the priest or shaman
What is a nematode?
{worm} nematodes, tiny worms in soil
What is anise? {herb, herbaceous plant}
anise, rhubarb and other herbs
98Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI
Online QA Examples
Examples (none work very well)AnswerBus:
– http://www.answerbus.com
Ionaut: – http://www.ionaut.com:8400/
LCC: – http://www.languagecomputer.com/demos/question
_answering/index.html