55
Question Answering and Information Extraction CMSC 473/673 UMBC December 11 th , 2017

Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Question Answering andInformation Extraction

CMSC 473/673UMBC

December 11th, 2017

Page 2: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Course Announcement 1: Project

Due Wednesday 12/20, 11:59 AM

Late days cannot be used

Any questions?

Page 3: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Course Announcement 2: Final ExamNo mandatory final exam

December 20th, 1pm-3pm: optional second midterm/final

Averaged into first midterm score

No practice questions

Register by Monday 12/11:https://goo.gl/forms/aXflKkP0BIRxhOS83

Page 4: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Course Announcement 3: Evaluations

Please fill them out! (We do pay attention to them)

Links from [email protected]

Page 5: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Recap from last time…

Page 6: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Pat and Chandler agreed on a plan.

He said Pat would try the same tactic again.

is “he” the same person as “Chandler?”

?

Entity Coreference Resolution

Page 7: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Basic System

Preprocessing Mention Detection

CorefModel OutputInput

Text

Page 8: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

What are Named Entities?

Named entity recognition (NER)

Identify proper names in texts, and classification into a set of predefined categories of interest

Person namesOrganizations (companies, government organisations,

committees, etc)Locations (cities, countries, rivers, etc)Date and time expressionsMeasures (percent, money, weight etc), email addresses, Web

addresses, street addresses, etc. Domain-specific: names of drugs, medical conditions, names

of ships, bibliographic references etc.

Cunningham and Bontcheva (2003, RANLP Tutorial)

Page 9: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Two kinds of NE approaches

Knowledge Engineering

rule based developed by experienced language engineers make use of human intuition requires only small amount of training datadevelopment could be very time consuming some changes may be hard to accommodate

Learning Systems

requires some (large?) amounts of annotated training data

some changes may require re-annotation of the entire training corpus

annotators can be cheap

Cunningham and Bontcheva (2003, RANLP Tutorial)

Page 10: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Baseline: list lookup approach

System that recognises only entities stored in its lists (gazetteers).

Advantages - Simple, fast, language independent, easy to retarget (just create lists)

Disadvantages – impossible to enumerate all names, collection and maintenance of lists, cannot deal with name variants, cannot resolve ambiguity

Cunningham and Bontcheva (2003, RANLP Tutorial)

Page 11: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Shallow Parsing Approach (internal structure)

Internal evidence – names often have internal structure. These components can be either stored or guessed, e.g. location:

Cap. Word + {City, Forest, Center, River}Sherwood Forest

Cap. Word + {Street, Boulevard, Avenue, Crescent, Road}

Portobello Street

Cunningham and Bontcheva (2003, RANLP Tutorial)

Page 12: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

NER and Shallow Parsing: Machine Learning

Sequence models (HMM, CRF) often effective

BIO encoding

Pat and Chandler Smith agreed on a plan.B-PER B-PER I-PERO O O O O

B-NP B-NP I-NPO B-VP O B-NP I-NP

B-NP I-NP I-NPI-NP B-VP O B-NP I-NP

Page 13: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Pat and Chandler agreed on a plan.

He said Pat would try the same tactic again.

?

Model Attempt 1: Binary Classification

Mention-Pair Model

Page 14: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Pat and Chandler agreed on a plan.

He said Pat would try the same tactic again.

Model Attempt 1: Binary Classification

training

observed positive

instances

negative instances

naïve approach (take all non-positive pairs): highly imbalanced!

Soon et al. (2001): heuristic for more balanced selection

solution: go left-to-right

for a mention m, select the closest preceding coreferent mention

otherwise, no antecedent is found for m

possible problem:not transitive

Page 15: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Anaphora

does a mention have an antecedent?

Chris told Pat he aced the test.

Page 16: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Model 2: Entity-Mention Model

entity 1

entity 2

entity 3

entity 4

Pat and Chandler agreed on a plan.

He said Pat would try the same tactic again.

advantage: featurize based on all (or some or none) of the

clustered mentions

disadvantage: clustering doesn’t address anaphora

Page 17: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Model 3: Cluster-Ranking Model (Rahman and Ng, 2009)

entity 1

entity 2

entity 3

entity 4

Pat and Chandler agreed on a plan.

He said Pat would try the same tactic again.learn to rank the

clusters and items in them

Page 18: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Stanford Coref (Lee et al., 2011)

Page 19: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

<Core Task Here> Applications

Question answering

Information extraction

Machine translation

Text summarization

Information retrieval

Page 20: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

<Core Task Here> Applications

Question answering

Information extraction

Machine translation

Text summarization

Information retrieval

Page 21: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

IBM Watson

https://www.youtube.com/watch?v=C5Xnxjq63Zg

https://youtu.be/WFR3lOm_xhE?t=34s

Page 22: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

What happened with Watson?

(let’s ask Google)

Page 23: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

What Happened with Watson?David Ferrucci, the manager of the Watson project at IBM Research,explained during a viewing of the show on Monday morning that several ofthings probably confused Watson. First, the category names on Jeopardy! aretricky. The answers often do not exactly fit the category. Watson, in histraining phase, learned that categories only weakly suggest the kind ofanswer that is expected, and, therefore, the machine downgrades theirsignificance. The way the language was parsed provided an advantage for thehumans and a disadvantage for Watson, as well. “What US city” wasn’t in thequestion. If it had been, Watson would have given US cities much moreweight as it searched for the answer. Adding to the confusion for Watson,there are cities named Toronto in the United States and the Toronto inCanada has an American League baseball team. It probably picked up thosefacts from the written material it has digested. Also, the machine didn’t findmuch evidence to connect either city’s airport to World War II. (Chicago was avery close second on Watson’s list of possible answers.) So this is just one ofthose situations that’s a snap for a reasonably knowledgeable human but atrue brain teaser for the machine.

https://www.huffingtonpost.com/2011/02/15/watson-final-jeopardy_n_823795.html

Page 24: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

How many children does the Queen have?

Page 25: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017
Page 26: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

There are still errors

(but some questions are

harder than others)

Page 27: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Question Answering Motivation

Question answering

Information extraction

Machine translation

Text summarization

Information retrieval

Page 28: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Question Answering Motivation

Question answering

Information extraction

Machine translation

Text summarization

Information retrieval

Page 29: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Brief Aside: Information Extraction

ATTACK template– Type: Gun attack– Perp: Shining

Path– # killed: 3

BUSINESS NEGOTIATIONtemplate

– ….

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Pathattack today.

Page 30: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Remember Our Logical Forms?

Papa ate the caviar

KB

Page 31: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Two Types of QA

Closed domainOften tied to structured database

Open domainOften tied to unstructured data

Page 32: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Remember Our Logical Forms?

Papa ate the caviar

KBCorpus

Page 33: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Open Domain:START (1993-Present; Katz, 1997)

SynTactic Analysis using Reversible Transformations

http://start.csail.mit.edu

Page 34: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Open Domain:START (1993-Present; Katz, 1997)

“The START Server is built on two foundations the sentence level Natural Language processing capability provided by the START Natural Language system (Katz, 1990) and the idea of natural language annotations for multimedia information segments. This paper starts with an overview of sentence level processing in the START system and

then explains how annotating information segments with collections of English sentences makes it possible to use the power of sentence level natural language processing in the service of multimedia information access. The paper

ends with a proposal to annotate the World Wide Web.” (Katz, 1997)

SynTactic Analysis using Reversible Transformationshttp://start.csail.mit.edu

Page 35: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Open Domain:START (1993-Present; Katz, 1997)

Decompose sentences into (subject, verb, object) triples

“De-questionify” the inputHow many children does the queen

haveThe queen has how many children

Apply any needed inference rules

Query against knowledge base

SynTactic Analysis using Reversible Transformationshttp://start.csail.mit.edu

“The START Server is built on two foundations the sentence level Natural Language processing capability provided by the START Natural Language system (Katz, 1990) and the idea of natural language annotations for multimedia information segments. This paper starts with an overview of sentence level processing in the START system and then explains how annotating information segments with collections of English sentences makes it possible to use the power of sentence level natural language processing in the service of multimedia information access. The paper ends with a proposal to annotate the World Wide Web.” (Katz, 1997)

Page 36: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017
Page 37: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Basic System

Question Analysis

Question Classification

Query Construction

Answer

Input Question

Neves --- https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/NaturalLanguageProcessing/NLP09_QuestionAnswering.pdf

Document Retrieval

KB

Sentence Retrieval

Sentence NLP

Answer Extraction

Answer Validation

Corpus

To Learn More:NLP

Information Retrieval (IR)Information Extraction (IE)

Page 38: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Aspects of NLPPOS taggingStemmingShallow Parsing (chunking)Predicate argument representation

verb predicates and nominalizationEntity Annotation

Stand alone NERs with a variable number of classesDates, times and numeric value normalizationIdentification of semantic relations

complex nominals, genitives, adjectival phrases, and adjectival clauses

Event identificationSemantic Parsing

Page 39: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Basic System

Question Analysis

Question Classification

Query Construction

Answer

Input Question

Neves --- https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/NaturalLanguageProcessing/NLP09_QuestionAnswering.pdf

Document Retrieval

KB

Sentence Retrieval

Sentence NLP

Answer Extraction

Answer Validation

Corpus

To Learn More:NLP

Information Retrieval (IR)Information Extraction (IE)

Page 40: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Question Classification

Albert Einstein was born in 14 March 1879.

Albert Einstein was born in Germany.

Albert Einstein was born in a Jewish family.

Neves --- https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/NaturalLanguageProcessing/NLP09_QuestionAnswering.pdf

Page 41: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Question Classification Taxonomy

LOC:other Where do hyenas live?NUM:date When was Ozzy Osbourne born?LOC:other Where do the adventures of ``The Swiss Family Robinson” take place?LOC:other Where is Procter & Gamble based in the U.S.?HUM:ind What barroom judge called himself The Law West of the Pecos?HUM:gr What Polynesian people inhabit New Zealand?

http://cogcomp.org/Data/QA/QC/train_1000.labelSLP3: Figure 28.4

Page 42: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Basic System

Question Analysis

Question Classification

Query Construction

Answer

Input Question

Neves --- https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/NaturalLanguageProcessing/NLP09_QuestionAnswering.pdf

Document Retrieval

KB

Sentence Retrieval

Sentence NLP

Answer Extraction

Answer Validation

Corpus

To Learn More:NLP

Information Retrieval (IR)Information Extraction (IE)

Page 43: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

NLP Techniques:Vector Space ModelProbabilistic Model

Language Model

Software:Lucenesklearn

nltk

Document & Sentence Retrieval

tf-idf(d, w)term frequency, inverse document frequency

tf: frequency of word w in document d

idf: inverse frequency of documents containing w

count(𝑤𝑤 ∈ 𝑑𝑑)# tokens 𝑖𝑖𝑖𝑖 𝑑𝑑

∗ log(# documents

# documents containing 𝑤𝑤)

Page 44: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Current NLP QA TasksTREC (Text Retrieval Conference)

http://trec.nist.gov/Started in 1992

Freebase Question Answeringe.g., https://nlp.stanford.edu/software/sempre/Yao et al. (2014)

WikiQAhttps://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge-dataset-for-open-domain-question-answering/

Page 45: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

orthography

morphology

lexemes

syntax

semantics

pragmatics

discourse

VISIONAUDIO

prosody

intonation

color

Page 46: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Visual Question Answering

http://www.visualqa.org/

Page 47: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Course Goals• Be introduced to some of the core

problems and solutions of NLP (big picture)• Learn different ways that success and

progress can be measured in NLP• Relate to statistics, machine learning, and

linguistics• Implement NLP programs• Read and analyze research papers• Practice your (written) communication skills

Page 48: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Course RecapBasics of ProbabilityRequirements to be a distribution (“proportional to”, ∝)Definitions of conditional probability, joint probability, and

independenceBayes rule, (probability) chain rule

Basics of language modelingGoal: model (be able to predict) and give a score to language

(whole sequences of characters or words)Simple count-based modelSmoothing (and why we need it): Laplace (add-λ), interpolation,

backoffEvaluation: perplexity

Tasks and Classification (use Bayes rule!)Posterior decoding vs. noisy channel modelEvaluations: accuracy, precision, recall, and Fβ (F1) scoresNaïve Bayes (given the label, generate/explain each feature

independently) and connection to language modeling

Maximum Entropy ModelsMeanings of feature functions and weightsUse for language modeling or conditional classification

(“posterior in one go”)How to learn the weights: gradient descent

Distributed Representations & Neural Language ModelsWhat embeddings are and what their motivation isA common way to evaluate: cosine similarity

Word Modeling

Latent ModelsWhat is meant by “latent”Expectation MaximizationBasic Example: Unigram Mixture Model (3 coins)

Machine Translation AlignmentFamily of methods for learning word-to-word translationsIBM Model 1Can be used beyond MT (e.g., semantics, paraphrasing)

Hidden Markov ModelBasic Definition: generative bigram model of latent tags3 Tasks: Likelihood, Most-Likely Sequence, Parameter

Estimation3 Basic Algorithms: Forward (Backward), Viterbi, Baum-

Welch2 Types of Decoding: Viterbi & Posterior

Semi-Supervised LearningLabeled data (small amount) + unlabeled data (large

amount)Apply EM to get fractional counts to re-estimate

parameters

Latent Sequences

Page 49: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Course RecapBasics of Probability

Requirements to be a distribution (“proportional to”, ∝)Definitions of conditional probability, joint probability, and independenceBayes rule, (probability) chain rule

Basics of language modelingGoal: model (be able to predict) and give a score to language (whole sequences of

characters or words)Simple count-based modelSmoothing (and why we need it): Laplace (add-λ), interpolation, backoffEvaluation: perplexity

Tasks and Classification (use Bayes rule!)Posterior decoding vs. noisy channel modelEvaluations: accuracy, precision, recall, and Fβ (F1) scoresNaïve Bayes (given the label, generate/explain each feature independently) and

connection to language modeling

Maximum Entropy ModelsMeanings of feature functions and weightsUse for language modeling or conditional classification (“posterior in one go”)How to learn the weights: gradient descent

Distributed Representations & Neural Language ModelsWhat embeddings are and what their motivation isA common way to evaluate: cosine similarity Word Modeling

Latent ModelsWhat is meant by “latent”Expectation MaximizationBasic Example: Unigram Mixture Model (3 coins)

Machine Translation AlignmentFamily of methods for learning word-to-word translationsIBM Model 1Can be used beyond MT (e.g., semantics, paraphrasing)

Hidden Markov ModelBasic Definition: generative bigram model of latent tags3 Tasks: Likelihood, Most-Likely Sequence, Parameter Estimation3 Basic Algorithms: Forward (Backward), Viterbi, Baum-Welch2 Types of Decoding: Viterbi & Posterior

Semi-Supervised LearningLabeled data (small amount) + unlabeled data (large amount)Apply EM to get fractional counts to re-estimate parameters

Latent Sequences

Syntactic ParsingBasic linguistic intuitionsCapturing of some ambiguities and light semantics

Constituency ParsingBasic Definition: generative tree3 Tasks: Likelihood, Most-Likely Sequence, Parameter Estimation3 Basic Algorithms: Inside, Viterbi, Outside2 Types of Decoding: Viterbi & Posterior

Semi-Supervised LearningLabeled data (small amount) + unlabeled data (large amount)Apply EM to get fractional counts to re-estimate parameters

Dependency ParsingWord-to-word relationsShift-reduce parsingGreedy vs. beam search

Semantic FormsRoles, Frames, and LabelingWays to get human judgments (methodology and infrastructure)Lexical & knowledge resources

Latent Structures

Page 50: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Natural Language Processing

Page 51: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Conditional vs. Sequence

CRF Tutorial, Fig 1.2, Sutton & McCallum (2012)

We’ll cover these in 678

Page 52: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Gradient Ascent

Page 53: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Pick Your Toolkit

PyTorchDeeplearning4j

TensorFlowDyNetCaffe

KerasMxNetGluonCNTK

Comparisons:https://en.wikipedia.org/wiki/Comparison_of_deep_learning_softwarehttps://deeplearning4j.org/compare-dl4j-tensorflow-pytorchhttps://github.com/zer0n/deepframeworks (older---2015)

Page 54: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

http://www.qwantz.com/index.php?comic=170

Page 55: Question Answering and Information Extraction · Question Answering and Information Extraction CMSC 473/673 UMBC. December 11 th, 2017

Thank you for a great semester!

ITE [email protected]

Natural language processing

Semantics

Vision & language processing

Learning with low-to-no supervision