134
Natural Language Processing & Information Extraction CSE 573 Dan Weld

Natural Language Processing & Information Extraction CSE 573 Dan Weld

Embed Size (px)

Citation preview

Page 1: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Natural Language Processing & Information

Extraction

CSE 573

Dan Weld

Page 2: Natural Language Processing & Information Extraction CSE 573 Dan Weld

todo

• Be more explicit on how rule learning works (before rapier, bwi) – show search thru rule space

• Too much of a lecture – add more interaction / questions

Page 3: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Confession

• Proposal 1– Keep homework and final exam– Teach next week so we can cover all the cool stuff

• Proposal 2– Talk quickly– Mini-project replaces homework & final, due 12/11

Page 4: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Mini-Project Options

1. Write a program to solve the counterfeit coin problem on the midterm.

2. Build a DPLL and or a WalkSAT satisfiability solver.

3. Build a spam filter using naïve Bayes, decision trees, or compare learners in the Weka ML package.

4. Write a program which learns Bayes nets.

Page 5: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Communication

Page 6: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Is the Problem

Page 7: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Of the century…

Herm

an is reprinted with perm

ission from L

aughingStock L

icensing Inc., Ottaw

a Canada.

All rights reserved.

Page 8: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Page 9: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Why Study NLP?

• A hallmark of human intelligence.

• Text is the largest repository of human knowledge and is growing quickly.– emails, news articles, web pages, IM, scientific

articles, insurance claims, customer complaint letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, ……

Page 10: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is NLP?

• Natural Language Processing– Understand human language (speech, text)– Successfully communicate (NL generation)

• Pragmatics: “Can you pass the salt?”• Semantics: “Paris is the capital of France”

– How represent meaning?

• Syntax: analyze the structure of a sentence• Morphology: analyze the structure of words

– Dogs = dog + s.

Page 11: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Fundamental NLP Tasks

• Speech recognition: speech signal to words• Parsing: decompose sentence into units

– Part of speech tagging: Eg, noun vs. verb

• Semantic role labeling: for a verb, find the units that fill pre-defined “semantic roles” (eg, Agent, Patient or Instrument)– Example: “John hit the ball with the bat”

Page 12: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Fundamental NLP Tasks

• Semantics: map sentence to corresponding “meaning representation” (e.g., logic)– give(John, Book, Sally)– Quantification: Everybody loves somebody

• Word Sense Disambiguation– orange juice vs. orange coat– Where do word senses come from?

• Co-reference resolution: – The dog found a cookie; He ate it.

• Implicit “text” - what is left unsaid?– Joe entered the restaurant, ordered, ate and left. The owner said

“Come back soon!”– Joe entered the restaurant, ordered, ate and left. The owner said

“Hey, I’ll call the police!”

Page 13: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP Applications

• Question answering– Who is the first Taiwanese president?

• Text Categorization/Routing– e.g., customer e-mails.

• Text Mining– Find everything that interacts with BRCA1.

• Machine Translation• Information Extraction• Spelling correction

– Is that just dictionary lookup?

Page 14: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Main Challenge in NLP: Ambiguity• Lexical:

– Label (noun or verb)? – London (Jack or capital of the UK)?

• Syntactic (examples from newspaper headlines):

– Prepositional Phrase Attachment• Ban on Nude Dancing on Governor’s Desk• Killer Sentenced to Die for Second Time in 10 Years

– Word Sense Disambiguation• 1990: Iraqi Head Seeking Arms

– Syntactic Ambiguity (what’s modifying what)• Juvenile Court to Try Shooting Defender.

• Semantic ambiguity: – “snake poison”

• Rampant metaphors: – “prices went up”

Page 15: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Speech Act Theory (Searle ’75)

• Example Speech Acts:– assertives = speech acts that commit a speaker to the truth

of the expressed proposition – directives = speech acts that are to cause the hearer to

take a particular action, e.g. requests, commands and advice

– commissives = speech acts that commit a speaker to some future action, e.g. promises and oaths

• “Could you pass the salt?”– “Yes.”

• Research has focused on specifying semantics and analyzing primitives instead of implemented “speech actors”

Page 16: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Semantics

• How do you map a sentence to a semantic representation?– What are the semantic primitives?

• Schank: 12 (or so) primitives– (Conceptual dependency theory, 1972)

• The basis of natural language is conceptual. • The conceptual level is interlingual,• while the sentential level is language-specific.

– De-emphasize syntax– Focus on semantic expectations as the basis for NLU

• Status: didn’t yield practical systems, out of favor, but..

Page 17: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Syntax is about Sentence Structures

• Sentences have structures and are made up of constituents.– The constituents are phrases.– A phrase consists of a head and modifiers.

• The category of the head determines the category of the phrase– e.g., a phrase headed by a noun is a noun

phrase

Page 18: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Alternative Parses

Teacher strikes idle kidsN N V N

NP

VP

NP

S

Teacher strikes idle kidsN V A N

NP

VP

NP

S

Teacher strikes idle kids

Page 19: Natural Language Processing & Information Extraction CSE 573 Dan Weld

How do you choose the parse?

• Old days: syntactic rules, semantics.• Since early 1980’s: statistical parsing.

– parsing as supervised learning

• Input: Tree bank of (sentence, tree)• Output: PCFG Parser + lexical language model

• Parsing decisions depend on the words!• Example: VP V + NP + NP?• Does the verb take two objects or one? • ‘told Sue the story’ versus ‘invented the Internet’

Page 20: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Issues with Statistical Parsing

• Statistical parsers still make “plenty” of errors

• Tree banks are language specific• Tree banks are genre specific

– Train on WSJ fail on the Web– standard distributional assumption

• Unsupervised, un-lexicalized parsers exist– But performance is substantially weaker

Page 21: Natural Language Processing & Information Extraction CSE 573 Dan Weld

How Difficult is Morphology?

• Examples from Woods et. al. 2000– delegate (de + leg + ate) take the legs from– caress (car + ess) female car– cashier (cashy + er) more wealthy– lacerate (lace + rate) speed of tatting– ratify (rat + ify) infest with rodents– infantry (infant + ry) childish behavior

Page 22: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is “Information Extraction”

Filling slots in a database from sub-segments of text.As a task:

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

NAME TITLE ORGANIZATION

Slides from Cohen & McCallum

Page 23: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is “Information Extraction”

Filling slots in a database from sub-segments of text.As a task:

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

NAME TITLE ORGANIZATIONBill Gates CEO MicrosoftBill Veghte VP MicrosoftRichard Stallman founder Free Soft..

IE

Slides from Cohen & McCallum

Page 24: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is “Information Extraction”

Information Extraction = segmentation + classification + clustering + association

As a familyof techniques:

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

Slides from Cohen & McCallum

Page 25: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a familyof techniques:

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

Slides from Cohen & McCallum

Page 26: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a familyof techniques:

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

Slides from Cohen & McCallum

Page 27: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a familyof techniques:

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.

"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“

Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation N

AME

TITLE ORGANIZATION

Bill Gates

CEO

Microsoft

Bill Veghte

VP

Microsoft

Richard Stallman

founder

Free Soft..

*

*

*

*

Slides from Cohen & McCallum

Page 28: Natural Language Processing & Information Extraction CSE 573 Dan Weld

IE in Context

Create ontology

Segment Classify Associate Cluster

Load DB

Spider

Query,Search

Data mine

IE

Documentcollection

Database

Filter by relevance

Label training data

Train extraction models

Slides from Cohen & McCallum

Page 29: Natural Language Processing & Information Extraction CSE 573 Dan Weld

IE HistoryPre-Web• Mostly news articles

– De Jong’s FRUMP [1982]• Hand-built system to fill Schank-style “scripts” from news wire

– Message Understanding Conference (MUC) DARPA [’87-’95], TIPSTER [’92-’96]

• Most early work dominated by hand-built models– E.g. SRI’s FASTUS, hand-built FSMs.– But by 1990’s, some machine learning: Lehnert, Cardie, Grishman and then

HMMs: Elkan [Leek ’97], BBN [Bikel et al ’98]

Web• AAAI ’94 Spring Symposium on “Software Agents”

– Much discussion of ML applied to Web. Maes, Mitchell, Etzioni.

• Tom Mitchell’s WebKB, ‘96– Build KB’s from the Web.

• Wrapper Induction– First by hand, then ML: [Doorenbos ‘96], [Soderland ’96], [Kushmerick ’97],…

Slides from Cohen & McCallum

Page 30: Natural Language Processing & Information Extraction CSE 573 Dan Weld

www.apple.com/retail

What makes IE from the Web Different?Less grammar, but more formatting & linking

The directory structure, link structure, formatting & layout of the Web is its own new grammar.

Apple to Open Its First Retail Storein New York City

MACWORLD EXPO, NEW YORK--July 17, 2002--Apple's first retail store in New York City will open in Manhattan's SoHo district on Thursday, July 18 at 8:00 a.m. EDT. The SoHo store will be Apple's largest retail store to date and is a stunning example of Apple's commitment to offering customers the world's best computer shopping experience.

"Fourteen months after opening our first retail store, our 31 stores are attracting over 100,000 visitors each week," said Steve Jobs, Apple's CEO. "We hope our SoHo store will surprise and delight both Mac and PC users who want to see everything the Mac can do to enhance their digital lifestyles."

www.apple.com/retail/soho

www.apple.com/retail/soho/theatre.html

Newswire Web

Slides from Cohen & McCallum

Page 31: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Landscape of IE Tasks (1/4):Pattern Feature Domain

Text paragraphswithout formatting

Grammatical sentencesand some formatting & links

Non-grammatical snippets,rich formatting & links Tables

Astro Teller is the CEO and co-founder of BodyMedia. Astro holds a Ph.D. in Artificial Intelligence from Carnegie Mellon University, where he was inducted as a national Hertz fellow. His M.S. in symbolic and heuristic computation and B.S. in computer science are from Stanford University. His work in science, literature and business has appeared in international media from the New York Times to CNN to NPR.

Slides from Cohen & McCallum

Page 32: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Landscape of IE Tasks (2/4):Pattern Scope

Web site specific Genre specific Wide, non-specific

Amazon.com Book Pages Resumes University Names

Formatting Layout Language

Slides from Cohen & McCallum

Page 33: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Landscape of IE Tasks (3/4):Pattern Complexity

Closed set

He was born in Alabama…

Regular set

Phone: (413) 545-1323

Complex pattern

University of ArkansasP.O. Box 140Hope, AR 71802 …was among the six houses sold

by Hope Feldman that year.

Ambiguous patterns,needing context + manysources of evidence

The CALD main office can be reached at 412-268-1299

The big Wyoming sky…

U.S. states U.S. phone numbers

U.S. postal addresses

Person names

Headquarters:1128 Main Street, 4th FloorCincinnati, Ohio 45210

Pawel Opalinski, SoftwareEngineer at WhizBang Labs.

E.g. word patterns:

Slides from Cohen & McCallum

Page 34: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Landscape of IE Tasks (4/4):Pattern Combinations

Single entity

Person: Jack Welch

Binary relationship

Relation: Person-TitlePerson: Jack WelchTitle: CEO

N-ary record

“Named entity” extraction

Jack Welch will retire as CEO of General Electric tomorrow. The top role at the Connecticut company will be filled by Jeffrey Immelt.

Relation: Company-LocationCompany: General ElectricLocation: Connecticut

Relation: SuccessionCompany: General ElectricTitle: CEOOut: Jack WelshIn: Jeffrey Immelt

Person: Jeffrey Immelt

Location: Connecticut

Slides from Cohen & McCallum

Page 35: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Evaluation of Single Entity Extraction

Michael Kearns and Sebastian Seung will start Monday’s tutorial, followed by Richard M. Karpe and Martin Cooke.TRUTH:

PRED:

Precision = = # correctly predicted segments 2

# predicted segments 6

Michael Kearns and Sebastian Seung will start Monday’s tutorial, followed by Richard M. Karpe and Martin Cooke.

Recall = = # correctly predicted segments 2

# true segments 4

F1 = Harmonic mean of Precision & Recall = ((1/P) + (1/R)) / 2

1

Slides from Cohen & McCallum

Page 36: Natural Language Processing & Information Extraction CSE 573 Dan Weld

State of the Art Performance

• Named entity recognition– Person, Location, Organization, …– F1 in high 80’s or low- to mid-90’s

• Binary relation extraction– Contained-in (Location1, Location2)

Member-of (Person1, Organization1)– F1 in 60’s or 70’s or 80’s

• Wrapper induction– Extremely accurate performance obtainable– Human effort (~30min) required on each site

Slides from Cohen & McCallum

Page 37: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Landscape of IE Techniques (1/1):Models

Any of these models can be used to capture words, formatting or both.

Lexicons

AlabamaAlaska…WisconsinWyoming

Abraham Lincoln was born in Kentucky.

member?

Classify Pre-segmentedCandidates

Abraham Lincoln was born in Kentucky.

Classifier

which class?

…and beyond

Sliding Window

Abraham Lincoln was born in Kentucky.

Classifier

which class?

Try alternatewindow sizes:

Boundary Models

Abraham Lincoln was born in Kentucky.

Classifier

which class?

BEGIN END BEGIN END

BEGIN

Context Free Grammars

Abraham Lincoln was born in Kentucky.

NNP V P NPVNNP

NP

PP

VP

VP

S

Mos

t lik

ely

pars

e?

Finite State Machines

Abraham Lincoln was born in Kentucky.

Most likely state sequence?

Slides from Cohen & McCallum

Page 38: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Landscape: Our Focus

Pattern complexity

Pattern feature domain

Pattern scope

Pattern combinations

Models

closed set regular complex ambiguous

words words + formatting formatting

site-specific genre-specific general

entity binary n-ary

lexicon regex window boundary FSM CFG

Slides from Cohen & McCallum

Page 39: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Sliding Windows

Slides from Cohen & McCallum

Page 40: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

Slides from Cohen & McCallum

Page 41: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

Slides from Cohen & McCallum

Page 42: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

Slides from Cohen & McCallum

Page 43: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

Slides from Cohen & McCallum

Page 44: Natural Language Processing & Information Extraction CSE 573 Dan Weld

A “Naïve Bayes” Sliding Window Model[Freitag 1997]

00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrunw t-m w t-1 w t w t+n w t+n+1 w t+n+m

prefix contents suffix

Other examples of sliding window: [Baluja et al 2000](decision tree over individual words & their context)

If P(“Wean Hall Rm 5409” = LOCATION) is above some threshold, extract it.

… …

Estimate Pr(LOCATION|window) using Bayes rule

Try all “reasonable” windows (vary length, position)

Assume independence for length, prefix words, suffix words, content words

Estimate from data quantities like: Pr(“Place” in prefix|LOCATION)

Slides from Cohen & McCallum

Page 45: Natural Language Processing & Information Extraction CSE 573 Dan Weld

“Naïve Bayes” Sliding Window Results

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

Domain: CMU UseNet Seminar Announcements

Field F1 Person Name: 30%Location: 61%Start Time: 98%

Slides from Cohen & McCallum

Page 46: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Realistic sliding-window-classifier IE

• What windows to consider?– all windows containing as many tokens as the shortest example, but

no more tokens than the longest example

• How to represent a classifier? It might:– Restrict the length of window;– Restrict the vocabulary or formatting used before/after/inside

window;– Restrict the relative order of tokens, etc.

• Learning Method– SRV: Top-Down Rule Learning [Frietag AAAI ‘98]– Rapier: Bottom-Up [Califf & Mooney, AAAI ‘99]

Slides from Cohen & McCallum

Page 47: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Rapier: results – precision/recall

Slides from Cohen & McCallum

Page 48: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Rule-learning approaches to sliding-window classification: Summary

• SRV, Rapier, and WHISK [Soderland KDD ‘97]

– Representations for classifiers allow restriction of the relationships between tokens, etc

– Representations are carefully chosen subsets of even more powerful representations based on logic programming (ILP and Prolog)

– Use of these “heavyweight” representations is complicated, but seems to pay off in results

• Can simpler representations for classifiers work?

Slides from Cohen & McCallum

Page 49: Natural Language Processing & Information Extraction CSE 573 Dan Weld

BWI: Learning to detect boundaries

• Another formulation: learn three probabilistic classifiers:– START(i) = Prob( position i starts a field)– END(j) = Prob( position j ends a field)– LEN(k) = Prob( an extracted field has length k)

• Then score a possible extraction (i,j) bySTART(i) * END(j) * LEN(j-i)

• LEN(k) is estimated from a histogram

[Freitag & Kushmerick, AAAI 2000]

Slides from Cohen & McCallum

Page 50: Natural Language Processing & Information Extraction CSE 573 Dan Weld

BWI: Learning to detect boundaries

• BWI uses boosting to find “detectors” for START and END

• Each weak detector has a BEFORE and AFTER pattern (on tokens before/after position i).

• Each “pattern” is a sequence of – tokens and/or – wildcards like: anyAlphabeticToken, anyNumber, …

• Weak learner for “patterns” uses greedy search (+ lookahead) to repeatedly extend a pair of empty BEFORE,AFTER patterns

Slides from Cohen & McCallum

Page 51: Natural Language Processing & Information Extraction CSE 573 Dan Weld

BWI: Learning to detect boundaries

Field F1NB

Person Name: 30%

Location: 61%

Start Time: 98%

Slides from Cohen & McCallum

F1BWI

69.3%

75.5%

99.6%

Page 52: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Problems with Sliding Windows and Boundary Finders

• Decisions in neighboring parts of the input are made independently from each other.

– Naïve Bayes Sliding Window may predict a “seminar end time” before the “seminar start time”.

– It is possible for two overlapping windows to both be above threshold.

– In a Boundary-Finding system, left boundaries are laid down independently from right boundaries, and their pairing happens as a separate step.

Slides from Cohen & McCallum

Page 53: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Rule Learning• Bootstrap

– <class nameplural> “such as” <proper NP>+– Use names to label

other pages– Now have training data!– Learn more rules:

– <proper NP>, mayor of <proper NP>

Page 54: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Hidden Markov ModelsConditional Random Fields

Finite State Machines

Abraham Lincoln was born in Kentucky.

Most likely state sequence?

Page 55: Natural Language Processing & Information Extraction CSE 573 Dan Weld

References• [Bikel et al 1997] Bikel, D.; Miller, S.; Schwartz, R.; and Weischedel, R. Nymble: a high-performance learning name-finder. In

Proceedings of ANLP’97, p194-201.• [Califf & Mooney 1999], Califf, M.E.; Mooney, R.: Relational Learning of Pattern-Match Rules for Information Extraction, in

Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99).• [Cohen, Hurst, Jensen, 2002] Cohen, W.; Hurst, M.; Jensen, L.: A flexible learning system for wrapping tables and lists in HTML

documents. Proceedings of The Eleventh International World Wide Web Conference (WWW-2002)• [Cohen, Kautz, McAllester 2000] Cohen, W; Kautz, H.; McAllester, D.: Hardening soft information sources. Proceedings of the

Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000).• [Cohen, 1998] Cohen, W.: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual

Similarity, in Proceedings of ACM SIGMOD-98.• [Cohen, 2000a] Cohen, W.: Data Integration using Similarity Joins and a Word-based Information Representation Language, ACM

Transactions on Information Systems, 18(3).• [Cohen, 2000b] Cohen, W. Automatically Extracting Features for Concept Learning from the Web, Machine Learning:

Proceedings of the Seventeeth International Conference (ML-2000).• [Collins & Singer 1999] Collins, M.; and Singer, Y. Unsupervised models for named entity classification. In Proceedings of the

Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.• [De Jong 1982] De Jong, G. An Overview of the FRUMP System. In: Lehnert, W. & Ringle, M. H. (eds), Strategies for Natural

Language Processing. Larence Erlbaum, 1982, 149-176.• [Freitag 98] Freitag, D: Information extraction from HTML: application of a general machine learning approach, Proceedings of the

Fifteenth National Conference on Artificial Intelligence (AAAI-98).• [Freitag, 1999], Freitag, D. Machine Learning for Information Extraction in Informal Domains. Ph.D. dissertation, Carnegie Mellon

University.• [Freitag 2000], Freitag, D: Machine Learning for Information Extraction in Informal Domains, Machine Learning 39(2/3): 99-101

(2000).• Freitag & Kushmerick, 1999] Freitag, D; Kushmerick, D.: Boosted Wrapper Induction. Proceedings of the Sixteenth National

Conference on Artificial Intelligence (AAAI-99)• [Freitag & McCallum 1999] Freitag, D. and McCallum, A. Information extraction using HMMs and shrinakge. In Proceedings

AAAI-99 Workshop on Machine Learning for Information Extraction. AAAI Technical Report WS-99-11.• [Kushmerick, 2000] Kushmerick, N: Wrapper Induction: efficiency and expressiveness, Artificial Intelligence, 118(pp 15-68).• [Lafferty, McCallum & Pereira 2001] Lafferty, J.; McCallum, A.; and Pereira, F., Conditional Random Fields: Probabilistic Models

for Segmenting and Labeling Sequence Data, In Proceedings of ICML-2001.• [Leek 1997] Leek, T. R. Information extraction using hidden Markov models. Master’s thesis. UC San Diego.• [McCallum, Freitag & Pereira 2000] McCallum, A.; Freitag, D.; and Pereira. F., Maximum entropy Markov models for information

extraction and segmentation, In Proceedings of ICML-2000• [Miller et al 2000] Miller, S.; Fox, H.; Ramshaw, L.; Weischedel, R. A Novel Use of Statistical Parsing to Extract Information from

Text. Proceedings of the 1st Annual Meeting of the North American Chapter of the ACL (NAACL), p. 226 - 233.

Slides from Cohen & McCallum

Page 56: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 57: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Introduction to NLP

Oren Etzioni

(CSE 473)

Page 58: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Communication

Page 59: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Is the Problem

Page 60: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Of the century…

Herm

an is reprinted with perm

ission from L

aughingStock L

icensing Inc., Ottaw

a Canada.

All rights reserved.

Page 61: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Page 62: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Why Study NLP?

• A hallmark of human intelligence.• Text is the largest repository of human

knowledge and is growing quickly.– emails, news articles, web pages, IM, scientific

articles, insurance claims, customer complaint letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, ……

Page 63: Natural Language Processing & Information Extraction CSE 573 Dan Weld

References

• Russell & Norvig ch. 22 & 23• Daniel Jurafsky and James H. Martin,

Speech and Language Processing• Foundations of Statistical NLP (Manning and

Schutze)

Page 64: Natural Language Processing & Information Extraction CSE 573 Dan Weld

What is NLP?

• Natural Language Processing– Understand human language (speech, text)– Successfully communicate (NL generation)

• Pragmatics: “Can you pass the salt?”• Semantics: “Paris is the capital of France”

– “show is white” iff what holds true in the world?– What does ‘water’ refer to? On ‘twin earth’?

• Syntax: analyze the structure of a sentence• Morphology: analyze the structure of words

– Dogs = dog + s.

Page 65: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Why is Natural Language Understanding difficult?

• The hidden structure of language is highly ambiguous

• Structures for: Fed raises interest rates 0.5% in effort to control inflation (NYT headline 5/17/00)

Page 66: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Where are the ambiguities?

Page 67: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Fundamental NLP Tasks

• Speech recognition: speech signal to words• Parsing: decompose sentence into units

– Part of speech tagging: Eg, noun vs. verb

• Semantic role labeling: for a verb, find the units that fill pre-defined “semantic roles” (eg, Agent, Patient or Instrument)– Example: “John hit the ball with the bat”

Page 68: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Fundamental NLP Tasks

• Semantics: map sentence to corresponding “meaning representation” (e.g., logic)– give(John, Book, Sally)– Quantification: Everybody loves somebody

• Word Sense Disambiguation– orange juice vs. orange coat– Where do word senses come from?

• Co-reference resolution: The dog found a cookie; He ate it.

• Implicit “text”: what is left unsaid?

Page 69: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP Applications

• Question answering– Who is the first Taiwanese president?

• Text Categorization/Routing– e.g., customer e-mails.

• Text Mining– Find everything that interacts with BRCA1.

• Machine Translation• Information Extraction• Spelling correction

– Is that just dictionary lookup?

Page 70: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 71: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Challenges in NLP: Ambiguity

• Lexical: label (noun or verb)? – London (Jack or capital of the UK)?

• Syntactic:– I smelled a wumpus in (2,2)– Teacher Strikes Idle Kids– Killer Sentenced to Die for Second Time in 10

Years

• Semantic ambiguity: “dog lover”• Rampant metaphors: “prices went up”

Page 72: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Pragmatics

How do you map an utterance to the intended meaning?

• Do you have a standard set of transformations?

• Do you have a model of beliefs & agents?– He knows that I can pass the salt, therefore…

• Can this be formulated as a learning problem?

Page 73: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Speech Act Theory (Searle ’75)

• Example Speech Acts:– assertives = speech acts that commit a speaker to the truth

of the expressed proposition – directives = speech acts that are to cause the hearer to

take a particular action, e.g. requests, commands and advice

– commissives = speech acts that commit a speaker to some future action, e.g. promises and oaths

• Research has focused on specifying semantics and analyzing primitives instead of implemented “speech actors”

Page 74: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Semantics

• How do you map a sentence to a semantic representation?– What are the semantic primitives? (Schank’s conceptual

dependency theory, 1972)

• Schank: 12 (or so) primitives:• The basis of natural language is conceptual. • The conceptual level is interlingual while the sentential level is

language-specific. • De-emphasize syntax• Focus on semantic expectations as the basis for NLU• Status: didn’t yield practical systems, out of favor, but..

Page 75: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Syntax is about Sentence Structures

• Sentences have structures and are made up of constituents.

• The constituents are phrases.• A phrase consists of a head and modifiers.• The category of the head determines the

category of the phrase– e.g., a phrase headed by a noun is a noun phrase

Page 76: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Parsing

• Analyze the structure of a sentence

The student put the book on the tableD N V D DN NP

NP

PP

NP

VPNP

S

Page 77: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Alternative Parses

Teacher strikes idle kidsN N V N

NP

VP

NP

S

Teacher strikes idle kidsN V A N

NP

VP

NP

S

Page 78: Natural Language Processing & Information Extraction CSE 573 Dan Weld

How do you choose the parse?

• Old days: syntactic rules, semantics.• Since early 1980’s: statistical parsing.

– parsing as supervised learning

• input: Tree bank of (sentence, tree)• output: PCFG Parser + lexical language model

• Parsing decisions depend on the words!

• Example: VP V + NP + NP?

• Does the verb take two objects or one?

• ‘told Sue the story’ versus ‘invented the Internet’

Page 79: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Issues with Statistical Parsing

• Statistical parsers still make “plenty” of errors– Have the report on my desk by Monday

• Tree banks are language specific• Tree banks are genre specific

– Train on WSJ fail on the Web– standard distributional assumption

• Unsupervised, un-lexicalized parsers exist, but performance is substantially weaker

Page 80: Natural Language Processing & Information Extraction CSE 573 Dan Weld

How Difficult is Morphology?

• Examples from Woods et. al. 2000– delegate (de + leg + ate) take the legs from– caress (car + ess) female car– cashier (cashy + er) more wealthy– lacerate (lace + rate) speed of tatting– ratify (rat + ify) infest with rodents– infantry (infant + ry) childish behavior

Page 81: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP Application Case Study: Machine Translation

• Focus is on translating single sentences using a probabilistic model

• Classical case of statistical NLP (both good & bad)

• Source sentence is E (English)• Target sentence is F (French)• What is P(F/E)?

Page 82: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Noisy Channel Model

• argmax over F, P(F/E)• P(F/E) = [P(E/F) * P(F)]/P(E)• Ignore P(E). Why?• P(F) = Language model of French acquired from a French

corpus• P(E/F) = “translation model” acquired from a sentence “aligned”

English-French corpus• What’s the point? Instead of estimating P(F/E) directly, we

decompose the problem into– find possible translations of E into French– Rank them using a model of French

Page 83: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Language Models

• To get a probability distribution over sentences, build an N-gram model

• P(go/let's) is high but P(lets'/go) is low• Equation shows a bi-gram model, tri-grams are

common• N is a tradeoff based on corpus size, sparsity• Used widely in NLP (e.g., speech)

)/()..( 11 fifiPffP n

Page 84: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Translation Model P(E/F)

Relies on corpus where French sentences paired with their English translations (E.g., European parliament proceedings)

• Corpus too sparse to translate whole sentences• So again, decompose into N-gramsP(the dog/le chien) = P(the/le) * P(dog/chien)• How does this fail?

– word order: brown dog chien brun– one-to-many: house a la maison– Idioms (he “kicked the bucket”), metaphors, etc.

• How does this scale to 49,000,000 language pairs?

Page 85: Natural Language Processing & Information Extraction CSE 573 Dan Weld

u

Most statistical machine translation research has focused on a few high-resource languages (European, Chinese, Japanese, Arabic).

Chinese ArabicFrench

(~200M words)

BengaliUzbek

ApproximateParallel Text Available (with English)

Italian Danish Finnish{ Various Western European

languages: parliamentary proceedings, govt documents(~30M words)

Serbian KhmerChechen

{… …

{Bible/Koran/Book of Mormon/Dianetics(~1M words)

Nothing/Univ. Decl.Of Human Rights(~1K words)

Page 86: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Conclusions

• NLP is harder than it might seem naively• Many subtasks• Statistical NLP is the dominant paradigm

– supervised learning– corpus-based statistics (language models)– Some important limitations in practice!

• NL “understanding” has received very little attention

Page 87: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Why is NLP Hard?

• Ambiguity– Prepositional Phrase Attachment

• Newspaper headline examples• Ban on Nude Dancing on Governor’s Desk

– Word Sense Disambiguation• 1990: Iraqi Head Seeking Arms

– Syntactic Ambiguity (what’s modfying what)• Juvenile Court to Try Shooting Defender.

Examples From Chris Manning

Page 88: Natural Language Processing & Information Extraction CSE 573 Dan Weld

• More– Teachers Strikes Idle Kids; – Stolen Painting Found by Tree. – Local High School Dropouts Cut in Half; – Red Tape Holds Up New Bridges; – Clinton Wins on Budget, But More Lies Ahead; – Hospitals are Sued by Seven Foot Doctors; Kids Make

Nutritious Snacks; – Minister Accused of Having Eight Wives in Jail.

Page 89: Natural Language Processing & Information Extraction CSE 573 Dan Weld

• That the reason that they work for human beings is because natural language understanding essentially depends on making complex and subtle use of context, both the context of the rest of the sentence, and context of prior discourse and thing around you in the world. So for humans it’s obvious what people are saying.

• AI complete tasks

Page 90: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Chart Parsing

• R&N p 800

Page 91: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Small devices

• With a big monitor, humans can scan for the right information

• On a small screen, there’s hugely more value from a system that can show you what you want:– phone number– business hours– email summary

• “Call me at 11 to finalize this”

Page 92: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP for IR/web search?

• It’s a no-brainer that NLP should be useful and used for web search (and IR in general):– Search for ‘Jaguar’

• the computer should know or ask whether you’re interested in big cats [scarce on the web], cars, or, perhaps a molecule geometry and solvation energy package, or a package for fast network I/O in Java

– Search for ‘Michael Jordan’• The basketballer or the machine learning guy?

– Search for laptop, don’t find notebook– Google doesn’t even stem:

• Search for probabilistic model, and you don’t even match pages with probabilistic models.

Page 93: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP for IR/web search?

• Word sense disambiguation technology generally works well (like text categorization)

• Synonyms can be found or listed• Lots of people have been into fixing this

– e-Cyc had a beta version with Hotbot that disambiguated senses, and was going to go live in 2 months … 14 months ago

– Lots of startups: • LingoMotors• iPhrase “Traditional keyword search technology is

hopelessly outdated”

Page 94: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP for IR/web search?

• But in practice it’s an idea that hasn’t gotten much traction– Correctly finding linguistic base forms is straightforward,

but produces little advantage over crude stemming which just slightly over equivalence classes words

– Word sense disambiguation only helps on average in IR if over 90% accurate (Sanderson 1994), and that’s about where we are

– Syntactic phrases should help, but people have been able to get most of the mileage with “statistical phrases” – which have been aggressively integrated into systems recently

Page 95: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP for IR/web search?

• People can easily scan among results (on their 21” monitor) … if you’re above the fold

• Much more progress has been made in link analysis, and use of anchor text, etc.

• Anchor text gives human-provided synonyms• Link or click stream analysis gives a form of

pragmatics: what do people find correct or important (in a default context)

• Focus on short, popular queries, news, etc.• Using human intelligence always beats artificial

intelligence

Page 96: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP for IR/web search?

• Methods which use of rich ontologies, etc., can work very well for intranet search within a customer’s site (where anchor-text, link, and click patterns are much less relevant)– But don’t really scale to the whole web

• Moral: it’s hard to beat keyword search for the task of general ad hoc document retrieval

• Conclusion: one should move up the food chain to tasks where finer grained understanding of meaning is needed

Page 97: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 98: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Product information

Page 99: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Product info

• C-net markets this information

• How do they get most of it?– Phone calls– Typing.

Page 100: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Inconsistency: digital cameras

• Image Capture Device: 1.68 million pixel 1/2-inch CCD sensor• Image Capture Device Total Pixels Approx. 3.34 million

Effective Pixels Approx. 3.24 million• Image sensor Total Pixels: Approx. 2.11 million-pixel• Imaging sensor Total Pixels: Approx. 2.11 million 1,688 (H)

x 1,248 (V)• CCD Total Pixels: Approx. 3,340,000 (2,140[H] x 1,560 [V] )

– Effective Pixels: Approx. 3,240,000 (2,088 [H] x 1,550 [V] )

– Recording Pixels: Approx. 3,145,000 (2,048 [H] x 1,536 [V] )

• These all came off the same manufacturer’s website!!

• And this is a very technical domain. Try sofa beds.

Page 101: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Product information/ Comparison shopping, etc.

• Need to learn to extract info from online vendors

• Can exploit uniformity of layout, and (partial) knowledge of domain by querying with known products

• E.g., Jango Shopbot (Etzioni and Weld)– Gives convenient aggregation of online content

• Bug: not popular with vendors– A partial solution is for these tools to be personal

agents rather than web services

Page 102: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Question answering from text

• TREC 8/9 QA competition: an idea originating from the IR community

• With massive collections of on-line documents, manual translation of knowledge is impractical: we want answers from textbases [cf. bioinformatics]

• Evaluated output is 5 answers of 50/250 byte snippets of text drawn from a 3 Gb text collection, and required to contain at least one concept of the semantic category of the expected answer type. (IR think. Suggests the use of named entity recognizers.)

• Get reciprocal points for highest correct answer.

Page 103: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Pasca and Harabagiu (200) show value of sophisticated NLP

• Good IR is needed: paragraph retrieval based on SMART

• Large taxonomy of question types and expected answer types is crucial

• Statistical parser (modeled on Collins 1997) used to parse questions and relevant text for answers, and to build knowledge base

• Controlled query expansion loops (morphological, lexical synonyms, and semantic relations) are all important

• Answer ranking by simple ML method

Page 104: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Question Answering Example

• How hot does the inside of an active volcano get? • get(TEMPERATURE, inside(volcano(active))) • “lava fragments belched out of the mountain were as

hot as 300 degrees Fahrenheit” • fragments(lava, TEMPERATURE(degrees(300)),

belched(out, mountain)) – volcano ISA mountain

– lava ISPARTOF volcano lava inside volcano

– fragments of lava HAVEPROPERTIESOF lava

• The needed semantic information is in WordNet definitions, and was successfully translated into a form that can be used for rough ‘proofs’

Page 105: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Conclusion

• Complete human-level natural language understanding is still a distant goal

• But there are now practical and usable partial NLU systems applicable to many problems

• An important design decision is in finding an appropriate match between (parts of) the application domain and the available methods

• But, used with care, statistical NLP methods have opened up new possibilities for high performance text understanding systems.

Page 106: Natural Language Processing & Information Extraction CSE 573 Dan Weld

NLP Research at the Turing Center

Oren EtzioniTuring Center

University of Washington

http://turing.cs.washington.edu

Page 107: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 108: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Turing Center Foci

• Scale MT to 49,000,000 language pairs– 2,500,000 word translation graph– P(V F C)?– PanImages

• Accumulate knowledge from the Web

• A new paradigm for Web Search

Page 109: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Outline

1. A New Paradigm for Search

2. Open Information Extraction

3. Tractable Inference

4. Conclusions

Page 110: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Web Search in 2020?

• Type key words into a search box?• Social or “human powered” Search?• The Semantic Web?• What about our technology exponentials?

“The best way to predict the future is to invent it!”

Page 111: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Intelligent Search

Instead of merely retrieving Web pages, read ‘em!

Machine Reading = Information Extraction (IE) + tractable inference

• IE(sentence) = who did what?– speaker(Alon Halevy, UW)

• Inference = uncover implicit information– Will Alon visit Seattle?

Page 112: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Application: Information Fusion

• What kills bacteria?• What west coast, nano-technology

companies are hiring?• Compare Obama’s “buzz” versus Hillary’s?• What is a quiet, inexpensive, 4-star hotel in

Vancouver?

Page 113: Natural Language Processing & Information Extraction CSE 573 Dan Weld

• Opine (Popescu & Etzioni, EMNLP ’05)

• IE(product reviews)– Informative– Abundant, but varied– Textual

• Summarize reviews without any prior knowledge of product category

Opinion Mining

Page 114: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 115: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 116: Natural Language Processing & Information Extraction CSE 573 Dan Weld

But “Reading” the Web is Tough

• Traditional IE is narrow• IE has been applied to small, homogenous

corpora• No parser achieves high accuracy• No named-entity taggers• No supervised learning

How about semi-supervised learning?

Page 117: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Semi-Supervised Learning

• Few hand-labeled examples Limit on the number of concepts Concepts are pre-specified Problematic for the Web

• Alternative: self-supervised learning– Learner discovers concepts on the fly– Learner automatically labels examples

per concept!

Page 118: Natural Language Processing & Information Extraction CSE 573 Dan Weld

2. Open IE = Self-supervised IE (Banko, Cafarella, Soderland, et. al, IJCAI ’07)

Traditional IE Open IE

Input: Corpus + Hand-labeled Data

Corpus

Relations: Specified in Advance

Discovered Automatically

Complexity:

Text analysis:

O(D * R)

R relations

Parser + Named-entity tagger

O(D)

D documents

NP Chunker

Page 119: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Extractor Overview (Banko & Etzioni, ’08)

1. Use a simple model of relationships in English to label extractions

2. Bootstrap a general model of relationships in English sentences, encoded as a CRF

3. Decompose each sentence into one or more (NP1, VP, NP2) “chunks”

4. Use CRF model to retain relevant parts of each NP and VP.

The extractor is relation-independent!

Page 120: Natural Language Processing & Information Extraction CSE 573 Dan Weld

TextRunner Extraction

• Extract Triple representing binary relation (Arg1, Relation, Arg2) from sentence.

Internet powerhouse, EBay, was originally founded by Pierre Omidyar.

Internet powerhouse, EBay, was originally founded by Pierre Omidyar.

(Ebay, Founded by, Pierre Omidyar)

Page 121: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Numerous Extraction Challenges

• Drop non-essential info:

“was originally founded by” founded by

• Retain key distinctions

Ebay founded by Pierr ≠ Ebay founded Pierre

• Non-verb relationships“George Bush, president of the U.S…”

• Synonymy & aliasingAlbert Einstein = Einstein ≠ Einstein Bros.

Page 122: Natural Language Processing & Information Extraction CSE 573 Dan Weld

TextRunner (Web’s 1st Open IE system)

1. Self-Supervised Learner: automatically labels example extractions & learns an extractor

2. Single-Pass Extractor: single pass over corpus, identifying extractions in each sentence

3. Query Processor: indexes extractions enables queries at interactive speeds

Page 124: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 125: Natural Language Processing & Information Extraction CSE 573 Dan Weld
Page 126: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Triples11.3 million

With Well-Formed Relation9.3 million

With Well-Formed Entities7.8 million

Abstract6.8 million

79.2% correct

Concrete1.0 million

88.1%correct

Sample of 9 million Web Pages

• Concrete facts: (Oppenheimer, taught at, Berkeley)

• Abstract facts: (fruit, contain, vitamins)

Page 127: Natural Language Processing & Information Extraction CSE 573 Dan Weld

3. Tractable Inference

Much of textual information is implicit

I. Entity and predicate resolution

II. Probability of correctness

III. Composing facts to draw conclusions

Page 128: Natural Language Processing & Information Extraction CSE 573 Dan Weld

I. Entity Resolution

Resolver (Yates & Etzioni, HLT ’07): determines synonymy based on relations found by TextRunner (cf. Pantel & Lin ‘01)

• (X, born in, 1941) (M, born in, 1941)• (X, citizen of, US) (M, citizen of, US)• (X, friend of, Joe) (M, friend of, Mary)

P(X = M) ~ shared relations

Page 129: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Relation Synonymy

• (1, R, 2)• (2, R 4)• (4, R, 8) • Etc.

• (1, R’ 2)• (2, R’, 4)• (4, R’ 8)• Etc.

P(R = R’) ~ shared argument pairs

•Unsupervised probabilistic model•O(N log N) algorithm run on millions of docs

Page 130: Natural Language Processing & Information Extraction CSE 573 Dan Weld

II. Probability of Correctness

How likely is an extraction to be correct?

Factors to consider include:• Authoritativeness of source • Confidence in extraction method• Number of independent extractions

Page 131: Natural Language Processing & Information Extraction CSE 573 Dan Weld

III Compositional Inference (work in progress, Schoenmackers, Etzioni, Weld)

Implicit information, (2+2=4)• TextRunner: (Turing, born in, London)• WordNet: (London, part of, England)• Rule: ‘born in’ is transitive thru ‘part of’• Conclusion: (Turing, born in, England)• Mechanism: MLN instantiated on the fly• Rules: learned from corpus (future work)• Inference Demo

Page 132: Natural Language Processing & Information Extraction CSE 573 Dan Weld

Mulder ‘01 WebKB ‘99 PMI-IR ‘01

KnowItAll, ‘04

UrnsBE ‘05

KnowItNow ‘05

TextRunner ‘07

KnowItAll Family Tree

Opine ‘05

Woodward ‘06

Resolver ‘07

REALM ‘07 Inference ‘08

Page 133: Natural Language Processing & Information Extraction CSE 573 Dan Weld

KnowItAll Team

• Michele Banko• Michael Cafarella• Doug Downey• Alan Ritter• Dr. Stephen Soderland• Stefan Schoenmackers• Prof. Dan Weld• Mausam

• Alumni: Dr. Ana-Maria Popescu, Dr. Alex Yates, and others.

Page 134: Natural Language Processing & Information Extraction CSE 573 Dan Weld

4. Conclusions

Imagine search systems that operate over a (more) semantic space

Key words, documents extractions TF-IDF, pagerank relational models Web pages, hyper links entities, relns

Reading the Web new Search Paradigm