17
Introduction to NLP ch1 What is Natural Language Processing?

Introduction to NLP ch1 What is Natural Language Processing?

Embed Size (px)

Citation preview

Introduction to NLP ch1

What is Natural Language Processing?

Dan Jurafsky

2

The original slides from:

http://web.stanford.edu/~jurafsky/NLPCourseraSlides.html

Some changes has done to these slides to fit with our NLP course

Dan Jurafsky

3

Natural language processing (NLP)is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages

Major applications and tasks in NLP:• Machine translation• Named entity recognition• Part-of-speech tagging• Parsing• Question answering• Sentiment analysis• Speech recognition• Information retrieval• Information extraction• Automatic summarization

Dan Jurafsky

4

Question Answering: IBM’s Watson

Dan Jurafsky

5

Question Answering: IBM’s Watson

Dan Jurafsky

6

Information Extraction

Subject: curriculum meeting Date: January 15, 2012

To: Dan Jurafsky

Hi Dan, we’ve now scheduled the curriculum meeting.It will be in Gates 159 tomorrow from 10:00-11:30.-Chris Create new Calendar entry

Event: Curriculum mtgDate: Jan-16-2012Start: 10:00amEnd: 11:30amWhere: Gates 159

Dan Jurafsky

Information Extraction & Sentiment Analysis

• nice and compact to carry! • since the camera is small and light, I won't need to carry

around those heavy, bulky professional cameras either! • the camera feels flimsy, is plastic and very light in weight you

have to be very delicate in the handling of this camera7

Size and weight

Attributes: zoom affordability size and weight flash ease of use

Dan Jurafsky

Information Extraction & Sentiment Analysis

• nice and compact to carry! • since the camera is small and light, I won't need to carry

around those heavy, bulky professional cameras either! • the camera feels flimsy, is plastic and very light in weight you

have to be very delicate in the handling of this camera8

Size and weight

Attributes: zoom affordability size and weight flash ease of use

Dan Jurafsky

9

Machine Translation

• Fully automatic• Helping human translators

Enter Source Text:

Translation from Stanford’s Phrasal:

  这 不过 是 一 个 时间 的 问题  .

This is only a matter of time.

Dan Jurafsky

Language Technology

Coreference resolution

Question answering (QA)

Part-of-speech (POS) tagging

Word sense disambiguation (WSD)Paraphrase

Named entity recognition (NER)

ParsingSummarization

Information extraction (IE)

Machine translation (MT)Dialog

Sentiment analysis

mostly solved

making good progress

still really hard

Spam detectionLet’s go to Agra!

Buy V1AGRA …

✓✗

Colorless green ideas sleep furiously. ADJ ADJ NOUN VERB ADV

Einstein met with UN officials in PrincetonPERSON ORG LOC

You’re invited to our dinner party, Friday May 27 at 8:30

PartyMay 27add

Best roast chicken in San Francisco!

The waiter ignored us for 20 minutes.

Carter told Mubarak he shouldn’t run again.

I need new batteries for my mouse.

The 13th Shanghai International Film Festival…

第 13届上海国际电影节开幕…

The Dow Jones is up

Housing prices rose

Economy is good

Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?

I can see Alcatraz from the window!

XYZ acquired ABC yesterdayABC has been taken over by XYZ

Where is Citizen Kane playing in SF?

Castro Theatre at 7:30. Do you want a ticket?

The S&P500 jumped

Dan Jurafsky

Ambiguity makes NLP hard:We say some input is ambiguousif there are multiple alternative linguistic structures than can be built for it.

Examples of the ambiguity :-Violinist Linked to JAL Crash Blossoms

Violinist Linked to JAL Crash BlossomsViolinist Linked to JAL Crash Blossoms

-Red Tape Holds Up New BridgesRed Tape delay New BridgesRed Tape to support New Bridges

100%REAL

Dan Jurafsky

Ambiguity is pervasive

Fed raises interest rates

New York Times headline (17 May 2000)

Fed raises interest rates

Fed raises interest rates 0.5%

Dan Jurafsky

non-standard EnglishGreat job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥

segmentation issues idiomsdark horse

get cold feetlose face

throw in the towel

neologismsunfriendRetweet

bromance

tricky entity namesWhere is A Bug’s Life playing …Let It Be was recorded …… a mutation on the for gene …

world knowledge

Mary and Sue are sisters.Mary and Sue are mothers.

But that’s what makes it fun!

the New York-New Haven Railroadthe New York-New Haven Railroad

Why else is making natural language understanding difficult?

Dan Jurafsky

Making progress on this problem…• The task is difficult! What tools do we need?

• Knowledge about language• Knowledge about the world• A way to combine knowledge sources

• How we generally do this:• probabilistic models built from language data• P(“maison” “house”) high• P(“L’avocat général” “the general avocado”) low

Dan Jurafsky

15

• To summarize, the knowledge of language needed to engage in complex, language behavior can be separated into six distinct categories:

1-Phonetics and Phonology – The study of linguistic sounds. 2-Morphology – The study of the meaningful components of words. 3-Syntax – The study of the structural relationships between words. 4-Semantics – The study of meaning. 5-Pragmatics – The study of how language is used to accomplish goals. 6-Discourse – The study of linguistic units larger than a single utterance.

Dan Jurafsky

This class

• Teaches key theory and methods for statistical NLP:• Viterbi• Naïve Bayes, Maxent classifiers• N-gram language modeling• Statistical Parsing• Inverted index, tf-idf, vector models of meaning

Introduction to NLP

What is Natural Language Processing?