24
1 Parts of Speech Sudeshna Sarkar 7 Aug 2008

Parts of Speech

Embed Size (px)

DESCRIPTION

Parts of Speech. Sudeshna Sarkar 7 Aug 2008. Why Do We Care about Parts of Speech?. Pronunciation Hand me the lead pipe. Predicting what words can be expected next Personal pronoun (e.g., I , she ) ____________ Stemming -s means singular for verbs, plural for nouns - PowerPoint PPT Presentation

Citation preview

Page 1: Parts of Speech

1

Parts of Speech

Sudeshna Sarkar

7 Aug 2008

Page 2: Parts of Speech

2

Why Do We Care about Parts of Speech?

•PronunciationHand me the lead pipe.

•Predicting what words can be expected nextPersonal pronoun (e.g., I, she) ____________

•Stemming-s means singular for verbs, plural for nouns

•As the basis for syntactic parsing and then meaning extractionI will lead the group into the lead smelter.

•Machine translation• (E) content +N (F) contenu +N• (E) content +Adj (F) content +Adj or satisfait +Adj

Page 3: Parts of Speech

3

What is a Part of Speech?

Is this a semantic distinction? For example, maybe Noun is the class of words for people, places and things. Maybe Adjective is the class of words for properties of nouns.

Consider: green book

book is a Noun

green is an Adjective

Now consider: book worm

This green is very soothing.

Page 4: Parts of Speech

4

How Many Parts of Speech Are There?

A first cut at the easy distinctions:

Open classes:

•nouns, verbs, adjectives, adverbs

Closed classes: function words

•conjunctions: and, or, but

•pronounts: I, she, him

•prepositions: with, on

•determiners: the, a, an

Page 5: Parts of Speech

5

Part of speech tagging

8 (ish) traditional parts of speechNoun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc

This idea has been around for over 2000 years (Dionysius Thrax of Alexandria, c. 100 B.C.)

Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS

We’ll use POS most frequently

I’ll assume that you all know what these are

Page 6: Parts of Speech

6

POS examples

N noun chair, bandwidth, pacing

V verb study, debate, munch

ADJ adj purple, tall, ridiculous

ADV adverb unfortunately, slowly,

P preposition of, by, to

PRO pronoun I, me, mine

DET determiner the, a, that, those

Page 7: Parts of Speech

7

Tagsets

Brown corpus tagset (87 tags):

http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html

Penn Treebank tagset (45 tags):

http://www.cs.colorado.edu/~martin/SLP/Figures/ (8.6)

C7 tagset (146 tags)

http://www.comp.lancs.ac.uk/ucrel/claws7tags.html

Page 8: Parts of Speech

8

POS Tagging: Definition

The process of assigning a part-of-speech or lexical class marker to each word in a corpus:

thekoalaputthekeysonthetable

WORDSTAGS

NVPDET

Page 9: Parts of Speech

9

POS Tagging example

WORD tag

the DETkoala Nput Vthe DETkeys Non Pthe DETtable N

Page 10: Parts of Speech

10

POS tagging: Choosing a tagset

There are so many parts of speech, potential distinctions we can draw

To do POS tagging, need to choose a standard set of tags to work with

Could pick very coarse tagetsN, V, Adj, Adv.

More commonly used set is finer grained, the “UPenn TreeBank tagset”, 45 tags

PRP$, WRB, WP$, VBG

Even more fine-grained tagsets exist

Page 11: Parts of Speech

11

Penn TreeBank POS Tag set

Page 12: Parts of Speech

12

Using the UPenn tagset

The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”)

Except the preposition/complementizer “to” is just marked “to”.

Page 13: Parts of Speech

13

POS Tagging

Words often have more than one POS: backThe back door = JJ

On my back = NN

Win the voters back = RB

Promised to back the bill = VB

The POS tagging problem is to determine the POS tag for a particular instance of a word.

Page 14: Parts of Speech

14

How hard is POS tagging? Measuring ambiguity

Page 15: Parts of Speech

15

Algorithms for POS Tagging

•Ambiguity – In the Brown corpus, 11.5% of the word types are ambiguous (using 87 tags):

Worse, 40% of the tokens are ambiguous.

Page 16: Parts of Speech

16

Algorithms for POS Tagging

Why can’t we just look them up in a dictionary?

•Words that aren’t in the dictionary

http://story.news.yahoo.com/news?tmpl=story&cid=578&ncid=578&e=1&u=/nm/20030922/ts_nm/iraq_usa_dc

•One idea: P(ti | wi) = the probability that a random hapax legomenon in the corpus has tag ti.

Nouns are more likely than verbs, which are more likely than pronouns.

•Another idea: use morphology.

Page 17: Parts of Speech

17

Algorithms for POS Tagging - Knowledge

•Dictionary

•Morphological rules, e.g.,•_____-tion•_____-ly•capitalization

•N-gram frequencies•to _____•DET _____ N•But what about rare words, e.g, smelt (two verb forms, melt and past tense of smell, and one noun form, a small fish)

•Combining these• V _____-ing I was gracking vs. Gracking is fun.

Page 18: Parts of Speech

18

POS Tagging - Approaches

ApproachesRule-based tagging

(ENGTWOL)Stochastic (=Probabilistic) tagging

HMM (Hidden Markov Model) taggingTransformation-based tagging

Brill tagger

• Do we return one best answer or several answers and let later steps decide?

• How does the requisite knowledge get entered?

Page 19: Parts of Speech

19

3 methods for POS tagging

1. Rule-based taggingExample: Karlsson (1995) EngCG tagger based on the Constraint Grammar architecture and ENGTWOL lexicon

– Basic Idea:

Assign all possible tags to words (morphological analyzer used)

Remove wrong tags according to set of constraint rules (typically more than 1000 hand-written constraint rules, but may be machine-learned)

Page 20: Parts of Speech

20

3 methods for POS tagging

2. Transformation-based taggingExample: Brill (1995) tagger - combination of rule-based and stochastic (probabilistic) tagging methodologies– Basic Idea:

Start with a tagged corpus + dictionary (with most frequent tags) Set the most probable tag for each word as a start value Change tags according to rules of type “if word-1 is a determiner

and word is a verb then change the tag to noun” in a specific order (like rule-based taggers)

machine learning is used—the rules are automatically induced from a previously tagged training corpus (like stochastic approach)

Page 21: Parts of Speech

21

3 methods for POS tagging

3. Stochastic (=Probabilistic) taggingExample: HMM (Hidden Markov Model) tagging - a training corpus used to compute the probability (frequency) of a given word having a given POS tag in a given context

Page 22: Parts of Speech

22

Hidden Markov Model (HMM) Tagging

Using an HMM to do POS tagging

HMM is a special case of Bayesian inference

It is also related to the “noisy channel” model in ASR (Automatic Speech Recognition)

Page 23: Parts of Speech

23

Goal: maximize P(word|tag) x P(tag|previous n tags)

P(word|tag) word/lexical likelihoodprobability that given this tag, we have this word NOT probability that this word has this tagmodeled through language model (word-tag matrix)

P(tag|previous n tags)tag sequence likelihoodprobability that this tag follows these previous tagsmodeled through language model (tag-tag matrix)

Hidden Markov Model (HMM) Taggers

Lexical information Syntagmatic information

Page 24: Parts of Speech

24

POS tagging as a sequence classification task

We are given a sentence (an “observation” or “sequence of observations”)

Secretariat is expected to race tomorrow

sequence of n words w1…wn.

What is the best sequence of tags which corresponds to this sequence of observations?

Probabilistic/Bayesian view:Consider all possible sequences of tags

Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w1…wn.