Upload
alvin-wyatt
View
27
Download
0
Embed Size (px)
DESCRIPTION
Parts of Speech. Sudeshna Sarkar 7 Aug 2008. Why Do We Care about Parts of Speech?. Pronunciation Hand me the lead pipe. Predicting what words can be expected next Personal pronoun (e.g., I , she ) ____________ Stemming -s means singular for verbs, plural for nouns - PowerPoint PPT Presentation
Citation preview
1
Parts of Speech
Sudeshna Sarkar
7 Aug 2008
2
Why Do We Care about Parts of Speech?
•PronunciationHand me the lead pipe.
•Predicting what words can be expected nextPersonal pronoun (e.g., I, she) ____________
•Stemming-s means singular for verbs, plural for nouns
•As the basis for syntactic parsing and then meaning extractionI will lead the group into the lead smelter.
•Machine translation• (E) content +N (F) contenu +N• (E) content +Adj (F) content +Adj or satisfait +Adj
3
What is a Part of Speech?
Is this a semantic distinction? For example, maybe Noun is the class of words for people, places and things. Maybe Adjective is the class of words for properties of nouns.
Consider: green book
book is a Noun
green is an Adjective
Now consider: book worm
This green is very soothing.
4
How Many Parts of Speech Are There?
A first cut at the easy distinctions:
Open classes:
•nouns, verbs, adjectives, adverbs
Closed classes: function words
•conjunctions: and, or, but
•pronounts: I, she, him
•prepositions: with, on
•determiners: the, a, an
5
Part of speech tagging
8 (ish) traditional parts of speechNoun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc
This idea has been around for over 2000 years (Dionysius Thrax of Alexandria, c. 100 B.C.)
Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS
We’ll use POS most frequently
I’ll assume that you all know what these are
6
POS examples
N noun chair, bandwidth, pacing
V verb study, debate, munch
ADJ adj purple, tall, ridiculous
ADV adverb unfortunately, slowly,
P preposition of, by, to
PRO pronoun I, me, mine
DET determiner the, a, that, those
7
Tagsets
Brown corpus tagset (87 tags):
http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html
Penn Treebank tagset (45 tags):
http://www.cs.colorado.edu/~martin/SLP/Figures/ (8.6)
C7 tagset (146 tags)
http://www.comp.lancs.ac.uk/ucrel/claws7tags.html
8
POS Tagging: Definition
The process of assigning a part-of-speech or lexical class marker to each word in a corpus:
thekoalaputthekeysonthetable
WORDSTAGS
NVPDET
9
POS Tagging example
WORD tag
the DETkoala Nput Vthe DETkeys Non Pthe DETtable N
10
POS tagging: Choosing a tagset
There are so many parts of speech, potential distinctions we can draw
To do POS tagging, need to choose a standard set of tags to work with
Could pick very coarse tagetsN, V, Adj, Adv.
More commonly used set is finer grained, the “UPenn TreeBank tagset”, 45 tags
PRP$, WRB, WP$, VBG
Even more fine-grained tagsets exist
11
Penn TreeBank POS Tag set
12
Using the UPenn tagset
The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.
Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”)
Except the preposition/complementizer “to” is just marked “to”.
13
POS Tagging
Words often have more than one POS: backThe back door = JJ
On my back = NN
Win the voters back = RB
Promised to back the bill = VB
The POS tagging problem is to determine the POS tag for a particular instance of a word.
14
How hard is POS tagging? Measuring ambiguity
15
Algorithms for POS Tagging
•Ambiguity – In the Brown corpus, 11.5% of the word types are ambiguous (using 87 tags):
Worse, 40% of the tokens are ambiguous.
16
Algorithms for POS Tagging
Why can’t we just look them up in a dictionary?
•Words that aren’t in the dictionary
http://story.news.yahoo.com/news?tmpl=story&cid=578&ncid=578&e=1&u=/nm/20030922/ts_nm/iraq_usa_dc
•One idea: P(ti | wi) = the probability that a random hapax legomenon in the corpus has tag ti.
Nouns are more likely than verbs, which are more likely than pronouns.
•Another idea: use morphology.
17
Algorithms for POS Tagging - Knowledge
•Dictionary
•Morphological rules, e.g.,•_____-tion•_____-ly•capitalization
•N-gram frequencies•to _____•DET _____ N•But what about rare words, e.g, smelt (two verb forms, melt and past tense of smell, and one noun form, a small fish)
•Combining these• V _____-ing I was gracking vs. Gracking is fun.
18
POS Tagging - Approaches
ApproachesRule-based tagging
(ENGTWOL)Stochastic (=Probabilistic) tagging
HMM (Hidden Markov Model) taggingTransformation-based tagging
Brill tagger
• Do we return one best answer or several answers and let later steps decide?
• How does the requisite knowledge get entered?
19
3 methods for POS tagging
1. Rule-based taggingExample: Karlsson (1995) EngCG tagger based on the Constraint Grammar architecture and ENGTWOL lexicon
– Basic Idea:
Assign all possible tags to words (morphological analyzer used)
Remove wrong tags according to set of constraint rules (typically more than 1000 hand-written constraint rules, but may be machine-learned)
20
3 methods for POS tagging
2. Transformation-based taggingExample: Brill (1995) tagger - combination of rule-based and stochastic (probabilistic) tagging methodologies– Basic Idea:
Start with a tagged corpus + dictionary (with most frequent tags) Set the most probable tag for each word as a start value Change tags according to rules of type “if word-1 is a determiner
and word is a verb then change the tag to noun” in a specific order (like rule-based taggers)
machine learning is used—the rules are automatically induced from a previously tagged training corpus (like stochastic approach)
21
3 methods for POS tagging
3. Stochastic (=Probabilistic) taggingExample: HMM (Hidden Markov Model) tagging - a training corpus used to compute the probability (frequency) of a given word having a given POS tag in a given context
22
Hidden Markov Model (HMM) Tagging
Using an HMM to do POS tagging
HMM is a special case of Bayesian inference
It is also related to the “noisy channel” model in ASR (Automatic Speech Recognition)
23
Goal: maximize P(word|tag) x P(tag|previous n tags)
P(word|tag) word/lexical likelihoodprobability that given this tag, we have this word NOT probability that this word has this tagmodeled through language model (word-tag matrix)
P(tag|previous n tags)tag sequence likelihoodprobability that this tag follows these previous tagsmodeled through language model (tag-tag matrix)
Hidden Markov Model (HMM) Taggers
Lexical information Syntagmatic information
24
POS tagging as a sequence classification task
We are given a sentence (an “observation” or “sequence of observations”)
Secretariat is expected to race tomorrow
sequence of n words w1…wn.
What is the best sequence of tags which corresponds to this sequence of observations?
Probabilistic/Bayesian view:Consider all possible sequences of tags
Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w1…wn.