Upload
curtis-parker
View
237
Download
0
Tags:
Embed Size (px)
Citation preview
1 Christel Kemke Morphology
COMP 4060 Natural Language Processing
Morphology, Word Classes,POS Tagging
2 Christel Kemke Morphology
Overview
Morphology
Stemming Word Classes POS Tagging
(Jurafsky, 2nd edition, Ch. 2, 3, 5; Allen Ch. 2,3)
3 Christel Kemke Morphology
Morphology
4 Christel Kemke Morphology
Morphemes and Words
Morpheme = "minimal meaning-bearing unit in a language"
Combine morphemes to create words Inflection
combination of a word stem with a grammatical morpheme same word class, e.g. clean (verb), clean-ing (verb)
Derivation combination of a word stem with a grammatical morpheme Yields different word class, e.g. clean (verb), clean-ing (noun)
Compounding combination of multiple word stems
Cliticization combination of a word stem with a clitic different words from different syntactic categories, e.g. I’ve = I
+ have
5 Christel Kemke Morphology
Inflectional Morphology
Inflectional Morphologyword stem + grammatical morpheme cat + sonly for nouns, verbs, and some adjectives Nouns
plural: regular: +s, +es irregular: mouse - mice; ox - oxenrules for exceptions: e.g. -y -> -ies like: butterfly - butterflies
possessive: +'s, +' Verbs
main verbs (sleep, eat, walk) modal verbs (can, will, should) primary verbs (be, have, do)
6 Christel Kemke Morphology
Inflectional Morphology (verbs)
Verb Inflections only for:main verbs (sleep, eat, walk); primary verbs (be, have, do)
Morpholog. Form Regularly Inflected Form stem walk merge try map -s form walks merges tries maps -ing participle walking merging trying mapping past; -ed participle walked merged tried mapped
Morph. Form Irregularly Inflected Form stem eat catch cut -s form eats catches cuts -ing participle eating catching cutting -ed past ate caught cut -ed participle eaten caught cut
7 Christel Kemke Morphology
Inflectional and Derivational Morphology (adjectives)
Adjective Inflections and Derivations: prefix un- unhappy adjective, negation suffix -ly happily adverb, mode
-er happier adjective, comparative 1
-est happiest adjective, comparative 2
suffix -ness happiness noun
plus combinations, like unhappiest, unhappiness.
Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.
8 Christel Kemke Morphology
Inflectional Morphology
9 Christel Kemke Morphology
Noun Inflections
10 Christel Kemke Morphology
Verb Inflections
11 Christel Kemke Morphology
Derivational Morphology
12 Christel Kemke Morphology
Noun Derivation
13 Christel Kemke Morphology
Adjective Derivation
14 Christel Kemke Morphology
Clitics
15 Christel Kemke Morphology
Verb Clitics
16 Christel Kemke Morphology
Methods, Algorithms
17 Christel Kemke Morphology
Stemming
Stemming algorithms strip off word affixes
yield stem only, no additional information (like plural, 3rd person etc.)
used, e.g. in web search engines famous stemming algorithm: the Porter
stemmer
18 Christel Kemke Morphology
Stemming Methods
Rule-based stemming Example rules:
ATIONAL→ ATE e.g., relational→ relate
ING→ if stem contains vowel, e.g., motoring→ motor
19 Christel Kemke Morphology
Stemming Problems
Errors of Comission Errors of Omission
organization organ European Europe
doing doe analysis analyzes
Generalization Generic Matrices matrix
Numerical numerous Noise noisy
Policy police sparse sparsity
20 Christel Kemke Morphology
Tokenization, Word Segmentation
Tokenization or word segmentation separate out “words” (lexical entries)
from running text expand abbreviated terms
E.g. I’m into I am, it’s into it is collect tokens forming single lexical
entry E.g. New York marked as one single entry
21 Christel Kemke Morphology
Tokenization, Word Segmentation
Finite state transducer (FST) Modifies input string (rules) Recognizes (stored) abbreviations and
composite words See Fig.3.22 in Jurafsky, Ch.3 More of an issue in languages like
Chinese
22 Christel Kemke Morphology
Lemmatization
Lemmatization maps words with same root but different surface appearances onto the same lexeme
e.g. buys, bought, buying -> buy
23 Christel Kemke Morphology
Morphological Processing
24 Christel Kemke Morphology
Word Reccognition
Spelling Errors Mark non-words based on
dictionary/lexicon Use “minimum editing distance”
Dynamic programming Table-based Transform operations
deletion, substitution, insertion Calculate minimum path
Morphological Parser = FST
25 Christel Kemke Morphology
Morphological Processing Knowledge
lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above)
rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs
orthographic rules: spelling, e.g. double consonant as in mapping
Processing: Finite State Transducers take information above and analyze word token /
generate word form
26 Christel Kemke Morphology
Fig. 3.3 FSA for verb inflection.
27 Christel Kemke Morphology
Fig. 3.5 More detailed FSA for adjective inflection.
Fig. 3.4 Simple FSA for adjective inflection.
28 Christel Kemke Morphology
Fig. 3.7 Compiled FSA for noun inflection.
29 Christel Kemke Morphology
Fig. 3.12 Lexical and intermediate tape of a FS Transducer
Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.
30 Christel Kemke Morphology
Word Classes and POS Tagging
31 Christel Kemke Morphology
Word Classes
Sort words into categories according to: morphological properties
Which types of morphological forms do they take?e.g. form plural: noun+s; 3rd person: verb+s
distributional propertiesWhat other words or phrases can occur nearby?e.g. possessive pronoun before noun
semantic coherenceClassify according to similar semantic type. e.g. nouns refer to object-like entities
32 Christel Kemke Morphology
Open vs. Closed Word Classes
Open Class TypesThe set of words in these classes can
change over time, with the development of the language, e.g. spaghetti and download
Open Class Types: nouns, verbs, adjectives, adverbs
33 Christel Kemke Morphology
Open vs. Closed Word Classes
Closed Class TypesThe set of words in these classes are very
much determined and hardly ever change for one language.
Closed Class Types: prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals
34 Christel Kemke Morphology
Open Class Words: Nouns
Nounsdenote objects, concepts, entities, events
Proper NounsNames for specific individual objects, entitiese.g. the Eiffel Tower, Dr. Kemke
Common NounsNames for categories, classes, abstracts, eventse.g. fruit, banana, table, freedom, sleep, race, ...
Count Nounsenumerable entities, e.g. two bananas
Mass Nounsnot countable items, e.g. water, salt, freedom
35 Christel Kemke Morphology
Open Class Words: Verbs
Verbs denote actions, processes, and states, e.g. smoke, dream,
rest, run
several morphological forms, e.g.
non-3rd person - eat, sleep
3rd person - eats, sleeps, progressive/ - eating, sleeping present participle/ gerundivepast participle - eaten, sleptsimple past - ate, slept
36 Christel Kemke Morphology
Open Class Words: Verbs (2)
non-3rd person eat I eat. We eat. They eat.3rd person eats He eats. She eats. It eats.progressive eating He is eating.
He will be eating. He has been eating.
e.g. present participle He is eating. gerundive Eating scorpions [NP] is
common in China.use as adjective Eating children [NP] are
common at McDonalds.past participle eaten He has eaten the scorpion.
The scorpion was eaten.simple past ate He ate the scorpion.
37 Christel Kemke Morphology
Verb Forms 1 - The five verb forms
Fig.2.6. The five verb forms. (Allen, 1995, p.28)
38 Christel Kemke Morphology
Verb Forms 2 - The basic tenses
Fig.2.7. The basic tenses. (Allen, 1995, p.29)
39 Christel Kemke Morphology
Verb Forms 3 - The progressive tenses
Fig.2.8. The progressive tenses. (Allen, 1995, p.29)
40
Past Present Future
Simple An action that ended at a point in the past.
An action that exists , is usual, or is repeated.
A plan for future action.
cooked cook / cooks will cook
(time clue)* e.g. He cooked yesterday. e.g. He cooks dinner every Friday. e.g. He will cook tomorrow.
Progressivebe + main verb +ing
An action was happening (past progressive) when another action happened (simple past).
An action that is happening now. An action that will be happening over time, in the future, when something else happens.
was / were cooking am / is / are cooking will be cooking
(time clue)* e.g. He was cooking when the phone rang.
e.g. He is cooking now. e.g. He will be cooking when you come.
Perfecthave + main verb
An action that ended before another action or time in the past.
An action that happened at an unspecified time in the past.
An action that will end before another action or time in the future.
had cooked has / have cooked will have cooked
(time clue)* e.g. He had cooked the dinner when the phone rang.
e.g. He has cooked many meals. e.g. He will have cooked dinner by the time you come.
Perfect Progressivehave + be + main verb + ing
An action that happened over time, in the past, before another time or action in the past.
An action occurring over time that started in the past and continues into the present.
An action occurring over time, in the future, before another action or time in the future.
had been cooking has / have been cooking will have been cooking
(time clue)* e.g. He had been cooking for a long time before he took lessons.
e.g. He has been cooking for over an hour.
e.g. He will have been cooking all day by the time she gets home.
Verb Tense Chart. From: http://www.athabascau.ca/courses/engl/155/support/verb_tenses.htm
41 Christel Kemke Morphology
Open Class Words: Adjectives
Adjectives denote qualities or properties of objects e.g. heavy, blue, content
most languages have concepts for
colour - white, green, ...
age - young, old, ...value - good, bad, ...
not all languages have adjectives as separate class
42 Christel Kemke Morphology
Open Class Words: Adverbs 1
Adverbsdenote modifications of actions (verbs) or qualities (adjectives) e.g. walk slowly or heavily drunk
Directional or Locational adverbsspecify direction or location e.g. go home, stay here
43 Christel Kemke Morphology
Open Class Words: Adverbs 2
Degree Adverbsspecify extent of process, action, property
e.g. extremely slow, very modest
Manner Adverbsspecify manner of action or process e.g. walk slowly, run fast
Temporal Adverbsspecify time of event or action e.g. yesterday, Monday
44 Christel Kemke Morphology
Closed Word Classes
Closed Class Types: Prepositions: on, under, over, at, from, to, with, ...
Determiners: a, an, the, ...
Pronouns: he, she, it, his, her, who, I, ...
Conjunctions: and, or, as, if, when, ...
Auxiliary verbs: can, may, should, are, …
Particles: up, down, on, off, in, out, …
Numerals: one, two, three, ..., first, second, ...
45 Christel Kemke Morphology
Closed Word Class: Prepositions
Prepositions
occur before noun phrases; describe relations; often spatial or temporal relations
e.g. on the table spatial
in two hours temporal
46 Christel Kemke Morphology
Closed Word Class: Pronouns
Pronouns reference to entities, events, relations etc.
Personal Pronounsrefer to persons or entities, e.g. you, he, it, ...
Possessive Pronounspossession or relation between person and object, e.g. his, her, my, its, ...
Wh-Pronounsreference in question or back reference, e.g. Who did this ..., Frieda, who is 80 years old ...
47 Christel Kemke Morphology
Closed Word Class: Conjunctions
Conjunctions join phrases or sentences; semantics is varied and complex
Coordinating ConjunctionJoin two phrases or sentences on the same level through conjunctions like and, or, but, ...
e.g. He takes a cat and a dog.He takes a dog and she takes a cat.
Subordinating ConjunctionConnect embedded phrases through e.g. thate.g. He thinks that the cat is nicer than the dog.
48 Christel Kemke Morphology
Closed Word Class: Auxiliary Verbs
Auxiliary Verbs Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult.
Tenseaddition expressing present, past or future, ...e.g. He will take the cat home.
Aspectaddition expressing completion of actione.g. He is taking the cat home. (incomplete)
Moodaddition expressing necessity of actione.g. He can take the cat home. (possible)
49 Christel Kemke Morphology
Closed Word Class: Copula, Modal Verbs
Copula (be, do, have) and Modal Verbs (can, should, ...) are subclasses of Auxiliary Verbs.
Describe state, process, or tense / modality of action. Semantics: difficult (e.g. modal logic)
State / Process: be and do e.g. He is at home. He does nothing.
Tense: havee.g. He has taken the cat home.
Modality: can, ought to, should, muste.g. He can take the cat home. (possibility)
50 Christel Kemke Morphology
Tagsets and POS Tagging
51 Christel Kemke Morphology
POS Tagging - Tagsets
Tagsets for English Penn Treebank, 45 tags Brown corpus, 87 tags C5 tagset, 61 tags C7 tagset, 146 tags
For references see Jurafsky, p.296C5 and C7 tagsets are listed in Appendix C
52Fig. 8.6 Penn Treebank, 45 tags
53 Christel Kemke Morphology
Ambiguity in POS Tagging
Fig. 8.7 Ambiguity in tagging. The left column classifies words according to the number of tags, which can be used for them. The right column shows how many words fall into each class. E.g. there are 264 words which can be tagged with 3 different POS tags, and 1 word (“still”) which has 7 possible tags. (based on the Brown Corpus)
54 Christel Kemke Morphology
POS Tagging - Taggers
Methods for POS Tagging:Rule-Based Tagging
use dictionary to assign POS; then use rules to disambiguate different POS/word classes (e.g. book as verb or noun)
Stochastic Taggingdetermines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines).
Learn tagging rulesProblem in POS Tagging: AmbiguityProblem in POS Tagging: Which tag set to use?