View
4
Download
0
Category
Preview:
Citation preview
▪▪▪
▪▪▪
▪▪
▪▪
Parts of Speech
More Fine-Grained Classes
More Fine-Grained Classes
Actually, I ran home extremely quickly yesterday
The closed classes
Example of POS tagging
The Penn Treebank Part-of-Speech Tagset
The Universal POS tagset
https://universaldependencies.org
https://universaldependencies.org/
POS tagging
goal: resolve POS ambiguities
POS tagging
Most Frequent Class Baseline
The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy
of 92.34%.
Most Frequent Class Baseline
The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy
of 92.34%.
● 97% tag accuracy achievable by most algorithms(HMMs, MEMMs, neural networks, rule-based algorithms)
Why POS tagging
▪ Text-to-speech ▪ record, lead, protest
▪ Lemmatization ▪ saw/V → see, saw/N → saw
▪ Preprocessing for harder disambiguation problems▪ syntactic parsing▪ semantic parsing
Generative sequence labeling: Hidden Markov Models
o1 o2 on
▪ In real world many events are not observable▪ Speech recognition: we observe
acoustic features but not the phones▪ POS tagging: we observe words but
not the POS tags
Hidden Markov Models
q1
q2
qn ...
HMM
From J&M
HMM example
From J&M
HMMs:Algorithms
From J&M
Forward
Viterbi
Forward–Backward; Baum–Welch
HMM tagging as decoding
HMM tagging as decoding
How many possible choices?
Part of speech tagging example
Slide credit: Noah Smith
The Viterbi Algorithm
The Viterbi Algorithm
The Viterbi Algorithm
The Viterbi Algorithm
Beam search
HMMs:Algorithms
From J&M
Forward
Viterbi
Forward–Backward; Baum–Welch
The Forward Algorithm
sum instead of max
Viterbi
▪ n-best decoding▪ relationship to sequence alignment▪
Extending the HMM Algorithm to Trigrams
▪ Word shape▪ lower case → x▪ upper case → X▪ numbers → d▪ punctuation → .▪ I.M.F → X.X.X▪ DC10-30 → XXdd-dd
▪ Word shape + consecutive character types are removed▪ DC10-30 → Xd-d
▪ Prefixes & suffixes ▪ -s, -ed, ing▪
Unknown Words
Brants (2000)
▪ a trigram HMM▪ handling unknown words▪ 96.7% on the Penn Treebank
Generative vs. Discriminative models
▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data
▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes
From Bamman
Maximum Entropy Markov Models (MEMM)
▪ HMM
▪ MEMM
Features in a MEMM
Features in a MEMM
▪ well-dressed
Decoding and Training MEMMs
Decoding MEMMs
greedy approach:
doesn’t use evidence from future decisions
Decoding MEMMs
Viterbi
▪ filling the chart with▪ HMM
▪ MEMM
Bidirectionality
▪ Label bias or observation bias problem ▪ will/NN to/TO fight/VB
▪ Linear-chain CRF (Lafferty et al. 2001)▪ A bidirectional version of the MEMM (Toutanova et al. 2003)▪ bi-LSTM
Neural sequence tagger
▪ Lample et al. 2016▪ Neural Architectures for NER
Multilingual POS tagging
▪ In morphologically-rich languages like Czech, Hungarian, Turkish▪ a 250,000 word token corpus of Hungarian has more than twice as
many word types as a similarly sized corpus of English▪ a 10 million word token corpus of Turkish contains four times as
many word types as a similarly sized English corpus
▪ ⇒ many UNKs▪ more information is coded in morphology
Multilingual POS tagging
▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly▪ UNKs are difficult: the majority of unknown words are common
nouns and verbs because of extensive compounding
▪ Universal POS tagset accounts for cross-linguistic differences
Named Entity Recognition
Named Entity tags
Ambiguity in NER
NER as Sequence Labeling
IOB tagging scheme
A feature-based algorithm for NER
A feature-based algorithm for NER
▪ gazetteers▪ a list of place names providing millions of entries for locations with
detailed geographical and political information▪ binary indicator features
Evaluation of NER
▪ F-score▪ segmentation is a confound
▪ e.g., American/B-ORG Airlines▪ 2 errors: false positive for O and a false negative for I-ORG
HMMs in Automatic Speech Recognition
ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb
“speech lab”
HMMs in Automatic Speech Recognition
w1
w2
Words
s1
s2
s3
s4
s5
s6
s7
Sound types
a1
a2
a3
a4
a5
a6
a7
Acousticobservations
Languagemodel
Acousticmodel
Recommended