Upload
andrew-elliott
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
04/18/23 CPSC503 Winter 2009 2
Knowledge-Formalisms Map(including probabilistic formalisms)
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners
04/18/23 CPSC503 Winter 2009 3
Today Sep 18• Dealing with spelling errors
– Noisy channel model
– Bayes rule applied to Noisy channel model (single and multiple spelling errors)
• Min Edit Distance ?
• Start n-grams models: Language Models
04/18/23 CPSC503 Winter 2009 4
Background knowledge
• Morphological analysis• P(x) (prob. distribution)• joint p(x,y)• conditional p(x|y)• Bayes rule• Chain rule
04/18/23 CPSC503 Winter 2009 5
Spelling: the problem(s)
Non-word isolated
Non-word context
Detection
Correction
Vw?
Find the most
likely correct word
funn -> funny, fun, ...
…in this context– trust funn – a lot of funn
Real-word isolated
Real-word context
?!
Is it an impossible (or very unlikely) word
in this context?
.. a wild dig.
Find the most likely substitution word in
this context
04/18/23 CPSC503 Winter 2009 6
Spelling: Data• .05% -3% - 38%• 80% of misspelled words, single error
– insertion (toy -> tony)– deletion (tuna -> tua) – substitution (tone -> tony) – transposition (length -> legnth)
• Types of errors– Typographic (more common, user knows the correct
spelling… the -> rhe)– Cognitive (user doesn’t know…… piece -> peace)
04/18/23 CPSC503 Winter 2009 7
Noisy Channel• An influential metaphor in language
processing is the noisy channel model
• Special case of Bayesian classification
signalsignal
noisysignal
04/18/23 CPSC503 Winter 2009 8
)|(maxargˆ OwPwVw
Goal: Find the most likely word given some observed (misspelled) word
Bayes and the Noisy Channel: Spelling Non-word
isolated
04/18/23 CPSC503 Winter 2009 10
Solution
1. Apply Bayes Rule2. Simplify
)|(maxargˆ OwPwVw
Vww
maxargˆ
Vww
maxargˆ
priorlikelihood
04/18/23 CPSC503 Winter 2009 11
Estimate of prior P(w) (Easy)
smoothingN
wCwP
)()(
||5.0
5.0)()(
VN
wCwP
1||5.0
5.0)(
||5.0
5.0)()(
VN
wC
VN
wCwP Vw
Vw Vw
Always verify…
04/18/23 CPSC503 Winter 2009 12
Estimate of P(O|w) is feasible(Kernighan et. al ’90)
For one-error misspelling:• Estimate the probability of each
possible error type e.g., insert a after c, substitute f with h
• P(O|w) equal to the probability of
the error that generated O from w e.g., P( cbat| cat) = P(insert b after c)
04/18/23 CPSC503 Winter 2009 13
Estimate P(error type)
(e.g substitution: sub[x,y]) and count
matrix
……
…
a b c d … ……
… ……
…
……
…
……
…
……
…
a b c ……
… 5 88
15
#Times b was incorrectly used for a
Large corpus compute confusion matrices
)(
],[)(
acount
absubaforsubsbP
Count(a)= # of a in corpus
04/18/23 CPSC503 Winter 2009 14
Corpus: Example
… On 16 January, he sais [sub[i,y] 3] that because of astronaut safety tha [del[a,t] 4] would be no more space shuttle missions to miantain [tran[a,i] 2] and upgrade the orbiting telescope……..
04/18/23 CPSC503 Winter 2009 15
Final Method single error(1) Given O, collect all the wi that could
have generated O by one error. E.g., O=acress => w1 = actress (t deletion),
w2 = across (sub o with e), … …
(3) Sort and display top-n to user
)()|( ii wPwOPword prior
Probability of the error generating O from w1
(2) For all the wi compute:
How to do (1): Generate all the strings that could have generated O by one error (how?). Keep the words
04/18/23 CPSC503 Winter 2009 16
Example: O = acress
)( iwP )()|( ii wPwOP)|( iwOP)( iwCiw
…stellar and versatile acress whose…
_
_
_
_
_
1988 AP newswire corpus 44 million words
04/18/23 CPSC503 Winter 2009 18
Corpora: issues to remember
• Zero counts in the corpus: Just because an event didn’t happen in the corpus doesn’t mean it won’t happene.g., cress has not really zero
probability • Getting a corpus that matches the actual use.
e.g., Kids don’t misspell the same way that adults do
04/18/23 CPSC503 Winter 2009 19
Multiple Spelling Errors• (BEFORE) Given O collect all the wi
that could have generated O by one error…….
• (NOW) Given O collect all the wi that could have generated O by 1..k errors How? (for two errors): Collect all the strings that could have generated O by one error, then collect all the wi that could have generated one of those strings by one error
Etc.
04/18/23 CPSC503 Winter 2009 20
Final Method multiple errors(1) Given O, for each wi that can be
generated from O by a sequence of edit operations EdOpi ,save EdOpi .
(3) Sort and display top-n to user
)()|( ii wPwOPword prior
Probability of the errors generating O from wi
(2) For all the wi compute:
iEdOpx
xP )(
04/18/23 CPSC503 Winter 2009 21
Spelling: the problem(s)
Non-word isolated
Non-word context
Detection
Correction
Vw?
Find the most
likely correct word
funn -> funny, funnel...
…in this context– trust funn – a lot of funn
Real-word isolated
Real-word context
?!
Is it an impossible (or very unlikely) word
in this context?
.. a wild dig.
Find the most likely sub word in this
context
04/18/23 CPSC503 Winter 2009 22
Real Word Spelling Errors• Collect a set of common sets of
confusions: C={C1 .. Cn}
e.g.,{(Their/they’re/there), (To/too/two), (Weather/whether), (lave, have)..}
• Whenever c’ Ci is encountered • Compute the probability of the
sentence in which it appears• Substitute all cCi (c ≠ c’) and
compute the probability of the resulting sentence
• Choose the higher one
Want to play with Spelling Correction: minimal noisy
channel model implementation
• (Python) http://www.norvig.com/spell-correct.html
04/18/23 CPSC503 Winter 2009 23
• By the way Peter Norvig is Director of Research at Google Inc.
04/18/23 CPSC503 Winter 2009 24
Today Sep 18• Dealing with spelling errors
– Noisy channel model
– Bayes rule applied to Noisy channel model (single and multiple spelling errors)
• Min Edit Distance ?
• Start n-grams models: Language Models
04/18/23 CPSC503 Winter 2009 25
Minimum Edit Distance• Def. Minimum number of edit
operations (insertion, deletion and substitution) needed to transform one string into another.
gumbo
gumb
gum
gam
delete o
delete b
substitute u by a
04/18/23 CPSC503 Winter 2009 26
Minimum Edit Distance Algorithm
• Dynamic programming (very common technique in NLP)
• High level description:– Fills in a matrix of partial comparisons– Value of a cell computed as “simple”
function of surrounding cells– Output: not only number of edit operations
but also sequence of operations
04/18/23 CPSC503 Winter 2009 27
target
source
ij
Minimum Edit Distance Algorithm Details
ed[i,j] = min distance between first i chars of the source and first j chars of the target
del-cost =1sub-cost=2
ins-cost=1
update
x
y
z
del
ins
sub or equal
?
i-1 , ji-1, j-1
i , j-1
MIN(z+1,y+1, x + (2 or 0))
04/18/23 CPSC503 Winter 2009 28
target
source
ij
Minimum Edit Distance Algorithm Details
ed[i,j] = min distance between first i chars of the source and first j chars of the target
del-cost =1sub-cost=2
ins-cost=1
update
x
y
z
del
ins
sub or equal
?
i-1 , ji-1, j-1
i , j-1
MIN(z+1,y+1, x + (2 or 0))
04/18/23 CPSC503 Winter 2009 30
Today Sep 18• Dealing with spelling errors
– Noisy channel model
– Bayes rule applied to Noisy channel model (single and multiple spelling errors)
• Min Edit Distance ?
• Start n-grams models: Language Models
04/18/23 CPSC503 Winter 2009 31
Key Transition
• Up to this point we’ve mostly been discussing words in isolation
• Now we’re switching to sequences of words
• And we’re going to worry about assigning probabilities to sequences of words
04/18/23 CPSC503 Winter 2009 32
Knowledge-Formalisms Map(including probabilistic formalisms)
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners
04/18/23 CPSC503 Winter 2009 33
Only Spelling?A.Assign a probability to a sentence
• Part-of-speech tagging• Word-sense disambiguation• Probabilistic Parsing
B.Predict the next word• Speech recognition• Hand-writing recognition• Augmentative communication for the
disabled
AB
?),..,( 1 nwwP Impossible to estimate
04/18/23 CPSC503 Winter 2009 34
Decompose: apply chain rule
Chain Rule:
)|(),..(1
111
i
jji
n
in AAPAAP
nw1
)|()(
)|()...|()()(),..,(
112
1
1112111
k
kn
k
nn
nn
wwPwP
wwPwwPwPwPwwP
Applied to a word sequence from position 1 to n:
04/18/23 CPSC503 Winter 2009 35
Example• Sequence “The big red dog barks”• P(The big red dog barks)= P(The) *
P(big|the) * P(red|the big)*
P(dog|the big red)* P(barks|the big red dog)
Note - P(The) is better expressed as: P(The|<Beginning of sentence>) written as P(The|
<S>)
04/18/23 CPSC503 Winter 2009 36
Not a satisfying solution Even for small n (e.g., 6) we would
need a far too large corpus to estimate:
)|(....... 516 wwP
Markov Assumption: the entire prefix history isn’t necessary.
)|()|( 11
11
nNnn
nn wwPwwP
),|()|(3
)|()|(2
)()|(1
211
1
11
1
11
nnnn
n
nnn
n
nn
n
wwwPwwPN
wwPwwPN
wPwwPN unigram
bigram
trigram
04/18/23 CPSC503 Winter 2009 37
Prob of a sentence: N-Grams
)|()()(),..,( 112
111
kk
n
k
nn wwPwPwPwwP
)()()(),..,(2
111 kn
k
nn wPwPwPwwP
)|()()(),..,( 12
111 kkn
k
nn wwPwPwPwwP
unigram
bigram
trigram)|()()(),..,( 2,12
111 kkkn
k
nn wwwPwPwPwwP
04/18/23 CPSC503 Winter 2009 38
Bigram<s>The big red dog barks
P(The big red dog barks)= P(The|<S>) *
P(big|the) *P(red|big)*
P(dog|red)* P(barks|dog)
)|()|()(),..,( 12
111 kkn
k
nn wwPSwPwPwwP
Trigram?
04/18/23 CPSC503 Winter 2009 39
Estimates for N-Grams
)(
),()(
),(
)(
)()|(
1
1
1
1
1
11
n
nn
words
n
pairs
nn
n
nnnn
wC
wwC
NwC
NwwC
wP
wwPwwP
bigram
..in general)(
)()|(
11
111
1
nNn
nnNnn
NnnwC
wwCwwP