View
223
Download
1
Embed Size (px)
Citation preview
Outline
• HMM, PFSA, and PCFG
• Inside and outside probability
• Expected counts and update formulae
• Relation to EM
• Relation between inside-outside and forward-backward algorithms
PCFG
• A PCFG is a tuple: – N is a set of non-terminals:– is a set of terminals– N1 is the start symbol– R is a set of rules– P is the set of probabilities on rules
• We assume PCFG is in Chomsky Norm Form• Parsing algorithms:
– Earley (top-down)– CYK (bottom-up)– …
),,,,( 1 PRNN
}{ iN
}{ kw
PFSA vs. PCFG
• PFSA can be seen as a special case of PCFG– State non-terminal– Output symbol terminal– Arc context-free rule– Path Parse tree (only right-branch binary tree)
S1 S2 S3
a b
S1
a S2
b S3
ε
PFSA and HMM
HMM Finish
Add a “Start” state and a transition from “Start” to any state in HMM.Add a “Finish” state and a transition from any state in HMM to “Finish”.
Start
The connection between two algorithms
• HMM can (almost) be converted to a PFSA.• PFSA is a special case of PCFG.• Inside-outside is an algorithm for PCFG.Inside-outside algorithm will work for HMM.
• Forward-backward is an algorithm for HMM.In fact, Inside-outside algorithm is the same
as forward-backward when the PCFG is a PFSA.
Forward and backward probabilities
)(ti)(ti
X1Xt Xn…
o1 onXn+1…
Ot-1
)(ti
X1
…Xt-1
Xt
…
Xn
Xn+1
O1
Ot-1
On
Ot )(ti
Backward/forward prob vs. Inside/outside prob
X1
),( lti
Xt=Ni
Ot OnOt-1O1 Ol
),( lti)(ti )(ti
O1
X1
Xt=Ni
Ot OnOt-1
PFSA:PCFG:
OutsideInsideForward
Backward
Definitions
• Inside probability: total prob of generating words wp…wq from non-terminal Nj.
• Outside probability: total prob of beginning with the start symbol N1 and generating and all the words outside wp…wq
• When p>q,
jpqN
),,(),( )1()1(1 mqjpqpj wNwPqp
)|(),( jpqpqj NwPqp
0),(),( qpqp jj
Calculating inside probability (CYK algorithm)
),1(),()(),(,
1
qddpNNNPqp srsr
sr
q
pd
jj
Nj
Nr Ns
wp wd Wd+1 wq
)(),( kj
j wNPkk
Calculating outside probability (case 1)
),1()(),(),(, 1
eqNNNPepqp ggjf
gf
m
qefj
Nj Ng
wp wq Wq+1 we
Nf
N1
w1 wm
Calculating outside probability (case 2)
)1,()(),(),(,
1
1
peNNNPqeqp gjgf
gf
p
efj
Ng Nj
we Wp-1 Wp wq
Nf
N1
w1 wm
Outside probability
)1,()(),(
),1()(),(),(
,
1
1
, 1
peNNNPqe
eqNNNPepqp
gjgf
gf
p
ef
ggjf
gf
m
qefj
otherwise
jifmj 0
11),1(
Probability of a sentence
),1()( 11 mwP m
kanyforwNPkkwP kj
jjm )(),()( 1
),(),(),( 1 qpqpNwP jjjpqm
Recap so far
• Inside probability: bottom-up
• Outside probability: top-down using the same chart.
• Probability of a sentence can be calculated in many ways.
The probability of a binary rule is used
)(
),1(),()(),(
)|,(1
1
1m
q
pdsr
srjj
msrjj
pq wP
qddpNNNPqp
wNNNNP
)(
),1(),()(),(
)|,(
)|,(
1
1
1 1
1 11
1
m
q
pdsr
srjj
m
p
m
q
m
p
m
qm
srjjpq
msrjj
wP
qddpNNNPqp
wNNNNP
wNNNNP
(1)
The probability of Nj is used
),(),(),( 1 qpqpNwP jjjpqm
)(
),(),(
)(
),()|(
11
11
m
jj
m
mjpq
mjpq wP
qpqp
wP
wNPwNP
)(
),(),(
)|(
)|(
)|(
11 1
1
11 1
1
m
jjm
p
m
q
msrj
r s
mjpq
m
p
m
q
mj
wP
qpqp
wNNNP
wNP
wNP
(2)
m
p
m
pqjj
m
p
m
pq
q
pdsr
srjj
mj
mjsrj
msrj
qpqp
qddpNNNPqp
wNP
wNNNNPwNNNP
1
1
1
1
11
),(),(
),1(),()(),(
)2(
)1(
)|(
)|,()|(
The probability of a unaryrule is used
)(
),(),(),()|,(
1
11
m
m
h
khjj
mjkj
wP
wwhhhhwusedisNwNP
m
p
m
pqjj
m
h
khjj
mj
mjkj
mjkj
qpqp
wwhhhh
wNP
wNwNPwNwNP
1
1
1
11
),(),(
),(),(),(
)2(
)3(
)|(
)|,(),|(
(3)
Multiple training sentences
ii
m
jjm
p
m
qm
j
Wsentenceforjh
wP
qpqpwNP
)(
)(
),(),()|(
11 11
),,(
)(
),1(),()(),(
)|,(1
11
1 11
srjf
wP
qddpNNNPqp
wNNNNP
i
m
q
pdsr
srjj
m
p
m
pqm
srjj
(1)
(2)
)(
),,()(
jh
srjfNNNP
ii
iisrj
Inner loop of the Inside-outside algorithm
Given an input sequence and1. Calculate inside probability:
• Base case• Recursive case:
2. Calculate outside probability:• Base case:
• Recursive case:
)(),( kj
j wNPkk
),1(),()(),(,
1
qddpNNNPqp srsr
sr
q
pd
jj
otherwise
jifmj 0
11),1(
)1,()(),(
),1()(),(),(
,
1
1
, 1
peNNNPqe
eqNNNPepqp
gjgf
gf
p
ef
ggjf
gf
m
qefj
Inside-outside algorithm (cont)
)(
),1(),()(),(
)|,(
1
1
1 1
1
m
q
pdsr
srjj
m
p
m
q
msrjj
wP
qddpNNNPqp
wNNNNP
)(
),(),(),()|,(
1
11
m
m
h
khjj
mjkj
wP
wwhhhhwusedisNwNP
3. Collect the counts
4. Normalize and update the parameters
km
jkjm
jkj
kj
k
kjkj
r sm
srjjm
srjj
r s
srj
srjsrj
wusedisNwNP
wusedisNwNP
wNCnt
wNCntwNP
wNNNNP
wNNNNP
NNNCnt
NNNCntNNNP
)|,(
)|,(
)(
)()(
)|,(
)|,(
)(
)()(
1
1
1
1
Relation to EM
• PCFG is a PM (Product of Multi-nominal) Model
• Inside-outside algorithm is a special case of the EM algorithm for PM Models.
• X (observed data): each data point is a sentence w1m.
• Y (hidden data): parse tree Tr.
• Θ (parameters):
)(
)(
kj
srj
wNP
NNNP
Relation to EM (cont)
),|,(
),|,(
),,(*),|(
),,(*),|()(
1
11 1
11
msrjj
msrjj
pq
m
p
m
q
srj
Trmm
srj
Y
srj
wNNNNP
wNNNNP
NNNwTrcountwTrP
NNNYXcountXYPNNNcount
),|,(
),,(*),|(
),,(*),|()(
11
11
mjkj
m
h
Tr
kjmm
kj
Y
kj
wusedisNwNP
wNTrwcountwTrP
wNYXcountXYPwNcount
Summary
)(ti )1( tj
XtXt+1
Ot
N1
Nr Ns
wp wd Wd+1 wq
Nj
),( qpj
),( qpj
)|( 1 iXjXPa ttij
),|( 1 jXiXwOPb ttktijk
)( srj NNNP
)( kj wNP
Summary (cont)
• Topology is known:– (states, arcs, output symbols) in HMM– (non-terminals, rules, terminals) in PCFG
• Probabilities of arcs/rules are unknown.
• Estimating probs using EM (introducing hidden data Y)
Converting HMM to PCFG
• Given an HMM=(S, Σ, π, A, B), create a PCFG=(S1, Σ1,S0, R, P) as follows:– S1=– Σ1=– S0=Start– R=
– P:
]},1[,|,,{}{ 0 NjiDDStartN iji
},{ EOSBOS
},,{
}{]},1[,|{0
0
EOSNBOSDwD
NDStartNjiNDNi
kij
ijiji
1)(
1)(
)(
)(
)(
0
EOSNP
BOSDP
bwDP
DNStartP
aNDNP
i
ijkkij
ii
ijjiji
Outside probability
),( qpj
),( Tpj
),( Tti
q=T
)(ti
(j,i),(p,t)
),(_ qpji
),(_ ppji
),(_ ttji
q=p
)1()( tat jiji
(p,t)
Outside prob for Nj Outside prob for Dij
Inside probability
),( Tpj
),( Tti
)(ti
q=T
(j,i),(p,t)
),( qpj
),(_ ppji
),(_ ttji
tijob
q=p
(p,t)
),(_ qpji
Inside prob for Nj Inside prob for Dij
)(
),1(),()(),(
)|,(1
1
1m
q
pdsr
srjj
msrjj
pq wP
qddpNNNPqp
wNNNNP
)(
),1(),()(),(
)|,(1
1
1T
q
pdjr
jrii
Tjrii
tq oP
qddtNNNPqt
oNNNNP
td
DN
Tqijr
)(
),1(),()(),()|,(
11
T
jrjiji
iT
jijiitT oP
TtttNDNPTtoNDNNP
Renaming: (j,i), (s,j),(p,t),(m,T)
)()(
)()()|,(
1
11 t
oP
jbaioNDNNP ij
T
tijkijtT
jijiitT
)|,( 1Tjijii
tT oNDNNP Estimating
m
p
m
pqm
srjjpqm
srjj wNNNNPwNNNNP1 1
11 )|,()|,(
T
t
T
pqT
jriitqT
jrii ONNNNPONNNNP1 1
11 )|,()|,(
td
DN
Tqijr
Renaming: (j,i), (s,j),(p,t),(m,T)
T
tij
T
tT
jriitTT
jijii tONNNNPONDNNP11
11 )()|,()|,(
)|,( 1Tjijii oNDNNP Estimating
)(
),(),()|(
11 11
m
jjm
p
m
qm
j
wP
qpqpwNP
)(
),(),()|(
11 11
T
iiT
t
T
qT
i
OP
qtqtONP
td
DN
Tqijr
Renaming: (j,i), (s,j),(p,t),(m,T)
)()(
)()()|(
1111 t
OP
iiONP
T
ti
T
ttT
tT
i
)|( 1Ti ONPEstimating
)|(
)|,()|(
1
11
mj
mjsrj
msrj
wNP
wNNNNPwNNNP
ijT
ti
T
tij
Ti
Tijri
Tjri a
t
t
ONP
ONNNNPONNNP
1
1
1
11
)(
)(
)|(
)|,()|(
Renaming: (j,i), (s,j),(w,o),(m,T)
)|( 1Tjri ONNNP Calculating
Calculating
m
p
m
pqjj
m
hhhjj
mkj
qpqp
wwhhhhwwNP
1
11
),(),(
),(),(),()|(
T
t
T
tqii
T
thtjiji
Tkji
qtqt
wOttttOwDP
1
1__
1_
),(),(
),(),(),()|(
tijoji
jijiji
btt
tattt
tq
),(
)1()(),(
_
_
Renaming (j,i_j), (s,j),(p,t),(h,t),(m,T),(w,O), (N,D)
T
tij
ht
T
tij
T
tijojiji
T
thtijojiji
Tkji
t
wOt
btat
wObtatOwDP
t
t
1
1
1
11
_
)(
),()(
)1()(
),()1()()|(
)|( 1_
Tkji OwDP