Upload
andra-lindsey
View
235
Download
0
Embed Size (px)
DESCRIPTION
tasks 1)given a sequence, compute the probability it came from one of a set of models (e.g. most likely phoneme) – classification 2)infer the most likely sequence of states underlying sequence of symbols find Q* such that: 3)train the HMM by learning the parameters (transition probabilities) from a set of examples given seqs X, find * such that
Citation preview
• Hidden Markov Models (HMMs)– probabilistic models for learning patterns in
sequences (e.g. DNA, speech, weather, cards...)
(2nd order model)
• an observable Markov model– directly get sequence of states– p(s1,s2...sn|)=p(s1) i=2..n p(si|si-1)– (why I don’t like the Urn example in the book)
• Hidden Markov model– only observe sequence of symbols generated by
states– for each state, there is a probability distribution over
finite set of symbols (emission probabilities)
– example: think of soda machine• observations: messages display (“insert 20 cents more”),
output can, give change• states: coins inserted so far add up to N cents...• state transitions are determined by coins input
• tasks1) given a sequence, compute the probability it came
from one of a set of models (e.g. most likely phoneme) – classification
2) infer the most likely sequence of states underlying sequence of symbols
find Q* such that:
3) train the HMM by learning the parameters (transition probabilities) from a set of examples
given seqs X, find * such that
• given an observation sequence O=o1...oT – if we also knew state seq Q=q1..qT, then we
could easily calculate p(O|Q,)– joint probability:
• p(O,Q|)=p(q1) · i=2..T p(qi|qi-1) · i=1..T p(oi|qi)
– could calculate by marginalization• p(O|) = Q p(O,Q|)• intractable, have to sum over all possible sequences
Q– the forward-backward algorithm is a recursive
procedure that solves this efficiently (via dynamic programming)
• Forward variable:
5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
t(i) is the probability of observing prefix o1..ot and ending in state qi
N
iT
tj
N
iijtt
ii
ittt
iOP
Obaij
Obi
SqOOPi
1
11
1
11
1
|
:Recursion
:tionInitializa|,...
• Backward variable:
6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
t(i) is the probability of being in state qi at time t and observing suffix ot+1..oT
N
jttjijt
T
itTtt
jObai
i
SqOOPi
111
1
1
:Recursion
:tionInitializa
| ,...
Forward-backward algorithm, O(N2T)forward pass:
for each time step i=1..Tcalculate (i) by summing over all predecessor states j
reverse passfor each time step i=T..1
calculate (i) by summing over all successor states j
• A01 function ForwardBackward( O, S,π,A,B ) : returns p(O|π,A,B)• A02 for each state si do• A03 1(i)←πi*Bi(O1)• A04 end for• A05 for i←2,3,...,T do• A06 for each state sj do• A07 i(j)←k (i-1(k)*Akj*Bj(Oi))• A08 end for• A09 end for
// is not needed for output, but is often computed for other purposes• A10 T(i)←1• A11 for i←T-1,...,1 do• A12 for each state sj do• A13 i(j)←k (Ajk*Bk(Oi+1)*i+1(k))• A14 end for• A15 end for• A16 return i T(i)• A17 end function
9
N
j tt
tt
itt
jjii
OSqPi
1
,
No!
Choose the state that has the highest probability, for each time step:
qt*= arg maxi γt(i)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Viterbi’s Algorithm
δt(i) ≡ maxq1q2 ∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)
• Initialization: δ1(i) = πibi(O1), ψ1(i) = 0
• Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij ψt(j) – note: I think the book has wrong formula for ψt(j)
• Termination:p* = maxi δT(i), qT
*= argmaxi δT (i)• Path backtracking:
qt* = ψt+1(qt+1
* ), t=T-1, T-2, ..., 1
10
• A01 function VITERBI( O, S,π,A,B ) : returns state sequence q1*..qT*• A02 for each state si do• A03 1(i)←πi*Bi(O1)• A04 1(i)←0• A05 end for• A06 for i←2,3,...,T do• A07 for each state sj do• A08 i(j)←maxk (i-1(k)*Akj*Bj(Oi))• A09 i(j)←argmaxk (i-1(k)*Akj*Bj(Oi))• A10 end for• A11 end for
// traceback, extract sequence of states• A12 p*←maxi T(i)• A13 qT*← argmaxi T(i)• A14 for i←T-1,T-2,...,1 do• A15 qi*=j+1(qi+1*)• A16 end for• A17 return q1*..qT*• A19 end function
12
otherwise and if
otherwise if
(EM) algorithm Welch-Baum
|
01
01 1
11
11
1
jtittij
itti
k l ttlklt
ttjijtt
jtitt
SqSqz
Sqz
lObakjObai
ji
OSqSqPji
:
,
,,,
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
learn model parameters (transition aij and emissionprobabilities bij with highest likelihood for a given setof training examples
define t(i,j) as prob of being in si at time t and sj at time t+1, given sequence of observations O
define latent variables zjt and zij
t as indicators of which state a sequence passes through at each time step
Baum-Welch (EM)
ˆ
,ˆ ˆ
:stepM
, :stepE
K
k
T
tkt
K
k
T
t mkt
kt
j
K
k
T
tkt
K
k
T
tk
tij
K
k
k
i
ttijt
ti
k
k
k
k
i
vOjmb
i
jia
K
i
jizEizE
1
1
1
1
1
1
1
1
1
1
1
111
1
13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
recall, it = i
t it , prob of being in state i at time t
expectation oftransition