Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• Hidden Markov Models (HMMs)– probabilistic models for learning patterns in

sequences (e.g. DNA, speech, weather, cards...)

(2nd order model)

• an observable Markov model– directly get sequence of states– p(s1,s2...sn|)=p(s1) i=2..n p(si|si-1)– (why I don’t like the Urn example in the book)

• Hidden Markov model– only observe sequence of symbols generated by

states– for each state, there is a probability distribution over

finite set of symbols (emission probabilities)

– example: think of soda machine• observations: messages display (“insert 20 cents more”),

output can, give change• states: coins inserted so far add up to N cents...• state transitions are determined by coins input

• tasks1) given a sequence, compute the probability it came

from one of a set of models (e.g. most likely phoneme) – classification

2) infer the most likely sequence of states underlying sequence of symbols

find Q* such that:

3) train the HMM by learning the parameters (transition probabilities) from a set of examples

given seqs X, find * such that

• given an observation sequence O=o1...oT – if we also knew state seq Q=q1..qT, then we

could easily calculate p(O|Q,)– joint probability:

• p(O,Q|)=p(q1) · i=2..T p(qi|qi-1) · i=1..T p(oi|qi)

– could calculate by marginalization• p(O|) = Q p(O,Q|)• intractable, have to sum over all possible sequences

Q– the forward-backward algorithm is a recursive

procedure that solves this efficiently (via dynamic programming)

• Forward variable:

5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

t(i) is the probability of observing prefix o1..ot and ending in state qi

N

iT

tj

N

iijtt

ii

ittt

iOP

Obaij

Obi

SqOOPi

1

11

1

11

1

|

:Recursion

:tionInitializa|,...

• Backward variable:


t(i) is the probability of being in state qi at time t and observing suffix ot+1..oT

N

jttjijt

T

itTtt

jObai

i

SqOOPi

111

1

1

:Recursion

:tionInitializa

| ,...

Forward-backward algorithm, O(N2T)forward pass:

for each time step i=1..Tcalculate (i) by summing over all predecessor states j

reverse passfor each time step i=T..1

calculate (i) by summing over all successor states j

• A01 function ForwardBackward( O, S,π,A,B ) : returns p(O|π,A,B)• A02 for each state si do• A03 1(i)←πi*Bi(O1)• A04 end for• A05 for i←2,3,...,T do• A06 for each state sj do• A07 i(j)←k (i-1(k)*Akj*Bj(Oi))• A08 end for• A09 end for

// is not needed for output, but is often computed for other purposes• A10 T(i)←1• A11 for i←T-1,...,1 do• A12 for each state sj do• A13 i(j)←k (Ajk*Bk(Oi+1)*i+1(k))• A14 end for• A15 end for• A16 return i T(i)• A17 end function

9

N

j tt

tt

itt

jjii

OSqPi

1

,

No!

Choose the state that has the highest probability, for each time step:

qt*= arg maxi γt(i)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Viterbi’s Algorithm

δt(i) ≡ maxq1q2 ∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)

• Initialization: δ1(i) = πibi(O1), ψ1(i) = 0

• Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij ψt(j) – note: I think the book has wrong formula for ψt(j)

• Termination:p* = maxi δT(i), qT

*= argmaxi δT (i)• Path backtracking:

qt* = ψt+1(qt+1

* ), t=T-1, T-2, ..., 1

10

• A01 function VITERBI( O, S,π,A,B ) : returns state sequence q1*..qT*• A02 for each state si do• A03 1(i)←πi*Bi(O1)• A04 1(i)←0• A05 end for• A06 for i←2,3,...,T do• A07 for each state sj do• A08 i(j)←maxk (i-1(k)*Akj*Bj(Oi))• A09 i(j)←argmaxk (i-1(k)*Akj*Bj(Oi))• A10 end for• A11 end for

// traceback, extract sequence of states• A12 p*←maxi T(i)• A13 qT*← argmaxi T(i)• A14 for i←T-1,T-2,...,1 do• A15 qi*=j+1(qi+1*)• A16 end for• A17 return q1*..qT*• A19 end function

12

otherwise and if

otherwise if

(EM) algorithm Welch-Baum

|

01

01 1

11

11

1

jtittij

itti

k l ttlklt

ttjijtt

jtitt

SqSqz

Sqz

lObakjObai

ji

OSqSqPji

:

,

,,,

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

learn model parameters (transition aij and emissionprobabilities bij with highest likelihood for a given setof training examples

define t(i,j) as prob of being in si at time t and sj at time t+1, given sequence of observations O

define latent variables zjt and zij

t as indicators of which state a sequence passes through at each time step

Baum-Welch (EM)

ˆ

,ˆ ˆ

:stepM

, :stepE

K

k

T

tkt

K

k

T

t mkt

kt

j

K

k

T

tkt

K

k

T

tk

tij

K

k

k

i

ttijt

ti

k

k

k

k

i

vOjmb

i

jia

K

i

jizEizE

1

1

1

1

1

1

1

1

1

1

1

111

1


recall, it = i

t it , prob of being in state i at time t

expectation oftransition

Documents

Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)