13
Hidden Markov Models (HMMs) probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

Embed Size (px)

DESCRIPTION

tasks 1)given a sequence, compute the probability it came from one of a set of models (e.g. most likely phoneme) – classification 2)infer the most likely sequence of states underlying sequence of symbols find Q* such that: 3)train the HMM by learning the parameters (transition probabilities) from a set of examples given seqs X, find * such that

Citation preview

Page 1: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• Hidden Markov Models (HMMs)– probabilistic models for learning patterns in

sequences (e.g. DNA, speech, weather, cards...)

(2nd order model)

Page 2: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• an observable Markov model– directly get sequence of states– p(s1,s2...sn|)=p(s1) i=2..n p(si|si-1)– (why I don’t like the Urn example in the book)

• Hidden Markov model– only observe sequence of symbols generated by

states– for each state, there is a probability distribution over

finite set of symbols (emission probabilities)

– example: think of soda machine• observations: messages display (“insert 20 cents more”),

output can, give change• states: coins inserted so far add up to N cents...• state transitions are determined by coins input

Page 3: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• tasks1) given a sequence, compute the probability it came

from one of a set of models (e.g. most likely phoneme) – classification

2) infer the most likely sequence of states underlying sequence of symbols

find Q* such that:

3) train the HMM by learning the parameters (transition probabilities) from a set of examples

given seqs X, find * such that

Page 4: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• given an observation sequence O=o1...oT – if we also knew state seq Q=q1..qT, then we

could easily calculate p(O|Q,)– joint probability:

• p(O,Q|)=p(q1) · i=2..T p(qi|qi-1) · i=1..T p(oi|qi)

– could calculate by marginalization• p(O|) = Q p(O,Q|)• intractable, have to sum over all possible sequences

Q– the forward-backward algorithm is a recursive

procedure that solves this efficiently (via dynamic programming)

Page 5: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• Forward variable:

5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

t(i) is the probability of observing prefix o1..ot and ending in state qi

N

iT

tj

N

iijtt

ii

ittt

iOP

Obaij

Obi

SqOOPi

1

11

1

11

1

|

:Recursion

:tionInitializa|,...

Page 6: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• Backward variable:

6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

t(i) is the probability of being in state qi at time t and observing suffix ot+1..oT

N

jttjijt

T

itTtt

jObai

i

SqOOPi

111

1

1

:Recursion

:tionInitializa

| ,...

Page 7: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

Forward-backward algorithm, O(N2T)forward pass:

for each time step i=1..Tcalculate (i) by summing over all predecessor states j

reverse passfor each time step i=T..1

calculate (i) by summing over all successor states j

Page 8: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• A01 function ForwardBackward( O, S,π,A,B ) : returns p(O|π,A,B)• A02 for each state si do• A03 1(i)←πi*Bi(O1)• A04 end for• A05 for i←2,3,...,T do• A06 for each state sj do• A07 i(j)←k (i-1(k)*Akj*Bj(Oi))• A08 end for• A09 end for

// is not needed for output, but is often computed for other purposes• A10 T(i)←1• A11 for i←T-1,...,1 do• A12 for each state sj do• A13 i(j)←k (Ajk*Bk(Oi+1)*i+1(k))• A14 end for• A15 end for• A16 return i T(i)• A17 end function

Page 9: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

9

N

j tt

tt

itt

jjii

OSqPi

1

,

No!

Choose the state that has the highest probability, for each time step:

qt*= arg maxi γt(i)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 10: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

Viterbi’s Algorithm

δt(i) ≡ maxq1q2 ∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)

• Initialization: δ1(i) = πibi(O1), ψ1(i) = 0

• Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij ψt(j) – note: I think the book has wrong formula for ψt(j)

• Termination:p* = maxi δT(i), qT

*= argmaxi δT (i)• Path backtracking:

qt* = ψt+1(qt+1

* ), t=T-1, T-2, ..., 1

10

Page 11: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

• A01 function VITERBI( O, S,π,A,B ) : returns state sequence q1*..qT*• A02 for each state si do• A03 1(i)←πi*Bi(O1)• A04 1(i)←0• A05 end for• A06 for i←2,3,...,T do• A07 for each state sj do• A08 i(j)←maxk (i-1(k)*Akj*Bj(Oi))• A09 i(j)←argmaxk (i-1(k)*Akj*Bj(Oi))• A10 end for• A11 end for

// traceback, extract sequence of states• A12 p*←maxi T(i)• A13 qT*← argmaxi T(i)• A14 for i←T-1,T-2,...,1 do• A15 qi*=j+1(qi+1*)• A16 end for• A17 return q1*..qT*• A19 end function

Page 12: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

12

otherwise and if

otherwise if

(EM) algorithm Welch-Baum

|

01

01 1

11

11

1

jtittij

itti

k l ttlklt

ttjijtt

jtitt

SqSqz

Sqz

lObakjObai

ji

OSqSqPji

:

,

,,,

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

learn model parameters (transition aij and emissionprobabilities bij with highest likelihood for a given setof training examples

define t(i,j) as prob of being in si at time t and sj at time t+1, given sequence of observations O

define latent variables zjt and zij

t as indicators of which state a sequence passes through at each time step

Page 13: Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

Baum-Welch (EM)

ˆ

,ˆ ˆ

:stepM

, :stepE

K

k

T

tkt

K

k

T

t mkt

kt

j

K

k

T

tkt

K

k

T

tk

tij

K

k

k

i

ttijt

ti

k

k

k

k

i

vOjmb

i

jia

K

i

jizEizE

1

1

1

1

1

1

1

1

1

1

1

111

1

13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

recall, it = i

t it , prob of being in state i at time t

expectation oftransition