55
Hidden Markov Models K 1 2

Hidden Markov Models

Embed Size (px)

DESCRIPTION

Hidden Markov Models. 1. 2. K. …. Outline. Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech Recognition (ASR). Example: The Dishonest Casino. A casino has two dice: Fair die P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 - PowerPoint PPT Presentation

Citation preview

Page 1: Hidden Markov Models

Hidden Markov Models

K

1

2

Page 2: Hidden Markov Models

Outline

• Hidden Markov Models – Formalism

• The Three Basic Problems of HMMs

• Solutions

• Applications of HMMs for Automatic Speech Recognition (ASR)

Page 3: Hidden Markov Models

Example: The Dishonest Casino

A casino has two dice:• Fair die

P(1) = P(2) = P(3) = P(5) = P(6) = 1/6• Loaded die

P(1) = P(2) = P(3) = P(4) = P(5) = 1/10P(6) = 1/2

Casino player switches back-&-forth between fair and loaded die once in a while

Game:1. You bet $12. You roll (always with a fair die)3. Casino player rolls (maybe with fair

die, maybe with loaded die)4. Highest number wins $2

Page 4: Hidden Markov Models

Question # 1 – Evaluation

GIVEN

A sequence of rolls by the casino player

12455264621461461361366616646616366163661636165

QUESTION

How likely is this sequence, given our model of how the casino works?

This is the EVALUATION problem in HMMs

Page 5: Hidden Markov Models

Question # 2 – Decoding

GIVEN

A sequence of rolls by the casino player

12455264621461461361366616646616366163661636165

QUESTION

What portion of the sequence was generated with the fair die, and what portion with the loaded die?

This is the DECODING question in HMMs

Page 6: Hidden Markov Models

Question # 3 – Learning

GIVEN

A sequence of rolls by the casino player

12455264621461461361366616646616366163661636165

QUESTION

How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back?

This is the LEARNING question in HMMs

Page 7: Hidden Markov Models

The dishonest casino model

FAIR LOADED

0.05

0.05

0.950.95

P(1|F) = 1/6P(2|F) = 1/6P(3|F) = 1/6P(4|F) = 1/6P(5|F) = 1/6P(6|F) = 1/6

P(1|L) = 1/10P(2|L) = 1/10P(3|L) = 1/10P(4|L) = 1/10P(5|L) = 1/10P(6|L) = 1/2

Page 8: Hidden Markov Models

Example: the dishonest casino

Let the sequence of rolls be:

O = 1, 2, 1, 5, 6, 2, 1, 6, 2, 4

Then, what is the likelihood of

X= Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair?

(say initial probs P(t=0,Fair) = ½, P(t=0,Loaded)= ½)

½ P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair) =

½ (1/6)10 (0.95)9 = .00000000521158647211 = 0.5 10-9

Page 9: Hidden Markov Models

Example: the dishonest casinoSo, the likelihood the die is fair in all this runis just 0.521 10-9

OK, but what is the likelihood of

X= Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded?

½ P(1 | Loaded) P(Loaded, Loaded) … P(4 | Loaded) =

½ (1/10)8 (1/2)2 (0.95)9 = .00000000078781176215 = 7.9 10-

10

Therefore, it is after all 6.59 times more likely that the die is fair all the way, than that it is loaded all the way.

Page 10: Hidden Markov Models

Example: the dishonest casinoLet the sequence of rolls be:

O = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6

Now, what is the likelihood X = F, F, …, F?

½ (1/6)10 (0.95)9 = 0.5 10-9, same as before

What is the likelihood

X= L, L, …, L?

½ (1/10)4 (1/2)6 (0.95)9 = .00000049238235134735 = 0.5 10-

7

So, it is 100 times more likely the die is loaded

Page 11: Hidden Markov Models

HMM Timeline

• Arrows indicate probabilistic dependencies.• x’s are hidden states, each dependent only on the previous state.

– The Markov assumption holds for the state sequence.

• o’s are observations, dependent only on their corresponding hidden state.

time

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Page 12: Hidden Markov Models

HMM Formalism

• An HMM can be specified by 3 matrices {• are the initial state probabilities

• A = {aij} are the state transition probabilities = Pr(xj|xi)

• B = {bik} are the observation probabilities = Pr(ok|xi)

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Page 13: Hidden Markov Models

Generating a sequence by the model

Given a HMM, we can generate a sequence of length n as follows:

1. Start at state xi according to prob i

2. Emit letter o1 according to prob bi(o1)

3. Go to state xj according to prob aij

4. … until emitting oT

1

2

N

1

2

N

1

2

K

1

2

N

o1 o2 o3 oT

2

1

N

2

0

b2o1

2

Page 14: Hidden Markov Models

The three main questions on HMMs

1. Evaluation

GIVEN a HMM , and a sequence O,FIND Prob[ O | ]

2. Decoding

GIVEN a HMM , and a sequence O,FIND the sequence X of states that maximizes P[X | O, ]

3. Learning

GIVEN a sequence O,FIND a model with parameters , A and B that maximize P[ O | ]

Page 15: Hidden Markov Models

Problem 1: Evaluation

Find the likelihood a sequence is generated by the model

Page 16: Hidden Markov Models

)|Pr( Compute

),,( ,)...( 1

O

BAooO T

oTo1 otot-1 ot+1

Given an observation sequence and a model, compute the probability of the observation sequence

Probability of an Observation

Page 17: Hidden Markov Models

Probability of an Observation

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

1

111112211

...),|Pr(T

toxoxoxoxox ttTT

bbbbbXO

1

111132211

...)|Pr(T

txxxxxxxxxx ttTT

aaaaX

Let X = x1 … xt be the state sequence.

Page 18: Hidden Markov Models

Probability of an Observation

X

XXO )|Pr(),|Pr(

X

XOO )|,Pr()|Pr(

111111

1

1

1

1

tttt ox

T

toxxx

T

tXx bba

111111

1

1

tttt oxxx

T

tXoxx bab

Page 19: Hidden Markov Models

HMM – Evaluation (cont.)

• Why isn’t it efficient? – For a given state sequence of length T we have about 2T

calculations– Let N be the number of states in the graph.– There are NT possible state sequences. – Complexity : O(2TNT )– Can be done more efficiently by the forward-backward (F-

B) procedure.

Page 20: Hidden Markov Models

)|,...()( 1 ixooPt tti

The Forward Procedure (Prefix Probs)

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

The probability of being in state i after generating the first t observations.

Page 21: Hidden Markov Models

)|(),...(

)()|()|...(

)()|...(

),...(

1111

11111

1111

111

jxoPjxooP

jxPjxoPjxooP

jxPjxooP

jxooP

tttt

ttttt

ttt

tt

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

)1( tj

Page 22: Hidden Markov Models

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

)1( tj

)|(),...(

)()|()|...(

)()|...(

),...(

1111

11111

1111

111

jxoPjxooP

jxPjxoPjxooP

jxPjxooP

jxooP

tttt

ttttt

ttt

tt

Page 23: Hidden Markov Models

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

)1( tj

)|(),...(

)()|()|...(

)()|...(

),...(

1111

11111

1111

111

jxoPjxooP

jxPjxoPjxooP

jxPjxooP

jxooP

tttt

ttttt

ttt

tt

Page 24: Hidden Markov Models

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

)1( tj

)|(),...(

)()|()|...(

)()|...(

),...(

1111

11111

1111

111

jxoPjxooP

jxPjxoPjxooP

jxPjxooP

jxooP

tttt

ttttt

ttt

tt

Page 25: Hidden Markov Models

Nijoiji

ttttNi

tt

tttNi

ttt

ttNi

ttt

tbat

jxoPixjxPixooP

jxoPixPixjxooP

jxoPjxixooP

...1

111...1

1

11...1

11

11...1

11

1)(

)|()|(),...(

)|()()|,...(

)|(),,...(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

Page 26: Hidden Markov Models

Nijoiji

ttttNi

tt

tttNi

ttt

ttNi

ttt

tbat

jxoPixjxPixooP

jxoPixPixjxooP

jxoPjxixooP

...1

111...1

1

11...1

11

11...1

11

1)(

)|()|(),...(

)|()()|,...(

)|(),,...(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

Page 27: Hidden Markov Models

Nijoiji

ttttNi

tt

tttNi

ttt

ttNi

ttt

tbat

jxoPixjxPixooP

jxoPixPixjxooP

jxoPjxixooP

...1

111...1

1

11...1

11

11...1

11

1)(

)|()|(),...(

)|()()|,...(

)|(),,...(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

Page 28: Hidden Markov Models

Nijoiji

ttttNi

tt

tttNi

ttt

ttNi

ttt

tbat

jxoPixjxPixooP

jxoPixPixjxooP

jxoPjxixooP

...1

111...1

1

11...1

11

11...1

11

1)(

)|()|(),...(

)|()()|,...(

)|(),,...(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

Page 29: Hidden Markov Models

The Forward Procedure

Initialization:

Iteration:

Termination:

Computational Complexity: O(N2T)

1)1( ioii b

Nijoijij t

batt...1

1)()1(

Ni

i TOP...1

)()|(

Ni 1

NjTt 111

Page 30: Hidden Markov Models

)|...()( ixooPt tTti

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Another Version: The Backward Procedure (Suffix Probs)

1)1( Ti

Nj

jioiji tbatt

...1

)1()(

Probability of the rest of the states given the first state

Page 31: Hidden Markov Models

Problem 2: Decoding

Find the best state sequence

Page 32: Hidden Markov Models

Decoding

• Given an HMM and a new sequence of observations, find the most probable sequence of hidden states that generated these observations:

• In general, there is an exponential number of possible sequences.

• Use dynamic programming to reduce search space to O(n2T).

)|,Pr(maxarg),|Pr(maxarg^

OXOXXXX

Page 33: Hidden Markov Models

oTo1 otot-1 ot+1

Viterbi Algorithm

),,...,...(max)( 1111... 11

ttttxx

j ojxooxxPtt

The state sequence which maximizes the probability of seeing the observations up to time t-1, landing in state j, and seeing the observation at time t.

x1 xt-1 j

Page 34: Hidden Markov Models

oTo1 otot-1 ot+1

Viterbi Algorithm

jj )0( Initialization

x1 xt-1 j

Page 35: Hidden Markov Models

oTo1 otot-1 ot+1

Viterbi Algorithm

1)(max)1(

tjoijii

j batt

1)(maxarg)1(

tjoijii

j batt

Prob. of ML state

x1 xt-1 xt xt+1

Recursion

Name of ML state

Page 36: Hidden Markov Models

oTo1 otot-1 ot+1

Viterbi Algorithm

)(maxargˆ TX ii

T

)1(ˆ1

^

tXtX

t

)(maxarg)ˆ( TXP ii

“Read out” the most likely state sequence, working backwards.

x1 xt-1 xt xt+1 xT

Termination

Page 37: Hidden Markov Models

Problem 3: Learning

Re-estimate the parameters of the model based on training data

Page 38: Hidden Markov Models

Learning by Parameter Estimation:

• Goal : Given an observation sequence, find the model that is most likely to produce that sequence.

• Problem: We don’t know the relative frequencies of hidden visited states.

• No analytical solution is known for HMMs.• We will approach the solution by successive approximations.

Page 39: Hidden Markov Models

The Baum-Welch Algorithm• Find the expected frequencies of possible values of the hidden

variables.• Compute the maximum likelihood distributions of the hidden

variables (by normalizing, as usual for MLE).• Repeat until “convergence.”• This is the Expectation-Maximization (EM) algorithm for parameter

estimation.• Applicable to any stochastic process, in theory.• Special case for HMMs is called the Baum-Welch algorithm.

Page 40: Hidden Markov Models

oTo1 otot-1 ot+1

Arc and State Probabilities

A

B

AAA

BBB B

Nmmm

jjoijit tt

tbatjip t

...1

)()(

)1()(),( 1

Probability of

traversing an arcFrom state i (at time t) to state j (at time t+1)

Nj

ti jipt...1

),()(Probability of being in state i at time t.

Page 41: Hidden Markov Models

oTo1 otot-1 ot+1

Aggregation and Normalization

A

B

AAA

BBB B

)1(ˆ i i

Now we can compute the new MLEs of the model parameters.

T

t i

T

t tij

t

jipa

1

1

)(

),(ˆ

T

t i

kot t

ikt

ib t

1

}:{

)(

)(ˆ

Page 42: Hidden Markov Models

The Baum-Welch Algorithm

1. Initialize A,B and (Pick the best-guess for model parameters or arbitrary)

2. Repeat3. Calculate and 4. Calculate and 5. Estimate , and 6. Until the changes are small enough

)(tj )(tj)(tj ),( jiPt

i ija ikb

Page 43: Hidden Markov Models

The Baum-Welch Algorithm – Comments

Time Complexity:

# iterations O(N2T)

• Guaranteed to increase the (log) likelihood of the model

P( | O) = P(O, ) / P(O) = P(O | ) / ( P(O) P() )

• Not guaranteed to find globally best parameters

Converges to local optimum, depending on initial conditions

• Too many parameters / too large model - Overtraining

Page 44: Hidden Markov Models

Application : Automatic Speech Recognition

Page 45: Hidden Markov Models

Examples (1)

Page 46: Hidden Markov Models

Examples (2)

Page 47: Hidden Markov Models

Examples (3)

Page 48: Hidden Markov Models

Examples (4)

Page 49: Hidden Markov Models
Page 50: Hidden Markov Models

Phones

Page 51: Hidden Markov Models

Speech Signal

• Waveform

• Spectrogram

Page 52: Hidden Markov Models

Speech Signal cont.

Articulation

Page 53: Hidden Markov Models

Feature Extraction

Frame 1

Frame 2

Feature VectorX1

Feature VectorX2

Page 54: Hidden Markov Models
Page 55: Hidden Markov Models