A tutorial on hidden markov models and selected ... · PigML 28.03.2007 A tutorial on hidden markov models and selected applications in speech recognition Author Rabiner L. Journal

PigML 28.03.2007

A tutorial on hidden markov models and selected applications in speech recognition

AuthorAuthorRabiner L.

JournalJournalProceding of the IEEE,1989

SpeakerSimone Tognetti

of 33

PigML 28.03.2007

2Outline

Introduction• Graph models for signal

Learning a model• ML• EM

Markov Chain HMM

• Solve HMMs Extension of HMMs Applications Conclusion

of 33

PigML 28.03.2007

3Introduction: we can model what we see

Real word system generate observable outputs

Outputs can be viewed as stochastic signals

We can build a model of that signal because :• the model provide information concerning how to do

the processing of the signal (ex. how to remove noise)• the model give informations about the system that have

generated the signal • from the model we can make predictions.

System

? 0 1 2 4 2 3 ... M(Θ)

of 33

PigML 28.03.2007

4Which model we choose?

Graphical models

BN

DBN

MC HMM KF NN

Xt1 Xt

It1 It

Yt1 Yt1

X Y

Xt1 Xt

Xt1 Xt

Yt1 Yt

of 33

PigML 28.03.2007

5Outline



Markov Chain HMM


of 33

PigML 28.03.2007

6Learnig BN: ML

Maximum likelihood estimation•Estimation of the parameters of a stochastic model when all the random variable are observable.

•Find the parameters that maximize the likelihood

Sequence of random variable iid ~ N(μ,σ^2)Yt1 Yt Yt+1

of 33

PigML 28.03.2007

7Learning BN: EM

Expectation Maximization (EM)Learn parameters of a stochastic model when some variables are hidden. (ie. HHM)• Learn means to find something that we don't see • A simple example

EM maximize the likelihood in a iterative with two steps• Expectation

• Maximization

X Y

of 33

PigML 28.03.2007

8Outline



Markov Chain HMM


of 33

PigML 28.03.2007

9Discrete Markov Process

Set of state S Transition Matrix M

Prior Probabilityπ(0) = [ P(S1) P(S2) P(S3) ]

Marginal (Posterior probability)π(t) = π(0) M^(t)

DBN

S1

S2

S3

P(S1|S1) P(S2|S1) P(S3|S1)

P(S1|S2) P(S2|S2) P(S3|S2)

P(S1|S3) P(S2|S3) P(S3|S3)

M=

St1 St St+1

of 33

PigML 28.03.2007

10Hidden Markov Models

DBN

How to think about them?.. i.e. coin tossing

qt1 qt

Ot1 Ot

Underlying Markov Process that is notObservableOutput Markov Process that isObservable

H T H T T H T ... S1S2

S3

S1 S2

Hypothesis on the underlying model:Underlying process:

choose a coin to toss

Output process: depends on the chosen coin

of 33

PigML 28.03.2007

11Hidden Markov Models

Set of state: qt in { S1, S2, S3, ... , SN }

Underlying process transition matrix Aaij=P(qt=Si|qt1=Sj)

Output process matrix B (Produce a set of observation O)• Discrete: Ot in { v1,v2, .. vD}

bi(k)= P(Ot = vk|qt = Si), B is N x D• Continuous: Ot in R

bi = P(Ot|qt = St) , B is a vector of distribution

Prior probability on statesπ(0)= [ P(q0 = S1), P(q0 = S2), ... P(q0 = SN) ]

of 33

PigML 28.03.2007

12Learning HMM

Basic problems

We have an HMM = (A,B, ) and a sequence of λ πobservation O = O1,O2,...On

1.P(O|λ): Probability that an observation sequence came from a given HMM

2.P(q1,q2,...,qt|O): Probability to have a specific state sequence given the observation sequence

3.How to adjust to maximize P(O| ): Learning the λ λHMM

of 33

PigML 28.03.2007

13Problem 1: P(O|λ)

Given a state sequence

Probability of the observation sequence

Probability of a state sequence

Join probability of Q and O

Probability of O by summation over all state sequence

of 33

PigML 28.03.2007

14Problem 1: P(O|λ)

Forward procedure: compute a forward variable

O1 Ot...

Si Siαi(t1)

OT

Siαi(T1)

Sjαj(T1)

...

...

1 t T

of 33

PigML 28.03.2007

15EM for HMM : Forward Backward procedure

Forward Variable Backward variable

Alfa and Beta together

Estep

Ot

Si βj(t+1)

OT1 OT...

Si

Si

Ot OT1 OT...O1 ...O2

βj(t+1)αi(t1)

of 33

PigML 28.03.2007

16Problem 2: P(q1,q2, ..., qt|O)

Probability of being in state Si at time t given the observation sequence

Expression in terms of forward and backward variable

Possible solution: find local best state

• but possible solution are not ammissible state sequence

of 33

PigML 28.03.2007


Viterbi algorithm• Best score along a single path

• Keep a track of the best path that can reach the state j at time t

t1 t

O1 Ot1 Ot

π1

bi(O1) bi(O1)

OT

Most probable state

1

of 33

PigML 28.03.2007


Global behaviour

...... ... ...

Most probable state

Go back with the backtrace variable

1 T1 T

of 33

PigML 28.03.2007

19Problem 3: Learning λ

BaumWelch (EM algorithm for HMM)• Probability of being in state i at time t and in state j at

time t+1

Si βj(t+1)αi(t1) Sj

Ot+1

aij

t t+1

OTO1 Ot... ...

Extract the effect of parameter aij

of 33

PigML 28.03.2007


BaumWelch (EM algorithm for HMM)• Expression with forward and backward variable

• Relation between and

• Expected number of transition from Si

• Expected number of transition from Si to Sj

of 33

PigML 28.03.2007


BaumWelch: estimation of parameters

of 33

PigML 28.03.2007


BaumWelch: Algorithm1.Initial estimation of λ2.Estep: Compute , α β3.MStep: New estimation of parameter 'λ4.Repeat from 2 until convergence

EM : if we compute the likelihood we obtain same equation for the Mstep

Problem: Learning is sensible to initial parameter value when we have continuous observations

of 33

PigML 28.03.2007

23Outline



Markov Chain HMM


of 33

PigML 28.03.2007

24Type of HMMs

Type change with the structure of the transition matrix of the underling process

Ergodic HMM with 4 state

Left to right HMM with 4 state

Parallel path left to right HMM with 6 state

of 33

PigML 28.03.2007

25Type of HMMs

Continuous observations: we can model the distribution probability of the output with a mixture

Kalman filter

Null transition

Covariance matrixMean vectorPrior vector

New parameter to estiamate

Xt1 Xt

Yt1 Yt

of 33

PigML 28.03.2007

26Other types and issue

Explicit state duration

Distance within two HMM

Scaling

Symmetric version

of 33

PigML 28.03.2007

27Outline



Markov Chain HMM


of 33

PigML 28.03.2007

28Speech recognition

A generic framework

Temporal and spectral analysis to obtain a observation sequence

Recognition of unit of the language. Word are divided into small unit to have a small set of models

Mapping from HMM to a voice unit. Given the observation choose the HMM that give the best P(O|λ)

Composition of unit to have a word

of 33

PigML 28.03.2007

29Single word recognition

Pronunciation of a single word an recognition of it

Key element: Need to have lot of domain knowledge to extract useful features

of 33

PigML 28.03.2007

30Segmental KMean Segmentation into state

When continuous observations is needed (i.e. speech recognition), the initial estimation of the observation distribution is important for the convergence

of 33

PigML 28.03.2007

31Recognition of a sequence of digits

The observation sequence matched at each step with a single word recognition system

• The observation sequences are non preclassified and we have no information about the ending of each word

• The building level match the observation sequence to a digit sequence with some probability

• An alternative way is to use the state segmentation whit an higher level model

of 33

PigML 28.03.2007

32Conclusion

HMMs are general stochastic models EM is a good algorithm to learn such models We need prior knowledge to define the structure of the

model Lot of parameters needs lot of data They perform very well for many applications if they are

applied in the correct way• Signal segmentation and classification• Clustering of signals• Prediction

of 33

PigML 28.03.2007

33End

Questions ?

Documents

A tutorial on hidden markov models and selected ... · PigML 28.03.2007 A tutorial on hidden markov models and selected applications in speech recognition Author Rabiner L. Journal