Upload
others
View
22
Download
0
Embed Size (px)
Citation preview
PigML 28.03.2007
A tutorial on hidden markov models and selected applications in speech recognition
AuthorAuthorRabiner L.
JournalJournalProceding of the IEEE,1989
SpeakerSimone Tognetti
of 33
PigML 28.03.2007
2Outline
Introduction• Graph models for signal
Learning a model• ML• EM
Markov Chain HMM
• Solve HMMs Extension of HMMs Applications Conclusion
of 33
PigML 28.03.2007
3Introduction: we can model what we see
Real word system generate observable outputs
Outputs can be viewed as stochastic signals
We can build a model of that signal because :• the model provide information concerning how to do
the processing of the signal (ex. how to remove noise)• the model give informations about the system that have
generated the signal • from the model we can make predictions.
System
? 0 1 2 4 2 3 ... M(Θ)
of 33
PigML 28.03.2007
4Which model we choose?
Graphical models
BN
DBN
MC HMM KF NN
Xt1 Xt
It1 It
Yt1 Yt1
X Y
Xt1 Xt
Xt1 Xt
Yt1 Yt
of 33
PigML 28.03.2007
5Outline
Introduction• Graph models for signal
Learning a model• ML• EM
Markov Chain HMM
• Solve HMMs Extension of HMMs Applications Conclusion
of 33
PigML 28.03.2007
6Learnig BN: ML
Maximum likelihood estimation•Estimation of the parameters of a stochastic model when all the random variable are observable.
•Find the parameters that maximize the likelihood
Sequence of random variable iid ~ N(μ,σ^2)Yt1 Yt Yt+1
of 33
PigML 28.03.2007
7Learning BN: EM
Expectation Maximization (EM)Learn parameters of a stochastic model when some variables are hidden. (ie. HHM)• Learn means to find something that we don't see • A simple example
EM maximize the likelihood in a iterative with two steps• Expectation
• Maximization
X Y
of 33
PigML 28.03.2007
8Outline
Introduction• Graph models for signal
Learning a model• ML• EM
Markov Chain HMM
• Solve HMMs Extension of HMMs Applications Conclusion
of 33
PigML 28.03.2007
9Discrete Markov Process
Set of state S Transition Matrix M
Prior Probabilityπ(0) = [ P(S1) P(S2) P(S3) ]
Marginal (Posterior probability)π(t) = π(0) M^(t)
DBN
S1
S2
S3
P(S1|S1) P(S2|S1) P(S3|S1)
P(S1|S2) P(S2|S2) P(S3|S2)
P(S1|S3) P(S2|S3) P(S3|S3)
M=
St1 St St+1
of 33
PigML 28.03.2007
10Hidden Markov Models
DBN
How to think about them?.. i.e. coin tossing
qt1 qt
Ot1 Ot
Underlying Markov Process that is notObservableOutput Markov Process that isObservable
H T H T T H T ... S1S2
S3
S1 S2
Hypothesis on the underlying model:Underlying process:
choose a coin to toss
Output process: depends on the chosen coin
of 33
PigML 28.03.2007
11Hidden Markov Models
Set of state: qt in { S1, S2, S3, ... , SN }
Underlying process transition matrix Aaij=P(qt=Si|qt1=Sj)
Output process matrix B (Produce a set of observation O)• Discrete: Ot in { v1,v2, .. vD}
bi(k)= P(Ot = vk|qt = Si), B is N x D• Continuous: Ot in R
bi = P(Ot|qt = St) , B is a vector of distribution
Prior probability on statesπ(0)= [ P(q0 = S1), P(q0 = S2), ... P(q0 = SN) ]
of 33
PigML 28.03.2007
12Learning HMM
Basic problems
We have an HMM = (A,B, ) and a sequence of λ πobservation O = O1,O2,...On
1.P(O|λ): Probability that an observation sequence came from a given HMM
2.P(q1,q2,...,qt|O): Probability to have a specific state sequence given the observation sequence
3.How to adjust to maximize P(O| ): Learning the λ λHMM
of 33
PigML 28.03.2007
13Problem 1: P(O|λ)
Given a state sequence
Probability of the observation sequence
Probability of a state sequence
Join probability of Q and O
Probability of O by summation over all state sequence
of 33
PigML 28.03.2007
14Problem 1: P(O|λ)
Forward procedure: compute a forward variable
O1 Ot...
Si Siαi(t1)
OT
Siαi(T1)
Sjαj(T1)
...
...
1 t T
of 33
PigML 28.03.2007
15EM for HMM : Forward Backward procedure
Forward Variable Backward variable
Alfa and Beta together
Estep
Ot
Si βj(t+1)
OT1 OT...
Si
Si
Ot OT1 OT...O1 ...O2
βj(t+1)αi(t1)
of 33
PigML 28.03.2007
16Problem 2: P(q1,q2, ..., qt|O)
Probability of being in state Si at time t given the observation sequence
Expression in terms of forward and backward variable
Possible solution: find local best state
• but possible solution are not ammissible state sequence
of 33
PigML 28.03.2007
17Problem 2: P(q1,q2, ..., qt|O)
Viterbi algorithm• Best score along a single path
• Keep a track of the best path that can reach the state j at time t
t1 t
O1 Ot1 Ot
π1
bi(O1) bi(O1)
OT
Most probable state
1
of 33
PigML 28.03.2007
18Problem 2: P(q1,q2, ..., qt|O)
Global behaviour
...... ... ...
Most probable state
Go back with the backtrace variable
1 T1 T
of 33
PigML 28.03.2007
19Problem 3: Learning λ
BaumWelch (EM algorithm for HMM)• Probability of being in state i at time t and in state j at
time t+1
Si βj(t+1)αi(t1) Sj
Ot+1
aij
t t+1
OTO1 Ot... ...
Extract the effect of parameter aij
of 33
PigML 28.03.2007
20Problem 3: Learning λ
BaumWelch (EM algorithm for HMM)• Expression with forward and backward variable
• Relation between and
• Expected number of transition from Si
• Expected number of transition from Si to Sj
of 33
PigML 28.03.2007
21Problem 3: Learning λ
BaumWelch: estimation of parameters
of 33
PigML 28.03.2007
22Problem 3: Learning λ
BaumWelch: Algorithm1.Initial estimation of λ2.Estep: Compute , α β3.MStep: New estimation of parameter 'λ4.Repeat from 2 until convergence
EM : if we compute the likelihood we obtain same equation for the Mstep
Problem: Learning is sensible to initial parameter value when we have continuous observations
of 33
PigML 28.03.2007
23Outline
Introduction• Graph models for signal
Learning a model• ML• EM
Markov Chain HMM
• Solve HMMs Extension of HMMs Applications Conclusion
of 33
PigML 28.03.2007
24Type of HMMs
Type change with the structure of the transition matrix of the underling process
Ergodic HMM with 4 state
Left to right HMM with 4 state
Parallel path left to right HMM with 6 state
of 33
PigML 28.03.2007
25Type of HMMs
Continuous observations: we can model the distribution probability of the output with a mixture
Kalman filter
Null transition
Covariance matrixMean vectorPrior vector
New parameter to estiamate
Xt1 Xt
Yt1 Yt
of 33
PigML 28.03.2007
26Other types and issue
Explicit state duration
Distance within two HMM
Scaling
Symmetric version
of 33
PigML 28.03.2007
27Outline
Introduction• Graph models for signal
Learning a model• ML• EM
Markov Chain HMM
• Solve HMMs Extension of HMMs Applications Conclusion
of 33
PigML 28.03.2007
28Speech recognition
A generic framework
Temporal and spectral analysis to obtain a observation sequence
Recognition of unit of the language. Word are divided into small unit to have a small set of models
Mapping from HMM to a voice unit. Given the observation choose the HMM that give the best P(O|λ)
Composition of unit to have a word
of 33
PigML 28.03.2007
29Single word recognition
Pronunciation of a single word an recognition of it
Key element: Need to have lot of domain knowledge to extract useful features
of 33
PigML 28.03.2007
30Segmental KMean Segmentation into state
When continuous observations is needed (i.e. speech recognition), the initial estimation of the observation distribution is important for the convergence
of 33
PigML 28.03.2007
31Recognition of a sequence of digits
The observation sequence matched at each step with a single word recognition system
• The observation sequences are non preclassified and we have no information about the ending of each word
• The building level match the observation sequence to a digit sequence with some probability
• An alternative way is to use the state segmentation whit an higher level model
of 33
PigML 28.03.2007
32Conclusion
HMMs are general stochastic models EM is a good algorithm to learn such models We need prior knowledge to define the structure of the
model Lot of parameters needs lot of data They perform very well for many applications if they are
applied in the correct way• Signal segmentation and classification• Clustering of signals• Prediction
of 33
PigML 28.03.2007
33End
Questions ?