Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
EE613Machine Learning for Engineers
HIDDEN MARKOV MODELS
Sylvain CalinonRobot Learning & Interaction Group
Idiap Research InstituteDec. 5, 2019
Outline
2
• Markov models
• Hidden Markov model (HMM)
• Forward-backward algorithm
• Viterbi decoding (dynamic programming)
• Hidden semi-Markov model (HSMM)
• HMM with dynamic features (Trajectory-HMM)
Markov models
3
Markov models - Parameters
4
K possible states
Markov models in language modeling
5
Markov models in language modeling
6
Markov models in language modeling
7
Example of text generated from a 4-gram model, trained on a corpus of 400 million words.
The first 4 words are specified by hand, the model generates the 5th word, and then the results are fed back into the model.
Source: http://www.fit.vutbr.cz/~imikolov/rnnlm/gen-4gram.txt
SAYS IT’S NOT IN THE CARDS LEGENDARY RECONNAISSANCE BY ROLLIE DEMOCRACIES UNSUSTAINABLE COULD STRIKE REDLINING VISITS TO PROFIT BOOKING WAIT HERE AT MADISON SQUARE GARDEN COUNTY COURTHOUSE WHERE HE HAD BEEN DONE IN THREE ALREADY IN ANY WAY IN WHICH A TEACHER …
MLE of transition matrix in Markov models
8
Hidden Markov model (HMM)
Python notebook: demo_HMM.ipynb
Matlab code: demo_HMM01.m
9
In a Markov chain, the state is directly visible to the observer → the transition probabilities are the only parameters.
In an HMM, the state is not directly visible, but an output dependent on the state is visible.
Hidden Markov model (HMM)
10
Hiddenstates
Observedoutput
(emissionprobability)
Image adapted from Wikipedia
Initial state: All we know is that 60% of the days are rainy on average.
The transition probability represents the change of the weather in the underlying Markov chain. Here, there is a 30% chance that tomorrow will be sunny if today is rainy.
Hidden Markov model (HMM)
11
Hiddenstates
Observedoutput
(emissionprobability)
Image adapted from Wikipedia
You can think of an HMM either as:
• a Markov chain with stochastic measurements
• a GMM with latent variables changing over time
The emission probability represents how likely Bob performs a certain activity on each day. if it is sunny, there is a 60% chance that he is outside for a walk. If it is rainy, there is a 50% chance that he cleans his apartment, etc.
Inference problems associated with HMMs
12
Emission/output distributions in HMM
Discrete tables
V1
V2
V3
V1
V2
V3
Gaussian distribution Mixture of Gaussians
13
Transition matrix structures in HMM
14
HMM - Examples of application
15
HMM is used in many fields as a tool for time series or sequences analysis, and in fields where the goal is to recover a data sequence that is not immediately observable:
Speech recognitionSpeech synthesisPart-of-speech taggingNatural language modelingMachine translationGene predictionMolecule kinetic analysisDNA motif discoveryAlignment of bio-sequences (e.g., proteins)Metamorphic virus detectionDocument separation in scanning solutions
CryptoanalysisActivity recognitionProtein foldingHuman motion scienceOnline handwriting recognitionRobotics
Automatic speech recognition
ξt can represent features extracted from the speech signal, and st can represent
the word being spoken. The transition model P(st|st-1) represents the language
model, and the observation model P(ξt|st) represents the acoustic model.
Part of speech tagging
ξt can represent a word, and st represents its part of speech (noun, verb,
adjective, etc.)
Activity recognition
ξt can represent features extracted from a video frame, and st is the class of
activity the person is engaged in (e.g., running, walking, sitting, etc.).
Gene finding
ξt can represent the DNA nucleotides (A,T,G,C), and st can represent whether we
are inside a gene-coding region or not.
16
HMM - Examples of application ξt Observationst Hidden state
GMM
HMM
HMM parameters
From now on, we will consider a single Gaussian as state output
17
Useful intermediary variables in HMM
18
Forward variable
Backward variable
Smoothed node marginals
Smoothed edge marginals
KK
K
Forward algorithm
19
Forward algorithm
20
Forward algorithm
21
ξ1
s1
ξ2
s2
ξ3
s3
ξ4
s4
Forward algorithm
22
ξ1
s1
ξ2
s2
ξ3
s3
ξ4
s4
KK
K
Forward algorithm
23
.94
.06
.94 .94 .94 .94 1
1.06 .06 .06 .06
Low influence of transition probabilities w.r.t. emission probabilities in HMM
.06
.94
.06 .06 .06 .06 1
1.94 .94 .94 .94
24
Learned transition probabilities Transition probabilities manually set
Direction of motion
The color of each datapointcorresponds to the value of the forward variable 𝛼
Useful intermediary variables in HMM
25
Forward variable
Backward variable
Smoothed node marginals
Smoothed edge marginals
26
Backward algorithm
K
K
K
Backward algorithm
27
Useful intermediary variables in HMM
28
Forward variable
Backward variable
Smoothed node marginals
Smoothed edge marginals
These variable are sometimes called "smoothed values" as they combine forward and backward probabilities in the computation.
You can think of their roles as passing "messages" from left to right, and from right to left, and then combining the information at each node.
Smoothed node marginals
29
Smoothed node marginals
30
Conditional independenceproperty
ξ1
s1
ξ2
s2
ξ3
s3
ξ4
s4
Smoothed node marginals
31
These variable are sometimes called "smoothed values" as they combine forward and backward probabilities in the computation.
You can think of their roles as passing "messages" from left to right, and from right to left, and then combining the information at each node.
Useful intermediary variables in HMM
32
Forward variable
Backward variable
Smoothed node marginals
Smoothed edge marginals
Smoothed edge marginals
K
Smoothed edge marginals
34
Smoothed edge marginals
35
EM for HMM
Similar to Markov models
Similarto GMM
K GaussiansM trajectoriesTm points per traj.
36
EM for HMM
37
K GaussiansM trajectoriesTm points per traj.
EM for HMM - Summary
38
K GaussiansM trajectoriesTm points per traj.
EM for HMM - Summary
39
These results can be formally retrieved with EM (also called Baum-Welch algorithm in the context of HMM).
The update rules can be interpreted as normalized counts, with several types of weighted averages required in the computation.
K GaussiansM trajectoriesTm points per traj.
Numerical underflow issue in HMM
40
Numerical underflow issue in HMM
41
This issue is sometimes not covered in textbooks, although it remains very important for practical implementation of HMM!
Summary - Why did we introduce these four intermediary variables in HMM?
Forward variable
Backward variable
Smoothed node marginals
Smoothed edge marginals
42
Summary - Why did we introduce these four intermediary variables in HMM?
43
Viterbi decoding (MAP vs MPE estimates)
Python notebook: demo_HMM.ipynb
Matlab code: demo_HMM_Viterbi01.m
44
Maximum a posteriori Most probable explanation
Viterbi decoding (MAP vs MPE estimates)
45
Maximum a posteriori Most probable explanation
Viterbi decoding - Trellis representation
46
Viterbi decoding - Algorithm
This is the probability of ending up in state i at time step t by taking the most probable path
It tells us the most likely previous state on the most probable path to st = i
47
Viterbi decoding - Trellis representation
48
Viterbi decoding - Example
Image adapted from Kevin P. Murphy (2012), Machine Learning: A Probabilistic Perspective49
Numerical underflow issue in Viterbi
50
Numerical underflow issue in Viterbi
51
Hidden semi-Markov model (HSMM)
Python notebook: demo_HSMM.ipynb
Matlab code: demo_HSMM01.m
52
By artificially duplicating the number of states while keeping the same emission
distribution, other state duration distributions can be modeled
State duration probability in standard HMM
53
The state duration follows a geometric distribution
54
Another approach is to provide an explicit model of the state duration instead of relying on
self-transition probabilities
Hidden semi-Markov model (HSMM)
GMM
HMM
HSMM
Hidden semi-Markov model (HSMM)
Parametric duration distribution
Hidden semi-Markov model (HSMM)
56
Hidden semi-Markov model (HSMM)
57
Hidden semi-Markov model (HSMM)
58
HMM with dynamic features
(Trajectory-HMM)
Matlab code: demo_trajHSMM01.m
59
HMM with dynamic features
60
61
HMM with dynamic features
62
HMM with dynamic features
63
HMM with dynamic features
64
(C=3 here)
D dimensions C derivativesT time steps
HMM with dynamic features
Large sparse matrix
65
HMM with dynamic features
66
HMM with dynamic features
67
Weighted Least Squares!
HMM with dynamic features
HMM with dynamic features
68
HMM with dynamic features - Summary
69
References
70
Hidden Markov model (HMM)
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77:2:257–285, February 1989
Hidden semi-Markov model (HSMM)
S.-Z. Yu. Hidden semi-Markov models. Artificial Intelligence, 174:215–243, 2010
S. E. Levinson. Continuously variable duration hidden Markov models for automatic speech recognition. Computer Speech & Language, 1(1):29–45, 1986
HMM with dynamic features (Trajectory HMM)
S. Furui. Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. on Acoustics, Speech, and Signal Processing, 34(1):52–59, 1986
H. Zen, K. Tokuda, and T. Kitamura. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech and Language, 21(1):153–173, 2007
Appendix
71
72
Markov models - Transition matrix
MLE of transition matrix in Markov models
73
HMM: Smoothed edge marginals
74
Conditional independence property
ξ1
s1
ξ2
s2
ξ3
s3
ξ4
s4
HSMM: Initialization of forward variable
75