View
250
Download
6
Category
Preview:
Citation preview
Hidden Markov Model
Nov 11, 2008
Sung-Bae Cho
• Hidden Markov Model
• Inference of Hidden Markov Model
• Path Tracking of HMM
• Learning of Hidden Markov Model
• Hidden Markov Model Applications
• Summary & Review
Agenda
Temporal Pattern Recognition
• The world is constantly changing.
• Temporal data sequence= …, X2, X1, X0, X1, X2, …
• Observed vs. real value
– Real value: X
– Observation: Y
3
Xt
YYtt
~
Hidden Concept and Actual Realization
1 2 3 nY YY Y Y 1 2 3 nX X X X X
reality
idea
X1
X2
X3
X4 X5
X6
X7
X8
Y1
Y2
Y3
Y4 Y5
Y6
Y7
Y8
4
Hidden Markov Model
• Definition:
– A statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters
– challenge is to determine the hidden parameters from the observable parameters
– Extracted model parameters can be used to perform further analysis
• Expression:
• A hidden random variable Xt that conditions another random variable Yt
X1 X2 Xt …
Y1 Y2 Yt …
1( , )t t tX Y
1
2
3
N
P2|1 P3|2P1|1
( ) ( | )t t tP X P Y X
Xt S = {1, 2, …, N}
5
Random Processes
• Xt1 Xt or Xt | Xt1 => Markov process
– Description: {P(Xt|Xt1)}
• Yt | Xt => Random process (often a Gaussian process)
– Description: {P(Yt|Xt)}
• Combination: {P(Xt|Xt1)P(Yt|Xt)}
– Doubly stochastic process
X1 X2 Xt…
Y1 Y2 Yt…
6
Why HMM?
• A good model for highly variable discrete-time sequence
– often noisy, uncertain and incomplete
• Generalization of DTW matching template
• Rigorous and theoretical foundation
– the model can be optimized
• Models spatiotemporal variabilities elegantly
– greater variability-modeling power than M-Chain
• Efficient inference/computation algorithms
• Theoretical and robust learning algorithm
• Can be combined to model complex patterns (composition and extension)
7
What is an HMM - Notation
• Three sets of parameters = (, A, B)
– Initial state probabilities :
= {i : i = Pr(X1= i)}
• constraints:
– Transition probabilities A:
A = {aij : aij = Pr(Xt+1= j|Xt= i)}
• constraints:
– Observation probabilities B:
B = {bj(v) : bj(v) = Pr(ot= v|Xt= j)}
• constraints:
1
0, 1N
i ii
1
0, 1N
ii ijj
a a
( ) 0, ( ) 1j jv V
b v b v
8
Model Parameters
• State space/alphabet:
• Matrices:S = { 1, 2, 3 }, N = 3
V = { 1, 2, 3, 4 }
.4
.3
.1
.2
.4
.3
.1
.2
.4
.3
.1
.2
b1(v) =
.21 2 3a12= 0.3
.7 .4.6
.8
.7 .3 .0
.0 .4 .6
.0 .2 .8A =
1. 0. 0. =
• N : num. of hidden state
• Q : state set
–
• M : num. of observation symbol
• S : observation symbol set
–
• A : transition probabilities
• B : observation probabilities
• π : initial state probabilities
• λ : HMM model
– λ=(A, B, π)
},,,{ 21 NqqqQ
},,,{ 21 NsssS
9
Markov Model Rule
• Observation sequence
–
• Chain rule
–
• Markov assumption
– Observation oi is affected by only observation oi-1
–
• Markov chain rule
–
T
iiiT oooPoooP
11121 ),,|(),,,(
},,,{ 21 ToooO
T
iiiT ooPoooP
1121 )|(),,,(
)|(),,|( 111 iiii ooPoooP
10
• Hidden Markov Model
• Inference of Hidden Markov Model
• Path Tracking of HMM
• Learning of Hidden Markov Model
• Hidden Markov Model Applications
• Summary & Review
Agenda
Three Basic Problems
• Evaluation (Estimation) Problem• given an HMM • given an observation• compute the probability of the observation
Solution: Forward Algorithm, Backward Algorithm• Decoding Problem
• given an HMM• given an observation • compute the most likely state sequence• i.e.
Solution: Viterbi Algorithm• Learning / optimization problem
• given an HMM • given an observation• find an HMM such that
Solution: Baum-Welch Algorithm
1 2{ , ,... | }Tp o o o 1 2, ,... To o o
1 2, ,... To o o
1 2, ,...,q q qTs s s
1,... 1 2 1arg max ( , ,..., | ,... , )q qT T Tp o o o q q
1 2, ,... To o o
1 2 1 1 2{ , ,... | } { , ,... | }T Tp o o o p o o o
The Evaluation Problem
• We know :
=
• From this :
=
• Obvious:
for sufficiently large values of T, it is infeasible to compute the above term for all possible state sequences need other solution
1 2 1 2( , ,... | , , ,..., )T q q qTp o o o s s s
1 1 1 11 11,... 1( ) ( ) ( )
k k kq q q q q kk Ts b o a b o
1 2( , ,... | )Tp o o o
1 1 1 112
3
1 11,... 1,... 11,...1,...
1,...
( ) ( ) ( )k k k
T
q q q q q kq N k Tq Nq N
q N
s b o a b o
The Forward Algorithm
• At time t and state i, probability of partial observation sequence
: array
• As a result at the last time T
1 2, ,... to o o ( )t i
1 1( ) ( )i ii b o 1 i N [ ][ ]time state
1 11
( ) [ ( ) ] ( )N
t t ij j ti
j i a b o
[ ][ ] ( )timetime state state
[ ][ ]state
T state1 2( , ,... | )Tp o o o
Forward Algorithm
• Definition
• Algorithm
– Initialization
– Induction
– End condition
15
1(i) ibi(o1) 1i N
)|,...()( 1 ittt sqooPi
)()()(1
1 tj
N
iijtt obaij
P(O | ) T (i)i1
N
)1,2( NjTt
Backward Algorithm
• Definition
• Algorithm
– Initialization
– Induction
– End condition
16
),|...()( 1 itTtt sqooPi
T (i) 1, 1i N
N
jttjijt jobai
111 )()()(
P(O | ) i 1(i)i1
N
)1 ,1,...,1( NiTt
• Hidden Markov Model
• Inference of Hidden Markov Model
• Path Tracking of HMM
• Learning of Hidden Markov Model
• Hidden Markov Model Applications
• Summary & Review
Agenda
The Decoding Problem
• Finding the “optimal” state sequence associated with the given observation sequence
Forward-Backward
• Optimality criterion : to choose the states that are individually most likely at each time t
• The probability of being in state i at time t
• : accounts for partial observation sequence
• : account for remainder
tq
1
( ) ( | , )
( ) ( )
( ) ( )
i t
t tN
t ti
t p q i O
i i
i i
( )t i( )t i 1 2, ,...t t To o o
1 2, ,... to o o
Viterbi Algorithm
• Solution to model decoding problem
– Given Y = O = o1 o2 o3 · · · oT,
– What is the best among all possible state sequences that might have produced O?
The best?• Be evaluated in probabilistic terms
1. A sequence of the most likely states at each time? (Greedy fashion)
2. The most likely complete state sequence (from any one of start states to any one final states): P(X, O|)
The best?• Be evaluated in probabilistic terms
1. A sequence of the most likely states at each time? (Greedy fashion)
2. The most likely complete state sequence (from any one of start states to any one final states): P(X, O|)
1,ˆ
TX
20
Viterbi Path
• is the path whose joint probability with the observation is the most likely:
1,ˆ arg max ( , | )T
XX P O X
1 1 2 2 3 1 1 21 2
( , | ) ( | , ) ( | )( ) ( ) ( )
T T Tx x x x x x x x x x T
P O X P O X P Xa a a b o b o b o
N T possible paths of X
O(TN T) multiplications with exhaustive enumeration
• Simplistic rewriting: (Let X = X1,T = x1 x2 … xT )
21
Viterbi Path Likelihood
Partial Viterbi path likelihood: (for X1,t, tT)
Back pointer to the prev best state
1ˆ( ) arg max ( ) ( )t t ij j t
ij i i a b o
1, 1, 1
1, 1 1, 1
1, 1 1, 1 1, 1 1, 1
1, 1 1, 2 1 1
1 1
( ) Pr( , , | )Pr( , , , | )
( , | ) Pr( , | , , )
max Pr( , , | ) Pr( , | , )
max ( ) Pr( | , , ) Pr( | ,
t t t t
t t t t
t t t t t t
t t t t t ti
t t t t t ti
j O X X jO o X X j
P O X o X j O X
O X X i o X j X i
i o X i X j X j X i
1
)
max ( ) ( ) , 1,..., , 1,...,t j t iji
i b o a j N t T
22
Viterbi Algorithm
1 1( ) ( )i ii b o
1 11
( ) max ( ) ( )t t ij j ti N
j i a b o
)(max1
iP TNi
1( ) 0i
11
( ) arg max ( )t t iji N
j i a
1arg max ( )T T
i Nx i
1 1( ), 1,...,1t t tx x t T
1
2
3
states
• Initialization
• Recursion
• Termination
• Backtracing
23
Viterbi Algorithm: Example
• Viterbi trellis construction
.6
.2
.2
.2
.5
.3
.0
.3
.7
RGB
.5
.6
.4
.4.1
=[1 0 0]T
.5×.2.0018
.00648
.01008
.4×.3
.1×.7.4×.7
.6×.3
.61×.6
0×.2.0
0×.0.0
.5×.2.018
.6×.5.036
.00576
.4×.5
.1×.3.4×.3
.5×.6.18
.6×.2.048
.0
.4×.2
.1×.0.4×.0
P(O, X*|) = Pr(RRGB, X= 1123|) = 0.01008
R R G B
24
• Hidden Markov Model
• Inference of Hidden Markov Model
• Path Tracking of HMM
• Learning of Hidden Markov Model
• Hidden Markov Model Applications
• Summary & Review
Agenda
The Learning / Optimization problem
• How do we adjust the model parameters to maximize
??
• Parameter Estimation
• Baum-Welch Algorithm ( EM : Expectation Maximization )
• Iterative Procedure
( | )P O
Parameter Estimation
• Probability of being in state i at time t, and state j at time t+1
• Probability of being in state i at time t, given the entire observation sequence and the model
• We can relate these by summing over j
1
1 1
1 11 1
( , ) ( , | , )
( ) ( ) ( )
( ) ( ) ( )
t t t
t ij j t t
N N
t ij j t ti j
i j P q i q j O
i a b o j
i a b o j
1
( ) ( , )N
t tj
i i j
Parameter Estimation (3)
• By summing over time index t …
• expected number of times that state i visited
• expected number of transitions made from state i
• That is …
= expected number of times that state i in O
= expected number of transitions made from state i to j
in O
• Update using &
: expected frequency (number of times) in state i at time (t=1)
1
1
( )T
tt
i
1
1
( , )T
tt
i j
( , , )A B ( , )t i j ( )i t_
1( )i i
Parameter Estimation (5)
• New Transition Probability …
expected number of transitions from state i to j
expected number of transitions from state I
=
1
_1
1
1
( , )
( )
T
tt
ij T
tt
i ja
i
Parameter Estimation (6)
• New Observation Probability…
expected number of times in state j and observing symbol
expected number of times in j
=
kv
1_. .
1
( )
( )( )
t k
T
tts t o v
j T
tt
j
b kj
Parameter Estimation (7)
• From , if we define new
• New model is more likely than old model in the sense that
• The observation sequence is more likely to be produced by new model
• has been proved by Baum & his colleagues
• iteratively use new model in place of old model, and repeat the reestimation calculation “ML estimation”
( , , )A B _ _ _ _
( , , )A B
_
( | ) ( | )P O P O
Baum-Welch Algorithm (1)
t (i, j) P(qt si, qt1 s j | O,)• Definition
–
• Calculation
–
• Definition
–
• Calculation
–
t (i) t (i, j)j1
N
t (i, j) t (i) ai, j b j (ot1) t1( j)
t (i) ai, j b j (ot1) t1( j)j1
N
i1
N
),|()( OsqPi itt
32
Baum-Welch Algorithm (2)
• Algorithm
1. Set initial model (λ0)
2. Estimate next model
Calculate: ,
3. Maximization : finding λ
4. If P(O|λ)-P(O|λ0) < threshold then stop
5. Else λ = λ0, move to 2 (repetition)
)(it ),( jit
T
tt
T
ttkt
i
i
ivokb
1
1
)(
)(),()(
1
1
1
1,
)(
),(
T
tt
T
tt
ji
i
jia
)(1 ii
(ot ,vk ) 1, if ot vk, and 0 otherwise
33
Classification Algorithm
• Classification
• Viterbi algorithm
• Domain/linguistic knowledge
– Markov source model for character probability
1,
ˆ arg max ( | ) ( )
arg max ( ) max ( )
T k kk
kk T
ik
k p Y P
P i
P(W) = P(w1 w2 … wn) = P(w1) P(w2|w1) … P(wn|wn-1)P(“123”) = P(“1”) P(“2”|“1”) P(“3”| “2”)
34
• Hidden Markov Model
• Inference of Hidden Markov Model
• Path Tracking of HMM
• Learning of Hidden Markov Model
• Hidden Markov Model Applications
• Summary & Review
Agenda
University of Alberta
• National ICT Australia project, University of Alberta, Canada
• Object
– Motion/gesture recognition of human
• sensors
– active, magnetic field, acoustic, laser, camera sensor
• method
– Coupled hidden Markov model (CHMM)
– Coupled HMMs provide an efficient way to resolve many complex problems, and offer superior training speeds, model likelihoods, and robustness to initial conditions.
– Proposed by M. Brand (1997)
[M. Brand, N. Oliver, and A. Pentland, “Coupled Hidden Markov Models for complex action recognition,” in IEEE Intl. Conf. Comp. Vis. Pat. Rec., 1997, pp. 994.999.]
36
University of Bologna
• Micrel, University of Bologna, Lab Italy, (2004)
• Research
– Setup ubiquitous environments
– Sensory data processing
– Gesture recognition
• Sensors
– Develop: Wireless MOCA (Motion capture with integrated accelerometers)
• Accelerometer, gyroscope
• Small size, small consumption, wireless
• Wearing on body
• Recognition method
– Hidden Markov Model
37
MIT Media LAB
• Media Laboratory, Massachusetts Institute of Technology
• Area: Visual Contextual Awareness in Wearable Computing (1998)
• Sensor: Vision
• Method
– Probabilistic object recognition
• Based on observed diverse featurevector
• Using probabilistic relations (O: object, M: measurement)
– Task recognition with HMM
38
eWatch Sensor Platform
• CMU Computer Science Lab, 2005
• Activity Recognition + improving power consumption
• Hardware
– LCD, LED, vibration motor, speaker,Bluetooth for wireless communication
– Li-Ion battery with a capacity of 700mAh
• Sensors
– a two-axis accelerometer (ADXL202; +/- 2g)
– Microphone, light & temperature sensors
• Method
– multi-class SVMs + HMM based Selective Sampling
39
Summary
• Hidden Markov Model introduction
• HMM inference method (estimation)
• HMM path tracking (decoding)
• HMM learning
• HMM application
4040
Recommended