Upload
others
View
15
Download
1
Embed Size (px)
Citation preview
Master 2 MOSIG
Knowledge Representation and Reasoning
HMM and Bayesian Filtering
Elise Arnaud
Université Joseph Fourier / INRIA Rhône-Alpes
Overview
1. Introduction on HMM
2. Filtering : Problem Statement
3. Overview of existing solutions
4. Forward algorithm
5. Kalman filter
6. Particle filter
7. Applications
8. Conclusion
Références
– A. Doucet, S.J. Godsill, C. Andrieu. On sequential Monte Carlo sampling
methods for Bayesian filtering. Statist. Comput., 10, 197-208, 2000.
– S.M. Arulampalam, S. Maskell, N.J. Gordon, T.C. Clapp. A tutorial on particle
filters for online nonlinear / non-Gaussian Bayesian tracking IEEE
Transactions on Signal Processing 50, 2 174-188, February 2002.
– Arnaud Doucet, Nando de Freitas, Neil Gordon (eds). Sequential Monte Carlo
Methods in Practice. Springer-Verlag, 2001.
Overview
1. Introduction on HMM
2. Filtering : Problem Statement
3. Overview of existing solutions
4. Forward algorithm
5. Kalman filter
6. Particle filter
7. Applications
8. Conclusion
2 (Introduction on HMM : 1)
Reminder on Markov chain
– we observe a system at discrete times 0, 1, 2, ..., t.
– The system can be in one state of a collection of possible states
– The observation of the system is considered as an experience whose (random)
result is the state’s system → stochastic process
examples :
– state of an engine (working, not working)
– weather (rain, cloud, snow, sun)
– robot’s position on a grid
The system is evolving in time
3 (Introduction on HMM : 2)
Reminder on Markov chain
Markov property : the state of a system at time t only depends on the state at
time t − 1
Knowing the present, we can forget the past to predict the future
Let x be a Markov chain :
X = {X0,X1,X2, . . . ,Xk . . .} = {Xk; k > 0}
Xk takes its value in a finite set of possible values : the state space X
p(xk+1|x0,x1, . . .xk) = p(xk+1|xk)
4 (Introduction on HMM : 3)
Reminder on Markov chain
To define a Markov chain X = {Xk; k > 0}, one needs :
– the state space X (the m possible values is the state space is discrete)
– the initial distribution p(X0)
– the transition matrix Q that describes the probabilities to go from one state to
another p(xk|xk−1)
To do inference calculation, we will use the joint law :
p(x0:t) = p(x0)∏
k=1:t
p(xk|xk−1)
5 (Introduction on HMM : 4)
Markov chain ... Hidden Markov chain
Let X = {Xk; k > 0} be a Markov chain
What we are interested in :
Knowing the state of the chain at instant k xk ∈ X ?
– We would like to know the weather (rain, cloud, sun, snow)
– We would like to know the position of a robot on a grid
6 (Introduction on HMM : 5)
Markov chain ... Hidden Markov chain
Let X = {Xk; k > 0} be a Markov chain
What we are interested in :
Knowing the state of the chain at instant k xk ∈ X ?
– We would like to know the weather (rain, cloud, sun, snow)
– We would like to know the position of a robot on a grid
Problem : the state of the system is indirectly / partially observed
– We would like to know the weather ... but we measure the temperature only
– We would like to know the position of a robot on a grid .... but we gather data
from a gyroscope on top of the robot
7 (Introduction on HMM : 6)
Markov chain ... Hidden Markov chain
Let X = {Xk; k > 0} be a Markov chain
What we are interested in :
Knowing the state of the chain at instant k xk ∈ X ?
– We would like to know the weather (rain, cloud, sun, snow)
– We would like to know the position of a robot on a grid
Problem : the state of the system is indirectly / partially observed
– We would like to know the weather ... but we measure the temperature only
– We would like to know the position of a robot on a grid .... but we gather data
from a gyroscope on top of the robot
described by a hidden Markov chain
8 (Introduction on HMM : 7)
Hidden Markov chain
Hidden Markov Model HMM = {Xk,Zk}k>0
9 (Introduction on HMM : 8)
Hidden Markov chain
Hidden Markov Model HMM = {Xk,Zk}k>0
{Xk}k>0 : state process
– state space X
– Markovian process
– transition law p(xk|xk−1) (transition matrix Q if X discrete and finite)
10 (Introduction on HMM : 9)
Hidden Markov chain
Hidden Markov Model HMM = {Xk,Zk}k>0
{Xk}k>0 : state process
– state space X
– Markovian process
– transition law p(xk|xk−1) (transition matrix Q if X discrete and finite)
{Zk}k>0 : observation (measurement) process
– observation space Z
– the measurement at time k only depend on the state at time k
– likelihood p(zk|xk) (likelihood matrix B if Z discrete and finite)
11 (Introduction on HMM : 10)
Hidden Markov chain
Hidden Markov Chain : model of a dynamic system
described by
1. state space X and observation space Z
2. initial distribution p(X0)
3. transition law p(xk|x0:k−1, z1:k−1) = p(xk|xk−1)
4. likelihood p(zk|x0:k−1, z1:k−1) = p(zk|xk)
12 (Introduction on HMM : 11)
Goal
calculate the state of a system from a set of observations
c©mercator
13 (Introduction on HMM : 12)
Goal
calculate the state of a system from a set of observations
– estimate the weather today from a set of temperatures measured till today
– estimate the position of the robot from the gyroscope data
14 (Introduction on HMM : 13)
Goal
calculate the state of a system from a set of observations
– estimate the weather today from a set of temperatures measured till today
– estimate the position of the robot from the gyroscope data
From a sequence of observations z1:k = {z1, . . . , zk}, the goal is to find the state
xk for wich the probability p(xk|z1:k) is maximal.
filtering problem
15 (Introduction on HMM : 14)
Goal
other problematics
From a sequence of observations z1:k = {z1, . . . , zk}, and the model, estimate
– lthe sequence of state x0:k = {x1, . . . ,xk} for wich the probability p(x0:k|z1:k) is
maximal : trajectography
– the most probable (previous) state at time t, with t < k : smoothing
– the most probable (futur) state at time t, with t > k : prediction
– the probability of occurence of the sequence of observation (to study rare events)
From a sequence of observations z1:k, and a sequence of states x1:k, estimate the model
parameters : learning
16 (Introduction on HMM : 15)
Applications
positioning, navigation and tracking
– target tracking
– computer vision
– mobile robotics
– ambient intelligence
– sensor networks, etc.
17 (Introduction on HMM : 16)
Applications
among others ...
data assimilation
environmental sciences
(oceanography, meteorology, atmospheric pollution)
information theory
bio-informatic
speech recognition
handwritting recognition
finance
...
Overview
1. Introduction on HMM
2. Filtering : Problem Statement
3. Overview of existing solutions
4. Forward algorithm
5. Kalman filter
6. Particle filter
7. Applications
8. Conclusion
18 (Filtering : Problem Statement : 1)
Problem Statement
Dynamic system modeled as a Hidden Markov Chain
described by
1. state space X ; measurement space Z
2. the initial distribution p(X0)
3. an evolution model (transition law) p(xk|x0:k−1, z1:k−1) = p(xk|xk−1)
4. a observation model (likelihood) p(zk|x0:k−1, z1:k−1) = p(zk|xk)
19 (Filtering : Problem Statement : 2)
Problem Statement
Dynamic system modeled as a Hidden Markov Chain
We have :
p(x0:t, z1:t) = p(x0)∏
k=1:t p(xk|xk−1) p(zk|xk)
p(x0:t, z1:t) = p(xt|xt−1) p(zt|xt) p(x0:t−1, z1:t−1)
20 (Filtering : Problem Statement : 3)
Problem Statement
Filtering - tracking :
estimation of the state given the past and present measurements
p(xk|z1:k)
Trajectography :
estimation of the state trajectory given the past and present measurements
p(x0:k|z1:k)
Smoothing :
estimation of the state given the past and some future measurements
p(xk|z1:t) t > k
Prediction :
estimation of a future state given the measurements up to a past time
p(xk|z1:t) t < k
21 (Filtering : Problem Statement : 4)
Problem Statement
Filtering
– estimation of the state given the past and present measurements
filtering distribution : p(xk|z1:k)
– This estimation has to be sequential ie :
p(xk−1|z1:k−1) → Algorithm → p(xk|z1:k)
22 (Filtering : Problem Statement : 5)
Problem Statement
Toy exemple : the white car tracking
– state xk : position + velocity
– evolution model : the car evolves at constant velocity
– observation zk : detected white cars
– observation model : the tracked car should be one of
the detected cars
p(xk|z1:k) current position of the white car
knowing all previous and current detected white cars
23 (Filtering : Problem Statement : 6)
Problem Statement
Objective : Sequential estimation of the filtering distribution p(xk|z1:k)
p(xk|z1:k) = C p(xk, z1:k)
= C
Z
p(xk,Xk−1, z1:k−1, zk) dXk−1
= C
Z
p(zk|xk,Xk−1, z1:k−1) p(xk,Xk−1, z1:k−1) dXk−1
= C p(zk|xk)
Z
p(xk|Xk−1, z1:k−1) p(Xk−1, z1:k−1) dXk−1
= C p(zk|xk)
Z
p(xk|Xk−1) p(Xk−1, z1:k−1) dXk−1
= C p(zk|xk)
Z
p(xk|Xk−1) p(Xk−1|z1:k−1 )p(z1:k−1) dXk−1
= C′ p(zk|xk)
Z
p(xk|Xk−1) p(Xk−1|z1:k−1) dXk−1
where C′ = C p(z1:k−1)
24 (Filtering : Problem Statement : 7)
Problem Statement
Objective : Sequential estimation of the filtering distribution p(xk|z1:k)
i.e. estimation of p(xk|z1:k) knowing p(xk−1|z1:k−1)
p(xk|z1:k) = C p(xk, z1:k) = C′ p(zk|xk)
∫
p(xk|Xk−1) p(Xk−1|z1:k−1) dXk−1
with
C =1
p(z1:k)
then
C′ =
p(z1:k−1)
p(z1:k)=
p(z1:k−1)
p(z1:k−1, zk)=
p(z1:k−1)
p(zk|z1:k−1) p(z1:k−1)=
1
p(zk|z1:k−1)
=1
R
p(zk,Xk|z1:k−1) dXk
=1
R
p(zk|Xk)p(Xk|z1:k−1) dXk
25 (Filtering : Problem Statement : 8)
Problem Statement
Objective : Sequential estimation of the filtering distribution p(xk|z1:k)
i.e. estimation of p(xk|z1:k) knowing p(xk−1|z1:k−1)
Optimal Bayesian Filter
1. prediction :
p(xk|z1:k−1) =
∫
p(xk|Xk−1) p(Xk−1|z1:k−1) dXk−1
2. update :
p(xk|z1:k) =p(zk|xk) p(xk|z1:k−1)
∫
p(zk|Xk) p(Xk|z1:k−1) dXk
26 (Filtering : Problem Statement : 9)
Problem Statement
Objective : Sequential estimation of the filtering distribution p(xk|z1:k)
i.e. estimation of p(xk|z1:k) knowing p(xk−1|z1:k−1)
Optimal Bayesian Filter
1. prediction :
p(xk|z1:k−1) =
∫
p(Xk|xk−1) p(Xk−1|z1:k−1) dXk−1
2. update :
p(xk|z1:k) =p(zk|xk) p(xk|z1:k−1)
∫
p(zk|Xk) p(Xk|z1:k−1) dXk
... but how to compute these two integrals ?
27 (Filtering : Problem Statement : 10)
Problem Statement
So far ...
p(xk−1|z1:k−1) → Algorithm → p(xk|z1:k)
– exact solution : Optimal Bayesian Filter
– but this solution implies the calculation of two huge integrals ...
– various algorithms can be proposed, depending on the model :
p(xk|xk−1) and p(zk|xk)
Overview
1. Introduction on HMM
2. Filtering : Problem Statement
3. Overview of existing solutions
4. Forward algorithm
5. Kalman filter
6. Particle filter
7. Applications
8. Conclusion
28 (Overview of existing solutions : 1)
Overview of existing solutions
– Both X and Z are discrete and finite → Forward algorithm
– Otherwise
– Linear Gaussian model → Kalman filter
– weakly non linear, Gaussian → Extensions of Kalman filter
– non linear, non Gaussian → Particle filter (Sequential Monte Carlo methods)
Overview
1. Introduction on HMM
2. Filtering : Problem Statement
3. Overview of existing solutions
4. Forward algorithm
5. Kalman filter
6. Particle filter
7. Applications
8. Conclusion
29 (Forward algorithm : 1)
Forward algorithm
Let us suppose that the HMM is characterized by the transition matrix Q defined
such as :
qij = p(Xk+1 = j|Xk = i)
and the observation matrix B defined such as :
bi(j) = p(Zk = j|Xk = i)
then, we can use the forward algorithm to calculate p(xk|z1:k)
30 (Forward algorithm : 2)
Forward algorithm
Exemple of a 2 state HMM
marche arret
0.7
0.3
0.1
0.9
Q =qMM qMA
qAM qAA=
0.9 0.1
0.7 0.3
31 (Forward algorithm : 3)
Forward algorithm
Exemple of a 2 state HMM
marche arret
0.7
0.3
0.1
0.9
0.2 0.8 0.050.95
B =bM (R) bM (V )
bA(R) bA(V )=
0.2 0.8
0.95 0.05
32 (Forward algorithm : 4)
Forward algorithm
Exemple of a 2 state HMM - representation on a lattice
mar
arrarr
mar
arr
mar
arr
mar
...
t = 2t = 0
t = 1 t = k
p(X0 = M) = µ(M) = 0.9 ; p(X0 = A) = µ(A) = 0.1
p(X3 = M |Z1 = R,Z2 = R,Z3 = V )?
p(X3 = R|Z1 = R,Z2 = R,Z3 = V )?
33 (Forward algorithm : 5)
Forward algorithm
Exemple of a 2 state HMM - representation on a latticemarche
arretarret
marche
arret
marche
arret
marche
...
t = 2t = 0
t = 1 t = k
p(X1 = M |Z1 = R)
∝ p(X1 = M,Z1 = R)
= [µ(M) ∗ p(X1 = M |X0 = M) + µ(A) ∗ p(X1 = M |X0 = A)] p(Z1 = R|X1 = M)
= [µ(M) qMM + µ(A) qAM ] bM (R)
= α1(M)
in a similar manner :
p(X1 = A|Z1 = R) ∝ p(X1 = A,Z1 = R)
= [µ(M) qMA + µ(A) qAA] bA(R) = α1(A)
34 (Forward algorithm : 6)
Forward algorithm
Exemple of a 2 state HMM - representation on a latticemarche
arretarret
marche
arret
marche
arret
marche
...
t = 2t = 0
t = 1 t = k
p(X2 = M |Z1 = R,Z2 = R)
∝ p(X2 = M,Z1 = R,Z2 = R)
= α1(M) ∗ qMM ∗ bM (R) + α1(A) ∗ qAM ∗ bM (R) = α2(M)
in a similar manner :
p(X2 = A|Z1 = R,Z2 = R)
∝ p(X2 = A,Z1 = R,Z2 = R)
= α1(M) ∗ qMA ∗ bA(R) + α1(A) ∗ qAA ∗ bA(R) = α2(A)
... we can do the same for p(X3|Z1 = R,Z2 = R,Z3 = V )
35 (Forward algorithm : 7)
Forward algorithm
Generalization
We define αk(i) = p(z1:k,Xk = i) the probability to observe z1 . . . zk with a
sequence of states that end in state i.
We have p(Xk = i|z1:k) ∝ αk(i)
The forward algorithm gives us a efficient way to calculate αk+1(i) knowing
αk(j) ∀i, j ∈ X
36 (Forward algorithm : 8)
Forward algorithm
Generalization
Initialization
α0(i) = µ(i)
Induction
αk+1(i) =
N∑
j=1
αk(j) qji
bi(zk+1)
We can also calculate
p(z1:k) =N
∑
j=1
αk(j)
T observations, N state ⇒ N2T operations
37 (Forward algorithm : 9)
Forward algorithm
x1
x2
xi
x3
xN
z
...
q1i
qNi
q2i
q3i
bi(z)
38 (Forward algorithm : 10)
Other problems
– trajectography : Viterbi algorithm
– smoothing : Forward-Backward algorithm
– learning : Baum-Welch algorithm
Overview
1. Introduction on HMM
2. Filtering : Problem Statement
3. Overview of existing solutions
4. Forward algorithm
5. Kalman filter
6. Particle filter
7. Applications
8. Conclusion