45
Master 2 MOSIG Knowledge Representation and Reasoning HMM and Bayesian Filtering Elise Arnaud Universit´ e Joseph Fourier / INRIA Rhˆone-Alpes [email protected]

Knowledge Representation and Reasoning HMM and …...Master 2 MOSIG Knowledge Representation and Reasoning HMM and Bayesian Filtering Elise Arnaud Universit´e Joseph Fourier / INRIA

  • Upload
    others

  • View
    15

  • Download
    1

Embed Size (px)

Citation preview

  • Master 2 MOSIG

    Knowledge Representation and Reasoning

    HMM and Bayesian Filtering

    Elise Arnaud

    Université Joseph Fourier / INRIA Rhône-Alpes

    [email protected]

  • Overview

    1. Introduction on HMM

    2. Filtering : Problem Statement

    3. Overview of existing solutions

    4. Forward algorithm

    5. Kalman filter

    6. Particle filter

    7. Applications

    8. Conclusion

  • Références

    – A. Doucet, S.J. Godsill, C. Andrieu. On sequential Monte Carlo sampling

    methods for Bayesian filtering. Statist. Comput., 10, 197-208, 2000.

    – S.M. Arulampalam, S. Maskell, N.J. Gordon, T.C. Clapp. A tutorial on particle

    filters for online nonlinear / non-Gaussian Bayesian tracking IEEE

    Transactions on Signal Processing 50, 2 174-188, February 2002.

    – Arnaud Doucet, Nando de Freitas, Neil Gordon (eds). Sequential Monte Carlo

    Methods in Practice. Springer-Verlag, 2001.

  • Overview

    1. Introduction on HMM

    2. Filtering : Problem Statement

    3. Overview of existing solutions

    4. Forward algorithm

    5. Kalman filter

    6. Particle filter

    7. Applications

    8. Conclusion

  • 2 (Introduction on HMM : 1)

    Reminder on Markov chain

    – we observe a system at discrete times 0, 1, 2, ..., t.

    – The system can be in one state of a collection of possible states

    – The observation of the system is considered as an experience whose (random)

    result is the state’s system → stochastic process

    examples :

    – state of an engine (working, not working)

    – weather (rain, cloud, snow, sun)

    – robot’s position on a grid

    The system is evolving in time

  • 3 (Introduction on HMM : 2)

    Reminder on Markov chain

    Markov property : the state of a system at time t only depends on the state at

    time t − 1

    Knowing the present, we can forget the past to predict the future

    Let x be a Markov chain :

    X = {X0,X1,X2, . . . ,Xk . . .} = {Xk; k > 0}

    Xk takes its value in a finite set of possible values : the state space X

    p(xk+1|x0,x1, . . .xk) = p(xk+1|xk)

  • 4 (Introduction on HMM : 3)

    Reminder on Markov chain

    To define a Markov chain X = {Xk; k > 0}, one needs :

    – the state space X (the m possible values is the state space is discrete)

    – the initial distribution p(X0)

    – the transition matrix Q that describes the probabilities to go from one state to

    another p(xk|xk−1)

    To do inference calculation, we will use the joint law :

    p(x0:t) = p(x0)∏

    k=1:t

    p(xk|xk−1)

  • 5 (Introduction on HMM : 4)

    Markov chain ... Hidden Markov chain

    Let X = {Xk; k > 0} be a Markov chain

    What we are interested in :

    Knowing the state of the chain at instant k xk ∈ X ?

    – We would like to know the weather (rain, cloud, sun, snow)

    – We would like to know the position of a robot on a grid

  • 6 (Introduction on HMM : 5)

    Markov chain ... Hidden Markov chain

    Let X = {Xk; k > 0} be a Markov chain

    What we are interested in :

    Knowing the state of the chain at instant k xk ∈ X ?

    – We would like to know the weather (rain, cloud, sun, snow)

    – We would like to know the position of a robot on a grid

    Problem : the state of the system is indirectly / partially observed

    – We would like to know the weather ... but we measure the temperature only

    – We would like to know the position of a robot on a grid .... but we gather data

    from a gyroscope on top of the robot

  • 7 (Introduction on HMM : 6)

    Markov chain ... Hidden Markov chain

    Let X = {Xk; k > 0} be a Markov chain

    What we are interested in :

    Knowing the state of the chain at instant k xk ∈ X ?

    – We would like to know the weather (rain, cloud, sun, snow)

    – We would like to know the position of a robot on a grid

    Problem : the state of the system is indirectly / partially observed

    – We would like to know the weather ... but we measure the temperature only

    – We would like to know the position of a robot on a grid .... but we gather data

    from a gyroscope on top of the robot

    described by a hidden Markov chain

  • 8 (Introduction on HMM : 7)

    Hidden Markov chain

    Hidden Markov Model HMM = {Xk,Zk}k>0

  • 9 (Introduction on HMM : 8)

    Hidden Markov chain

    Hidden Markov Model HMM = {Xk,Zk}k>0

    {Xk}k>0 : state process

    – state space X

    – Markovian process

    – transition law p(xk|xk−1) (transition matrix Q if X discrete and finite)

  • 10 (Introduction on HMM : 9)

    Hidden Markov chain

    Hidden Markov Model HMM = {Xk,Zk}k>0

    {Xk}k>0 : state process

    – state space X

    – Markovian process

    – transition law p(xk|xk−1) (transition matrix Q if X discrete and finite)

    {Zk}k>0 : observation (measurement) process

    – observation space Z

    – the measurement at time k only depend on the state at time k

    – likelihood p(zk|xk) (likelihood matrix B if Z discrete and finite)

  • 11 (Introduction on HMM : 10)

    Hidden Markov chain

    Hidden Markov Chain : model of a dynamic system

    described by

    1. state space X and observation space Z

    2. initial distribution p(X0)

    3. transition law p(xk|x0:k−1, z1:k−1) = p(xk|xk−1)

    4. likelihood p(zk|x0:k−1, z1:k−1) = p(zk|xk)

  • 12 (Introduction on HMM : 11)

    Goal

    calculate the state of a system from a set of observations

    c©mercator

  • 13 (Introduction on HMM : 12)

    Goal

    calculate the state of a system from a set of observations

    – estimate the weather today from a set of temperatures measured till today

    – estimate the position of the robot from the gyroscope data

  • 14 (Introduction on HMM : 13)

    Goal

    calculate the state of a system from a set of observations

    – estimate the weather today from a set of temperatures measured till today

    – estimate the position of the robot from the gyroscope data

    From a sequence of observations z1:k = {z1, . . . , zk}, the goal is to find the state

    xk for wich the probability p(xk|z1:k) is maximal.

    filtering problem

  • 15 (Introduction on HMM : 14)

    Goal

    other problematics

    From a sequence of observations z1:k = {z1, . . . , zk}, and the model, estimate

    – lthe sequence of state x0:k = {x1, . . . ,xk} for wich the probability p(x0:k|z1:k) is

    maximal : trajectography

    – the most probable (previous) state at time t, with t < k : smoothing

    – the most probable (futur) state at time t, with t > k : prediction

    – the probability of occurence of the sequence of observation (to study rare events)

    From a sequence of observations z1:k, and a sequence of states x1:k, estimate the model

    parameters : learning

  • 16 (Introduction on HMM : 15)

    Applications

    positioning, navigation and tracking

    – target tracking

    – computer vision

    – mobile robotics

    – ambient intelligence

    – sensor networks, etc.

  • 17 (Introduction on HMM : 16)

    Applications

    among others ...

    data assimilation

    environmental sciences

    (oceanography, meteorology, atmospheric pollution)

    information theory

    bio-informatic

    speech recognition

    handwritting recognition

    finance

    ...

  • Overview

    1. Introduction on HMM

    2. Filtering : Problem Statement

    3. Overview of existing solutions

    4. Forward algorithm

    5. Kalman filter

    6. Particle filter

    7. Applications

    8. Conclusion

  • 18 (Filtering : Problem Statement : 1)

    Problem Statement

    Dynamic system modeled as a Hidden Markov Chain

    described by

    1. state space X ; measurement space Z

    2. the initial distribution p(X0)

    3. an evolution model (transition law) p(xk|x0:k−1, z1:k−1) = p(xk|xk−1)

    4. a observation model (likelihood) p(zk|x0:k−1, z1:k−1) = p(zk|xk)

  • 19 (Filtering : Problem Statement : 2)

    Problem Statement

    Dynamic system modeled as a Hidden Markov Chain

    We have :

    p(x0:t, z1:t) = p(x0)∏

    k=1:t p(xk|xk−1) p(zk|xk)

    p(x0:t, z1:t) = p(xt|xt−1) p(zt|xt) p(x0:t−1, z1:t−1)

  • 20 (Filtering : Problem Statement : 3)

    Problem Statement

    Filtering - tracking :

    estimation of the state given the past and present measurements

    p(xk|z1:k)

    Trajectography :

    estimation of the state trajectory given the past and present measurements

    p(x0:k|z1:k)

    Smoothing :

    estimation of the state given the past and some future measurements

    p(xk|z1:t) t > k

    Prediction :

    estimation of a future state given the measurements up to a past time

    p(xk|z1:t) t < k

  • 21 (Filtering : Problem Statement : 4)

    Problem Statement

    Filtering

    – estimation of the state given the past and present measurements

    filtering distribution : p(xk|z1:k)

    – This estimation has to be sequential ie :

    p(xk−1|z1:k−1) → Algorithm → p(xk|z1:k)

  • 22 (Filtering : Problem Statement : 5)

    Problem Statement

    Toy exemple : the white car tracking

    – state xk : position + velocity

    – evolution model : the car evolves at constant velocity

    – observation zk : detected white cars

    – observation model : the tracked car should be one of

    the detected cars

    p(xk|z1:k) current position of the white car

    knowing all previous and current detected white cars

  • 23 (Filtering : Problem Statement : 6)

    Problem Statement

    Objective : Sequential estimation of the filtering distribution p(xk|z1:k)

    p(xk|z1:k) = C p(xk, z1:k)

    = C

    Z

    p(xk,Xk−1, z1:k−1, zk) dXk−1

    = C

    Z

    p(zk|xk,Xk−1, z1:k−1) p(xk,Xk−1, z1:k−1) dXk−1

    = C p(zk|xk)

    Z

    p(xk|Xk−1, z1:k−1) p(Xk−1, z1:k−1) dXk−1

    = C p(zk|xk)

    Z

    p(xk|Xk−1) p(Xk−1, z1:k−1) dXk−1

    = C p(zk|xk)

    Z

    p(xk|Xk−1) p(Xk−1|z1:k−1 )p(z1:k−1) dXk−1

    = C′ p(zk|xk)

    Z

    p(xk|Xk−1) p(Xk−1|z1:k−1) dXk−1

    where C′ = C p(z1:k−1)

  • 24 (Filtering : Problem Statement : 7)

    Problem Statement

    Objective : Sequential estimation of the filtering distribution p(xk|z1:k)

    i.e. estimation of p(xk|z1:k) knowing p(xk−1|z1:k−1)

    p(xk|z1:k) = C p(xk, z1:k) = C′ p(zk|xk)

    p(xk|Xk−1) p(Xk−1|z1:k−1) dXk−1

    with

    C =1

    p(z1:k)

    then

    C′ =

    p(z1:k−1)

    p(z1:k)=

    p(z1:k−1)

    p(z1:k−1, zk)=

    p(z1:k−1)

    p(zk|z1:k−1) p(z1:k−1)=

    1

    p(zk|z1:k−1)

    =1

    R

    p(zk,Xk|z1:k−1) dXk

    =1

    R

    p(zk|Xk)p(Xk|z1:k−1) dXk

  • 25 (Filtering : Problem Statement : 8)

    Problem Statement

    Objective : Sequential estimation of the filtering distribution p(xk|z1:k)

    i.e. estimation of p(xk|z1:k) knowing p(xk−1|z1:k−1)

    Optimal Bayesian Filter

    1. prediction :

    p(xk|z1:k−1) =

    p(xk|Xk−1) p(Xk−1|z1:k−1) dXk−1

    2. update :

    p(xk|z1:k) =p(zk|xk) p(xk|z1:k−1)

    p(zk|Xk) p(Xk|z1:k−1) dXk

  • 26 (Filtering : Problem Statement : 9)

    Problem Statement

    Objective : Sequential estimation of the filtering distribution p(xk|z1:k)

    i.e. estimation of p(xk|z1:k) knowing p(xk−1|z1:k−1)

    Optimal Bayesian Filter

    1. prediction :

    p(xk|z1:k−1) =

    p(Xk|xk−1) p(Xk−1|z1:k−1) dXk−1

    2. update :

    p(xk|z1:k) =p(zk|xk) p(xk|z1:k−1)

    p(zk|Xk) p(Xk|z1:k−1) dXk

    ... but how to compute these two integrals ?

  • 27 (Filtering : Problem Statement : 10)

    Problem Statement

    So far ...

    p(xk−1|z1:k−1) → Algorithm → p(xk|z1:k)

    – exact solution : Optimal Bayesian Filter

    – but this solution implies the calculation of two huge integrals ...

    – various algorithms can be proposed, depending on the model :

    p(xk|xk−1) and p(zk|xk)

  • Overview

    1. Introduction on HMM

    2. Filtering : Problem Statement

    3. Overview of existing solutions

    4. Forward algorithm

    5. Kalman filter

    6. Particle filter

    7. Applications

    8. Conclusion

  • 28 (Overview of existing solutions : 1)

    Overview of existing solutions

    – Both X and Z are discrete and finite → Forward algorithm

    – Otherwise

    – Linear Gaussian model → Kalman filter

    – weakly non linear, Gaussian → Extensions of Kalman filter

    – non linear, non Gaussian → Particle filter (Sequential Monte Carlo methods)

  • Overview

    1. Introduction on HMM

    2. Filtering : Problem Statement

    3. Overview of existing solutions

    4. Forward algorithm

    5. Kalman filter

    6. Particle filter

    7. Applications

    8. Conclusion

  • 29 (Forward algorithm : 1)

    Forward algorithm

    Let us suppose that the HMM is characterized by the transition matrix Q defined

    such as :

    qij = p(Xk+1 = j|Xk = i)

    and the observation matrix B defined such as :

    bi(j) = p(Zk = j|Xk = i)

    then, we can use the forward algorithm to calculate p(xk|z1:k)

  • 30 (Forward algorithm : 2)

    Forward algorithm

    Exemple of a 2 state HMM

    marche arret

    0.7

    0.3

    0.1

    0.9

    Q =qMM qMA

    qAM qAA=

    0.9 0.1

    0.7 0.3

  • 31 (Forward algorithm : 3)

    Forward algorithm

    Exemple of a 2 state HMM

    marche arret

    0.7

    0.3

    0.1

    0.9

    0.2 0.8 0.050.95

    B =bM (R) bM (V )

    bA(R) bA(V )=

    0.2 0.8

    0.95 0.05

  • 32 (Forward algorithm : 4)

    Forward algorithm

    Exemple of a 2 state HMM - representation on a lattice

    mar

    arrarr

    mar

    arr

    mar

    arr

    mar

    ...

    t = 2t = 0

    t = 1 t = k

    p(X0 = M) = µ(M) = 0.9 ; p(X0 = A) = µ(A) = 0.1

    p(X3 = M |Z1 = R,Z2 = R,Z3 = V )?

    p(X3 = R|Z1 = R,Z2 = R,Z3 = V )?

  • 33 (Forward algorithm : 5)

    Forward algorithm

    Exemple of a 2 state HMM - representation on a latticemarche

    arretarret

    marche

    arret

    marche

    arret

    marche

    ...

    t = 2t = 0

    t = 1 t = k

    p(X1 = M |Z1 = R)

    ∝ p(X1 = M,Z1 = R)

    = [µ(M) ∗ p(X1 = M |X0 = M) + µ(A) ∗ p(X1 = M |X0 = A)] p(Z1 = R|X1 = M)

    = [µ(M) qMM + µ(A) qAM ] bM (R)

    = α1(M)

    in a similar manner :

    p(X1 = A|Z1 = R) ∝ p(X1 = A,Z1 = R)

    = [µ(M) qMA + µ(A) qAA] bA(R) = α1(A)

  • 34 (Forward algorithm : 6)

    Forward algorithm

    Exemple of a 2 state HMM - representation on a latticemarche

    arretarret

    marche

    arret

    marche

    arret

    marche

    ...

    t = 2t = 0

    t = 1 t = k

    p(X2 = M |Z1 = R,Z2 = R)

    ∝ p(X2 = M,Z1 = R,Z2 = R)

    = α1(M) ∗ qMM ∗ bM (R) + α1(A) ∗ qAM ∗ bM (R) = α2(M)

    in a similar manner :

    p(X2 = A|Z1 = R,Z2 = R)

    ∝ p(X2 = A,Z1 = R,Z2 = R)

    = α1(M) ∗ qMA ∗ bA(R) + α1(A) ∗ qAA ∗ bA(R) = α2(A)

    ... we can do the same for p(X3|Z1 = R,Z2 = R,Z3 = V )

  • 35 (Forward algorithm : 7)

    Forward algorithm

    Generalization

    We define αk(i) = p(z1:k,Xk = i) the probability to observe z1 . . . zk with a

    sequence of states that end in state i.

    We have p(Xk = i|z1:k) ∝ αk(i)

    The forward algorithm gives us a efficient way to calculate αk+1(i) knowing

    αk(j) ∀i, j ∈ X

  • 36 (Forward algorithm : 8)

    Forward algorithm

    Generalization

    Initialization

    α0(i) = µ(i)

    Induction

    αk+1(i) =

    N∑

    j=1

    αk(j) qji

    bi(zk+1)

    We can also calculate

    p(z1:k) =N

    j=1

    αk(j)

    T observations, N state ⇒ N2T operations

  • 37 (Forward algorithm : 9)

    Forward algorithm

    x1

    x2

    xi

    x3

    xN

    z

    ...

    q1i

    qNi

    q2i

    q3i

    bi(z)

  • 38 (Forward algorithm : 10)

    Other problems

    – trajectography : Viterbi algorithm

    – smoothing : Forward-Backward algorithm

    – learning : Baum-Welch algorithm

  • Overview

    1. Introduction on HMM

    2. Filtering : Problem Statement

    3. Overview of existing solutions

    4. Forward algorithm

    5. Kalman filter

    6. Particle filter

    7. Applications

    8. Conclusion