30
Classic HMM tutorial – see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series, HMMs, Kalman Filters Machine Learning – 10701/15781 Carlos Guestrin Carnegie Mellon University March 28 th , 2005

Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

  • Upload
    others

  • View
    29

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Classic HMM tutorial – see class website:*L. R. Rabiner, "A Tutorial on Hidden Markov Models and SelectedApplications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2,pp.257--286, 1989.

Time series,HMMs,Kalman FiltersMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon University

March 28th, 2005

Page 2: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Adventures of our BN hero

Compact representation for probability distributionsFast inferenceFast learning

But… Who are the most popular kids?

1. Naïve Bayes

2 and 3. Hidden Markov models (HMMs)Kalman Filters

Page 3: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Handwriting recognition

Character recognition, e.g., kernel SVMs

z cb

cac rr

rr r

r

Page 4: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Example of a hidden Markov model (HMM)

Page 5: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Understanding the HMM Semantics

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Page 6: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

HMMs semantics: Details

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Just 3 distributions:

Page 7: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

HMMs semantics: Joint distribution

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Page 8: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Learning HMMs from fully observable data is easy

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Learn 3 distributions:

Page 9: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Possible inference tasks in an HMM

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Marginal probability of a hidden variable:

Viterbi decoding – most likely trajectory for hidden vars:

Page 10: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Using variable elimination to compute P(Xi|o1:n)

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Compute:

Variable elimination order?

Example:

Page 11: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

What if I want to compute P(Xi|o1:n) for each i?

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Compute:

Variable elimination for each i?

Variable elimination for each i, what’s the complexity?

Page 12: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Reusing computationX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Compute:

Page 13: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

The forwards-backwards algorithmX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Initialization: For i = 2 to n

Generate a forwards factor by eliminating Xi-1

Initialization: For i = n-1 to 1

Generate a backwards factor by eliminating Xi+1

∀ i, probability is:

Page 14: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Most likely explanationX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 = Compute:

Variable elimination order?

Example:

Page 15: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

The Viterbi algorithmX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Initialization: For i = 2 to n

Generate a forwards factor by eliminating Xi-1

Computing best explanation: For i = n-1 to 1

Use argmax to get explanation:

Page 16: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

What about continuous variables?

In general, very hard! Must represent complex distributions

A special case is very doableWhen everything is GaussianCalled a Kalman filterOne of the most used algorithms in the history of probabilities!

Page 17: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Time series data example: Temperatures from sensor network

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Page 18: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Operations in Kalman filter

Compute

Start with At each time step t:

Condition on observation

Roll-up (marginalize previous time step)

X1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Page 19: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Detour: Understanding Multivariate Gaussians

Observe attributesExample: Observe X1=18

P(X2|X1=18)

Page 20: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Characterizing a multivariate Gaussian

Mean vector:

Covariance matrix:

Page 21: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Conditional Gaussians

Conditional probabilitiesP(Y|X)

Page 22: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Kalman filter with GaussiansX1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Equivalent to a linear system

Page 23: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Detour2: Canonical form

Standard form and canonical forms are related:

Conditioning is easy in canonical formMarginalization easy in standard form

Page 24: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Conditioning in canonical form

First multiply:

Then, condition on value B = y

Page 25: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Operations in Kalman filter

Compute

Start with At each time step t:

Condition on observation

Roll-up (marginalize previous time step)

X1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Page 26: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Roll-up in canonical form

First multiply:

Then, marginalize Xt:

Page 27: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Operations in Kalman filter

Compute

Start with At each time step t:

Condition on observation

Roll-up (marginalize previous time step)

X1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Page 28: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Learning a Kalman filter

Must learn:

Learn joint, and use division rule:

Page 29: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

Maximum likelihood learning of a multivariate Gaussian

Data:

Means are just empirical means:

Empirical covariances:

Page 30: Time series, HMMs, Kalman Filtersguestrin/Class/10701-S05/slides/hmms.pdf · Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system

What you need to know

Hidden Markov models (HMMs)Very useful, very powerful!Speech, OCR,…Parameter sharing, only learn 3 distributionsTrick reduces inference from O(n2) to O(n)Special case of BN

Kalman filterContinuous vars version of HMMsAssumes Gaussian distributionsEquivalent to linear systemSimple matrix operations for computations