Introduction to Hidden Markov Models Bayesian Networks/OL… · Introduction •A Hidden Markov...

Introduction to Hidden Markov Models

Introduction

• A Hidden Markov Model (HMM) is a simple temporal Bayesian

network.

• The structure of the network dictates how a system that varies over

time moves from one state to the next.

• Generally, we start at an initial state at time zero and the process

moves to each of its possible new states with a probability based

solely on the current state.

• In Markov models, the states are directly observable – they are the

values being measured

• In Hidden Markov Models, the states are hidden, but lead to

observable values with certain probabilities.

• Set of states:

• Process moves from one state to another generating a

sequence of states :

•Markov chain property: probability of each subsequent state depends only on

what was the previous state:

•To define Markov model, the following probabilities have to be specified:

inital probabilities and transition probabilities

Firstly, Markov Models

},,,{ 21 Nsss K

KK ,,,,21 ikii

)|(),,,|( 1121 !! =ikikikiiikssPssssP K

)|( jiij ssPa =)(iisP=!

Rain Dry

0.70.3

0.2 0.8

• Two states : ‘Rain’ and ‘Dry’.

• Transition probabilities:

P(‘Rain’|‘Rain’)=0.3 , P(‘Dry’|‘Rain’)=0.7 ,

P(‘Rain’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8

• Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry’)=0.6 .

Example of Markov Model

• By Markov chain property, probability of state sequence can be found by the

formula:

•Suppose we want to calculate a probability of a sequence of states in our

example, {‘Dry’,’Dry’,’Rain’,Rain’}.

P({‘Dry’,’Dry’,’Rain’,Rain’} )

= P(‘Rain’|’Rain’) P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’)

= 0.3*0.2*0.8*0.6

Calculation of sequence probability

)()|()|()|(

),,,()|(

),,,(),,,|(),,,(

112211

12112121

iiiikikikik

ikiiikik

ikiiikiiikikii

sPssPssPssP

sssPssP

sssPssssPsssP

Low High

0.70.3

0.2 0.8

DryRain

0.6 0.60.4 0.4

Example of Hidden Markov Model

• Two states : ‘Low’ and ‘High’ atmospheric pressure.

• Two observations : ‘Rain’ and ‘Dry’.

• Transition probabilities:

P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7 ,

P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8

• Observation probabilities :

P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 , P(‘Rain’|‘High’)=0.4 ,

P(‘Dry’|‘High’)=0.3 .

• Initial probabilities: say P(‘Low’)=0.4 , P(‘High’)=0.6 .

Example of Hidden Markov Model

•Suppose we want to calculate a probability of a sequence of observations in

our example, {‘Dry’,’Rain’}.

•Consider all possible hidden state sequences:

P({‘Dry’,’Rain’} )

= P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P({‘Dry’,’Rain’} ,

{‘Low’,’High’}) + P({‘Dry’,’Rain’} , {‘High’,’Low’}) +

P({‘Dry’,’Rain’} , {‘High’,’High’})

where first term is :

P({‘Dry’,’Rain’} , {‘Low’,’Low’})

= P({‘Dry’,’Rain’} | {‘Low’,’Low’}) P({‘Low’,’Low’})

= P(‘Dry’|’Low’)P(‘Rain’|’Low’) P(‘Low’)P(‘Low’|’Low’)

= 0.4*0.4*0.6*0.4*0.3

Calculation of observation sequence probability

How HMM Works

• HMM is a stochastic generative

model for sequences

• Defined by

– finite set of states S

– finite alphabet A

– transition prob matrix T

– emission prob matrix E

• Move from state to state

according to T while emitting

symbols according to Esk

• Game:

– You bet £1

– You roll your die

– Casino rolls their die

– Highest number wins £2

• Question: Suppose we played 2

games, and the sequence of rolls

was 1, 6, 2, 6. Have we been

cheated?

Example: Dishonest Casino

• Casino has two dices:

– Fair dice

• P(i) = 1/6, i = 1..6

– Loaded dice

• P(i) = 1/10, i = 1..5

• P(i) = 1/2, i = 6

• Casino switches between fair &

loaded die with probability of 1/2.

• Initially, dice is always fair

“Visualisation” of Dishonest Casino

1, 6, 2, 6?

We were probably cheated...

E(1|Fair) * T(?,Fair)*

E(6|Loaded) * T(Fair,Loaded)*

E(2|Fair) * T(Loaded, Fair) *

E(6|Loaded) * T(Fair, Loaded)

Evaluation problem. Given the HMM M=(A, B, !) and the observation sequence O=o1

o2 ... oK , calculate the probability that model M has generated sequence O .

• Decoding problem. Given the HMM M=(A, B, !) and the observation sequence O=o1

o2 ... oK , calculate the most likely sequence of hidden states si that produced this observation

sequence O.

• Learning problem. Given some training observation sequences O=o1 o2 ... oK and general

structure of HMM (numbers of hidden and visible states), determine HMM parameters

M=(A, B, !) that best fit training data.

O=o1...oK denotes a sequence of observations ok!{v1,…,vM}.

Main uses of HMMs :

• Typed word recognition, assume all characters are separated.

• Character recognizer outputs probability of the image being

particular character, P(image|character).

Word recognition example(1).

Hidden state Observation

• If lexicon is given, we can construct separate HMM models

for each lexicon word.

Amherst a m h e r s t

Buffalo b u f f a l o

0.5 0.03

• Here recognition of word image is equivalent to the problem

of evaluating few HMM models.

•This is an application of Evaluation problem.

0.4 0.6

• We can construct a single HMM for all words.

• Hidden states = all characters in the alphabet.

• Transition probabilities and initial probabilities are calculated

from language model.

• Observations and observation probabilities are as before.

• Here we have to determine the best sequence of hidden states,

the one that most likely produced word image.

• This is an application of Decoding problem.

What makes a good HMM problem space?

Characteristics:

• Classification problems

There are two main types of output from an HMM:

– Scoring of sequences

• For example, protein family modelling

– Labeling of observations within a sequence

• For example, identifying genes in a particular sequence

HMM Requirements

So you’ve decided you want to build an HMM,

here’s what you need:

• An architecture

– Probably the hardest part

– Should be sound & easy to interpret

• A well-defined success measure

– Necessary for any form of machine learning

HMM Requirements

Continued

• Training data

– Labeled or unlabeled – it depends

• You do not always need a labeled training set to do

observation labeling, but it helps

– Amount of training data needed is:

• Directly proportional to the number of free parameters in the

• Inversely proportional to the size of the training sequences

HMM Advantages

• Statistical Grounding

– Statisticians are comfortable with the theory behind hidden

Markov models

– Freedom to manipulate the training and verification processes

– Mathematical / theoretical analysis of the results and processes

– HMMs are still very powerful modeling tools – far more powerful

than many statistical methods

HMM Advantages continued

• Modularity

– HMMs can be combined into larger HMMs

• Transparency of the Model

– Assuming an architecture with a good design

– People can read the model and make sense of it

– The model itself can help increase understanding

HMM Advantages continued

• Incorporation of Prior Knowledge

– Incorporate prior knowledge into the architecture

– Initialize the model close to something believed to be correct

– Use prior knowledge to constrain training process

HMM Disadvantages

• Markov Chains

– States are supposed to be independent

– P(y) must be independent of P(x), and vice versa

– This usually isn’t true

– Can get around it when relationships are local

P(x) … P(y)

HMM Disadvantages

continued

• …and then there are the standard Machine Learning Problems

• Watch out for local maxima

– Model may not converge to a truly optimal parameter set for a

given training set

• Avoid over-fitting

– You’re only as good as your training set

– More training is not always good

HMM Disadvantages

continued

• Speed!!!

– Almost everything one does in an HMM involves: “enumerating

all possible paths through the model”

– There are efficient ways to do this

– Still slow in comparison to other methods

Conclusions

• HMMs have problems where they excel, and problems where they

do not

• You should consider using one if:

– Problem can be phrased as classification

– Observations are ordered

– The observations follow some sort of grammatical structure

(optional)

Conclusions

Advantages:

– Statistics

– Modularity

– Transparency

– Prior Knowledge

Disadvantages:

– State independence

– Over-fitting

– Local Maximums

– Speed

Introduction to Hidden Markov Models Bayesian Networks/OL… · Introduction •A Hidden Markov...

Documents

Markov Chains and Hidden Markov Models - Rice University · are “hidden”; hence, we have a hidden Markov model, or HMM. ... In Markov chains and hidden Markov models, the probability

Development and Application of Hidden Markov … and Application of Hidden Markov Models in ... Development and Application of Hidden Markov Models in the Bayesian Framework ... 1.1

Hidden Markov

Hidden Markov Model} Induction by Bayesian Model Mergingpapers.nips.cc/paper/669-hidden-markov-model-induction-by-bayesian... · Hidden Markov Model Induction by Bayesian Model Merging

Hidden Markov Models · Hidden Markov Models 1 10-601 Introduction to Machine Learning Matt Gormley Lecture 20 Nov. 7, 2018 ... Hidden Markov Model 28 A Hidden Markov Model (HMM)

Hidden Markov Model} Induction by Bayesian Model Merging

Markov hidden

AN INTRODUCTION TO HIDDEN MARKOV MODELS AND … · February 7, 2001 10:35 WSPC/115-IJPRAI 00083 An Introduction to Hidden Markov Models and Bayesian Networks 11 state S t is independent

A Bayesian phylogenetic hidden Markov model for B cell ...vnminin.github.io/papers/Dhar2020.pdfRESEARCH ARTICLE A Bayesian phylogenetic hidden Markov model for B cell receptor sequence

Inference in Mixed Hidden Markov Models and Applications ... · 2. From hidden Markov models to mixed hidden Markov models Many authors suggested applying hidden Markov models to

Hidden Markov Models Applied To Intraday Momentum …2. Hidden Markov Models An HMM is a Bayesian state space model that as-sumes discrete time unobserved (hidden or latent) states

Variational Bayesian Hidden Markov ModelsFigure 3.1: Graphical model representation of a hidden Markov model. The hidden variables s t transition with probabilities speciﬁed in the

Lecture 6a: Introduction to Hidden Markov Models · Lecture 6a: Introduction to Hidden Markov Models ... Markov Chain/Hidden Markov Model ... The states are hidden from the

An Introduction to Hidden Markov Models and Bayesian Networks

Bayesian inference and model selection for stochastic ...€¦ · Bayesian inference and model selection for stochastic epidemics and other coupled hidden Markov models (with special

Parameter Estimation for Hidden Markov models · Outline Introduction Hidden Markov Models Approximate Bayesian Computation Noisy ABC Summary Introduction II will describe a theoretical

Performance of Hidden Markov Model and Dynamic Bayesian ...strathprints.strath.ac.uk/48364/1/HMM_and_DBN_v3.pdf · Performance of Hidden Markov Model and Dynamic Bayesian Network

Hidden Markov Model Nov 11, 2008 Sung-Bae Cho. Hidden Markov Model Inference of Hidden Markov Model Path Tracking of HMM Learning of Hidden Markov Model

Bayes’ Theorem, Bayesian Networks and Hidden Markov Model

Hidden Markov Model - elearning.di.unipi.it · algorithm for HMMs Graphical models withvarying structure: Dynamic Bayesian Networks. Hidden Markov Models Wrap-Up Generative models