47
Hidden Markov Models BMI/CS 776 www.biostat.wisc.edu/ ~craven/776.html Mark Craven [email protected] March 2002

Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven [email protected] March 2002

Embed Size (px)

Citation preview

Page 1: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Hidden Markov Models

BMI/CS 776

www.biostat.wisc.edu/~craven/776.html

Mark Craven

[email protected]

March 2002

Page 2: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Announcements• talks this week

• Quantitative Genetics in the age of Genomics• Bruce Walsh, Univ. of Arizona• 4pm 3/11, 184 Russell Labs

• Genomic Expression Programs in Yeast Cells Responding to Diverse Environmental Changes• Audrey Gasch, Lawrence Berkeley National Lab• 10am 3/13, Biotech Center Auditorium

• Population-Genetics Models for the Evolution of New Gene Function• Bruce Walsh• 4pm 3/13, 1221 Statistics

Page 3: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

A Simple HMM

A

TC

G A

TC

G

• given say a T in our input sequence, which state emitted it?

Page 4: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Hidden State

• we’ll distinguish between the observed parts of a problem and the hidden parts

• in the Markov models we’ve considered previously, it is clear which state accounts for each part of the observed sequence

• in the model above, there are multiple states that could account for each part of the observed sequence

– this is the hidden part of the problem

Page 5: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Parameters of an HMM

• since we’ve decoupled states and characters, we might also have emission probabilities

)|Pr()( kbxbe iik

)|Pr( 1 kla iikl

probability of emitting character b in state k

probability of a transition from state k to l represents a path (sequence of states) through the model

• as in Markov chain models, we have transition probabilities

Page 6: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Simple HMM for Gene FindingFigure from A. Krogh, An Introduction to Hidden Markov Models for Biological Sequences

GCTAC

AAAAA

TTTTT

CTACG

CTACA

CTACC

CTACT

start

A

T

C

G

A

T

C

G

A

T

C

Gstart

Page 7: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

HMM for Eukaryotic Gene Finding

Figure from A. Krogh, An Introduction to Hidden Markov Models for Biological Sequences

Page 8: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Announcements• reading for next week

• Chapter 5 (through section 5.5) of Durbin et al.• Chapter 6 of Durbin et al.

• talks• Population-Genetics Models for the Evolution of New

Gene Function• Bruce Walsh, University of Arizona• 4pm 3/13, 1221 Computer Sciences

• Information Fusion for Multidocument Summarization• Regina Barzilay, Columbia University• 4pm 3/14, 1325 Computer Sciences

• HW #2 ready this afternoon; due 4/3

Page 9: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Course Project• implement an algorithm or two• experiment with it on some real data• milestones

1. proposal (due 4/8)• the algorithm(s) you will be investigating• the data set(s) you will be using• 1-3 hypotheses

2. description of your experiments (due 4/15)• how you will test your hypotheses• data to be used• what will be varied• methodology

3. final write-up (due 5/17)• ~8 pages similar in format to a CS conference paper• prototypical organization: introduction, description of methods, description of

experiments, discussion of results

Page 10: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

A Simple HMM

0.8

)A(2e13a

probability of emitting character A in state 2

probability of a transition from state 1 to state 3

0.4

A 0.4C 0.1G 0.2T 0.3

A 0.1C 0.4G 0.4T 0.1

A 0.2C 0.3G 0.3T 0.2

begin end

0.5

0.5

0.2

0.8

0.6

0.1

0.90.2

0 5

4

3

2

1

A 0.4C 0.1G 0.1T 0.4

Page 11: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Three Important Questions

• How likely is a given sequence?

the Forward algorithm

• What is the most probable “path” for generating a given sequence?

the Viterbi algorithm

• How can we learn the HMM parameters given a set of sequences?

the Forward-Backward (Baum-Welch) algorithm

Page 12: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

How Likely is a Given Sequence?

• the probability that the path is taken and the sequence is generated:

(assuming begin/end are the only silent states on path)

L

iiNL iii

axeaxx1

001 11)()...,...Pr(

Lxx ...1

N ...0

Page 13: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

How Likely Is A Given Sequence?

A 0.1C 0.4G 0.4T 0.1

A 0.4C 0.1G 0.1T 0.4

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

6.03.08.04.02.04.05.0

)C()A()A(),AACPr( 35313111101

aeaeaea

A 0.4C 0.1G 0.2T 0.3

A 0.2C 0.3G 0.3T 0.2

Page 14: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

How Likely is a Given Sequence?

• the probability over all paths is:

)...,...Pr( )...Pr( 011

NLL xxxx

• but the number of paths can be exponential in the length of the sequence...

• the Forward algorithm enables us to compute this efficiently

Page 15: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

How Likely is a Given Sequence: The Forward Algorithm

• define to be the probability of being in state k having observed the first i characters of x

)(ifk

• we want to compute , the probability of being in the end state having observed all of x

• can define this recursively

)(LfN

Page 16: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Forward Algorithm• because of the Markov property, don’t have to explicitly

enumerate every path – use dynamic programming instead

)(4 if )1( ),1( 42 ifif• e.g. compute using

A 0.4C 0.1G 0.2T 0.3

A 0.1C 0.4G 0.4T 0.1

A 0.4C 0.1G 0.1T 0.4

A 0.2C 0.3G 0.3T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

Page 17: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Forward Algorithm

• initialization:

1)0(0 f

statessilent not are that for ,0)0( kfk

probability that we’re in start state andhave observed 0 characters from the sequence

Page 18: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Forward Algorithm

k

klkl aifif )()(

• recursion for silent states:

k

klkll aifieif )1()()(

• recursion for emitting states (i =1…L):

Page 19: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Forward Algorithm

• termination:

k

kNkNL aLfLfxxx )()()...Pr()Pr( 1

probability that we’re in the end state andhave observed the entire sequence

Page 20: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Forward Algorithm Example

A 0.4C 0.1G 0.2T 0.3

A 0.1C 0.4G 0.4T 0.1

A 0.4C 0.1G 0.1T 0.4

A 0.2C 0.3G 0.3T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

• given the sequence x = TAGA

Page 21: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Forward Algorithm Example

1)0(0 f 0)0(1 f

• given the sequence x = TAGA

• initialization

• computing other values

15.02.005.013.0

))0()0(()()1( 11101011

afafTef

...

0)0(5 f

8.005.014.0)1(2 f

2.015.05.004.0

))1()1(()()2( 11101011

afafAef

))4()4(()4()Pr( 4543535 afaffTAGA

Page 22: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Three Important Questions

• How likely is a given sequence?

• What is the most probable “path” for generating a given sequence?

• How can we learn the HMM parameters given a set of sequences?

Page 23: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Finding the Most Probable Path: The Viterbi Algorithm

• define to be the probability of the most probable path accounting for the first i characters of x and ending in state k

)(ivk

• we want to compute , the probability of the most probable path accounting for all of the sequence and ending in the end state

• can define recursively

• can use DP to find efficiently

)(LvN

)(LvN

Page 24: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Finding the Most Probable Path: The Viterbi Algorithm

• initialization:

1)0(0 v

statessilent not are that for ,0)0( kvk

Page 25: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Viterbi Algorithm

• recursion for emitting states (i =1…L):

klkk

ill aivxeiv )1(max)()(

klkk

l aiviv )(max)(

• recursion for silent states:

klkk

l aivi )(maxarg)(ptr

klkk

l aivi )1(maxarg)(ptr keep track of most probable path

Page 26: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Viterbi Algorithm

• traceback: follow pointers back starting at

• termination:

L

kNkk

aLv )( maxargL

kNkk

aLvx )( max),Pr(

Page 27: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Three Important Questions

• How likely is a given sequence?

• What is the most probable “path” for generating a given sequence?

• How can we learn the HMM parameters given a set of sequences?

Page 28: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Announcements

• class ends at 1:45 today

• talk today: Lattice Models, Runs-related Statistics, and Applications to Computational Biology

Yong Kong, Curagen Corp.

2:00 PM

Biotech Center Auditorium, 425 Henry Mall

• HW #2 due date now 4/8

• project proposal due 4/8

Page 29: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Learning Task

• given:

– a model

– a set of sequences (the training set)

• do:

– find the most likely parameters to explain the training sequences

• the goal is find a model that generalizes well to sequences we haven’t seen before

Page 30: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Learning Parameters

• if we know the state path for each training sequence, learning the model parameters is simple– no hidden state during training– count how often each parameter is used– normalize/smooth to get probabilities– process just like it was for Markov chain models

• if we don’t know the path for each training sequence, how can we determine the counts?– key insight: estimate the counts by considering every

path weighted by its probability

Page 31: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Learning Parameters: The Baum-Welch Algorithm

• a.k.a the Forward-Backward algorithm

• an EM approach

• algorithm sketch:

– initialize parameters of model

– iterate until convergence

• calculate the expected number of times each transition or emission is used

• adjust the parameters to maximize the likelihood of these expected values

Page 32: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step• we want to know the probability of producing sequence x

with the i th symbol being produced by state k (for all x, i and k)

C A G T

A 0.4C 0.1G 0.2T 0.3

A 0.4C 0.1G 0.1T 0.4

A 0.2C 0.3G 0.3T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

A 0.1C 0.4G 0.4T 0.1

Page 33: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step• the forward algorithm gives us , the probability of

being in state k having observed the first i characters of x)(ifk

A 0.4C 0.1G 0.2T 0.3

A 0.2C 0.3G 0.3T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

A 0.1C 0.4G 0.4T 0.1

C A G T

A 0.4C 0.1G 0.1T 0.4

Page 34: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step• the backward algorithm gives us , the probability of

observing the rest of x, given that we’re in state k after i characters

)(ibk

A 0.4C 0.1G 0.2T 0.3

A 0.2C 0.3G 0.3T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

A 0.1C 0.4G 0.4T 0.1

C A G T

A 0.4C 0.1G 0.1T 0.4

Page 35: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step• putting forward and backward together, we can compute

the probability of producing sequence x with the i th symbol being produced by state q

A 0.4C 0.1G 0.2T 0.3

A 0.2C 0.3G 0.3T 0.2

begin end

0.5

0.5

0.2

0.8

0.4

0.6

0.1

0.90.2

0.8

0 5

4

3

2

1

C A G T

A 0.4C 0.1G 0.1T 0.4

A 0.1C 0.4G 0.4T 0.1

Page 36: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step

• first, we need to know the probability of the i th symbol being produced by state q, given sequence x

)|Pr( xki

• given this we can compute our expected counts for state transitions, character emissions

Page 37: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step• the probability of of producing x with the i th symbol being

produced by state k is

)|...Pr(

),...Pr(),Pr(

1

1

kxx

kxxxk

iLi

iii

• the first term is , computed by the forward algorithm)(ifk

• the second term is , computed by the backward algorithm

)(ibk

Page 38: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Backward Algorithm• initialization:

kNk aLb )(

for states with a transition to end state

Page 39: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Backward Algorithm

• recursion (i =L…1):

otherwise ),1()(

statesilent is if ,)()(

1

l lilkl

lklk ibxea

libaib

Page 40: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Backward Algorithm

• termination:

otherwise ),1()(

statesilent is if ,)0()...Pr()Pr(

10

01

l lll

llL bxea

lbaxxx

Page 41: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step

)Pr(

)()(

)Pr(

),Pr()|Pr(

x

ibif

x

xkxk

kk

ii

• now we can calculate the probability of the i th symbol being produced by state k, given x

Page 42: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step

)()()Pr(

1

}|{, ibif

xn k

cxik

xck

i

• now we can calculate the expected number of times letter c is emitted by state k

sum oversequences

sum over positionswhere c occurs in x

Page 43: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Expectation Step

x

lilkli

k

lk x

ibxeaifn

)Pr(

)1()()( 1

• and we can calculate the expected number of times that the transition from k to l is used

x

lkli

k

lk x

ibaifn

)Pr(

)()(

• or if l is a silent state

Page 44: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Maximization Step

'',

,)(

cck

ckk n

nce

ckn ,• Let be the expected number of emissions of c from state k for the training set

• estimate new emission parameters by:

• just like in the simple case

• but typically we’ll do some “smoothing” (e.g. add pseudocounts)

Page 45: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Maximization Step

mmk

lkkl n

na

lkn • let be the expected number of transitions from state k to state l for the training set

• estimate new transition parameters by:

Page 46: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

The Baum-Welch Algorithm

• initialize parameters of model

• iterate until convergence

– calculate the expected number of times each transition or emission is used

– adjust the parameters to maximize the likelihood of these expected values

Page 47: Hidden Markov Models BMI/CS 776 craven/776.html Mark Craven craven@biostat.wisc.edu March 2002

Baum-Welch Convergence

• this algorithm will converge to a local maximum (in the likelihood of the data given the model)

• usually in a fairly small number of iterations