24
Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI

Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Embed Size (px)

Citation preview

Page 1: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Viterbi, Forward, and BackwardAlgorithms for Hidden Markov Models

Prof. Carolina RuizComputer Science Department

Bioinformatics and Computational Biology ProgramWPI

Page 2: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Resources used for these slides

• Durbin, Eddy, Krogh, and Mitchison. "Biological Sequence Analysis". Cambridge University Press. 1998. Sections 3.1-3.3.

• Prof. Moran's Algorithms in Computational Biology course (Technion Univ.): – Ydo Wexler & Dan Geiger's Markov Chain Tutorial. – Hidden Markov Models (HMMs) Tutorial.

Page 3: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

HMM: Coke/Pepsi Examplestart

B

R

AC

P C

P

Hidden States:• start: fake start state• A: The price of Coke and Pepsi are the same• R: “Red sale”: Coke is on sale (cheaper than Pepsi)• B: “Blue sale”: Pepsi is on sale (cheaper than Coke)

Emissions:• C: Coke• P: Pepsi

0.60.1

0.30.2

0.4

0.7

0.1

0.1 0.3

0.8

0.3

0.1

C P

0.6

0.4

0.5

0.5

0.10.9

Page 4: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

1. Finding the most likely trajectory

• Given a HMM and a sequence of observables:x1,x2,…,xL

• determine the most likely sequence of states that generated x1,x2,…,xL:

S* = (s*1,s*2,…,s*L)

= argmax p( s1,s2,…,sL| x1,x2,…,xL ) s1,s2,…,sL

= argmax p( s1,s2,…,sL; x1,x2,…,xL)/p(x1,x2,…,xL) s1,s2,…,sL

= argmax p( s1,s2,…,sL; x1,x2,…,xL ) s1,s2,…,sL

Page 5: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

= argmax p( s1,s2,…,sL; x1,x2,…,xL ) s1,s2,…,sL= argmax p(s1,s2,…,sL-1; x1,x2,…,xL-1)p(sL|sL-1)p(xL|sL) s1,s2,…,sL

This inspires a recursive formulation of S*. Viterbi’s idea: This can be calculated using dynamic programming.

v(k,t) = max p(s1,..,st= k ; x1,..,xt)

that is, the probability of a most probable path up to time t that ends on state k. By the above derivation:

v(k,t) = max p(s1,..,st-1; x1,..,xt-1)p(st=k|st-1)p(xt|st=k)

= max v(j,t-1)p(st=k|sj)p(xt|st=k) j

= p(xt|st=k) max v(j,t-1)p(st=k|sj) j

Page 6: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Viterbi’s Algorithm - Example

v x1 = C x2 = P x3 = C

start 1 0 0 0

A 0

R 0

B 0

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the most likely path S*= (s*1,s*2,s*3) that generated x1,x2,x3= CPC

initialization

Page 7: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Viterbi’s Algorithm - Example

v x1 = C x2 = P x3 = C

start 1 0 0 0A 0 p(xt|st=k) max j v(j,t-1)p(st|sj)

= p(C|A) max {v(start,0)p(A|start), 0, 0, 0}= p(C|A) v(start,0)p(A|start) = 0.6 *1*0.6 = 0.36Parent: start

R 0 p(C|R) max {v(start,0)p(R|start), 0, 0, 0}= 0.9*1*0.1 = 0.09Parent: start

B 0 p(C|B) max {v(start,0)p(B|start), 0, 0, 0}= 0.5*1*0.3 = 0.15Parent: start

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the most likely path S*= (s*1,s*2,s*3) that generated x1,x2,x3= CPC

Page 8: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Viterbi’s Algorithm - Example

v x1 = C x2 = P x3=C

start 1 0 0 0

A 0 0.36Parent: start

= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(P|A) max {v(start,1)p(A|start), v(A,1)p(A|A), v(R,1)p(A|R), v(B,1)p(A|B)}= 0.4* max{0, 0.36*0.2, 0.09*0.1, 0.15*0.4} = 0.4*0.072= 0.0288Parent: A

R 0 0.09Parent: start

= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(P|R) max {v(start,1)p(R|start), v(A,1)p(R|A), v(R,1)p(R|R), v(B,1)p(R|B)}= 0.1* max{0, 0.36*0.1, 0.09*0.1, 0.15*0.3} = 0.1*0.045= 0.0045Parent: B

B 0 0.15Parent: start

= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(P|B) max {v(start,1)p(B|start), v(A,1)p(B|A), v(R,1)p(B|R), v(B,1)p(B|B)}= 0.5* max{0, 0.36*0.7, 0.09*0.8, 0.15*0.3} = 0.5*0.252= 0.126Parent: A

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the most likely path S*= (s*1,s*2,s*3) that generated x1,x2,x3= CPC

Page 9: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Viterbi’s Algorithm - Example

v x1 = C x2 = P x3=C

start 1 0 0 0

A 0 0.36Parent: start

0.0288Parent: A

= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(C|A) max {v(start,2)p(A|start), v(A,2)p(A|A), v(R,2)p(A|R), v(B,2)p(A|B)}= 0.6* max{0, 0.0288*0.2, 0.0045*0.1, 0.126*0.4} = 0.6*0.0504= 0.03024Parent: B

R 0 0.09Parent: start

0.0045Parent: B

= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(C|R) max {v(start,2)p(R|start), v(A,2)p(R|A), v(R,2)p(R|R), v(B,2)p(R|B)}= 0.9* max{0, 0.0288*0.1, 0.0045*0.1, 0.126*0.3} = 0.9*0.0378= 0.03402Parent: B

B 0 0.15Parent: start

0.126Parent: A

= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(C|B) max {v(start,1)p(B|start), v(A,2)p(B|A), v(R,2)p(B|R), v(B,2)p(B|B)}= 0.5* max{0, 0.0288*0.7, 0.0045*0.8, 0.126*0.3} = 0.5*0.0378= 0.0189Parent: B

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the most likely path S*= (s*1,s*2,s*3) that generated x1,x2,x3= CPC

Page 10: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Viterbi’s Algorithm - Example

v x1 = C x2 = P x3=C

start 1 0 0 0

A 0 0.36Parent: start

0.0288Parent: A

0.03024Parent: B

R 0 0.09Parent: start

0.0045Parent: B

0.03402Parent: B

B 0 0.15Parent: start

0.126Parent: A

0.0189Parent: B

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the most likely path S*= (s*1,s*2,s*3) that generated x1,x2,x3= CPC

Hence, the most likely path that generated CPC is: start A B RThis maximum likelihood path is extracted from the table as follows:• The last state of the path is the one with the highest value in the right-most column• The previous state in the path is the one recorded as Parent of the last• Keep following the Parents trail backwards until you arrive at start

Page 11: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

2. Calculating the probability of a sequence of observations

• Given a HMM and a sequence of observations:x1,x2,…,xL

• determine p(x1,x2,…,xL):

p(x1,x2,…,xL)

= p( s1,s2,…,sL; x1,x2,…,xL) s1,s2,…,sL

= p(s1,s2,…,sL-1; x1,x2,…,xL-1)p(sL|sL-1)p(xL|sL) s1,s2,…,sL

Page 12: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Let f(k,t) = p(st= k ; x1,..,xt)that is, the probability of x1,..,xt requiring st= k. In other words, the sum of probabilities of all the paths that emit (x1,..,xt) and end in state st=k.

f(k,t) = p(st= k ; x1,..,xt, xt)

= j p(st-1=j; x1,x2,…,xt-1) p(st=k|st-1=j) p(xt|st=k)

= p(xt|st=k) j p(st-1=j; x1,x2,…,xt-1) p(st=k|st-1=j)

= p(xt|st=k) j f(j,t-1) p(st=k|st-1)

Page 13: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Forward Algorithm - Example

f x1 = C x2 = P x3 = C

start 1 0 0 0

A 0

R 0

B 0

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC).

initialization

Page 14: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Forward Algorithm - Example

f x1 = C x2 = P x3 = C

start 1 0 0 0A 0 p(xt|st=k) j f(j,t-1)p(st|sj)

= p(C|A) {f(start,0)p(A|start), 0, 0, 0}= p(C|A) f(start,0)p(A|start) = 0.6 *1*0.6 = 0.36

R 0 p(C|R) {f(start,0)p(R|start), 0, 0, 0}= 0.9*1*0.1 = 0.09

B 0 p(C|B) {f(start,0)p(B|start), 0, 0, 0}= 0.5*1*0.3 = 0.15

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC).

Page 15: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Forward Algorithm - Example

f x1 = C x2 = P x3=C

start 1 0 0 0

A 0 0.36 = p(xt|st=k) j f(j,t-1)p(st|sj)= p(P|A) (f(start,1)p(A|start), + f(A,1)p(A|A), + f(R,1)p(A|R), + f(B,1)p(A|B))= 0.4* (0 + 0.36*0.2 + 0.09*0.1 + 0.15*0.4) = 0.4*0.141= 0.0564

R 0 0.09 = p(xt|st=k) j f(j,t-1)p(st|sj)= p(P|R) (f(start,1)p(R|start) + f(A,1)p(R|A) + f(R,1)p(R|R) + f(B,1)p(R|B))= 0.1* (0 + 0.36*0.1 + 0.09*0.1 + 0.15*0.3) = 0.1*0.09= 0.009

B 0 0.15 = p(xt|st=k) j f(j,t-1)p(st|sj)= p(P|B) (f(start,1)p(B|start) + f(A,1)p(B|A) + f(R,1)p(B|R) + f(B,1)p(B|B))= 0.5* (0 + 0.36*0.7 + 0.09*0.8 + 0.15*0.3) = 0.5*0.369= 0.1845

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC).

Page 16: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Forward Algorithm - Example

f x1 = C x2 = P x3=C

start 1 0 0 0

A 0 0.36 0.0564 = p(xt|st=k) j f(j,t-1)p(st|sj)= p(C|A) {f(start,2)p(A|start), f(A,2)p(A|A), f(R,2)p(A|R), f(B,2)p(A|B)}= 0.6* (0 + 0.0564*0.2 + 0.009*0.1 + 0.1845*0.4} = 0.6*0.08598= 0.05159

R 0 0.09 0.009 = p(xt|st=k) j f(j,t-1)p(st|sj)= p(C|R) {f(start,2)p(R|start), f(A,2)p(R|A), f(R,2)p(R|R), f(B,2)p(R|B)}= 0.9* (0 + 0.0564*0.1 + 0.009*0.1 + 0.1845*0.3} = 0.9*0.06189= 0.05570

B 0 0.15 0.1845 = p(xt|st=k) j f(j,t-1)p(st|sj)= p(C|B) {f(start,1)p(B|start), f(A,2)p(B|A), f(R,2)p(B|R), f(B,2)p(B|B)}= 0.5* (0 + 0.0564*0.7 + 0.009*0.8 + 0.1845*0.3} = 0.5*0.10203= 0.05102

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC).

Page 17: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Forward Algorithm - Example

f x1 = C x2 = P x3=C

start 1 0 0 0

A 0 0.36 0.0564 0.05159

R 0 0.09 0.009 0.05570

B 0 0.15 0.1845 0.05102

Hence, the probability of CPC being generated by this HMM is:p(CPC) = j f(j,3) = 0.05159 + 0.05570 + 0.05102 = 0.15831

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC).

Page 18: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

3. Calculating the probability of St = k given a sequence of observations

• Given a HMM and a sequence of observations:x1,x2,…,xL

• determine the probability that the state visited at time t was k: p(st=k| x1,x2,…,xL), where 1 <= t <= L

p(st=k| x1,x2,…,xL) = p(x1,x2,…,xL; st=k)/p(x1,x2,…,xL)

Note that p(x1,x2,…,xL) can be found using the forward algorithm. We’ll focus now on determining

p(x1,x2,…,xL; st=k)

Page 19: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

p(x1,…,xt,…,xL; st=k)

= p(x1,…,xt; st=k) p(xt+1,…,xL| x1,…,xt ; st=k)

= p(x1,…,xt; st=k) p(xt+1,…,xL| st=k)

f(k,t) b(k,t) forward algorithm backward algorithm

b(k,t) = p(xt+1,…,xL| st=k)

= j p(st+1=j|st=k)p(xt+1|st+1=j) p(xt+2,…,xL| st+1=j)

b(j,t+1)

Page 20: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Backward Algorithm - Example

b x1 = C x2 = P x3 = C

A 1

R 1

B 1

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits xt+1,…,xL given that St=k: p(xt+1,…,xL| st=k)

initialization

Page 21: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Backward Algorithm - Example

b x1 = C x2 = P x3 = C

A j p(st+1=j|st=k) p(xt+1|st+1=j) b(j,t+1)= j p(s3=j|A) p(C|s3=j) b(j,3)= p(A|A)p(C|A)b(A,3) + p(R|A)p(C|R)b(R,3) + p(B|A)p(C|B)b(B,3)= 0.2*0.6*1 + 0.1*0.9*1 + 0.7*0.5*1 = 0.56

1

R j p(st+1=j|st=k) p(xt+1|st+1=j) b(j,t+1)= j p(s3=j|R) p(C|s3=j) b(j,3)= p(A|R)p(C|A)b(A,3) + p(R|R)p(C|R)b(R,3) + p(B|R)p(C|B)b(B,3)= 0.1*0.6*1 + 0.1*0.9*1 + 0.8*0.5*1 = 0.55

1

B j p(st+1=j|st=k) p(xt+1|st+1=j) b(j,t+1)= j p(s3=j|R) p(C|s3=j) b(j,3)= p(A|B)p(C|A)b(A,3) + p(R|B)p(C|R)b(R,3) + p(B|B)p(C|B)b(B,3)= 0.4*0.6*1 + 0.3*0.9*1 + 0.3*0.5*1 = 0.66

1

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits xt+1,…,xL given that St=k: p(xt+1,…,xL| st=k)

Page 22: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Backward Algorithm - Example

b x1 = C x2 = P x3 = C

A j p(st+1=j|st=k) p(xt+1|st+1=j) b(j,t+1)= j p(s2=j|A) p(P|s2=j) b(j,2)= p(A|A)p(P|A)b(A,2) + p(R|A)p(P|R)b(R,2) + p(B|A)p(P|B)b(B,2)= 0.2*0.4*0.56 + 0.1*0.1*0.55 + 0.7*0.5*0.66 = 0.2813

0.56 1

R j p(st+1=j|st=k) p(xt+1|st+1=j) b(j,t+1)= j p(s2=j|R) p(P|s2=j) b(j,2)= p(A|R)p(P|A)b(A,2) + p(R|R)p(P|R)b(R,2) + p(B|R)p(P|B)b(B,2)= 0.1*0.4*0.56 + 0.1*0.1*0.55 + 0.8*0.5*0.66 = 0.2919

0.55 1

B j p(st+1=j|st=k) p(xt+1|st+1=j) b(j,t+1)= j p(s2=j|R) p(P|s2=j) b(j,2)= p(A|B)p(P|A)b(A,2) + p(R|B)p(P|R)b(R,2) + p(B|B)p(P|B)b(B,2)= 0.4*0.4*0.56 + 0.3*0.1*0.55 + 0.3*0.5*0.66 = 0.2051

0.66 1

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits xt+1,…,xL given that St=k: p(xt+1,…,xL| st=k)

Page 23: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

Backward Algorithm - Example

b x1 = C x2 = P x3 = C

A 0.2813 0.56 1

R 0.2919 0.55 1

B 0.2051 0.66 1

Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits xt+1,…,xL given that St=k: p(xt+1,…,xL| st=k)

We can calculate the probability of CPC being generated by this HMM from the Backward table as follows: p(CPC) = j b(j,1)p(j|start)p(C|j)

= (0.2813+0.6*0.6) + (0.2919*0.1*0.9) + (0.2051*0.3*0.5)= 0.15831

though we can obtain the same probability from the Forward table (as we did in a previous slide).

Page 24: Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology

3. (cont.) Using the Forward and Backward tables to calculate the probability of St = k given a sequence of observations

Example:• Given: Coke/Pepsi HMM, and sequence of observations: CPC• Find the probability that the state visited at time 2 was B, that is p(s2=B| CPC)

In other words, given that the person drank CPC, what’s the probability that Pepsi was on sale during the 2nd week?

Based on the calculations we did on the previous slides:

p(s2=B|CPC) = p(CPC; s2=B)/p(CPC)

= [ p( x1=C, x2=P; s2=B) p(x3=C| x1=C, x2=P ; s2=B) ] / p(x1=C, x2=P, x3=C)

= [ p(x1=C, x2=P; s2=B) p(x3=C| s2=B) ] / p(CPC)

= [ f(B,2) b(B,2) ] / p(CPC)= [0.1845 * 0.66] / 0.15831= 0.7691

here, p(CPC) was calculated by summing up the last column of the Forward table.

so there is a high probability that Pepsi was on sale during week 2, given that the person drank Pepsi that week!