Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
2
Overview
Markov ModelsHidden Markov Models
Evaluation: Forward algorithm
Decoding: Viterbi algorithm
Learning: Forward-Backward Algorithm
Backward algorithm
Speech recogition with HMM
Three central problems:
Types of Hidden Markov ModelsApplications using HMMs
Isolated word recognizer
Conclusion
Measured performance
3
Markov Models
- A System of n states : the system is at time t in state w(t)
- Changes of state are nondeterministic but depending on the previousstates
- Transition probabilities: Often shown as state transition matrix
In a First order, ..., n-order model changes depend on the previous 1, ..., nstates
- Vector of initial probabilities !
4
Example: Weather system
Three states: sunny, cloudy, rainy
Transition Matrix A
State vector
p = (1,0,0)T
˜˜˜
¯
ˆ
ÁÁÁ
Ë
Ê
375,0625,0125,0
375,0125,0375,0
25,025,05,0
A =
5
Hidden Markov Models
- n invisible states
- every state emits at time t a visible symbol/state v(t)
- System generates a sequence of symbols (states) VT= { v(1), v(2),v(3), ..., v(n)}
- Probability of emitting v(t) in state w(t)=j: P( v(t) | w(t)=j) = bj(k)
- Transition probability: P(w(t+1)=j | w(t)=i) = aij
- Normalization conditions
= Transition matrix A
= Confusion matrix B
- Vector of initial probabilities !
 =j
ija 1 Â =j
j kb 1)( for all kfor all i
7
Transition matrix A
Confusion matrix B
˜˜˜
¯
ˆ
ÁÁÁ
Ë
Ê
375,0625,0125,0
375,0125,0375,0
25,025,05,0
˜˜˜
¯
ˆ
ÁÁÁ
Ë
Ê
5,035,01,005,0
25,025,025,025,0
05,015,02,06,0sun
clouds
rain
Dry Dryish Damp Soggy
sun
clouds
rain
sun clouds rain
p = (1,0,0)TState vector
8
Types of Hidden Markov Models
Ergodic
All states can be reached within one step from everywhere
Transition matrix A entries are never zero
˜̃
˜˜˜
¯
ˆ
ÁÁÁÁÁ
Ë
Ê
44434241
34333231
24232221
14131211
aaaa
aaaa
aaaa
aaaa
A =
9
Left-right Models
˜̃
˜˜˜
¯
ˆ
ÁÁÁÁÁ
Ë
Ê
44
3433
242322
131211
000
00
0
0
a
aa
aaa
aaa
A =
No backleading transitions
aij = 0 j < i
Additional constraints:
aij = 0 j > i + ∆
(f.e. ∆ = 2 => no jumps of more than 2 states)
10
Applications using HMM
Speech recognition
Language modelling
Recognition of hand writings
Protein sequence analysis
Financial/Economic Models
11
Three central problems
1. Evaluation
2. Decoding
3. Learning
Given a HMM (A,B,!) :
Find the probability that a sequence of visible states VT was generatedby that model
Given a HMM (A,B,!) and a set of observations:
Find the most probable sequence of hidden states that led to thoseobservations
Given the number of visible and hidden states and somesequences of training observations:
Find the parameters aij and bj(k)
12
Evaluation:
)()|()(max
1
Tr
Tr
r
r
TT PVPVP wwÂ=
=
Probability that the system M produces a sequence VT
For all possible sequences wrT={w(1), w(2),..., w(T)}
That means: Take all sequences of hidden states, calculate theprobability that they calculated VT and add them
Â’= =
-=N
r
T
t
T ttPttvPVP1 1
))1(|)(())(|)(()( www
N is the number of states , T is the number of visible symbols / steps to go
Problem: Complexity is o(NT T)
13
The Forward alogithm:
Calculate the Problem recursively
ÔÓ
ÔÌ
Ï
-=
Â=
))(()1(
))0(()(
1
tvbat
vbt
jij
N
ii
jj
j a
pa
t=0 and j = initial state
bj(v(t)) means the probability to emit the state selected by v(t)
aj(t) is the probability that the model is in state j and has produced thefirst t elements of VT
else
14
initialize aj(0)= p(j)bj(v(0)), t=0 aij,bj,visible sequence VT
for t <= t + 1
until t = T
return P(VT) = afinal(T)
end
))(()1()(1
tvbatt jij
N
i ij  =-ææ¨ aa for all j £ N
Forward algorithm
Complexity of this algorithm: o(N2T)
15
Example: Forward algorithm
t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 t = 1 t = 2 t = 3
State 1
State 2
State 3
17
Backward algorithm
initialize bi(T)= 1, t=T aij, bjk, visible sequence VT
for t <= t - 1
until t = 1
return P(VT) = bi(0)
end
))1(()1()(1
++ææ¨ Â =tvbatt jij
N
i ij bb for all j < N
bi(t) is the probability that the model is in state j and will producethe last T - t elements of VT
18
Decoding (Viterbi Algorithm)
Can be recursively calculated with the Viterbi algorithm:
Finds the sequence of hidden states in a N-state model M that mostprobable generated a sequence VT = {v(0),v(1),...,v(t)} of visible states.
di(t) = max P(w(1),..., w(t)=i, v(1),...,v(t) | M ) over all pathes w
recursively: dj(t) = maxi [di(t-1)aij] bj(v(t))
di(t) is the maximal probability along a path w(1),..., w(t) = ito generate the sequence v(1),...,v(t) (partial best path)
to keep track of the path maximizing di(t) the array yj(t) is used
with dj(0) = p(i) bi(v(0))
19
initialize di(0) = p(i)bi(v(0)), yi(0)=0, t=0 für alle i
for t <= t + 1
for all states jdj(t)= max1£i£N[di(t-1)aij] bj(v(t))
yj(t)= arg max1£i£N[di(t-1)aij]
until t = T
Sequence backtracking:
Sequence calculation:
for t = T-1, t <= t - 1
w(t) = yw(t+1)(t+1)
w(T) = arg max1£i£N[di(t-1)aij]
until t = 0
Sequence Termination:
22
Positive aspects of the Viterbi Algorithm
Reduction of computational complexity by recursion
Takes the entire context to find a optimal solution, thereforelower error rates with noisy data
23
Learning (Forward Backward algorithm)
determine the N-state model parameters aij and bjk based on a trainingsequence by iteratively calculating better values
Definition:
)|(
)1())1(()()(
MVP
ttvbatt
T
jjijiij
++=
bax
Probability of a transition wi(t) to wj(t+1) given the model Mgenerated VT by any path
 ==
N
j iji tt1
)()( xg Probability of being in state wi at time t
),|)1(),(()( MVttPt Tjiij += wwx
 Â= =++
++= N
k l
N
l lklki
jjijiij
ttvbat
ttvbatt
1 1)1())1(()(
)1())1(()()(
ba
bax
25
ÂÂ
-
=
-
== 1
1
1
1
)(
)(' T
t i
T
t ijij
t
ta
g
x
which gives us better values for aij
Â-
=
1
1)(
T
t i tg
Expected number of times in wi
Â-
=
1
1)(
T
t ij tx
Expected number of transitions from wi to any other state
Expected number of transitions from wi to wj
 =
T
t i t1
)(g
ÂÂ
Â-
= =
-
=
++
++= 1
1 1
1
1
)1())1(()(
)1())1(()(
T
t
N
kkkikk
T
tjiijit
ttvbat
ttvbat
ba
ba
26
ÂÂ
=
== T
t j
T
t ktjkj
t
vbtvb
1
1
)(
)()()('
g
g
Expected number of times in wi
Expected number of times in wi emitting vk
p‘(i) = gi(0) Probability of being in state i at time 0
Calculation of the bj’
27
Positive Aspect
arbitrary precision of estimation
estimate these parameters by:
For ! and aij either random or uniform
for the bj better initial estimates are very useful for fast convergence
ProblemsHow to choose the initial parameters aij and bj?
Manual segmentation
Maximum Likelihood segmentation
28
What is the appropriate model
Must be decided on the kind of signal modeled
In speech recognition often a left-right model is used to model theadvancing of time
f.e. every sylable get a state and a final silent state
29
Scaling
Problem: The ai and bi are always smaller than 1 so thecalculations converge against zero
This exceeds the precision range even in double precision
Solution: Multiply the ai and bi by a scaling coefficient ct thatis independent from i but depends on t
For example:
Â=
= N
ii
t
tc
1
)(
1
a
These parameters fall out in the calculation of aij‘ and bj‘
30
Isolated Word recognizer
Build a HMM for each word in a vocabulary and calculate the(A,B,!) parameters (train the model)
For each word to be recognized:
- Feature analysis (Vector quantization)
- Run Viterbi on all models to find the most probable model forthat observation sequence
Generates Observation Vectors from the signal
Speech recognizers using HMMs
33
Measured performance of an isolated word recognizer
100 digits by 100 talkers(50 female / 50 male)
Original Training: the original training set was usedTS2: the original speakers as in the training
TS3: Complete new set of speakers
TS4: Another new set of speakers
34
Conclusion
There are various processes where the real activity is invisible andonly a generated pattern can be observed. These can be modeled byHMM
needs enough training data
Advantages
Limitations of HMM :
The Markov assumption that each state only depends on the previousstate is not always true
HMM are the predominant method for current automatic speechrecognition and play a great role in other recognition systems
Acceptable calculation complexity
Low error rates
35
Bibliography
Rabiner, L.R.A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition;Proceedings of the IEEE, Vol.77, Iss.2, Feb 1989; Pages:257-286
Richard O. Duda, Peter E. Hart, David G. StorkPattern Classification chapter 3.10
Eric Keller (Editor)Fundamentals of Speech synthesis and Speech recognitionChapter 8 by Kari Torkkola
Used Websites
http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.htmlIntroduction to Hidden Markov Models, University of Leeds
http://www.cnel.ufl.edu/~yadu/report1.htmlSpeech recognition using Hidden Markov Models,Yadunandana Nagaraja Rao , University of Florida