Upload
andra-short
View
244
Download
1
Tags:
Embed Size (px)
11
7-Speech Recognition7-Speech Recognition
Speech Recognition Concepts Speech Recognition Concepts
Speech Recognition ApproachesSpeech Recognition Approaches
Recognition TheoriesRecognition Theories
Bayse RuleBayse Rule
Simple Language ModelSimple Language Model
P(A|W) Network TypesP(A|W) Network Types
22
7-Speech Recognition (Cont’d)7-Speech Recognition (Cont’d)
HMM Calculating ApproachesHMM Calculating Approaches
Neural ComponentsNeural Components
Three Basic HMM ProblemsThree Basic HMM Problems
Viterbi AlgorithmViterbi Algorithm
State Duration ModelingState Duration Modeling
Training In HMMTraining In HMM
33
Recognition TasksRecognition Tasks
Isolated Word Recognition (IWR)Isolated Word Recognition (IWR)
Connected Word (CW) , And Continuous Connected Word (CW) , And Continuous Speech Recognition (CSR)Speech Recognition (CSR)
Speaker Dependent, Multiple Speaker, And Speaker Dependent, Multiple Speaker, And Speaker Independent Speaker Independent
Vocabulary SizeVocabulary Size– Small <20Small <20– Medium >100 , <1000Medium >100 , <1000– Large >1000, <10000Large >1000, <10000– Very Large >10000Very Large >10000
44
Speech Recognition ConceptsSpeech Recognition Concepts
NLPSpeech
Processing
Text Speech
NLPSpeech
ProcessingSpeech
Understanding
Speech Synthesis
TextPhone Sequence
Speech Recognition
Speech recognition is inverse of Speech Synthesis
55
Speech Recognition Speech Recognition ApproachesApproaches
Bottom-Up ApproachBottom-Up Approach
Top-Down ApproachTop-Down Approach
Blackboard ApproachBlackboard Approach
66
Bottom-Up ApproachBottom-Up Approach
Signal Processing
Feature Extraction
Segmentation
Signal Processing
Feature Extraction
Segmentation
Segmentation
Sound Classification Rules
Phonotactic Rules
Lexical Access
Language Model
Voiced/Unvoiced/Silence
Kno
wle
dge
Sou
rces
Recognized Utterance
77
UnitMatching
System
Top-Down ApproachTop-Down Approach
FeatureAnalysis
LexicalHypothesis
SyntacticHypothesis
SemanticHypothesis
UtteranceVerifier/Matcher
Inventory of speech
recognition units
Word Dictionary Grammar
TaskModel
Recognized Utterance
88
Blackboard ApproachBlackboard Approach
EnvironmentalProcesses
Acoustic Processes Lexical
Processes
SyntacticProcesses
SemanticProcesses
Blackboard
99
Recognition TheoriesRecognition Theories
Articulatory Based RecognitionArticulatory Based Recognition– Use from Articulatory system for recognitionUse from Articulatory system for recognition– This theory is the most successful until nowThis theory is the most successful until now
Auditory Based RecognitionAuditory Based Recognition– Use from Auditory system for recognitionUse from Auditory system for recognition
Hybrid Based RecognitionHybrid Based Recognition– Is a hybrid from the above theoriesIs a hybrid from the above theories
Motor TheoryMotor Theory– Model the intended gesture of speakerModel the intended gesture of speaker
1010
Recognition ProblemRecognition Problem
We have the sequence of acoustic We have the sequence of acoustic symbols and we want to find the words symbols and we want to find the words that expressed by speakerthat expressed by speaker
Solution : Finding the most probable of Solution : Finding the most probable of word sequence by having Acoustic word sequence by having Acoustic symbolssymbols
1111
Recognition ProblemRecognition Problem
A : Acoustic SymbolsA : Acoustic Symbols
W : Word SequenceW : Word Sequence
we should find so that we should find so that W)|(max)|ˆ( AWPAWP
W
1212
Bayse RuleBayse Rule
),()()|( yxPyPyxP
)(
)()|()|(
yP
xPxyPyxP
)(
)()|()|(
AP
WPWAPAWP
1313
Bayse Rule (Cont’d)Bayse Rule (Cont’d)
)(
)()|(max
AP
WPWAPW
)|(max)|ˆ( AWPAWPW
)()|(max
)|(maxˆ
WPWAPArg
AWPArgW
W
W
1414
Simple Language ModelSimple Language Model
nwwwww 321
),...,,,(
),...,,|(
).....,,|(
),|()|()(
)|()(
121
121
1234
123121
1211
WWWWP
WWWWP
WWWWP
WWWPWWPWP
wwwwPwP
nnn
nnn
iii
n
i
Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.
1515
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|()( 211
iii
n
iwwwPwP
)|()( 11
ii
n
iwwPwP
Trigram :
Bigram :
)()(1
i
n
iwPwP
Monogram :
1616
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|( 123 wwwP
Computing Method :Number of happening W3 after W1W2
Total number of happening W1W2
AdHoc Method :
)()|()|()|( 332321231123 wfwwfwwwfwwwP
1717
Error Production FactorError Production Factor
Prosody (Recognition should be Prosody (Recognition should be Prosody Independent)Prosody Independent)
Noise (Noise should be prevented)Noise (Noise should be prevented)
Spontaneous SpeechSpontaneous Speech
1818
P(A|W) Computing P(A|W) Computing ApproachesApproaches
Dynamic Time Warping (DTW)Dynamic Time Warping (DTW)
Hidden Markov Model (HMM)Hidden Markov Model (HMM)
Artificial Neural Network (ANN)Artificial Neural Network (ANN)
Hybrid SystemsHybrid Systems
1919
Dynamic Time Warping Dynamic Time Warping Method (DTW)Method (DTW)
To obtain a global distance between two speech patterns a time alignment must be performed
Ex :A time alignment path between a template pattern “SPEECH” and a noisy input “SsPEEhH”
2020
Artificial Neural NetworkArtificial Neural Network
...
1x
0x
1w0w
1Nw
1Nx
y)(
1
0
i
N
iixwy
Simple Computation Element of a Neural Network
2121
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
Neural Network TypesNeural Network Types– PerceptronPerceptron– Time DelayTime Delay– Time Delay Neural Network Computational Time Delay Neural Network Computational
Element (TDNN)Element (TDNN)
2222
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
. . .
. . .
0x
0y 1My
1Nx
Single Layer Perceptron
2323
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
. . .
. . .
Three Layer Perceptron
. . .
. . .
2424
Hybrid MethodsHybrid Methods
Hybrid Neural Network and Matched Filter For Hybrid Neural Network and Matched Filter For RecognitionRecognition
PATTERN
CLASSIFIER
SpeechAcoustic Features Delays
Output Units
2525
Neural Network PropertiesNeural Network Properties
The system is simple, But too much The system is simple, But too much iteration is needed for trainingiteration is needed for training
Doesn’t determine a specific structureDoesn’t determine a specific structure
Regardless of simplicity, the results are Regardless of simplicity, the results are goodgood
Training size is large, so training should be Training size is large, so training should be offlineoffline
Accuracy is relatively goodAccuracy is relatively good
2626
Hidden Markov ModelHidden Markov Model
Observation : O1,O2, . . . Observation : O1,O2, . . .
States in time : q1, q2, . . .States in time : q1, q2, . . .
All states : s1, s2, . . ., sNAll states : s1, s2, . . ., sN
tOOOO ,,,, 321
tqqqq ,,,, 321
Si Sjjiaija
2727
Hidden Markov Model (Cont’d)Hidden Markov Model (Cont’d)
Discrete Markov ModelDiscrete Markov Model
)|(
),,,|(
1
121
itjt
zktitjt
sqsqP
sqsqsqsqP
Degree 1 Markov Model
2828
Hidden Markov Model (Cont’d)Hidden Markov Model (Cont’d)
)|( 1 itjtij sqsqPa
ija : Transition Probability from Si to Sj ,
Nji ,1
2929
Discrete Markov Model Discrete Markov Model ExampleExample
S1 : The weather is rainyS2 : The weather is cloudyS3 : The weather is sunny
8.01.01.0
2.06.02.0
3.03.04.0
}{ ijaA
rainy cloudy sunnyrainy
cloudy
sunny
3030
Hidden Markov Model Example Hidden Markov Model Example (Cont’d)(Cont’d)
Question 1:How much is this probability:Sunny-Sunny-Sunny-Rainy-Rainy-Sunny-Cloudy-Cloudy
22311333 ssssssss
22321311313333 aaaaaaa
87654321 qqqqqqqq410536.1
3131
Hidden Markov Model Example Hidden Markov Model Example (Cont’d)(Cont’d)
Question 2:The probability of staying in state Si for d days if we are in state Si?
NisqP ii 1),( 1The probability of being in state i in time t=1
)()1()( 1 dPaassssP iiidiiijiii
d Days
3232
Discrete Density HMM Discrete Density HMM ComponentsComponents
N : Number Of StatesN : Number Of States
M : Number Of OutputsM : Number Of Outputs
A (NxN) : State Transition Probability A (NxN) : State Transition Probability MatrixMatrix
B (NxM): Output Occurrence Probability in B (NxM): Output Occurrence Probability in each stateeach state
(1xN): Initial State Probability(1xN): Initial State Probability),,( BA : Set of HMM Parameters
3333
Three Basic HMM ProblemsThree Basic HMM Problems
Given an HMM Given an HMM and a sequence of and a sequence of observations observations O,O,what is the probability what is the probability ? ?
Given a model and a sequence of Given a model and a sequence of observations observations OO, what is the most likely , what is the most likely state sequence in the model that produced state sequence in the model that produced the observations?the observations?
Given a model Given a model and a sequence of and a sequence of observationsobservations O, O, how should we adjust how should we adjust model parameters in order to maximize model parameters in order to maximize ? ?
)|( OP
)|( OP
3434
First Problem SolutionFirst Problem Solution
)(),|(),|(11
tq
T
ttt
T
tobqoPqoP
t
TT qqqqqqq aaaqP132211
)|(
)()|(),( yPyxPyxP
)|(),|()|,( zyPzyxPzyxP We Know That:
And
3535
First Problem Solution (Cont’d)First Problem Solution (Cont’d)
)|(),|()|,( qPqoPqoP
)()()(
)|,(
122111 21 Tqqqqqqqq obaobaob
qoP
TTT
T
TTTqqq
Tqqqqqqqq
q
obaobaob
qoPoP
21
122111)()()(
)|,()|(
21
Computation Order : )2( TTNO
3636
Forward Backward ApproachForward Backward Approach
)|,,,,()( 21 iqoooPi ttt
Niobi ii 1),()( 11
Computing )(it
1) Initialization
3737
Forward Backward Approach Forward Backward Approach (Cont’d)(Cont’d)
NjTt
obaij tjij
N
itt
1,11
)(])([)( 11
1 2) Induction :
3) Termination :
N
iT ioP
1
)()|(
Computation Order : )( 2TNO
3838
Backward VariableBackward Variable
),|,,,()( 21 iqoooPi tTttt
NiiT 1,1)(1) Initialization
2)Induction
NiAndTTt
jobaiN
jttjijt
11,,2,1
)()()(1
11
3939
Second Problem SolutionSecond Problem Solution
Finding the most likely state sequenceFinding the most likely state sequence
N
itt
ttN
it
t
ttt
ii
ii
iqoP
iqoP
oP
iqoPoiqPi
11
)()(
)()(
)|,(
)|,(
)|(
)|,(),|()(
Individually most likely state :Ttiq t
it 1)],([maxarg*
4040
Viterbi AlgorithmViterbi Algorithm
Define : Define :
Ni
oooiqqqqP
i
tttqqq
t
t
1
]|,,,,,,,,[max
)(
21121,,, 121
P is the most likely state sequence with this conditions : state i , time t and observation o
4141
Viterbi Algorithm (Cont’d)Viterbi Algorithm (Cont’d)
)(].)(max[)( 11 tjijti
t obaij
1) Initialization
0)(
1),()(
1
11
i
Niobi ii
)(it Is the most likely state before state i at time t-1
4242
Viterbi Algorithm (Cont’d)Viterbi Algorithm (Cont’d)
NjTt
aij
obaij
ijtNi
t
tjijtNi
t
1,2
])([maxarg)(
)(])([max)(
11
11
2) Recursion
4343
Viterbi Algorithm (Cont’d)Viterbi Algorithm (Cont’d)
)]([maxarg
)]([max
1
*
1
*
iq
ip
TNi
T
TNi
3) Termination:
4)Backtracking:
1,,2,1),( *11
* TTtqq ttt
4444
Third Problem SolutionThird Problem Solution
Parameters Estimation using Baum-Parameters Estimation using Baum-Welch Or Expectation Maximization Welch Or Expectation Maximization (EM) Approach(EM) Approach
Define:
N
i
N
jttjijt
ttjijt
tt
ttt
jobai
jobai
oP
jqiqoP
ojqiqPji
1 111
11
1
1
)()()(
)()()(
)|(
)|,,(
),|,(),(
4545
Third Problem Solution Third Problem Solution (Cont’d)(Cont’d)
N
jtt jii
1
),()(
1
1
)(T
tt i
T
tt ji
1
),(
: Expectation value of the number of jumps from state i
: Expectation value of the number of jumps from state i to state j
4646
Third Problem Solution Third Problem Solution (Cont’d)(Cont’d)
)(1 ii
T
tt
T
tt
ij
i
jia
1
1
)(
),(
T
tt
Vo
T
tt
j
j
j
kb kt
1
1
)(
)(
)(
4747
Baum Auxiliary FunctionBaum Auxiliary Function
q
qoPqoPQ )|,(log)'|,()|( '
)'|()|(
)|()|(: '
oPoP
QQif
By this approach we will reach to a local optimum
4848
Restrictions Of Restrictions Of Reestimation FormulasReestimation Formulas
11
N
ii
NiaN
jij
1,11
NjkbM
kj
1,1)(1
4949
Continuous Observation Continuous Observation DensityDensity
We have amounts of a PDF instead of We have amounts of a PDF instead of
We haveWe have
)|()( jqVoPkb tktj
1)(,),,()(1
dooboCob j
M
kjkjkjkj
Mixture Coefficients
Average Variance
5050
Continuous Observation Continuous Observation DensityDensity
Mixture in HMMMixture in HMM
),,()( jkjkjkk
j oCMaxob
M2|1M1|1
M4|1M3|1
M2|3M1|3
M4|3M3|3
M2|2M1|2
M4|2M3|2
S1 S2 S3Dominant Mixture:
5151
Continuous Observation Continuous Observation Density (Cont’d)Density (Cont’d)
Model Parameters:Model Parameters:
),,,,( CA
N×N N×M×K×KN×M×KN×M1×N
N : Number Of StatesM : Number Of Mixtures In Each StateK : Dimension Of Observation Vector
5252
Continuous Observation Continuous Observation Density (Cont’d)Density (Cont’d)
T
t
M
kt
T
tt
jk
kj
kjC
1 1
1
),(
),(
T
tt
t
T
tt
jk
kj
okj
1
1
),(
),(
5353
Continuous Observation Continuous Observation Density (Cont’d)Density (Cont’d)
T
tt
jktjkt
T
tt
jk
kj
ookj
1
1
),(
)()(),(
),( kjt Probability of event j’th state and k’th mixture at time t
5454
State Duration ModelingState Duration Modeling
Si Sj
Probability of staying d times in state i :
)1()( 1ii
diii aadP
jia
ija
5555
State Duration Modeling State Duration Modeling (Cont’d)(Cont’d)
Si Sjjia
……. …….
HMM With clear duration
ija )(dPj)(dPi
5656
State Duration Modeling State Duration Modeling (Cont’d)(Cont’d)
HMM consideration with State Duration :HMM consideration with State Duration :– Selecting using ‘sSelecting using ‘s– Selecting usingSelecting using– Selecting Observation Sequence Selecting Observation Sequence
using using in practice we assume the following in practice we assume the following
independence:independence:
– Selecting next state using transition probabilities Selecting next state using transition probabilities . We also have an additional constraint: . We also have an additional constraint:
),(),,,(1
1
11 121 tq
d
tdq OtbOOOb
iiq 1
dOOO ,,, 21 )(
1dPq1d
21qqa
),,,(11 21 dq OOOb
jq 2
011qqa
5757
Training In HMMTraining In HMM
Maximum Likelihood (ML)Maximum Likelihood (ML)
Maximum Mutual Information (MMI)Maximum Mutual Information (MMI)
Minimum Discrimination Information (MDI)Minimum Discrimination Information (MDI)
5858
Training In HMMTraining In HMM
Maximum Likelihood (ML)Maximum Likelihood (ML)
)|( 1oP
)|( 2oP)|( 3oP
)|( noP
.
.
.
)]|([*V
rOPMaximumP
ObservationSequence
5959
Training In HMM (Cont’d)Training In HMM (Cont’d)
Maximum Mutual Information (MMI)Maximum Mutual Information (MMI)
)()(
)|,(log),(
POP
OPOI
v
ww
v
wPwOP
OPOI
1
)(),|(log
)|(log),(
Mutual Information
}{, v
6060
Training In HMM (Cont’d)Training In HMM (Cont’d)
Minimum Discrimination Information Minimum Discrimination Information (MDI)(MDI)
dooP
oqoqPQI )|(
)(log)():(
),,,( 21 TOOOO
),,,( 21 tRRRR
Observation :
Auto correlation :
):(inf),( PQIPR )(RQ