Upload
happy-deal
View
230
Download
0
Embed Size (px)
Citation preview
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
1/15
A crash course on Automatic SpeechA crash course on Automatic SpeechRecognition and using CMU sphinx to buildRecognition and using CMU sphinx to build
small ASRssmall ASRsByBy
Shyam.kShyam.kSwathanthra Malayalam ComputingSwathanthra Malayalam Computing
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
2/15
2006-06-01 2
Introduction
Why Speech?Most eecti!e an" natural orm o human communication
Systems can #e more user-rien"ly an" more people will #ea#le to access the technology with ease
$pplications are enormous an" % lea!e that to your
imagination&language translators an" tutors' %()S'in"e*ing au"io recor"ing...etc+
$S) is still an unsol!e" pro#lem)esearches ha!e to #e ma"e to utili,e its ull potential
But mature enough to use un"er controlle" con"itions
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
3/15
2006-06-01
Introduction-2
Why is Speech )ecognition har"?remen"ous range o !aria#ility in speech'large !oca#ulary...
/isciplines in Speech echnologyhysiology o speech pro"uction an" hearing
signal processing
inear $lge#ra
ro#a#ility heory' statistical estimation an" mo"eling
%normation heory
inguistics
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
4/152006-06-01
Speech production and Acoustic Phonetic approach
$nalogy o speech
pro"uction with electricnetwork an" mo"ellingthe !ocal tract
But speech is not that
"eterministicStatistical approachesshowe" much #etterresults than acoustic
phonetic approach an" sothese metho"s lost theirimportance
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
5/152006-06-01 3
statistical pattern recognition approach
Speech )ecognition is a type o pattern recognition
pro#lem)aw sample streams o au"io are not well suite" ormatching
4eatures or mathematical e*pressions are ormulate"
which when applie" in an au"io stream representschanges in speech
So e*traction o such eatures is the !ery irst step
spectral analysis..#ut not enough
cepstral analysis...aha5#ut not yet there
Mel 4reuency Cepstral Coeicients&M4CC+..W7W5
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
6/15
2006-06-01 6
Template matching
$ simple i"ea in the way o statistical pattern
recognition is to pre-recor" a wor" to #erecogni,e"'compute the eature !ectors' an" comparethe !ectors to in" the more closely matche" input.
he same theory is e*pan"e" or the whole $S)
systemshe concept o matching the templates is enhance" to theamous /ynamic rogramming&/+ algorithm
Simple templates are replace" #y mo"els which are initially
trained/ierent types o mo"els are there as acoustic'pronounciationan" language mo"els.
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
7/15
2006-06-01 8
Hidden Maro! Models
Speech can #e consi"ere" in!aria#le o!er a !ery small
inter!al o time9MM mo"els speech #y consi"ering it as a set o suchsmall inter!als' with each such segments in!ariantwithin themsel!es
:ach such segments can #e mo"ele" #y an 9MM state:ach state has a mo"el which is #uilt o!er a pro#a#ility"istri#ution that "escri#es the eature !ector!ariation&aka speech !ariation+ o!er that segment
hus we ha!e the pro#a#ility or a gi!en speechsegment to #e a particular 9MM state #ase" on thepro#a#ility "istri#ution
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
8/15
2006-06-01 ;
HMM example
4igure shows $ Simple 9MM with three states
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
9/15
2006-06-01 $($
sphin*2 is a ast speech recognition system'semicontinuous 9MMs
pocket sphin* is the astest recognition system'thoughits not as accurate as sphin*2 or sphin*
sphin* uses continuous 9MMs
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
13/15
2006-06-01 1
CMUSPHI()-Training
Sphin* train is the training package
%t reuires the ollowing ilesphone list-which speciies the phones use" in our particularapplication with each phone in a seperate line
"ictionary-which speciies how each an" e!ery wor" in our
!oca#ulary is ma"e with the phones speciie" in the a#o!e listiller "ictionary-which speciies the special wor"s such assilent #reath cough etc..
transcripts-which speciies the content o each au"io ile inthe "ata#ase with the wor"s in the "ictionary
o#!iously the speech iles with the same ile names as per thetranscripts
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
14/15
2006-06-01 1
CMUSPHI()- Training* minimal usage
erl scripts
perl scripts are pro!i"e" or easy usage o sphin*train so thatonce we ha!e the iles rea"y at the appropriate locations weust nee" to run the perl scripts to get the acuoustical mo"elsrea"y.So lets go the easy way5
he iles such as phonelist'"ictionary'transcripts etc are kept in
proect@etc "irectoryspeech iles are kept in seperate "irectory or each users in theproect@wa! "irectory
once we ha!e these rea"y'we coul" setup the proect #y callingsphin*Atutorial.pl an" there ater makeAeats.pl &i necessary+
hen we train the mo"el #y either ust calling )un$ll.pl scriptin scriptsApl "ir or #y calling each component program otraining manually.
7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379
15/15
2006-06-01 13
CMUSPHI()-"ecoding
here is "ierent types o "eco"ers namely
sphin*2'sphin*'sphin* an" pocket sphin*we coul" select one rom these "eco"ers accor"ing tothe type o application we are ha!ing
7nce we ha!e the traine" mo"els' we ust want to call
the "eco"ers #y pro!i"ing those mo"el iles an" othernecessary "ata or "eco"ing
Sphin* ha!e a nice $% set which ena#les us "e!elopersto integrate sphin* to our own applications.