A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

Embed Size (px)

Citation preview

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    1/15

    A crash course on Automatic SpeechA crash course on Automatic SpeechRecognition and using CMU sphinx to buildRecognition and using CMU sphinx to build

    small ASRssmall ASRsByBy

    Shyam.kShyam.kSwathanthra Malayalam ComputingSwathanthra Malayalam Computing

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    2/15

    2006-06-01 2

    Introduction

    Why Speech?Most eecti!e an" natural orm o human communication

    Systems can #e more user-rien"ly an" more people will #ea#le to access the technology with ease

    $pplications are enormous an" % lea!e that to your

    imagination&language translators an" tutors' %()S'in"e*ing au"io recor"ing...etc+

    $S) is still an unsol!e" pro#lem)esearches ha!e to #e ma"e to utili,e its ull potential

    But mature enough to use un"er controlle" con"itions

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    3/15

    2006-06-01

    Introduction-2

    Why is Speech )ecognition har"?remen"ous range o !aria#ility in speech'large !oca#ulary...

    /isciplines in Speech echnologyhysiology o speech pro"uction an" hearing

    signal processing

    inear $lge#ra

    ro#a#ility heory' statistical estimation an" mo"eling

    %normation heory

    inguistics

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    4/152006-06-01

    Speech production and Acoustic Phonetic approach

    $nalogy o speech

    pro"uction with electricnetwork an" mo"ellingthe !ocal tract

    But speech is not that

    "eterministicStatistical approachesshowe" much #etterresults than acoustic

    phonetic approach an" sothese metho"s lost theirimportance

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    5/152006-06-01 3

    statistical pattern recognition approach

    Speech )ecognition is a type o pattern recognition

    pro#lem)aw sample streams o au"io are not well suite" ormatching

    4eatures or mathematical e*pressions are ormulate"

    which when applie" in an au"io stream representschanges in speech

    So e*traction o such eatures is the !ery irst step

    spectral analysis..#ut not enough

    cepstral analysis...aha5#ut not yet there

    Mel 4reuency Cepstral Coeicients&M4CC+..W7W5

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    6/15

    2006-06-01 6

    Template matching

    $ simple i"ea in the way o statistical pattern

    recognition is to pre-recor" a wor" to #erecogni,e"'compute the eature !ectors' an" comparethe !ectors to in" the more closely matche" input.

    he same theory is e*pan"e" or the whole $S)

    systemshe concept o matching the templates is enhance" to theamous /ynamic rogramming&/+ algorithm

    Simple templates are replace" #y mo"els which are initially

    trained/ierent types o mo"els are there as acoustic'pronounciationan" language mo"els.

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    7/15

    2006-06-01 8

    Hidden Maro! Models

    Speech can #e consi"ere" in!aria#le o!er a !ery small

    inter!al o time9MM mo"els speech #y consi"ering it as a set o suchsmall inter!als' with each such segments in!ariantwithin themsel!es

    :ach such segments can #e mo"ele" #y an 9MM state:ach state has a mo"el which is #uilt o!er a pro#a#ility"istri#ution that "escri#es the eature !ector!ariation&aka speech !ariation+ o!er that segment

    hus we ha!e the pro#a#ility or a gi!en speechsegment to #e a particular 9MM state #ase" on thepro#a#ility "istri#ution

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    8/15

    2006-06-01 ;

    HMM example

    4igure shows $ Simple 9MM with three states

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    9/15

    2006-06-01 $($

    sphin*2 is a ast speech recognition system'semicontinuous 9MMs

    pocket sphin* is the astest recognition system'thoughits not as accurate as sphin*2 or sphin*

    sphin* uses continuous 9MMs

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    13/15

    2006-06-01 1

    CMUSPHI()-Training

    Sphin* train is the training package

    %t reuires the ollowing ilesphone list-which speciies the phones use" in our particularapplication with each phone in a seperate line

    "ictionary-which speciies how each an" e!ery wor" in our

    !oca#ulary is ma"e with the phones speciie" in the a#o!e listiller "ictionary-which speciies the special wor"s such assilent #reath cough etc..

    transcripts-which speciies the content o each au"io ile inthe "ata#ase with the wor"s in the "ictionary

    o#!iously the speech iles with the same ile names as per thetranscripts

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    14/15

    2006-06-01 1

    CMUSPHI()- Training* minimal usage

    erl scripts

    perl scripts are pro!i"e" or easy usage o sphin*train so thatonce we ha!e the iles rea"y at the appropriate locations weust nee" to run the perl scripts to get the acuoustical mo"elsrea"y.So lets go the easy way5

    he iles such as phonelist'"ictionary'transcripts etc are kept in

    proect@etc "irectoryspeech iles are kept in seperate "irectory or each users in theproect@wa! "irectory

    once we ha!e these rea"y'we coul" setup the proect #y callingsphin*Atutorial.pl an" there ater makeAeats.pl &i necessary+

    hen we train the mo"el #y either ust calling )un$ll.pl scriptin scriptsApl "ir or #y calling each component program otraining manually.

  • 7/26/2019 A Crash Course on Speech Recognition and Using CMU Sphinx to Build ASRs 379

    15/15

    2006-06-01 13

    CMUSPHI()-"ecoding

    here is "ierent types o "eco"ers namely

    sphin*2'sphin*'sphin* an" pocket sphin*we coul" select one rom these "eco"ers accor"ing tothe type o application we are ha!ing

    7nce we ha!e the traine" mo"els' we ust want to call

    the "eco"ers #y pro!i"ing those mo"el iles an" othernecessary "ata or "eco"ing

    Sphin* ha!e a nice $% set which ena#les us "e!elopersto integrate sphin* to our own applications.