Click here to load reader
Upload
kimberly-wilcox
View
214
Download
0
Embed Size (px)
Citation preview
HTK
Arnel Fajardo, student (Hak Seng)Under Professor Yoon-joong Kim (Paksa)Using HTK for Phoneme-level Recognition in the developed Filipino Phonetically-balanced wordsPhoneme Recognition unitPhonemeNo. of phonemes =36Recognized phonemes=33 Set of phonemesVowels /a/ /e/ /i/ /o/ /u/Consonants/b/ /k/ /d/ /g/ /h/ /l/ /m/ /n/ // /p/ /r/ /s/ /t/ /w/ /y/ Diphtongs/iw/ /ay/ /aw/ /oy/ /ey/ /uy/ Additions:/p:/ /b:/ /m:/ /t:/ /d:/ /n:/ /s:/ /l:/ /k:/ /g:/
Speech Recognition ProcessHTK using Recognition Process
Preparation of the DataSpeech Data for Training and Testing-Data ( wave file)50 sets ( 25 male and 25 female)257 words ( 2-word list)Training data40 sets ( 20 male and 20 female)257 words ( 2-word list)212 words (3-word list)Test DataSpeaker Dependent: 20 sets each for male and femaleSpeaker independent: 5 sets each for male and femaleLocation: Test_htk_samples/Htk_phoneme_2/Data/pbw_list_2/speakernumber/PBW2/speech fileFeature of Speech Data:-wave *.wav-16Khz, 16 bit, linear PCM-Phonetically Balanced Words (PBW)
Preparation of the DataCreating phoneme in an MLF(Master Label File) HLEd n modelList/monoList -l * -d dic/phoneDict -i mlfs/phones.mlf scripts/mono.led mlfs/words.mlfHLEdDictionary(phoneDict)Edit Script(mono.led)Phone Level Transcription(phones.mlf)Word Level Transcription(words.mlf)Phone Level Model List(monoList)EX
Preparation of the Data[input] MLF(Creating a file) file: mlfs/words.mlfcontents[input] Creating a modellistFilename: modelList/wordListcontentssilaba'yagadakinakingako'yalamalonaminaminganakanongapatarawatinatingayaw#!MLF!#"*/PBW2001.lab"silaba'ysil."*/PBW2002.lab"silagadsil."*/PBW2003.lab"silakinsil.Preparation of the Data[output]phoneme-level-file File: mlfs/phones.mlfContents[output]phoneme model listFile: modelList/monoListContents
7Preparation of the DataParameter Characteristics ExtractionHCopy C configs/Hcopy.config S scripts/Hcopy.scp
Preparation of the Data[input]configs/HCopy.config
[input]scripts/HCopy.scp
# Coding parametersSOURCEKIND = WAVEFORMSOURCEFORMAT = WAVESOURCERATE = 625TARGETKIND = MFCC_0TARGETRATE = 100000.0SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 250000.0USEHAMMING = TPREEMCOEF = 0.97NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F
Preparation of the DataCreate file list for trainingFile: scripts/train.scpCreate file list for testingFile: scripts/test.scp
Acoustic Model CreationTopology used 7 states left-to-right HMM for PBW 2Acoustic Model CreationGeneral Model Generation
HCompV -C configs/config -f 0.01 m S scripts/train.scp M monoHmms/m0 monoHmms/protoAcoustic Model Creation[input] monoHmms/proto
Acoustic Model Creation[input] Configs/config Start of config file modification TARGETKIND : MFCC_0_D_A
[input] scripts/train.scp# Coding parametersNONUMSCAPES = TTARGETKIND = MFCC_0_D_ATARGETRATE = 100000.0SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 250000.0USEHAMMING = TPREEMCOEF = 0.97NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F
Acoustic Model Creation[output] monoHmms/m0/vFloor
Acoustic Model Creation[output] monoHmms/m0/proto
Acoustic Model CreationPhoneme units model Written macro file (monoHmms/m0/macros)vFloor The contents of the file created by adding :" ~o 39
Acoustic Model CreationModel phoneme unitsHmms/m0/hmmdefs To recognize all phoneme units of hmm that defines the model~h a
~
~h b
~
Training: Embedded TrainingEmbedded TrainingHERest C configs/config l mlfs/phones.mlf S scripts/train.scp -H monoHmms/m0/macros H monoHmms/m0/hmmdefs -M monoHmms/m1 modellist/monoList
HERestPhone Level Transcription(phones.mlf)Training Files listed in(train.scp)Hmm1(m1)/macros/hmmdefsHmm0(m0)/macros/hmmdefsConfiguration File(config)HMM List(monoList)Training: Embedded Training[input]monoHmms/m0/macros
[input]mlfs/phones.mlf[input]monoHmms/m0/hmmdefs
[input] scripts/train.scp
Training : Embedded TrainingEvaluation
HERest C configs/config l mlfs/phones.mlf S scripts/train.scp -H monoHmms/m1/macros H monoHmms/m1/hmmdefs -M monoHmms/m2 modellist/monoList
HERest C configs/config l mlfs/phones.mlf S scripts/train.scp -H monoHmms/m2/macros H monoHmms/m2/hmmdefs -M monoHmms/m3 modellist/monoList
Training: Embedded TrainingRe-evaluation
HERest C configs/config l mlfs/phones.mlf S scripts/train.scp -H monoHmms/m3/macros H monoHmms/m3/hmmdefs -M monoHmms/m4 modellist/monoList
HERest C configs/config l mlfs/phones.mlf S scripts/train.scp -H monoHmms/m4/macros H monoHmms/m4/hmmdefs -M monoHmms/m5 modellist/monoListRecognitionPreparation for recognitionPronunciation dictionary(dic/phonedict)
Word modelsWord : target word recognition[models : hmm model list
RecognitionPreparation for recognitionCreating a grammar file (dic/pbwGram)
Grammar Rules for creating grammar file
RecognitionPreparation for Recognition-Recognition Network creation
HParse dic/pbwGram dic/tag_NetHParse C configs/config dic/pbwGram dic/tag_NetRecognition[input] dic/pbwGram
[input] configs/config
Recognition[output] dic/tag_Net
RecognitionHVite C configs/config -S scripts/test.scp H monoHmms/m5/hmmdefs -H Hmms/m7/macros w dic/tag_Net i mlfs/recOutWordm7.mlfDic/dict modelList/monoList
Recognition[input] configs/config
[input] scripts/test_d.scp
Recognition[input]dic/dict[input] modellist/wordList
Recognition[output] mlfs/pbw2_dependent_result.mlf
RecognitionAnalysis of Recognition ResultsHResults I mlfs/words.mlf modelList/wordList mlfs/pbw2_dependent_result.mlf