Upload
barnaby-morris
View
217
Download
4
Embed Size (px)
Citation preview
Speech Recognition Speech Recognition ApplicationApplication
Voice Enabled Phone Directory
- Yousef Rabah
Process of Speech RecognitionProcess of Speech Recognition
Speaker dependent vs. Speaker Independent Vocabulary Isolated vs. ContinuousFrequency changes PronunciationSpeech ProcessingHMM – Probabilities, Parameters, Training Phonemes to words
ProblemProblem
Automatic speech interacting phone directory assistance without human interaction.
Automatic Speech Recognition - SphinxAutomatic Speech Recognition - Sphinx
Acoustic modeling Language Model
– Unigrams: <s> & </s>– Bigrams: P(word2 | word1)
– Trigrams: P(word3| word2 | word1)
Lexicon Structure – ZERO Z IH R OW– ONE W AH N– TWO T UW
– <sil>
Input / Output Input / Output 24003 samples in file
/usr/local/share/sphinx3/model/lm/an4/hell.rawINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil>INFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2)INFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTHINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> HINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 12
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 13
INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH
Backtrace(null)
LatID SFrm EFrm AScr LScr Type
254 0 45 -391470 -74100 -1<sil>
594 46 81 -472155 -148846 0 H
1291 82 102 -288621 -148846 0 E
1850 103 126 -235274 -148846 0 L
2599 127 147 -430694 -148846 0 L
2650 148 148 0 -148846 0 </s>
0 148 -1818214 -818330 (Total)
FWDVIT: H E L L (null)
DifficultiesDifficulties
Hardware issuesASR software issuesLetter phonemes - “e-set”Time
SolutionSolution
Database (PostgreSQL)
– Names– Numbers– Phone number– Fast access
SolutionSolution
Architecture of application– User Interaction– Connects to Database– Communicates with Sphinx– Uses of C, Perl, shell scripts
Example (general idea):…
PC: Say the letters of first name, press space bar before and after you speak:
User: S AA EM
PC: Did you say, SAM ?
…
SolutionSolution
Check ListCheck List
Reading ASR system Database - PSQLApplications in C, Perl, PHP, vxml, shell
TimelineTimeline