Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah

Speech Recognition Speech Recognition ApplicationApplication

Voice Enabled Phone Directory

- Yousef Rabah

Process of Speech RecognitionProcess of Speech Recognition

Speaker dependent vs. Speaker Independent Vocabulary Isolated vs. ContinuousFrequency changes PronunciationSpeech ProcessingHMM – Probabilities, Parameters, Training Phonemes to words

ProblemProblem

Automatic speech interacting phone directory assistance without human interaction.

Automatic Speech Recognition - SphinxAutomatic Speech Recognition - Sphinx

Acoustic modeling Language Model

– Unigrams: <s> & </s>– Bigrams: P(word2 | word1)

– Trigrams: P(word3| word2 | word1)

Lexicon Structure – ZERO Z IH R OW– ONE W AH N– TWO T UW

– <sil>

Input / Output Input / Output 24003 samples in file

/usr/local/share/sphinx3/model/lm/an4/hell.rawINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil>INFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2)INFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTHINFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> HINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H EINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 12

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E LINFO: live.c(239): live_nfeatvec: 13

INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH

Backtrace(null)

LatID SFrm EFrm AScr LScr Type

254 0 45 -391470 -74100 -1<sil>

594 46 81 -472155 -148846 0 H

1291 82 102 -288621 -148846 0 E

1850 103 126 -235274 -148846 0 L

2599 127 147 -430694 -148846 0 L

2650 148 148 0 -148846 0 </s>

0 148 -1818214 -818330 (Total)

FWDVIT: H E L L (null)

DifficultiesDifficulties

Hardware issuesASR software issuesLetter phonemes - “e-set”Time

SolutionSolution

Database (PostgreSQL)

– Names– Numbers– Phone number– Fast access

SolutionSolution

Architecture of application– User Interaction– Connects to Database– Communicates with Sphinx– Uses of C, Perl, shell scripts

Example (general idea):…

PC: Say the letters of first name, press space bar before and after you speak:

User: S AA EM

PC: Did you say, SAM ?

…

SolutionSolution

Check ListCheck List

Reading ASR system Database - PSQLApplications in C, Perl, PHP, vxml, shell

TimelineTimeline

Documents

Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah