Speech to text conversion

SPEECH TO TEXT CONVERSION

Why this project?

Speech recognition technology is one from the fast growing

engineering technologies.

Nearly 20% people of the world are suffering from various

disabilities; many of them are blind or unable to use their hands

effectively. they can share information with people by operating

computer through voice input.

Our project is capable to recognize the speech and convert the

input audio into text; it also enables a user to perform operations

such as open calculator, wordpad, notepad, log off computer.

APPLICATIONS

In Car Systems

Health Care

Military

Training air traffic controllers

Telephony and other domains

Usage in education and daily life

PERFORMANCE

The performance of speech recognition systems is

usually evaluated in terms of accuracy and speed.

Accuracy is usually rated with word error rate (WER),

whereas speed is measured with the real time factor.

Other measures of accuracy include Single Word Error

Rate (SWER) and Command Success Rate (CSR).

Accuracy

Accuracy of speech recognition vary with the following:

Vocabulary size and confusabilitySpeaker dependence vs. independence Isolated, discontinuous, or continuous speechTask and language constraintsRead vs. spontaneous speech

SYSTEM BLOCK DIAGRAM

Acoustic Model

An acoustic model is created by taking audio recordings of speech, and their

text transcriptions, and using software to create statistical representations of the

sounds that make up each word. It is used by a speech recognition engine to

recognize speech.

Language Model

A language model is a file containing the probabilities of sequences of words.

Language models are used for dictation applications, whereas grammars are

used in desktop command and control or telephony interactive voice

response (IVR) type applications.

Speech Engine

A speech engine is software that gives your computer the ability to play back

text in a spoken voice (referred to as text-to-speech or TTS).

HMM (HIDDEN MARKOV MODEL)

These are statistical models that output a sequence of symbols or

quantities. HMMs are used in speech recognition because a speech

signal can be viewed as a piecewise stationary signal or a short-

time stationary signal. In a short time-scale (e.g., 10 milliseconds),

speech can be approximated as a stationary process. Speech can be

thought of as a Markov model for many stochastic purposes.

HMM Codebook

HMM Speech Process

Advantages

Able to write the text both through keyboard and voice input

Voice recognition of different notepad commands such as

open save and clear.

Open different windows soft wares, based on voice input.

Lower operational costs.

Provide significant help for the people with disabilities.

Requires less consumption of time in writing text.

WEAKNESS

Homonyms:

Are the words that are differently spelled and have the different meaning but acquires

the same meaning, for example “there” “their”, “be” and “bee”. This is a challenge for

computer machine to distinguish between such types of phrases that sound alike.

Speeches:

A second challenge in the process, is to understand the speech uttered by different

users, current systems have a difficulty to separate simultaneous speeches form

multiple users.

Noise factor:

the program requires hearing the words uttered by a human distinctly and clearly. Any

extra sound can create interference, first you need to place system away from noisy

environments and then speak clearly else the machine will confuse and will mix up the

words.

FUTURE SCOPE

Accuracy will become better and better

Dictation speech recognition will gradually become accepted

Greater use will be made of “intelligent systems” which will attempt

to guess what the speaker intended to say, rather than what was

actually said, as people often misspeak and make unintentional

mistakes.

Microphone and sound systems will be designed to adapt more

quickly to changing background noise levels, different environments,

with better recognition of extraneous material to be discarded.

Speech to text conversion

Engineering

Text-to-Speech Text-to-Speech Part II 1 Intelligent Robot Lecture Note

Hungarian text–to–speech conversion: linguistic models ... · Olaszy and Géza Németh. The first versions of the program worked on the DOS operating system, and the speech signal

Voice Conversion applied to Text-to-Speech systemsnlp.lsi.upc.edu/papers/thesis_helenca.pdfspeakers for Text-To-Speech systems (TTS). The goal of this thesis is to develop a VC system

Visual speech to text conversion applicable to telephone communication

CONVERSION OF EMOTION IN SPEECH SIGNAL USING DWT & ADAPTIVE … · Conversion of Emotion in Speech Signal using DWT & Adaptive Filtering 84 CONVERSION OF EMOTION IN SPEECH SIGNAL

Design and Implementation of Text To Speech Conversion for ... · Design and Implementation of Text To Speech Conversion for Visually Impaired People ... speech synthesizer will be

Assembly Speech Text

Letter-to-Phoneme Conversion for a German Text-to-Speech ...vera/Diplomarbeit.pdf · word stress assignment for a German text-to-speech system. In the rst part of the thesis (chapter

Text to Speech Conversion for Odia Languageethesis.nitrkl.ac.in/7229/1/Test_Jena_2015.pdf · A Text-to-Speech (TTS) system converts normal language text into natural sounding speech

SOPC-Based Speech-to-Text Conversion · A speech-to-text system can also improve system accessibility by providing data entry options for blind, deaf, or physically handicapped users

Text-to-speech in Vocabulary Acquisition and Student ...csisso/docs/Sisson-technical-report.pdf · 1.4 Text-to-speech (TTS) 1.4.1 Types of TTS Text to speech systems generate speech

Speech Recognition: Technology & Patent Landscape · PDF filespeech-enabled communication, ... Word recognition 2,044 Speech to text conversion 1,521 ... Speech recognition application

Text to-speech

ASSISTIVE CONTEXT-AWARE TOOLKIT (ACAT) - … · 6.6 Handling Calibration ... Text-to-Speech Conversion from text to speech through the Microsoft ... files from the folder where you

Text to Speech 사용법

Voice Conversion - speech.ee.ntu.edu.twspeech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/Voice... · Voice Conversion Using Transformer with Text-to-Speech Pretraining, arXiv, 2019 •[Biadsy,

Conversion and Conversation: Speech and Social Change in ... · Conversion and Conversation: Speech and Social Change in Jean Toomer's Cane ... Conversion and Conversation: Speech

CS378 - Mobile Computing Speech to Text, Text to Speech, Telephony

TEXT-TO-SPEECH INTRODUCTION. What is text-to-speech? Text-to-speech (TTS) is a process where digital text is converted in to spoken words. –“Talking text”

Low-Cost Portable Text Recognition and Speech Synthesis ... · Low-cost Portable Text Recognition and Speech Synthesis with ... portable text recognition and speech synthesis