15
CVSR COLLEGE OF ENGINEERING CVSR COLLEGE OF ENGINEERING DEPARTMENT OF DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING ELECTRONICS AND COMMUNICATION ENGINEERING TECHNICAL SEMINAR ON TECHNICAL SEMINAR ON SPEECH TO TEXT CONVERSION SPEECH TO TEXT CONVERSION BY Y.RAJENDER REDDY(08H61A04C5)

Tech Seminar Ppt

Embed Size (px)

DESCRIPTION

its about speech converter

Citation preview

Page 1: Tech Seminar Ppt

CVSR COLLEGE OF ENGINEERINGCVSR COLLEGE OF ENGINEERING

DEPARTMENT OFDEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERINGELECTRONICS AND COMMUNICATION ENGINEERING

TECHNICAL SEMINAR ONTECHNICAL SEMINAR ON

SPEECH TO TEXT CONVERSIONSPEECH TO TEXT CONVERSION

BY

Y.RAJENDER REDDY(08H61A04C5)

Page 2: Tech Seminar Ppt

INTRODUCTIONINTRODUCTION

Speech recognition is the process of capturing spoken words using

microphone or telephone and converting them into a digitally stored set of

words

Speech to text conversion is one of the application of speech recognition

Speech-to-text system improves accessibility by providing data entry

options for blind, deaf, or physically handicapped users.

Page 3: Tech Seminar Ppt

BLOCK DIAGRAMBLOCK DIAGRAM

Speech acquisition

Speech preprocessing

Hidden Marcov model

Text storage

External Hardware

Through Microphone

Page 4: Tech Seminar Ppt

SPEECH ACQUISITIONSPEECH ACQUISITION

The microphone input port with the audio codec receives the signal, amplifies

it, and converts it into 16-bit PCM digital samples at a sampling rate of 8 KHz

.

The system needs a parallel/serial interface to the Nios II processor and an

application running on the processor that acquires and stores data in memory.

The received samples are stored into memory on the Altera Development and

Education (DE2) board.

Page 5: Tech Seminar Ppt

SPEECH PREPROCESSINGSPEECH PREPROCESSING

Preprocessing involves taking the speech samples as input, blocking the

samples into frames, and returning a unique pattern for each sample.

The unique pattern can be achived by following steps

1. The digital samples are divided into overlapped frames.

2. The system checks the frames for voice activity using endpoint detection

and energy threshold calculations.

Page 6: Tech Seminar Ppt

3. The speech samples are passed through a pre-emphasis filter.

4.The frames with voice activity are passed through a Hamming window.

CONTINUE……CONTINUE……

5. The system finds linear predictive coding (LPC) coefficients for frames .

6. From the LPC coefficients, the system determines the cepstral coefficients

The cepstral coefficients serve as feature vectors.

Page 7: Tech Seminar Ppt

HIDDEN MARCOV HIDDEN MARCOV MODELMODEL

Hidden Marcov Model is used for speech recognition, which converts speech

to text

This model consists of three steps

• Training

• HMM-Based recognition

• Digit Models

Page 8: Tech Seminar Ppt

TRAININGTRAINING

Training involves creating a pattern representative of the features of a class

using one or more test patterns that correspond to speech sounds of the same

class.

An important part of speech-to-text conversion using pattern recognition

is training.

Page 9: Tech Seminar Ppt

HMM-BASED RECOGNITIONHMM-BASED RECOGNITION

Recognition is the process of comparing the unknown test pattern

with each sound class reference pattern and computing a measure of

similarity between the test pattern and each reference pattern

Page 10: Tech Seminar Ppt

DIGIT MODELSDIGIT MODELS

The input speech sample is preprocessed and the feature

vector is extracted.

Then, the index of nearest codebook vector for each frame

is sent to all digit models.

The model with the maximum probability is chosen as the

recognisied digit.

Page 11: Tech Seminar Ppt

TEXT STORAGETEXT STORAGE

The Nios II processor on the DE2 board sends the digital speech data to a PC.

A target program running on the PC receives the text and writes it to the

disk.

Page 12: Tech Seminar Ppt

APPLICATIONAPPLICATIONSSInteractive voice response system (IVRS)

Voice-dialing in mobile phones and telephones

Hands-free dialing in wireless bluetooth headsets

PIN and numeric password entry modules

Automated teller machines (ATMs)

Page 13: Tech Seminar Ppt

1. Topic taken from seminartopics.co.in/ece-seminar-topics/

2. Garg, Mohit. Linear Prediction Algorithms. Indian Institute of Technology,

Bombay, India, Apr 2003.

3. Li, Gongjun and Taiyi Huang. An Improved Training Algorithm in Hmm-

Based Speech Recognition.National Laboratory of Pattern Recognition.

Chinese Academy of Sciences, Beijing.

4. Altera Nios ii Document

REFERENCESREFERENCES

Page 14: Tech Seminar Ppt

THANK THANK

YOUYOU

Page 15: Tech Seminar Ppt

QUERIES

QUERIES