speech processing basics

Preview:

Citation preview

Speech Processing

• Fundamentals of Digital Speech processing

1.Anatomy and physiology of speech organs

2.The process of speech production

3.The Acoustic Theory of speech production

4.Digital models for speech signals

Applications of Speech Processing

• 1.Speech recognition: speech to text• 2.Speech understanding: Not exact words(meaning is

important rather than text) :speech translation• 3.speech synthesis: Text to speech, computer can

speak to you• 4.Word processing: check and correct spelling,

grammar and style• 5.text prediction: speed up word processing• 6.automatic summarization: Topic identification,

summary generation• 7.text mining : Necessary data

• Anatomy: It is the study of structure of bodies of people or animals• Physiology: It is the study of how people’s and animals bodies functions

and understanding the higher order mechanisms within the human central nervous system that account for speech production in human beings

• Acoustic: It is a scientific study of sounds• Phonetics: It is relating to the sound of a word or to the sounds that are

used in languages • Phonemes: It is the smallest unit of sounds which is significant in a

language • Articulatory:It is the action of productory a sound or word cleary,in speech

or music• Linguistics: It is study of the way in which language works• Semantics: It is the branch of Linguistics that deals with the meanings of

words and sentences.

Speech Processing

SignalProcessing Information

TheoryPhonetics

Acoustics

Algorithms(Programming)

Fourier transformsDiscrete time filtersAR(MA) models

EntropyCommunication theoryRate-distortion theory

Statistical SPStochastic models

PsychoacousticsRoom acousticsSpeech production

ASR: Application

© James Glass, MIT

7

Recognition

Voice Input Analog to Digital Acoustic Model

Language Model

Display Speech EngineFeedback

Automatic Speech Recognition

Speech Generation

• first talker formulates a message(in this mind)that he wants to transmit to listener via speech

• The process of message formulation is creation of printed text expressing the words of message

• The next step is conversion of the message into a language code.

• This roughly corresponds to converting the printed text of message into set of phoneme sequence corresponding to sounds that make up words and pitch accent associated with the sounds

• Once the language code is chosen, the talker must execute a series of neuromuscular commands to cause the vocal cords to vibrate when appropriate and shape the vocal tract such that the proper sequence of speech sounds is created and spoken by the talker, then producing an acoustic signal as final output

Speech Recognition

• First the listener processes the acoustic signal the basilar membrane in the inner ear, which providing a running spectrum analysis of the incoming signal.

• The neural activity along the auditory nerve is converted into a language code at higher centers of processing within the brain and message comprehension is achieved

• The lungs and the associated muscles act as the source of air for exciting the vocal mechanism.

• The muscle force pushes air out of lungs(shown as a piston pushing up within a cylinder)and though the bronchi and trachea.

• When the vocal cords are tensed, the air flow causes them to vibrate ,producing so called voiced speech sounds

• When the vocal cords are relaxed, in order to produce a sound, the air flow either must pass through a constriction in vocal tract and thereby become turbulent, producing so called unvoiced speech sounds

Classifications

• 1.silence(s)-no speech is produced()

• 2.Unvoiced(U):vocal cords are not vibrating so speech signal is aperiodic or random in nature

• 3.Voiced(V): vocal cords are vibrate periodically when air flows from the lungs, so speech signal is periodic

Speech Waveform Characteristics

• Loudness

• Voiced/Unvoiced.

• Pitch.

– Fundamental frequency.

• Spectral envelope.

– Formants.

Speech Waveform Characteristics Cont.

Voiced Speech Unvoiced Speech

/ih/ /s/

Phoneme HierarchySpeech sounds

Vowels ConsonantsDiphtongs

Plosive

NasalFricative

Retroflexliquid

Lateralliquid

Glide

iy, ih, ae, aa, ah, ao,ax, eh,er, ow, uh, uw

ay, ey,oy, aw

w, y

p, b, t,d, k, g

m, n, ng f, v, th, dh,s, z, sh, zh, h

r

l

Language dependent.About 50 in English.

Signal processing

Digital speech processing

• Speech signals are composed of a sequence of sounds.

• The study of these rules and their implication s in human communication is the domain of linguistics.

• The study and classification of sound of speech is called phonetics.

Recommended