32
Speech Processing Fundamentals of Digital Speech processing 1.Anatomy and physiology of speech organs 2.The process of speech production 3.The Acoustic Theory of speech production 4.Digital models for speech signals

speech processing basics

Embed Size (px)

Citation preview

Page 1: speech processing basics

Speech Processing

• Fundamentals of Digital Speech processing

1.Anatomy and physiology of speech organs

2.The process of speech production

3.The Acoustic Theory of speech production

4.Digital models for speech signals

Page 2: speech processing basics

Applications of Speech Processing

• 1.Speech recognition: speech to text• 2.Speech understanding: Not exact words(meaning is

important rather than text) :speech translation• 3.speech synthesis: Text to speech, computer can

speak to you• 4.Word processing: check and correct spelling,

grammar and style• 5.text prediction: speed up word processing• 6.automatic summarization: Topic identification,

summary generation• 7.text mining : Necessary data

Page 3: speech processing basics
Page 4: speech processing basics

• Anatomy: It is the study of structure of bodies of people or animals• Physiology: It is the study of how people’s and animals bodies functions

and understanding the higher order mechanisms within the human central nervous system that account for speech production in human beings

• Acoustic: It is a scientific study of sounds• Phonetics: It is relating to the sound of a word or to the sounds that are

used in languages • Phonemes: It is the smallest unit of sounds which is significant in a

language • Articulatory:It is the action of productory a sound or word cleary,in speech

or music• Linguistics: It is study of the way in which language works• Semantics: It is the branch of Linguistics that deals with the meanings of

words and sentences.

Page 5: speech processing basics

Speech Processing

SignalProcessing Information

TheoryPhonetics

Acoustics

Algorithms(Programming)

Fourier transformsDiscrete time filtersAR(MA) models

EntropyCommunication theoryRate-distortion theory

Statistical SPStochastic models

PsychoacousticsRoom acousticsSpeech production

Page 6: speech processing basics

ASR: Application

© James Glass, MIT

Page 7: speech processing basics

7

Recognition

Voice Input Analog to Digital Acoustic Model

Language Model

Display Speech EngineFeedback

Page 8: speech processing basics

Automatic Speech Recognition

Page 9: speech processing basics
Page 10: speech processing basics

Speech Generation

• first talker formulates a message(in this mind)that he wants to transmit to listener via speech

• The process of message formulation is creation of printed text expressing the words of message

• The next step is conversion of the message into a language code.

• This roughly corresponds to converting the printed text of message into set of phoneme sequence corresponding to sounds that make up words and pitch accent associated with the sounds

Page 11: speech processing basics

• Once the language code is chosen, the talker must execute a series of neuromuscular commands to cause the vocal cords to vibrate when appropriate and shape the vocal tract such that the proper sequence of speech sounds is created and spoken by the talker, then producing an acoustic signal as final output

Page 12: speech processing basics

Speech Recognition

• First the listener processes the acoustic signal the basilar membrane in the inner ear, which providing a running spectrum analysis of the incoming signal.

• The neural activity along the auditory nerve is converted into a language code at higher centers of processing within the brain and message comprehension is achieved

Page 13: speech processing basics
Page 14: speech processing basics
Page 15: speech processing basics
Page 16: speech processing basics
Page 17: speech processing basics
Page 18: speech processing basics

• The lungs and the associated muscles act as the source of air for exciting the vocal mechanism.

• The muscle force pushes air out of lungs(shown as a piston pushing up within a cylinder)and though the bronchi and trachea.

• When the vocal cords are tensed, the air flow causes them to vibrate ,producing so called voiced speech sounds

• When the vocal cords are relaxed, in order to produce a sound, the air flow either must pass through a constriction in vocal tract and thereby become turbulent, producing so called unvoiced speech sounds

Page 19: speech processing basics

Classifications

• 1.silence(s)-no speech is produced()

• 2.Unvoiced(U):vocal cords are not vibrating so speech signal is aperiodic or random in nature

• 3.Voiced(V): vocal cords are vibrate periodically when air flows from the lungs, so speech signal is periodic

Page 20: speech processing basics

Speech Waveform Characteristics

• Loudness

• Voiced/Unvoiced.

• Pitch.

– Fundamental frequency.

• Spectral envelope.

– Formants.

Page 21: speech processing basics

Speech Waveform Characteristics Cont.

Voiced Speech Unvoiced Speech

/ih/ /s/

Page 22: speech processing basics
Page 23: speech processing basics
Page 24: speech processing basics

Phoneme HierarchySpeech sounds

Vowels ConsonantsDiphtongs

Plosive

NasalFricative

Retroflexliquid

Lateralliquid

Glide

iy, ih, ae, aa, ah, ao,ax, eh,er, ow, uh, uw

ay, ey,oy, aw

w, y

p, b, t,d, k, g

m, n, ng f, v, th, dh,s, z, sh, zh, h

r

l

Language dependent.About 50 in English.

Page 25: speech processing basics

Signal processing

Page 26: speech processing basics

Digital speech processing

Page 27: speech processing basics
Page 28: speech processing basics

• Speech signals are composed of a sequence of sounds.

• The study of these rules and their implication s in human communication is the domain of linguistics.

• The study and classification of sound of speech is called phonetics.

Page 29: speech processing basics
Page 30: speech processing basics
Page 31: speech processing basics
Page 32: speech processing basics