49
1 Department of Electrical Engineering , IIT Bombay EE679 EE679 : Speech Processing : Speech Processing A preview A preview EE679 EE679 : Speech Processing : Speech Processing A preview A preview Dept of Electrical Engineering I.I.T. Bombay

EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

1Department of Electrical Engineering , IIT Bombay

EE679EE679: Speech Processing : Speech Processing

A previewA preview

EE679EE679: Speech Processing : Speech Processing

A previewA preview

Dept of Electrical Engineering

I.I.T. Bombay

Page 2: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

2Department of Electrical Engineering , IIT Bombay

Outline

• Speech production (physiology)

• Classification of sounds: articulatory, acoustic

• Speech analysis (signal processing methods for information extraction)

• Hearing, and speech perception

• Speech technology (speech compression, ASR,TTS)

• Audio/music technology

Page 3: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

3Department of Electrical Engineering , IIT Bombay

Speech communication

Page 4: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

4Department of Electrical Engineering , IIT Bombay

Acoustic waves

Speed = wavelength x frequency

Page 5: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

5Department of Electrical Engineering , IIT Bombay

Information in speech

• Linguistic (phone->word->sentence->message)

• Paralinguistic:

--speaker-based (pronunciation, age, sex,etc.),

--expressive (emotions, mood)

The speech signal is characterised by an enormous

range of perceptually contrasting sounds!

Page 6: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

6Department of Electrical Engineering , IIT Bombay

Generating speech*

Respiration->phonation

->articulation

Vibrating vocal cords

create puffs of air giving

rise to air pressure

variations which reach

our ears.*HyperPhysics, Sound and

Hearing, Georgia State

University

Page 7: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

7Department of Electrical Engineering , IIT Bombay

.......;4

5;

4

3;

4321

L

cf

L

cf

L

cf ===

Vocal tract: Acoustic resonances*

*HyperPhysics, Sound and

Hearing, Georgia State University

(http://hyperphysics.phy-

astr.gsu.edu/hbase/sound/)

Page 8: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

8Department of Electrical Engineering , IIT Bombay

Speech production (Childers, Speech Overview, 1993)

Page 9: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

9Department of Electrical Engineering , IIT Bombay

Vocal cords

Tongue Jaw

Lips

Teeth

Velum

Moving muscles

which alter the

resonant cavities Static cavity

Dynamic cavity

Vocal

cavity

Pharyngeal

cavity

Velum

Nasal

cavity

Oral

Cavity

Articulators

Trachea connection to lungs

Oral sound output

Nasal sound output

Articulation: producing the various sounds of speech*

*Securivox

tutorial

Page 10: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

10

Von Kempelen's talking machine

1791

"Briefly, the device was operated in the following manner. The right arm rested on the main bellows and

expelled air though a vibrating reed to produce voiced sounds." (This is illustrated in the lower half of the

figure). "The fingers of the right hand controlled the air passages for the fricatives /sh/ and /s/, as well as the

'nostril' openings and the reed on-off control. For vowel sounds, all the passages were closed and the reed

turned on. Control of vowel resonances was effected with the left hand by suitably deforming the leather

resonator at the front of the device. Unvoiced sounds were produced with the reed off, and by a turbulent flow

through a suitable passage. In the original work, von Kempelen claimed that approximately 19 consonant

sounds could be made passably well.” Flanagan, Speech Analysis, Synthesis and Perception, 166-167.

Page 11: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

11

1875

• Alexander Bell invents the method of, and apparatus for,

“transmitting vocal or other sounds telegraphically ... by causing

electrical undulations, similar in form to the vibrations of the air

accompanying the said vocal or other sound”.

=> Major impetus to modern speech processing.

• 1930s: Electrical synthesis of speech by Dudley’s vocoder

Department of Electrical Engineering , IIT Bombay

Page 12: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

12Department of Electrical Engineering , IIT Bombay

Sound -> electrical form*

*The Physics Classroom:http://www.glenbrook.k12.il.us/gbssci/phys/Class/sound/u11l2a.html

Page 13: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

13

Speech “waveform”

Department of Electrical Engineering , IIT Bombay

Page 14: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

14Department of Electrical Engineering , IIT Bombay

Speech Waveforms from “my speech”

(b) “ee” vowel

(c) “s” consonant

(a) start of “y” vowel

Page 15: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

15Department of Electrical Engineering , IIT Bombay

Components of sound

A sound is usually comprised of several frequency

components.

Depending on the relationships of the frequency

components, the sound can elicit a sensation of pitch.

Page 16: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

16Department of Electrical Engineering , IIT Bombay

Speech production

• Vocal cords (larynx) modulate the airflow from the

lungs by rapid opening-closing; the rate of vibration is

determined by their mass and tension.

Pitch frequency ranges:

male: 80-160 Hz; female:160-320 Hz;

singers: over 2 octaves.

• Vocal tract shapes the vocal cord vibrations into the

intricate sounds of speech via changes in shape to

produce various acoustic resonances.

Page 17: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

17Department of Electrical Engineering , IIT Bombay

• The sound spectrum is modified by the

shape of the vocal tract.

• The resonant frequencies of the vocal

tract cause peaks in the spectrum called

formants.

Vocal tract “filter”*

*Childers, Speech Overview

Page 18: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

18Department of Electrical Engineering , IIT Bombay

Most important aspects of speech…

• The intelligence in speech is encoded in the power

spectrum of the acoustic pressure wave.

• Different articulatory configurations result in signals

with different spectra, esp. different resonance

frequencies called formants, which are perceived as

different sounds.

• The different spectra make up the finite alphabet of

symbols (linguistic code) governed by a hierarchy of

linguistic rules.

Page 19: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

19Department of Electrical Engineering , IIT Bombay

Basic sounds of speech: Phones

• The speech signal can be divided into sound segments

with fixed articulation and acoustics over short intervals.

i.e. articulatory configuration <=> acoustic properties

Smallest meaningful sound unit: “phone”

(i.e. set of distinctive sounds of a language)

In Indian written scripts, one symbol represents one phone.

Page 20: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

20Department of Electrical Engineering , IIT Bombay

Classification of speech sounds

Vowels and Consonants

• Vowels: steady sounds specified by position of the articulators (typically, tongue)

• Consonants: are (dynamic) sounds classified

by place and manner of articulation

Page 21: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

21Department of Electrical Engineering , IIT Bombay

Place of articulation

(constriction of vocal tract)

Page 22: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

22Department of Electrical Engineering , IIT Bombay

Page 23: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

23Department of Electrical Engineering , IIT Bombay

“my speech”

Dark areas of spectrogram

show high intensity

– Voiced segments are much

louder than unvoiced

– Horizontal dark bands are the

formant peaks

– “s” has high frequency content

– Vertical bands are individual

larynx closures

– The “y” of “my” is a diphthong:

two successive vowels

“Decoding” the speech signal: visible speech

Page 24: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

24Department of Electrical Engineering , IIT Bombay

Page 25: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

25Department of Electrical Engineering , IIT Bombay

Machli jal ki hai raani jeevan uska he paani

Page 26: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

26Department of Electrical Engineering , IIT Bombay

Indian costumes are quite colourful

Page 27: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

27Department of Electrical Engineering , IIT Bombay

Speech perception

Distinct stages of physiological processing

in the auditory system:

Peripheral auditory system (Ears) � analysis

Auditory nervous system (Brain) �synthesis

Page 28: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

28Department of Electrical Engineering , IIT Bombay

Audible sound

Page 29: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

29Department of Electrical Engineering , IIT Bombay

Sound and Sensation

A sound of given frequency components and sound pressure levels leads to perceived sensations that can be distinguished in terms of:

– loudness <-- intensity

– pitch <-- fundamental frequency

– timbre (“quality” or “colour”)

<--ther spectro-temporal properties

Page 30: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

30Department of Electrical Engineering , IIT Bombay

Our auditory apparatus

Cochlea:

Ear’s microphone

HyperPhysics, Sound and Hearing, Georgia State University

(http://hyperphysics.phy-

astr.gsu.edu/hbase/sound/soucon.html#soucon)

Page 31: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

31Department of Electrical Engineering , IIT Bombay

Basilar Membrane

Location-dependent frequency “resonance”

•Thickness and tension

vary along its length

•Traveling wave has

maximum vibration

amplitude at a location

depending on its frequency

Page 32: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

32Department of Electrical Engineering , IIT Bombay

Basilar Membrane

Frequency-to-place transformation (Fourier analysis)

HyperPhysics, Sound and Hearing, Georgia State University

(http://hyperphysics.phy-

astr.gsu.edu/hbase/sound/soucon.html#soucon)

Page 33: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

33Department of Electrical Engineering , IIT Bombay

Applications

• Automatic speech recognition/ understanding

• Text-to-speech synthesis

• Speaker verification (biometric)

• Digital storage/transmission of speech

• Aids to the handicapped

• Enhancement of quality

Page 34: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

34Department of Electrical Engineering , IIT Bombay

Transmission/storage

Waveform coding:

distortion vs bit rate

What distortion is

“acceptable” depends on

the application and on

human perception.

Page 35: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

35Department of Electrical Engineering , IIT Bombay

Digital audio bit rates: Waveform coding

Format Sample Rate

(kHz)

Bits/sample

Telephony 8 12 (=> 96 kbps)

Wideband audio 16 16

Hi-fidelity audio 44.1 16

Page 36: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

36Department of Electrical Engineering , IIT Bombay

Source-filter model parameters

Pitch and vocal tract shape vary slowly in time

Page 37: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

37Department of Electrical Engineering , IIT Bombay

Frame-based coding of speech

Page 38: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

38Department of Electrical Engineering , IIT Bombay

Automatic speech recognition

• To extract the linguistic code (a structured

sequence of discrete symbols) from an analysis of the acoustic speech signal.

• That is, continuous, noisy measurements of a non-stationary function of time only are available.

Page 39: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

39Department of Electrical Engineering , IIT Bombay

Automatic speech recognition

• Feature calculation (to a more distinctive domain)

• Pattern classification with respect to previously

trained models of phones/words

• Improved transcription based on language model

Page 40: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

40Department of Electrical Engineering , IIT Bombay

*K.Samudravijaya, A Tutorial on

Speech and Speaker Recognition

ASR: block diagram*

Page 41: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

41Department of Electrical Engineering , IIT Bombay

ASR: Challenges

• Inter- and intra-speaker variations

• Effects of coarticulation in continuous speech

• Background noise and variable channels

Page 42: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

42

Categories of speech recognition tasks

Human to machine:

• Database query/ information retrieval

• Dictation

Human to human:

• Broadcast news

• Lectures

• Voice mail

• Meeting

• Telephone conversation

Department of Electrical Engineering , IIT Bombay

Page 43: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

43Department of Electrical Engineering , IIT Bombay

Speaker recognition

(voice-based biometric)

• The voice signal is considered relatively easy to

acquire/collect.

• Speech enables an (indirect) measurement of

physiological features (i.e. characteristics of the

speaker’s voice production system).

• Applications:

Commercial (access control, segmentation)

Military, Forensic

Page 44: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

44Department of Electrical Engineering , IIT Bombay

What: To convert a text string into a speech waveform

Why: For technology to communicate when a display would

be inconvenient.

Speech Synthesis

Page 45: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

45Department of Electrical Engineering , IIT Bombay

Basic TTS System

Prosody => A phone is long/short, loud/soft, high/low-pitched

Page 46: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

46Department of Electrical Engineering , IIT Bombay

Outline

• Speech production (physiology)

• Classification of sounds: articulatory, acoustic

• Speech analysis (signal processing methods for information extraction)

• Hearing, and speech perception

• Speech technology (speech compression, ASR,TTS)

• Audio/music technology

Page 47: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

47Department of Electrical Engineering , IIT Bombay

Text / References

• Douglas O'Shaughnessy, Speech Communications: Human and Machine, Universities Press (India) Ltd., 2001

• Rabiner and Schafer, Digital Processing of Speech Signals

• IITB Moodle for all course-related hand-outs

Page 48: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

48Department of Electrical Engineering , IIT Bombay

Recognition: “Vowel triangle”

Page 49: EE679 : Speech Processingdaplab/courses/ee679-overview20… · 10 Von Kempelen's talking machine 1791 "Briefly, the device was operated in the following manner. The right arm rested

49Department of Electrical Engineering , IIT Bombay

Speaker variability: due to differences in vocal physiology