18
Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems (S3)

Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Embed Size (px)

Citation preview

Page 1: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Speech Signal Processing

Lecturer: Jonas SamuelssonTAs: Barbara Resch and Jan Plasberg

Speech Processing Group (TSB)Dept. Signals, Sensors, and Systems (S3)

Page 2: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Speech Processing

SignalProcessing Information

TheoryPhonetics

Acoustics

Algorithms(Programming)

Fourier transformsDiscrete time filtersAR(MA) models

EntropyCommunication theoryRate-distortion theory

Statistical SPStochastic models

PsychoacousticsRoom acousticsSpeech production

Page 3: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Topics, part I

• Analysis of speech signals: – Fourier analysis; spectrogram– Autocorrelation; pitch estimation– Linear prediction; compression, recognition– Cepstral analysis; pitch estimation,

enhancement

Page 4: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Topics, part II

• Speech compression.– Scalar quantization (PCM, DPCM).– (Transform Coding.)– Vector quantization.– State of the art speech coders: CELP, sinusoidal

Page 5: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Topics, part III

• Statistical modeling of speech.– Gaussian mixtures; speaker identification.– Hidden Markov models; speech recognition.

Page 6: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Topics, part IV

• Speech enhancement:– Microphone array processing.

• Beamforming.• Blind signal separation (cocktail party).

– Echo cancellation.• The LMS algorithm.

– Noise suppression.• Spectral subtraction.• The Wiener filter.

Page 7: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Practicalities• 12 lectures, 12 exercises (48h altogether).

• 4 compulsory (graded) assignments.

• 1 written exam.

• 4 study points awarded if success.

• 4 pts = 17 h/week.

• “Spoken Language Processing. A guide…” by Huang et. al. available at Kårbokhandeln.

• Borrow headphones against 200 SEK deposit.

• More info in syllabus and on http://www.s3.kth.se/speech/courses/2E1400/

Page 8: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Tools for Speech Processing:Prerequisites

• Fourier transform (continuous and discrete time, periodic and aperiodic signals).

• Digital filter theory. Z-transform.

• Random processes. Innovation processes, AR, MA. Filtering of stochastic signals.

• Probability theory. ML and MMSE estimation.

• And more… cf. chapters 3 and 5 in Huang.

Page 9: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Speech Production

Lungs

Page 10: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Speech Sounds

• Coarse classification with phonemes.

• A phone is the acoustic realization of a phoneme.

• Allophones are context dependent phonemes.

Page 11: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Phoneme HierarchySpeech sounds

Vowels ConsonantsDiphtongs

Plosive

Nasal Fricative

Retroflexliquid

Lateralliquid

Glide

iy, ih, ae, aa, ah, ao,ax, eh,er, ow, uh, uw

ay, ey,oy, aw

w, y

p, b, t,d, k, g m, n, ng f, v, th, dh,

s, z, sh, zh, h

r

l

Language dependent.About 50 in English.

Page 12: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Speech Waveform Characteristics

• Loudness

• Voiced/Unvoiced.

• Pitch.– Fundamental frequency.

• Spectral envelope.– Formants.

Page 13: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Speech Waveform Characteristics Cont.

Voiced Speech Unvoiced Speech

/ih/ /s/

Page 14: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Short-Time Speech Analysis

• Segments (or frames, or vectors) are typically of length 20 ms.– Speech characteristics are constant.– Allows for relatively simple modeling.

• Often overlapping segments are extracted.

Page 15: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

B B B B

B=1/N

Page 16: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

The Spectrogram

• A classic analysis tool.– Consists of DFTs of overlapping, and

windowed frames.

• Displays the distribution of energy in time and frequency.– is typically displayed.

2

10 )(log10 fXm

Page 17: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

The Spectrogram Cont.

Page 18: Speech Signal Processing Lecturer: Jonas Samuelsson TAs: Barbara Resch and Jan Plasberg Speech Processing Group (TSB) Dept. Signals, Sensors, and Systems

Short time ACF

ACF

|DFT|

/m/ /ow/ /s/