24
4/25/2001 ECE566 Philip Felber 1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25, 2001 Prepared for Dr. Henry Stark ECE 566 Statistical Pattern Recognition

presentation.ppt

Embed Size (px)

DESCRIPTION

presentation.ppt

Citation preview

Page 1: presentation.ppt

4/25/2001 ECE566 Philip Felber 1

Speech Recognition A report of an Isolated Word experiment.

By Philip FelberIllinois Institute of Technology

April 25, 2001Prepared for Dr. Henry StarkECE 566 Statistical Pattern Recognition

Page 2: presentation.ppt

4/25/2001 ECE566 Philip Felber 2

Speech Recognition

Speech recognition and production are components of the larger subject of speech processing.Speech recognition is as old as the hills.Survey of speech recognition in general.Description of a simple isolated word computer experiment programmed in MATLAB.

Page 3: presentation.ppt

4/25/2001 ECE566 Philip Felber 3

Sounds of Spoken Language

Phonetic components (1877): Sweet Voiced, unvoiced and plosive Vowels and consonants

Acoustic wave patterns (1874): Bell Oscilloscope (amplitude vs. time) Spectroscope (power vs. frequency) Spectrogram (power vs. freq. vs. time)

Koenig, Dunn, and Lacey (1946).

Page 4: presentation.ppt

4/25/2001 ECE566 Philip Felber 4

Vocabulary (numbers)with Phonetic Spellings

one W AH N six S IH K S

two T UW seven S EH V AH N

three TH R IY eight EY T

four F AO R nine N AY N

five F AY V zero Z IH R OW

Page 5: presentation.ppt

4/25/2001 ECE566 Philip Felber 5

The Word “SIX”Oscillograph and

Spectrogram

0 500 1000 1500 2000 2500 3000 3500 4000-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1 SIX

Time

Freq

uenc

y

SIX

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450

500

1000

1500

2000

2500

3000

3500

4000

Page 6: presentation.ppt

4/25/2001 ECE566 Philip Felber 6

Contributions toAutomatic Speech Recognizers

Vocoder (1928): DudleyLinear Predictive Coding (1967): Atal, Schroeder, and HanaeurHidden Markov Models (1985): Rabiner, Juang, Levinson, and SondhiContinuous speech (199x): various using ANN and HMM

Page 7: presentation.ppt

4/25/2001 ECE566 Philip Felber 7

Automatic Speech Recognizers

HAL 9000 from Kubrick’s film 2001: A Space OdysseyCommand / ControlSecurity – Access controlSpeech to textTranslation

Page 8: presentation.ppt

4/25/2001 ECE566 Philip Felber 8

Survey of Speech to Text

IBM VoiceType – ViaVoiceDragon Systems DragonDictateKurzweil VoicePlus

Page 9: presentation.ppt

4/25/2001 ECE566 Philip Felber 9

Speech Waveform Capture

Analog to digital conversionSound cardSampling rateSampling resolutionStandardized in amplitude and time

Page 10: presentation.ppt

4/25/2001 ECE566 Philip Felber 10

Pre-processing

Analog to digital conversion.Speech has an overall spectral tilt of5 to 12 dB per octave.A pre‑emphasis filter is normally used.Normalize or standardize in loudness.Temporal alignment.

Page 11: presentation.ppt

4/25/2001 ECE566 Philip Felber 11

Feature Extraction

Linear predictive coding (LPC)LPC-cepstrum

Page 12: presentation.ppt

4/25/2001 ECE566 Philip Felber 12

The Word “SIX”LPC and LPC-Cepstrum

0 2 4 6 8 10 12 14 16 18 20-0.4

-0.2

0

0.2

0.4

0.6

0.8

1 SIX

0 2 4 6 8 10 12 14 16 18 20-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25 SIX

Page 13: presentation.ppt

4/25/2001 ECE566 Philip Felber 13

Response of LPC Filterfor “FOUR” and “SIX”

Frequency (Hz)

0 500 1000 1500 2000 2500 3000 3500 4000-10

0

10

20

Frequency (Hz)

Mag

nitu

de (

dB)

Frequency (Hz)

0 500 1000 1500 2000 2500 3000 3500 4000-20

-10

0

10

20

Frequency (Hz)

Mag

nitu

de (

dB)

Page 14: presentation.ppt

4/25/2001 ECE566 Philip Felber 14

Classification

Simple metric distance to mean (parametric) k-nearest neighbor (non-parametric)

Advanced recognizers Hidden Markov models (HMM) Artificial neural networks (ANN)

Page 15: presentation.ppt

4/25/2001 ECE566 Philip Felber 15

Several small (10 words) vocabularies.Separate training and testing data.Linear predictive coding and cepstrum.A correlation ratio, Euclidian distance, k-nearest neighbor, and Mahalanobis.

An Isolated Word Experiment

Page 16: presentation.ppt

4/25/2001 ECE566 Philip Felber 16

The Apparatus

ComputerWindows NTMATLAB (student or full version)Sound cardLoudspeakers and microphoneAbout a dozen MATLAB programs

Page 17: presentation.ppt

4/25/2001 ECE566 Philip Felber 17

Program Structure

Array ofFeatureVectors

Training

Extracting

Testing

Matching

Extracting

Clasification

Page 18: presentation.ppt

4/25/2001 ECE566 Philip Felber 18

Extractors

Linear predictive coding (LPC) Coefficients of an all pole filter that

represents the formants.

LPC cepstrum Coefficients of the Fourier transform

of the log magnitude of the spectrum.

Page 19: presentation.ppt

4/25/2001 ECE566 Philip Felber 19

Classifiers

A correlation measure Inner-product against feature average.

Euclidean distance Distance to feature average.

k-nearest neighbor (non-parametric) Sorted distance to each feature.

Mahalanobis distance Distance adjusted by covariance.

Page 20: presentation.ppt

4/25/2001 ECE566 Philip Felber 20

The Experiments

Male and female speakers.Several vocabularies.Separate training and testing tapes.Standard “runs” against various algorithm combinations.

Page 21: presentation.ppt

4/25/2001 ECE566 Philip Felber 21

The Results 

Extract

 

Match

Linear Prediction LPC Cepstrum

numbers1-9 & 0

aeiour g b

yes no

numbers 1-9 & 0

aeiour g b

yes no

Correlation metric21(9) features

98.75%(87.5)

68.75%92.5%(48.75)

68.75%

Euclidean distance 21(9) features

98.75%(93.75)

75%92.5%(56.25)

70%

3-nearest neighbors 19(9) features

100%(97.5)

92.5%97.5%(78.75)

95%

Mahalanobis dist.9(9) features

51.25%(51.25)

81.25%61.25%(61.25)

77.5%

Page 22: presentation.ppt

4/25/2001 ECE566 Philip Felber 22

Summary

LPC worked better than LPC-cepstrum.Poor results from Mahalanobis because of insufficient data for estimate of covariance matrix.Laboratory worked better than studio.Good noise canceling microphone helps.

Page 23: presentation.ppt

4/25/2001 ECE566 Philip Felber 23

Where To Get More Information

D. Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice‑Hall, 2000. Search the ‘NET’ for speech recognition.

Page 24: presentation.ppt

4/25/2001 ECE566 Philip Felber 24

Food for Thought