Ann Speech Recog

8/14/2019 Ann Speech Recog

1/20

Artificial Neural Networkfor Speech Recognition

Austin MarshallMarch 3, 20052nd Annual Student Research Showcase


2/20

23/3/05

Overview

Presenting an Artificial Neural Network torecognize and classify speech

Spoken digits

one,two,three, etc

Choosing a speech representation scheme

Training Perceptron Results


3/20

33/3/05

Representing Speech Problem

Recording samples never produce identical waveforms Length Amplitude Background noise Sample rate

However, perceptual information relative to speech remains consistent

Solution Extract speech-related information

See: Spectrogram


4/20

43/3/05

Representing Speechone one

Waveform

Spectrogram


5/20

53/3/05

Spectrogram Shows change in

amplitude spectra over

time Three dimensions

X Axis: Time Y Axis: Frequency Z axis: Color intensity

represents magnitude

one


6/20

63/3/05

Mel Frequency Cepstrum Coefficients

Spectrogram provides a good visualrepresentation of speech but still variessignificantly between samples

A cepstral analysis is a popular method forfeature extraction in speech recognition

applications, and can be accomplished usingMel Frequency Cepstrum Coefficient analysis(MFCC)


7/20

73/3/05

Mel Frequency Cepstrum Coefficients Inverse Fourier transform of the log of the Fourier

transform of a signal using the Mel Scale filterbank mfcc function returns vectors of 13 dimensions


8/20

83/3/05

Network Architecture Input layer

26 Cepstral Coefficients

Hidden Layer 100 fully-connected hidden-layer

units Weight range between -1 +1

Initially random Remain constant

Output 1 output unit for each target Limited to values between 0 and +1


9/20

93/3/05

Sample Training Stimuli(Spectrograms)

one

two

three


10/20

103/3/05

Training the network

Spoken digits were recorded Seven samples of each digit

One through eight recorded Total of 56 different recordings with varying lengths and

environmental conditions

Background noise was removed from each

sample


11/20

113/3/05


Calculate MFCC using Malcolm SlaneysAuditory Toolbox

c=mfcc(s,fs,fix((3*fs)/(length(s)-256))) Limits frame rate such that mfcc always produces a matrix of two

vectors corresponding to the coefficients of the two halves of thesample

Convert 13x2 matrix to 26 dimensionalcolumn vector

c=c(:)


12/20

123/3/05


Supervised learning Choose intended target and create a target vector

56 dimensional target vector

If training the network to recognize spokenone, target has a value of +1 for each of theknown one stimuli and 0 for everything else


13/20

133/3/05

Training the network Train a multilayer perceptron with feature vectors

(simplified) Select stimuli at random Calculate response to stimuli Calculate error Update weights Repeat

In a finite amount of time, the perceptron willsuccessfully learn to distinguish between stimuli ofan intended target and not.


14/20

143/3/05

Training the network Calculate response to

stimuli Calculate hidden layer

h =sigmoid(W* s +bias) Calculate response

o=sigmoid( v *h +bias)

Sigmoid transfer function Maps values between 0 and +1

sigmoid(x)=1/(1+e -x)


15/20

153/3/05


Calculate error For a given stimuli, error is the difference between target and

response t-o t will be either 0 or 1 o will be between 0 and +1


16/20

163/3/05


Update weights v =v previous +(t-o) h T

v is weight vector between hidden-layer units and output (gamma) is learning rate


17/20

173/3/05

Results

Learning rate: +1 Bias: -1 100 hidden-layer

units 3000 iterations 316 seconds to learn

target

Target = one


18/20

183/3/05

Results

Response to unseen stimuli Stimuli produced by same voice used to train network with noise

removed Network was tested against eight unseen stimuli corresponding

to eight spoken digits Returned 1 (full activation) for one and zero for all other stimuli. Results were consistent across targets

i.e. when trained to recognize two, three, etc sigmoid(v*sigmoid(w*t1+bias)+bias) == 1


19/20

193/3/05

Results Response to noisy sample

Network returned a low, but response > 0 to a sample withoutnoise removed

Response to foreign speaker Network responded with mixed results when presented samples

from speakers different from training stimuli

In all cases, error rate decreased andaccuracy improved with more learningiterations


20/20

203/3/05

References Jurafsky, Daniel and Martin, James H. (2000) Speech and Language

Processing: An Introduction to Natural Language Processing,Computational Linguistics, and Speech Recognition (1st ed.). Prentice Hall

Golden, Richard M. (1996) Mathematical Methods for Neural Network Analysis and Design (1st ed.). MIT Press Anderson, James A. (1995) An Introduction to Neural Networks (1st ed.).

MIT Press Hosom, John-Paul, Cole, Ron, Fanty, Mark, Schalkwyk, Joham, Yan,

Yonghong, Wei, Wei (1999, February 2). Training Neural Networks for

Speech Recognition Center for Spoken Language Understanding, OregonGraduate Institute of Science and Technology,http://speech.bme.ogi.edu/tutordemos/nnet_training/tutorial.html

Slaney, Malcolm Auditory Toolbox Interval Research Corporation

Ann Speech Recog

Documents

NORTHWESTERN UNIVERSITY Flexibility of Speech Perception ... · Flexibility of Speech Perception and Production Strategies in Mandarin-English Bilinguals L. Ann Burchfield This dissertation

Face Recog

08 Equity Recog

Talumpok national high school recog

iNNvestigate neural networks! - arXiv · 1. Introduction In recent years deep neural networks have revolutionized many domains, e.g., image recog-nition, speech recognition, speech

OS 10.8 Web Voice Recog

Speech Recog

k.facial Recog

Fire Hazard Recog

Face recog - slideshare

Speech & Resonance Disorders due to Cleft Palate and ... Ann, 1 Speech-Resonanc… · due to Cleft Palate and Velopharyngeal Dysfunction: Assessment and Intervention ... the following

Fighter recog

Shell Aviation recog

Ann-Maree Edwards Bachelor of Speech Pathology (Honours I)345399/s33266130_mphil... · Ann-Maree Edwards Bachelor of Speech Pathology ... Measures of sound pressure level ... partner’s

13.face recog

Speech Recog Report (2)

Chunking Team Pattern Recog Icis2003

Recog Training Pres

Fingerprint Recog

2d Ed Rev Recog St0610