Upload
benjamin-bengfort
View
497
Download
3
Tags:
Embed Size (px)
DESCRIPTION
A presentation of Hermansky & Morgan's 1994 paper, RASTA Processing of Speech. Learn the dramatic effect of RASTA on critical band analysis when combined with PLP to do speech detection! Hermansky, Hynek, and Nelson Morgan. "RASTA processing of speech." Speech and Audio Processing, IEEE Transactions on 2.4 (1994): 578-589.
Citation preview
RASTA Processing of SpeechHynek Hermansky & Nelson Morgan
The Question
Stochastic techniques to derive information from sound seems wasteful, especially since non-speech components have a predictable effect on speech signal.
Can we suppress spectral components that change too quickly or slowly to be speech?
The Answer
RASTA - much like human listeners, isolates not the speech components, but the relative spectral changes in order to reduce slowly changing or steady state factors (noise!). This emphasizes changes/“edges”.
Quick disclaimer: we definitely know what we’re talking about
Edge Detection
Inspiration
Humans can perceive speech like sounds depending on the spectral difference between the current sound and the preceding sound.
Sounds!
An analogous situation might occur in time-reversed speech:
Intelligibility of Time Reversed Speech
More Sounds!
Filters
More Sounds!
What band pass filters sound like from Chris’ experiments.
Speech Processing Reviewhttp://www.learnartificialneuralnetworks.com/images/srfig01.jpg
Speech Processing Reviewhttp://www.learnartificialneuralnetworks.com/images/srfig01.jpg
Perceptual Linear Predictionhttp://svr-www.eng.cam.ac.uk/~ajr/SA95/img181.gif
Replace conventional critical-band short term spectrum in PLP analysis with spectral estimate from frequencies band-pass filtered via a sharp spectral zero.
New estimate is less sensitive to variations.
The RASTA Method
1. Compute critical-band power spectrum (PLP)2. Transform spectral amplitude through compressing static
nonlinear transformation (RASTA)3. Filter the time trajectory of each transformed spectral
component (RASTA)4. Transform the filtered speech representation through
expanding static nonlinear transformation (RASTA)5. Multiply by the equal loudness curve and exponentiate by
0.33 to simulate hearing (PLP)6. Compute an all-pole model of the result (PLP)
RASTA-PLP
The Key→ suppress constant factors in the auditory-like spectrum, prior to estimation of language model.
Research issues:- What domain is filtering in?- What filter to use?
Speech Signal
Spectral Analysis
Bank of Compressing Static Nonlinearities
Bank of Linear Bandpass Filters
Bank of Expanding Static Nonlinearities
Continued Processing
For this paper: an IIR filter with this transfer function
Resulting Filter
- Affects choice of compressing/expanding static nonlinear function (The domain):
1. Logarithmic2. Lin-Log
Two Flavors of RASTA
Logarithmic Amplitude Transformation (step 2)Antilogarithmic (exponential) transformation (step 4)
Natural Logarithm dependent on J, a signal-dependent positive constant that is linear like for J < 1 and logarithmic like for J > 1
J=0.1
J=1.0
Results
Digits recorded over phone lines, with or without noise or changes in noise over time
Isolated Digits Recognition
Large Vocab Continuous Speech
Four speakers each reading 2,652 sentencesSentences were preserved as recorded or had a low-pass filter applied to them
Next Experiments
● Let’s train the model in with no noise and then test it in a situation with noise in the background
● Analogous to software assembled in the factory and used in the real world
● RASTA > PLP when noise changes between training and test
● Success of RASTA depends on transform of signal
Isolated Digits Recognition
Large Vocab Continuous Speech
● Again, success depends on filter used
Optimizing J
● It seems important, then, to pick an appropriate J = domain parameter, for each level of noise
● This can be approximated by measuring energy at the first part of an utterance
● Performance improves even more!
Consequences of RASTA Processing
● Most important advance of RASTA: compare current information to previous information
● This highlights transitions and changes → edge detection!