Upload
carver
View
95
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Spectral envelope analysis of TIMIT corpus. using LP, WLSP, and MVDR. Steve Vest Matlab implementation of methods by Tien-Hsiang Lo. Overview. Methods WLSP MVDR TIMIT corpus Measurements. Analysis methods. LP Linear Prediction using autocorrelation method WLSP - PowerPoint PPT Presentation
Citation preview
Spectral envelope analysis of TIMIT corpususing LP, WLSP, and MVDR
Steve Vest
Matlab implementation of methods by Tien-Hsiang Lo
Overview
• Methods• WLSP• MVDR
• TIMIT corpus
• Measurements
Analysis methods
• LP• Linear Prediction using autocorrelation method
• WLSP• Weighted-sum Line Spectrum Pairs
• MVDR• Minimum Variance Distortionless Response
• MVDR of WLSP• MVDR applied to WLSP coefficients
WLSP
• Purpose: Increase spectral dynamics between peaks and valleys in spectral envelope• Maximizes difference between peak and valley
amplitudes• Uses autocorrelation values beyond N to obtain
better accuracy
• When applied to Speech coding• Improves quality of decoded speech• Attenuates quantization noise level in the valleys
WLSP Algorithm
1. Apply Hamming window to signal
2. Calculate N-1 order LP coefficients
3. Using LP coefficients calculate LSP polynomials
ˆ ˆ
ˆ ˆR
Rp = a +a
q = a a
where p and q are the symmetric and antisymmetric LSP polynomials, â is the zero-extended vector of LP coefficients, and âR is the reversal of â.
WLSP Algorithm
3. Calculate WLSP polynomial
4. λ is the weighting parameter chosen to minimize the error between the autocorrelations of the speech and the WLSP all-pole filter impulse response• autocorrelations match n=1:N
• Minimize SSE for n=N+1:N+1+L
1
0,1
d p q
WLSP vs. LP
MVDR
• Estimates the power at each frequency by applying a special FIR filter
• Distortionless constraint• FIR filter minimizes the total output power while
preserving unity gain at the estimating frequency• Solving for distortionless filter is a constrained
optimization problem
• More robust modeling method than LP but can be equated from LP
MVDR Algorithm
1. Calculate LP coefficients ak
2. Calculate MVDR coefficients μk
*
0
*
11 2 , for 0 :
, for : 1
N k
i i kiek
k
N k i a a k NP
k N
Note that MVDR coefficients are symmetric and have order 2N+1
MVDR vs. LP
MVDR of WLSP
• Just an exercise out of curiosity• Performs WLSP• Performs MVDR using coefficients from WLSP
instead of LP
• Resulting conclusion• It’s a bad idea…
MVDR of WLSP vs. MVDR
TIMIT corpus
• “The TIMIT corpus of read speech has been designed to provide speech data forthe acquisition of acoustic-phonetic knowledge and for the development andevaluation of automatic speech recognition systems.”
• Large collection of speech samples from 8 regions of the USA
• Samples are phonetically labeled
TIMIT regions
• Region 1: New England
• Region 2: Northern
• Region 3: North Midland
• Region 4: South Midland
• Region 5: Southern
• Region 6: New York City
• Region 7: Western
• Region 8: Army Brat (moved around)
Analyzed Vowels• iy beet• ih bit• eh bet• ey bait• ae bat• aa bott• aw bout• ay bite• ah but• ao
bought
• oy boy• ow boat• uh book• uw boot• ux toot• er bird• ax about• ix debit• axr butter• ax-h suspect
Collected Data
• First three formants• Frequency [Hz]• Amplitude [dB]
• Valleys after formants• Frequency [Hz]• Delta [dB]• Difference between formant amplitude and valley
amplitude
• Collected from entire training data set in TIMIT corpus
Collected Data
• Data organized by:• Vowel• Region• Sex• Spectral approximation method• Trineme• Phonemes preceding and following vowel
Collected Data
• Filter orders N=22• LP: N → 22
• WLSP: M=N+1=23
• MVDR: M=2(2N)+1=89
• MVDR of WLSP: M=2(2N)+1=89
• WLSP data is erroneous• Hamming window was not applied which has
noticeable impact on results
• MVDR of WLSP needs to be excluded
• MVDR order is too high
General Observations
• Formant locations vary greatly• Between different speakers• Between different Trinemes• 100-200 Hz for F1• 300-600 Hz for F2• 600-1000 Hz for F3
Work still to be done
• Optimize methods• e.g. WLSP search method for λ• Analysis of data took over 5 hrs
• Determine best filter orders for each method
• Reorganize data storage for easier analysis• Very difficult to sort through 100,000 sets of data
averages
• Determine exact statistics to be taken
• Perform analysis of TIMIT data again
Sources
• Murthi, Manohar N. “All-Pole Modeling of Speech Based on the Minimum Variance Distortionless Response Spectrum”. IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 3, May 2000
• Backstrom, Tom. “All-Pole Modeling Technique Based on Weighted Sum of LSP Polynomials”. IEEE Signal Processing Letters, Vol. 10, No. 6, June 2003