9
Mel-spectrum computation new_fe_sp.c Presentation by Yu Zhang scuyuzh @hotmail.com Oct 1 st ,2003 Seminar Speech Recognition

04 Mel-Spectrum Computation

Embed Size (px)

Citation preview

Page 1: 04 Mel-Spectrum Computation

Mel-spectrum computation new_fe_sp.c

Presentation by Yu Zhang [email protected]

Oct 1st,2003

Seminar Speech RecognitionSeminar Speech Recognition

Page 2: 04 Mel-Spectrum Computation

We know that human ears, for frequencies lower than 1 kHz, hears tones with a linear scale instead of logarithmic scale for the frequencies higher that 1 kHz.

The mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz.

The voice signals have most of their energy in the low frequencies. It is also very natural to use a mel-spaced filter bank showing the above characteristics.

Mel-frequency WrappingMel-frequency Wrapping

Page 3: 04 Mel-Spectrum Computation

line 165 of new_fe_sp.cfloat32 fe_mel(float32 x){ return( 2595.0 * ( float32 ) log10 (1.0 + x / 700.0 ) );}float32 fe_melinv(float32 x){ return( 700.0 * ( ( float32 ) pow (10.0 , x / 2595.0 ) - 1.0 ) );}

)700/1(log*2595)( 10 ffmel

Mel-frequency WrappingMel-frequency Wrapping

Use the following approximate formula to compute the mels for a given frequency f in Hz:

Page 4: 04 Mel-Spectrum Computation

For each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the ‘mel’ scale. The pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels.

The mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz.

Page 5: 04 Mel-Spectrum Computation

Figure 1: Power Spectrum without Mel-frequency WrappingFigure 2: Mel-frequency Wrapping of Power Spectrum

Mel-frequency WrappingMel-frequency Wrapping

Figure 1 Figure 2

Considering the full image with the mel frequency wrapping set, there is less imformation than the one without the mel frequency.But instead if we looking in details,we see that the image with the mel frequency wrapping keeps the low frequencesand removes some informaiton.To summarize,the Mel Frequency wrapping set allows us to keep only the part of useful information.

Page 6: 04 Mel-Spectrum Computation

S[k] is the power spectrumN is the length of the Discrete Fourier TransformL is total number of Triangular Mel weighting filters.

The Mel spectrum is computed by multiplying the Power Spectrum by each of the Triangular Mel Weighting filters and integrating the result.

= 0,1,…,L-1l

Mel spectrumMel spectrum

][][][2/

0

kMkSlSN

k

l

Page 7: 04 Mel-Spectrum Computation

line 62 in new_fe_sp.c int32 fe_build_melfilters(melfb_t *MEL_FB){ //estimate filter coefficients MEL_FB->filter_coeffs = (float32 **)fe_create_2d(MEL_FB->num_filters, MEL_FB->fft_size, sizeof(float32)); MEL_FB->left_apex = (float32 *) calloc(MEL_FB->num_filters,sizeof(float32)); MEL_FB->width = (int32 *) calloc(MEL_FB->num_filters,sizeof(int32)); filt_edge = (float32 *) calloc(MEL_FB->num_filters+2,sizeof(float32)); … melmax = fe_mel(MEL_FB->upper_filt_freq); melmin = fe_mel(MEL_FB->lower_filt_freq); for (i=0;i<=MEL_FB->num_filters+1; ++i){

filt_edge[i] = fe_melinv(i*dmelbw + melmin); } … for (whichfilt=0;whichfilt<MEL_FB->num_filters; ++whichfilt) { //Building the triangular mel weighting filters … } …}

Building the Triangular Mel Weighting filtersBuilding the Triangular Mel Weighting filters

Page 8: 04 Mel-Spectrum Computation

line 156 in new_fe_sp.c void fe_mel_spec(fe_t *FE, float64 *spec, float64 *mfspec){ int32 whichfilt, start, i; float32 dfreq; dfreq = FE->SAMPLING_RATE/(float32)FE->FFT_SIZE;

for (whichfilt = 0; whichfilt<FE->MEL_FB->num_filters; whichfilt++){ start = (int32)(FE->MEL_FB->left_apex[whichfilt]/dfreq) + 1; mfspec[whichfilt] = 0; for (i=0; i< FE->MEL_FB->width[whichfilt]; i++) mfspec[whichfilt] +=FE->MEL_FB->filter_coeffs[whichfilt][i]*spec[start+i]; }}

/**FE is the triangular mel weighting filter*spec is the power spectrum*mfspec is the mel spectrumvariables marked in red are coefficients of mel weighting filter*/

][][][2/

0

kMkSlSN

k

l

Building the Mel spectrumBuilding the Mel spectrum

l=0,1,…L-1

Page 9: 04 Mel-Spectrum Computation

REFERENCES

(1)SPHINX III Signal Processing Front End Specification 31 August 1999, Michael Seltzer ([email protected]) CMU Speech Group

(2) Digital Signal Processing Mini-Project “An Automatic Speaker Recognition System” Minh N. Do, Audio Visual Communications Laboratory Swiss Federal Institute of Technology, Lausanne, Switzerland

(3) Project of Digital Signal Processing - AN AUTOMATIC SPEAKER RECOGNITION SYSTEM

Swati Rastogi (DSC) [email protected]

David Mayor (DSC) [email protected]