View
219
Download
3
Tags:
Embed Size (px)
Citation preview
Modeling of Mel Frequency Features Modeling of Mel Frequency Features for Non Stationary Noisefor Non Stationary Noise
I.AndrianakisI.AndrianakisP.R.WhiteP.R.White
Signal Processing and Control Group Signal Processing and Control Group Institute of Sound and Vibration Institute of Sound and Vibration
Research University of SouthamptonResearch University of Southampton
Outline Outline
Introduction.
Mel Frequency Log Spectrum and Cepstrum.
Distribution of the MFLS and MFC coefficients.
Physical Interpretation of the distributions.
Modeling of data with Gaussian Mixture Models and the EM algorithm.
Results.
Summary & Further work.
IntroductionIntroduction
When working with speech or noise, often one wishes to extract When working with speech or noise, often one wishes to extract some salient features of the signals so that instead of working with some salient features of the signals so that instead of working with the whole data set to concentrate on a smaller set that conveys the whole data set to concentrate on a smaller set that conveys most significant information.most significant information.
Such features are the Mel Frequency Log Spectral and Cepstral Such features are the Mel Frequency Log Spectral and Cepstral Coefficients.Coefficients.
Their favourable property is that they focus mostly on low Their favourable property is that they focus mostly on low frequency components, where most of the car or train noise energy frequency components, where most of the car or train noise energy exists, while compacting the – usually lower energy - higher exists, while compacting the – usually lower energy - higher frequencies.frequencies.
We shall present some results from our research on the application We shall present some results from our research on the application of MFLSCs and MFCCs to noise signals and their modelling with of MFLSCs and MFCCs to noise signals and their modelling with Gaussian Mixture Models.Gaussian Mixture Models.
Mel Frequency Mel Frequency Log Spectrum and CepstrumLog Spectrum and Cepstrum
Mel Frequency Cepstrum
Mel FrequencyLog Spectrum
Noise STFT |.|2Mel Frequency Filter Banks
Log( . ) DCT( . )
Rationale Behind the Use of Rationale Behind the Use of Mel Frequency FeaturesMel Frequency Features
Mel frequency warping focuses in low frequencies (<1Khz) where the filter bank spacing is linear.
Energy above 1KHz is compacted as the filters have logarithmically increasing pass bands.
Suitable for representing ambient noise (i.e. in cars and trains) because the energy is concentrated in the lower frequencies.
Rationale Behind the Use of Rationale Behind the Use of Mel Frequency Features Mel Frequency Features (II)(II)
Filter banks are closely spaced where the signal’s energy is higher.
Comparison With LPCComparison With LPC
TraiTrainn
CarCar
PSDPSD 13 LPC 13 LPC SpectrumSpectrum
20 Mel 20 Mel SpectrumSpectrum
Frequency Frequency [Hz][Hz]
Distribution of the Mel Frequency Distribution of the Mel Frequency CoefficientsCoefficients
We are concerned with the form of the probability distribution of the Mel
Frequency features, that is, the Mel Log Spectrum and the Mel
Cepstrum.
In the following, we shall present the distribution of MF Log Spectrum
Coefficients and MF Cepstral Coefficients for various types of signals.
We shall also try to give a physical explanation for the form of the
distribution for each case.
‘‘Stationary’ Noise Stationary’ Noise
0 20 40 60 80 100 120-1
-0.5
0
0.5
1
Time [s] Time
Fre
quen
cy
Spectrogram
0 20 40 60 80 100 1200
500
1000
1500
2000
2500
3000
3500
4000
This is a segment of car noise and its respective spectrogram.
The signal looks fairly stationary in its mean and variance, while the spectrogram shows that its frequency components do not vary with time either.
We shall proceed now to examine the distribution of its Mel Frequency Features.
Mel Log SpectrumMel Log Spectrum
0 5 10 15 200
0.5
1
1.5
2
2.5
3
3.5
4Kurtosis of Coeffficients
Corfficients
Kur
tosi
s0 1 2 3 4 5 6 7
0
100
200
300
400
500
6001
-1 0 1 2 3 4 50
100
200
300
400
500
6005
-8 -7 -6 -5 -4 -3 -2 -10
100
200
300
400
500
600
700
800
900
100020
-7 -6 -5 -4 -3 -2 -1 0 1 20
200
400
600
800
1000
120016
1 5 16
20Coefficient
s
Time [s]
Coeff
icie
nts
Mel Log Spectrum
0 20 40 60 80 100 120
2
4
6
8
10
12
14
16
18
20
Below we can see the evolution with time of the previous signal’s Mel Log Spectrum, the kurtosis of its coefficients and some characteristic distributions.
The coefficients follow almost a Gaussian distribution.
Mel CepstrumMel Cepstrum
Time [s]
Coeff
icie
nts
Mel Cepstrum
0 20 40 60 80 100 120
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20-0.5
0
0.5
1
1.5
2
2.5Kurtosis of Coeffficients
Corfficients
Kur
tosi
s-2 -1.5 -1 -0.5 0 0.5 1 1.5
0
50
100
150
200
250
300
350
400
450
50015
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.50
50
100
150
200
250
300
350
400
450
50012
4 6 8 10 12 14 16 180
100
200
300
400
500
600
700
800
900
10002
-8 -6 -4 -2 0 2 4 60
100
200
300
400
500
600
700
800
9001
1 2 12
15Coefficient
s
This is the evolution with time of the Mel Cepstrum, the kurtosis of its coefficients and some selected distributions.
The coefficients are again almost Gaussian. The high kurtosis for 1 and 2 is due to a few outliers.
Non-Stationary Noise Non-Stationary Noise
0 50 100 150 200
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Time [s] Time
Fre
quen
cy
Spectrogram
0 50 100 150 2000
500
1000
1500
2000
2500
3000
3500
4000
We shall proceed now to examine how the distributions vary in the case of Non-Stationary noise.
This is a segment of train noise, where a number of amplitude fluctuations occurs due to events as changing of rails and other trains passing by.
Mel Log SpectrumMel Log Spectrum
Time [s]
Coeff
icie
nts
Mel Log Spectrum
0 50 100 150 200
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20-0.5
0
0.5
1
1.5
2
2.5Kurtosis of Coeffficients
Corfficients
Kur
tosi
s-3 -2 -1 0 1 2 3 4 5
0
100
200
300
400
500
600
700
800
900
10001
-2 -1 0 1 2 3 4 5 60
100
200
300
400
500
600
700
800
9007
-4 -3 -2 -1 0 1 2 3 40
100
200
300
400
500
600
700
800
900
100011
-7 -6 -5 -4 -3 -2 -1 00
100
200
300
400
500
600
700
800
900
100019
The Mel Log Spectrum is now varying with time reflecting the different sound events. The kurtosis is also increasing for higher coefficients.
1 7 11
19Coefficient
s
The few first coefficients close to Gaussian but the higher ones develop longer tails.
Mel Cepstrum Mel Cepstrum
Time [s]
Coeff
icie
nts
Mel Cepstrum
0 50 100 150 200
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20-0.5
0
0.5
1
1.5
2
2.5Kurtosis of Coeffficients
Corfficients
Kur
tosi
s-10 -8 -6 -4 -2 0 2 4 6 8 100
100
200
300
400
500
600
700
800
900
10001
0 2 4 6 8 10 12 140
100
200
300
400
500
600
700
8002
-5 -4 -3 -2 -1 0 1 2 3 4 50
100
200
300
400
500
600
700
800
900
10004
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.50
100
200
300
400
500
600
700
800
90011
The sound events are now reflected in the first few Cepstrum coefficients.
1 2 4 11Coefficient
s
Unlike the Log Spectrum the first coefficients now have longer tails, while the higher tend to Gaussian.
Log Spectrum Distribution - Log Spectrum Distribution - Physical InterpretationPhysical Interpretation
The lower ML Spectrum coefficients represent the lower frequencies of the spectrum where there is always noise energy present.
Thus, they assume constant high values with not many fluctuations that turn them close to Gaussian.
Higher coefficients assume high values only temporarily, due to non stationary events.
This results in their distributions having longer tails.
When energy is present at high frequencies for prolonged periods they can even be bimodal.
Time [s]
Coeff
icie
nts
Mel Log Spectrum
0 50 100 150 200
2
4
6
8
10
12
14
16
18
20
-3 -2 -1 0 1 2 3 4 50
100
200
300
400
500
600
700
800
900
10001
-7 -6 -5 -4 -3 -2 -1 00
100
200
300
400
500
600
700
800
900
100019
1 19Coefficient
s
Cepstrum Distribution - Cepstrum Distribution - Physical InterpretationPhysical Interpretation
The lower Cepstrum Coefficients reflect the amplitude and envelope spectral fluctuations.
As both of these vary in non stationary signals so do the lower MFCCs resulting in distributions with long tails.
Higher coefficients however, convey mostly information about harmonic components, not as dominant in the more broadband like noise of trains and cars and definitely not fast fluctuating.
1 11Coefficient
s
Time [s]
Coeff
icie
nts
Mel Cepstrum
0 50 100 150 200
2
4
6
8
10
12
14
16
18
20
-10 -8 -6 -4 -2 0 2 4 6 8 100
100
200
300
400
500
600
700
800
900
10001
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.50
100
200
300
400
500
600
700
800
90011
Modelling the DataModelling the Data
The previous analysis showed that the distribution of Mel Log Spectrum and Mel Cepstrum coefficients deviates from the normal especially in the case of non-stationary noise, which is of most interest.
In our attempt to model successfully the coefficients we used Gaussian Mixture Models, which are capable of approximating irregularly shaped distributions.
An algorithm that allows us to fit mixtures of Gaussians into our data is the Estimation Maximization algorithm.
The Estimation Maximization The Estimation Maximization Algorithm for Gaussian Mixture Algorithm for Gaussian Mixture
ModelsModelsWe assume the probabilistic model:
where:
We assume a latent random variable that determines the distribution comes from.
We then find the expected value of the log likelihood with
respect to , given and an initial guess of the parameters
That is:
1
( | ) ( | )M
i i ii
p p
x x
( , ), 1...i i i M
glog( ( , | ))p x y
jy
x
[log ( , | ) | , )]gE p x y x
jx
y
log ( , | ) ( | , )gp f dy y
x y y x
The Estimation Maximization The Estimation Maximization Algorithm for Gaussian Mixture Algorithm for Gaussian Mixture
Models Models (II)(II)
This was the Expectation step. In the Maximization step we maximize the
expected value with respect to i.e.
The two steps are repeated until convergence.
For an excellent tutorial of EM see:
J. Bilmes, A Gentle Tutorial of the EM Algorithm and its Application fir Gaussian Mixture and Hidden Markov Models
1arg max( [log ( , | ) | , )])i iE p
x y x
Fitting GMM to the DataFitting GMM to the Data
-10 -5 0 5 10 150
0.05
0.1
0.15
0.2
0.25
-15 -10 -5 0 5 10 150
0.05
0.1
0.15
0.2
0.25
-10 -5 0 5 10 150
0.05
0.1
0.15
0.2
0.25
-4 -3 -2 -1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
-4 -3 -2 -1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
-15 -10 -5 0 5 10 150
0.05
0.1
0.15
0.2
0.25
-20 -15 -10 -5 0 5 100
0.05
0.1
0.15
0.2
0.25
Single Gaussian
Two Gaussians
Three Gaussians
Here we present some results of fitting GMMs to
various distributions.
Summary Summary
Today we have discussed about:
The distribution of the Mel Frequency Log Spectral and Cepstral Coefficients.
The form this assumes in the presence of non-stationary noise providing also a physical explanation.
How it can be modeled with Gaussian Mixture models via the EM algorithm.
And finally showed some results of fitting GMMs into our data.
Further Work Further Work
Examine the distribution of Mel Frequency features for noisy speech and see how these are altered by the presence of different noise types.
Construct Optimal Estimators for clean speech Mel features, given the noisy ones and the noise models.
Use HMMs with Gaussian Mixture Models for accommodating the different noise states.