Transcript

HIWIRE MEETINGHIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004CRETE, SEPTEMBER 23-24, 2004

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR

2

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Schedule

VAD for noise suppression & frame-dropping Long-Term Spectral divergence Subband OS-based detector

Non-linear feature normalization Histogram equalization OS-based equalization Segmental implementation

3

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

VAD (1)

VAD: motivation To get an estimation of the background noise for

Wiener filter design Spectral subtraction

To discard non-speech frames

WIENERFILTER / SS

VAD

FRAMEDROPPING

NOISEESTIMATION

RECOGNIZERNOISY SPEECH

4

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

VAD (2)

Our approach

Use of rather long time spans (~100ms) instead of instantaneous measures

Increase discrimination

Use an statistical model in the log-FBE domain Smoother estimations

Use a feedback decision coupled with noise suppression VAD works on less noisy speech

Use of Order Statistics More robust estimation

5

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (1)

J. Ramírez , J.C. Segura, C. Benítez, A. de la Torre and A.J. Rubio, Efficient voice activity detection algorithms using long-term speech information, Speech Communication 42 (2004) 271–287

K

KnK

K

NFFT

k

NNn

ntkXK

tkN

speechtkN

speechnontkNtkNtkN

tkN

tkLTSE

NFFTtkLTSD

ntkXtkLTSE

),(12

1),(

)1,(

),()1()1,(),(

),(

),(1log10),(

),(max),(

1

02

2

10

6

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (2)

7

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (3)

8

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (4)

9

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (5)

10

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (7)

Recognition experiments with AURORA 2 and 3

11

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Long-Term Spectral Divergence (6)

12

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (1)

J. Ramírez, J.C. Segura, C. Benítez, A. de la Torre, and A.J. Rubio,An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition, IEEE Trans. On Speech and Audio Processing (to appear in 2005)

Decision is based on averaged QSNR defined as a inter-quantile difference

Feedback structure VAD operates over the

noise-reduced signal

13

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (2)

speech)(),(1

)(

9.0),(),(),(

speech-nonfor update),()1()1,(),(

22

),(),()1(),(

),(),(),( :statisticsOrder

)},(),,(,),,({:buffer Temporal

bandin at timeEnergy -log : ),(

1

5.0

)1()(

)12()()1(

tSNRtkQSNRK

tSNR

ptkEtkQtkQSNR

tkQtkEtkE

spNfpNs

tkfEtkEftkQ

tkEtkEtkE

NtkEtkENtkE

kttkE

K

k

Np

NN

ssp

Nr

14

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (3)

15

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (4)

16

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Subband OSF VAD (5)

17

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Accurate VAD

Open topics

New alternatives to improve the performanceNew decision criteria based on OS- filtersAlready used for edge detection in images

Computational efficiencyDevelopment of computationally efficient algorithms

18

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Feature normalization

Objective Transform features to remove undesired variability

Linear techniques CMS

Cepstral mean subtraction Removes the effect of linear channel distortion

CMVN Cepstral mean and variance normalization Extension of CMS to deal with variance reduction caused by the

additive noise

19

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Feature normalization

Non-linear feature distortion Environment effects are non-linear for MFCC features And can hardly be removed with linear techniques Because not only the location (mean) and scale (variance) of

the feature distributions are affected, but also the shape (affecting higher order moments of the distribution)

Non-linear extensions CDF-matching approaches (HEQ and related) Have been proved to be more effective than linear ones Give normalization for not only the two first moments of the

probability distributions

20

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

CDF-matching based equalization

The main idea Transform the features to match a given PDF In the one-dimensional case CDF-matching gives the solution

))((][ˆ)̂()(

)()̂()()(

)̂(][ˆ)(

1

ˆ

xCxTxxxC

duuxduupxC

xxTxxpx

XXX

xx

XX

XX

21

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Equalization and robust classifiers

5.38.0expexplog nhnhxy

22

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Invariance

CMS is invariant to additive bias CMVN is invariant to linear transformations Equalization to a reference distribution is invariant to any

invertible transformation (including non-linear ones)

xxCxGCy

xCxGC

yCyTy

xCxTx

xGy

XY

XY

YY

XX

ˆ))(()))(((ˆ

therefore and

)())((

then invertible is G() if

))(()(ˆ

))(()(ˆ

tiontransforma general A : )(

11

1

1

23

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (1)

A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Pérez, C. Benítez and A.J. Rubio, Histogram equalization of speech representation for robust speech recognition, IEEE Tans. On Speech and Audio Processing (to appear in 2005)

Transformation of each component of the MFCC vector to a Gaussian reference

Cumulative distribution are estimated using histograms

Performance compared with CMS, CMVN and model-based feature compensation (VTS)

Combination with (VTS)

24

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (2)

25

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (3)

26

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (4)

27

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

HEQ for robust speech recognition (5)

28

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Segmental HEQ (1)

J.C. Segura, C. Benítez, A. de la Torre, A.J. Rubio and J. Ramírez, Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition, IEEE Signal Processing Letters, 11(5), May 2004

A segmental implementation of HEQ for non-stationary noise

A temporal buffer is used for the histogram estimation instead of the full sentence

The algorithmic delay is T frames

},,,{ TttTtt xxxX

29

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Segmental HEQ (2)

30

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

OSEQ: An efficient implementation (1)

A very computationally efficient algorithm based on Order Statistics

][12

5.0)())(ˆ(ˆ:tionTransforma

12,,1125.0:table Lookup

125.0

)(ˆ:estimation CDF

:Statistics Order

},,,{: bufferTemporal

11

1

)(

)12()()1(

rGTxr

xCx

TrTr

G[r]

Tr

xC

xxx

xxxX

ttXt

rX

Tr

TttTtt

31

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

OSEQ: An efficient implementation (2)

32

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Feature normalization

Open topics Reference distribution

Clean speech / Gaussian / ¿Others? Dynamic features normalization ( and )

After, before or simultaneously [Obuchi, Stern, EUSP’03]

Progressive normalization Not all MFCC are equally affected and do not have equal

discriminative power [de Wet, …, ICASSP’03] Lower order moments normalization [Hsu, Lee, ICASSP’04]

Parametric techniques Actual approaches are non-parametric [Haverinen, Kiss, EUSP’03]

New applications Speaker independence and adaptation Multi-stream normalization

33

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

Combination of techniques

Development of a combined robust front-end

An accurate VAD For noise parameter estimation

A noise reduction technique Spectral subtraction or Wiener filter Statistical feature compensation

A Frame-Dropping algorithm To discard non-speech frames

And a Feature normalization block For residual non-linear distortion compensation

34

José

C. S

egur

a Lu

na

HIWIRE Meeting – Crete, 23-24 September, 2004

VAD (1)

Development of a combined robust front-end

WIENERFILTER / SS

VAD

FRAMEDROPPING

NOISEESTIMATION

FEATUREEQUALIZATION

NOISY SPEECHRECOGNIZER

HIWIRE MEETINGHIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004CRETE, SEPTEMBER 23-24, 2004

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR