HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

  • View
    214

  • Download
    2

Embed Size (px)

Text of HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

  • Slide 1
  • HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOS C. SEGURA LUNA GSTC UGR
  • Slide 2
  • 2 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Schedule VAD for noise suppression & frame-dropping Long-Term Spectral divergence Subband OS-based detector Non-linear feature normalization Histogram equalization OS-based equalization Segmental implementation
  • Slide 3
  • 3 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 VAD (1) VAD: motivation To get an estimation of the background noise for Wiener filter design Spectral subtraction To discard non-speech frames WIENER FILTER / SS VAD FRAME DROPPING NOISE ESTIMATION RECOGNIZER NOISY SPEECH
  • Slide 4
  • 4 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 VAD (2) Our approach Use of rather long time spans (~100ms) instead of instantaneous measures Increase discrimination Use an statistical model in the log-FBE domain Smoother estimations Use a feedback decision coupled with noise suppression VAD works on less noisy speech Use of Order Statistics More robust estimation
  • Slide 5
  • 5 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Long-Term Spectral Divergence (1) J. Ramrez, J.C. Segura, C. Bentez, A. de la Torre and A.J. Rubio, Efficient voice activity detection algorithms using long-term speech information, Speech Communication 42 (2004) 271287
  • Slide 6
  • 6 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Long-Term Spectral Divergence (2)
  • Slide 7
  • 7 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Long-Term Spectral Divergence (3)
  • Slide 8
  • 8 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Long-Term Spectral Divergence (4)
  • Slide 9
  • 9 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Long-Term Spectral Divergence (5)
  • Slide 10
  • 10 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Long-Term Spectral Divergence (7) Recognition experiments with AURORA 2 and 3
  • Slide 11
  • 11 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Long-Term Spectral Divergence (6)
  • Slide 12
  • 12 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Subband OSF VAD (1) J. Ramrez, J.C. Segura, C. Bentez, A. de la Torre, and A.J. Rubio, An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition, IEEE Trans. On Speech and Audio Processing (to appear in 2005) Decision is based on averaged QSNR defined as a inter-quantile difference Feedback structure VAD operates over the noise-reduced signal
  • Slide 13
  • 13 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Subband OSF VAD (2)
  • Slide 14
  • 14 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Subband OSF VAD (3)
  • Slide 15
  • 15 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Subband OSF VAD (4)
  • Slide 16
  • 16 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Subband OSF VAD (5)
  • Slide 17
  • 17 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Accurate VAD Open topics New alternatives to improve the performance New decision criteria based on OS- filters Already used for edge detection in images Computational efficiency Development of computationally efficient algorithms
  • Slide 18
  • 18 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Feature normalization Objective Transform features to remove undesired variability Linear techniques CMS Cepstral mean subtraction Removes the effect of linear channel distortion CMVN Cepstral mean and variance normalization Extension of CMS to deal with variance reduction caused by the additive noise
  • Slide 19
  • 19 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Feature normalization Non-linear feature distortion Environment effects are non-linear for MFCC features And can hardly be removed with linear techniques Because not only the location (mean) and scale (variance) of the feature distributions are affected, but also the shape (affecting higher order moments of the distribution) Non-linear extensions CDF-matching approaches (HEQ and related) Have been proved to be more effective than linear ones Give normalization for not only the two first moments of the probability distributions
  • Slide 20
  • 20 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 CDF-matching based equalization The main idea Transform the features to match a given PDF In the one-dimensional case CDF-matching gives the solution
  • Slide 21
  • 21 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Equalization and robust classifiers
  • Slide 22
  • 22 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Invariance CMS is invariant to additive bias CMVN is invariant to linear transformations Equalization to a reference distribution is invariant to any invertible transformation (including non-linear ones)
  • Slide 23
  • 23 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 HEQ for robust speech recognition (1) A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Prez, C. Bentez and A.J. Rubio, Histogram equalization of speech representation for robust speech recognition, IEEE Tans. On Speech and Audio Processing (to appear in 2005) Transformation of each component of the MFCC vector to a Gaussian reference Cumulative distribution are estimated using histograms Performance compared with CMS, CMVN and model-based feature compensation (VTS) Combination with (VTS)
  • Slide 24
  • 24 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 HEQ for robust speech recognition (2)
  • Slide 25
  • 25 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 HEQ for robust speech recognition (3)
  • Slide 26
  • 26 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 HEQ for robust speech recognition (4)
  • Slide 27
  • 27 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 HEQ for robust speech recognition (5)
  • Slide 28
  • 28 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Segmental HEQ (1) J.C. Segura, C. Bentez, A. de la Torre, A.J. Rubio and J. Ramrez, Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition, IEEE Signal Processing Letters, 11(5), May 2004 A segmental implementation of HEQ for non-stationary noise A temporal buffer is used for the histogram estimation instead of the full sentence The algorithmic delay is T frames
  • Slide 29
  • 29 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Segmental HEQ (2)
  • Slide 30
  • 30 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 OSEQ: An efficient implementation (1) A very computationally efficient algorithm based on Order Statistics
  • Slide 31
  • 31 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 OSEQ: An efficient implementation (2)
  • Slide 32
  • 32 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Feature normalization Open topics Reference distribution Clean speech / Gaussian / Others? Dynamic features normalization ( and ) After, before or simultaneously [Obuchi, Stern, EUSP03] Progressive normalization Not all MFCC are equally affected and do not have equal discriminative power [de Wet, , ICASSP03] Lower order moments normalization [Hsu, Lee, ICASSP04] Parametric techniques Actual approaches are non-parametric [ Haverinen, Kiss, EUSP03] New applications Speaker independence and adaptation Multi-stream normalization
  • Slide 33
  • 33 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 Combination of techniques Development of a combined robust front-end An accurate VAD For noise parameter estimation A noise reduction technique Spectral subtraction or Wiener filter Statistical feature compensation A Frame-Dropping algorithm To discard non-speech frames And a Feature normalization block For residual non-linear distortion compensation
  • Slide 34
  • 34 Jos C. Segura Luna HIWIRE Meeting Crete, 23-24 September, 2004 VAD (1) Development of a combined robust front-end WIENER FILTER / SS VAD FRAME DROPPING NOISE ESTIMATION FEATURE EQUALIZATION NOISY SPEECH RECOGNIZER
  • Slide 35
  • HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOS C. SEGURA LUNA GSTC UGR