RCC-Mean Subtraction Robust Feature and RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Compare Various Feature based Methods for Robust Speech Recognition in presence of Robust Speech Recognition in presence of
Telephone NoiseTelephone Noise
Amin FazelSharif University of TechnologySharif University of Technology
Hossein Sameti, Mohammad T. Manzuri
February 2005
Computer Engineering Department, Sharif University of Computer Engineering Department, Sharif University of TechnologyTechnology
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
2/30
⢠Introduction
⢠Feature based methodsâ MFCC, RCC, CMN, PLP, RASTA
⢠Mean Normalization Root Cepstral Coefficients
⢠Experimental Resultsâ Experiment 1 â Sharif CSR and TFARSDAT Databaseâ Experiment 2 â HTK CSR and AURORA 2 Database
⢠Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
3/30
Effect of Noise on ASR
⢠Two phase in most ASR systemsâ Trainâ Operating (Testing)
⢠Mismatch causes reduction in accuracy
⢠Mismatch occur because ofâ Environment
⢠Microphone, babble, distance, transmission canal
â Speaker⢠Specific speaker: speed,âŚâ˘ Various speakers: gender, age, accent,âŚ
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
4/30
Effect of Noise on ASR
⢠Noiseâ Additive noise
⢠Babble, car, subway
⢠Exhibit, office, âŚ
â Convolutional Noise⢠Canal, telephone line
⢠Microphone effect⢠Distance of speaker to microphone
â Others ⢠Lombard noise, Reflection of building
noise
Stationary Non-stationary
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
5/30
Effect of Noise on ASR
⢠Simple model
⢠Robust Speech Recognition is the study of building speech recognition that handle mismatch condition.
Convolutional
noise CorruptedSpeech
Additive noise
Clean Speech
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
6/30
Robustness Methods
⢠Signalâ Speech enhancement
⢠Featureâ Robust feature extraction
⢠Modelâ Change of the model parameters
â Model trainingTraining phase
Testing phase
SpeechSignal
Features ModelFeature
ExtractionModel
Training
SpeechSignal
Features Model
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
7/30
Introduction
⢠Feature based methodsâ MFCC, RCC, CMN, PLP, RASTA
⢠Mean Normalization Root Cepstral Coefficients
⢠Experimental Resultsâ Experiment 1 â Sharif CSR and TFARSDAT Databaseâ Experiment 2 â HTK CSR and AURORA 2 Database
⢠Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
8/30
Mel-Frequency Cepstral Coefficient
⢠Compute magnitude-squared of Fourier transform
⢠Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution
⢠Take log of outputs ( for RCC we take root instead of log)
⢠Compute cepstral using discrete cosine transform
⢠Smooth by dropping higher-order coefficients
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
9/30
Temporal processing
⢠To capture the temporal features of the spectral envelop; to provide the robustness:âDelta Feature: first and second order differences; regressionâCepstral Mean Subtraction:
⢠For normalizing for channel effects and adjusting for spectral slope
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
10/30
Perceptual Linear Prediction (PLP)
⢠Compute magnitude-squared of Fourier transform⢠Apply triangular frequency weights that represent the
effects of peripheral auditory frequency resolution
⢠Apply compressive nonlinearities
⢠Compute discrete cosine transform
⢠Smooth using autoregressive modeling⢠Compute cepstral using linear recursion
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
11/30
PLP (Cont.)
⢠Algorithm
Intensity-Loudness
Conversion
Inverse DFT
Find Autoregressive
Coefficients
All pole model
Critical Band Analysis
Equal Loudness Pre-
Emphasis
Speech signal
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
12/30
RelAtive SpecTral Analysis
⢠Which makes PLP (and possibly also some other short-term spectrum based techniques) more robust to linear spectral distortions
⢠The new spectral estimate is less sensitive to slow variations in the short-term spectrum
⢠Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features
â This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz)
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
13/30
RASTA (Cont.)
⢠Algorithm
SPECTRAL ANALYSIS
Bank of Compressing Static Nonlinearities
Bank of Linear Band pass Filters
Bank of Expanding Static Nonlinearities
OPTIONAL PROCESSING
SPEECH SIGNAL
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
14/30
RASTA-PLP
⢠Algorithm
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
15/30
Introduction
Feature based methodsâ MFCC, RCC, CMN, PLP, RASTA
⢠Mean Normalization Root Cepstral Coefficients
⢠Experimental Resultsâ Experiment 1 â Sharif CSR and TFARSDAT Databaseâ Experiment 2 â HTK CSR and AURORA 2 Database
⢠Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
16/30
RCC-Mean Normalization
⢠Root Cepstral Coefficients (RCC)â Derived using root compression rather than
log compression on the filterbank energies
⢠Advantage of RCC to MFCCâ More immune to noiseâ Faster decoding
P , 2, 1,jfor ,][~
][][1
0
kSkwjeN
kj
m,2, 1,ifor ,)5.0(
cos])[(][1j
P
P
jijeiRCC
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
17/30
RCC-Mean Normalization
⢠Mean normalization
⢠If we approximate root with logarithm
NiCCC avgyiyiMNRCC ,,1 ,;;;_
N
iiyavgy C
NC
1;;
1
hsy CCCnhnsny
)()()(
avghavgsavgy CCC ;;;
0
is
hhisiMNRCC
C
CCCC
;
;;_
)(
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
18/30
Introduction
Feature based methodsâ MFCC, RCC, CMN, PLP, RASTA
Mean Normalization Root Cepstral Coefficients
⢠Experimental Resultsâ Experiment 1 â Sharif CSR and TFARSDAT Databaseâ Experiment 2 â HTK CSR and AURORA 2 Database
⢠Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
19/30
Experiment 1
⢠Databaseâ TFARSDAT
⢠64 Speakers⢠8 hours telephony speech data
⢠ASRâ Sharif ASR System
⢠HMM based⢠Training: Segmental K-means ⢠Search: Beam Viterbi
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
20/30
Experiment 1
⢠Test results
Accuracy Correctness%
MFCC % 54.97 % 59.32
MFCC_CMS % 51.62 % 56.63
RASTA_PLPRASTA_PLP % 58.38% 58.38 % 65.59% 65.59
RCC % 55.67 % 59.85
RCC_MNRCC_MN % 56.89% 56.89 % 64.31% 64.31
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
21/30
Experiment 2
⢠Aurora 2.0â Noisy connected digits recognitionâ 4 hours training data, 2 hours test data in
70 Noise Types/SNR conditions
⢠HTKâ HMM basedâ Model for each digit
⢠16 states with 3 Gaussian mixtures
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
22/30
Experiment 2
⢠Average results on AURORAâ Average obtained on various SNRs of a noise
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
23/30
Experiment 2
⢠Subway noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
24/30
Experiment 2
⢠Babble noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
25/30
Experiment 2
⢠Car noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
26/30
Experiment 2
⢠Exhibition noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
27/30
Introduction
Feature based methodsâ MFCC, RCC, CMN, PLP, RASTA
Mean Normalization Root Cepstral Coefficients
Experimental Resultsâ Experiment 1 â Sharif CSR and TFARSDAT Databaseâ Experiment 2 â HTK CSR and AURORA 2 Database
⢠Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
28/30
Summery
⢠Various robust features was tested
⢠Introduce of RCC_MN
⢠In first experimentâ RASTA-PLP â Although RCC_MN is good
⢠In second experimentâ RCC_MN
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
29/30
Introduction
Feature based methodsâ MFCC, RCC, CMN, PLP, RASTA
Mean Normalization Root Cepstral Coefficients
Experimental Resultsâ Experiment 1 â Sharif CSR and TFARSDAT Databaseâ Experiment 2 â HTK CSR and AURORA 2 Database
Summery
Outline
Thanks for your patience !