Upload
chelsea-osborne
View
33
Download
1
Tags:
Embed Size (px)
DESCRIPTION
ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE. Chun-Yu Chen. Wooil Kim and John H. L. Hansen. Outline. Real conversational speech corpus TEO-CB-AUTO-ENV Emotional language model score Experimental results. Real conversational speech corpus. - PowerPoint PPT Presentation
Citation preview
ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH
BY LEVERAGING CONTENT STRUCTUREWooil Kim and John H. L. Hansen
Chun-Yu Chen
Outline• Real conversational speech corpus
• TEO-CB-AUTO-ENV
• Emotional language model score
• Experimental results
Real conversational speech corpus
• Neutral speech• digits , alphabets , and other words (First,
July, August)• specific information
• Angry speech• negative words (not, no, can’t, even, how)• Complaints• others(that, this, here)
Real conversational speech corpus
TEO-CB-AUTO-ENV
• one of the acoustic features for angry speech detection
• designed to represent nonlinear characteristics of the voiced sound production (e.g., vowels)
• The resulting vector of area coefficients has been shown to be large for neutral speech
Emotional language model score
• two types of combination methods1. feature combination
MFCC feature vector is appended to the TEO-CB-Auto-Env feature vector
2. classifier combination combining the likelihood scores from both
classifiers with a scale factor
• “Emotional” language models• Based on an initial language model with a
large vocabulary (HUB4)• using the transcripts of neutral and angry
speech• using HTK and CMU-Cambridge SLMT
toolkit to adapting the initial laguage model• formulate a 2-dimensional feature vector for
a “lexical” feature
Emotional language model score
Emotional language model score
• Collect data• 15 female and 13 male speakers• 136 segments for neutral speech and 124
segments for angry speech• Each segment has 3-6 sec
Experimental results
Experimental results• Two type of model for test
1. Open-speaker• model training by all data except tester’s
2. Close-speaker• Split to two part of data• Tester only speak utterance in part A• Model is training by part B• More performance by include more data
• Without EMLS• MFCC-EDZ is best
in single feature
Experimental results
• With EMLS
Experimental results