ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE

Preview:

DESCRIPTION

ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE. Chun-Yu Chen. Wooil Kim and John H. L. Hansen. Outline. Real conversational speech corpus TEO-CB-AUTO-ENV Emotional language model score Experimental results. Real conversational speech corpus. - PowerPoint PPT Presentation

Citation preview

ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH

BY LEVERAGING CONTENT STRUCTUREWooil Kim and John H. L. Hansen

Chun-Yu Chen

Outline• Real conversational speech corpus

• TEO-CB-AUTO-ENV

• Emotional language model score

• Experimental results

Real conversational speech corpus

• Neutral speech• digits , alphabets , and other words (First,

July, August)• specific information

• Angry speech• negative words (not, no, can’t, even, how)• Complaints• others(that, this, here)

Real conversational speech corpus

TEO-CB-AUTO-ENV

• one of the acoustic features for angry speech detection

• designed to represent nonlinear characteristics of the voiced sound production (e.g., vowels)

• The resulting vector of area coefficients has been shown to be large for neutral speech

Emotional language model score

• two types of combination methods1. feature combination

MFCC feature vector is appended to the TEO-CB-Auto-Env feature vector

2. classifier combination combining the likelihood scores from both

classifiers with a scale factor

• “Emotional” language models• Based on an initial language model with a

large vocabulary (HUB4)• using the transcripts of neutral and angry

speech• using HTK and CMU-Cambridge SLMT

toolkit to adapting the initial laguage model• formulate a 2-dimensional feature vector for

a “lexical” feature

Emotional language model score

Emotional language model score

• Collect data• 15 female and 13 male speakers• 136 segments for neutral speech and 124

segments for angry speech• Each segment has 3-6 sec

Experimental results

Experimental results• Two type of model for test

1. Open-speaker• model training by all data except tester’s

2. Close-speaker• Split to two part of data• Tester only speak utterance in part A• Model is training by part B• More performance by include more data

• Without EMLS• MFCC-EDZ is best

in single feature

Experimental results

• With EMLS

Experimental results

Recommended