29
CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines Björn Schuller , Riccardo Zaccarelli, Nicolas Rollet, Laurence Devillers CNRS-LIMSI Spoken Language Processing Group Orsay, France Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment

CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines

  • Upload
    faunia

  • View
    23

  • Download
    1

Embed Size (px)

DESCRIPTION

CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines. Björn Schuller , Riccardo Zaccarelli, Nicolas Rollet, Laurence Devillers. CNRS-LIMSI Spoken Language Processing Group Orsay , France. Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment. - PowerPoint PPT Presentation

Citation preview

Page 1: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and BaselinesBjörn Schuller, Riccardo Zaccarelli, Nicolas Rollet, Laurence

DevillersCNRS-LIMSI Spoken Language Processing GroupOrsay, France

Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment

Page 2: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Introduction

• CINEMO Corpus Statistics

• Recognition of Complex Emotions

• Conclusions

Outline

Björn Schuller 2

Page 3: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Dimensional ModelOrthogonal system:

Arousal, valence, dominance/potency, ...Ideally non-correlated

• Categorical ModelDiscrete affective statese.g. „Big 6“ (Ekman/MPEG-4)Assignable in emotion sphere“Intensity” turns category into dimension

• Complex Emotions“Soft” hit for several categories“Major / minor” emotion

Models of Emotion

Björn Schuller 3

Arousal a

Valence v

e=[v,a]T

1.0-1.0

-1.0

1.0

Surprise

Joy

Anticipation

Acceptance

Neutr alität

Sadness

Disgust

Anger

Fear

Page 4: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

Databases – Nine Popular Examples

Björn Schuller 4

Corpus Content # Emotions # Instances h:mm # Subjects Type

ABC German Fixed

6 431 1:15 8 4 f acted

AVIC English variable

I (5) 3002 1:47 21 10 f natural

DES Danish Fixed

5 419 0:28 4 2 f acted

EMO-DB German Fixed

7 494 0:22 10 5 f acted

eNTERFACE English Fixed

6 1277 1:00 42 8 f acted

SAL English variable

A/V 1692 1:41 4 2 f natural

SmartKom German variable

(10) 3823 7:08 79 47 f natural

SUSAS English fixed

(3) 3593 1:01 7 3 f natural

VAM German variable

A/V/D (3x5) 946 0:47 47 32 f Natural

Page 6: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Size3 992 instances after segmentation2:13:59 h net playtime

• Subjects51 speakers:

21 female (1 656 instances), 30 male (2 336 instances)4 age groupsNone professional actor

• ProtocolDubbing selected scenes from 12 French movies Broad coverage of emotionsSituations close to everyday emotions (Rottenberg et al., 2007)Suited to well induce mood (Gerrards-Hesse et al., 1994)

Corpus Stats and Protocol

Björn Schuller 6

Page 7: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Good Blend to Cover EmotionsExtrapolation of interpersonal behavior patternsAffective Computing

• Areas of ApplicationInterpretation of the user intentionAccommodation in the communicationObjective measurementTransmission of emotionEmotional adaptationMultimedia RetrievalVideo gaming and entertainmentSurveillanceEncoding

A Dozen Movies

Björn Schuller 7

Page 8: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• “Karaoke”Participants superpose voice on actor’sActor’s voice audible or mutedDialog/pauses shown as a KaraokeCurrent word highlightedSpoken interactions, natural contexts

• Example Scene: “Chaos”Affective state: sadness, disappointment Description: speaker reports

humiliating behavior of boyfriend Involvement’s degree: highly implicatedType of action: storytellingImplied temporalities: recent past

Movies

Björn Schuller 8

Page 9: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Numbers29 scenes, 1 or 2 players at a time:

14 male, 7 female, 6 mixed gender, 2 female–female scenes31 roles:14 female and 17 male

• Scene RepetitionEach scene could be repeatedNumber of occurrences per attempt:

1 945 (first), 1 518 (second), 433 (third), 84 (fourth), 12 (fifth)Mean number of scene repetition: 1.67

Scenes and Roles

Björn Schuller 9

Page 10: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• N-Gram Frequencies119 turns with 1 609 wordsVocabulary size of 5624.4 graphemes on averageUni-grams “c”’ (this), “est” (is), and “j’ ” (I) > 50 timesBi-gram “c’est” > 10 times

A Linguistic Perspective

Björn Schuller 10

Page 11: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Sequential ProcessingAt present complete annotation by 2 experienced labelers:𝐿1: male, 31 years; 2: female, 26 years𝐿2 strategies intentionally followed:𝐿1 provided with sequential order, manually segmented audio𝐿2 provided with single instances in random order for verification

• Balanced Segmentation InterestsSyntax, pragmatic, stationarity of major emotionShorter segments preferredPredominant non-linguistic vocalizations as boundariesAfter segmentation:

min. 24, max. 189, median 74, std. dev. 41 instances per speaker

Segmentation and Annotation

Björn Schuller 11

Page 12: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Labelling per InstanceSpeaker ID/gender, movie ID, attempt, running ID, begin/end

timeMajor and minor emotion attribute (16 options)Mood (7 options: amusement, irritation, neutrality,

embarrassment, positivity, stress, timidity, =0.41)𝜅6 Dimensions: 3 states

Segmentation and Annotation

Björn Schuller 12

Page 13: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Major and MinorFrequencies per labeller

Annotation

Björn Schuller 13

Page 14: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Major and MinorHeat map of pairsPotentially 256 combinations 118 found in the setStrong presence of blended

Full agreement on major/minor:105 combinations 2 091 instancesi. e. half of the corpus

Blended emotions well identifiable

Annotation

Björn Schuller 14

Page 15: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Distribution of DimensionsTypical imbalance in favor of negative valence

Annotation

Björn Schuller 15

Page 16: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Agreement DimensionsMonotonic increase from unweighted to quadratic kappa:

label confusions preferably in neighboring classesApart from suddenness, good concurrence at ≥ 0.4𝜅

Annotation

Björn Schuller 16

Page 17: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

Recognition of Complex Emotions

Page 18: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Train, Development, TestFoster easy reproducibility of results Proper definition of a development set

Straightforward three-fold partitioning by speaker index:Train (≈40%/ 21 speakers: ID 1–21)Development (≈30%/15 speakers: ID 22–36)Test (≈30%/ 15 speakers: ID 37–51)

Strict speaker independence‘Genuine’ results w/o previous fine-tuning on the test partition

Data Partitioning

Björn Schuller 18

Page 19: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• openEARopenSMILE’s “base” set 988 features

Slight extension over INTERSPEECH 2009Emotion Challenge

Systematic brute-forcing19 functionals of 26 low-level descriptors SMA LP filteredPlus regression coeff’s

Acoustic Features

Björn Schuller 19

Page 20: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Upper BoundsFirst major and minor emotions separatelyMax. 16 classes

Then complex compound Max. 256 classes (quadratic number as order matters)Not all permutations occurDependencies among labels have to be assumed:

Scripted recording protocol and in general

Problem Complexity

Björn Schuller 20

Page 21: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• AlternativesBest fuzzy architecture for multiple labels:

e.g. multi-task neural networks?

Different weighting of major/minor emotioncomparison with the N-best result list?

• Chosen Way‘Traditional’ Support Vector MachinesPolynomial KernelPair-wise multi-class discriminationSequential Minimal Optimization learningTraining up-sampled in case of high class imbalance

Classification Strategy

Björn Schuller 21

Page 22: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• ‘Fixed Minor’‘Conventional’ case Minor emotion fixed as neutralMajor emotion varied Full labeler agreement950 instances, 5 classes providing sufficient instances (major–minor, # instances):

AMU –NEU (79)DEC –NEU (204) ENE –NEU (359) INQ –NEU (202) SAT –NEU (106)

Three Examples

Björn Schuller 22

Page 23: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• ‘Fixed Major’Different blends of irritationMajor emotion fixed as irritation Minor emotion varied Full labeler agreement607 instances, again 5 classes providing sufficient instances

ENE– COL (186)ENE– DEC (110)ENE– INQ (66)ENE– IRO (51)ENE– NEU (184)

Three Examples

Björn Schuller 23

Page 24: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• ‘Fully Mixed’Full labeler agreement533 instances, again 5 classes providing sufficient instances

INQ–NEU (114)STR–INQ (63)ENE–COL (186)ENE–DEC (110)JOI–SUR (60)

Examples in no stricter relation to each otherBut: demonstrate that feasible even in full major/minor mix

Three Examples

Björn Schuller 24

Page 25: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• ResultsWeighted Average Recall (WAR, i. e. recognition rate) Unweighted Average Recall (UAR, reflect imbalance among

classes) Area under the receiver operating curve (AUC)

Three Examples

Björn Schuller 25

Page 26: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Results for Selected DimensionsGround truth by mean of labellersAll instances usedCross correlation (CC), mean linear error (MLE)Support Vector RegressionPrediction can be used as features for complex emotionsHighly imbalanced distribution

Regression Baseline

Björn Schuller 26

Page 27: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

Conclusions

Page 28: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

• Corpus for Complex EmotionsComparatively large CINEMO corpus

• BaselinesFirst impressions on the challenge

• Future Directions… Future large resources with recordings ‘in the wild’

Tailored classification architectures:Exploit the mutual information among major and minor emotionsComplex ‘language models’ to reflect transition probabilities

Conclusions

Björn Schuller 28

Page 29: CINEMO –  A French Spoken Language Resource for Complex Emotions: Facts and Baselines

Merci.

This work was partly funded by the ANR project Affective Avatar.