Upload
faunia
View
23
Download
1
Embed Size (px)
DESCRIPTION
CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and Baselines. Björn Schuller , Riccardo Zaccarelli, Nicolas Rollet, Laurence Devillers. CNRS-LIMSI Spoken Language Processing Group Orsay , France. Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment. - PowerPoint PPT Presentation
Citation preview
CINEMO – A French Spoken Language Resource for Complex Emotions: Facts and BaselinesBjörn Schuller, Riccardo Zaccarelli, Nicolas Rollet, Laurence
DevillersCNRS-LIMSI Spoken Language Processing GroupOrsay, France
Thursday 20th May 2010, 12.25-12.45 PM, O21 - Emotion, Sentiment
• Introduction
• CINEMO Corpus Statistics
• Recognition of Complex Emotions
• Conclusions
Outline
Björn Schuller 2
• Dimensional ModelOrthogonal system:
Arousal, valence, dominance/potency, ...Ideally non-correlated
• Categorical ModelDiscrete affective statese.g. „Big 6“ (Ekman/MPEG-4)Assignable in emotion sphere“Intensity” turns category into dimension
• Complex Emotions“Soft” hit for several categories“Major / minor” emotion
Models of Emotion
Björn Schuller 3
Arousal a
Valence v
e=[v,a]T
1.0-1.0
-1.0
1.0
Surprise
Joy
Anticipation
Acceptance
Neutr alität
Sadness
Disgust
Anger
Fear
Databases – Nine Popular Examples
Björn Schuller 4
Corpus Content # Emotions # Instances h:mm # Subjects Type
ABC German Fixed
6 431 1:15 8 4 f acted
AVIC English variable
I (5) 3002 1:47 21 10 f natural
DES Danish Fixed
5 419 0:28 4 2 f acted
EMO-DB German Fixed
7 494 0:22 10 5 f acted
eNTERFACE English Fixed
6 1277 1:00 42 8 f acted
SAL English variable
A/V 1692 1:41 4 2 f natural
SmartKom German variable
(10) 3823 7:08 79 47 f natural
SUSAS English fixed
(3) 3593 1:01 7 3 f natural
VAM German variable
A/V/D (3x5) 946 0:47 47 32 f Natural
CINEMO Corpus Statistics
• Size3 992 instances after segmentation2:13:59 h net playtime
• Subjects51 speakers:
21 female (1 656 instances), 30 male (2 336 instances)4 age groupsNone professional actor
• ProtocolDubbing selected scenes from 12 French movies Broad coverage of emotionsSituations close to everyday emotions (Rottenberg et al., 2007)Suited to well induce mood (Gerrards-Hesse et al., 1994)
Corpus Stats and Protocol
Björn Schuller 6
• Good Blend to Cover EmotionsExtrapolation of interpersonal behavior patternsAffective Computing
• Areas of ApplicationInterpretation of the user intentionAccommodation in the communicationObjective measurementTransmission of emotionEmotional adaptationMultimedia RetrievalVideo gaming and entertainmentSurveillanceEncoding
A Dozen Movies
Björn Schuller 7
• “Karaoke”Participants superpose voice on actor’sActor’s voice audible or mutedDialog/pauses shown as a KaraokeCurrent word highlightedSpoken interactions, natural contexts
• Example Scene: “Chaos”Affective state: sadness, disappointment Description: speaker reports
humiliating behavior of boyfriend Involvement’s degree: highly implicatedType of action: storytellingImplied temporalities: recent past
Movies
Björn Schuller 8
• Numbers29 scenes, 1 or 2 players at a time:
14 male, 7 female, 6 mixed gender, 2 female–female scenes31 roles:14 female and 17 male
• Scene RepetitionEach scene could be repeatedNumber of occurrences per attempt:
1 945 (first), 1 518 (second), 433 (third), 84 (fourth), 12 (fifth)Mean number of scene repetition: 1.67
Scenes and Roles
Björn Schuller 9
• N-Gram Frequencies119 turns with 1 609 wordsVocabulary size of 5624.4 graphemes on averageUni-grams “c”’ (this), “est” (is), and “j’ ” (I) > 50 timesBi-gram “c’est” > 10 times
A Linguistic Perspective
Björn Schuller 10
• Sequential ProcessingAt present complete annotation by 2 experienced labelers:𝐿1: male, 31 years; 2: female, 26 years𝐿2 strategies intentionally followed:𝐿1 provided with sequential order, manually segmented audio𝐿2 provided with single instances in random order for verification
• Balanced Segmentation InterestsSyntax, pragmatic, stationarity of major emotionShorter segments preferredPredominant non-linguistic vocalizations as boundariesAfter segmentation:
min. 24, max. 189, median 74, std. dev. 41 instances per speaker
Segmentation and Annotation
Björn Schuller 11
• Labelling per InstanceSpeaker ID/gender, movie ID, attempt, running ID, begin/end
timeMajor and minor emotion attribute (16 options)Mood (7 options: amusement, irritation, neutrality,
embarrassment, positivity, stress, timidity, =0.41)𝜅6 Dimensions: 3 states
Segmentation and Annotation
Björn Schuller 12
• Major and MinorFrequencies per labeller
Annotation
Björn Schuller 13
• Major and MinorHeat map of pairsPotentially 256 combinations 118 found in the setStrong presence of blended
Full agreement on major/minor:105 combinations 2 091 instancesi. e. half of the corpus
Blended emotions well identifiable
Annotation
Björn Schuller 14
• Distribution of DimensionsTypical imbalance in favor of negative valence
Annotation
Björn Schuller 15
• Agreement DimensionsMonotonic increase from unweighted to quadratic kappa:
label confusions preferably in neighboring classesApart from suddenness, good concurrence at ≥ 0.4𝜅
Annotation
Björn Schuller 16
Recognition of Complex Emotions
• Train, Development, TestFoster easy reproducibility of results Proper definition of a development set
Straightforward three-fold partitioning by speaker index:Train (≈40%/ 21 speakers: ID 1–21)Development (≈30%/15 speakers: ID 22–36)Test (≈30%/ 15 speakers: ID 37–51)
Strict speaker independence‘Genuine’ results w/o previous fine-tuning on the test partition
Data Partitioning
Björn Schuller 18
• openEARopenSMILE’s “base” set 988 features
Slight extension over INTERSPEECH 2009Emotion Challenge
Systematic brute-forcing19 functionals of 26 low-level descriptors SMA LP filteredPlus regression coeff’s
Acoustic Features
Björn Schuller 19
• Upper BoundsFirst major and minor emotions separatelyMax. 16 classes
Then complex compound Max. 256 classes (quadratic number as order matters)Not all permutations occurDependencies among labels have to be assumed:
Scripted recording protocol and in general
Problem Complexity
Björn Schuller 20
• AlternativesBest fuzzy architecture for multiple labels:
e.g. multi-task neural networks?
Different weighting of major/minor emotioncomparison with the N-best result list?
• Chosen Way‘Traditional’ Support Vector MachinesPolynomial KernelPair-wise multi-class discriminationSequential Minimal Optimization learningTraining up-sampled in case of high class imbalance
Classification Strategy
Björn Schuller 21
• ‘Fixed Minor’‘Conventional’ case Minor emotion fixed as neutralMajor emotion varied Full labeler agreement950 instances, 5 classes providing sufficient instances (major–minor, # instances):
AMU –NEU (79)DEC –NEU (204) ENE –NEU (359) INQ –NEU (202) SAT –NEU (106)
Three Examples
Björn Schuller 22
• ‘Fixed Major’Different blends of irritationMajor emotion fixed as irritation Minor emotion varied Full labeler agreement607 instances, again 5 classes providing sufficient instances
ENE– COL (186)ENE– DEC (110)ENE– INQ (66)ENE– IRO (51)ENE– NEU (184)
Three Examples
Björn Schuller 23
• ‘Fully Mixed’Full labeler agreement533 instances, again 5 classes providing sufficient instances
INQ–NEU (114)STR–INQ (63)ENE–COL (186)ENE–DEC (110)JOI–SUR (60)
Examples in no stricter relation to each otherBut: demonstrate that feasible even in full major/minor mix
Three Examples
Björn Schuller 24
• ResultsWeighted Average Recall (WAR, i. e. recognition rate) Unweighted Average Recall (UAR, reflect imbalance among
classes) Area under the receiver operating curve (AUC)
Three Examples
Björn Schuller 25
• Results for Selected DimensionsGround truth by mean of labellersAll instances usedCross correlation (CC), mean linear error (MLE)Support Vector RegressionPrediction can be used as features for complex emotionsHighly imbalanced distribution
Regression Baseline
Björn Schuller 26
Conclusions
• Corpus for Complex EmotionsComparatively large CINEMO corpus
• BaselinesFirst impressions on the challenge
• Future Directions… Future large resources with recordings ‘in the wild’
Tailored classification architectures:Exploit the mutual information among major and minor emotionsComplex ‘language models’ to reflect transition probabilities
Conclusions
Björn Schuller 28
Merci.
This work was partly funded by the ANR project Affective Avatar.