If you can't read please download the document
View
4.504
Download
12
Embed Size (px)
Citation preview
Emotion-AI:
December 16th, 2017
1
Affective Computing (1995):
the study and development of systems and devices that can recognize, interpret, process, and simulate human affects
Professor Rosalind PiccardMIT Media Lab
Annual Conference on AffectComputing and IntelligentInteraction (ACII)
ACII 2017 @ San Antonio
2
3
?
4
People also smile when they are miserable. Paul Ekman, Telling Lies: Clues to Deceit the Marketplace, Politics, and Marriage
5
6
PhilosophyDiscuss emotion with philosophy
Turn to PracticalCombing physical and emotion tostart to apply the systems on human
Cognitive ProcessCognitive Theory
Mind-Body DualismCombining physical world with emotion
Modern Theory1
2
3
4
5
7
https://aquileana.wordpress.com/2014/04/14/platos-phaedrus-the-allegory-of-the-chariot-and-the-tripartite-nature-of-the-soul/
Platos horses
Successful Person- Reason horse is more in control
described emotion and reason as two horses pulling us in opposite directions.
PhilosophyDiscuss emotion with philosophy
8
Stoicism Aristippus
PhilosophyDiscuss emotion with philosophy
9
Mind-Body Dualism
In the 17th century, Ren Descartes viewed the bodys emotional apparatus as largely hydraulic. He believed that when a person felt angry or sad it was because certain internal valves opened and released such fluids as bile and phlegm
Mind-Body DualismCombining physical world with emotion
10
Charles Darwin believed that emotions were beneficial for evolution because emotions improved chances of survival. For example, the brain uses emotion to keep us away from a dangerous animal (fear), away from rotting food and fecal matter (disgust), in control of our resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).
Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.
Turn to PracticalDiscuss the combination with physic andemotion and start to apply the systemof human
11
James, William. 1884. "What Is an Emotion?" Mind. 9, no. 34: 188-205.
Our feeling of the same changes as they occur is the emotion
Modern Theory
12
James- Lange Cannon-Bard Schachter & Singer
13
James- Lange Cannon-Bard Schachter & Singer
:1.:
2.
3.:
James
Two-factor theory of emotion
:
:1.(hypothalamus) (limbicsystem)
14
16
17
321
18
,
3
19
2
20
1 ()
Cognitive ProcessCognitive Theory
21
James-Lange
22
ArnoldLazarus- (Appraisal)
TomkinsIzard-
23
1
/
2
4
1
*
*
*
3
6
1
**
1
5
24
1
/
2
4
1
*
*
*
3
6
1
**
1
5
25
1
/
2
4
1
*
*
*
3
6
1
**
1
5
INFERENCE?
26
?
TAG?
27
Charles Darwin believed that emotions were beneficial for evolution because emotionsimproved chances of survival. For example, the brain uses emotion to keep us away from adangerous animal (fear), away from rotting food and fecal matter (disgust), in control ofour resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).
Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.
28
100
Paul Ekman
29
30
Are There Universal Facial Expressions? Just guess
31
Facial Action Coding SystemFACS
AU
32
Mascolo, M. F., Fischer, K. W., & Li,J. (2003). Dynamic development of component system of emotions: Pride, shame, and guilt in China and the United States. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 375-408). New York: Oxford University Press.Shaver, P. R., Wu, S., & Schwartz, J. C. (1992). Cross-cultural similarities and differences in emotion and its representation: A prototype approach. In Clark, M. S. (Ed.), Review of Personality and Social Psychology, 13, pp. 231-251. Sage: Thousand Oaks.
33
1
2
3
4
2
/
1
*
*
*
4
**
3
2
1 4
3
34
?
35
label ?
there is no limit to the number of possible different emotions
William James
36
Silvan Tomkins (1962) concluded that there are eight basic emotions:
surprise, interest, joy, rage, fear, disgust, shame, and anguish
Carroll Izard (the University of Delaware 1993) 12 discrete emotions labeled: Interest, Joy, Surprise, Sadness, Anger, Disgust, Contempt,
Self-Hostility, Fear, Shame, Shyness, and Guilt
Differential Emotions Scale or DES-IV
37
Ekman 1972 ()
38
Dimensional models of emotion
Define emotions according to one or more dimensions
Wilhelm Max Wundt(1897) three dimensions: "pleasurable versus unpleasurable",
"arousing or subduing" and "strain or relaxation
Harold Schlosberg (1954) three dimensions of emotion: "pleasantness
unpleasantness", "attentionrejection" and "level of activation
Prevalent incorporate valence and arousal dimensions
39
Circumplex model Vector model Positive activation negative activation (PANA) model Plutchik's model PAD emotional state model Lvheim cube of emotion Cowen & Keltner 2017
40
Circumplex model : Perceptual
developed by James Russell (1980) two-dimensional circular space, containing arousal
and valence dimensions
arousal represents the vertical axis and valence represents the horizontal axis
prevalent use as labels
41
Positive activation Negative activation (PANA) Self Report
created by Watson and Tellegen in 1985 suggests that positive affect and negative affect are
two separate systems (responsible for differentfunctions)
states of higher arousal tend to be defined by theirvalence
states of lower arousal tend to be more neutral interms of valence
the vertical axis represents low to high positive affect the horizontal axis represents low to high negative
affect. the dimensions of valence and arousal lay at a 45-
degree rotation over these axes
42
43
Cowen & Keltner
2017, University of California, Berkeley researchers Alan S. Cowen & Dacher Keltner (PNAS)
27 distinct emotions http://news.berkeley.edu/2017/09/06/27-
emotions/ (A.) Admiration. (B.) Adoration. (C.) Aesthetic appreciation. (D.)
Amusement. (E.) Anger. (F.) Anxiety. (G.) Awe. (H.) Awkwardness. (I.)Boredom. (J.) Calmness. (K.) Confusion. (L.) Craving. (M.) Disgust. (N.)Empathic pain. (O.) Entrancement. (P.) Excitement. (Q.) Fear. (R.) Horror.(S.) Interest. (T.) Joy. (U.) Nostalgia. (V.) Relief. (W.) Romance. (X.) Sadness(Y.) Satisfaction (Z.) Sexual desire. (.) Surprise.
44
http://news.berkeley.edu/2017/09/06/27-emotions/
:
Affective Computing
Many Theories
Many Models/Annotations
Take Away? Stable
45
1
/
2
4
1
*
*
*
3
6
1
**
1
5
(Data-driven AI Learning and Inference) ?
46
?
47
Affective Computing
reference: https://www.gartner.com/newsroom/id/3412017/
fast growing, but still not a mature technique
48
Face
Affective Computing
SpeechBody GestureMulti-Modal PhysiologyLanguage
reference: http://blog.ventureradar.com/2016/09/21/15-leading-affective-computing-companies-you-should-know/49
Education Health Care Gaming Advertisement Retail Legal
Emotion Recognition AS Part of Larger SystemAPI, SDK
50
:
51
Little Dragon(Affectiva- Education)
make learning more enjoyable and more effective, by providing an educational tool that is both universal and personalized
reference: https://www.affectiva.com/success-story/
https://www.youtube.com/watch?v=SmjAa8iMkjU 52
https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=SmjAa8iMkjU
53
Nevermind(Affectiva- Gaming)
bio-feedback horror game
sense a players facial expressions for signs of emotional distress, and adapt game play accordingly
reference: https://www.affectiva.com/success-story/chttps://www.youtube.com/watch?v=NGr0orAqRH4&t=497s 54
https://www.affectiva.com/success-story/chttps://www.youtube.com/watch?v=NGr0orAqRH4&t=497s
Brain Power(Affectiva- Health Care)
The Worlds First Augmented Reality Smart-Glass-Systemto empower children and adults with autism to teachthemselves crucial social and cognitive skills.
reference: https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=qfoTprgWyns
55
https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=qfoTprgWyns
56
MediaRebel(Affectiva- Legal)
Legal video deposition management platform MediaRebel uses AffectivasEmotion SDK for facial expression analysis and emotion recognition.
Intelligent analytical features include: Search transcript based upon witness emotions Instantly playback testimony based upon select emotions Identify positive, negative & neutral witness behavior
reference: https://www.affectiva.com/success-story/
https://www.mediarebel.com/57
https://www.affectiva.com/success-story/https://www.mediarebel.com/
shelfPoint(Affectiva- Retail)
Cloverleaf is a retail technology company for the modern brick-and-mortar marketer and merchandise
shelfPoint solution: brands and retailers can now capture customer engagement and sentiment data at the moment of purchase decision something previously unavailable in physical retail stores.
reference: https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=S9gDqpF6kLs
https://www.youtube.com/watch?v=W6UnahO_zXs58
https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=S9gDqpF6kLshttps://www.youtube.com/watch?v=W6UnahO_zXs
59
:
60
1
/
2
4
1
*
*
*
3
6
1
**
1
5
Data-driven AI Learning and Inference ?
61
?
?
62
63
Year Database Language Setting Protocol Elicitation
1997 DES Dan. Single Scr. Induced
2000 GEMEP Fre. Single Scr. & Spo. Acted
2005 eNTERFACE' 05 Eng. Single Scr. Induced2007 HUMAINE Eng. TV Talk Scr. & Spo. Mix.
2008 VAM Ger. TV Talk Spo. Acted
2008 IEMOCAP Eng. Dyadic Scr. & Spo. Acted
2009 SAVEE Eng. Single Spo. Acted
2010 CIT Eng. Dyadic Scr. & Spo. Acted
2010 SEMAINE Eng. Dyadic Scr. Mix.
2013 RECOLA Fre. Dyadic Spo. Acted
2016 CHEAVD Chi. TV talk Spo. Posed
2017 NNIME Chi. Dyadic Spo. Acted
Language: DanishParticipants: 4 (Man: 2; Female: 2)Recordings:
AudioTotal: 0.5 hoursSentences: 5200 utterancesLabels:
Perspectives: Nave-Observer Rater: 20 Discrete session-level annotation
Categorical (5)
DES:DESIGN, RECORDING AND VERIFICATION OF A DANISH
EMOTIONAL SPEECH DATABASE
64
Engberg, Inger S., et al. "Design, recording and verification of a Danish emotional speech database." Fifth European Conference on Speech Communication and Technology. 1997.
Available: Tom Brndsted ([email protected])
DES
Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification1
(Cat.:0.676)
Automatic emotional speech classification2
(Cat.:0.516)
65
1Yun, Sungrack, and Chang D. Yoo. "Loss-scaled large-margin Gaussian mixture models for speech emotion classification."IEEE Transactions on Audio, Speech, and Language Processing20.2 (2012): 585-598.
2Ververidis, Dimitrios, Constantine Kotropoulos, and Ioannis Pitas. "Automatic emotional speech classification." Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on. Vol. 1. IEEE, 2004..
Language: FrenchParticipants: 10 (Man: 5; Female: 5)Recordings:
Dual-channel Audio HD Video Manual Transcript Face & Head Body Posture & Gestures
Sentences: 7300 sequencesLabels:
Perspectives: Nave-Observer Discrete session-level annotation Categorical (18)
GEMEP:Geneva Multimodal Emotion Portrayals corpus
66
Bnziger, Tanja, Hannes Pirker, and K. Scherer. "GEMEP-GEneva Multimodal Emotion Portrayals: A corpus for the study of multimodal emotional expressions." Proceedings of LREC. Vol. 6. 2006.
Bnziger, Tanja, and Klaus R. Scherer. "Using actor portrayals to systematically study multimodal emotion expression: The GEMEP corpus." International conference on affective computing and intelligent interaction. Springer, Berlin, Heidelberg, 2007.
Available: Tanja Bnziger(Tanja.Banziger@ pse.unige.ch)
GEMEP
Multimodal emotion recognition from expressive faces, body gestures and speech
(Cat.: 0.571)
67
Kessous, Loic, Ginevra Castellano, and George Caridakis. "Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis." Journal on Multimodal User Interfaces 3.1 (2010): 33-48.
Language: EnglishParticipants: 42 (Man: 34; Female: 24)(14 different nationalities)Recordings:
Dual-channel Audio HD Video Script
Total: 1166 video sequencesEmotion-related atmosphere:
To express six emotions
eNTERFACE' 05:The eNTERFACE05 Audio-Visual Emotion Database
68
Martin, Olivier, et al. "The enterface05 audio-visual emotion database." Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on. IEEE, 2006.
Available: O. Martin ([email protected])
eNTERFACE' 05 Sparse autoencoder-
based feature transfer learning for speech emotion recognition1
(Cat.: 59.1)
Unsupervised learning in cross-corpus acoustic emotion recognition2
(Val./Act.:0.574/0.616)
69
1Deng, Jun, et al. "Sparse autoencoder-based feature transfer learning for speech emotion recognition." Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 2013.
2Zhang, Zixing, et al. "Unsupervised learning in cross-corpus acoustic emotion recognition." Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011.
Language: EnglishParticipants: Many (Include 8 datasets)Recordings : (Naturalistic (TV shows, interviews)/Induced data)
Audio Video Gesture Emotion words
Labels: Perspectives: Nave-Observer Rater: 4 Continuous-in-time annotation
Dimensional (8) [Intensity, Activation, Valence, Power, Expect, Word] Discrete annotation (5)
Emotion-related states Key Event Everyday Emotion words
HUMAINE:Addressing the Collection and Annotation of
Naturalistic and Induced Emotional Data
70
Douglas-Cowie, Ellen, et al. "The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data." Affective computing and intelligent interaction (2007): 488-500.
Available: [email protected]
HUMAINE
A Multimodal Database for Affect Recognition and Implicit Tagging1
(Val./Act.:0.761/0.677)
Abandoning Emotion Classes - Towards Continuous Emotion Recognition with Modelling of Long-Range Dependencies2
(Val./Act.[MSE]:0.18/0.08)
71
1Soleymani, Mohammad, et al. "A multimodal database for affect recognition and implicit tagging." IEEE Transactions on Affective Computing 3.1 (2012): 42-55.
2Wllmer, Martin, et al. "Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies." Ninth Annual Conference of the International Speech Communication Association. 2008.
Language: German TV showsParticipants: 47 Recordings:
Audio Video Face Manual Transcript
Total: 12 hoursSentences: 946 utterancesLabels:
Perspectives: Peer, Director, Self, Nave-Observer Rater: 17 Continuous-in-time annotation
Dimensional (Valence-Activation-Dominance) for Audio Discrete session-level annotation
Categorical (7) for Faces
VAM:The Vera am Mittag German Audio-Visual
Spontaneous Speech Database
72
Grimm, Michael, Kristian Kroschel, and Shrikanth Narayanan. "The Vera am Mittag German audio-visual emotional speech database." Multimedia and Expo, 2008 IEEE International Conference on. IEEE, 2008.
Available: [email protected]
VAM
Towards robust spontaneous speech recognition with emotional speech adapted acoustic models1
(Word ACC.: 42.75)
Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization Speech Adapted Acoustic Models2
(Val./Act.: 0.502/0.677)
73
1Vlasenko, Bogdan, Dmytro Prylipko, and Andreas Wendemuth. "Towards robust spontaneous speech recognition with emotional speech adapted acoustic models." Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, Saarbrucken, Germany. 2012.
2Schuller, Bjrn, et al. "Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization." Proc. 2011 Afeka-AVIOS Speech Processing Conference, Tel Aviv, Israel. 2011.
Language: EnglishParticipants: 10 (Man: 5; Female: 5)Recordings:
Dual-channel Audio HD Video Manual Transcript 53 Marker Motion (Face and Head)
Total: 12 hours, 50 sessions (3 min/session)Sentence: 6904 sentencesLabels:
Perspectives: Nave-ObserverSelf (6/10) Rater: 6 Continuous-in-time annotation
Dimensional (Valence-Activation-Dominance) Discrete session-level annotation
Categorical (5)
IEMOCAP:The Interactive Emotional Dyadic Motion Capture
database
74
Busso, Carlos, et al. "IEMOCAP: Interactive emotional dyadic motion capture database." Language resources and evaluation 42.4 (2008): 335.
Available: Anil Ramakrishna ([email protected])
IEMOCAP Tracking continuous
emotional trends of participants during affective dyadicinteractions using body language and speech information1
(Val./Act./Dom.:0.619/0.637/0.62)
Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions2
(Cat./Val./Act.:0.552/0.634/0.650)
75
1Metallinou, Angeliki, Athanasios Katsamanis, and Shrikanth Narayanan. "Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information." Image and Vision Computing 31.2 (2013): 137-152.
2Lee, Chi-Chun, et al. "Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions." Tenth Annual Conference of the International Speech Communication Association. 2009.
Language: EnglishParticipants: 4 (Man: 4)Recordings:
Dual-channel Audio Video Face Maker
Sentences: 480 utterancesLabels:
Perspectives: Nave-Observer Discrete session-level annotation
Categorical (6)
SAVEE:Surrey Audio-Visual Expressed Emotion database
76
Jackson, P., and S. Haq. "Surrey Audio-Visual Expressed Emotion(SAVEE) Database." University of Surrey: Guildford, UK (2014).
Available: P Jackson ([email protected])
SAVEE
77
. 2S. Haq and P.J.B. Jackson. "Speaker-Dependent Audio-Visual Emotion Recognition", In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pages
53-58, 2009.3S. Haq, P.J.B. Jackson, and J.D. Edge. Audio-Visual Feature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual
Speech Processing, pages 185-190, 2008
Speaker-Dependent Audio-Visual Emotion Recognition1
(Cat.: 97.5) Audio-Visual Feature
Selection and Reduction for Emotion Classification3
(Cat.: 96.7)
Language: EnglishParticipants: 16 (Man: 7; Female: 9)Recordings:
Dual-channel Audio HD Video Transcript Body gesture
Total: 48 dyadic sessions Sentences: 2162 sentenceLabels:
Perspectives: Nave-Observer Rater: 3 Discrete session-level annotation Continuous-in-time annotation
Dimensional (Valence-Activation-Dominance)
CIT:The USC CreativeIT database of multimodal dyadic
interactions: from speech and full body motion captureto continuous emotional annotations
78
Metallinou, Angeliki, et al. "The USC CreativeIT database: A multimodal database of theatrical improvisation." Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (2010): 55.
Metallinou, Angeliki, et al. "The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations." Language resources and evaluation 50.3 (2016): 497-521.
Available: Manoj Kumar ([email protected])
CIT
79
1Yang, Zhaojun, and Shrikanth S. Narayanan. "Modeling dynamics of expressive body gestures in dyadic interactions." IEEE Transactions on Affective Computing 8.3 (2017): 369-381.
2Yang, Zhaojun, and Shrikanth S. Narayanan. "Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions." INTERSPEECH. 2016.3Chang, Chun-Min, and Chi-Chun Lee. "Fusion of multiple emotion perspectives: Improving affect recognition through integrating cross-lingual emotion
information." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions2
Language: EnglishParticipants: 150Recordings:
Dual-channel Audio HD Video Manual Transcript
Multi-Interaction (like TV talk show): Human vs. Human Semi-human vs. Human Machine vs. Human
Total: 959 dyadic sessions (3 min/session)Labels:
Perspectives: Nave-Observer Rater: 8 Continuous-in-time annotation
Dimensional (Valence-Activation) Discrete Categorical (27)
SEMAINE:The SEMAINE Database: Annotated Multimodal Records of Emotionally
Colored Conversations between a Person and a Limited Agent
80
McKeown, Gary, et al. "The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent." IEEE Transactions on Affective Computing 3.1 (2012): 5-17.
Available: [email protected]
SEMAINE Building autonomous
sensitive artificial listeners1
A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling2
(0.701)
81
1Schroder, Marc, et al. "Building autonomous sensitive artificial listeners." IEEE Transactions on Affective Computing 3.2 (2012): 165-183.
2Jiang, Bihan, et al. "A dynamic appearance descriptor approach to facial actions temporal modeling." IEEE transactions on cybernetics 44.2 (2014): 161-174.
Language: FrenchParticipants: 46 (Man: 19; Female: 27)Recordings:
Dual-channel Audio HD Video (15 facial action units) Electrocardiogram Electrothermal activity
Total: 11 hours, 102 dyadic sessions (3 min/session)Sentence: 1306 sentenceLabels:
Perspectives: Self, Nave-Observer Rater: 6 Continuous-in-time annotation
Dimensional (Valence-Activation)
RECOLA:Remote Collaborative and Affective Interactions
82
Ringeval, Fabien, et al. "Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions." Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 2013.
Available: Fabien Ringeval ([email protected])
RECOLA Prediction of asynchronous
dimensional emotion ratings from audiovisual and physiological data1
(Val./Act.: 0.804/0.528 )
End-to-end speech emotion recognition using a deep convolutional recurrent network2
(Val./Act.: 0.741/0.325 )
Face Reading from SpeechPredicting Facial Action Units from Audio Cues3
(Predict Facial Action Units from Audio Cues: 0.650 )
83
1Ringeval, Fabien, et al. "Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data." Pattern Recognition Letters 66 (2015): 22-30.2Trigeorgis, George, et al. "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network." Acoustics, Speech and Signal
Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.3Ringeval, Fabien, et al. "Face Reading from SpeechPredicting Facial Action Units from Audio Cues." Sixteenth Annual Conference of the International Speech
Communication Association. 2015.
Language: ChineseParticipants: 238 Recordings:
Audio Video (34 films, 2 TV series, 4 TV shows)
Total: 2.3 hours,Labels:
Rater: 4 Discrete session-level annotation
Fake/suppressed emotions Multi-emotion annotation for some segments Categorical (26 non-prototypical)
2017 Multimodal Emotion Recognition Challenge (MEC 2017: http://www.chineseldc.org/htdocsEn/emotion.html)
CHEAVD: A Chinese natural emotional audio-visual database
84
Li, Ya, et al. "CHEAVD: a Chinese natural emotional audiovisual database." Journal of Ambient Intelligence and Humanized Computing 8.6 (2017): 913-924.
Available: Ya Li ([email protected])
CHEAVD MEC 2016: the multimodal emotion recognition
challenge of CCPR 20161 (Cat.: 37.03)
Chinese Speech Emotion Recognition2 (Cat.: 47.33) Transfer Learning of Deep Neural Network for
Speech Emotion Recognition3 (Cat.: 50.01)
85
1Li, Ya, et al. "MEC 2016: the multimodal emotion recognition challenge of CCPR 2016." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.
2Zhang, Shiqing, et al. "Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.
3Huang, Ying, et al. "Transfer Learning of Deep Neural Network for Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.
Language: ChineseParticipants: 44 (Man: 20; Female: 24)Recordings:
Dual-channel Audio HD Video Manual Transcript Electrocardiogram
Total: 11 hours, 102 dyadic sessions (3 min/session)Sentences: 6029 utterancesLabels:
Perspectives: Peer, Director, Self, Nave-Observer Rater: 49 Continuous-in-time annotation Discrete session-level annotation
Dimensional (Valence-Activation) Categorical (6)
NNIME:The NTHU-NTUA Chinese Interactive Multimodal
Emotion Corpus
86
Huang-Cheng Chou, Wei-Cheng Lin, Lien-Chiang Chang, Chyi-Chang Li, Hsi-Pin Ma, Chi-Chun Lee "NNIME: The NTHU-NTUA Chinese Interactive Multimodal Emotion Corpus" in Proceedings of ACII 2017
Available: Huang-Cheng Chou ([email protected])Chi-Chun Lee ([email protected])
NNIME
Cross-Lingual Emotion Information1,3(session)
(Val./Act.: 0.682/0.604)
Dyad-Level Interaction2
(Cat.: 0.65)
87
1Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee*, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition" in Proceedings of ACII 2017
2Yun-Shao Lin, Chi-Chun Lee*, "Deriving Dyad-Level Interaction Representation using Interlocutors Structural and Expressive Multimodal Behavior Features" in Proceedings of the International Speech Communication Association (Interspeech), pp. 2366-2370, 2017
3Chun-Min Chang, Chi-Chun Lee*, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.5820-5824, 2017
Access These Emotion Database
88
Year Database Website
1997 DES http://kom.aau.dk/~tb/speech/Emotions/
2000 GEMEP https://www.affective-sciences.org/gemep/
2005 eNTERFACE' 05 http://www.enterface.net/enterface05/2007 HUMAINE http://emotion-research.net/download/pilot-db/
2008 VAM http://emotion-research.net/download/vam
2008 IEMOCAP http://sail.usc.edu/iemocap/
2009 SAVEE http://kahlan.eps.surrey.ac.uk/savee/
2010 CIT http://sail.usc.edu/CreativeIT/ImprovRelease.htm
2010 SEMAINE https://semaine-db.eu/
2013 RECOLA https://diuf.unifr.ch/diva/recola/download.html
2016 CHEAVD Upon request
2017 NNIME http://nnime.ee.nthu.edu.tw/
Key take-away
()
()
()
: ?
89
1
/
2
4
1
*
*
*
3
6
1
**
1
5
Data-driven AI Learning and Inference ?
90
1
/
2
4
1
*
*
*
3
6
1
**
1
5
Data-driven AI Learning and Inference ?
91
Speech
Text
Gesture
Face
Human Expression
92
Paralinguistic Expression
Linguistic Expression
93
Achievement 88.4%
Amusement 90.4%
Contentment 52.4%
Pleasure 61.6%
Relief 83.9%
73%-94%
Amusementdisgust
Pleasuresadness
Sadnessrelief24%17.5%
Amusement12%
7.9%
? ?
Achievement, Amusement, Anger,
Contentment, Disgust, Pleasure, Relief,
Sadness, Surprise
: 69.9%
Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.
94
Laugh
Cry
Sigh
Whisper
Whine
Laukka, Petri, et al. "Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations." Frontiers in Psychology 4 (2013).Gupta, Rahul, et al. "Detecting paralinguistic events in audio stream using context in features and probabilistic decisions." Computer Speech & Language 36 (2016): 72-92.
Laughter & Fillers2015
IS2013 sub-challenge
AUC for Detection
Laughter : 95.3 %
Fillers : 90.4 %
Cross-Culture2013
Universal Emotion
Non-Verbal Signals
Speak : India, USA,
Kenya, Singapore,
Listen : Sweden
95
?
Sahu, Saurabh & Gupta, Rahul & Sivaraman, Ganesh & AbdAlmageed, Wael & Espy-Wilson, Carol. (2017). Adversarial Auto-Encoders for Speech Based Emotion Recognition. 1243-1247. 10.21437/Interspeech.2017-1421. Rao, K. Sreenivasa, Shashidhar G. Koolagudi, and Ramu Reddy Vempada. "Emotion recognition from speech using global and local prosodic features." International journal of speech technology 16.2 (2013): 143-160.Lalitha, S., et al. "Emotion detection using MFCC and Cepstrum features." Procedia Computer Science 70 (2015): 29-35.Huang, Che-Wei, and Shrikanth Narayanan. "Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition." arXiv preprint arXiv:1706.02901 (2017).Lee, Jinkyu, and Ivan Tashev. "High-level feature representation using recurrent neural network for speech emotion recognition." INTERSPEECH. 2015.
Emo-DB
ProsodicSVM
62.43%
MFCCANN
85.7%
Deep Convolution
High-Level Representation
(time series)
96
Dimosa, Kostis, Leopold Dickb, and Volker Dellwoc. "Perception of levels of emotion in speech prosody." The Scottish Consortium for ICPhS (2015).Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.
emotional prosody does not function categorically, distinguishing only different emotions, but also indicates different degrees of the expressed emotion.
pitch and pitch variation is especially important for people to recognize emotion from non-verbal sounds
voice quality tension
Some Experiments : change the sound (remove pitch, noisy channel, )
(descriptors)
A Review : Research Findings of Acoustic and Perceptual Studies
97
Flow chart
Learning Representation Discriminative Model
98
(Low-level Descriptors)Low Level Descriptors (10 15 ms)
Mel Frequency Cepstral CoefficientsPitchSignal EnergyLoudnessVoice Quality (Jitter, Shimmer)Log Filterbank EnergiesLinear Prediction Cepstral CoefficientsCHROMA and CENS Features (Music)
Compute
Statistics Method
Continuous Qualitative Spectral
PitchEnergy
Formants
Voice quality :Harsh, tense,
breathy
LPCMFCCLFPC
99
:
Arias, Juan Pablo, Carlos Busso, and Nestor Becerra Yoma. "Shape-based modeling of the fundamental frequency contour for emotion detection in speech." Computer Speech & Language28.1 (2014): 278-294.
emotionally salient temporal segments
75.8% in binary emotion classification Dot, dash : subjective, dev. of sujective Solid : objective
100
Source Filter
(ex) High arousal
PhysicallyVocal Production System Respiration Vocal Fold Vibration Articulation
increase tension in laryngeal musculature
raised subglottis pressure
change production of sound at glottis
vocal quality
Johnstone, Tom & Scherer, Klaus. (2000). Vocal communication of Emotion. Handbook of Emotions,. . 101
Mel-scale Filter Bank
The response of the basilar membrane as a function of frequency, measured at six different distances from the stapes
The psychoacoustical transfer function
Stern, Richard M., and Nelson Morgan. "Features based on auditory physiology and perception." Techniques for Noise Robustness in Automatic Speech Recognition (2012): 193227.102
Support Vector Machine (SVM)
Convolutional Neural Network
Hidden Markov Model (HMM)
Recurrent Neural Network
Time series Model
103
End to End From LLD to Deep Learning
Z. Aldeneh and E. M. Provost, "Using regional saliency for speech emotion recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 2741-2745. doi: 10.1109/ICASSP.2017.7952655C. W. Huang and S. S. Narayanan, "Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition," 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 583-588. doi: 10.1109/ICME.2017.8019296
signal Neural Network emotion
CNN for Time Series SignalAttention
104
YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software, B.Mathieu, S.Essid, T.Fillon, J.Prado, G.Richard, proceedings of the 11th ISMIR conference, Utrecht, Netherlands, 2010.Florian Eyben, Felix Weninger, Florian Gross, Bjrn Schuller: Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor, In Proc. ACM Multimedia (MM), Barcelona, Spain, ACM, ISBN 978-1-4503-2404-5, pp. 835-838, October 2013. doi:10.1145/2502081.2502224Paul Boersma & David Weenink (2013): Praat: doing phonetics by computer [Computer program].
105
Paralinguistic Expression
Linguistic Expression
106
?
Schwarz-Friesel, Monika. "Language and emotion." The Cognitive Linguistic Perspective, in: Ulrike Ldtke (Hg.), Emotion in Language. TheoryResearchApplication, Amsterdam (2015): 157-173.
LexiconGrammarIdeational Meaning
Lindquist, Kristen A., Jennifer K. MacCormack, and Holly Shablack. "The role of language in emotion: predictions from psychological constructionism." Frontiers in psychology 6 (2015).
developmental and cognitive science, demonstrating that language helps
107
Human Behavior Evaluation
Cuples Therapy
Oral Presentation
Reviews
Hotels HBRNN
Amazon Cross-Lingual
Movie (93%), Book (92%), DVD (93%), PNN + RBM
Tweets
Positive & Negative
DCNN & LSTM
Ain, Qurat Tul, et al. "Sentiment analysis using deep learning techniques: a review." Int J Adv Comput Sci Appl 8.6 (2017): 424108
Review Article Social Media Talk
, ?
Its terrible!
What Texts Tell Us (Topics) Emotional Polarity
Its cool!
Parts of Speech (POS) tagsN-Gram
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
VB
VBD
NN
NNS
JJJJR
JJS
IN
TO
109
Dictionary-Based Sentiment Analysis
110
Changqin Quan and Fuji Ren. 2009. Construction of a blog emotion corpus for Chinese emotional expression analysis. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3 (EMNLP '09), Vol. 3. Association for Computational Linguistics, Stroudsburg, PA, USA, 1446-1454.Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a wordemotion association lexicon." Computational Intelligence29.3 (2013): 436-465.Pennebaker, James W., et al. The development and psychometric properties of LIWC2015. 2015.
:
LIWC (Linguistic Inquiry Word Count) : ()644500/ : 406/499
Seed WordGold Standard :
1.
2.
111
Data-driven?Sentiment Analysis (Unsupervised)Data-DrivenLatent Structure (representation) recognition
112
Sentiment Analysis (Supervised)
= 0.76 0.6 0.79 = 0.66 = 0.73 (Nave Bayes, SVM)
Aman, Saima, and Stan Szpakowicz. "Identifying expressions of emotion in text." Text, speech and dialogue. Springer Berlin/Heidelberg, 2007.
Feature RepresentationClassifier
Emotion Label
113
Deep Model
Lopez, Marc Moreno, and Jugal Kalita. "Deep Learning applied to NLP." arXiv preprint arXiv:1703.03091 (2017)..
Embed Embed Embed
I
LSTM LSTM LSTM
love it
positive
114
?
115
Automatic Speech Recognition (ASR)
f( ) = Speech Text
Challenging Task
speaker gender
Mapping / Translation
116
phoneme
Aldeneh, Zakaria & Khorram, Soheil & Dimitriadis, Dimitrios & Mower Provost, Emily. (2017). Pooling acoustic and lexical features for the prediction of valence. 68-72. 10.1145/3136755.3136760.
AffectNatural Language
Non-Verbal
SpeechBio-
Information
Image
Pooling Intermediate Representation
Performance
Robustness
117
Facial Action Coding SystemFACS
AU
118
Facial Action Coding System (FACS)
FACS
The tool for annotating facial expressions
What The Face Reveals is strong evidence for the fruitfulness of the systematic analysis of facial expression
Paul Ekman and Wallace V. Friesen 1976
Action Unit (AUs)
AUs are considered to be the smallest visuallydiscernible facial movement
As AUs are independent of any interpretation,they can be used as the basis for recognition ofbasic emotions
Its an explicit means of describing all possiblemovements of face in 46 action points
Action Unit (AUs) FACS is a tool for measuring facial expressions Each observable component of facial moment is called
an AUs
All facial expressions can be broken down into their constituent AUs
AU Description Example AU Description Example
1 Inner Brow Raiser 12 Lip Corner Puller
4 Brow Lowerer 13 Cheek Puffer
7 Lid Tightener 20 Lip stretcher
AU framework
Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)
Automatic face & Facial feature detection
Face alignment
Multiple image windows at a variety
of Locations and scales
Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation
Image filter:Modify or enhance the image
Facial AU(e.g., AU1, AU7,
AU6+ASU15, etc)
Rule-basedclassifier
e.g., Gabor filter coefficients
Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)
Automatic face & Facial feature detection
Face alignment
Multiple image windows at a variety
of Locations and scales
Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation
Image filter:Modify or enhance the image
Facial AU(e.g., AU1, AU7,
AU6+ASU15, etc)
Rule-basedclassifier
"Recognizing action units for facial expression analysis.Tian, Y-I., Takeo Kanade, and Jeffrey F. Cohn.
Recognize AUs for Facial Expression Analysis- Rule-based Classifier
Informed by FACS AUs, they group the facial features into upper and lower parts because the facial actions in two sides are relatively independent for AU recognition [14]P. Ekman and W.V. Friesen, The facial action coding system: A technique for the measurement of facial movement
single AU detection
combined AU detection
Recognize AUs for Facial Expression Analysis- Results
AU detectionEkman-Hager
Single AUdetection
Combine AUdetection
Recognition rate
Upperface 75 % 86.7 %
Lowerface 95.8 % 90.7 %
AU detectioncross database
Test databasesTrain
databaseCohn-Kanade
Ekman-Hager
Recognition rate
Upperface 93.2 % 86.7 %
Ekman-Hager
Lowerface 90.7 % 93.4 %
Cohn-Kanade
:
AU,
Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)
Automatic face & Facial feature detection
Face alignment
Multiple image windows at a variety
of Locations and scales
Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation
Image filter:Modify or enhance the image
Facial AU(e.g., AU1, AU7,
AU6+ASU15, etc)
Rule-basedclassifier
"Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds Mustafa Sert, and Nukhet Aksoy
AAM face track model
Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds
Extract facial images from Active Appearance Model (AAM) to form an appearance model
Facial AU multi-class classification using ADT for both AU detection and facial expression recognition
ADT learns a separate decision threshold for each AU category, assign instance to category if and only if satisfied:
= + >
is the mapping function to map SVM to high dimension space
Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds
: Prototypic and major variants of AU combinations for facial expression fear. + denotes logical AND , indicates logical OR
Facial expression recognition accuracy of the proposed scheme. Bold bracketed numbers indicate best result, bold numbers denote second best
Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds
ADT-based AU detector along with the rule-based emotion classifier (B&D) outperforms the baseline methods (A&C)
Among the proposed method, D gives best results in all facial emotion categories except surprise
The proposed ADT scheme outperforms the baseline method by an average F1-score of 6.383% for 17 AUs
It gives superior performance in terms of F1-score compared with the baseline method for all AUs except AU2
Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)
Automatic face & Facial feature detection
Face alignment
Multiple image windows at a variety
of Locations and scales
Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation
Image filter:Modify or enhance the image
Facial AU(e.g., AU1, AU7,
AU6+ASU15, etc)
Rule-basedclassifier
"Compound facial expressions of emotion: from basic research to clinical applicationsShichuan Du, and Aleix M. Martinez
Observations under distinct compound emotions
Compound facial expressions of emotion
Compound facial expressions of emotion
AU intensity shown in a cumulative histogram for each AU and emotion category
The x-axis in these histograms specifies the intensity of activation The y-axis in these histograms defines the cumulative percentage of intensity
(scale 0 to 1)
Numbers between zero and one specify the percentage of people using the specified and smaller intensities.
Fig. AUs used to express a compound emotion are consistent with the AUs used to express its component categories
Key take-away
AU
AU ! ! !
AUDNN
++
135
136
AffectNatural Language
Non-Verbal
Speech Physiology
Face
Body Gestures
?
1234567
reference: de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247268. 137
Psychology Bull, P. E. Posture and gesture. Pergamon press, 1987. Pollick, F. E., Paterson, H. M., Bruderlin, A., and Sanford, A. J. Perceiving affect from arm
movement. Cognition 82, 2 (2001), B51B61.
Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions, and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117139.
Boone, R. T., and Cunningham, J. G. Childrens decoding of emotion in expressive body movement: The development of cue attunement. Developmental psychology 34 (1998), 10071016.
de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247268.
Engineer Balomenos, T., Raouzaiou, A., Ioannou, S., Drosopoulos, A., Karpouzis, K., and Kollias, S.
Emotion analysis in man-machine interaction systems. In Machine learning for multimodal interaction. Springer, 2005, 318328.
Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions, and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117139.
reference: Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 2014 138
?
12 actorsfour female and eight malesaged between 24 and 60total of about 100 videosseparate clips of expressive gesture
reference: 1) Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-
time Automatic Emotion Recognition from Body Gestures in ArXiv 20142) Amol S. Patwardhan and Gerald M. Knapp, Augmenting Supervised Emotion Recognition with
Rule-Based Decision Model. in ArXiv 2016
QualisysKinect
139
Data Validation Human annotation?
The sole 3D skeleton is a guarantee that the user is not exploiting other information
Not easy for human to
recognize emotion only based
on gesture
reference: Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 2014 140
Skeleton based feature
anger sadness
happiness fear
surprise disgust
reference:1)Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 20142) Piana, S., Stagliano`, A., Camurri, A., and Odone, F. A set of full-body movement features for emotion recognition to help children affected by autismspectrum condition. In IDGEI International Workshop (2013).
Histogram: Energy on each of frames
141
Classification Result
Qualisys Data with 310 gestures
Kinect Data with 579 gestures
Clean Dataset
Noisy Dataset
Almost the same to the humans recognition ability 142
Skeleton Capture Method
Kinect
reference:https://itp.nyu.edu/classes/dance-f16/kinect/,
https://github.com/CMU-Perceptual-Computing-Lab/openposehttps://www.qualisys.com/
expensive, sophisticated system with multiple high speed camera
cheap, easy to get RGB-D 3D camera device
free, new software system with CNN
Qualisys
OpenPose
143
https://itp.nyu.edu/classes/dance-f16/kinect/https://github.com/CMU-Perceptual-Computing-Lab/openposehttps://www.qualisys.com/
OpenPose: CNN based Method
144
Pose difference/movement indicative of arousal mostly
145
AffectNatural Language
Non-Verbal
Speech Physiology
Face
Body Gestures
1
/
2
4
1
*
*
*
3
6
1
**
1
5
Expressive Data AI Learning and Inference ?
146
1
/
2
4
1
*
*
*
3
6
1
**
1
5
Internal Data AI Learning and Inference ?
147
-Polyvagal theory
Stephen Porges
148
(immobilization)
(fight -flight)(mobilization)
149
reference:D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168172, 2012http://blog.sina.com.cn/s/blog_753e49f90100pop2.htmlhttp://www.xzbu.com/6/view-2908185.htm
150
http://blog.sina.com.cn/s/blog_753e49f90100pop2.htmlhttp://www.xzbu.com/6/view-2908185.htm
HRV (Heart Rate Variability)
ANS
(HRV)
(HRV analysis) [2]
reference: 1) Mara Teresa Valderas , Juan Bolea, Pablo Laguna, Montserrat Vallverd, Raquel Bailn, Human Emotion Recognition Using Heart Rate Variability Analysis with Spectral Bands Based on Respiration in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE.2) Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996) Heart rate variability. Standards of measurement, physiological interpretation, and clinical use. Eur Heart J 17(3):354-813) D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168172, 2012.
151
HRV
total power, TP
ms2
0.4Hz
very low
frequency power, VLFP
ms2
0.04Hz
low frequency
power, LFPms2
0.04-0.15Hz
high frequency
power, HFPms2
0.15-0.4Hz
normalized LFP, nLFP
,n.u. LF/(TP-VLF)
normalized HFP,nHFP
,n.u. HF/(TPVLF)
LF/HF
https://zh.wikipedia.org/wiki/%E5%BF%83%E7%8E%87%E8%AE%8A%E7%95%B0%E5%88%86%E6%9E%90152
Emotion elicitation real experiences film clips problem solving computer game interfaces images spoken words music
Movie clips method emotion inducing method more efficient than others verified by
previous studies 4 films (3- 10 min for each one) 4 emotion: angry, fear, sad and happy ECG data was record for 90 sec at 2 min before the end of movies.
reference: 1)Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 20152) Mimma Nardelli, Gaetano Valenza, Alberto Greco, Antonio Lanata, Enzo Pasquale Scilingo, Recognizing Emotions Induced by Affective Sounds through Heart Rate Variability in IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 6, NO. 4, OCTOBER-DECEMBER 2015
induced
153
ECG process pipeline
reference: Abhishek Vaish and Pinki Kumari, A Comparative Study on Machine Learning Algorithms in Emotion State Recognition Using ECG in Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012
154
ECG Feature Extraction: HRV
Time Domain Feature Frequency Domain Feature
reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
1. MeanRRI:
average of resultant RR intervals. 2. CVRR:
the ratio of the standard deviation and mean of RR intervals.
3. SDRR:stand deviation of the RR intervals.
4. SDSD:
standard deviation of the successivedifferences of the RR intervals.
1. LF (low frequency):
standardized LF power (0.04-0.15 Hz)
2. HF (high frequency):
standardized HF power (0.15-0.4 Hz)
3. LHratio:the ratio of LF/HF
he shapes of the probability distributions
Statistic Feature
Evaluate the distribution :
155
Analysis on Feature
Time Domain Feature Frequency Domain Feature Statistics Feature
reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
156
Classifier
reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015 157
,
158
1
/
2
4
1
*
*
*
3
6
1
**
1
5
? !
159
160
Group-Level EmotionThin Slice
: Multi-Task
Cross CorpusCommon ground
Cross Lingual Perspective
161
LLD
: ?
Encoding
Result & Discussion (Binary Classification: Unweighted Average Recall)
Database Act. Feature Rep. Val. Feature Rep.
CIT 0.658 Praat BoAW 0.613 Praat FV
IEMOCAP 0.769 EGEMAPS Func. 0.663 Praat FV
NNIME 0.65 Praat FV 0.564 Praat BoAW
RECOLA 0.634 EGEMAPS Func. 0.602 Praat BoAW
VAM 0.811 ComP_LLD FV 0.665 EGE_LLD BoW
Variational Deep Embedding Fisher Scoring
: ?
: !
Generated Perspectives Multi-view Kernel Fusion
1. Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition"in Proceedings of ACII 2017
2. Chun-Min Chang, Chi-Chun Lee, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP), 2017
?
Group-Level EmotionThin Slice
Multi-Task
Cross CorpusCommon ground
Cross Lingual Perspective
166
memory
cognitive
emotion
167
1
2 3
4
(/)
168
alexithymia
169
1
2
3
4
5
170
(/)Mental Well-being
()
Ref: https://ac.els-cdn.com/S1877042815003080/1-s2.0-S1877042815003080-main.pdf?_tid=238e46fe-da36-11e7-86e8-00000aab0f26&acdnat=1512531351_6db4641b5d3531d365e0f207f474d65f
171
()Well-beings
(Boredom, it turns out, can be a dangerous and disruptive state of mind that damages your health)(Manns )
Ref: http://alcoholrehab.com/drug-addiction/boredom-and-substance-abuse/
Boredom
Ref: On the Function of Boredom (Shane W. Bench) 172
Ref: The Facilitation of Social-Emotional Understanding and Social Interaction in High-Functioning Children with Autism: Intervention Outcomes
:
Ref: Social Skills Deficits in Children with Autism Spectrum Disorders: Evidence Based Interventions
173
(Pintrich, 1991, p. 199)
( special issue of theEducational Psychologist)
Ref: The Importance of Students Goals in TheirEmotional Experience of Academic Failure:Investigating the Precursors and Consequences of Shame (Jeannine E. Turner )
174
The importance of Students Goals in Their Emotional Experience of Academic Failure: Investigating the Precursors and Consequences of Shame,
The Importance of Students Goals in TheirEmotional Experience of Academic Failure:Investigating the Precursors and Consequences ofShame (Jeannine E. Turner)
175
FMRI()
Ref: https://blog.hubspot.com/marketing/emotions-in-advertising-examples
176
(communication and perception of emotion in music)
(emotional consequences of music listening)
(predictors of music preferences)
Swathi Swaminathan
Ref: Current Emotion Research in Music Psychology (Swathi Swaminathan )177
SENSE EMOTION
..
178
1
/
2
4
1
*
*
*
3
6
1
**
1
5
!
179
functional Magnetic Resonance Imaging (fMRI)
180
181
Uses a standard MRI scanner Acquires a series of images (numbers) Measure changes in blood oxygenation Use non-invasive, non-ionizing radiation Can be repeated many times; can be used for a
wide range of subjects
Combines good spatial and reasonable temporal resolution
Synopsis of fMRI
182
Blood-Oxygen-Level dependent (BOLD)
183
Emotion Perception Decoding from fMRI
fMRI Dataset
Interactionbehavior
SPM Preprocessing
Emotion
MachineLearning
Behaviorobservation
184
Emotional modules
185
Co-activation graph for each emotion category
A) Force-directed graphs for each emotion category, based on the Fruchterman-Reingold spring algorithm
B) The same connections in the anatomical space of the brain.
186 -
?
187
?
1
/
2
4
1
*
*
*
3
6
1
**
1
5
INFERENCE?
188
189
?
Our Research: Human-centered Behavioral Signal Processing (BSP)
Prof. Shrikanth Narayanan
Seek a window into human mind and traits
through engineering approach
S. Narayanan and P. G. Georgiou, Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 12031233, 2013.Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, Shrikanth Narayanan, "Signal Processing and Machine Learning for Mental Health Research and Clinical Applications", in IEEE Signal Processing Magazine
(EECS713)Behavioral Informatics and Interaction Computation Laboratory (BIIC)
(Signal Processing)
(Machine Learning)
(Decision Analytics)
High-dimensional Behavior Space, Non-linear Predictive Recognition, Multimodal Integration, Experts Decision Mechanism
Spatial-temporal modelingDe-noisingFeature extraction
SupervisedUnsupervisedSemi-supervised
Our Technology: Human-centric Decision Analytics Research & Development
Core Technology
Speech & LanguageDiarization, SpeakerID, ASR,
Paralinguistic Descriptors, Emotion-AI, Sentiment, Word-topic
Representation
Computer VisionSegmentation, Tracking, Image-
Video Descriptors
Multimodal FusionJoint speech-language-gesture
modeling for multimodal prediction, Multi-party interaction modeling
Representation LearningBehavior embedded space learning,
clinical health informatics data representation
Predictive LearningDeep-learning, machine learning
based predictive modeling
BIICInterdisciplinary
Research
ASD
PAIN
EHR
fMRI
EMO-AI
ORAL
Mental Health
Clinical Health
Affective Computing
Education
Our Application: Human-centered Exemplary BSP Domains
Flow
Consumer Behavior
EMO-AI
Neuroscience
KEY APPLICATIONS
Affective Computing
Mental Health
Clinical Health
Education
Neuroscience
Consumer Behavior
:
193
:
Computing beyond status-quo in making a positive impact
Factual Conceptual Procedural Metacognitive
Computation Blueprints
BehaviorComputing
HealthAnalytics
Affect Recognition
Emphatic Computing
Social Computing
Value-Sensitive
TechnologyAffectiveFeedback
Interpersonal RelationshipComputing
Cognitive Feedback
Fulfillment Empowerment
Motivation
Internal States
External Functions
Our Vision: Human-Centric Computing (HCC)computationally innovate human-centric empowerment enabling next-generation entity intelligence
195
PHD
BIIC LAB MEMBERS
196
BIIC Lab @ NTHU EE
http://biic.ee.nthu.edu.tw
THANK YOU . . .