[系列活動] Emotion-AI: 運用人工智慧實現情緒辨識

Emotion-AI:

December 16th, 2017

1

Affective Computing (1995):

the study and development of systems and devices that can recognize, interpret, process, and simulate human affects

Professor Rosalind PiccardMIT Media Lab

Annual Conference on AffectComputing and IntelligentInteraction (ACII)

ACII 2017 @ San Antonio

2

People also smile when they are miserable. Paul Ekman, Telling Lies: Clues to Deceit the Marketplace, Politics, and Marriage

5

PhilosophyDiscuss emotion with philosophy

Turn to PracticalCombing physical and emotion tostart to apply the systems on human

Cognitive ProcessCognitive Theory

Mind-Body DualismCombining physical world with emotion

Modern Theory1

2

3

4

5

7

https://aquileana.wordpress.com/2014/04/14/platos-phaedrus-the-allegory-of-the-chariot-and-the-tripartite-nature-of-the-soul/

Platos horses

Successful Person- Reason horse is more in control

described emotion and reason as two horses pulling us in opposite directions.


8

Stoicism Aristippus


9

Mind-Body Dualism

In the 17th century, Ren Descartes viewed the bodys emotional apparatus as largely hydraulic. He believed that when a person felt angry or sad it was because certain internal valves opened and released such fluids as bile and phlegm

Mind-Body DualismCombining physical world with emotion

10

Charles Darwin believed that emotions were beneficial for evolution because emotions improved chances of survival. For example, the brain uses emotion to keep us away from a dangerous animal (fear), away from rotting food and fecal matter (disgust), in control of our resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).

Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.

Turn to PracticalDiscuss the combination with physic andemotion and start to apply the systemof human

11

James, William. 1884. "What Is an Emotion?" Mind. 9, no. 34: 188-205.

Our feeling of the same changes as they occur is the emotion

Modern Theory

12

James- Lange Cannon-Bard Schachter & Singer

13

James- Lange Cannon-Bard Schachter & Singer

:1.:

2.

3.:

James

Two-factor theory of emotion

:

:1.(hypothalamus) (limbicsystem)

14

321

18

,

3

19

1 ()

Cognitive ProcessCognitive Theory

21

James-Lange

22

ArnoldLazarus- (Appraisal)

TomkinsIzard-

23

1

/

2

4

1

*

*

*

3

6

1

**

1

5

24

1

/

2

4

1

*

*

*

3

6

1

**

1

5

25

1

/

2

4

1

*

*

*

3

6

1

**

1

5

INFERENCE?

26

?

TAG?

27

Charles Darwin believed that emotions were beneficial for evolution because emotionsimproved chances of survival. For example, the brain uses emotion to keep us away from adangerous animal (fear), away from rotting food and fecal matter (disgust), in control ofour resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).

Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.

28

100

Paul Ekman

29

Are There Universal Facial Expressions? Just guess

31

Facial Action Coding SystemFACS

AU

32

Mascolo, M. F., Fischer, K. W., & Li,J. (2003). Dynamic development of component system of emotions: Pride, shame, and guilt in China and the United States. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 375-408). New York: Oxford University Press.Shaver, P. R., Wu, S., & Schwartz, J. C. (1992). Cross-cultural similarities and differences in emotion and its representation: A prototype approach. In Clark, M. S. (Ed.), Review of Personality and Social Psychology, 13, pp. 231-251. Sage: Thousand Oaks.

33

1

2

3

4

2

/

1

*

*

*

4

**

3

2

1 4

3

34

?

35

label ?

there is no limit to the number of possible different emotions

William James

36

Silvan Tomkins (1962) concluded that there are eight basic emotions:

surprise, interest, joy, rage, fear, disgust, shame, and anguish

Carroll Izard (the University of Delaware 1993) 12 discrete emotions labeled: Interest, Joy, Surprise, Sadness, Anger, Disgust, Contempt,

Self-Hostility, Fear, Shame, Shyness, and Guilt

Differential Emotions Scale or DES-IV

37

Ekman 1972 ()

38

Dimensional models of emotion

Define emotions according to one or more dimensions

Wilhelm Max Wundt(1897) three dimensions: "pleasurable versus unpleasurable",

"arousing or subduing" and "strain or relaxation

Harold Schlosberg (1954) three dimensions of emotion: "pleasantness

unpleasantness", "attentionrejection" and "level of activation

Prevalent incorporate valence and arousal dimensions

39

Circumplex model Vector model Positive activation negative activation (PANA) model Plutchik's model PAD emotional state model Lvheim cube of emotion Cowen & Keltner 2017

40

Circumplex model : Perceptual

developed by James Russell (1980) two-dimensional circular space, containing arousal

and valence dimensions

arousal represents the vertical axis and valence represents the horizontal axis

prevalent use as labels

41

Positive activation Negative activation (PANA) Self Report

created by Watson and Tellegen in 1985 suggests that positive affect and negative affect are

two separate systems (responsible for differentfunctions)

states of higher arousal tend to be defined by theirvalence

states of lower arousal tend to be more neutral interms of valence

the vertical axis represents low to high positive affect the horizontal axis represents low to high negative

affect. the dimensions of valence and arousal lay at a 45-

degree rotation over these axes

42

Cowen & Keltner

2017, University of California, Berkeley researchers Alan S. Cowen & Dacher Keltner (PNAS)

27 distinct emotions http://news.berkeley.edu/2017/09/06/27-

emotions/ (A.) Admiration. (B.) Adoration. (C.) Aesthetic appreciation. (D.)

Amusement. (E.) Anger. (F.) Anxiety. (G.) Awe. (H.) Awkwardness. (I.)Boredom. (J.) Calmness. (K.) Confusion. (L.) Craving. (M.) Disgust. (N.)Empathic pain. (O.) Entrancement. (P.) Excitement. (Q.) Fear. (R.) Horror.(S.) Interest. (T.) Joy. (U.) Nostalgia. (V.) Relief. (W.) Romance. (X.) Sadness(Y.) Satisfaction (Z.) Sexual desire. (.) Surprise.

44

http://news.berkeley.edu/2017/09/06/27-emotions/

:

Affective Computing

Many Theories

Many Models/Annotations

Take Away? Stable

45

1

/

2

4

1

*

*

*

3

6

1

**

1

5

(Data-driven AI Learning and Inference) ?

46

Affective Computing

reference: https://www.gartner.com/newsroom/id/3412017/

fast growing, but still not a mature technique

48

Face

Affective Computing

SpeechBody GestureMulti-Modal PhysiologyLanguage

reference: http://blog.ventureradar.com/2016/09/21/15-leading-affective-computing-companies-you-should-know/49

Education Health Care Gaming Advertisement Retail Legal

Emotion Recognition AS Part of Larger SystemAPI, SDK

50

Little Dragon(Affectiva- Education)

make learning more enjoyable and more effective, by providing an educational tool that is both universal and personalized

reference: https://www.affectiva.com/success-story/

https://www.youtube.com/watch?v=SmjAa8iMkjU 52

https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=SmjAa8iMkjU

Nevermind(Affectiva- Gaming)

bio-feedback horror game

sense a players facial expressions for signs of emotional distress, and adapt game play accordingly

reference: https://www.affectiva.com/success-story/chttps://www.youtube.com/watch?v=NGr0orAqRH4&t=497s 54

https://www.affectiva.com/success-story/chttps://www.youtube.com/watch?v=NGr0orAqRH4&t=497s

Brain Power(Affectiva- Health Care)

The Worlds First Augmented Reality Smart-Glass-Systemto empower children and adults with autism to teachthemselves crucial social and cognitive skills.

reference: https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=qfoTprgWyns

55

https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=qfoTprgWyns

MediaRebel(Affectiva- Legal)

Legal video deposition management platform MediaRebel uses AffectivasEmotion SDK for facial expression analysis and emotion recognition.

Intelligent analytical features include: Search transcript based upon witness emotions Instantly playback testimony based upon select emotions Identify positive, negative & neutral witness behavior

reference: https://www.affectiva.com/success-story/

https://www.mediarebel.com/57

https://www.affectiva.com/success-story/https://www.mediarebel.com/

shelfPoint(Affectiva- Retail)

Cloverleaf is a retail technology company for the modern brick-and-mortar marketer and merchandise

shelfPoint solution: brands and retailers can now capture customer engagement and sentiment data at the moment of purchase decision something previously unavailable in physical retail stores.

reference: https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=S9gDqpF6kLs

https://www.youtube.com/watch?v=W6UnahO_zXs58

https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=S9gDqpF6kLshttps://www.youtube.com/watch?v=W6UnahO_zXs

1

/

2

4

1

*

*

*

3

6

1

**

1

5

Data-driven AI Learning and Inference ?

61

?

?

62

63

Year Database Language Setting Protocol Elicitation

1997 DES Dan. Single Scr. Induced

2000 GEMEP Fre. Single Scr. & Spo. Acted

2005 eNTERFACE' 05 Eng. Single Scr. Induced2007 HUMAINE Eng. TV Talk Scr. & Spo. Mix.

2008 VAM Ger. TV Talk Spo. Acted

2008 IEMOCAP Eng. Dyadic Scr. & Spo. Acted

2009 SAVEE Eng. Single Spo. Acted

2010 CIT Eng. Dyadic Scr. & Spo. Acted

2010 SEMAINE Eng. Dyadic Scr. Mix.

2013 RECOLA Fre. Dyadic Spo. Acted

2016 CHEAVD Chi. TV talk Spo. Posed

2017 NNIME Chi. Dyadic Spo. Acted

Language: DanishParticipants: 4 (Man: 2; Female: 2)Recordings:

AudioTotal: 0.5 hoursSentences: 5200 utterancesLabels:

Perspectives: Nave-Observer Rater: 20 Discrete session-level annotation

Categorical (5)

DES:DESIGN, RECORDING AND VERIFICATION OF A DANISH

EMOTIONAL SPEECH DATABASE

64

Engberg, Inger S., et al. "Design, recording and verification of a Danish emotional speech database." Fifth European Conference on Speech Communication and Technology. 1997.

Available: Tom Brndsted ([email protected])

DES

Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification1

(Cat.:0.676)

Automatic emotional speech classification2

(Cat.:0.516)

65

1Yun, Sungrack, and Chang D. Yoo. "Loss-scaled large-margin Gaussian mixture models for speech emotion classification."IEEE Transactions on Audio, Speech, and Language Processing20.2 (2012): 585-598.

2Ververidis, Dimitrios, Constantine Kotropoulos, and Ioannis Pitas. "Automatic emotional speech classification." Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on. Vol. 1. IEEE, 2004..

Language: FrenchParticipants: 10 (Man: 5; Female: 5)Recordings:

Dual-channel Audio HD Video Manual Transcript Face & Head Body Posture & Gestures

Sentences: 7300 sequencesLabels:

Perspectives: Nave-Observer Discrete session-level annotation Categorical (18)

GEMEP:Geneva Multimodal Emotion Portrayals corpus

66

Bnziger, Tanja, Hannes Pirker, and K. Scherer. "GEMEP-GEneva Multimodal Emotion Portrayals: A corpus for the study of multimodal emotional expressions." Proceedings of LREC. Vol. 6. 2006.

Bnziger, Tanja, and Klaus R. Scherer. "Using actor portrayals to systematically study multimodal emotion expression: The GEMEP corpus." International conference on affective computing and intelligent interaction. Springer, Berlin, Heidelberg, 2007.

Available: Tanja Bnziger(Tanja.Banziger@ pse.unige.ch)

GEMEP

Multimodal emotion recognition from expressive faces, body gestures and speech

(Cat.: 0.571)

67

Kessous, Loic, Ginevra Castellano, and George Caridakis. "Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis." Journal on Multimodal User Interfaces 3.1 (2010): 33-48.

Language: EnglishParticipants: 42 (Man: 34; Female: 24)(14 different nationalities)Recordings:

Dual-channel Audio HD Video Script

Total: 1166 video sequencesEmotion-related atmosphere:

To express six emotions

eNTERFACE' 05:The eNTERFACE05 Audio-Visual Emotion Database

68

Martin, Olivier, et al. "The enterface05 audio-visual emotion database." Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on. IEEE, 2006.

Available: O. Martin ([email protected])

eNTERFACE' 05 Sparse autoencoder-

based feature transfer learning for speech emotion recognition1

(Cat.: 59.1)

Unsupervised learning in cross-corpus acoustic emotion recognition2

(Val./Act.:0.574/0.616)

69

1Deng, Jun, et al. "Sparse autoencoder-based feature transfer learning for speech emotion recognition." Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 2013.

2Zhang, Zixing, et al. "Unsupervised learning in cross-corpus acoustic emotion recognition." Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011.

Language: EnglishParticipants: Many (Include 8 datasets)Recordings : (Naturalistic (TV shows, interviews)/Induced data)

Audio Video Gesture Emotion words

Labels: Perspectives: Nave-Observer Rater: 4 Continuous-in-time annotation

Dimensional (8) [Intensity, Activation, Valence, Power, Expect, Word] Discrete annotation (5)

Emotion-related states Key Event Everyday Emotion words

HUMAINE:Addressing the Collection and Annotation of

Naturalistic and Induced Emotional Data

70

Douglas-Cowie, Ellen, et al. "The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data." Affective computing and intelligent interaction (2007): 488-500.

Available: [email protected]

HUMAINE

A Multimodal Database for Affect Recognition and Implicit Tagging1

(Val./Act.:0.761/0.677)

Abandoning Emotion Classes - Towards Continuous Emotion Recognition with Modelling of Long-Range Dependencies2

(Val./Act.[MSE]:0.18/0.08)

71

1Soleymani, Mohammad, et al. "A multimodal database for affect recognition and implicit tagging." IEEE Transactions on Affective Computing 3.1 (2012): 42-55.

2Wllmer, Martin, et al. "Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies." Ninth Annual Conference of the International Speech Communication Association. 2008.

Language: German TV showsParticipants: 47 Recordings:

Audio Video Face Manual Transcript

Total: 12 hoursSentences: 946 utterancesLabels:

Perspectives: Peer, Director, Self, Nave-Observer Rater: 17 Continuous-in-time annotation

Dimensional (Valence-Activation-Dominance) for Audio Discrete session-level annotation

Categorical (7) for Faces

VAM:The Vera am Mittag German Audio-Visual

Spontaneous Speech Database

72

Grimm, Michael, Kristian Kroschel, and Shrikanth Narayanan. "The Vera am Mittag German audio-visual emotional speech database." Multimedia and Expo, 2008 IEEE International Conference on. IEEE, 2008.


VAM

Towards robust spontaneous speech recognition with emotional speech adapted acoustic models1

(Word ACC.: 42.75)

Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization Speech Adapted Acoustic Models2

(Val./Act.: 0.502/0.677)

73

1Vlasenko, Bogdan, Dmytro Prylipko, and Andreas Wendemuth. "Towards robust spontaneous speech recognition with emotional speech adapted acoustic models." Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, Saarbrucken, Germany. 2012.

2Schuller, Bjrn, et al. "Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization." Proc. 2011 Afeka-AVIOS Speech Processing Conference, Tel Aviv, Israel. 2011.

Language: EnglishParticipants: 10 (Man: 5; Female: 5)Recordings:

Dual-channel Audio HD Video Manual Transcript 53 Marker Motion (Face and Head)

Total: 12 hours, 50 sessions (3 min/session)Sentence: 6904 sentencesLabels:

Perspectives: Nave-ObserverSelf (6/10) Rater: 6 Continuous-in-time annotation

Dimensional (Valence-Activation-Dominance) Discrete session-level annotation

Categorical (5)

IEMOCAP:The Interactive Emotional Dyadic Motion Capture

database

74

Busso, Carlos, et al. "IEMOCAP: Interactive emotional dyadic motion capture database." Language resources and evaluation 42.4 (2008): 335.

Available: Anil Ramakrishna ([email protected])

IEMOCAP Tracking continuous

emotional trends of participants during affective dyadicinteractions using body language and speech information1

(Val./Act./Dom.:0.619/0.637/0.62)

Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions2

(Cat./Val./Act.:0.552/0.634/0.650)

75

1Metallinou, Angeliki, Athanasios Katsamanis, and Shrikanth Narayanan. "Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information." Image and Vision Computing 31.2 (2013): 137-152.

2Lee, Chi-Chun, et al. "Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions." Tenth Annual Conference of the International Speech Communication Association. 2009.

Language: EnglishParticipants: 4 (Man: 4)Recordings:

Dual-channel Audio Video Face Maker

Sentences: 480 utterancesLabels:

Perspectives: Nave-Observer Discrete session-level annotation

Categorical (6)

SAVEE:Surrey Audio-Visual Expressed Emotion database

76

Jackson, P., and S. Haq. "Surrey Audio-Visual Expressed Emotion(SAVEE) Database." University of Surrey: Guildford, UK (2014).

Available: P Jackson ([email protected])

SAVEE

77

. 2S. Haq and P.J.B. Jackson. "Speaker-Dependent Audio-Visual Emotion Recognition", In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pages

53-58, 2009.3S. Haq, P.J.B. Jackson, and J.D. Edge. Audio-Visual Feature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual

Speech Processing, pages 185-190, 2008

Speaker-Dependent Audio-Visual Emotion Recognition1

(Cat.: 97.5) Audio-Visual Feature

Selection and Reduction for Emotion Classification3

(Cat.: 96.7)

Language: EnglishParticipants: 16 (Man: 7; Female: 9)Recordings:

Dual-channel Audio HD Video Transcript Body gesture

Total: 48 dyadic sessions Sentences: 2162 sentenceLabels:

Perspectives: Nave-Observer Rater: 3 Discrete session-level annotation Continuous-in-time annotation

Dimensional (Valence-Activation-Dominance)

CIT:The USC CreativeIT database of multimodal dyadic

interactions: from speech and full body motion captureto continuous emotional annotations

78

Metallinou, Angeliki, et al. "The USC CreativeIT database: A multimodal database of theatrical improvisation." Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (2010): 55.

Metallinou, Angeliki, et al. "The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations." Language resources and evaluation 50.3 (2016): 497-521.

Available: Manoj Kumar ([email protected])

CIT

79

1Yang, Zhaojun, and Shrikanth S. Narayanan. "Modeling dynamics of expressive body gestures in dyadic interactions." IEEE Transactions on Affective Computing 8.3 (2017): 369-381.

2Yang, Zhaojun, and Shrikanth S. Narayanan. "Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions." INTERSPEECH. 2016.3Chang, Chun-Min, and Chi-Chun Lee. "Fusion of multiple emotion perspectives: Improving affect recognition through integrating cross-lingual emotion

information." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.

Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions2

Language: EnglishParticipants: 150Recordings:

Dual-channel Audio HD Video Manual Transcript

Multi-Interaction (like TV talk show): Human vs. Human Semi-human vs. Human Machine vs. Human

Total: 959 dyadic sessions (3 min/session)Labels:

Perspectives: Nave-Observer Rater: 8 Continuous-in-time annotation

Dimensional (Valence-Activation) Discrete Categorical (27)

SEMAINE:The SEMAINE Database: Annotated Multimodal Records of Emotionally

Colored Conversations between a Person and a Limited Agent

80

McKeown, Gary, et al. "The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent." IEEE Transactions on Affective Computing 3.1 (2012): 5-17.


SEMAINE Building autonomous

sensitive artificial listeners1

A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling2

(0.701)

81

1Schroder, Marc, et al. "Building autonomous sensitive artificial listeners." IEEE Transactions on Affective Computing 3.2 (2012): 165-183.

2Jiang, Bihan, et al. "A dynamic appearance descriptor approach to facial actions temporal modeling." IEEE transactions on cybernetics 44.2 (2014): 161-174.

Language: FrenchParticipants: 46 (Man: 19; Female: 27)Recordings:

Dual-channel Audio HD Video (15 facial action units) Electrocardiogram Electrothermal activity

Total: 11 hours, 102 dyadic sessions (3 min/session)Sentence: 1306 sentenceLabels:

Perspectives: Self, Nave-Observer Rater: 6 Continuous-in-time annotation

Dimensional (Valence-Activation)

RECOLA:Remote Collaborative and Affective Interactions

82

Ringeval, Fabien, et al. "Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions." Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 2013.

Available: Fabien Ringeval ([email protected])

RECOLA Prediction of asynchronous

dimensional emotion ratings from audiovisual and physiological data1

(Val./Act.: 0.804/0.528 )

End-to-end speech emotion recognition using a deep convolutional recurrent network2

(Val./Act.: 0.741/0.325 )

Face Reading from SpeechPredicting Facial Action Units from Audio Cues3

(Predict Facial Action Units from Audio Cues: 0.650 )

83

1Ringeval, Fabien, et al. "Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data." Pattern Recognition Letters 66 (2015): 22-30.2Trigeorgis, George, et al. "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network." Acoustics, Speech and Signal

Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.3Ringeval, Fabien, et al. "Face Reading from SpeechPredicting Facial Action Units from Audio Cues." Sixteenth Annual Conference of the International Speech

Communication Association. 2015.

Language: ChineseParticipants: 238 Recordings:

Audio Video (34 films, 2 TV series, 4 TV shows)

Total: 2.3 hours,Labels:

Rater: 4 Discrete session-level annotation

Fake/suppressed emotions Multi-emotion annotation for some segments Categorical (26 non-prototypical)

2017 Multimodal Emotion Recognition Challenge (MEC 2017: http://www.chineseldc.org/htdocsEn/emotion.html)

CHEAVD: A Chinese natural emotional audio-visual database

84

Li, Ya, et al. "CHEAVD: a Chinese natural emotional audiovisual database." Journal of Ambient Intelligence and Humanized Computing 8.6 (2017): 913-924.

Available: Ya Li ([email protected])

CHEAVD MEC 2016: the multimodal emotion recognition

challenge of CCPR 20161 (Cat.: 37.03)

Chinese Speech Emotion Recognition2 (Cat.: 47.33) Transfer Learning of Deep Neural Network for

Speech Emotion Recognition3 (Cat.: 50.01)

85

1Li, Ya, et al. "MEC 2016: the multimodal emotion recognition challenge of CCPR 2016." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.

2Zhang, Shiqing, et al. "Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.

3Huang, Ying, et al. "Transfer Learning of Deep Neural Network for Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.

Language: ChineseParticipants: 44 (Man: 20; Female: 24)Recordings:

Dual-channel Audio HD Video Manual Transcript Electrocardiogram

Total: 11 hours, 102 dyadic sessions (3 min/session)Sentences: 6029 utterancesLabels:

Perspectives: Peer, Director, Self, Nave-Observer Rater: 49 Continuous-in-time annotation Discrete session-level annotation

Dimensional (Valence-Activation) Categorical (6)

NNIME:The NTHU-NTUA Chinese Interactive Multimodal

Emotion Corpus

86

Huang-Cheng Chou, Wei-Cheng Lin, Lien-Chiang Chang, Chyi-Chang Li, Hsi-Pin Ma, Chi-Chun Lee "NNIME: The NTHU-NTUA Chinese Interactive Multimodal Emotion Corpus" in Proceedings of ACII 2017

Available: Huang-Cheng Chou ([email protected])Chi-Chun Lee ([email protected])

NNIME

Cross-Lingual Emotion Information1,3(session)

(Val./Act.: 0.682/0.604)

Dyad-Level Interaction2

(Cat.: 0.65)

87

1Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee*, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition" in Proceedings of ACII 2017

2Yun-Shao Lin, Chi-Chun Lee*, "Deriving Dyad-Level Interaction Representation using Interlocutors Structural and Expressive Multimodal Behavior Features" in Proceedings of the International Speech Communication Association (Interspeech), pp. 2366-2370, 2017

3Chun-Min Chang, Chi-Chun Lee*, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.5820-5824, 2017

Access These Emotion Database

88

Year Database Website

1997 DES http://kom.aau.dk/~tb/speech/Emotions/

2000 GEMEP https://www.affective-sciences.org/gemep/

2005 eNTERFACE' 05 http://www.enterface.net/enterface05/2007 HUMAINE http://emotion-research.net/download/pilot-db/

2008 VAM http://emotion-research.net/download/vam

2008 IEMOCAP http://sail.usc.edu/iemocap/

2009 SAVEE http://kahlan.eps.surrey.ac.uk/savee/

2010 CIT http://sail.usc.edu/CreativeIT/ImprovRelease.htm

2010 SEMAINE https://semaine-db.eu/

2013 RECOLA https://diuf.unifr.ch/diva/recola/download.html

2016 CHEAVD Upon request

2017 NNIME http://nnime.ee.nthu.edu.tw/

Key take-away

()

()

()

: ?

89

1

/

2

4

1

*

*

*

3

6

1

**

1

5


90

1

/

2

4

1

*

*

*

3

6

1

**

1

5


91

Speech

Text

Gesture

Face

Human Expression

92

Paralinguistic Expression

Linguistic Expression

93

Achievement 88.4%

Amusement 90.4%

Contentment 52.4%

Pleasure 61.6%

Relief 83.9%

73%-94%

Amusementdisgust

Pleasuresadness

Sadnessrelief24%17.5%

Amusement12%

7.9%

? ?

Achievement, Amusement, Anger,

Contentment, Disgust, Pleasure, Relief,

Sadness, Surprise

: 69.9%

Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.

94

Laugh

Cry

Sigh

Whisper

Whine

Laukka, Petri, et al. "Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations." Frontiers in Psychology 4 (2013).Gupta, Rahul, et al. "Detecting paralinguistic events in audio stream using context in features and probabilistic decisions." Computer Speech & Language 36 (2016): 72-92.

Laughter & Fillers2015

IS2013 sub-challenge

AUC for Detection

Laughter : 95.3 %

Fillers : 90.4 %

Cross-Culture2013

Universal Emotion

Non-Verbal Signals

Speak : India, USA,

Kenya, Singapore,

Listen : Sweden

95

?

Sahu, Saurabh & Gupta, Rahul & Sivaraman, Ganesh & AbdAlmageed, Wael & Espy-Wilson, Carol. (2017). Adversarial Auto-Encoders for Speech Based Emotion Recognition. 1243-1247. 10.21437/Interspeech.2017-1421. Rao, K. Sreenivasa, Shashidhar G. Koolagudi, and Ramu Reddy Vempada. "Emotion recognition from speech using global and local prosodic features." International journal of speech technology 16.2 (2013): 143-160.Lalitha, S., et al. "Emotion detection using MFCC and Cepstrum features." Procedia Computer Science 70 (2015): 29-35.Huang, Che-Wei, and Shrikanth Narayanan. "Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition." arXiv preprint arXiv:1706.02901 (2017).Lee, Jinkyu, and Ivan Tashev. "High-level feature representation using recurrent neural network for speech emotion recognition." INTERSPEECH. 2015.

Emo-DB

ProsodicSVM

62.43%

MFCCANN

85.7%

Deep Convolution

High-Level Representation

(time series)

96

Dimosa, Kostis, Leopold Dickb, and Volker Dellwoc. "Perception of levels of emotion in speech prosody." The Scottish Consortium for ICPhS (2015).Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.

emotional prosody does not function categorically, distinguishing only different emotions, but also indicates different degrees of the expressed emotion.

pitch and pitch variation is especially important for people to recognize emotion from non-verbal sounds

voice quality tension

Some Experiments : change the sound (remove pitch, noisy channel, )

(descriptors)

A Review : Research Findings of Acoustic and Perceptual Studies

97

Flow chart

Learning Representation Discriminative Model

98

(Low-level Descriptors)Low Level Descriptors (10 15 ms)

Mel Frequency Cepstral CoefficientsPitchSignal EnergyLoudnessVoice Quality (Jitter, Shimmer)Log Filterbank EnergiesLinear Prediction Cepstral CoefficientsCHROMA and CENS Features (Music)

Compute

Statistics Method

Continuous Qualitative Spectral

PitchEnergy

Formants

Voice quality :Harsh, tense,

breathy

LPCMFCCLFPC

99

:

Arias, Juan Pablo, Carlos Busso, and Nestor Becerra Yoma. "Shape-based modeling of the fundamental frequency contour for emotion detection in speech." Computer Speech & Language28.1 (2014): 278-294.

emotionally salient temporal segments

75.8% in binary emotion classification Dot, dash : subjective, dev. of sujective Solid : objective

100

Source Filter

(ex) High arousal

PhysicallyVocal Production System Respiration Vocal Fold Vibration Articulation

increase tension in laryngeal musculature

raised subglottis pressure

change production of sound at glottis

vocal quality

Johnstone, Tom & Scherer, Klaus. (2000). Vocal communication of Emotion. Handbook of Emotions,. . 101

Mel-scale Filter Bank

The response of the basilar membrane as a function of frequency, measured at six different distances from the stapes

The psychoacoustical transfer function

Stern, Richard M., and Nelson Morgan. "Features based on auditory physiology and perception." Techniques for Noise Robustness in Automatic Speech Recognition (2012): 193227.102

Support Vector Machine (SVM)

Convolutional Neural Network

Hidden Markov Model (HMM)

Recurrent Neural Network

Time series Model

103

End to End From LLD to Deep Learning

Z. Aldeneh and E. M. Provost, "Using regional saliency for speech emotion recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 2741-2745. doi: 10.1109/ICASSP.2017.7952655C. W. Huang and S. S. Narayanan, "Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition," 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 583-588. doi: 10.1109/ICME.2017.8019296

signal Neural Network emotion

CNN for Time Series SignalAttention

104

YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software, B.Mathieu, S.Essid, T.Fillon, J.Prado, G.Richard, proceedings of the 11th ISMIR conference, Utrecht, Netherlands, 2010.Florian Eyben, Felix Weninger, Florian Gross, Bjrn Schuller: Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor, In Proc. ACM Multimedia (MM), Barcelona, Spain, ACM, ISBN 978-1-4503-2404-5, pp. 835-838, October 2013. doi:10.1145/2502081.2502224Paul Boersma & David Weenink (2013): Praat: doing phonetics by computer [Computer program].

105

Paralinguistic Expression

Linguistic Expression

106

?

Schwarz-Friesel, Monika. "Language and emotion." The Cognitive Linguistic Perspective, in: Ulrike Ldtke (Hg.), Emotion in Language. TheoryResearchApplication, Amsterdam (2015): 157-173.

LexiconGrammarIdeational Meaning

Lindquist, Kristen A., Jennifer K. MacCormack, and Holly Shablack. "The role of language in emotion: predictions from psychological constructionism." Frontiers in psychology 6 (2015).

developmental and cognitive science, demonstrating that language helps

107

Human Behavior Evaluation

Cuples Therapy

Oral Presentation

Reviews

Hotels HBRNN

Amazon Cross-Lingual

Movie (93%), Book (92%), DVD (93%), PNN + RBM

Tweets

Positive & Negative

DCNN & LSTM

Ain, Qurat Tul, et al. "Sentiment analysis using deep learning techniques: a review." Int J Adv Comput Sci Appl 8.6 (2017): 424108

Review Article Social Media Talk

, ?

Its terrible!

What Texts Tell Us (Topics) Emotional Polarity

Its cool!

Parts of Speech (POS) tagsN-Gram

https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

VB

VBD

NN

NNS

JJJJR

JJS

IN

TO

109

Dictionary-Based Sentiment Analysis

110

Changqin Quan and Fuji Ren. 2009. Construction of a blog emotion corpus for Chinese emotional expression analysis. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3 (EMNLP '09), Vol. 3. Association for Computational Linguistics, Stroudsburg, PA, USA, 1446-1454.Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a wordemotion association lexicon." Computational Intelligence29.3 (2013): 436-465.Pennebaker, James W., et al. The development and psychometric properties of LIWC2015. 2015.

:

LIWC (Linguistic Inquiry Word Count) : ()644500/ : 406/499

Seed WordGold Standard :

1.

2.

111

Data-driven?Sentiment Analysis (Unsupervised)Data-DrivenLatent Structure (representation) recognition

112

Sentiment Analysis (Supervised)

= 0.76 0.6 0.79 = 0.66 = 0.73 (Nave Bayes, SVM)

Aman, Saima, and Stan Szpakowicz. "Identifying expressions of emotion in text." Text, speech and dialogue. Springer Berlin/Heidelberg, 2007.

Feature RepresentationClassifier

Emotion Label

113

Deep Model

Lopez, Marc Moreno, and Jugal Kalita. "Deep Learning applied to NLP." arXiv preprint arXiv:1703.03091 (2017)..

Embed Embed Embed

I

LSTM LSTM LSTM

love it

positive

114

?

115

Automatic Speech Recognition (ASR)

f( ) = Speech Text

Challenging Task

speaker gender

Mapping / Translation

116

phoneme

Aldeneh, Zakaria & Khorram, Soheil & Dimitriadis, Dimitrios & Mower Provost, Emily. (2017). Pooling acoustic and lexical features for the prediction of valence. 68-72. 10.1145/3136755.3136760.

AffectNatural Language

Non-Verbal

SpeechBio-

Information

Image

Pooling Intermediate Representation

Performance

Robustness

117

Facial Action Coding SystemFACS

AU

118

Facial Action Coding System (FACS)

FACS

The tool for annotating facial expressions

What The Face Reveals is strong evidence for the fruitfulness of the systematic analysis of facial expression

Paul Ekman and Wallace V. Friesen 1976

Action Unit (AUs)

AUs are considered to be the smallest visuallydiscernible facial movement

As AUs are independent of any interpretation,they can be used as the basis for recognition ofbasic emotions

Its an explicit means of describing all possiblemovements of face in 46 action points

Action Unit (AUs) FACS is a tool for measuring facial expressions Each observable component of facial moment is called

an AUs

All facial expressions can be broken down into their constituent AUs

AU Description Example AU Description Example

1 Inner Brow Raiser 12 Lip Corner Puller

4 Brow Lowerer 13 Cheek Puffer

7 Lid Tightener 20 Lip stretcher

AU framework

Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)

Automatic face & Facial feature detection

Face alignment

Multiple image windows at a variety

of Locations and scales

Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation

Image filter:Modify or enhance the image

Facial AU(e.g., AU1, AU7,

AU6+ASU15, etc)

Rule-basedclassifier

e.g., Gabor filter coefficients



Face alignment






AU6+ASU15, etc)


"Recognizing action units for facial expression analysis.Tian, Y-I., Takeo Kanade, and Jeffrey F. Cohn.

Recognize AUs for Facial Expression Analysis- Rule-based Classifier

Informed by FACS AUs, they group the facial features into upper and lower parts because the facial actions in two sides are relatively independent for AU recognition [14]P. Ekman and W.V. Friesen, The facial action coding system: A technique for the measurement of facial movement

single AU detection

combined AU detection

Recognize AUs for Facial Expression Analysis- Results

AU detectionEkman-Hager

Single AUdetection

Combine AUdetection

Recognition rate

Upperface 75 % 86.7 %

Lowerface 95.8 % 90.7 %

AU detectioncross database

Test databasesTrain

databaseCohn-Kanade

Ekman-Hager

Recognition rate

Upperface 93.2 % 86.7 %

Ekman-Hager

Lowerface 90.7 % 93.4 %

Cohn-Kanade

:

AU,



Face alignment






AU6+ASU15, etc)


"Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds Mustafa Sert, and Nukhet Aksoy

AAM face track model

Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds

Extract facial images from Active Appearance Model (AAM) to form an appearance model

Facial AU multi-class classification using ADT for both AU detection and facial expression recognition

ADT learns a separate decision threshold for each AU category, assign instance to category if and only if satisfied:

= + >

is the mapping function to map SVM to high dimension space


: Prototypic and major variants of AU combinations for facial expression fear. + denotes logical AND , indicates logical OR

Facial expression recognition accuracy of the proposed scheme. Bold bracketed numbers indicate best result, bold numbers denote second best


ADT-based AU detector along with the rule-based emotion classifier (B&D) outperforms the baseline methods (A&C)

Among the proposed method, D gives best results in all facial emotion categories except surprise

The proposed ADT scheme outperforms the baseline method by an average F1-score of 6.383% for 17 AUs

It gives superior performance in terms of F1-score compared with the baseline method for all AUs except AU2



Face alignment






AU6+ASU15, etc)


"Compound facial expressions of emotion: from basic research to clinical applicationsShichuan Du, and Aleix M. Martinez

Observations under distinct compound emotions

Compound facial expressions of emotion

Compound facial expressions of emotion

AU intensity shown in a cumulative histogram for each AU and emotion category

The x-axis in these histograms specifies the intensity of activation The y-axis in these histograms defines the cumulative percentage of intensity

(scale 0 to 1)

Numbers between zero and one specify the percentage of people using the specified and smaller intensities.

Fig. AUs used to express a compound emotion are consistent with the AUs used to express its component categories

Key take-away

AU

AU ! ! !

AUDNN

++

135

136


Non-Verbal

Speech Physiology

Face

Body Gestures

?

1234567

reference: de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247268. 137

Psychology Bull, P. E. Posture and gesture. Pergamon press, 1987. Pollick, F. E., Paterson, H. M., Bruderlin, A., and Sanford, A. J. Perceiving affect from arm

movement. Cognition 82, 2 (2001), B51B61.

Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions, and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117139.

Boone, R. T., and Cunningham, J. G. Childrens decoding of emotion in expressive body movement: The development of cue attunement. Developmental psychology 34 (1998), 10071016.

de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247268.

Engineer Balomenos, T., Raouzaiou, A., Ioannou, S., Drosopoulos, A., Karpouzis, K., and Kollias, S.

Emotion analysis in man-machine interaction systems. In Machine learning for multimodal interaction. Springer, 2005, 318328.

Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions, and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117139.

reference: Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 2014 138

?

12 actorsfour female and eight malesaged between 24 and 60total of about 100 videosseparate clips of expressive gesture

reference: 1) Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-

time Automatic Emotion Recognition from Body Gestures in ArXiv 20142) Amol S. Patwardhan and Gerald M. Knapp, Augmenting Supervised Emotion Recognition with

Rule-Based Decision Model. in ArXiv 2016

QualisysKinect

139

Data Validation Human annotation?

The sole 3D skeleton is a guarantee that the user is not exploiting other information

Not easy for human to

recognize emotion only based

on gesture

reference: Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 2014 140

Skeleton based feature

anger sadness

happiness fear

surprise disgust

reference:1)Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 20142) Piana, S., Stagliano`, A., Camurri, A., and Odone, F. A set of full-body movement features for emotion recognition to help children affected by autismspectrum condition. In IDGEI International Workshop (2013).

Histogram: Energy on each of frames

141

Classification Result

Qualisys Data with 310 gestures

Kinect Data with 579 gestures

Clean Dataset

Noisy Dataset

Almost the same to the humans recognition ability 142

Skeleton Capture Method

Kinect

reference:https://itp.nyu.edu/classes/dance-f16/kinect/,

https://github.com/CMU-Perceptual-Computing-Lab/openposehttps://www.qualisys.com/

expensive, sophisticated system with multiple high speed camera

cheap, easy to get RGB-D 3D camera device

free, new software system with CNN

Qualisys

OpenPose

143

https://itp.nyu.edu/classes/dance-f16/kinect/https://github.com/CMU-Perceptual-Computing-Lab/openposehttps://www.qualisys.com/

OpenPose: CNN based Method

144

Pose difference/movement indicative of arousal mostly

145


Non-Verbal

Speech Physiology

Face

Body Gestures

1

/

2

4

1

*

*

*

3

6

1

**

1

5

Expressive Data AI Learning and Inference ?

146

1

/

2

4

1

*

*

*

3

6

1

**

1

5

Internal Data AI Learning and Inference ?

147

-Polyvagal theory

Stephen Porges

148

(immobilization)

(fight -flight)(mobilization)

149

reference:D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168172, 2012http://blog.sina.com.cn/s/blog_753e49f90100pop2.htmlhttp://www.xzbu.com/6/view-2908185.htm

150

http://blog.sina.com.cn/s/blog_753e49f90100pop2.htmlhttp://www.xzbu.com/6/view-2908185.htm

HRV (Heart Rate Variability)

ANS

(HRV)

(HRV analysis) [2]

reference: 1) Mara Teresa Valderas , Juan Bolea, Pablo Laguna, Montserrat Vallverd, Raquel Bailn, Human Emotion Recognition Using Heart Rate Variability Analysis with Spectral Bands Based on Respiration in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE.2) Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996) Heart rate variability. Standards of measurement, physiological interpretation, and clinical use. Eur Heart J 17(3):354-813) D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168172, 2012.

151

HRV

total power, TP

ms2

0.4Hz

very low

frequency power, VLFP

ms2

0.04Hz

low frequency

power, LFPms2

0.04-0.15Hz

high frequency

power, HFPms2

0.15-0.4Hz

normalized LFP, nLFP

,n.u. LF/(TP-VLF)

normalized HFP,nHFP

,n.u. HF/(TPVLF)

LF/HF

https://zh.wikipedia.org/wiki/%E5%BF%83%E7%8E%87%E8%AE%8A%E7%95%B0%E5%88%86%E6%9E%90152

Emotion elicitation real experiences film clips problem solving computer game interfaces images spoken words music

Movie clips method emotion inducing method more efficient than others verified by

previous studies 4 films (3- 10 min for each one) 4 emotion: angry, fear, sad and happy ECG data was record for 90 sec at 2 min before the end of movies.

reference: 1)Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 20152) Mimma Nardelli, Gaetano Valenza, Alberto Greco, Antonio Lanata, Enzo Pasquale Scilingo, Recognizing Emotions Induced by Affective Sounds through Heart Rate Variability in IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 6, NO. 4, OCTOBER-DECEMBER 2015

induced

153

ECG process pipeline

reference: Abhishek Vaish and Pinki Kumari, A Comparative Study on Machine Learning Algorithms in Emotion State Recognition Using ECG in Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012

154

ECG Feature Extraction: HRV

Time Domain Feature Frequency Domain Feature

reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015

1. MeanRRI:

average of resultant RR intervals. 2. CVRR:

the ratio of the standard deviation and mean of RR intervals.

3. SDRR:stand deviation of the RR intervals.

4. SDSD:

standard deviation of the successivedifferences of the RR intervals.

1. LF (low frequency):

standardized LF power (0.04-0.15 Hz)

2. HF (high frequency):

standardized HF power (0.15-0.4 Hz)

3. LHratio:the ratio of LF/HF

he shapes of the probability distributions

Statistic Feature

Evaluate the distribution :

155

Analysis on Feature

Time Domain Feature Frequency Domain Feature Statistics Feature

reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015

156

Classifier

reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015 157

,

158

1

/

2

4

1

*

*

*

3

6

1

**

1

5

? !

159

Group-Level EmotionThin Slice

: Multi-Task

Cross CorpusCommon ground

Cross Lingual Perspective

161

LLD

: ?

Encoding

Result & Discussion (Binary Classification: Unweighted Average Recall)

Database Act. Feature Rep. Val. Feature Rep.

CIT 0.658 Praat BoAW 0.613 Praat FV

IEMOCAP 0.769 EGEMAPS Func. 0.663 Praat FV

NNIME 0.65 Praat FV 0.564 Praat BoAW

RECOLA 0.634 EGEMAPS Func. 0.602 Praat BoAW

VAM 0.811 ComP_LLD FV 0.665 EGE_LLD BoW

Variational Deep Embedding Fisher Scoring

: ?

: !

Generated Perspectives Multi-view Kernel Fusion

1. Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition"in Proceedings of ACII 2017

2. Chun-Min Chang, Chi-Chun Lee, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP), 2017

?

Group-Level EmotionThin Slice

Multi-Task

Cross CorpusCommon ground

Cross Lingual Perspective

166

memory

cognitive

emotion

167

1

2 3

4

(/)

168

alexithymia

169

1

2

3

4

5

170

(/)Mental Well-being

()

Ref: https://ac.els-cdn.com/S1877042815003080/1-s2.0-S1877042815003080-main.pdf?_tid=238e46fe-da36-11e7-86e8-00000aab0f26&acdnat=1512531351_6db4641b5d3531d365e0f207f474d65f

171

()Well-beings

(Boredom, it turns out, can be a dangerous and disruptive state of mind that damages your health)(Manns )

Ref: http://alcoholrehab.com/drug-addiction/boredom-and-substance-abuse/

Boredom

Ref: On the Function of Boredom (Shane W. Bench) 172

Ref: The Facilitation of Social-Emotional Understanding and Social Interaction in High-Functioning Children with Autism: Intervention Outcomes

:

Ref: Social Skills Deficits in Children with Autism Spectrum Disorders: Evidence Based Interventions

173

(Pintrich, 1991, p. 199)

( special issue of theEducational Psychologist)

Ref: The Importance of Students Goals in TheirEmotional Experience of Academic Failure:Investigating the Precursors and Consequences of Shame (Jeannine E. Turner )

174

The importance of Students Goals in Their Emotional Experience of Academic Failure: Investigating the Precursors and Consequences of Shame,

The Importance of Students Goals in TheirEmotional Experience of Academic Failure:Investigating the Precursors and Consequences ofShame (Jeannine E. Turner)

175

FMRI()

Ref: https://blog.hubspot.com/marketing/emotions-in-advertising-examples

176

(communication and perception of emotion in music)

(emotional consequences of music listening)

(predictors of music preferences)

Swathi Swaminathan

Ref: Current Emotion Research in Music Psychology (Swathi Swaminathan )177

SENSE EMOTION

..

178

1

/

2

4

1

*

*

*

3

6

1

**

1

5

!

179

functional Magnetic Resonance Imaging (fMRI)

180

Uses a standard MRI scanner Acquires a series of images (numbers) Measure changes in blood oxygenation Use non-invasive, non-ionizing radiation Can be repeated many times; can be used for a

wide range of subjects

Combines good spatial and reasonable temporal resolution

Synopsis of fMRI

182

Blood-Oxygen-Level dependent (BOLD)

183

Emotion Perception Decoding from fMRI

fMRI Dataset

Interactionbehavior

SPM Preprocessing

Emotion

MachineLearning

Behaviorobservation

184

Emotional modules

185

Co-activation graph for each emotion category

A) Force-directed graphs for each emotion category, based on the Fruchterman-Reingold spring algorithm

B) The same connections in the anatomical space of the brain.

186 -

?

187

?

1

/

2

4

1

*

*

*

3

6

1

**

1

5

INFERENCE?

188

189

?

Our Research: Human-centered Behavioral Signal Processing (BSP)

Prof. Shrikanth Narayanan

Seek a window into human mind and traits

through engineering approach

S. Narayanan and P. G. Georgiou, Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 12031233, 2013.Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, Shrikanth Narayanan, "Signal Processing and Machine Learning for Mental Health Research and Clinical Applications", in IEEE Signal Processing Magazine

(EECS713)Behavioral Informatics and Interaction Computation Laboratory (BIIC)

(Signal Processing)

(Machine Learning)

(Decision Analytics)

High-dimensional Behavior Space, Non-linear Predictive Recognition, Multimodal Integration, Experts Decision Mechanism

Spatial-temporal modelingDe-noisingFeature extraction

SupervisedUnsupervisedSemi-supervised

Our Technology: Human-centric Decision Analytics Research & Development

Core Technology

Speech & LanguageDiarization, SpeakerID, ASR,

Paralinguistic Descriptors, Emotion-AI, Sentiment, Word-topic

Representation

Computer VisionSegmentation, Tracking, Image-

Video Descriptors

Multimodal FusionJoint speech-language-gesture

modeling for multimodal prediction, Multi-party interaction modeling

Representation LearningBehavior embedded space learning,

clinical health informatics data representation

Predictive LearningDeep-learning, machine learning

based predictive modeling

BIICInterdisciplinary

Research

ASD

PAIN

EHR

fMRI

EMO-AI

ORAL

Mental Health

Clinical Health

Affective Computing

Education

Our Application: Human-centered Exemplary BSP Domains

Flow

Consumer Behavior

EMO-AI

Neuroscience

KEY APPLICATIONS

Affective Computing

Mental Health

Clinical Health

Education

Neuroscience

Consumer Behavior

:

193

:

Computing beyond status-quo in making a positive impact

Factual Conceptual Procedural Metacognitive

Computation Blueprints

BehaviorComputing

HealthAnalytics

Affect Recognition

Emphatic Computing

Social Computing

Value-Sensitive

TechnologyAffectiveFeedback

Interpersonal RelationshipComputing

Cognitive Feedback

Fulfillment Empowerment

Motivation

Internal States

External Functions

Our Vision: Human-Centric Computing (HCC)computationally innovate human-centric empowerment enabling next-generation entity intelligence

195

PHD

BIIC LAB MEMBERS

196

BIIC Lab @ NTHU EE

http://biic.ee.nthu.edu.tw

THANK YOU . . .

Data & Analytics

[系列活動] Emotion-AI: 運用人工智慧實現情緒辨識