Affective Computing in Psychology and Computer Science ... · Affective Computing in Autonomous Driving 9 Why do we need Emotion Recognition in Autonomous Driving? Increase the acceptance

Mohammad Bagheri, Negar Namdarian

Technical University of Munich

Department of Informatics

Master Seminar Emotional Awareness in Autonomou driving

28. June 2019)

Affective Computing in Psychology and Computer

Science, Challenges and Advancements

1. Introduction

2. Challenges

3. Proposed Approaches (Emotion Recognition Techniques)

a. Speech Emotion Recognition

b. Facial Expression Recognition

c. Multimodal Systems

4. Conclusion and Summary

Outline

3

1. introduction

a. Affect Definition in Psychology and Computer science

b. The Necessity of Affective Computing

c. Levels of Automation in Autonomous Driving

d. Affective Computing in Autonomous Driving

4

Introduction

Affect Definition in Psychology and Computer science:

● Psychology: Discrete categories such as fear, sadness, anger, joy, disgust,

surprise, love.

● Computer Science: Affect Recognition vs. Sentiment Recognition

● Affect Recognition:

fine-grained affect recognition, as it aims to classify data to a large set of

emotion labels

● Sentiment Recognition:

coarse-grained affect recognition - binary classification task (positive versus

negative)

5

6

Affect Taxonomy:

The Necessity of Affective Computing:

● Could improve the systems’ functionality, Artificial intelligence and the ability

to respond according to user affective feedback.

● Increase Trustworthiness, Reliability and Security.

7

https://andrewbolwell.com/2014/09/22/robots-and-human-emotion-2/

https://andrewbolwell.com/2014/09/22/robots-and-human-emotion-2/

Levels of Automation in Autonomous Driving

8

Affective Computing in Autonomous Driving

9

Why do we need Emotion Recognition in Autonomous Driving?

● Increase the acceptance and reliability among users

● inducing a sense of security and reliability to users

● Detecting drowsiness of the driver while driving [facial analysis]

● Making sure the driver is paying enough attention to the road

● Prevention of following dangerous orders from the driver

● Correcting the misinterpreted command from the driver

● Introducing a more enjoyable and comfortable atmosphere to the cabin for

the passengers

Affective Computing in Autonomous Driving

10

https://www.notanygadgets.com/product/meet-cozmo-the-ai-robot-with-emotions/

https://www.notanygadgets.com/product/meet-cozmo-the-ai-robot-with-emotions/

11

Challenges

Challenges of Emotion Recognition in Autonomous Driving:

● Robustness

● Accuracy

● Reliability

● Security

● Challenges of Affective Computing

(will be discussed in each Affective Computing methods)

12

Proposed Approaches

(Emotion Recognition Techniques)

● Speech Emotion Recognition

● Facial Expression Recognition

● Physiological Emotion Recognition Techniques

● Body Gesture and Movement

● Multimodal Systems

13

Emotion Recognition Techniques

Speech Emotion Recognition:

Recognizing speakers emotional state from speech signal

The main issues:

● selection of suitable Feature set,

● choice of proper classifiers

● preparation of suitable data sets/data bases.

14

Proposed Approaches (Emotion Recognition

Techniques)

Applications:

● speech synthesis

● customer service

● education(E-learning)

● Medical purposes(rehabilitation,counseling)

● Entertainment(Music player)

15


Speech databases:

1. Natural speech databases:

collected naturally/spontaneously from speakers in real conditions

example: AIBO database

1. Acted speech databases:

consist of emotional utterances spoken by professional or unprofessional

actors

example: Berlin Database of Emotional Speech

16


Speech databases:

17


Corpus Name language Type Emotions

AIBO German natural anger,boredom,emphati

c,helpless,ironic,joyful,

motherese,reprimandin

g,rest , surprise, touchy

Berlin databse of

emotional speech

German Acted happiness,anger,anxiet

y,fear,boredom,disgust,

neutral

INTERFACE English,Spanish,French

, Slovenian

Acted Anger,Sadness,Joy,Fe

ar,Disgust, Surprize,

neutral(soft,slow,normal

,loud,fast)

Danish Emotional

Speech Database

(DES)

Danish Acted angry, happy, sad,

surprise, and neutral

Speech emotion recognition System steps:

Preprocessing: remove noise from signal and obtain high frequency characteristics

of signals. Afterwards send signal to Feature extraction and selection steps.

18


Emotional

speech Input

Feature

Extraction

Feature

Selectionclassification

Emotional

speech

output

19


Emotional

speech Input

Feature

Extraction

Feature


Emotional

speech

output

speech Feature vectors categorization:

1. short-time(segmental): calculated once for every small time frame(25-50

msec)

2. Long-time(suprasegmental): calculated over the entire utterance duration

1. low-level-descriptions (LLDs): contain prosodic features(suprasegmental)

and spectral features(segmental)

2. functionals. LLDs: contain prosodic features(suprasegmental) and

spectral features(segmental)

20


Emotional

speech Input

Feature

Extraction

Feature


Emotional

speech

output

speech Feature vectors categorization:

1. continuous(pitch,energy,.),

2. qualitative(voice quality such as harsh,tense,...),

3. spectral(LPC, MFCC,..)

4. TEO (Teager energy operator)-based features .

21


Emotional

speech Input

Feature

Extraction

Feature


Emotional

speech

output

Feature Extraction Methods:

reduce the dimension of input vector while maintaining the perceptive power

of the signal.

● mel-frequency cepstral coefficients (MFCCs)

● linear predictive cepstral coefficients (LPCCs),

● perceptual linear predictive coefficients (PLPs)

22


Emotional

speech Input

Feature

Extraction

Feature


Emotional

speech

output

Feature Selection Methods:

● Principal Component Analysis (PCA)

● Canonical Correlation Analysis (CCA)

23


Emotional

speech Input

Feature

Extraction

Feature


Emotional

speech

output

Single Classifiers:

● support vector machine (SVM)

● hidden Markov models (HMM)

● Artificial neural network(ANN)

● K-nearest neighbor

● Gaussian mixture model (GMM)

24


Emotional

speech Input

Feature

Extraction

Feature


Emotional

speech

output

Hybrid Classifiers:

● Use of various classifiers in order to improve the performance

25


Hybrid Classifiers:

Challenges:

● selection of best features set

● Different languages, Accents, speaking styles

● Limitations of Speech databases and difficulties of obtaining natural speech

database(ethics,..)

26


Facial Expression Recognition:

● one of the most popular and effective approaches in autonomous vehicles

● classifies facial feature deformations and facial motions into emotion

categories.

27

Proposed Approaches (Emotion Recognition

Techniques)

Facial Expression:

could be described by three characteristics:

1. Onset Phase(attack): the intensity of the muscle activation increases toward

the apex phase.

1. Apex phase(sustain): the plateau when the intensity of the muscle activation

stabilizes.

1. Offset Phase( relaxation): progressive muscular relaxation toward the

neutral(no manifestation of muscle activation)

28


Facial Expression Measurement Methods:

● Judgment-based approach: specific facial patterns directly mapped into

emotional categories( Anger, Disgust, Fear, Happiness, Sadness and Surprise)

● Sign-based approach: facial motion and deformation are coded into visual

classes.Facial actions are hereby abstracted and described by their location

and intensity.

29


Facial Expression Analysis steps:

30


Face

Acquisition

Facial

Feature

Extraction

Facial

Expression

Classification

output

31


Face

Acquisition

Facial

Feature

Extraction

Facial

Expression

Classification

output

automatic face detector which can locate faces in any complicated or

cluttered background.

Pose and Illumination variations affect the accuracy, Hence some

normalizations are recommended before advancing to next steps.

32


Face

Acquisition

Facial

Feature

Extraction

Facial

Expression

Classification

output

● Local vs holistic approaches: In holistic feature expression analysis

the whole face is being processed where as in local feature

expression analysis the areas of face which are affected by the facial

expressions are being processed.

33


Face

Acquisition

Facial

Feature

Extraction

Facial

Expression

Classification

output

● Deformation vs motion based approaches:

Motion-based concentrate directly on the facial changes which occur

as a result of facial expressions. deformation based approach

requires a neutral image of the face or a face model in order to

function. Deformation based approaches extract the facial features

which are only relevant to facial actions and not the irrelevant data

(wrinkles of old age).

34


Face

Acquisition

Facial

Feature

Extraction

Facial

Expression

Classification

output

● SVM(Support Vector Machine)

● NN(Nearest neighbor)

● K-NN

● Linear Regression(LR)

● Partial Least Square(PLS)

● Neural Networks

● rule based classifiers.

Application in Autonomous Driving:

Detecting fatigue in Drivers and hence provide effective methods to improve driving

safety.

Fatigue Detection Methods:

● detecting changes and movements of the eyes (driver’s gaze direction,blinking

rate or eye closure)

● Assessing mouth shape or head position of the driver

Examples:

● DAISY (Driver AssIsting System): monitoring and warning aid for the driver in

longitudinal and lateral control on German motorways

● Copilot : video based system which measures slow eyelid closure.

35


Challenges in Autonomous Driving:

● Illumination

● Pose Variations

● Not accurate enough to be used in fully automated vehicles, only warning

system

36


37

Multimodal Fusion Emotion Recognition

1. Feature Level Fusion or Early Fusion:

Features extracted from different channels like audio, text, visual, etc. are fused

as a general feature vector and finally the combined features are sent for the

analysis.

● Advantages:

The correlation between various multimodal features at an early stage can

potentially provide better task accomplishment.

● Disadvantages:

The features obtained belong to diverse modalities and can differ widely in

many aspects

38

Information Fusion Techniques

2. Decision level fusion or late fusion:

Features of each modality are examined and classified separately, and then the

results are fused as a decision vector to obtain the final decision.

Advantages:

● The fusion of decisions obtained from various modalities becomes easy

compared to feature-level fusion, since the decisions resulting from multiple

modalities usually have the same form of data.

● Every modality can utilize its best suitable classifier or model to learn its

features

★ Researchers have tended to prefer decision-level fusion over feature-level fusion.

39


3. Hybrid multimodal fusion:

Combination of both feature-level and decision-level fusion methods.

● Advantages:

Advantages of both early fusion and late fusion techniques!

40


4. Model-level fusion:

It is a technique that uses the correlation between data observed under different

modalities, with a relaxed fusion of the data.

● Rule-based fusion methods:

linear weighted fusion ,majority voting , custom-defined rules

● Classification-based fusion methods:

SVMs, Bayesian inference, Dempster-Shafer theory, dynamic bayesian

networks, neural networks and maximum entropy models.

● Estimation-based fusion methods:

kalman filter, extended kalman filter and particle filter based fusion methods.

41


42

Multimodal Fusion Emotion Recognition

Challenges in Multimodal fusion affect computing:

• Continuous data from real noisy sensors may generate incorrect data.

• Identifying whether the extracted audio and utterance refer to the same content.

• Multimodal affect analysis models should be trained on Big Data from diverse

contexts, in order to build generalized models.

• Effective modeling of temporal information in the Big Data.

• For real-time analysis of multi-modal Big Data, an appropriately scalable Big

Data architecture and platform needs to be designed, to effectively cope with the

heterogeneous Big Data challenges of growing space and time complexity.

43

Conclusion and Summary

Speech Modality:Advantages: Low Hardware Cost, already existing in cars

Disadvantages:

not all speech captured by the microphone may be relevant.

Drivers aren’t speaking all the time! → no constant and reliable driver monitoring

Vision Modality:

Advantages: Omnipresent

Disadvantages: it suffers from changing lighting conditions, occlusion and

different pose

Multimodality approach:

Advantages: more robustness to the environment and sensor noise, redundant

information improves human performance → computers

Disadvantages: Complexity, finding the best approach for fusing multimodal

information

44

Questions?

45

Thank you for your attention!

Documents

Affective Computing in Psychology and Computer Science ... · Affective Computing in Autonomous Driving 9 Why do we need Emotion Recognition in Autonomous Driving? Increase the acceptance