Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Mohammad Bagheri, Negar Namdarian
Technical University of Munich
Department of Informatics
Master Seminar Emotional Awareness in Autonomou driving
28. June 2019)
Affective Computing in Psychology and Computer
Science, Challenges and Advancements
1. Introduction
2. Challenges
3. Proposed Approaches (Emotion Recognition Techniques)
a. Speech Emotion Recognition
b. Facial Expression Recognition
c. Multimodal Systems
4. Conclusion and Summary
Outline
3
1. introduction
a. Affect Definition in Psychology and Computer science
b. The Necessity of Affective Computing
c. Levels of Automation in Autonomous Driving
d. Affective Computing in Autonomous Driving
4
Introduction
Affect Definition in Psychology and Computer science:
● Psychology: Discrete categories such as fear, sadness, anger, joy, disgust,
surprise, love.
● Computer Science: Affect Recognition vs. Sentiment Recognition
● Affect Recognition:
fine-grained affect recognition, as it aims to classify data to a large set of
emotion labels
● Sentiment Recognition:
coarse-grained affect recognition - binary classification task (positive versus
negative)
5
6
Affect Taxonomy:
The Necessity of Affective Computing:
● Could improve the systems’ functionality, Artificial intelligence and the ability
to respond according to user affective feedback.
● Increase Trustworthiness, Reliability and Security.
7
https://andrewbolwell.com/2014/09/22/robots-and-human-emotion-2/
Levels of Automation in Autonomous Driving
8
Affective Computing in Autonomous Driving
9
Why do we need Emotion Recognition in Autonomous Driving?
● Increase the acceptance and reliability among users
● inducing a sense of security and reliability to users
● Detecting drowsiness of the driver while driving [facial analysis]
● Making sure the driver is paying enough attention to the road
● Prevention of following dangerous orders from the driver
● Correcting the misinterpreted command from the driver
● Introducing a more enjoyable and comfortable atmosphere to the cabin for
the passengers
Affective Computing in Autonomous Driving
10
https://www.notanygadgets.com/product/meet-cozmo-the-ai-robot-with-emotions/
11
Challenges
Challenges of Emotion Recognition in Autonomous Driving:
● Robustness
● Accuracy
● Reliability
● Security
● Challenges of Affective Computing
(will be discussed in each Affective Computing methods)
12
Proposed Approaches
(Emotion Recognition Techniques)
● Speech Emotion Recognition
● Facial Expression Recognition
● Physiological Emotion Recognition Techniques
● Body Gesture and Movement
● Multimodal Systems
13
Emotion Recognition Techniques
Speech Emotion Recognition:
Recognizing speakers emotional state from speech signal
The main issues:
● selection of suitable Feature set,
● choice of proper classifiers
● preparation of suitable data sets/data bases.
14
Proposed Approaches (Emotion Recognition
Techniques)
Applications:
● speech synthesis
● customer service
● education(E-learning)
● Medical purposes(rehabilitation,counseling)
● Entertainment(Music player)
15
Speech Emotion Recognition:
Speech databases:
1. Natural speech databases:
collected naturally/spontaneously from speakers in real conditions
example: AIBO database
1. Acted speech databases:
consist of emotional utterances spoken by professional or unprofessional
actors
example: Berlin Database of Emotional Speech
16
Speech Emotion Recognition:
Speech databases:
17
Speech Emotion Recognition:
Corpus Name language Type Emotions
AIBO German natural anger,boredom,emphati
c,helpless,ironic,joyful,
motherese,reprimandin
g,rest , surprise, touchy
Berlin databse of
emotional speech
German Acted happiness,anger,anxiet
y,fear,boredom,disgust,
neutral
INTERFACE English,Spanish,French
, Slovenian
Acted Anger,Sadness,Joy,Fe
ar,Disgust, Surprize,
neutral(soft,slow,normal
,loud,fast)
Danish Emotional
Speech Database
(DES)
Danish Acted angry, happy, sad,
surprise, and neutral
Speech emotion recognition System steps:
Preprocessing: remove noise from signal and obtain high frequency characteristics
of signals. Afterwards send signal to Feature extraction and selection steps.
18
Speech Emotion Recognition:
Emotional
speech Input
Feature
Extraction
Feature
Selectionclassification
Emotional
speech
output
19
Speech Emotion Recognition:
Emotional
speech Input
Feature
Extraction
Feature
Selectionclassification
Emotional
speech
output
speech Feature vectors categorization:
1. short-time(segmental): calculated once for every small time frame(25-50
msec)
2. Long-time(suprasegmental): calculated over the entire utterance duration
1. low-level-descriptions (LLDs): contain prosodic features(suprasegmental)
and spectral features(segmental)
2. functionals. LLDs: contain prosodic features(suprasegmental) and
spectral features(segmental)
20
Speech Emotion Recognition:
Emotional
speech Input
Feature
Extraction
Feature
Selectionclassification
Emotional
speech
output
speech Feature vectors categorization:
1. continuous(pitch,energy,.),
2. qualitative(voice quality such as harsh,tense,...),
3. spectral(LPC, MFCC,..)
4. TEO (Teager energy operator)-based features .
21
Speech Emotion Recognition:
Emotional
speech Input
Feature
Extraction
Feature
Selectionclassification
Emotional
speech
output
Feature Extraction Methods:
reduce the dimension of input vector while maintaining the perceptive power
of the signal.
● mel-frequency cepstral coefficients (MFCCs)
● linear predictive cepstral coefficients (LPCCs),
● perceptual linear predictive coefficients (PLPs)
22
Speech Emotion Recognition:
Emotional
speech Input
Feature
Extraction
Feature
Selectionclassification
Emotional
speech
output
Feature Selection Methods:
● Principal Component Analysis (PCA)
● Canonical Correlation Analysis (CCA)
23
Speech Emotion Recognition:
Emotional
speech Input
Feature
Extraction
Feature
Selectionclassification
Emotional
speech
output
Single Classifiers:
● support vector machine (SVM)
● hidden Markov models (HMM)
● Artificial neural network(ANN)
● K-nearest neighbor
● Gaussian mixture model (GMM)
24
Speech Emotion Recognition:
Emotional
speech Input
Feature
Extraction
Feature
Selectionclassification
Emotional
speech
output
Hybrid Classifiers:
● Use of various classifiers in order to improve the performance
25
Speech Emotion Recognition:
Hybrid Classifiers:
Challenges:
● selection of best features set
● Different languages, Accents, speaking styles
● Limitations of Speech databases and difficulties of obtaining natural speech
database(ethics,..)
26
Speech Emotion Recognition:
Facial Expression Recognition:
● one of the most popular and effective approaches in autonomous vehicles
● classifies facial feature deformations and facial motions into emotion
categories.
27
Proposed Approaches (Emotion Recognition
Techniques)
Facial Expression:
could be described by three characteristics:
1. Onset Phase(attack): the intensity of the muscle activation increases toward
the apex phase.
1. Apex phase(sustain): the plateau when the intensity of the muscle activation
stabilizes.
1. Offset Phase( relaxation): progressive muscular relaxation toward the
neutral(no manifestation of muscle activation)
28
Facial Expression Recognition:
Facial Expression Measurement Methods:
● Judgment-based approach: specific facial patterns directly mapped into
emotional categories( Anger, Disgust, Fear, Happiness, Sadness and Surprise)
● Sign-based approach: facial motion and deformation are coded into visual
classes.Facial actions are hereby abstracted and described by their location
and intensity.
29
Facial Expression Recognition:
Facial Expression Analysis steps:
30
Facial Expression Recognition:
Face
Acquisition
Facial
Feature
Extraction
Facial
Expression
Classification
output
31
Facial Expression Recognition:
Face
Acquisition
Facial
Feature
Extraction
Facial
Expression
Classification
output
automatic face detector which can locate faces in any complicated or
cluttered background.
Pose and Illumination variations affect the accuracy, Hence some
normalizations are recommended before advancing to next steps.
32
Facial Expression Recognition:
Face
Acquisition
Facial
Feature
Extraction
Facial
Expression
Classification
output
● Local vs holistic approaches: In holistic feature expression analysis
the whole face is being processed where as in local feature
expression analysis the areas of face which are affected by the facial
expressions are being processed.
33
Facial Expression Recognition:
Face
Acquisition
Facial
Feature
Extraction
Facial
Expression
Classification
output
● Deformation vs motion based approaches:
Motion-based concentrate directly on the facial changes which occur
as a result of facial expressions. deformation based approach
requires a neutral image of the face or a face model in order to
function. Deformation based approaches extract the facial features
which are only relevant to facial actions and not the irrelevant data
(wrinkles of old age).
34
Facial Expression Recognition:
Face
Acquisition
Facial
Feature
Extraction
Facial
Expression
Classification
output
● SVM(Support Vector Machine)
● NN(Nearest neighbor)
● K-NN
● Linear Regression(LR)
● Partial Least Square(PLS)
● Neural Networks
● rule based classifiers.
Application in Autonomous Driving:
Detecting fatigue in Drivers and hence provide effective methods to improve driving
safety.
Fatigue Detection Methods:
● detecting changes and movements of the eyes (driver’s gaze direction,blinking
rate or eye closure)
● Assessing mouth shape or head position of the driver
Examples:
● DAISY (Driver AssIsting System): monitoring and warning aid for the driver in
longitudinal and lateral control on German motorways
● Copilot : video based system which measures slow eyelid closure.
35
Facial Expression Recognition:
Challenges in Autonomous Driving:
● Illumination
● Pose Variations
● Not accurate enough to be used in fully automated vehicles, only warning
system
36
Facial Expression Recognition:
37
Multimodal Fusion Emotion Recognition
1. Feature Level Fusion or Early Fusion:
Features extracted from different channels like audio, text, visual, etc. are fused
as a general feature vector and finally the combined features are sent for the
analysis.
● Advantages:
The correlation between various multimodal features at an early stage can
potentially provide better task accomplishment.
● Disadvantages:
The features obtained belong to diverse modalities and can differ widely in
many aspects
38
Information Fusion Techniques
2. Decision level fusion or late fusion:
Features of each modality are examined and classified separately, and then the
results are fused as a decision vector to obtain the final decision.
Advantages:
● The fusion of decisions obtained from various modalities becomes easy
compared to feature-level fusion, since the decisions resulting from multiple
modalities usually have the same form of data.
● Every modality can utilize its best suitable classifier or model to learn its
features
★ Researchers have tended to prefer decision-level fusion over feature-level fusion.
39
Information Fusion Techniques
3. Hybrid multimodal fusion:
Combination of both feature-level and decision-level fusion methods.
● Advantages:
Advantages of both early fusion and late fusion techniques!
40
Information Fusion Techniques
4. Model-level fusion:
It is a technique that uses the correlation between data observed under different
modalities, with a relaxed fusion of the data.
● Rule-based fusion methods:
linear weighted fusion ,majority voting , custom-defined rules
● Classification-based fusion methods:
SVMs, Bayesian inference, Dempster-Shafer theory, dynamic bayesian
networks, neural networks and maximum entropy models.
● Estimation-based fusion methods:
kalman filter, extended kalman filter and particle filter based fusion methods.
41
Information Fusion Techniques
42
Multimodal Fusion Emotion Recognition
Challenges in Multimodal fusion affect computing:
• Continuous data from real noisy sensors may generate incorrect data.
• Identifying whether the extracted audio and utterance refer to the same content.
• Multimodal affect analysis models should be trained on Big Data from diverse
contexts, in order to build generalized models.
• Effective modeling of temporal information in the Big Data.
• For real-time analysis of multi-modal Big Data, an appropriately scalable Big
Data architecture and platform needs to be designed, to effectively cope with the
heterogeneous Big Data challenges of growing space and time complexity.
43
Conclusion and Summary
Speech Modality:Advantages: Low Hardware Cost, already existing in cars
Disadvantages:
not all speech captured by the microphone may be relevant.
Drivers aren’t speaking all the time! → no constant and reliable driver monitoring
Vision Modality:
Advantages: Omnipresent
Disadvantages: it suffers from changing lighting conditions, occlusion and
different pose
Multimodality approach:
Advantages: more robustness to the environment and sensor noise, redundant
information improves human performance → computers
Disadvantages: Complexity, finding the best approach for fusing multimodal
information
44
Questions?
45
Thank you for your attention!