9
EMOTIC: Emotions in Context Dataset Ronak Kosti * , Jose M. Alvarez , Adria Recasens , Agata Lapedriza * Universitat Oberta de Catalunya * Data61 / CSIRO Massachusetts Institute of Technology {rkosti, alapedriza}@uoc.edu * , [email protected] , [email protected] Abstract Recognizing people’s emotions from their frame of ref- erence is very important in our everyday life. This capac- ity helps us to perceive or predict the subsequent actions of people, interact effectively with them and to be sympa- thetic and sensitive toward them. Hence, one should ex- pect that a machine needs to have a similar capability of understanding people’s feelings in order to correctly inter- act with humans. Current research on emotion recognition has focused on the analysis of facial expressions. However, recognizing emotions requires also understanding the scene in which a person is immersed. The unavailability of suit- able data to study such a problem has made research in emotion recognition in context difficult. In this paper, we present the EMOTIC database (from EMOTions In Con- text), a database of images with people in real environ- ments, annotated with their apparent emotions. We defined an extended list of 26 emotion categories to annotate the images, and combined these annotations with three com- mon continuous dimensions: Valence, Arousal, and Dom- inance. Images in the database are annotated using the Amazon Mechanical Turk (AMT) platform. The resulting set contains 18, 313 images with 23, 788 annotated people. The goal of this paper is to present the EMOTIC database, detailing how it was created and the information available. We expect this dataset can help to open up new horizons on creating systems able of recognizing rich information about people’s apparent emotional states. 1. Introduction When we look at a person it is very easy for us to put ourselves in her situation, and even to feel, to some degree, things that this person appears to be feeling. We use this ex- ceptional ability of guessing how others feel constantly in our daily lives. Such empathizing capacity serves us to be more helpful, sensitive, sympathetic, affectionate and cor- Face / Head Body Person in Context a) b) c) Figure 1: What can we estimate about these people emo- tional states? dial in our social interactions. More generally, this capacity help us to understand better other people, to understand the motivations and goals behind their actions and to predict how they will react to different events. In this paper we introduce the EMOTIC (Emotions in Context) database. The EMOTIC database is a large scale annotated image database with people in context. In this database people are annotated according to their apparent emotions, with a rich set of 26 emotional categories and also with continuous dimensions (valence, arousal and domi- nance). The images, which show context of the person, 61

EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

EMOTIC: Emotions in Context Dataset

Ronak Kosti∗, Jose M. Alvarez†, Adria Recasens‡, Agata Lapedriza∗

Universitat Oberta de Catalunya∗

Data61 / CSIRO†

Massachusetts Institute of Technology‡

{rkosti, alapedriza}@uoc.edu∗, [email protected]†, [email protected]

Abstract

Recognizing people’s emotions from their frame of ref-

erence is very important in our everyday life. This capac-

ity helps us to perceive or predict the subsequent actions

of people, interact effectively with them and to be sympa-

thetic and sensitive toward them. Hence, one should ex-

pect that a machine needs to have a similar capability of

understanding people’s feelings in order to correctly inter-

act with humans. Current research on emotion recognition

has focused on the analysis of facial expressions. However,

recognizing emotions requires also understanding the scene

in which a person is immersed. The unavailability of suit-

able data to study such a problem has made research in

emotion recognition in context difficult. In this paper, we

present the EMOTIC database (from EMOTions In Con-

text), a database of images with people in real environ-

ments, annotated with their apparent emotions. We defined

an extended list of 26 emotion categories to annotate the

images, and combined these annotations with three com-

mon continuous dimensions: Valence, Arousal, and Dom-

inance. Images in the database are annotated using the

Amazon Mechanical Turk (AMT) platform. The resulting

set contains 18, 313 images with 23, 788 annotated people.

The goal of this paper is to present the EMOTIC database,

detailing how it was created and the information available.

We expect this dataset can help to open up new horizons on

creating systems able of recognizing rich information about

people’s apparent emotional states.

1. Introduction

When we look at a person it is very easy for us to put

ourselves in her situation, and even to feel, to some degree,

things that this person appears to be feeling. We use this ex-

ceptional ability of guessing how others feel constantly in

our daily lives. Such empathizing capacity serves us to be

more helpful, sensitive, sympathetic, affectionate and cor-

Face / Head Body Person in Context

a)

b)

c)

Figure 1: What can we estimate about these people emo-

tional states?

dial in our social interactions. More generally, this capacity

help us to understand better other people, to understand the

motivations and goals behind their actions and to predict

how they will react to different events.

In this paper we introduce the EMOTIC (Emotions in

Context) database. The EMOTIC database is a large scale

annotated image database with people in context. In this

database people are annotated according to their apparent

emotions, with a rich set of 26 emotional categories and also

with continuous dimensions (valence, arousal and domi-

nance). The images, which show context of the person,

61

Page 2: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

cover a wide range of scene types and activities, allowing

the study of emotion recognition beyond the analysis of fa-

cial expressions.

There has been a lot of research in emotion recogni-

tion from images. In particular, remarkable research has

been done in the direction of recognizing the 6 basic emo-

tions [11] (anger, disgust, fear, happiness, sadness, and sur-

prise), mainly from facial expression, but also from body

language analysis. In section 2, we give a brief overview

of some of the most relevant works. However, despite these

efforts, machines are still far from recognizing emotional

states as we do.

The problem of emotional state recognition is extremely

complex, but our hypothesis is that there are two main im-

portant limitations in the current approaches: (1) First, most

of the existing databases in emotion recognition lack of fine-

grain labels of human emotions. Most studies classify emo-

tions according to 6 categories, but this is far from the fine

grain categorization that humans are capable of. In this

work we introduce a more sophisticated set of 26 emotion

categories and combine them with the common continuous

dimensions (valence, arousal and dominance). This combi-

nation provides a rich description of the emotional state of

a person. (2) Second, the context (the surroundings of the

person) is an important source of information and has been

traditionally dismissed. The EMOTIC database attempts to

overcome these two issues.

Recent research in psychology highlights the importance

of context in the perception of emotions [4]. The impor-

tance of the context to infer fine grain information on ap-

parent emotional states is illustrated in Figure 1. For ex-

ample, in Figure 1.a, when we look only at the face of the

person (first column), we can guess that the person is feel-

ing Happiness, but is hard to infer additional information

about her emotions. The body pose and clothing (second

column) gives additional clues and we can infer that she

is practicing some sport. However, when we consider the

whole context (third column), we can see that she was in-

volved in a competition and she got the first prize. From

this information, we can say she probably feels Excitement

and Confidence. A similar story can be told for the person in

Figure 1.b. We only see part of the face (first column) which

is not very informative, but the body (second column) indi-

cates that the person is looking away toward something or

someone - which apparently has his attention. Even now

we cannot tell much. It is only when we look at the whole

context (third column) that it becomes clear the person is

in a meeting room and he is paying attention to a person

talking, probably feeling (Engagement). Figure 1.c shows

even a more challenging situation. We just see the back of

the person’s head (first column) which does not give any

information about the emotional state of the person. The

body pose (second column) reveals part of the story, but it

is only the whole image (third column) that tells us that the

boy is playing, so he probably feels Engagement with some

other kids, and he is probably in a state of Anticipation to

the trajectory of the ball. The goal of the EMOTIC dataset

is to provide data for developing automatic systems that can

make these type of inferences.

The EMOTIC database is introduced in a paper accepted

for the Computer Vision and Pattern Recognition 2017 con-

ference [18]. In this paper we give more details on the

dataset, on the image annotation, and on the annotation

consistency among different annotators. We present also

extended statistics and algorithmic analysis of the data us-

ing state-of-the-art methods for scene category and scene

attribute recognition. Our analytics show different distribu-

tion patterns of emotions depending on the different scenes

and attributes. The obtained results indicate that current

systems of scene understanding can be used to incorpo-

rate the analysis of context in the understanding of people’s

emotions. Thus, we think that EMOTIC dataset, in com-

bination with previous datasets for emotion estimation (see

Section 2.1 for an overview), can help to make further steps

in the direction of designing systems capable of recognizing

people’s emotions as humans do.

2. Related Work

Most of the research in computer vision to recognize

emotional states is contextualized in facial expression anal-

ysis (e.g.,[5, 13]). Some of these methods are based on the

Facial Action Coding System [15, 29]. This system uses

a set of specific localized movements of the face, called

Action Units, to encode the facial expression. These Ac-

tion Units can be recognized from geometric-based fea-

tures and/or appearance features extracted from face images

[23, 19, 12]. Recent works for emotion recognition based

on facial expression use CNNs to recognize the emotions

and/or the Action Units [5].

Instead of recognizing emotion categories, some recent

works on facial expression [27] use the continuous dimen-

sions of the VAD Emotional State Model [21] to represent

emotions. The VAD model describes emotions using 3 nu-

merical dimensions: Valence (V), that measures how pos-

itive or pleasant an emotion is, ranging from negative to

positive; Arousal (A), that measures the agitation level of

the person, ranging from non-active / in calm to agitated /

ready to act; and Dominance (D) that measures the control

level of the situation by the person, ranging from submissive

/ non-control to dominant / in-control. On the other hand,

Du et al. [10] proposed a set of 21 facial emotion categories,

defined as different combinations of the basic emotions, like

‘happily surprised’ or ‘happily disgusted’. This categoriza-

tion gives more detail about the expressed emotion.

Although most of the works in recognizing emotions are

focused on face analysis, there are a few works in computer

62

Page 3: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

vision that address emotion recognition using other visual

clues apart from the face. For instance, some works [22]

consider the location of shoulders as additional information

to the face features to recognize basic emotions. More gen-

erally, Schindler et al. [26] used the body pose to recognize

the 6 basic emotions, performing experiments on a small

dataset of non-spontaneous poses acquired under controlled

conditions.

2.1. Related datasets

In recent years we observed a significant emergence

of affective datasets to recognize people’s emotions. The

GENKI database [1] contains frontal face images of single

person with wide ranging illumination, geographical, per-

sonal and ethnic setting and the images are labeled as smil-

ing or non-smiling. The ICML face-expression recognition

dataset [2] consists of 28k images annotated with 6 basic

emotions and a neutral category. The UCDSEE dataset [28]

has a set of 9 emotion expressions acted by 4 persons ac-

quired using strictly the same lab setting in order to focus

mainly on the facial expression of the person.

The dynamic body movement is also an essential source

of emotion. The studies [16, 17] establish the relationship

between affect and body posture using as ground truth the

base-rate of human observers. The data used consists of

a spontaneous subset acquired under a controlled setting

(people playing Wii games). The GEMEP database [3] is

a multi-modal (audio and video) dataset and comprises 10actors playing 18 affective states. The dataset has videos

of actors showing emotions through acting - body pose and

facial expression combined.

EMOTIW challenges [7] hosts 3 databases viz. 1) The

AFEW database [6] focuses on emotion recognition from

video frames taken from movies and TV shows; where the

actions are semi-spontaneous and are annotated with at-

tributes like name, age of actor, age of character, pose, gen-

der, expression of person, the overall clip expression and

the basic 6 emotions and a neutral category; 2) The SFEW

dataset [8] is a subset of AFEW database consisting of im-

ages of face-frames annotated specifically with the basic

6 emotions and a neutral category and, 3) The HAPPEI

database [9] focuses on the problem of group level emo-

tion estimation and it is a first attempt to use context for the

problem of predicting happiness in groups of people.

Finally, the MSCOCO dataset has been recently anno-

tated with object attributes [24], including some emotion

categories for people, such as happy and curious. These

attributes show some overlap with the categories that we

define in this paper. However, COCO attributes are not in-

tended to be exhaustive for emotion recognition, and not ev-

ery person in the dataset is annotated with affect attributes.

1. Peace: well being and relaxed; no worry; having positive thoughts or

sensations; satisfied

2. Affection: fond feelings; love; tenderness

3. Esteem: feelings of favorable opinion or judgment; respect; admiration;

gratefulness

4. Anticipation: state of looking forward; hoping on or getting prepared

for possible future events

5. Engagement: paying attention to something; absorbed into something;

curious; interested

6. Confidence: feeling of being certain; conviction that an outcome will be

favorable; encouraged; proud

7. Happiness: feeling delighted; feeling enjoyment or amusement

8. Pleasure: feeling of delight in the senses

9. Excitement: feeling enthusiasm; stimulated; energetic

10. Surprise: sudden discovery of something unexpected

11. Sympathy: state of sharing others emotions, goals or troubles; sup-

portive; compassionate

12. Doubt/Confusion: difficulty to understand or decide; thinking about

different options

13. Disconnection: feeling not interested in the main event of the sur-

rounding; indifferent; bored; distracted

14. Fatigue: weariness; tiredness; sleepy

15. Embarrassment: feeling ashamed or guilty

16. Yearning: strong desire to have something; jealous; envious; lust

17. Disapproval: feeling that something is wrong or reprehensible; con-

tempt; hostile

18. Aversion: feeling disgust, dislike, repulsion; feeling hate

19. Annoyance: bothered by something or someone; irritated; impatient;

frustrated

20. Anger: intense displeasure or rage; furious; resentful

21. Sensitivity: feeling of being physically or emotionally wounded; feel-

ing delicate or vulnerable

22. Sadness: feeling unhappy, sorrow, disappointed, or discouraged

23. Disquietment: nervous; worried; upset; anxious; tense; pressured;

alarmed

24. Fear: feeling suspicious or afraid of danger, threat, evil or pain; horror

25. Pain: physical suffering

26. Suffering: psychological or emotional pain; distressed; anguished

Table 1: Proposed emotion categories with definitions.

3. EMOTIC Dataset Creation

Our aim was to create a database of natural images, cap-

turing the subjects and their contexts with their natural un-

constrained environments. We started by collecting im-

ages from well established datasets like MSCOCO [20] and

ADE20K [33]. These datasets host a good number of im-

ages which satisfy our criteria. We also downloaded im-

ages after searching on Google search engine. We used

various combination of words representing a varied mix-

ture of subjects, locations, situations and contexts. This re-

sulted in a challenging collection of images, that combine

images of people under different situations, performing dif-

ferent tasks and showing a wide range of emotional states.

Currently, the EMOTIC database consists of 18316 images

with 23788 people annotated. EMOTIC dataset is split in

training (70%), validation (10%), and testing (20%) sets.

63

Page 4: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

3.1. Emotion Representation

EMOTIC dataset combines 2 methods to represent emo-

tions:

• Discrete Categories: We define an extended list of 26emotional categories. Table 1 gives the definition of

each emotion category. Two examples of images for

each of the emotion category are shown in Figure 2.

• Continuous Dimensions: We also used the VAD Emo-

tional State Model to represent emotions. The contin-

uous dimensions annotations in the database are in a

1− 10 scale. Figure 3 shows examples of people with

different levels of each one of these three dimensions.

In Figure 4 we show images of the EMOTIC database

along with their annotations.

To define the 26 emotion categories, we collected a vo-

cabulary of affective states. Using word connections (syn-

onyms, affiliations, relevance) and the inter-dependence of a

group of words (psychological research and affective com-

puting [14, 25]), we started forming word-groupings. After

multiple iterations and cross-referencing with dictionaries

and research in affective computing, we obtained the final

26 categories (ref Table 1). While forming this group of 26emotion categories, we adjudged it necessary for the group

to follow two important conditions: (1) Disjointness and (2)

Visual Separability. By disjointness, we mean that given

any category pair, {c1, c2}, we could always find an exam-

ple of image where just one of the categories apply (and not

the other). By visual separability we mean that two affec-

tive states were assigned to the same emotion group in case

we find, qualitatively, that the two could not be visually sep-

arable under the conditions of our database. For example,

in Figure 2 the images for 7.Happiness and 1.Peace show

clearly the visual separability of these categories in spite

of being similar. However, the category excitement, for in-

stance, includes the subcategories “enthusiastic, stimulated,

and energetic.” Each of these three words have a specific

and different meaning, but it is very difficult to separate one

from another just after seeing a single image. In our list of

categories we decided to avoid the neutral category since

we think that, in general, at least one category applies, even

though it could apply just with low intensity.

Notice that our 26 categories also include the 6 basic

emotions (categories 7, 10, 18, 20, 22, 24) defined by Ek-

man [11]. Note that Aversion is a general form of the basic

emotion Disgust, hence it makes more sense to keep it as a

main emotion category.

3.2. Image Annotation

Images have been annotated on the Amazon Mechanical

Turk(AMT) Platform. Figure 6 shows the two different an-

notation interfaces created for labeling images with emotion

categories (Figure 6.a) and continuous dimensions (Figure

6.b). In the AMT interface for continuous dimensions we

also asked AMT workers to annotated the gender and the

estimated age range of the person in the bounding box ac-

cording to the following age ranges: kid (0-12 years old),

teenager (13-20 years old), adult (more than 20 years old).

We adopted two methods to control the annotation qual-

ity, apart of providing the AMT workers with extended an-

notation instructions and examples. First, we launched a

qualification task to monitor the annotation of one image

according to discrete categories and one image according to

continuous dimensions. In each of the control figures we

manually selected all the categories that we thought were

not clearly apply for that image, and also those ranges of

continuous dimensions that were, in our opinion, out of an

acceptable response. For instance, in the image shown in 6,

a worker that selected pleasure, disconnection, sadness,

fear, pain, suffering, or a worker that selects 1 − 2 for

valence, arousal or dominance would not pass. We con-

sidered that a worker that labeled according to these light

restrictions understood well the task and did not make a

random labeling. Just those workers that label the control

images under the the mentioned restrictions were allowed

to label images of the dataset. Secondly, we insert 2 con-

trol images, with restrictions similar to the ones mentioned

before, for every 18 images to track the consistency during

the annotation. The annotations of our database come from

those workers that keep passing the control images during

their HITs.

3.3. Annotation Consistency

The Validation set was annotated by 5 different workers,

to check the annotation consistency among different peo-

ple. Although there is an inherent subjective nature in this

annotation task, we observed that there is a degree of agree-

ment among different annotators. To measure this agree-

ment quantitatively, we computed Fleiss’ Kappa (κ) statis-

tic using the 5 annotations of each image in the validation

set. The obtained result is κ = 0.31. Furthermore, more

than 50% of the images in the validation set has κ > 0.31,

which indicates a much higher agreement level than ran-

dom chance (notice that random annotations will produce

κ ∼ 0).

For every image in validation set, there is at least one cat-

egory which is annotated by 2 or more annotators. In 60%images (validation), there is at least one category which is

annotated by at least 3 annotators. These statistics which are

reflected in the Figure 5 are a good indicator of the agree-

ableness amongst the human annotators.

For continuous dimensions, the standard deviation

among the 3 different workers is, in average, 1.41, 0.70 and

2.12 for valence, arousal and dominance respectively. Dom-

inance shows a higher dispersion in its value as compared to

64

Page 5: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

Figure 2: Visual examples of the 26 emotion categories defined in Table 1. Per each category we show two images where the

person marked with the red bounding box has been annotated with the corresponding category.

Arousal

Dominance

1

1 53 8 10

1 53 8 10

1 53 8 10

Valence

Figure 3: Examples of images from the EMOTIC dataset

with different scores of Valence, Arousal, and Dominance.

other dimensions. This means that the agreement deviation

is higher for dominance than for other dimensions. Note

that the range of values for each dimension is between 1 to

10 - 1 being the lowest and 10 the highest level.

4. Dataset Statistics

Of the 23, 788 persons annotated in the dataset, 66%are males and 34% are females. The age distribution of

the annotated people is the following: 11% children, 11%teenagers, and 78% adults.

Figure 7.a shows the number of people for each emo-

tional category, while Figures 7.b, 7.c and 7.d show the

number of people for valence, arousal and dominance con-

tinuous dimensions for each score, respectively.

It is interesting to see how the values (1 - lowest to 10 -

highest) of a given continuous dimension are spread across

the dataset for each emotional category. Figure 8.a shows

Anticipation

Excitement

Engagement

V A D0

0.5

1!"#$%&'#("

)*$%+,-,"+.

)"/'/,-,"+

!"#$%&'#("

)*$%+,-,"+

)".'.,-,"+V A D

0

0.5

1!"##$%&''(

)&"*%$%+

(,%+"+&-&%.

Yearning

Disquietment

Annoyance

V A D0

0.5

1!"#$%&%'

(&)*+&,"-"%,

.%%/0#%1"2

!"#$%&%'(

)&*+,&-"."%-(

/%%01#%2"

Peace

Esteem

Happiness

V A D0

0.5

1Peace

Esteem

Happiness

Anticipation

Excitement

Engagement

Yearning

Disquitement

Annoyance

Happiness

Yearning

Engagement

Pleasure

10 10

10 10

Figure 4: Annotated images from the EMOTIC dataset.

the plot for the spread of Valence values across each one

of them. The categories are sorted according to the mean

value of Valence across each category. We can clearly see

that the lowest mean of Valence is for emotions like Suffer-

ing, Pain, Sadness - indicating that Valence values, in av-

erage, are low for these categories. This shows consistency

on the annotations of our EMOTIC dataset, since one would

expect that images with emotion categories like Suffering,

Pain, Sadness should exhibit, in average, low levels of Va-

lence. Similarly, it is clear that emotions like Happiness,

65

Page 6: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

0 500 1000 1500 2000 2500Images

0

10

20

30

40

50

60

70

80

90

100

% o

f C

ate

go

rie

s

% of Categories annotated by more than 1(black), 2(red), 3(green), 4(blue) annotators

More than 1 AnnotatorsMore than 2 AnnotatorsMore than 3 AnnotatorsMore than 4 Annotators

Figure 5: Images of validation set with at least one of all the

annotated categories with more than (1,2,3,4) annotators

a)

b)

Back Go to Next Image(Image 1 of 20)

Peace (well being and relaxed/no worry/positive sensation/satis�ed)

A�ection (fond feelings/tenderness/love/compassion)

Expectation (state of anticipating/hoping on something or someone)

Esteem (favorable opinion or judgment/gratefulness/admiration/respect)

Con�dence (feeling of being certain/proud/encouraged/optimistic)

Engagement (occupied/absorbed/interested/paying attention to something)

Pleasure (feeling of delight in the senses)

Happiness (feeling delighted/enjoyment/amusement)

Excitement (pleasant and excited state/stimulated/energetic/enthusiastic)

Surprise (sudden discovery of something unexpected)

Su�ering (distressed/perturbed/anguished)

Disapproval (think that something is wrong or reprehensible/contempt/hostile)

Yearning (strong desire to have something/jealous/envious)

Fatigue (weariness/tiredness/sleepy)

Pain (physical su�ering)

Doubt/Confusion (di�culty to understand or decide/sceptical/lost)

Fear (feeling afraid of danger/evil/pain/horror)

Vulnerability (feeling of being physically or emotionally wounded)

Disquitement (unpleasant restlessness/tense/worried/upset/stressed)

Annoyance (bothered/iritated/impatient/troubled/frustrated)

Anger (intense displeasure or rage/furious/resentful)

Disgust (feeling dislike or repulsion/feeling hateful)

Sadness (feeling unhappy/grief/disappointed/discouraged)

Disconnection (not participating/indi�erent/bored/distracted)

Embarrassment (feeling ashamed or guilty)

Back Go to Next Image(Image 1 of 20)

Valence: Negative vs. Positive

Arousal (awakeness): Calm vs. Ready to act

Dominance: Dominated vs. In control

Gender and age of the person in the yellow box

Positive

(pleasant)

Negative

(unpleasant)

Ready to act

(active)Calm

In

control

Dominated

(no

control)

Male Female

Kid (0-12) Teenager (13-20) Adult (more than 20)

Figure 6: AMT Interfaces for emotion category (a) and con-

tinuous dimensions (a) annotation.

Affection, Pleasure should have a higher level of Valence

and this fact is apparent from the plot itself. Finally, as ex-

pected, the plot also shows the correlation between a person

in a disconnected emotional state and a mid-level Valence

value - Disconnection lies in the mid-range of the sorted

emotion categories. Similarly, in Figure 8.b we can see the

same type of information for the arousal dimension. Notice

5.Eng

agem

ent

4.Ant

icipat

ion

7.Hap

pine

ss

9.Exc

item

ent

6.Con

fiden

ce

8.Ple

asur

e

1.Pea

ce

2.Affe

ction

13.D

isco

nnec

tion

3.Est

eem

11.S

ympa

thy

16.Y

earn

ing

14.F

atigue

12.D

oubt

/Con

fusion

23.D

isqu

ietm

ent

19.A

nnoy

ance

21.S

ensitiv

ity

17.D

isap

prov

al

10.S

urpr

ise

22.S

adne

ss

26.S

uffe

ring

18.A

vers

ion

20.A

nger

24.F

ear

25.P

ain

15.E

mba

rrass

men

t

103

104

Nu

mb

er

of P

ers

on

s

Categories: Number of persons per category

50%

27% 26% 26%23%

11%

8%

6% 6%5% 5%

4%3% 3%

3%

2% 2% 2% 2% 2%

2%

1% 1%1% 1%

1%

Total Number of People: 23788

Total Number of Images: 18316

a)

Nu

mb

er

of

Pe

op

le

b) c) d)

1%

1%

3%

7%

16%

36%

19%

14%

3%

1%

1%

2%

5%

23%

11%14%

20%

11%

6%5%

1%

2%

3%

5%6%

20%

34%

14%

10%

5%

Nu

mb

er

of P

eo

ple

1 2 3 1 2 3 4 5 6 7 8 9 10 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Valence Scores Arousal Scores

104

103

10

Dominance Scores

2

103

103

104

102

102

Figure 7: a) Number of people per emotion category in the

EMOTIC dataset; b),c),d) Number of people per score in

each of the continuous dimensions.

that fatigue and sadness are the categories that, have lowest

arousal score in average, meaning that when these feelings

are present, people are usually in a low level of agitation.

On the other hand, confidence and excitement are the cate-

gories with highest arousal level. Finally, Figure 8.c shows

the distribution of the dominance scores. The categories

with lowest dominance level (people feeling they are not in

control of the situation) are suffering and pain, while the

highest dominance levels in average are shown with the cat-

egories confidence and excitement. We observe that these

types of category sorting are consistent with our common

sense knowledge. However, we also observe in these graph-

ics that per each category we have a some relevant variabil-

ity of the continuous dimension scores. This suggests that

the information contributed by each type of annotation can

be complementary and not redundant.

We also computed the co-occurrence probability of cate-

gories. These probabilities are shown in Figure 9. Given

a row, corresponding to the emotional category c1, and

a column, corresponding to the emotional category c2,

each entry corresponds to the probability P (c2|c1). From

these results we can observe interesting patterns of cat-

egory co-occurrences. For instance, we see that when

a person feels affection it is very likely that she also

feels happiness, or that when a person feels anger she

is also likely to feel annoyance. More generally, we

used k-means to cluster category annotations and we ob-

66

Page 7: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

26.Suffe

ring

25.Pain

22.Sadness

20.Anger

19.Annoya

nce

18.Ave

rsio

n

23.Disq

uietm

ent

12.Doubt/C

onfusio

n

17.Disa

pprova

l

15.Em

barrass

ment

24.Fear

14.Fatig

ue

13.Disc

onnectio

n

21.Sensit

ivity

10.Surp

rise

16.Yearn

ing

5.Engagem

ent

4.Antic

ipatio

n

1.Peace

6.Confid

ence

11.Sym

pathy

9.Exc

item

ent

3.Est

eem

8.Ple

asure

2.Affe

ctio

n

7.Happin

ess0

10

20

30

40

50

60

70

80

90

100

% o

f eac

h sc

ore

per

Cat

egor

y

Distribution of Valence values across Categories

12345678910

14.Fatig

ue

22.Sadness

13.Disc

onnectio

n

26.Suffe

ring

17.Disa

pprova

l

25.Pain

12.Doubt/C

onfusio

n

1.Peace

15.Em

barrass

ment

16.Yearn

ing

19.Annoya

nce

21.Sensit

ivity

23.Disq

uietm

ent

11.Sym

pathy

2.Affe

ctio

n

18.Ave

rsio

n

10.Surp

rise

7.Happin

ess

3.Est

eem

8.Ple

asure

20.Anger

5.Engagem

ent

24.Fear

4.Antic

ipatio

n

6.Confid

ence

9.Exc

item

ent0

10

20

30

40

50

60

70

80

90

100

% o

f eac

h sc

ore

per

Cat

egor

y

Distribution of Arousal values across Categories

12345678910

26.Suffe

ring

25.Pain

22.Sadness

17.Disa

pprova

l

18.Ave

rsio

n

20.Anger

24.Fear

19.Annoya

nce

12.Doubt/C

onfusio

n

23.Disq

uietm

ent

14.Fatig

ue

15.Em

barrass

ment

10.Surp

rise

21.Sensit

ivity

13.Disc

onnectio

n

16.Yearn

ing

1.Peace

2.Affe

ctio

n

11.Sym

pathy

8.Ple

asure

7.Happin

ess

5.Engagem

ent

3.Est

eem

4.Antic

ipatio

n

9.Exc

item

ent

6.Confid

ence0

10

20

30

40

50

60

70

80

90

100

% o

f eac

h sc

ore

per

Cat

egor

y

Distribution of Dominance values across Categories

12345678910

0

a)

b)

c)

Figure 8: Per each of the continuous dimensions, distribu-

tion of the scores across the different emotion categories.

served that some category groups appear frequently in the

EMOTIC database. Some examples are {anticipation, en-

gagement, confidence}, {affection, happiness, pleasure},

{doubt/confusion, disapproval, annoyance}, {yearning, an-

noyance, disquietment}.

5. Dataset Algorithmic Analysis

We classified the scenes of EMOTIC dataset using a

“state-of-the-art Convolutional Neural Network model for

scene recognition [32]. Figure 10 shows two examples of

scene category and scene attribute recognition in images of

EMOTIC dataset. The figure shows the original images, the

class activation maps [30] (that correspond to the region of

the image that supports the decision of the classifier, and the

scene categories (top 2) and scene attribute (top 5) automat-

ically recognized in each image. As we can see, the results

are very accurate. In general, as reported by the authors in

[31], the recognition accuracy of these systems is around

78%, according to the feedback provided by the users of

online demo for scene recognition.

Cooccurence of Categories100.00

15.76

10.52

3.39

6.41

8.33

12.61

17.08

4.40

4.08

9.87

2.82

10.41

8.59

1.93

10.49

1.51

0.72

0.72

0.00

23.26

2.25

0.59

1.72

2.19

1.59

11.43

100.00

24.11

2.91

3.46

4.41

14.71

23.04

6.77

6.80

34.51

1.74

0.75

0.76

3.86

2.87

0.57

1.09

0.00

0.36

9.38

2.87

0.88

2.16

1.32

1.59

6.57

20.76

100.00

5.40

4.04

8.79

9.02

11.31

6.15

2.91

26.78

0.94

2.41

0.51

3.38

2.54

0.38

1.09

0.36

0.73

1.88

0.20

0.74

2.59

0.44

0.53

11.38

13.48

29.08

100.00

32.10

31.58

12.99

18.15

31.34

14.76

19.06

20.81

8.45

10.35

16.43

13.58

10.94

9.42

9.34

6.57

11.26

3.69

14.12

24.14

5.26

3.97

39.56

29.46

39.93

59.01

100.00

58.46

34.88

39.85

55.11

29.13

24.64

27.25

25.41

23.23

12.08

12.80

23.96

14.86

19.93

14.96

12.95

9.63

23.09

18.10

12.28

6.88

23.27

16.98

39.35

26.27

26.46

100.00

22.11

28.50

40.48

11.26

15.54

3.76

5.51

6.82

6.76

5.08

3.96

3.26

5.03

5.11

3.38

0.41

5.00

9.48

3.51

1.06

40.33

64.91

46.23

12.38

18.08

25.32

100.00

63.31

40.74

24.08

45.06

2.28

5.05

3.41

7.73

7.17

1.70

0.36

0.54

0.36

10.51

0.00

0.74

0.43

2.63

1.85

22.96

42.72

24.36

7.27

8.68

13.72

26.60

100.00

16.72

14.95

19.91

2.15

2.11

2.40

4.35

4.97

1.51

6.88

0.00

0.36

6.19

0.00

0.15

2.16

3.07

1.32

13.96

29.67

31.32

29.64

28.36

46.02

40.44

39.50

100.00

26.60

16.91

5.64

2.49

3.16

8.70

4.53

2.45

2.90

3.95

3.28

4.13

0.00

7.35

18.10

1.75

0.53

1.09

2.50

1.24

1.17

1.26

1.07

2.00

2.96

2.23

100.00

1.89

16.64

1.28

7.20

8.21

5.52

14.91

6.52

4.49

4.01

1.69

2.87

4.71

14.66

21.93

13.49

5.95

28.67

25.85

3.42

2.40

3.35

8.49

8.92

3.21

4.27

100.00

2.95

0.75

1.52

3.38

2.21

2.64

5.07

0.90

1.09

8.63

15.78

4.41

3.45

5.26

9.52

1.09

0.93

0.58

2.39

1.70

0.52

0.27

0.62

0.68

24.08

1.89

100.00

9.13

15.91

19.81

8.83

31.51

15.58

19.21

14.96

7.50

13.93

18.24

28.02

30.26

25.13

7.14

0.71

2.65

1.72

2.82

1.35

1.08

1.08

0.54

3.30

0.86

16.24

100.00

24.24

10.14

3.31

10.94

12.68

13.29

6.20

3.94

10.45

13.09

6.03

6.58

8.73

3.52

0.43

0.33

1.26

1.54

1.00

0.44

0.73

0.41

11.07

1.03

16.91

14.48

100.00

11.11

7.40

17.17

8.33

14.54

4.74

4.32

12.70

18.68

10.78

39.47

30.69

0.21

0.57

0.58

0.52

0.21

0.26

0.26

0.35

0.29

3.30

0.60

5.50

1.58

2.90

100.00

0.33

6.42

6.16

4.67

4.74

2.81

6.35

6.62

7.76

8.33

8.20

4.91

1.85

1.91

1.89

0.97

0.85

1.05

1.73

0.67

9.71

1.72

10.74

2.26

8.46

1.45

100.00

11.13

4.35

5.39

4.38

15.38

5.94

4.12

5.17

20.61

14.02

0.41

0.21

0.17

0.89

1.06

0.39

0.15

0.31

0.21

15.34

1.20

22.42

4.37

11.49

16.43

6.51

100.00

36.23

38.06

36.13

5.63

15.57

15.74

10.34

29.39

27.51

0.10

0.21

0.25

0.40

0.34

0.17

0.02

0.73

0.13

3.50

1.20

5.77

2.64

2.90

8.21

1.32

18.87

100.00

17.41

24.82

2.81

8.20

9.26

9.48

4.82

11.90

0.21

0.00

0.17

0.80

0.93

0.52

0.05

0.00

0.36

4.85

0.43

14.36

5.58

10.23

12.56

3.31

40.00

35.14

100.00

55.47

6.57

11.07

20.15

12.93

13.60

19.58

0.00

0.07

0.17

0.28

0.34

0.26

0.02

0.04

0.15

2.14

0.26

5.50

1.28

1.64

6.28

1.32

18.68

24.64

27.29

100.00

2.63

5.53

8.82

7.33

9.65

10.85

6.41

3.57

0.83

0.92

0.58

0.33

0.91

1.27

0.36

1.75

3.95

5.37

1.58

2.90

7.25

9.05

5.66

5.43

6.28

5.11

100.00

19.47

8.09

12.07

14.04

20.37

0.57

1.00

0.08

0.28

0.39

0.04

0.00

0.00

0.00

2.72

6.61

9.13

3.85

7.83

14.98

3.20

14.34

14.49

9.69

9.85

17.82

100.00

18.09

18.10

26.75

47.35

0.21

0.43

0.41

1.48

1.32

0.63

0.08

0.04

0.81

6.21

2.58

16.64

6.71

16.04

21.74

3.09

20.19

22.83

24.60

21.90

10.32

25.20

100.00

19.83

14.91

29.37

0.21

0.36

0.50

0.86

0.35

0.41

0.02

0.19

0.68

6.60

0.69

8.72

1.06

3.16

8.70

1.32

4.53

7.97

5.39

6.20

5.25

8.61

6.76

100.00

9.65

11.11

0.26

0.21

0.08

0.18

0.23

0.15

0.10

0.27

0.07

9.71

1.03

9.26

1.13

11.36

9.18

5.19

12.64

3.99

5.57

8.03

6.00

12.50

5.00

9.48

100.00

35.45

0.31

0.43

0.17

0.23

0.22

0.07

0.11

0.19

0.03

9.90

3.09

12.75

2.49

14.65

14.98

5.85

19.62

16.30

13.29

14.96

14.45

36.68

16.32

18.10

58.77

100.00

1.Pea

ce

2.Affe

ction

3.Est

eem

4.Ant

icipat

ion

5.Eng

agem

ent

6.Con

fiden

ce

7.Hap

pine

ss

8.Ple

asur

e

9.Exc

item

ent

10.S

urpr

ise

11.S

ympa

thy

12.D

oubt

/Con

fusion

13.D

isco

nnec

tion

14.F

atigue

15.E

mba

rrass

men

t

16.Y

earn

ing

17.D

isap

prov

al

18.A

vers

ion

19.A

nnoy

ance

20.A

nger

21.S

ensitiv

ity

22.S

adne

ss

23.D

isqu

ietm

ent

24.F

ear

25.P

ain

26.S

uffe

ring

1.Peace

2.Affectio

n

3.Esteem

4.Anticipatio

n

5.Engagement

6.Confid

ence

7.Happiness

8.Pleasure

9.Excitement

10.Surprise

11.Sympathy

12.Doubt/C

onfusion

13.Disconnectio

n

14.Fatigue

15.Embarrassment

16.Yearning

17.Disapproval

18.Aversion

19.Annoyance

20.Anger

21.Sensitivity

22.Sadness

23.Disquietm

ent

24.Fear

25.Pain

26.Sufferin

g

Figure 9: Category co-ocurrance probabilities.

Original Image Class Activation Map Classification

Scene Categories:

Home o�ce,

Living Room

Scene Attributes:

No Horizon,

Enclosed Area,

Man-Made,

Dry, Metal

Scene Categories:

Ocean, Coast

Scene Attributes:

Warm, Open Area,

Far-Away Horizon,

Natural Light, Cloth

Figure 10: Examples of scene category and attribute recog-

nition in images from the EMOTIC dataset, with the corre-

sponding class activation map.

Using the classification labels obtained with these CNN

models we can now study patterns on emotion distribu-

tion in different scene categories and under different scene

attributes. Using our data, we computed the probabil-

ity of each emotion at each place as P (emo|place) =Nemo/Npeople, where Npeople is the number of people

that we observed in the specific place category, and Nemois the number of those people that have been labeled with

the specific emo category. Figure 11 shows representa-

tive examples of the emotion category distribution in dif-

ferent scenes. We see that, in our data, there are differ-

ent patterns. For instance, the most frequent emotions in a

bedroom are engagement, happiness, pleasure, peace,

and affection, while in a baseball field, the most frequent

emotions are engament, anticipation, confidence, and

excitement. Among these examples, we also see that emo-

tions like disquietment, annoyance, or anger are more

frequent in bedrooms (where people have some privacy de-

gree), in offices (where people can feel tired of working) or

67

Page 8: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

Bedroom

1.Peace

3.Anticipation

4.Esteem

6.Engag

ement

7.Pleasure

8.Hap

piness

9.Excitement

10.Surprise

12.Disap

proval

13.Yearning

14.Fatigue

15.Pain

16.Doubt/Confusion

17.Fear

18.Disquietm

ent

19.Annoyance

20.Anger

21.Aversion

22.Sad

ness

23.Disconnection

24.Embarrassment

25.Sensitivity

26.Sym

pathy

Game Room

1.Peace

3.Anticipation

4.Esteem

6.Engag

ement

7.Pleasure

8.Hap

piness

9.Excitement

10.Surprise

12.Disap

proval

13.Yearning

14.Fatigue

15.Pain

16.Doubt/Confusion

17.Fear

18.Disquietm

ent

19.Annoyance

20.Anger

21.Aversion

22.Sad

ness

23.Disconnection

24.Embarrassment

25.Sensitivity

26.Sym

pathy

Waiting Room

1.Peace

3.Anticipation

4.Estee

m

6.Engag

emen

t

7.Pleasure

8.Hap

piness

9.Excitem

ent

10.Surprise

12.Disap

proval

13.Yearning

14.Fatigue

15.Pain

16.Doubt/Confusion

17.Fear

18.Disquietm

ent

19.Annoyance

20.Anger

21.Aversion

22.ad

ness

23.Disconnection

24.Embarrassmen

t

25.Sen

sitivity

26.Sym

pathy

Office

1.Peace

3.Anticipation

4.Estee

m

6.Engag

emen

t

7.Pleasure

8.Hap

piness

9.Excitem

ent

10.Surprise

12.Disap

proval

13.Yearning

14.Fatigue

15.Pain

16.Doubt/Confusion

17.Fear

18.Disquietm

ent

19.Annoyance

20.Anger

21.Aversion

22.Sad

ness

23.Disconnection

24.Embarrassmen

t

25.Sen

sitivity

26.Sym

pathy

Hospital Room

1.Pe

ace

3.Anticipation

4.Esteem

6.En

gag

emen

t

7.Pleasure

8.Hap

piness

9.Excitemen

t

10.Surprise

12.Disap

proval

13.Yearning

14.Fatigue

15.Pain

16.Doubt/Confusion

17.Fear

18.Disquietm

ent

19.Annoyance

20.Anger

21.Aversion

22.Sad

ness

23.Disconnection

24.Embarrassmen

t

25.Sen

sitivity

26.Sym

pathy

Probability

Probability

Probability

Probability

Probability

Baseball Field

1.Pe

ace

3.Anticipation

4.Esteem

6.En

gag

emen

t

7.Pleasure

8.Hap

piness

9.Excitemen

t

10.Surprise

12.Disap

proval

13.Yearning

14.Fatigue

15.Pain

16.Dou

bt/Con

fusion

17.Fear

18.Disquietmen

t

19.Annoy

ance

20.Anger

21.Aversion

22.Sad

ness

23.Disconnection

24.Embarrassmen

t

25.Sen

sitivity

26.Sym

pathy

Probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure 11: Emotion Distribution per place category

in hospital rooms (where people can be worried).

We can do a similar analysis using the recognized scene

attributes. Representative results are shown in Figure 12.

Here, we also observe some interesting patterns. Among

these examples, we see that the category peace is signif-

icantly higher when scenes have the attribute vegetation.

This seems reasonable, since natural areas are more suitable

for relaxing. Similarly, the highest frequency for sympathyis shown for the attribute socializing, followed by the at-

tribute vegetation. Taking into account the 6 attributes

shown in Figure 12, it seems reasonable to see more fre-

quently sympathy in scenes with these two attributes than,

for instance, in scenes of sports or competing. We also ob-

serve very high frequencies for the categories engegment,anticipation, confidence, and excitement in scenes with

the attributes sports and competing. Notice that, in gen-

eral, the most frequent emotions correlate with the most fre-

quent emotions in the whole database (see Figure 7), as ex-

pected. For instance, there are many people in the database

showing engagement, since usually people are ”involved

into something.” As a consequence, we see a high frequency

of the emotion engagement in all of the scene categories

and attributes. However, even though the main pattern of

the general emotion distribution is preserved, we see spe-

cific variations in each scene category and attribute.

Regarding the continuous dimensions, we also observe

interpretable patterns. Figure 13 shows an overview of

scene categories (top section of each box) and scene at-

tributes (low section of each box) with highest and lowest

valence, arousal, and dominance, respectively, in average.

6. Conclusions

In this paper we present the EMOTIC database, a

database of images with people in context annotated ac-

cording to their apparent emotional states. The EMOTIC

database combines 2 different annotation approaches: an

extended set of 26 emotional categories and a set of 3 con-

Vegetation

0

0.1

0.2

0.3

0.4

0.5

0.6

1.Pe

ace

2.A�e

ction

3.Anticipation

4.Esteem

5.Co

n�de

nce

6.En

gage

men

t

7.Pleasure

8.Hap

pine

ss

9.Excitemen

t

10.Surprise

11.Su�

ering

12.Disap

prov

al

13.Yearning

14.Fatigue

15.Pain

16.Dou

bt/Con

fusion

17.Fear

18.Disqu

ietm

ent

19.Ann

oyan

ce

20.Ang

er

21.Aversion

22.Sad

ness

23.Disconn

ectio

n

24.Emba

rrassm

ent

25.Sen

sitiv

ity

26.Sym

pathy

0

0.1

0.2

0.3

0.4

0.5

0.6

0

0.1

0.2

0.3

0.4

0.5

0.6

0

0.1

0.2

0.3

0.4

0.5

0.6

Working

1.Pe

ace

2.A�e

ction

3.Anticipation

4.Esteem

5.Co

n�de

nce

6.En

gage

men

t

7.Pleasure

8.Hap

pine

ss

9.Excitemen

t

10.Surprise

11.Su�

ering

12.Disap

prov

al

13.Yearning

14.Fatigue

15.Pain

16.Dou

bt/Con

fusion

17.Fear

18.Disqu

ietm

ent

19.Ann

oyan

ce

20.Ang

er

21.Aversion

22.Sad

ness

23.Disconn

ectio

n

24.Emba

rrassm

ent

25.Sen

sitiv

ity

26.Sym

pathy0

0.1

0.2

0.3

0.4

0.5

0.6

1.Pe

ace

2.A�e

ction

3.Anticipation

4.Esteem

5.Co

n�de

nce

6.En

gage

men

t

7.Pleasure

8.Hap

pine

ss

9.Excitemen

t

10.Surprise

11.Su�

ering

12.Disap

prov

al

13.Yearning

14.Fatigue

15.Pain

16.Dou

bt/Con

fusion

17.Fear

18.Disqu

ietm

ent

19.Ann

oyan

ce

20.Ang

er

21.Aversion

22.Sad

ness

23.Disconn

ectio

n

24.Emba

rrassm

ent

25.Sen

sitiv

ity

26.Sym

pathy

Competing Training

1.Pe

ace

2.A�e

ction

3.Anticipation

4.Esteem

5.Co

n�de

nce

6.En

gage

men

t

7.Pleasure

8.Hap

pine

ss

9.Excitemen

t

10.Surprise

11.Su�

ering

12.Disap

prov

al

13.Yearning

14.Fatigue

15.Pain

16.Dou

bt/Con

fusion

17.Fear

18.Disqu

ietm

ent

19.Ann

oyan

ce

20.Ang

er

21.Aversion

22.Sad

ness

23.Disconn

ectio

n

24.Emba

rrassm

ent

25.Sen

sitiv

ity

26.Sym

pathy

Socializing

Sports

0

0.1

0.2

0.3

0.4

0.5

0.6

1.Pe

ace

2.A�e

ction

3.Anticipation

4.Esteem

5.Co

n�de

nce

6.En

gage

men

t

7.Pleasure

8.Hap

pine

ss

9.Excitemen

t

10.Surprise

11.Su�

ering

12.Disap

prov

al

13.Yearning

14.Fatigue

15.Pain

16.Dou

bt/Con

fusion

17.Fear

18.Disqu

ietm

ent

19.Ann

oyan

ce

20.Ang

er

21.Aversion

22.Sad

ness

23.Disconn

ectio

n

24.Emba

rrassm

ent

25.Sen

sitiv

ity

26.Sym

pathy

Probability

Probability

Probability

Probability

Probability

1.Pe

ace

2.A�e

ction

3.Anticipation

4.Esteem

5.Co

n�de

nce

6.En

gage

men

t

7.Pleasure

8.Hap

pine

ss

9.Excitemen

t

10.Surprise

11.Su�

ering

12.Disap

prov

al

13.Yearning

14.Fatigue

15.Pain

16.Dou

bt/Con

fusion

17.Fear

18.Disqu

ietm

ent

19.Ann

oyan

ce

20.Ang

er

21.Aversion

22.Sad

ness

23.Disconn

ectio

n

24.Emba

rrassm

ent

25.Sen

sitiv

ity

26.Sym

pathy

Probability

Figure 12: Emotion Distribution per place attribute

Valence Arousal Dominance

Highest

Lowest

Foliage, Snow,

Sailing/Boating,

Warm, Socializing

Sailing/Boating,

Competing, Sports,

Ocean, Snow

Enclosed Area,

Competing, Dry

Driving, Working

Socializing, Working,

Enclose Area,

Nohorizon, Trees

Snow, Ocean,

Sports, Competing,

Open Area

Wood, Enclosed Area,

Cloth, Working,

Man-Made

Wheat Field,

Orchard, Veranda,

Sky Slope, Food Court

Locker Room,

Conference Room,

O!ce, Slum,

Conference Center

Ocean, Baseball

Stadium, Baseball

Field, Sky Resort,

Football Stadium

Hotel Room,

Conference Room,

O!ce, Hospital Room,

Bedroom

Ocean,

Sky Slope,

Raft, Highway,

Forest Road

Hospital,

Conference Center,

Hospital Room, Shower,

Auditorium

Figure 13: Place categories and attributes with highest and

lowest average scores for valence, arousal, and dominance.

tinuous dimensions (Valence, Arousal, and Dominance). In

this paper we present the details on the EMOTIC database

creation and statistics. We also present an algorithmic anal-

ysis of the data performed using state-of-the-art methods for

scene category and scene attribute recognition. Our results

suggest that current scene understanding techniques can be

used to incorporate the analysis of the context for emotional

states recognition. Thus, the EMOTIC dataset, in combina-

tion with previous datasets on emotion estimation, can open

the door to new approaches for apparent emotion estimation

in the wild from visual information.

Acknowledgments

This work has been partially supported by the Ministerio

de Economia, Industria y Competitividad (Spain), under the

Grant Ref. TIN2015-66951-C2-2-R. The authors also thank

NVIDIA for their generous hardware donations.

References

[1] GENKI database. http://mplab.ucsd.edu/

wordpress/?page_id=398. Accessed: 2017-04-12.

68

Page 9: EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence is very important in our everyday life. This capac-ity helps us to perceive or

[2] ICML face expression recognition dataset. https://

goo.gl/nn9w4R. Accessed: 2017-04-12.

[3] T. Banziger, H. Pirker, and K. Scherer. Gemep-geneva mul-

timodal emotion portrayals: A corpus for the study of multi-

modal emotional expressions. In Proceedings of LREC, vol-

ume 6, pages 15–019, 2006.

[4] L. F. Barrett, B. Mesquita, and M. Gendron. Context in emo-

tion perception. Current Directions in Psychological Sci-

ence, 20(5):286–290, 2011.

[5] C. F. Benitez-Quiroz, R. Srinivasan, and A. M. Martinez.

Emotionet: An accurate, real-time algorithm for the auto-

matic annotation of a million facial expressions in the wild.

In Proceedings of IEEE International Conference on Com-

puter Vision & Pattern Recognition (CVPR16), Las Vegas,

NV, USA, 2016.

[6] A. Dhall et al. Collecting large, richly annotated facial-

expression databases from movies. 2012.

[7] A. Dhall, R. Goecke, J. Joshi, J. Hoey, and T. Gedeon.

Emotiw 2016: Video and group-level emotion recogni-

tion challenges. In Proceedings of the 18th ACM Interna-

tional Conference on Multimodal Interaction, pages 427–

432. ACM, 2016.

[8] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static fa-

cial expression analysis in tough conditions: Data, evalua-

tion protocol and benchmark. In Computer Vision Workshops

(ICCV Workshops), 2011 IEEE International Conference on,

pages 2106–2112. IEEE, 2011.

[9] A. Dhall, J. Joshi, I. Radwan, and R. Goecke. Finding hap-

piest moments in a social context. In Asian Conference on

Computer Vision, pages 613–626. Springer, 2012.

[10] S. Du, Y. Tao, and A. M. Martinez. Compound facial expres-

sions of emotion. Proceedings of the National Academy of

Sciences, 111(15):E1454–E1462, 2014.

[11] P. Ekman and W. V. Friesen. Constants across cultures in the

face and emotion. Journal of personality and social psychol-

ogy, 17(2):124, 1971.

[12] S. Eleftheriadis, O. Rudovic, and M. Pantic. Discriminative

shared gaussian processes for multiview and view-invariant

facial expression recognition. IEEE transactions on image

processing, 24(1):189–204, 2015.

[13] S. Eleftheriadis, O. Rudovic, and M. Pantic. Joint facial ac-

tion unit detection and feature fusion: A multi-conditional

learning approach. IEEE Transactions on Image Processing,

25(12):5727–5742, 2016.

[14] E. G. Fernandez-Abascal, B. Garcıa, M. Jimenez, M. Martın,

and F. Domınguez. Psicologıa de la emocion. Editorial Uni-

versitaria Ramon Areces, 2010.

[15] E. Friesen and P. Ekman. Facial action coding system: a

technique for the measurement of facial movement. Palo

Alto, 1978.

[16] A. Kleinsmith and N. Bianchi-Berthouze. Recognizing af-

fective dimensions from body posture. In Proceedings of the

2Nd International Conference on Affective Computing and

Intelligent Interaction, ACII ’07, pages 48–58, Berlin, Hei-

delberg, 2007. Springer-Verlag.

[17] A. Kleinsmith, N. Bianchi-Berthouze, and A. Steed. Au-

tomatic recognition of non-acted affective postures. IEEE

Transactions on Systems, Man, and Cybernetics, Part B (Cy-

bernetics), 41(4):1027–1038, Aug 2011.

[18] R. Kosti, J. M. Alvarez, A. Recasens, and A. Lapedriza.

Emotion recognition in context. In The IEEE Conference

on Computer Vision and Pattern Recognition (CVPR), 2017.

[19] Z. Li, J.-i. Imai, and M. Kaneko. Facial-component-based

bag of words and phog descriptor for facial expression recog-

nition. In SMC, pages 1353–1358, 2009.

[20] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B.

Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollar, and

C. L. Zitnick. Microsoft COCO: common objects in context.

CoRR, abs/1405.0312, 2014.

[21] A. Mehrabian. Framework for a comprehensive description

and measurement of emotional states. Genetic, social, and

general psychology monographs, 1995.

[22] M. A. Nicolaou, H. Gunes, and M. Pantic. Continuous pre-

diction of spontaneous affect from multiple cues and modali-

ties in valence-arousal space. IEEE Transactions on Affective

Computing, 2(2):92–105, 2011.

[23] M. Pantic and L. J. Rothkrantz. Expert system for automatic

analysis of facial expressions. Image and Vision Computing,

18(11):881–905, 2000.

[24] G. Patterson and J. Hays. Coco attributes: Attributes for

people, animals, and objects. In European Conference on

Computer Vision, pages 85–100. Springer, 2016.

[25] R. Picard. Affective computing, volume 252. MIT press Cam-

bridge, 1997.

[26] K. Schindler, L. Van Gool, and B. de Gelder. Recognizing

emotions expressed by body pose: A biologically inspired

neural model. Neural networks, 21(9):1238–1246, 2008.

[27] M. Soleymani, S. Asghari-Esfeden, Y. Fu, and M. Pantic.

Analysis of eeg signals and facial expressions for continuous

emotion detection. IEEE Transactions on Affective Comput-

ing, 7(1):17–28, 2016.

[28] J. L. Tracy, R. W. Robins, and R. A. Schriber. Development

of a facs-verified set of basic and self-conscious emotion ex-

pressions. Emotion, 9(4):554, 2009.

[29] J. F. C. W.-S. Chu, F. De la Torre. Selective transfer ma-

chine for personalized facial expression analysis. In IEEE

Transactions on Pattern Analysis and Machine Intelligence,

2016.

[30] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Tor-

ralba. Learning deep features for discriminative localization.

In Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition, pages 2921–2929, 2016.

[31] B. Zhou, A. Khosla, A. Lapedriza, A. Torralba, and A. Oliva.

Places: An image database for deep scene understanding.

arXiv preprint arXiv:1610.02055, 2016.

[32] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva.

Learning deep features for scene recognition using places

database. In Advances in neural information processing sys-

tems, pages 487–495, 2014.

[33] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Tor-

ralba. Semantic understanding of scenes through ade20k

dataset. 2016.

69