It’s not all written on the robot’s face

Robotics and Autonomous Systems 60 (2012) 1449–1456

Contents lists available at SciVerse ScienceDirect

Robotics and Autonomous Systems

journal homepage: www.elsevier.com/locate/robot

It’s not all written on the robot’s faceJiaming Zhang, Amanda J.C. Sharkey ∗

Neurocomputing and Robotics Group, Department of Computer Science, University of Sheffield, Regent Court, Portobello, Sheffield S1 4DP, UK

a r t i c l e i n f o

Article history:Available online 29 May 2012

Keywords:RobotEmotionsFacial expressionsContext

a b s t r a c t

Past work on creating robots that can make convincing emotional expressions has concentrated on thequality of those expressions, and on assessing people’s ability to recognize them in neutral contexts,without any strong emotional valence. It would be interesting to find out whether observers’ judgmentsof the facial cues of a robot would be affected by a surrounding emotional context. This paper takes itsinspiration from the contextual effects found on our interpretation of the expressions on human facesand computer avatars, and looks at the extent to which they also apply to the interpretation of thefacial expressions of a mechanical robot head. The kinds of contexts that affect the recognition of robotemotional expressions, the circumstances underwhich such contextual effects occur, and the relationshipbetween emotions and the surrounding situation, are observed and analyzed. Design implications forbelievable emotional robots are drawn.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

In natural settings, we receive important cues from both an ex-presser’s facial expressions and the emotion-eliciting situations inwhich the expression occurs. Emotions are intimately related to asituation being experienced or imagined by human. The interpre-tation of human affect is context sensitive: for instance, we inter-pret a smile differently if we know that the person smiling has justheard that they have won the lottery, than we would if we knewthat the same person had just been accused of bullying. Withoutthe context, a human may misunderstand the expresser’s emo-tional expressions. In the future, it is quite likely that we will en-counter some socially interactive robots that have to function inour environment in unpredictable emotional settings. Will we beable to recognize their emotional expressions, and how appropri-ate will they seem?

The extent to which the surrounding emotional context mightaffect the interpretation and recognition of the facial expressionsof robots is an issue that has as yet received little attention. Thispaper presents a further report of studies using a mechanical-likerobot head that explored the influence of a surrounding context onobservers’ attributions of facial expressions to the robot [1].

Previous research on the recognition of robot emotions hastended to focus more on efforts to create universally recognizableexpressions in a neutral context. Breazeal [2] reported researchusing the Kismet robot that demonstrated that even though thatrobot has a mechanical appearance and lacks many of the facial

∗ Corresponding author. Tel.: +44 0114 2221837.E-mail addresses: [email protected] (J. Zhang), [email protected],

[email protected] (A.J.C. Sharkey).

0921-8890/$ – see front matter© 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2012.05.017

features of humans (most notably, skin, teeth, and nose), its facialexpressions were still readable by people with minimal to noprior familiarity with the robot. Kismet’s emotional expressionswere based on Russell’s circumplex model of affect containing twodimensions: valence and arousal [3], and Smith and Scott’s [4]theory of mapping of Action Units (AU) of the Facial ActionCoding System (FACS) [5] to the expressions corresponding to basicemotions. The resulting expressions were quite well recognized.

The Probo robot is more creature-like than Kismet, and hasmore facial features [6]. Like Kismet, Probo’s facial expressionswere created with the help of Action Units (AU) and an emotionalspace based on Russell’s circumplex model of affect [7]. TheProbo project [8] also created convincing and believable facialexpressions for the robot, obtaining an overall recognition rate forits facial expressions which was better than that of Kismet.

Androids are more human-like in terms of their appearanceand behavior, and Ishiguro’s creations have pushed the human-likeness of these robots to a very high level, as in the androidrobot ‘‘Geminoid F ’’. According to Becker-Asano and Ishiguro [9],at first sight and from a distance it is difficult to tell GeminoidF and its human model person apart even when both of themare moving slightly. Surprisingly, the android failed to achieve anoverall recognition rate of the facial expressions corresponding tobasic emotions as high as that of the Kismet and Probo projects.For instance, the overall average recognition rate of the six facialexpressions (happy, sad, fear, disgust, surprise, and anger) forProbowas 84% (including a disgust, but no neutral expression), andfor Kismet was 73% (including disgust, but not neutral expression),whereas for Geminoid F , the overall recognition rate was only 61%(no disgust but a neutral expression included). The project didfind that similar facial displays portrayed by the model person for

http://dx.doi.org/10.1016/j.robot.2012.05.017

http://www.elsevier.com/locate/robot

http://www.elsevier.com/locate/robot

mailto:[email protected]



http://dx.doi.org/10.1016/j.robot.2012.05.017

1450 J. Zhang, A.J.C. Sharkey / Robotics and Autonomous Systems 60 (2012) 1449–1456

Geminoid F achieved higher recognition rates overall than that ofGeminoid F .

It seems that people can recognize the facial expressionsof emotional robots to an extent. However, when it comes torecognising human emotions, context plays a significant role, as agrowing body of empirical research has begun to demonstrate. Forexample, in a review of experimental investigations of the factorsthat determine the recognition of human emotional expressions,Niedenthal et al. [10] concluded that there are many factorscoloring the recognition of emotions, such as the emotion-elicitingsituation in which the expression occurs, the observer’s ownemotional state, and the types of facial expression that the observerhas encountered previously that may result in contrast effects.Although both the human face and the surrounding contextmade important contributions to emotion judgments, the wayin which facial and context cues were combined (congruent orincongruent with each other), affected observers’ judgments offacial expressions. When the surrounding emotional context didnot match the facial expression, observers often reinterpretedeither the facial expression (i.e., the person’s face does not revealthe person’s real feelings) or the meaning of the context (i.e.,the situation does not have the usual meaning for the person).Secondly, observers of human facial expressions were influencedby their own emotional state in their attributions of emotionalstates to the people they observed.

The facial dominance account of human emotions proposed byIzard [11] and Ekman [12] assumed that a clear prototypical facialexpression of a basic emotion would override any expectationsderived from the surrounding situation. However, an increasingamount of evidence supports an account of limited situationaldominance, whereby facial expressions are seen as only one ofthe elements contributing to the emotion attributed to a person,and the surrounding situation is found to have a strong effect.For example, [13] reported a study in which they found that evenwhen facial expressions were of a basic emotion, the context inwhich they occurred determined the specific emotion believedto be expressed by the expresser. In their study, the surroundingsituation affected the interpretation of the facial expression, butsometimes the effects are bidirectional. For instance, when DeGelder and Vroomen [14] explored the combination of informationfrom human facial expressions and voice tone, they foundbidirectional contextual effects. They presented a series of picturesof faces ranging from happy to sad, accompanied by a sad voice,a happy voice or no voice. When the emotions matched, people’sreaction time in judgments was faster than when they weremismatched. The vocally expressed emotions affected judgmentsof the facial emotions, and vice versa.

Context has also been found to affect our interpretation ofthe emotional expressions of computer avatars [15–18]. Thesestudies based the avatar’s facial expressions on the FACS, with theexception of Creed and Beale who based their facial expressionson other work by Ekman [19]. Creed and Beale [15] investigatedhow mismatched facial and audio expressions were perceived,combining an animated avatar face with a female humanvoice. They found that subjects attempted to make mismatchedexpressions consistent on both the visual and audio dimensions ofanimations with the result that their perception of the emotionalexpressions became confused. Noël et al. [16], on the other hand,found that recognition of the emotional expressions of an avatarwas not improved by the presentation of congruent, rather thanincongruent emotional text. However the texts they used werevery short, and their emotional expressions were preselected onthe basis of being extremely recognizable.

Mower et al. [17,18] used computer simulated avatars in aneffort to determine how participants made emotional decisionswhen presented with either conflicting (e.g. angry avatar face,happy avatar voice) or congruent information (e.g. happy avatar

face, happy avatar voice) in an animated display consisting of twochannels: the facial expression and the vocal expression. Theyfound that: (1) when presented with a congruent combination ofaudio and visual information, users were better able to differen-tiate between happy and angry emotional expressions than whenthey were presented with either of the two channels individually;(2) when faced with a conflicting emotional presentation, userspredominantly attended to the vocal channel rather than the visualchannel. In other words, they were more likely to see the avatar’sexpressed emotions as being expressed by means of its voice, thanreflected in its facial expressions. Their findings indicate that theobserver’s recognition of facial expressions can be influenced bythe surrounding context, and also that emotion conveyed by othermodalities, in this case the voice, can override that expressed bythe face.

Empirical research on human and avatar emotions shows thatpeople’s attribution of specific emotions to facial expressions issusceptible to situational information. This research goes againstthe idea of universally recognizable human facial expressions.Will situational information similarly influence the recognition ofsynthetic robot facial expressions, even when they are based onthe FACS system? Niedenthal et al. [10] concluded that contextualeffects on the recognition of human emotional expressions arestronger when the expressions are ambiguous or neutral andless intense. Since synthetic robot facial expressions are moreambiguous than human facial expressions, it seems likely that therecognition of the simulated emotions shown by a robot will besimilarly influenced by a surrounding affective context.

A robot’s facial expressions can be viewed as a modalitycontaining emotional information, as can a surrounding context,such as recorded BBC News or selected affective pictures. The twomodalities may reflect congruent, or incongruent emotions. In thefollowing experiments, the effects of congruent, and incongruentcombinations of context and robot expressions were investigated.

2. Method

2.1. Interaction design

Two between-subjects experiments were conducted in a quietand brightly lit room. The experiments were based on a robot headknown as CIM [20] in the NRG lab, the University of Sheffield.The electronic interface for the CIM robot was built around theSun SPOT (Sun Small Programmable Object Technology) systemincorporating an I2C bus to connect the individual componentsto the dedicated microprocessor. This microprocessor was adedicated java processor capable of running java programs. In thetwo experiments, the FACS (Facial Action Coding System) [21] wasapplied to set up the parameters of the servos (see Table 1) tomake the robot head produce sequences of the six static facialexpressions (joy (e.g., AUs 6+12), fear (e.g., AUs 1+2+4+10+12),surprise (e.g., AUs 1 + 2 + 10 + 12 + 58 + 63), anger (e.g., AUs2+ 23+ 42+ 44), disgust (e.g., AUs 4+ 9+ 15+ 16), and sadness(e.g., AUs 1 + 4 + 6 + 15 + 64)).

Two three minute sequences of simulated facial expressionswere developed. They consisted of two types: Positive Affect,which mainly consisted of three different versions of joyfuland surprised expressions (motions such as looking around andnodding were added in the gaps between these expressions), andNegative Affect which mainly consisted of three different versionsof sad, angry, and disgusted expressions (motions such as shakingwere added in the gaps).

Examples of simulated facial expressions shown by CIM (seeFig. 1) are presented as follows:

In the first experiment, the source of the recordedNews that thesubjects listened to was BBC World News (see Table 2). Two types

J. Zhang, A.J.C. Sharkey / Robotics and Autonomous Systems 60 (2012) 1449–1456 1451

Table 1Servos set up for CIM.

Facial expressions Action set up Servo speed

Joy Smiling lips, raised checks Relatively fastSad Crying brows, crying lips, raised checks, eyes down SlowestAnger Angry brows, eyes narrowed, tightened lips FastFear Raised eye brows, mouth opened FastDisgust Frown eye brows, nose wrinkled, raised upper lip SlowSurprise Raised eye brows, eyes widened, eyes up, neck backward Fastest

Table 2Content of the recorded BBC World News.

No. \ type Positive/Neutral News (date) Negative News (date)

1 World’s oldest spider web was discovered inBritain (01/11/2009).

Collective international action was called to prevent the tiger population fromdying out (28/10/2009).

2 NASA said a ‘significant amount’ of frozen waterhas been found on the moon (14/11/2009).

Scientists found that snow and ice on Africa’s highest mountain Kilimanjaro wasmelting rapidly and could vanish within 20 years (03/11/2009).

3 Scientists identified more than 17,000 species inthe deepest parts of the oceans (23/11/2009).

British scientists warned that the medical use in humans of nanotechnology maydamage the DNA or genetic building blocks of cells (06/11/2009).

4 Scientists used the Large Hadron Collider (LHC)at the CERN to produce the first particlecollisions (24/11/2009).

The United Nations children’s agency said nearly 200 million children under theage of five living in the developing world were stunted as a result of malnutrition(12/11/2009).

5 The draw for 2010 Football World Cup in SouthAfrica took place in Cape Town (05/12/2009).

The United Nations Food Agency appealed for more money to feed one billionhungry people (19/11/2009).

Fig. 1. Joy (top left), surprise (top middle), fear (top right), sadness (bottom left), anger (bottom middle), and disgust (bottom right) of CIM.

of News were used in this study, one was Positive/Neutral News,and the other one was Negative News.

The affective pictures used in the second experiment wereselected from the international affective picture system (IAPS) [22].The selected affective pictures were either all pleasant (No.1463 1610 1710 1920 2040 2057 2070 2080 2150 2160 2311 23402360 2530 2550 2660 4220 4250 5480 5760 5910 7330 7350 74007470 7580 8190 8200 8210 8370 8470 8540) or all unpleasant (No.2205 2590 2661 2710 2750 2800 2900 3180 3220 3230 3300 33503530 6260 6350 6560 6570 9040 9050 9110 9120 9181 9220 93309340 9421 9433 9560 9600 9830 9910 9920). The content of thepictures varied from human faces (18 positive and 15 negative),animal faces (4 positive examples) and accident scenarios (5negative examples). Each type consisted of 32 slides of affectivepictures, each of which was presented for 6 s.

2.2. Hypotheses

As discussed, context effects can be found when people aremaking judgments about human emotional facial expressions,and different judgments are made depending on whether theexpressions and context are congruent, or incongruent [10]. A

similar effect was also found when judgments were made aboutthe expressions of an avatar: users were more able to recognizethe happy and angry facial expressions of the avatar whenconfronted by a congruent audio–visual presentation than whenthey encountered a conflicting audio–visual presentation [17,18].In order to explore whether this is also the case when theymake judgments about robot emotional facial expressions, weformulated one primary hypothesis as follows:

Hypothesis 1 (H1): When there is a surrounding emotionalcontext, people will be better at recognizing the emotionalexpressions portrayed by the robot head when that context iscongruentwith the emotional valence of its facial expressions thanwhen the context is incongruent with the emotional valence of itsfacial expressions.

It was previously found [17,18] that users were influencedmore by the vocal channel than by an avatar’s facial expressionwhen making judgments about the avatar’s emotional state.Limited situational dominance was found in studies of human andavatar faces [13,16]. Contextual influence was found to increasewhen human facial expressions were ambiguous [10], and assuggested above, synthetic robot expressions are likely to bemore ambiguous than human facial expressions. The surrounding


Table 3Percentage of six possible labels (items in the first row) chosen tomatch the displayed facial expressions(items in the first column).

% match Joy Sadness Anger Fear Disgust Surprise % correct

Joy 64 0 0 3 5 28 64Sadness 0 88 1 3 8 0 88Anger 0 4 57 14 24 1 57Fear 11 0 0 0 13 76 0Disgust 0 4 45 11 40 0 40Surprise 14 0 0 18 3 65 65

Table 4Percentage of three possible labels (items in the first row) chosen to match the displayed sequence of facial expressions (positive ornegative).

% match Positive Affect Neutral Affect Negative Affect % correct

Positive Affect Condition Group 1 (congruent news) 94.4 5.6 0 94.4Group 3 (conflicting news) 44.4 27.8 27.8 44.4

Negative Affect Condition Group 2 (conflicting news) 44.4 22.3 33.3 33.3Group 4 (congruent news) 0 5.6 94.4 94.4

context is therefore likely to influence the interpretation of thesynthetic robot facial expressions, and even to dominate thatinterpretation. If H1 is validated, then this reasoning leads to thefollowing hypothesis:

Hypothesis 2 (H2): When subjects are presented with conflict-ing information from the robot’s face and an accompanying emo-tional context at the same time, their attribution of robot emotionsto its facial expressionswill bemore influenced by the surroundingcontext than the robot face itself.

2.2.1. Experiment 1: procedureIn this experiment, each participant was seated on a high back

office chair, in front of the left-hand side of a long rectangulardesk (the experimenter was seated in front of the right-hand sideof the desk). Each participant adjusted the height of the chairand distance between the chair and the desk such that he/shecould view the robot head on the left-hand side of the desk andthe computer screen (to the right of the robot head). During theexperiment each participant listened to the recorded BBC Newsthrough stereo headphones.

Warmup: The emotional robot head CIMdisplayed six differentstatic facial expressions to all subjects. Then subjects were askedto fill in a questionnaire about how they perceived the robot’ssimulated emotions.

Experiment: The subjects listened to a sequence of recordedBBC News items, (either all positive/neutral, or negative) whilebeing shown a 3 min sequence of the robot head’s facialexpressions (positive or negative).

Responses: After listening to the recorded BBC News, subjectswere asked to answer the following questions:

1. As a total impression, please select what kind of News youthink you were listening to from the following given choices.____________________

A: Positive News B: Neutral News C: Negative News.

2. As a total impression, please select what kind of emotion(affect) you think the robotwas feeling from the following givenchoices. ____________________

A: Positive Affect B: Neutral Affect C: Negative Affect.

3. As a detailed impression, please select what emotions youthink the robot was feeling from the following given choices.____________________ (you can choose more than one option)

A: Joy B: Sadness C: Anger D: FearE: Disgust F: Surprise.

Then, all subjectswere asked to fill in one further questionnaire:the Brief Mood Introspection Scale (BMIS) [23].

Subjects were divided into four groups according to differentcombinations of BBC News (positive/neutral vs. negative) androbot expressions (positive vs. negative) (Group1: Positive/NeutralBBC News with Robot’s Positive Affect; Group 2: Positive/NeutralBBC News with Robot’s Negative Affect; Group 3: Negative BBCNews with Robot’s Positive Affect; Group 4: Negative BBC Newswith Robot’s Negative Affect). Two of the groups thus receivedcongruent emotional information from the news and the robothead (Group 1 and Group 4), whilst two received incongruent, orconflicting information (Group 2 and Group 3).

The 72 subjects (38male and 34 female) with average age 26.82who participated in this experiment, had various nationalities(41 British, 4 Chinese, 8 other Asian, and 13 other Europeanparticipants). Theywere recruited through auniversitymailing list,andwere not paid, butwere offered the chance towin a book tokenworth £45. Therewere 18 subjects in each group,whichmeant thatthere were 36 subjects in the Congruent Condition (group 1 andgroup 4) and 36 subjects in the Incongruent Condition (group 2 andgroup 3).

2.2.2. Experiment 1: resultsIn the warming-up procedure, some of the robot’s static

expressions weremore easily recognized than others (see Table 3).Most of the subjects (88%) recognized the robot’s sad expression,and a majority were able to identify the ‘joy’ expression (64%), andthe surprise expression (65%). Anger was recognized correctly by57% of the subjects, but most (76%) confused the fearful expressionwith a surprised expression. The disgust and anger expressions ofthe robot were easily confused, and some of the subjects (less thanone third) found it difficult to tell the difference between the joyand surprise expressions.

The responses to the questionnaire administered after viewingthe robotwere analyzed. As shown in Table 4, subjects’ judgementsabout the robot emotions do seem to be affected by the accompa-nying News. A response was considered correct when the robothead was said to have shown Positive Affect in the Positive Af-fect Condition, or Negative Affect in the Negative Affect Condi-tion (the neutral choice was counted as wrong in both conditions).When the robot’s positive expressions were accompanied by ‘‘pos-itive/neutral’’ News, they were correctly recognized as such 94.4%of the time, as compared to only 44.4% of the time when accompa-nied by ‘‘negative’’ News. Conversely, the robot’s negative expres-sions were correctly recognized as such 94.4% of the time whenpairedwith ‘‘negative’’ News, as opposed to 33.3% of the timewhenpaired with ‘‘positive/neutral’’ News.


Table 5Percentage of three possible labels (items in the first row) chosen to match the played sequence of BBC News items (positive/neutral ornegative).

% match Positive News Neutral News Negative News % correct

Positive/Neutral BBC news condition Group 1 (congruent robot) 77.8 22.2 0 100Group 2 (conflicting robot) 33.3 66.7 0 100

Negative BBC news condition Group 3 (conflicting robot) 5.6 11.1 83.3 83.3Group 4 (congruent robot) 0 5.6 94.4 94.4

Although subjectswere often not good at identifying the robot’sexpressions, they were much better when the robot’s expressionswere accompanied by a congruent context. A Chi-square testfor independence (with Yates Continuity Correction) indicated asignificant association between information style (congruent orconflicting) and accuracy of attributing robot emotions to its facialexpressions. In other words, the accuracy of subjects’ perceptionof robot’s emotions was significantly different depending onwhether the robot behaved congruently with the BBC News(a higher accuracy, 34/36 (94.4% correct, 5.6% incorrect)), orwhether the robot behaved conflictingly with the BBC News (amuch lower accuracy, 14/36 (38.9% correct, 61.1% incorrect)).χ2(1, n = 72) = 22.562, p < 0.0001, correlation coefficientphi = 0.589 (large effect). Consequently, H1 that when thereis a surrounding emotional context, people will be better atrecognizing the emotional expressions portrayed by the robot headwhen that context is congruent with the emotional valence of itsfacial expressions than when the context is incongruent with theemotional valence of its facial expressions, was supported.

It seems that subjects’ identification of the emotion conveyedby the News was good, and did not seem to be affected by whetheror not the robot face displayed matching emotions (see Table 5).A response was considered to be correct when the News wassaid to have been Positive or Neutral in the Positive/Neutral BBCNews Condition, or Negative in the Negative BBC News Condition(neutral News was not counted in this condition).

The hypothesis (H2) that when subjects are presented withconflicting information from the robot’s face and an accompanyingemotional context at the same time, their attribution of robotemotions to its facial expressions will be more influenced bythe surrounding context than the robot face itself, was alsotested. A Chi-square test for independence (with Yates ContinuityCorrection) indicated the BBC News modality had a strongerperceptual effect than the Robot Expression modality whensubjects were presented with conflicting information (33/36accuracy for BBC News vs. 14/36 accuracy for Robot Emotions).χ2(1, n = 72) = 19.854, p < 0.0001, correlation coefficient phi= −0.554 (large effect). Consequently, H2 was supported.

In addition, a Two-Way ANOVA (unrelated) in SPSS wasconducted to explore the impact of the recorded BBC News and thesynthetic robot emotions on subjects’ moods, as measured by theBrief Mood Introspection Scale (BMIS) [23]. The interaction effectbetween recorded BBC News and the synthetic robot emotions onthe subjects’ moods was not statistically significant, F(1, 68) =

0.032, p = 0.858; in addition, neither the main effect for recordedBBC News (F(1, 68) = 0.013, p = 0.910) nor the main effectfor synthetic robot emotions (F(1, 68) = 0.021, p = 0.884)reached statistical significance. There was no significant differencein scores for participantswho listened to the positive/neutral News(M = 47.5278, SD = 6.45196), and participants who listened tothe negative News (M = 47.7222, SD = 7.76664). There was alsono significant difference in scores for participants who watchedpositive sequence of robot facial expressions (M = 47.7500, SD =

5.47918), and participants who watched negative sequence ofrobot facial expressions (M = 47.5000, SD = 8.48023). Hence,neither the recorded BBC News nor the synthetic robot emotionsappeared to affect subjects’ moods.

2.2.3. Experiment 2: procedureThis experiment took place in a smaller room than that used for

Experiment 1, but with the same robot head (on the left-hand sideof the desk) and the same computer screen (on the right-hand sideof the desk, next to the robot head). The experimenter was seatedin front of another desk. The participants could watch the affectivepictures on the computer screen, and, at the same time, see therobot head.

This experiment followed the same procedures as used in thefirst experiment, except that a series of pictureswas shown inplaceof the recorded news:

Warmup: Six different facial expressions of the emotional robotCIM and six static pictures with neutral valence (e.g., a picture of ablue coffee cup) on a computer screen selected from the IAPS wereshowed simultaneously to all subjects. Then subjectswere asked tofill in a questionnaire about how they perceived the robot’s staticfacial expressions. This time, before the warming-up procedure,subjects were also asked to read the instructions for the Affect Gridused to measure current levels of pleasure and arousal [24].

Experiment: Subjects watched 32 slides of affective pictureswhile being simultaneously shown a 3 min sequence of facialexpressions (either positive or negative) of the emotional robot.

Responses: After the last picture was shown, subjects wereasked to use the Affect Grid to indicate their current state, beforethey answered the following questions:

1. As a total impression, please select what kind of affectivepictures you think you were viewing from the following givenchoices. ____________________A: Pleasant Pictures B: Neutral PicturesC: Unpleasant Pictures.

2. As a total impression, please select what kind of emotion(affect) you think the robotwas feeling from the following givenchoices. ____________________

A: Positive Affect B: Neutral Affect C: Negative Affect.

3. As a detailed impression, please select what emotions youthink the robot was feeling from the following given choices.____________________ (you can choose more than one option)

A: Joy B: Sadness C: Anger D: CaringE: Disgust F: Surprise.

Subjects were divided into four groups according to differentcombinations of affective pictures (positive vs. negative) and robotexpressions (positive vs. negative) (Group 1: Positive AffectivePictures with Robot’s Positive Affect; Group 2: Negative AffectivePictures with Robot’s Positive Affect; Group 3: Positive AffectivePictures with Robot’s Negative Affect; Group 4: Negative AffectivePictures with Robot’s Negative Affect). Two of the groups thusreceived congruent emotional information from the affectivepictures and the robot head (Group 1 and Group 4), whilst tworeceived incongruent, or conflicting information (Group 2 andGroup 3).

56 volunteers with an average age of 24.5 participated inthis experiment. These 23 male and 33 female participants hadvarious nationalities (21 British, 15 Chinese, 9 other Asian, and7 other European participants). The volunteers were recruited


Table 6Percentage of three possible labels (items in the first row) chosen to match the displayed sequence of facial expressions (positive ornegative).

% match Positive Affect Neutral Affect Negative Affect % correct

Positive Affect Condition Group 1 (congruent pictures) 78.6 7.1 14.3 78.6Group 2 (conflicting pictures) 14.3 7.1 78.6 14.3

Negative Affect Condition Group 3 (conflicting pictures) 21.4 21.5 57.1 57.1Group 4 (congruent pictures) 7.1 0 92.9 92.9

Table 7Percentage of three possible labels (items in the first row) chosen to match the displayed set of affective pictures (positive or negative).

% match Positive Pictures Neutral Pictures Negative Pictures % correct

Positive Pictures Condition Group 1 (congruent robot) 100 0 0 100Group 3 (conflicting robot) 96 4 0 96

Negative Pictures Condition Group 2 (conflicting robot) 0 0 100 100Group 4 (congruent robot) 0 4 96 96

through the university mailing list, and were offered the chanceto win a book token worth £50. Care was taken to ensure that nosubjects in experiment 2 had participated in experiment 1 (thiswas mentioned in the recruiting email, and their names werechecked). There were 14 subjects in each group, resulting in 28subjects in the Congruent Condition (group 1 and group 4), and 28subjects in the Conflicting Condition (group 2 and group 3).

2.2.4. Experiment 2: resultsThe responses to the questionnaire administered after viewing

the robot were analyzed. Table 6 shows subjects’ perception ofthe facial expressions of the robot as they were displayed togetherwith the affective pictures. A responsewas considered to be correctwhen the robot head was said to have shown Positive Affect inthe Positive Affect Condition, or Negative Affect in the NegativeAffect Condition (the neutral choice was counted as wrong inboth conditions). It seemed that subjects’ judgements about therobot were affected by the accompanying affective pictures. Whenthe robot’s positive expressions were accompanied by ‘‘positive’’pictures, they were correctly recognized as such 78.6% of the time,as compared to only 14.3% of the time when accompanied by‘‘negative’’ pictures. Conversely, the robot’s negative expressionswere correctly recognized as such 92.9% of the time when pairedwith ‘‘negative’’ pictures, as opposed to 57.1% of the time whenpaired with ‘‘positive’’ pictures.

A Chi-square test for independence (with Yates ContinuityCorrection) indicated significant association between informationstyle (Conflicting Information or Congruent Information) and ac-curacy of attributing robot emotions to its facial expressions, inother words, subjects were less able to recognize the robot’semotional expressions, when the expressions conflicted with va-lence of the affective pictures (a lower accuracy, 10/28 (35.7%correct)), than when the robot behaved congruently with the af-fective pictures (a much higher accuracy, 24/28 (85.7% correct)).χ2(1, n = 56) = 12.652, p < 0.0001, correlation coefficient phi= 0.512 (large effect). Consequently, H1 that when there is asurrounding emotional context, people will be better at recogniz-ing the emotional expressions portrayed by the robot head whenthat context is congruent with the emotional valence of its facialexpressions than when the context is incongruent with the emo-tional valence of its facial expressions, was supported.

As shown in Table 7, subjects were mostly good at identifyingthe emotional content of the pictures. A response was consideredto be correct when the pictures were said to have been Positive inthe Positive Pictures Condition, orNegative in theNegative PicturesCondition (a neutral response was counted as incorrect in eithercondition). It seemed that subjects’ identification of the emotionconveyed by the affective pictures was not affected by whether ornot the robot face displayed matching emotions.

The hypothesis (H2) that when subjects are presented withconflicting information from the robot’s face and an accompanyingemotional context at the same time, their attribution of robotemotions to its facial expressions will be more influenced by thesurrounding context than the robot face itself, was also tested.Two Chi-square tests for independence (with Yates ContinuityCorrection) indicated the Affective Pictures modality had strongerperceptual effect than the Robot Emotions modality in Group2 (14/14 accuracy for Affective Pictures vs. 2/14 accuracy forRobot Emotions) but not in Group 3 (13/14 accuracy for AffectivePictures vs. 8/14 accuracy for Robot Emotions) when subjectswere presented with conflicting information. In Group 2, χ2(1,n = 28) = 17.646, p < 0.0001 (smaller than 0.05), correlationcoefficient phi = 0.866 (large effect); and in Group 3, χ2(1, n =

28) = 3.048, p = 0.081 (bigger than 0.05), correlation coefficientphi = 0.412 (medium effect). Consequently, H2 was only weaklysupported.

In addition, an Independent-Samples T -Test in SPSS was con-ducted to compare the unpleasant–pleasant scores on the AffectGrid for participants who watched the unpleasant pictures andparticipants who watched the pleasant pictures. There was sig-nificant difference in scores for participants who watched theunpleasant pictures (M = −1.5357, SD = 1.31887), and partic-ipants who watched the pleasant pictures (M = 1.7143, SD =

1.21281), with t(54) = 9.598, p < 0.0001 (two-tailed). Themagnitude of the difference inmeans (means difference=3.25000,95% CI: 2.57114–3.92886) was very large (eta squared = 0.63, for0.01 = small effect, 0.06 = moderate effect, and 0.14 = large ef-fect). There was evidence that the pleasant pictures made peoplefeel happier than the unpleasant pictures.

3. Conclusion and discussion

This paper has examined how the surrounding emotionalcontext (congruent, or incongruent) influenced users’ perceptionof a robot’s simulated emotional expressions. Hypothesis 1, thatwhen there is a surrounding emotional context, people willbe better at recognizing robot emotions when that context iscongruent with the emotional valence of the robot emotions thanwhen the context is incongruent with the emotional valence ofthe robot emotions, was supported in both of the experimentsreported here. This suggests that the recognition of robot emotionscan be strongly affected by a surrounding context, and suchemotions are more likely to be recognized as intended whenthey are congruent with that context. The second hypothesis, thatwhen a robot’s expressions are not appropriate given the context,subjects’ judgments of those expressions will be more affected bythe context than the expressions themselves was clearly validatedin the first experiment, and weakly supported in the second. In


summary, when confronted with congruent surrounding context(the recorded BBC News or the selected affective pictures), peoplewere more able to recognize the robot emotions as intended thanwhen faced by an incongruent surrounding context. In addition itwas found that the recorded BBCNews had amore dominant effecton judgements about the robot’s expressions than the contentof the expressions themselves. The affective pictures showed aweaker, but still dominant effect.

There are two possible explanations of the effect of theaccompanying emotional context on judgements about the robot’semotional expressions. One is that the context affects theemotional state of the observer, which in turn affects theirperception of the robot. Niedenthal et al. [10] found that observersof human facial expressions were influenced by their ownemotional state in their attributions of emotional states to thepeople they observed. The other is that the observer interprets therobot’s expressions as though the robot was also witnessing thecontext and responding to it. In this explanation, the observers’judgements are not determined by their own emotional state.

Which explanation is better supported by the experimentsreported here? A significant association between subjects’ moodsand their perception of the synthetic robot emotions was onlyfound in the second experiment. We suggest therefore that theeffect of the surrounding context observed in the first experimentwas not due to the observers’ own emotional state coloring theirperception of the robot, but resulted instead from witnessingthe context and the robot at the same time, and interpretingthe robot’s expressions as if the robot was responding to thesurrounding context. When the context was negative, the robot’sexpressions were seen to be negative, even when on objectivecriteria (the underlying FACS coding) they should have been seenas positive. The reverse was also true (the robot’s expressionswere seen as positive when the surrounding context was positive,even when there were objective reasons to expect the expressionto be seen as a negative one). The same effect was found in thesecond experiment, although in this case the affective pictureswere found to have influenced subjects’ emotional state. In thisexperiment, judgments about the robot’s expressions may havebeen influenced by a combination of the observers’ own emotionalstate, and their interpretation of the robot’s expressions. It is alsopossible that the results of experiment 2 are the result of theinfluence of the pictures on the subjects’ moods alone, and not aconsequence of the subjects interpreting the robot’s expressionsas a reaction to the pictures. The positioning of the robot head wassuch that it would not have appeared to be looking at the pictures(although subjects were told to watch the robot face and thecomputer screen at the same time). It is thus possible that differentexplanations underlie the context effects in the two experiments.

More research is needed to explore the relative effects ofdifferent kinds of surrounding context, and to distinguish betweenpossible explanations of the contextual effects. There is someevidence that similar results can be obtained even if differentaccompanying contexts are used than those reported here. Furtherevidence of the effect of an emotional musical context wasprovided in another study by Zhang and Sharkey [25]. They usedthe same experimental set up as in experiment 2 (same room, samecomputer, same robot head, and same relative position betweenthe computer and the robot head) to explore the effects of amusical context. Again, care was taken to ensure that no subjectsin this study had participated in experiment 1 or 2. They foundthat emotionally valenced classical music also affected judgmentsof the robot’s emotional facial expressions. The expressions weremore likely to be recognized as intended when they occurredwith music of a similar valence. Interestingly, it was found thatthe contextual influence was bidirectional: the music influencedjudgments of the robot’s emotional expressions, and the robot’s

expressions affected people’s judgments of the emotional valenceof the music, resulting in no dominant effect on judgements aboutthe robot’s expressions found for the classical music.

By contrast, when a film context was used in a further exper-iment in [31], no context effect was found. In that experiment,emotional film clips were used as an accompanying context. Usingsimilar procedures to those of Experiment 1 and 2, two film clipswereused as context: an amusing filmclip (a fake orgasmscene in arestaurant from the film ‘‘WhenHarrymet Sally’’, [2:53min]) and asad film clip (a death scene from the film ‘‘The Champ’’, [2:45min]).The reasons that the film clips did not influence the interpretationof the robot’s expressions are discussed further in [31], but maywell be due to the choice of film clip—some participants seem tohave been unclear about whether or not the amusing film clip wasan example of positive as opposed to neutral or negative content.Nonetheless, it seems that it cannot be assumed that every emo-tional context will affect the recognition and interpretation of arobot’s facial expressions.

We can also consider whether the results obtained in theseexperiments would generalize to other robots. The robot headused in these studies has an admittedly limited ability to articulateexpressions. It lacks skin and hair and has limited degrees offreedom: left eye and right eye—yaw and tilt abilities; left eyebrowand right eyebrow—capable of raising the right and left sides;upper and lower lip—capable of bending up and down; neck—allowing the head to turn and tilt; nose—allowing movementbackward and forward; cheeks—capable of rotating up and down.It might be that contextual effects were found here preciselybecause the robot’s expressions were somewhat ambiguous. Itwould be interesting to know whether other robots, such asKismet, and Probo, with more expressive faces are similarlysusceptible to the effects of a surrounding context or not.

Given that even judgments about human facial expressionshave been also shown to be influenced by surrounding con-texts [10], we can speculate that such robots would be affected,at least to some degree. Research by Becker-Asano and Ishiguro [9]suggests that robot facial expressions are likely to bemore ambigu-ous than human facial expressions. Consequently, it is expectedthat even the emotional judgments of facial expressions of an-droids (e.g., [27–29]) or robots with hybrid face (e.g., the BERT 2Head composed of a plastic face plate and an LCD display [30]) willbe affected by the surrounding context to some degree.

We can also question the extent towhich the present results area consequence of the particular questions asked of the participants.In the experiments reported here, and in [25,31], subjects wereasked to make a forced choice between describing the robot’ssequence of expressions as positive, neutral, or negative. Thesimplicity of this choice could be partly responsible for the effects:the present results do not indicate whether or not similar resultswould be obtained if different questions had been asked. It wouldbe interesting in future research to explore this, and to find outwhat results would be obtained if subjects were allowed a widerchoice of terms.

Nonetheless, the overall set of results reported here, andby Zhang and Sharkey [25] cohere with the limited situationaldominance account [13], even though that account was proposedin the context of the recognition and interpretation of human facialexpressions. The limited situational dominance account views thefacial expression itself as only one element in the interpretationof emotional expressions—others being the surrounding situation,and also the current state of the observer. These studies extend ourknowledge about the kinds of context that affect the recognition ofrobot emotional expressions. It seems that recorded speech withan emotional content, affective pictures, and emotionally valencedmusic can all affect such recognition. In the two experimentsreported in this paper, the news and affective pictures both had


a more dominant effect over the robot face—subjects were moreaffected by the emotional content of BBC News, and by the contentof affective pictures in their judgments of the robot’s emotionsthan theywere by the content of the expressions themselves. Otherresearchers also found a stronger effect of surrounding context onthe recognition of avatar and human faces [13,16–18]. However,unlike the spoken and visual contexts investigated here, musicappears to have a less dominant, and more bidirectional influence.This is consistent with earlier findings obtained with avatars andhumans [14,15,26].

In summary, previous research had shown that the recognitionof human and avatar emotions can be affected by a surroundingemotional context, and the extent to which it matches, or conflictswith those emotions. The present paper provides evidence thatthe recognition of the emotional expressions of a moving robothead can also be affected by a surrounding context. In general,these findings imply ways of enhancing the expressive skills ofemotional robots. It seems that achieving a good match betweena robot’s simulated emotions and the surrounding context isimportant. Hong et al. [26] used neural networks to map real-timesurrounding emotional contexts to synthetic avatar emotions—similar approaches in which the valence of surrounding context isdetected, and the robot’s expressions adapted tomatch it,maywellprove to be a goodway of creating robots in the future that are seenas more convincing and believable.

References

[1] J. Zhang, A.J.C. Sharkey, Contextual recognition of robot emotions, in: R. Groß,et al. (Eds.), TAROS 2011, in: LNAI, vol. 6856, 2011, pp. 78–89.

[2] C. Breazeal, Designing Sociable Robots, MIT Press, Cambridge, 2002.[3] J. Russell, Reading emotions from and into faces: resurrecting a dimensional-

contextual perspective, in: J. Russell, J. Fernandez-Dols (Eds.), The Psychologyof Facial Expression, Cambridge University Press, Cambridge, UK, 1997,pp. 295–320.

[4] C. Smith, H. Scott, A componential approach to the meaning of facialexpressions, in: J. Russell, J. Fernandez-Dols (Eds.), The Psychology of FacialExpression, Cambridge University Press, Cambridge, UK, 1997, pp. 229–254.

[5] P. Ekman, W. Friesen, Facial Action Coding System, Consulting PsychologistsPress, 1978.

[6] J. Saldien, K. Goris, B. Vanderborght, B. Verrelst, R. Van Ham, D. Lefeber, ANTY:The Development of an Intelligent Huggable Robot for Hospitalized Children,CLAWAR, 2006.

[7] J. Posner, J. Russell, B. Peterson, The circumplex model of affect: anintegrative approach to affective neuroscience, cognitive development, andpsychopathology, Development and Psychopathology 17 (03) (2005) 715–734.

[8] J. Saldien, K. Goris, B. Vanderborght, J. Vanderfaeillie, D. Lefeber, Expressingemotions with the social robot Probo, International Journal of Social Robotics2 (2010) 377–389.

[9] C. Becker-Asano, H. Ishiguro, Evaluating facial displays of emotion for theandroid robot Geminoid F, in: Proceedings IEEE SSCI Workshop on AffectiveComputational Intelligence, 2011, pp. 22–29.

[10] P.M. Niedenthal, S. Kruth-Gruber, F. Ric, What information determinesthe recognition of emotion? in: The Psychology of Emotion: InterpersonalExperiential, and Cognitive Approaches, in: Principles of Social Psychology,Psychology Press, New York, 2006, pp. 136–144.

[11] C.E. Izard, The Face of Emotion, Appleton-Century-Crofts, New York, 1971.[12] P. Ekman, Universals and cultural differences in facial expressions of emotion,

in: J.K. Cole (Ed.), Nebraska Symposium on Motivation, vol. 19, University ofNebraska Press, Lincoln, 1972, pp. 207–283.

[13] J.M. Carroll, J.A. Russell, Do facial expressions signal specific emotions? judgingthe face in context, Journal of Personality and Social Psychology 70 (1996)205–218.

[14] B. de Gelder, J. Vroomen, The perception of emotions by ear and by eye,Cognition & Emotion 14 (2000) 289–311.

[15] C. Creed, R. Beale, Psychological responses to simulated displays of mis-matched emotional expressions, Interacting with Computers 20 (2) (2007)225–239.

[16] S. Noël, S. Dumoulin, G. Lindgaard, Interpreting human and avatar facialexpressions, in: T. Gross, J. Gulliksen, P. Kotzé, L. Oestreicher, P. Palanque,R. Prates, M. Winckler (Eds.), Human–Computer Interaction—INTERACT 2009,Springer, Berlin, Heidelberg, 2009, pp. 98–110.

[17] E. Mower, S. Lee, M.J. Matariric, S. Narayanan, Human perception of syntheticcharacter emotions in the presence of conflicting and congruent vocal andfacial expressions, in: IEEE Int. Conf. Acoustics, Speech, and Signal Processing,ICASSP 2008, Las Vegas, NV, 2008, pp. 2201–2204.

[18] E. Mower, M.J. Matarić, S. Narayanan, Human perception of audio–visualsynthetic character emotion expression in the presence of ambiguous andconflicting information, IEEE Transactions on Multimedia 11 (5) (2009).

[19] P. Ekman, Emotions Revealed: Recognizing Faces and Feelings to ImproveCommunication and Emotional Life, Henry Holt & Co., 2004.

[20] A. Bailey, Synthetic social interactions with a robot using the basic personalitymodel, M.Sc. Thesis, University of Sheffield, Sheffield, England, 2006.

[21] P. Ekman, W. Friesen, J. Hager, Facial Action Coding System, Research Nexus,Salt Lake City, Utah, 2002.

[22] M.M. Bradley, P.J. Lang, The international affective picture system (IAPS) in thestudy of emotion and attention, in: J.A. Coan, J.B. Allen (Eds.), The Handbook ofEmotion Elicitation and Assessment, 2007, pp. 29–46.

[23] J.D. Mayer, Y.N. Gaschke, The experience and meta-experience of mood,Journal of Personality and Social Psychology 55 (1988) 102–111.

[24] J.A. Russell, A. Weiss, G.A. Mendelsohn, Affect grid: a single-item scale ofpleasure and arousal, Journal of Personality and Social Psychology 57 (3)(1989) 493–502.

[25] J. Zhang, A.J.C. Sharkey, Listening to sadmusic while seeing a happy robot face,in: B. Mutlu, et al. (Eds.), ICSR 2011, in: LNAI, vol. 7072, 2011, pp. 173–182.

[26] P. Hong, Z. Wen, T. Huang, Real-time speech driven expressive synthetictalking faces using neural networks, IEEE Transactions on Neural Networks(2002).

[27] P. Jaeckel, N. Campbell, C. Melhuish, Gaussian process regression for facialbehavior mapping, in: Proceedings Taros 2008, 2008.

[28] P. Jaeckel, N. Campbell, C. Melhuish, Facial behavior mapping—from videofootage to a robot head, Robotics and Autonomous Systems (2008).

[29] C. Henrik, P. Jaeckel, N. Campbell, N. Lawrence, C. Melhuish, Shared Gaussianprocess latent variable models for handling ambiguous facial expressions,in: Proceedings Intelligent Systems and Automation: 2nd MediterraneanConference on Intelligent Systems and Automation, CISA’09, vol. 1107, 5March 2009, pp. 147–153.

[30] D. Bazo, R. Vaidyanathan, A. Lenz, C. Melhuish, Design and testing ofhybrid expressive face for the BERT2 humanoid robot, in: IEEE InternationalConference on Intelligent Robots and Systems, IROS, Taipei, Taiwan, October2010, pp. 5317–5322.

[31] J. Zhang, Contextual recognition of robot emotions, Doctoral Thesis, Universityof Sheffield, March 2012 (submitted for publication).

Jiaming Zhang is currently in his last year as a Ph.D. stu-dent. He joined the Neurocomputing and Robotics Groupin the Department of Computer Science, University ofSheffield in October, 2008. His Ph.D. topic is focused onthe investigation of contextual effects on the recognitionof the emotional expressions of robots. His field of inter-est includes Artificial Intelligence, Affective Computing,Robotics, and Human Robot Interaction (HRI). His inter-est primarily stems from the two degrees—B.Sc. Commu-nication Engineering and M.Sc. Control Systems that hecompleted in Chongqing University, China and in the Uni-

versity of Sheffield, UK, respectively.

Amanda J.C. Sharkey has an interdisciplinary background.After taking a first degree in Psychology, she held avariety of research positions at University of Exeter, MRCCognitive Development Unit, and Yale and then Stanford,USA. She completed her Ph.D. in Psycholinguistics in1989 at University of Essex. Since then she conductedresearch in neural computing at University of Exeter,before moving to University of Sheffield where she is nowa senior lecturer in the Department of Computer Science,and researches human–robot interaction and associatedethical issues, swarm robotics, and combining neural nets

and other estimators. Amanda has over 70 publications. She is a founding memberof the scientific committee for the international series of workshops on MultipleClassifier Systems and is Editor of the journal Connection Science.

Documents

It’s not all written on the robot’s face