6
An ECA Expressing Appreciations Sabrina Campano, Caroline Langlet, Nadine Glas, Chlo´ e Clavel Institut Mines-T´ el´ ecom, T´ el´ ecom-ParisTech, CNRS-LTCI 46 Rue Barrault, 75013 Paris, France Telephone: +33(0)1 45 81 75 14 Email: {firstname.lastname}@telecom-paristech.fr Catherine Pelachaud CNRS-LTCI, T´ el´ ecom-ParisTech 46 Rue Barrault, 75013 Paris, France Telephone: +33(0)1 45 81 75 93 Email: [email protected] Abstract—In this paper, we propose a computational model that provides an Embodied Conversational Agent (ECA) with the ability to generate verbal other-repetition (repetitions of some of the words uttered in the previous user speaker turn) when interacting with a user in a museum setting. We focus on the generation of other-repetitions expressing emotional stances in appreciation sentences. Emotional stances and their semantic features are selected according to the user’s verbal input, and ECA’s utterance is generated according to these features. We present an evaluation of this model through users’ subjective reports. Results indicate that the expression of emotional stances by the ECA has a positive effect on user engagement, and that ECA’s behaviours are rated as more believable by users when the ECA utters other-repetitions. Keywordsother-repetition; engagement; alignment; emotional stance; embodied conversational agent I. I NTRODUCTION Embodied Conversational Agents (ECAs) are computer- generated characters that are able to produce and respond to verbal and nonverbal communication. Fostering user’s engage- ment [1] during interactions with ECAs or robots is an im- portant process in human-agent interaction. A disengaged user may leave the interaction too early, and prevents the agent from completing its task. A method that may contribute to engaging a user with an ECA is to simulate alignment processes. Various terminology are used to design alignment processes or similar processes. These processes differ on the way they integrate the temporal and dynamic aspects. For example, mimicry is defined as the direct imitation of what the other participant produces [2], while synchrony is defined as the dynamic and reciprocal adaptation of temporal structures of behaviors between interactive partners [3]. Implementations of alignment strategies in human-computer dialogues concerned mainly alignment on lexical and syntactic choices [4], while the community of human-agent face-to-face interaction furthers implementations of non verbal alignment using terminologies that are slightly different from the one used in corpus studies: mimicry [5], synchrony [3], social/emotional resonance [6], [7], emotional mirroring [8], dynamical coupling [9]. The work presented in this paper is conducted in the framework of the national french project A1:1 1 , in which a human-size Embodied Conversational Agent (ECA) is being developed and is dedicated to sustain face to face interaction with museum visitors. The scientific focus of the project is to trigger and maintain user’s engagement during his/her interaction. The ECA has knowledge about the museum’s 1 http://lifesizeavatar.com/ objects and artworks, and its role is to discuss about them with visitors. In this context, the present study focuses on the devel- opment of verbal-centered alignment strategies acting at two levels: the appreciation level (sharing appreciation) and at the lexical level by using other-repetition (OR) to express appreciations. OR is the intentional repetition by the hearer of part of what the speaker has just said, in order to convey a communicative function that was not present in the first instance [10] [2]. To our knowledge, no computational model of OR has been proposed so far in human-agent interaction. To take a step in this direction, we focus on one of the communicative functions of OR: OR expressing emotional stances [10] 2 . In this process, the hearer repeats a part of speaker’s last sentence in order to convey surprise, or a negative / positive evaluation about it. Emotional stances about museum topics can be expressed through the verbal form of appreciations which is a basic activity for visitors in a museum [11]. Such type of OR has been identified in our previous work grounded on corpora analysis, including the human-agent interaction corpus SEMAINE [12]. In this paper, we describe and evaluate a computational model of OR, that allows an ECA to convey emotional stances through language by using sentences expressing an appreciation in response to the user’s previous utterance. The model is able to select a given emotional stance, and to generate the corresponding appreciation sentence to be said by the ECA (Section II). The evaluation methods of the model – a user study including questionnaires addressed to 33 participants – and the arising results are presented in Section III. II. SELECTION AND GENERATION OF APPRECIATION SENTENCES The Detection of User’s Appreciations Module (Detect- Appr Module) transmits a set of semantic features correspond- ing to the user’s last utterance as inputs to the Other-Repetition with Emotional Stances Module (OR-Emo Module). Inputs of the OR-Emo Module also include Agent’s Preferences. The OR-Emo Module proceeds in two steps detailed in Figure 1. First (blue boxes of Figure 1), the module selects an emotional stance according to the inputs (semantic features of user’s sentence and agent’s preferences), and second (red boxes of Figure1), it generates a sentence corresponding to the emotional stance which was selected. This output sentence to be uttered by the agent contains an appreciation including an 2 Other communicative functions are expressing understanding, formulating a request for clarification, promoting topical talk 978-1-4799-9953-8/15/$31.00 ©2015 IEEE 962 2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

An ECA Expressing Appreciations - Casa Paganini - · PDF file · 2015-09-19An ECA Expressing Appreciations ... OR-Emo Module proceeds in two steps detailed in Figure 1. ... each appreciation

  • Upload
    lythu

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: An ECA Expressing Appreciations - Casa Paganini - · PDF file · 2015-09-19An ECA Expressing Appreciations ... OR-Emo Module proceeds in two steps detailed in Figure 1. ... each appreciation

An ECA Expressing Appreciations

Sabrina Campano, Caroline Langlet, Nadine Glas, Chloe ClavelInstitut Mines-Telecom, Telecom-ParisTech, CNRS-LTCI

46 Rue Barrault, 75013 Paris, FranceTelephone: +33(0)1 45 81 75 14

Email: {firstname.lastname}@telecom-paristech.fr

Catherine PelachaudCNRS-LTCI, Telecom-ParisTech

46 Rue Barrault, 75013 Paris, FranceTelephone: +33(0)1 45 81 75 93

Email: [email protected]

Abstract—In this paper, we propose a computational modelthat provides an Embodied Conversational Agent (ECA) with theability to generate verbal other-repetition (repetitions of someof the words uttered in the previous user speaker turn) wheninteracting with a user in a museum setting. We focus on thegeneration of other-repetitions expressing emotional stances inappreciation sentences. Emotional stances and their semanticfeatures are selected according to the user’s verbal input, andECA’s utterance is generated according to these features. Wepresent an evaluation of this model through users’ subjectivereports. Results indicate that the expression of emotional stancesby the ECA has a positive effect on user engagement, and thatECA’s behaviours are rated as more believable by users whenthe ECA utters other-repetitions.

Keywords—other-repetition; engagement; alignment; emotionalstance; embodied conversational agent

I. INTRODUCTION

Embodied Conversational Agents (ECAs) are computer-generated characters that are able to produce and respond toverbal and nonverbal communication. Fostering user’s engage-ment [1] during interactions with ECAs or robots is an im-portant process in human-agent interaction. A disengaged usermay leave the interaction too early, and prevents the agent fromcompleting its task. A method that may contribute to engaginga user with an ECA is to simulate alignment processes. Variousterminology are used to design alignment processes or similarprocesses. These processes differ on the way they integratethe temporal and dynamic aspects. For example, mimicry isdefined as the direct imitation of what the other participantproduces [2], while synchrony is defined as the dynamicand reciprocal adaptation of temporal structures of behaviorsbetween interactive partners [3]. Implementations of alignmentstrategies in human-computer dialogues concerned mainlyalignment on lexical and syntactic choices [4], while thecommunity of human-agent face-to-face interaction furthersimplementations of non verbal alignment using terminologiesthat are slightly different from the one used in corpus studies:mimicry [5], synchrony [3], social/emotional resonance [6],[7], emotional mirroring [8], dynamical coupling [9].

The work presented in this paper is conducted in theframework of the national french project A1:11, in which ahuman-size Embodied Conversational Agent (ECA) is beingdeveloped and is dedicated to sustain face to face interactionwith museum visitors. The scientific focus of the projectis to trigger and maintain user’s engagement during his/herinteraction. The ECA has knowledge about the museum’s

1http://lifesizeavatar.com/

objects and artworks, and its role is to discuss about themwith visitors.

In this context, the present study focuses on the devel-opment of verbal-centered alignment strategies acting at twolevels: the appreciation level (sharing appreciation) and atthe lexical level by using other-repetition (OR) to expressappreciations. OR is the intentional repetition by the hearerof part of what the speaker has just said, in order to conveya communicative function that was not present in the firstinstance [10] [2]. To our knowledge, no computational modelof OR has been proposed so far in human-agent interaction.To take a step in this direction, we focus on one of thecommunicative functions of OR: OR expressing emotionalstances [10]2. In this process, the hearer repeats a part ofspeaker’s last sentence in order to convey surprise, or anegative / positive evaluation about it. Emotional stances aboutmuseum topics can be expressed through the verbal formof appreciations which is a basic activity for visitors in amuseum [11]. Such type of OR has been identified in ourprevious work grounded on corpora analysis, including thehuman-agent interaction corpus SEMAINE [12]. In this paper,we describe and evaluate a computational model of OR, thatallows an ECA to convey emotional stances through languageby using sentences expressing an appreciation in response tothe user’s previous utterance. The model is able to select agiven emotional stance, and to generate the correspondingappreciation sentence to be said by the ECA (Section II).The evaluation methods of the model – a user study includingquestionnaires addressed to 33 participants – and the arisingresults are presented in Section III.

II. SELECTION AND GENERATION OF APPRECIATIONSENTENCES

The Detection of User’s Appreciations Module (Detect-Appr Module) transmits a set of semantic features correspond-ing to the user’s last utterance as inputs to the Other-Repetitionwith Emotional Stances Module (OR-Emo Module). Inputs ofthe OR-Emo Module also include Agent’s Preferences. TheOR-Emo Module proceeds in two steps detailed in Figure 1.First (blue boxes of Figure 1), the module selects an emotionalstance according to the inputs (semantic features of user’ssentence and agent’s preferences), and second (red boxesof Figure1), it generates a sentence corresponding to theemotional stance which was selected. This output sentence tobe uttered by the agent contains an appreciation including an

2Other communicative functions are expressing understanding, formulatinga request for clarification, promoting topical talk

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 962

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Page 2: An ECA Expressing Appreciations - Casa Paganini - · PDF file · 2015-09-19An ECA Expressing Appreciations ... OR-Emo Module proceeds in two steps detailed in Figure 1. ... each appreciation

Fig. 1: Decision tree of the OR-Emo Module. The process startsonce the user has produced a sentence.

other-repetition. The OR-Emo module has been built both ontheoretical studies – [10] for the selection of emotional stanceswith OR and [13] for the generation of appreciation sentences– and on the study of the SEMAINE corpus [12]. The modulehas been integrated in a Human-Agent system. The GRETAsystem [14] is used for the agent model. The dialogue modelused in the GRETA platform is DISCO [15], a hierarchicaltask network.A. Inputs: User Appreciations and Agent Preferences

A list of semantic features is associated to each user’ssentence expressing an appreciation. In this study, the detectionof user’s appreciations is carried out manually by a humanexpert (acting as a Wizard Of Oz) during the interactionbetween a human participant and an agent. We built a databaseof appreciation sentences, and we associated each sentencewith semantic features. During the interaction, when the humanexpert detects an appreciation, he/she selects the appreciationsentence that best matches the appreciation sentence utteredby the participant. The human expert is trained to this taskand provides a quick answer that does not damage the qualityof the interaction. When the expert selected the appreciationsentence, its semantic features were transmitted to the OR-EmoModule3. Given a user usr, each appreciation Appusr is asso-ciated to a semantic feature set following Martin and White’smodel [13] as described in [16]. The feature set includes thepolarity of the appreciation polAppusr

= {positive, negative},the source4 of the appreciation srcAppusr

, the target of theappreciation (targetAppusr ∈ T ), the lemmatized form ofthe appreciation lexia lexiaLemAppusr belonging to a listL of appreciation lexia, the part of speech of the lexicalunit (lexiaPosTagAppusr ∈ {ADJ, V ERB}) expressing theappreciation. For example, the semantic features correspond-ing to the sentence “I don’t like the baroque style” utteredby usr are: polAppusr

= negative; srcAppusr= usr;

targetAppusr= baroque style; lexiaLemAppusr

= like;lexiaPosTagAppusr

= V ERB.

3We are currently developing an automatic appreciation detection system[16] that is devoted to replace the Wizard of Oz.

4In our interaction model, only the appreciations for which the user is thesource in the user utterance have been considered, which is not necessarily thecase: in the sentence “My wife loves Klimt”, where the source is the user’swife.

Given a set Topics of different conversation topics, agent’spreferences on Topics represent agent’s liking/disliking foreach topic (e.g. a specific painting or artist). Let agt bethe agent conversing with the user usr. ∀topic ∈ Topics,Prefagt(topic) ∈ [−1, 0[∪]0, 1] is the preference value oftopic from the point of view of agent agt (if Prefagt(topic) <0, agt dislikes the topic). In this work, pol(Prefagt(topic))refers to the polarity of agent’s preference about topic.

B. Selecting Emotional Stance and Appreciation Sentenceswith OR

The selection process is rule-based: we use a binarydecision tree (Figure 1), which allows for a representationof conditional rule-based decision processes. This decisionprocess is grounded on theoretical work issued from linguisticstudies. Relying on the concepts of appreciation [13] andother-repetition [10], two levels of alignment between the ECAand a user are represented in our model. The first one concernsalignment at the level of the polarity (positive / negative)between a user’s appreciation and an ECA’s appreciation on thesame target. In this case, the ECA shares its appreciation withthe user. The second one concerns alignment at the lexical levelwith other-repetition [10]. In this case, the ECA expresses anappreciation in using intentionally word(s) previously utteredby the user. Thus, an emotional stance is selected accordingto: (i) the presence or absence of an appreciation in user’slast sentence, (ii) the polarity of user’s appreciation in user’slast sentence, that can be the same or divergent from agent’spreference for the target of user’s appreciation.

Two main decision parameters are set according to theinputs described in Section II-A. The first one is theappreciation ∈ {True, False} variable. It equals to Truewhen the Detect-Appr module detected at least one appre-ciation in the user’s last speaking turn and False other-wise. The second one is the divergence(Appusr(target :name(topic), P refagt(topic)) ∈ {True, False} variable. Itequals to True when the polarity of the user’s appreciationfor the target name(topic) is the same as the polarity of theagent’s preference for topic, and False otherwise. Formally,we denote Appusr as an appreciation originating from a userusr, with targetAppusr

= name(topic). The decision processis carried out through a decision tree visible in Figure 1.It shows how an OR-Emo is selected. The ECA has thepossibility to repeat an appreciation word uttered in the user’slast sentence, denoted as lexiaLemAppusr

, or a topic nameuttered in last user’s sentence, denoted as name(topic) ∈ T .The selection of the repeated word and the generation of thesentence is detailed in Section II-C. In case where multipleappreciations are detected in the user’s speaking turn, the OR-Emo model takes into account the last appreciation formulatedby the user, denoted as Appusr.

When a user’s appreciation Appusr is detected (leftbranch of the decision tree), the ECA has the abilityto repeat the appreciation word lexiaLemAppusr . Thiscould lead to the expression of two OR-Emo types. Ifthere is no divergence (divergence(Appusr(target :name(topic), P refagt(topic)) = False), the ECAwill display an Aligned OR-Emo. That means thepolarity of ECA’s appreciation will be the same asthe polarity of user’s appreciation Appusr. When

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 963

Page 3: An ECA Expressing Appreciations - Casa Paganini - · PDF file · 2015-09-19An ECA Expressing Appreciations ... OR-Emo Module proceeds in two steps detailed in Figure 1. ... each appreciation

there is a divergence (divergence(Appusr(target :name(topic), P refagt(topic)) = True), then the ECAexpresses a Surprise OR-Emo. Surprise is here modelled asa non valenced emotional stance. When an other-repetitionformulated by the hearer expresses surprise, it is a requestfor elaboration that asks the speaker to tell more [10].Using surprise is thus an interesting strategy to foster user’sengagement because it allows us to (i) avoid disalignment,(ii) make the user tell more about a given topic 5.

When no user’s appreciation is detected (right branch of thedecision tree), there are two possibilities: if the user uttereda topic name name(topic) ∈ T , the ECA still has thepossibility to repeat this name(topic) within an appreciation(ex: “I like name(topic)”). In this case, a Basic OR-Emo isselected. It represents a positive or negative emotional stance,which is unaligned (different from oppositely aligned); if notopic name was uttered, then the default sentence is selected.It does not represent any emotional stance. This sentenceshould be consistent with the scenario, and the topic which iscurrently discussed. It has to be pre-defined, and it is passedas parameter to the OR-Emo Module during the interaction.

C. Generation of the Agent’s OR-Emo Sentences

For each kind of OR-Emo in the decision tree, one specificpattern for ECA’s sentence is defined. First, the model providesrules in order to define relevant semantic features of theagent’s utterance: the sentence form assertive or interrogative(sentenceForm : [interrogative|assertive]) and the featureset of AppBase. The Surprise OR-Emo pattern has aninterrogative form starting with a NewsMark6 (e.g. “Really”),the source of the appreciation is the user (agent repeats user’sappreciation), and the polarity of AppBase corresponds tothe polarity of the user’s appreciation polAppuser

(ex: “Ahbon, vous n’aimez pas Picasso?” in french, transl. : “Really,you don’t like Picasso”). The Basic OR-Emo and AlignedOR-Emo patterns have an assertive form, the source of theappreciation is the agent (it expresses its own appreciation),and the polarity of AppBase corresponds to the agent’s agtpreference pol(Prefagt(t)) (ex: “Moi non plus, je n’aime pasPicasso” in french, transl. : “Me neither, I don’t like Picasso”).In Aligned OR-Emo, a syntactic variable AlignmentExpr isadded at the beginning of the sentence. Two simple phrasesexpress that the agent agrees with the user’s appreciation: (i)AlignmentExpr(negation : false) → “Moi aussi” (“Metoo” in english) (ii) AlignmentExpr(negation : true) →“Moi non plus” (“Me neither” in english).

Once the AppBase features have been defined, its syn-tactic form is produced by using a predefined appreciationpattern. We defined two kinds of pattern: an adjectival one,(AppBase(termCat : adj)), which is used when the ap-preciation word is an adjective (AppAdj), and a verbal one(AppBase(termCat : verb), which is used when the appreci-ation word is a verb (AppV b). The adjectival pattern has formain verbal form (OpinionV b) “trouver que” (“to think”, “toconsider” in english), followed by an adjective. An example inenglish would be “I consider this painting as beautiful”. Two

5Other work models surprise as a general emotional reaction to belief-disconfirmation [17].

6“which implies information that is treated as both new and as surprisingor interesting” [10].

forms are defined for this adjectival pattern: a negative one andan affirmative one. These two forms can be used either for anagent’s appreciation (AppBase(src : agent)), or for a user’sone (AppBase(src : user)). When AppBase(src : user),the pronoun has second person plural form (“vous”) and whenAppBase(src : agent), it has a first person singular form(“je”).

In the verbal pattern, the appreciation on the target isconveyed by the verb. An example in english would be “Ilove this painting”. The verbal pattern also has a negative and apositive form. Again, these two forms of the verbal pattern areused either for an agent’s appreciation (AppBase(src : user)),or for a user’s one (AppBase(src : agent)). As in theadjectival pattern, the verb is conjugated to the right formaccording to the pronoun and the nature of the source (user oragent).

The ECA’s sentences are accompanied by non verbalbehaviors, corresponding to performative acts such as argue,inform, or to emotional stances (negative, positive, or surprise).The agents’ communicative intentions are described in theFML-APML standardised format [18].

III. EVALUATION AND RESULTS

A. Experimental Conditions

We rely on subjective measures of user engagement, as-sessed through a questionnaire. We also aim at the evaluationof the ECA’s believability, as perceived by the user. We used3 conditions for our experiment: 1) When the dialogue systemis set in EMO-WITH-OR condition, the OR-Emo model isactivated, and the ECA expresses emotional stances with ORs,following the decision rules of Figure 1. 2) When the dialoguesystem is set in EMO-WITHOUT-OR condition, the ECAexpresses emotional stances without ORs. To do so, a list ofgeneric appreciation words with negative or positive polarityhas been defined. The word in the generated sentence is ran-domly chosen from this list (excluded the user’s appreciationword), according to the polarity that must be expressed. 3)When the dialogue system is set in NO-EMO condition, theECA does not express emotional stances. Instead, it utters apre-defined sentence, that has approximately the same lengthas a verbal emotional stance produced by the OR-Emo model.These pre-defined sentences are written in the scenario script,such as when this condition is applied, the default sentence isreturned instead of the OR-Emo model output.

We asked human participants to visit an improvised mu-seum, then talk to the virtual agent called Leonard that takesthe role of another visitor, and ultimately fill in a questionnaire.We hung 4 pictures of existing artworks in the corridor. Theobjects were chosen as to vary in style and type of affect theymight evoke: a foto of the exhibition of Balloon Dog by JeffKoons, and printed images of the paintings The Kiss by GustavKlimt, Composition A by Piet Mondrian, and The AnatomyLesson of Dr. Frederick Ruysch by Jan Van Neck. We placedanother artwork between the screen of the virtual agent and theuser that serves as a first conversation topic: the picture of astatue named Soldier drawing his Bow, by Jacques Bousseau.Each participant went through one condition only, which waskept all along the interaction. Leonard was displayed on a75-inch vertically placed screen and has the appearance of a

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 964

Page 4: An ECA Expressing Appreciations - Casa Paganini - · PDF file · 2015-09-19An ECA Expressing Appreciations ... OR-Emo Module proceeds in two steps detailed in Figure 1. ... each appreciation

cartoon-like version of a man of about 70 years old. Picturesof the experiment can be found in [19].

B. Questionnaire

The questionnaire used in the experiment is composed of5 separate sections, devoted to the evaluation of the OR-Emomodel. The questions for sections 2 to 5 – originally writtenin French – are rated in a 7-point scale: from Not at all toVery much.Section 1 collects the user profile (5 items): birth date, gender,residence country for the 5 last years, education level, industry.Section 2 concerns engagement in the interaction groundedon the definition by [20]. Thus, we create 4 items7 to assessboth the user’s own engagement and the perception of agent’sengagement by the user: During the interaction, to what extentdid you: 2.1 → want to stay together with Leonard ? 2.2→ think Leonard would like to stay together with you ? 2.3→ want to continue the conversation ? 2.4 → think Leonardwould like to continue the conversation ?Section 3 concerns engagement in the interaction grounded onthe Temple Presence Inventory (TPI) questionnaire [21]. Thisquestionnaire already used in [1] aims at measuring a person’simmersive tendency, or presence, in a virtual environment. Asrecommended by the authors of the TPI, we selected individualitems that were useful and appropriate for our study, andwe adapted the selected items to a human-agent interactioncontext. We added a supplementary item on the general likingof the interaction. The corresponding questions are presentedto the user as follows: To what extent: 3.1 → did you feelinvolved in the interaction ? 3.2→ this experience was boringor lively ? 3.3 → the information delivered by Leonard wasinteresting ? 3.4 → did you like this interaction ?Section 4 is dedicated to the user’s perception of the agent’semotional stances and appreciations (5 items). It aims atassessing whether the user (i) perceived that the agent reactedto user’s appreciations, (ii) perceived these reactions as appro-priate (iii) perceived that the agent has its own preferences(iv) felt that the agent and he/she share the same preferences.This makes it possible to assess whether the emotional stancesand the appreciations formulated by the agent are effectivelyperceived by the user, and whether the difference between theagent’s preferences and the user’s appreciations could playa role in the user’s engagement. For the questions includedin this section, we used the word “opinion” instead of “ap-preciation”, in order to clarify the terminology for the user.The user is given the following instructions: Please indicateyour level of agreement with the following statements. 4.1 →In general, Leonard took into account what I said duringthe conversation. 4.2 → Leonard reacted to the opinions Iexpressed. 4.3 → When Leonard reacted to my opinions, itdid it in an appropriate manner. I had the feeling that: 4.4 →Leonard had its own opinions. 4.5 → Leonard and me sharethe same opinions.Section 5 collects user’s perception of agent’s believability(6 items). The items of this section are inspired from thedefinition of believable agents by Bates (1994) [22], the TPIquestionnaire [21], and the evaluation protocol used by [23] forassessing agent’s believability. The user is given the followinginstructions: Please indicate your level of agreement with the

7the same items have been used in [19] for another study

following statements: I had the feeling that: 5.1 → Leonard’sutterances and behaviours are consistent. 5.2 → Leonard’sutterances and behaviours are lively. 5.3→ Leonard is able tothink by itself. 5.4 → Leonard has feelings. 5.5 → Leonard’sutterances and behaviours are common to occur in humanbehaviour. 5.6 → Leonard’s utterances and behaviours couldoccur in human behaviour.

C. Results and Discussion

33 participants (13 females and 20 men) took part in thestudy. They were recruited in the offices of Telecom-ParisTech(Paris, France), and were external to the virtual agents team.They worked in research laboratories or companies, and in-cluded administrative staff as well as researchers and engi-neers with a good level in French. 33 interaction sessions ofapproximately 6-10 minutes length were thus recorded.

Before analysing the results, we applied the ShapiroWilktest [24] to each individual sample by group to check whetherthe samples are likely to come from a normally distributedpopulation. For a quite large amount of samples ( 39%),the results indicate that the null hypothesis can be rejected(p < 0.05). This means that these samples are likely tocome from a non normally distributed population. Hence, forall of the following results, we used statistical tests for nonparametric data. As each participant has passed the experimentwith only one experimental condition, the statistical tests tobe used are then for non repeated measures. When only twosamples were compared, we used the Wilcoxon rank sum tests[25] with continuity correction (W , p). When more than twosamples were tested, we used the Kruskall Wallis test (χ2(2),p) [26], and then a post-hoc test which is Mann-Whitney withBonferroni correction. The post-hoc test is used to do pairwisecomparisons and obtain supplementary information. In order totest whether several items in the questionnaire can be combinedinto a single Likert’s scale, we used the Cronbach’s α (alpha)that verifies whether several items measure the same construct.[27].

Perception of the agent’s emotional stances with appre-ciations. We analysed the results corresponding to the section4 in the questionnaire, focusing on the answers of questions4.3 and 4.5. Regarding 4.3 answers, when the ECA reactedto the user’s appreciations, participants found that it reactedin an appropriate manner: EMO-WITHOUT-OR µ = 5.45,σ = 1.13, EMO-WITH-OR µ = 5.45 σ = 0.69 and NO-EMOµ = 4.27 σ = 1.74.

Overall, users had the feeling that they share the sameappreciations as the ECA when it expressed emotional stances.This is shown by the ratings obtained for question 4.5. Thestatistics of this item for the groups are: EMO-WITHOUT-ORµ = 4.64, σ = 1.69, EMO-WITH-OR µ = 4.73 σ = 1.01and NO-EMO µ = 3.64 σ = 1.36. Regarding the impactof ORs, the comparison between groups EMO-WITH-OR andNO-EMO gave a significant result (W = 90.5, p < 0.05),whereas the comparison between groups EMO-WITHOUT-ORand NO-EMO was not significant (W = 86, p = 0.09334).This means that when the ECA expresses appreciations withORs, it significantly reinforces the users’ feeling that they sharethe same appreciations as the ECA, in comparison with whenthe ECA did not express emotional stances. When the ECAexpresses appreciations without ORs, the results suggest that

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 965

Page 5: An ECA Expressing Appreciations - Casa Paganini - · PDF file · 2015-09-19An ECA Expressing Appreciations ... OR-Emo Module proceeds in two steps detailed in Figure 1. ... each appreciation

there is no difference in comparison with when the ECA didnot express emotional stances. This result concerning the ORsis encouraging as the sharing of appreciations is importantto build rapport and affiliation between two speakers, whichcontributes to their engagement [28]

Engagement. We tested whether the items of the twosections (2 and 3) can be aggregated into a single Likertscale. The Cronbach’s α measure of scale reliability for theLikert items8 is α = 0.903, which indicates an excellentreliability. We combined these scales into a single Likert scale,representing the user’s engagement. A Kruskall-Wallis teston the user’s engagement Likert scale for the 3 groups gaveno significant results (χ2(2) = 3.3038, p = 0.1917), whichmeans that the 3 groups cannot be distinguished from eachother. The mean and standard deviation by group are shownin Table I. The highest engagement score was obtained for theEMO-WITHOUT-OR group, closely followed by the EMO-WITH-OR condition. The results suggest that when the agentformulates emotional stances with ORs, user’s engagement isnot improved compared to when it expresses emotional stanceswithout ORs. This result can be compared to the one obtainedby Ivaldi et al. [29], who found that their gaze mechanismallows for improving the pace of interaction, but does not seemto increase perceived engagement. Subjective measures maynot fully reflect the user’s engagement; subjective measuresshould be completed by objective ones, such as the analysis ofuser’s facial expressions, mutual gaze, speech or verbal content[30]. In order to evaluate the impact of ECA’s expressionsof emotional stances on user’s engagement, two Wilcoxonrank sum tests with continuity correction were performed.The comparison between groups EMO-WITH-OR and NO-EMO gave no significant results (W = 84, p = 0.1297),as the comparison between the groups EMO-WITHOUT-ORand NO-EMO (W = 84, p = 0.1303). However, the nullhypothesis can be rejected with a 87% confidence interval.This confidence interval suggests that it would be interesting tore-conduct the experiment, in order to test again the hypothesisthat an ECA expressing emotional stances could enhance userengagement.

During our experiment, we noticed that when the agentexpressed an emotional stance about a topic, the user oftenresponded in doing so and the expression of emotional stancesfrom the agent seems thus to have a positive impact on userengagement. Although this has to be confirmed in a furtherstudy, this result shows that the expression of emotional stancescontaining verbal appreciations is an interesting feature toinclude in the capabilities of an ECA.

As a supplementary result, we evaluated the impact ofthe 3 conditions on the agent’s engagement as perceived bythe users. The questions 2.2 and 2.4 were aggregated intoa single Likert scale (average). The result is not significant(χ2(2) = 3.7414, p = 0.154), but the confidence interval(85%) suggests that the experiment could be re-conducted. Thestatistics for the groups EMO-WITHOUT-OR, EMO-WITH-OR and NO-EMO are respectively µ = 5.82, σ = 1.19,µ = 5.18 σ = 1.27 and µ = 4.68 σ = 1.68. Hence, agentengagement as perceived by users obtained a lower rating whenemotional stances contained ORs, compared to when they did

8As items 2.2 and 2.4 are related to the agent’s engagement, they were notconsidered for the aggregation.

TABLE I: MAIN STATISTICS

User’s Engagement Believabilitygroup mean sd mean sd

EMO-WITH-OR 5.06 0.87 5.45 1.16EMO-WITHOUT-OR 5.14 1.27 5.33 0.82

NO-EMO 4.62 0.76 4.64 0.8

not. It would be interesting to re-conduct an experiment, inorder to assess whether the user is less engaged when anagent uses the same vocabulary (appreciation words, and topicnames). However, in the next paragraph we will see that whenthe agent uses the same words as the user, its behaviours arerated as more believable. These results offer an interestingresearch perspective, showing that the link between ORs andbelievability / engagement should be further explored.

Believability. Cronbach’s α measure is α = 0.891 for theitems of Section 5, which indicates a good reliability. TheLikert items corresponding to the evaluation of the agent’sbelievability were then averaged into a single Likert scale. AKruskall-Wallis test was performed on this Likert scale forthe 3 groups, which gave results very close to a significantthreshold (χ2(2) = 5.6779, p = 0.05849). This suggests thatthere is a difference between the 3 groups regarding agent’sbelievability perceived by the user. Table I shows that thehighest mean is obtained with the EMO-WITH-OR groupand the lowest with the NO-EMO group A post-hoc testusing Mann-Whitney tests with Bonferroni correction gave nonsignificant results (p = 0.081 for groups NO-EMO and EMO-WITHOUT-OR). This suggests that there is no differencebetween the two groups NO-EMO and EMO-WITHOUT-OR.

A Kruskall-Wallis test was performed for each item ofSection 5 in the questionnaire, in order to identify the mostdiscriminant items. Regarding the answers to the question 5.4,the test gave a significant result (χ2(2) = 11.809, p < 0.01).The post-hoc test using Mann-Whitney tests showed significantdifferences between the groups EMO-WITH-OR and NO-EMO (p < 0.01), and between the groups EMO-WITHOUT-OR and NO-EMO (p < 0.05). The difference between themeans for the group NO-APPR and the other groups is quiteimportant (Table II). This means that users tend to think thatthe agent has feelings when it expresses emotional stances withappreciations, whereas in the other case they tend to think ithas no feelings.

Regarding the answers to the question 5.3, a Kruskall-Wallis test revealed a non significant effect of group, but the pvalue is close to the threshold of significance (χ2(2) = 5.6222,p = 0.06014). The post-hoc test using Mann-Whitney testshowed no significant differences, despite a good confidenceinterval for the groups EMO-WITH-OR and NO-EMO (p =0.09, hence 91% confidence). The means for the 3 groupsare above or equal to the medium of the scale (Table II). Thisexperiment has to be re-conducted to test this hypothesis again,but these results could indicate that users tend to think that theagent has its own feelings rather when it expresses emotionalstances with appreciations than when it does not.

Regarding the answers to the question 5.6, the Kruskall-Wallis test gave a non significant result, despite a p value closeto a significant threshold (χ2(2) = 5.2996, p = 0.07067).The means for the 3 groups are shown in Table II. It can be

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 966

Page 6: An ECA Expressing Appreciations - Casa Paganini - · PDF file · 2015-09-19An ECA Expressing Appreciations ... OR-Emo Module proceeds in two steps detailed in Figure 1. ... each appreciation

TABLE II: STATISTICS FOR 3 ITEMS IN THE BELIEVABILITYSECTION

Feelings (5.4) Thoughts (5.3) Possible behaviours (5.6)group mean sd mean sd mean sd

EMO-WITH-OR 4.91 1.14 5.27 1.19 5.91 1.38EMO-WITHOUT-OR 4.82 1.25 4.91 1.14 5.27 0.65

NO-EMO 3 1.26 4 1.26 5 0.89

noticed that for emotional stances containing ORs, the agent’sbehaviours were rated as more believable (especially, morecommon to occur) than for emotional stances without ORs.

IV. CONCLUSION

This study presented a computational model of other-repetitions (ORs) conveying emotional stances, that can beused by an Embodied Conversational Agent (ECA) in responseto the user’s utterances. The evaluation of the model showedthat the expression of verbal emotional stances by an ECAtends to improve the user’s perception of his/her own engage-ment. The OR model also seems to have an impact on someaspects of ECA’s believability as perceived by the user. Theresults suggest that the experiment should be re-conducted toconfirm/disconfirm this hypothesis. On the other hand, ORdoes not seem to affect the user’s perception of his/her ownengagement. This indicates that the study of ORs shouldbe completed by objective measures for a comprehensiveexploration of their contribution. However, the presence ofORs in the ECA’s appreciations had a positive effect on theparticipants’ feeling that they shared the same appreciationsas the ECA. We are currently integrating in our model otherinputs such as the user’s level of engagement (based on thelevel of the user’s talkativeness), in order to juggle with timingaspects for the triggering of appreciations [31]. We further planto replicate the evaluation protocol used in the present study.For future work, we would also like to add objective measuresof engagement, such as gaze cues or speech rate.

ACKNOWLEDGMENT

The authors thank the members of Laboratoire Parole etLangage, France for valuable insights and suggestions, as wellas A1:1 partners and Greta team of Telecom-ParisTech for theirhelp in the experimental setup.

REFERENCES

[1] C. L. Sidner, C. Lee, C. D. Kidd, N. Lesh, and C. Rich, “Explorationsin engagement for humans and robots,” Artificial Intelligence, vol. 166,no. 1, pp. 140–164, 2005.

[2] R. Bertrand, G. Ferre, M. Guardiola et al., “French face-to-face in-teraction: repetition as a multimodal resource,” Coverbal Synchrony inHuman-Machine Interaction, p. 141, 2013.

[3] E. Delaherche, M. Chetouani, A. Mahdhaoui, C. Saint-Georges, S. Vi-aux, and D. Cohen, “Interpersonal synchrony: A survey of evaluationmethods across disciplines,” T. on Affective Computing, vol. 3, no. 3,pp. 349–365, 2012.

[4] H. Buschmeier, K. Bergmann, and S. Kopp, “An alignment-capablemicroplanner for natural language generation,” in Workshop on NaturalLanguage Generation. ACL, 2009, pp. 82–89.

[5] U. Hess, P. Philippot, and S. Blairy, “8. mimicry,” The Social Contextof Nonverbal Behavior, p. 213, 1999.

[6] J. Gratch, S.-H. Kang, and N. Wang, “Using social agents to exploretheories of rapport and emotional resonance,” Social Emotions in Natureand Artifact, p. 181, 2013.

[7] S. Kopp, “Social resonance and embodied coordination in face-to-face conversation with artificial interlocutors,” Speech Communication,vol. 52, no. 6, pp. 587–597, 2010.

[8] J. C. Acosta and N. G. Ward, “Achieving rapport with turn-by-turn,user-responsive emotional coloring,” Speech Communication, vol. 53,no. 910, pp. 1137–1148, 2011.

[9] K. Prepin, M. Ochs, and C. Pelachaud, “Beyond backchannels: co-construction of dyadic stance by reciprocal reinforcement of smilesbetween virtual agents,” in CogSci, 2013.

[10] J. Svennevig, “Other-repetition as display of hearing, understanding andemotional stance,” Discourse studies, vol. 6, no. 4, pp. 489–516, 2004.

[11] L. H. Silverman, “Meaning making matters: Communication, conse-quences, and exhibit design,” Exhibitionist, 1999.

[12] S. Campano, J. Durand, and C. Clavel, “Comparative analysis of verbalalignment in human-human and human-agent interactions.” LREC,2014.

[13] J. R. Martin and P. R. White, The language of evaluation. PalgraveMacmillan Basingstoke and New York, 2005.

[14] E. de Sevin, R. Niewiadomski, E. Bevacqua, A.-M. Pez, M. Mancini,and C. Pelachaud, “Greta, une plateforme d’agent conversationnelexpressif et interactif,” Technique et science informatiques, vol. 29,no. 7, p. 751, 2010.

[15] C. Rich and C. L. Sidner, “Using collaborative discourse theory topartially automate dialogue tree authoring,” in Intelligent Virtual Agents.Springer, 2012, pp. 327–340.

[16] C. Langlet and C. Clavel, “Improving social relationships in face-to-facehuman-agent interactions: when dislikes,” in ACL, 2015, to appear.

[17] R. Reisenzein, E. Hudlicka, M. Dastani, J. Gratch, K. V. Hindriks,E. Lorini, and J.-J. C. Meyer, “Computational modeling of emotion:Toward improving the inter- and intradisciplinary exchange,” T. AffectiveComputing, vol. 4, no. 3, pp. 246–266, 2013.

[18] M. Mancini and C. Pelachaud, “The fml-apml language,” in Proc. ofthe Workshop on FML at AAMAS, vol. 8, 2008.

[19] N. Glas and C. Pelachaud, “User engagement and preferences ininformation-giving chat with virtual agents,” in Workshop on Engage-ment in Social Intelligent Virtual Agents, 2015, p. Forthcoming.

[20] I. Poggi, Mind, hands, face and body: a goal and belief view ofmultimodal communication. Weidler, 2007.

[21] M. Lombard, L. Weinstein, and T. Ditton, “Measuring telepresence: Thevalidity of the temple presence inventory (tpi) in a gaming context,” inISPR, 2011.

[22] J. Bates et al., “The role of emotion in believable agents,” Communi-cations of the ACM, vol. 37, no. 7, pp. 122–125, 1994.

[23] S. Campano, N. Sabouret, E. de Sevin, and V. Corruble, “An evaluationof the cor-e computational model for affective behaviors,” in AAMAS.AAMAS, 2013, pp. 745–752.

[24] S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality(complete samples),” Biometrika, pp. 591–611, 1965.

[25] H. B. Mann and D. R. Whitney, “On a test of whether one of tworandom variables is stochastically larger than the other,” The annals ofmathematical statistics, pp. 50–60, 1947.

[26] W. H. Kruskal and W. A. Wallis, “Use of ranks in one-criterion varianceanalysis,” Journal of the American statistical Association, vol. 47, no.260, pp. 583–621, 1952.

[27] J. C. Nunnally and I. Bernstein, “The assessment of reliability,” Psy-chometric theory, vol. 3, pp. 248–292, 1994.

[28] R. Zhao, A. Papangelis, and J. Cassell, “Towards a dyadic computationalmodel of rapport management for human-virtual agent interaction,” inIntelligent Virtual Agents. Springer, 2014, pp. 514–527.

[29] S. Ivaldi, S. M. Anzalone, W. Rousseau, O. Sigaud, and M. Chetouani,“Robot initiative in a team learning task increases the rhythm of inter-action but not the perceived engagement,” Frontiers in Neurorobotics,vol. 8, p. 5, 2014.

[30] K. Hook, P. Persson, and M. Sjolinder, “Evaluating users’ experience ofa character-enhanced information space,” AI Communications, vol. 13,no. 3, pp. 195–212, 2000.

[31] S. Campano, C. Clavel, and C. Pelachaud, “I like this painting too:When an eca shares appreciations to engage users,” in AAMAS, 2015.

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 967