25
is is a contribution from Interaction Studies 14:3 © 2013. John Benjamins Publishing Company is electronic le may not be altered in any way. e author(s) of this article is/are permitted to use this PDF le to generate printed copies to be used by way of oprints, for their personal use only. Permission is granted by the publishers to post this le on a closed server which is accessible to members (students and sta) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet. For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com Tables of Contents, abstracts and guidelines are available at www.benjamins.com John Benjamins Publishing Company

John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

This is a contribution from Interaction Studies 14:3© 2013. John Benjamins Publishing Company

This electronic file may not be altered in any way.The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only.Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet.For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com

Tables of Contents, abstracts and guidelines are available at www.benjamins.com

John Benjamins Publishing Company

Page 2: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

Interaction Studies 14:3 (2013), 366–389. DOI 10.1075/is.14.3.04yamISSN 1572–0373 / E-ISSN 1572–0381 © John Benjamins Publishing Company

Interactions between a quiz robot and multiple participantsFocusing on speech, gaze and bodily conduct in Japanese and English speakers

Akiko Yamazaki1, Keiichi Yamazaki4, Keiko Ikeda2, Matthew Burdelski3, Mihoko Fukushima4, Tomoyuki Suzuki4, Miyuki Kurihara4, Yoshinori Kuno4 & Yoshinori Kobayashi41Tokyo University of Technology / 2Kansai University / 3Osaka University / 4Saitama University

This paper reports on a quiz robot experiment in which we explore similarities and differences in human participant speech, gaze, and bodily conduct in responding to a robot’s speech, gaze, and bodily conduct across two languages. Our experiment involved three-person groups of Japanese and English-speaking participants who stood facing the robot and a projection screen that displayed pictures related to the robot’s questions. The robot was programmed so that its speech was coordinated with its gaze, body position, and gestures in relation to transition relevance places (TRPs), key words, and deictic words and expressions (e.g. this, this picture) in both languages. Contrary to findings on human interaction, we found that the frequency of English speakers’ head nodding was higher than that of Japanese speakers in human-robot interaction (HRI). Our findings suggest that the coordination of the robot’s verbal and non-verbal actions surrounding TRPs, key words, and deictic words and expressions is important for facilitating HRI irrespective of participants’ native language.

Keywords: coordination of verbal and non-verbal actions; robot gaze comparison between English and Japanese; human-robot interaction (HRI); transition relevance place (TRP); conversation analysis

1. Introduction

“Were an ethologist from Mars to take a preliminary look at the dominant animal on this planet, he would be immediately struck by how much of its behavior, within a rather extraordinary array of situations and settings (from camps in the tropical rain forest to meetings in Manhattan skyscrapers), was organized through face-to-face interaction with other members of its species” (M. Goodwin 1990: p. 1).

Page 3: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 367

The study of face-to-face interaction has long been a central concern across various social scientific disciplines. More recently it has become an important focus within fields related to technology and computer-mediated discourse (e.g. Heath & Luff 2000; Suchman 2006). This is especially true of research adopt-ing ethnomethodology (Garfinkel 1967) and conversation analysis (hereafter abbreviated as CA) (Sacks, Schegloff & Jefferson 1974) that examines human-robot interaction (HRI) (e.g. Pitsch et al. 2013; A. Yamazaki et al. 2010 and A. Yamazaki et al. 2008). One of the central issues in this area is multicultural and inter-cultural patterns in human-robot and human-virtual agents. Many of these corpora are of human-human interaction, but were collected for the purposes of HRI or human-agent interaction, such as the CUBE-G corpus (e.g. Nakano & Rehm 2009), corpora of the natural language dialogue group of USC (e.g. Traum et al. 2012), and the CMU cross-cultural receptionist corpus (e.g. Makatchev, Simmons & Sakr 2012). These works also focus on verbal and non-verbal behavior of human interactions in order to develop virtual agents. While we have a similar interest in examining and developing technology that can be employed in a real-world environment across various cross-cultural and cross-linguistic settings, our research project utilizes a robot that is able to verbalize and display its bodily orientation towards objects in the immediate vicinity and multiparty participants in order to facilitate participants’ understanding of, and engagement in, the robot’s talk. Thus, a key feature of the present research is not only the use of speech and non-verbal language, but also the coordination of these resources in order to facilitate visitors’ engagement in human-robot interaction (A. Yamazaki et al. 2010). In the present paper we focus on how non-verbal actions (e.g. gaze, torso, gesture) are related to question-response sequences in multi-party HRI.

A rich literature on human social interaction can be found in studies on CA. A main focus of these studies is to identify the underlying social organiza-tion in constructing sequences of interaction. In particular, a growing number of studies examines question-response sequences. For example, as Stivers and Rossano (2010) point out, a question typically elicits a response from the recipi-ent of the question turn. Thus asking a question is a technique for selecting a next speaker (Sacks, Schegloff & Jefferson 1974). Since a question forms the first part of an adjacency pair, it calls for a specific type of second pair part (cf. Sacks 1987; Schegloff 2007). Rossano (2013) points out that in all question- response sequences there is a transition relevance place (TRP). A TRP is defined as: “The first possible completion of a first such unit constitutes an initial transition- relevance place” (Sacks, Schegloff & Jefferson 1974: p. 703) and it is a place where turn transfer or speaker change may occur. At a TRP, a hearer can pro-vide a response to the speaker, but may not necessarily take the turn (e.g. verbal

Page 4: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

368 Akiko Yamazaki et al.

continuers, head nods). When somebody asks a question, a hearer has a nor-mative obligation to answer (Rossano 2013). Stivers and her colleagues (2010) examined question-response sequences of naturally occurring dyadic interac-tions among ten different languages. While there are some variations in the ways speakers produce question formats, what is common among them is that the speaker typically gazes towards the addressee(s) when asking a question to a multiparty audience (e.g. Hayashi 2010).

A number of researchers have studied gaze in human interaction in various settings and sequential contexts. In particular, Kendon (1967) points out that gaze patterns were systematically related to the particular feature of talk. C. Goodwin (1981) clarified that hearers display their engagement towards the speaker by using gaze. In addition, Bavelas and his colleagues (2002) describe how mutual gaze plays a role in speaker-hearer interactions. A recent study of dyadic interactions in cross-cultural settings reveals similarities and differences of gaze behaviors among different language and cultures (Rossano, Levinson & Brown 2009). While gaze has been given much thought and acknowledged as an important resource in human-human interaction, a question still remains as to how gaze is deployed and can be employed in multiparty HRI.

Within the current research on gaze behavior in HRI (e.g. Knight & Simmons 2012; Mutlu et al. 2009) there has not yet been discussion on mul-tiparty question-response sequences in cross-cultural settings. The present paper begins to fill this gap by comparing human-robot interaction in Japa-nese and English within a quiz robot experiment. The use of a robot allows us to ask the same questions employing the same bodily gestures, and to com-pare the responses of participants under the same conditions. Utilizing video-taped recordings, our analysis involves detailed transcriptions of human-robot interaction and a quantitative summary of the kinds of participants’ responses. We show that a main difference between the two language groups is that the frequency of nodding of English speakers is significantly higher than that of Japanese speakers. This is contrary to research on human-human interaction that argues that Japanese speakers nod more often than English speakers (e.g. Maynard 1990). Participants show their engagement in interaction with a robot when the robot’s utterance and bodily behavior such as gaze are coordinated appropriately.

This paper is organized in the following manner. In Section 2, we discuss the background of this study. In Section 3, we explain the setup for a quiz robot experiment. In Section 4, we offer initial analysis. In Section 5, we provide detailed analysis of participants’ responses in regard to the robot’s gaze and talk. Discussion and concluding remarks will follow in Section 6.

Page 5: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 369

2. Background of this study

2.1 Cross-cultural communicative differences: Word order

A rich literature on multicultural and inter-cultural variations in interaction can be found in the literature on ‘interactional linguistics’ (e.g. Tanaka 1999; Iwasaki 2009) in which scholars with training in CA and related fields tackle cultural dif-ferences from a cross-linguistic perspective. They do not follow the traditional syntactical approach to defining “differences” among languages used in interac-tion. Rather, they reveal an interactional word order associated to a specific social interactional activity. For instance, one study focused on a cross-cultural compari-son between Japanese and English (Fox, Hayashi & Jasperson 1996). Differences between interactions involving these two languages are particularly interesting due to their respective word order. There is a distinctive difference in ‘projection’ in regard to the timing of completion of a current turn-constructional unit (e.g. a sentential unit), which is defined as ‘projectability’ in CA.

In regard to question formats, which include interrogatives, declaratives and tag-questions (Stivers 2010), English and Japanese word order exhibits dif-ferences and similarities in interaction. For interrogatives, the sentence structure is nearly the opposite between Japanese and English. As Tanaka (1999: p. 103) states, “[i]n a crude sense, the structures of Japanese and English can be regarded as polar opposites. This is reflected in differences in participant orientations to turn- construction and projection.” For declaratives, there are no dramatic differ-ences between the two languages in terms of word order. However, the placement of the question word in the sentence is different. For tag-questions, the sentence structure is similar between Japanese and English as a question-format ‘tagged on’ to the end of a statement.

2.2 Coordination of verbal and non-verbal actions and questioning strategy

We have conducted ethnographic research in several museums and exhibitions in Japan and the United States in order to explore ways that expert human guides engage visitors in the exhibits. We analyzed video recordings by adopting CA and have applied those findings to a guide robot that can be employed in museums. Based on the following findings, we employed two central design principles in the robot for the current project.

First, as the coordination of verbal actions and non-verbal actions is an essen-tial part of human interaction (C. Goodwin 2000), we observed that guides often turn their heads toward the visitors when they mention a key word in their talk, and point towards a display when they use a deictic word or expression (e.g. this

Page 6: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

370 Akiko Yamazaki et al.

painting). We also found that visitors display their engagement by nodding and shifting their gaze, in particular at key words, deictic words and expressions, and sentence endings (one place of a TRP). Adopting these findings we programmed the coordination of verbal and non-verbal actions into a museum guide robot, and found that in dyadic robot guide-human interactions participants frequently nod-ded and shifted their gaze towards the object (A. Yamazaki et al. 2010).

Second, we observed that human guides often use a question-answer strategy in engaging visitors. In particular, guides use questions aimed at engaging visitors while coordinating their gaze (K. Yamazaki et al. 2009; Diginfonews 2010). In par-ticular, guides ask a pre-question (first question) regarding an exhibit, and at the same time monitor the visitors’ responses (during a pause) to check whether visi-tors are displaying engagement (e.g. nodding). Then the guide asks a main ques-tion (second question) towards a particular visitor who displays engagement. The first question serves as a “clue” to the main question. We programmed this ques-tioning strategy into the robot’s talk, and the results showed that the robot could select an appropriate visitor who has displayed “knowledge” in order to provide an answer to the main question (A. Yamazaki et al. 2012).

In relation to the second finding, we found that guides use a combination of three types of pre-question and main question formats towards multiple visitors: (1) Guide begins with a pre-question as an interrogative, and then asks a main question, (2) guide begins with a pre-question as a tag-question, and then asks a main-question, and (3) guide begins with a declarative sentence without telling the name or attribution of the referent and then asks a question regarding the referent. We implemented these three types of combinations of pre-question and main question in the current quiz robot, as we are interested in how coordination of the robot’s question (verbal) and gaze (non-verbal actions) is effective in multi-party interaction in English and Japanese.

3. The present experiment: A quiz robot in Japanese and English

In this experiment, we implemented the robot’s movement based on the ethnog-raphy described above in regard to verbal actions in both English and Japanese.

3.1 Robot system

We used Robovie-R ver.3. The experimental system was designed to provide expla-nations to three participants. The system has three pan-tilt-zoom (PTZ) cameras and three PCs, which are each dedicated to processing images from one PTZ cam-era observing one participant. Figure 1 presents an overview of the robot system.

Page 7: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 371

In addition to the three cameras outside of the body of the robot and three PCs for image processing, we used a laser range sensor (Hokuyo UTM-30LX) and another PC for processing the range sensor data and for integrating the sensor processing results. The system detects and tracks the bodies of multiple visitors in the range sensor data. A PTZ camera is assigned to each detected body. The system is con-trolled to enable it to turn toward the observed bodies of participants. For detect-ing and tracking a face and computing its direction, we used Face API (http://www.seeingmachines.com/product/faceapi/). The pan, tilt, and zoom of each PTZ camera are automatically adjusted based on the face detection results, so that the face remains focused in the center of the image. The system can locate a human body with a margin of error of 6 cm for position and 6 degrees for orientation. It can measure 3D face direction within 3 degrees of error at 30 frames per second. The head orientation is measured around three coordinate axes (roll, pitch and yaw) with the origin at the center of the head.

From the face direction results, the system can recognize the following behav-iors of participants: nodding, shaking, cocking and tilting the head, and gazing away, and it can choose an appropriate answerer based on such recognition results (A. Yamazaki et al. 2012). However, in the experiments described later, we did not use these autonomous functions so as to prevent potential recognition errors from influencing human behaviors. The system detected and tracked the participants’ positions using the laser range sensor to turn its head precisely toward them. A human experimenter, who was seated in back of the screen and could see the par-ticipants’ faces, controlled the robot to point to and ask one of three participants to answer the question. In other words, we adopted a WOZ (Wizard of Oz) method

Robot

Robot control

PC

PCBody tracking Face tracking

Pan-tilt controlMain process

PC PC PC

PTZ camerasLaser range sensor

Figure 1. Quiz robot system

Page 8: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

372 Akiko Yamazaki et al.

for selecting an answerer by employing a human experimenter who controlled the robot’s body movement.

Following the findings of our ethnographic research of expert human guides, we programmed the robot to move its gaze by moving its head (its eyes can be moved as well) and arms/hands in relation to its speech by using canned phrases (a built-in text-speech system can be used as well). The robot can open and close its hands and move its forefingers to point towards a target projected on the screen, similar to a human guide. The robot speaks English towards English par-ticipants and it speaks Japanese towards Japanese participants. A rough outline of the sequence of speech and movement is as follows: (1) Before the robot begins to talk, it looks towards the participants, (2) When the robot says the first word, it moves its gaze and hands to a picture projected on the projection screen in back, (3) During its speech, the robot moves its gaze and hand when it utters deictic words and expressions such as ‘here’ and ‘this picture’, (4) At the end of each sentence, the robot moves its gaze towards three participants one at a time, or it looks at a particular participant depending on the length of sentence as expert human guides do, (5) When the robot asks the main question, it moves its gaze and hand towards a particular participant (selected by the experimenter), (6) If the participant gives the correct answer to the main question, the robot makes a clapping gesture (an experimenter operates a PC to make a sound of clapping hands) and repeats the answer to the question. When the participant gives an incorrect answer, the robot says, “That’s incorrect” (Chigaimasu, in Japanese) and then produces the correct answer.

3.2 Experimental setup

The following are the details of the experiment we conducted in English and Japanese with the quiz robot. As described above, each group consisted of three participants.

1. Experiment 1 (in English): Kansai University (Osaka, Japan), 21 partici-pants (7 groups): 18 male and 3 female native speakers of English (mainly Americans, New Zealanders, and Australians). (15 June 2012). All partici-pants are international undergraduate/graduate students or researchers who either are presently studying Japanese language and culture or have in-depth knowledge of Japan.

2. Experiment 2 (in Japanese): Saitama University (Saitama, Japan, near Tokyo), 51 participants (27 groups): 31 male and 20 female native speakers of Japanese (4 July 2012). All participants were undergraduate students of Saitama University.

Page 9: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 373

During the experiment, we used three video cameras. The position of the robot (R), participants (1, 2 and 3), and the three video cameras (A, B, C) are shown (Figure 2).

Camera B

Camera C

Camera A

Screen

Display image size: 180 cm width × 120 cm height

450 cm

400 cm

2

R

3

1

140 cm110 cm

140 c

m

200 cm230 cm

Figure 2. Bird’s eye view of experimental setup

Before the experiments, our staff asked the participants to answer the robot’s quiz questions. Then the robot asked each group six questions related to six differ-ent pictures projected on a screen in back of the robot. The content of these ques-tions (Q1–Q6) was the following:

Q1: Name of the war portrayed in Picasso’s painting ‘Guernica’ (Screen: Guer-nica painting) (Answer: Spanish civil war).

Q2: Name of a Japanese puppet play (Screen: Picture of a famous playwright and a person operating a puppet) (Answer: Bunraku).

Q3: Name of the prefecture in which the city of Kobe is located (Screen: pic-ture of Kobe port and a Christmas light show called ‘Luminarie’) (Answer: Hyogo prefecture) (Figure 3).

Q4: Name (either first, last, or both) of the lord of Osaka Castle (picture of Osaka castle and Lord Hideyoshi Toyotomi) (Answer: Hideyoshi Toyotomi) (Figure 4).

Q5-a (English speakers only): Full name of a Japanese baseball player in the American major leagues (Screen: photo of Ichiro Suzuki) (Answer: Ichiro Suzuki).

Q5-b (Japanese speakers only): Full name of the chief cabinet secretary in former Japanese Prime Minister Kan’s cabinet (Screen: photo of Cabinet Secretary Yukio Edano) (Answer: Yukio Edano)

Page 10: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

374 Akiko Yamazaki et al.

Q6: Name of the former governor of California who was the husband of the niece of a former president of the United States (Screen: map of California, John F. Kennedy) (Answer: Arnold Schwarzenegger) (Figure 5).

Figure 3. Image projected on screen at Q3 (the left is Kobe port and the right is “Luminarie”)

Figure 4. Image projected on screen at Q4 (the left is Osaka castle and the right is Hideyoshi Toyotomi)

As an image is projected on the screen (as in Figures 3, 4 and 5), the robot poses a pre-question to the visitors (e.g. “Do you know this castle?”), and then provides an explanation of the image before asking a main question (e.g. “Do you know the name of the famous person in this photo who had this castle built?”).

Page 11: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 375

Due to differences in the topics of Q5 in English and Japanese and the low fre-quency of correct answers by participants for Q1 and Q2, here we do not analyze these three questions in detail. Rather we focus on Q3, Q4, and Q6. We will also report on some of the results of a questionnaire we asked participants to fill out after the experiment on whether they knew the answers to the pre-question and main question of Q3, Q4 and Q6.

3.3 Experimental stimuli

In what follows we explain the three question types in English and Japanese in rela-tion to robot gaze and discuss similarities and differences in terms of word order.

1. Declarative question: Q3For each utterance there are three lines of transcript. Rh stands for the robot’s hand motion and Rg represents the robot’s head motion. The third line, R, repre-sents the robot’s speech.

Transcription symbols are as follows: f = robot facing forward towards the participants; a comma-like mark (,) represents the robot moving its hand/head; ‘i’ indicates the robot in a still position facing towards the screen; d = robot’s hand/head down; ‘1’ indicates the robot in a still position facing towards Partici-pant 1 who stands to the furthest right of three participants in regard to the robot position; ‘2’ means the robot is in a still position facing towards Participant 2 who stands in between the other two participants; ‘3’ represents the robot in a still position facing towards Participant 3 who stands to the furthest left of the three

Figure 5. Image projected on screen at Q6 (President Kennedy is on the left and map of the United States with California highlighted is on the right)

Page 12: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

376 Akiko Yamazaki et al.

participants; ‘o’ indicates the robot spreading its hands and arms outward; ‘m’ rep-resents the robot moving its hands.

We adopt the Jefferson (1984) transcription notation for describing the robot’s speech. The transcription symbols are as follows:> < represents the portion of an utterance delivered at a pace noticeably quicker than the surrounding talk, and < > noticeably slower.(.) represents a brief pause.↑ marks a sharp rise in pitch register.↓ marks a noticeable fall in pitch register._ Underlining represents vocalic emphasis.Q3 in English01. Rh: f,,,,,iiiiiiiiiiiiiiiiiiiiiiiiiiiii02. Rg: f,,,,,iiiiiiiiiiiiiiiiiii,,,,,fffff03. R : This well-known(.)port in Japan has

04. Rh: iiiiiiiiiiiiiiiiiiii,,,,,,ooooommmmmmmm05. Rg: ffffffffffffffffffff,,,,,,111111,,,,,,,����5����VRPH�IDPRXV�WRXULVW�VLWHVĻ����VXFK�DV

07. Rh: mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm08. Rg: ,,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3,,,09. R : <Mt.(.)Rokko>a;nd Arima (.)>hot springs<.We explain the robot’s gaze in relation to the robot’s utterance. In line 3, after utter-ing in, the robot looks towards the participants. While saying the word sites, the robot shifts its gaze towards Participant 1. When the robot says such as, it shifts its gaze away from Participant 1, and when saying Mount, it turns towards Partici-pant 2. When the robot finishes saying hot springs (a key word, and a TRP) (line 9), it turns towards Participant 3.

In Q3 from the Japanese data, the first line and the second line represent the robot hand motion and head motion. The third line is the robot utterance in Japanese, and the following line provides an English gloss, and in parenthesis an English translation is provided.

The interlinear gloss abbreviations are as follows:acc means accusative particle.cop represents a copula.lk stands for a linking nominal.q indicates a question particle.tag represents a tag-like expression.top stands for a topic particle.

Page 13: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 377

Q3 in Japanese

1 Rh: ddddddddddddddddddddddddddddddddd,,,,2 Rg: fffffffffffffffffffffffffffffffff,,,,3 R : Nihon o daihyoo suru minatomachi dearu Japan acc well-known port-town cop

4 Rh: ,,,,,,iiiiiiiiiiiiiiiiii5 Rg: ,,,,,,iiiiiiiiii,,11,,226 R : kochira no toshi wa(.) this lk town top

7 Rh: iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii,,,,,,,,,,,,,,,,8 Rg: ,,333333333333333333333333333333333,,,,,,,,22,,11119 R : Rokkoosan ya Arima onsen nado(.) kankoochi Mt. Rokko and Arima hot springs etc tourist site

10 Rh: ,,,,,,,oooooooooooommmmmm11 Rg: 111111111111,,,,,,,,,,22,12 R : toshite mo yuumei desu as also famous cop

(Translation: This well-known port in Japan has some famous tourist sites such as Mount Rokko and Arima Hot Springs.)

In line 3, while uttering de aru ‘to be (inanimate)’ the robot shifts its gaze towards the picture (Figure 3). At the point of uttering kochira ‘this,’ this move was completed, and the robot turns towards the participants from Participant 1 to Participant 2 to Participant 3. While uttering Rokkosan ya Arima onsen nado ‘such as Mount Rokko and Arima hot springs’, the robot looks towards Participant 3. While saying kankoochi ‘tourist sites’, the robot shifts its eye gaze from Participant 3 to Participant 2. The robot moves its gaze away from Participants 3 and looks towards Participant 2 while saying yuumei desu at the sentence end.

The location of a key word ‘Arima Hot Springs’ differs in Japanese and English in Q3. In English the key word is located at the end of the sentence (at a TRP), whereas in Japanese it is located in the middle of the sentence (not at a TRP). In both languages, the robot looks towards Participant 3.

2. Q4: Interrogative formatQ4 in English.

01. Rh: f,,,,,iiiiiiiiiiiiiiiiii02. Rg: f,,,,,iiiiiiiiiiiiiiiiii03. R ��'R�\RX�NQRZ�WKLV�FDVWOH↑?

Page 14: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

378 Akiko Yamazaki et al.

Here, the robot looks towards the three participants, utters the phrase Do you (line  3), and then turns its head and body towards the picture on the screen ( Figure 4).

Q4 in Japanese

01. Rh: f,,,,,iiiiiiiiiiiiiiiiiiiiiiiiiiiii02. Rg: f,,,,,iiiiiiiiiiiiiiiiiiiiiiiii,,11����5����.RFKLUD�QR� RVKLUR� R� VLWWH�LPDVX�ND≠? this lk castle acc know q

(Translation: Do you know this castle?)

Upon uttering kochira ‘this’, the robot moves its hand and looks towards the pic-ture, says imasu ‘to be (animate)’, and then begins shifting its gaze away from the picture and towards Participant 1 (line 3).

Here, the sentence structure is polar opposite between Japanese and English in regard to projectability:

Early arrivalEnglish: Do you know this castle?Japanese: Kochira no Oshiro o shitte imasu ka? Late arrival

The projection of the forthcoming sentence as a question sentence arrives much earlier in English than in Japanese as shown by the framed boxes.

3. Q6: tag question formatQ6 in English.

01. Rh: f,,,iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii02. Rg: f,,,iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii����5�����7KH!ĹPDQ�RQ�WKH�OHIW�RI�WKLV�SKRWR!���LV�D�YHU\�

famous

04. Rh: iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii,,,,,,,ommmmmmmmm05. Rg: iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii,,,,,,,111111111106. R : president of the United States,<isn’t he? (2.5)

When the robot produces the article the, it looks at the picture (line 3). After utter-ing United States, it turns its gaze towards Participant 1, while saying isn’t he (tag-format) (line 6).

Q6 in Japanese.

1 Rh: d,,,,,,,iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii2 Rg: f,,,,,,,iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

Page 15: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 379

3 R : kochira no hidarigawa no shashin no jinbutsu wa(.) this lk left side lk photo lk person top

4 Rh: iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii5 Rg: iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii6 R : Amerika gasshuukoku no daitooryoo toshite mo America United States lk president as also

7 Rh: i,,,,,,,,,,,,,ooo8 Rg: i,,,,,,,,,,,,,1119 R : yuumei desu yone? famous cop tag

(Translation: the man on the left of this photo is a very famous president of the United States of America, isn’t he?)

When the robot begins the question, saying kochira ‘this,’ it looks at the picture on the screen (line 3). In line 9, when saying yuumei ‘is famous’, the robot moves its gaze and looks at Participant 1 while uttering yone ‘isn’t he’ (tag-format). In Q6, the sentence structure is similar, using ‘isn’t he?’ and desu yone? ‘isn’t he’ between English and Japanese.

4. Initial analysis

Initially we wanted to know whether linguistic features such as key words (e.g. names of people and places) and question type affect the participants’ reac-tions and how these reactions differ between English and Japanese. As a result, we analyzed participants’ responses including utterances and bodily behaviors based on detailed transcriptions of the video data and quantitative summaries of the kinds of participants’ responses. In particular, we analyzed participants’ responses to the robots’ three different question formats (interrogative, tag ques-tion, and declarative sentence) and compared their responses in English and Japanese. In the following, we analyze the participants’ reactions in English and Japanese. We pay attention to (1) sentence beginning, and (2) sentence ending (TRP) and key words, and analyze participants’ responses towards (1) the end of the first word of a sentence beginning, (2) the end of the sentence end and the end of key words.

(1) Sentence beginningAlthough there are differences in word order, the majority of participants looked at the picture even when they were looking somewhere else prior to the pre- question. Also, we can observe that there is a deictic word “this” in the three questions (in

Page 16: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

380 Akiko Yamazaki et al.

Japanese, in Q4 and Q6; and in English in Q3), which is coordinated with the robot’s pointing to the picture. In response, the participants looked towards the screen (see Table 1). Besides Q4, there are no major differences in the participants’ reactions between the two language groups.

Table 1. The directions of participants’ gaze in English (E) and Japanese (J) at sentence beginning (Q3, Q4 and Q6) (The ‘n’ in parentheses is the number of tokens.)

Q3 (E) Q3 (J) Q4(E) Q4 (J) Q6(E) Q6 (J)

Image 81.0%(n = 17)

82.4%(n = 42)

47.6%(n = 10)

88.2%(n = 45)

90.5%(n = 19)

94.1%(n = 48)

Robot 9.5%(n = 2)

2.0%(n = 1)

23.8%(n = 5)

0%(n = 0)

9.5%(n = 2)

3.9%(n = 2)

Other direction 9.5%(n = 2)

15.7%(n = 8)

28.6%(n = 6)

11.8%(n = 6)

0%(n = 0)

2.0%(n = 1)

(2) Sentence endingFor all three questions, the robot is looking at either Participant 1 or Participant 3 at the end of the pre-question (TRP) both in Japanese and English. Besides one English participant who kept looking at the robot, all the participants shift their gaze towards the robot (Q3). In other words, when the robot looks at the par-ticipants, participants tended to look at the robot (Q3 (E), Q6 (E) and Q6 (J) in (Table 2). Also we noticed that some participants nod at this point. Some English participants also nodded towards the robot while saying minimal utterances such as ‘Yes, he is’ during Q6. In our initial observations, the English-speaking group nodded more often than the Japanese-speaking group (Table 3). This find-ing differs from previous literature that finds that Japanese people nod more than English-speakers in conversation (e.g. Maynard 1990). We will explore this matter in the detailed analysis section.

Table 2. Directions of participants’ gaze in English and Japanese at sentence ending (Q3, Q4, and Q6)

Q3 (E) Q3 (J) Q4(E) Q4 (J) Q6(E) Q6 (J)

Image 52.4%(n = 11)

86.3% (n = 44)

81.0% (n = 17)

86.3% (n = 44)

28.6%(n = 6)

76.5%(n = 39)

Robot 38.1%(n = 8)

5.9%(n = 3)

4.8%(n = 1)

5.9%(n = 3)

52.4%(n = 11)

21.6%(n = 11)

Other direction 9.5%(n = 2)

7.8%(n = 4)

14.3%(n = 3)

7.8%(n = 4)

19.0%(n = 4)

2.0%(n = 1)

Page 17: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 381

Table 3. Fraction of participants who nodded in English and Japanese at sentence ending (Q3, Q4, and Q6)

Q3 Q4 Q6

English Speaker 28.6%(n = 6)

0%(n = 0)

33.3%(n = 7)

Japanese Speaker 3.9%(n = 2)

9.8%(n = 5)

11.8%(n = 6)

(3) KeywordsWhen the robot utters key words such as “Arima hot springs”, English speakers tended to shift their gaze more often than Japanese speakers (see Table 4), and nodded their head more often (Table 5). As Lee et al. (2004) discuss, nodding shows that participants are highly engaged in human-robot interactions.

Table 4. Percentage and number of participants who shift their gaze towards the robot and fraction of participants who nodded in English and Japanese at the end of key word ‘Arima Hot Springs’ (Q3)

Gaze shift to the robot Nodding

English Speaker 33.3%(n = 7)

28.6%(n = 6)

Japanese Speaker 5.9%(n = 3)

3.9%(n = 2)

These results show that we find a rather distinctive difference in regard to fre-quency of gaze shifts and nodding at key words (e.g. ‘Arima Hot Springs’) between Japanese speakers and English speakers. We will discuss this result in Section 5.

5. Detailed analysis

In this section, we examine participants’ responses during the robot’s use of a keyword and tag-question in detail in order to discuss reasons as to why English speakers’ frequency of responses may be higher than those of Japanese speakers’ in regard to robot gaze.

5.1 Comparing responses during the keyword (in Q3)

The previous section showed a large difference in regard to frequency of nodding and gaze shifts at the end of key words (e.g. ‘Arima hot springs’) between Japanese speakers and English speakers.

Page 18: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

382 Akiko Yamazaki et al.

(1) Frequency of responses in regard to participants’ standing position

In regard to participants’ standing position, all participants in the position of Par-ticipant 3 showed some responses (gaze shift, utterance and nodding) towards the robot in English. In Japanese, as mentioned at Section 3.3, the robot says Arima Onsen (‘Arima Hot Springs’) in the middle of the sentence while looking at Par-ticipant 3. The response frequency of Participant 3 is the highest of the three par-ticipants (Table 5). This suggests that not only gaze shift but also placement of the keyword at the TRP might have an influence upon participants’ responses.

Table 5. Participant’s response frequency in regard to standing position by English and Japanese speakers

Participant 1 Participant 2 Participant 3

English Speaker 14.3%(n = 1)

71.4%(n = 5)

100%(n = 7)

Japanese Speaker 5.9%(n = 1)

5.9%(n = 1)

35.3%(n = 6)

What we can observe from these findings is that the participants whom the robot gazed toward tended to react more often. In English, the key word ‘Arima Hot Springs’ is in the sentence end position or in this case a TRP. However, it seems that both the placement of the keyword in this utterance (middle of the sentence in Japanese) and the direction of the robot gaze explains this result. (2) Fraction of participants who nodded in relation to participants’ standing

position.

Since nodding often expresses engagement in human interaction (Stivers 2008), we also examine the amount of nodding in regard to each participant’s standing position (Table 6).

Table 6. Fraction of participants who nodded in regard to standing position for English and Japanese speakers

Participant 1 Participant 2 Participant 3

English Speaker 0 42.9%(n = 3)

42.9%(n = 3)

Japanese Speaker 0 0 11.8%(n = 2)

For English speakers, 3 out of 7 sets of both Participant 2 and Participant 3 nodded their head. For Japanese speakers, two participants nodded towards the

Page 19: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 383

robot (Participant 3). The response frequency of English speaking participants is higher than that of Japanese speaking participants in general, and this is particu-larly true with regard to nodding. Nodding among Japanese speakers is of par-ticular interest to many scholars (e.g. Kogure 2007). It is often argued that the frequency of nodding is higher among Japanese than among English speakers. Our results contradict this research.

One possible explanation for these findings is that English speakers who live in the Kansai region of Japan (where the experiment with English-speakers was conducted) have a better knowledge of places such as ‘Arima Hot Springs’ than do Japanese students who live in the Tokyo region (where the experiment with Japanese speakers was conducted). According to the post-experiment question-naire, 16 of 21 English speakers had prior knowledge of both ‘Hyogo Prefecture’ and ‘Luminarie’ (see the right image of Figure 5), whereas only 13 of 51 Japanese speakers had prior knowledge of Luminarie (all had prior knowledge of Hyogo Prefecture). The participants’ prior knowledge likely influenced their responses to an extent. However, it does not fully explain the low incidence of nodding of Japanese speakers.

Another explanation is that the combination of robot gaze and place-ment of a key word within the utterance is important for eliciting participants’ responses in HRI. In particular at a TRP, participants shift their gaze according to the robot’s gaze and there is mutual gaze between the robot and participants. In Japanese, a key word is a noun modifier added to another noun, such as kankoochi (‘tourist site’), and it was far from the TRP. Japanese participants rarely nod or shift their gaze. In English, the key word was at the sentence end (TRP) where hearers can display their engagement by nodding (Rossano 2013). Both the placement of the key word within the sentence and gaze shift of the robot (c.f. Section 3.3) seem to be the two main factors that have led to this dif-ference between the two groups.

5.2 Comparing responses to tag-part of a tag-question (in Q6)

In this section, we analyze participant responses towards the tag-question in Q6. As we have shown in the previous section, there is also a large difference of responses to the tag part of a tag-question between the two groups, despite a simi-larity in word order (see Section 3.3).

Here we examine whether robot gaze shift affects the participants’ responses in regard to where they were standing in relation to the robot when the tag- question is mentioned both in English (Table 7) and Japanese (Table 8). We categorized participants’ reactions into five groups as follows.

Page 20: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

384 Akiko Yamazaki et al.

Table 7. English-speaking participants’ responses in regard to their standing position

gaze only

gaze +nod

gaze +word

nodding Uttering words

Participant 1 28.5%(n = 2)

28.5%(n = 2)

14.2%(n = 1)

0 0

Participant 2 42.8%(n = 3)

14.2%(n = 1)

0 14.2%(n = 1)

14.2%(n = 1)

Participant 3 14.2%(n = 1)

0 0 28.5%(n = 2)

14.2%(n = 1)

First, we explain the English speakers’ responses. When the robot says isn’t he, five participants in the Participant 1 position gaze mutually with the robot. Three participants uttered words (e.g. ‘yes, he is’ and ‘huh’) (Table 7). This shows that English-speaking participants are highly engaged in the HRI.

Table 8. Japanese speaking participants’ responses in regard to their standing position

gaze only

gaze +nod

gaze +word

nodding Uttering words

Participant 1 29.4%(n = 5)

5.9%(n = 1) 0 11.8%

(n = 2) 0

Participant 2 11.8%(n = 2)

5.9%(n = 1) 0 0 0

Participant 3 0 5.9%(n = 1)

0 5.9%(n = 1)

0

For Japanese participants, the response frequency is not so high, besides Par-ticipant 1’s responses (Table 8).

From the comparison of these two figures, we can make three observations. First, in terms of gaze shift, the furthest right participants produced more reactions in both English and Japanese; in other words, those whom the robot turns towards produced more reactions. Second, in terms of nodding, similarly to gaze shift, the furthest right participants produced more reactions in English. Third, there are more overall reactions by English speakers, such as gaze shift, talking, and nodding, than by Japanese speakers. One possible reason for this is the difference in sentence structure between English and Japanese. For instance, as for the tag-question, in Japanese the adjective yuumei (‘famous’) comes at the end of the sentence, func-tioning in the grammatical role of both adjective and predicate, whereas in English, this adjective comes in the middle of the sentence, functioning as an adjective and modifier of the word ‘president.’ This difference results in a difference in the mean-ing of the question. In Japanese, the question builds on the assumption that the participants know that President Kennedy is famous, and the correct answer is

Page 21: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 385

based on subjective judgment. In English, the question asks whether or not the participants know President Kennedy, who is famous, and the answer is factual.

What we can observe from this is that there is a relationship between the robot’s gaze and the participants’ reactions. In fact, English-speaking participants whom the robot gazed towards produced more reactions through both gaze shift and nodding. This is also true for the Japanese speaking participants with regard to gaze, however not so in terms of nodding.

6. Discussion and conclusion

In order to address issues on multiparty question-answer sequences in regard to non-verbal behaviors from a cross-linguistic perspective, we compared response activities such as gaze shift and nodding in English-speaking and Japanese- speaking participants within a quiz robot experiment. We found that participants coordinate their gaze and nodding in relation to the robot’s speech, gaze, and pointing.

First, we verified whether a grammatical feature affects the participants’ reac-tions and how the reactions differ in each language group. Notwithstanding the polar opposite of sentence structure in Q4, we could not find much of a difference between the two groups.

We revealed the reasons why English participants display a higher frequency of engagement in Q3 as follows: (1) Placement of keyword at the TRP of the utter-ance and (2) adequately coordinated robot gaze with utterance by analyzing par-ticipants’ responses in regard to their standing position.

We have observed a rather remarkable difference between the two groups in response to tag-questions in spite of a similarity in the word order in Japanese and English. Our findings are as follows: (1) Difference of clarity of questions for English and Japanese. However, (2) the coordinated combination of robot gaze and placement of word in interactionally significant places (c.f. TRPs) in the utter-ance is much more important for eliciting participants’ responses in general.

These findings support the belief that the coordination of verbal and non-verbal actions is a key element of HRI.

We will discuss the implication of our research for social interaction and HRI.In question-response sequence in multiparty interactions, gaze coordinated

with utterance plays a significant role in eliciting responses (e.g. nodding) from participants. There is a difference in engagement in regard to a participant’s position. The participants who were gazed at by the robot reacted more, when we examined the participants’ responses in regard to their standing position. This confirms not only the claim of previous research (Stivers 2010), but also it seems that a human standing or seating position which is correlated to the respective culture (Rossano, Levinson & Brown 2009) plays a much more crucial role in an

Page 22: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

386 Akiko Yamazaki et al.

individual’s response in multiparty interactions. We should consider the standing position in multiparty settings further.

Some researchers argue that Japanese speakers tend to accept a robot as a com-municative partner more than Westerners (Nomura et al. 2007). Other researchers discuss that the frequency of nodding is higher in Japanese than English speakers (Maynard 1990). Our result does not confirm these findings, as English-speaking participants nodded more often, and by doing so seemed to display engagement. Participants show their engagement in interaction with a robot when the robot’s utterance and bodily behavior such as gaze are coordinated appropriately. However, we should consider how participants recognize robot gaze and hand gesture. This suggests that we have to consider the height and size of a robot and the standing position of humans, especially in cross-cultural and cross-linguistic settings. Our results suggest that an in-depth understanding of social interaction in cross-cultural settings is key for successful implementation of human-robot interaction (HRI).

However, the number of English speakers in this experiment was rather small (21), and two of the question topics required knowledge of Japan. In order to verify our claim of the importance of verbal and non-verbal actions in multi-party interac-tion in cross-cultural settings, we have to replicate the study with a larger number of participants, not only in English but also in other languages, using this robot and dif-ferent sizes of robot. In this way, we can hope to better improve the implementation of robots in work settings such as museums in order to facilitate visitors’ engagement.

Acknowledgement

This work was supported by JSPS KAKENHI Grant Numbers 21300316 and 23252001.

References

Bavelas, J.B., Coates, L., & Johnson, T. (2002). Listener responses as a collaborative process: The role of gaze. Journal of Communication, 52(3), 566–580.

CMU cross-cultural receptionist corpus. Retrieved from https://github.com/maxipesfix/receptionist_corpus

Diginfonews (2010). Museum Guide Robot: Diginfo [Video file]. Retrieved from https://www.youtube.com/watch?v=2cSCwMOxccs

Face API: Retrieved September 6, 2013. from http://www.seeingmachines.com/product/faceapi/Fox, B.A., Hayashi, M., & Jasperson, R. (1996). Resources and repair: A cross-linguistic study of

syntax and repair. Studies in Interactional Sociolinguistics, 13, 185–237.Garfinkel, H. (1967). Studies in ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall.Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. Lan-

guage, thought, and culture. New York: Academic Press.

Page 23: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 387

Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal of Pragmatics, 32(10), 1489–1522. DOI: 10.1016/S0378-2166(99)00096-X

Goodwin, M.H. (1990). He-said-she-said: Talk as social organization among black children (Vol. 618). Bloomington: Indiana University Press.

Hayashi, M. (2010). An overview of the question-response system in Japanese. Journal of Prag-matics, 42(10), 2685–2702. DOI: 10.1016/j.pragma.2010.04.006

Heath, C., & Luff, P. (2000). Technology in action. Cambridge, UK: Cambridge University Press. DOI: 10.1017/CBO9780511489839

Iwasaki, S. (2009). Initiating interactive turn spaces in Japanese conversation: Local projection and collaborative action. Discourse Processes, 46(2–3), 226–246. DOI: 10.1080/01638530902728918

Jefferson, G. (1984). Transcript notation. In J.M. Atkinson & J. Hertiage (Eds.), Structures of social action. Studies in conversation analysis (pp. ix–xvi). Cambridge, UK: Cambridge Uni-versity Press.

Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta psychologica, 26, 22–63.

Knight, H., & Simmons, R. (2012). Estimating human interest and attention via gaze analysis. In Proceedings of the 12th International Conference on Intelligent Virtual Agents (IVA’12, pp. 245–251).

Kogure, M. (2007). Nodding and smiling in silence during the loop sequence of backchannels in Japanese conversation. Journal of Pragmatics, 39(7), 1275–1289.

Lee, C., Lesh, N., Sidner, C.L., Morency, L.P., Kapoor, A., & Darrell, T. (2004). Nodding in con-versations with a robot. In CHI’04 Extended Abstracts on Human Factors in Computing Systems (pp. 785–786). ACM.

Makatchev, M., Simmons, R., & Sakr, M. (2012). A cross-cultural corpus of annotated verbal and nonverbal behaviors in receptionist encounters. In Gaze in HRI: From Modeling to Commu-nication workshop at 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Boston, USA, March, 2012.

Maynard, S.K. (1990). Conversation management in contrast: Listener response in Japanese and American English. Journal of Pragmatics, 14(3), 397–412. DOI: 10.1016/0378-2166(90)90097-W

Mutlu, B., Shiwa, T., Kanda, T., Ishiguro, H., & Hagita, N. (2009). Footing in human-robot con-versations: how robots might shape participant roles using gaze cues. In Proceedings of the 4th ACM/IEEE International Conference on Human-Robot Interaction (pp. 61–68). ACM.

Nakano, Y., & Rehm, M. (2009). Multimodal corpus analysis as a method for ensuring cultural usability of embodied conversational agents. In Human centered design (pp. 521–530). Springer Berlin: Heidelberg. DOI: 10.1007/978-3-642-02806-9_60

Nomura, T., Kanda, T., Suzuki, T., Han, J., Shin, N., Burke, J., & Kato, K. (2007). Implications on humanoid robots in pedagogical applications from cross-cultural analysis between Japan, Korea, and the USA. In Robot and Human Interactive Communication, 2007. RO-MAN 2007. The 16th IEEE International Symposium on (pp. 1052–1057). IEEE. DOI: 10.1109/ROMAN.2007.4415237

Pitsch, K., Vollmer, A. L., & Muhlig, M. (2013). Robot feedback shapes the tutor’s presenta-tion. How a robot’s online gaze strategies lead to micro-adaptation of the human’s conduct. Interaction Studies, 14(2), 268–296.

Rossano, F. (2013). Gaze in conversation. In J. Sidnell & T. Stivers (Eds.), The handbook of con-versation analysis (pp. 308–329). Malden, MA: Blackwell.

Rossano, F., Brown, P., & Levinson, S.C. (2009). Gaze, questioning and culture. In J. Sidnell (Ed.), Conversation analysis: Comparative perspectives (pp. 187–249). Cambridge, UK: Cambridge University Press.

Page 24: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

388 Akiko Yamazaki et al.

Sacks, H., Schegloff, E.A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 696–735. DOI: 10.2307/412243

Sacks, H. (1987). On the preferences for agreement and contiguity in sequences in conversation. In G. Button & J.R.E. Lee (Eds.), Talk and social organization (pp. 54–69). Clevedon, UK: Multilingual Matters.

Schegloff, E.A. (2007). Sequence organization in interaction: A primer in conversation analysis (Vol. 1). Cambridge University Press.

Suchman, L. (2006). Human-machine reconfigurations: Plans and situated actions. Cambridge, UK: Cambridge University Press.

Stivers, T. (2008). Stance, alignment, and affiliation during storytelling: When nodding is a token of affiliation. Research on Language and Social Interaction, 41(1), 31–57.

Stivers, T. (2010). An overview of the question-response system in American English conversa-tion. Journal of Pragmatics, 42(10), 2772–2781.

Stivers, T., & Rossano, F. (2010). Mobilizing response. Research on Language and Social Interac-tion, 43(1), 3–31.

Tanaka, H. (1999). Turn-taking in Japanese conversation: A study in grammar and interaction (Vol. 3). Amsterdam: John Benjamins.

Traum, D., Aggarwal, P., Artstein, R., Foutz, S., Gerten, J., Katsamanis, A., Leuski, A., Noren, D., & Swartout, W. (2012). Ada and grace: Direct interaction with museum visitors. In Y. Nakano et al. (Eds.), Proceedings of the 12th International Conference on Intelligent Virtual Agents (IVA’12, pp. 245–251). Heidelberg: Springer.

Yamazaki, A., Yamazaki, K., Burdelski, M., Kuno, Y., & Fukushima, M. (2010). Coordination of verbal and non-verbal actions in human-robot interaction at museums and exhibitions. Journal of Pragmatics, 42(9), 2398–2414. DOI: 10.1016/j.pragma.2009.12.023

Yamazaki, A., Yamazaki, K., Kuno, Y., Burdelski, M., Kawashima, M., & Kuzuoka, H. (2008). Precision timing in human-robot interaction: Coordination of head movement and utter-ance. In Proceedings of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems (CHI’ 08, pp. 131–140). ACM. DOI: 10.1145/1357054.1357077

Yamazaki, A., Yamazaki, K., Ohyama, T., Kobayashi, Y., & Kuno, Y. (2012). A techno-sociological solution for designing a museum guide robot: Regarding choosing an appropriate visitor. In Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI’ 12, pp. 309–316). ACM.

Yamazaki, K., Yamazaki, A., Okada, M., Kuno, Y., Kobayashi, Y., Hoshi, Y., & Heath, C. (2009). Revealing gauguin: Engaging visitors in robot guide’s explanation in an art museum. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI’ 09, pp. 1437–1446). ACM. DOI: 10.1145/1518701.1518919

Authors’ addresses

Akiko Yamazaki Keiichi YamazakiTokyo University of Technology Saitama University1404–1, Katakura, Hachioji, 255 Shimo-Okubo, Sakura-ku, Tokyo, Japan, 192–0982 Saitama-shi, Japan 338–8570

[email protected] [email protected]

Page 25: John Benjamins Publishing Company - Osaka Universitymburdel/yamazaki-ikeda-burdelski.pdfJohn Benjamins Publishing Company !is electronic "le may not be altered in any way. !e author(s)

© 2013. John Benjamins Publishing CompanyAll rights reserved

Interaction between a robot and multiple participants 389

Keiko Ikeda Matthew BurdelskiKansai University Osaka University3-3-35 Yamate-cho, Suita-shi, 1–5 Machikaneyama, Toyonaka-shi,Osaka, Japan, 564–8680 Osaka, Japan, 560–8532

[email protected] [email protected]

Mihoko Fukushima Tomoyuki SuzukiSaitama University Saitama University255 Shimo-Okubo, Sakura-ku, 255 Shimo-Okubo, Sakura-ku,Saitama-shi, Japan 338–8570 Saitama-shi, Japan 338–8570

[email protected] [email protected]

Miyuki Kurihara Yoshinori KunoSaitama University Saitama University255 Shimo-Okubo, Sakura-ku, 255 Shimo-Okubo, Sakura-ku,Saitama-shi, Japan 338–8570 Saitama-shi, Japan 338–8570

[email protected] [email protected]

Yoshinori KobayashiSaitama University255 Shimo-Okubo, Sakura-ku, Saitama-shi, Japan 338–8570JST PRESTO

[email protected]

Authors’ biographical notes

Akiko Yamazaki is an Associate Professor of sociology in the Department of Media at Tokyo University of Technology.

Keiichi Yamazaki is a Professor of sociology at Saitama University.

Keiko Ikeda is an Associate Professor and a coordinator for the Japanese Language Program Division of International Affairs, Kansai University.

Mathew Burdeslki is Associate Professor of Japanese Linguistics in the Graduate School of Let-ters at Osaka University.

Mihoko Fukushima is a researcher at Saitama University.

Tomoyuki Suzuki is an undergraduate student at Saitama University.

Miyuki Kurihara is an undergraduate student at Saitama University.

Yoshinori Kuno is a Professor at Saitama University.

Yoshinori Kobayashi is an Assistant Professor in the Department of Information and Computer Sciences, Saitama University.