9
A Wizard-of-Oz platform for embodied conversational agents By Edward Brown and Neil Barrett * ********************************************************************************************* A low-cost prototyping environment for experimenting with embodied conversational agents is discussed. The platform allows modeling and experimenting with different agent constructs and protocols prior to significant investment in the construction of the agent environment. Problems in the design of such a platform include the substantial number of agent controls needed and the flexibility required to represent the constructs of different theories, protocols and target environments as they are introduced and developed. These problems are addressed by augmenting a movie clip manager with a general drawing palette as a design tool. The result is a prototyping environment which simulates multiple agents on a desktop while allowing arbitrary notational conventions. The current version does not render multiple agents in a shared virtual environment, but the protocol-based architecture is amenable to such extensions. In the meantime, valuable results regarding the social character of multiple agent interaction can be explored with the existing tool. Copyright # 2006 John Wiley & Sons, Ltd. Received: 10 April 2006; Revised: 2 May 2006; Accepted: 10 May 2006 KEY WORDS: agent; conversational; prototyping; Wizard-of-Oz Introduction Prototyping environments for user interfaces typically adopt a notation for describing the interaction between the user and the system. Whether this is a state-machine, flowchart or storyboard 1–3 the choice of notation tends to be structured. This can have the drawback of restricting choices for the character of the interface produced. For example, if the system prototypes conversational agents (virtual people), it may target constructs such as tone, facial expression or body ges- tures to reflect psychosocial elements of the conversa- tion. The resulting over-structured environments 1–7 tend to be brittle (i.e., tend to be restricted to a particular modeling approach, social or linguistic theory). Further- more, higher level psychological and affective con- structs, such as attitudes, beliefs, opinions, and disposition, are generally difficult to model and test, particularly if the scenario involves multiple actors and multiple users. For the purpose of experimenting with different embodied conversational agent (or agent) personalities, different theoretical notations, and different combi- nations of multiple agents, we have developed a low cost prototyping environment which is not limited to a prescriptive notation. This flexibility allows the exper- imenter to deal with conversations which are too complex for conventional notational constructs. In addition, the experimenter can change to a different notation when the theoretical constructs underlying their investigation changes. (e.g., from a speech act theory to a group dynamics theory). The challenge that is discussed in the remainder of this paper is the design of a flexible prototyping environment, which will allow fast and effective control of multiple agents by a ‘behind the scenes’ operator. In addition to convenient layout of agent controls, the tool is amenable to a variety of experimental models and theories, which evolve as the research progresses. Our system constitutes a simple and novel solution within these constraints. The discussion herein is separated into five major sections. The section ‘The Wizard Interface’ describes the tool’s main interface; the section ‘prototyping an agent interface’ discusses the steps involved during the COMPUTER ANIMATION AND VIRTUAL WORLDS Comp. Anim. Virtual Worlds 2006; 17: 249–257 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cav.129 ******************************************************************************************************************* *Correspondence to: N. Barrett, Department of Computer Science, Memorial University of Newfoundland, St. Johns, NF Canada, A1B 3X5. E-mail: [email protected] ******************************************************************************************************************* Copyright # 2006 John Wiley & Sons, Ltd.

A Wizard-of-Oz platform for embodied conversational agents

Embed Size (px)

Citation preview

Page 1: A Wizard-of-Oz platform for embodied conversational agents

A Wizard-of-Oz platform for embodiedconversational agents

By Edward Brown and Neil Barrett** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

A low-cost prototyping environment for experimentingwith embodied conversational agents

is discussed. The platform allows modeling and experimenting with different agent

constructs and protocols prior to significant investment in the construction of the agent

environment. Problems in the design of such a platform include the substantial number of

agent controls needed and the flexibility required to represent the constructs of different

theories, protocols and target environments as they are introduced and developed. These

problems are addressed by augmenting a movie clip manager with a general drawing palette

as a design tool. The result is a prototyping environment which simulates multiple agents on

a desktop while allowing arbitrary notational conventions. The current version does not

render multiple agents in a shared virtual environment, but the protocol-based architecture is

amenable to such extensions. In themeantime, valuable results regarding the social character

of multiple agent interaction can be explored with the existing tool. Copyright # 2006 John

Wiley & Sons, Ltd.

Received: 10 April 2006; Revised: 2 May 2006; Accepted: 10 May 2006

KEY WORDS: agent; conversational; prototyping; Wizard-of-Oz

Introduction

Prototyping environments for user interfaces typically

adopt a notation for describing the interaction between

the user and the system.Whether this is a state-machine,

flowchart or storyboard 1–3 the choice of notation tends

to be structured. This can have the drawback of

restricting choices for the character of the interface

produced. For example, if the system prototypes

conversational agents (virtual people), it may target

constructs such as tone, facial expression or body ges-

tures to reflect psychosocial elements of the conversa-

tion. The resulting over-structured environments1–7

tend to be brittle (i.e., tend to be restricted to a particular

modeling approach, social or linguistic theory). Further-

more, higher level psychological and affective con-

structs, such as attitudes, beliefs, opinions, and

disposition, are generally difficult to model and test,

particularly if the scenario involves multiple actors and

multiple users.

For the purpose of experimenting with different

embodied conversational agent (or agent) personalities,

different theoretical notations, and different combi-

nations ofmultiple agents, we have developed a low cost

prototyping environment which is not limited to a

prescriptive notation. This flexibility allows the exper-

imenter to deal with conversations which are too

complex for conventional notational constructs. In

addition, the experimenter can change to a different

notation when the theoretical constructs underlying

their investigation changes. (e.g., from a speech act

theory to a group dynamics theory).

The challenge that is discussed in the remainder of

this paper is the design of a flexible prototyping

environment, which will allow fast and effective control

of multiple agents by a ‘behind the scenes’ operator. In

addition to convenient layout of agent controls, the tool

is amenable to a variety of experimental models and

theories, which evolve as the research progresses. Our

system constitutes a simple and novel solution within

these constraints.

The discussion herein is separated into five major

sections. The section ‘The Wizard Interface’ describes

the tool’s main interface; the section ‘prototyping an

agent interface’ discusses the steps involved during the

COMPUTER ANIMATION AND VIRTUAL WORLDS

Comp. Anim. Virtual Worlds 2006; 17: 249–257

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cav.129* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

*Correspondence to: N. Barrett, Department of ComputerScience, Memorial University of Newfoundland, St. Johns,NF Canada, A1B 3X5. E-mail: [email protected]

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd.

Page 2: A Wizard-of-Oz platform for embodied conversational agents

prototyping process; the section ‘programming of

modifications’ enhancing the prototyping environment

by script programming; the ‘laboratory environment’

describes the physical setup and the ‘implementation

technologies’ section reveals the details behind the

laboratory environment.

TheWizard Interface

The Wizard-of-Oz approach to interface design provides

a way to prototype complex or intelligent interfaces by

allowing a human (wizard) to operate ‘behind the scenes’,

simulating some of the complexity of the interface design

before it is actually built. Our tool, named WOZECA,

extends the Wizard-of-Oz concepts to visual representa-

tion of an agent. In its simplest interpretation, WOZECA

is a specializedmovie player that allows awizard to select

movies which are immediately or subsequently played

for a user. In broad concept, the WOZECA simulates the

participation of multiple conversational agents, while the

behaviors of the agents and subject are observed and

captured for later analysis. Agents are simulated by real

actorswhich are filmed and their clips segmented prior to

user experiments. The tool uses an inventory, or bank, of

prerecorded video clips as available responses of the

agent.

One variation of our Wizard-of-Oz tool for conversa-

tional agents is illustrated in Figure 1. Agent behaviours

are represented by buttons which the wizard lays out on

her operating canvas. These buttons can be manufac-

tured by the wizard or copied from an inventory of clips

which are also available on screen. During a participant

(subject) experimental session, the wizard activates

agent behaviors by pressing the appropriate buttons

directing agents to interact with the users or with each

other on the participant’s screen.

The wizard may layout the agent responses (i.e., their

corresponding buttons or movie clips) according to

whatever organizational construct is appropriate for

Figure 1. Design using the wizard’s pallette.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 250 Comp. Anim. Virtual Worlds 2006; 17: 249–257

E. BROWN AND N. BARRETT* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 3: A Wizard-of-Oz platform for embodied conversational agents

their experiment. For example, the clips can be arranged

according to their corresponding speech act, their role or

stage in the dialog, the tone or attitude of the agent, or

any other factor or combination of factors which is

relevant to the experiment.

The wizard tool has a standard drawing palette to lay

out 2½ - D graphic objects on the same canvas as the

buttons. By mixing arbitrary graphic elements with clip

buttons, theory or practice driven notation can be added

to the layout of clips elements on the screen. The wizard

may adopt a standard notation or develop a new one;

this relies on the adoption of conventions by the wizard

as opposed to forcing a particular structure on the

design. To support the development and/or adoption of

conventions by the wizard, the tool has multiple graphic

features—for example, colour might represent the agent

attitude, while ellipses could be used to illustrate stages

in the dialog.

Using the graphical elements (such as colour, position,

size, text fonts, shapes) for grouping behaviors,

conversational state, or particular themes increases

demand on the screen real estate. The potential for this

flexibility to produce clutter is compounded when

simulatingmultiple agents simultaneously. To deal with

screen real estate limitations, quick-windows were

added, which the wizard can pop up and use as a

layout canvas in the same manner as the main canvas.

Quick-windows were initially conceived for thematic

groupings of clips, but they can extend any modeling

notation, by collapsing part of a dialog diagram. For

example, alternative responses can be collected in a

quick-window rather than consume a large screen area

for a conceptually simple part of the interaction. A

simple gesture (currently, a mouse click) converts a clip

button into a quick-window button. Once the quick-

window is created, copy/paste operations can be used

to move or duplicate parts of the notation.

Prototyping anAgent Interface

Typical user interface prototyping involves a sequence

of design, test and analyze phases.2 The present section

will discuss how these phases apply to the WOZECA

tool in contrast to more conventional approaches.

The design phase begins with the creation of a

script that contains a set of text-statements. The text-

statements are subsequently recorded with consumer-

grade digital video tools and equipment, resulting in a

set of video clips (short movies). The video clips are

added to an inventory, which thewizard than lays out as

buttons onWOZECA canvas in preparation for the pilot

test.

Also part of the design phase is the layout of notation

and clip buttons within WOZECA; the wizard prepares

the organization she anticipates will be suitable for the

upcoming interaction with the test participant. Any

notation which can be represented with the drawing

palette tools may be adopted. Figure 2, for example,

illustrates a notion based loosely on a speech act model

from Reference [8]. It is this flexibility which dis-

tinguishes our WOZECA system from convention

prototyping environments such as SUEDE.2

Designed to prototype speech interfaces, SUEDE

provides the wizard with conversational responses,

rigidly structured in the form similar to a state-machine,

where each response operates as a state transition. To

operate SUEDE during a participant session, the wizard

selects audio clips from a list. After the system produces

the corresponding output, which corresponds to a

response from the simulated agent, the system tran-

sitions to the next list.

SUEDE illustrates our concern with over-structured

prototyping environments. Such tools restrict the

designer to a rigid representation of the interaction

between system and user; this representation can be

restrictive. Even conventional graphical user interface

prototyping tools (without conversational agents),

such as CrossWeaver3 and DEMAIS,1 depend on

a structuring mechanism such as a storyboard or

flowchart.

WOZECA’s approach provides general tools to

organize and structure the prototyping environment

rather than impose a particular structure on the agent-

user conversation. Our Wizard-of-Oz tool allows the

wizard to represent different structures or to proceed

without any preconception of the appropriate

structure. Even if there are repeated and observable

tendencies,9 with WOZECA the conversation need

not follow a predetermined path. More concretely,

WOZECA furnishes mechanisms such as a drawing

palette which help a wizard organize the available clips

without an underlying structure such as a finite state

machine.

Having laid out the conversational design, the wizard

is prepared tomove into the testing phase. In said phase,

the prototype is used as a pilot, and the participant is

present. While a user interacts with the system, the

wizard pushes buttons and manipulates the layout to

launch clips he created during the previous design

phase. When the wizard requests a clip by pressing a

button and a clip is already playing, then the new clip is

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 251 Comp. Anim. Virtual Worlds 2006; 17: 249–257

EMBODIED CONVERSATIONAL AGENTS* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 4: A Wizard-of-Oz platform for embodied conversational agents

queued. Play requests queue on the user’s computer,

thus preserving the order in which buttons are pressed.

This allows the wizard to compose agent behaviours

from several clips.

The wizard may choose to merely activate the agent

behaviors in response to the participant’s actions, or the

wizard may rearrange and reorganize the layout while

the testing is ongoing, forWOZECA operates identically

during the design and test phase. For a simple example,

buttons could be deleted from the layout once they are

used.

Finally, an analysis is performed using information

gathered during the test phase which may include

information such as event logs or a video record of the

user. The data analysis contributes to the evaluation of

the conversational agent interface design. In principle,

there is no distinction between WOZECA and other

systems with respect to analysis.

Programming ofModif|cations

WOZECA’s flexible nature has implications beyond the

use of different organizational notation. The ubiquity of

the palette’s feature blurs the distinction between

designer, developer, and application user, as the

wizard moves seamlessly between activities of canvas

layout (design) and interacting with participants (test).

The same individual may undertake all of these as part

of their investigation of their particular research

programme. Layout and design of the code and the

wizard canvas area may in fact represent elements of

the theory that is being tested, which changes

periodically, or even evolve as a program of research

evolves.

In some cases, the notational flexibility of the drawing

tools may not be sufficient for the envisioned layout.

Since the wizard tool interface is built using a scripting

Figure 2. A notation based on speech act theory.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 252 Comp. Anim. Virtual Worlds 2006; 17: 249–257

E. BROWN AND N. BARRETT* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 5: A Wizard-of-Oz platform for embodied conversational agents

language, designers with programming skills can

modify the functionality of the tool. Our wizard

population is drawn from our research group, many

of whom are competent programmers (or have access to

team members with programming skills) and may have

significant software development experience.

Systems of this character (that blend design, program-

ming and end-use) have been studied and created under

Figure 4. A graphical depiction of the WOZECA system.

Figure 3. A wizard’s interface during the test phase.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 253 Comp. Anim. Virtual Worlds 2006; 17: 249–257

EMBODIED CONVERSATIONAL AGENTS* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 6: A Wizard-of-Oz platform for embodied conversational agents

terms such as ‘end-user programming’,10 ‘programma-

ble design environments’,11 and ‘integrated’ user

environments.12 (An example of such a system is

Mathematica.13) These systems blend design and user

environments to different degrees, extending the notion

of what the user does into the design and re-design of

the environment itself.

The code architecture has been structured to support

end-user programming, with programming hooks at

anticipated modification points. For example, visual

feedback might be useful to indicate which agent

behaviors have occurred; consequently a specific

program event is fired once a behavior clip is triggered.

The wizard-programmer can script a response to this

event—for example, moving, coloring, and/or deacti-

vating the clip button corresponding to the particular

agent behavior.

As a consequence of its flexibility and end-user

programming, our wizard tool is continually evolving.

Both the layout notation on the canvas and the tool code

are revised for particular experimental questions under

examination. Unlike conventional program develop-

ment, there is no anticipation of working towards a

completed application, and no version of the tool is

authoritative. Each is appropriate to the investigation at

hand.

This evolution partly relies on the programming skills

of our users. While similar to the effects of end-user

programming, our group has had access to professional

quality programming skills among our users. For us, it is

more a matter of providing possibilities to existing

designers. We do not believe that any programming is

necessary to make effective use of the tool but it does

extend the designers repertoire. The provision of

standard drawing tools to support the canvas layout

provides flexibility well beyond fixed notation, for

wizard/operators that choose not to use the scripting

capability. Thus, even without relying on programming

features, the wizard tool has the requisite flexibility to

support multiple notations and theories, and for the

wizard to rapidly configure the layout for specific

experiments.

Implicit in this approach are two claims that will only

be tested as we gain experience working with this

evolutionary approach: First, that the standard drawing

tool paradigm and features (group/copy/cut/paste/

drag/drop) provide sufficient efficacy and usability in

comparison to a tool that enforces a fixed notation. That

is, the loss of specialized features and gestures for a

particular notation is more than compensated by the

flexibility in the generalized drawing tools. The second

claim is that the additional burden on the designer to

think about the notation (and possible programming

effort) will not distract from the designers main task of

creating conversational agents. In contrast, we expect

the extra design features will enrich this main task.

The remainder of this section relates some of our

experience (piloting several sessions) withmanaging the

flexibility of WOZECA. To date, restricting the con-

versation topic and time constraint (10 minutes)

improved the wizard’s ability to maintain a dialogue

with the user. We separated conversational tendencies

into seven categories (see Table 1). The two most

important categories are greeting and redirection for

these categories maintain the conversational topic

within bounds that are addressable by the recorded

clips. A proper greeting constrains the conversation and

provides a direction for the first user interaction, and

the redirection category consist of statements that allow

the wizard to change the topic when it deviates from the

expected. Applying a color scheme to the categories

renders approximately 70 buttons (see Figure 3), each a

movie clip, quite manageable. With this scheme the

wizard finds and activates ECA responses to user

queries and statements almost instantly. The current

scheme categories are: greeting, affirmative, negative,

confused, and redirection in one color, a color for

specific information and a color for general information.

Category Description Example

Greeting Start or stop a conversation HelloAff|rmative Positive response YesNegative Negative response NoNeutral Neutral response MaybeConfused Problemwith the user’s input I did not understand. . .Redirection Control and direct conversation Consider this. . .Information Information statements The ball is blue.

Table 1. Dialogue categories

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 254 Comp. Anim. Virtual Worlds 2006; 17: 249–257

E. BROWN AND N. BARRETT* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 7: A Wizard-of-Oz platform for embodied conversational agents

As the number of clips and complexity of the conversa-

tion grows, the wizard can make use of notational

capabilities (quick-windows, drawing tools, etc.) to

futher organize the interaction with more complex

designs.

The wizard’s singular difficulty is a non-linear style of

conversation. When the user asks two or more questions

at once (e.g.: Is the ball green and the square blue?), it is

often difficult to answer both questions in a natural

manner. Since users seem to tolerate ‘breakdown’ in

the conversation, one mechanism for handling non-

linear conversation is to ignore all but one aspect

rendering the conversation linear. Thus, if the user asks,

‘Is the ball green and the square blue?’, the embodied

conversational agent could answer, ‘The square is not

blue. Maybe we can talk about the green diamonds’.

What we see evolving through Table 1, including its

color and spatial representation on the wizard canvas, is

an unplanned (if rudmentary) theory of conversation.

We also envision WOZECA being used in a more

prescriptive fashion by testing existing conversational

theories (such as speech act theory8).

Laboratory Environment

Low cost was a principle objective in developing our

environment. WOZECA is embedded in a low-cost

portable lab environment to facilitate embodied con-

versational agent research (see Figure 4). The production

of the movie clips and the configuration of the clips into

windows on the participant’s computer screen is what

manifests the illusion of autonomous conversational

agents. Agents appear in their prescribed window

configuration and their related audio track (typically

speech) is heard via attached speakers. Participants

appear to communicate to the agents by typing in a

chatter box window, located in the bottom portion of the

screen, although they are really communicating to the

wizard. The chatter box resembles Instant Messaging or

Internet Relay Chat in functionality.We have considered

the addition of a microphone for voice communication,

but for the moment we use a keyboard interface to limit

the reaction time required of the wizard. The participant

set-up currently has no use for a mouse.

A separate video record is made of the participant’s

activity. The wizard also has a view through the

participant camera (a ‘web-cam’) so she can make

judgments about the flow of conversation based on the

participant’s reactions.

Consistent time stamps on the different data acqui-

sition equipment is critical for proper analysis since data

capture occurs with different applications on different

machines; and the data from these sources must be

collated after the experimental sessions. Timing data is

synchronized from the video feed, the wizard tool, the

agent behaviors and the participant’s input as exper-

imental data capture.

ImplementationTechnologies

The production of movie clips is deliberately separated

from the wizard tool. The best available tools and

environment for this activity may change, and even the

type of tool used to construct virtual agents changes.

Currently, we use a digital video camera (Canon Elura

85) to record real actors directly to a hard drive. The

video is edited post capture, and an XML inventory

file is created, making the movie clips available to the

wizard tool (This is the most time consuming part of the

design phase.). Once the XML inventory file is available,

the wizard loads the file into an inventory window

available, to the wizard, at any time, although most

commonly used during the design phase (the layout of

the wizard canvas in preparation for participant tests).

WOZECA implements a cross-platform client-server

architecture for testing prototypes. The wizard’s tool,

written using Revolution 2.6, connects to a server,

written in the python language, which acts as an

abstraction layer hiding the details of the movie player

(currently, the open source MPlayer application). Upon

receiving a TCP connection from the wizard tool, the

server dispatches a thread to handle the simple low-

bandwidth protocol. As an example this protocol, the

server responds to the ‘play’ command by queuing or

starting the appropriate movie clip. The server is

designed to manage multiple connections simul-

taneously in anticipation of multiple wizards when

the experiments advance to a larger agent scenarios.

Since the agent movie clips reside on the server (user’s)

machine the network is only burdened with the

participant video feed and the simple command

protocol. WOZECA is expected to perform well with

limited bandwidth (e.g., WIFI). In the near future, a

wireless computer on a mobile platform will provide a

portable experimental station which could be moved to

suitable locations in our institution—bringing the lab to

the participants.

WOZECA runs on two Gentoo Linux (2.6.11-r1)

systems with a 64-bit AMD Athlon 3000 processors,

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 255 Comp. Anim. Virtual Worlds 2006; 17: 249–257

EMBODIED CONVERSATIONAL AGENTS* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 8: A Wizard-of-Oz platform for embodied conversational agents

one io Vibe ieee1394 card (NEC uPD72874 chip)

connected to a Canon Elura 85 camera and one Nvidia

NV43 GeForce 6200 dual headed graphics card. The

wizard tool is written in Revolution 2.6 driving a python

server for clip recording/playing. Kino is an open-

source simple video editor. Outlay for the installation

(excluding software development and movie pro-

duction expenses) is under 3500 Canadian dollars.

There are some stability issues with some of the

software, particularly with Kino versions prior to 0.8,

but these are within the range of typical research

installations, and in any case do not come into play

during participant sessions.

Conclusions

We have built an environment for the study of complex

conversational agent interactions, based on aWizard-of-

Oz simulation of agents. It serves as a proof-of-concept

that a low cost, mobile and flexible environment can

address genuine questions regarding complex human-

agent interaction. With careful preparation, experiments

can be conducted on the efficacy of artificial personality

constructs and agent protocols before expensive and

complex systems are actually built and tested. It is

currently limited to representing agents with movie

clips although the underlying protocol could easily be

extended to activate rendering of agents. This means

that the agent behaviors tend to be those that can be

readily filmed as independent agent movie clips as

opposed to agents physically interacting within one

movie clip. Eventually, we anticipate adding clips in

which agents interact physically as well.

The hope is that the novel design concepts incorpor-

ated into WOZECA will enhance the prototyping

environment’s capabilities and the prototypes of con-

versational agents. The design of WOZECA has

emphasized a simple participant model, a highly

configurable wizard tool and the ability to capture both

experimental data and the wizard tool configuration,

making them available for post-experiment analysis.

Our environment is low cost and easily replicated.

Because the protocols and data capture have been built

on standard network technology with low-bandwidth

requirements, the environment is highly mobile, and the

participant stations and wizards station can be easily

relocated.

The primary advantage to our system, however, is

that it does not restrict the investigator to any particular

representation of the human-agent conversation. This

flexibility in representing different structures means the

designer is not restricted to built-in notation, but can

adopt their own notation or adhere to known conven-

tions. Our next step is expanding the scope of

experiments we conduct, which will test our claims

regarding the flexibility of this environment.

References1. Bailey BP, Konstan JA. Are informal tools better?: compar-

ing demais, pencil and paper, and authorware for earlymultimedia design. In Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems. 2003; pp. 313–320.

2. Klemmer SR, Sinha AK, Chen J, Landay JA, Aboobaker N,Wang A. Suede: a wizard of oz prototyping tool for speechuser interfaces. In Proceedings of the 13th Annual ACMSymposium on User Interface Software and Technology. 2000;pp. 1–10.

3. Sinha AK, Landay JA. Capturing user tests in amultimodal,multide-vice informal prototyping tool. In Proceedings of the5th International Conference on Multimodal Interfaces. 2003;pp. 117–124.

4. Balci K. Xfaceed: authoring tool for embodied conversa-tional agents. In Proceedings of the 7th International Conferenceon Multimodal Interfaces. 2005; pp. 208–213.

5. Klein F, Giese H. Analysis and design of physical and socialcontexts in multi-agent systems using uml. In Proceedings ofthe Workshop on Software Engineering for Large-Scale Multi-Agent Systems. 2005; pp. 1–8.

6. Maya V, LamolleM, PelachaudC. Influences and embodiedconversational agents. In The Third International Joint Con-ference on Autonomous Agents and Mul-tiagent Systems, oncomplexity of modeling ECA features personality. 2004;pp. 1306–1307.

7. Pelachaud C. Multimodal expressive embodied conversa-tional agents. In The 13th Annual ACM International Con-ference on Multimedia, 2005; pp. 683–689.

8. Winograd T, Flores F. Understanding Computers and Cogni-tion. Addison-Wesley: Ablex, Norword, NJ, 1986.

9. Ole Bernsen N, Dybkjær L. Evaluation of spoken multi-modal conversation. In Proceedings of the 6th InternationalConference on Multimodal Interfaces. 2004; pp. 38–45.

10. Ko AJ, Myers BA. Human factors affecting dependability inend-user programming workshop on end-user softwareengineering. In First Workshop on End User Software Engin-eering (WEUSE I). 2005; pp. 1–4.

11. Eisenberg M, Fischer G. Programmable design environ-ments: Integrating end-user programming with domain-oriented assistance. In Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems. 1994; pp. 431–437.

12. Kelleher C, Pausch R. Lowering the barriers to program-ming: a taxonomy of programming environments andlanguages for novice programmers. ACM Computing Sur-veys (CSUR) 2005; volume 37: pp. 83–137.

13. Maeder RE. Computer Science With Mathematica: Theory andPractice for Science, Mathematics, and Engineering. CambridgeUniversity Press: Cambridge, 2000.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 256 Comp. Anim. Virtual Worlds 2006; 17: 249–257

E. BROWN AND N. BARRETT* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 9: A Wizard-of-Oz platform for embodied conversational agents

Authors’ biographies:

Dr Edward Brown is an associate professor at Memor-ial’s Department of Computer Science with principalresearch interests in user interface agents, intellectualproperty, and privacy issues related to technology. Histeaching responsibilities are primarily software designcourses. He has an undergraduate degree from Memor-ial University, MSc and PhD from the University ofToronto, and LL.B. from the University of Victoria,

Canada. Dr Brown has worked in the area of toolkitimplementation and user interface design in NorthAmerica and Europe, and with intellectual propertyspecialty firms in Canada and the US Dr. Brown wascalled to the Bar in June of 2004, and is developing apractice in technology law.

Neil Barrett is in the process of achieving the degreeof Master’s of Science from Memorial University ofNewfoundland with the goal of continuing in academiabeginning with a PhD.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 257 Comp. Anim. Virtual Worlds 2006; 17: 249–257

EMBODIED CONVERSATIONAL AGENTS* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *