A Distributed Architecture for Knowledge-Based … Distributed Architecture for Knowledge-Based Interactive Robots Pattara Kiatisevi, Non Member, Vuthichai Ampornaramveth, and Haruki

Proceedings of the 2nd International Conference on Information Technology for Application (ICITA 2004)

A Distributed Architecture for Knowledge-BasedInteractive Robots

Pattara Kiatisevi, Non Member, Vuthichai Ampornaramveth, and Haruki Ueno, Member, IEEE

Abstract— Development of an interactive robot is usuallyinvolved with combining components into an integrated system.This paper presents a distributed architecture for interactiverobots. Various kinds of components including robotics devices,for example, robot’s neck, arms, and software components, e.g.,face recognition, speech recognition, are integrated. The architec-ture enables inter-operation among these components regardlessof their computing platforms and programming languages byutilizing XML-RPC [1]. The SPAK Knowledge Manager [2] isintegrated in the architecture and responsible for intelligencetasks. An interactive robot is developed and a multi-modalhuman-robot interaction experiment is conducted both in thesimulation platform and in the real environment with a humanoidrobot.

Index Terms— Architecture for Interactive Robot, Multi-modal Human-Robot Interaction, Symbiotic Information Sys-tems, SPAK

I. INTRODUCTION

Computers and robots are more and more accessible tohumans. In our research of Symbiotic Information Systems(SIS) [3], we believe that, instead of training humans to becomputer-literate in order to be able to use the informationsystems, the systems should be developed so that they areable to interact with us in the human way.

Although robotics technology has made impressive pro-gresses in the past decades but most produced robots aremerely integration of mechanical and electronics componentswith little intelligence. SIS is focused on combining them withintelligence technology to achieve a robot that interacts withpeople in the human-friendly and symbiotic manner.

We emphasize on the Knowledge-based approach and havedeveloped a general purpose knowledge software platformcalled the Software Platform for Agents and Knowledge orSPAK [2] in order to support the realization of such robots.

Useful applications are, e.g., service and friend robots forelderly persons as the number of elderly persons is increasingin many parts of the world [4], [5].

Meanwhile, researchers are struggling to make machinesmore intelligent. A lot of training data is needed, e.g., forapplications like speech processing, language understanding.

Pattara Kiatisevi is with the Intelligent Systems Research Division, NationalInstitute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430 andis student of the Graduate University for Advanced Studies, Shonan Village,Hayama, Kanagawa, Japan 240-0193 (e-mail: [email protected])

Dr. Vuthichai Ampornaramveth is with the Intelligent Systems ResearchDivision, National Institute of Informatics (e-mail: [email protected])

Prof. Dr. Haruki Ueno is the Director of the Intelligent Systems ResearchDivision, National Institute of Informatics and Professor of the GraduateUniversity for Advanced Studies (e-mail: [email protected])

ICITA 2004 ISBN 0-646-42313-4

To the fact that elderly persons usually have free time and alot of experiences, the robot can learn new things from themand they can be good companion of each other.

The issue is therefore how to build such robot that is helpfulfor elderly people, capable of interacting in the human-way,and capable of learning new things from the interactions.

Development of an interactive robot is concerned withmany kinds of components, e.g., the robotics devices (robot’sarms, robot’s head), software engines for image processing,speech processing. Since these components are available asresults from research and development in each field, theydiffer dramatically depending on their technologies, usages,and manufacturers.

Since all these components are to be combined to form anintegrated system, a mechanism that allows the reuse of thesecomponents, the efficient cooperations between them, and thefast development process in order to cope with rapid changeof technology, is needed.

In this paper, we present a distributed architecture forinteractive robots according to such needs. Using the architec-ture and the SPAK Knowledge Manager, an interactive robotis developed. Simulation tool to enable faster developmentprocess was created. A multi-modal human-robot interactionexperiment was done both in the simulation environment andwith the real humanoid robot.

In section II, the architecture is presented. In section III,the experiment scenario and the result are shown. The paperconcludes in section IV with the future work discussed insection V.

II. THE ARCHITECTURE

The design goal is to have an architecture in which variouscomponents or agents1 cooperate with each other effectively.The architecture is chosen to be distributed because thesecomponents might reside on different computing platforms.Interaction between components developed in different pro-gramming languages must be supported. Also the use of thearchitecture should be intuitive and simple.

There are a large number of research and development inagent software platforms [6] ranging from distributed softwareframework like CORBA, DCOM, to more comprehensiveagent platforms like FIPA-OS, ZEUS. FIPA [7] aims to createstandards among agent platforms. However, most projects inthis group are concerned with developing a comprehensiveplatform for general applications. The complexity is therefore

1The term agent in this paper is referred to an individual computing elementwhich performs a specific task using the resource it controls. It can beautonomous or non-autonomous, intelligent or non-intelligent.


RoboNeck

KnowledgeServer

RoboMouth

FaceDetector

FaceRecognizer

SpeechRecognizer

Microphone Speaker

Video Camera Robot’s Neck

Human

XML−RPC Server XML−RPC Server XML−RPC Server

XML−RPC ServerXML−RPC ServerXML−RPC Server

TCP/IP Network

Interactive Robot

Fig. 1. Overview Diagram of an Interactive Robot

high and only a few of their features are needed in ourinteractive robot.

In the DARPA Communicator project, the Galaxy Ar-chitecture [8] is developed for its spoken-dialogue systemwhere various kinds of components are communicating on thisarchitecture. The multi-platform support is not yet extensiveand the use in other applications is rare.

As there is no architecture that perfectly matches our needs,we are developing a new one while at the same time trying tore-use the existing technologies if possible.

In this section, the developed distributed architecture ispresented. It is multi-platform and simple, without unnecessaryfeatures. It is modular, adding a new component or replacingthe existing one is trivial. It is designed to work with theSPAK Knowledge platform. The concept and detail of thearchitecture are discussed in each aspect as follows.

Primitive Agent

An interactive robot is decomposed into smaller componentscalled Primitive Agents. Primitive agents include softwareentities that represent robotics equipment, e.g., sensors, cam-era, microphone; software components that perform somespecific tasks, e.g., face recognizer, speech recognizer; andthe Knowledge servers or software that performs intelligencetasks.

A primitive agent can be passive, waiting for incomingrequests or a more complex and active with a certain level ofautonomy. The Knowledge server can be considered a specialprimitive agent because of its functionality and role (will bedescribed later). Figure 1 illustrates an example of interactiverobot and its 6 primitive agents.

Primitive agents are designed to be small and each iseither responsible for a certain specific task, for example,speech recognition and face detection; or representing a certainrobotics device it is connected to, e.g., video camera or Robot’sneck. A primitive agent can be accessed by other primitiveagents on the network through its interface.

Communications between Primitive Agents

Software in each primitive agent has different programminginterfaces depending on their languages, usages, and manu-facturers. Some runs only on some specific platforms. In oursystem, for example, the knowledge server primitive agentis written in Java and hence multi-platform, while the facerecognizer software is written in C and developed on UNIXmachines.

Since all these primitive agents are to be integrated, theymust be able to communicate with each other effectively.Therefore an effective communications mechanism in thisheterogeneous and distributed system is needed.

We evaluated various technologies and finally selectedXML-RPC [1] as the communication protocol between prim-itive agents. XML-RPC enables remote-procedure-call (RPC)across various computing platforms and programming lan-guages. It is simple and light-weight. XML-RPC messagesare transported in the text-based XML format which is open,standardized, and easy for human inspection. There exist manyXML-RPC implementations in various computer languages,e.g., C/C++, Java, Perl, Python, and various operating systems,e.g., GNU/Linux, Microsoft Windows, Sun Solaris. We con-sider the simplicity and the cross-platform support of XML-RPC as its main advantages.

On the other hand, using XML as data format, the amountof data to be transfered is much larger than that of binaryprotocol. Using gzip extension of HTTP-1.1 might alleviatethe problem but more computing power is needed to compressand decompress the data. Moreover, as XML is text-basedprotocol, all data are sent unencrypted over the network. Moresecurity can be achieved via the use of HTTPS at the price ofcomputing resource for encryption and decryption. However,these disadvantages are not critical in our experiment.

Compared to the middle-ware framework like CORBAwhich is also multi-platform or other XML-based remote-procedure-call protocol like SOAP, XML-RPC has far lessfeatures. However, it fulfills our need for inter-primitive-agentscommunications and is much less complicated

An example XML-RPC message representing a remote-procedure-call to function example.HelloWorld() passing aparameter of type integer with value 1 is as follow:

<?xml version="1.0"?><methodCall>

<methodName>example.HelloWorld</methodName><params><param>

<value><i4>1</i4></value></param>

</params>

</methodcall>

A primitive agent is therefore basically a piece of softwareperforming a specific task (e.g., speech recognizer) or repre-senting the hardware (e.g., robot’s neck) wrapped by an XML-RPC server. This server waits for requests from other agents.When a request arrives, it accepts the request and pass it tothe appropriate part of code that performs the real processing.


Primitive Agent Abstraction

Although primitive agents can be very different, they sharesome common properties. For example, all primitive agentsshould provide a mechanisms for other agents to check theirstatus and test the reachability. Therefore we designed thegeneric interface for such common tasks. Two generic func-tions getStatus() and ping() are for checking agent’s status andtesting the reachability respectively. The verbosity of the statusmessages can be set by function setDebug(). The knowledgeserver can retrieve information about a certain primitive agentby calling function getSPAKInfo() (the detail of this will bediscussed later). An example generic interface (in pseudocode)for all primitive agents is as follow:

• void setDebug(boolean setDebug)• string getSPAKInfo()• string getStatus()• string ping()

Moreover similar primitive agents might even share more com-mon properties, e.g., two face detector agents (using differentface detection algorithms) would have the same interfacethat accepts the image and returns the location(s) of face(s).Therefore primitive agents are categorized into classes andsome class-specific methods are commonly defined. Exampleof interface for face detector primitive agent class is:

• void setImage(base64 encoded data imagecontents): setthe input image contents

• string getFaceLocations(): do the face detection, andreturn face location(s)

• base64 encoded data getProcessedImage(): return inputimage with a rectangle around each face

Frame-based Knowledge Model

We utilize the SPAK Knowledge Manager which featuresFrame-based knowledge management, a GUI knowledge edi-tor, and forward and backward chaining engines [2]. A frameis a data-structure for representing a stereotyped situation [9].Benefits of frame-based systems are the ability to representconcepts and situations in class and object hierarchies andmatching of current situation to a frame. We also use Frame-based system as the main framework incorporating othertechniques like procedural script and logics.

Knowledge contents in the Knowledge Manager representsthe world model of the experiment. Concepts like Human,Student, Professor, Event, Behavior are maintained as frames.Information about each frame is stored in frame’s Slots.Slot values can be scalar (e.g., integer, string), pointer toother frame instance or procedural script (SPAK supportsJavascript). The special slot onInstantiate will be executedonce when the frame is instantiated.

When there is a change in the environment, it can resultin creation of a new frame and triggering an action. TheKnowledge Manager acts as central module of the system re-ceiving input events from other primitive agents, incorporatingchanges into its knowledge contents and causing output actionsas a response.

Knowledge Exchange between Primitive Agents

At the moment knowledge exchange takes place betweenthe knowledge server and other agents.

To be flexible and modular, basic information of a primitiveagent is designed to be embedded in that primitive agentitself in the format that is understandable by the SPAKKnowledge Manager (currently XML-based format) insteadof hard-coding it inside the Knowledge Manager.

In the initialization phase of the system, the SPAK Knowl-edge Manager queries primitive agents to retrieve this basicinformation by calling their getSPAKInfo() methods (whichexists in every primitive agents as we mentioned earlier). Theretrieved data will be processed and a frame representing eachprimitive agent will be created and added to the knowledge treeto reflect the existence of that primitive agent. In the future,more contents can be included in this basic information sothat the Knowledge Manager learns more about the primitiveagent automatically from it.

Example of the basic information of a primitive agent ofclass Mouth named RobovieMouth:

<FRAME><NAME>RobovieMouth</NAME><ISA>Mouth</ISA><ISINSTANCE>TRUE</ISINSTANCE><SLOTLIST>

<SLOT><NAME>URL</NAME><TYPE>TYPE STR</TYPE><CONDITION>COND ANY</CONDITION><ARGUMENT></ARGUMENT><VALUE>http://robovie.local:8080/RPC2 </VALUE><REQUIRED>TRUE</REQUIRED><SHARED>TRUE</SHARED>

</SLOT>

[...]

<SLOT><NAME>sayTextFn</NAME><TYPE>TYPE STR</TYPE><CONDITION>COND ANY</CONDITION><ARGUMENT></ARGUMENT><VALUE>sayText</VALUE><REQUIRED>TRUE</REQUIRED><SHARED>TRUE</SHARED>

</SLOT><SLOT><NAME>utterFn</NAME><TYPE>TYPE STR</TYPE><CONDITION>COND ANY</CONDITION><ARGUMENT></ARGUMENT><VALUE>utter</VALUE><REQUIRED>TRUE</REQUIRED><SHARED>TRUE</SHARED>

</SLOT></SLOTLIST>

</FRAME>

After this knowledge has been processed by the KnowledgeManager, a new frame RobovieMouth is created as shown inFigure 2 (only sub-tree of knowledge contents starting fromRemoteAgents frame which is super-class of RobovieMouth isshown).

In the next section, a human-robot interaction experimentbased on the architecture present in this Section is discussed.

III. EXPERIMENT

The experiment is conducted as to evaluate the proposedarchitecture. The experiment scenario is designed as a simple


Fig. 2. RobovieMouth frame and RemoteAgents Sub-Tree

human-robot greeting conversation using multi-modal inter-faces namely speech, vision, and gesture. The detail of thescenario is as follow:

There is a robot standing in the laboratory. People whoarrive the office usually walk pass and greet it. Mr. Vuthiwalks toward the robot. The robot spots his face and startsthe conversation. (assumed they knew each other).

Robot: Hi Vuthi, how are you today?Vuthi: I’m OK. How about you?Robot: I’m fine, thank you.

Five minutes later Pattara (assumed not known to therobot) arrives. Robot now talks to him.

Robot: Hi, who are you? We haven’t known eachother before, have we. What’s your name?

Pattara: Hi, my name is Pattara.Robot: Nice to meet you, Pattara. How are you

today? Vuthi just arrived here 5 minutesago.

Pattara: Ah, I see. I’m fine thank you. See you.Robot: See you.

Now Pattara is leaving. He stops at the robot and talks to it.

Pattara: Are there still any students here?Robot: Hassan has left since 6 o’clock. Alex

should still be here.Pattara: OK I see, thank you. Bye, bye.Robot: Bye bye, have a nice evening, Pattara

(waving it’s hand).

In this scenario, the robot interacts with human usingspeech, vision, and gesture. It must be able to detect andrecognize humans, understand basic greeting words, rememberpersons’ status and some past events. Robot also followshuman’s face while having conversation.

The Robovie humanoid robot [10] is used in the experiment.It is developed by ATR, Japan, with human upper torso placedon a ActivMedia wheel robot. It has two eye cameras anda speaker at its mouth. Robovie can interact with users bymoving its arms and head, or using voice. As it was notequipped with microphone, a wireless microphone is attached

Fig. 3. Overview Diagram of the System

to its head. Control software is installed on Robovie’s internalLinux PC.

Overview diagram of the system is illustrated in Figure3. The system is composed many primitive agents, namelyRobovieNeck, RobovieEyes, RobovieTrunk, FaceDetector, Fac-eRecognizer, SpeechRecognizer, RobovieMouth, and Knowl-edge Server, running on networked computers. Some of theseprimitive agents are discussed below with selected importantfunctions.

RobovieNeck

RobovieNeck offers functions to move the neck of Robovieto a specified destination. For simplicity, the destination isin (x,y) coordinate each with value ranging from -2 to 2.There are therefore total 25 possibilities of positions. Importantfunctions offered by RobovieNeck are as follows:

• void move(int posX, int posY): move to an absoluteposition (posX, posY).

• void followObject(int posX, posY): move the neck appro-priately so that it follows the object at position (posX,posY).

RobovieEyes

RobovieEyes agent is connected to Robovie’s video camerasand provides function to retrieve the snapshot image. At themoment it uses only 1 camera, with image size of 320x240pixel. It can be configured to regularly make the snapshot andautomatically asks FaceDetector primitive agent to performface detection.

FaceDetector

We use face detection software from Carnegie-Mellon Uni-versity [11] to find face locations in the image. FaceDetectoragent receives input image via setImage() function. The facelocations can be retrieved by calling getFaceLocations() (sim-ilar to that shown in Section II). It can be set to automaticallysubmit a new event frame to the Knowledge server agentwhenever a face has been found in the image.

FaceRecognizer

This agent performs face recognition and currently usesEigenface [12] as its back-end. FaceRecognizer will not doface detection therefore the input image must contain onlyface. Its interface includes following functions:


Fig. 4. SPAK Knowledge Editor

• void resetDB(): clear the database.• string recognize(base64 encoded data imagecontents):

recognize the person present in the imagecontents data.• void setName(string name): assign the name to the last

detected-as-unknown person and add to the facedatabase.

SpeechRecognizer

We use Sphinx [13] speech recognition software with asimple language model containing only related text used in theexperiment. The agent accepts input speech contents either incompressed Ogg Vorbis format or uncompressed WAV format,processes and returns the text string as output via the followinginterfaces:

• string recognize(base64 encoded data soundcontents)

RobovieMouth

RobovieMouth is connected directly to the sound device ofRobovie. It accepts strings of input text and forwards to theFestival Text-to-Speech software [14] which will synthesizethe speech output. RobovieMouth also supports raw soundinput contents.

Knowledge Server

Knowledge Server is the intelligence part of the system. It isbasically the SPAK Knowledge Manager wrapped by an XML-RPC server code. Input data can be sent to the KnowledgeManager as a text message through its getMessage() function:

• void getMessage(string message)Knowledge contents can be edited using SPAK KnowledgeEditor shown in Figure 4. In the preparation phase, basicknowledge needed for the experiment, e.g., concepts of Stu-dent, Human, and Professor was added to the system.

Incoming events from other primitive agents to the Knowl-edge Manager during the experiment are in the form of event

Fig. 5. FollowFace Behavior Frame

frames, e.g., AudioVisual event, TouchSensor event. Event-action behavior is achieved through Behavior frames. Behaviorframes are designed with proper slots and constraints in orderto match certain situations and generate some actions.

For example, when an unknown face is detected by theface detector primitive agent, an event frame reporting thisis sent to the Knowledge Manager. Upon receiving this frame,the condition of a behavior frame Greet which requires theexistence of an unknown person in front of the robot isfulfilled, therefore a new instance of Greet frame is createdand some actions, e.g., utter a greeting word, are triggeredaccording to the contents in its onInstantiate slot.

Another example is the FollowFace behavior frame thatenables robot to follow the face of the conversation partner.It has 4 slots, namely Name, mNewFaceFound, mNeck, andonInstantiate, as shown in Figure 5. Name slot is a stringrepresenting the frame name. mNewFace and mNeck are ofInstance type (which means pointer to other frame instances)of frame NewFaceFound (an audiovisual event frame) andframe Neck (a RemoteAgents frame) respectively.

This means the FollowFace frame will be instantiated ifthere exist an instance of NewFaceFound event frame, whichis usually generated by FaceRecognizer primitive agent, andan instance of Neck RemoteAgents frame, which will existif a primitive agent of class Neck (in the experiment it isRobovieNeck) is present in the system.

Provided that in the setup phase, an instance of the Neckframe to reflect the existence of RobovieNeck agent has beencreated, if a new face is detected by the FaceRecognizeragent, a FollowFace frame will be instantiated (because nowboth conditions are fulfilled) and the action specified in itsonInstantiate slot will be executed. In this case it is tomake an XML-RPC call to followObject() function of theRobovieNeck primitive agent (with the face locations got frommNewFaceFound frame sent as parameters), and the Robot’sneck will move accordingly.

Other actions that should take place when certain eventsoccur are done in the similar way, e.g., ParseSpeechInputbehavior frame which will parse the input recognized speechand generate SpeechAct frames, will be instantiated whenthere is a new SpeechInput event frame, AnswerSpeechInputbehavior frame will be instantiated when there is a newSpeechAct event frame.

For the whole experiment, more than 50 frames are main-tained in the knowledge base.

Simulation of Primitive Agents

By having defined interfaces of all involved primitiveagents, some primitive agents can be substituted by the simu-lated ones with the same interfaces. Also some events which


Fig. 6. Robosim Web Interface

Fig. 7. Robot makes a conversation with a human subject

are, according to the design, generated by certain primitiveagents can be virtually generated in the simulation environ-ment. Therefore the whole system can be tested and debuggedfirst without having to implemented all the components. Thishas speeded up the development process largely.

We have created a simulation environment called Robosim.Its web interface shown in Figure 6.

After the system worked well in the simulation environment,it was successfully tested with the real robot with someadjustments, e.g., primitive agents that require much comput-ing power like SpeechRecognizer were moved to the high-computing-power machines in order to achieve faster results,primitive agents that interact with each other very often wereplaced on the same machines to reduce communications over-head, e.g., RobovieEyes and FaceDetector. Figure 7 illustratesthe Robovie robot having a conversation with a human subjectin the experiment.

IV. CONCLUSION

In this paper, we present our distributed architecture forinteractive robots. An interactive robot is developed based on

the architecture and a multi-modal human-robot interactionexperiment has been successfully conducted in which humanand robot made a conversation using natural interfaces: speech,vision, and gesture.

The proposed architecture is distributed and simple. Inte-grating various primitive agents running on different comput-ing platforms or written in different programming languagescan be done effectively and with short development time.SPAK Knowledge Manager is integrated with the architecture.It performs as intelligence center of the system.

V. FUTURE WORK

Future work includes improvement of primitive agents’wrapper (XML-RPC server) code to be multi-threading andreentrant to support concurrent requests. In the such case thatmany requests arrive at the same time, task priority should besupported. This includes the ability to pause and resume (orcancel) the current task while processing the higher priorityrequest.

ACKNOWLEDGEMENT

We would like to thank Prof. Kanade at the Carnegie-MellonUniversity for kindly allowing us to use of the face detectionsoftware.

REFERENCES

[1] UserLand Software, Inc. XML-RPC Home Page. [Online]. Available:http://www.xmlrpc.org/

[2] Vuthichai Ampornaramveth, Pattara Kiatisevi, Haruki Ueno, “Towarda Software Platform for Knowledge Management in Human-RobotEnvironment,” Technical Report of IEICE, Vol. 103 No. 83, pg. 15-20,2003.

[3] Haruki Ueno, “Symbiotic Information Systems: Towards an Ideal Rela-tionship of Human-Beings and Information Systems,” Technical Reportof IEICE, KBSE2001-15:27-34, August 2001.

[4] Kevin Kinsella, Victoria A. Velkoff. An Aging World:2001, International Population Reports. [Online]. Available:http://www.census.gov/

[5] Ministry of Public Management, Home Affairs, Posts, andTelecommunications. Japan Statistical Yearbook 2003. [Online].Available: http://www.stat.go.jp/

[6] IEEE Computer Society. IEEE Distributed Systems On-line : Distributed Agents Projects. [Online]. Available:http://dsonline.computer.org/agents/projects.htm

[7] The Foundation for Intelligent Physical Agents. FIPA Web Site.[Online]. Available: http://www.fipa.org/

[8] DARPA Communicator Project. Galaxy Architecture. [Online].Available: http://communicator.sf.net

[9] Marvin Minsky, “A Framework for Representing Knowledge,” MIT-AILaboratory Memo 306, June 1974.

[10] Takayuki Kanda, Hiroshi Ishiguro, Tetsuo Ono, Michita Imai, andRyohei Nakatsu, “Development and Evaluation of an Interactive Hu-manoid Robot: Robovie,” IEEE International Conference on Roboticsand Automation (ICRA 2002), 2002.

[11] Henry A. Rowley, Shumeet Baluja, and Takeo Kanade, “NeuralNetwork-Based Face Detection,” IEEE Transactions on Pattern Analysisand Machine Intelligence, volume 20, number 1, pages 23-38, January1998.

[12] M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,”Proceedings, Eleventh International Conference on Pattern Recognition,pages 586-591, 1991.

[13] Carnegie Mellon University. CMU Sphinx: Open Source SpeechRecognition. [Online]. Available: http://www.speech.cs.cmu.edu/sphinx/

[14] University of Edinburgh. The Festival Speech Synthesis System.[Online]. Available: http://www.cstr.ed.ac.uk/projects/festival/

Documents

A Distributed Architecture for Knowledge-Based … Distributed Architecture for Knowledge-Based Interactive Robots Pattara Kiatisevi, Non Member, Vuthichai Ampornaramveth, and Haruki