161
UNIVERSIDAD POLITÉCNICA DE MADRID ESCUELA TÉCNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN PROYECTO FIN DE MASTER IMPLEMENTATION OF AN AFFECTIVE CONVERSATIONAL AGENT FOR CONTROLLING A HI-FI SYSTEM

lorien.die.upm.eslorien.die.upm.es/juancho/pfcs/jjsg/PFC-Justo Saavedra.…  · Web viewA simple communication protocol has been adopted and ... It provides messages communication

Embed Size (px)

Citation preview

BIBLIOGRAPHY118

UNIVERSIDAD POLITCNICA DE MADRID

ESCUELA TCNICA SUPERIOR DE INGENIEROS

DE TELECOMUNICACIN

Proyecto fin de Master

Implementation of an affective conversational agent for controlling a Hi-Fi system

Author: Justo Javier Saavedra GuadaTutors: Syaheerah LutfiDr. Juan Manuel Montero Martnez

Madrid, 2011

Contents

1.Introduction1

1.1.Objectives2

2.System architecture3

2.1.GALAXY-II: A reference architecture for conversional system development3

2.2.NEMO Architecture: Our architecture6

2.2.1.Previous dialogue system architecture7

2.2.2.NEMO Architecture: New architecture8

3.System Communications13

3.1.Concepts13

3.1.1.Peer-to-peer14

3.1.2.client/server model14

3.1.3.Advantages/Disadvantages15

3.2.Communication Methods & Tools16

3.2.1.Sockets16

3.2.2.Service oriented architecture (soa)17

3.2.3.web services and soa18

3.2.3.1.Simple object access protocol (soap)19

3.3.Final implementation21

4.Emotional System23

4.1.Emotional Theories23

4.1.1.Two factor theory of emotion24

4.1.2.Ortony, Clore, and Collins24

4.1.3.Theory of roseman28

4.1.4.Theory of Frijda30

4.1.5.Theory of Oatley and Johnson-Laird33

4.2.theories behind our emotional system34

4.2.1.Maslows hierarchy of needs35

4.2.2.Appraisal theory37

4.3.Nemo: Emotional System39

4.3.1.Needs40

4.3.2.Appraisals40

4.3.3.Emotions43

4.3.3.1.Constant weight f(w)44

4.3.3.2.Emotion timing44

4.3.3.3.Neutral emotion44

4.3.4.Mapping appraisals45

5.Object oriented analysis46

5.1.Objects46

5.2.classes47

5.2.1.Methods of a class47

5.2.2.Encapsulation & Accessibility48

5.2.3.Inheritance48

5.2.4.Polymorphism49

5.3.classes in the Nemo: Emotional System50

5.3.1.Need classes53

5.3.2.Emotions Classes57

6.Hi-Fi system61

6.1.Introduction to the hi-fi dialog system61

6.2.Hi-Fi Application [36]63

6.3.Menu63

6.4.System Message64

6.5.Configuration65

6.5.1.Settings that affect the Knowledge Management Module66

6.6.State of the system68

6.7.Recognition69

6.7.1.Voice activity detector setup dedicated controls69

6.7.2.Dedicated controls for recording and playback71

6.7.3.Oscilloscope setup dedicated controls72

6.7.4.Recognition dedicated controls72

6.8.Language Understanding Module74

6.9.Dialogue manager75

6.9.1.Dialogues Goals dedicated controls76

6.9.2.Dialogue concepts dedicated controls76

6.9.3.Threshold configuration controls77

6.9.4.Dialogue memory controls78

6.10.Execution module79

6.11.Response Generation module80

6.12.Synthesis84

6.13.Dialogue features used in the emotional system84

7.Supervisor86

7.1.How it works?86

7.2.Interface87

7.3.User Case Model88

A.Computer Vision89

A.1 Facial expressions90

A.2Techniques92

A.2.1Haar Classifer: Viola Jones method92

A.2.2Motion flow94

A.3Our Smile Detector94

A.3.1Training95

A.3.2Picture Test97

A.3.4Real time test98

B.How To Run The System100

C.Response Generation Templates Examples114

Bibliography116

Figures

Figure 1: Galaxy-II Architecture [2]4

Figure 2: Dialogue System basic architecture8

Figure 3: Architecture11

Figure 4: Architecture data flow12

Figure 5: Classification of Computer Systems [5]13

Figure 6: Peer-to-Peer Architecture14

Figure 7: Client-Server Architecture15

Figure 8: SOAP request example20

Figure 9: SOAP response example20

Figure 10: Example rule.xml21

Figure 11: sendemotionstohifi Jabon-xml message22

Figure 12: Two Factor Theory [17]24

Figure 13: Structure of emotion types in the theory of Ortony, Clore and Collins [18]26

Figure 14: Frijda Emotional System [24]33

Figure 15: Maslow Hierarchy of Needs36

Figure 16: Nemo Architecture39

Figure 17: Nemo UML diagram50

Figure 18: LevelTimeHistory class51

Figure 19: CIOProcess class53

Figure 20: CNeedProcess class54

Figure 21: TNIFV class54

Figure 22: CSurvival class55

Figure 23: CSafety class55

Figure 24: CSocial class56

Figure 25: CSuccess class56

Figure 26: CEthics class57

Figure 27: CEmotion class58

Figure 28: CSurprise class59

Figure 29: CSad class59

Figure 30: CHappy class59

Figure 31: CFear class59

Figure 32: CAngry class60

Figure 33: CShame class60

Figure 34: Dialogue System Architecture61

Figure 35: Hi-Fi Main Windows63

Figure 36: Modules submenu64

Figure 37: Ver submenu64

Figure 38: Configuration dialog66

Figure 39: Overview of the dialog box "HiFi State"68

Figure 40: Overview of the dialog box "HiFi Recognition"69

Figure 41: Representation of the energy of an audio signal, along with levels of the detector and the frames marked as the start and end71

Figure 42: Dedicated controls for recording and playback of audio files, located in the dialog box "HiFi Recognition"71

Figure 43: Dedicated oscilloscope control settings, located in the dialog box "HiFi Recognition"72

Figure 44: Recognition dedicated controls itself, located in the dialog box "HiFi Recognition"73

Figure 45: Overview of the dialog box "Comprension Hi-Fi"74

Figure 46: Overview of the dialog box "HiFi Dialogue Manager"75

Figure 47: Dialogue Objectives76

Figure 48: Controls dedicated to present the classified dialogue concepts77

Figure 49: Threshold controls of the dialogue manager78

Figure 50: Dialogue Memory Controls79

Figure 51: Execution Module80

Figure 52: Response Generation Box81

Figure 53: Synthesizer84

Figure 54: Supervisor in the system architecture86

Figure 55: Supervisor Interface87

Figure 56: User Case Model88

Figure 57: Haar features93

Figure 58: Cascade of classifiers94

Figure 59: Flowchar to filter the images95

Figure 60: Comparison of the methods (Left: Picture1 Left,Right: Picture2)96

Figure 61: Opencv application99

Figure 62: IR USB Driver Task Bar102

Figure 63: Test Command IR Driver102

Figure 64: HI-FI System Loading103

Figure 65: HI-FI System Ready104

Figure 66: IR Activation Box104

Figure 67: HI-FI System Ready and IR Activated105

Figure 68: Supervisor not ready106

Figure 69: Supervisor (Agent Face) Neutral107

Figure 70: Emotional System Running108

Figure 71: Supervisor System Ready109

Figure 72: Run the Recognition Module110

Figure 73: Recognition Dialogue Box111

Figure 74: Recognition Dialogue Box Running112

Tables

Table 1: Advantages/Disadvantages P2P vs. Client-Server16

Table 2: Ortony, Clore, and Collins variables27

Table 3: Plan junctures [27]34

Table 4: Mapping Appraisal Weights into Emotions45

Table 5: System message65

Table 6: Combinations for links between modules Knowledge Manager, Dialogue Manager and Performance.67

1

1. Introduction

The purpose of this work is the design, development, and adaptation of an emotional model to be used in a dialogue system that provides several functions from speech recognition. The main goal is to improve the existing dialogue system by adding new features related to the emotional model that is aimed at creating a robotic servant with basic human-like emotions.

In the near future, emotional systems will play an important role in the development of intelligent/affective systems. Either recognition of the user emotions or emotional response from the system is highly desirable. To create a system/robot with such features, several aspects have to be considered:

Emotional model

Dialogue System

Speech recognition

Language understanding module

Dialogue manager

Emotional response generator

Emotional Speech Synthesizer

User emotion recognition by the use of a webcam

We can create a good emotional model, but without a good dialogue system adapted to the emotional model, the users will not notice, for example, the emotions felt by the system if we do not have an emotional speech synthesizer. Another important feature is the capacity to sense user emotions, interaction, and movements in order to supply with more information our emotional systems so that they can generate better responses.

Our experimental work is based on a hi-fi audio system in which the dialogue system was adapted and it provides functionality to control its CD player, two tapes, am/fm radio, and others. In the past, this hi-fi audio system was tested, and evaluated by users rating its performance without the use of the emotional model. It is part of this work to improve the dialogue system, adapt the new emotional features, and set the basis for the future re-evaluation of the Hi-Fi dialogue system.

1.1. Objectives

Adaptation of a human affective and cognitive model to the servant robotic system.

Improvement and adaption of the existing dialogue system that controls a Hi-Fi system.

Design and Development of the Architectural Communication System to be used.

Emotion recognition by webcam in order to feed the emotional model.

INTRODUCCTION2

2. System architecture

In this chapter the reference model Galaxy-II is presented. Galaxy-II is used for conversational system developments and it guided us in the creation of our system architecture; secondly the basic dialogue system (used as a base for our work) is presented, and finally our final architecture that adapts the new dialogue system, and the emotional model.

2.1. GALAXY-II: A reference architecture for conversional system development

This architecture was developed by the Spoken Language Systems Group from the Massachusetts Institute of Technology. Through their experience in designing spoken dialogue systems, they have realized that an essential element in being able to rapidly configure new systems is to allow as many aspects of the system design as possible to be specifiable without modifying source code []. By doing so, they have been able to configure multi-modal, multi- domain, multi-user, and multilingual systems with much less effort than previously. As new and increasingly complex spoken dialogue systems are built, the task of evaluating and expanding these systems becomes both more important and more difficult. Typically, a spoken dialogue system comprises multiple modules, each of which performs its task within an overall framework, sometimes completely independently but most often with input from other modules. Secondly, once a mechanism is in place for running data through an off-line system, a simple pre-processing of data with a new version of any component can lead to an incoherent interaction, as only one side of a two-sided conversation has changed. Finally, what to evaluate (e.g., individual component vs. overall system behaviour) and how (error rates vs. some measure of usability).

The GALAXY-II architecture consists of a central hub that controls the flow of information among a suite of servers, which may be running on the same machine or at remote locations []. Figure 1 shows a typical hub configuration for a generic spoken dialogue system. The hub interaction with the servers is controlled via scripting language. A hub program includes a list of the active servers, specifying the host, port, and set of operations each server supports, as well as a set of one or more programs. Each program consists of a set of rules, where each rule specifies an operation, a set of conditions under which that rule should fire, a list of INPUT and OUTPUT variables for the rule, as well as optional STORE/RETRIEVE variables into/from the discourse history. When a rule is fired, the input variables are packaged into a token and sent to the server that handles the operation. The hub expects the server to return a token containing the output variables at a later time. The variables are all recorded in a hub-internal master token. The conditions consist of simple logical and/or arithmetic tests on the values of the typed variables in the master token. The hub communicates with the various servers via a standardized frame-based protocol.

Figure 1: Galaxy-II Architecture []

A simple communication protocol has been adopted and standardized for all hub/server interactions. Upon initiation, the hub first handshakes with all of the specified servers, confirming that they are up and running and sending them a welcome token that may contain some initialization information, as specified in the hub script []. The hub then launches a wait loop in which the servers are continuously polled for any return tokens. Each token is named according to its corresponding program in the hub script, and may also contain a rule index to locate its place in program execution, and a token id to associate it with the appropriate master token in the hubs internal memory. The rule is consulted to determine which OUTPUT variables to update in the master, and which variables, if any, to store in the discourse history. Following this, the master token is evaluated against the complete set of rules subsequent to the rule index, and any rules that pass test conditions are then executed. In the current implementation, the usual case is that only one rule is fired, although simultaneous rule executions can be used to implement parallelism, a feature that is used. Servers other than those that implement user interface functions are typically stateless; any history they may need is sent back to the hub for safekeeping, where it is associated with the current utterance. Common state can thus be shared among multiple servers. To execute a given rule, a new token is created from the master token, containing only the subset of variables specified in the INPUT variables for the rule in question. This token is then sent to the server assigned to the execution of the operation specified by the rule. If it is determined that the designated server is busy (has not yet replied to a preceding rule either within this dialogue or in a competing dialogue) then the token is queued up for later transmission. Thus the hub is in theory never stalled waiting for a server to receive a token. The hub then checks whether the server that sent the token has any tokens in its input queue. If so, it will pop the queue before returning to the wait loop. For example, the recognizer sends to the hub each hypothesis as a separate token, signalling completion with a special final token. The selected token is processed through discourse inheritance via the hub script and sent on to the dialogue manager. The dialogue manager usually initiates a subdialogue in order to retrieve information from the database. The retrieved database entries are returned to the dialogue manager for interpretation. These activities are controlled at a separate program in the hub script, which we refer to as a module-to-module subdialogue. Finally, the dialogue manager sends a reply frame to the hub which is passed along to generation and synthesis. After the synthesized speech has been transmitted to the user, the audio server is freed up to begin listening the next user utterance [].

One of the challenges that is well addressed by the Galaxy-II architecture is managing multimodal interactions. In a multimodal interaction separate control threads need to manage the various input/output modalities. These threads need to be coordinated and synchronized. In this architecture, the execution model of having a set of active tokens for which rules are fired as their conditions are matched has proven effective in supporting the multiple threads. Different tokens correspond to activity in different threads. Tying all the threads for a given user session together is a session identifier in every token.

All of these multimodal interactions are handled in a straightforward manner within the Galaxy-II architecture. The parallel execution, multiple token programming model supports these interactions in a simpler manner than would be possible in a traditional programming language such as C.

The GALAXY-II architecture has proven to be a powerful tool for evaluation. It has made possible a wide range of system configurations specifically designed for monitoring system performance resulting in a suite of hub programs concerned with evaluation []. In some cases, it can be used only to evaluate a particular aspect of system performance, such as recognition or understanding. In other cases to evaluate the performance of the entire system, perhaps comparing a new version with the version that existed at the time a log file was first created. At other times can be useful to look at ways of measuring system performance as it relates to user satisfaction, along measurable dimensions.

2.2. NEMO Architecture: Our architecture

Based on the study of the Galaxy-II architecture presented previously, we adapted our work to perform in a similar way using available tools in the Speech Technology Group. The main objectives were to reuse as much as possible previous work in the dialogue system architectures, and adapt it to architecture capable of managing multiple servers interacting with each other. The servers in our dialogue system (similarly to the Galaxy-II) can be in the same machine or in different machines, giving broad flexibility to operate and distribute computational processing.

Main objectives:

Multimodal interface: The system must be capable of interacting with the user in several ways. Its helpful in order to test, and evaluate the system as well.

Scalable architecture: Must provide easiness to add/delete new modules and functionalities.

Emotional behaviour: The system needs to address the emotional behaviour independent of task. The emotional state of the system is calculated by giving certain information related to the performance of the tasks involved in the dialogue system, and information from the new modules. It follows a human psychological theory adapted to an affective agent.

Multi-server interaction: Must provide an architecture and protocol capable of managing the different server interactions.

2.2.1. Previous dialogue system architecture

The previous dialogue system architecture is presented in Figure 2. It contains six basic blocks that were designed, developed, and adapted in previous works to control the Hi-Fi system by voice commands [] :

Speech recognition: It translates the input voice from a microphone into a set of recognized words derived from a vocabulary previously trained.

Language Understanding Module: It extracts semantic concepts from the recognized words. This module is constituted by a set of context dependent rules, handcrafted by an expert in the system application domain, and the concept dictionary, a list with those relevant semantic concepts that each word can be related to. The output of the language understanding module is a list of attribute-value pairs [].

Dialogue Manager: This list of semantic concepts is the input to the dialog manager. From that list, the dialog manager fills an execution frame with the required information in order to execute the different actions present in the query utterance. It establishes an explicit confirmation mechanism giving some feedback to the user about the action that its going to be executed [].

Execution Module: It is in charge of sending the infrared commands to the system and keeping track of the actions executed by the mini hi-fi. It should mention that the commercial system is not able to give us a feedback of the commands executed, what can produce a synchronization loss [].

Generation Module: The generation module creates different sentences for the same dialog goal to give some feedback to the user about the actions carried out and to ask the user about some needed information in order to achieve the dialogue goal or to perform the requested action. In order to make the dialogue more natural it uses different sentences at each time. Its based on specific templates for each possible dialogue goal [].

Synthesiser: The text to speech module synthesizes the speech from the sentence proposed by the generation module. As we are developing an emotional system, the synthesizer used is capable of expressing emotions.

(Front EndSpeech RecognitionDialogue ManagerLanguage Understanding ModuleGeneration ModuleSynthesizerExecution ModuleTextText)

Figure 2: Dialogue System basic architecture

2.2.2. NEMO Architecture: New architecture

Our work consisted on adapting and expanding the functionality of the dialogue system by adding the emotional model. Our work consisted on:

Analyze the different blocks of the basic dialogue system in order to detect what information could be useful to provide information to the emotional model.

Modify the basic blocks in order to supply the information necessary to compute the emotions.

Expand the functionality of the generation module by adding more concepts such as relationship (Friend, Known, Unknown, etc), emotions (sadness, happiness, fear, surprise, anger, disgust). Furthermore, new concepts could be added such as cultural background and others in order to adapt phrases depending on the different concepts.

Research and develop the emotional model to be used.

Adapt a computer vision application based on OpenCV (open source computer vision library) that uses the webcam in order to recognize user movements, smiles, and presence in order to improve our system.

Adapt a new synthesizer developed by our Speech Technology Group that has the capacity of expressing emotions by giving a phrase and the emotion.

The final architecture is presented in Figure 3. The idea consist of a central hub similar to the one implement in Galaxy-II, this hub is implemented with use of a SOAP (Simple Object Access Protocol) called Jabon developed by the Intelligent Control Research Group of the Universidad Politecnica of Madrid. It provides messages communication between nodes. These messages must have an envelope that contains a compulsory body and an optional header field. This toolkit provides an interface to generate C++ web services that we used to communicate our different servers-clients. The implementation consists on defining (on the main loop of the principal program) the different rules and calls to each server. In addition, each server contains the services it provides and the input and output variables required. To call a service the input/output variable, the name of the service, and host and port must be specified.

Considering previous work architecture, in where all the components worked on the same machine and the calls to each module were made by function calls; It was analyzed which of those components were needed to be changed in order to set them as servers providing services to other components on the dialogue system.

It can be seen in Galaxy-II, all the components acted as servers, but it does not mean it is necessary to change all the previous modules in the dialogue system if it does not carry a substantial improvement. The new/changed components of the emotional dialogue system are (see figure 3):

OpenCV webcam client: This program based on the OpenCV libraries processes real time video input from a webcam in order to detect movements, face recognition, and smiles. As soon as it detects any of the previous events, this client communicates with the OpenCV webcam server running on the main program. This application represents the beginning of an unexploited potential in the field of computer vision. In the future, it could recognize users, recognize all the emotions, user movements (hand gesture, hand movements, head movements), etc.

OpenCV webcam server: This server runs on the main program, and its functionality is to receive events from the OpenCV client. Once, the events are received the principal program processes these events.

New emotional system server: it is in charge of receiving the events used to compute and update the emotional state, and also returning the emotional state in order to generate an adequate response through the speech synthesiser.

Synthesiser server: This module is in charge of synthesizing the voice. Its input parameters are the phrase and the emotion. Due to having a different synthesizer, it was necessary to separate this module from the previous dialogue system.

Synthesiser client: It runs on the main program, and is in charge of sending the service request to synthesizer server

Response Generator: This module even though is maintained in the main program, it was changed to add more concepts such as relationship to the user and the emotions. It looks for a given set of concepts and finds the appropriate phrases depending on the context. Setting more concepts allow us to create more adequate and realistic responses.

In Figure 3, the modules in blue are located on the main application program while the modules in green are outside the main program and are accessed through the use of a server/service by means of SOAP. The application program acts as the hub of the architecture managing the different rules.

The new emotional system needs the performance results of the different modules of the dialogue system in order to compute the emotions. As it is presented later, the idea behind the emotional model is to create it independently of the task in order to adapt it to other systems in the future. The different blocks/modules such as the recognizer, the language understanding module, the dialogue module, and newly introduced OpenCV video application supply with information the emotional model.

Figure 3: Architecture

In Figure 4, the data flow of the new architecture is shown. First, the speech recognizer translates the voice command into text, and also sends performance information (confidence of detection, phrases) to the emotional system. Second, the language understanding module extracts the semantic concepts of the text and passes this information to the dialogue manager, but also sends information to the emotional model. Then, the dialogue manager works on the list of semantic concepts from the previous and actual interactions, and sends information to the execution module that will operate the hi-fi system. While all these steps occur, the OpenCV webcam server in the main program is passing information to the emotional model about events detected by the webcam. The emotional system is parallely computing this information, and then as the dialogue manager sends the concepts to the response generator module, the emotional system updates the required information so that the response generator module will look into the appropriate response depending on the concepts. Finally, through the synthesizer client, the response with the emotion is passed to the synthesizer server that will reproduce the response.

(Speech RecognizerLanguage Understanding ModuleDialogue ManagerResponse GeneratorOpenCV ClientSynthesiser ClientOpenCV Server*Emotional System (Update Emotions)Synthesis ServerExecution Module (IR, Roomba, Andromotic)Voice CommandTextText & EmotionFace recognition / Smile recognition / Motion detectionEmotional ResponseIR commandsSet emotionsSemantic concepts)

SYSTEM ARCHITECTURE11

Figure 4: Architecture data flow

3. System Communications

In this chapter the research done about the different communication topologies is presented, the different tools available, and the considerations made to choose the implementation used in the final architecture. The communication systems allow the user to implement efficiently the architecture chosen, and provide communication between all the modules needed to operate the whole system.

3.1. Concepts

All computer systems could be classified in two categories: centralized and distributed, see Figure 5. Distributed systems can be divided into the Client-Server model and the Peer-to-Peer model. The Client-Server model can be flat where all clients only communicate with a single server, or it can be hierarchical for improved scalability. In a hierarchal model, the servers of one level are acting as clients to higher level servers [].

The Peer-to-Peer architecture is split into pure and hybrid architectures. The pure architecture works without a central server, whereas the hybrid architecture first contacts a server to obtain meta-information, such as the identity of the peer, on which some information is stored, or to verify security credentials [].

Figure 5: Classification of Computer Systems []

3.1.1. Peer-to-peer

A peer-to-peer, commonly abbreviated to P2P, is any distributed network architecture composed of participants that make a portion of their resources (such as processing power, disk storage or network bandwidth) directly available to other network participants, without the need for central coordination instances (such as servers or stable hosts) []. Peers are both suppliers and consumers of resources, in contrast to the traditional clientserver model where only servers supply, and clients consume.

A peer is a network node that can act as a client or a server, with or without centralized control, and with or without continuous connectivity. The term peer can be applied to a wide range of device types, including small handheld and powerful server-class machines that are closely managed [].

Figure 6: Peer-to-Peer Architecture

3.1.2. client/server model

The clientserver model of computing is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients []. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server machine is a host that is running one or more server programs which share their resources with clients. A client does not share any of its resources, but requests a server's content or service function. Clients therefore initiate communication sessions with servers which await (listen for) incoming requests.

Server: It is a provider of services; the server must compute requests and has to return the results with an appropriate protocol. A server as a provider of services can be running on the same device as the client is running on, or on a different device, which is reachable over the network. The decision to outsource a service from an application in form of a server can have different reasons.

Client: A client is typically a device or a process which uses the service of one or more servers. Since clients are often the interface between server-information and people, clients are designed for information input and visualization of information. Although clients had only few resources and functionality in the past, today most clients are PCs with more performance regarding resources and functionality [].

Figure 7: Client-Server Architecture

3.1.3. Advantages/Disadvantages

In the table below the advantages/disadvantages of each model are shown.

peer-to-peer

client-server

Advantages

In a pure Peer-to-Peer architecture there is no single point of failure, which means, if one peer breaks down, the rest of the peers are still able to communicate.

Peer-to-Peer provides the opportunity to take advantage of unused resources such as processing power for computations and storage capacity. In Client-Server architectures, the centralized system bears the majority of the cost of the system.

Peer-to-Peer prevent bottleneck such as traffic overload using central server architecture, because Peer-to-Peer can distribute data and balance request across the net without using a central server.

Data management is much easier because the files are in one location. This allows fast backups and efficient error management. There are multiple levels of permissions, which can prevent users from doing damage to files.

The server hardware is designed to serve requests from clients quickly. All the data are processed on the server, and only the results are returned to the client. This reduces the amount of network traffic between the server and the client machine, improving network

performance.

Disadvantages

Today many applications need a high security standard, which is not satisfied by current Peer-to-Peer solutions.

The connections between the peers are normally not designed for high throughput rates, even if the coverage of ADSL and Cable modem connections is increasing.

A centralized system or a Client-Server system will work as long as the service provider keeps it up and running. If peers start to abandon a Peer-to-Peer system, services will not be available to anyone.

Client-Server-Systems are very expensive and need a lot of maintenance.

The server constitutes a single point of failure. If failures on the server occur, it is possible that the system suffers heavy delay or completely breaks down, which can potentially block hundreds of clients from working with their data or their applications. Within companies high costs could accumulate due to server downtime.

Table 1: Advantages/Disadvantages P2P vs. Client-Server

3.2. Communication Methods & Tools

There exist several communication methods to pass data from one process/application to another one. Due to the architecture (based on a client-server model as it can be seen in chapter II), there is a need to review some of the mechanism used to communicate client-server.

3.2.1. Sockets

Internet sockets constitute a mechanism for delivering incoming data packets to the appropriate application process or thread, based on a combination of local and remote IP addresses and port numbers. Each socket is mapped by the operating system to a communicating application process or thread.

To use plain sockets a formatted message of some type is required. Sockets programming demands a lot of time, and if we compare the programming benefits with the time used to program and deal with the parsing, error-handling and related tasks for a custom message infrastructure against a SOAP based, it is found that is preferable to go for SOAP based standard mechanism that is more flexible, and hide all the complexity present in the programming of sockets.

3.2.2. Service oriented architecture (soa)

A service-oriented architecture (SOA) is a flexible set of design principles used during the phases of systems development and integration. A deployed SOA-based architecture will provide a loosely-integrated suite of services that can be used within multiple business domains.

Service-orientation requires loose coupling of services with operating systems, and other technologies that underlie applications. SOA separates functions into distinct units, or services , which developers make accessible over a network in order to allow users to combine and reuse them in the production of applications. These services and their corresponding consumers communicate with each other by passing data in a well-defined, shared format, or by coordinating an activity between two or more services .

SOA also generally provides a way for consumers of services, such as web-based applications, to be aware of available SOA-based services. For example, several disparate departments within a company may develop and deploy SOA services in different implementation languages; their respective clients will benefit from a well understood, well defined interface to access them. XML is commonly used for interfacing with SOA services, though this is not required.

Web services can implement a service-oriented architecture. Web services make functional building-blocks accessible over standard Internet protocols independent of platforms and programming languages. These services can represent either new applications or just wrappers around existing legacy systems to make them network-enabled.

Using SOA in the web service approach, each module can play one or both of the roles:

Service Provider - The service provider creates a web service and possibly publishes its interface and access information to the service registry. Each provider must decide which services to expose, how to make trade-offs between security and easy availability, how to price the services, or (if no charges apply) how/whether to exploit them for other value. The provider also has to decide what category the service should be listed in for a given broker service and what sort of trading partner agreements are required to use the service. It registers what services are available within it, and lists all the potential service recipients. The implementer of the broker then decides the scope of the broker. Public brokers are available through the Internet, while private brokers are only accessible to a limited audience, for example, users of a company intranet. Furthermore, the amount of the offered information has to be decided. Some brokers specialize in many listings. Others offer high levels of trust in the listed services. Some cover a broad landscape of services and others focus within an industry. Some brokers catalogue other brokers. Depending on the business model, brokers can attempt to maximize look-up requests, number of listings or accuracy of the listings. The Universal Description Discovery and Integration (UDDI) specification defines a way to publish and discover information about Web services. Other service broker technologies include (for example) ebXML (Electronic Business using eXtensible Markup Language) and those based on the ISO/IEC 11179 Metadata Registry (MDR) standard.

Service consumer - The service consumer or web service client locates entries in the broker registry using various find operations and then binds to the service provider in order to invoke one of its web services. Whichever service the service-consumers need, they have to take it into the brokers, then bind it with respective service and then use it. They can access multiple services if the service provides multiple services.

3.2.3. web services and soa

Web services technology is a collection of standards (or emerging standards) that can be used to implement an SOA. Web services technology is vendor and platform-neutral, interoperable, and supported by many vendors today.

Web services are self-contained, modular applications that can be described, published, located, and invoked over networks. Web services encapsulate business functions, ranging from a simple request-reply to full business process interactions. The services can be new or wrap around existing applications.

The W3C defines a "Web service" as a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards .

The Web Services Description Language (WDSL) is an XML-based language that provides a model for describing Web services. WSDL is often used in combination with SOAP and an XML Schema to provide web services over the Internet. A client program connecting to a web service can read the WSDL to determine what operations are available on the server. Any special data types used are embedded in the WSDL file in the form of XML Schema. The client can then use SOAP to actually call one of the operations listed in the WSDL.

3.2.3.1. Simple object access protocol (soap)

SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of three parts:

1. The format of a SOAP message is an envelope containing zero or more headers and exactly one body. The envelope is the top element of the XML document, providing a container for control information, the addressee of a message, and the message itself. Headers contain control information such as quality-of-service attributes. The body contains the message identification and its parameters.

2. Encoding rules are used for expressing instances of application-defined data types. SOAP defines a programming language independent data-type schema based on an XML Schema Descriptor (XSD), plus encoding rules for all data-types defined to this model.

3. RPC representation is the convention for representing remote procedure calls (RPC) and responses.

SOAP tries to pick up where XML-RPC left off by implementing user defined data types, the ability to specify the recipient, message specific processing control, and other features.

SOAP's greatest feature is its ability to step past XML-RPC's limitations and customize every portion of the message. This ability to customize allows developers to describe exactly what they want within their message. The downside of this is that the more you customize a message the more work it will take to make a foreign system do anything beyond simply parsing it.

In the example below, a GetStockPrice request is sent to a server. The request has a StockName parameter, and a Price parameter that will be returned in the response. The namespace for the function is defined in "http://www.example.org/stock".

(

IBM

)

Figure 8: SOAP request example

(

34.5

)

Figure 9: SOAP response example

3.3. Final implementation

Due to the advantages, features, and SOAP work experience in the GTH, it was the chosen protocol to create web services. The modules in the architecture could be in the same or in different machines over the network interacting with each other.

There exist a number of benefits by implementing SOAP, and it includes modularization, transparency, and distribution of computational load. By having specific web services, we can replace a model in our architecture in order to evaluate and compare its performance.

The toolkit called Jabon developed by the Intelligent Control Research Group of the Universidad Politecnica de Madrid, and based on the SOAP protocol was used in the development. The SOAP messages can be implemented in two ways:

1. Using a central server: A central server communicates clients and services. This hub contains rules, locations, input / output data parameters of each service. Clients make the request to the central server, and it associates a specific request with a specific service. The central server contains a set of rules (rules.xml) and agents (agents.xml) where all this information is stored.

(

)

Figure 10: Example rule.xml

2. Clients directly call a specific service in a known location: Clients can request services to the service provider directly. Clients know the location and the service input / output parameters.

In the final application, this approach was used. Different modules communicate with the main program that acts as a server without the use of rules, and the server communicates with specific services in specific locations when needed. The question is why the central hub provided by Jabon tools was not used, and the answer is that the main program acts as server that distributes the information; furthermore, another hub was not needed. Next, it is presented the XML implementation of a SOAP message that goes from the emotional module to the main application. The message is called sendemotionlevelstoHifi, and consists of several float variables that store the emotional information such as happy, sad, anger levels; these variables are sent to the Hi-Fi (main program) in order to process it and use it; for example to synthesize and generate an adequate response to a certain petition.

(

)

SYSTEM COMMUNICATIONS16

Figure 11: sendemotionstohifi Jabon-xml message

4. Emotional System

Emotion is fundamental to human experience, influencing cognition, perception, and everyday tasks such as learning, communication, and even rational decision-making. However, technologists have largely ignored emotions and created an often frustrating experience for people, in part because affect has been misunderstood and hard to measure. Nowadays, the study and development of systems and devices that can recognize, interpret, process, and simulate human emotions is known as affective computing. It is an interdisciplinary field spanning computer sciences, psychology, and cognitive science .

This modern branch of computer science originated by Rosalind Picard's 1995 paper on affective computing. The main motivation behind affective computing is the ability to simulate empathy. In order to do that the systems should be able to interpret emotional states of the users, and adapt its behaviour to them.

4.1. Emotional Theories

There are several theories that try to explain the emotional behaviour in human beings. To apply these theories, and adapt them to a robotic agent represents the first step to create an emotional agent. The emotional system is based on the Appraisal Theory. The Appraisal theory is a psychological theory that establishes that our emotions are based on the judgments made about the events happening around us. Therefore, the judgments/evaluations of a certain situation determine our emotional response.

The another foundation of the emotional system is the Need Theory based on the Maslow Hierarchy of Needs, which was adapted to the emotional agent in order to feed the appraisal model created.

This section provides the theoretical foundations of the emotional theories. Many of these theories complement with each other, and their studies, results, and conclusion have given important guidelines to apply the theories to a robotic agent. It is interesting to note that the majority of the computer models of emotions, if they refer expressly to psychological theories, are based on the so-called appraisal theories. These approaches can be converted into program code by adapting the models to the robotic agent.

4.1.1. Two factor theory of emotion

The Two Factor theory or the Schachter-Singer Theory of Emotion is a social psychological theory of affective experience, which integrates the role of both physiological arousal and cognitive factors in determining emotion. The theory posits that the experience of particular emotions is dependent on cognitive labels exerting a steering function over general physiological arousal.

The theory thus presents a model of emotional experience based on cognitive labels in response to physiological excitation. In this theory the individual senses the particular emotional object of the situation through the sense organs. An induced form of autonomic arousal then follows this perception. Accompanying this general pattern of sympathetic excitation is a specific cognitive label, which allows one to interpret this stirred-up state in terms of the characteristics of the precipitating situation and ones apperceptive mass.The theory also addresses the salience of feedback mechanisms, as past experience provide the framework within which one understands and labels his feelings.

Figure 12: Two Factor Theory []

4.1.2. Ortony, Clore, and Collins

Ortony, Clore, and Collins define a cognitive approach for looking at emotions. This theory is extremely useful for the project of modelling agents which can experience emotions. The cornerstone of their analysis is that emotions are valenced reactions. They do not describe events in a way that will cause emotions, but rather, emotions can occur as a result of how people understand events. This approach is surprisingly subtle and nuanced. There are many constraints and caveats, but these are all logical considering the perspective of the model.

One can be pleased about the consequences of an event or not (pleased/displeased); one can endorse or reject the actions of an agent (approve/disapprove) or one can like or not like aspects of an object (like/dislike).

A further differentiation consists of the fact that events can have consequences for others or for oneself and that an acting agent can be another one or oneself. The consequences of an event for another agent can be divided into desirable and undesirable; the consequences for oneself as relevant or irrelevant expectations. Relevant expectations for oneself finally can be differentiated again according to whether they actually occur or not (confirmed/disconfirmed).

This differentiation leads to the following structure of emotion types:

Figure 13: Structure of emotion types in the theory of Ortony, Clore and Collins []

The intensity of an emotional feeling is determined predominantly by three central intensity variables:

Desirability is linked with the reaction to events and is evaluated with regard to goals.

Praiseworthiness is linked with the reaction to actions of agents and is evaluated with regard to standards.

Appealingness is linked with the reaction to objects and is evaluated with regard to attitudes.

This theory further defines a set of global and local intensity variables. Sense of reality, proximity, unexpectedness and arousal are the four global variables which operate over all three emotion categories. The local variables, to which the central intensity variables mentioned above also belong, are:

EVENTS

AGENT

OBJECTS

Desirability

Praiseworthiness

Appealingness

Desirability for other

Strength of cognitive unit

Familiarity

Deservingness

Expectation deviation

Liking

Likelihood

Effort

Realization

Table 2: Ortony, Clore, and Collins variables

In a specific case, each of these variables is assigned a valueand a weight. Furthermore, there is a threshold valuefor each emotion, below which an emotion is not subjectively felt.

On the basis of this model the emergence of an emotion can be described in formal language: Let D (p, e, t) be the desirability (D) of an event (e) for a person (p) at a certain time (t). This function possesses a positive value for a desirable event, a negative value for a not desirable event. Furthermore let I g (p, e, t) be a combination of global intensity variables and P j (p, e, t) the potential for a state ofjoy. Then the following rule for "joy" can be provided:

IF D(p,e,t) > 0

THEN set Pj(p,e,t) = fj(D(p,e,t), Ig(p,e,t))

The resulting function f j releases a further rule which determines the intensity for joy ( I j) and thereby makes possible the experience of the joy emotion. Let T j be a threshold value, then:

IF Pj(p,e,t) > Tj(p,t)

THEN set Ij(p,e,t) = Pj(p,e,t) - Tj(p,t)

ELSE set Ij(p,e,t) = 0

If the threshold value is exceeded, this rule produces the emotion of joy; otherwise it supplies the value "zero", i.e., no emotional feeling. Depending upon the intensity of the emotion, different tokens are used for its description. Such tokens are words which describe this emotion.

Ortony, Clore and Collins supply no formalization for all of their defined emotions but give only a few examples. They postulate, however, that every emotion can be described using a formal notation, although with many emotions this is by far more complex than with the presented example.

4.1.3. Theory of roseman

The theory of Roseman, which he presented for the first time in 1979, was modified by him several times in the following years. It changed in partially substantial details; what remained the same was only the basic approach of an appraisal theory of the emotions.

Roseman developed his first theory based upon 200 written reports of emotional experiences. From the analysis of these documents, he derived his model, in which five cognitive dimensions determine whether an emotion arises and which one it is.

The first dimension describes whether a person possesses a motivation to a desired situational state or a motivation away of an unwanted situational state. The dimension thus knows about the states "positive" and "negative".

The second dimension describes whether the situation agrees with the motivational state of the person or not. The dimension knows about the states "situation present" or "situation absent".

The third dimension describes whether an event is noticed as certain or only as a possibility. This dimension knows the conditions "certain" and "uncertain".

The fourth dimension describes whether a person perceives the event as deserved or undeserved, with the two states deserved" and "undeserved".

The fifth dimension finally describes, from whom the event originates. This dimension knows the states "the circumstances", "others" or "oneself".

From the combination of these five dimensions and their values, a table can be arranged, from which, according to Roseman, emotions can be predicted.

Altogether 48 combinations can be formed of Roseman's dimensions (positive/negative x present/absent x certain/uncertain x deserved/undeserved x circumstances/others/oneself). With these 48 cognitive appraisals correspond, according to Roseman, 13 emotions.

After experimental examinations of this approach did not furnish the results postulated by Roseman, he modified his model. The second dimension of his original model (situation present or absent) now contained the states "motive consistent" and "motive inconsistent", whereby "motive consistent" always corresponds to the value "positive" of the first dimension and "motive inconsistent" to the value "negative" of the first dimension. In place of the alternatives "present" and "absent" now the terms "appetitive" and "aversive" were used.

A further correction concerned the fourth dimension of the original model (deserved/undeserved). Roseman replaced it by the dimension of strength, i.e. whether a person in a given situation perceives himself or herself as strong or weak. States of this dimension thus are "strong" and "weak".

Roseman also supplemented the third dimension of his original model (certain/uncertain) by a further state: "unknown". That was necessaryin order to incorporate the emotion of surprise in his model.

Roseman concedes that this model, too, could not be empirically validated. As a consequence he developed a third version of his . It differs from his second approach in several points: The fourth dimension (strong/weak) is replaced by a relational appraisal of the own control potential, with the states "low" and "high". The value "unknown" of the third dimension is replaced by the state "unexpected", since this is, according to Roseman, the condition for the emotion of surprise. And finally Roseman adds still another dimension for the negative emotions which he calls "type of problem". It describes whether an event is noticed as negative because it blocks a goal (with the result "frustration") or because it is negative in its nature (with the result" abhorrence"). This dimension has the states "non-characteristic" and "characteristic".

How far this (as of now) last model by Roseman can be proven empirically cannot be said. One weakness of the model, however, is evident: It has problems dealing with a situation in which one person makes two different appraisals. If, for example, a student is of the opinion that his teacher gives him a test that is not fair but knows at the same time that he has not sufficiently prepared for the test, then Roseman's model cannot clearly predict what the student's emotions are - because two states of the fifth dimension are present at the same time.

4.1.4. Theory of Frijda

Frijda points out that the word "emotion" does not refer to a "natural class" and that it is not able to refer to a well-defined class of phenomena which are clearly distinguishable from other mental and behaviour events. For Frijda, therefore, the process of emotion emergence is of larger interest.

The centre of Frijda's theory is the term concern. A concern is the disposition of a system to prefer certain states of the environment and of the own organism over the absence of such conditions. Concerns produce goals and preferences for a system. If the system has problems to realize these concerns, emotions develop. The strength of such an emotion is determined essentially by the strength of the relevant concern(s) .

Frijda defines six substantial characteristics of the emotion system which describe its function:

Concern relevance detection: The emotion subsystem announces the meaning of events for the concerns of the overall system to all other components of the system. This signal is called affect by Frijda. This means the system must be able to pick up information from the environment and from the system itself.

Appraisal: Next, the meaning of the stimulus for the concerns of the system has to be appraised. This is a two-stage process with the subprocesses relevance appraisal and context appraisal.

Control precedence: If the relevance signal is strong enough, it changes the priorities of perception, attention and processing. It produces a tendency to affect the behaviour of the system. Frijda calls this control precedence.

Action readiness changes: According to Frijda, this represents the heart of the emotional reaction. Change of the action readiness means changes in the dispatching of processing and attention resources as well as the tendency towards certain kinds of actions.

Regulation: Apart from the activation of certain forms of action readiness, the emotion system monitors all processes of the overall system and events of the environment which can affect this action readiness, in order to be able to intervene accordingly.

Social nature of the environment: The emotion system is adjusted to the fact that it operates in a predominantly social environment. Many appraisal categories are therefore of social nature; action readiness is predominantly a readiness for social actions .

For Frijda, emotions are absolutely necessary for systems which realize multiple concerns in an uncertain environment. If a situation occurs, in which the realization of these concerns appears endangered, so-called action tendencies develop. These action tendencies are linked closely with emotional states and serve as a safety device for what Frijda calls concern realization (CR) .

Frijda defines the following associated actions with emotion in parentheses :

Approach (Desire)

Avoidance (Fear)

Being-with (Enjoyment, Confidence)

Attending (interest)

Rejecting (Disgust)

Nonattending (Indifference)

Agonistic (Attack/Threat, Anger)

Interrupting (Shock, Surprise)

Dominating (Arrogance)

Submitting (Humility, Resignation)

According to Frijda, a functioning emotional system must have the following components :

Concerns: Internal representations against which the existing conditions are tested.

Action Repertoire: Consisting of fast emergency reactions, social signals and mechanisms to develop new plans.

Appraisal Mechanisms: Mechanisms which establish the fit between events and concerns as well as connections to the action control system and the action repertoire.

Analyser: Observation of the incoming information and subsequent coding regarding their implications and consequences.

Comparator: Test of all information on concern relevance. The result are relevance signals, which activate the action system and the Diagnoser and cause attentional arousal.

Diagnoser: Responsible for context evaluation, scanning the information for action-relevant references. Performs a number of tests (e.g. whether consequences of an event are safe or uncertain, who is responsible for it etc.) and results in an appraisal profile.

Evaluator: Agreement or discrepancy signals of the Comparator and the profile of the Diagnoser are combined into the final relevance signal and its intensity parameter. The intensity signals the urgency of an action to the action system. The relevance signal constitutes the so-called control precedence signal.

Action Proposer: Prepares the action by selecting a suitable alternative course of action and by making available the resources necessary for it.

Actor: Generates actions.

This general description of an emotional system can be formalized in such a way that it can form the basis for a computer model:

Figure 14: Frijda Emotional System []

4.1.5. Theory of Oatley and Johnson-Laird

Oatley and Johnson-Laird assume in their theory, called by them "communicative theory of emotions" , a hierarchy of parallel working processing instances, which work on asynchronously different tasks. These instances are coordinated by a central control. The control system contains a model of the entire system.

The individual modules of the system communicate with one another, so that the system can work. According to Oatley and Johnson-Laird there are two kinds of communication. They call the first kind propositional or symbolical;through it actual information about the environment is conveyed. The second kind of communication is nonpropositional or of emotional nature; its task is not to convey information but to shift the entire system of modules into a state of increased attention, the so-called emotion mode. This function is comparable to global interrupt programs on computers:

Emotion signals provide a specific communication system which can invoke the actions of some processors [modules] and switch others off. It sets the whole system into an organized emotion mode without propositional data having to be evaluated by a high-level conscious operating system...The emotion signal simply propagates globally through the system to set into one of a small number of emotion modes.

According to Oatley, the central postulate of the theory is that each goal and plan has a monitoring mechanism that evaluates events relevant to it. When a substantial change of probability of achieving an important goal or sub goal occurs, the monitoring mechanism broadcasts (to the whole cognitive system) a signal that can set it into readiness to respond to this change. Humans experience these signals and the states of readiness they induce as emotions.

Emotions coordinate quasi-autonomous processes in the nervous system by communicating significant way marks of current plans (plan junctures). Oatley and Johnson-Laird bring such plan junctures in connection with elementary emotions:

Plan juncture

Emotion

Subgoals being achieved

Happiness

Failure of major plan

Sadness

Self-preservation goal violated

Anxiety

Active plan frustrated

Anger

Gustatory goal violated

Disgust

Table 3: Plan junctures []

Since they arise at plan junctures, emotions are a design solution for problems of plan changes in systems with a multiplicity of goals.

The name "communicative theory of emotions" was chosen because it is the task of emotions to convey certain information to all modules of the overall system.

4.2. theories behind our emotional system

The Nemo emotional model is based on the study and adaptation of psychological theories that try to explain how motivation, emotions and appraisal work in human beings. The adaptations are necessary in order to create a model for an intelligent system restricted to domestic use.

The idea behind the emotional model implemented in the Hi-Fi system, is to create a more human-like interface that will provide more comfort and naturalness as the user interacts with it. The basic theories that support are Maslows hierarchy of needs and the appraisal theory.

4.2.1. Maslows hierarchy of needs

Abraham Maslow attempted to synthesize a large body of research related to human motivation. Prior to Maslow, researchers generally focused separately on factors such as biology, achievement, or power to explain what energizes, directs, and sustains human behaviour. Maslow posited a hierarchy of human needs based on two groupings: deficiency needs and growth needs. Within the deficiency needs, each lower need must be met before moving to the next higher level. Once each of these needs has been satisfied, if at some future time a deficiency is detected, the individual will act to remove the deficiency. Maslow believed that these needs are similar to instincts and play a major role in motivating behaviour. The first four levels that represent the deficiency needs are:

1. Physiological: hunger, thirst, bodily comforts.

2. Safety/security: out of danger.

3. Belongingness and Love: affiliate with others, be accepted.

4. Esteem: to achieve, be competent, gain approval and recognition.

As people progress up the pyramid, needs become increasingly psychological and social. Soon, the need for love, friendship and intimacy become important. Further up the pyramid, the need for personal esteem and feelings of accomplishment take priority. An individual is ready to act upon the growth needs if and only if the deficiency needs are met. Growth needs do not stem from a lack of something, but rather from a desire to grow as a person. Maslow emphasized the importance of self-actualization, which is a process of growing and developing as a person to achieve individual potential. Maslow's initial conceptualization included only one growth need called self-actualization. Self-actualized people are characterized by:

1. Being problem-focused

2. Incorporating an ongoing freshness of appreciation of life

3. A concern about personal growth

4. The ability to have peak experiences.

The figure below shows the Maslow Hierarchy of needs.

Figure 15: Maslow Hierarchy of Needs

The different levels in Maslows hierarchy of needs are explained as

Physiological Needs

These include the most basic needs that are vital to survival, such as the need for water, air, food and sleep. Maslow believed that these needs are the most basic and instinctive needs in the hierarchy because all needs become secondary until these physiological needs are met.

Security Needs

These include needs for safety and security. Security needs are important for survival, but they are not as demanding as the physiological needs. Examples of security needs include a desire for steady employment, health insurance, safe neighbourhoods and shelter from the environment.

Social Needs

These include needs for belonging, love and affection. Maslow considered these needs to be less basic than physiological and security needs. Relationships such as friendships, romantic attachments and families help fulfil this need for companionship and acceptance, as involvement in social, community or religious groups does.

Esteem Needs

After the first three needs have been satisfied, esteem needs becomes increasingly important. These include the need for things that reflect on self-esteem, personal worth, social recognition and accomplishment.

Self-actualizing Needs

This is the highest level of Maslows hierarchy of needs. Self-actualizing people are self-aware, concerned with personal growth, less concerned with the opinions of others and interested fulfilling their potential.

4.2.2. Appraisal theory

Appraisal theory is the idea that emotions are extracted from our evaluations/judgements of events that cause specific reactions in different people. Essentially, our appraisal of a situation causes an emotional, or affective, response that is going to be based on that appraisal. An example of this is going on a first date. If the date is perceived as positive, one might feel happiness, joy, giddiness, excitement, and/or anticipation, because they have appraised this event as one that could have positive long term effects, i.e. starting a new relationship, engagement, or even marriage. On the other hand, if the date is perceived negatively, then our emotions, as a result, might include dejection, sadness, emptiness, or fear.

Appraisal Theory came to be as an explanation of the following:

Intensity of Response and Variance: How can we account for varying emotional response and degree of response in a situation? : There are several distinct emotions (such as joy, sadness, fear, and anger), as manifested in different facial expressions observable across cultures. These similarities indicate that emotion is more universal than originally thought, and thus Appraisal Theory helps to explain the question of degree and affective variability.

Different Reaction in Similar Situations: How can we explain individual differences in affective responses to the same stimulus?: If not using Appraisal Theory to explain this phenomenon, a stimulus should cause the same reaction in every individual who encounters the stimulus. In terms of Appraisal Theory, an aroused state will elicit different responses from different people depending on the context preceding arousal. For example, if a friendshipis coming to an end, one person might feel sadness,guilt,anger, while the other person could possibly feelreliefandapathy. Based on each persons view of the friendship, their affective response to the dissolution of the relationship will be viewed differently.

Different Stimuli and Similar Reactions: How can we account for the array of stimuli that cause a similar affective response?: There is no way to quantify all the stimuli that lead to a particular affective response. Any range of context, whether considered normal to produce a particular emotional outcome or not, can produce any emotion. More narrow theories cannot account for the discrepancies in response and stimuli, where Appraisal Theory can.

The Start of Affective Response: What begins the emotional response?: Appraisal Theory accounts for the fact that our affective responses arent pulled from thin air. A response to a stimulus is intensified within the context of a current situation. For example, if a person was to lose their mother, and a month later, lose an acquaintance, the emotional response to losing an acquaintance will be intensified by the context of having recently lost a parent, more so than if they had not recently lost a close loved one.

Role of effective Emotional Response: How can we explain the effectiveness of an affective response?: Based in Appraisal Theory, if we react in anger to a situation where anger would be a waste in energy, we are ineffectivelycoping with the situation. Our emotional responses are highly evolved so they waste as little energy as possible while helping us to manage a situation.

When Affective Responses Seem Irrational: How do we explain the absurdity of emotions?: Appraisal Theory helps to clarify why irrational emotions are ok. Other theories that state that emotions function to help us achieve our goals, and we can stop them at any time, cannot explain these irrational affective responses. Appraisal Theory, instead, is used as an explanation for how illogical emotions can be disruptive, but does not try to justify them. []

4.3. Nemo: Emotional System

The model adapts human emotion theories to an emotional agent. The theories on which the model is based are the Maslows hierarchy of needs and the Appraisal theory as we saw. This emotional system is currently being developed by Syaheerah Lutfi at the Speech Technology Group in this University.

The main objective of the emotional system is the ability to be scalable and adaptable. It is aimed to be used in domestic agents. The agent calculates its emotions according to appraisal variables that depend on needs. The Nemo architecture can be seen in the next Figure. Tasks provide inputs to certain need levels (e.g.: Battery level is related to the survival level). Different events are inputs to these need levels, and they are recalculated taking into consideration previous states. While needs are calculated, the agent appraises its situation according to certain appraisal variables (Desirability, Familiarity, Unexpectedness, etc). Appraised needs are output as vectors called the Need Independent Features (NIF). Each NIF vector is mapped into an emotion of a specific type and intensity following an Emotion Matrix.

(EmotionsTask1Task2Task3Task4TasksNeedsAppraisals (NIF))

Figure 16: Nemo Architecture

4.3.1. Needs

Needs are based on the Maslows pyramid adapted to an intelligent emotional system. Each level is evaluated in a scale from 0 to 100. Being 0 less satisfied and 100 totally satisfied. The levels adapted to the domestic agents are described below:

Survival: As its name says it depends on the critical components of the Hardware (battery life, memory). Being this level at base (higher priority), the failure of satisfying this need would put the system to a halt. The formula is:

Safety: Its related with the availability of the network and its resources. If a network resource is not available for more than a certain period of time, safety levels decreases. Computation for safety takes place periodically (e.g. every 100 ms).

Social: It depends on how often the system interacts with the user. If the interaction was made by voice or a webcam. Did the system receive a caress? Its related with agent interpretation of his social environment.

Success: It depends on the accomplishment or non-accomplishment of certain objectives (recognition, positive interaction with the user, winning a game, etc). There are positive and negative success events that can increase or decrease this level.

Ethics: This level is influenced by threats, insults, cheating on a game, etc. (e.g. If the system detects that the user is constantly trying to confuse the system, this level will be low). The computation of Ethics is similar to Success because there are positive and negative events.

The need levels are not constant in time. They are constantly changing by new events and by decay due to time lapses.

4.3.2. Appraisals

Appraisals take the information from the need hierarchy pyramid being totally independent from tasks. These appraised need levels are output as NIF: Need Independent Features which are mapped into a specific emotion and intensity.

The values of each appraisal range from 0 to 100. They are described as follows:

Desirability: It ranges from -100 to 100, and it refers to the degree of satisfaction of the situation. An event is desirable if it contributes in satisfying the agents needs; contrarily, an undesirable event brings difficulties to the agent. Desirability is modelled by observing the current and the previous state.

Unexpectedness: Refers to the degree of unexpectedness of a certain situation. Unexpectedness situations produce surprise, and might suspend actions being performed in order to understand the situation. The degree of unexpectedness is based on whether the system predicts or not an event based on its experience. Unexpectedness can be produced by positive or negative events.

Where,

Level(n) is the value of current state

Level (n-1) is value of previous state and

Level (n-2) is two previous states (for all the computations).

Relevance: Indicates the importance of a certain event or situation. An event is relevant if it inhibits the system from satisfying its needs or when the system is in critical conditions. Being 0 totally irrelevant and 100 is completely relevant.

Where CriticalValue=depends on the Need.

Urgency: Estimates the time and the distance to critical levels. The urgency considers the distance between the current state of the need and the critical level. In other words, the time available before reaching a critical situation. A situation is deemed urgent when the distance between the state of the need and the critical zone is narrow or when it reaches the critical level. A situation is perceived as not so urgent if there is a reasonable distance between the state of the need and the critical zone.

Controllability: Controllability does not directly depend on the levels. Whether or not an event is controllable depends on the task. Events that are considered controllable are positive, while events that are uncontrollable are negative. The system periodically computes this by comparing the positive or the negative events against all actions.

Unfamiliarity: Depend on whether the system recognizes the event or the state as a known situation or not. If the difference in changes of several events over time is exactly the same, the agent becomes increasingly familiar with the event. After a number of sufficient repetitions, the unfamiliarity level reaches a zero-utility (as opposed to being highly familiar with the particular change of event). On the other hand, if there is a new change of events that never took place before, the situation is considered unfamiliar.

Where:

MeanLevel: is the mean of the Levels

StandardDeviation= is the Standard Deviation

Changeability: Changeability is related to the range of the need variation along time axis. As estimation, 4.0*sigma is chosen. The wider the range of the need variation, the higher the changeability.

Priority (Weight): It takes into account the need level producing the event so that importance of the need is projected.

4.3.3. Emotions

The main goal of the architecture is emotions not being directly dependent of tasks, and needs. Updates on the need values generate changes on the NIF vectors that become the input for the computation of emotion and its dynamic. If Needs are not updated for some time, the emotion level decreases, and reaches the neutral level.The emotions modelled in the Hi-Fi systems are basic and are limited by the application. (e.g The system cannot be disgusted). They range from 0 to 100. The list of emotions is presented below:

Surprise: It is provoked by new events, which the system unexpected.

Fear: It is provoked by low levels in the critical (lower) levels of the needs pyramid.

Happiness (Joy): Provoked by social interaction, and successful accomplished task.

Sadness: it is provoked by absence of interaction with the system

Neutral: The neutral emotion represents 100 minus the sum of all other emotions.

Anger with itself: Provoked when certain task very controllable cannot be achieved

Shame: Provoked by controllable ethics generated event.

In general, the emotion dynamics are expressed by the computation below:

Where

: represents the flow of emotion decay, causing neutral emotion to increase.

Weight:: represents a constant weight related to the different need levels in the pyramid. Lower levels have higher weights.

sign: represents the positive or negative sign of desirability

Kur,Ku,Kf,Kagent,Kd: are the weights of the corresponding appraisal variables. The weights depend on distinctive circumstances

In NEMO, positive sign of desirability are related to positive emotions, happiness and positive surprise. Positive events increase the probability of the achievement of any of these two emotions. Negative sign of desirability is related to the rest of the emotions such as fear, sadness, anger and negative surprise. []

4.3.3.1. Constant weight f(w)

Happiness, surprise and sadness are bounded to higher levels of needs in the need pyramid, while fear is related to lower levels. Therefore, for the latter, weight is only applied if the affected need level is lower or equal to Safety. This indicates that fear is more related to survival or safety satisfaction levels rather than to higher need levels. On the contrary, anger is weighted only when the affected need level is superior or equal to Success. This translates to the agent being angry when his success need is not satisfied, or when there is an absence of ethics in the events that took place. Additionally, since events that provoke anger could also provoke fear, weight for fear is lifted in these levels.

4.3.3.2. Emotion timing

There are two distinguished types of emotions based on time; short-lived emotions such as surprise, and long-lived emotions such as happiness and sadness. As mentioned, surprise is viewed as an adaptive response strategy due to unexpectedness. After a while, these emotions are suppressed and return to neutral.

4.3.3.3. Neutral emotion

There are no events that provoke neutral emotion, rather, the absence of events attributes to neutralness. Hence, neutral emotion is one hundred (100) minus the sum of all emotions.

The rise of a particular emotion causes neutralness to fall, while the opposite cause it to rise. Neutralness is modeled in such way so that the agent is accustomed to his state of need satisfaction, and to encourage dynamic emotional expressions over a period of time.

4.3.4. Mapping appraisals

Each of emotions has associated weight values that correspond to each of the appraisal variables. These weights determine the behaviour for each emotion.

Sign

Desirability

Unexpectedness

Urgency

Controllability

Unfamiliar

Changeability

Emotion

0

1.0

2.0

1.0

2.0

0.0

0.5

Surprise

-1

1.0

1.0

3.0

5.0

0.0

0.0

Fear

+1

1.0

1.0

1.0

2.0

0.0

0.0

Happiness

-1

1.0

1.0

1.0

2.0

0.0

0.0

Sadness

-1

4.0

1.0

2.0

1.0

0.0

0.0

Anger

-1

1.0

1.0

4.0

1.0

0.0

0.0

Shame

EMOTIONAL SYSTEM44

Table 4: Mapping Appraisal Weights into Emotions

5. Object oriented analysis

It is always important to take into consideration the numerous advantages of object oriented analysis (OOA) because it provides code efficiency, flexibility, easy maintenance, stability, and reusability. OOA organizes both information and the processing that manipulates that information, according to the real world objects that the information describes [].

For designing the emotional system architecture NEMO, the use of OOA provided great advantages. In this chapter, the basic OOA concepts used in the design and development of the NEMO architectural classes are explained, and then the implementation of the different classes is reviewed. As our emotional system is based on the evaluation of needs, appraisals in order to calculate the emotions, several classes were implemented, and their purpose, and relationships will be explained.

The terms class and object are sometimes used interchangeably, but in fact, classes describe the type of objects, while objects are usable instances of classes. The object oriented methodology does not focus its analysis neither in the data nor the functions, but in the objects. An object is an entity with certain behaviour. It carries data, can have relationships with other objects, and most important the object includes the functions to get or change its own data. So, in the object oriented analysis the main task are to identify the different behaviours, attributes, services, and functions that can be performed on, to, with, or by the objects [].

5.1. Objects

Objects can be classified in three types according to Jacobson, but there exist other classifications. The classification helps us to identify the objects and associate it with real world objects

Entity Objects: These are the ones that mirror the objects in the users real world and carry the data the users are primarily interested in. All the behaviour associated is modelled by the methods in the objects []. This objects can be categorized in:

Concrete Objects: They are tangible kinds of objects (e.g.: person, car, store, book)

Conceptual Objects: They are intangible objects, typically defined in terms of other classes or objects. (e.g.: organization, strategy)

Event and State Objects: As it names says, they are related when an event occurs or some or more objects change to a different state (e.g. : sale, deposit, status)

Interface objects: They are used to handle communication between the systems and external entities such as users, operator, or other systems []. (e.g. radio buttons, windows, scroll bars)

Control Objects: These methods typically carry out a task involving data from many different classes of objects. They carry very little data, and usually dont have many attributes. Most of the data is found in other objects [].

5.2. classes

Classes are groups of objects with common behaviour (operations), similar properties (attributes), same semantic and common relationship with other objects []. An object of a given class is called an instance of the class. Classes provide a place to put the program code for the methods and the definitions and data types for the attributes need not to be repeated for each object instance. Every class implements an interface by providing structure and method implementations.

5.2.1. Methods of a class

A method is a subroutine associated with a class. Methods provide a mechanism for accessing and manipulating the encapsulated data stored in an object. There exist several types of methods in a class:

Constructor methods are in charge of creating the instance/objects of a class. Whenever a class or struct is created, its constructor is called. A class or struct may have multiple constructors that take different arguments. Constructors enable the programmer to set default values, limit instantiation, and write code that is flexible and easy to read []. Instances of a class share the same set of attributes yet may differ in what those attributes contain. For example, a class "Person" would describe the attributes common to all instances of the Person class. Each person is generally like the others, but they vary in such attributes as "height" and "weight". The description of the class would name such attributes and define the actions which a person can perform, such as "run", "jump", "sleep", "walk", etc.

Destructor methods are in charge of destructing the instance previously created. It is generally the case after an object is used; it is removed from memory to make room for other programs or objects to take that object's place. In order for this to happen, a destruction method is called upon that object. Destroying an object will cause any references to the object to become invalid [].

Instance method is a method which is associated with one object and uses the instance variables of that object.

Static Methods are associated with the class. They cannot be used by the instances created, but by the class itself.

5.2.2. Encapsulation & Accessibility

Its main purpose is to separate the interface of a class from its implementation. Hiding the internal variables and methods of a class protects its integrity by preventing other process from setting the internal data into an invalid or inconsistent state. Methods and variables that are not intended to be used from outside of the class or assembly can be hidden to limit the potential for coding errors or malicious exploits. A benefit of encapsulation is that it can reduce system complexity and thus increases robustness, by allowing the developer to limit the interdependencies between software components. A common set of access specifiers that many object-oriented languages support are:

Private restricts the access to the class itself in which it is defined. Only methods that are part of the same class can access private members.

Protected allows the class itself and all its subclasses to access the member.

Public means that the variables, methods or class are accessible to every process

5.2.3. Inheritance

Inheritance enables you to create new classes that reuse, extend, and modify the behavior that is defined in other classes. The class whose members are inherited is called the base class, and the class that inherits those members is called the derived class .Conceptually, a derived class is a specialization of the base class. For example, if you have a base class