Dynamic Story Shadows and Sounds - Técnico Lisboa...Fig. 12 - Ghostwriter .....26 Fig. 13 – Berndt’s system overview of the approach to realtime adaptive orchestration and performance.27

Dynamic Story Shadows and Sounds

António João Martins Leonardo

Dissertação para obtenção do Grau de Mestre em

Engenharia Informática e de Computadores

Júri

Presidente: Prof. Paulo Jorge Pires Ferreira

Orientador: Prof ª Ana Maria Severino Almeida e Paiva

Vogais: Prof. David Manuel Martins de Matos

Junho 2009

Abstract

English

Sound and Music have always surrounded those who are lucky to be aware of its

existence. The world would not be complete without the existence of them. Besides the

entertainment aspect of music, it can also be used as a way of complementing and adding

information to a story.

In the context of storytelling environments, this is especially useful since we want the

viewers to not only enjoy the visualization of the story but also to understand it the best

possible. The task of scoring a story becomes more challenging when the story is being created

in real-time, since there is not time to prepare and adapt the music to apply on that story before

it is being shown to the viewers.

This work consists in the creation of a solution that answers to the challenge of using

music and sounds to score a story being created in real-time in a storytelling environment. We

use pre-composed music and sounds to score the story and we change musical parameters

according to changes happening in the story. The characters present in scene, their actions and

the emotions felt by these are some of the features which this work relies on to make those

changes.

The experiments validated the proposed solution as an addition of entertainment and

understanding to a story, making it more complete and enjoyable to watch.

Keywords: Music, Real-Time scoring, Storytelling environment, Emotions,

Entertainment, Understanding

Portuguese

Som e Música têm sido sempre uma presença constante na vida de quem tem a sorte

de os conseguir ouvir. O mundo não seria completo sem a existência destes. Além do aspecto

de entretenimento que a música proporciona, também pode ser usada como uma forma de

complementar e dar mais informações sobre uma história.

No contexto de ambientes contadores de histórias, isto é especialmente útil pois é

desejável que não só seja agradável visualizar a história mas também que se perceba o melhor

possível o que estamos a ver. A tarefa de criar um acompanhamento musical para uma história

torna-se um desafio maior quando esta está a ser criada em tempo-real, pois não há tempo

para preparar e adaptar a música e sons a serem aplicados à história antes desta estar a ser

mostrada à audiência.

Este trabalho consiste na criação de uma solução que responde ao desafio de utilizar

música e sons para acompanhar uma história que está a ser criada em tempo-real num

ambiente contador de histórias. São utilizados música e sons pré-compostos para acompanhar

essa história e são mudados parâmetros musicais conforme alterações que surgem na história.

As personagens presentes em cena, as suas acções e as emoções sentidas por estas são

algumas das características nas quais o trabalho se baseia para fazer estas alterações.

Os resultados validaram a solução proposta como uma adição de entretenimento e

compreensão a uma história, tornando-a mais completa e agradável de visualizar.

Palavras-chave: Música, Acompanhamento musical em tempo-real, Ambientes

contadores de histórias, Emoções, Entretenimento, Compreensão

Contents 1. Introduction................................................................................................................................ 1

1.1 Motivation............................................................................................................................ 1 1.2 Objectives ........................................................................................................................... 2 1.3. Outline................................................................................................................................ 3

2. Storytelling environments .......................................................................................................... 5 2.1. A Generic Storytelling Environment ................................................................................... 5 2.2. Valence and Arousal .......................................................................................................... 6 2.3. Story Development ............................................................................................................ 7 2.4. Related Storytelling Environments..................................................................................... 8 2.5. Summary.......................................................................................................................... 11

3. Background: Music, Emotions and Film Scoring..................................................................... 12 3.1. Music and Emotions......................................................................................................... 12 3.2. Film Scoring ..................................................................................................................... 17 3.3. Summary.......................................................................................................................... 23

4. Related Work........................................................................................................................... 24 4.1. Marc Downie’s work ......................................................................................................... 24 4.2. Roberto Bresin’s work...................................................................................................... 25 4.3. Pietro Casella’s work ....................................................................................................... 25 4.4. Andrew de Quincey’s work .............................................................................................. 26 4.5. Nakamura’s work ............................................................................................................. 26 4.6. Alex Brendt’s work ........................................................................................................... 27 4.7. David Cope’s work ........................................................................................................... 27 4.8. Summary.......................................................................................................................... 28

5. The sound of D3S ................................................................................................................... 30 5.1. Understanding and enjoying a story ................................................................................ 30 5.2. Event sounds and background music.............................................................................. 32 5.3. The Artistic task................................................................................................................ 37 5.4. Adding dynamism to music and sounds .......................................................................... 38 5.5. Summary.......................................................................................................................... 40

6. Architecture and Integration .................................................................................................... 41 6.1. I-Sounds........................................................................................................................... 41 6.2. Apollo – D3S Music Manager .......................................................................................... 46 6.3. Integration with the storytelling systems .......................................................................... 51 6.4. Summary.......................................................................................................................... 55

7. Evaluation................................................................................................................................ 56 7.1. General system evaluation - D3S and story enjoyment................................................... 56 7.2. Background music’s tempo and environment’s intensity................................................. 58 7.3. Association between characters and instruments ........................................................... 61 7.4. Themes associated to characters .................................................................................... 63 7.5. Use of sounds to underscore events ............................................................................... 67 7.6. Volume and relations between characters....................................................................... 69 7.7. Character theme and mood ............................................................................................. 71 7.8. Music associated to story’s key moments – Friendship and Duel Themes..................... 73 7.9. Music associated to story’s key moments – Climax Theme ............................................ 76

8. Conclusions ............................................................................................................................. 78 8.1. Future Work ..................................................................................................................... 79

References .................................................................................................................................. 81 Appendix A .................................................................................................................................. 84 Appendix B .................................................................................................................................. 87

List of Figures Fig. 1 – A generic storytelling environment ...................................................................................................5 Fig. 2 - Emotion Model (Russel,1980)...........................................................................................................6 Fig. 3 - Freytag’s Pyramid.............................................................................................................................7 Fig. 4 - Affective Guideline............................................................................................................................8 Fig. 5 – I-Shadows System ...........................................................................................................................9 Fig. 6 – I-Shadows story .............................................................................................................................10 Fig. 7 – FearNot! story ................................................................................................................................11 Fig. 8 - Thayer’s model of mood .................................................................................................................14 Fig. 9 - Features extracted from music and their correlation with emotional state ......................................15 Fig. 10 – Relations between musical parameters and human expressions ...............................................17 Fig. 11 – Project (void *) .............................................................................................................................24 Fig. 12 - Ghostwriter ...................................................................................................................................26 Fig. 13 – Berndt’s system overview of the approach to realtime adaptive orchestration and performance.27 Fig. 14 – Relation between discussed works and D3S ...............................................................................29 Fig. 15 – Story enjoyment and understanding.............................................................................................31 Fig. 16 – D3S Framework ...........................................................................................................................32 Fig. 17- Association between Mood Values and the type of Hero Theme...................................................35 Fig. 18 - Changes of acts in I-Shadows ......................................................................................................36 Fig. 19 - Possible discretization of the General Music ................................................................................36 Fig. 20 - Disney’s Peter and the Wolf – 1946..............................................................................................39 Fig. 21 – Freytag’s pyramid with music’s Tempo variable...........................................................................40 Fig. 22 – Original I-Sounds architecture......................................................................................................42 Fig. 23 – Integration of Apollo .....................................................................................................................44 Fig. 24 – D3S Interface ...............................................................................................................................46 Fig. 25 - Input and Output of Apollo – The music manager.........................................................................49 Fig. 26 - Internal Structure of Apollo for the Background Music..................................................................50 Fig. 27 - Communication between I-Shadows and I-Sounds ......................................................................51 Fig. 28 – Interaction example between I-Shadows and I-Sounds ...............................................................52 Fig. 29 - FearNot! Flow Chart......................................................................................................................54 Fig. 30 - Different scoring systems and associated interest........................................................................58 Fig. 31 - Different music’s tempo speeds and associated perception of the story intensity .......................60 Fig. 32 – Percentages of recognition ..........................................................................................................62 Fig. 33 – Frequency percent of score .........................................................................................................63 Fig. 34 - Characters used on this experiment .............................................................................................64 Fig. 35 – Percentages of recognition ..........................................................................................................65 Fig. 36 – Frequency percent of score .........................................................................................................65 Fig. 37 – Hero, Friend and Villain Colors ....................................................................................................66 Fig. 38 – Experiment 5 results ....................................................................................................................68 Fig. 39 – Experiment 6 box plot ..................................................................................................................70 Fig. 40 – Experiment 7 results ....................................................................................................................72 Fig. 41 – Participants distribution................................................................................................................73 Fig. 42 - Answers for the videos with the boy and the dragon ....................................................................75 Fig. 43 - Answers for the videos with the boy and the fairy.........................................................................75 Fig. 44 – Experiment 9 answers.................................................................................................................77 Fig. 45 – Brief description of the testing procedure.....................................................................................84 Fig. 46 – Experiment 1 versions..................................................................................................................88 Fig. 47 - Experiment 1 Descriptive Statistics...............................................................................................88 Fig. 48 - Experiment 1 Friedman Test.........................................................................................................88 Fig. 49 – Experiment 2 versions..................................................................................................................89 Fig. 50 - Experiment 2 Descriptive Statistics...............................................................................................89 Fig. 51 - Experiment 2 Kruskal-Wallis Test .................................................................................................90 Fig. 52 - Experiment 3 Descriptive Statistics...............................................................................................90 Fig. 53 - Experiment 3 Chi-Square Test......................................................................................................91 Fig. 54 – Experiment 4 versions..................................................................................................................92 Fig. 55 - Experiment 4 Descriptive Statistics – with sound..........................................................................92 Fig. 56 - Experiment 4 Chi-Square test – with sound..................................................................................93 Fig. 57 - Experiment 4 Chi-Square Test – without sound............................................................................94 Fig. 58 – Experiment 5 versions..................................................................................................................95 Fig. 59 - Experiment 5 Descriptive Statistics...............................................................................................95 Fig. 60 - Experiment 5 Mann-Whitney test..................................................................................................95 Fig. 61 – Experiment 6 versions..................................................................................................................96 Fig. 62 - Experiment 6 Kruskal-Wallis test ..................................................................................................96 Fig. 63 – Experiment 7 versions..................................................................................................................97

Fig. 64 - Experiment 7 Kruskal-Wallis test ..................................................................................................97 Fig. 65 – Experiment 8 versions..................................................................................................................98 Fig. 66 - Experiment 8 Kruskal-Wallis test for themes ................................................................................98 Fig. 67 - Experiment 8 Kruskal-Wallis test for characters ...........................................................................99 Fig. 68 - Experiment 8 Kruskal-Wallis test for versions...............................................................................99 Fig. 69 – Experiment 9 versions................................................................................................................100 Fig. 70 - Experiment 9 Descriptive Statistics.............................................................................................100 Fig. 71 - Experiment 9 Mann-Whitney test................................................................................................100

1/100

1. Introduction

Have you ever imagined the world without sounds and music? It would certainly be an

incomplete world where people couldn’t communicate with each other through speech and

where the birds couldn’t sing. Bands or composers would not exist and we wouldn’t be able to

hear those great musical pieces that delight our ears every time they are played. The movies

would be silent and there would not be any music or sounds on them.

Sound and Music have a big influence on the way we experience everything around us. It is

known that, if we didn’t have those, the world wouldn’t be perceived as it is today. We cannot

deny the existence and importance of sounds and music – they are everywhere. Therefore, they

play a big role in entertainment as well.

Music and sounds can be used to score a story that is being told in the cinema, theatre,

animated cartoons or games. By that, we mean that they can be used to accompany that story

in a way that they can emphasize situations of the story, create suspense and generate fear on

the viewer or other emotions.

We can think about the early 20th century silent movies, where it wasn’t possible to record any

dialogue. However, even though there was not any spoken dialogue, the directors tried to at

least use some background music to bring some color to those black and white silent movies of

that era. Even though that background music did not have much more than the purpose of

entertaining the audience so that they would not be surrounded by complete silence, the use of

music to accompany films and stories have evolved.

Nowadays, music is used as a tool to help the understanding of the story, such as, for example,

helping to identify characters: who are the characters of the story and what their role is,

situations where they are involved and emotions felt by these and other characteristics of a

story.

Music can also create emotional effects on the audience. The audience emotional response to a

story can be enhanced or even influenced through the use of music.

This work aims to create a solution to score stories automatically in real-time by adding

entertainment to the visualization of that story and helping the audience to understand it better.

1.1 Motivation

We can think about music that is used on cinema movies, on theatres, on animated cartoons

and even on video games. The importance of sound and music in these environments is

undeniable (scoring). The scoring is used as a complement to the visual content helping to

better understand it. In these environments, the stories that are being told are static and pre-

made. Therefore, when the music is being created to accompany them, it is possible to know

what will happen in the story. That way, we can plan the use of music according to the events

and situations happening on the story.

2/100

In contrast, another place where we can consider the importance of the existence of sounds

and music is in real time storytelling environments. In these environments, a story is being

created in real time by a computer and is being told to an audience that looks forward to

understand and enjoy it. We can consider extending the artistic elements of music and sounds

to these environments as well.

In these environments, the stories are told in a dynamic and interactive way, where the actors

are real and/or virtual and they have a strong emotional component. Since the stories in these

environments are being created in real-time and are most of the times interactive, one cannot

predict what will happen throughout the story. The unpredictability of user actions and other

factors such as the story development make the creation process of music and sounds to

accompany it more difficult. The extension of using music to these environments is not

straightforward, and presents a problem that needs to be addressed. Further, the presence of

autonomous virtual agents makes the creation of these stories more emergent, making difficult

the process of creating sounds and music that is adequate.

In these environments, the story is the most important aspect. If we think about a movie with a

large variety of interesting special effects, one might enjoy watching it without really

understanding the story. In storytelling environments this is not possible, since its all about the

story.

We must then think about using sound and music to help the audience to understand it. By

understanding a story, we consider essential to understand the following topics: understand the

character’s roles, understand their actions, understand their emotional states and understand

the interaction and relations between characters.

The challenge of this thesis can then be summarized as the following: how to score a story,

which is being created in real time by these environments, using sounds and music in a way

that adds more understanding to what is happening in the story and consequently increases the

enjoyment to the viewer?

1.2 Objectives

The goal of this thesis is to create a system which generates music to accompany an interactive

storytelling system making the experience of watching the story more enjoyable for the viewer.

We will consider that, for a viewer to enjoy a story he has to understand it. Therefore, we will

need to find a way to score that story adding more comprehension to it and thus helping the

viewer to understand what is happening on it.

On the other hand, we also want to bring the entertaining factor to the music itself. We want that

music to be enjoyable to hear and not only use it as a tool to help the viewer to understand the

story being told.

Ultimately, we want to understand how important and influent music and sounds are for the

scoring of storytelling environments.

3/100

We can then summarize the objectives of this thesis as the following:

1) Create a framework that allows the scoring of storytelling environments in real-time. This

framework should be generic and be able to be integrated with various storytelling

environments.

2) Utilization of music as a tool to enhance the understanding of the story. We want to use

music to help the audience to understand what is going on the story.

To help define the understanding of a story, we can formulate the following hypothesis:

To understand a story, one must understand:

- The characters’ roles;

- The characters’ actions;

- The characters’ emotional states;

- The characters’ actions;

- The interaction and relations between characters.

3) Guarantee the entertainment of the audience through the use of music. Besides utilizing it to

enhance the understanding, music should always be a good way of adding entertainment to the

story just through the pleasure of listening to it.

4) Include the work of this thesis in the I-Sounds system, which is an existing framework that

allows the scoring of storytelling environments. This framework was previously used to integrate

a real time music composition module and connect it to the storytelling environment I-Shadows.

5) Extend the use of the framework and adapt it to other storytelling environments, such as

FearNot!

These are then the objectives that Dynamic Story Shadows and Sounds (or D3S) will try to

solve.

1.3. Outline The following of this document is organized in seven different chapters.

This dissertation will be contextualized in chapter 2, where the storytelling environments will be

discussed. We will give a brief description of a generic storytelling environment explaining its

main components and their relevance for this work.

In the first part of the “Background: Music, Emotions and Film Scoring” chapter we will report on

some work that has been made in establishing relations between musical parameters and

emotions. In the second part, we will give an overview about the area of creating music and

4/100

sound to score films (Film Scoring), which is tightly connected to this thesis and was a major

source of inspiration for the features of the work in it.

In the “Related Work” chapter we will do an overview on the use of computing systems that

have been developed in the same area of this thesis.

The section “Sound of D3S” will explain the theory that supports what have been done in D3S

where the framework and all its features will be explained in detail.

In the “Architecture” chapter, we will explain all the technical details related to not only to the

system were the D3S was built in (I-Sounds system), but we will also explain details of

integration with other storytelling systems such as FearNot!.. The music manager module

Apollo, that was implemented to fulfill some of the objectives of this thesis, will also be

described in this chapter and it will be given an example of interaction between D3S and a

storytelling system.

In the “Evaluation” chapter, all the testing of the system and its consequent result’s analysis and

discussion will be explained. In this section, among other topics, we will see to what point music

is important and relevant for scoring a storytelling environment.

The final section “Conclusions” will give the final considerations about this thesis, as well as

there will be a sub-section about future work that can be developed within the area of musical

scoring of storytelling environments.

5/100

2. Storytelling environments

In this chapter we will contextualize our thesis with storytelling environments and to discuss

some of their main characteristics. This will allow the further study about music integration on

them. We will also present the storytelling systems that are closely related to it, since they are

the systems where this work was tested on. Those systems are I-Shadows system and

FearNot! System. The first consists on a shadow theatre, where shadows manipulated by real

users interact with other shadows created by the computer. The second consists in the creation

of improvised dramas to address bullying problems.

2.1. A Generic Storytelling Environment

A storytelling environment consists on a place where a story is being told. It is a combination of

characters and their surrounding environment that together develop that story. Generally, these

environments are being created and developed in real-time while the audience is watching the

story. In some of these environments, it is possible for the audience to interact with the

characters and join the story. In Fig. 1 we can see a representation of a generic storytelling

environment.

Fig. 1 – A generic storytelling environment

The characters have a name, a role in the story and a mood. The name is what the characters

will be recognized as, while the role is their function within the story. The mood corresponds of

6/100

their inner emotional state, whether they are happy or sad. The characters also have an

emotional relation between them. A character might like another character, while at the same

time might dislike a third character.

The characters can do actions in the environment. They can do actions towards other

characters, towards objects or they can express emotions. These actions have an impact on the

environment and may change the relations between the characters and their moods.

The environment is where the characters are acting. It not only contains the characters but can

also contain objects which they can interact with. The environment also has a generic arousal

and valence values, which corresponds to how tense / relaxed or happy / sad the emotions on

the environment are. Due to the importance of these concepts for this dissertation, the next

section is dedicated to a more detailed explanation on the arousal / valence concepts.

2.2. Valence and Arousal

In the psychology area, there are two concepts related to emotions that are also connected to

storytelling - “valence” and “arousal”. Valence is a term used in psychology, especially when

discussing emotions, that means the intrinsic attractiveness (positive valence) or aversiveness

(negative valence) of an event, object or situation [1]. For example, emotions usually referred as

“positive” (such as happiness, joy) have a “positive valence”, while “negative” emotions (such as

fear, anger) have a “negative valence”. Arousal is a physiological and psychological state of

being awake. It is important on regulating consciousness, attention, information processing and

on motivating certain behaviours. Emotions can then be conceptualized along these 2

dimensions. Arousal consists on a single axis of intensity increasing from neutral to maximally

arousing. Valence can be described as a bipolar dimension, consisted of positive and negative

values. [2] It is possible then, to associate emotions within this 2 dimension space. In 1980,

Russel defined an Emotion Model, where several emotions were distributed on this space

according to their valence and arousal characteristics. Valence is represented on the horizontal

axis, while Arousal is on the vertical axis. (Fig. 2)

Fig. 2 - Emotion Model (Russel,1980)

7/100

This model is interesting on the context of this thesis because we can make a direct relation

between the most important human emotions and the dimensions of valence and arousal.

These are present on the storytelling environments and can be used to help the introduction of

music to accompany the story. Next, we will see how the stories can be created and developed

in a storytelling environment, and how we can associate valence and arousal with them.

2.3. Story Development

On the storytelling environments where the users can interact within, one is faced with the

problem of conciliation between the story planned by the system and the actions the users are

free to make that may change the story. Because of that, when creating a story to be followed

by characters in real time (being real or virtual), there is the need to follow a guideline so the

story makes sense and there are no critical deviations made by the users from it.

Freytag (1983) defined a pyramid that would define the drama development that generally

followed the same pattern along a variable called Tension.

On the Freytag Pyramid, there are 5 different acts:

• Exposition – provides the information about the environment, characters and their

relations.

• Rising Action – reaction to negative events that happen to prevent the protagonist

from reaching his goal

• Climax – turning point, where usually the protagonist succeeds on his goal

• Falling Action – everything returns back to the normal

• Denouement – conclusion of the story

Fig. 3 - Freytag’s Pyramid

It is possible to establish an association between this pyramid and valence. That way, we can

say that the story starts with a positive mood, suffers a negative impact and then ends in a

positive conclusion. Figure 4 shows the Affective Guideline of the story development used in I-

Shadows storytelling environment which illustrates that association.

8/100

Fig. 4 - Affective Guideline

If we wish to introduce a musical component to these environments, then it is essential to take

into account the usefulness of story development, not only on the impact it makes on the current

emotion state of each character and on the scene emotion, but also on predicting what will

happen in the current story, allowing the music to know it and try to predict it too, by helping

creating effects of suspense or anticipation.

2.4. Related Storytelling Environments

In storytelling environments, the story is the most important factor. However, it is possible to

emphasize the emotions emerging on each scene by using music to go along with the action.

Besides that, the use of music can create effects of suspense and anticipation on the viewers of

what will happen in the story. It is a powerful way to intensify the viewer enjoyment and

experience when assisting a story created on such environments.

We will dedicate special attention to the I-Shadows and FearNot! systems, since they were the

two environments where this thesis was tested on. We will follow with a brief explanation and

overview about the two systems.

2.4.1. I-Shadows

I-Shadows (Fig. 5) is an Affective Interactive Drama system, based on Autonomous Affective

Characters and Drama theory. The objective of the creation of this system consisted of creating

an interactive experience between real actors (the users) and the virtual actors (some Chinese

shadow puppets). On this system, users can use different puppets and create a story in

cooperation with the system. To be able to interact with the story, the virtual actors have an

agent architecture that supports emotion reactions, goal oriented behaviour and social

9/100

interactions. Besides that, there is a specific agent (a story director) that coordinates the actors,

allowing them to appear or disappear from the scene of the story. [3] [4]

Fig. 5 – I-Shadows System

Characters

I-Shadows system implements a very rich cast of characters, with appropriate actions and an

emotional behaviour. To achieve this emotional behaviour, the system uses an OCC model of

emotions (Ortony, Clore, & Collins, 1988) based architecture (FAtiMA) [5] developed at GAIPS

(Intelligent Agents and Synthetic Characters Group) for the minds of the characters and

director. [3]

There are two different types of characters on I-Shadows, the Real Characters, that are puppets

manipulated by the user and that are detected by the system using a vision component and the

Virtual Autonomous Characters that are implemented by the system itself. The Director’s

function is to conciliate both Real and Virtual characters’ perspective in the story development

[3].

Relation between characters

In I-Shadows, each character has its own view of the environment and, not only they react

differently to events, but they also have different relations amongst them. That way, each

character has an initial relation towards all the other characters. These relations can develop

and change dynamically along the story development and influence their behaviour.

Some characters may be more relevant for the story, such as the hero, the villain, and the

princess, since they are the main characters and the story develops around them. The relations

between these are usually stereotyped and can be set as initial relations. For example, the

villain emotional relation towards the other two is a negative relation (anger, hate). On the other

hand, the emotional relation between the hero and the princess is a positive one (love,

friendship).

Emotional state

Each character has its own personality. Because of that, each character reacts differently when

perceiving an event from the environment. They also have an own emotional state, which

corresponds to his mood at each moment. Based on their valenced reaction to each event or to

10/100

their relation towards other characters present on the scene, their own emotion state may

change and adapt to the new situation.

Fig. 6 – I-Shadows story

2.4.2. FearNot!

This application software was developed within the VICTEC (Virtual ICT with Empathic

Characters), a European framework V project which was carried out between 2002 and 2005.

The project considered the application of 3D animated synthetic characters and emergent

narrative to create improvised dramas to address bullying problems for children aged 8-12 in

the UK, Germany and Portugal. One of its main goals was to develop synthetic characters

which would create an emphatic connection with the user. The users would be children which

would be exposed to bullying scenarios through this application. Then they could explore those

bullying scenarios and cope with those synthetic characters to find strategies that could solve

their problems. [6][7]

Characters

The characters in FearNot! consist in a bullied child – the main character - a pair of bullies and a

pair of friends of the main character. The gender of the characters changes according to the

gender of the child that is using the application. That way, the users can identify themselves

more with the situations that are exposed to during the story.

Story Sequence

The story presented in FearNot! has two different aspects: one is shown when the child is

outdoor being bullied; another aspect takes form where the child is at his/her home and has a

chat with the user about what is happening and asks for his/her help. In the first one, the user

can’t interact and the story is being generated automatically. In the second one, the user can

11/100

write to the synthetic character using a text chat and cope with him to find strategies to solve the

problems.

Fig. 7 – FearNot! story

2.5. Summary In this section we have presented the context behind the development of this thesis. A brief

explanation on generic storytelling systems was made. We discussed relevant themes such as

valence and arousal and its relation to story development that we can find in the I-Shadows

system. A special attention was given to the two storytelling systems that are closely related to

this thesis, I-Shadows and FearNot!. Before the implementation of this work, I-Shadows musical

scoring was limited to an association with the emotional state of the environment. FearNot!

system did not have any sound scoring at all. With D3S, the musical scoring got more complete

on I-Shadows and was also used to score FearNot! stories.

12/100

3. Background: Music, Emotions

and Film Scoring

This chapter is divided in two major sections. In the first section, we will discuss relevant

aspects about the relation between music and emotions and the importance of that relation in

the context of this dissertation. We will present some works that relate several musical

parameters with different emotions.

The second section will be focused on film scoring, which consists on the process of creating a

musical accompaniment to a film. We will discuss most of its characteristics in order to show its

relevance and purpose within the movies. The film scoring area was a major source of

inspiration for the work developed in this dissertation.

3.1. Music and Emotions

Wittgenstein [8] tells us that we understand music and language in similar ways but music is not

a language because we can’t communicate through music as we can through language.

However, between other things, it is still possible to communicate something through music –

emotions. In another perspective, Aristotle tells us that music is mimetic or imitative. By saying

that, he meant that music can be a representation of a person’s emotions and moods. If he was

right, then maybe we have a greater direct connection to music than we do in attempting to put

words to our emotions or to our emotive responses to music, because music can be imitative of

our emotions. Following Wittgenstein and Aristotle, we can understand music as something that

goes beyond words and only exists beyond words. That can provide an explanation to why we

respond the way we do to the music. Music can be so much closer to the affective field that it

cannot be described by words. [8]

In storytelling environments, it is important for the viewer to understand the emotions that each

scene may suggest. That way he can understand the story better and some ambiguities may

disappear. Being music an exceptional way to transmit emotions, then we need to make an

association between music features and emotions states.

The ability that music has to affect and manipulate emotions and the brain is undeniable, and

yet largely inexplicable. Not many studies have been made in this area until recently, since

music and biology fields were considered mutually exclusive.

One of the problems found when trying to study the musical’s emotional power is that defining

music’s emotional content may be very subjective, since a piece of music can be experienced in

different ways by each person who hears it. The emotion created can be affected by memories

that are remembered to the listener, the environment where the music is being played, the

mood of the listener at the time, his personality, culture and other multiple factors that can

influence the emotions felt by that listener. [9]

13/100

A study was made [10] in order to test the brain responses to pleasant and unpleasant music.

Unpleasant music was defined in this test as being music with a high rate of dissonances within

it. The findings of this study suggested that music may recruit neural mechanisms similar to

those previously associated with pleasant/unpleasant emotional states.

The rewards given by the brain while listening to pleasant music were analyzed by Blood, A.J

and Zatorre, R.J on a study [11] where several users were tested while they were listening to

pleasant music. Rewards given by their brains like “shivers-down-the-spine” or “chills” were

analyzed through the measuring of cerebral blood flow. Besides those chills, there were reports

of changes in heart rate, electromyogram and respiration. While listening to the pleasant music,

some brain structures were activated which are related with other euphoria-inducing stimuli,

such as food, sex and abuse of drugs. As a conclusion, they stated that this finding links music

with biologically relevant, survival-related stimuli via their common recruitment of brain circuitry

involved in pleasure and reward.

Another quantifiable aspect of emotional responses to music is its effect on hormone levels in

the body [12] [13]. There is evidence that music can lower levels of cortisol in the body

(associated with arousal and stress), and raise levels of melatonin (which can induce sleep) [12]

.This explains the music’s ability to relax, calm and give peace.

Despite several studying on this area, some questions as “How does music succeed in

prompting emotions within us?” or “Why are these emotions often so powerful?” cannot be

answered yet. It is possible to quantify the emotional responses caused by music, but it’s not

possible to explain them. [9].

Classification of the emotions associated to music is a challenging problem. Simply assigning

an emotion class to a song segment in a deterministic way does not work well because not

everyone shares the same feelings for a song. Listening mood, environment, personality, age,

cultural background etc, can influence the emotion perception. Because of these factors,

classification methods that simply assign one emotion class to each song in a deterministic

manner do not perform well in practice [14] [15] [16]

Yang, Liu and Chen [17] proposed then a fuzzy approach to classify music emotions, where

they created a model based on Thayer’s model of mood [18] (Fig. 8). They would divide the 2

dimension space (arousal – valance) in four parts, being each part associated with a type of

music.

14/100

Fig. 8 - Thayer’s model of mood [18]

The use of music as objective to intensify emotions was also used on other artistic areas

besides narration, such as painting. Cheng-Te and Man-Kwan [19] developed a system that

could generate automatic music to accompany a slideshow of impressionism paintings. They

said that in particular, paintings have more affective content than photos and impressionists are

more concerned with conveying emotions. Music elements which affect the emotion include

melody, rhythm, tempo, mode, key, harmony, dynamics and tone-color. Among these music

elements, melody, mode, tempo and rhythm have stronger effects on emotions. Generally

speaking, major scale is brighter, happier than minor; rapid tempo is more exciting or tenser

than slow tempo.

Other relevant works have been developed related with the association between emotions and

different types of music. Due to its high importance for this thesis, the next sub-section is

dedicated to it.

3.1.1. Different types of music for different emotions

When listening to some type of music, one may feel emotions and associate them to that music.

As explained before, that association may depend on numerous factors, depending on who’s

listening to it. However, it is still possible to make some relations between characteristics of

music and the emotions that they may suggest. We will present 3 different works that were

made on this area.

Juslin, Bresin and Friberg work

Juslin, Bresin and Friberg [20][21] have made a correlation between emotions and certain music

features. Several users listened to music played with different performances where each

performance had its own music features. It was possible to show that most listeners recognized

the intended emotions correctly when using different features of each music played associated

to those emotions. Some of the results of their work are shown on the next table (Fig. 9)

15/100

Emotion Music Feature Value

Fear Tempo

Sound Level

Articulation

Irregular

Low

Mostly non-legato

Anger Tempo

Sound Level

Articulation

Very rapid

Load

Mostly non-legato

Happiness Tempo

Sound Level

Articulation

Fast

Moderate or load

Airy

Fig. 9 - Features extracted from music and their correlation with emotional state[20][21]

Gabrielsson and Lindström work

Similarly, Gabrielsson and Lindström [22] have made an association between the emotions of

Happiness and Sadness and several musical parameters: articulation, harmony, loudness,

melodic range, melodic direction, mode, pitch level, rhythm, tempo and timbre. The results were

the following:

Happiness in music parameters Sadness in music parameters

- Articulation – staccato - Articulation - legato

- Harmony – simple and consonant - Harmony – complex /dissonant

- Loudness - loud - Loudness - soft

- Melodic range - wide - Melodic range - narrow

- Melodic direction – ascending - Melodic direction - descending

- Mode – major - Mode - minor

- Pitch level – high - Pitch level – low

- Rhythm – regular / smooth - Rhythm – firm

- Tempo – fast - Tempo - slow

- Timbre – few harmonics - Timbre – few harmonics, soft

As a result of their study, they concluded that happiness and sadness may be like opposite

emotions while using these parameters, since most of them had opposite results. An exception

was founded on the timbre, where both happiness and sadness use few harmonics to be

expressed.

Jan Berg and Johnny Wingstedt work

Another example is Berg and Wingstedt’s work [23]. Six musical parameters were considered to

their study involving the emotions ‘happiness’ and ‘sadness’. Those parameters were mode

(major, minor), instrumentation (timbre), tempo, articulation (legato-staccato), volume and

register.

The mode of a music is often associated with the emotion of happiness and sadness, being the

major mode linked to the first one and the minor mode to the second one. Besides that, the

16/100

major mode can also be associated with grace, serenity and solemnity, while the minor mode

can be associated with dreamy or dignified qualities of tension, disgust and anger. [22] On this

work, there were tests made with users. The objective was to make a relation between the

major / minor mode of music and the happiness / sadness emotions. Results show that over

95% of the users recognized the major mode as being associated with happiness and the minor

mode associated with sadness. Besides that, they concluded that users with musical studies

had linked this relation even stronger than non-musicians. This was the test with most relevant

and conclusive results, which may help us to conclude that there is a strong link between mode

and those two emotions.

According to the instrumentation parameter, it has been shown that sounds with a rich harmonic

spectrum may suggest potency, anger, disgust, fear, activity or surprise. Sounds with a few, low

or suppressed harmonics may be associated with pleasantness or happiness as well with

tenderness, sadness or boredom [22] [23].

Tempo is other relevant factor, where one can associate fast tempo with expressions such as

happiness/joy, activity/excitement, potency, surprise, anger or fear. Slow tempo may be

associated with sadness, calmness/serenity, dignity/solemnity, tenderness, boredom and

disgust [22][23].

Articulation is what defines the overall note-length of the music played, being staccato

associated with shorter duration notes and with the expressions of gaiety, energy, activity, fear

and anger. On the other hand, legato is associated with longer duration notes and may transmit

the expressions of sadness, tenderness, solemnity and softness [22][23].

Volume is associated with the loudness of the music. Loud music may transmit expressions like

joy, intensity, power, tension or anger. Soft music may transmit sadness, softness, tenderness,

solemnity or fear [22][23].

Register tells about how high or low the pitch level is. High pitches are usually associated with

happiness, grace, serenity, dreaminess, excitement, surprise, potency, anger, fear and activity.

Low pitches suggest sadness, dignity, solemnity, boredom or pleasantness [22][23].

Even if these parameters can be directly associated with emotions, when two or more are

combined, then the result may be different. They showed that the minor mode was mostly

associated with sadness. However, if it is added with a fast tempo, then the music may be

perceived as ‘happy’ [23]. The following table summarizes the association between the

parameters and the expressions/emotions.

17/100

Fig. 10 – Relations between musical parameters and human expressions [22][23]

3.2. Film Scoring

Film Scoring consists on creating a musical accompaniment to a film. This area has already

been studied and developed for some years now. It started in the first times of cinema, where

the score was used on silent movies so they could keep the viewer focused and interested on

the movie. Nowadays, film scores have a big importance to the movie and music is composed

exclusively for the films, being played by big orchestras or by well known bands. Even if its

purpose has changed, film music is a very important component on films and still provides other

purposes, as we will see in this section.

Since a film has a story involved, then we can make a relation between films and other

storytelling environments, such as I-Shadows. Since this thesis work will be focused on music to

accompany stories, then film scoring is the closest existing method to seek inspiration from,

being important to dedicate the rest of this chapter to it.

3.2.1. Film score functions

Usually it is hard to understand the difference between film score and film songs. While film

songs are songs used in a film that can be compiled in a soundtrack, the film score is the

“illustration” of the movie through the use of music, composed exclusively for that film.

The score can have a variety of functions in a film. These functions are generally dictated by the

film’s director and one or more can be used at the same time. The main functions of a film score

will be described next. [24]

Parameters Variations Emotions/Expressions Associated

Major mode happiness, grace,serenity, solemnity Mode

Minor mode sadness, dreamy, disgust, anger

Rich harmonics potency, anger, disgust, fear, activity, surprise Instrumentation

Few harmonics pleasantness, happiness, tenderness, sadness, boredom

Fast tempo happiness/joy, activity/excitement, potency, surprise, anger, fear Tempo

Slow tempo sadness, calmness/serenity, dignity/solemnity, tenderness, boredom, disgust

Staccato gaiety, energy, activity, fear, anger Articulation

Legato sadness, tenderness, solemnity, softness

Loud volume joy, intensity, power, tension, anger Volume

Soft volume sadness, softness, tenderness, solemnity, fear

High pitch happiness, grace, serenity, dreaminess, excitement, surprise, potency, anger, fear, activity Register

Low pitch sadness, dignity, solemnity, boredom, pleasantness

18/100

Source Music

Source Music is music that plays a key part in a scene of a film in which it is inserted in a scene

and every character of that scene is aware of it. This music is part of the film and it can be

heard or played by one of the characters. For example, if a character is playing an instrument

on the film, the sound of that instrument must be provided. The most complex type of source

music is music composed exclusively for the movie, such as orchestral music for an orchestra

that is playing on that movie.

Thematic music

Thematic music can be opening themes and character themes. This kind of music usually

parallels the action and can help the audience to recognize a certain character just by hearing

its theme or even changes that happened on that character, as it’s explained below on the

section “thematic music in character development”. [24]

Defining the ethnicity, location and period of the film

Film score music can be used to give hints to the audience about the ethnicity, location and

period of the film, adding realism to the scenes. The director can choose to take an existing

piece of music that represents or defines the period or location of the movie or ask the

composer to produce music with certain characteristics that would give those hints to the

audience about where and when the story of the film happens. Those hints can be made

through the use of instruments, like on the movie Braveheart (1995), where bagpipes were used

to give a Scottish feeling to the film, so it would help to clearly identify the location and ethnicity

of the characters of the film. [24]

Paralleling the action of the film

Music can also be used to emphasize action on the film. It can have the objective of accentuate

what the viewer is already seeing. One kind of films that uses this in an intensive way are

animated cartoons, where almost every action of a character is accompanied by music (for

example, when the coyote falls from a cliff while chasing road runner, the music also falls down

in pitch, usually in a chromatic sequence). Paralleling the action of the film is also called

underscoring, that will be expanded further ahead. [24]

Commenting on the film and adding to scenes

This function is considered one of the most intelligent ones, since music here has the function of

providing additional information to the viewer that is not explicit on the scene which is being

scored. This can be done in numerous ways. The first one is through an overture that consists

of an introductory music that has all the major themes of the movie and summarizes it, by giving

hints about the plot. For example, if there’s a love theme present on that overture, then the

viewer might guess that it will be present on the movie. A good example is Superman (1978)

19/100

which its overture includes a heroic march and a love theme, so the viewer knows that most

probably the hero will find love on his adventure.

Commenting the film through music score can also be done to describe a location shown on the

movie. Some landscapes may lead to misinterpretation if there’s not a musical score hinting

about the nature of it. For example, on the movie Legends of the Fall (1994) where the score

was composed by James Horner, the opening scene shows a scene of a beautiful landscape

that was accompanied by the score. If the scene was shown in silence, the viewer could not

have his attention focused on that beauty, and the scene could be understood as expressing

loneliness.

Another interesting example is the movie Jaws (1975), where the two notes musical motif is

played when a shark is coming, serving not only as the function of commenting the film by

hinting the viewer of what is about to happen but also providing emotional focus, as it will be

explained on the next part. [24]

Providing emotional focus

Another function of a film score is its ability to generate an emotional response from the viewer.

This is considered the strongest and most notorious function of film scoring, being all the other

functions related in a way to this one, since all of them have as objective to produce certain

emotions on the viewer. The use of orchestra is the preferred method to do this, where “strings

emphasize romance and tragedy, brass instruments emphasize power and sorrow (when used

in solos), and percussion heightens the suspense”. [24] Other relations between music and

emotions are described better in the section “Music and Emotions” of this document.

Underscoring

Underscoring consists on a piece of background music that is used to parallel the action of a

film. It can also be used as a mood enhancing accompaniment to a situation in a movie.

However, that music should not distract the viewer from the movie. [24] The use of character

theme’s can be used as underscoring, where the music that represents each character is

played when that character is on scene, having the necessary variation needed for the different

scenes and to accompany the character development. We can also see the example of the

movie Gladiator (2000) where the battle scenes had really intense music scoring them, while

other parts of the movie were scored with music that was not intense, having other musical

characteristics.

Sometimes the use of underscoring is abused making the effect of emphasizing the action

disappear. A bad example of the use of underscoring is the movie The Rock (1996), where it is

impossible to musically distinguish the scenes since they were all scored in the same bombastic

manner. [24]

20/100

Thematic music in character development

As explained above, thematic music consists on opening themes and character themes. A

character theme is a piece of music that is associated with a certain character and it is usually

played on the most relevant moments of the film where that character shows up. It is mostly

used for main characters and can help the audience not only to recognize the hero or the villain

but also to better notice changes on that character resulting from the action of that film. The

music can go along with the character development through that film, changing with that

character. A good example is the character theme scored for Luke Skywalker, on the Star Wars

movie (1977) by the composer John Williams. In the first scenes, Luke was shown as an

inexperienced young adult by using vigorous orchestrations to reflect his youth, with the

particular use of strings to represent hope and innocence. By the film’s end, when Luke finishes

his hero’s journey, the same theme is played in a different way, through the use of trumpets and

presented as a march. This expresses his maturity, newly found confidence, and strength -

changes of the character during the film that are also reflected in the music. [24]

Other functions outside the film

Music can be other functions outside the film, as it can be sold as soundtrack so the viewer can

listen at home their favorite film music. Some films themes have become so popular that can

serve as publicity for the correspondent film, being sometimes a theme better known than the

film itself. [24]

3.2.2. Pre-made music vs composed music

Sometimes, pre-made music is used temporarily within the creation of a movie while the

composer is making the dedicated music for it. That way, the director can have a first idea on

what he intends for each scene by using music with similar characteristics of the one under

production.

When making music for film scoring, the use of orchestras is preferred over popular music, not

only because of the vast number of instruments within the orchestra can allow the composer to

better show different emotions, but also because orchestral music ages much more gracefully,

while popular music tend to date quickly as styles rapidly evolve. [25]

3.2.3. Creation of the film score – production, composition and

synchronization

Being a film the result of a collective effort, the composer working on a film is not free to work

alone. He must submit to such demands as budget, number of performers, timing (sometimes

calculated in fractions of a second), the mood of each sequence, the producer's tastes, and the

pressures of production schedules. [26]

21/100

After the complete film or part of it has been shot, the composer is shown a “rough cut” of it and

talks with the director about the type of music being used on it. The process is called “spotting”.

Sometimes, when the musical score needs time to be developed, due to the complex nature or

importance of it, or because the scenes are dependent of the music (like dances, for example)

the director may talk with the composer before the shootage had take place.

During the process of composition, the composer may choose to write paper scores or use a

computer-based environment. The advantage of using the computer is that it is possible to

generate a midi file of the music which can be listened and approved before it is given to the

orchestra to perform. [25]

One of the hardest parts on the process of the film score creation is the timing and

synchronization of the score within the film scenes. This synchronization may be done by

adapting the image to the music or music to the image, depending on what would be easier to

do. For example, in an animated cartoon, usually its music that adapts the image, since the

continuity and fluency of the music is secondary, being the animation the main point. If it’s on a

movie where the director wants a scene where there will be a dance, then it should be the

image to adapt the music, being this last one the most important in this case. On the context of

a real time storytelling environment such as I-Shadows or FearNot!, the music would have to

adapt to the scenes, since obviously what matters is the story being shown on the shadows

theatre, which is being made in real time and there’s no previous preparation (in terms of timing

and synchronization) of the music produced for it.

Claudia Gorbman [27] defined seven principles which should be the functions of using

background music to accompany a narrative and what should be achieved when composing,

mixing and editing film music. Those principles are:

1 – ‘Invisibility’ – the technical apparatus that produces film music must remain invisible

2 – ‘Inaudibility’ – since background music is not the main focus of a film, it should not be heard

consciously, so the viewer can focus on the most important which is the dialogues, the visuals,

etc.

3 – ‘Signifier of emotion’ – music can suggest moods and emotions

4 – ‘Narrative cueing’ – music should work a) ‘referentially’/’narratively’, indicating point-of-view

and character/setting, and b) ‘connotatively’ – interpret and illustrate narrative events

5 – ‘Continuity’ – one of the objectives of background music is to fill “gaps”

6 – ‘Unity’ – this is achieved by using variations and repetition of musical material (such as

themes)

7 – “One might break with any of the above rules within the boundaries of reason”, making this

rules not absolute but as a guide to better understand background music functions.

22/100

3.2.4. The Symbolic, Real and Imaginary - three regimes of film

music

Robert Spande [28] created a theory, where he says that the film experience must imitate in

some way all three overlapping dimensions of subjective reality: the symbolic, the real and the

imaginary, and the use of film music is fundamental for it. These three dimensions are equally

important and none of them have any sort of “priority” or “primacy” over the others, being all

important for the experience that a subject feels when watching a film.

The Symbolic is related to the fact that film music’s objective is to emphasize/underline the

important dramatic aspects of the film.

The Real is the dimension related to facts that happened to the subject, such as traumas. So

the author defends that film music may have similar effects on the listener as traumas.

The Imaginary is connected to the fact that music is omniscient and always knows what is going

to happen on each scene.

3.2.5. Animated Movies

These types of movies take special advantage of the use of music scoring. Since most of them

(specially the older ones) have no dialogue or spoken commentary, the sound-track consists

simply of sound effects and music, being of a great importance for these movies.

One of the best example of the use of film scoring on animated movies are the Walt Disney’s

ones, where the synchronization between the soundtrack and the visual elements is so

complete that makes the cartoon viewing difficult to imagine without sound.

“The music and sound effects in animated movies have to create a lot more of the environment

and, many times, the emotions of the film. Animated, 2-D figures can’t emote in the way a

human actor can. The music in animated features often has to convey a lot more.” [29]

3.2.6. Scoring in Games

With the evolution of technology and the growing demand on the game’s market, using music in

games as scoring has become fundamental to improve the player’s experience when playing a

game.

Games are closer to the storytelling environments such as I-Shadows than films because there

is interactivity between the user and the environment. For example, music can be triggered

when the player reaches certain point in a game. The use of music could emphasize the

emotions given to the player by the game at that point.

Besides maintaining all of the functions found in films or television sound, games audio has

some distinct differences in the ways audio functions in games. Karen Collins [30] listed several

of those functions, being those commercial functions, kinetic functions, anticipating action,

drawing attention, structural functions, reinforcement, illusionary and spatial functions,

environmental functions and communication of emotional meaning.

23/100

3.3. Summary

In the first part of this chapter, we discussed the relation between music and emotions. We

presented some studies about the effect of listening to music on human brain. We saw that

listening to it may arise emotions. When listening to pleasant music, the rewards given by the

brain to the listener were similar to survival-related rewards. We also discussed that each

person has its own experience when listening to music and the emotions arised by listening to

the same piece of music can be different on each individual. Then we saw what relations have

been made between different kind of emotions and music characteristics, such as mode,

instrumentation, tempo, articulation, volume register and others. The most relevant association

being made was between the emotions of happiness and sadness that are opposite emotions in

terms of musical characteristics.

In this thesis’ context, knowing that each person feels differently towards a piece of music will

be useful when gathering the music that will fit a possible emotion state. One cannot go into

much detail by selecting music for each type of emotion, since there is a subjective association

between music and emotions depending on the listener. It would be more reasonable to group

emotions according to their similarities. The knowledge obtained by associating emotions with

different kinds of musical characteristics will help on that task as well.

In the second part of this chapter, we discussed the importance of the use of music as

accompaniment to films. Several of its functions were described, such as helping defining

ethnicity, location and period, paralleling the action, underscoring and providing emotional

focus. We also discussed some topics such as the use of thematic music on character

development, pre-made music vs. composed music and we described an overview on the

process of film score creation.

As a final conclusion for this section, we can say that a film score can’t save a bad film.

However, if it’s done well along with a good film, it can multiply the emotions felt by the viewer

and make his experience a most enjoyable one.

24/100

4. Related Work

After reviewing significant work in the fields of music, emotions and film scoring, it is essential to

present some of the works made in the computing area that used some of the concepts

mentioned in the previous sections. In this section we will then present seven distinct works:

Marc Downie’s behavior, animation music: the music and movement of synthetic characters;

Roberto Bresin’s Virtual Virtuosity – Studies in Automatic Music Performance; Pietro Casella’s

Music, Agents and Emotions; Andrew de Quincey’s Herman, Nakamura’s et al Automatic

background music generation based on actors’ mood and emotions, Axel Berndt’s Adaptive

Musical Expression from Automatic Realtime Orchestration and Performance and David Cope’s

Experiments in Musical Intelligence.

These works are relevant for this thesis and inspired the work here reported, since some of their

objectives and goals are similar.

4.1. Marc Downie’s work

Marc Downie’s work [31] named behavior, animation music: the music and movement of

synthetic characters consisted on the creation of synthetic characters within an architecture of

reactive behavior systems. These characters are autonomous characters that are complete but

simple. They are situated inside virtual worlds and we can interact with them. These characters

are related with music since one of the objectives of Downie’s work was the creation of music

through the use of those characters. One abstract character would be responsible for the music

control.

One musical character was built for the project (void *) (Fig. 11). This project consists of a group

of characters that would perform dance moves according to the interaction of the user with

them. The creation of the musical character had the objective of completing the environment

with a musical score. The objective of the music created by it was that it would never sound bad

and it would attend to emotional changes and behavioral decisions of the dancing characters as

well as changes on the camera system. Music would also have to exist only as a background

feature and should never be the primary attention target.

Fig. 11 – Project (void *)

The flow of the system for this musical character consisted on a behavior system that controls

tiles of pre-composed music (which reduces the risk of “sounding bad”), choosing when to play

25/100

them and how to change some of the tile features, such as volume. There is also pattern

generators between the behavior system and the tiles layout that decides the sequence of tiles

to play. These pattern generators can be used to predict the future.

The author noticed that in games most of the music is environmental background consisted of

music loop and occasional fade to silence. Because of that, he wanted to go beyond the use of

possible irrelevant background music and try to provide musical score for a medium with no

script that is up to the challenges offered by rich characters.

The use of music in (void *) follows the use of thematic development for the characters, use of

special music for key events and takes in account the emotions of the scene and which

characters are in scene at each moment.

The biggest problems arised by this musical system was the difficulty of dealing with time

related components, such as synchronization with the action in real time.

4.2. Roberto Bresin’s work

Roberto Bresin’s work [32] entitled Virtual Virtuosity – Studies in Automatic Music Performance

consisted on a research in the field of automatic music performance with a special focus on

piano. The objective of this work was to adapt the virtuosity of live performances made by

humans into an automatic music system. That way, it should be possible to have a system that

created music including time, sound and timbre deviations from a deadpan1 realization of the

score.

They proposed a system based on artificial neural networks (ANN). The system designed

listens to the last played note, predicts the performance of the next note, looks three notes

ahead in the score, and plays the current tone. It generates real-time sound level and time

deviations for each note represented in the input to the ANN. The ANN model was also used to

produce punctuation in performances. However, it failed at realizing legato and staccato

articulation. These performance features were used to create six macro-rules that represented

six different emotions: anger, sadness, happiness, fear, solemnity and tenderness.

4.3. Pietro Casella’s work

Pietro Casella’s work [33] named Music, Agents and Emotions had as objective the creation of

a generic cinematic musical agent able to create film-like music on demand. To achieve that

goal, they used automatic music generation algorithms with an emotional goal and automatic

management of the interaction between action and music with specific knowledge oriented to

affective goals. Two components were created so this objective could be achieved – an agent

named MAgentA and the system named Mediator.

MAgentA was the designation found for the music creation agent that focused on building music

through the use of composition algorithms which incorporated emotions on them. However, it

1 A deadpan play of a music does not have any kind of performance expressions

26/100

could not bring the “film like” music to the setting since, according to the author, technology is

still missing.

Mediator was the system responsible for integrating music in an interactive virtual environment.

It served also as a test-bed for models of action-music interaction as well as being a research

tool for game-music composers and directors.

4.4. Andrew de Quincey’s work

Andrew de Quincey’s work [34] named Herman is a project capable of generating music at

varying levels of tension in real time. It generates background music to a 3D haunted house

environment - Judy Robertson's PhD system, Ghostwriter [35] (Fig. 12), operated by children to

produce plays. One of the main objectives is to convey fear using suspense and surprise effects

through the music. According to the flow of the narrative, one of the human users can set the

level of tension manually, which the system responds to. The generation of music is made in 3

levels: 1 - high-level form, 2 - rhythm and volume and 3 - melody and harmony. The system

changes the parameters of each step according to the level of tension set previously by the

user. In the melody and harmony level, a custom probabilistic model is used to choose the

melody and it is used a set of composition rules and parameter values from harmony theories.

Fig. 12 - Ghostwriter

4.5. Nakamura’s work

Nakamura’s et al work [36] consisted on the creation of an automatic music generation system

that would help to improve the total quality of computer generated animations. The inputs for

this system are music parameters (mood types and musical motifs2) and motion parameters for

each scene of the animation that describes the action. The system then generates the music for

that scene starts by choosing tempo according to the emotional state.

The harmony is then chosen using a system of rules and its selection is based on the mood

type. Then the system chooses the melody, using the motifs instantiated to the harmony and

looped the necessary time. The rhythm is chosen using pre-made rhythmic patterns associated

to emotions and its selection is based on the mood type and tempo.

2 Motifs are short pieces of music. In this case they are pre-composed.

27/100

Finally it is made the selection of sound effects for motions that are determined according to the

characteristics and intensity of the motions. These effects are also synchronized with the music.

Both the background music and sound effects are generated so that the transitions between

scenes are smooth.

4.6. Alex Brendt’s work

Axel Berndt [37] developed a new approach named Adaptive Musical Expression from

Automatic Realtime Orchestration and Performance. It is based on musical orchestration

principles that consist on an adaptation of performative expression characteristics of a musical

piece.

The work focuses on how to make smooth transitions between musical pieces and at the same

time including some humanization features in those transitions through the use of dynamics and

tempo.

This work uses pre-composed music and changes dynamics and tempo on it. The changes in

dynamics consist in giving loudness instructions and changing between different loudness

levels (crescendos or decrescendos). The changes in tempo consist in subtle acceleration or

slowing (accelerando, ritardando) and making delays on the music. The MIDI standard is used

to output the music.

Fig. 13 – Berndt’s system overview of the approach to realtime adaptive orchestration and

performance

4.7. David Cope’s work David Cope [38] started developing his work called Experiments in Musical Intelligence (EMI) in

1981, where he intended to create a computer program which composed complete works in the

styles of various classical composers. He started by coding rules that would be followed by the

program to generate new music. However, the music created was lifeless and without much

music energy. He changed then the initial concept and revised the program to create new

output from music stored in a database. That way, the program would retrieve instructions that

28/100

are contained in each piece of music and then would create different music based on those

instructions. The discovering of instructions is based in part on the concept of recombinancy,

which is the method for producing new music by recombining extant music into new logical

successions. To do this recombinancy successfully, EMI uses three basic principles: 1)

deconstruction (analyze and separate into parts); 2) signatures (commonality – retain that which

signifies style); 3) compatibility – (recombinancy – recombine into new works). This work is one

of the most known works in the field of automatic composition.

4.8. Summary

In this section we presented some works related to this thesis. In all the works there was a

creation of a system that managed to create music in an automatic way. In Downie’s work [31],

a musical agent was created to assist the virtual environment (void *) with the musical creation.

In Bresin’s work [32] the main goal was to generate automatic music that had some kind of

virtuosity and expression on it, instead of just playing a deadpan reproduction of the score. In

Casella’s work [33] there was a creation of a musical composition agent that was integrated in a

system that integrated the music created by it in a virtual environment. In Quincey’s work [34],

music was generated to accompany a story going on a haunted house virtual environment. In

Nakamura’s et al work [36], music was generated to improve computer generated animations,

based on actors’ emotions and motions. In Berndt’s work [37], it is given a special attention on

how to make a smooth transition between musical pieces through the use of changes in

dynamics and tempo. Finally, in Cope’s work [38], new music is automatically generated based

on the recombinancy concept. To the work of this thesis, we can relate the ideas of the use of a

module that coordinates the creation of music, used in both Downie’s and Casella’s work, with

the use of Apollo – the D3S music manager which will be further explained. We can also

associate the use of virtuosity and expression of music with some of the changes in musical

parameters that were made for this thesis like in Berndt’s work. Other similarities between these

works and D3S can be found. As a final conclusion for this chapter, we will present a table

summarizing the features of these systems that influenced D3S work. The D3S features will be

explained with more detail in chapter 5.

29/100

Fig. 14 – Relation between discussed works and D3S

30/100

5. The sound of D3S

The big difference between scoring a movie and scoring a real time storytelling environment is

that these environments create a story in real time. While in the movies there is time to select

the scenes to score, compose the music suitable for each scene and then editing and

synchronizing with the music with the images, on real time environments this is much harder or

impossible to do without the aid of a computing system. We can make some associations

between what was presented in the film scoring chapter and what we could have on our system.

For instance, it would be interesting to have character themes associated to each main

character. It is also possible to use music to give hints to the viewer, mostly of what could

happen in a story’s near future. For example, when the villain is about to make his entrance on

the shadow theatre, then the music can give a hint about it by starting playing its theme just

before he enters. The villain theme should be a music piece with sinister and scary

characteristics, as other character themes would have their own characteristics.

In this thesis work, we will consider the use of pre-made music, since making a system that not

only accompanies a story in an effective way but also composes music in real time is beyond

the objective of this thesis. Because of that, the use of pre-made music, mostly classical or

instrumental, will suit better the storytelling environments designed for a young audience.

5.1. Understanding and enjoying a story

Our main goal with this thesis is to increase the enjoyment to the viewer of a story generated in

a storytelling environment. To enjoy viewing a story, in the majority of the cases, one needs to

understand it. One could say that if we watch, for example, a movie with many interesting

special effects, we can say that the movie was enjoyable to watch because of those features,

even if we didn’t understand what was going on in the story. However, if we understand the

story, then we have the chance of enjoying it even more. Through adding music to these

storytelling environments, besides adding entertainment value to them, we also want to add

understanding to the story so the user can enjoy watching it even more.

To understand a story, it is essential that we understand the characters in it. Then we need to:

• Know their roles - who is the hero, who is the villain, etc;

• Understand their actions – who is doing what;

• Perceive how they are feeling – not only individually but also towards other characters;

• Understand what is happening in the scene through the result of the characters’

interaction – knowledge about the environment’s valence and arousal, which results

from the interaction of the characters.

31/100

Besides the point of understanding a story, we also want the user to be entertained through

hearing music that he could enjoy. Music can help to understand the story, but if the viewer

doesn’t like the way it sounds, then it will be harder to reach the final goal of the enjoyment

increase.

The scheme in Fig. 15 illustrates the relation between the main goal – enjoyment of the story –

and the features of the thesis. Those will be explained in the following sections of this chapter.

Fig. 15 – Story enjoyment and understanding

32/100

Background music Event Sounds

Character

Themes

Key Moments

Themes

Music that

emphasizes

emotions

Filler Music

Tempo

And

ArousalHero ThemeInstruments

And

Characters

Volume

And

RelationsVillain Theme

Friend Theme

Hero Theme

PH

Hero Theme

PL

Hero Theme

NL

Hero Theme

NH

Duel Theme

Friendship Theme

Climax Theme

Note that there is also the entertainment factor of music – we want the music to sound good to

the story viewers. That would be guaranteed mostly through the use of pre-composed music for

all different background music. The use of pre-composed music is not explicit in this scheme.

Based on the features of the bottom layer of that figure, we can then organize them in a generic

D3S framework to score storytelling environments, which is shown on the next figure:

Fig. 16 – D3S Framework

Using the understanding and enjoyment factors as guidelines, in this chapter we will explain this

framework and present the theory behind the development of this work.

5.2. Event sounds and background music

As we have seen before, several films and animated movies use different type of music and

sounds to score what is happening on them. We can notice certain resemblances between the

affective environments such as I-Shadows and animated movies – both use fictional characters

that are made specifically for them; both are focused on a younger audience; both have actions

and feelings that are stereotyped through the use of images – for example, if a character is in

love with another one, some red hearts may appear in scene as an evidence of that feeling.

Using the last resemblance, we can then analyze how these actions are scored in the animated

movies – through the use of underscoring. This effect consists on scoring actions through the

story in a way that it is possible to emphasize them and make them more noticeable for the

audience [24]. That way we can then select specific actions in the affective environment – which

33/100

we will call events – and associate a piece of music or a sound to be played with it while it is

happening on the scene. That way we can underscore those events paralleling the action with

sounds.

Besides the use of music to score events, we want also other kind of music that we have seen

that is present on almost every single animated movie (specially on the classics from Disney)

and that is used on movies as well – the background music. This type of music may have

different purposes and objectives: it can be used as a way of emphasizing the presence of

certain characters on a scene; it can be used to hint the audience of the intensity and

importance of the actions that are happening in the story arc, such as the climax of the story; or

it can just be used as a filler to complement the visual aspect of the story [24] [27].

We can then conclude that there will be two parallel layers when considering music execution –

event music and background music. They are considered parallel since both can be happening

at the same time. The first one is more oriented to short musical sequences that represent

certain actions and the second one is relative to pre-composed music that is being played while

the story is developing.

5.2.1. Event Sounds

Since events are associated with actions made by characters, we classified the arguments of an

event as being the following:

Subject – the character that executes the action

Target – the character that is the target of that action

Action – the action itself of that event

With that, for every event, there is always a subject that executes it, there may be or not be a

target (since some events are individual and do not relate to other characters) and there’s the

action that describes which kind of event that is and what music will be then played to score it.

The event sounds help the user to understand what actions are being made and who are doing

those actions. For example, when a character is trying to tell us that he’s happy, he can try to

express it by doing a certain movement with his body. Along with that movement, a happy

sound can be also played to reinforce the fact that the character is happy.

5.2.2. Background Music

By using background music, we wanted to use music that could offer all the features we have

seen before of what would be the purpose of using it to score stories, with a special focus on

enhancing the understanding of the story. That way, we can then classify background music in

four different categories:

Character Themes

This music is played whenever an important character enters the scene for the first time. This

decision was based on the character themes that are also present in many movies as we have

seen in chapter 2. The objective of using this music is to create an impact on the audience

34/100

when that important character makes its first appearance, revealing that it is an important

character and it will take an important role on the story.

To choose which characters would be worthy of having a character theme, we based on

Propp’s study [39]. He analyzed the basic plot components of Russian folk tales to identify

common narrative elements amongst them. By analyzing the structure of those stories, he

concluded that all the existing characters could be resolved in 7 different types: hero, villain,

princess, donor, helper, dispatcher and false hero. In the context of this thesis, we will consider

only three of them as being the main characters: the hero, the villain and the princess. Since the

princess character might be limited, we used a different definition for it and we called it the

friend character. That way, it can still be a princess but can also be just a good friend of the

hero, someone that is on the hero’s side. We will only pick 3 characters so that it would be

easier for the audience to remember and associate a certain theme to a character. If every

character in scene had a character theme associated, it would be confusing and hard to

understand who the main characters of the story are. That way, those 3 characters - the Hero,

the Villain and the Friend – will have a dedicated character theme. With that, we have then our

first type of background music:

Hero Theme – Theme that is played when the Hero enters in scene for the first time.

Villain Theme – Theme that is played when the Villain enters in scene for the first time and

whenever the villain is in scene without the hero being present too.

Friend Theme – Theme that is played when the Friend enters in scene for the first time or when

it is alone in scene.

The Hero Theme is also played when the hero is alone in scene. However there were made

some changes to it according to the emotions felt by the hero at that time, as we will see next.

Background music that emphasizes emotions

As we have seen before, the use of music can be useful to emphasize emotions that are

present on the scene at each moment. The D3S system uses this type of music through the

hero. That way, the hero theme is played when the hero is alone in scene (which will most likely

happen often, since he’s the main character). To make the theme emphasize the emotions felt

by the hero at that time, we can associate one different type of hero theme for each kind of

mood that the hero is feeling at each moment. That theme will carry associated emotions within.

Using the notion of valence and intensity that we have seen before and the notation used by the

I-Shadows system to classify moods, the mood of the hero can then be divided in 4 main mood

intervals:

35/100

Mood Value Type of Hero Theme

[-10;-5[ Hero Theme with Negative Valance and High Intensity (Hero Theme NH)

[-5;0[ Hero Theme with Negative Valence and Low Intensity (Hero theme NL)

[0;5[ Hero Theme with Positive Valence and Low Intensity (Hero Theme PL)

[5;10] Hero Theme with Positive Valence and High Intensity (Hero Theme PH)

Fig. 17- Association between Mood Values and the type of Hero Theme

To make the changes of valence and intensity to the theme, we can then make an analogy

between some well known parameters that we have seen before [2]: the valence can be

associated to the mode, being major mode associated with positive valence and minor mode

associated with negative valence. The intensity can be analogized with rhythm and volume,

being a faster rhythm and loud volume associated with a high intensity and a slower rhythm and

softer volume associated with a low intensity. These associations between the musical

parameters of mode, rhythm, volume and the respective valence / intensity that they might

suggest, were based on the studies we have discussed in chapter 3. Most specifically, with

Thayer’s model of mood [18] and with the other studies that relate musical parameters with

emotions also discussed in that chapter.

Background music for key moments

Along the story, there will be some key moments that have a great importance for it. Such

importance should then be emphasized too through the use of music to score them. We have

seen some examples in films where the type of music played was related to what was

happening in the scene at that moment.

Using the existing knowledge for the common structure of the kind of stories and following

Propp structure [39], we could find some key moments that have background music associated

within. Those moments are:

Duel Theme – played when at least the hero and the villain are in the scene.

Friendship Theme – played when the hero and the friend are alone in the scene.

Climax Theme – played when the story reached its climax. Since the climax itself its just a

single point in the story that ends the rising action act and starts the denouement act – for

example, when the villain is finally defeated – we have to find a way to fit the playing of the

music associated with it. That way, it starts to be played when the rising action act reaches its

halfway point and ends when we reach the halfway point on the denouement act.

36/100

Fig. 18 - Changes of acts in I-Shadows

Background music as filler

When there is nothing relevant happening in a scene, a different type of background music is

played. We called it General Music, since it is the music that is played when, for example, only

secondary characters are present in the scene which most likely will not do anything as relevant

for the development of the story as a hero or a villain could make.

This General Music is the only type of music of the system that is not pre-composed - it is

generated automatically using the existing Amadeus module of I-Sounds, which generates

music for the scene according to the environments characteristics at that time. In this specific

case, the music changes according to the valence of the character’s mood and those changes

are reflected in the mode and rhythm. Amadeus will be better explained in chapter 6.

This kind of background music could also use pre-composed music instead of automatically

generated one. We could use a similar approach to the one we used for the hero themes. That

way, we could discretize the environment valence/arousal space into 4 intervals and have

different music associated to each of them.

Fig. 19 - Possible discretization of the General Music

37/100

5.2.3. The use of silence

A common saying tells us that “silence is golden”. Even though we have said that the use of

music is important to emphasize several aspects of the story, sometimes we may use silence to

emphasize other aspects.

As mentioned before, suspense can be created through the use of certain kind of music, like the

drums that are played in the circus before a dangerous action. However, the use of silence can

have the same effect. When the music suddenly stops after being played constantly for

sometime, it may arise some questions and emotions on the audience. The simple question

“why did the music stop?” may make the audience start imagining and guessing of what is going

to happen after, creating and building an effect of suspense [40].

In the concrete example of D3S, the use of silence was made before the first entry of the villain

in scene. Since D3S is informed about the characters’ entrance in scene before it happens, that

anticipation time is used to stop the music being currently played. That way, when the villain

music starts being played and then the villain itself appears in scene, the impact caused is

bigger and the evilness of the villain may be emphasized. We can then discretize the villain

entry in 3 different steps:

1 – The music currently being played stops – this happens when the system receives the

information that a villain is about to enter in scene. That way we can anticipate that entry and

prepare accordingly by stopping the music.

2 – The villain theme starts playing – this way, the viewers can be suggested that something

important will happen in the scene.

3 – The villain enters in scene – the suspense effect ends, making a greater impact on the

audience with that entrance.

5.3. The Artistic task

The choice of what kind of music / sounds are played for each event is mostly an artistic task.

One can have pre-composed music or music generated by systems such as Amadeus.

However, we chose to use pre-composed music and sounds, since it increases the chance of

adding the entertainment factor to it. There have been some advances in the automatic music

composition field where good examples exist, such as EMI. However, we have not reached yet

a point where really pleasurable music is being created. Despite of such advances in that field,

we have not reached yet the same quality of music that is created by composers.

Since we want to keep the system open for changes relative to the music to be played, all the

sounds and music are read from files that have a name associated with it. That way, each event

have a file associated called “name.mid”, where name is the action’s name. The same happens

to the background music, where there’s also midi files associated for each type of background

music. As we have the system now, some of the event sounds were composed by the author of

the thesis and others were adapted from existing motifs. Most of the background music has

38/100

been chosen between existing music, where there was the concern of searching for music that

had the characteristics that were explained above to match each type of background music.

However, even though the nature of the music may change by choosing other music files to be

played, it is possible to change some musical parameters dynamically. Those changes are

explained in the next section.

5.4. Adding dynamism to music and sounds

Since D3S is being used to score dynamic affective environments, the music and the sounds

need to be dynamic as well to adapt to the nature of those environments. Because of that, we

need to find which musical parameters we can change and when / how to change them.

Besides, we need to make associations and analogies between those musical parameters and

other parameters that we can find in the affective environment. The use of MIDI format reveals

useful in this matter, since it is easier to tweak parameters in this format than on other formats

such as WAV or MP3, since there are existing libraries that allows those changes.

5.4.1. Volume and Emotional Relations

The volume of music can be associated with the intensity of its output in decibels. With that, a

loud music will have a high decibel output, and a soft music a lower decibel output. Using the

intensity parameter, we can associate it with the other kind of intensity that we have seen before

– the emotions intensity.

Making an analogy between the volume of the music and the intensity of the emotions, we can

then change the first one according to the second one. With that, taking one example of the

events - if the character John gives a flower to the character Mary, a sound associated to the

action offer is played. The volume of that sound can be higher or lower according to the relation

that both characters have. Then, if John and Mary have a strong relationship between

themselves, the sound will be played with a higher volume. On the other hand, if their

relationship is neutral or weak, the sound will be lower and not as significant.

This way, we can add dynamism to the event sounds and hint the audience with the importance

of that action through the volume of the associated sound that was scoring it.

5.4.2. Instruments and Characters

The use of instruments plays an important role too when we want the audience to associate

them with a specific kind of character. This kind of association has been already seen in the

musical world. The most famous example is Peter and the Wolf, a composition by Sergei

Prokofiev written in 1936.

39/100

Fig. 20 - Disney’s Peter and the Wolf – 1946 [41]

This composition consists on a children’s story with both music and text written by the

composer. That story is spoken by a narrator and accompanied by an orchestra. There are

several characters present in it and for each character there is an instrument associated to it.

For example, the bird lines are played by a flute, the duck by an oboe, the cat by a clarinet and

so on [42]. The main objective of doing this was that by doing so, the listener to the story could

easily associate the music to the characters. That way, we have a better perception of what is

happening in the story. For example, if the flute and the oboe are being played at the same

time, then we know that the bird and the duck are both in scene at that time.

Using a similar analogy, in the D3S we can associate instruments to the characters as well.

However, since the background music can be complex and using different instruments at the

time, the events sounds is the best place we can find to do these associations. For example,

when we have John offering a flower to Mary, we will have a specific instrument to score that

action. On the other hand, if we have Mary offering the same or something else to John, then

the instrument used will be another one. Even if the sound played will be the same, the use of a

different instrument makes it possible for the audience to make a better distinction of who gave

the object to whom through listening to the sound.

5.4.3. Tempo and Environment Arousal

The Tempo of a piece of music is its speed or pace [43]. When the Tempo changes, each note

duration changes too, making the music slower or faster.

Usually, the tempo of music is measured in BPM (beats per minute). The same unit is used to

measure one’s heart rate. When we talk about environment’s arousal, we talk about how

agitated or calm is that environment at a given time. Usually, the heart beat of someone who is

agitated beats faster than other that is calm.

We can then make an analogy between the tempo of a piece of music and the environment

arousal - the intensity of the scene at each moment. We can then play the background music at

a faster Tempo when the environment arousal increases and do the opposite action when it

decreases. With these changes, we expect the tempo of the background music to follow

Freygtag’s Pyramid model (Fig. 21), being the Tension variable associated with the Tempo of

the music being played.

40/100

Fig. 21 – Freytag’s pyramid with music’s Tempo variable

5.5. Summary In this chapter, we explained how we could meet the goals of making the viewer better

understand a story and at the same time enjoy it more through the use of music and sounds.

We considered the use of two different and parallel layers while scoring a storytelling

environment: the event sounds layer and the background music layer. Within the first one, we

have seen that we can associate a sound to an event that happens in the environment. That

way, it is possible to better understand what kinds of events are happening. Changes on the

instrument according to the character that made the event can help the viewer to understand

who did the event. Changes in the volume can help to understand the strength of the relation

between two characters. Within the background music dimension, we have considered the use

of character themes to help the viewer to recognize the main characters of the story. The use of

the hero theme can be changed according to his mood and that change could be transmitted to

the audience with the help of a variation of that theme. Other themes can be used in key

moments of the story, such as the duel, friendship and climax scenes. We have also

considering the change of the background music tempo to emphasize the change of the

environment’s intensity along the story. We have seen that background music can also have

other functions such as being a filler to entertain the audience while there are not major events

for the story happening in the environment. The use of silence was explained as being a way of

creating anticipation and suspense effects on the audience.

41/100

6. Architecture and Integration

In this chapter we aim at giving an overview not only on the system architecture but also on the

integration with the storytelling systems. We will describe the main components of the base

implemented system that was used to develop the D3S features – the I-Sounds system – and

we will explain the changes that were needed to make in order to adapt it to D3S. After, we will

explain how the music manager module Apollo works and what its purpose for our work is. We

will also describe how the integration was made with the two storytelling systems used for the

test of this work – I-Shadows and FearNot! systems.

6.1. I-Sounds

I-Sounds system is an existing framework developed by Ricardo Cruz [44] that allows the

scoring of storytelling environments. It was created with the purpose of scoring the storytelling

environment I-Shadows, as a response to the need of having a system that could manage all

the sound resources and do the connection with the interactive storytelling systems.

It is a 3 layered system that uses the information received by the story-telling system to produce

sound to accompany the story made by that system. As for the layers, they consist in an

Affective Layer, a Composition Layer and an Output Layer.

Additionally, an application specific driver was developed in order to facilitate the

communication with the storytelling environments. An interface was also developed to ease the

process of testing the system and keeping track of the changes in the system while the story is

being scored. In the following sections, we will give more details not only about the driver but

also about each of the layers that compose the I-Sounds system and about the Interface.

6.1.1. Application Specific Driver

This driver is responsible for decoding the messages that come from the storytelling system and

arrive to the I-Sounds system. It makes then all the changes needed in the Affective Layer

according to the content received on those messages. This driver is dedicated to a specific

storytelling system. In order to adapt the system to different storytelling systems, a new driver

needs to be created and changes need to be made in order to integrate it with those systems.

Following, we present the example of the messages that are understood by a driver. Notice that

the majority of these messages can be used for any generic storytelling environment.

Register Actor – this message is responsible to register a new actor in the local environment. It

is also responsible to inform the I-Sounds system whenever an actor enters in scene.

Establish Connection – this message is responsible to establish an initial connection between

two actors. That connection is an affective one, which will be used to update their emotional

relations.

Remove Actor – this message is sent whenever an actor leaves the scene.

42/100

Update mood – this message updates the mood of a specific actor. The mood can be positive if

the actor is feeling positive emotions (such as happiness, joy, etc.) or negative in the case of

negative emotions (sadness, anger, etc.).

Fig. 22 – Original I-Sounds architecture [44]

43/100

Update emotional state – this message updates the emotional state of an actor. It gives more

information about the intensity of each specific emotion that is being felt by that actor. For

example, it can update the emotion “sad” for the character “John” with a value 10 (maximum),

which would mean that the character is very sad.

Updates connection – message used to update the emotional connection between two actors,

which was created previously. We can then update the system saying that, for example, a

character hates another character with an intensity of 10 while at the same time likes a third

character with an intensity of 5.

Story State Message – In this message, the storytelling system updates the information about

the story state, namely about the mood, valence and intensity of the environment and the

current story act. The story acts are related to the Freytag’s pyramid acts, which allow the D3S

system to know the current story state and predict changes in the music according to it.

Event Message – This message is sent whenever an event occurs, informing the I-Sounds

system about the subject, target and action of that event.

The last two messages were added for D3S, since we needed to retrieve more information

about the storytelling environment related to the story states and the events made.

6.1.2. Affective Layer

In this layer, I-Sounds system keeps the record of all the entities that are present in the affective

environment as well as a record of the environment proprieties. It needs to keep a registry of the

environment so that it helps to decide what kind of music to generate at each moment.

The information about each entity consists on their name, role, mood and instrument. The

information about the name and role is given by the storytelling system through the Register

Actor message. The mood is updated whenever it is needed through the Update Mood

message. The instrument is assigned to a character according to its role. We decided that the

main characters (hero, villain and friend) would have a pre-assigned instrument (trumpet, cello

and flute) respectively. That way it would be easier to distinguish between the characters. All

the other characters would have a different instrument assigned to them, which would be

assigned according to the order of registry in the system.

The information about the environment proprieties consists on its mood, valence, intensity and

background music being played. There is also a registry of the existing actors and the actors

that are currently acting in scene. This distinction is needed due to the importance of knowing

who enters or exits the scene and who is acting at a given time so that the music can be played

according to it.

There was the need to change this layer slightly to adapt it to the new information that would be

received from systems which have a story development that follows Freytag’s Pyramid.

Therefore, the affective environment also keeps the registry of the story state. That way, it is

possible to know in which act of the story we are in a given moment. In the case of the

storytelling systems that do not follow Freytag’s Pyramid, this parameter is not used to produce

the background music.

44/100

6.1.3. Composition Layer

This layer gets information from the previous layer, creates the music and/or sounds to be

produced based on the information received and sends the results for the next one, the output

layer.

The original I-Sounds system was able to compose music in real-time according to mood

changes using mode and rhythm [44]. That music creation was the job of the module Amadeus,

the composition algorithm implemented in I-Sounds framework. It is able to compose short

music segments that are representative of certain defined emotions. Those compositions are

based in two parameters: mode and rhythm. The mode parameter follows diatonic major and

minor modes and there are used three notes to produce the melody of the sequences. Those

notes are the tonic, mediant and dominant scale degrees. The rhythm parameter follows the

Just in Time theory by Eduardo Lopes, which is based in pulse salience and kinetics to produce

different rhythmic sequences and associate them with emotions. [45]

To make it possible for the system to use pre-composed music without changing Amadeus, we

created another composer – Ludwig – which is responsible for the selection of pre-composed

music.

To manage which composer to be selected at a given time, a music manager – Apollo – was

created to do that job. This manager makes the bridge between the affective layer and the

composition layer. Therefore, whenever a message is received, after the information in the first

layer has been updated, it selects the most appropriate composer to produce the sounds or

music to be output. More information about Apollo can be read later in this chapter.

Fig. 23 – Integration of Apollo

45/100

6.1.4. Output Layer

This final layer is responsible for reading the results produced by the composition layer and

outputting them as music or sounds. Originally, the output layer was built in order to be able to

reproduce only one layer of sound. Since we needed two different layers for D3S – background

music and events – changes had to be done in order to adapt the output to be able to reproduce

two layers of music at the same time. Since that the MIDI protocol is the one used, we added a

new synthesizer that would be responsible for playing the event sounds, while the one existing

would play the background music. Other changes were made, related to the change of musical

parameters (change of music’s tempo, volume, instruments) so that this layer could support the

variation of these features.

6.1.5. Interface

The I-Sounds system has an interface built for the purpose of keeping track of all the changes

that were happening in the environment. It has four different divisions, where we can keep track

of the affective entities that are in the environment and their respective emotion values, the

current environment politic control, the compositor that is being used and the output handler that

is active at a given moment.

A new division was added in the interface for the purpose of testing the D3S changes.

Through it, it is possible to track the following variables:

Actors registered – The list of the actors registered in the I-Sounds system

Events list – The list of all the events that were received by the system since the start of its

execution;

Current State – The current story state. It can have values between 1 and 4. Each integer

related to each act. It can also track middle states between acts. This is useful if we want to

consider changes that happen between acts, such as the climax of the story.

Environment mood – the overall mood of the environment;

Hero Mood – the mood of the first hero that showed up in the story;

Music being played – The music that is being played in the environment and that can be heard

by the users.

Music value – the value of the music being played. Each musical theme has a value

associated. The most relevant and important themes for the story have the higher values (such

as the Hero Theme or Villain Theme), and the less relevant have the lower values (such as

General Music). When a certain defined time passes, this music value is changed to 0 due to a

timeout trigger, to give opportunity to other less valuable themes to be played. This music

hierarchy can be seen later in this chapter.

Actors in scene – The list of actors that have entered in scene and will affect the story

development directly.

46/100

6.2. Apollo – D3S Music Manager

Since we have many musical features we want to include, we need to find a way of coordinating

them and make them fully integrated with the I-Sounds system. To do that, the module Apollo

was created.

Apollo is the module responsible for updating the affective environment and performing the

music selection in D3S.

Fig. 24 – D3S Interface

It has three main functions:

- Changing the affective environment whenever a change occurs due to a new message

received from the affective environment.

- Choosing what changes in sound / music are needed to make according to the new

changes in the environment.

- Selecting which composer module – Amadeus or Ludwig – will be activated to

generate the sound or music to play according to the changes.

6.2.1. Amadeus and Ludwig

Amadeus and Ludwig are the two available composition modules in the I-Sounds system. They

have different natures – Ludwig is oriented to pre-composed music and Amadeus is oriented to

real-time composed music. Ludwig was created while Amadeus already existed.

As we have seen, there are 2 parallel paths when considering music execution – events music

and background music.

Since the events are sounds that are reproduced whenever an action occurred, those were pre-

composed to be available always in the same form so the same action has always the same

sound associated to it.

47/100

The background music can either be pre-composed or composed in real time, depending on

which kind of music we intent to have. The pre-composed music has the advantage of generally

sounding better and being more interesting to hear. However, it has the disadvantage of having

to be composed before by someone. The music composed in real time have the advantage of

being composed while the story is being developed and can adapt better to the changes in the

environment. However, it might sometimes not sound as good as the pre-composed music. As

such, Ludwig plays all the music and sounds associated to special events and the other

background music that is pre-composed. Amadeus is responsible for generating and playing

other kind of background music whenever there is not obvious pre-composed music to score in

certain concrete situations, as it will be explained below.

6.2.2. Input

The input for Apollo consists on variables given by I-Shadows whenever relevant changes occur

in the action of the story that justify a possible change in the sound as well.

These variables consist on:

Scene Valence – the valence of the current scene, being an interval between -10 and 10

Hero Mood – the mood of the hero, being an interval between -10 and 10

Event – a special action made by a character

Scene Change– given whenever the story act changes, being possible to take the values

corresponding to the beginning and the middle point of each act: Exposition, Rising Action,

Climax, Falling Action, Denouement

Character Entry – when a character enters the scene

Character Exit – when a character exits the scene

6.2.3. Output

The output created by Apollo will be given to the destination composer. The composer will then

use this output to generate the sound to play. In the case where Amadeus is selected, Apollo

gives information about the scene mood which Amadeus will use to generate its music. In the

case where Ludwig is selected, Apollo gives information about which music to play, being it an

event or other type of background music.

6.2.4. Internal Structure

Since the events and the background music follow parallel paths, there are also two paths on

the manager associated to them. Whenever an event occurs, the function associated to the

treatment of events is called. When a change occurs in the environment that may lead to a

change to the background music, another function is called to analyze that occurrence.

To score the background music, Apollo uses a structure of music selection based on a list of

hierarchy. This list was ordered based on the importance and relevance of the music for the

story. The list, sorted by descending relevance order, consists on:

48/100

• Hero Theme First Time – theme that is played when the hero enters in a scene for the

first time

• Villain Theme First Time – theme that is played when the villain enters in a scene for

the first time

• Friend Theme First Time – theme that is played when the friend enters in a scene for

the first time

• Climax Music – music that is played when the story reached its climax

• Hero Theme – theme played when the hero is the only character in scene. The

characteristics of this theme are based on the current mood of the hero.

• Friend Theme – theme played when the friend is the only character in scene.

• Duel Theme – theme played when at least the hero and the villain are in scene.

• Friendship Theme – theme played when the hero and the friend are alone in scene.

• Villain Theme – theme played when the villain is in scene, either alone or with more

characters. The hero can’t be present.

• General Music – music played when none of the above situations happen, when we

have only secondary characters in scene. This is the situation when Amadeus is

currently being called, since the music generated from it is more basic than the pre-

composed music. All the other states above are treated by the composer Ludwig.

The hierarchy is defined according to the importance and relevance of each theme to the

understanding of the story. More concretely, we want to guarantee that the viewers understand

the character’s roles, their actions and emotional state, how they interact and feel about other

characters and the key moments of the story

49/100

Fig. 25 - Input and Output of Apollo – The music manager

50/100

Fig. 26 - Internal Structure of Apollo for the Background Music

51/100

6.3. Integration with the storytelling systems

To be possible for the I-Sounds system to connect to a storytelling system, we designed a driver

which makes that connection. This driver can be modified or a new one can be created to adapt

the I-Sounds to other systems. During the development of this thesis, we used two different

storytelling environments to test the D3S features on them: the I-Shadows system and the

FearNot! system. As we have seen before, the I-Shadows system consists on a theatre where

computer generated shadows interact with user shadows in order to create children oriented

stories [4]. The FearNot! system consists in another storytelling environment where the main

theme is bullying. It has the objective of creating scenarios where a child character is being

bullied by others and the user has the objective to help that character through the story. [6][7]

6.3.1. Integration with I-Shadows

The integration between I-Sounds and I-Shadows required that the following components were

created:

On I-Sounds side:

A driver – called I-Shadows driver – which is responsible for detecting incoming messages from

the I-Shadows system. Those messages are transmitted through UDP protocol and they are

organized in a XML structure, so that the portability would be better and the integration easier to

make. This driver existed originally in the I-Sounds system.

On I-Shadows side:

There were created 3 components:

1. Director – entity that has the knowledge about what is happening in the storytelling

environment and is responsible to send messages to the sound system whenever

something relevant happens in the story.

2. Sound Interface – this component contains all the information about the possible

messages that can be sent to I-Sounds. This is accessed directly by the I-Shadows

director, which gives order to create the message whenever it is needed.

3. Messages structure – this component is responsible for the serialization of the

messages and to convert them in the XML structure, so then the I-Sounds system can

then read them after and decipher them.

Fig. 27 - Communication between I-Shadows and I-Sounds

52/100

I-Shadows I-Sounds

1) Register Hero, Villain, Friend

Register characters

and

assign instruments

2) Put Hero in Scene

Affective

Layer

Composition

Layer

Output

Layer

S1: Play

Hero

Theme

Select Hero

Theme

Hero in

scene3) Put Friend in Scene

Hero and Friend

in scene

Select Friend

Theme

S1:Play

Friend

Theme

Select Friendship

Theme

S1: Play

Friendship

Theme

- -

4) Event: Hero grabs flower

Select grab

event

S2: Play grab

sound5) Event: Hero offers flower to friend

Select offer

eventS2: Play offer

sound

6) Event: Friend expresses happiness

Select express

happiness event

S2: Play express

happiness sound7) Put Villain in Scene

Hero, Friend and

Villain in sceneSelect Villain

Theme

S1: Play Villain

Theme

Select Duel

Theme

S1: Play

Duel Theme

8) Story reaches climax

Select Climax

ThemeS1: Play Climax

Theme9) Villain Dies

Hero and Friend

in scene

Select Die

Sound10) Villain exits sceneS2: Play die

sound

Select Friendship

Theme

S1: Play

Friendship

Theme

Example of interaction between I-Shadows and I-Sounds

Following, we will present an example of interaction between these two systems:

Fig. 28 – Interaction example between I-Shadows and I-Sounds

Whenever I-Shadows sends a message to I-Sounds, that message is read and decoded by the

driver. It changes then the affective layer according to the updated information contained in that

message. Apollo is then responsible to check if there is the need to either change the

background music or to play an event sound. In the case where we need to do that, it forwards

the request of selecting the music to Ludwig, which is responsible to locate the music to play. It

forwards the music to the output layer, which makes the necessary changes on it and output it

in MIDI format. Notice that S1 and S2 correspond to the synthesizer that is used to output that

53/100

music / sound. As we explained before, we need to have two since we want 2 layers of music to

be played at the same time: the background music and events sounds. To simplify, we will

assume that the environment objects have already been registered in I-Sounds.

Following, we will give more details about the 10 messages that exist on the above example.

1) I-Shadows sends message “Register Actor” for the Hero, Villain and Friend characters. The

driver decodes the message register those entities in I-Sounds. The respective instruments are

assigned to those characters. Notice that at this moment there is no sound being output since

the scene is still empty of characters.

2) I-Shadows sends the message “Register Actor” again for the Hero character. Since this

character is already registered in I-Sounds, the system will then put it in scene. Apollo notices

that there was a significant change on the environment and selects the Hero Theme to be

played, since the hero is alone in scene. The output layer starts then outputting the music.

3) Similarly, this message puts the Friend character in scene. Since another significant change

in the environment was made, Apollo decides to change the music to the Friend Theme. Every

background music theme has a minimum time of reproduction. We want this time to exist so that

the change between themes is not abrupt (in the cases where there is a fast change of the

environment). When that time expires, the value of the music that is currently being played is

reset to zero. That way, the high valued themes (such as the character themes) do not play

forever and there is given the chance of other themes to be played as well. In this case, the

Friend Theme is played for that minimum time of execution. When it expires, its value is reset to

zero. Apollo will then check if there is another possible theme that can be reproduced by that

time which value is above zero. It finds the Friendship Theme as the theme with the next higher

value and sends it to the output layer for reproduction.

4) The storytelling system sends an Event Message with the Hero as a subject, with the target

being a flower and with the action being grab. Apollo detects this new message and forwards

the sound associated with the action grab to be played by the output layer. Since the Hero is the

subject, that event is played with the instrument associated to that character.

5) Another event is sent to I-Sounds. This time, is the offer event. The subject is still the hero,

but the target changes to the friend. The action made is offer. Notice that the volume of this

sound is played according to the relation between the hero and the friend at the time.

6) A third event is sent by this time, which corresponds to an action of expressing happiness by

the hero. The subject is himself, but there is no target since it is an action that he is doing alone.

7) By this time, the villain enters in scene. The Villain Theme is played during the minimum time

of reproduction. After that, Apollo changes the background music to a Duel Theme since at

least the hero and the villain are in scene at that moment.

8) A Story State message arrives on I-Sounds with the information that we have reached the

climax of the story. That way, Apollo chooses the Climax Theme as the next theme to be

played.

9) The Villain dies and the message with that respective event is sent. The corresponding

sound is played with the instrument associated to the villain.

54/100

10) After the death of the villain, I-Shadows decides to remove it from scene, leaving the hero

and friend alone in scene. That way, Apollo changes the music back to the friendship theme.

Notice that along the story, the environment mood and arousal have been changed, along with

changes in the affective relations between the characters. With the changes in the

environment’s arousal, the background music’s tempo also changes, being faster when the

arousal is higher and slower when the arousal is lower. The messages related to the

environment and characters’ relations updates are not represented in the scheme above.

6.3.2. Integration with FearNot!

A driver for I-Shadows already existed in the original I-Sounds system, but a new one had to be

created for the FearNot! system that was adapted to the needs of this new system.

Besides, in the FearNot! side, we needed to implement not only the SoundInterface and

Messages components, but we also needed to adapt the director existing in I-Shadows. That

way, the director was responsible to make the calls to create a message whenever it was

needed. Since both I-Shadows and FearNot! systems use the OCC based architecture (FAtiMA)

[5], this adaptation was possible to do without bigger problems.

Interaction between FearNot! and I-Sounds

The story structure in FearNot! is different from I-Shadows. If follows a sequence that alternates

episodes with coping parts. The episodes have no user interaction and the characters of the

story interact only with each other. The coping parts consist on a small chat between the bullied

child and the user, where advices can be given to that child by the latter.

In the next image we can see the flow chart of the story sequence, where 1, 2, 3 are the scenes

with no interaction, that are intercalated with coping parts. The content of the final episode

depends on the choices made by the user concerning the coping strategies decided along the

story.

Fig. 29 - FearNot! Flow Chart [35]

Legend: • Introduction (I): Type in of code, name, age and gender, introduction of characters and school • Bullying episode (1-3) • In between episodes: interaction with victim character in resource room (cope) • Educational message (F): after end of episode 3.

55/100

Even though the structure is different, it does not change the way that D3S interacts with the

storytelling environment. During the episodes, the process is very similar of an I-Shadows story.

Between episodes, the actors are all removed from scene so the scene remains silent while the

user chats with the bullied child. Therefore, FearNot! would be like different small stories (which

are the episodes) that are intercalated with the silent chat parts.

6.4. Summary

In this chapter, we described the architecture of I-Sounds and the changes that were made in it

so we could integrate the new features of D3S. We described the D3S music manager Apollo,

which is the module responsible for changing the affective environment whenever a change

occurs, choosing the sound / music to be played and selecting which composer module will

generate the output. We defined its structure and we described a new composer module,

Ludwig, and how it is used along with the existing module Amadeus. The first one uses pre-

composed music while the latter uses music composed in real-time. We presented an

interaction example between the I-Shadows and I-Sounds system and what changes happened

during that interaction in every layer of the –Sounds system. Finally we explained how the

integration was made with the two storytelling systems that served as a testing base for this

thesis work – the I-Shadows and FearNot! Systems.

56/100

7. Evaluation

With this dissertation work, we intended to find a solution for the problem of scoring a story,

which is being created in real time, using sounds and music in a way that adds more

understanding to what is happening in that story and consequently increases the enjoyment to

the viewer.

In the previous chapters we discussed some features that together would consist on a solution

for that problem. We can summarize those features in the following list:

1) Themes associated to characters

2) Association between characters and instruments

3) Use of sounds to underscore events

4) Change of character theme according to mood

5) Volume of sound associated to events

6) Music for key moments: Friendship Theme, Duel Theme, Climax Theme

7) Background music tempo

In this chapter we will then present the evaluation model that was followed to validate these

features. In order to do that validation, they were evaluated individually in independent tests.

Additionally, we also tested the system as a whole so that we could guarantee that we would

meet the main goal of the thesis. The storytelling environment that was used for the evaluation

part was the I-Shadows system.

In order to obtain data for the different tests performed, all the experiments were gathered in

online forms which the participants answered. We created 9 different versions of tests for that

form. The participants watched videos and heard sounds online. The sample was collected

amongst university students aged between 18 and 30. We understand that the ideal would be to

have answers from the I-Shadows destined age group of users, which would be around the age

of 10. Unfortunately, it wasn’t possible to gather a significant amount of users of these ages.

Note: Even though several different variables will be tested, we will assume that they are

independent among them. That way, the evaluation of the experiments can be simplified.

7.1. General system evaluation - D3S and story

enjoyment

The objective of this experiment is to validate if D3S brings a better global understanding to the

story and if consequently improves the enjoyment of the viewer while watching it, which was the

main goal to be achieved by this thesis. We will then evaluate if the use of D3S brings more

interest to the viewers by comparing it with no use of any sound system and with the use of

57/100

Amadeus composing system, which was present in the original I-Sounds project [44], which

uses only real-time composed music.

7.1.1 Method

Design

The Independent variable of this experiment is the type of scoring system to accompany a

storytelling environment, while the dependent variable is the enjoyment of the story. This

experiment followed a repeated-measures design and the type of data collected consists on a

5-point Likert scale score (0-5).

Participants

A total of 62 participants answered to this experiment. They were asked to answer under all the

different conditions of the test, so there were 62 answers for the Silent, Amadeus and D3S

versions.

Procedure

Three videos were shown to each of the participants:

- a first video where a complete story is shown. In that video no background music is played

during its exhibition;

- a second video where the story is scored using Amadeus;

- a third video of that same story where all the D3S features are present.

In the end, the participants were asked to rate the interest on each of the videos in a 5-point

Likert scale (0-5), where 0 corresponded to “Not Interesting” and 5 to “Very Interesting”.

To guarantee that there was not any influence in the answers, the order of the videos was

changed among the 9 versions.

7.1.2. Results

In this first experiment, D3S got the highest mean value of interest (3.21 out of 5), while the

silent version got the lowest (1.73 out of 5). The Amadeus versions got the middle result (2.27

out of 5).

The comparison between the three different conditions can be seen in the box plot graph in the

next figure:

58/100

Fig. 30 - Different scoring systems and associated interest.

A non-parametric Friedman test was performed on the data. Results tell us that the level of

interest obtained was significantly affected by the type of scoring system used,

χ2(2) = 75.8, ρ< 0.001.

7.1.3. Discussion

The results of this experiment tell us that the use of a musical system to score a storytelling

environment increases the interest of watching a story being created there. The results were

validated since we got a significance value of p < 0.001. We can see that participants preferred

D3S over Amadeus, which suggests that the new features included in this project offers a better

interest to the viewers. This increase of interest might be related with the fact of the use of pre-

composed music in D3S opposing the real-time composed music used by Amadeus. The final

result is more pleasant to the listener’s ears and might have a significant influence in the

evaluation related to the interest.

7.2. Background music’s tempo and

environment’s intensity

The objective of this experiment was to test if the changes in the background music’s tempo

translate into a better perception of the story’s intensity. We aimed at studying if there is a

relation between the music’s tempo and the environment’s intensity. By this, we wanted to

analyze if fast tempo induces the perception of greater environment intensity and if slow tempos

induces the opposite.

59/100

7.2.1. Method

Design

The Independent variable of this experiment is the background’s music tempo, while the

dependent variable is the perception of the environment’s intensity. This experiment followed an

independent design and the type of data collected consists on a 5-point Likert scale score (0-5).

Participants

A total of 62 participants answered to this experiment. Each participant got one version with a

different background music tempo. Therefore, 21 of them watched a video with a slow tempo,

other 21 watched a video with a medium tempo and the remaining 20 watched a video with a

fast tempo.

Procedure

The experiment participants were divided in 3 groups:

- We showed a video of a story segment with a slow music tempo to the first group;

- To the second group we showed a video of the same story segment but with a medium music

tempo;

- The same story segment was shown but with a fast music tempo to the third group.

In the end, we asked for the participants to rate their perception of the intensity of the story on

each of the videos in a 5-point Likert scale (0-5), where 0 corresponded to “Not Intense” and 5

to “Very Intense”.

7.2.2. Results

In this experiment, the highest mean value of intensity belonged to the version of the video with

the medium tempo (3.05 out of 5), while the slow tempo version got the lowest (2.57 out of 5).

The fast tempo version got the middle result (2.80 out of 5).

The comparison between the three different conditions can be seen in the box plot graph in the

next figure:

60/100

Fig. 31 - Different music’s tempo speeds and associated perception of the story intensity

A Kruskal-Wallis test was applied on the data. Analyzing the results of that test, we can affirm

that the change in music’s tempo did not affect significantly the perception of the environment’s

intensity, since the result p = 0.465 is greater than the acceptance value of 0.05.

7.2.3. Discussion

The results of this test do not allow us to conclude that there is an association between music

tempo and the environment’s intensity, since the significance value was above the acceptance

value of 0.05. However, even though the test results were not conclusive, we strongly believe

that the association exists. We can summarize some of the reasons that might have lead to the

fact that our results were not conclusive for this experiment:

- The design of this experiment is independent. Therefore, each participant only saw one video

and answered only to that video. The music used was one of the music’s assigned for the

climax part. The climax music already has an intense character associated to it, and even with a

medium tempo, the perception of intensity might be considered high. Because of that,

participants that watched the medium tempo video might have considered that the intensity was

already high enough to score it with a four or five in the 5-point scale, which would inflate the

results for the medium version.

61/100

- The users might have different understandings of the word “intensity”. It might have an

ambiguous meaning, since some might consider it related with being agitated / calm, while

others might associate it with the number of characters or actions happening in scene or even

other factors.

- The fact that this experiment was made right after the first one might have created an

unexpected impact in the answers. The first experiment asked three questions to the

participants, where they had to rate the video in an interest scale of 5-points. In this experiment,

the question changed. However, the scale is the same and the participants might not have

noticed that the question have changed, answering it as they would to an interest scale.

- The last possible reason might be related to the fact that it might be easier to perceive a

change in music’s tempo than to quantify its intensity. Therefore, what might be considered a

value 3 of intensity for a participant can be understood as a value 1 for others. However, if we

ask the participants if we have the music’s tempo changing in a video and we ask about the

intensity in the beginning and end of the video, the results might be more according to the

theory behind this experiment.

7.3. Association between characters and

instruments

The objective of this experiment was to test if the association between a character and an

instrument helps the understanding of which character did a certain action. We wanted to

analyze if the participants instinctively associate an instrument to a character and can recognize

the characters through the hearing of that instrument. That way, the listener could also better

distinguish the character’s actions by associating them with the respective instrument.

7.3.1. Method

Design

The Independent variable of this experiment is the association between characters and

instruments, while the dependent variable is the understanding about who did an action. This

experiment followed an independent design and the type of data collected consists on a score.

Participants

A total of 62 participants answered to this experiment. They all watched the same video.

Procedure

We showed to the participants a video where there are some events made by characters which

have different instruments associated with them – a trumpet, a flute and a cello to the hero, his

friend and the dragon, respectively.

62/100

After the video, the participants heard 3 events played by each of the instruments. The first

event was an “express happiness” event, played by the trumpet. The second event was also an

“express happiness” event, but played by the flute. The last event was a “show villainy intention”

event, but played by the cello. These events consisted on movements made by the characters

which represented the respective intention of either expressing happiness or showing a villainy

intention. We asked to the participants to tell who did the sound that they just heard. This test

was the same in all 9 versions.

7.3.2. Results

To analyze the results, a score was assigned to each participant according to the answers to

this experiment, which we can see following:

0 – None characters recognized correctly

1 – One character recognized correctly

2 – Two characters recognized correctly

3 – All characters recognized correctly

Analyzing the results according to the score, we can see that the mean was 2.42 out of 3, which

is a positive value. Considering the scores of 0 and 1 as being negative and 2 and 3 as being

positive, 81% of the participants had a positive score, while 19% had a negative one.

Additionally, we can analyze the individual scores for each character and the percentage of

participants that successfully recognized them through the hearing of the sounds. The hero

(boy) and his friend (the girl) had respectively 73% and 74% of successful rates. The villain

(dragon) got the highest score with 92% of the participants being able to recognize it. The Chi-

Square Test results showed us a significance p < 0.001 for all the characters and score. In the

next figures we can see the relation between the percentages of recognition for each character

and the percentage of participants that fall in each score category:

Fig. 32 – Percentages of recognition

63/100

Fig. 33 – Frequency percent of score

7.3.3. Discussion

As we can see through the results for this experiment, about 81% of the participants could

recognize at least 2 of the 3 characters. The biggest doubts within the recognition process

appeared while listening to the sounds of the hero (boy) and the friend (girl). When this

experience was designed, the boy and the girl had the same event sound associated. Both

played the “express happiness” event but with different instruments. The reason behind the

greatest recognition rates for the dragon, and lower and similar rates between each other for the

boy and girl might be related with it. Participants might easily recognize a music motif than an

instrument. Therefore, most of them might found hard to distinguish between the same music

motifs being played by two different instruments. That didn’t happen with the dragon, since the

motif associated was different. Another explanation might be related to the timber of the

instruments. The trumpet and flute might be closer to each other than to the cello, since the first

two use air to produce sound while the latter use the vibration of strings.

7.4. Themes associated to characters

The objective of this experiment was to test if our system helps to identify the characters’ role in

the story – specially the main characters - by associating a theme to a character. This test was

realized in order to test the hero, friend and villain themes, which are the main characters

considered for this thesis’ work. We will then see if the choice of those themes are appropriate

to fit the type of character which they represent.

64/100

7.4.1. Method

Design

The Independent variable of this experiment is the background’s themes associated to the main

characters, while the dependent variable is the understanding of the characters’ role. This

experiment followed an independent design and the type of data collected consists on a score.

Participants

A total of 62 participants answered to this experiment. 35 of them watched the versions with

sound while the remaining 27 watched the versions without sound.

Procedure


- We showed a video to the first group where 4 characters made their appearance by a defined

order. Those characters had all the boy shape, but different colors. The colors used were blue,

yellow, red and green. The choice of using the same character was made to minimize the

influence of the shape of the character on the participant’s decision. The white and black colors

were avoided because they might be easily associated with good and evil, respectively. There

was no sound played during the video.

- We showed another video to the second group where the same 4 characters showed up.

However, the hero theme, friend theme and villain theme were played. The order of the played

themes changed among the different versions.

Fig. 34 - Characters used on this experiment

7.4.2. Results

To analyze the results, a score was assigned to each participant according to the answers to

this experiment, which we can see following:

0 – No themes recognized correctly

1 – One theme recognized correctly

65/100

2 – Two themes recognized correctly

3 – All themes recognized correctly

Analyzing the results according to the score, we can see that the mean was 1.71 out of 3, which

is a positive value. However, the Chi-Square test for the score gave a non significant value of p

= 0.543. The Chi-Square test also gave non-significant values for the recognition of the hero

and friend theme (p = 0.866 for both). However, it was significant for the recognition for the

villain theme (p = 0.028). In the next figures we can see the relation between the percentages of

recognition for each theme and the percentage of participants that fall in each score category:

Fig. 35 – Percentages of recognition

Fig. 36 – Frequency percent of score

66/100

Additionally, we can check if the order which the characters were presented has any influence

in the answers. The Kruskal-Wallis test was applied in order to verify it. The non-significant

result of p = 0.108 makes the order not relevant.

In the experiment versions without sound, we can also try to establish a connection between the

type (hero, friend and villain) and colors (blue, yellow, red and green) of the characters. The

Chi-Square test gave significant values for all the characters (p = 0.009, p < 0.001 and p =

0.013 for the hero, friend and villain respectively). Below, we can see how the colors were

picked by the participants for each character:

Legend: 1 – Blue 2 – Yellow 3 – Red Fig. 37 – Hero, Friend and Villain Colors 4 – Green

7.4.3. Discussion

In the versions with sound, the villain theme was the only one that was successfully recognized

by a significant amount of participants (69%), since it was the only one that got a significance

value of p = 0.028, which is within the acceptance values of p < 0.05. The hero and friend

67/100

themes were hard to distinguish in this experiment. The reason behind might be the musical

characteristics of both themes, which are close to each other. For example, both are in the

major mode and are happy themes. The villain theme, by contrast, is in minor mode and has a

sinister nature. It might also be hard for the participants to distinguish characters only through

the theme. The visualization aspect is also very important, and since all the characters had the

same shape and only differed in color, it was hard to distinguish between them.

In the versions without sound, we can see that the predominant choice of colors for the

characters was blue, yellow and red for the hero, friend and villain respectively. However, this

was also the first three colors that were presented to the participants. Since there was not any

sound and they didn’t have anything to base their decision on, they might have picked the same

order of colors as the order of the questions that were made to them. The high amount of red

answers for the hero can also be explained by the use of that same color on the previous

videos, which might have influenced these answers.

7.5. Use of sounds to underscore events

The objective of this experiment was to test if through the use of sounds to underscore events, if

they help to understand those events that happened in the story. By adding or removing a

sound associated to an event, we wanted to study if that sound helps to understand the action

made by the character responsible for that sound. In this experiment we will then see the

differences that the presence of sound associated with an event can make, comparing with the

non existence of sound.

7.5.1. Method

Design

The Independent variable of this experiment is the use of sounds to underscore events, while

the dependent variable is the understanding of what action was made. This experiment followed

an independent design and the type of data collected consists on a frequency.

Participants

A total of 62 participants answered to this experiment. 35 watched the versions without sound

while the remaining 27 watched the versions with sound.

Procedure

The experiment participants are divided in 2 groups:

- We showed a video of a scene to the first group where 4 actions are made by the characters.

Those actions do not have any sound associated within.

68/100

- We showed another video to the second group where the same actions happen but this time

they had the respective sound associated with it.

The last action consists of the death of the dragon. While in the first video there is no sound

played along, in the second video there is played a motif of the “Funeral March” from Frédéric

Chopin’s Piano Sonata No. 2 in B flat minor, op. 35, which it is known in popular culture as

being the music usually associated to death.

After each participant has seen its video we asked what they thought that had happened to the

dragon in the end. They could choose between the following options: fell, ran away, died, fell

asleep or none of the above.

7.5.2. Results

About 86% of the participants that watched the video without sound recognized the event in it.

In the video with sound, 96% recognized the event. However, the Mann-Whitney test gave us a

result of p = 0.166, which tells us that this difference of recognition is not significant for this test.

Fig. 38 – Experiment 5 results

7.5.3. Discussion

Even though there was an increase of the recognition of the event in the version with sound, the

inference test applied made it not significant enough to make this test conclusive since we got a

significance value p > 0.05. This might be explained by the fact that the movement that the

dragon made which represents the “die” event – falling down the screen and rotating 45º in the

end – might be a good enough representation of a dying action. Besides that, the fact that the

character that did it was the dragon – a natural villain character – might have influenced the

answers as it would be natural for it to die. We strongly believe that the use of sound helps to

understand the events made, however to be significant it have to be applied in a more

ambiguous situation. In this case, we could suggest a scenario where the character that dies is

not the dragon, but another (like the boy or the girl) and the movement done could be more

ambiguous as well.

69/100

7.6. Volume and relations between characters

The objective of this experiment was to test if the changes of volume translate into a better

perception of the strength of the relation between characters. We intended to test the

assumption that high volume corresponds to a strong relation between two characters, while a

lower volume corresponds to a weak relation. In this experiment we changed the volume

parameter in several videos and see if the participants change their answers accordingly.

7.6.1. Method

Design

The Independent variable of this experiment is the volume of the sound associated to events,

while the dependent variable is the understanding of the strength of the relation between them.

This experiment followed an independent design and the type of data collected consists on a

frequency.

Participants

A total of 62 participants answered to this experiment. 20 of them watched the video without any

sound, 15 watched the video with same volume, 13 the Low-High volume version and 14 the

High-Low volume version.

Procedure

The experiment participants were divided in 4 different groups:

- We showed a video to the first group where the boy sends two hearts: one to the girl and other

to the fairy in this order. The heart is the icon that represents the event “show love intention”. In

the end, the girl and the fairy leave the scene and the boy is left alone. There is not any sound

associated to these events.

- We showed the same video to the second group. However, the sending of the hearts has the

respective sound underscoring it. The volume of the sound in both hearts is the same.

- The same video was shown to the third group. This time, the sound underscoring the first

heart (girl) had a volume bigger than the second heart (fairy).

- We also used the same video for the fourth and last group. The sound underscoring the first

heart (girl) had a volume lower than the second heart (fairy).

In the end, we asked to the participants to tell who they thought that the boy liked the most: the

girl, the fairy or if he liked both the same.

7.6.2. Results

On the silent version, the predominant answer was “both”, with 60%. The girl got 30% and the

fairy only 10% of the answers.

70/100

On the version where the event was scored with a sound and the volume was the same on both

hearts, 73% of the participants answered “both”, while 27% answered the boy liked more the

girl. None participants answered that the boy liked more the fairy.

On the version where the volume was lower on the girl and higher on the fairy, the girl got 15%,

the fairy 62% and both 23% of the answers.

On the version where the volume was higher on the girl and lower on the fairy, the girl got 86%,

the fairy 7% and both 7% of the answers.

A Kruskal-Wallis test was applied on the results, where the value of p = 0.001 tells us that the

change of answer according to the video showed was significant, which validates our results.

Following we can see a box plot of the answers to this test:

Fig. 39 – Experiment 6 box plot

Legend: Conditions: Silent – video without any sound Same Volume – same volume on both hearts LH – lower volume for the girl heart and higher volume for the fairy heart HL – higher volume for the girl heart and lower volume for the fairy heart Answer: Girl – The boy liked more the girl Fairy – The boy liked more the fairy Both – The boy liked both the same

71/100

7.6.3. Discussion

This test successfully proved our assumption of the existing relation between sound volume and

strength of the relation between characters. The results were validated since we got a

significance value of p = 0.001. We can see that in the silent and same volume version,

participants tend to answer that the boy liked both the same because they couldn’t perceive any

difference between the two actions. However, there was still a higher percentage of answers on

the girl side than on the fairy side, since this character would be more natural to be liked by the

boy. When the changes of volume were made, the doubts dissipated and the majority of

answers fell in the side of the higher volumes (62% and 82%). However, in the version where

the volume was higher for the fairy, there was still some participants that answers that the boy

liked more the girl (23%), probably because it would be a more natural answer.

7.7. Character theme and mood

The objective of this experiment was to test if the change of the character theme according to

the mood helps to understand how that character is feeling. To test this, we presented at the

participants of this test three different situations, where the music played was different and we

then analyzed if these changes affected the answers related to the mood of the characters.

7.7.1. Method

Design

The Independent variable of this experiment is the change of character’s theme according to

mood, while the dependent variable is the understanding of the mood of a character. This

experiment followed an independent design and the type of data collected consists on a

frequency.

Participants

A total of 62 participants answered to this experiment. 20 watched the silent version, 21

watched the version with a happy theme and 21 the version with a sad theme.

Procedure

The video used in this experiment was the same that was used in the previous one. However,

this test focuses on the last part of it, when the boy is alone in scene after he sent the hearts to

the two other characters.

The test participants are then divided in 3 groups:

- In the first group, the participants saw a silent video.

- In the second group, the participants saw a video where in the end - when the boy was alone -

a theme corresponding to a happy mood was played.

72/100

- In the last group, the participants saw a video where the ending theme corresponded to a sad

mood.

In the end, we asked for the participants to tell if they thought that, in the end, the boy was very

happy, happy, sad, very sad or felt other feeling.

7.7.2. Results

In order to better analyze the results, the answers were grouped in 3 segments according to its

valence: neutral (other feeling), negative (sad and very sad) or positive (happy and very happy).

In the silent version, we can see that the negative valence got 65% of the answers, while the

positive valence got 20% of them. The remaining 15% corresponded to the neutral valence.

In the version where a happy theme was played, 81% of the answers were on the positive

valence, while 19% of the answers were on the negative side. There was no answer on the

neutral valence for this type of video.

In the version where a sad theme was played, 95% of the answers were on the negative

valence. 5% of the answers were on the positive. There were no answers on the neutral

valence for this type of video.

A Kruskal-Wallis test was applied to the results and we obtained a value p < 0.001 which

indicates that the results are significant.

Following, we can see a box plot that illustrates the results better:

Fig. 40 – Experiment 7 results

73/100

7.7.3. Discussion

This test proved the initial hypothesis that the nature of the theme played can influence the

perception of the characters mood in a story. The results were validated since we got a

significance value of p < 0.001. We can see that the sad theme was less ambiguous than the

happy theme, since it got 95% of the answers on the negative valence as supposed, while the

happy theme got less (81%) on the positive valence. When no music was played, the fact that

65% of the participants answered in the negative valence might be related with two factors: the

first part of the video was the sending of the hearts for the two characters. Then, those

characters left the scene and the boy was alone on it. Participants might have associated those

events as making the boy sad, since the other characters didn’t correspond to his hearts. The

second factor might be related to the association between loneliness and sadness. Since

humans are social beings, they don’t like to be alone. Therefore, loneliness might be associated

to sadness.

7.8. Music associated to story’s key moments –

Friendship and Duel Themes

The objective of this experiment is to test if the use of music associated to key moments such

as friendship and duel parts helps to understand what is happening in the story.

7.8.1. Method

Design

The Independent variable of this experiment is the use of friendship / duel themes, while the

dependent variable is the understanding of what is happening in the story. This experiment

followed an independent design and the type of data collected consists on a frequency.

Participants

A total of 62 participants answered to this experiment. There were used 6 different videos, and

the distribution of the participants was the following:

Fig. 41 – Participants distribution

Characters Music Participants

Boy and Dragon Silent 8

Boy and Dragon Friendship Theme 13

Boy and Dragon Duel Theme 14

Boy and Fairy Silent 7

Boy and Fairy Friendship Theme 7

Boy and Fairy Duel Theme 13

74/100

Procedure

In this experience, we showed to the participants a video where two characters were moving

and interacting with each other in an ambiguous way. The characters that interacted and the

music played in the background were changed among 6 different videos that were shown to

different groups of participants.

1) Silent video with the boy and the dragon

2) Video with the boy and the dragon with the duel theme

3) Video with the boy and the dragon with the friendship theme

4) Silent video with the boy and the fairy

5) Video with the boy and the fairy with the duel theme

6) Video with the boy and the fairy with the friendship theme

In the end, we asked for the participants to tell what they thought that the characters were doing

– if they were playing or fighting.

7.8.2. Results

Next, we will present the results for each video, showing what percentage each got for each

answer – if the characters were playing or fighting.

The video with the boy and dragon without any sound got 50% for playing and 50% for fighting.

The video with the boy and dragon scored with the duel theme got 7% for playing and 93% for

fighting. The video with the boy and dragon scored with the friendship theme got 100% for

playing and 0% for fighting. The video with the boy and fairy without any sound got 100% for

playing and 0% for fighting. The video with the boy and fairy score with the duel theme got 15%

for playing and 85% for fighting. The video with the boy and fairy scored with the friendship

theme got 100% for playing and 0% for fighting.

To analyze the significance of this test, we want to see not only if there is any influence in the

answers with the change of theme between the videos (friendship and duel) but also if there is

an influence with the change of characters (dragon and fairy). We applied then the Kruskal-

Wallis test to these 2 dimensions of independent variables: themes and characters. The first

one got a result of p < 0.001 which makes it highly significant, while the second got a result of p

= 0.542 which makes it not significant. Additionally, we verified with a third Kruskal-Wallis test

that the changes were significant among all the 6 versions of videos, which got us a significant

result of p < 0.001.

Following, we can see two graphs that illustrate the differences of the answers according to the

change of theme:

75/100

0

2

4

6

8

10

12

14

Silent Duel Friendship

Playing

Fighting

0

2

4

6

8

10

12

Silent Duel Friendship

Playing

Fighting

Boy and Dragon

Fig. 42 - Answers for the videos with the boy and the dragon

Boy and Fairy

Fig. 43 - Answers for the videos with the boy and the fairy

7.8.3. Discussion

Since the change of characters did not have any influence in the answers (non-significance of

Kruskal-Wallis test with p > 0.05), we can then conclude that the only influence in the answers

was on the theme change. This influence would be enough to completely change the opinion of

the participants about what the characters were doing.

The success of this test might be associated to the ambiguity of the videos, which would give

more space to the music to influence the participants. Since the viewers couldn’t clearly tell

what the characters were doing just by looking at their movement, they would have to get hints

coming from their hearing sense. Therefore, the music being played strongly affected their

opinion about the interaction of those characters.

76/100

7.9. Music associated to story’s key moments –

Climax Theme

The objective of this experiment was to test if the use of music associated to the climax helps to

understand the increased intensity of the environment at that point. We will then test the

participants under two different scenarios. In the first one we will have a video of a climax scene

without any sound on it. In the second one, the same video will be played but it will be scored

with one of the climax themes existent in D3S. We will then analyze the perception of intensity

by the participants under the two different conditions.

7.9.1. Method

Design

The Independent variable of this experiment is the use of climax theme, while the dependent

variable is the perception of the story intensity. This experiment followed an independent design

and the type of data collected consists on a 5-point Likert scale score (0-5).

Participants

A total of 62 participants answered to this experiment. 27 of them watched the version without

any sound while 35 watched the version with sound.

Procedure


- To the first group, we showed a video of a story when it reached its climax without any sound.

- To the second group, the same video was shown, but this time it was played along with a

climax theme

In the end, it was asked for the participants to rate their perception of the intensity of the story

on each of the videos in a 5-point Likert scale (0-5), where 0 corresponded to “Not Intense” and

5 to “Very Intense”.

7.9.2. Results

The video without any sound got a mean intensity of 2.85, while the video with sound got a

mean of 3.91.

.A Mann-Whitney test was performed in order to validate the significance of this test. A result of

p = 0.001 makes the answers changing significantly according to the video played. The

following box plot illustrates the changes of answers under the two conditions:

77/100

Fig. 44 – Experiment 9 answers

7.9.3. Discussion

The results of this experiment were validated since we got a significance value of p =

0.001.They proved that the existence of climax music adds to the perception of intensity of a

scene. This makes it beneficial to use if we want the viewers to understand the scene’s high

intensity, which is a characteristic of the climax part in almost every story.

78/100

8. Conclusions In the beginning of this thesis, we proposed to find a solution for the following question: how to

score a story, which is being created in real time by storytelling environments, using sounds and

music in a way that adds more understanding of what is happening in the story and

consequently increases the enjoyment to the viewer?

The D3S project intended to be a contribution to both the fields of interactive storytelling and

automatic scoring. With it, we intended to explore a different approach on how to produce sound

and music to accompany a story and help the viewer to understand it better while offering a

more entertaining experience.

To help the viewer to understand the story, we used sounds to score actions made by the

characters so that those actions could be better understood by the viewer. That way, the viewer

could easily distinguish who did the action through the use of different instruments for different

characters. In the case of actions towards other characters, they could understand better the

strength of the connection between those characters. This strength was perceived through

changes on volume.

Additionally, we also used background music to score the story that was being created in the

storytelling environment. This type of music would help the viewer to understand the nature of

the characters that appeared in the story, namely the hero, his friend and the villain. Besides, it

would also help to understand the key moments of the story, such as the duel, friendship and

climax scenes. The background music was also used to emphasize the emotions felt by the

hero by changing the nature of his theme. Finally, we used it as a way of making the viewer

perceive the intensity of the environment through changes in tempo.

The adding of sound and music to a previously silent environment increases the enjoyment of

the viewer, since it makes the experience more complete by using the hearing sense. The use

of pre-composed music was an important factor to contribute to the increase of the enjoyment

since it minimizes the risk of music sounding unpleasant.

The integration with the I-Sounds system was successful, since we managed to include all the

features of this thesis in that system. Additionally, we created the Apollo manager which will

help the addition of future composition modules, increasing the extensibility of the I-Sounds

system.

Besides that, even though the main storytelling system used to test this thesis features was I-

Shadows, we also integrated I-Sounds with FearNot!, proving not only the adaptability of the I-

Sounds system but also of the work developed for D3S, since it was possible to see all the

features implemented for this thesis working in that storytelling environment as well. The

successful integration with two different storytelling systems proves the generality of D3S

framework.

The results obtained were satisfying in a way that we have reached our initial proposed goals –

viewers could understand better the story being created and have a more enjoyable experience

79/100

while watching it. We validated the results for the following features: increase of enjoyment of

the story when using D3S comparing with the use of Amadeus or with no use of any sound

scoring system; better understanding of the author of an event when using instruments

associated to characters; better understanding of the emotional relation between two characters

when using differences of volume in sound; better understanding of the character’s mood when

changing the theme according to it and better perception of key moments when using different

kinds of background music.

The results for the use of themes to help to identify the hero and friend characters were not

conclusive. With this result, we can conclude that the choice of these themes was not the best

possible, which made it hard for the viewer to distinguish between both. However, we have to

take into account that the shape of the character used was the same, which might have

influenced the result. Additionally, the results for better perception of the environment’s intention

through changes in music’s tempo, use of sounds to score events as a way of understanding

them better were not conclusive either. However, even though these two tests were not

conclusive, we believe that they could be reevaluated with some changes. Those changes are

explained in the next and final section of this thesis.

8.1. Future Work

In this work we explored several features related to music and sound and used them in a way

that they could help the viewer to better understand the story. Those features, such as the

change on background music tempo, use of different instruments for different characters,

change of volume according to the emotions strength and others, were used to enhance certain

characteristics of the story being made. Other features can be explored and used in future

works. For instance, we could try to use different octaves according to the character gender. As

we know, male voices are usually lower in frequency than female voices. If we make an analogy

between the pitch of the voice and octaves, we could then play the sounds associated to the

characters in its respective octave, according to the character gender. This type of new features

and many others could be integrated within the system to create new ways of understanding the

story.

Other changes could be made in the output system. As the system is now, it only supports MIDI

files to be played. The reason for it was explained as being easier to manipulate this type of files

so we could fulfill our intention of changing musical parameters. In the future, this system could

be adapted to WAV or MP3 files, which have an overall greater quality and would produce more

interesting results. The challenge here would consist on adapt the parameter changes (such as

music tempo, volume, etc) to the new format.

Other approaches could also be made in the D3S music Manager, Apollo. Different semantics

could be explored, such as creating other ways of selecting the music to play. New composer

modules (such as Amadeus or Ludwig) could be created and integrated to the system, making

the type of music available to be played richer and new advances in the automatic composing

field could be made. The system was made in a way that it would be easy to extend it by adding

80/100

or changing the composer modules. The coordination of those modules and the decisions on

when and how to use those modules would be made at the manager level.

We could also reevaluate the tests that didn’t produce conclusive results by changing some

aspects of them. In the relation between music’s tempo and environment’s intensity, we could

make a new test where we would focus on finding a relation between the change of music and

the change of the environment’s intensity perception instead of trying to find a relation between

music’s tempo value and an intensity value. On the events’ sounds test, we can make a new

evaluation using a more ambiguous video which would eventually generate better results

related to the understanding of the events.

81/100

References [1] Frijda, Nico H. The Emotions. Cambridge (UK): Cambridge University Press, 1986, p.207

[2] Lewis PA, Critchley HD, Rotshtein P and Dolan RJ. Neural Correlates of Processing Valence

and Arousal in Affective Words, 2006

[3] Brisson, António and Paiva, Ana. Are we telling the same story? Balancing real and virtual

actors in a collaborative story creation system, AAAI Fall Symposium on Narrative Intelligence

Technologies, Westin Arlington Gateway, Arlington, Virginia, November 9-11, 2007

[4] Brisson, António. I-Shadows. Using Affect in an Emergent Interactive Drama. MSc Thesis,

Instituto Superior Técnico, Portugal, 2007

[5] Dias J, and Paiva A.: 2005, Feeling and reasoning: A computational model for emotional

characters. Proceedings of EPIA 2005, 127-140, Springer, LNCS, 2005.

[6] ECIRCUS – Education through Characters with emotional-Intelligence and Role-playing

Capabilities that Understand Social interaction - The FearNot! Demonstrator - http://www.e-

circus.org/ (Jan 2009)

[7] Aylett, Ruth , Vala, Marco, Sequeira, Pedro and Paiva, Ana. FearNot! – An Emergent

Narrative Approach to Virtual Dramas for Anti-bullying Education. ICVS 2007, LNCS 4871, pp.

202 – 205, 2007.

[8] Worth, Sarah E. Music, Emotion and Language: Using Music to Communicate, Furman

University, 1998

[9] Vaidya, Geetanjali, Music, Emotion and the Brain, 2004

[10] Blood, A.J., Zatorre, R.J, Bermudez, P., and Evans, A.C. Emotional responses to pleasant

and unpleasant music correlate with activity in paralimbic brain regions, Montreal Neurological

Institute, Montreal, Canada 1999

[11] Blood, A.J , Zatorre, R.J, Intensely pleasurable responses to music correlate with activity in

brain regions implicated with reward and emotion, Montreal Neurological Institute, Montreal,

Canada, 2001

[12] Heslet, Prof. Dr. Lars. Our Musical Brain, Danish State Hospital, 2003

[13] Lemonick, Michael, Music on the Brain: Biologists and psychologists join forces to

investigate how and why humans appreciate music., 2000

[14] Wang, M., Zhang, N., and Zhu, H., User-Adaptive Music Emotion Recognition, IEEE, Int.

Conf. Signal Processing,

pp. 1352-1355, 2004.

[15] Liu, D., Lu, L., and Zhang, H. J., Automatic Mood Detection from Acoustic Music Data,

ISMIR, 2003.

[16] Yang, D., and Lee, W., Disambiguating Music Emotion Using Software Agents, ISMIR,

2004.

[17] Yang, Yi-Hsuan, Liu, Chia-Chu and Chen, Homer H. Music Emotion Classification: A Fuzzy

Approach, Graduate Institute of Communication Engineering, National Taiwan University, 2006

[18] Thayer, R. E., The Biopsychology of Mood and Arousal, Oxford University Press, 1989

82/100

[19] Cheng-Te, Li, Man-Kwan, Shan. Emotion-based Impressionism Slideshow

with Automatic Music Accompaniment, Department of Computer Science, National Chengchi

University, Taipei, Taiwan, 2007

[20] - Juslin, P.N. Cue Utilization in Communication of Emotion in Music Performance: Relating

Performance to Perception, Journal of Experimental Psychology: Human Perception and

Performance, vol. 26, no. 6, pp 1797-1813., 2000

[21] - Bresin, R. and Friberg A., Synthesis and Decoding of Emotionally Expressive Music

Performance, IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan,

1999

[22] Gabrielsson, A. and Lindström, E. The Influence of Musical Structure on Emotional

Expression, 2001

[23] Jan Berg, Johnny Wingstedt, Relations between Selected Musical Parameters and

Expressed Emotions – Extending the Potential of Computer Entertainment, Interactive Institute,

Sweden, 2005

[24] Oppenheim, Y. The functions of film music Published in the online magazine

Film Score Monthly, http://www.filmscoremonthly.com/features/functions.asp (Dec 2007)

[25] Wikipedia, Filme Score, http://en.wikipedia.org/wiki/Film_score (Dec 2007)

[26] Encyclopedia of Music in Canada – Film Scores - www.thecanadianencyclopedia.com (Dec

2007)

[27] Gorbman, Cláudia . Unheard Melodies: Narrative Film Music. Bloomington: Indiana

University Press, 1987

[28] Spande, Robert. The Three Regimes: A Theory of Film Music, The University of Minnesota,

Minneapolis, USA, 1996

[29] John Debney, Interview in Playback Magazine,

http://www.ascap.com/playback/2003/march/debney.html (Dec 2007)

[30] Karen Collins, Functions of Game Audio, http://www.gamesound.com/functions.pdf (Dec

2007)

[31] Downie, Marc. behavior, animation music: the music and movement of synthetic characters.

BA, MSci, Magdalene College, University of Cambridge, 1998

[32] Bresin, Roberto. Virtual Virtuosity – Studies in Automatic Music Performance Doctoral

dissertation, Stockholm, Sweden: KTH, Speech Music and Hearing, 2000

[33] Casella, Pietro. Music, Agents and Emotions, MSc Thesis, Engenharia Informática e de

Computadores, Instituto Superior Técnico, Universidade Técnica, Lisboa, Portugal, 2004.

[34] De Quincey, Andrew. Herman. Master’s thesis, Division of Informatics, University of

Edinburgh, 1998

[35] Robertson, Judy, Good, Judith. Ghostwriter: A narrative virtual environment for children.

Interaction Design And Children, England, 2003

[36] Jun-Ichi Nakamura, Tetsuya Kaku, Kyungsil Hyun, Tsukasa Noma, Sho Yoshida, Automatic

background music generation based on actors’ mood and motions. The Journal of Visualization

and Computer Animation, Volume 5, Issue 4 , Pages 247 – 264, 1994

83/100

[37] Berndt, Axel, Theisel, Holger, Adaptive Musical Expression from Automatic Realtime

Orchestration and Performanc, pp. 132-143, Springer, LNCS, 2008

[38] Cope, David. Experiments in Musical Intelligence. Madison, WI: A-R Editions, 1996

[39] Propp, Vladimir. Morphology of the Folktale. 1928. 2nd ed. Trans. Lawrence Scott. Austin:

U of Texas P, 1968.

[40] Lafrance, J.D. In Dreams I Talk To You…- The use of sound in David Lynch’s Blue Velvet,

http://www.armchairdirector.org/features/archive/bluevelvet/index.htm (Dec 2007)

[41] Disney’s 1946 movie – Peter and the Wolf - http://www.imdb.com/title/tt0038836/

[42] Thulga, Phil. Peter and the Wolf - a musical story by Sergei Prokofiev,

http://www.philtulga.com/Peter.html (Apr 2008)

[43] Music Theory Online: Tempo - http://www.dolmetsch.com/musictheory5.htm (Dec 2007)

[44] Cruz, Ricardo. I-Sounds: Emotion-based Music Composition for Virtual Environments, MSc

Thesis, IST, Apr 2008

[45] Lopes, E. Just In Time: Towards a theory of rhythm and metre. PhD thesis, University of

Southampton, UK, 2003.

84/100

Appendix A

The evaluation form

The test individuals answered to an online form where they needed to watch videos / hear

sounds and then answer to questions related to those. A brief description about the form was

available to read before the start of it:

Dynamic Story Shadows and Sounds

Welcome! D3S is a system that was developed to accompany a storytelling environment. Through the use of music and sounds, it has the objective of making a better experience to the viewer, helping to his comprehension of the story and increasing the enjoyment. The following tests were developed with the objective of validating the system as a whole and some of its features in particular. Since there are used many videos with sound, we would like to ask you to use your headphones before you start. However, take into account that some videos don't have any sound. Watch every video with attention, since these should only be watched once. After each video, a question will be asked. There are not any right or wrong answers. We thank you for your participation. Click in "next" when you are ready. Fig. 45 – Brief description of the testing procedure

The structure of the form can be seen in the next figure. Notice that the questions and available

answers were all the same among the 9 versions. The only changes between them were the

videos displayed in each one.

85/100

86/100

87/100

88/100

Appendix B

Evaluation Tests – Organization and Data Calculation

Experiment 7.1 - General system evaluation - D3S and story enjoyment

Version Experiment 1

A S D A

B D S A

C S A D

D A D S

E A S D

F D A S

G S A D

H S D A

I A S D

Fig. 46 – Experiment 1 versions

Legend: S – Silent version A – Amadeus version D – D3S version

N Mean Std. Deviation Minimum Maximum

Silent 62 1,73 ,813 1 4

Amadeus 62 2,27 ,978 1 5

D3S 62 3,21 1,026 1 5

Fig. 47 - Experiment 1 Descriptive Statistics

Friedman Test

Ranks

Mean Rank

Silent 1,35

Amadeus 1,90

D3S 2,74

Test Statisticsa

N 62

Chi-Square 75,798

df 2

Asymp. Sig. ,000

a. Friedman Test

Fig. 48 - Experiment 1 Friedman Test

89/100

Experiment 7.2 - Background music’s tempo and environment’s intensity


A S

B M

C F

D S

E M

F F

G S

H M

I F


Legend: S – Slow tempo M – Medium tempo F – Fast tempo

Descriptive Statistics


Slow 21 2,57 1,028 1 5

Medium 21 3,05 1,203 1 5

Fast 20 2,80 1,196 1 5


Kruskal-Wallis Test

Ranks

Tempo N Mean Rank

Slow 21 27,95

Medium 21 34,57

Fast 20 32,00

Intensity

Total 62

Test Statisticsa,b

Intensity

Chi-Square 1,532

df 2

Asymp. Sig. ,465

a. Kruskal Wallis Test

90/100

Test Statisticsa,b

Intensity

Chi-Square 1,532

df 2

Asymp. Sig. ,465


b. Grouping Variable: Tempo

Fig. 51 - Experiment 2 Kruskal-Wallis Test

Experiment 7.3 - Association between characters and instruments



Boy 62 ,73 ,450 0 1

Girl 62 ,74 ,441 0 1

Dragon 62 ,92 ,275 0 1

Score 62 2,42 ,897 0 3


Chi-Square Test Frequencies

Boy

Observed N Expected N Residual

Wrong 17 31,0 -14,0

Right 45 31,0 14,0

Total 62

Girl


Wrong 16 31,0 -15,0

Right 46 31,0 15,0

Total 62

91/100

Dragon


Wrong 5 31,0 -26,0

Right 57 31,0 26,0

Total 62

Score


None 2 15,5 -13,5

1 Right 11 15,5 -4,5

2 Right 8 15,5 -7,5

All Right 41 15,5 25,5

Total 62

Test Statistics

Boy Girl Dragon Score

Chi-Square 12,645a 14,516

a 43,613

a 58,645

b

df 1 1 1 3

Asymp. Sig. ,000 ,000 ,000 ,000

a. 0 cells (,0%) have expected frequencies less than 5. The minimum

expected cell frequency is 31,0.

b. 0 cells (,0%) have expected frequencies less than 5. The minimum


Fig. 53 - Experiment 3 Chi-Square Test

92/100

Experiment 7.4 - Themes associated to characters


A FVHN

B S

C HFVN

D S

E VFNH

F S

G HFVN

H S

I FVHN


Legend: F – Friend Theme V – Villain Theme H – Hero Theme N – No Theme associated S – Silent Version

Versions with sound:



Hero 35 ,51 ,507 0 1

Friend 35 ,49 ,507 0 1

Villain 35 ,69 ,471 0 1

Score 35 1,71 1,045 0 3

Fig. 55 - Experiment 4 Descriptive Statistics – with sound


Hero


Wrong 17 17,5 -,5

Right 18 17,5 ,5

Total 35

93/100

Friend


Wrong 18 17,5 ,5

Right 17 17,5 -,5

Total 35

Villain


Wrong 11 17,5 -6,5

Right 24 17,5 6,5

Total 35

Score


None 5 8,8 -3,8

1 Right 10 8,8 1,3

2 Right 10 8,8 1,3

All Right 10 8,8 1,3

Total 35

Test Statistics

Hero Friend Villain Score

Chi-Square ,029a ,029

a 4,829

a 2,143

b

df 1 1 1 3

Asymp. Sig. ,866 ,866 ,028 ,543

a. 0 cells (,0%) have expected frequencies less than 5. The minimum


b. 0 cells (,0%) have expected frequencies less than 5. The minimum


Fig. 56 - Experiment 4 Chi-Square test – with sound

94/100


Hero


Blue 12 6,8 5,3

Yellow 4 6,8 -2,8

Red 10 6,8 3,3

Green 1 6,8 -5,8

Total 27

Friend


Blue 4 6,8 -2,8

Yellow 16 6,8 9,3

Red 1 6,8 -5,8

Green 6 6,8 -,8

Total 27

Villain


Blue 7 6,8 ,3

Yellow 1 6,8 -5,8

Red 13 6,8 6,3

Green 6 6,8 -,8

Total 27

Test Statistics

Hero Friend Villain

Chi-Square 11,667a 18,778

a 10,778

a

df 3 3 3

Asymp. Sig. ,009 ,000 ,013

a. 0 cells (,0%) have expected frequencies less than 5.

The minimum expected cell frequency is 6,8.

Fig. 57 - Experiment 4 Chi-Square Test – without sound

95/100

Experiment 7.5 - Use of sounds to underscore events


A X

B Y

C X

D Y

E X

F Y

G X

H Y

I X


Legend: X – Silent Y – With sound



Understanding 62 ,90 ,298 0 1


Mann-Whitney Test

Ranks

Sound N Mean Rank Sum of Ranks

Without 35 30,07 1052,50

With 27 33,35 900,50

Understanding

Total 62

Test Statisticsa

Understanding

Mann-Whitney U 422,500

Wilcoxon W 1052,500

Z -1,386

Asymp. Sig. (2-tailed) ,166

a. Grouping Variable: Sound

Fig. 60 - Experiment 5 Mann-Whitney test

96/100

Experiment 7.6 - Volume and relations between characters


A S

B S

C X

D LH

E LH

F X

G HL

H HL

I X


Legend: S – Same volume X – Silent L – Lower volume H – Higher volume

Kruskal-Wallis Test

Ranks

Conditions N Mean Rank

Silent 20 36,15

Same Volume 15 39,27

LH 13 31,69

HL 14 16,36

Answer

Total 62

Test Statisticsa,b

Answer

Chi-Square 16,360

df 3

Asymp. Sig. ,001


b. Grouping Variable:

Conditions

Fig. 62 - Experiment 6 Kruskal-Wallis test

97/100

Experiment 7.7 - Character theme and mood


A H

B S

C X

D H

E S

F X

G H

H S

I X


Legend: H – Happy theme played in the end S – Sad theme played in the end X – Silent

Kruskal-Wallis Test

Ranks

ThemeValence N Mean Rank

SilentValence 20 28,50

HappyValence 21 45,31

SadValence 21 20,55

Valence

Total 62

Test Statisticsa,b

Valence

Chi-Square 27,724

df 2

Asymp. Sig. ,000



ThemeValence

Fig. 64 - Experiment 7 Kruskal-Wallis test

98/100

Experiment 7.8 - Music associated to story’s key moments – Friendship and Duel Themes


A BDS

B BDDu

C BDFr

D BFDu

E BFS

F BFFr

G BDFr

H BFDu

I BDDu


Legend: B – Boy D – Dragon F – Fairy S – Silent Fr – Friendship theme Du – Duel theme

Kruskal-Wallis Test - Themes

Ranks

Theme N Mean Rank

Silent 15 25,77

Friendship 20 17,50

Duel 27 45,06

Answer

Total 62

Test Statisticsa,b

Answer

Chi-Square 38,753

df 2

Asymp. Sig. ,000



Theme

Fig. 66 - Experiment 8 Kruskal-Wallis test for themes

99/100

Kruskal-Wallis Test - Characters

Ranks

Character N Mean Rank

Dragon 35 32,56

Fairy 27 30,13

Answer

Total 62

Test Statisticsa,b

Answer

Chi-Square ,371

df 1

Asymp. Sig. ,542



Character

Fig. 67 - Experiment 8 Kruskal-Wallis test for characters

Kruskal-Wallis Test - Video versions

Ranks

Type N Mean Rank

BDS 8 33,00

BDF 13 17,50

BDD 14 46,29

BFS 7 17,50

BFF 7 17,50

BFD 13 43,73

Answer

Total 62

Test Statisticsa,b

Answer

Chi-Square 42,643

df 5

Asymp. Sig. ,000


b. Grouping Variable: Type

Fig. 68 - Experiment 8 Kruskal-Wallis test for versions

100/100

Experiment 7.9 - Music associated to story’s key moments – Climax Theme


A Y

B X

C Y

D X

E Y

F X

G Y

H X

I Y


Legend: Y – With climax music X – Silent


Intensity

Type Mean N Std. Deviation

Without Music 2,85 27 1,262

With Music 3,91 35 ,919

Total 3,45 62 1,197


Mann-Whitney Test

Ranks

Type N Mean Rank Sum of Ranks

Without Music 27 23,19 626,00

With Music 35 37,91 1327,00

Intensity

Total 62

Test Statisticsa

Intensity

Mann-Whitney U 248,000

Wilcoxon W 626,000

Z -3,307

Asymp. Sig. (2-tailed) ,001

a. Grouping Variable: Type

Fig. 71 - Experiment 9 Mann-Whitney test

Documents

Dynamic Story Shadows and Sounds - Técnico Lisboa...Fig. 12 - Ghostwriter .....26 Fig. 13 – Berndt’s system overview of the approach to realtime adaptive orchestration and performance.27