9
Modelling aective-based music compositional intelligence with the aid of ANS analyses Toshihito Sugimoto b , Roberto Legaspi a, * , Akihiro Ota b , Koichi Moriyama a , Satoshi Kurihara a , Masayuki Numao a a The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan b Department of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan Available online 23 November 2007 Abstract This research investigates the use of emotion data derived from analyzing change in activity in the autonomic nervous system (ANS) as revealed by brainwave production to support the creative music compositional intelligence of an adaptive interface. A relational model of the influence of musical events on the listener’s aect is first induced using inductive logic programming paradigms with the emotion data and musical score features as inputs of the induction task. The components of composition such as interval and scale, instrumen- tation, chord progression and melody are automatically combined using genetic algorithm and melodic transformation heuristics that depend on the predictive knowledge and character of the induced model. Out of the four targeted basic emotional states, namely, stress, joy, sadness, and relaxation, the empirical results reported here show that the system is able to successfully compose tunes that convey one of these aective states. Ó 2007 Elsevier B.V. All rights reserved. Keywords: Adaptive user interface; EEG-based emotion spectrum analysis; User modelling; Automated reasoning; Machine learning 1. Introduction It is no surprise that only a handful of research works have factored in human aect in creating an intelligent music system or interface (e.g., [1,6,13,17,23]). One major reason is that the general issues alone when investigating music and emotion are enough to immediately confront and intimidate the researcher. More specifically, how can music composition, which is a highly structured cognitive process, be modelled and how can emotion, which consists of very complex elements and is dependent on individuals and stimuli, be measured? [7]. The other is the fact that music is a reliable elicitor of aective response immediately raises the question as to what exactly in music can influence an individual’s mood. For example, is it the case that musi- cal structures contain related musical events (e.g., chord progression, melody change, etc.) that allow emotionally- stimulating mental images to surface? Although attempts have been made to pin-point which features of the musical structure elicit which aect (e.g., [2,20]), the problem remains compelling because the solutions are either partial or uncertain. Our research addresses the problem of determining the extent by which emotion-inducing music can be modelled and generated using creative music compositional AI. Our approach involves inducing an aects-music relations model that describes musical events related to the listener’s aective reactions and then using the predictive knowledge and character of the model to automatically control the music generation task. We have embodied our solution in a constructive adaptive user interface (CAUI) that re- arranges or composes [13] a musical piece based on one’s aect. We have reported the results of combining inductive logic programming (in [8,13]) or multiple-part learning (in [7]) to induce the model and a genetic algorithm whose fit- ness function is influenced by the model. In these previous versions of the CAUI, an evaluation instrument based on 0950-7051/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2007.11.010 * Corresponding author. Tel.: +81 6 6879 8426; fax: +81 6 6879 8428. E-mail address: [email protected] (R. Legaspi). www.elsevier.com/locate/knosys Available online at www.sciencedirect.com Knowledge-Based Systems 21 (2008) 200–208

Stress, Joy, Sadness and Relaxation From Four EEG Analyses

Embed Size (px)

DESCRIPTION

Stress, Joy, Sadness and Relaxation From Four EEG Analyses

Citation preview

  • Modelling affective-based music compositional intelligencewith the aid of ANS analyses

    Toshihito Sugimoto b, Roberto Legaspi a,*, Akihiro Ota b, Koichi Moriyama a,Satoshi Kurihara a, Masayuki Numao a

    a The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japanb Department of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan

    Available online 23 November 2007

    Abstract

    This research investigates the use of emotion data derived from analyzing change in activity in the autonomic nervous system (ANS)as revealed by brainwave production to support the creative music compositional intelligence of an adaptive interface. A relational modelof the influence of musical events on the listeners affect is first induced using inductive logic programming paradigms with the emotiondata and musical score features as inputs of the induction task. The components of composition such as interval and scale, instrumen-tation, chord progression and melody are automatically combined using genetic algorithm and melodic transformation heuristics thatdepend on the predictive knowledge and character of the induced model. Out of the four targeted basic emotional states, namely, stress,joy, sadness, and relaxation, the empirical results reported here show that the system is able to successfully compose tunes that convey oneof these affective states.! 2007 Elsevier B.V. All rights reserved.

    Keywords: Adaptive user interface; EEG-based emotion spectrum analysis; User modelling; Automated reasoning; Machine learning

    1. Introduction

    It is no surprise that only a handful of research workshave factored in human affect in creating an intelligentmusic system or interface (e.g., [1,6,13,17,23]). One majorreason is that the general issues alone when investigatingmusic and emotion are enough to immediately confrontand intimidate the researcher. More specifically, how canmusic composition, which is a highly structured cognitiveprocess, be modelled and how can emotion, which consistsof very complex elements and is dependent on individualsand stimuli, be measured? [7]. The other is the fact thatmusic is a reliable elicitor of affective response immediatelyraises the question as to what exactly in music can influencean individuals mood. For example, is it the case that musi-cal structures contain related musical events (e.g., chordprogression, melody change, etc.) that allow emotionally-

    stimulating mental images to surface? Although attemptshave been made to pin-point which features of the musicalstructure elicit which affect (e.g., [2,20]), the problemremains compelling because the solutions are either partialor uncertain.

    Our research addresses the problem of determining theextent by which emotion-inducing music can be modelledand generated using creative music compositional AI.Our approach involves inducing an affects-music relationsmodel that describes musical events related to the listenersaffective reactions and then using the predictive knowledgeand character of the model to automatically control themusic generation task. We have embodied our solution ina constructive adaptive user interface (CAUI) that re-arranges or composes [13] a musical piece based on onesaffect. We have reported the results of combining inductivelogic programming (in [8,13]) or multiple-part learning (in[7]) to induce the model and a genetic algorithm whose fit-ness function is influenced by the model. In these previousversions of the CAUI, an evaluation instrument based on

    0950-7051/$ - see front matter ! 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.knosys.2007.11.010

    * Corresponding author. Tel.: +81 6 6879 8426; fax: +81 6 6879 8428.E-mail address: [email protected] (R. Legaspi).

    www.elsevier.com/locate/knosys

    Available online at www.sciencedirect.com

    Knowledge-Based Systems 21 (2008) 200208

  • the semantic differential method (SDM) was used to mea-sure affective responses. The listener rated musical pieceson a scale of 15 for a set of bipolar affective descriptorpairs (e.g., happysad). Each subjective rating indicatesthe degree of the positive or negative affect.

    We argue that for the CAUI to accurately capture thelisteners affective responses, it must satisfy necessary con-ditions that the SDM-based self-reporting instrument doesnot address. Emotion detection must capture the dynamicnature of both music and emotion. With the rating instru-ment, the listener can only evaluate after the music isplayed. This means that only one evaluation is mappedto the entire musical piece rather than having possibly var-ied evaluations as the musical events unfold. Secondly, thedetection task should not impose a heavy cognitive loadupon the listener. It must ensure that listening to musicremains enjoyable and avoid, if not minimize, disturbingthe listener. In our prior experiments, the listener was askedto evaluate 75 musical pieces, getting interrupted the samenumber of times. If indeed the listener experienced stress oranxiety in the process, it was difficult to factor this in thecalculations. Lastly, the emotion detection task should belanguage independent, which can later on permit cross-cul-tural analyses. This flexibility evades the need to change theaffective labels (e.g., Japanese to English).

    We believe that the conditions stated above can be sat-isfied by using a device that can analyze emotional statesby observing the change in activity in the autonomic ner-vous system (ANS). Any intense feeling has consequentphysiological effects on the ANS [19]. These effects includefaster and stronger heartbeat, increased blood pressure orbreathing rate, muscle tension and sweating, acceleratedmental activity, among others. This is the reason ANSeffects can be observed using devices that can measureblood pressure, skin or heart responses, or brainwave pro-duction. Researchers in the field of affective computing areactive in developing such devices (e.g., [14]). We have mod-ified the learning architecture of the CAUI to incorporatean emotion spectrum analyzing system (ESA)1 that detectsemotional states by observing brainwave activities thataccompany the emotion [11].

    The learning architecture is shown in Fig. 1. The rela-tional model is induced by employing the inductive logicprogramming paradigms of FOIL and R taking as inputsthe musical score features and the ESA-provided emotiondata. The musical score features are represented as defini-tions of first-order logic predicates and serve as back-ground knowledge to the induction task. The next taskemploys a genetic algorithm (GA) that produces variantsof the original score features. The fitness function of theGA fits each generated variant to the knowledge providedby the model and music theory. Finally, the CAUI createsusing its melody-generating module an initial tune consist-

    ing of the GA-obtained chord tones and then alters certainchord tones to become non-chord tones in order to embel-lish the tune.

    Using the ESA has several advantages. The dynamicchanges in both emotions and musical events can now bemonitored and mapped continuously over time. Secondly,it allows mapping of emotion down to the musical barlevel. This means that many training examples can beobtained from a single piece. Using the self-reportinginstrument, the listener needed to hear and evaluate manymusical pieces just to obtain fairly enough examples.Thirdly, more accurate measurements can now be acquiredobjectively. Lastly, it is unobtrusive thereby relieving thelistener of any cognitive load and allowing him/her to justsit back and listen to the music.

    In this paper, we first discuss the domain knowledge rep-resentations, learning parameters and learning tasks usedfor the CAUI in Sections 2 to 3. Section 4 details our exper-imentation methodology and analysis of the empiricalresults we gathered. Section 5 briefly locates the contribu-tion of the CAUI in the field. Discussions on what weintend to carry out as possible future works can be foundpart of our analysis and conclusion.

    2. Knowledge acquisition and representation

    In order to obtain a personalized model of the couplingof emotional expressions and the underlying music param-eters, it is vital to: (1) identify which musical features (e.g.,tempo, rhythm, harmony, etc.) should be represented asbackground knowledge, (2) provide an instrument tomap the features to identified emotion descriptors, (3) log-ically represent the music parameters, and (4) automati-cally induce the model. Although the influence of variousfeatures have been well studied (e.g., refer to a comprehen-sive summary on the influence of compositional parameters[2] and an overview of recent investigations on the influenceof performance parameters [4,5]), the task of the CAUI isto automatically find musical structure and sequence fea-tures that are influential to specific emotions.

    2.1. Music theory

    The aspect of music theory relevant to our research isthe interaction of music elements into patterns that can

    1 Developed by the Brain Functions Laboratory, Inc. (http://www.bfl.co.jp/main.html).

    Fig. 1. The learning architecture of the CAUI.

    T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 201

  • help the composition techniques. We have a narrow musictheory that consists of a limited set of music elements (seeFig. 2). The reason is that we need the predictive model tobe tractable in order to perform controlled experimenta-tions and obtain interpretable results. The definitions ofthe concepts listed in Fig. 2 can be found in texts on musictheory. The methods by which music theory is utilized bythe genetic algorithm and melodic transformation heuris-tics are explained in Section 3.

    Fourteen musical piece segments were prepared consist-ing of four pieces from classical music, three from JapanesePop, and seven from harmony textbooks. The amount oftime a segment may play is from 7.4 to 48 s (an averageof 24.14 s). These pieces were selected, albeit not randomly,from the original 75 segments that were used in our previ-ous experiments. Based on prior results, these selectedpieces demonstrate a high degree of variance in emotionalcontent when evaluated by previous users of the system. Inother words, these pieces seem to elicit affective flavoursthat are more distinguishable.

    2.2. Emotion acquisition features of the ESA

    Through proper signal processing, scalp potentials thatare measured by an electroencephalograph (EEG) canprovide global information about mental activities and

    emotional states [11]. With the ESA, EEG features associ-ated with emotional states are extracted into a set of 45cross-correlation coefficients. These coefficients are calcu-lated for each of the h(58 Hz), a(813 Hz) and b(1320 Hz) frequency components forming a 135-dim EEGstate vector. Operating a transformation matrix on thisstate vector linearly transforms it to a 4-dim vectorE = e1,e2,e3,e4, with the four components representinglevels of stress, joy, sadness and relaxation, respectively.The maximum time resolution of the emotion analysis per-formed in real-time is 0.64 s. More detailed discussions onthe ideas behind ESA can be found in [11]. The emotioncharts in Fig. 3 graphically show series of readings thatwere taken over time. The higher the value means themore evident is the emotion being displayed. The twowave charts at the bottom indicate levels of alertnessand concentration, respectively. These readings help gaugethe reliability of the emotion readings. For example, thelevel of alertness should be high when the music is beingplayed indicating that the listener is being keen to thetune. Low alert points are valid so long as these corre-spond to the silent pauses inserted between tunes sincethere is no need for the user to listen to the pauses. How-ever, acceptably high values for concentration should beexpected at any point in time. The collected emotion dataare then used by the model induction task.

    Fig. 2. Basic aspects of music theory that are being used for this version of the CAUI.

    Fig. 3. EEG signals used for emotion analyses are obtained using scalp electrodes.

    202 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

  • Brainwave analysis is a delicate task that can easily bedistorted by external factors including an eye blink. Hence,careful attention needs to be given when acquiring thereadings. The listener needs to be in a closed room withminimal noise and other external distractions as possible.The listener is also required to close his/her eyes at alltimes. This set-up is necessary to obtain stable readings.Any series of measurements should be taken without dis-turbing the listener.

    2.3. First-order logic representation of the score features

    The background knowledge of the CAUI are definitionsin first-order logic that describe musical score features.The language of first-order logic, or predicate logic, isknown to be well-suited both for data representationand describing the desired outputs. The representationalpower of predicate logic permits describing existing featurerelations among data, even complex relations, and pro-vides comprehensibility of the learned results [12]. Scorefeatures were encoded into a predicate variable, or rela-tion, named music(), which contains one song_frame()

    and a list of sequenced chord() relations describing theframe and chord features, respectively. Fig. 4 shows themusic() representation (- means NIL) of the musicalscore segment of the prelude of Jacques OffenbachsOrphee aux Enfers.

    The CAUI needs to learn three kinds of target relationsor rules, namely, frame(), pair() and triplet(), wherein thelast two represent patterns of two and three successivechords, respectively. These rules comprise the affects-musicrelational model. Fig. 5-left, for example, shows structuralinformation contained in the given sample relations andthe actual musical notation they represent. Fig. 5-rightshows a segment of an actual model learned by the CAUIthat can be used to construct a musical piece that is sup-posed to induce in one user a sad feeling.

    2.4. Model induction using FOIL and R

    The CAUI employs the combination of FOIL and R(Refinement by Example) to model the musical structuresthat correlate with the listeners emotions with the musicalstructures comprising the set of training examples.

    Fig. 5. A segment of a set of rules that are supposed to stimulate a sad feeling.

    Fig. 4. A musical score represented in music() predicate.

    T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 203

  • FOIL [16] is a first-order inductive learning system thatinduces a theory represented as function-free Horn clauses.Each clause is a conjunction of literals, where each literalconsists of a relation and an ordering of the variable argu-ments of the relation. The training examples are repre-sented extensionally as sets of ground tuples, i.e., theconstant values of the relations present in the examples.Tuples belonging or not belonging to the relation arelabelled as and tuples, respectively. FOIL assumesthat all tuples exhibit a relationship R and the tuplesdo not. FOIL iteratively learns a clause of the theory andremoves from the training set the tuples of the relationR covered by that clause until all tuples are covered byone or more clauses.

    Induction of a single clause starts with it having anempty body, and body literals are iteratively added at theend of the clause until no tuple is covered by the clause.FOIL selects one literal to be added from a set of candidateliterals based on an information gain heuristic that esti-mates the utility of a literal in discriminating from tuples. The information gained for adding a literal is com-puted as

    GainLi Ti % IT i & IT i1 1

    IT i &log2Ti

    Ti T&i; IT i1 &log2

    Ti1Ti1 T&i1

    2

    Ti and T&i denote the number of and tuples in the

    training set Ti. Adding the literal Lm to the partially devel-oping clause: = R(v1,v2,. . .,vk) :-L1, L2,. . .,Lm&1 results tothe new set Ti+1, which contains the tuples that remainedfrom Ti. Ti denotes the number of tuples in T

    i that led

    to another tuple after adding Lm. The candidate literalLi that yields the largest gain becomes Lm.

    R [21] is a system that automatically refines the theoryin the function-free first-order logic. It assumes that theinduced theory can only be approximately correct, hence,needs to be refined to improve its accuracy using the train-ing examples. R implements a four-step theory revisionprocess, i.e., (1) operationalization, (2) specialization, (3)rule creation, and (4) unoperationalization. Operational-ization expands the theory into a set of operational clauses,detecting and removing useless literals. A literal is useful ifits normalized gain, i.e., computing only for I(Ti) I(Ti+1)of Eq. (1), is >h, where h is a specified threshold, and if itproduces new variables for the other literals in the clause,i.e., it is generative [21]. R considers the useless literalsas faults in the theory. Specialization uses FOIL to add lit-erals to the overly general clauses covering tuples tomake them more specific. Rule creation uses FOIL tointroduce more operational clauses in case some tuplescannot be covered by existing ones. Finally, unoperational-ization re-organizes the clauses to reflect the hierarchicalstructure of the original theory.

    The training examples suitable for inducing the modelare generated as follows. Each musical piece is divided intomusical bars or measures. A piece may contain eight to 16

    bars (an average of 11.6 bars per piece). Every three succes-sive bars in a piece together with the music frame are trea-ted as one training example, i.e., examplei = (frame, bari&2,bari&1, bari). Each bar consists of a maximum of fourchords. The idea here is that sound flowing from at leastthree bars is needed to elicit an affective response. The firsttwo examples in every piece, however, will inherently con-tain only one and two bars, respectively. The componentsof each bar are extracted from music() and representedas ground tuples. A total of 162 examples were obtainedfrom the 14 pieces with each bar having an average play-time of 2.1 s.

    Recall that emotion readings are taken while the musicis being played. Using the available synchronization toolsof the ESA and music segmenting tools, the emotion mea-surements are assigned to the corresponding musical seg-ments. Subsequently, each emotion measure is discretizedto a value between 1 and 5 based on a pre-determinedthreshold. Using the same range of values as that of theSDM-based instrument permits us to retain the learningtechniques in [8] while evaluating the new emotion detec-tion scheme. It is also plausible for us to define a set ofbipolar affective descriptor pairs ed1ed2 (e.g., joyfulnotjoyful). It is important to note that antonymic semantics(e.g., stressed vs. relaxed and joyful vs. sad) do not holdfor the ESA since the four emotions are defined alongorthogonal dimensions. Hence, four separate readings aretaken instead of just treating one as inversely proportionalto the other. This is consistent with the circumplex modelof affect [15] where each of the four emotions can be seenin different quadrants of this model. One relational modelis learned for each affect in the four bipolar emotion pairsed1ed2 (a total of 4 2 = 8 models).

    To generate the training instances specific to FOIL, forany emotion descriptor ed1 in the pair ed1ed2, the exam-ples labelled as 5 are represented as tuples, while thoselabelled as 64 as tuples. Conversely for ed2, and tuples are formed from bars which were evaluated as 1andP2, respectively. In other words, there are correspond-ing sets of and tuples for each affect and a tuple fored1 does not mean that it is a tuple for ed2. Examples arederived almost in the same way for FOIL+R . For exam-ple, the tuples of ed1 and ed2 are formed from barslabelled as P4 and 62, respectively.

    3. Composing using GA and melody heuristics

    Evolutionary computational models have been dominat-ing the realm of automatic music composition (as reviewedby [24]). One major problem in user-oriented GA-basedmusic creation (e.g., [3,22]), however, is that the user isrequired to listen and then rate the composed musicalsequences in each generation. This is obviously burden-some, tiring and time-consuming. Although the CAUI isuser-oriented, it need not solicit user intervention since ituses the relational model as critic to control the qualityof the composed tunes.

    204 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

  • We adapted the conventional bit-string chromosomerepresentation in GA as a columns-of-bits representationexpressed in music() form (see Fig. 6, where F is the song_frame() and Ci is a chord()). Each bit in a column repre-sents a component of the frame (e.g., tempo) or chord(e.g., root). The performance of our GA depends on twobasic operators, namely, single-point crossover and muta-tion. With the first operator, the columns of bit stringsfrom the beginning of the chromosome to a selectedcrossover point is copied from one parent and the rest iscopied from the other. Mutation inverts selected bitsthereby altering the individual frame and chord informa-tion. The more fundamental components (e.g., tempo,rhythm and root) are mutated less frequently to avoid adrastic change in musical events, while the other featuresare varied more frequently to acquire more variants.

    The fundamental idea of GA is to produce increasinglybetter solutions in each new generation of the evolutionaryprocess. During the genetic evolution process, candidatechromosomes are being produced that may be better orworse than what has already been obtained. Hence, the fit-ness function is necessary to evaluate the utility of eachcandidate. The CAUIs fitness function takes into accountthe user-specific relational model and music theory:

    fitnessChromosomeM fitnessUserM fitnessTheoryM 3

    where M is a candidate chromosome. This function makesit possible to generate frames and chord progressions thatfit the music theory and stimulate the target feeling. fitness-User(M) is computed as follows:

    fitnessUserM fitnessFrameM fitnessPairM fitnessTripletM 4

    Each function at the right-hand side of Eq. (4) is generallycomputed as follows:

    fitnessX M XLi1

    AveragedF P i; d0F P i; dFRP i; d0FRP i

    5The meanings of the objects in Eq. (5) are shown in Table1. The only variable parameter is Pi, which denotes thecomponent/s extracted from M, that will serve as inputto the four subfunctions of fitnessX. If there are n chord()predicates in M, there will be L Pis formed depending onthe fitnessX. For example, given chromosome M: = music(song_frame(),chord1(),. . .,chord8()), where the added sub-

    scripts denote chord positions, computing for fitnes-sPair(M) will have 7 Pis (L = 8-1): P1 = (chord1(),chord2()),. . .,P7 = (chord7(),chord8()). With fitnessFrame(M),it will only be P1 = song_frame().

    The values of the subfunctions in Eq. (5) will differdepending on whether an ed1 (e.g., sad) or ed2 (e.g., notsad) music is being composed. Let us denote the targetaffect of the current composition as emoP and the oppositeof this affect as emoN (e.g., if ed1 is emoP then emoN refersto ed2, and vice versa). dF and dFR (where F and FR refer tothe models obtained using FOIL alone or FOIL+R ,respectively) return +2 and +1, respectively, if Pi appearsin any of the corresponding target relations (see Table 1)in the model learned for emoP. On the other hand, d

    0F

    and d0FR return &2 and &1, respectively, if Pi appears inany of the corresponding relations in the emoN model. Ineffect, the structure Pi is rewarded if it is part of the desiredrelations and is penalized if it also appears in the model forthe opposite affect since it does not possess a distinct affec-tive flavour. The returned values (2 and 1) were deter-mined empirically.

    fitnessTheory(M) seeks to reward chromosomes that areconsistent with our music theory and penalize those thatviolate. This is computed in the same way as Eq. (4) exceptthat each of the three functions at the right shall now becomputed as

    fitnessX M XLi1

    AveragegP i 6

    The definitions of the objects in Eq. (6) follow the ones inTable 1 except that Pi is no longer checked with the rela-tional models but with the music theory. The subfunctiong returns the score of fitting Pi with the music theory, whichis either a reward or a penalty. Structures that earn a highreward include frames that have complete or half cadence,chord triplets that contain the transition Tfi Sfi D of thetonal functions tonic (T), subdominant (S) and dominant(D), and pairs that transition from dominant to secondarydominant (e.g., V/IIfi II). On the other hand, penalty isgiven to pairs or triplets that have the same root, form

    Fig. 6. GA chromosome structure and operators.

    Table 1Meanings of the objects in Eq. (5)

    fitnessX Pi (component/s of M) L Target relation

    fitnessFrame song_frame() 1 frame()fitnessPair (chordi(),chordi+1()) n & 1 pair()fitnessTriplet (chordi(),chordi+1(),chordi+2()) n & 2 triplet()

    T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 205

  • and inversion values, have the same tonal function andform, or have the transition Dfi S. All these heuristicsare grounded in basic music theory. For example, the ca-dence types are scored based on the strength of their effectssuch that the complete cadence is given the highest scoresince it is the strongest. Another is that the transitionTfi Sfi D is rewarded since it is often used and manysongs have been written using this. Dfi S is penalizedsince a dominant chord will not resolve with asubdominant.

    Overall, the scheme we just described is defensible giventhat music theory can be represented using heuristics forevaluating the fitness of each GA-generated music variant.The character of each generated variant is immediately fitnot just to the music theory, but more importantly, tothe desired affective perception. It is also clear in the com-putations that the presence of the models permits theabsence of human intervention during composition therebyrelieving the user of unnecessary cognitive load and achiev-ing full automation. Fig. 7 shows one of the best-fit GA-generated chromosomes to stimulate a sad feeling.

    The outputs of the GA contain only chord progressions.Musical lines with only chord tones may sound monoto-nous or homophonic. A non-chord tone may serve toembellish the melodic motion surrounding the chord tones.The CAUIs melody-generating module first generateschord tones using the GA-obtained music() informationand then utilizes a set of heuristics to generate the non-chord tones in order to create a non-monotonic piece ofmusic.

    To create the chord tones, certain aspects of music the-ory are adopted including the harmonic relations V7fi I(or Dfi T, which is known to be very strong), Tfi D,Tfi S, Sfi T, and Sfi D, and keeping the intervals inoctaves. Once the chord tones are created, the non-chordtones, which are supposed to be not members of the accom-panying chords, are generated by selecting and disturb-ing the chord tones. All chord tones have an equalchance of being selected. Once selected, a chord tone ismodified into a non-chordal broderie, appoggiatura or pass-ing tone. How these non-chord tones are adopted for theCAUI is detailed in [7].

    4. Experimentation and analysis of results

    We performed a set of individualized experiments todetermine whether the CAUI-composed pieces can actuallystimulate the target emotion. Sixteen subjects were asked to

    hear the 14 musical pieces, at the same time, wear theESAs helmet. The subjects were all Japanese male withages ranging from 18 to 27 years. Although it is ideal toincrease the heterogeneity of the subjects profile, it seemsmore appropriate at this stage to limit their diversity interms of their background and focus more on the possiblyexisting differences in their emotional reactions. For thesubject to hear the music playing continuously, all thepieces were sequenced using a music editing tool and silentpauses of 15 s each were inserted before and after eachpiece with the exemption of the first which is preceded bya 30-s silence so as to condition the subject. Personalizedmodels were learned for each subject based on their emo-tion readings and new pieces were composed independentlyfor each. The same subjects were then asked to go throughthe same process using the set of newly composed pieces.Twenty-four tunes were composed for each subject, i.e.,three for each of the bipolar affective descriptors. Fig. 8shows that the CAUI was able to compose a sad piece, evenwithout prior handcrafted knowledge of any affect-induc-ing piece.

    We computed for the difference of the averaged emotionreadings for each ed1ed2 pair. The motivation here is thatthe higher the difference is the more distinct/distinguishableis the affective flavour of the composed pieces. We also per-formed a paired t-test on the differences to determine ifthese are significant. Table 2 shows that the composedsad pieces are the only ones that correlate with the subjectsemotions. A positive difference was seen in many instances,albeit not necessarily significant statistically. This indicatesthat the system is not able to differentiate the structuresthat can arouse such impressions.

    The version of the CAUI reported in [8] is similar to thecurrent except for two things: (1) it used self-reporting and(2) evaluated on a whole-music, instead of bar, level. Itscompositions are significant in only two out of six emotiondimensions at level a = 0.01 using students t-test. The cur-rent version used only 14 pieces but was able to producesignificant outputs for one emotion. This shows that wecannot easily dismiss the potential of the current version.

    The results obtained can be viewed as acceptable if thecurrent form of the research is taken as a proof of concept.The acceptably sufficient result for one of the emotiondimensions shows a promise in the direction we are head-ing and motivates to further enhance the systems capabil-ity in terms of its learning techniques. The unsatisfactory

    Fig. 7. An actual GA-generated musical piece.

    Fig. 8. A CAUI-composed sad musical piece.

    206 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

  • results obtained for the other emotion descriptors can alsobe attributed to shortcomings in creating adequately struc-tured tunes due to our narrow music theory. For instance,the composed tunes at this stage consist only of eight barsand are rhythmically monotonic. Admittedly, we need totake more of music theory into consideration. Secondly,since the number of training examples has been downsized,the number of distinct frames, i.e., in terms of attribute val-ues, became fewer. There is no doubt that integrating themore complex musical knowledge and scaling to a largerdataset are feasible provided that the CAUI sufficientlydefines and represents the degrees of musical complexity(e.g., structure in the melody) and acquires the needed stor-age to store the training data (this has become our immedi-ate obstacle). It is also an option to investigate the effect ofjust a single music element that is very influential in creat-ing music and stimulating emotions (e.g., the role of beat inAfrican music). This will permit a more focused study whilelessening the complexity in scope.

    5. Related works

    To comprehend the significant link that unites music andemotion has been a subject of considerable interest involv-ing various fields (refer to [5]). For about five decades, arti-ficial intelligence has played a crucial role in computerizedmusic (reviewed in [10]), yet there seems to be a scarcity ofresearch that tackles the compelling issues of a user affect-specific automated composition. As far as our limitedknowledge of the literature is concerned, it has been difficultto find a study that aims to measure the emotional influenceof music and then heads towards a fully automated compo-

    sition task. This is in contrast to certain works that did notdeal with music composition even if they have achieveddetecting the emotional influence of music (e.g., [1,9]) orto systems that solicit users ratings during composition(e.g., [22,23]). Other works attempt to compose music withEEG or other biological signals as direct generative source(e.g., refer to the concepts outlined in [18]) but may not nec-essarily distinguish the affective characteristics of the com-posed pieces. We single out the work of Kim and Andre[6] which deals with more affective dimensions whose mea-sures are based on users self-report and results of physiolog-ical sensing. It differs with the CAUI in the sense that it doesnot induce a relational model and it dealt primarily withgenerating rhythms.

    6. Conclusion

    This paper proposes the technique of composing musicbased on the users emotions as analyzed from changes inbrainwave activities. The results reported here show thatlearning is feasible even with the currently small trainingset. The current architecture also permitted evading a tiringand burdensome self-reporting as emotion detection taskwhile achieving partial success in composing an emotion-inducing tune. We cannot deny that the system falls a longway short of human composers, nevertheless, we believethat the potential of its compositional intelligence shouldnot be easily dismissed.

    The CAUIs learning architecture will remain viable evenif other ANS measuring devices are used. The problem withthe ESA is that it practically limits itself from being boughtby ordinary people since it is expensive and it restricts usersmobility (e.g., eye blinks can easily introduce noises).We arecurrently developing a multi-modal emotion recognitionscheme that will allow us to investigate other means to mea-sure expressed emotions (e.g., through ANS response andhuman locomotive features) using devices that permitmobility and are cheaper than the ESA.

    References

    [1] R. Bresin, A. Friberg, Emotional coloring of computer-controlledmusic performance, Computer Music Journal 24 (4) (2000) 4462.

    [2] A. Gabrielsson, E. Lindstrom, The influence of musical structure onemotional expression, in: P.N. Juslin, J.A. Sloboda (Eds.), Music andEmotion: Theory and Research, Oxford University Press, New York,2001, pp. 223248.

    [3] B.E. Johanson, R. Poli, GP-Music: An interactive genetic program-ming system for music generation with automated fitness raters,Technical Report CSRP-98-13, School of Computer Science, TheUniversity of Birmingham, 1998.

    [4] P.N. Juslin, Studies of music performance: A theoretical analysis ofempirical findings, in: Proc. Stockholm Music Acoustics Conference,2003, pp. 513516.

    [5] P.N. Juslin, J.A. Sloboda, Music and Emotion: Theory and Research,Oxford University Press, New York, 2001.

    [6] S. Kim, E. Andre, Composing affective music with a generate andsense approach, in: V. Barr, Z. Markov (Eds.), Proc. 17th Interna-tional FLAIRS Conference, Special Track on AI and Music, AAAIPress, 2004.

    Table 2Results of empirical validation

    Subject Stressed Joyful Sad Relaxed

    Average difference of ed1 (+) and ed2 (&) emotion analyses valuesA 1.67 2.33 &0.67 &3.00B 0.67 0.33 &1.33 1.33C 1.00 &1.00 0.67 &1.33D &1.00 0.67 0.67 2.33E &2.67 1.00 1.33 1.00F 0.67 0.33 0.00 &0.67G 0.67 0.33 1.67 1.33H 1.00 0.00 1.33 &0.67I 0.67 &0.33 1.67 &0.67J 0.67 0.33 &0.33 &2.00K 0.33 &0.33 0.67 0.00L 0.67 0.33 2.33 0.00M &0.67 0.33 0.33 &1.33N &0.33 &2.33 1.00 2.00O 0.33 0.33 0.67 1.00P &1.67 &1.67 0.00 1.00Average 0.13 0.04 0.63 0.02Sample variance 1.18 1.07 0.85 2.12Standard error 0.28 0.27 0.24 0.38t Value 0.45 0.16 2.63 0.06

    Significant (5%) False False True FalseSignificant (1%) False False True False

    T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 207

  • [7] R. Legaspi, Y. Hashimoto, K. Moriyama, S. Kurihara, M. Numao,Music compositional intelligence with an affective flavour, in: Proc.12th International Conference on Intelligent User Interfaces, ACMPress, 2007, pp. 216224.

    [8] R. Legaspi, Y. Hashimoto, M. Numao, An emotion-driven musicalpiece generator for a constructive adaptive user interface, in: Proc. 9thPacific Rim International Conference on Artificial Intelligence,Lecture Notes in Artificial Intelligence, vol. 4009, Springer, 2006,pp. 890894.

    [9] T. Li, M. Ogihara, Detecting emotion in music, in: Proc. 4thInternational Conference on Music Information Retrieval, 2003, pp.239240.

    [10] R. Lopez de Mantaras, J.L. Arcos, AI and Music: From Compositionto Expressive Performances, AI Magazine 23 (3) (2002) 4357.

    [11] T. Musha, Y. Terasaki, H.A. Haque, G.A. Ivanitsky, FeatureExtraction from EEGs Associated with Emotions, Artif Life Robotics1 (1997) 1519.

    [12] C. Nattee, S. Sinthupinyo, M. Numao, T. Okada, Learning first-orderrules from data with multiple parts: Applications on mining chemicalcompound data, in: Proc. 21st International Conference on MachineLearning, 2004, pp. 7785.

    [13] M. Numao, S. Takagi, K. Nakamura, Constructive adaptive userinterfaces Composing music based on human feelings, in: Proc.18th National Conference on AI, AAAI Press, 2002, pp. 193198.

    [14] R.W. Picard, J. Healey, Affective Wearables, Personal and Ubiqui-tous Computing 1 (4) (1997) 231240.

    [15] J. Posner, J.A. Russell, B.S. Peterson, The circumplex model of affect:an integrative approach to affective neuroscience, cognitive develop-

    ment, and psychopathology, Development and Psychopathology 17(2005) 715734.

    [16] J.R. Quinlan, Learning logical definitions from relations, MachineLearning 5 (1990) 239266.

    [17] D. Riecken, Wolfgang: Emotions plus goals enable learning, in:Proc. IEEE International Conference on Systems, Man and Cyber-netics, 1998, pp. 11191120.

    [18] D. Rosenboom, Extended Musical Interface with the HumanNervous System: Assessment and Prospectus, Leonardo MonographSeries, Monograph No. 1 (1990/1997).

    [19] C. Roz, The autonomic nervous system: Barometer of emotionalintensity and internal conflict, A lecture given for Confer, 27 March2001, a copy can be found in: http://www.thinkbody.co.uk/papers/autonomic-nervous-system.htm.

    [20] J.A. Sloboda, Music structure and emotional response: some empir-ical findings, Psychology of Music 19 (2) (1991) 110120.

    [21] S. Tangkitvanich, M. Shimura, Refining a relational theory withmultiple faults in the concept and subconcept, in: Machine Learning:Proc. of the Ninth International Workshop, 1992, pp. 436444.

    [22] M. Unehara, T. Onisawa, Interactive music composition system Composition of 16-bars musical work with a melody part and backingparts, in: Proc. IEEE International Conference on Systems, Man andCybernetics, 2004, pp. 57365741.

    [23] M. Unehara, T. Onisawa, Music composition system based onsubjective evaluation, in: Proc. IEEE International Conference onSystems, Man and Cybernetics, 2003, pp. 980986.

    [24] G.A. Wiggins, G. Papadopoulos, S. Phon-Amnuaisuk, A. Tuson,Evolutionary Methods for Musical Composition, InternationalJournal of Computing Anticipatory Systems 1 (1) (1999).

    208 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

    Modelling affective-based music compositional intelligence with the aid of ANS analysesIntroductionKnowledge acquisition and representationMusic theoryEmotion acquisition features of the ESAFirst-order logic representation of the score featuresModel induction using FOIL and R

    Composing using GA and melody heuristicsExperimentation and analysis of resultsRelated worksConclusionReferences