Evolutionary Music Composition - COnnecting REpositories · Master of Science in Computer Science June 2011 Pauline Haddow, IDI Submission date: Supervisor: Norwegian University of

Master of Science in Computer ScienceJune 2011Pauline Haddow, IDI

Submission date:Supervisor:

Norwegian University of Science and TechnologyDepartment of Computer and Information Science

Evolutionary Music CompositionA Quantitative Approach

Johannes Høydahl Jensen

i

Problem Description

Artificial Evolution has shown great potential for musical tasks. However, amajor challenge faced in Evolutionary Music Composition systems is findinga suitable fitness function. The aim of this research is to devise an automaticfitness function for the evolution of novel melodies.

Assignment given: 15. January 2011

Supervisor: Pauline Haddow, IDI

iii

Abstract

Artificial Evolution has shown great potential in the musical domain. Onetask in which Evolutionary techniques have shown special promise is in theautomatic creation or composition of music. However, a major challengefaced when constructing evolutionary music composition systems is findinga suitable fitness function.

Several approaches to fitness have been tried. The most common is interact-ive evaluation. However, major efficiency challenges with such an approachhave inspired the search for automatic alternatives.

In this thesis, a music composition system is presented for the evolution ofnovel melodies. Motivated by the repetitive nature of music, a quantitativeapproach to automatic fitness is pursued. Two techniques are explored thatboth operate on frequency distributions of musical events. The first buildson Zipf’s Law, which captures the scaling properties of music. Statisticalsimilarity governs the second fitness function and incorporates additionaldomain knowledge learned from existing music pieces.

Promising results show that pleasant melodies can emerge through the ap-plication of these techniques. The melodies are found to exhibit several fa-vourable musical properties, including rhythm, melodic locality and motifs.

v

Preface

This master’s thesis presents the research completed during my final semesterat the Norwegian University of Science and Technology (NTNU). It was car-ried out in the period January to June 2011 at the Department of Computerand Information Science (IDI). It is a continuation of my project work doneas part of the preceding 2010 semester, which resulted in a paper submittedto the Genetic and Evolutionary Computation Conference 2011.

I would like to thank my supervisor, professor Pauline Haddow, for invaluablefeedback and support. Our meetings were stimulating and always left me inan uplifted state.

Bill Manaris deserves my gratitude for his help and advice. Credit is givento Classical Archives (www.classicalarchives.com) for kindly providingaccess to their collection of music files.

I would also like to thank my fellow students at the CRAB lab for theirfriendship, advice and joyful social gatherings.

Thanks go to my family for their support and especially my father for hisvaluable feedback. Finally, I wish to thank Mari for her help and constantencouragement.

Johannes H. Jensen

Trondheim, 2011

www.classicalarchives.com

Contents

1 Introduction 1

1.1 Goals and Limitations . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Overview of This Document . . . . . . . . . . . . . . . . . . . 3

2 Background 5

2.1 Music Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . 7

2.3 Evolutionary Art and Aesthetics . . . . . . . . . . . . . . . . . 10

2.4 Evolutionary Music . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Music Representation . . . . . . . . . . . . . . . . . . . . . . . 12

2.5.1 Linear Representations . . . . . . . . . . . . . . . . . . 12

2.5.2 Tree-Based Representations . . . . . . . . . . . . . . . 14

2.5.3 Phenotype Mapping . . . . . . . . . . . . . . . . . . . 15

2.6 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6.1 Interactive Evaluation . . . . . . . . . . . . . . . . . . 17

2.6.2 Hardwired Fitness Functions . . . . . . . . . . . . . . . 18

2.6.3 Learned Fitness Functions . . . . . . . . . . . . . . . . 19

vii

viii CONTENTS

3 Methodology 21

3.1 Music Representation . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.3 Functions and Terminals . . . . . . . . . . . . . . . . . 23

3.1.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.5 Genetic Operators . . . . . . . . . . . . . . . . . . . . 24

3.1.6 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Fitness Based on Zipf’s Law . . . . . . . . . . . . . . . . . . . 28

3.3.1 Zipf’s Law . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.2 Zipf’s Law in Music . . . . . . . . . . . . . . . . . . . . 30

3.3.3 Fitness Function . . . . . . . . . . . . . . . . . . . . . 32

3.4 Fitness Based on Distribution Similarity . . . . . . . . . . . . 34

3.4.1 Metric Frequency Distributions . . . . . . . . . . . . . 34

3.4.2 Cosine Similarity . . . . . . . . . . . . . . . . . . . . . 35

3.4.3 Fitness Function . . . . . . . . . . . . . . . . . . . . . 36

3.4.4 Relationship to Zipf’s Law . . . . . . . . . . . . . . . . 37

3.4.5 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 38

CONTENTS ix

4 Experiments: Zipf’s Law 41

4.1 A Musical Representation . . . . . . . . . . . . . . . . . . . . 41

4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . 45

4.1.4 Results and Discussion . . . . . . . . . . . . . . . . . . 46

4.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Tree-Based Composition . . . . . . . . . . . . . . . . . . . . . 51

4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 51


4.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3 Adding Rhythm . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 54



4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5 Experiments: Distribution Similarity 61

5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 62


x CONTENTS


5.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2 Improving the Basics . . . . . . . . . . . . . . . . . . . . . . . 82

5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 83


5.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.3 Learning From the Best . . . . . . . . . . . . . . . . . . . . . 86

5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 86

5.3.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 87



5.3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6 Conclusion and Future Work 97

Bibliography 99

Appendix 105

Chapter 1

Introduction

If a composer could say what he had to say in words he would notbother trying to say it in music. (Gustav Mahler)

Artificial Intelligence concerns the creation of computer systems that performtasks that are normally addressed by humans. Music Composition is one sucharea. The task of creating music which human listeners would appreciate hasbeen shown to be difficult with current AI techniques (Miranda and Biles,2007). Although music exhibits many rational properties, it distinguishesitself by also involving human emotions and aesthetics, which are domainsnot fully understood and also difficult to describe mathematically.

What thought processes are followed when creating music? Asking an exper-ienced composer is unlikely to yield a precise answer. The response mightcontain some general rules of thumb: a melody should encompass a few cent-ral motifs; the tones should fit the chord progression and so on. However,there are few formal rules that explain the process of music composition.The difficulty in formalizing music knowledge and understanding the processof music creation, makes music a challenging domain for computers (Minskyand Laske, 1992).

Evolutionary techniques have shown to be powerful for searching in complexdomains too difficult to tackle using analytical methods (Floreano and Mat-tiussi, 2008). It is thus perhaps not so surprising that Artificial Evolutionhas shown potential in the musical domain. The most popular task pursuedis Evolutionary Music Composition (EMC), where the goal is to create mu-sic through evolutionary techniques. However, a major challenge with suchsystems is the design of a suitable fitness function.

1

2 CHAPTER 1. INTRODUCTION

Some researchers have considered interactive fitness functions, i.e. fitnessassigned by humans. However, interactive evaluation brings many efficiencyproblems and concerns (see Section 2.6.1). An automatic fitness functionwould steer evolution towards pleasant music, without extensive human in-put.

The term “pleasant” is chosen here instead of the subjective term “liking”because pleasantness in contrast to preference, shows little variation amongsubjects with different musical tastes (Ritossa and Rickard, 2004). In otherwords, pleasantness is in essence more of an invariant measurement thanliking. For example, many people will admit that they find classical musicpleasant, but certainly not all will claim to like classical music.

It is however difficult, if not impossible, to be objective when discussing mu-sic. There are simply too many factors that influence our musical perception.Thus when music is described as “pleasant” in this thesis, it is done so in anattempt to be as objective as possible.

1.1 Goals and Limitations

As stated in the problem description, the focus of this research is automaticfitness functions for the evolution of music. In particular, a fitness functionis sought that can:

1. Capture pleasantness in music

2. Be applied to evolve music in different styles

An automatic fitness able to estimate the pleasantness of music would makeit possible to evolve music which is at least consistently appealing. In otherwords, it may serve as a baseline for evolution of music with more sophistic-ated properties.

Significant variation exists within the music domain. The ability to modeldifferent musical styles would allow similar diversity in evolved music as well.Assuming personal music taste is somewhat correlated to musical style, thecapability to evolve similar properties in music is a clear advantage.

Finally, in order to reduce the complexity and narrow the scope of this work,the focus is restricted to evolution of short, monophonic melodies, i.e onenote played at a time.

1.2. OVERVIEW OF THIS DOCUMENT 3

1.2 Overview of This Document

The thesis is structured as follows. Chapter 1 gives an introduction to theresearch domain, the motivation behind evolutionary music composition sys-tems and the main problem areas explored by this research. In Chapter 2,important background material used throughout the thesis is covered. Musicrepresentation is discussed in detail in Section 2.5. An overview of the ap-proaches to musical fitness functions, including a discussion of their strengthsand weaknesses, is given in Section 2.6.

In Chapter 3, the theory and concepts employed in this research are presen-ted. The music representation employed is described in Section 3.1, followedby the two approaches to fitness in Sections 3.3 and 3.4.

Chapters 4 and 5 presents the experiments performed with fitness based onZipf’s Law and distribution similarity, respectively. Conclusions from theexperiments are drawn at the end of each chapter.

Finally, Chapter 6 concludes the research and suggests areas of future work.

Many music examples are given throughout the thesis, which are presented asprintable music scores. The accompanying Zip-archive contains both MIDIand audio renderings of all the music examples.

4 CHAPTER 1. INTRODUCTION

Chapter 2

Background

This chapter covers important background material used throughout the restof the thesis. Section 2.1 gives a short introduction to basic music termino-logy. Section 2.2 explains the main concepts behind Artificial Evolution. InSection 2.3, a short overview of Evolutionary Computation applied to aes-thetic domains is given. Section 2.4 gives a survey of related work in theEvolutionary Music field.

Music representation schemes are covered in Section 2.5. Finally in Sec-tion 2.6, approaches to music fitness are discussed.

2.1 Music Terminology

This section gives a short introduction to the basic music terminology usedthroughout this document. If the reader is familiar with elementary musicterminology, the section can safely be skipped.

Music comes in many forms and flavours, and the styles and conventions varygreatly across cultures. In Western music however, the common depictionof music is that of a score. A music score represents music as sequences ofnotes, rests, and other relevant information like tempo, time signature, keyand lyrics. Figure 2.1 shows an example music score, which has a tempo of120 beats per minute (BPM).

Notes and rests are the primary building blocks in the music domain, muchas letters are the elementary units in language. The note value determines

5

6 CHAPTER 2. BACKGROUND

Twinkle Twinkle Little Star= 120

44

Figure 2.1 Example music score with 4 bars, 4 beats per bar and a tempo of120 beats per minute.

(a) Five common note values. From theleft: the whole note, half note, quarternote, eight note and sixteenth note.

(b) Two note modifiers: An augmentedquarter note (left) and a quarter note tiedtogether with an eight note (right).

C D E F G A B C

(c) Note pitches ranging from Middle C to C5, i.e. aone-octave C major scale.

Figure 2.2 Note values (a), note modifiers (b) and note pitches (c).

the duration of the note, while its vertical position in the score dictates thepitch that should be played. Equivalently, the rest value denotes the lengthof a rest (period of silence).

Note (rest) durations are valued as fractions of the whole note which typicallylasts 4 beats. The half note lasts 1/2 of the whole note, the quarter note is1/4th of the whole note and so on. The shape of the note (rest) determinesits value (duration). Figure 2.2a shows the five most common note values.

Notes can also be augmented, denoted by a dot after the note. An augmentednote lasts 1.5 times its original note value. Two notes of the same pitch mayalso be tied together, which indicates that they should be played as a singlenote, but for the duration of both. These two note modifiers are shown inFigure 2.2b. The resulting note value (duration) in both examples is thesame: 1.5

4= 1

4+ 1

8= 3

8.

Pitches are discretized into finite frequencies, corresponding to the keys ona piano. Pitches are grouped into 12 classes ranging from C to B. Theaccidentals modify a pitch: the sharp ♯ raises the pitch by a semitone (half-step) and the flat ♭ lowers it by the same amount. The interval between two

2.2. EVOLUTIONARY COMPUTATION 7

pitches of the same class is called an octave, where the upper pitch has afrequency that is twice that of the lower pitch. Figure 2.2c shows how thedifferent pitches are organized in a music score.

Musical phrases commonly follow a scale which denotes which pitches are tobe present. For instance, the C-major scale consists of all the white keys onthe piano, while the chromatic scale contains all keys (both white and black).

A music score divides time into discrete time intervals called bars (measures)which are separated by a vertical line. The time signature appears at thestart of the score: the upper number specifies how many beats are in a bar andthe lower number determines which note value forms a beat. For instance, atime signature of 3

4(the waltz) means that there are 3 beats in a bar and the

quarter note constitutes a beat. The score in Figure 2.1 consists of 4 barsand has a time signature of 4

4.

Looser musical structures are also useful when describing music. Motifs areshort musical ideas, i.e. short sequences of notes which are repeated andunfolded within a melody. Several motifs can be combined to form a phrase.A theme is the central material, usually a melody, on which the music isfounded.

2.2 Evolutionary Computation

Evolutionary Computation (EC) incorporates a wide set of algorithms thattake inspiration from natural evolution and genetics. Applications includesolving hard optimization problems, design of digital circuits, creation ofnovel computer programs and several other areas that are typically addressedby human design. Most evolutionary algorithms are based on the same keyelements found in natural evolution: a population, diversity within the pop-ulation, a selection mechanism and genetic inheritance (Floreano and Mat-tiussi, 2008).

In an Evolutionary Algorithm (EA) a population of individuals is maintainedthat represent possible solutions to a target problem. Each solution is en-coded as a genome which, when decoded, yields the candidate solution. Thestructure of the genomes is called its genotype, and a wide range of possiblerepresentations exist. The phenotype describes the form of the individual and


is mostly problem specific. While the phenotype directly describes a solu-tion, the genotype typically employs a more low-level and compact encodingscheme.

An EA typically operates on generations of populations. For each evolu-tionary step, a selection of individuals are chosen for survival to the nextgeneration. The selection mechanism typically favours individuals accordingto how well they solve the target problem; that is, according to their fitness.

Tournament selection is a popular selection mechanism, in which several“tournaments” are held between k randomly chosen individuals. With prob-ability 1 − e, the individual with the highest fitness wins the tournament.With probability e, however, a random individual is chosen instead. Thismechanism promotes genetic diversity while at the same time maintainingselection pressure.

The fitness function takes as input an individual which is evaluated to pro-duce a numerical fitness score. As such it is highly problem specific and ofteninvolves multiple objectives. A common technique is to combine several ob-jectives as a weighted sum.

In order to explore the solution space, diversity is maintained within thepopulation through genetic operators like mutation and crossover. Randommutations are introduced to the population in order to explore the closeneighbourhood of each solution, in search for a local optima. That is, muta-tions cause exploitation of the best solutions found so far.

Many EAs also include a second selection phase called reproduction. Duringreproduction, two (or more) parent individuals are chosen for mating, andtheir genetic material is combined to produce one or more offspring. Eachoffspring then inherits genetic material from its parents. The idea is thatcombining two good solutions might result in an even better solution. Thecombination of different genetic material is addressed by the crossover op-erator, to which many variations exist. Crossover ensures exploration of thesolution space.

The evolutionary loop continues like this with selection, (reproduction) andmutation through many generations until some stopping criterion is met,and a satisfying solution emerges from the population. Figure 2.3 depictsthe evolutionary loop and its main components.

2.2. EVOLUTIONARY COMPUTATION 9

Initialization

Population

SelectionFitness

Diversity

Figure 2.3 A high level diagram of the evolutionary loop.

In general, the process of constructing an EA involves:

1. Design of a genetic representation (genotype) and its correspondinggenetic operators (mutation, crossover, etc.)

2. Creation of a mapping between the genotype to the phenotype

3. Choosing the selection mechanisms

4. Design of a fitness function which measures the quality of each solution

Different types of EAs mainly employ different genetic representations andoperators. Genetic Algorithms (GA) operate on binary representations, Ge-netic Programming (GP) uses a tree-based scheme and Evolutionary Pro-gramming (EP) operates directly on phenotype parameters. For a detailed in-troduction to Evolutionary Computation, see Floreano and Mattiussi (2008).


2.3 Evolutionary Art and Aesthetics

Music is an art form with and thus exhibits an important aesthetic aspect.This section gives a brief overview of how evolutionary computation has beenapplied in other aesthetic domains.

The first to utilize evolutionary techniques in the aesthetic field was RichardDawkins, the evolutionary biologist of later renown, who in 1987 devised“Biomorphs”, a program that let the user evolve graphical stick figures. Sincethen, a myriad of evolutionary art systems have been developed (Bentley andCorne, 2002).

Another early pioneer in the field was Karl Sims, who applied interactiveevolution for computer graphics. Both 3D and 2D images as well as anima-tions were explored, using a symbolic Lisp tree representation. Completelyunexpected images could emerge, some remarkably beautiful (Sims, 1991).He presented his work at Paris Centres Pompidou in 1993. Named GeneticImages, the artwork allowed museum visitors to serve as selectors for newgenerations of evolved images.

Another similar example is NEvAr by Machado and Cardoso (2002), a systemwhich lets the user evolve colour images using Genetic Programming anduser-provided fitness assessment.

Secretan et al. (2008) take the concept a step further with Picbreeder – aweb-based service where users can evolve pictures collaboratively. It offersan online community where pictures can be shared and evolved further intonew designs.

Common to these examples is the interactive approach to fitness, where hu-mans pass aesthetic judgement on the generated pieces. Since humans actas selectors for future generations, these systems are good demonstrations ofthe power of evolutionary techniques in the aesthetic domains.

Evolution has also been applied in other areas where aesthetics play an im-portant role. Architecture, sculptures, ship design and sound are just a fewexamples. For a thorough overview of evolutionary art, see Bentley andCorne (2002).

2.4. EVOLUTIONARY MUSIC 11

2.4 Evolutionary Music

One of the early examples of Evolutionary Computation applied for music isHorner and Goldberg (1991), where a Genetic Algorithm was used to per-form thematic bridging. Since then, EC has been used in a wide range ofmusical tasks, including sound synthesis, improvisation, expressive perform-ance analysis and music composition.

Evolutionary Music Composition (EMC) is the application of EvolutionaryComputation for the task of creating (generating) music. In EMC systems,the genotype is typically binary (GA) or tree-based (GP). The phenotype isthe music score – sequences of notes and rests that make up the composition.

As mentioned earlier, the fitness function poses a significant challenge whenconstructing evolutionary composition systems. Approaches include inter-active evaluation, hardwired rules and machine learning methods.

One well-known EMC system is GenJam, a genetic algorithm for generatingjazz solos (Biles, 1994). A binary genotype is employed, with musical (domainspecific) genetic operators. GenJam uses interactive fitness evaluation, wherea human musical mentor listens through and rates the evolved music phrases.

Johanson and Poli (1998) developed a similar EMC system called GP-Musicwhich can evolve short musical sequences. Like GenJam, it relies on inter-active fitness evaluation, but employs a tree-based genotype instead of a bitstring.

Motivated by subjectivity and efficiency concerns with interactive fitness as-sessment, Papadopoulos and Wiggins (1998) developed an automatic fitnessfunction based on music-theoretical rules. It was applied in a GA for theevolution of jazz melodies, with some promising results.

Another similar example is AMUSE, where a hardwired fitness function isused to evolve melodies in given scales and chord progressions (Özcan andErçal, 2008).

Phon-Amnuaisuk et al. (1999) used a GA to generate traditional musicalharmony, i.e. chord progressions. A fitness function derived from musictheory was used and surprisingly successful results are reported.

The Swedish composer Palle Dahlstedt presents an evolutionary system whichcan create complete piano pieces. By using a recursive binary tree repres-entation, combined with formalized fitness criteria, convincing contemporaryperformances have been generated (Dahlstedt, 2007).


Some have considered probabilistic inference as framework for automatic fit-ness functions. A recent example is Little Ludwig (Bellinger, 2011), whichlearns to compose in the style of known music. Evolved music is evaluatedbased on the probability of note sequences with respect to some inspirationalmusic piece.

This has been a short introduction to the EMC field and is by no meansexhaustive. For a more in-depth survey, the reader is referred to a bookdedicated to the subject – “Evolutionary Computer Music” (Miranda andBiles, 2007).

2.5 Music Representation

The genotypes most commonly used in EMC systems are binary bit strings(GA) and trees (GP). Although the choice of genotype differs, music domainknowledge is commonly encoded into the representation in order to constrainthe search space and get more musical results.

Discrete MIDI1 pitches are usually employed instead of pitch frequencies.They range from 0-127 and include all the note pitches possible in a musicscore.

2.5.1 Linear Representations

Linear genotypes encode music as a sequence of notes, rests and other musicalevents in an array-like structure. Binary genotypes fall under this category,where the sequence is encoded in a string of bits. Symbolic vectors are alsocommon, which are at a higher level of abstraction and somewhat easier todeal with. Linear genotypes fall under the Genetic Algorithm (GA) class ofEAs.

The genotype is typically of a fixed size specified by the user, dividing thegenotype into N slots which each can hold a musical event. This effect-ively limits the length of the evolved music, either directly or implicitly. A

1MIDI (Musical Instrument Digital Interface) is a standardized protocol developed bythe music industry which enables electronic instruments to communicate with each other.For more information see the website http://www.midi.org/.

http://www.midi.org/

2.5. MUSIC REPRESENTATION 13

1 0111100 1 0111100 1 1000000 0 . . . 1 1000011 0 . . .no

te C note

C note

E hold

note G hold

Figure 2.4 Binary genome with implicit note durations. Each event is one bytelong. The first bit denotes the event type, either note (1) or hold (0). The remainingseven bits contain the MIDI pitch to be played for the note event and are simplyignored in the case of a hold event.

Figure 2.5 Music score corresponding to the genome in Figure 2.4, given thatevent durations are one quarter note long.

position-based scheme is usually employed, i.e. a note’s position in the scoreis implicitly determined by its location in the genome.

Musical events can include notes, rests, chords, dynamics (e.g. loud or soft),articulation (e.g. staccato) and so on. The duration of an event is eitherspecified explicitly in the event, or determined implicitly, e.g. by the eventsthat follow.

Papadopoulos and Wiggins (1998) use a symbolic vector genotype, where theduration of each event is explicitly defined. Rather than absolute pitches, adegree-based scheme is used where the pitches are mapped to degrees of apre-defined scale. Genomes are thus sequences of (degree, duration) tuples.

An implicit duration scheme is found in GenJam’s binary genotype. Threetypes of events are supported: note, rest and hold. A note specifies the startof a new note. Similarly a rest event dictates the beginning of a rest. Bothevents implicitly mark the end of any previous note or rest. Finally, a holdevent results in the lengthening of a preceding note (rest) by a fixed amount(Biles, 1994).

A binary genome with an implicit duration scheme can be seen in Figure 2.4.Each event is one byte long – one bit for the event type and seven bits forthe MIDI pitch to be played. In the case of a hold event, the pitch field issimply ignored. Figure 2.5 shows the corresponding music score, given thatevent durations are one quarter note long.


Music knowledge is often embedded into the genetic operators. For example,instead of the random point mutation traditionally employed in GAs, variousmusical permutations can be used. Examples include reversing, rotating andtransposing random sequences of notes, as well as copying a fragment fromone location to another.

Finally, linear genotypes are simple to use, since their structure closely re-sembles a music score. Interpretation of the genome is thus fairly straight-forward for a human.

2.5.2 Tree-Based Representations

Tree-based genotypes encode music as a tree where the leaf nodes (the ter-minals) contain the notes and rests, and the inner nodes represent operatorsthat perform some function on the contents of their sub-trees. The tree isrecursively parsed to produce a music score. The functions must minimallyinclude some form of concatenation to be able to produce sequences of notes,but often other musical operators are used as well. Tree-based genotypes arepart of the Genetic Programming (GP) family of EAs.

Some argue that GP is well suited for music because the tree closely resemblesthe hierarchical structure found in music, e.g. short sequences of notes makea motif, several motifs form a phrase which combines into melodies and largersections (Minsky and Laske, 1992).

Johanson and Poli (1998) employ seven different operators (functions), in-cluding concatenation, repetition, note elongation, mirroring and transpos-ition of notes. Manaris et al. (2007) also include functions for polyphony,retrograde and inversion.

Figure 2.6 depicts an example tree-based genome for music, with three differ-ent operators. The leaf nodes are (pitch, duration) pairs. The repeat operatorcauses the notes in its sub-tree to be played twice, while slow doubles theirduration. The corresponding music score can be seen in Figure 2.5.

As can be seen, music knowledge is embedded into the function nodes. Assuch the complexity lies in the interpretation of the genome and the mappingto a music score, i.e. in the phenotype, at a higher level of abstraction thanin the linear genotypes.


concatenate

repeat

C, 14

slow

concatenate

E, 14 G,14

Figure 2.6 An example tree-based music genome. The leaves contain (pitch,duration) pairs and the non-leaf operators perform concatenation, repetition orslowing of the notes in their respective sub-trees. The corresponding music scorecan be seen in Figure 2.5.

Since the domain knowledge lies at the phenotype level (when parsing thetree), traditional genetic operators from GP can be used without furthermodification. Mutation simply replaces a random node with a randomlygenerated sub-tree, while crossover swaps two randomly selected sub-treesbetween genomes. Care must be taken when performing these operations sothat the tree doesn’t grow out of bounds, resulting in extremely long piecesof music. Typically a maximum depth parameter is implemented to limit theheight of the tree.

Tree-based genotypes are dynamic in size, meaning that the length of thegnomes can change across generations. This in turn allows variation in thelength of the evolved music. Introducing new music knowledge or addingmore musical possibilities (like polyphony or dynamics) is also fairly straight-forward, and simply involves implementation of a new function node. Thestructure of the genotype remains the same.

It should be noted that trees are more complex structures than vectors andare therefore more difficult to use. For a human, translation of a tree into amusic score is not exactly child’s play.

2.5.3 Phenotype Mapping

The translation from genotype to phenotype often adds further music know-ledge to the system, often in the form of musical constraints.


One common technique is to map the pitches in the genotype to a pre-definedscale. The scale can either be kept the same for the entire score, or followsome chord progression specified by the user.

GenJam makes use of this approach, where jazz melodies are mapped toscales according to a predefined chord progression. This allows the melodiesto be applied to many different musical contexts. Since the notes are alwaysin scale, the system never plays a “wrong” note. This scheme effectivelyreduces the amount of unpleasant melodies the mentor has to listen through(Biles, 1994). In the context of EAs, the search space is reduced to melodiesin certain scales.

In many musical genres, however, breaking the “rules” from time to time isencouraged and can give rise to much more interesting music. A jazz soloistwho always keeps to the “correct” scales usually results in a rather boringperformance.

Another feature often employed is that of a reference pitch which specifiesthe lowest possible pitch to consider during the mapping process. A refer-ence duration can also be considered, to effectively adjust the tempo of theresulting score (given a fixed BPM).

2.6 Fitness

As mentioned, musical fitness is the primary focus of this research and hasproven to be the most challenging aspect of EMC systems. Some measure-ment of the “goodness” of a music piece is required in order to guide evolutiontowards aesthetically pleasing individuals. In other words, a fitness functionis needed which captures the human perception of pleasantness in music.

Unfortunately, as discussed in Chapter 1, human music creation is an elusiveprocess, and exactly what constitutes pleasant music is also not fully under-stood. Furthermore, personal musical taste plays an important role in theliking of a given piece of music. Certain styles of music will appeal to somepeople, while not to others. A fitness function which can be tuned to favourcertain musical styles would certainly be beneficial, allowing the generationof a wider spectrum of music.

2.6. FITNESS 17

The approaches to musical fitness can roughly be divided into three categor-ies:

• Interactive evaluation: fitness assigned by humans, e.g. Biles, 1994.

• Hardwired fitness functions : rules based on music theory or experience,e.g. Papadopoulos and Wiggins, 1998; Dahlstedt, 2007.

• Learned fitness functions : Artificial Neural Networks (ANNs), MachineLearning, Markov Models, e.g. Bellinger, 2011.

A more detailed explanation of the different approaches to fitness follows,with a discussion of their main strengths and weaknesses.

2.6.1 Interactive Evaluation

Probably the most precise fitness assessment available is from the targetaudiences themselves. If humans evaluate the fitness of the music pieces, theproblem of formalizing music knowledge is avoided. Surely this produces amuch more accurate assessment of the “goodness” of a music piece than anyknown algorithmic method?

With interactive fitness functions, one or more human mentors will sit downand listen through each evolved music piece and give them a fitness score,either explicitly or implicitly.

In GenJam, for example, the mentor will press ’G’ on his keyboard a numberof times to indicate that he likes what was just played. Similarly, to indicatethat a portion of the music was bad, he presses ’B’. Fitness is then simplythe number of Gs minus the number of Bs (Biles, 1994).

One of the main problems with the interactive approach is that for everymusic piece in the population, the mentors must listen carefully through eachone and determine which ones are better. This process has to be repeatedfor each generation.

In the visual art domains, this is less problematic because several images canbe presented to the mentor at the same time and thus be compared quickerand easier. Music, however, is temporal in nature. A piece of music alwayshas to be presented in the correct tempo and the length of the piece dictates


the minimum time required for evaluation. Finally, several music pieces cannot be presented simultaneously, making the evaluation process much moretime consuming than with images.

This issue has been termed “the fitness bottleneck” and is the main limitingfactor for the population size and number of generations (Biles, 1994). In-teractive fitness might provide a good evaluation, but clearly at the expenseof efficiency.

Another important problem with the interactive approach is subjectivity. Thehumans responsible for evaluating each music piece will most likely be biasedtowards their own musical taste. This can be countered by increasing thenumber of mentors to gain statistical significance. However, much care mustbe taken when selecting the participants. The number of people, their age,background etc. are parameters which will clearly affect the results. Determ-ining the important parameters and gathering the right people is a challen-ging task in itself.

Furthermore, providing consistent evaluation is hard and it is likely that thementors will be biased from previous listenings, mood or even boredom. Fi-nally, interactive fitness tells us very little about the processes involved inmusic composition. The music knowledge which is applied is hidden away inthe mentor’s mind and is for that reason of limited research value (Papado-poulos and Wiggins, 1998).

2.6.2 Hardwired Fitness Functions

Another approach to musical fitness is to study music theory (or othersources) for best practices in music composition. This typically involvesexamination of relevant music theory and translation of this knowledge toalgorithmic fitness assessments. Such methods attempt to address the chal-lenges found in interactive fitness functions.

A fitness function for melodies based on jazz theory was designed by Papado-poulos and Wiggins (1998). The function consists of a weighted sum of eightsub-objectives related to melodic intervals, pattern matching, rhythm, con-tour and speed. The authors report that their system generated subjectivelyinteresting melodies, although few examples are provided. Noticeably themore music knowledge which was encoded into the system, the better werethe results.

2.6. FITNESS 19

A similar rule-based approach is found in Özcan and Erçal (2008). A surveyperformed with 36 students revealed that the subjects were unable to differ-entiate between the melodies created by the system and those of a humanamateur musician. Further, tests showed a correlation between the fitness ofa melody and its statistical rank as given by the human evaluators.

As discussed above, hardwired fitness functions can yield good results. How-ever, they can often be quite challenging to design. The rules of thumb foundin most music literature are often vague and hard to interpret algorithmic-ally. They are only best practices and are certainly not followed rigidly bymost artists. Furthermore, sufficient knowledge might not even be availablein the literature for certain styles of music.

Another issue is scalability. Hardwired fitness functions tend to becomehighly specialized towards some small subset of music. In order to evolvemusic in another style, the rules that make up the fitness have to be alteredand redesigned by hand – a potentially challenging and time consuming pro-cess.

2.6.3 Learned Fitness Functions

Because of the difficulty in hand-designing good fitness functions for music,many researchers have turned to learned fitness functions for possible solu-tions. In this approach, machine learning techniques are used to train thefitness function to evaluate music pieces. Such systems learn by extractingknowledge from examples (training data) which are utilized to evaluate newunseen music pieces.

One of the main advantages of machine learning approaches is that they typ-ically require less domain knowledge than hardwired fitness functions. Fur-thermore, they might discover novel aspects of music which could otherwisebe missed by human experts.

Another important advantage is that of adaptability. Hardwired fitness func-tions are challenging and time consuming to create, and the design processhas to be repeated for each musical style. A learned fitness function wouldideally only require a new set of training data for analysis.

Unfortunately, machine learning techniques are not powerful enough to pro-cess raw music directly. For instance, passing an entire music score as the


input to an ANN will not only require an infeasible number of neurons, butthe network is unlikely to succeed in extracting any relevant knowledge.

Some form of feature extraction is necessary to both reduce the dimensionalityof the problem and assist the algorithm by identifying potentially usefulmusical characteristics. Identifying features which are musically meaningfulis crucial for the algorithm to successfully learn anything.

Relevant, unbiased training data is essential, as well as collecting the suffi-cient amount of material in order to achieve good performance (Duda et al.,2006). For instance, a collection of music pieces used as training data shouldinclude music in all relevant musical styles and by many different authors.

In an attempt to improve the efficiency of GenJam, an ANN was trainedbased on data gathered from interactive evaluation runs. The hope was thatthe neural network would learn how to evaluate new music. However, theresults were unsuccessful, with diverging fitness for nearly identical genomes(Biles et al., 1996). A similar attempt was made in Johanson and Poli (1998)with inconclusive results – some of the evolved melodies were reportedly nicewhile others rather unpleasant.

Even though learned fitness functions have great potential, they are challen-ging to design and tune. As mentioned, the disadvantages include sensitivityto the input data, many input parameters and difficult feature selection.

Chapter 3

Methodology

This thesis proposes a quantitative approach to Evolutionary Music Compos-ition. That is, the fitness function operates on the frequency of music events.The evolutionary system is designed to create short, monophonic melodies.

The term “event” is used broadly here, meaning an occurrence of any kindwithin a music piece. In other words, an event is not restricted to single notes,but can be the relationship between notes or pairs of notes, for example. Thetypes of events explored are covered in Section 3.2.1: Metrics.

Two different techniques for processing the frequency distributions are in-vestigated, i.e. two fitness functions. The first builds on Zipf’s Law, whichmeasures the scaling properties in music and is described in Section 3.3. Thesecond technique is based on the similarity between frequency distributionsand is presented in 3.4.

A tree-based representation is employed based on Genetic Programming (seeSection 2.5.2). Many of the features described in Section 2.5.3 are also in-cluded. A detailed description of the genotype and phenotype is given inSection 3.1.

3.1 Music Representation

In previous work by the author (Jensen, 2010) a symbolic, vector-based gen-otype was used to evolve music. As discussed in Section 2.5.1, linear repres-entations (e.g. binary, vectors) are commonly found in the EMC literature.They all fall under the Genetic Algorithm class of Evolutionary Computation.

21

22 CHAPTER 3. METHODOLOGY

The other type of representation commonly employed is the tree-based gen-otype which falls under the Genetic Programming umbrella. It has beenargued that GP is more suitable because the tree closely resembles the hier-archical structure found in music (see Section 2.5.2).

3.1.1 Introduction

In this work, a tree-based genotype is employed, which was shown to out-perform the vector-based genotypes previously used (see Section 4.1). Asdiscussed earlier, music is encoded in a tree where the leaf nodes (terminals)contain notes and the inner nodes represent functions that perform some op-erations on their sub-trees. Recursively parsing a tree results in a sequenceof notes – the music score (see Section 2.5.2).

The following sections give an in-depth description of the tree-based repres-entation used throughout this research.

3.1.2 Parameters

The genotype and phenotype take several parameters which determine vari-ous aspects of the representation. They are summarized here and describedin more detail in the sections that follow.

Genotype Parameters

Pitches: Number of pitches (integer)

Durations: Number of durations (integer)

Max-depth: Maximum tree depth (integer)

Initialization method: Tree generation method (“grow” or “full”)

Function probability: Probability of function nodes (float)

Terminal probability: Probability of terminal nodes (float)


Phenotype Parameters

Pitch reference: Lowest possible pitch (integer)

Scale: A musical scale mapping (list)

Resolution: Base note duration (integer)

Duration map: Set of durations to use (list)

3.1.3 Functions and Terminals

Inner nodes have two children, i.e functions take two arguments, thus result-ing in binary trees. The function set only contains one function, concaten-ation, which is denoted by a “+”. Concatenation simply connects the notesfrom its two sub-trees to a form a longer sequence. Although it would beinteresting to include other functions as well, it was decided to keep thingssimple so that the fitness function would be responsible for most of the musicknowledge.

The set of terminals contain the notes, which are represented as (pitch, dur-ation) tuples. Pitches and durations are both positive integers, which are inthe range [0, N) where N is dictated by the the pitches or durations para-meters. The way the pitches and durations are interpreted depends on thephenotype parameters and is detailed in Section 3.1.6. Note that rests arenot included in the representation.

3.1.4 Initialization

Initialization of random genome trees is performed in two different occasions:

1. When creating the initial population at the beginning of the EA.

2. When generating random sub-trees for mutation.

Trees are generated in a recursive manner, where in each step a randomnode from either the function set or the terminal set is chosen. Tree growthis constrained by the max-depth parameter, which determines the maximumheight of the generated trees.


The initialization method determines how each node is chosen. With thefull method, a function node is always chosen before the maximum depthis reached, after which only terminals can be selected. This results in fullbinary trees with 2D leaf nodes (notes), where D is the maximum depth.With the grow method, function and terminal nodes are chosen randomlyaccording to the function- and terminal probability parameters, respectively.The resulting trees are therefore of varying size and shape. See also Koza(1992).

3.1.5 Genetic Operators

The traditional GP operators of mutation and crossover are adopted. Muta-tion replaces a random node with a randomly generated sub-tree. Crossoverswaps two random sub-trees between genomes. Both operators conform tothe maximum depth, meaning that the resulting genomes will never be higherthan this limit.

For both operators, function and terminal nodes are selected according to thefunction- and terminal probability, respectively. As suggested in Koza (1992),the default function probability is 90% and terminal probability 10%.

3.1.6 Parsing

Parsing of the tree is done at the phenotype level, where the tree is recursivelytraversed. The result is the sequence of notes that make up the music score.

Genotype pitches p in the terminal nodes are offset by the pitch referenceand mapped to the specified musical scale:

pitch = pitchref + 12 bp/Nc+ scale [p mod N ]

where N is the length of the scale list. In words, the second term calculatesthe octave while the last term performs the scale mapping. The resultinginteger is a MIDI pitch number.

The genotype durations d (positive integers) can be interpreted in two ways:

3.2. FITNESS 25

+

+

(0,2) (2,1)

+

(4,1) (5,0)

(a)

44

(b)

Figure 3.1 An example tree genome (a) and the resulting music score (b).

1. If the resolution parameter is specified, according to the equation: 2dr,

where r is the resolution. Thus for a resolution of 16, this would resultin the real-valued durations 1

16, 1

8, 1

4, 1

2for d ∈ [0, 3].

2. If a duration map is given, d is interpreted as the index of an elementin this map, i.e. duration [d]. This allows the use of durations that arenot easily enumerated.

Figure 3.1 shows an example tree genome and the resulting music score afterparsing, using a resolution of 16, pitch reference set to 60 (Middle C) andthe chromatic scale (no scale).

3.2 Fitness

As mentioned, two approaches to automatic fitness for music are explored.The first is based on Zipf’s Law, and is described further in Section 3.3. Thesecond is based on distribution similarity and is detailed in Section 3.4.

Common to the two fitness functions is the quantitative approach – theyoperate on the frequency of musical events. The difference is how thesefrequency distributions are utilized to produce a fitness score. Figure 3.2depicts the structure and relationship between the fitness functions. Forboth functions, the fitness score is calculated with respect to a set of targetmeasurements.


Music

Metric frequencydistributions

Zipf SimilarityTargetslopesTarget

distributions

Fitness score Fitness score

Figure 3.2 Diagram of the two fitness functions, the first based on Zipf’s Law andthe second on distribution similarity. The fitness score is calculated with respectto a set of target measurements.

3.2. FITNESS 27

3.2.1 Metrics

The musical events are produced by different metrics which are summarizedhere and described below. Most of the metrics are derived from Manariset al. (2007).

Pitch: Note pitches (p ∈ [0, 127])

Chromatic-tone: Note pitches modulo 12 (ct ∈ [0, 11])

Duration: Note durations (d ∈ R+)

Pitch duration: Note pitch and duration pairs (p, d)

Chromatic-tone duration: Chromatic tone and duration pairs (ct, d)

Pitch distance: Time intervals between pitch repetitions (tp ∈ R+)

Chromatic-tone distance: Time intervals between chromatic-tone repeti-tions (tct ∈ R+)

Melodic interval: Musical intervals within melody (mi = pi − pi−1 or ab-solute mi = |pi − pi−1|)

Melodic bigram: Pairs of adjacent melodic intervals (mii,mii+1)

Melodic trigram: Triplets of adjacent melodic intervals (mii,mii+1,mii+2)

Rhythm: Note durations plus subsequent rests (r ∈ R+)

Rhythmic interval: Relationship between adjacent note rhythms (ri =ri

ri−1)

Rhythmic bigram: Pairs of adjacent rhythmic intervals (rii, rii+1)

Rhythmic trigram: Triplets of adjacent rhythmic intervals (rii, rii+1, rii+2)

Pitches are positive integers corresponding to MIDI pitches, while durationsare positive real numbers denoting a time interval. Chromatic tone is simplythe octave-independent pitch, e.g. any C will have the chromatic tone number0, Es will have the number 4 and so on. This is useful because pitches areperceptually invariant (perceived as the same) over octaves (Hulse et al.,1992).


Melodic intervals capture the distance between adjacent note pitches withinthe melody. As such they provide melodic information independent of themusical key, e.g. the same melody played in the C major and D major scalewill contain the same melodic intervals. Hulse et al. (1992) presents evidencethat melodies with the same sequence of intervals are perceptually invariant.

The melodic intervals come in two flavours: relative and absolute – the latterdiscards information about the direction of the interval. The melodic bigramsand trigrams produce pairs and triplets of melodic intervals, respectively.

The rhythm metric was created to describe rhythmic features formed bynotes followed by any number of rests. When there are no rests, rhythm isequivalent to the duration metric. Since the genotype does not support rests,this is always the case for the evolved music. Rhythm is however useful whenapplied to real-world music which do contain rests.

Rhythmic intervals capture the relationship between two adjacent rhythmsas a ratio. They are therefore independent of tempo and are the rhythmicequivalent of melodic intervals. Hulse et al. (1992) also demonstrates thatrhythmic structure is perceptually invariant across tempo changes.

A rhythmic interval of 1.0 indicates no rhythmic change between two notes.An interval of less than 1.0 describes an increase in speed, while greaterthan 1.0 indicates a decrease. Rhythmic bigrams and trigrams are pairs andtriplets of rhythmic intervals, respectively.

3.3 Fitness Based on Zipf’s Law

Research on Zipf’s Law has demonstrated that art tends to follow a balancebetween chaos and monotony. Evidence shows that this is also the casein music (Manaris et al., 2005, 2003, 2007). Building on this research, anautomatic fitness function based on Zipf’s Law is proposed for the evolutionof novel melodies.

3.3.1 Zipf’s Law

The Harvard linguist George Kingsley Zipf studied statistical occurrencesin natural and social phenomena. He defined Zipf’s Law which describes

3.3. FITNESS BASED ON ZIPF’S LAW 29

the scaling properties of many of these phenomena (Zipf, 1949). The lawstates that the “frequency of an event is inversely proportional to its statisticalrank ”:

f = r−a (3.1)

where f is the frequency of occurrence of some event, r is its statistical rankand a is close to 1 (Manaris et al., 2005).

For example, ranking the different words in a book by their frequency, themost frequent word (rank 1) will occur approximately twice as often as thesecond most frequent word (rank 2), three times as often as the third mostfrequent word (rank 3) and so on. Plotting these ranks and frequencies ona logarithmic scale produces a straight line with a slope of −1. The slopecorresponds to the exponent −a in equation (3.1). Such a plot is known asa rank-frequency distribution.

Figure 3.3 shows the rank-frequency distribution of the 10,000 most frequentwords in the Brown Corpus text collection and a straight line which fits thedistribution. As predicted, the slope of the line is approximately -1.

Figure 3.3 Rank-frequency distribution of the 10,000 most frequent words in theBrown Corpus and a straight line which fits the distribution. The slope of the lineis approximately -1 as predicted by Zipf’s Law.


Zipf’s Law is a special case of a power law. When a is 1, the distribution iscalled 1/f noise, or pink noise. These 1/f distributions have been observed ina wide range of human and natural phenomena, including language, city sizes,incomes, earthquake magnitudes, extinctions of species and in various artforms. Other related distributions are white noise (1/f 0 – uniform random)and brown noise (1/f 2 ).

3.3.2 Zipf’s Law in Music

In his seminal book Human Behaviour and the Principle of Least Effort,Zipf also found evidence for his theory in music. Analysis of Mozart’s Bas-soon Concerto in Bb Major revealed an inverse linear relationship betweenthe length of intervals between repetitions of notes and their frequency ofoccurrence (Zipf, 1949).

Vossa and Clarke (1978) studied noise sources for stochastic music compos-ition. They generated music using a white, pink and brown noise source.Samples of the results were played to several hundred people. They dis-covered that the music from the pink noise source was generally perceived asmuch more interesting than music from the white and brown noise sources.

Manaris et al. (2003) devised more metrics based on Zipf’s Law, and laterwork expanded and refined them (Manaris et al., 2005, 2007). Each metriccounts the frequency of some musical event and plots them against theirstatistical rank on a log-log scale. Linear regression is then performed on thedata to estimate the slope of the distribution. The slopes may range from zeroto negative infinity, indicating uniform random to monotone distributions,respectively. The coefficient of determination R2 is also computed to seehow well the slope fits the data. R2 values range from 0.0 (worst fit) to 1.0(perfect fit).

Some of the relevant metrics explored by Manaris include rank-frequencydistributions of: pitches, chromatic tones, note durations, pitch durations,chromatic tone durations, pitch distances, melodic intervals, melodic bigramsand melodic trigrams. Section 3.2.1 covers the metrics in more detail.

Figure 3.4 shows the rank-frequency distributions and slopes from all theabove metrics applied to The Beatles’ Let It Be. Most of the metrics displayslopes near -1 as predicted. Notice the rather steep slope of −2.0 for notedurations, something which suggests little variation in the music rhythm.


Figure 3.4 Rank-frequency distributions and slopes for each metric applied toThe Beatles’ Let It Be. Most of the metrics display slopes near -1, as predicted byZipf’s Law.


A large corpus of MIDI-encoded music in different styles was analysed byManaris with the Zipf-based metrics. The results showed that all music piecesdisplayed many near Zipfian distributions, with strong correlations betweenthe distribution and the linear fit. Non-music (random) pieces exhibited veryfew (if any) distributions.

These results suggest that Zipf-based metrics capture essential aspects ofthe scaling properties in music. They indicate that music tends to follow adistribution balanced between chaos and monotony, i.e. between a near-zeroslope and a steep slope approaching negative infinity.

Studies showed that different styles of music exhibited different slopes anddemonstrated further that the slopes could be used successfully in severalmusic classification tasks. A connection between Zipf metrics and humanaesthetics was also revealed (Manaris et al., 2005).

3.3.3 Fitness Function

Since Zipf distributions are so prevalent in existing music, it seems reasonableto assume that new music must also exhibit such properties. The obviousquestion is then, can a fitness function based on Zipf’s Law guide evolutiontowards pleasant music?

Some research exists in this area: Manaris et al. (2007) performed severalmusic generation experiments using Zipf metrics for fitness. Different melodicgenes were used for the initial population and successful results were repor-ted when an existing music piece was used. In other words, variations ofexisting music were evolved. The work presented herein, however, is focusedon creating new music from scratch.

Each Zipf-based metric extracts a slope from a music piece. Assume that thevalue of some favourable slope is known a priori – that is, a target slope whichevolution should search for. The fitness is then a function of the distance(error) to the target.

For a single metric, the target fitness is defined as a Gaussian:

fm(x;T ) = e−(T−x

λ)2 (3.2)


Figure 3.5 Fitness plot of the target fitness fm for different tolerance values λ.

where m denotes the metric, T is the target slope for the given metric, xis the metric slope of some evolved music piece and λ is the tolerance – apositive constant. fm results in smooth fitness values ranging from 0.0 (when|T − x| is above some threshold) and 1.0 (when T = x). The tolerance λadjusts this threshold (and the steepness of the fitness curve). Fitness willapproach zero when |T −x| is approximately 2λ. Figure 3.5 shows the fitnesscurves of fm for different tolerance values.

Of course, a single metric is unlikely to be sufficient alone as the fitnessfunction. Several metrics should be taken into account. Thus instead of asingle target slope, a vector of target slopes is used. Combining the targetfitness of several different metrics as a weighted sum gives:

f(x;T) =N∑i=1

wifi(xi;Ti) (3.3)

where N is the number of metrics, i denotes the metric number, wi its weightand fi the single metric target fitness function in equation (3.2). Finally fis normalized to produce fitness values in the interval [0, 1].


3.4 Fitness Based on Distribution Similarity

Experiment results from Chapter 4 demonstrate that Zipf metrics can beused successfully as fitness for evolution of pleasant melodies. Some musicalknowledge was necessary for evolution to produce coherent results, e.g. con-straints in the form of a scale, the number of possible pitches and so on.However, a majority of the evolved melodies were in fact rather unpleas-ant. As discussed in Section 4.4, Zipf metrics capture scaling properties only,which were shown to be insufficient for pleasant music alone.

Zipf’s Law in music seems to be universal in that it applies to many different(if not all) styles of music. Musical taste, however, varies greatly from personto person and depends on many factors such as nationality, culture andmusical background. Exposure to music will likely affect our musical taste,e.g. an Indian is likely to prefer Indian music over country, simply becausehe is more familiar with the style.

Thus, instead of attempting to model universal musical properties, it mightbe more fruitful to model properties in certain styles of music.

Musicians usually focus on a few selected musical styles, but how does creativ-ity come to the musician? An important part of the music creative process isundoubtedly listening to a lot of music for inspiration. Musical concepts andideas are borrowed from music we like, either knowingly or subconsciously.Either way, there is certainly an element of learning involved in musical cre-ativity.

This chapter presents a more knowledge-rich approach to musical fitness.The approach takes both contents and scaling properties into account, whichcan be learned from existing music pieces.

3.4.1 Metric Frequency Distributions

Each metric (see Section 3.2.1) counts the occurrence of some type of musicalevent, producing a frequency distribution of events. Figure 3.6 shows thefrequency distribution of chromatic tones used in Mozart’s Piano SonataNo. 16 in C major (K. 545). From the distribution it is seen that the piecemainly concerns the pitches C (0), D (2), E (4), ..., i.e. the C major scale.

3.4. FITNESS BASED ON DISTRIBUTION SIMILARITY 35

Figure 3.6 Frequency of chromatic tones from Mozart’s Piano Sonata No. 16.

(a) (b)

Figure 3.7 Melodic intervals from Mozart’s Piano Sonata No. 16 (a) and De-bussy’s Prélude Voiles (b). Note the difference in shape and intervals used.

Perhaps more interesting is the frequency of melodic intervals (Figure 3.7a):the major (±2) and minor second (±1) intervals dominate the melody, fol-lowed by the minor third (±3), unison (0), perfect fourth (±5) and majorthird (±4). Compare this to the melodic intervals in Debussy’s Prélude Voiles(Book I, No. 2) shown in Figure 3.7b, where the major second is mainly used– evidence of the whole tone scale that is employed.

As can be seen, there is a wealth of knowledge in such frequency distributions.A fitness function which makes use of this knowledge could steer evolutiontowards music that is statistically similar to some selected piece.

3.4.2 Cosine Similarity

The fitness function presented herein takes as input a set of discrete tar-get frequency distributions, which can be learned from existing music. The


fitness score is calculated based on the similarity to these distributions.

Concepts from the field of Information Retrieval (IR) are borrowed, where acommon task is to score documents based on similarity. The standard tech-nique operates on the frequency of different words in a document – term fre-quencies. Each document is viewed as a vector with elements correspondingto the frequency of the different terms in the dictionary. With the documentvector model, the similarity of two documents can be assessed by consideringthe angle between their respective vectors – the cosine similarity :

sim(A,B) = cos(θ) =A ·B‖A‖ ‖B‖

(3.4)

A and B are the two document vectors, the numerator is their dot productand the denominator is the multiple of their norms. Since the vector elementsare strictly positive, the similarity score ranges from 0 meaning completelydissimilar (independent) to 1 meaning exactly the same. The cosine similarityhas the advantage of being unaffected by differences in document length –the denominator normalizes the term frequencies.

In the musical domain, the documents are music scores. Instead of words,musical features are considered: pitches, melodic intervals, rhythm etc. asdescribed in Section 3.2.1. For instance, the cosine similarity between themelodic intervals in Mozart’s and Debussy’s pieces (Figure 3.7) is 0.92.

3.4.3 Fitness Function

For a given metric m, fitness is defined as the cosine similarity between thefrequency vectors of the music individual x and a target piece T:

fm(x;T) = sim(x,T) =x ·T‖x‖ ‖T‖

(3.5)

fm will thus reward music with features that are statistically similar to thetarget piece with respect to the metric. In other words, music which exhibitsthe same events at similar relative frequency. The target vector can stemfrom a single music piece or a collection of pieces.


For example, if the melodic intervals in Figure 3.7b were used as the tar-get vector T, the fitness function would reward music in the whole tonescale similar to Debussy’s Prélude Voiles. Furthermore, positive intervals areapproximately as frequent as negative intervals. Balanced melodies wouldtherefore be favoured, i.e. where upward and downward motions occur ap-proximately as often.

For multiple features, the fitness is simply the weighted sum of the similarityscores fm:

f(x1,x2, . . . ,xN ;T1,T2, . . . ,TN) =N∑i=1

wifi(xi;Ti) (3.6)

where i denotes the metric, xi and Ti are the metric frequency vectors and wiis the weight (importance) of metric i. For convenience, the sum is normalizedto produce fitness values in the range [0, 1].

As mentioned, the fitness function rewards music which exhibits propertiesthat are statistically similar to a target music piece. The motivation for thisapproach is not to copy, but rather to learn from existing music by extractingcommon music knowledge plus a bit of inspiration. The amount of inspirationdepends on which metrics are included. For instance, higher level melodicn-grams will reward music which mimics the melody in the target piece.

3.4.4 Relationship to Zipf’s Law

Zipf’s Law applies to rank-frequency distributions, i.e. only the relative fre-quencies of events are considered. Cosine similarity, on the other hand, oper-ates directly on frequency distributions and thus both relative frequency andevent content is taken into account. That is, cosine similarity incorporatesZipf’s Law.

If the cosine similarity of two frequency distributions A and B is 1.0, it followsthat their respective rank-frequency distributions will have the same shape.Consequently their Zipf slopes will also be identical:

sim(A,B) = 1.0⇒ slope(A) = slope(B)

Assuming that the target music piece exhibits Zipfian slopes, the similarity-based fitness function (3.6) will thus promote music with similar slopes.


3.4.5 Filtering

When counting the many events in real-world music, there is likely to be someevents that occur very rarely, i.e. have low frequencies. In other words, thereis bound to be some noise. When target vectors are derived from a musicscore, it is desirable to put emphasis on the most descriptive events. Giventhe repetitive nature of music, it is reasonable to assume that importantevents occur often. That is, events with very low frequencies are consideredof little value and can be filtered out.

A simple filtering method is to discard events whose frequency is below somethreshold. This has proven to be an effective method in text categorization,where the dimensionality of document vectors can be greatly reduced byisolating the most descriptive terms (Yang and Pedersen, 1997).

When real-world music is used for target vectors, events are filtered out whosenormalized frequency is below a threshold according to the criterion:

f

N< p (3.7)

where f is the event frequency, N is the total number of events in the scoreand p is the threshold in percent. For example, a threshold of p = 0.01 woulddiscard events whose frequency accounts for less than 1% of all events.

Figure 3.8a shows the frequency distribution of melodic bigrams from Moz-art’s Piano Sonata No. 16. Applying a 1% threshold filter results in theremoval of 86 events, producing the much smaller distribution with 19 eventsas seen in Figure 3.8b.


(a) (b)

Figure 3.8 Frequency of melodic bigrams from Mozart’s Piano Sonata No. 16 :full distribution (a) and after filtering events below a 1% threshold (b).

Chapter 4

Experiments: Zipf’s Law

Previous work by the author explored the use of Zipf’s Law as fitness forevolution of short melodies (Jensen, 2010). Results showed that pleasantmelodies could indeed be generated with such a technique. Several favour-able musical features were seen in the results including melodic motifs. Someconstraints were necessary to achieve any pleasant results, most notably re-stricting note pitches to a pre-defined scale.

Several different target slopes were tried in order to improve the quality of theevolved melodies, but it was difficult to find a good set of slopes. On average,only 10% of the evolved melodies were perceived as pleasant. Furthermore,the melodies lacked several important musical features, including rhythm andstructure.

In this chapter, experiments are presented where the goal was to improvethese results. In Section 4.1, the performance of the tree-based represent-ation is investigated. Melodies evolved with the tree-based representationare qualitatively compared to melodies from previous work in Section 4.2.Finally, rhythmic qualities are introduced in Section 4.3.

4.1 A Musical Representation

In earlier work (Jensen, 2010), three linear genotypes were explored withrespect to the Zipf-based fitness function: An event-based binary genotype,

41

42 CHAPTER 4. EXPERIMENTS: ZIPF’S LAW

an event-based vector and a dynamic vector representation. All of these rep-resentations fall under the GA umbrella and the event-based vector achievedthe best fitness of the three. However, a near-maximum fitness was neverachieved and convergence was relatively slow.

Experiments were therefore performed in an attempt to improve fitness andconvergence speed. Two options were investigated:

1. Test a new tree-based genotype.

2. Modify the old vector-based genotypes.

4.1.1 Introduction

As discussed in Section 2.5, two approaches to representation are commonlyfound in the evolutionary music literature. The first is the linear binary/vectorgenotype (GA) similar to what was already tried. The second is the tree-based representation (GP), which some have argued is well suited for musicbecause of its hierarchical structure (see Section 3.1). It was therefore de-cided to test a GP approach to see if it improved fitness and convergencespeed.

In summary, the event-based genotypes from previous work are similar to theexample shown in Figure 2.4, i.e. employing implicit note durations. How-ever, the event-based vector is symbolic instead of binary and consistentlyachieved better performance. The dynamic vector genotype is simply a listof (enable, pitch, duration) triplets, i.e. with explicit note durations. Thefirst element, enable, is a flag which dictates whether the triplet should beevaluated or ignored when interpreted by the phenotype. This allows forvariation in the number of notes.

The hypothesis was that the slow evolutionary convergence of the vector-based genotypes was caused by low genetic diversity in the population. Asdiscussed in Section 2.2, diversity is a key element in Evolutionary Compu-tation. The two vector-based genotypes were changed to see if this was thecase. As a first test, the mutation operator was modified so as to change anentire gene instead of a single gene element. Another mutation variant wasalso tried, where mutation changed a whole genome segment. Finally, themutation rate was increased to see if it improved the performance.

4.1. A MUSICAL REPRESENTATION 43

4.1.2 Setup

For all the experiments, evolution was run for 500 generations. For eachgeneration, the maximum fitness was averaged over 30 runs and plotted.The choice of parameters is described in more detail in Section 4.1.3.

General Parameters

The evolutionary parameters listed below were identical for all experimentsunless otherwise noted:

• Population size: 100

• Mutation rate: 0.1

• Crossover rate: 0.9

• Tournament selection: k = 5, e = 0.1 (see Section 2.2)

• Fitness: Weighted sum with 10 Zipf metrics:

– Metrics: pitch, chromatic-tone, duration, pitch duration, chromatic-tone duration, pitch distance, chromatic-tone distance, melodicinterval (absolute), melodic bigram (absolute), melodic trigram(absolute)

– Target slopes: Ti = −1.0– Tolerance: λ = 0.5

– Weights: uniform 1.0 for all metrics

Tree-Based Representation Parameters

The parameters for the tree-based representation (see Section 3.1) were se-lected to closely match the properties of the vector-based genotypes:

• Pitches: 12

• Durations: 5


• Resolution: 16

• Functions: concatenation (+)

• Terminals: simple (pitch, duration) or event-based (event-type, value)

• Maximum tree depth: 5 or 6 resulting in maximum 25 = 32 or 26 = 64notes, respectively

• Initialization method: grow or full

• Function probability: 0.9 (default)

• Terminal probability: 0.1 (default)

The event-based terminal scheme is similar to the event-based vector, i.e.tuples of (event-type, value) where event-type is either note or hold and thevalue holds the pitch.

Vector-Based Representation Parameters

The parameters for the vector-based representations were derived from theprevious work and are summarized below.

For the event-based vector, the parameters were:

• Bars (measures): 4

• Pitches: 12

• Resolution: 16

The following parameters were used for the dynamic vector :

• Length: 64

• Pitches: 12

• Durations: 5

• Resolution: 16


4.1.3 Experiment Setup

In total, 14 experiment runs were performed where different parameters weretested. In six of the runs the tree-based genotype was used, while in eight runsthe vector-based genotypes were employed. The setup for each experimentis detailed below.

Tree-Based Representation

For the tree-based representation, it was important that the parameters wereas similar to the vector-based genotype as possible. This was both to ensurea fair comparison, but more importantly to isolate whether the tree structurewas beneficial for music.

As such, two different terminal schemes were tested: simple and event-based,similar to the dynamic vector and event vector, respectively.

Another key parameter is the maximum tree depth, which dictates the max-imum number of possible notes. Fewer notes results in less freedom for evol-ution to find melodies with high fitness. It was therefore important thatthe max-depth parameter was roughly equivalent to the number of possiblenotes in the vector-based genotypes: 64. The max-depth was thus set to 6,resulting in 26 = 64 number of notes. It was decided to also try a depthof 5 (32 notes) to see how the tree-based genotype would perform with lessfreedom than the vectors.

Finally, two different tree initialization methods were tested: grow and full,to see if either method led to significant advantages with respect to fitnessand convergence speed.

As such, for both simple and event-based terminals, the following configura-tions were tested:

1.1. Max tree depth: 6, initialization method: grow

1.2. Max tree depth: 6, initialization method: full

1.3. Max tree depth: 5, initialization method: grow

Resulting in a total of 6 experiment runs.


Vector-Based Representation

As mentioned, it was hypothesized that the genetic diversity was too low,causing slow convergence for the vector-based genotypes. The mutation op-erator previously employed was designed to mutate only part of the gene, i.e.a single element of the tuple or triplet.

To increase genetic diversity, the mutation behaviour was changed so thatthe entire gene was randomly changed, i.e. all elements of a tuple (triplet).A second mutation variant was also explored where a whole genome segmentis altered, i.e. multiple sequential tuples (triplets), in order to boost geneticdiversity even more.

As a final measure, the mutation rate was increased to see if it improved theperformance. However, it was decided to only test the increased mutationrate using the mutation operator with the best performance.

Thus, for both the event-based vector and dynamic vector, the following testswere performed:

2.1. Mutation of entire gene

2.2. Mutation of genome segment

2.3. Using the best mutation operator from (2.1) and (2.2), increase muta-tion rate to:

(a) 0.5

(b) 0.9

This resulted in a total of 8 experiment runs.

4.1.4 Results and Discussion

The results from the 14 experiments are presented below and compared toearlier results. Finally, the best configurations of the three genotypes arepresented.


(a) Simple terminals (b) Event-based terminals

Figure 4.1 Fitness plots for the three different configurations of tree-based geno-types with simple (a) and event-based (b) terminals. The event vector from earlierwork is also shown as reference (old).

Tree-Based Representation

Figure 4.1 shows the fitness plots from the 6 different tree-based configura-tions tried, along with the old event vector from earlier work as reference.Surprisingly, all variants of the tree-based genotypes yielded superior per-formance compared to the vector representation. On average they convergedmuch faster and reached a higher maximum fitness.

From the fitness plots, it is evident that the best performer was the tree withsimple terminals (Figure 4.1a), which consistently reached better results thanthe event-based counterpart (Figure 4.1b). Trees with a maximum depth of5 also performed well (configuration 1.3), which signifies that even shortmelodies could achieve a high fitness.

As can be seen, using the full method for initialization (1.2) seemed to resultin marginally higher fitness values. However, the melodies evolved using thefull method were generally longer than those evolved with grow (1.1): theaverage number of notes was 45 and 29 for full and grow respectively (withsimple terminals). As such, the slight difference in fitness values was likelyrelated to melody length rather than the initialization method itself.

A similar trend was found with the event-based tree, where the average num-ber of notes was 20 (full) and 16 (grow). This indicates that the worse per-formance found with the event-based terminals was also due to the shorter


(a) 2.1. Mutation of entire gene (b) 2.2. Mutation of genome segment

Figure 4.2 Fitness plots for the vector-based genotypes using two different muta-tion operators: 2.1. Mutation of entire gene (a) and 2.2. Mutation of genomesegment (b). For reference, the plots using the old mutation operator also shown.

melody lengths. Thus there seemed to be no significant difference betweenthe two terminal schemes, at least from a fitness perspective.

At this point, it was not evident why the tree-based representations per-formed so much better than the vectors. Apart from the difference in struc-ture, the mutation operator used for GP is more explorative than the GA’s,in that it can alter many genes (nodes) at the same time as opposed to onlyone. It was thus hypothesized that mutation was a key element to the successof GP.

Vector-Based Representation

Figure 4.2 shows the fitness plots from the two mutation operators appliedto both vector-based genotypes: 2.1. Mutation of entire gene (4.2a) and 2.2.Mutation of genome segment (4.2b). The fitness from using the old mutationoperator are also included for reference.

As shown in Figure 4.2a, mutation of the entire gene (2.1) resulted in a slightimprovement in fitness when applied to the event vector. For the dynamicvector, however, entire-gene mutation drastically improved the performance;much faster convergence and higher fitness was achieved, matching the per-formance of the event-vector. These results confirmed the hypothesis thatmore genetic diversity would improve performance. However, the fitness wasstill not comparable to the tree-based representations.


(a) Event vector (b) Dynamic vector

Figure 4.3 Fitness for the two vector-based genotypes using entire-gene mutation(2.1) and increased mutation rates 0.5 (2.3a) and 0.9 (2.3b).

The fitness plots from genome segment mutation (2.2) can be seen in Fig-ure 4.2b. For the event vector, segment mutation lead to worse performancecompared to the old mutation operator, i.e. mutation was destructive in thiscase. The dynamic vector displayed a performance increase, but results werenot as good as 2.1.

Since entire-gene mutation (2.1) resulted in better performance than segmentmutation (2.2), it was therefore used in test 2.3: increased mutation rate of0.5 (2.3a) and 0.9 (2.3b). In Figure 4.3, the fitness plots from the four runsmay be seen, along with the results from experiment 2.1 for comparison.

As can be seen, a mutation rate of 0.5 further improved the performanceof both vector representations, with the dynamic vector now surpassing theevent vector. Increasing the mutation rate further to 0.9 did not seem toimprove results at this point.

A Final Comparison

Figure 4.4 shows the fitness plots for each genotype configuration yieldingthe best performance, where the tree-based representation is the clear winner.The genotype configurations are summarized below:

Tree: simple terminals, max-depth: 6, initialization method: full (1.2)

Vectors: entire-gene mutation, mutation rate: 0.5 (2.3a)


Figure 4.4 The best configurations found for the tree-based and vector-basedrepresentations.

At this point, no further improvements were made. Why the GP approachperformed so well is not fully understood. One hypothesis is that the mod-ular, hierarchical structure of the tree is particularly beneficial for music, atleast with respect to Zipfian properties. Another reason might be that treesallow genomes of different lengths, while the vectors explored here did not.

Either way, the primary focus of this research is fitness for music and it wastherefore decided to use the tree-based genotype for future experiments.

It should be noted that these experiments by no means represent an exhaust-ive comparison of GA versus GP for music. It is almost certain that a GAexpert would be able to design a genotype which is better able to matchperformance of the GP presented herein. As such, these experiments areprimarily included to justify the choice of a tree-based representation for therest of this research.

4.1.5 Summary

The tree-based representation was shown to outperform the linear repres-entations previously employed. Although improvements were achieved withthe vector-based genotypes, the tree was still clearly the best performer with

4.2. TREE-BASED COMPOSITION 51

higher fitness and faster convergence speed. Why the tree-based represent-ation performed so well is not fully understood. It is hypothesized that themodular, hierarchical structure of the tree is particularly beneficial for music.

4.2 Tree-Based Composition

In the previous experiment, tree and vector-based r

Documents

Evolutionary Music Composition - COnnecting REpositories · Master of Science in Computer Science June 2011 Pauline Haddow, IDI Submission date: Supervisor: Norwegian University of