Machine Learning in Music Composition€¦ · explained or understood to the point of being recreatable, is not ... 2 Ben Goertzel, and Cassio Pennachin, Eds. Artificial General Intelligence

1

University of Adelaide

Elder Conservatorium of Music

Research Project

Submitted in partial fulfilment of the requirements for the degree of

Bachelor of Music with Honours

Machine Learning in Music Composition

Submitted by

Simon Harley Koehn a1667613

Adelaide, June 2017

2

CONTENTS

Abstract 3

Declaration 4

Acknowledgements 5

Introduction 6

Chapter 1 Machine Learning and Algorithmic Composition History 8

Chapter 2 Machine Learning and Composition Techniques 11

Chapter 3 Examples of Machine Learning In Music 16

Chapter 4 Working Towards a New System 22

Conclusion 24

Bibliography 26

3

ABSTRACT

Within the field of artificial intelligence, machine learning is the study and creation of computer

systems that can learn. Such systems have can be applied to algorithmic composition, in which a

machine learning algorithm learns musical rules from existing music samples and applies that to its

own generated composition.

This paper provides a general explanation of certain machine learning concepts, and a historical

context for these. Three existing systems are examined to give a sense of what is musically possible

with such a system, and identify their limitations and possible improvements.

A conceptual model is then proposed for a future system that draws on some of the ideas presented,

taking note of the strengths of techniques used in the example systems. By presenting key concepts

and pointing to possible future directions, this paper lays the foundation for future research into ML

music systems.

4

DECLARATION

The candidate declares that the material contained in this submission is his/her own work and that appropriate recognition has been given when referring to the work of others.

Signature:........................................

Simon Koehn

Date: 22/6/2017

5

ACKNOWLEDGEMENTS

Thanks to my supervisor Christian Haines for invaluable guidance throughout this project. Also to Adele Sliuzas, for her expert proof-reading and moral support, all while carrying our unborn child.

6

INTRODUCTION

Machine learning is a powerful computational tool with numerous possible applications. Current

machine learning techniques have been applied to music composition with great success. New

technology and greater computational power in the twenty first century has seen a rise in how

machine learning can be manifested in the world of music. As a fast paced field, these applications of

machine learning are still in their infancy and therefore open up possibilities for the future scope of

machine learning in music. Further developments in machine learning as related to algorithmic

composition will allow for increasingly human-like computer music composition.

Artificial intelligence (AI) deals with the application of machines to tasks that are generally

considered to require human intelligence. Machine learning (ML) is an important subfield of this,

specifically concerning computer systems that can be said to learn.1 The study of AI is often met with

debate in regards to its goals and scope. The notion that intelligence, as exhibited by humans, can be

explained or understood to the point of being recreatable, is not universally accepted2. Whether or not

true intelligence can ever be created, many of the techniques that have arisen from the pursuit of this,

including ML, are of great value.

If the goal of AI is to create human-like general intelligence in a machine, machine learning is one of

the key elements that must operate in conjunction with other subfields of AI. Simply put, these

systems make predictions on data fed into them, based on previous sets of input data. ML can be used

to great effect when applied to tasks that might be difficult to address by developing more explicit

algorithms or procedures. Such tasks might include identifying objects in an image, predicting the

price of a house based on numerous attributes, or even predicting people’s actions in a given situation.

1 Ron Kohavi and Foster Provost, "Glossary of terms". Accessed 1 June, 2017, http://robotics.stanford.edu/~ronnyk/glossary.html Originally published in Machine Learning, 30 (1998) 271–274. 2 Ben Goertzel, and Cassio Pennachin, Eds. Artificial General Intelligence. (Berlin: Springer, 2007) vi-vii.

7

In addition to these more practical applications, ML, amongst other techniques from AI, has been

incorporated into artistic practices. This takes the form of either a tool that uses machine learning to

assist with the artistic process, or a ML algorithm that is created to generate artistic content of its own

based on learning from existing art works or other data. The application of ML to music fits obviously

into the discipline of algorithmic composition. Numerous attempts have been made to produce music

with ML, and many are in progress.

Chapter one of this paper will define some of the key concepts in ML, and look at its history, and the

more general history of AI. This is then expanded upon in the second chapter, with specific regard to

artificial neural networks and genetic algorithms. Their inner workings are discussed, along with

approaches to applying ML to composition. In the third chapter, three examples are examined and

assessed in terms of their limitations and musical possibilities. The first of these is referred to in this

paper as LZC, a model proposed by Lichtenwalter, Zorina and Chawla for generating musical scores.3

WaveNet by Google Deepmind is an artificial neural network that can be trained on digital audio files

in order to generate its own sample by sample. It is unable to consider musical parameters as

meaningfully as LZC, but Its ability to mimic the input sounds shows promise for future iterations.

Lastly Wekinator represents a different application of ML to musical tasks. It operates as a trainable

control system that allows the user to link almost any input to musical parameters in other software.

These examples demonstrate the scope of possible applications of ML to music, point to future

possibilities. In response to the concepts presented, the fourth chapter proposes a new system,

DuANN, that combines elements of the example systems in chapter three in order to allow for user

interaction in the generation of music.

In all, this paper serves as a summary of machine learning in algorithmic composition, drawing on

concepts from both computer science and musical disciplines. In doing so it provides a foundation for

future study in this area, and points to possible directions that might be pursued. 3 Lichtenwalter, “Applying learning algorithms”

8

Chapter 1

Machine Learning and Algorithmic Composition History

The use of ML in musical tasks typically fits the definition of algorithmic composition, whether it is

applied to genuine composition, or fits somehow into a broader compositional process. As such, ML

music has historical roots in AC as well as computer science and AI. As AI and ML research

progresses, with increasingly effective systems and greater computational power, so too does the the

potential for its application to AC. Algorithmic composition (AC) can be defined as the ‘generation of

musical structure... [by] a formalizable and abstracting procedure’.4 This definition encompasses the

use of any conceivable algorithm, as long as it is used to generate musical structure on some level. It

also covers a spectrum of application, from the use of algorithms to assist with specific compositional

tasks, to generating entire compositions. Certain classes of algorithms have been often used in

composition, and as such have become standard approaches of AC. These include Markov chains,

transition networks, chaos systems, and generative grammars.

Examples of AC can be found as early as AD 1000, and appear throughout the following millennium.

The calculation of an algorithm is obviously a vital part of the AC process, and as such, the

development of the programmable computer dramatically increased the feasibility of more

complicated systems. Calculations that might once have been arduous manual tasks could be

automated and completed in a much shorter time.5 This allowed for the development and

implementation of evolutionary and adaptive algorithms such as genetic algorithms and artificial

neural networks.6 The latter refers to a specific type of learning algorithm used in ML, which, in

various forms, has been successfully applied to compositional tasks.

4 Gerhard Nierhaus, Algorithmic Composition (Vienna: Springer, 2009), 1. 5Nierhaus, Algorithmic Composition, 21,63.

6 Nierhaus Algorithmic Composition, 4-5.

9

Within the field of computer science, Machine Learning deals with the study and creation of learning

algorithms. These algorithms make predictions on data sets based on the results of previous

predictions and sample data sets. ML systems avoid using static program directives by making

decisions or predictions dependant on the data they are fed. This attribute is of great value in tasks that

require analysis of large amounts of data or are difficult to explicitly define.

ML is one of several areas of study that have developed out of AI research. The term artificial

intelligence was first suggested by John McCarthy in 1956 while attending a summer workshop at

Dartmouth College alongside other important figures in AI. It concerns the development of systems

that model behaviours usually associated with human intelligence.7 Following the Dartmouth

workshop, many of those in attendance went on to develop basic AI systems, generating a great deal

of enthusiasm for the field. Among these were Allen Newell and Herbert Simon’s reasoning program

the Logic Theorist, Arthur Samuel’s checker playing program, and systems for problem solving,

vision, learning and planning, which were developed by Marvin Minsky and his students.8

While these systems showed promise, more complex real-world problems were still out of reach.

Accurate modeling of intelligence that could solve problems more efficiently requires vast amounts of

knowledge and the ability to easily search through that knowledge. More generally, it was

acknowledged that a greater understanding of cognitive processes was required in order to develop

these models. This led to a split into many areas of AI, allowing researchers to focus on individual

cognitive processes. These include knowledge representation, search, theorem proving, language

processing, vision, robotics, and learning. 9

7 Tecuci, Gheorghe. “Artificial Intelligence” in WIREs Comput Stat 2012, 4. 169 8 Tecuci, “Artificial Intelligence”, 179. and , Michael Paluszek and Stephanie Thomas, MATLAB Machine Learning. (Berkley: Apress, 2017) 17. 9 Tecuci, “Artificial Intelligence”, 170.

10

Although ML began as a branch of AI, some techniques and related concepts were developed much

earlier. Bayes’ Theorem, which suggests an equation for determining the probability of an event

dependant on another variable, was created by Thomas Bayes in 1763, and is still used in ML

systems. In 1957 Frank Rosenblatt began developing his perceptron, the first model of a type of ML

algorithm called artificial neural networks (ANN).10 Rosenblatt’s model is relatively simple, but with

significant developments it survives as a key element in many ML systems.11

Contemporary ML began as data mining, an area of considerable focus within early AI research

concerned with the analysis of data. While interest in this initially diminished, it was reinvented in the

1990s as machine learning, applied to tasks of pattern recognition. The vast amounts of data that had

since become available thanks to internet connectivity made this approach significantly more feasible.

ML has since been applied to numerous tasks including driverless cars, high-speed stock trading, and

the prediction of human and natural events.12 Development of ML technologies continues today,

fuelled in part by their value to many consumer technologies, but also industrial and military

applications. The integration of human and machine intelligence also being actively researched, with

potential to lead to autonomous control over machines, or even human augmentation.13

ML and AC both have their roots in numerous fields, usually predating their inception. Developments

in other areas of study continue to contribute to AC, and as ML techniques become increasingly

powerful, facilitated in part by increases in computational power, their application to musical tasks

benefits. The concepts presented in chapter two, and more elaborate examples of those in chapter

three, are a continuation of the historical timeline presented here.

10 Nierhaus, Algorithmic Composition, 207. and Richard F. Lyon, Human and Machine Hearing. (Cambridge: Cambridge University Press, 2017) 420. 11 Lyon, Human and Machine Hearing, 420. 12 Paluszek, MATLAB Machine Learning, 21 13 Paluszek, MATLAB Machine Learning, 22.

11

Chapter 2

Machine Learning and Composition Techniques

Within ML many different algorithms and techniques are used for different kinds of learning, and are

best suited to different learning tasks. In order to address ML approaches to algorithmic composition,

some common and foundational approaches should be understood.14 Two techniques are therefore

described: Artificial Neural Networks and Genetic Algorithms.

A major consideration in AC is the mapping of output data to musical parameters. ML systems

inherently deal with some of the difficulties of this. By making use of sample data to determine the

musical rules they apply, many general musical concepts are automatically imparted. This is one of

the primary attractions of using ML for compositional tasks, however some mapping issues remain. It

must be determined which musical parameters comprise the input data, and how those parameters are

extracted into numerical data. This also applies to the system’s output.15 An awareness of various

approaches to this is as important as understanding the workings of the ML systems themselves. This

will be touched on here in relation to ANNs and genetic algorithms, and differing approaches can be

found in the examples in chapter three.

Artificial neural networks are ML systems that are structurally analogous to neural networks in the

human brain.16 They learn to predict solutions by changing the structural relationships of a network of

data processing units called neurons. ANNs vary significantly in structure and application, from the

most basic, single layer perceptrons, to deep neural networks with complex architecture.17 They are

widely used in the field of ML, valued for their suitability to problems of pattern recognition,

prediction, optimization, and automatic classification.18

14 Nierhaus, Algorithmic Composition, 245. 15 Tom M. Mitchell, Machine Learning, (Singapore: The McGraw-Hill Book Co., 1997) 52. 16 Mitchell, Machine Learning, 82. 17 Lyon, Human and Machine Hearing. 419. 18 Nierhaus, Algorithmic Composition, 205.

12

Early conceptions of connectionist structures, or ANN-like structures were created by Warren

McCulloch and Walter Pitts in the study of nervous systems. The McCulloch-Pitts neuron developed

in 1943 is a model that allows for basic logic calculations. In this model three inputs are summed

together and the system returns a 0 or 1 depending on whether the result reaches a given threshold. All

though these neurons are very simple, when arranged in a network they are capable of much more

complex calculations.19

Donald Hebb’s 1949 book The Organization of Behaviour was the next major contribution to the

development of ANNs. In it he proposed the first learning rules for neural architectures as they apply

in the human brain. Regarding the biological model, Hebb suggests that connections between brain

cells are reinforced by a growth process or metabolic change when one repeatedly triggers the other.

This is known as Hebbian learning and is used in various ANN systems, as well as other ML

techniques.20

Rosenblatt’s Perceptron (1957), is model generally accepted to be the first ANN, and it continues to

be used in ML systems. A perceptron maps a number of inputs to an output, generating a predicted

answer for the problem it has been trained to solve. The connections are affected by weights that are

that are adjusted according to the accuracy of a predictions on sample data sets. These simple weights

and mappings can be expressed algebraically as:

y=Wx

The multiplication of the weights and inputs is essentially a matrix multiplication. In this case y is the

output of the perceptron and the sum of the weighted input values, x is the input values, and W is the

weights by which the inputs are multiplied. The output is often then compared to a set threshold,

indicating a positive or negative result depending on that comparison. In this way the perceptron can

19 Paluszek, MATLAB Machine Learning, 17. 20 Nierhaus Algorithmic Composition, 207. and Paluszek, MATLAB Machine Learning, 17.

13

be used to classify a given input data set, or without a threshold, can make linear numerical

predictions. Adjustments to the weight values is comparable to the varying strength of connections

between neurons in a human brain.21

The original perceptron model is known as the single-layer perceptron (SLP). In this model the inputs

are directly mapped to the output via their learned weights, with weighting only occurring once,

between input and output. SLPs are not without limitations, as highlighted by Marvin Minsky and

Seymor Papert in 1956. Their observations temporarily held up the development of ANNs, but it was

later discovered that additional layers of neurons allowed systems to overcome those limitations.22

With the perceptron as a structural foundation, the incorporation of a back-propagation algorithm

allows the ANN to learn. Back-propagation changes the connection weights beginning with those to

the outputs and then moving back through the layers until reaching the inputs. In contrast to the feed-

forward networks described so far, recurrent neural networks incorporate connections between

neurons that do not send data in an input to output direction, and can create cycles within the network.

These networks can retain information from previous data sets and operate dynamically across

different learning events.

Genetic algorithms (GA) are a class of evolutionary algorithms whose mechanisms are loosely

modeled on Darwinian biological evolution. They employ stochastic processes in the generation and

recombination of data sets that are representative of a possible solution to the task they are applied to.

GAs excell at very different tasks to ANNs. Rather than learning from sets of data, they search for the

most efficient or desirable solution to a given problem by measuring the success of randomly

generated solutions.23

One of the basic elements of a GA is a population of individuals, which are initially randomly

generated, or generated according to the current best solutions to the problem. Each individual has a

21 Lyon, Human and Machine Hearing. 420. 22 Mitchell, Machine Learning, 95. and Lyon, Human and Machine Hearing. 420. 23 Nierhaus, Algorithmic Composition, 158-159.

14

chromosome which is a set of data from which a possible solution can be derived. On each cycle of

the system, the chromosome of every member of the population is assessed against the fitness criteria,

which is typically a measurement of the the chromosome’s success at solving the problem. After the

assessment, individuals are selected to parent the next generation of the population. Those with better

scores in the fitness test are more likely to be selected and their chromosomes combined to make new

individuals. At this point, each new individual has a small chance to be mutated, that is, some data in

its chromosome may be randomly generated. This cycle continues until either the solution is found or

all individuals possess the same data.24

An alternate technique of evolutionary computation is genetic programming (GP). In these systems

the individuals within the population are themselves computer programs, rather than numerical

strings. The programs in GP algorithms are represented as trees. The trees are comprised of nodes that

indicate a specific mathematical function, and the variables of a given function make up the

descendant nodes in the tree structure. Similarly to GAs, GP uses successive iterations of its

population to improve performance at task. The program trees’ fitness is assessed, and new

generations are created, combining and mutating the previous trees. In this way GP systems are

capable of developing an algorithm that is well suited to a particular task, and as such, can be

especially useful when working in conjunction with other computational techniques.25

Applying these techniques to compositional tasks brings its own set of problems. While ML systems

are often able to derive musical and compositional rules from sample music, the way data is extracted

from the music, and also reinterpreted at the output, is a significant decision for the composer. The

ML composition process can be broken into three steps: data processing, which in ANNs is the

conversion of musical features into a numerical representation; data analysis, the processing of the

data through the network; and then the generation of a composition by the trained system. The third

step in particular presents a particular challenge. Especially in the pursuit of genuine human-like

24 Mitchell, Machine Learning, 250 and Nierhaus, Algorithmic Composition, 159. 25 Mitchell, Machine Learning, 262.

15

compositions, the rules put in place at this step should be general enough to encapsulate the rules of

tonal music, but also must be able to more narrowly facilitate the use of these rules to generate music

that resembles the input music.26

Various approaches have been used to represent musical parameters in the context of ML. While these

are often dependant on the specific system, there are some more general approaches. Temporal

parameters, such as the duration of specific notes or phrases has occasionally been dealt with by

creating windows of time as structural components of the composition. A most obvious example is the

use of a musical measure as a temporal building block. Note sequences can then be arranged within a

given window, simplifying the task of relating the contained notes to each other. Windows can also be

easily repeated or arranged to adhere to higher level structural rules.

Pitches, in combination with a note’s temporal features, are much more easily codified, either by

mapping a tuning system to numbers, or using pre-established formats such as MIDI or MusicXML.

So as to learn and generate coherent musical phrases, pitches will often be considered in the context

of a scale and learned relative to the melodic scope of that scale.27

26 Ryan Lichtenwalter, Katerina Zorina, and Nitesh V. Chawla, “Applying learning algorithms to music generation” In Proceedings of the 4th Indian International Conference on Artificial Intelligence IICAI 2009, (2009) 1. 27 Nierhaus, Algorithmic Composition, 213

16

Chapter 3

Examples of Machine Learning in Music

This chapter will examine three unique examples of applied ML in order to cultivate a more palpable

understanding of how ML can be used to generate music. These examples will be analysed for their

use of ML, alongside its limitations, and possible future directions. Each represents music generation

at differing levels. The first, is a model proposed by Ryan Lichtenwalter, Katerina Zorina and Nitesh

V. Chawla, and discussed in their paper Applying Learning Algorithms to Music Generation.28 This

model, which will be referred to as LZC, learns from and outputs musical notation in the form of

MusicXML files. The second example, Google Deepmind’s WaveNet, generates digital audio

waveforms sample by sample, learning from and outputting sound at a resolution much finer than

musical notation. Finally, Rebecca Fiebrink’s Wekinator applies ML to a very different musical task.

It learns to recognise a series of inputs, which can be controlled by human hand gestures, and maps

them to specified output data. In this way it can be used as a trainable controller for the real-time

manipulation of musical, or other, parameters.

LZC

The system proposed by Lichtenwalter, Zorina and Chawla (LZC), avoids the use of explicit rules to

funnel the output into a limited musical scope, such as blues progressions or jazz improvisation. To do

this it employs a sliding window sequential learning technique. This allows for a given parameter to

be determined in relation to the musical content that temporally precedes it. It is also designed to be

usable with a variety of different learning algorithms. LZC is a powerful example of the musical

potential in ML composition that deals with music generation on a notational level.

An important design decision of LZC is its use of the MusicXML format as both a source of musical

data, and an output. MusicXML is a file format for musical notation, and this ensures that the all the

28 Lichtenwalter, “Applying learning algorithms”.

17

data LZC deals with is clearly musically expressible. This contrasts with MIDI files, in which musical

form is more difficult to clearly indicate. Another key feature of the LZC system is the way in which

it considers each note in regard to others in its vicinity. A sliding window system is employed to this

effect, in which a window selection moves along the score, grouping sections of notes together. Each

note’s features are acknowledged by the system to include the notes preceding it in the sliding

window. This information is learned in terms of both the pitch and duration of a note and its

precedents.29

LZC is designed to learn from any musical notation input, which must then be applicable to any

composition it generates. Information acquired from a piece in one key signature needs to be

extrapolated into a universally relevant set of rules. This is done by transposing the input notation into

one of two keys: C major for all peices determined to be in a major key, and A minor for those in a

minor key. The designers acknowledge that this division is inadequate, as it does not accommodate

minor key modulations that might occur in pieces considered to be in a major key, and suggest that

future work on this system might address this.30

LZC has clear limitations. Its sequences of notes are unconnected as the system is unable to learn or

implement broader musical structure such as phrasing or repetition. Even with this, and its simplified

handling of modality and chord recognition, it is able to produce “reasonable” sounding music.31 With

many foundational system structures in place, these elements can be improved with further

development, making LZC’s results all the more encouraging.

29 Lichtenwalter, “Applying learning algorithms”, 3-4. 30 Lichtenwalter, “Applying learning algorithms”, 3. 31 Lichtenwalter, “Applying learning algorithms”, 13

18

WaveNet

WaveNet is a deep ANN for generating digital audio waveforms created by van den Oord et al. at

Google Deepmind.32 Although it is examined here for its use in music generation, WaveNet’s primary

focus is on mimicking the human voice, particularly in text-to-speech (TTS) applications.

Concatenative TTS is perhaps the most common TTS approach, in which short fragments of speech

are recorded into a database and then called upon to be recombined into words. Once the database has

been established it is difficult to adjust the sound or inflection of the generated speech. In WaveNet,

sounds are learned at the resolution of a digital audio file, making precise and variable imitation

possible. The design of WaveNet is naturally informed by its suitability to voice mimicry. When

trained on musical data sets it is capable of producing interesting results that are clearly derivative of

the sample data. Unlike LZC, WaveNet makes no direct assessment of the musical parameters of its

inputs, and musicality in its output can be attributed to rules that govern the temporal arrangement of

digital audio samples rather than musical parameters.33

A crucial factor in WaveNet’s success, is the manner in which it learns from a given audio file. Each

sample being considered in regard to all those that came before it. The probability of a sample with

particular attributes occurring after a series of samples is modelled in a multi-layered neural network.

The network used within WaveNet is a convolutional neural network (CNN). This is a type of feed-

forward network, meaning that data is passed from input to output via the layers of neurons without

any cyclic loops as in RNNs. This allows each sample to be conditioned only by its temporal

predecessors and not by any samples that come after it. CNNs are much faster to train than RNNs

because of their lack of recurrent connections but require significantly more layers to facilitate the

same scope of connection between a neuron and its previous layer (referred to as receptive field).

WaveNet combats this by using dilated convolutions, allowing the network to skip some input values

and operate on a coarser level. Many other refinements to specific functions in WaveNet are

32 Dr Shane Legg, Mustafa Suleyman, and Dr Demis Hassabis, “Deepmind”. Accessed 13/5/2017 https://deepmind.com/ 33 Aaron van den Oord, et al. "WaveNet: A Generative Model for Raw Audio." 2016 in “Deepmind” Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, . Accessed 13/5/2017 https://deepmind.com/blog/wavenet-generative-model-raw-audio/

19

employed, streamlining its processes and making it possible to run the system. When applied to TSS,

WaveNet outperforms other TTS techniques, both in malleability, and convincing mimicry. When

trained on musical data sets its output is recognisably derived from the sounds within the sample data

but with only limited musical coherence, certainly far from the musicality that can be generated by

LZC.34

As a system for composition, WaveNet is limited by its audio based approach. It does not learn any

particular musical rules from sample data because its decision making is concerned with such a small

elements in the time domain. This is also its strength, as the system has learnt how to assemble the

sounds themselves, with effective mimicry of pitch and timbre. The fact that the musical output of

WaveNet has hints of musicality arising from an arrangement of such small elements is promising.

Perhaps with further development, increased computational power, and further training, more explicit

musical rules could be learned in the form of these lower level structural rules.

Wekinator

Wekinator was developed by Rebecca Fiebrink (2011) in order to fill a perceived gap in research. It

therefore became the first ‘general-purpose tool for interactively applying supervised learning to real

time musical problems, including audio and gesture analysis’.35 Supervised learning referring to a

broad class of ML algorithms whose output is assessed in order to learn the system’s weights.

Wekinator derives its name from the ML and data mining toolkit Weka. While Weka’s infrastructure

does not allow or real-time input streams it can be used as a library, from which its supervised

learning algorithms have been implemented in the Wekinator software.

34 Aaron van den Oord, et al. "WaveNet: A Generative Model for Raw Audio." 2016 in “Deepmind” Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, . Accessed 13/5/2017 https://deepmind.com/blog/wavenet-generative-model-raw-audio/ 35 Rebecca Fiebrink, Real-time human interaction with supervised learning algorithms for music composition and performance, Order No. 3445567, Princeton University, 2011. http://proxy.library.adelaide.edu.au/login?url=http://search.proquest.com.proxy.library.adelaide.edu.au/docview/854505276?accountid=8203 . 60.

20

The Wekinator software works by allowing the user to generate trained models for real-time input and

output. Input data sets are connected, through training, to specific output values that can be sent to

other software. The implication is that certain input data will correlate with the musical parameters in

the receptive software according to the user training and allow for real time manipulation of those

parameters.36

Without user designated connections between the Wekinator outputs and other software, the output

values are arbitrary. In this way, the ML algorithms are not applied with any particular consideration

of potential musical generation. The system is however tailored for musical applications in other

ways, namely by its use of Open Sound Control (OSC) as a data format for communicating with other

software, and specifically by incorporating an optional component written in and for use with the

ChucK programming language.

Wekinator’s use of ML for interactive tasks differentiates it from other systems. Weka’s tools operate

in a specific way, focusing on generating a model that represents a static data set as accurately as

possible. The learning required for Wekinator is such that input data may change, and this change can

occur in real time. In the case of a user controlling the sound of a digital instrument, it might be

desirable for the neural network connections to be adjusted as the user is hearing the impact it has on

the end product. This is all made possible in Wekinator by a variety of options available in the user

interface that allow the user to switch between training, testing and performance with ease,

implementing the Weka tools as required. The ANN itself is a multilayer perceptron; a feed-forward

network that trains by means of back-propagation to adjust connection weights.

Wekinator is used for very different musical tasks than WaveNet and LZC, and so can not be valuably

compared with them in terms of musical output. The system output, although important in terms of

the control it gives the user, has little to do with the specific musical results, and in fact could be quite

usefully applied to non-musical tasks.

36 Fiebrink, Real-time human interaction. 61

21

When used compositionally, the musical task it performs is one of facilitation. It allows the user to

make connections between input and the resulting sound in ways that might not have otherwise been

conceived, or even possible. In this way, its musical use might be said to be exploratory, allowing for

the discovery of new ideas, all the while still curated by the user or composer’s sensibilities.

In the first example, musical notation is learned from by the LZC system, and derivative music is

composed. Certain musical concepts are reduced to a more computational simple form, reducing the

musical scope of the compositions it generates. LZC is however capable of producing convincing

reworkings of the input data, and with further work could be capable of doing so with greater

authenticity. WaveNet is a powerful ANN that excels at TTS tasks, and also produces interesting

results when musical sample data is fed to its inputs. It is limited in its ability to generate music with

coherent overarching structure, but it shows promise in its ability to mimic the tonal and timbral

qualities. The applications for Wekinator differ significantly from the other examples. It being a

versatile tool for controlling musical parameter in unique but developable ways. While it does not

offer any specific approach to generating musical content, it allows the user to engage with a large

variety of musical tasks in a thoroughly customisable and exploratory manner. These examples

demonstrate a range of the possible applications for ML in compositional tasks. Examining their

limitations and future potential helps to lay the early foundations for future projects and the

progression of the field.

22

Chapter 4

Working Towards a New System

This paper has established some of the basic principles of ML in composition and provided insight

into the potential of ML music systems. In this chapter the foundations of a new system will be

proposed. It will be an amalgamation of a selection of ideas and techniques presented in the paper,

and will point to a future project designed to extend the developer’s understanding of ML music

systems. The proposed system, called DuANN, will draw on elements from the examples, combining

them in consideration of their established limitations.

The DuANN system will allow for interactivity with the user, with similar flexibility to Wekinator’s

input options. The input may be gestural, interpreted by video tracking, or other methods of

measuring positions in real space. By feeding this data into an ANN, the input type may vary, as the

network will learn to use any set of data and automatically configure it’s mappings.

Two ANNs will be used in series to facilitate dialogue between the user and the system. The first

receives the control input, and allows the user to map a gesture to an output data set. This data set is

then sent to the second ANN which makes a prediction on possible note or chord output for for that

data set. The system begins the interaction by proposing a note or chord and allow the user perform a

gesture that will be related to that musical output by the first ANN. If the gesture is then repeated

before another note is proposed, the second ANN reinforces the connections between the note or

chord as it exists in data set form, and the pitch and duration that it produces in the form of MIDI data.

In this way conversation continues between the human and DuANN, building up a library of musical

outputs that correlate with gestures, and can also be adjusted.

By using two ANNs, the musical output is abstracted beyond the direct control allowed by Wekinator,

adding to the conversational feel of the interaction. Further developments of DuANN would allow the

23

second ANN to generate larger musical phrases derived from the initial gesture-to-note mappings, in a

similar manner to the LZC system. DuANN represents a new approach to interactive computer music

generation, drawing on concepts illustrated by the examples.

24

Conclusion

ML is a subfield of AI concerned with the study and creation of machines that can learn. Various

techniques have been used to achieve ML, and many of these can be usefully applied to the

composition of music. This paper has summarised the foundational ML concepts and examined

examples in order to provide a foundation for further research.

The numerous ideas that predate the inception of ML have been shown to contribute to its

development. Early concepts of AI and connectionist structures lead to the creation of ANNs as well

as other ML algorithms. The origins of their use in algorithmic composition can be similarly

attributed to an established practice of applying ideas from mathematics and natural sciences to

composition.

This paper has shown how the invention of Rosenblatt’s Perceptron has lead the way to the complex

multi-layered ANNs used in contemporary ML. By adjusting connections between input and output

data, these algorithms are capable of effective learning for the purposes of classification or linear

predictions. Genetic algorithms can be used to stochastically generate solutions to problems by

combining and mutating a large number of possible solutions. Both of these techniques can be used

separately or together in musical composition.

The three examples examined in chapter three have shown the diversity of possible applications for

ML to musical tasks. LZC demonstrates how ML can be used to generate scores that derive their

musical rules from a corpus of existing music. WaveNet is capable of creating each sample of a digital

audio file in relation to the preceding samples, such that it convincingly mimics the sound of music or

speech audio that is fed into the system. Wekinator is a tool for the live control of musical parameters.

It differs significantly from the other two systems in that it does not directly learn from or generate

musical rules. Instead it allows the user to create unique control profiles and manipulate sound or

other musical parameters in new and unique ways. Following from these examples, in chapter four, a

25

new system was proposed, drawing on the ideas presented in the rest of the paper. This system,

DuANN, acts as a conceptual foundation for a future project with the intention of extending the

developers knowledge of ML music.

By examining the examples in chapter three in relation to the concepts established in chapters one and

two, a basic understanding of ML has been presented, and the scope and possibilities for ML music

also shown. This, in conjunction with the proposed new DuANN system, provides a valuable starting

point for further research in the field, and indicates specific plausible steps toward this.

26

Bibliography Albertson, Dan. and Hannah, Ron. “The Living Composers Project: Eduardo Reck Miranda” Last modified 2017, accessed May 10, 2017, http://composers21.com/compdocs/mirandae.htm Biles, John A. “GenJam: A Genetic Algorithm for Generating Jazz Solos” in ICMC Proceedings (1994) 131-137. Coenen, Alcedo. “David Cope, Experiments in Musical Intelligence.” in Organised Sound Vol. 2 no 1 (1997). 57-60.

Cope, David. Computer Models of Musical Creativity. Cambridge: The MIT Press, 2005. Cope, David. The Algorithmic Composer. Madison: A-R Editions. 2000. De Mantaras, Ramon L. and Arcos, Josep Lluis. “AI and music: From composition to expressive performance.” in AI Magazine vol. 3 no. 43 (2002). Doornbusch, Paul. “Computer Sound Synthesis in 1951: The Music of CSIRAC.” in Computer Music Journal vol. 28 no.1 (2004) 10-25. Fernández, Jose David. and Vico. Francisco “AI Methods in Algorithmic Composition: A Comprehensive Survey” in Journal of Artificial Intelligence Research, no 48 (2013): 513-582 (Fernandez 2013, 517) Fiebrink, Rebecca, Real-time human interaction with supervised learning algorithms for music composition and performance, Order No. 3445567, Princeton University, 2011. http://proxy.library.adelaide.edu.au/login?url=http://search.proquest.com.proxy.library.adelaide.edu.au/docview/854505276?accountid=8203 Goertzel, Ben and Pennachin, Cassio. Editors, Artificial General Intelligence. Berlin: Springer, 2007. Johnson, Colin G. “Towards a Prehistory of Evolutionary and Adaptive Computation in Music” in Applications of Evolutionary Computing, Editor Colin G. Johnson, 502-509. Berlin: Springer, 2003. Kirke, A. and Eduardo R. Miranda. editors. Guide to Computing for Expressive Music Performance. London: Springer, 2013. Kohavi, Ron. and Provost, Foster. "Glossary of terms". Accessed 1 June, 2017, http://robotics.stanford.edu/~ronnyk/glossary.html Originally published in Machine Learning, 30 (1998) 271–274. Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, “Deepmind”. Accessed 13/5/2017 https://deepmind.com/ Lichtenwalter, Ryan., Zorina, K., and Chawla, Nitesh V. “Applying learning algorithms to music generation” In Proceedings of the 4th Indian International Conference on Artificial Intelligence IICAI 2009, (2009) 483–502. Lyon, Richard, F. Human and Machine Hearing. Cambridge, Cambridge University Press, 2017.

27

Mitchell, Tom M. Machine Learning, Singapore: The McGraw-Hill Book Co., 1997. Minsky, Marvin. The Society of the Mind. New York: Simon and Schuster, 1985 Miranda, Eduardo R. “Cellular Automata Music: An Interdisciplinary Project” in Interface, Vol. 22 no 1. (January 1993) 3-21. Nierhaus, Gerhard, editor. Patterns of Intuition: Musical Creativity in the Light of Algorithmic Composition. Netherlands: Springer, 2015. Nierhaus, Gerhard. Algorithmic Composition. Vienna: Springer, 2009. Paluszek, Michael. and Thomas, Stephanie. MATLAB Machine Learning. Berkley: Apress. 2017. Proudfoot, Diane and Copeland, B. Jack. “Artificial Intelligence” in The Oxford Handbook of Philosophy of Cognitive Science, Oxford: Oxford University Press, 2012. Date Accessed 15 Jun. 2017 http://www.oxfordhandbooks.com.proxy.library.adelaide.edu.au/view/10.1093/oxfordhb/9780195309799.001.0001/oxfordhb-9780195309799-e-7 Romero, J., and Penousal Machado, editors. The Art of Artificial Evolution. Berlin: Springer-Verlag, 2008. Rowe, Robert. Machine Musicianship. Cambridge: The MIT Press, 2001. Tecuci, Gheorghe. “Artificial Intelligence” in WIREs Comput Stat 4 (2012):168–180. van den Oord, Aaron. Dieleman, Sander. Zen, Heiga. Simonyan, Karen. Vinyals, Oriol. Graves, Alex. Kalchbrenner, Nal. Senior, Andrew. and Kavukcuoglu, Koray. "WaveNet: A Generative Model for Raw Audio." 2016 in “Deepmind” Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, . Accessed 13/5/2017 https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Documents

Machine Learning in Music Composition€¦ · explained or understood to the point of being recreatable, is not ... 2 Ben Goertzel, and Cassio Pennachin, Eds. Artificial General Intelligence