Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
University of Adelaide
Elder Conservatorium of Music
Research Project
Submitted in partial fulfilment of the requirements for the degree of
Bachelor of Music with Honours
Machine Learning in Music Composition
Submitted by
Simon Harley Koehn a1667613
Adelaide, June 2017
2
CONTENTS
Abstract 3
Declaration 4
Acknowledgements 5
Introduction 6
Chapter 1 Machine Learning and Algorithmic Composition History 8
Chapter 2 Machine Learning and Composition Techniques 11
Chapter 3 Examples of Machine Learning In Music 16
Chapter 4 Working Towards a New System 22
Conclusion 24
Bibliography 26
3
ABSTRACT
Within the field of artificial intelligence, machine learning is the study and creation of computer
systems that can learn. Such systems have can be applied to algorithmic composition, in which a
machine learning algorithm learns musical rules from existing music samples and applies that to its
own generated composition.
This paper provides a general explanation of certain machine learning concepts, and a historical
context for these. Three existing systems are examined to give a sense of what is musically possible
with such a system, and identify their limitations and possible improvements.
A conceptual model is then proposed for a future system that draws on some of the ideas presented,
taking note of the strengths of techniques used in the example systems. By presenting key concepts
and pointing to possible future directions, this paper lays the foundation for future research into ML
music systems.
4
DECLARATION
The candidate declares that the material contained in this submission is his/her own work and that appropriate recognition has been given when referring to the work of others.
Signature:........................................
Simon Koehn
Date: 22/6/2017
5
ACKNOWLEDGEMENTS
Thanks to my supervisor Christian Haines for invaluable guidance throughout this project. Also to Adele Sliuzas, for her expert proof-reading and moral support, all while carrying our unborn child.
6
INTRODUCTION
Machine learning is a powerful computational tool with numerous possible applications. Current
machine learning techniques have been applied to music composition with great success. New
technology and greater computational power in the twenty first century has seen a rise in how
machine learning can be manifested in the world of music. As a fast paced field, these applications of
machine learning are still in their infancy and therefore open up possibilities for the future scope of
machine learning in music. Further developments in machine learning as related to algorithmic
composition will allow for increasingly human-like computer music composition.
Artificial intelligence (AI) deals with the application of machines to tasks that are generally
considered to require human intelligence. Machine learning (ML) is an important subfield of this,
specifically concerning computer systems that can be said to learn.1 The study of AI is often met with
debate in regards to its goals and scope. The notion that intelligence, as exhibited by humans, can be
explained or understood to the point of being recreatable, is not universally accepted2. Whether or not
true intelligence can ever be created, many of the techniques that have arisen from the pursuit of this,
including ML, are of great value.
If the goal of AI is to create human-like general intelligence in a machine, machine learning is one of
the key elements that must operate in conjunction with other subfields of AI. Simply put, these
systems make predictions on data fed into them, based on previous sets of input data. ML can be used
to great effect when applied to tasks that might be difficult to address by developing more explicit
algorithms or procedures. Such tasks might include identifying objects in an image, predicting the
price of a house based on numerous attributes, or even predicting people’s actions in a given situation.
1 Ron Kohavi and Foster Provost, "Glossary of terms". Accessed 1 June, 2017, http://robotics.stanford.edu/~ronnyk/glossary.html Originally published in Machine Learning, 30 (1998) 271–274. 2 Ben Goertzel, and Cassio Pennachin, Eds. Artificial General Intelligence. (Berlin: Springer, 2007) vi-vii.
7
In addition to these more practical applications, ML, amongst other techniques from AI, has been
incorporated into artistic practices. This takes the form of either a tool that uses machine learning to
assist with the artistic process, or a ML algorithm that is created to generate artistic content of its own
based on learning from existing art works or other data. The application of ML to music fits obviously
into the discipline of algorithmic composition. Numerous attempts have been made to produce music
with ML, and many are in progress.
Chapter one of this paper will define some of the key concepts in ML, and look at its history, and the
more general history of AI. This is then expanded upon in the second chapter, with specific regard to
artificial neural networks and genetic algorithms. Their inner workings are discussed, along with
approaches to applying ML to composition. In the third chapter, three examples are examined and
assessed in terms of their limitations and musical possibilities. The first of these is referred to in this
paper as LZC, a model proposed by Lichtenwalter, Zorina and Chawla for generating musical scores.3
WaveNet by Google Deepmind is an artificial neural network that can be trained on digital audio files
in order to generate its own sample by sample. It is unable to consider musical parameters as
meaningfully as LZC, but Its ability to mimic the input sounds shows promise for future iterations.
Lastly Wekinator represents a different application of ML to musical tasks. It operates as a trainable
control system that allows the user to link almost any input to musical parameters in other software.
These examples demonstrate the scope of possible applications of ML to music, point to future
possibilities. In response to the concepts presented, the fourth chapter proposes a new system,
DuANN, that combines elements of the example systems in chapter three in order to allow for user
interaction in the generation of music.
In all, this paper serves as a summary of machine learning in algorithmic composition, drawing on
concepts from both computer science and musical disciplines. In doing so it provides a foundation for
future study in this area, and points to possible directions that might be pursued. 3 Lichtenwalter, “Applying learning algorithms”
8
Chapter 1
Machine Learning and Algorithmic Composition History
The use of ML in musical tasks typically fits the definition of algorithmic composition, whether it is
applied to genuine composition, or fits somehow into a broader compositional process. As such, ML
music has historical roots in AC as well as computer science and AI. As AI and ML research
progresses, with increasingly effective systems and greater computational power, so too does the the
potential for its application to AC. Algorithmic composition (AC) can be defined as the ‘generation of
musical structure... [by] a formalizable and abstracting procedure’.4 This definition encompasses the
use of any conceivable algorithm, as long as it is used to generate musical structure on some level. It
also covers a spectrum of application, from the use of algorithms to assist with specific compositional
tasks, to generating entire compositions. Certain classes of algorithms have been often used in
composition, and as such have become standard approaches of AC. These include Markov chains,
transition networks, chaos systems, and generative grammars.
Examples of AC can be found as early as AD 1000, and appear throughout the following millennium.
The calculation of an algorithm is obviously a vital part of the AC process, and as such, the
development of the programmable computer dramatically increased the feasibility of more
complicated systems. Calculations that might once have been arduous manual tasks could be
automated and completed in a much shorter time.5 This allowed for the development and
implementation of evolutionary and adaptive algorithms such as genetic algorithms and artificial
neural networks.6 The latter refers to a specific type of learning algorithm used in ML, which, in
various forms, has been successfully applied to compositional tasks.
4 Gerhard Nierhaus, Algorithmic Composition (Vienna: Springer, 2009), 1. 5Nierhaus, Algorithmic Composition, 21,63.
6 Nierhaus Algorithmic Composition, 4-5.
9
Within the field of computer science, Machine Learning deals with the study and creation of learning
algorithms. These algorithms make predictions on data sets based on the results of previous
predictions and sample data sets. ML systems avoid using static program directives by making
decisions or predictions dependant on the data they are fed. This attribute is of great value in tasks that
require analysis of large amounts of data or are difficult to explicitly define.
ML is one of several areas of study that have developed out of AI research. The term artificial
intelligence was first suggested by John McCarthy in 1956 while attending a summer workshop at
Dartmouth College alongside other important figures in AI. It concerns the development of systems
that model behaviours usually associated with human intelligence.7 Following the Dartmouth
workshop, many of those in attendance went on to develop basic AI systems, generating a great deal
of enthusiasm for the field. Among these were Allen Newell and Herbert Simon’s reasoning program
the Logic Theorist, Arthur Samuel’s checker playing program, and systems for problem solving,
vision, learning and planning, which were developed by Marvin Minsky and his students.8
While these systems showed promise, more complex real-world problems were still out of reach.
Accurate modeling of intelligence that could solve problems more efficiently requires vast amounts of
knowledge and the ability to easily search through that knowledge. More generally, it was
acknowledged that a greater understanding of cognitive processes was required in order to develop
these models. This led to a split into many areas of AI, allowing researchers to focus on individual
cognitive processes. These include knowledge representation, search, theorem proving, language
processing, vision, robotics, and learning. 9
7 Tecuci, Gheorghe. “Artificial Intelligence” in WIREs Comput Stat 2012, 4. 169 8 Tecuci, “Artificial Intelligence”, 179. and , Michael Paluszek and Stephanie Thomas, MATLAB Machine Learning. (Berkley: Apress, 2017) 17. 9 Tecuci, “Artificial Intelligence”, 170.
10
Although ML began as a branch of AI, some techniques and related concepts were developed much
earlier. Bayes’ Theorem, which suggests an equation for determining the probability of an event
dependant on another variable, was created by Thomas Bayes in 1763, and is still used in ML
systems. In 1957 Frank Rosenblatt began developing his perceptron, the first model of a type of ML
algorithm called artificial neural networks (ANN).10 Rosenblatt’s model is relatively simple, but with
significant developments it survives as a key element in many ML systems.11
Contemporary ML began as data mining, an area of considerable focus within early AI research
concerned with the analysis of data. While interest in this initially diminished, it was reinvented in the
1990s as machine learning, applied to tasks of pattern recognition. The vast amounts of data that had
since become available thanks to internet connectivity made this approach significantly more feasible.
ML has since been applied to numerous tasks including driverless cars, high-speed stock trading, and
the prediction of human and natural events.12 Development of ML technologies continues today,
fuelled in part by their value to many consumer technologies, but also industrial and military
applications. The integration of human and machine intelligence also being actively researched, with
potential to lead to autonomous control over machines, or even human augmentation.13
ML and AC both have their roots in numerous fields, usually predating their inception. Developments
in other areas of study continue to contribute to AC, and as ML techniques become increasingly
powerful, facilitated in part by increases in computational power, their application to musical tasks
benefits. The concepts presented in chapter two, and more elaborate examples of those in chapter
three, are a continuation of the historical timeline presented here.
10 Nierhaus, Algorithmic Composition, 207. and Richard F. Lyon, Human and Machine Hearing. (Cambridge: Cambridge University Press, 2017) 420. 11 Lyon, Human and Machine Hearing, 420. 12 Paluszek, MATLAB Machine Learning, 21 13 Paluszek, MATLAB Machine Learning, 22.
11
Chapter 2
Machine Learning and Composition Techniques
Within ML many different algorithms and techniques are used for different kinds of learning, and are
best suited to different learning tasks. In order to address ML approaches to algorithmic composition,
some common and foundational approaches should be understood.14 Two techniques are therefore
described: Artificial Neural Networks and Genetic Algorithms.
A major consideration in AC is the mapping of output data to musical parameters. ML systems
inherently deal with some of the difficulties of this. By making use of sample data to determine the
musical rules they apply, many general musical concepts are automatically imparted. This is one of
the primary attractions of using ML for compositional tasks, however some mapping issues remain. It
must be determined which musical parameters comprise the input data, and how those parameters are
extracted into numerical data. This also applies to the system’s output.15 An awareness of various
approaches to this is as important as understanding the workings of the ML systems themselves. This
will be touched on here in relation to ANNs and genetic algorithms, and differing approaches can be
found in the examples in chapter three.
Artificial neural networks are ML systems that are structurally analogous to neural networks in the
human brain.16 They learn to predict solutions by changing the structural relationships of a network of
data processing units called neurons. ANNs vary significantly in structure and application, from the
most basic, single layer perceptrons, to deep neural networks with complex architecture.17 They are
widely used in the field of ML, valued for their suitability to problems of pattern recognition,
prediction, optimization, and automatic classification.18
14 Nierhaus, Algorithmic Composition, 245. 15 Tom M. Mitchell, Machine Learning, (Singapore: The McGraw-Hill Book Co., 1997) 52. 16 Mitchell, Machine Learning, 82. 17 Lyon, Human and Machine Hearing. 419. 18 Nierhaus, Algorithmic Composition, 205.
12
Early conceptions of connectionist structures, or ANN-like structures were created by Warren
McCulloch and Walter Pitts in the study of nervous systems. The McCulloch-Pitts neuron developed
in 1943 is a model that allows for basic logic calculations. In this model three inputs are summed
together and the system returns a 0 or 1 depending on whether the result reaches a given threshold. All
though these neurons are very simple, when arranged in a network they are capable of much more
complex calculations.19
Donald Hebb’s 1949 book The Organization of Behaviour was the next major contribution to the
development of ANNs. In it he proposed the first learning rules for neural architectures as they apply
in the human brain. Regarding the biological model, Hebb suggests that connections between brain
cells are reinforced by a growth process or metabolic change when one repeatedly triggers the other.
This is known as Hebbian learning and is used in various ANN systems, as well as other ML
techniques.20
Rosenblatt’s Perceptron (1957), is model generally accepted to be the first ANN, and it continues to
be used in ML systems. A perceptron maps a number of inputs to an output, generating a predicted
answer for the problem it has been trained to solve. The connections are affected by weights that are
that are adjusted according to the accuracy of a predictions on sample data sets. These simple weights
and mappings can be expressed algebraically as:
y=Wx
The multiplication of the weights and inputs is essentially a matrix multiplication. In this case y is the
output of the perceptron and the sum of the weighted input values, x is the input values, and W is the
weights by which the inputs are multiplied. The output is often then compared to a set threshold,
indicating a positive or negative result depending on that comparison. In this way the perceptron can
19 Paluszek, MATLAB Machine Learning, 17. 20 Nierhaus Algorithmic Composition, 207. and Paluszek, MATLAB Machine Learning, 17.
13
be used to classify a given input data set, or without a threshold, can make linear numerical
predictions. Adjustments to the weight values is comparable to the varying strength of connections
between neurons in a human brain.21
The original perceptron model is known as the single-layer perceptron (SLP). In this model the inputs
are directly mapped to the output via their learned weights, with weighting only occurring once,
between input and output. SLPs are not without limitations, as highlighted by Marvin Minsky and
Seymor Papert in 1956. Their observations temporarily held up the development of ANNs, but it was
later discovered that additional layers of neurons allowed systems to overcome those limitations.22
With the perceptron as a structural foundation, the incorporation of a back-propagation algorithm
allows the ANN to learn. Back-propagation changes the connection weights beginning with those to
the outputs and then moving back through the layers until reaching the inputs. In contrast to the feed-
forward networks described so far, recurrent neural networks incorporate connections between
neurons that do not send data in an input to output direction, and can create cycles within the network.
These networks can retain information from previous data sets and operate dynamically across
different learning events.
Genetic algorithms (GA) are a class of evolutionary algorithms whose mechanisms are loosely
modeled on Darwinian biological evolution. They employ stochastic processes in the generation and
recombination of data sets that are representative of a possible solution to the task they are applied to.
GAs excell at very different tasks to ANNs. Rather than learning from sets of data, they search for the
most efficient or desirable solution to a given problem by measuring the success of randomly
generated solutions.23
One of the basic elements of a GA is a population of individuals, which are initially randomly
generated, or generated according to the current best solutions to the problem. Each individual has a
21 Lyon, Human and Machine Hearing. 420. 22 Mitchell, Machine Learning, 95. and Lyon, Human and Machine Hearing. 420. 23 Nierhaus, Algorithmic Composition, 158-159.
14
chromosome which is a set of data from which a possible solution can be derived. On each cycle of
the system, the chromosome of every member of the population is assessed against the fitness criteria,
which is typically a measurement of the the chromosome’s success at solving the problem. After the
assessment, individuals are selected to parent the next generation of the population. Those with better
scores in the fitness test are more likely to be selected and their chromosomes combined to make new
individuals. At this point, each new individual has a small chance to be mutated, that is, some data in
its chromosome may be randomly generated. This cycle continues until either the solution is found or
all individuals possess the same data.24
An alternate technique of evolutionary computation is genetic programming (GP). In these systems
the individuals within the population are themselves computer programs, rather than numerical
strings. The programs in GP algorithms are represented as trees. The trees are comprised of nodes that
indicate a specific mathematical function, and the variables of a given function make up the
descendant nodes in the tree structure. Similarly to GAs, GP uses successive iterations of its
population to improve performance at task. The program trees’ fitness is assessed, and new
generations are created, combining and mutating the previous trees. In this way GP systems are
capable of developing an algorithm that is well suited to a particular task, and as such, can be
especially useful when working in conjunction with other computational techniques.25
Applying these techniques to compositional tasks brings its own set of problems. While ML systems
are often able to derive musical and compositional rules from sample music, the way data is extracted
from the music, and also reinterpreted at the output, is a significant decision for the composer. The
ML composition process can be broken into three steps: data processing, which in ANNs is the
conversion of musical features into a numerical representation; data analysis, the processing of the
data through the network; and then the generation of a composition by the trained system. The third
step in particular presents a particular challenge. Especially in the pursuit of genuine human-like
24 Mitchell, Machine Learning, 250 and Nierhaus, Algorithmic Composition, 159. 25 Mitchell, Machine Learning, 262.
15
compositions, the rules put in place at this step should be general enough to encapsulate the rules of
tonal music, but also must be able to more narrowly facilitate the use of these rules to generate music
that resembles the input music.26
Various approaches have been used to represent musical parameters in the context of ML. While these
are often dependant on the specific system, there are some more general approaches. Temporal
parameters, such as the duration of specific notes or phrases has occasionally been dealt with by
creating windows of time as structural components of the composition. A most obvious example is the
use of a musical measure as a temporal building block. Note sequences can then be arranged within a
given window, simplifying the task of relating the contained notes to each other. Windows can also be
easily repeated or arranged to adhere to higher level structural rules.
Pitches, in combination with a note’s temporal features, are much more easily codified, either by
mapping a tuning system to numbers, or using pre-established formats such as MIDI or MusicXML.
So as to learn and generate coherent musical phrases, pitches will often be considered in the context
of a scale and learned relative to the melodic scope of that scale.27
26 Ryan Lichtenwalter, Katerina Zorina, and Nitesh V. Chawla, “Applying learning algorithms to music generation” In Proceedings of the 4th Indian International Conference on Artificial Intelligence IICAI 2009, (2009) 1. 27 Nierhaus, Algorithmic Composition, 213
16
Chapter 3
Examples of Machine Learning in Music
This chapter will examine three unique examples of applied ML in order to cultivate a more palpable
understanding of how ML can be used to generate music. These examples will be analysed for their
use of ML, alongside its limitations, and possible future directions. Each represents music generation
at differing levels. The first, is a model proposed by Ryan Lichtenwalter, Katerina Zorina and Nitesh
V. Chawla, and discussed in their paper Applying Learning Algorithms to Music Generation.28 This
model, which will be referred to as LZC, learns from and outputs musical notation in the form of
MusicXML files. The second example, Google Deepmind’s WaveNet, generates digital audio
waveforms sample by sample, learning from and outputting sound at a resolution much finer than
musical notation. Finally, Rebecca Fiebrink’s Wekinator applies ML to a very different musical task.
It learns to recognise a series of inputs, which can be controlled by human hand gestures, and maps
them to specified output data. In this way it can be used as a trainable controller for the real-time
manipulation of musical, or other, parameters.
LZC
The system proposed by Lichtenwalter, Zorina and Chawla (LZC), avoids the use of explicit rules to
funnel the output into a limited musical scope, such as blues progressions or jazz improvisation. To do
this it employs a sliding window sequential learning technique. This allows for a given parameter to
be determined in relation to the musical content that temporally precedes it. It is also designed to be
usable with a variety of different learning algorithms. LZC is a powerful example of the musical
potential in ML composition that deals with music generation on a notational level.
An important design decision of LZC is its use of the MusicXML format as both a source of musical
data, and an output. MusicXML is a file format for musical notation, and this ensures that the all the
28 Lichtenwalter, “Applying learning algorithms”.
17
data LZC deals with is clearly musically expressible. This contrasts with MIDI files, in which musical
form is more difficult to clearly indicate. Another key feature of the LZC system is the way in which
it considers each note in regard to others in its vicinity. A sliding window system is employed to this
effect, in which a window selection moves along the score, grouping sections of notes together. Each
note’s features are acknowledged by the system to include the notes preceding it in the sliding
window. This information is learned in terms of both the pitch and duration of a note and its
precedents.29
LZC is designed to learn from any musical notation input, which must then be applicable to any
composition it generates. Information acquired from a piece in one key signature needs to be
extrapolated into a universally relevant set of rules. This is done by transposing the input notation into
one of two keys: C major for all peices determined to be in a major key, and A minor for those in a
minor key. The designers acknowledge that this division is inadequate, as it does not accommodate
minor key modulations that might occur in pieces considered to be in a major key, and suggest that
future work on this system might address this.30
LZC has clear limitations. Its sequences of notes are unconnected as the system is unable to learn or
implement broader musical structure such as phrasing or repetition. Even with this, and its simplified
handling of modality and chord recognition, it is able to produce “reasonable” sounding music.31 With
many foundational system structures in place, these elements can be improved with further
development, making LZC’s results all the more encouraging.
29 Lichtenwalter, “Applying learning algorithms”, 3-4. 30 Lichtenwalter, “Applying learning algorithms”, 3. 31 Lichtenwalter, “Applying learning algorithms”, 13
18
WaveNet
WaveNet is a deep ANN for generating digital audio waveforms created by van den Oord et al. at
Google Deepmind.32 Although it is examined here for its use in music generation, WaveNet’s primary
focus is on mimicking the human voice, particularly in text-to-speech (TTS) applications.
Concatenative TTS is perhaps the most common TTS approach, in which short fragments of speech
are recorded into a database and then called upon to be recombined into words. Once the database has
been established it is difficult to adjust the sound or inflection of the generated speech. In WaveNet,
sounds are learned at the resolution of a digital audio file, making precise and variable imitation
possible. The design of WaveNet is naturally informed by its suitability to voice mimicry. When
trained on musical data sets it is capable of producing interesting results that are clearly derivative of
the sample data. Unlike LZC, WaveNet makes no direct assessment of the musical parameters of its
inputs, and musicality in its output can be attributed to rules that govern the temporal arrangement of
digital audio samples rather than musical parameters.33
A crucial factor in WaveNet’s success, is the manner in which it learns from a given audio file. Each
sample being considered in regard to all those that came before it. The probability of a sample with
particular attributes occurring after a series of samples is modelled in a multi-layered neural network.
The network used within WaveNet is a convolutional neural network (CNN). This is a type of feed-
forward network, meaning that data is passed from input to output via the layers of neurons without
any cyclic loops as in RNNs. This allows each sample to be conditioned only by its temporal
predecessors and not by any samples that come after it. CNNs are much faster to train than RNNs
because of their lack of recurrent connections but require significantly more layers to facilitate the
same scope of connection between a neuron and its previous layer (referred to as receptive field).
WaveNet combats this by using dilated convolutions, allowing the network to skip some input values
and operate on a coarser level. Many other refinements to specific functions in WaveNet are
32 Dr Shane Legg, Mustafa Suleyman, and Dr Demis Hassabis, “Deepmind”. Accessed 13/5/2017 https://deepmind.com/ 33 Aaron van den Oord, et al. "WaveNet: A Generative Model for Raw Audio." 2016 in “Deepmind” Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, . Accessed 13/5/2017 https://deepmind.com/blog/wavenet-generative-model-raw-audio/
19
employed, streamlining its processes and making it possible to run the system. When applied to TSS,
WaveNet outperforms other TTS techniques, both in malleability, and convincing mimicry. When
trained on musical data sets its output is recognisably derived from the sounds within the sample data
but with only limited musical coherence, certainly far from the musicality that can be generated by
LZC.34
As a system for composition, WaveNet is limited by its audio based approach. It does not learn any
particular musical rules from sample data because its decision making is concerned with such a small
elements in the time domain. This is also its strength, as the system has learnt how to assemble the
sounds themselves, with effective mimicry of pitch and timbre. The fact that the musical output of
WaveNet has hints of musicality arising from an arrangement of such small elements is promising.
Perhaps with further development, increased computational power, and further training, more explicit
musical rules could be learned in the form of these lower level structural rules.
Wekinator
Wekinator was developed by Rebecca Fiebrink (2011) in order to fill a perceived gap in research. It
therefore became the first ‘general-purpose tool for interactively applying supervised learning to real
time musical problems, including audio and gesture analysis’.35 Supervised learning referring to a
broad class of ML algorithms whose output is assessed in order to learn the system’s weights.
Wekinator derives its name from the ML and data mining toolkit Weka. While Weka’s infrastructure
does not allow or real-time input streams it can be used as a library, from which its supervised
learning algorithms have been implemented in the Wekinator software.
34 Aaron van den Oord, et al. "WaveNet: A Generative Model for Raw Audio." 2016 in “Deepmind” Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, . Accessed 13/5/2017 https://deepmind.com/blog/wavenet-generative-model-raw-audio/ 35 Rebecca Fiebrink, Real-time human interaction with supervised learning algorithms for music composition and performance, Order No. 3445567, Princeton University, 2011. http://proxy.library.adelaide.edu.au/login?url=http://search.proquest.com.proxy.library.adelaide.edu.au/docview/854505276?accountid=8203 . 60.
20
The Wekinator software works by allowing the user to generate trained models for real-time input and
output. Input data sets are connected, through training, to specific output values that can be sent to
other software. The implication is that certain input data will correlate with the musical parameters in
the receptive software according to the user training and allow for real time manipulation of those
parameters.36
Without user designated connections between the Wekinator outputs and other software, the output
values are arbitrary. In this way, the ML algorithms are not applied with any particular consideration
of potential musical generation. The system is however tailored for musical applications in other
ways, namely by its use of Open Sound Control (OSC) as a data format for communicating with other
software, and specifically by incorporating an optional component written in and for use with the
ChucK programming language.
Wekinator’s use of ML for interactive tasks differentiates it from other systems. Weka’s tools operate
in a specific way, focusing on generating a model that represents a static data set as accurately as
possible. The learning required for Wekinator is such that input data may change, and this change can
occur in real time. In the case of a user controlling the sound of a digital instrument, it might be
desirable for the neural network connections to be adjusted as the user is hearing the impact it has on
the end product. This is all made possible in Wekinator by a variety of options available in the user
interface that allow the user to switch between training, testing and performance with ease,
implementing the Weka tools as required. The ANN itself is a multilayer perceptron; a feed-forward
network that trains by means of back-propagation to adjust connection weights.
Wekinator is used for very different musical tasks than WaveNet and LZC, and so can not be valuably
compared with them in terms of musical output. The system output, although important in terms of
the control it gives the user, has little to do with the specific musical results, and in fact could be quite
usefully applied to non-musical tasks.
36 Fiebrink, Real-time human interaction. 61
21
When used compositionally, the musical task it performs is one of facilitation. It allows the user to
make connections between input and the resulting sound in ways that might not have otherwise been
conceived, or even possible. In this way, its musical use might be said to be exploratory, allowing for
the discovery of new ideas, all the while still curated by the user or composer’s sensibilities.
In the first example, musical notation is learned from by the LZC system, and derivative music is
composed. Certain musical concepts are reduced to a more computational simple form, reducing the
musical scope of the compositions it generates. LZC is however capable of producing convincing
reworkings of the input data, and with further work could be capable of doing so with greater
authenticity. WaveNet is a powerful ANN that excels at TTS tasks, and also produces interesting
results when musical sample data is fed to its inputs. It is limited in its ability to generate music with
coherent overarching structure, but it shows promise in its ability to mimic the tonal and timbral
qualities. The applications for Wekinator differ significantly from the other examples. It being a
versatile tool for controlling musical parameter in unique but developable ways. While it does not
offer any specific approach to generating musical content, it allows the user to engage with a large
variety of musical tasks in a thoroughly customisable and exploratory manner. These examples
demonstrate a range of the possible applications for ML in compositional tasks. Examining their
limitations and future potential helps to lay the early foundations for future projects and the
progression of the field.
22
Chapter 4
Working Towards a New System
This paper has established some of the basic principles of ML in composition and provided insight
into the potential of ML music systems. In this chapter the foundations of a new system will be
proposed. It will be an amalgamation of a selection of ideas and techniques presented in the paper,
and will point to a future project designed to extend the developer’s understanding of ML music
systems. The proposed system, called DuANN, will draw on elements from the examples, combining
them in consideration of their established limitations.
The DuANN system will allow for interactivity with the user, with similar flexibility to Wekinator’s
input options. The input may be gestural, interpreted by video tracking, or other methods of
measuring positions in real space. By feeding this data into an ANN, the input type may vary, as the
network will learn to use any set of data and automatically configure it’s mappings.
Two ANNs will be used in series to facilitate dialogue between the user and the system. The first
receives the control input, and allows the user to map a gesture to an output data set. This data set is
then sent to the second ANN which makes a prediction on possible note or chord output for for that
data set. The system begins the interaction by proposing a note or chord and allow the user perform a
gesture that will be related to that musical output by the first ANN. If the gesture is then repeated
before another note is proposed, the second ANN reinforces the connections between the note or
chord as it exists in data set form, and the pitch and duration that it produces in the form of MIDI data.
In this way conversation continues between the human and DuANN, building up a library of musical
outputs that correlate with gestures, and can also be adjusted.
By using two ANNs, the musical output is abstracted beyond the direct control allowed by Wekinator,
adding to the conversational feel of the interaction. Further developments of DuANN would allow the
23
second ANN to generate larger musical phrases derived from the initial gesture-to-note mappings, in a
similar manner to the LZC system. DuANN represents a new approach to interactive computer music
generation, drawing on concepts illustrated by the examples.
24
Conclusion
ML is a subfield of AI concerned with the study and creation of machines that can learn. Various
techniques have been used to achieve ML, and many of these can be usefully applied to the
composition of music. This paper has summarised the foundational ML concepts and examined
examples in order to provide a foundation for further research.
The numerous ideas that predate the inception of ML have been shown to contribute to its
development. Early concepts of AI and connectionist structures lead to the creation of ANNs as well
as other ML algorithms. The origins of their use in algorithmic composition can be similarly
attributed to an established practice of applying ideas from mathematics and natural sciences to
composition.
This paper has shown how the invention of Rosenblatt’s Perceptron has lead the way to the complex
multi-layered ANNs used in contemporary ML. By adjusting connections between input and output
data, these algorithms are capable of effective learning for the purposes of classification or linear
predictions. Genetic algorithms can be used to stochastically generate solutions to problems by
combining and mutating a large number of possible solutions. Both of these techniques can be used
separately or together in musical composition.
The three examples examined in chapter three have shown the diversity of possible applications for
ML to musical tasks. LZC demonstrates how ML can be used to generate scores that derive their
musical rules from a corpus of existing music. WaveNet is capable of creating each sample of a digital
audio file in relation to the preceding samples, such that it convincingly mimics the sound of music or
speech audio that is fed into the system. Wekinator is a tool for the live control of musical parameters.
It differs significantly from the other two systems in that it does not directly learn from or generate
musical rules. Instead it allows the user to create unique control profiles and manipulate sound or
other musical parameters in new and unique ways. Following from these examples, in chapter four, a
25
new system was proposed, drawing on the ideas presented in the rest of the paper. This system,
DuANN, acts as a conceptual foundation for a future project with the intention of extending the
developers knowledge of ML music.
By examining the examples in chapter three in relation to the concepts established in chapters one and
two, a basic understanding of ML has been presented, and the scope and possibilities for ML music
also shown. This, in conjunction with the proposed new DuANN system, provides a valuable starting
point for further research in the field, and indicates specific plausible steps toward this.
26
Bibliography Albertson, Dan. and Hannah, Ron. “The Living Composers Project: Eduardo Reck Miranda” Last modified 2017, accessed May 10, 2017, http://composers21.com/compdocs/mirandae.htm Biles, John A. “GenJam: A Genetic Algorithm for Generating Jazz Solos” in ICMC Proceedings (1994) 131-137. Coenen, Alcedo. “David Cope, Experiments in Musical Intelligence.” in Organised Sound Vol. 2 no 1 (1997). 57-60.
Cope, David. Computer Models of Musical Creativity. Cambridge: The MIT Press, 2005. Cope, David. The Algorithmic Composer. Madison: A-R Editions. 2000. De Mantaras, Ramon L. and Arcos, Josep Lluis. “AI and music: From composition to expressive performance.” in AI Magazine vol. 3 no. 43 (2002). Doornbusch, Paul. “Computer Sound Synthesis in 1951: The Music of CSIRAC.” in Computer Music Journal vol. 28 no.1 (2004) 10-25. Fernández, Jose David. and Vico. Francisco “AI Methods in Algorithmic Composition: A Comprehensive Survey” in Journal of Artificial Intelligence Research, no 48 (2013): 513-582 (Fernandez 2013, 517) Fiebrink, Rebecca, Real-time human interaction with supervised learning algorithms for music composition and performance, Order No. 3445567, Princeton University, 2011. http://proxy.library.adelaide.edu.au/login?url=http://search.proquest.com.proxy.library.adelaide.edu.au/docview/854505276?accountid=8203 Goertzel, Ben and Pennachin, Cassio. Editors, Artificial General Intelligence. Berlin: Springer, 2007. Johnson, Colin G. “Towards a Prehistory of Evolutionary and Adaptive Computation in Music” in Applications of Evolutionary Computing, Editor Colin G. Johnson, 502-509. Berlin: Springer, 2003. Kirke, A. and Eduardo R. Miranda. editors. Guide to Computing for Expressive Music Performance. London: Springer, 2013. Kohavi, Ron. and Provost, Foster. "Glossary of terms". Accessed 1 June, 2017, http://robotics.stanford.edu/~ronnyk/glossary.html Originally published in Machine Learning, 30 (1998) 271–274. Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, “Deepmind”. Accessed 13/5/2017 https://deepmind.com/ Lichtenwalter, Ryan., Zorina, K., and Chawla, Nitesh V. “Applying learning algorithms to music generation” In Proceedings of the 4th Indian International Conference on Artificial Intelligence IICAI 2009, (2009) 483–502. Lyon, Richard, F. Human and Machine Hearing. Cambridge, Cambridge University Press, 2017.
27
Mitchell, Tom M. Machine Learning, Singapore: The McGraw-Hill Book Co., 1997. Minsky, Marvin. The Society of the Mind. New York: Simon and Schuster, 1985 Miranda, Eduardo R. “Cellular Automata Music: An Interdisciplinary Project” in Interface, Vol. 22 no 1. (January 1993) 3-21. Nierhaus, Gerhard, editor. Patterns of Intuition: Musical Creativity in the Light of Algorithmic Composition. Netherlands: Springer, 2015. Nierhaus, Gerhard. Algorithmic Composition. Vienna: Springer, 2009. Paluszek, Michael. and Thomas, Stephanie. MATLAB Machine Learning. Berkley: Apress. 2017. Proudfoot, Diane and Copeland, B. Jack. “Artificial Intelligence” in The Oxford Handbook of Philosophy of Cognitive Science, Oxford: Oxford University Press, 2012. Date Accessed 15 Jun. 2017 http://www.oxfordhandbooks.com.proxy.library.adelaide.edu.au/view/10.1093/oxfordhb/9780195309799.001.0001/oxfordhb-9780195309799-e-7 Romero, J., and Penousal Machado, editors. The Art of Artificial Evolution. Berlin: Springer-Verlag, 2008. Rowe, Robert. Machine Musicianship. Cambridge: The MIT Press, 2001. Tecuci, Gheorghe. “Artificial Intelligence” in WIREs Comput Stat 4 (2012):168–180. van den Oord, Aaron. Dieleman, Sander. Zen, Heiga. Simonyan, Karen. Vinyals, Oriol. Graves, Alex. Kalchbrenner, Nal. Senior, Andrew. and Kavukcuoglu, Koray. "WaveNet: A Generative Model for Raw Audio." 2016 in “Deepmind” Legg, Dr Shane. Suleyman, Mustafa. and Hassabis, Dr Demis, . Accessed 13/5/2017 https://deepmind.com/blog/wavenet-generative-model-raw-audio/