Upload
vothien
View
237
Download
2
Embed Size (px)
Citation preview
P a g e | 1
P a g e | I
Abstract/Summary
This report discusses the design and implementation of a real time, computer generated drum
accompanist. The report starts off describing the motivation and aim of the project before discussing a
selection of related work. This background work focuses on the concepts of tracking the timing of musical
signals, deriving information from them, and creating a musical accompaniment.
From there, the report gives an overview of the entire system then proceeds to go into detail on
each of the components. Justification for each component and the methods used to create them are
discussed alongside these details, as well as how each component contributes to the overall performance
of system. The final portion of the report discusses the outcome of the project and how well it performed
the designated task of providing a drum accompaniment to the user.
The system described here makes use of machine listening techniques to analyze an incoming
musical audio signal and determine various musical properties such as dynamics, tempo, and style. The
system will then generate an expressive drum beat to accompany the audio signal in real time. These
drum beats will be created with a set of genetic algorithms which are intended to model basic human
creativity in a musical sense. The system is intended as a practice tool as well as a means to observe the
musical interaction that may occur between humans and computers.
P a g e | II
Chapter 1: Introduction ...................................................................................................................................1
1.1 Motivation..............................................................................................................................................1
1.2 Overview ................................................................................................................................................2
1.2.1 Aim ..................................................................................................................................................2
1.2.2 Objectives........................................................................................................................................2
1.2.3 Requirements ..................................................................................................................................3
1.2.4 Enhancements ................................................................................................................................5
1.2.5 Deliverables .....................................................................................................................................6
1.3 Methodology ..........................................................................................................................................6
1.3.1 Overview .........................................................................................................................................6
1.3.2 Development Stages .......................................................................................................................7
1.3.3 Schedule ..........................................................................................................................................8
Chapter 2: Related Background .................................................................................................................... 11
2.1 Audio Analysis and Onset Detection ................................................................................................... 11
2.2 Beat Tracking....................................................................................................................................... 12
2.3 Computer Generated Music ............................................................................................................... 14
Chapter 3: Design and Development ............................................................................................................ 17
3.1 Key Concepts ....................................................................................................................................... 17
3.2 Overall Architecture ............................................................................................................................ 17
3.3 General Approach ............................................................................................................................... 18
3.4 Modelling a Drum Kit .......................................................................................................................... 19
3.5 Input .................................................................................................................................................... 20
3.5 Audio Signal Analysis........................................................................................................................... 21
3.5.1 Onset Detection ........................................................................................................................... 21
3.5.2 Beat Tracking................................................................................................................................ 22
3.5.3 Volume ......................................................................................................................................... 23
3.5.4 Sustain .......................................................................................................................................... 24
3.5.5 Complexity ................................................................................................................................... 27
3.6 Drum Beat Generation ........................................................................................................................ 28
3.6.1 Scoring .......................................................................................................................................... 29
3.6.2 Evolution ...................................................................................................................................... 30
3.6.3 Introducing Cymbals and Toms .................................................................................................... 31
3.7 Drum Beat Enhancement .................................................................................................................... 32
3.7.1 Probability Values ........................................................................................................................ 33
3.7.2 Volume Values ............................................................................................................................. 35
P a g e | III
3.7.3 Tone and Position Values ............................................................................................................. 35
3.8 Output ................................................................................................................................................. 36
3.9 Simulink Model ................................................................................................................................... 36
Chapter 4: Results ......................................................................................................................................... 38
Chapter 5: Evaluation .................................................................................................................................... 40
5.1 Overview ............................................................................................................................................. 40
5.2 Quantitative Evaluation ...................................................................................................................... 40
5.2.1Beat Tracking ................................................................................................................................ 40
5.2.1.1 Paranoid Android ...................................................................................................................... 41
5.2.1.2 Blitzkrieg Bop: .......................................................................................................................... 43
5.2.2 Latency ......................................................................................................................................... 44
5.3 Qualitative Evaluation ......................................................................................................................... 46
Chapter 6: Conclusions and Further Work .................................................................................................... 48
6.1 Conclusions ......................................................................................................................................... 48
6.2 Further Work ....................................................................................................................................... 49
References: ................................................................................................................................................... 50
Appendix A - Personal Reflection
Appendix B - Interim Report
Appendix C - Operation Manual
Appendix D - Resources Used
P a g e | 1
Chapter 1: Introduction
1.1 Motivation
Many musicians enjoy performing and practising music with other musicians, however, it is not
always possible to assemble a group of like minded musicians together for this purpose. A software based
solution to provide automatic accompaniment for musicians is a good way to practise performing with
others, as well as gauging how a piece of music might sound with additional instruments. For those
looking to play their music with a drummer, drum programming software is readily available to anyone
with an internet connection. However, most of these programs only allow for the programming of simple
looping drumbeats and can take time to perfect. Those who are not familiar with drumming may also
have a hard time programming a drumbeat that sounds good and fits with the music being played. These
drum machines also do not typically respond to the musician's playing which forces the musician to follow
whatever has been programmed. This is particularly bad for practising as the player is not able play as
expressively as they would with a human drummer. Various instrument playing robots have also been
created, both programmable and improvisational. Unfortunately, most musicians cannot afford to
purchase robots to use as band mates.
What’s needed is a system which allows a musician to perform material on a live instrument and
hear back a drumbeat which can go along with it. This could be useful as a practice tool and as a
compositional aid for those who may not have immediate access to a human drummer.
The system described in this report is a step towards the ability to provide a practice tool for
musicians who may not be able to rehearse with other musicians by allowing them to play their
instrument live and have a supporting drum beat provided to them. Many beginning musicians will
practise by playing along with their favourite songs. However, this method does not allow one to take a
leading role in the performance. The finished program can be a useful tool for soloists to improve their
group performance in an interactive way and without relying on a metronome.
The program may also be useful for those trying to write music as they could get an idea of how a
given guitar or piano part would sound with drums supporting it. It will also be interesting to see how
human musicians interact with a computer simulated drummer and how it may influence their
performance.
This project is relevant to the field of Artificial Intelligence as it is fundamentally a computerised
model of human creativity as it relates to drum performance. This system is designed to exhibit the basic
musical intelligence of a drum player in such a way that it can react appropriately to different musical
cues and properties.
P a g e | 2
1.2 Overview
1.2.1 Aim
The overall aim of this project is to design and implement a program which uses machine listening
and learning to analyse audio produced by a human musician in real time. By analysing beat patterns and
rhythms, the program should provide feedback in the form of an accompanying percussion part. The
system will receive a musical audio signal as it is played, determine the tempo and beat pattern of the
signal, then output a percussion accompaniment to the musical signal in real-time.
First, a beat detection feature needs to be implemented. This feature must derive the beat
structure from an incoming audio waveform by detecting onset events within the wave. These onsets are
defined by sudden energy increases within the waveform and typically coincide with a note or chord being
played on the instrument that produced the waveform.
Once the beat structure has been determined, a fitting drum part will be generated in a predictive
manner so that it will be played along with the human musician in real time. This drum part will be
modelled on the components of a standard rock/jazz drum kit consisting of at least a bass drum, snare
drum, and hi hats.
Another aim is to provide a practice tool for musicians who may not be able to rehearse with other
musicians. It will also be interesting to see how human musicians interact with a computer simulated
drummer and how it may influence their performance.
1.2.2 Objectives
The program can be divided into four primary components:
Audio Input/Output Handling
Signal Analysis
Beat Tracker
Drumbeat Generator
Artificial Expression Simulator
There are many ways to design an audio input/output system so it is important to employ a system
which is efficient and effective. In order to handle streaming input, a buffer will be implemented so that
small samples of the incoming audio can be analysed. The output will be handled in a similar way, most
likely by playing out full measures of drumbeats at a time, though this may be subject to change.
P a g e | 3
The signal analysis component processes the incoming musical waveform into a form that the rest
of the program can process. This step is most vital for the beat tracking aspect. Further analysis will
attempt to detect other musical features that may be relevant to the style of the music.
The beat tracker will determine the tempo of a piece of music based on a consistent pattern of
onset moments occurring in the input signal. Because most humans do not possess perfect timing, the beat
tracker will be able to adapt to slight changes in tempo without skipping notes or going out of time. It must
also be responsive to expressive playing where onsets will not always be consistent and straightforward.
The drumbeat generator makes use of a probability template to generate a simple but varying
drumbeat. This is primarily to provide a groundwork for which the artificial expression simulator can
create interesting and relevant beats to complement the musician's playing. This aspect will hopefully
make further use of the input signal analysis to determine the dynamics and overall style of the music
being played.
1.2.3 Requirements
Key requirements of the project include the development of a robust and accurate beat tracking
algorithm capable of tracking songs with non-constant tempos. It requires a suitable onset detection
method from which the beat tracking algorithm can derive a tempo. The beat tracking process needs to
work in real time with a constant input therefore it must be able to accurately predict approaching beat
locations.
A suitable drum beat template must also be created with a strong reliance on the derived tempo.
This template will allow for the creation of basic and relatively complex beats while remaining flexible to
changes in tempo. The template does not define the drum beat itself but rather provide an empty
structure which allows particular drum hits to be specified. This will prevent the generated drum beat from
playing out of time.
Artificial expression is another key aspect of the system. In order to avoid repetitive and
emotionless drum beats, an algorithmic approach to mimic human creativity will be implemented. This will
allow the drum beat to respond to a musician’s dynamic level and style as they change throughout a song.
With respect to the basic drum beat, extra hits and drum fills should included but set so they are not
overdone, this will provide a more of a human effect to beat. The minimum requirements are as follows:
P a g e | 4
1. Implement an efficient input/output system
The program will require a buffer system to analyse small samples of an audio signal as it is
produced by a human musician. A circular buffer will most likely be implemented for this purpose. The
output must also not interfere with the input signal as this would cause the program to be influenced by
its own output rather than the human musician.
2. Analyse beat patterns within the audio signal
Extracting the tempo from input audio is essential to creating a drumbeat which will stay in time
with the user's playing. This can be accomplished by performing an onset analysis on the waveform and
by finding patterns in the peaks that occur. Any regular interval occurring between peaks (even if other
peaks are present during the interval) can produce the tempo. [7]
3. Perform in Real Time
In order to provide a true accompaniment, the program must be able to analyse the audio signal,
produce a beat, and play it back in time with the human musician. The program will have to predict what
the user will play and when, then play back the drumbeat to go along with what was predicted.
4. Implement a genetic algorithm to produce potential beat candidates.
The genetic algorithm will focus on generating patterns for each instrument of the drum kit to
produce a whole beat. Beats are scored according to criteria relating to each instrument which will
influence its overall score. The higher scoring beats will move on to the next generation, while beats
meeting certain scoring requirements will be randomly selected for crossover and mutations. The lowest
scoring beats will be scrapped and replaced with new randomly generated beats in order to provide new
possible beats. This algorithm will require modification as development progresses.
A potential beat created by this genetic algorithm could be visualised in the following way, with 1
indicating a hit and 0 representing a rest. The following beat in Figure 1 is for a single 4/4 measure split
along 16th notes, it can be programmed as a simple integer matrix in most programming languages:
P a g e | 5
Figure 1. Drum Beat Matrix Representation (adapted from [5])
5. Simulate a human drummer with feedback based on the previous analysis of tempo and patterns.
Using all the previous minimum requirements, the final drumbeat will attempt to emulate a
human drummer in the sense that it will adapt to change and provide suitable, non-repetitive drumbeats
in real time. The minimum requirement for this virtual drummer's kit is a bass drum, snare drum, and hi
hat.
1.2.4 Enhancements
Most rock and jazz drummers possess more than just a snare, kick, and hi hat. Including these
additional components would allow for more expressive drumbeats to be created. Each addition will
require a new scoring system for the genetic algorithm, as well as the introduction of new rules in how
each component can interact with the others. Acceptable drumbeats should be able to be played by a
competent human drummer possessing two arms and two legs.
Extracting musical features and information from the input can also improve the system. Further
waveform analysis can result in information which will affect various characteristics of the generated
drumbeat. Analysis may include changes in amplitude, the rate of onsets, and the rate of decay of any
peaks as these can all help influence stylistic decisions which are made during drumbeat creation. [7]
Because most drummers are not just drum machines, it is important to include a degree of
variability within a set beat. This allows a single beat to be repeated a few times with enough variation to
create a more natural effect. Instead of using 1's and 0's to indicate a hit or rest, values between 0 and 1
can be used to show the percentage of a hit occurring at that moment in time. Hits which help define the
beat can be weighted so that they will always be a 1 or close to it, while less essential hits can be given
lower values so that they are not repeated on every iteration. [5]
Introducing tone values for the cymbals and toms as well as a position value for the hi hats may
help to increase the realism of the virtual drum kit. This would create the effect of a drum kit possessing
multiple toms and cymbals without having to create additional instrument tracks within the drumbeat
template. With the hi hats, this value would represent the distance between the two cymbals. The tone
values would also range from 0 to 1, with lower values corresponding to a lower tone/hi hat distance, and
higher values indicating a higher tone/hi hat distance. [5] Also, allowing for volume adjustments would
allow the program to create accents and ghost notes, again giving the beat a more human feel.
P a g e | 6
Additional features such as capacity for triplets, drum fills, and swing beats would also greatly
increase the realism and musical capabilities of the virtual drum accompanist. Many drummers use drum
fills to signify transitions, create tension, and to bridge a section of music where not much else is going on.
A drum fill in this system would temporarily override the current beat for an appropriate duration so it
does not sound like a second drummer has suddenly joined in and dropped out. The fills would need to be
regulated so they do not occur too often or at inappropriate moments. This feature would greatly increase
the program's versatility if correctly implemented.
In an effort to make the program more user friendly, a simple user interface containing all the
functions of the program could be implemented. This would allow any user to simply install the program
and begin using it in a fully live situation without having to compile code or perform any unnecessary
setup.
1.2.5 Deliverables
The deliverables of this project will include a final report and a program which can take a musical
input and output a drumbeat to accompany it in real time.
1.3 Methodology
1.3.1 Overview
Iterative/incremental development [4] allows a developer to gradually incorporate individual
features to a program in such a way that each addition results in a fully functional version of the program.
Each addition follows a cycle of planning, design, implementation, testing, and evaluation. Every addition
also gives the developer a chance to reconfigure other existing design aspects within the program if
necessary. This method allows for a great deal of flexibility during architecture construction as well as
early and easier bug detection. [4] [15]
Each working implementation should be thoroughly tested and analysed before work on the next
version begins. Individual features should also be clearly separable in order to accommodate modification.
This process may call for a redesign of the system architecture should the need arise. [4] [15]
A project such as this can benefit greatly from this method of development. Due to the reliance on
genetic algorithms and probability based functions, it is difficult to predict exactly how the program will
react to any given feature implementation. Utilising an iterative/incremental development method, each
addition of a feature can be tested and optimised until the desired result is achieved. This development
P a g e | 7
method also states that features should be easily separable and well organized, this may allow the
developer to disable a given feature at any point in the development cycle in order to evaluate its
usefulness and efficiency. Sub-features may interact with each other but will be designed to be
independent of one another.
The project will follow the iterative/incremental [4] [15] method by first carrying out the initial
planning phase which involves heavy research and a basic architecture design. A few starting features will
also be considered during this stage in such a way that the next stage will produce workable results. The
next step will be to create the basic beat detection and drumbeat generation structures. These two
features make up the foundation of the project and will serve as a starting point for all additional features
to be implemented. A series of optimising features will then be gradually incorporated into the program.
1.3.2 Development Stages
The development stages of the project are shown below:
Stage 1
Initial Planning
Research
Broad Architecture Proposals
Stage 2
Beat Detection:
Implement as a subsection of main function.
Test to ensure detection is accurate
Drumbeat Generation:
Create basic drumbeat template using genetic algorithm (kick, snare, hi hat)
Do not take input waveform into consideration, create template independent of waveform.
Check to see that top beats are acceptable
Beat generator will only respond to pre-recorded tracks at this point
Stage 3
Beat Detector/Generator Syncing
Link the two features together so that the generated drumbeat will be displayed in time with the song playback
Latency analysis will start here for reference
Stage 4
Incorporate basic wave feature analysis to augment beat generator (will use features to determine stylistic changes to beat). Will have a few different segments (each addition with latency analysis):
P a g e | 8
Dynamics:
Volume parameters will be added to beat template. Are determined by relative local amplitude.
Sustain:
Observes peak trails to determine if legato (smooth) or staccato (detached) beat should be used
Hi hat openness parameter introduced.
Accents and ghost notes:
Adjust volume parameters of individual hits to allow for more dynamic drum beats.
Stage 5
Real time implementation for beat generator/detection system.
May occur during the feature implementations of stage 4.
Stage 6
Hit Probabilities
Non essential drum hits can have probability values assigned to them to create the effect of a more varied drumbeat while still retaining the overall feel of the drumbeat.
Additional tracks
Cymbals and toms added to structure
Stage 7
User testing
Optimisation and Deployment
1.3.3 Schedule
Below, the tentative schedule as of the writing of the interim report is shown. This schedule
indicated that many of the styling features associated with the drum beat output would be programmed
just after the submission of the interim report at the same time as the conversion to real time. Before this
point the system only accepted pre-recorded audio tracks. This plan, shown in figure 2, allowed for plenty
of time at the end for writing the report, evaluation, and optimisation.
P a g e | 9
Figure 2. Initial Project Schedule
The chart below (figure 3) shows how the process actually occurred.
Figure 3. Final/Actual Schedule
P a g e | 10
Converting the system to real time proved to be much more difficult than previously anticipated.
The beat tracking software used with pre-recorded tracks [LabROSA] was not designed for real time
function, so I opted to convert their system for real time use. Unfortunately, when attempting real time
processes in MATLAB using Simulink, many of the commands and techniques commonly used in MATLAB
are not available. This required an extensive period of time in which all of the lines in the original code
which threw out errors were substituted with alternate and less efficient blocks of code in order to
circumvent the limitations of Simulink. Nearing the end of this process, it was demonstrated that a
significant amount of latency was occurring during test runs and the clock timer built in to Simulink would
also slow down. These factors led to the decision to start over using the aubio [10] beat tracking system
which, while not as accurate, was fully ready to operate in real time. Following a series of licensing,
installation, and driver issues which were resolved thanks to the help from the School of Computing
support staff, the aubio system was able bring the system up to speed in terms of real time execution.
This period resulted in other aspects of the project being pushed back, most notably the drum
styling features which are considered vital to the emulation of a human drummer. The various obstacles
which occurred during this time led to an early start on the final report. The limitations associated with
Simulink also prevented the desired method of audio output, utilising a MIDI based drum kit to
accompany the input signal.
P a g e | 11
Chapter 2: Related Background
This section gives an overview of previous work which relates to this project. Various audio
analysis techniques are discussed first, followed by different approaches to the problem of beat tracking.
The final sub-section looks at a few papers related to the artificial emulation of human creativity
2.1 Audio Analysis and Onset Detection
Onset detection is a key aspect of the overall beat tracking algorithm. Bello, et al.[1] discuss various
approaches involving differences in energy and phase within the waveform. Each method has a potential
application that is largely based on the type of input that will be most commonly received. One method is
to observe the spectral features of the waveform by performing a Fourier transform on the wave [2]. The
features that may be derived from a waveform filtered this way are useful in detecting onsets amidst
relatively noisy and layered inputs.
Temporal features, which relate to a wave's amplitude, may also be taken into consideration. A
valid onset typically occurs during a sudden increase in the waveform's amplitude [26]. It is described in [1]
how rectifying and smoothing the signal can help to accentuate these onset features. It is also argued that
the wave should be filtered so that high amplitudes which are not part of a sudden rise will be lowered.
This makes it easier to identify where the onsets actually occur. It is stated that analysis of the temporal
features is a fast and efficient method of onset detection and is particularly useful when the audio signal is
being produced by single, accented instrument such as a guitar or piano.
Ellis [11] has worked to develop a beat tracking program which utilises Matlab. This program
analyses a waveform and generates an audio file which produces clicks corresponding to the detected
beat. Ellis discusses using an onset strength envelope which is essentially a filtered representation of the
original waveform. The onset strength envelope used in [11] locates sudden energy increases within the
waveform and represents them as individual spikes in the processed waveform. Higher spikes tend to
represent valid onsets. The filtering process involves performing a short-time Fourier transform similar to
the spectral feature analysis approach described in [1]. The signal is also passed through a high pass filter
and is then convolved with a Gaussian envelope [11]. This seems to be an effective approach though it
would need to be modified for real time applications.
Goto and Muraoka [13] attempt to recognise chord changes in a piece of music in order to detect
onsets for the purpose of beat tracking. By focusing on the lower end of the frequency spectrum, the
onset events are likely to occur on chord changes rather than on potentially complex melodies which are
typically of a higher frequency. While this method is relevant due to its execution in real time, it may not
P a g e | 12
be robust enough to accommodate pieces of music which do not feature clear and discernable chord
changes.
Tools such as Max/MSP and Pd can also be used for various aspects of audio analysis, as described
in [16]. The fiddle system outlined in [16] attempts to determine the pitch of an incoming audio stream
using sinusoidal decomposition. A rectangular-window discrete Fourier transform is used to obtain the
peaks in the audio along with their corresponding frequencies and amplitudes. From there, the
fundamental frequency of the input is estimated using a likelihood function in which each individual peak
is matched to the nearest frequency corresponding to a musical note. This works for single note inputs as
well as inputs consisting of more than one note and will display the note name(s) to the user. If an input is
not near enough to any of these fundamental frequencies then the input is determined to not have any
pitch at that moment in time.
Another system described in [16] is the bonk application. This system is used to detect the onsets
of percussive hits which are not pitched and therefore not susceptible to sinusoidal decomposition
analysis [17]. The system uses spectral analysis to detect percussive onsets rather than looking for sudden,
sharp increases in amplitude in order to avoid onsets being masked by loudly ringing sounds. This analysis
is further used to help identify what instrument produced the onset, this is done by comparing the
analysis with pre-stored and identified spectral templates. These systems, most notably the bonk program,
may offer interesting features to future implementations of the virtual drum accompanist system.
2.2 Beat Tracking
The concept of beat tracking has been explored in many different ways. Essentially, beat tracking
is the process of deriving the tempo and beat pattern of a piece of music in the same way that a human
may tap their foot in time with the music. A few approaches to this problem will be discussed in this
section.
Goto and Muraoka have developed a series of beat trackers, one of which derives the beat based
on audible percussive hits [12]. This beat tracker is able to function in real time but relies on a steady
drum beat to be present, whereas in this project the drum beat is to be responding to the tempo, not
leading it. Another one of Goto and Muraoka's beat trackers [13] detects chord changes in order to
determine patterns, specifically root note changes which occur between10Hz-1kHz.
The beat tracker used by Zhe and Wang [35] detects measures by extrapolating from evenly
spaced, pronounced downbeats. Subdivisions are then calculated to fill out each measure. This system
assumes a basic pop song in 4/4 time is being played at a constant tempo. Songs which do not feature a
pronounced downbeat may confuse the system, therefore limiting its ability to detect tempos in songs
P a g e | 13
that do not follow pop conventions. This system is not likely to have any relevance to this project.
The beat tracking system described by Ellis [11] utilises Matlab to analyse a waveform and
generate an audio file which produces clicks corresponding to the detected beat. Ellis discusses using an
onset strength envelope which is essentially a filtered representation of the original waveform. The onset
strength envelope used by Bello et al. [1] locates sudden energy increases within the waveform and
represents them as individual spikes in the processed waveform. Higher spikes tend to represent valid
onsets. The filtering process involves performing a short-time Fourier transform similar to the spectral
feature analysis approach also described in [11]. The signal is also passed through a high pass filter and is
then convolved with a Gaussian envelope. The onset detection methods used by Ellis are derived from the
filtering equations described by Bello et al. [1] By finding a pattern of recurring and equidistant peaks in
the onset envelope, a general tempo can be easily derived. The tempo calculation is weighted to prevent
extremely distant and near peaks from forming improbable tempos. The derived tempo is biased towards
120 beats per minute, a design decision which acts as probable middle ground for human created music.
Extremely fast and slow tempos, while still possibly in time, would not represent the common human
interpretation of the beat. This system is open source and easy to set up, though it is designed to work
only with pre-recorded audio files.
Collins [7] proposes a multi-agent beat tracking algorithm which follows a human musician playing
a MIDI keyboard in real time. This method uses a number of agents which predict where the next beat
should occur. Each agent, a hypothesis of beat locations, has a score and a weight to determine its
accuracy. An agent's score is increased for making correct beat predictions which coincide with a human
musician playing a note. Low scoring agents are erased and new ones are constantly created to allow for
the tracking of dynamic tempos. Each agent contains a set of values describing that particular agent's
current score, weight, and beat estimations. Whenever an onset occurs, each agent is checked to see how
well it predicted the onset occurring at that particular time. Poorly performing agents are eliminated and
new ones are generated based on the onset time. Scoring is weighted depending on the amount of time
since the last onset in order to prevent subdivision of a beat from counting as an actual beat. Essentially,
this weighting prevents overly fast tempos from being derived. Collins onset detection method is
dependent on MIDI onsets rather than a waveform so it does not utilise any onset detection methods that
would be required of a non-MIDI audio signal.
The aubio real-time audio library [10] provides real time beat tracking for streaming audio input
captured by a microphone. The library uses real time onset detection methods described in [1]. The real
time aspect is possible due to the predictive nature of the system. The next predicted beat is determined
by the intervals between the previous few beats with a Gaussian weighting applied, meaning the most
recent bit intervals will have more influence. The predicted beat location is based on the location of the
beat immediately preceding it. This makes the tracker very adaptable to changes in tempo, though it may
P a g e | 14
be prone to error if a complex input rhythm is introduced. The primary advantage of this system is the
absence of noticeable lag during the execution of the beat tracking function.
Toiviainen [27] explains how adaptive oscillators can be used for beat tracking purposes. The even
motion of an oscillator can easily map to the pulse of a beat. The peaks during oscillation can correspond
to downbeats while the troughs can represent upbeats. In this way, reliable counting during a bar can be
modelled. Onsets occurring near the peak of an oscillation can help influence the oscillator's speed
depending on whether the peak occurs before or after the detected onset. This also helps to prevent
onsets of complicated rhythms to have less influence on the derived tempo providing that a strong and
clearly defined downbeat is present. Toiviainen's adaptive oscillator also takes into account short term and
long term changes. If a sudden change in tempo occurs, the oscillator will increase speed in order to catch
up before settling back to the original tempo. The long term change tracker looks at how the tempo has
changed over time and adjusts the base speed of the oscillation to match. The adaptive and dynamic
nature of this system looks to be promising though it requires a MIDI input for its onset detection aspect.
The B-Keeper system uses the kick drum from a real drum kit to determine the beat in a live
setting [24]. The program uses a microphone with a line into a Max/MSP program which performs the
onset detection and beat tracking portions. The beat tracking aspect hooks up to a system to play backing
tracks used in live performances without the need for the drummer to play to a click track in a pair of
headphones. This allows for great expressivity within the band by not constraining them to an unchanging
backing track.
2.3 Computer Generated Music
Collins [7] explains how his system creates a melodic accompaniment to a human musician in real
time by detecting the chords and notes being played and producing an opposing melody in order to
inspire the musician to try something new. The rhythm of the melody is also designed to accent beats
which the human musician is overlooking. While many interesting ideas are discussed, the virtual
drummer of this project is meant to provide a more supportive role rather than pushing new ideas.
Perhaps this idea could be incorporated as an option in a future version of the program for musicians who
may be seeking new sources of inspiration or just looking for a challenge.
Collins also touches on the creation of computer generated music in 'Algorithmic Composition
Methods for Breakbeat Science' [5]. Collins explains how a series of probability templates can be set up
which display the probability of a given instrument being activated at that particular moment in time. The
example (a copy of figure 1) below displays a template for a single measure of 4 beats, each divided into a
set of 16th notes. When the probability value is 1.0, the note is guaranteed to be played on every iteration
of the measure and a 0.0 indicates no chance of activation.
P a g e | 15
Figure 1. Drum Beat Matrix Representation (Adapted from [5])
This method allows for a non-repetitive drumbeat which will cause the generated part to sound
slightly more human. Additional values can be attached to each location which affect volume and pitch if
desired. The template will be set to synchronise with the beat pattern and will continuously update the
time interval between notes as tempo changes in the musician's playing occur. The exact beat locations
must be anticipated if the system is to keep up in real time [6].
Another method contains set beats of varying length, a member of the set has a probability of
being executed rather than individual hits (which are 0 and 1 probability). It is also mentioned that
additional probability values could be included that influence an effect or tone for a given hit (would only
apply when p(hit) > 0). This concept could be used for varying volume levels in such a way that enables
ghost notes and accents within a drumbeat. Having current hits influence the probability values of future
hits will also help to create a more interesting drum part by potentially reducing chance based repetition.
Collins is still continuing work on expanding various machine listening and learning techniques in
computer generated musical accompaniment [8].
Weinberg et al.[28] have created a situation in which they can study the interaction between
human and robotic musicians. Shimon is a four armed robot which plays the marimbas and Haile uses two
appendages to play a hand drum. The robots are able to perform along with human musicians in real time,
Haile providing percussion accompaniment [31] [30] [29] (with lead/support trading capabilities) and
Shimon providing Thelonious Monk inspired marimba parts. Using physical robots to play instruments is to
give a sense of personality to the robotic performers, making musical interaction with human musicians
easier. Shimon is designed with to have a “head” which moves in time to the music and is able to
track/follow fellow performers. The ideas discussed are very relevant this project but without the
emphasis on physical robotic performers as this greatly complicates the process. In the future, however,
the program could easily be modified to send musical instructions to robot designed to play a drum kit.
The robot, Haile, can also be adapted to play a xylophone or small marimba [33] [32]. Haile's
playing is determined by a genetic algorithm which uses melodic excerpts of a human pianist's playing as
its base population. These excerpts set the style for what the robot will play, essentially providing the
robot with a natural, human inspired starting point. The robot is able to freely improvise due to the
musical instructions it receives from the other instruments. Note densities highly influence the robot's
playing between lead and support mode, creating an atmosphere similar to when human musicians
P a g e | 16
improvise together and take turns leading the song. The genetic algorithm used to determine what Haile
plays takes roughly 0.1 seconds to run which enables it to quickly respond to changes in playing if needed.
A notable difference between the Haile system and this project is that Haile relies on multiple samples of
human playing rather than coming up with its own parts. This is useful for selecting a specific style of
playing and may be useful in a future version of this program which enables a musician to select a
drumming style for the song he or she will be playing.
Ramirez, et al. [22] provide an approach rooted in the concept of genetic algorithms [18] and
machine learning in order to model human musician expressivity [21]. The authors make use of recordings
from a professional jazz saxophonist as a training set for which their computer generated composition will
use as a basis for creating creation rules [20]. Because these rules are rooted in a particular style, the
program will create a melody that is similar in style while still being relevant to song it was created for. The
authors attempted to look at the problem from the point of view of a human musician and how they
would interpret the music being presented to them before improvising an accompaniment [19]. This
method, with its genre based training sets, may be useful for the creation of drumbeats that are intended
for a particular style of music, similar to the work of Weinberg et al. [33]
P a g e | 17
Chapter 3: Design and Development
3.1 Key Concepts
Onset Detection
In musical audio signal processing, an onset refers to the peak that occurs when a note on an
instrument is first played. The onset marks the very beginning of the note and should not appear in the
middle of a sustained note. Onset detection describes the processing of locating these peaks within an
audio waveform, typically for the purpose of beat tracking.
Beat/Tempo Tracking
Beat tracking refers to process of determining the location of the beat of a musical signal, a beat
being the pulse within the music that one would normally tap their foot to. The beat can be derived from
a set of onsets by finding a consistent pattern between the onsets. The tempo of a song can be derived
from these beat locations by looking at the common interval time between them.
Drum Beats
A drum beat is a musical pattern often performed on a drum kit to accompany a piece of music.
The drum beat is played in a way which complements the music it is accompanying and can help keep a
group of musicians in time with each other.
Modelling Human Expression
While human expression is quite an abstract concept, for the purposes of this report it refers to
the way a human drummer plays a drum beat with variations in volume and style as well as variations on
the drum beat itself.
3.2 Overall Architecture
The system follows a circular process in which it first listens to a section of a streaming input
which consists of a musician playing their instrument. The signal is then processed in order to determine
the beat as well as various musical features. This information is then used to determine the fitness of
potential drum beats created by a genetic algorithm. Once a winning drum beat has been chosen, it is
P a g e | 18
then output back to the user with additional expressive features to simulate a realistic drum beat. This is a
continuous process so that changes in tempo can be properly accounted for.
This architecture, shown in figure 4, allows the system to be responsive to changes in the
musician's playing, as well as having the potential to influence the musician to respond to the drum beat
which is created. This back and forth interaction is at the heart of many live performances between
human musicians, therefore, the system also attempts to emulate this process in order to provide for a
more natural feel.
Figure 4. System Architecture Flow
3.3 General Approach
The audio signal analysis is what processes the incoming musical waveform into a form that the
rest of the program can recognize. This step is most vital for the beat tracking aspect. The beat tracker will
determine the tempo of song based on a consistent pattern of onset moments occurring in the input
signal.
As this project is geared toward guitar and piano players, a sampling of both guitar and piano
performances will be used to test the beat tracker. Input is captured with a standard, inexpensive
computer microphone to ensure that lower quality signals will be effective. For validation, the
performances will feature sample songs of varying speeds, volume, and levels of complexity. Additional
samples will contain non-consistent levels of these features, such as a song that is sped up and slowed
P a g e | 19
down at varying rates. This will be necessary as few musicians are able to continuously play at a perfectly
constant tempo. It is also important as some songs are very dynamic in this regard and should be
accounted for.
Additional analysis will include translating musical concepts into a basic numerical form. These
concepts include dynamics, articulation, and complexity.
The drum beat creation aspect makes use of a probability template to generate a simple but
varying drum beat. This is primarily to provide a groundwork for which the artificial expression simulator
can create interesting and relevant beats to the musician's playing. This aspect will make further use of the
input signal analysis to determine the dynamics and overall style of the song being played.
Because drumming is an art without too many set rules, many of the stylistic decisions made
during drum beat creation are based on a few simple conventions which mostly relate to rock and jazz
drumming. A wide range of musicians, with backgrounds in styles of music which commonly feature a
drum kit, were polled for their opinion on some of the decisions made below. The results of this poll will be
discussed as each concept requiring a musical assumption is brought up.
3.4 Modelling a Drum Kit
In order to provide an accurate representation of a human drummer, a standard rock/jazz drum
kit will need to be emulated. The image on the following page (figure 5) shows a five piece drum kit
complete with hi hats and cymbals.
Figure 5. Drum Kit Components
P a g e | 20
The bass and snare drums typically make up the core of a drumbeat, with the bass drum
commonly being struck first during a drum beat (known as the down beat) with the snare occurring in
between on the backbeats. Depending on how hard the snare is hit, it can provide loud accents or softer
filler notes known as ghost notes. The hi hats are used primarily to keep time, but when another drum is
being used for time keeping the hi hats may be hit to provide accents. Cymbals will often provide loud
accents to a drum beat but can also be used to keep time if desired. Toms range in sound and tone
depending on size. They are used for accents and drum fills, and occasionally lower toms will be used to
keep time.
3.5 Input
The system receives input from a basic computer microphone picking up the signal of an acoustic
instrument. The microphone is connected to a Linux machine running aubio [10]. aubio is an open source
package with real time beat tracking and onset detection capabilities. The input signal is analyzed and the
predicted beats are output as an audio signal in the form of clicks. aubio makes use of JACK
(http://jackaudio.org/), an open source program which allows for real time audio interfacing. JACK allows
the microphone signal to be sent to aubio and for the signal produced by aubio to be sent to the speaker
channel. Figure 6 shows a graphical representation of this process.
User Microphone Linux machine w/
aubio and JACK
Windows machine
w/ MATLAB and
Simulink
Speakers or
Headphones
Figure 6. User/Hardware Information Flow
P a g e | 21
The click track produced by aubio is then output through the left speaker channel of the
computer, while the raw input signal is sent through the right. A stereo speaker cable then leads to the
microphone input of a Windows machine running an instance of MATLAB. The core program is contained
within a Simulink model, a feature of MATLAB which allows for real-time handling of signals.
The Simulink model receives the signal from the Linux machine and analyses it within two second
windows. The signal is split into two arrays, one containing the click track and one containing the input
signal. The tempo of the audio is then derived from the click track while the input signal undergoes
further analysis.
A parallel process at the start looks at the input signal to make sure the user is about to start
playing. For a piece of music in 4/4, the musician needs to tap out eight hits before the program will
begin. This can be done by clapping, tapping on the instrument, or strumming muted guitar strings. Once
eight hits have been played, the musician can start playing as normal. Because aubio will continuously
output the timing click track, this introduction gives it a chance to readjust itself to the new tempo. The
tap-out introduction also helps the program to identify the down beat of the bar so that the drum beat
will not only stay in time with the musician, but correctly align itself to the time signature. This technique
is often used by musicians in a group for coordination purposes so it should not be an unfamiliar concept.
3.5 Audio Signal Analysis
3.5.1 Onset Detection
First, a suitable onset detection method must be implemented in order for the beat tracker to
successfully derive the tempo of the song. Running the signal through a short-time Fourier transform,
followed by a high pass filter, then convolving the signal with a Gaussian envelope, a suitable onset
strength envelope can be created [11]. An example of the onset envelope can be seen in figure 7. These
methods essentially just accentuate possible onsets so that beat tracking is a simple matter of peak
selection and pattern detection. These methods are also utilised by the real time aubio system, so its
onset detection incorporated into the beat tracking system will be used. The input signal analysis must be
executed in real time as there is no guarantee that a musician will be playing the same sample repeatedly
and at a perfectly consistent tempo.
P a g e | 22
Figure 7. Waveform Onset Envelope Derivation [11]
3.5.2 Beat Tracking
Accurate beat tracking is vital to the performance of the virtual drum accompanist. If it cannot play
in time with a human musician, then it would be considered a very poor accompanist. The beat tracking
aspect of this system is provides the tempo for the drum beat playback aspect by analyzing the interval
between beats.
The image on the following page (figure 8) shows the stages of beat detection used in the LabROSA
Cover Song ID package [11].
P a g e | 23
Figure 8. The top graph shows the raw waveform of a 17 second piano part. The graph below it displays the waveform
after it has been processed to display onset strength envelope. The bottom plot displays the beat derived from the
onset envelope [11].
Because this system is to be run in real time, the aubio beat tracking system [10] discussed in
section 2.2 will be used to set the tempo for the output.
3.5.3 Volume
A proficient drummer knows when to play loudly and when the play softly, often taking cues from
the leading musicians as to what volume level is appropriate. The automatic volume control of this system
establishes the output drum playing volume on the amplitude of the input signal.
The original waveform is monitored and tracked for gradual changes in amplitude, as these will
determine the overall volume of the drum beat. The program looks at the maximum amplitude within the
current window and retains the value in order to set the base output volume. Because the amplitude of the
P a g e | 24
input signal ranges from -1 to 1, the absolute value is observed, giving a volume range of 0 to about 1.3.
This method for volume control was chosen for its simplicity and fast computing time. When a selection of
musicians were asked if this was a valid musical assumption, all respondents agreed.
The image below (figure 9) shows the first section of the piano part from The Great Gig in the Sky
by Pink Floyd [34]. This song features very soft playing at first with a gradual crescendo (increase in
amplitude) into a louder section with a guitar accompaniment. These volume changes are noted within the
system to dictate the output volume of the drum beat.
Figure 9. Waveform volume change
3.5.4 Sustain
Articulation is a musical term which refers to the how the space between notes is handled, with
either silence, sustain, or a degree of both. In the case of guitar and piano, sustained chords will often
correspond to a smoother playing style, while notes with a noticeable silence between them can mean
something a bit more disjointed is being performed. While there are no strict rules dictating how a
drummer should react to differences in articulation, it is fairly common to see a drummer matching his or
her articulations to that of the other musicians. Regardless of what direction a drummer takes in regard to
articulation or sustain, it is a factor that should not be ignored. Sustain will be represented as a rating from
1 to 3, 1 meaning little to no sustain, 3 meaning full sustain, and 2 representing a moderate level of sustain.
The decrease rate in amplitude between peaks is what will be observed in order to determine the
sustain value. Quick, drastic drops in amplitude will give a sustain value of 1 while slow and mild drops in
amplitude between peaks will produce a value of 3. Any decay rate in between these is given a 2. At the
end of the window, the average sustain value is then sent forward to determine drum articulation. The
sustain value does not carry over from window to window as this would greatly hinder the system's ability
to quickly adapt to changing styles.
P a g e | 25
Testing has shown that quick amplitude drops to below 20% of the maximum amplitude indicate a
staccato (heavily disjointed) style of playing. Drops to this value over a longer period of time (typically
greater than .4 seconds) are mainly due to the natural amplitude decay of a musical signal and therefore
not considered to be staccato. Testing has also shown that if a signal retains at least 30% of its maximum
amplitude, a high level of sustain is present. These values have been gathered with the use of inexpensive
microphones, the use of compressors in any recording device would not be compatible with this system.
The first image (figure 10) is from a slower, acoustic version of "Blitzkrieg Bop" by the Ramones
[23]. The slower version of the song features long, sustained chords which accounts for the slow rate of
decay in the signal. The signal holds around .4 in amplitude before the next onset occurs, further evidence
of the sustained chord.
Figure 10. Gradual amplitude decay with balance around .4 on the y axis.
The next image (figure 11) is taken from the second movement of "Paranoid Android" by
Radiohead [14]. This part features slightly quicker decay as only a single string is played at a time as
opposed to full chords. This results in a slightly less smooth guitar part but with still with some degree of
sustain dropping only slightly below .4.
P a g e | 26
Figure 11. Quick initial amplitude decay with slow latent decay and/or balance.
The final image (figure 12) is taken from the song "Hanuman" by Rodrigo y Gabriela [34]. This is a
much more disjointed guitar part than the previous samples as evidenced by the quick drops in amplitude
as well as the drops below .2 which indicate a brief silence from the guitar aside from background noise
including the sound created by fingers sliding across the muted strings to new positions.
Figure 12. Quick initial decay with occasional balances below .2.
P a g e | 27
3.5.5 Complexity
Rhythm is a very important aspect in music, the way a particular part is played on a melodic
instrument can greatly influence how a human drummer will perform their part. Typically, rhythmically
simple guitar or piano parts will feature an equally simple drum part. One can look at the classic punk
band, The Ramones, to hear an example of this. For musicians who are practicing simple parts, a complex
drum beat may be detrimental to the feel of the song and may also confuse novices who are not
accustomed to intricate rhythms. Musicians who are able to play slightly more complex rhythms should
therefore be able to handle increasingly complex drum parts. While some may argue that higher level
playing does not always necessitate an elaborate drum beat, it may be useful for practice purposes to help
push a musician to work with and around drum beats of varying difficulty.
The complexity value is determined by looking at the number of offbeat onsets compared the
number of detected beats.
Where is the total number of onsets, is the number of onsets which lie on a detected beat,
and is the total number of beats. This number typically ranges from 0 to anywhere above 2.0, where 0
indicates a very straightforward rhythm, while higher values correspond to an increased complexity. Ob is
determined by looking at the time location of a given onset and comparing it with the time locations from
the beat array. If the time locations are within 1/40th of a second, it is assumed that the onset is on the
beat. While the onsets and beats which occur simultaneously normally have the same time value,
occasional discrepancies between the two waveforms may result in slightly offset onset index locations.
This may also be caused by the musician playing a note slightly ahead or behind the beat. Once found, the
complexity value is sent forward to the genetic algorithm fitness function.
The justification for this method comes from the observation that simple rhythms will follow the
beat of the song fairly strictly, or at least within even subdivisions of the beat. Complex rhythms will
deviate from the beat and typically do not feature consistently spaced notes.
In figure 13, the graph on the left (from [34]) shows how each onset occurs at the same time as a
beat indicator, this is representative of a very simple input and will be given a complexity value of 0. The
graph on the right (from [14]) shows a number of onsets which do not correspond to any beat as well as a
few beats which have no onset occurring simultaneously. The local window in this sample would have a
higher complexity value due to these factors.
P a g e | 28
Figure 13. Low complexity value on the left due to an even beat:onset ratio with all onsets occurring on beat. The wave
on the right has a higher complexity value due to the number of onsets which occur off beat, some beats also have no
onsets associated with them.
3.6 Drum Beat Generation
Artificial creativity is a rather abstract concept in which a model of human creativity can be
represented algorithmically. Collins [7] employs a method of finding empty spaces within a musician’s
rhythm and creates counter-melodies within the spaces. This provides an interestingly layered and
complex melodic structure that is meant to inspire the human musician into exploring new musical ideas.
This results in a constant trade of ideas between human and computer rather than having the computer
restricted to a supporting role. Another approach makes use of evolutionary computation and genetic
algorithms to generate artificial creativity [22]. This method creates a number of randomly generated
musical segments, picks from the most suitable segments and uses them to seed new segments. This
process is repeated until an acceptable segment is found.
This system uses a similar technique by using drum beats in the form of matrices as musical
segments. In the first generation, all of the potential drum beat candidates are randomly generated. From
there, the fitness functions determines the validity of each drum beat to see what will pass on to the next
Low Complexity Value High Complexity Value
P a g e | 29
generation, what will undergo mutation and crosser, and what will be purged and replaced with new,
randomly generated drum beats.
3.6.1 Scoring
Drum beats receive an overall fitness score which is determined by many independent factors. If a
low complexity value has been detected (between 0 and .5 for most input signals which feature
straightforward rhythms), each individual drum track will receive harsh penalties for extraneous hits and
for not having the kick and snare follow simple patterns. The hi hat pattern must also stay on beat for time
keeping purposes.
Figure 14. A simplified look at drum beat candidate evaluation.
The above image (figure 14) shows the factors that are taken into consideration when scoring a
drum beat for a simple piece of music. The kick drum is scored in the following way:
Where is the number of kick drum hits occurring on the down beats (beats 1 and 3) and being the
total number kick drum hits. This method pushes kick drum tracks which keep the pulse of beat forward
while an extra hit or two will be allowed without too much penalty. The maximum score is always capped
at 10 to provide an evenly rounded score across all tracks. 10 was chosen for no other reason than
simplicity based in percentages and to avoid the over-complication of the fitness functions.
The snare drum is scored in the same way but with an emphasis on the upbeats (2 and 4).
10
P a g e | 30
The hi hat, being the primary time keeper in this system, is scored differently:
Where is the total number of hi hat hits occurring on the beat and is the time signature numerator
(commonly 4). This is done so the hi hats stay on beat to help keep time for the musician while still allowing
for some variation on the off beats.
When the input signal has been determined to be of moderate complexity (.5- 2.0 for most input
signals which are not completely straightforward nor overly intricate) the same equation is applied but
with (+2) appended to the end. This is done so the penalties attributed to extra hits are less severe,
allowing for greater freedom of expression. This number ensures that more complex drum beats are
possible while still ensuring that unacceptable beats are not passed on.
For complexity values greater than 2.0, many restrictions are removed to allow for a wide range of
potential drum beats of varying difficulty. Instead of the scoring system present in the lower complexity
ranges, a set of hard coded penalties have been put in place to prevent very sparse drum beats as well as
drum beat matrices which are overly full of hits, these beats are given maximum scores of 2 out of 10.
Simple beats which only exhibit kick hits on 1 and 3 and snare hits on 2 and 4 are given a maximum score of
5. Otherwise, an acceptable beat will receive a score between 8 and 10.
3.6.2 Evolution
Once the scoring process has completed, the transition to the next generation begins. Drum beats
scoring in the top 25% are carried on to the next generation without modification. Another 25% of the
next generation is made up of high scoring individual instrument tracks from different drum beats. The
first is a hybrid of the top scoring instruments while the rest are randomly comprised of the top 25%
scoring instrument tracks with at least one of them being randomly generated. This hybrid set of drum
beats is purely experimental and does not always lead to an optimal solution but it does create a unique
set of drum beats which can potentially rise to the top.
A third 25% of drum beats are subject to vertical crossover in which the first section of a drum
beat is combined with the complement end section of another drumbeat. Drum beats scoring in the top
50% are potentially subject to this. The dividing line is randomly decided with checks in place to prevent
the same two drum beats from being split in the same place more than once. The following image (figure
15) graphically demonstrates these concepts.
P a g e | 31
Figure 15. A look at generational transitions associated with the genetic algorithm.
Finally, the final quarter of the next generation is randomly generated, ensuring a fresh supply of
new drum beats and instrument tracks is available. Currently, the evolutionary process is run through only
ten iterations, which has been shown to allow passable drum beats to be created without having the
process converge on the same drum beat each run through.
3.6.3 Introducing Cymbals and Toms
As there are no absolute strict rules for cymbal and tom hits within a drum beat, and some may
argue going against convention can help push new creative boundaries, the rules defined by the system are
fairly loose while still maintaining some degree of restraint.
One restraining aspect regarding the cymbals is how the downbeat of every fourth drum beat
iteration will feature a cymbal hit. There is no scientific justification for this other than in the author's own
experience as a drummer, cues such as this can help keep a group of performers aware of their position
within a song. Many patterns in rock and pop will feature repeats in multiples of a four which is why four
loops was chosen, though this number can easily be changed. Anchors like this will help to reassure the
user of where they are in a piece of music as well as assuring them that the program is in the correct
location.
Cymbals are also introduced to the drum beat mainly as accents which will replace hi hat hits,
depending on the complexity of the input signal. Medium complexity inputs will have a 15% chance of a
cymbal hit replacing a hi hat hit providing that a kick or snare drum hit is present as accented cymbals
P a g e | 32
sound relatively weak without bottom end support. For higher complexity inputs, this number is increased
to 20%. These numbers allow the cymbals to still be thought of as accents while not risking overuse.
If the sustain and volume values of a signal are high enough, there is also a 50% chance that a
cymbal will completely replace the hi hat track as a time keeper. The extra sustain associated with cymbals
can often help strengthen and support louder, sustained chords. The 50% value is chosen as open hi hats
and sustained cymbal hits are equally valid options.
For higher complexity values, toms may be added in as filler between snare drum hits or to even
replace certain snare and hi hat hits if there is a conflict (this is done to constrain the virtual drummer to
simulate a human drummer which typically will only have two arms available).
3.7 Drum Beat Enhancement
Table 2 shows a sample of the drum beat template. This template contains the basic elements of a
standard rock or jazz drum kit. A volume parameter has also been included for each instrument. An overall
volume will be derived from the relative amplitude of the incoming waveform. This volume value can be
adjusted for individual hits in order to allow for accents and ghost notes, common techniques used in the
composition of a dynamic drumbeat. The volume parameter ranges from 0 to 1.
A tone parameter has also been introduced to the tom and cymbal instrument tracks. A lower
value corresponds to a lower pitch and vice versa. This parameter thus allows the template to model a
drum kit containing multiple toms and cymbals, as is common for most standard kits. The tone parameter
also ranges from 0 to 1.
The position value for the hi hat represents the distance between the two cymbals that make up a
pair of hi hats. A value of 0 indicates tightly closed hi hats which produce a short, sharp sound. Completely
open hi hats, a value of 1, occur when there is no contact between the two cymbals; this is rarely used by
most human drummers. Values between 0 and 1 cover the remaining distance and allow for interesting
crescendo effects, accents, and setting an overall style. Figure 16 displays a representation of the drum
beat styling parameters.
P a g e | 33
Beat Template Beat
intr. parameter 1 & 2 & 3 & 4 &
kick probability 1.0 0.0 0.0 0.7 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.4 0.4 0.0
volume 0.8 0.8 0.8 0.8 0.8
snare probability 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.4 0.0 1.0 0.0 0.0 0.3
volume 0.8 0.4 0.8 0.4
hi hat probability 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.5
volume
0.8 0.8
0.8 0.8
0.8 0.8
0.8 1.0
position 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.7
tom hit 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0
volume
0.8 0.8
tone 0.6 0.4
cymbal hit 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
volume 0.8
0.8
tone 0.3 0.5
Figure 16. Representation of drum beat styling paramaters.
The values contained in the drum beat template are assigned by the drum beat styling aspect. The
probability values are rolled before the final drum beat is sent out so that each potential hit is a 0 or a 1.
This is done each time the beat is looped so that it has a chance of variation on each iteration.
3.7.1 Probability Values
Bilmes [3] claims that the key to machine expressivity is in variation and deviation, without which
any computer generated music will sound mechanical. To combat this, a set of probability values will be
introduced so that repeated drum beats are not played exactly the same way every time. These
probability values allow for variation within a drum beat without having to generate a completely new
one which retains some similarity.
The probability values assigned to each track determine how likely a given hit will occur each time
it is looped. Some rhythms require less rigidity in a drum beat, so the probability values can help to keep
things interesting as a particular beat is repeated. The probability values for each hit are determined
randomly but must remain above the minimum in order to help preserve a more natural feeling drum
beat.
Each instrument's set of probability values are determined randomly but with varying minimum
values. The following plot (figure 17) demonstrates how the lowest possible probability value for the kick
drum is determined for a given location. This chart only relates to drum beats where the complexity value
has been determined to be greater than .5, as inputs which score above .5 are typically of a high enough
complexity to warrant these additional drumming techniques.
P a g e | 34
Figure 17. Top - Minimum possible probability values by location for the kick drum with an emphasis on higher values
for beat one. Bottom - The snare is weighted as a counter to the bass drum in terms of possible probability values.
The area above the curve is the range of possible probability values at each beat location,
randomly generated within the acceptable bounds. In this case, the main pulse of the beat (on beats 1
and 3) has a higher minimum than on beats 2 and 4, this is done to avoid too much degradation of the
normally important hits of the kick drum. The minimum probability function is simply a cosine wave
ranging from 66-100 with clipping on peaks occurring after beat 1. The downbeat remains important
while the offbeats are subject to a greater degree of variability.
1 & 2 & 3 & 4 &
1 & 2 & 3 & 4 &
P a g e | 35
3.7.2 Volume Values
The next stage is to set the volume levels for each hit. A base volume level will be set based on the
relative amplitude of the input waveform. This base value will adjust to whatever the current dynamic level
of the input signal may be, much in the same way a human drummer will respond to another musician
playing louder or softer.
In drumming, an accent is a hit which is played with an increased power level compared to the hits
surrounding it. This is a commonly used technique by many drummers to increase the dynamic range of
their drum beats (e.g. John Bonham's introduction on Rock n' Roll by Led Zeppelin). Ghost notes are when
a drummer plays hit at a reduced volume, allowing for greater improvisational potential without
overloading the drum beat with full volume hits.
For the snare drum, accents and ghost notes are randomly decided when the complexity level is
high enough. The volume level for ghost notes ranges anywhere from (BaseVolume/2) to
(BaseVolume/2.5). Accents are played at double volume with a maximum of 1 as to avoid overly loud
accents; a human drummer already playing at full volume will have a hard time distinguishing his or her
accents by volume as well.
Tom, cymbal, and kick hits are currently kept at a steady volume but are still adjusted overall
during loud or quiet inputs.
3.7.3 Tone and Position Values
The tom tone level will be set to a low value if the tom is being used as the primary beat/time
keeper, essentially when it is being struck in an even, repeating pattern. Isolated tom hits can have
completely random tone levels otherwise. Fast, repeated tom hits which make up a drum fill will
commonly move from high to low, but other methods and combinations are just as valid and should not be
discouraged.
Cymbals are treated in much the same way, a lower value will be assigned if the cymbal is being
used as a time keeper. This is to represent a ride cymbal which is typically larger than other cymbals.
The position value largely relies on the input waveform (specifically, the intensity rating). A
waveform with longer, high amplitude trails following the onsets is most likely representing an instrument
that is being played with sustained chords. If the high hat is being used as the time keeper in this section,
the position value will be somewhere near the middle. If the waveform features quick drops in amplitude
after an onset, then this section of song will be more suited to a closed high hat featuring a lower position
value. The higher position values are used for accents, which are indicated by a higher volume level for
the hi hat at that moment in time.
P a g e | 36
3.8 Output
Due to constraints present in MATLAB and Simulink, audio output options are extremely limited.
Instead, a visual representation of the drumbeat is displayed for evaluation purposes along with an
audible click to indicate beat location. A separate vector array, located underneath the drum beat display,
gives a visual cue of where the system is currently located in regards to the drum beat output. This output
in figure 18 is for a simple kick/snare/hi hat drum beat.
Figure 18. Simulink visual representation of output. The top matrix represents (titled drumbeat) the current drum beat.
The middle array (titled timing) shows the current location of the beat. The bottom matrix (titled beatfx) displays the
styling parameters associated with the drum beat.
3.9 Simulink Model
The system features discussed above are all contained in MATLAB and mostly handled through
Simulink. Simulink allows for the use of Embedded MATLAB functions, which are similar to standard
MATLAB functions but with real time functionality and C code generation options. Unfortunately,
Embedded MATLAB is hindered by many restrictions including audio output, variable sized matrices, and
the exclusion of some commonly used, built in MATLAB functions. The following images located in figure
19 display the Simulink architecture and explain some of the components within.
kick snare hi hats beat location indicator kick probability kick volume snare probability
snare volume
hi hat probability hi hat volume hi hat position
P a g e | 37
Figure 19. Simulink workspace. The three information flows being sent from the Output Processing block are directed
to the drum beat visualisation matrices of Figure 18 in the previous section (3.8).
The input signal is buffered into a two second window for analysis by the Input Processing block
as well as being sent in 0.003125 second samples to the Timing Issues block. Input Processing handles all
the waveform analysis needed to create and style the drum beat, namely the volume, complexity, and
sustain values. Timing Issues is responsible for syncing the output of the drum beat with beat pattern
being predicted by aubio. Beat Generation is the block which calls the genetic algorithm for drum beat
creation. When a drum beat has been chosen, it is sent to the output processing block which adds the
final styling features such as, accents, ghost notes, hi hat position.
Drum Beat Generation: -Genetic Algorithm -Create un-styled beat
Input Processing: -Volume -Complexity -Sustain
Input signal
received from JACK
8000Hz
Timing Issues: -Synchronisation -Time Signature
ProgramStart =1 when input amplitude threshold is reached
Output Processing -drum beat styling -drum beat output timing
StartProcess =1 when 2 bars have been tapped out
P a g e | 38
Chapter 4: Results
Below is a set of a example drum beats produced by the program. Each drum beat is in 4/4 timing
divided into eighth notes. A brief description of the beat and the input signal it was accompanying can be
found to the right of each chart. These results were taken from the user evaluation stage in which only 8
count, kick/snare/hi hat beats were used in order to the simplify the output display for the user.
vol 0.500
4/4 snapping
comp 0.000
sustain 1.000
kick 1 0 0 0 1 0 0 1
snare 0 0 1 0 0 1 1 0
hat 1 1 1 0 1 0 1 0
beat 1 & 2 & 3 & 4 &
kick prob. 1 1 1 1 1 1 1 1
vol. 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500
snare prob. 1 1 1 1 1 1 1 1
vol. 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500
hi hat prob. 1 1 1 1 1 1 1 1
vol. 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500
pos. 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Figure 20. Drum beat to accompany consistent snapping.
vol 1.072
Just
comp 0.5695
sustain 2.429
kick 1 0 1 0 1 0 0 0
snare 0 0 1 0 0 1 1 0
hat 1 0 1 0 1 1 1 1
beat 1 & 2 & 3 & 4 &
kick prob. 1 0 .8 0 .98 0 0 0
vol. 1.072 1.072 1.072 1.072 1.072 1.072 1.072 1.072
snare prob. 0 0 .99 0 0 .96 .85 0
vol. 0 0 1.072 0 0 .356 1.072 0
hi hat prob. .8 0 .91 0 .89 .87 .83 .82
vol. 1.072 0 1.072 0 1.072 1.072 1.072 1.072
pos. 2.4 0.0 2.4 0.0 2.4 2.4 2.4 2.4
Figure 21. Drum beat to accompany a strummed guitar part.
The input of for this drum beat consisted
only of consistently timed finger-snapping
into a microphone. The drum beat itself is
fairly straight forward, but the offbeat kick
and snare hits in the second half of the bar
help to keep it interesting. The styling table
below is locked with 100% probability and
fixed volume for all tracks due to the low
complexity value
The guitar input here was played at full
volume and is somewhat more complex but
still with full chord strumming. Below, the
styling directions can be observed. Having
the complexity value above .5 allows for
the inclusion of probability values as well as
the potential of ghost notes and accents
P a g e | 39
vol 1.013
1400 jam
comp 1.750
sustain 1.920
kick 1 0 1 1 0 0 0 0
snare 0 0 1 0 0 0 1 1
hat 1 1 1 1 1 1 0 1
beat 1 & 2 & 3 & 4 &
kick prob. 1 0 .8 .85 0 0 0 0
vol. 1.013 0 1.013 1.013 0 0 0 0
snare prob. 0 0 .92 0 0 0 .87 .72
vol. 0 0 1.013 0 0 0 1.3 0
hi hat prob. .83 .85 .91 .99 .89 .95 0 .92
vol. 1.9 1.9 1.9 0 1.9 1.9 0 1.9
pos. 1.9 1.9 1.9 1.9 1.9 3 1.9 1.9
Figure 22. Drum beat to accompany a complex guitar part.
vol 1.117
paradise city
comp 1.750
sustain 2.290
kick 1 0 0 0 1 0 0 1
snare 1 0 1 0 0 0 1 0
hat 1 0 1 0 1 1 1 1
beat 1 & 2 & 3 & 4 &
kick prob. .98 0 0 0 .92 0 0 .80
vol. 1.117 0 0 0 1.117 0 0 1.117
snare prob. 0 0 .92 0 0 0 .87 0
vol. 1.117 0 1.117 0 0 0 1.117 0
hi hat prob. .83 0 .91 0 .89 .95 0 .92
vol. 1.117 0 1.117 0 1.117 1.117 1.117 1.117
pos. 2.3 0 2.3 0 2.3 2.3 2.3 2.3
Figure 23. Drum beat to accompany a chord based guitar part where each note is struck individually.
The complexity value can also be heavily influence by how quick the tap lead in is performed. If
the user taps out the beat in eighth notes, the complexity value will be lower than if it is tapped out in
quarter notes. This can be used as a way to give the user slightly more control of the drum beat
generation process.
The guitar part used in the creation of this
drum beat was quite complex. It was
played slightly muted but with some chords
still left sustained.
In this guitar part, the notes of the chords
are struck individually in a way that
features many off-beat onsets. Every note
typically rings out.
P a g e | 40
Chapter 5: Evaluation
5.1 Overview
This chapter will observe the performance of the system in terms of accuracy and style. The beat
tracking and onset detection methods will be compared with their respective input waveforms in order to
determine accuracy. Consistency within the beat tracking system is another aspect which will be
evaluated. The beat locations will be compared to the beat locations determined by a human musician.
Any delay in the output will also be noted as this can negatively affect the human musician and
potentially render the system unusable. The evaluation performed looked at the average latency value
between the input and output at different stages as well as any jitter than may be present. A consistent
latency value would allow the program to accurately jump ahead and predict beat locations while a high
amount of jitter would make accurate accompaniment impossible.
As for the actual performance of the program, a selection of human musicians will be asked to try
out the system and answer a series of questions regarding the quality of the experience. User feedback
can help identify key problem areas as well as provide useful information on potential future
implementations.
5.2 Quantitative Evaluation
5.2.1Beat Tracking
The two beat tracking systems tested here are the LabROSA beat tracker [11] and the aubio beat
tracker[10]. Each system was tested and compared with the human perceived beat. Analysis looks at the
tempo derived from each method, represented as an average interval between each detected beat (in
seconds). The standard deviation is another important factor to consider as it shows how consistent each
method is. Some degree of variance is to be expected as the test audio files are played by humans so a
natural, human deviation in tempo is present. The minimum and maximum time intervals are also shown
to demonstrate the potential severity of the variance. Two song samples were chosen for analysis in this
section as they are representative of the beat tracking capabilities of each system. The first song shown
will be an excerpt from the second movement of Paranoid Android, originally performed by the group,
Radiohead [14]. The song sample is played on a solo acoustic guitar. This sample features a rhythm with
many offbeat notes and accents as well as some slight syncopation, making it ideal to test the limitations
of any of the beat tracking systems. The second song will be a sample of the chorus from Blitzkrieg Bop by
the Ramones [23]. The sample is slower than the original and played on an acoustic guitar. It was chosen
because of its relatively straightforward rhythm, featuring accents only on the beats.
P a g e | 41
5.2.1.1 Paranoid Android
The charts below (figure 24) show the beat locations in relation to the song sample for each
method. The first chart was produced by the LabROSA beat tracker [11]and looks to be fairly consistent
and accurate (confirmed by listening the audio click track along with the original sample). The second
chart was produced by aubio [10] and appears to be much less consistent, deriving a much slower tempo
than what is present at the beginning. It should be noted that the tempo here is not being displayed in
half-time, which would be acceptable, and many of the beat locations range from slightly to greatly offset
from some of the obvious onsets. The third chart shows the beat locations determined by a human
tapping along with the song sample. It should be noted that the aubio beat tracker does not show any
beat locations for the first few beats as it has an initial warm up period when initialized.
Figure 24. Waveform with slightly complex rhythm. Top - beat locations as determined by the LabROSA Cover Song ID
beat tracker [11]. Middle - beat locations as determined by aubio beat tracker [10]. Bottom - beat locations as
determined by a human musician.
The chart below (figure 25) displays a numerical representation of what was stated above. The
average interval time between beat locations as determined by the LabROSA beat tracker is very close to
the human determined interval time. Both of these also have a low variance value which is seen as a
P a g e | 42
positive due to the relatively consistent tempo of the input guitar part. The aubio beat tracker performed
poorly on this sample, with a much higher variance and an extremely outlying maximum interval value.
Paranoid Android LabROSA aubio human
Average Interval 0.3355 0.4995 0.3385
Variance 0.0183 0.1818 0.0144
Max. Interval 0.3720 0.9056 0.3680
Min. Interval 0.2920 0.3019 0.3080
Figure 25. Accuracy information of different beat tracking methods. The Average Interval row displays the mean
interval times for the entire beat location file (basically giving the tempo of the song). Variance shows how far the
interval deviate from the mean on average. Max. Interval show the longest interval time within the beat location file
while Min. Interval shows the shortest. All values are in seconds.
The aubio package has a number of alternative onset detection methods which can be used with
the beat tracker, however, as can be seen below in figure 26, little difference is made in terms of
accuracy. One can conclude that the beat detection algorithm included in aubio would need
improvements in order to handle more complex rhythms.
Figure 26. Aubio beat detection utilising each onset detection method included in the program. While method
produces greatly different results, no one of them stands out as being particularly accurate or consistent.
P a g e | 43
5.2.1.2 Blitzkrieg Bop:
The beat locations of the charts below ( figure 27) appear to be consistent across all three. This
most likely due to simpler rhythmic pattern of this song sample compared to the previous one.
Figure 27. Waveform with a simple rhythm. Top - beat locations as determined by the LabROSA Cover Song ID beat
tracker [11]. Middle - beat locations as determined by aubio beat tracker [10]. Bottom - beat locations as determined by
a human musician.
The chart below (figure 28) also shows that both beat detection methods worked as well as the
human tapping method.
Blitzkrieg Bop LabROSA aubio human
Average Interval 0.5163 0.5167 0.5163
Variance 0.0159 0.0124 0.0150
Max. Interval 0.5921 0.5480 0.5640
Min. Interval 0.4760 0.4920 0.4800
Figure 28. Accuracy information of different beat tracking methods. The Average Interval row displays the mean
interval times for the entire beat location file (basically giving the tempo of the song). Variance shows how far the
interval deviate from the mean on average. Max. Interval show the longest interval time within the beat location file
while Min. Interval shows the shortest. All values are in seconds.
P a g e | 44
5.2.2 Latency
Latency was observed at two stages within the program. The difference between real input time
and the output from JACK was the first to be observed. The output latency here is what the system on the
second computer receives as its input. Figure 29 demonstrates this.
input - JACK delay
(s)
Average delay time 0.0278
Variance 0.0022
Max. delay time 0.0360
Min. delay time 0.0240
Figure 29. Latency measurements from the initial signal input to the signal after beat tracking has been performed.
The delay time after running the signal through the computer running aubio appears to be fairly
predictable. The delay time itself is relatively short and the amount of jitter (variance) is also quite low.
Users report that there was no real noticeable lag present at this stage in the system. The plot below
(figure 30) shows the distribution of delay times.
Figure 30. Distribution of delay times from the initial signal input to the signal after beat tracking has been performed.
P a g e | 45
The second point observed was at the final output of the program. A signal was sent through each
section of the chain to determine what the overall latency of the system would be.
input - program delay (s)
Average Interval 0.1015
Variance 0.0317
Max. Interval 0.1720
Min. Interval 0.0440
Figure 31. Distribution of delay times from the signal input to the final output signal (drum beat output).
The delay time here is noticeably higher (figure 31). Taking the delay from JACK into
consideration, the average would be about 0.0737 seconds. The variance here is also quite a bit higher,
but between being noticeable and not to users. The image below shows the distribution of delay intervals
(figure 32).
Figure 32. Distribution of delay times from the initial signal input to the final output signal (drum beat output).
P a g e | 46
When compared to the latency distribution from JACK, the difference is much more noticeable, as
shown in the graph below (figure 33).
Figure 33. Scale comparison of Figures 30 and 32.
The latency of the aubio and JACK systems is tolerable but the jitter associated with the program
running on MATLAB is of questionable reliability. It does not seem that MATLAB itself is an environment
that should be used for time dependent tasks such as required for this project.
5.3 Qualitative Evaluation
Qualitative evaluation is very important to a project such as this, as it can be hard to measure
musicality and entertainment mathematically, especially if the factors are largely based upon the personal
preference of the user. Users were asked to test out the system using an acoustic guitar, playing a range
of different styles, and then asked a series of questions regarding the performance of the program. Users
were asked how accurate they felt the system was, what level of interaction was experienced, how they
felt about the decisions made regarding the output drum beat, as well as some other general questions.
The main complaint received from users was the lack of an audio representation of the drum
beats presented to them. The visual cue was not nearly intuitive enough and proved to be difficult to
interpret for users not familiar with drum tabs, though multiple users commented that this method could
Delay from
JACK
Delay from
final output
P a g e | 47
benefit those learning drums. The program would basically provide a number of drum beats to the guitar
player who could then ask a drummer to play the beat on an actual drum kit. Another major complaint
shared by users was the accuracy of the beat tracker when attempting songs without straightforward
rhythms. One user felt that the drum beats did not take the actual rhythm of the input into consideration
as much as it could have. The user felt the drum beats were too simple for some of the complex rhythms
which he was playing and should have been based more on what as being played rather than just
increasing the complexity. Other users commented on the fact that drum beats accompanying verse-
chorus-verse structured songs were not reprised when melody patterns were repeated so the playing was
not as cohesive as it could have been.
As for the overall quality of the drum beats, reviews were mixed. Some felt that they provided a
very suitable accompaniment to their guitar playing while others felt that the beats were relatively
mediocre. One user felt that none of the generated beats would be considered acceptable for the style of
music which was being played. No users felt that the program had an influence on what they were
playing, saying it felt more like they were playing to the system and not with it.
Users felt there was great potential in the system and were not expecting a perfect
accompaniment from a program created in this time span. The general consensus was that this system
could be very useful as a practice tool as well as a potential method for song writing. Users were very
receptive to the concept of the system and a few were quite interested in how the beats were randomly
generated and scored, though it was stated that the fitness functions should be more receptive to
different styles of drumming than just basic rock beats. Some users said they would prefer to manually
choose a genre style for the drum beat rather than having it derived from the input. Users also
commented on the impracticality of requiring two computers to be able to fully run the program.
An observation noted by the author is how the sustain value calculations were more tailored to
the play style of author. The different playing techniques of other users resulted in unexpected sustain
values which generate hi hat position values which did not necessarily fit with the part being played. In
order to accurately and consistently derive this value, waveforms created by a wide range of musicians of
varying skill levels playing the same set of parts would need to be observed.
P a g e | 48
Chapter 6: Conclusions and Further Work
6.1 Conclusions
This report presents a design and development of a virtual drum accompanist for musical
composition and practice purposes. The ultimate goal is to have the virtual drummer seamlessly provide
accurate and appropriate accompaniments to pieces of music as they are played. The system is to model a
human drummer who has no prior knowledge of a song but can still create a suitable and dynamic drum
beat to complement whatever is being played. Through detailed waveform analysis and accurate beat
tracking, the virtual drummer can quickly compose and execute drum beats while modelling basic forms of
creativity and expression commonly found in human musicians.
This is an ongoing project with a prototype system currently being validated with a series of song
samples meant to determine the overall adaptability of the system. There is much that can be done to
optimise this system as it is still in the early stages of development. Future enhancements will help to
increase the virtual drummer's versatility as well as its ability to interact with human musicians. The
variables associated with the drum beat styling aspect are also undergoing constant analysis due to their
weighted influence on drum beat candidates. When an optimal configuration has been found, after
extensive testing with many users from a wide range of styles and abilities, the system may be put
through an initial training phase to reduce the number of invalid drum beats created during the
probability assignment phase.
The system did, however, help to provide an interesting perspective on the interaction between
human and virtual musicians. The way music and maths are related provides a motivation for modelling
musical concepts with computers, and the best way to evaluate the system is to have human musicians
interact with it. While the system is by no means a replacement for a human drummer, it provides
musicians with something to experiment and to their creative boundaries with.
The process of designing, implementing, and evaluating this system has shown that while a
number of the techniques were valid and could be expanded upon, different approaches should be taken
in regard to the completely random generation of drum beats. While an element of randomness and
deviation is important to a system like this, perhaps it should not be completely reliant on it. The system
present can serve as a springboard for future projects attempting to emulate the expressivity of human
drummers. The potential for use as a practice tool could be seen as the primary focus and help give a
greater sense of direction to further developments.
P a g e | 49
6.2 Further Work
Some key improvements needed in the system are in the beat detection portion and the audio
output section. A beat tracker which is not as easily influenced by rhythm changes within the same tempo
is of the greatest importance. The audio output aspect of the system also needs improvement which may
require converting the source code to another language due to Simulink's restrictions on various standard
MATLAB operations. Additional drum beat options, such as syncopation, swing beats, and drum fills could
also greatly benefit the system by increasing its flexibility.
Other future developments may include upgrades to the artificial expression simulator and the
drum beat template. Additional parameters may include hit locations for particular instruments which can
greatly expand the range of sound and create a more natural feel. Musical recognition techniques could
also be very beneficial to the system. If a musician performs a piece with a recurring theme, the system
should recognise it and reprise the drum beat that was previously associated with that theme. This
feature would give the system the ability to participate in very structured songs in a more cohesive
manner. This would also allow musicians to teach the system to play a song if it is able to pick up on these
musical cues.
The system could also be greatly expanded to include additional accompanists, such as piano,
bass, guitar, horns, and stringed instruments. However, an extension like that would require a much
higher level of audio analysis to extract the tones of individual notes. Another possibility is the
implementation of a vision system which allows the user to give visual cues to the system to indicate
movement changes, pauses, and other cues which are normally visually communicated between band
members.
P a g e | 50
References:
[1] Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler M, (2005), A tutorial on onset detection in music signals, IEEE transactions on speech and audio processing, vol. 13, no. 5: pp.1035-1047. [2] Bello J P, Duxbury C, Davies M, Sandler M, (2004), On the Use of Phase and Energy for Musical Onset Detection in the Complex Domain, IEEE Signal Processing Letters, vol. 11, no. 6: pp. 553-556. [3] Bilmes J, (1993), Techniques to foster drum machine expressivity, ICMC 93, pp.276-283. [4] Cockburn A, (2008), Using both incremental and iterative development, STSC Cross Talk, 21 (5): pp. 27-30. [5] Collins N, (2001) Algorithmic composition methods for breakbeat science, Proceedings of Music Without Walls. [6] Collins N, (2007), Musical robots and listening machines, Cambridge Companion to Electronic Music, pp. 171-184. [7] Collins N, (2010), Contrary Motion: An oppositional interactive music system, NIME Conference, pp. 125-129. [8] Collins N, (2011), LL: Listening and Learning in an Interactive Improvisation System, University of Sussex, unpublished. [9] Danby E, Ng K (2011), Virtual Drum Accompanist: Interactive Multimedia System to Model Expression of Human Drummers, Conference on Distributed Multimedia Systems, vol. 17: pp. 110-113. [10] Davies M, Brossier P, Plumbley M, (2005), Beat Tracking Towards Automatic Musical Accompaniment, Audio Engineering Society Convention, 118. [11] Ellis D, (2007), Beat tracking by dynamic programming, Journal of New Music Research, 36:1: pp. 51-60. [12] Goto M, Muraoka Y, (1994), A Beat Tracking System for Acoustic Signals of Music, ACM Multimedia 94 Proceedings, pp. 365-372. [13] Goto M, Muraoka Y, (1999), Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions, Speech Communication, Volume 27: pp. 311–355. [14] Greenwood C, Greenwood J, O'Brien E, Selway P, Yorke T, (1997), "Paranoid Android", OK Computer, Parlophone. [15] Larman C, Basili V, (2003), Iterative and Incremental Development: A Brief History, IEEE Computer Society, 36 (6): pp. 47-56. [16] Puckette M, Apel T, Zicarelli D, (1998), Real-time audio analysis tools for Pd and MSP, ICMC 98. [17] Puckette M, Brown J, (1998), Accuracy of Frequency Estimates from the Phase Vocoder, IEEE Transactions on Speech and Audio Processing, vol. 6 no. 2: pp. 166-176. [18] Ramirez R, Hazan A, (2005), Understanding Expressive Music Performance Using Genetic Algorithms, European Workshop on Evolutionary Music and Art, Berlin:Springer, pp. 508-516. [19] Ramirez R, Hazan A, Maestre E, Pertusa A, Gomez E, Serra X, (2007), Performance Based Interpreter Identification in Saxophone Audio Recordings, IEEE Transactions on Circuits and Systems for Video Technology, 17(3): pp. 356-364. [20] Ramirez R, Hazan A, Maestre E, Serra X, (2005), Understanding Expressive Transformations in Saxophone Jazz Performances, Journal of New Music Research, 34(4): pp.319-330.
P a g e | 51
[21] Ramirez R, Hazan A, Maestre E, Serra X, (2006), A Data Mining Approach to Expressive Music Performance Modelling, Multimedia Data Mining and Knowledge Discovery, Berlin: Springer, pp.362-380. [22] Ramirez R, Hazan A, Maestre E, Serra X, (2008), A genetic rule-based model of expressive performance for jazz saxophone, Computer Music Journal, 32:1: pp. 38-50. [23] Ramone D, Ramone T, (1976), "Blitzkrieg Bop", Ramones, Sire Records. [24] Robertson A, Plumbly M, (2007), B-Keeper: A Beat-Tracker for Live Performance. NIME07, pp. 234-237. [25] Sánchez R, Quintero G, (2009), "Hanuman", 11:11, Rubyworks. [26] Schloss A, (1985), On the Automatic Transcription of Percussive Music - From Acoustic signal to High-Level Analysis, Stanford University Ph.D. dissertation, Tech. Rep. STAN-M-27. [27] Toiviainen P, (1998), An Interactive MIDI Accompanist, Computer Music Journal, Vol. 22, No. 4: pp. 63-75. [28] Weinberg G, Blosser B, Mallikarjuna T, Raman A, (2009) The creation of a multi-human, multi-robot interactive jam session, NIME Conference, pp. 70-73. [29] Weinberg G, Driscoll S, Parry M, (2005), Musical Interactions with a Perceptual Robotic Percussionist, IEEE International Workshop on Robots and Human Interactive Communication, pp. 456-461. [30] Weinbrerg G, Driscoll S, (2006), Human Interaction with an Anthropomorphic Percussionist, CHI 2006 Proceedings, pp. 1229 - 1232. [31] Weinberg G, Driscoll S, (2006), Towards Robotic Musicianship, Computer Music Journal, 30:4: pp.28-45. [32] Weinberg G & Driscoll S, (2007), The Design of a Perceptual and Improvisational Robotic Marimba Player, IEEE International Conference on Robot & Human Interactive Communication, 15: pp.769-774. [33] Weinberg G, Godfrey M, Rae A, Rhoads J, (2007), A real-time genetic algorithm in human-robot musical improvisation, CMMR, pp. 351-259. [34] Wright R, Torry C, (1973), "The Great Gig in the Sky", The Dark Side of the Moon, Harvest/Capitol. [35] Zhe J, Wang Y, (2008), Complexity-Scalable Beat Detection with MP3 Audio Bitstreams, Computer Music Journal, 32:1: pp. 71-8
P a g e | 1
Appendix A: Personal Reflection
Many unforeseen roadblocks were encountered during the course of this project. The primary
impeding factor was the limitations associated with Simulink in regards to the MATLAB programming
language. Initial testing and design was done in the standard MATLAB environment and it was somewhat
surprising to see that many of the techniques already implemented had to be rewritten or adjusted to
increase compatibility with Simulink. Looking back, MATLAB was probably not the best choice of software
for a project which relied heavily on timing as it is not always accurate in that regard. There were a variety
of methods for implementing MIDI output and processing for the standard MATLAB package, so when the
decision to use MATLAB was made I had assumed that everything would work out. Too much time was
spent adjusting everything for use in Simulink that it severely detracted or prevented important timing
and audio output issues.
On another note, the DMS conference (http://www.ksi.edu/seke/dms11.html) provided a great
opportunity to see what others in the field had been working on, especially in regards to computing in
music. The conference deadline pushed many things forward and greatly helped with the writing of this
final report. Without the extra pressure from submission and presentation deadlines from the conference,
the system architecture and some of the methods would not have been developed as early and may have
been rushed at the last minute.
P a g e | 2
Appendix B: Interim Report
P a g e | 3
Appendix C: Operation Manual
Setup:
The aubio library (http://aubio.org/download) and JACK (jackaudio.org)need to be downloaded
and installed to a Linux machine. A computer microphone should be connected to the microphone input
on this computer.
Another computer must have an up to date version of MATLAB with Simulink. A license for the
Signal Processing Blockset within Simulink is also required. A male-male 1/8" stereo headphone/speaker
cable must be connected to the microphone input on this computer with the other end plugged into the
headphone/speaker output of the first machine (the one running aubio and JACK).
Download the compressed folder containing all of the necessary MATLAB code.
P a g e | 4
Set up software:
On the aubio machine. Start up the JACK software and click start. Open up a command prompt
and enter:
aubiotrack -j -t .5
where '-j' tells aubio to use jack and '-t .5' sets the input threshold amplitude at .5. This value can be raised
or lowered depending on the users play style.
On the JACK interface, click the 'Connect' button. Open the 'system' and 'aubio' drop down
menus. Drag out_1 to playback_1, and capture_1 to playback_2 and in_1. The image below demonstrates
how this will look.
Open MATLAB and map to the drive which contains drummer.mdl as well as all the other code
from the compressed folder. Open drummer.mdl.
Running:
Simply click the run button on the drummer.mdl model window, tap out 8 counts to properly align
the beat tracker, and begin playing right on count 9.
P a g e | 5
Appendix D: Resources Used
Aubio - http://aubio.org/download
Real time beat tracking software
JACK - jackaudio.org
Audio interfacing/routing between applications
LabROSA Cover Song ID - http://labrosa.ee.columbia.edu/projects/coversongs/
Beat detection and onset detection
MATLAB/Simulink
Programming environment, user interface
Audacity - http://www.download-audacity.com/
Recording and evaluation for aubio processes