Virtual Drum Accompanist - University of Leeds · PDF filemodelled on the components of a standard rock/jazz drum kit ... method from which the ... With respect to the basic drum beat,

P a g e | 1

P a g e | I

Abstract/Summary

This report discusses the design and implementation of a real time, computer generated drum

accompanist. The report starts off describing the motivation and aim of the project before discussing a

selection of related work. This background work focuses on the concepts of tracking the timing of musical

signals, deriving information from them, and creating a musical accompaniment.

From there, the report gives an overview of the entire system then proceeds to go into detail on

each of the components. Justification for each component and the methods used to create them are

discussed alongside these details, as well as how each component contributes to the overall performance

of system. The final portion of the report discusses the outcome of the project and how well it performed

the designated task of providing a drum accompaniment to the user.

The system described here makes use of machine listening techniques to analyze an incoming

musical audio signal and determine various musical properties such as dynamics, tempo, and style. The

system will then generate an expressive drum beat to accompany the audio signal in real time. These

drum beats will be created with a set of genetic algorithms which are intended to model basic human

creativity in a musical sense. The system is intended as a practice tool as well as a means to observe the

musical interaction that may occur between humans and computers.

P a g e | II

Chapter 1: Introduction ...................................................................................................................................1

1.1 Motivation..............................................................................................................................................1

1.2 Overview ................................................................................................................................................2

1.2.1 Aim ..................................................................................................................................................2

1.2.2 Objectives........................................................................................................................................2

1.2.3 Requirements ..................................................................................................................................3

1.2.4 Enhancements ................................................................................................................................5

1.2.5 Deliverables .....................................................................................................................................6

1.3 Methodology ..........................................................................................................................................6

1.3.1 Overview .........................................................................................................................................6

1.3.2 Development Stages .......................................................................................................................7

1.3.3 Schedule ..........................................................................................................................................8

Chapter 2: Related Background .................................................................................................................... 11

2.1 Audio Analysis and Onset Detection ................................................................................................... 11

2.2 Beat Tracking....................................................................................................................................... 12

2.3 Computer Generated Music ............................................................................................................... 14

Chapter 3: Design and Development ............................................................................................................ 17

3.1 Key Concepts ....................................................................................................................................... 17

3.2 Overall Architecture ............................................................................................................................ 17

3.3 General Approach ............................................................................................................................... 18

3.4 Modelling a Drum Kit .......................................................................................................................... 19

3.5 Input .................................................................................................................................................... 20

3.5 Audio Signal Analysis........................................................................................................................... 21

3.5.1 Onset Detection ........................................................................................................................... 21

3.5.2 Beat Tracking................................................................................................................................ 22

3.5.3 Volume ......................................................................................................................................... 23

3.5.4 Sustain .......................................................................................................................................... 24

3.5.5 Complexity ................................................................................................................................... 27

3.6 Drum Beat Generation ........................................................................................................................ 28

3.6.1 Scoring .......................................................................................................................................... 29

3.6.2 Evolution ...................................................................................................................................... 30

3.6.3 Introducing Cymbals and Toms .................................................................................................... 31

3.7 Drum Beat Enhancement .................................................................................................................... 32

3.7.1 Probability Values ........................................................................................................................ 33

3.7.2 Volume Values ............................................................................................................................. 35

P a g e | III

3.7.3 Tone and Position Values ............................................................................................................. 35

3.8 Output ................................................................................................................................................. 36

3.9 Simulink Model ................................................................................................................................... 36

Chapter 4: Results ......................................................................................................................................... 38

Chapter 5: Evaluation .................................................................................................................................... 40

5.1 Overview ............................................................................................................................................. 40

5.2 Quantitative Evaluation ...................................................................................................................... 40

5.2.1Beat Tracking ................................................................................................................................ 40

5.2.1.1 Paranoid Android ...................................................................................................................... 41

5.2.1.2 Blitzkrieg Bop: .......................................................................................................................... 43

5.2.2 Latency ......................................................................................................................................... 44

5.3 Qualitative Evaluation ......................................................................................................................... 46

Chapter 6: Conclusions and Further Work .................................................................................................... 48

6.1 Conclusions ......................................................................................................................................... 48

6.2 Further Work ....................................................................................................................................... 49

References: ................................................................................................................................................... 50

Appendix A - Personal Reflection

Appendix B - Interim Report

Appendix C - Operation Manual

Appendix D - Resources Used

P a g e | 1

Chapter 1: Introduction

1.1 Motivation

Many musicians enjoy performing and practising music with other musicians, however, it is not

always possible to assemble a group of like minded musicians together for this purpose. A software based

solution to provide automatic accompaniment for musicians is a good way to practise performing with

others, as well as gauging how a piece of music might sound with additional instruments. For those

looking to play their music with a drummer, drum programming software is readily available to anyone

with an internet connection. However, most of these programs only allow for the programming of simple

looping drumbeats and can take time to perfect. Those who are not familiar with drumming may also

have a hard time programming a drumbeat that sounds good and fits with the music being played. These

drum machines also do not typically respond to the musician's playing which forces the musician to follow

whatever has been programmed. This is particularly bad for practising as the player is not able play as

expressively as they would with a human drummer. Various instrument playing robots have also been

created, both programmable and improvisational. Unfortunately, most musicians cannot afford to

purchase robots to use as band mates.

What’s needed is a system which allows a musician to perform material on a live instrument and

hear back a drumbeat which can go along with it. This could be useful as a practice tool and as a

compositional aid for those who may not have immediate access to a human drummer.

The system described in this report is a step towards the ability to provide a practice tool for

musicians who may not be able to rehearse with other musicians by allowing them to play their

instrument live and have a supporting drum beat provided to them. Many beginning musicians will

practise by playing along with their favourite songs. However, this method does not allow one to take a

leading role in the performance. The finished program can be a useful tool for soloists to improve their

group performance in an interactive way and without relying on a metronome.

The program may also be useful for those trying to write music as they could get an idea of how a

given guitar or piano part would sound with drums supporting it. It will also be interesting to see how

human musicians interact with a computer simulated drummer and how it may influence their

performance.

This project is relevant to the field of Artificial Intelligence as it is fundamentally a computerised

model of human creativity as it relates to drum performance. This system is designed to exhibit the basic

musical intelligence of a drum player in such a way that it can react appropriately to different musical

cues and properties.

P a g e | 2

1.2 Overview

1.2.1 Aim

The overall aim of this project is to design and implement a program which uses machine listening

and learning to analyse audio produced by a human musician in real time. By analysing beat patterns and

rhythms, the program should provide feedback in the form of an accompanying percussion part. The

system will receive a musical audio signal as it is played, determine the tempo and beat pattern of the

signal, then output a percussion accompaniment to the musical signal in real-time.

First, a beat detection feature needs to be implemented. This feature must derive the beat

structure from an incoming audio waveform by detecting onset events within the wave. These onsets are

defined by sudden energy increases within the waveform and typically coincide with a note or chord being

played on the instrument that produced the waveform.

Once the beat structure has been determined, a fitting drum part will be generated in a predictive

manner so that it will be played along with the human musician in real time. This drum part will be

modelled on the components of a standard rock/jazz drum kit consisting of at least a bass drum, snare

drum, and hi hats.

Another aim is to provide a practice tool for musicians who may not be able to rehearse with other

musicians. It will also be interesting to see how human musicians interact with a computer simulated

drummer and how it may influence their performance.

1.2.2 Objectives

The program can be divided into four primary components:

Audio Input/Output Handling

Signal Analysis

Beat Tracker

Drumbeat Generator

Artificial Expression Simulator

There are many ways to design an audio input/output system so it is important to employ a system

which is efficient and effective. In order to handle streaming input, a buffer will be implemented so that

small samples of the incoming audio can be analysed. The output will be handled in a similar way, most

likely by playing out full measures of drumbeats at a time, though this may be subject to change.

P a g e | 3

The signal analysis component processes the incoming musical waveform into a form that the rest

of the program can process. This step is most vital for the beat tracking aspect. Further analysis will

attempt to detect other musical features that may be relevant to the style of the music.

The beat tracker will determine the tempo of a piece of music based on a consistent pattern of

onset moments occurring in the input signal. Because most humans do not possess perfect timing, the beat

tracker will be able to adapt to slight changes in tempo without skipping notes or going out of time. It must

also be responsive to expressive playing where onsets will not always be consistent and straightforward.

The drumbeat generator makes use of a probability template to generate a simple but varying

drumbeat. This is primarily to provide a groundwork for which the artificial expression simulator can

create interesting and relevant beats to complement the musician's playing. This aspect will hopefully

make further use of the input signal analysis to determine the dynamics and overall style of the music

being played.

1.2.3 Requirements

Key requirements of the project include the development of a robust and accurate beat tracking

algorithm capable of tracking songs with non-constant tempos. It requires a suitable onset detection

method from which the beat tracking algorithm can derive a tempo. The beat tracking process needs to

work in real time with a constant input therefore it must be able to accurately predict approaching beat

locations.

A suitable drum beat template must also be created with a strong reliance on the derived tempo.

This template will allow for the creation of basic and relatively complex beats while remaining flexible to

changes in tempo. The template does not define the drum beat itself but rather provide an empty

structure which allows particular drum hits to be specified. This will prevent the generated drum beat from

playing out of time.

Artificial expression is another key aspect of the system. In order to avoid repetitive and

emotionless drum beats, an algorithmic approach to mimic human creativity will be implemented. This will

allow the drum beat to respond to a musician’s dynamic level and style as they change throughout a song.

With respect to the basic drum beat, extra hits and drum fills should included but set so they are not

overdone, this will provide a more of a human effect to beat. The minimum requirements are as follows:

P a g e | 4

1. Implement an efficient input/output system

The program will require a buffer system to analyse small samples of an audio signal as it is

produced by a human musician. A circular buffer will most likely be implemented for this purpose. The

output must also not interfere with the input signal as this would cause the program to be influenced by

its own output rather than the human musician.

2. Analyse beat patterns within the audio signal

Extracting the tempo from input audio is essential to creating a drumbeat which will stay in time

with the user's playing. This can be accomplished by performing an onset analysis on the waveform and

by finding patterns in the peaks that occur. Any regular interval occurring between peaks (even if other

peaks are present during the interval) can produce the tempo. [7]

3. Perform in Real Time

In order to provide a true accompaniment, the program must be able to analyse the audio signal,

produce a beat, and play it back in time with the human musician. The program will have to predict what

the user will play and when, then play back the drumbeat to go along with what was predicted.

4. Implement a genetic algorithm to produce potential beat candidates.

The genetic algorithm will focus on generating patterns for each instrument of the drum kit to

produce a whole beat. Beats are scored according to criteria relating to each instrument which will

influence its overall score. The higher scoring beats will move on to the next generation, while beats

meeting certain scoring requirements will be randomly selected for crossover and mutations. The lowest

scoring beats will be scrapped and replaced with new randomly generated beats in order to provide new

possible beats. This algorithm will require modification as development progresses.

A potential beat created by this genetic algorithm could be visualised in the following way, with 1

indicating a hit and 0 representing a rest. The following beat in Figure 1 is for a single 4/4 measure split

along 16th notes, it can be programmed as a simple integer matrix in most programming languages:

P a g e | 5

Figure 1. Drum Beat Matrix Representation (adapted from [5])

5. Simulate a human drummer with feedback based on the previous analysis of tempo and patterns.

Using all the previous minimum requirements, the final drumbeat will attempt to emulate a

human drummer in the sense that it will adapt to change and provide suitable, non-repetitive drumbeats

in real time. The minimum requirement for this virtual drummer's kit is a bass drum, snare drum, and hi

hat.

1.2.4 Enhancements

Most rock and jazz drummers possess more than just a snare, kick, and hi hat. Including these

additional components would allow for more expressive drumbeats to be created. Each addition will

require a new scoring system for the genetic algorithm, as well as the introduction of new rules in how

each component can interact with the others. Acceptable drumbeats should be able to be played by a

competent human drummer possessing two arms and two legs.

Extracting musical features and information from the input can also improve the system. Further

waveform analysis can result in information which will affect various characteristics of the generated

drumbeat. Analysis may include changes in amplitude, the rate of onsets, and the rate of decay of any

peaks as these can all help influence stylistic decisions which are made during drumbeat creation. [7]

Because most drummers are not just drum machines, it is important to include a degree of

variability within a set beat. This allows a single beat to be repeated a few times with enough variation to

create a more natural effect. Instead of using 1's and 0's to indicate a hit or rest, values between 0 and 1

can be used to show the percentage of a hit occurring at that moment in time. Hits which help define the

beat can be weighted so that they will always be a 1 or close to it, while less essential hits can be given

lower values so that they are not repeated on every iteration. [5]

Introducing tone values for the cymbals and toms as well as a position value for the hi hats may

help to increase the realism of the virtual drum kit. This would create the effect of a drum kit possessing

multiple toms and cymbals without having to create additional instrument tracks within the drumbeat

template. With the hi hats, this value would represent the distance between the two cymbals. The tone

values would also range from 0 to 1, with lower values corresponding to a lower tone/hi hat distance, and

higher values indicating a higher tone/hi hat distance. [5] Also, allowing for volume adjustments would

allow the program to create accents and ghost notes, again giving the beat a more human feel.

P a g e | 6

Additional features such as capacity for triplets, drum fills, and swing beats would also greatly

increase the realism and musical capabilities of the virtual drum accompanist. Many drummers use drum

fills to signify transitions, create tension, and to bridge a section of music where not much else is going on.

A drum fill in this system would temporarily override the current beat for an appropriate duration so it

does not sound like a second drummer has suddenly joined in and dropped out. The fills would need to be

regulated so they do not occur too often or at inappropriate moments. This feature would greatly increase

the program's versatility if correctly implemented.

In an effort to make the program more user friendly, a simple user interface containing all the

functions of the program could be implemented. This would allow any user to simply install the program

and begin using it in a fully live situation without having to compile code or perform any unnecessary

setup.

1.2.5 Deliverables

The deliverables of this project will include a final report and a program which can take a musical

input and output a drumbeat to accompany it in real time.

1.3 Methodology

1.3.1 Overview

Iterative/incremental development [4] allows a developer to gradually incorporate individual

features to a program in such a way that each addition results in a fully functional version of the program.

Each addition follows a cycle of planning, design, implementation, testing, and evaluation. Every addition

also gives the developer a chance to reconfigure other existing design aspects within the program if

necessary. This method allows for a great deal of flexibility during architecture construction as well as

early and easier bug detection. [4] [15]

Each working implementation should be thoroughly tested and analysed before work on the next

version begins. Individual features should also be clearly separable in order to accommodate modification.

This process may call for a redesign of the system architecture should the need arise. [4] [15]

A project such as this can benefit greatly from this method of development. Due to the reliance on

genetic algorithms and probability based functions, it is difficult to predict exactly how the program will

react to any given feature implementation. Utilising an iterative/incremental development method, each

addition of a feature can be tested and optimised until the desired result is achieved. This development

P a g e | 7

method also states that features should be easily separable and well organized, this may allow the

developer to disable a given feature at any point in the development cycle in order to evaluate its

usefulness and efficiency. Sub-features may interact with each other but will be designed to be

independent of one another.

The project will follow the iterative/incremental [4] [15] method by first carrying out the initial

planning phase which involves heavy research and a basic architecture design. A few starting features will

also be considered during this stage in such a way that the next stage will produce workable results. The

next step will be to create the basic beat detection and drumbeat generation structures. These two

features make up the foundation of the project and will serve as a starting point for all additional features

to be implemented. A series of optimising features will then be gradually incorporated into the program.

1.3.2 Development Stages

The development stages of the project are shown below:

Stage 1

Initial Planning

Research

Broad Architecture Proposals

Stage 2

Beat Detection:

Implement as a subsection of main function.

Test to ensure detection is accurate

Drumbeat Generation:

Create basic drumbeat template using genetic algorithm (kick, snare, hi hat)

Do not take input waveform into consideration, create template independent of waveform.

Check to see that top beats are acceptable

Beat generator will only respond to pre-recorded tracks at this point

Stage 3

Beat Detector/Generator Syncing

Link the two features together so that the generated drumbeat will be displayed in time with the song playback

Latency analysis will start here for reference

Stage 4

Incorporate basic wave feature analysis to augment beat generator (will use features to determine stylistic changes to beat). Will have a few different segments (each addition with latency analysis):

P a g e | 8

Dynamics:

Volume parameters will be added to beat template. Are determined by relative local amplitude.

Sustain:

Observes peak trails to determine if legato (smooth) or staccato (detached) beat should be used

Hi hat openness parameter introduced.

Accents and ghost notes:

Adjust volume parameters of individual hits to allow for more dynamic drum beats.

Stage 5

Real time implementation for beat generator/detection system.

May occur during the feature implementations of stage 4.

Stage 6

Hit Probabilities

Non essential drum hits can have probability values assigned to them to create the effect of a more varied drumbeat while still retaining the overall feel of the drumbeat.

Additional tracks

Cymbals and toms added to structure

Stage 7

User testing

Optimisation and Deployment

1.3.3 Schedule

Below, the tentative schedule as of the writing of the interim report is shown. This schedule

indicated that many of the styling features associated with the drum beat output would be programmed

just after the submission of the interim report at the same time as the conversion to real time. Before this

point the system only accepted pre-recorded audio tracks. This plan, shown in figure 2, allowed for plenty

of time at the end for writing the report, evaluation, and optimisation.

P a g e | 9

Figure 2. Initial Project Schedule

The chart below (figure 3) shows how the process actually occurred.

Figure 3. Final/Actual Schedule

P a g e | 10

Converting the system to real time proved to be much more difficult than previously anticipated.

The beat tracking software used with pre-recorded tracks [LabROSA] was not designed for real time

function, so I opted to convert their system for real time use. Unfortunately, when attempting real time

processes in MATLAB using Simulink, many of the commands and techniques commonly used in MATLAB

are not available. This required an extensive period of time in which all of the lines in the original code

which threw out errors were substituted with alternate and less efficient blocks of code in order to

circumvent the limitations of Simulink. Nearing the end of this process, it was demonstrated that a

significant amount of latency was occurring during test runs and the clock timer built in to Simulink would

also slow down. These factors led to the decision to start over using the aubio [10] beat tracking system

which, while not as accurate, was fully ready to operate in real time. Following a series of licensing,

installation, and driver issues which were resolved thanks to the help from the School of Computing

support staff, the aubio system was able bring the system up to speed in terms of real time execution.

This period resulted in other aspects of the project being pushed back, most notably the drum

styling features which are considered vital to the emulation of a human drummer. The various obstacles

which occurred during this time led to an early start on the final report. The limitations associated with

Simulink also prevented the desired method of audio output, utilising a MIDI based drum kit to

accompany the input signal.

P a g e | 11

Chapter 2: Related Background

This section gives an overview of previous work which relates to this project. Various audio

analysis techniques are discussed first, followed by different approaches to the problem of beat tracking.

The final sub-section looks at a few papers related to the artificial emulation of human creativity

2.1 Audio Analysis and Onset Detection

Onset detection is a key aspect of the overall beat tracking algorithm. Bello, et al.[1] discuss various

approaches involving differences in energy and phase within the waveform. Each method has a potential

application that is largely based on the type of input that will be most commonly received. One method is

to observe the spectral features of the waveform by performing a Fourier transform on the wave [2]. The

features that may be derived from a waveform filtered this way are useful in detecting onsets amidst

relatively noisy and layered inputs.

Temporal features, which relate to a wave's amplitude, may also be taken into consideration. A

valid onset typically occurs during a sudden increase in the waveform's amplitude [26]. It is described in [1]

how rectifying and smoothing the signal can help to accentuate these onset features. It is also argued that

the wave should be filtered so that high amplitudes which are not part of a sudden rise will be lowered.

This makes it easier to identify where the onsets actually occur. It is stated that analysis of the temporal

features is a fast and efficient method of onset detection and is particularly useful when the audio signal is

being produced by single, accented instrument such as a guitar or piano.

Ellis [11] has worked to develop a beat tracking program which utilises Matlab. This program

analyses a waveform and generates an audio file which produces clicks corresponding to the detected

beat. Ellis discusses using an onset strength envelope which is essentially a filtered representation of the

original waveform. The onset strength envelope used in [11] locates sudden energy increases within the

waveform and represents them as individual spikes in the processed waveform. Higher spikes tend to

represent valid onsets. The filtering process involves performing a short-time Fourier transform similar to

the spectral feature analysis approach described in [1]. The signal is also passed through a high pass filter

and is then convolved with a Gaussian envelope [11]. This seems to be an effective approach though it

would need to be modified for real time applications.

Goto and Muraoka [13] attempt to recognise chord changes in a piece of music in order to detect

onsets for the purpose of beat tracking. By focusing on the lower end of the frequency spectrum, the

onset events are likely to occur on chord changes rather than on potentially complex melodies which are

typically of a higher frequency. While this method is relevant due to its execution in real time, it may not

P a g e | 12

be robust enough to accommodate pieces of music which do not feature clear and discernable chord

changes.

Tools such as Max/MSP and Pd can also be used for various aspects of audio analysis, as described

in [16]. The fiddle system outlined in [16] attempts to determine the pitch of an incoming audio stream

using sinusoidal decomposition. A rectangular-window discrete Fourier transform is used to obtain the

peaks in the audio along with their corresponding frequencies and amplitudes. From there, the

fundamental frequency of the input is estimated using a likelihood function in which each individual peak

is matched to the nearest frequency corresponding to a musical note. This works for single note inputs as

well as inputs consisting of more than one note and will display the note name(s) to the user. If an input is

not near enough to any of these fundamental frequencies then the input is determined to not have any

pitch at that moment in time.

Another system described in [16] is the bonk application. This system is used to detect the onsets

of percussive hits which are not pitched and therefore not susceptible to sinusoidal decomposition

analysis [17]. The system uses spectral analysis to detect percussive onsets rather than looking for sudden,

sharp increases in amplitude in order to avoid onsets being masked by loudly ringing sounds. This analysis

is further used to help identify what instrument produced the onset, this is done by comparing the

analysis with pre-stored and identified spectral templates. These systems, most notably the bonk program,

may offer interesting features to future implementations of the virtual drum accompanist system.

2.2 Beat Tracking

The concept of beat tracking has been explored in many different ways. Essentially, beat tracking

is the process of deriving the tempo and beat pattern of a piece of music in the same way that a human

may tap their foot in time with the music. A few approaches to this problem will be discussed in this

section.

Goto and Muraoka have developed a series of beat trackers, one of which derives the beat based

on audible percussive hits [12]. This beat tracker is able to function in real time but relies on a steady

drum beat to be present, whereas in this project the drum beat is to be responding to the tempo, not

leading it. Another one of Goto and Muraoka's beat trackers [13] detects chord changes in order to

determine patterns, specifically root note changes which occur between10Hz-1kHz.

The beat tracker used by Zhe and Wang [35] detects measures by extrapolating from evenly

spaced, pronounced downbeats. Subdivisions are then calculated to fill out each measure. This system

assumes a basic pop song in 4/4 time is being played at a constant tempo. Songs which do not feature a

pronounced downbeat may confuse the system, therefore limiting its ability to detect tempos in songs

P a g e | 13

that do not follow pop conventions. This system is not likely to have any relevance to this project.

The beat tracking system described by Ellis [11] utilises Matlab to analyse a waveform and

generate an audio file which produces clicks corresponding to the detected beat. Ellis discusses using an

onset strength envelope which is essentially a filtered representation of the original waveform. The onset

strength envelope used by Bello et al. [1] locates sudden energy increases within the waveform and

represents them as individual spikes in the processed waveform. Higher spikes tend to represent valid

onsets. The filtering process involves performing a short-time Fourier transform similar to the spectral

feature analysis approach also described in [11]. The signal is also passed through a high pass filter and is

then convolved with a Gaussian envelope. The onset detection methods used by Ellis are derived from the

filtering equations described by Bello et al. [1] By finding a pattern of recurring and equidistant peaks in

the onset envelope, a general tempo can be easily derived. The tempo calculation is weighted to prevent

extremely distant and near peaks from forming improbable tempos. The derived tempo is biased towards

120 beats per minute, a design decision which acts as probable middle ground for human created music.

Extremely fast and slow tempos, while still possibly in time, would not represent the common human

interpretation of the beat. This system is open source and easy to set up, though it is designed to work

only with pre-recorded audio files.

Collins [7] proposes a multi-agent beat tracking algorithm which follows a human musician playing

a MIDI keyboard in real time. This method uses a number of agents which predict where the next beat

should occur. Each agent, a hypothesis of beat locations, has a score and a weight to determine its

accuracy. An agent's score is increased for making correct beat predictions which coincide with a human

musician playing a note. Low scoring agents are erased and new ones are constantly created to allow for

the tracking of dynamic tempos. Each agent contains a set of values describing that particular agent's

current score, weight, and beat estimations. Whenever an onset occurs, each agent is checked to see how

well it predicted the onset occurring at that particular time. Poorly performing agents are eliminated and

new ones are generated based on the onset time. Scoring is weighted depending on the amount of time

since the last onset in order to prevent subdivision of a beat from counting as an actual beat. Essentially,

this weighting prevents overly fast tempos from being derived. Collins onset detection method is

dependent on MIDI onsets rather than a waveform so it does not utilise any onset detection methods that

would be required of a non-MIDI audio signal.

The aubio real-time audio library [10] provides real time beat tracking for streaming audio input

captured by a microphone. The library uses real time onset detection methods described in [1]. The real

time aspect is possible due to the predictive nature of the system. The next predicted beat is determined

by the intervals between the previous few beats with a Gaussian weighting applied, meaning the most

recent bit intervals will have more influence. The predicted beat location is based on the location of the

beat immediately preceding it. This makes the tracker very adaptable to changes in tempo, though it may

P a g e | 14

be prone to error if a complex input rhythm is introduced. The primary advantage of this system is the

absence of noticeable lag during the execution of the beat tracking function.

Toiviainen [27] explains how adaptive oscillators can be used for beat tracking purposes. The even

motion of an oscillator can easily map to the pulse of a beat. The peaks during oscillation can correspond

to downbeats while the troughs can represent upbeats. In this way, reliable counting during a bar can be

modelled. Onsets occurring near the peak of an oscillation can help influence the oscillator's speed

depending on whether the peak occurs before or after the detected onset. This also helps to prevent

onsets of complicated rhythms to have less influence on the derived tempo providing that a strong and

clearly defined downbeat is present. Toiviainen's adaptive oscillator also takes into account short term and

long term changes. If a sudden change in tempo occurs, the oscillator will increase speed in order to catch

up before settling back to the original tempo. The long term change tracker looks at how the tempo has

changed over time and adjusts the base speed of the oscillation to match. The adaptive and dynamic

nature of this system looks to be promising though it requires a MIDI input for its onset detection aspect.

The B-Keeper system uses the kick drum from a real drum kit to determine the beat in a live

setting [24]. The program uses a microphone with a line into a Max/MSP program which performs the

onset detection and beat tracking portions. The beat tracking aspect hooks up to a system to play backing

tracks used in live performances without the need for the drummer to play to a click track in a pair of

headphones. This allows for great expressivity within the band by not constraining them to an unchanging

backing track.

2.3 Computer Generated Music

Collins [7] explains how his system creates a melodic accompaniment to a human musician in real

time by detecting the chords and notes being played and producing an opposing melody in order to

inspire the musician to try something new. The rhythm of the melody is also designed to accent beats

which the human musician is overlooking. While many interesting ideas are discussed, the virtual

drummer of this project is meant to provide a more supportive role rather than pushing new ideas.

Perhaps this idea could be incorporated as an option in a future version of the program for musicians who

may be seeking new sources of inspiration or just looking for a challenge.

Collins also touches on the creation of computer generated music in 'Algorithmic Composition

Methods for Breakbeat Science' [5]. Collins explains how a series of probability templates can be set up

which display the probability of a given instrument being activated at that particular moment in time. The

example (a copy of figure 1) below displays a template for a single measure of 4 beats, each divided into a

set of 16th notes. When the probability value is 1.0, the note is guaranteed to be played on every iteration

of the measure and a 0.0 indicates no chance of activation.

P a g e | 15

Figure 1. Drum Beat Matrix Representation (Adapted from [5])

This method allows for a non-repetitive drumbeat which will cause the generated part to sound

slightly more human. Additional values can be attached to each location which affect volume and pitch if

desired. The template will be set to synchronise with the beat pattern and will continuously update the

time interval between notes as tempo changes in the musician's playing occur. The exact beat locations

must be anticipated if the system is to keep up in real time [6].

Another method contains set beats of varying length, a member of the set has a probability of

being executed rather than individual hits (which are 0 and 1 probability). It is also mentioned that

additional probability values could be included that influence an effect or tone for a given hit (would only

apply when p(hit) > 0). This concept could be used for varying volume levels in such a way that enables

ghost notes and accents within a drumbeat. Having current hits influence the probability values of future

hits will also help to create a more interesting drum part by potentially reducing chance based repetition.

Collins is still continuing work on expanding various machine listening and learning techniques in

computer generated musical accompaniment [8].

Weinberg et al.[28] have created a situation in which they can study the interaction between

human and robotic musicians. Shimon is a four armed robot which plays the marimbas and Haile uses two

appendages to play a hand drum. The robots are able to perform along with human musicians in real time,

Haile providing percussion accompaniment [31] [30] [29] (with lead/support trading capabilities) and

Shimon providing Thelonious Monk inspired marimba parts. Using physical robots to play instruments is to

give a sense of personality to the robotic performers, making musical interaction with human musicians

easier. Shimon is designed with to have a “head” which moves in time to the music and is able to

track/follow fellow performers. The ideas discussed are very relevant this project but without the

emphasis on physical robotic performers as this greatly complicates the process. In the future, however,

the program could easily be modified to send musical instructions to robot designed to play a drum kit.

The robot, Haile, can also be adapted to play a xylophone or small marimba [33] [32]. Haile's

playing is determined by a genetic algorithm which uses melodic excerpts of a human pianist's playing as

its base population. These excerpts set the style for what the robot will play, essentially providing the

robot with a natural, human inspired starting point. The robot is able to freely improvise due to the

musical instructions it receives from the other instruments. Note densities highly influence the robot's

playing between lead and support mode, creating an atmosphere similar to when human musicians

P a g e | 16

improvise together and take turns leading the song. The genetic algorithm used to determine what Haile

plays takes roughly 0.1 seconds to run which enables it to quickly respond to changes in playing if needed.

A notable difference between the Haile system and this project is that Haile relies on multiple samples of

human playing rather than coming up with its own parts. This is useful for selecting a specific style of

playing and may be useful in a future version of this program which enables a musician to select a

drumming style for the song he or she will be playing.

Ramirez, et al. [22] provide an approach rooted in the concept of genetic algorithms [18] and

machine learning in order to model human musician expressivity [21]. The authors make use of recordings

from a professional jazz saxophonist as a training set for which their computer generated composition will

use as a basis for creating creation rules [20]. Because these rules are rooted in a particular style, the

program will create a melody that is similar in style while still being relevant to song it was created for. The

authors attempted to look at the problem from the point of view of a human musician and how they

would interpret the music being presented to them before improvising an accompaniment [19]. This

method, with its genre based training sets, may be useful for the creation of drumbeats that are intended

for a particular style of music, similar to the work of Weinberg et al. [33]

P a g e | 17

Chapter 3: Design and Development

3.1 Key Concepts

Onset Detection

In musical audio signal processing, an onset refers to the peak that occurs when a note on an

instrument is first played. The onset marks the very beginning of the note and should not appear in the

middle of a sustained note. Onset detection describes the processing of locating these peaks within an

audio waveform, typically for the purpose of beat tracking.

Beat/Tempo Tracking

Beat tracking refers to process of determining the location of the beat of a musical signal, a beat

being the pulse within the music that one would normally tap their foot to. The beat can be derived from

a set of onsets by finding a consistent pattern between the onsets. The tempo of a song can be derived

from these beat locations by looking at the common interval time between them.

Drum Beats

A drum beat is a musical pattern often performed on a drum kit to accompany a piece of music.

The drum beat is played in a way which complements the music it is accompanying and can help keep a

group of musicians in time with each other.

Modelling Human Expression

While human expression is quite an abstract concept, for the purposes of this report it refers to

the way a human drummer plays a drum beat with variations in volume and style as well as variations on

the drum beat itself.

3.2 Overall Architecture

The system follows a circular process in which it first listens to a section of a streaming input

which consists of a musician playing their instrument. The signal is then processed in order to determine

the beat as well as various musical features. This information is then used to determine the fitness of

potential drum beats created by a genetic algorithm. Once a winning drum beat has been chosen, it is

P a g e | 18

then output back to the user with additional expressive features to simulate a realistic drum beat. This is a

continuous process so that changes in tempo can be properly accounted for.

This architecture, shown in figure 4, allows the system to be responsive to changes in the

musician's playing, as well as having the potential to influence the musician to respond to the drum beat

which is created. This back and forth interaction is at the heart of many live performances between

human musicians, therefore, the system also attempts to emulate this process in order to provide for a

more natural feel.

Figure 4. System Architecture Flow

3.3 General Approach

The audio signal analysis is what processes the incoming musical waveform into a form that the

rest of the program can recognize. This step is most vital for the beat tracking aspect. The beat tracker will

determine the tempo of song based on a consistent pattern of onset moments occurring in the input

signal.

As this project is geared toward guitar and piano players, a sampling of both guitar and piano

performances will be used to test the beat tracker. Input is captured with a standard, inexpensive

computer microphone to ensure that lower quality signals will be effective. For validation, the

performances will feature sample songs of varying speeds, volume, and levels of complexity. Additional

samples will contain non-consistent levels of these features, such as a song that is sped up and slowed

P a g e | 19

down at varying rates. This will be necessary as few musicians are able to continuously play at a perfectly

constant tempo. It is also important as some songs are very dynamic in this regard and should be

accounted for.

Additional analysis will include translating musical concepts into a basic numerical form. These

concepts include dynamics, articulation, and complexity.

The drum beat creation aspect makes use of a probability template to generate a simple but

varying drum beat. This is primarily to provide a groundwork for which the artificial expression simulator

can create interesting and relevant beats to the musician's playing. This aspect will make further use of the

input signal analysis to determine the dynamics and overall style of the song being played.

Because drumming is an art without too many set rules, many of the stylistic decisions made

during drum beat creation are based on a few simple conventions which mostly relate to rock and jazz

drumming. A wide range of musicians, with backgrounds in styles of music which commonly feature a

drum kit, were polled for their opinion on some of the decisions made below. The results of this poll will be

discussed as each concept requiring a musical assumption is brought up.

3.4 Modelling a Drum Kit

In order to provide an accurate representation of a human drummer, a standard rock/jazz drum

kit will need to be emulated. The image on the following page (figure 5) shows a five piece drum kit

complete with hi hats and cymbals.

Figure 5. Drum Kit Components

P a g e | 20

The bass and snare drums typically make up the core of a drumbeat, with the bass drum

commonly being struck first during a drum beat (known as the down beat) with the snare occurring in

between on the backbeats. Depending on how hard the snare is hit, it can provide loud accents or softer

filler notes known as ghost notes. The hi hats are used primarily to keep time, but when another drum is

being used for time keeping the hi hats may be hit to provide accents. Cymbals will often provide loud

accents to a drum beat but can also be used to keep time if desired. Toms range in sound and tone

depending on size. They are used for accents and drum fills, and occasionally lower toms will be used to

keep time.

3.5 Input

The system receives input from a basic computer microphone picking up the signal of an acoustic

instrument. The microphone is connected to a Linux machine running aubio [10]. aubio is an open source

package with real time beat tracking and onset detection capabilities. The input signal is analyzed and the

predicted beats are output as an audio signal in the form of clicks. aubio makes use of JACK

(http://jackaudio.org/), an open source program which allows for real time audio interfacing. JACK allows

the microphone signal to be sent to aubio and for the signal produced by aubio to be sent to the speaker

channel. Figure 6 shows a graphical representation of this process.

User Microphone Linux machine w/

aubio and JACK

Windows machine

w/ MATLAB and

Simulink

Speakers or

Headphones

Figure 6. User/Hardware Information Flow

P a g e | 21

The click track produced by aubio is then output through the left speaker channel of the

computer, while the raw input signal is sent through the right. A stereo speaker cable then leads to the

microphone input of a Windows machine running an instance of MATLAB. The core program is contained

within a Simulink model, a feature of MATLAB which allows for real-time handling of signals.

The Simulink model receives the signal from the Linux machine and analyses it within two second

windows. The signal is split into two arrays, one containing the click track and one containing the input

signal. The tempo of the audio is then derived from the click track while the input signal undergoes

further analysis.

A parallel process at the start looks at the input signal to make sure the user is about to start

playing. For a piece of music in 4/4, the musician needs to tap out eight hits before the program will

begin. This can be done by clapping, tapping on the instrument, or strumming muted guitar strings. Once

eight hits have been played, the musician can start playing as normal. Because aubio will continuously

output the timing click track, this introduction gives it a chance to readjust itself to the new tempo. The

tap-out introduction also helps the program to identify the down beat of the bar so that the drum beat

will not only stay in time with the musician, but correctly align itself to the time signature. This technique

is often used by musicians in a group for coordination purposes so it should not be an unfamiliar concept.

3.5 Audio Signal Analysis

3.5.1 Onset Detection

First, a suitable onset detection method must be implemented in order for the beat tracker to

successfully derive the tempo of the song. Running the signal through a short-time Fourier transform,

followed by a high pass filter, then convolving the signal with a Gaussian envelope, a suitable onset

strength envelope can be created [11]. An example of the onset envelope can be seen in figure 7. These

methods essentially just accentuate possible onsets so that beat tracking is a simple matter of peak

selection and pattern detection. These methods are also utilised by the real time aubio system, so its

onset detection incorporated into the beat tracking system will be used. The input signal analysis must be

executed in real time as there is no guarantee that a musician will be playing the same sample repeatedly

and at a perfectly consistent tempo.

P a g e | 22

Figure 7. Waveform Onset Envelope Derivation [11]

3.5.2 Beat Tracking

Accurate beat tracking is vital to the performance of the virtual drum accompanist. If it cannot play

in time with a human musician, then it would be considered a very poor accompanist. The beat tracking

aspect of this system is provides the tempo for the drum beat playback aspect by analyzing the interval

between beats.

The image on the following page (figure 8) shows the stages of beat detection used in the LabROSA

Cover Song ID package [11].

P a g e | 23

Figure 8. The top graph shows the raw waveform of a 17 second piano part. The graph below it displays the waveform

after it has been processed to display onset strength envelope. The bottom plot displays the beat derived from the

onset envelope [11].

Because this system is to be run in real time, the aubio beat tracking system [10] discussed in

section 2.2 will be used to set the tempo for the output.

3.5.3 Volume

A proficient drummer knows when to play loudly and when the play softly, often taking cues from

the leading musicians as to what volume level is appropriate. The automatic volume control of this system

establishes the output drum playing volume on the amplitude of the input signal.

The original waveform is monitored and tracked for gradual changes in amplitude, as these will

determine the overall volume of the drum beat. The program looks at the maximum amplitude within the

current window and retains the value in order to set the base output volume. Because the amplitude of the

P a g e | 24

input signal ranges from -1 to 1, the absolute value is observed, giving a volume range of 0 to about 1.3.

This method for volume control was chosen for its simplicity and fast computing time. When a selection of

musicians were asked if this was a valid musical assumption, all respondents agreed.

The image below (figure 9) shows the first section of the piano part from The Great Gig in the Sky

by Pink Floyd [34]. This song features very soft playing at first with a gradual crescendo (increase in

amplitude) into a louder section with a guitar accompaniment. These volume changes are noted within the

system to dictate the output volume of the drum beat.

Figure 9. Waveform volume change

3.5.4 Sustain

Articulation is a musical term which refers to the how the space between notes is handled, with

either silence, sustain, or a degree of both. In the case of guitar and piano, sustained chords will often

correspond to a smoother playing style, while notes with a noticeable silence between them can mean

something a bit more disjointed is being performed. While there are no strict rules dictating how a

drummer should react to differences in articulation, it is fairly common to see a drummer matching his or

her articulations to that of the other musicians. Regardless of what direction a drummer takes in regard to

articulation or sustain, it is a factor that should not be ignored. Sustain will be represented as a rating from

1 to 3, 1 meaning little to no sustain, 3 meaning full sustain, and 2 representing a moderate level of sustain.

The decrease rate in amplitude between peaks is what will be observed in order to determine the

sustain value. Quick, drastic drops in amplitude will give a sustain value of 1 while slow and mild drops in

amplitude between peaks will produce a value of 3. Any decay rate in between these is given a 2. At the

end of the window, the average sustain value is then sent forward to determine drum articulation. The

sustain value does not carry over from window to window as this would greatly hinder the system's ability

to quickly adapt to changing styles.

P a g e | 25

Testing has shown that quick amplitude drops to below 20% of the maximum amplitude indicate a

staccato (heavily disjointed) style of playing. Drops to this value over a longer period of time (typically

greater than .4 seconds) are mainly due to the natural amplitude decay of a musical signal and therefore

not considered to be staccato. Testing has also shown that if a signal retains at least 30% of its maximum

amplitude, a high level of sustain is present. These values have been gathered with the use of inexpensive

microphones, the use of compressors in any recording device would not be compatible with this system.

The first image (figure 10) is from a slower, acoustic version of "Blitzkrieg Bop" by the Ramones

[23]. The slower version of the song features long, sustained chords which accounts for the slow rate of

decay in the signal. The signal holds around .4 in amplitude before the next onset occurs, further evidence

of the sustained chord.

Figure 10. Gradual amplitude decay with balance around .4 on the y axis.

The next image (figure 11) is taken from the second movement of "Paranoid Android" by

Radiohead [14]. This part features slightly quicker decay as only a single string is played at a time as

opposed to full chords. This results in a slightly less smooth guitar part but with still with some degree of

sustain dropping only slightly below .4.

P a g e | 26

Figure 11. Quick initial amplitude decay with slow latent decay and/or balance.

The final image (figure 12) is taken from the song "Hanuman" by Rodrigo y Gabriela [34]. This is a

much more disjointed guitar part than the previous samples as evidenced by the quick drops in amplitude

as well as the drops below .2 which indicate a brief silence from the guitar aside from background noise

including the sound created by fingers sliding across the muted strings to new positions.

Figure 12. Quick initial decay with occasional balances below .2.

P a g e | 27

3.5.5 Complexity

Rhythm is a very important aspect in music, the way a particular part is played on a melodic

instrument can greatly influence how a human drummer will perform their part. Typically, rhythmically

simple guitar or piano parts will feature an equally simple drum part. One can look at the classic punk

band, The Ramones, to hear an example of this. For musicians who are practicing simple parts, a complex

drum beat may be detrimental to the feel of the song and may also confuse novices who are not

accustomed to intricate rhythms. Musicians who are able to play slightly more complex rhythms should

therefore be able to handle increasingly complex drum parts. While some may argue that higher level

playing does not always necessitate an elaborate drum beat, it may be useful for practice purposes to help

push a musician to work with and around drum beats of varying difficulty.

The complexity value is determined by looking at the number of offbeat onsets compared the

number of detected beats.

Where is the total number of onsets, is the number of onsets which lie on a detected beat,

and is the total number of beats. This number typically ranges from 0 to anywhere above 2.0, where 0

indicates a very straightforward rhythm, while higher values correspond to an increased complexity. Ob is

determined by looking at the time location of a given onset and comparing it with the time locations from

the beat array. If the time locations are within 1/40th of a second, it is assumed that the onset is on the

beat. While the onsets and beats which occur simultaneously normally have the same time value,

occasional discrepancies between the two waveforms may result in slightly offset onset index locations.

This may also be caused by the musician playing a note slightly ahead or behind the beat. Once found, the

complexity value is sent forward to the genetic algorithm fitness function.

The justification for this method comes from the observation that simple rhythms will follow the

beat of the song fairly strictly, or at least within even subdivisions of the beat. Complex rhythms will

deviate from the beat and typically do not feature consistently spaced notes.

In figure 13, the graph on the left (from [34]) shows how each onset occurs at the same time as a

beat indicator, this is representative of a very simple input and will be given a complexity value of 0. The

graph on the right (from [14]) shows a number of onsets which do not correspond to any beat as well as a

few beats which have no onset occurring simultaneously. The local window in this sample would have a

higher complexity value due to these factors.

P a g e | 28

Figure 13. Low complexity value on the left due to an even beat:onset ratio with all onsets occurring on beat. The wave

on the right has a higher complexity value due to the number of onsets which occur off beat, some beats also have no

onsets associated with them.

3.6 Drum Beat Generation

Artificial creativity is a rather abstract concept in which a model of human creativity can be

represented algorithmically. Collins [7] employs a method of finding empty spaces within a musician’s

rhythm and creates counter-melodies within the spaces. This provides an interestingly layered and

complex melodic structure that is meant to inspire the human musician into exploring new musical ideas.

This results in a constant trade of ideas between human and computer rather than having the computer

restricted to a supporting role. Another approach makes use of evolutionary computation and genetic

algorithms to generate artificial creativity [22]. This method creates a number of randomly generated

musical segments, picks from the most suitable segments and uses them to seed new segments. This

process is repeated until an acceptable segment is found.

This system uses a similar technique by using drum beats in the form of matrices as musical

segments. In the first generation, all of the potential drum beat candidates are randomly generated. From

there, the fitness functions determines the validity of each drum beat to see what will pass on to the next

Low Complexity Value High Complexity Value

P a g e | 29

generation, what will undergo mutation and crosser, and what will be purged and replaced with new,

randomly generated drum beats.

3.6.1 Scoring

Drum beats receive an overall fitness score which is determined by many independent factors. If a

low complexity value has been detected (between 0 and .5 for most input signals which feature

straightforward rhythms), each individual drum track will receive harsh penalties for extraneous hits and

for not having the kick and snare follow simple patterns. The hi hat pattern must also stay on beat for time

keeping purposes.

Figure 14. A simplified look at drum beat candidate evaluation.

The above image (figure 14) shows the factors that are taken into consideration when scoring a

drum beat for a simple piece of music. The kick drum is scored in the following way:

Where is the number of kick drum hits occurring on the down beats (beats 1 and 3) and being the

total number kick drum hits. This method pushes kick drum tracks which keep the pulse of beat forward

while an extra hit or two will be allowed without too much penalty. The maximum score is always capped

at 10 to provide an evenly rounded score across all tracks. 10 was chosen for no other reason than

simplicity based in percentages and to avoid the over-complication of the fitness functions.

The snare drum is scored in the same way but with an emphasis on the upbeats (2 and 4).

10

P a g e | 30

The hi hat, being the primary time keeper in this system, is scored differently:

Where is the total number of hi hat hits occurring on the beat and is the time signature numerator

(commonly 4). This is done so the hi hats stay on beat to help keep time for the musician while still allowing

for some variation on the off beats.

When the input signal has been determined to be of moderate complexity (.5- 2.0 for most input

signals which are not completely straightforward nor overly intricate) the same equation is applied but

with (+2) appended to the end. This is done so the penalties attributed to extra hits are less severe,

allowing for greater freedom of expression. This number ensures that more complex drum beats are

possible while still ensuring that unacceptable beats are not passed on.

For complexity values greater than 2.0, many restrictions are removed to allow for a wide range of

potential drum beats of varying difficulty. Instead of the scoring system present in the lower complexity

ranges, a set of hard coded penalties have been put in place to prevent very sparse drum beats as well as

drum beat matrices which are overly full of hits, these beats are given maximum scores of 2 out of 10.

Simple beats which only exhibit kick hits on 1 and 3 and snare hits on 2 and 4 are given a maximum score of

5. Otherwise, an acceptable beat will receive a score between 8 and 10.

3.6.2 Evolution

Once the scoring process has completed, the transition to the next generation begins. Drum beats

scoring in the top 25% are carried on to the next generation without modification. Another 25% of the

next generation is made up of high scoring individual instrument tracks from different drum beats. The

first is a hybrid of the top scoring instruments while the rest are randomly comprised of the top 25%

scoring instrument tracks with at least one of them being randomly generated. This hybrid set of drum

beats is purely experimental and does not always lead to an optimal solution but it does create a unique

set of drum beats which can potentially rise to the top.

A third 25% of drum beats are subject to vertical crossover in which the first section of a drum

beat is combined with the complement end section of another drumbeat. Drum beats scoring in the top

50% are potentially subject to this. The dividing line is randomly decided with checks in place to prevent

the same two drum beats from being split in the same place more than once. The following image (figure

15) graphically demonstrates these concepts.

P a g e | 31

Figure 15. A look at generational transitions associated with the genetic algorithm.

Finally, the final quarter of the next generation is randomly generated, ensuring a fresh supply of

new drum beats and instrument tracks is available. Currently, the evolutionary process is run through only

ten iterations, which has been shown to allow passable drum beats to be created without having the

process converge on the same drum beat each run through.

3.6.3 Introducing Cymbals and Toms

As there are no absolute strict rules for cymbal and tom hits within a drum beat, and some may

argue going against convention can help push new creative boundaries, the rules defined by the system are

fairly loose while still maintaining some degree of restraint.

One restraining aspect regarding the cymbals is how the downbeat of every fourth drum beat

iteration will feature a cymbal hit. There is no scientific justification for this other than in the author's own

experience as a drummer, cues such as this can help keep a group of performers aware of their position

within a song. Many patterns in rock and pop will feature repeats in multiples of a four which is why four

loops was chosen, though this number can easily be changed. Anchors like this will help to reassure the

user of where they are in a piece of music as well as assuring them that the program is in the correct

location.

Cymbals are also introduced to the drum beat mainly as accents which will replace hi hat hits,

depending on the complexity of the input signal. Medium complexity inputs will have a 15% chance of a

cymbal hit replacing a hi hat hit providing that a kick or snare drum hit is present as accented cymbals

P a g e | 32

sound relatively weak without bottom end support. For higher complexity inputs, this number is increased

to 20%. These numbers allow the cymbals to still be thought of as accents while not risking overuse.

If the sustain and volume values of a signal are high enough, there is also a 50% chance that a

cymbal will completely replace the hi hat track as a time keeper. The extra sustain associated with cymbals

can often help strengthen and support louder, sustained chords. The 50% value is chosen as open hi hats

and sustained cymbal hits are equally valid options.

For higher complexity values, toms may be added in as filler between snare drum hits or to even

replace certain snare and hi hat hits if there is a conflict (this is done to constrain the virtual drummer to

simulate a human drummer which typically will only have two arms available).

3.7 Drum Beat Enhancement

Table 2 shows a sample of the drum beat template. This template contains the basic elements of a

standard rock or jazz drum kit. A volume parameter has also been included for each instrument. An overall

volume will be derived from the relative amplitude of the incoming waveform. This volume value can be

adjusted for individual hits in order to allow for accents and ghost notes, common techniques used in the

composition of a dynamic drumbeat. The volume parameter ranges from 0 to 1.

A tone parameter has also been introduced to the tom and cymbal instrument tracks. A lower

value corresponds to a lower pitch and vice versa. This parameter thus allows the template to model a

drum kit containing multiple toms and cymbals, as is common for most standard kits. The tone parameter

also ranges from 0 to 1.

The position value for the hi hat represents the distance between the two cymbals that make up a

pair of hi hats. A value of 0 indicates tightly closed hi hats which produce a short, sharp sound. Completely

open hi hats, a value of 1, occur when there is no contact between the two cymbals; this is rarely used by

most human drummers. Values between 0 and 1 cover the remaining distance and allow for interesting

crescendo effects, accents, and setting an overall style. Figure 16 displays a representation of the drum

beat styling parameters.

P a g e | 33

Beat Template Beat

intr. parameter 1 & 2 & 3 & 4 &

kick probability 1.0 0.0 0.0 0.7 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.4 0.4 0.0

volume 0.8 0.8 0.8 0.8 0.8

snare probability 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.4 0.0 1.0 0.0 0.0 0.3

volume 0.8 0.4 0.8 0.4

hi hat probability 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.5

volume

0.8 0.8

0.8 0.8

0.8 0.8

0.8 1.0

position 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.7

tom hit 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0

volume

0.8 0.8

tone 0.6 0.4

cymbal hit 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

volume 0.8

0.8

tone 0.3 0.5

Figure 16. Representation of drum beat styling paramaters.

The values contained in the drum beat template are assigned by the drum beat styling aspect. The

probability values are rolled before the final drum beat is sent out so that each potential hit is a 0 or a 1.

This is done each time the beat is looped so that it has a chance of variation on each iteration.

3.7.1 Probability Values

Bilmes [3] claims that the key to machine expressivity is in variation and deviation, without which

any computer generated music will sound mechanical. To combat this, a set of probability values will be

introduced so that repeated drum beats are not played exactly the same way every time. These

probability values allow for variation within a drum beat without having to generate a completely new

one which retains some similarity.

The probability values assigned to each track determine how likely a given hit will occur each time

it is looped. Some rhythms require less rigidity in a drum beat, so the probability values can help to keep

things interesting as a particular beat is repeated. The probability values for each hit are determined

randomly but must remain above the minimum in order to help preserve a more natural feeling drum

beat.

Each instrument's set of probability values are determined randomly but with varying minimum

values. The following plot (figure 17) demonstrates how the lowest possible probability value for the kick

drum is determined for a given location. This chart only relates to drum beats where the complexity value

has been determined to be greater than .5, as inputs which score above .5 are typically of a high enough

complexity to warrant these additional drumming techniques.

P a g e | 34

Figure 17. Top - Minimum possible probability values by location for the kick drum with an emphasis on higher values

for beat one. Bottom - The snare is weighted as a counter to the bass drum in terms of possible probability values.

The area above the curve is the range of possible probability values at each beat location,

randomly generated within the acceptable bounds. In this case, the main pulse of the beat (on beats 1

and 3) has a higher minimum than on beats 2 and 4, this is done to avoid too much degradation of the

normally important hits of the kick drum. The minimum probability function is simply a cosine wave

ranging from 66-100 with clipping on peaks occurring after beat 1. The downbeat remains important

while the offbeats are subject to a greater degree of variability.

1 & 2 & 3 & 4 &

1 & 2 & 3 & 4 &

P a g e | 35

3.7.2 Volume Values

The next stage is to set the volume levels for each hit. A base volume level will be set based on the

relative amplitude of the input waveform. This base value will adjust to whatever the current dynamic level

of the input signal may be, much in the same way a human drummer will respond to another musician

playing louder or softer.

In drumming, an accent is a hit which is played with an increased power level compared to the hits

surrounding it. This is a commonly used technique by many drummers to increase the dynamic range of

their drum beats (e.g. John Bonham's introduction on Rock n' Roll by Led Zeppelin). Ghost notes are when

a drummer plays hit at a reduced volume, allowing for greater improvisational potential without

overloading the drum beat with full volume hits.

For the snare drum, accents and ghost notes are randomly decided when the complexity level is

high enough. The volume level for ghost notes ranges anywhere from (BaseVolume/2) to

(BaseVolume/2.5). Accents are played at double volume with a maximum of 1 as to avoid overly loud

accents; a human drummer already playing at full volume will have a hard time distinguishing his or her

accents by volume as well.

Tom, cymbal, and kick hits are currently kept at a steady volume but are still adjusted overall

during loud or quiet inputs.

3.7.3 Tone and Position Values

The tom tone level will be set to a low value if the tom is being used as the primary beat/time

keeper, essentially when it is being struck in an even, repeating pattern. Isolated tom hits can have

completely random tone levels otherwise. Fast, repeated tom hits which make up a drum fill will

commonly move from high to low, but other methods and combinations are just as valid and should not be

discouraged.

Cymbals are treated in much the same way, a lower value will be assigned if the cymbal is being

used as a time keeper. This is to represent a ride cymbal which is typically larger than other cymbals.

The position value largely relies on the input waveform (specifically, the intensity rating). A

waveform with longer, high amplitude trails following the onsets is most likely representing an instrument

that is being played with sustained chords. If the high hat is being used as the time keeper in this section,

the position value will be somewhere near the middle. If the waveform features quick drops in amplitude

after an onset, then this section of song will be more suited to a closed high hat featuring a lower position

value. The higher position values are used for accents, which are indicated by a higher volume level for

the hi hat at that moment in time.

P a g e | 36

3.8 Output

Due to constraints present in MATLAB and Simulink, audio output options are extremely limited.

Instead, a visual representation of the drumbeat is displayed for evaluation purposes along with an

audible click to indicate beat location. A separate vector array, located underneath the drum beat display,

gives a visual cue of where the system is currently located in regards to the drum beat output. This output

in figure 18 is for a simple kick/snare/hi hat drum beat.

Figure 18. Simulink visual representation of output. The top matrix represents (titled drumbeat) the current drum beat.

The middle array (titled timing) shows the current location of the beat. The bottom matrix (titled beatfx) displays the

styling parameters associated with the drum beat.

3.9 Simulink Model

The system features discussed above are all contained in MATLAB and mostly handled through

Simulink. Simulink allows for the use of Embedded MATLAB functions, which are similar to standard

MATLAB functions but with real time functionality and C code generation options. Unfortunately,

Embedded MATLAB is hindered by many restrictions including audio output, variable sized matrices, and

the exclusion of some commonly used, built in MATLAB functions. The following images located in figure

19 display the Simulink architecture and explain some of the components within.

kick snare hi hats beat location indicator kick probability kick volume snare probability

snare volume

hi hat probability hi hat volume hi hat position

P a g e | 37

Figure 19. Simulink workspace. The three information flows being sent from the Output Processing block are directed

to the drum beat visualisation matrices of Figure 18 in the previous section (3.8).

The input signal is buffered into a two second window for analysis by the Input Processing block

as well as being sent in 0.003125 second samples to the Timing Issues block. Input Processing handles all

the waveform analysis needed to create and style the drum beat, namely the volume, complexity, and

sustain values. Timing Issues is responsible for syncing the output of the drum beat with beat pattern

being predicted by aubio. Beat Generation is the block which calls the genetic algorithm for drum beat

creation. When a drum beat has been chosen, it is sent to the output processing block which adds the

final styling features such as, accents, ghost notes, hi hat position.

Drum Beat Generation: -Genetic Algorithm -Create un-styled beat

Input Processing: -Volume -Complexity -Sustain

Input signal

received from JACK

8000Hz

Timing Issues: -Synchronisation -Time Signature

ProgramStart =1 when input amplitude threshold is reached

Output Processing -drum beat styling -drum beat output timing

StartProcess =1 when 2 bars have been tapped out

P a g e | 38

Chapter 4: Results

Below is a set of a example drum beats produced by the program. Each drum beat is in 4/4 timing

divided into eighth notes. A brief description of the beat and the input signal it was accompanying can be

found to the right of each chart. These results were taken from the user evaluation stage in which only 8

count, kick/snare/hi hat beats were used in order to the simplify the output display for the user.

vol 0.500

4/4 snapping

comp 0.000

sustain 1.000

kick 1 0 0 0 1 0 0 1

snare 0 0 1 0 0 1 1 0

hat 1 1 1 0 1 0 1 0

beat 1 & 2 & 3 & 4 &

kick prob. 1 1 1 1 1 1 1 1

vol. 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500

snare prob. 1 1 1 1 1 1 1 1

vol. 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500

hi hat prob. 1 1 1 1 1 1 1 1

vol. 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500

pos. 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Figure 20. Drum beat to accompany consistent snapping.

vol 1.072

Just

comp 0.5695

sustain 2.429

kick 1 0 1 0 1 0 0 0

snare 0 0 1 0 0 1 1 0

hat 1 0 1 0 1 1 1 1

beat 1 & 2 & 3 & 4 &

kick prob. 1 0 .8 0 .98 0 0 0

vol. 1.072 1.072 1.072 1.072 1.072 1.072 1.072 1.072

snare prob. 0 0 .99 0 0 .96 .85 0

vol. 0 0 1.072 0 0 .356 1.072 0

hi hat prob. .8 0 .91 0 .89 .87 .83 .82

vol. 1.072 0 1.072 0 1.072 1.072 1.072 1.072

pos. 2.4 0.0 2.4 0.0 2.4 2.4 2.4 2.4

Figure 21. Drum beat to accompany a strummed guitar part.

The input of for this drum beat consisted

only of consistently timed finger-snapping

into a microphone. The drum beat itself is

fairly straight forward, but the offbeat kick

and snare hits in the second half of the bar

help to keep it interesting. The styling table

below is locked with 100% probability and

fixed volume for all tracks due to the low

complexity value

The guitar input here was played at full

volume and is somewhat more complex but

still with full chord strumming. Below, the

styling directions can be observed. Having

the complexity value above .5 allows for

the inclusion of probability values as well as

the potential of ghost notes and accents

P a g e | 39

vol 1.013

1400 jam

comp 1.750

sustain 1.920

kick 1 0 1 1 0 0 0 0

snare 0 0 1 0 0 0 1 1

hat 1 1 1 1 1 1 0 1

beat 1 & 2 & 3 & 4 &

kick prob. 1 0 .8 .85 0 0 0 0

vol. 1.013 0 1.013 1.013 0 0 0 0

snare prob. 0 0 .92 0 0 0 .87 .72

vol. 0 0 1.013 0 0 0 1.3 0

hi hat prob. .83 .85 .91 .99 .89 .95 0 .92

vol. 1.9 1.9 1.9 0 1.9 1.9 0 1.9

pos. 1.9 1.9 1.9 1.9 1.9 3 1.9 1.9

Figure 22. Drum beat to accompany a complex guitar part.

vol 1.117

paradise city

comp 1.750

sustain 2.290

kick 1 0 0 0 1 0 0 1

snare 1 0 1 0 0 0 1 0

hat 1 0 1 0 1 1 1 1

beat 1 & 2 & 3 & 4 &

kick prob. .98 0 0 0 .92 0 0 .80

vol. 1.117 0 0 0 1.117 0 0 1.117

snare prob. 0 0 .92 0 0 0 .87 0

vol. 1.117 0 1.117 0 0 0 1.117 0

hi hat prob. .83 0 .91 0 .89 .95 0 .92

vol. 1.117 0 1.117 0 1.117 1.117 1.117 1.117

pos. 2.3 0 2.3 0 2.3 2.3 2.3 2.3

Figure 23. Drum beat to accompany a chord based guitar part where each note is struck individually.

The complexity value can also be heavily influence by how quick the tap lead in is performed. If

the user taps out the beat in eighth notes, the complexity value will be lower than if it is tapped out in

quarter notes. This can be used as a way to give the user slightly more control of the drum beat

generation process.

The guitar part used in the creation of this

drum beat was quite complex. It was

played slightly muted but with some chords

still left sustained.

In this guitar part, the notes of the chords

are struck individually in a way that

features many off-beat onsets. Every note

typically rings out.

P a g e | 40

Chapter 5: Evaluation

5.1 Overview

This chapter will observe the performance of the system in terms of accuracy and style. The beat

tracking and onset detection methods will be compared with their respective input waveforms in order to

determine accuracy. Consistency within the beat tracking system is another aspect which will be

evaluated. The beat locations will be compared to the beat locations determined by a human musician.

Any delay in the output will also be noted as this can negatively affect the human musician and

potentially render the system unusable. The evaluation performed looked at the average latency value

between the input and output at different stages as well as any jitter than may be present. A consistent

latency value would allow the program to accurately jump ahead and predict beat locations while a high

amount of jitter would make accurate accompaniment impossible.

As for the actual performance of the program, a selection of human musicians will be asked to try

out the system and answer a series of questions regarding the quality of the experience. User feedback

can help identify key problem areas as well as provide useful information on potential future

implementations.

5.2 Quantitative Evaluation

5.2.1Beat Tracking

The two beat tracking systems tested here are the LabROSA beat tracker [11] and the aubio beat

tracker[10]. Each system was tested and compared with the human perceived beat. Analysis looks at the

tempo derived from each method, represented as an average interval between each detected beat (in

seconds). The standard deviation is another important factor to consider as it shows how consistent each

method is. Some degree of variance is to be expected as the test audio files are played by humans so a

natural, human deviation in tempo is present. The minimum and maximum time intervals are also shown

to demonstrate the potential severity of the variance. Two song samples were chosen for analysis in this

section as they are representative of the beat tracking capabilities of each system. The first song shown

will be an excerpt from the second movement of Paranoid Android, originally performed by the group,

Radiohead [14]. The song sample is played on a solo acoustic guitar. This sample features a rhythm with

many offbeat notes and accents as well as some slight syncopation, making it ideal to test the limitations

of any of the beat tracking systems. The second song will be a sample of the chorus from Blitzkrieg Bop by

the Ramones [23]. The sample is slower than the original and played on an acoustic guitar. It was chosen

because of its relatively straightforward rhythm, featuring accents only on the beats.

P a g e | 41

5.2.1.1 Paranoid Android

The charts below (figure 24) show the beat locations in relation to the song sample for each

method. The first chart was produced by the LabROSA beat tracker [11]and looks to be fairly consistent

and accurate (confirmed by listening the audio click track along with the original sample). The second

chart was produced by aubio [10] and appears to be much less consistent, deriving a much slower tempo

than what is present at the beginning. It should be noted that the tempo here is not being displayed in

half-time, which would be acceptable, and many of the beat locations range from slightly to greatly offset

from some of the obvious onsets. The third chart shows the beat locations determined by a human

tapping along with the song sample. It should be noted that the aubio beat tracker does not show any

beat locations for the first few beats as it has an initial warm up period when initialized.

Figure 24. Waveform with slightly complex rhythm. Top - beat locations as determined by the LabROSA Cover Song ID

beat tracker [11]. Middle - beat locations as determined by aubio beat tracker [10]. Bottom - beat locations as

determined by a human musician.

The chart below (figure 25) displays a numerical representation of what was stated above. The

average interval time between beat locations as determined by the LabROSA beat tracker is very close to

the human determined interval time. Both of these also have a low variance value which is seen as a

P a g e | 42

positive due to the relatively consistent tempo of the input guitar part. The aubio beat tracker performed

poorly on this sample, with a much higher variance and an extremely outlying maximum interval value.

Paranoid Android LabROSA aubio human

Average Interval 0.3355 0.4995 0.3385

Variance 0.0183 0.1818 0.0144

Max. Interval 0.3720 0.9056 0.3680

Min. Interval 0.2920 0.3019 0.3080

Figure 25. Accuracy information of different beat tracking methods. The Average Interval row displays the mean

interval times for the entire beat location file (basically giving the tempo of the song). Variance shows how far the

interval deviate from the mean on average. Max. Interval show the longest interval time within the beat location file

while Min. Interval shows the shortest. All values are in seconds.

The aubio package has a number of alternative onset detection methods which can be used with

the beat tracker, however, as can be seen below in figure 26, little difference is made in terms of

accuracy. One can conclude that the beat detection algorithm included in aubio would need

improvements in order to handle more complex rhythms.

Figure 26. Aubio beat detection utilising each onset detection method included in the program. While method

produces greatly different results, no one of them stands out as being particularly accurate or consistent.

P a g e | 43

5.2.1.2 Blitzkrieg Bop:

The beat locations of the charts below ( figure 27) appear to be consistent across all three. This

most likely due to simpler rhythmic pattern of this song sample compared to the previous one.

Figure 27. Waveform with a simple rhythm. Top - beat locations as determined by the LabROSA Cover Song ID beat

tracker [11]. Middle - beat locations as determined by aubio beat tracker [10]. Bottom - beat locations as determined by

a human musician.

The chart below (figure 28) also shows that both beat detection methods worked as well as the

human tapping method.

Blitzkrieg Bop LabROSA aubio human

Average Interval 0.5163 0.5167 0.5163

Variance 0.0159 0.0124 0.0150

Max. Interval 0.5921 0.5480 0.5640

Min. Interval 0.4760 0.4920 0.4800

Figure 28. Accuracy information of different beat tracking methods. The Average Interval row displays the mean

interval times for the entire beat location file (basically giving the tempo of the song). Variance shows how far the

interval deviate from the mean on average. Max. Interval show the longest interval time within the beat location file

while Min. Interval shows the shortest. All values are in seconds.

P a g e | 44

5.2.2 Latency

Latency was observed at two stages within the program. The difference between real input time

and the output from JACK was the first to be observed. The output latency here is what the system on the

second computer receives as its input. Figure 29 demonstrates this.

input - JACK delay

(s)

Average delay time 0.0278

Variance 0.0022

Max. delay time 0.0360

Min. delay time 0.0240

Figure 29. Latency measurements from the initial signal input to the signal after beat tracking has been performed.

The delay time after running the signal through the computer running aubio appears to be fairly

predictable. The delay time itself is relatively short and the amount of jitter (variance) is also quite low.

Users report that there was no real noticeable lag present at this stage in the system. The plot below

(figure 30) shows the distribution of delay times.

Figure 30. Distribution of delay times from the initial signal input to the signal after beat tracking has been performed.

P a g e | 45

The second point observed was at the final output of the program. A signal was sent through each

section of the chain to determine what the overall latency of the system would be.

input - program delay (s)

Average Interval 0.1015

Variance 0.0317

Max. Interval 0.1720

Min. Interval 0.0440

Figure 31. Distribution of delay times from the signal input to the final output signal (drum beat output).

The delay time here is noticeably higher (figure 31). Taking the delay from JACK into

consideration, the average would be about 0.0737 seconds. The variance here is also quite a bit higher,

but between being noticeable and not to users. The image below shows the distribution of delay intervals

(figure 32).

Figure 32. Distribution of delay times from the initial signal input to the final output signal (drum beat output).

P a g e | 46

When compared to the latency distribution from JACK, the difference is much more noticeable, as

shown in the graph below (figure 33).

Figure 33. Scale comparison of Figures 30 and 32.

The latency of the aubio and JACK systems is tolerable but the jitter associated with the program

running on MATLAB is of questionable reliability. It does not seem that MATLAB itself is an environment

that should be used for time dependent tasks such as required for this project.

5.3 Qualitative Evaluation

Qualitative evaluation is very important to a project such as this, as it can be hard to measure

musicality and entertainment mathematically, especially if the factors are largely based upon the personal

preference of the user. Users were asked to test out the system using an acoustic guitar, playing a range

of different styles, and then asked a series of questions regarding the performance of the program. Users

were asked how accurate they felt the system was, what level of interaction was experienced, how they

felt about the decisions made regarding the output drum beat, as well as some other general questions.

The main complaint received from users was the lack of an audio representation of the drum

beats presented to them. The visual cue was not nearly intuitive enough and proved to be difficult to

interpret for users not familiar with drum tabs, though multiple users commented that this method could

Delay from

JACK

Delay from

final output

P a g e | 47

benefit those learning drums. The program would basically provide a number of drum beats to the guitar

player who could then ask a drummer to play the beat on an actual drum kit. Another major complaint

shared by users was the accuracy of the beat tracker when attempting songs without straightforward

rhythms. One user felt that the drum beats did not take the actual rhythm of the input into consideration

as much as it could have. The user felt the drum beats were too simple for some of the complex rhythms

which he was playing and should have been based more on what as being played rather than just

increasing the complexity. Other users commented on the fact that drum beats accompanying verse-

chorus-verse structured songs were not reprised when melody patterns were repeated so the playing was

not as cohesive as it could have been.

As for the overall quality of the drum beats, reviews were mixed. Some felt that they provided a

very suitable accompaniment to their guitar playing while others felt that the beats were relatively

mediocre. One user felt that none of the generated beats would be considered acceptable for the style of

music which was being played. No users felt that the program had an influence on what they were

playing, saying it felt more like they were playing to the system and not with it.

Users felt there was great potential in the system and were not expecting a perfect

accompaniment from a program created in this time span. The general consensus was that this system

could be very useful as a practice tool as well as a potential method for song writing. Users were very

receptive to the concept of the system and a few were quite interested in how the beats were randomly

generated and scored, though it was stated that the fitness functions should be more receptive to

different styles of drumming than just basic rock beats. Some users said they would prefer to manually

choose a genre style for the drum beat rather than having it derived from the input. Users also

commented on the impracticality of requiring two computers to be able to fully run the program.

An observation noted by the author is how the sustain value calculations were more tailored to

the play style of author. The different playing techniques of other users resulted in unexpected sustain

values which generate hi hat position values which did not necessarily fit with the part being played. In

order to accurately and consistently derive this value, waveforms created by a wide range of musicians of

varying skill levels playing the same set of parts would need to be observed.

P a g e | 48

Chapter 6: Conclusions and Further Work

6.1 Conclusions

This report presents a design and development of a virtual drum accompanist for musical

composition and practice purposes. The ultimate goal is to have the virtual drummer seamlessly provide

accurate and appropriate accompaniments to pieces of music as they are played. The system is to model a

human drummer who has no prior knowledge of a song but can still create a suitable and dynamic drum

beat to complement whatever is being played. Through detailed waveform analysis and accurate beat

tracking, the virtual drummer can quickly compose and execute drum beats while modelling basic forms of

creativity and expression commonly found in human musicians.

This is an ongoing project with a prototype system currently being validated with a series of song

samples meant to determine the overall adaptability of the system. There is much that can be done to

optimise this system as it is still in the early stages of development. Future enhancements will help to

increase the virtual drummer's versatility as well as its ability to interact with human musicians. The

variables associated with the drum beat styling aspect are also undergoing constant analysis due to their

weighted influence on drum beat candidates. When an optimal configuration has been found, after

extensive testing with many users from a wide range of styles and abilities, the system may be put

through an initial training phase to reduce the number of invalid drum beats created during the

probability assignment phase.

The system did, however, help to provide an interesting perspective on the interaction between

human and virtual musicians. The way music and maths are related provides a motivation for modelling

musical concepts with computers, and the best way to evaluate the system is to have human musicians

interact with it. While the system is by no means a replacement for a human drummer, it provides

musicians with something to experiment and to their creative boundaries with.

The process of designing, implementing, and evaluating this system has shown that while a

number of the techniques were valid and could be expanded upon, different approaches should be taken

in regard to the completely random generation of drum beats. While an element of randomness and

deviation is important to a system like this, perhaps it should not be completely reliant on it. The system

present can serve as a springboard for future projects attempting to emulate the expressivity of human

drummers. The potential for use as a practice tool could be seen as the primary focus and help give a

greater sense of direction to further developments.

P a g e | 49

6.2 Further Work

Some key improvements needed in the system are in the beat detection portion and the audio

output section. A beat tracker which is not as easily influenced by rhythm changes within the same tempo

is of the greatest importance. The audio output aspect of the system also needs improvement which may

require converting the source code to another language due to Simulink's restrictions on various standard

MATLAB operations. Additional drum beat options, such as syncopation, swing beats, and drum fills could

also greatly benefit the system by increasing its flexibility.

Other future developments may include upgrades to the artificial expression simulator and the

drum beat template. Additional parameters may include hit locations for particular instruments which can

greatly expand the range of sound and create a more natural feel. Musical recognition techniques could

also be very beneficial to the system. If a musician performs a piece with a recurring theme, the system

should recognise it and reprise the drum beat that was previously associated with that theme. This

feature would give the system the ability to participate in very structured songs in a more cohesive

manner. This would also allow musicians to teach the system to play a song if it is able to pick up on these

musical cues.

The system could also be greatly expanded to include additional accompanists, such as piano,

bass, guitar, horns, and stringed instruments. However, an extension like that would require a much

higher level of audio analysis to extract the tones of individual notes. Another possibility is the

implementation of a vision system which allows the user to give visual cues to the system to indicate

movement changes, pauses, and other cues which are normally visually communicated between band

members.

P a g e | 50

References:

[1] Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler M, (2005), A tutorial on onset detection in music signals, IEEE transactions on speech and audio processing, vol. 13, no. 5: pp.1035-1047. [2] Bello J P, Duxbury C, Davies M, Sandler M, (2004), On the Use of Phase and Energy for Musical Onset Detection in the Complex Domain, IEEE Signal Processing Letters, vol. 11, no. 6: pp. 553-556. [3] Bilmes J, (1993), Techniques to foster drum machine expressivity, ICMC 93, pp.276-283. [4] Cockburn A, (2008), Using both incremental and iterative development, STSC Cross Talk, 21 (5): pp. 27-30. [5] Collins N, (2001) Algorithmic composition methods for breakbeat science, Proceedings of Music Without Walls. [6] Collins N, (2007), Musical robots and listening machines, Cambridge Companion to Electronic Music, pp. 171-184. [7] Collins N, (2010), Contrary Motion: An oppositional interactive music system, NIME Conference, pp. 125-129. [8] Collins N, (2011), LL: Listening and Learning in an Interactive Improvisation System, University of Sussex, unpublished. [9] Danby E, Ng K (2011), Virtual Drum Accompanist: Interactive Multimedia System to Model Expression of Human Drummers, Conference on Distributed Multimedia Systems, vol. 17: pp. 110-113. [10] Davies M, Brossier P, Plumbley M, (2005), Beat Tracking Towards Automatic Musical Accompaniment, Audio Engineering Society Convention, 118. [11] Ellis D, (2007), Beat tracking by dynamic programming, Journal of New Music Research, 36:1: pp. 51-60. [12] Goto M, Muraoka Y, (1994), A Beat Tracking System for Acoustic Signals of Music, ACM Multimedia 94 Proceedings, pp. 365-372. [13] Goto M, Muraoka Y, (1999), Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions, Speech Communication, Volume 27: pp. 311–355. [14] Greenwood C, Greenwood J, O'Brien E, Selway P, Yorke T, (1997), "Paranoid Android", OK Computer, Parlophone. [15] Larman C, Basili V, (2003), Iterative and Incremental Development: A Brief History, IEEE Computer Society, 36 (6): pp. 47-56. [16] Puckette M, Apel T, Zicarelli D, (1998), Real-time audio analysis tools for Pd and MSP, ICMC 98. [17] Puckette M, Brown J, (1998), Accuracy of Frequency Estimates from the Phase Vocoder, IEEE Transactions on Speech and Audio Processing, vol. 6 no. 2: pp. 166-176. [18] Ramirez R, Hazan A, (2005), Understanding Expressive Music Performance Using Genetic Algorithms, European Workshop on Evolutionary Music and Art, Berlin:Springer, pp. 508-516. [19] Ramirez R, Hazan A, Maestre E, Pertusa A, Gomez E, Serra X, (2007), Performance Based Interpreter Identification in Saxophone Audio Recordings, IEEE Transactions on Circuits and Systems for Video Technology, 17(3): pp. 356-364. [20] Ramirez R, Hazan A, Maestre E, Serra X, (2005), Understanding Expressive Transformations in Saxophone Jazz Performances, Journal of New Music Research, 34(4): pp.319-330.

P a g e | 51

[21] Ramirez R, Hazan A, Maestre E, Serra X, (2006), A Data Mining Approach to Expressive Music Performance Modelling, Multimedia Data Mining and Knowledge Discovery, Berlin: Springer, pp.362-380. [22] Ramirez R, Hazan A, Maestre E, Serra X, (2008), A genetic rule-based model of expressive performance for jazz saxophone, Computer Music Journal, 32:1: pp. 38-50. [23] Ramone D, Ramone T, (1976), "Blitzkrieg Bop", Ramones, Sire Records. [24] Robertson A, Plumbly M, (2007), B-Keeper: A Beat-Tracker for Live Performance. NIME07, pp. 234-237. [25] Sánchez R, Quintero G, (2009), "Hanuman", 11:11, Rubyworks. [26] Schloss A, (1985), On the Automatic Transcription of Percussive Music - From Acoustic signal to High-Level Analysis, Stanford University Ph.D. dissertation, Tech. Rep. STAN-M-27. [27] Toiviainen P, (1998), An Interactive MIDI Accompanist, Computer Music Journal, Vol. 22, No. 4: pp. 63-75. [28] Weinberg G, Blosser B, Mallikarjuna T, Raman A, (2009) The creation of a multi-human, multi-robot interactive jam session, NIME Conference, pp. 70-73. [29] Weinberg G, Driscoll S, Parry M, (2005), Musical Interactions with a Perceptual Robotic Percussionist, IEEE International Workshop on Robots and Human Interactive Communication, pp. 456-461. [30] Weinbrerg G, Driscoll S, (2006), Human Interaction with an Anthropomorphic Percussionist, CHI 2006 Proceedings, pp. 1229 - 1232. [31] Weinberg G, Driscoll S, (2006), Towards Robotic Musicianship, Computer Music Journal, 30:4: pp.28-45. [32] Weinberg G & Driscoll S, (2007), The Design of a Perceptual and Improvisational Robotic Marimba Player, IEEE International Conference on Robot & Human Interactive Communication, 15: pp.769-774. [33] Weinberg G, Godfrey M, Rae A, Rhoads J, (2007), A real-time genetic algorithm in human-robot musical improvisation, CMMR, pp. 351-259. [34] Wright R, Torry C, (1973), "The Great Gig in the Sky", The Dark Side of the Moon, Harvest/Capitol. [35] Zhe J, Wang Y, (2008), Complexity-Scalable Beat Detection with MP3 Audio Bitstreams, Computer Music Journal, 32:1: pp. 71-8

P a g e | 1

Appendix A: Personal Reflection

Many unforeseen roadblocks were encountered during the course of this project. The primary

impeding factor was the limitations associated with Simulink in regards to the MATLAB programming

language. Initial testing and design was done in the standard MATLAB environment and it was somewhat

surprising to see that many of the techniques already implemented had to be rewritten or adjusted to

increase compatibility with Simulink. Looking back, MATLAB was probably not the best choice of software

for a project which relied heavily on timing as it is not always accurate in that regard. There were a variety

of methods for implementing MIDI output and processing for the standard MATLAB package, so when the

decision to use MATLAB was made I had assumed that everything would work out. Too much time was

spent adjusting everything for use in Simulink that it severely detracted or prevented important timing

and audio output issues.

On another note, the DMS conference (http://www.ksi.edu/seke/dms11.html) provided a great

opportunity to see what others in the field had been working on, especially in regards to computing in

music. The conference deadline pushed many things forward and greatly helped with the writing of this

final report. Without the extra pressure from submission and presentation deadlines from the conference,

the system architecture and some of the methods would not have been developed as early and may have

been rushed at the last minute.

P a g e | 2

Appendix B: Interim Report

P a g e | 3

Appendix C: Operation Manual

Setup:

The aubio library (http://aubio.org/download) and JACK (jackaudio.org)need to be downloaded

and installed to a Linux machine. A computer microphone should be connected to the microphone input

on this computer.

Another computer must have an up to date version of MATLAB with Simulink. A license for the

Signal Processing Blockset within Simulink is also required. A male-male 1/8" stereo headphone/speaker

cable must be connected to the microphone input on this computer with the other end plugged into the

headphone/speaker output of the first machine (the one running aubio and JACK).

Download the compressed folder containing all of the necessary MATLAB code.

P a g e | 4

Set up software:

On the aubio machine. Start up the JACK software and click start. Open up a command prompt

and enter:

aubiotrack -j -t .5

where '-j' tells aubio to use jack and '-t .5' sets the input threshold amplitude at .5. This value can be raised

or lowered depending on the users play style.

On the JACK interface, click the 'Connect' button. Open the 'system' and 'aubio' drop down

menus. Drag out_1 to playback_1, and capture_1 to playback_2 and in_1. The image below demonstrates

how this will look.

Open MATLAB and map to the drive which contains drummer.mdl as well as all the other code

from the compressed folder. Open drummer.mdl.

Running:

Simply click the run button on the drummer.mdl model window, tap out 8 counts to properly align

the beat tracker, and begin playing right on count 9.

P a g e | 5

Appendix D: Resources Used

Aubio - http://aubio.org/download

Real time beat tracking software

JACK - jackaudio.org

Audio interfacing/routing between applications

LabROSA Cover Song ID - http://labrosa.ee.columbia.edu/projects/coversongs/

Beat detection and onset detection

MATLAB/Simulink

Programming environment, user interface

Audacity - http://www.download-audacity.com/

Recording and evaluation for aubio processes

Documents

Virtual Drum Accompanist - University of Leeds · PDF filemodelled on the components of a standard rock/jazz drum kit ... method from which the ... With respect to the basic drum beat,