Drum transcription in polyphonic music using non...

Preview:

Citation preview

Drum transcription in polyphonic music usingnon-negative matrix factorization

Arnaud Moreau2 Arthur Flexer1,2

1Institute of Medical Cybernetics and Artificial IntelligenceCenter for Brain Research, Medical University of Vienna, Austria

2The Austrian Research Institute for Artificial IntelligenceFreyung 6/6, A-1010 Vienna, Austria

Introduction

I Prerequisite for genre classification or beat/meter detectionI Transcription more difficult in polyphonic musicI Source separation based systemI Extension of work by Helen and Virtanen from

drum/non-drum classification and separation to fullpolyphonic drum transciption

Overview

featureextraction

[X ] f , t [A]c , t

SVMclassification

peak picking

input signal

NMFseparation

STFT

[X ] f , t

featureextraction

[S ] f , c [A]c , t

NMFseparation

STFT

[X ] f , t

drum samples

transcription

I Input audio is divided into 5 sec excerptsI Magnitude spectrogram representation (window size 2048,

hop size 512)I Non-negative matrix factorisation (NMF) algorithm gives

source-spectra and time-varying gains of c componentsI c components classified by Support Vector Machine (SVM)I Peak-picking algorithm

Results and Discussion

The algorithm is evaluated on 60 sec excerpts from 4multi-channel recordings, which are labelled manually,containing a total number of 1019 drum onsets.

Song 1, 242 onsets Song 3, 206 onsetsBD SD HH mean BD SD HH mean

Rp% 88.66 54.93 41.03 60.85 36.84 50.68 81.77 56.43Rr% 93.33 63.64 98.97 85.31 20.51 88.24 99.25 69.33Rh% 78.89 5.45 −43.30 13.68 −10.26 −17.65 74.44 15.51Song 2, 224 onsets Song 4, 347 onsetsRp% 33.33 69.57 34.76 45.89 80.00 31.25 76.63 62.63Rr% 13.75 69.05 93.33 58.71 50.00 6.33 63.24 39.85Rh% −11.25 35.71 −135.00 −36.85 37.50 −7.59 42.16 24.02

I Most errors are already made at the classification stagewhich harms the subsequent drum transcription

I Results not comparable - no publicly available data setI Remaining research questions (among others):

I What is the optimal feature subset?I What are the optimal thresholds for peak-picking?

Acknowledgement

Helmut Schonleitner of the cultural center AKKU (http://www.akku-steyr.at)provided the multichannel recordings that have been used to evaluate our algorithm.The Austrian Research Institute for Artificial Intelligence acknowledges support fromthe ministries BMUKK and BMVIT.

System

Features

spectral features temporal featuresspectral centroid temporal centroidspectral kurtosis temporal kurtosisspectral skewness temporal skewnessspectral rolloff crest factorspectral flatness peak timespectral contrast peak fluctuationnoise likeness percussivenessstandard deviation periodicity10 MFCCs20 dynamic MFCCs (mean+std)20 dynamic ∆MFCCs (mean+std)

The NMF algorithm

One short-time spectrum vector x(t) ismodelled as a sum of c components,each having a constant spectrum S andtime-varying gain A(t)

x(t) ≈c∑

i=1SiAi(t) or X ≈SA.

The components are estimated using theupdate rules

S = S. ∗AT (X./SA)

AT1and

A = A. ∗(X./SA)ST

1ST.

This is a suitable representation for druminstruments, because their spectra don’tchange over time.

The classifier

I One SVM for classes drum/non-drum,2580 feature vectors

I One SVM for classes BD, SD, HH, 3145feature vectors

I Implemented in WEKA(www.cs.waikato.ac.nz/ml/weka/)

I Trainingdata: ENST-Drums(perso.enst.fr/˜gillet/ENST-drums/) and various drumsamples

I Crossvalidation results inside training set:86.28% and 92.94%

Selected References

S. Dixon.Onset detection revisited.In Proc. of the DAFx, pages 133–137, Montreal, Quebec, Canada, Sept.18–20, 2006.

O. Gillet and G. Richard.Enst-drums: an extensive audio-visual database for drum signalsprocessing.In Proceedings of the 7th International Conference on Music InformationRetrieval, pages 156–159, Victoria, BC, Canada, October 2006.

M. Helen and T. Virtanen.Separation of drums from polyphonic music using non-negative matrixfactorization and support vector machine.In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005.

D. D. Lee and H. S. Seung.Algorithms for non-negative matrix factorization.In NIPS, pages 556–562, 2000.

J. Paulus and T. Virtanen.Drum transcription with non-negative spectrogram factorisation.In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005.

K. Tanghe, S. Degroeve, and B. De Baets.An algorithm for detecting and labeling drum events in polyphonic music.In Proc. of the first MIREX, London, UK, September 11-15 2005.

C. Uhle, C. Dittmar, and T. Sporer.Extraction of drum tracks from polyphonic music using independentsubspace analysis.In Proc. of the 4th ICA, Nara, Japan, April 2003.

a.moreau@gmx.net arthur.flexer@ofai.at

Recommended