Upload
dangdieu
View
260
Download
1
Embed Size (px)
Citation preview
Drum transcription in polyphonic music usingnon-negative matrix factorization
Arnaud Moreau2 Arthur Flexer1,2
1Institute of Medical Cybernetics and Artificial IntelligenceCenter for Brain Research, Medical University of Vienna, Austria
2The Austrian Research Institute for Artificial IntelligenceFreyung 6/6, A-1010 Vienna, Austria
Introduction
I Prerequisite for genre classification or beat/meter detectionI Transcription more difficult in polyphonic musicI Source separation based systemI Extension of work by Helen and Virtanen from
drum/non-drum classification and separation to fullpolyphonic drum transciption
Overview
featureextraction
[X ] f , t [A]c , t
SVMclassification
peak picking
input signal
NMFseparation
STFT
[X ] f , t
featureextraction
[S ] f , c [A]c , t
NMFseparation
STFT
[X ] f , t
drum samples
transcription
I Input audio is divided into 5 sec excerptsI Magnitude spectrogram representation (window size 2048,
hop size 512)I Non-negative matrix factorisation (NMF) algorithm gives
source-spectra and time-varying gains of c componentsI c components classified by Support Vector Machine (SVM)I Peak-picking algorithm
Results and Discussion
The algorithm is evaluated on 60 sec excerpts from 4multi-channel recordings, which are labelled manually,containing a total number of 1019 drum onsets.
Song 1, 242 onsets Song 3, 206 onsetsBD SD HH mean BD SD HH mean
Rp% 88.66 54.93 41.03 60.85 36.84 50.68 81.77 56.43Rr% 93.33 63.64 98.97 85.31 20.51 88.24 99.25 69.33Rh% 78.89 5.45 −43.30 13.68 −10.26 −17.65 74.44 15.51Song 2, 224 onsets Song 4, 347 onsetsRp% 33.33 69.57 34.76 45.89 80.00 31.25 76.63 62.63Rr% 13.75 69.05 93.33 58.71 50.00 6.33 63.24 39.85Rh% −11.25 35.71 −135.00 −36.85 37.50 −7.59 42.16 24.02
I Most errors are already made at the classification stagewhich harms the subsequent drum transcription
I Results not comparable - no publicly available data setI Remaining research questions (among others):
I What is the optimal feature subset?I What are the optimal thresholds for peak-picking?
Acknowledgement
Helmut Schonleitner of the cultural center AKKU (http://www.akku-steyr.at)provided the multichannel recordings that have been used to evaluate our algorithm.The Austrian Research Institute for Artificial Intelligence acknowledges support fromthe ministries BMUKK and BMVIT.
System
Features
spectral features temporal featuresspectral centroid temporal centroidspectral kurtosis temporal kurtosisspectral skewness temporal skewnessspectral rolloff crest factorspectral flatness peak timespectral contrast peak fluctuationnoise likeness percussivenessstandard deviation periodicity10 MFCCs20 dynamic MFCCs (mean+std)20 dynamic ∆MFCCs (mean+std)
The NMF algorithm
One short-time spectrum vector x(t) ismodelled as a sum of c components,each having a constant spectrum S andtime-varying gain A(t)
x(t) ≈c∑
i=1SiAi(t) or X ≈SA.
The components are estimated using theupdate rules
S = S. ∗AT (X./SA)
AT1and
A = A. ∗(X./SA)ST
1ST.
This is a suitable representation for druminstruments, because their spectra don’tchange over time.
The classifier
I One SVM for classes drum/non-drum,2580 feature vectors
I One SVM for classes BD, SD, HH, 3145feature vectors
I Implemented in WEKA(www.cs.waikato.ac.nz/ml/weka/)
I Trainingdata: ENST-Drums(perso.enst.fr/˜gillet/ENST-drums/) and various drumsamples
I Crossvalidation results inside training set:86.28% and 92.94%
Selected References
S. Dixon.Onset detection revisited.In Proc. of the DAFx, pages 133–137, Montreal, Quebec, Canada, Sept.18–20, 2006.
O. Gillet and G. Richard.Enst-drums: an extensive audio-visual database for drum signalsprocessing.In Proceedings of the 7th International Conference on Music InformationRetrieval, pages 156–159, Victoria, BC, Canada, October 2006.
M. Helen and T. Virtanen.Separation of drums from polyphonic music using non-negative matrixfactorization and support vector machine.In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005.
D. D. Lee and H. S. Seung.Algorithms for non-negative matrix factorization.In NIPS, pages 556–562, 2000.
J. Paulus and T. Virtanen.Drum transcription with non-negative spectrogram factorisation.In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005.
K. Tanghe, S. Degroeve, and B. De Baets.An algorithm for detecting and labeling drum events in polyphonic music.In Proc. of the first MIREX, London, UK, September 11-15 2005.
C. Uhle, C. Dittmar, and T. Sporer.Extraction of drum tracks from polyphonic music using independentsubspace analysis.In Proc. of the 4th ICA, Nara, Japan, April 2003.