Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Music Boundary DetectionGoal: Make a DNN detect the boundary of popular music
Motivation: Basic level task of structure analysis
Jeongmin Liu, Keyongseok Jeong, Hyunjun Lee
Dataset
- SALAMI [1] v2.0 for training (373 songs) and validation (66 songs)
- Our own dataset (23 songs) for test.
Input: Mel Spectrogram
- 2048-point FFT and 50% overlap- No. of mel bins: 128
Output: Boundary Score with kernel [2]
- Gaussian kernel size: 31
Augmentation
- All possible combinations of followings are applied
- +1, -1, & 0 step of pitch-shifting- -24dBFS noise & clean- SpecAugment [3] & no SpecAugment
- (SpecAugment: randomly chosen mel bins (up to 15 bins) are removed.)
Boundary Score Example
DNN
Ideas
Modifying UNet [4]
- Global average pooling layer- Summing skipped features, instead of concatenating them- The long time-axis length of convolution kernel
Weighting loss function to avoid data imbalance
Multiple annotations [5]
- Some songs in SALAMI have two different annotations.- At every training epoch, one annotation is randomly chosen. (different from [5])
Peak-Picking Algorithm
Results (Take The Dive by 종현)
Results
caused by incomplete bars mean
max
min
F1 score (tolerance ±0.5 sec [6]) F1 score (tolerance ±3.0 sec [7])
DiscussionAmbiguous boundary caused by an incomplete bar
- ex) 김연자 - 아모르 파티
In many cases, DNN finds fine structure.
DNN can’t catch a smooth change of musical idea
- ex) 자우림 - 狂犬時代
[1] J. B. L. Smith, J. Ashley Burgoyne, I. Fujinaga, D. De Roure, and J. S. Downie, “Design and creation of a large-scale database of structural annotations”, in ISMIR 2011, 2011, vol. 11, pp. 555–560.
[2] K. Ullrich, J. Schluter, and T. Grill, “Boundary Detection in Music Structure Analysis Using Convolutional Neural Networks,” in ISMIR 2014, 2014, pp. 417–422.
[3] D. S. Park et al., “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” arXiv preprint arXiv:1904.08779, 2019.
[4] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, 2015, vol. 9351, pp. 234–241.
[5] T. Grill and J. Schluter, “Music boundary detection using neural networks on spectrograms and self-similarity lag matrices,” in EUSIPCO 2015, 2015, pp. 1296–1300.
[6] D. Turnbull, G. Lanckriet, E. Pampalk, and M. Goto, “A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting,” in ISMIR 2007, 2007, pp. 51–54.
[7] M. Levy and M. Sandler, "Structural Segmentation of Musical Audio by Constrained Clustering," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 318-326, Feb. 2008.
References