Jeongmin Liu, Keyongseok Jeong, Hyunjun Lee Motivation: Basic …juhan/gct634/2019/finals/MUSIC... · 2019. 6. 23. · Motivation: Basic level task of structure analysis Jeongmin

Music Boundary DetectionGoal: Make a DNN detect the boundary of popular music

Motivation: Basic level task of structure analysis

Jeongmin Liu, Keyongseok Jeong, Hyunjun Lee

Dataset

- SALAMI [1] v2.0 for training (373 songs) and validation (66 songs)

- Our own dataset (23 songs) for test.

Input: Mel Spectrogram

- 2048-point FFT and 50% overlap- No. of mel bins: 128

Output: Boundary Score with kernel [2]

- Gaussian kernel size: 31

Augmentation

- All possible combinations of followings are applied

- +1, -1, & 0 step of pitch-shifting- -24dBFS noise & clean- SpecAugment [3] & no SpecAugment

- (SpecAugment: randomly chosen mel bins (up to 15 bins) are removed.)

Boundary Score Example

DNN

Ideas

Modifying UNet [4]

- Global average pooling layer- Summing skipped features, instead of concatenating them- The long time-axis length of convolution kernel

Weighting loss function to avoid data imbalance

Multiple annotations [5]

- Some songs in SALAMI have two different annotations.- At every training epoch, one annotation is randomly chosen. (different from [5])

Peak-Picking Algorithm

Results (Take The Dive by 종현)

Results

caused by incomplete bars mean

max

min

F1 score (tolerance ±0.5 sec [6]) F1 score (tolerance ±3.0 sec [7])

DiscussionAmbiguous boundary caused by an incomplete bar

- ex) 김연자 - 아모르 파티

In many cases, DNN finds fine structure.

DNN can’t catch a smooth change of musical idea

- ex) 자우림 - 狂犬時代

[1] J. B. L. Smith, J. Ashley Burgoyne, I. Fujinaga, D. De Roure, and J. S. Downie, “Design and creation of a large-scale database of structural annotations”, in ISMIR 2011, 2011, vol. 11, pp. 555–560.

[2] K. Ullrich, J. Schluter, and T. Grill, “Boundary Detection in Music Structure Analysis Using Convolutional Neural Networks,” in ISMIR 2014, 2014, pp. 417–422.

[3] D. S. Park et al., “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” arXiv preprint arXiv:1904.08779, 2019.

[4] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, 2015, vol. 9351, pp. 234–241.

[5] T. Grill and J. Schluter, “Music boundary detection using neural networks on spectrograms and self-similarity lag matrices,” in EUSIPCO 2015, 2015, pp. 1296–1300.

[6] D. Turnbull, G. Lanckriet, E. Pampalk, and M. Goto, “A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting,” in ISMIR 2007, 2007, pp. 51–54.

[7] M. Levy and M. Sandler, "Structural Segmentation of Musical Audio by Constrained Clustering," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 318-326, Feb. 2008.

References

Documents

Jeongmin Liu, Keyongseok Jeong, Hyunjun Lee Motivation: Basic …juhan/gct634/2019/finals/MUSIC... · 2019. 6. 23. · Motivation: Basic level task of structure analysis Jeongmin