Upload
others
View
83
Download
2
Embed Size (px)
Citation preview
© Fraunhofer IDMT
Jakob Abeßer1,3, Stefan Balke2, Klaus Frieler3
Martin Pfleiderer3, Meinard Müller2
Deep Learning for Jazz Walking Bass Transcription
1 Semantic Music Technologies Group, Fraunhofer IDMT, Ilmenau, Germany2 International Audio Laboratories Erlangen, Germany3 Jazzomat Research Project, University of Music Franz Liszt, Weimar, Germany
© Fraunhofer IDMT 2
n Example: Miles Davis: So What (Paul Chambers: b)
n Our assumptions for this work:
n Quarter notes (mostly chord tones)
n Representation: beat-wise pitch values
Motivation – What is a Walking Bass Line?
© T
ri A
gu
sN
ura
dh
im
D
Dm7 (D, F, A, C)
C A F A D F A D A D A F AF
© Fraunhofer IDMT 3
Motivation – How is this useful?
n Harmonic analysis
n Composition (lead sheet) vs. actual performance
n Polyphonic transcription from ensemble recordings is challenging
n Walking bass line can provide first clues about local harmonic changes
n Features for style & performer classification
© Fraunhofer IDMT 4
Problem Setting
n Challenges
n Bass is typically not salient
n Overlap with drums (bass drum) and piano (lower register)
n High variability of recording quality and playing styles
n Example: Lester Young – Body and Soul
n Goals
n Train a DNN to extract bass pitch saliency representation
n Postprocessing: manual beat-annotations for beat-wise bass pitch
© Fraunhofer IDMT 5
Outline
n Dataset
n Approach
n Bass Saliency Mapping
n Semi-Supervised Model Training
n Beat-Informed Late Fusion
n Evaluation
n Results
n Summary & Outlook
© Fraunhofer IDMT 6
Dataset
n Weimar Jazz Database (WJD) [1]
n 456 high-quality jazz solo transcriptions
n Annotations: solo melody, beats, chords, segments (phrase, chorus ...)
n 41 files with bass annotations
n Data augmentation(+): Pitch-shifting +/- 1 semitone (sox library [2])
© Fraunhofer IDMT 7
Bass-Saliency Mapping
n Data-driven approach
n Use Deep Neural Network (DNN) to learn mapping from magnitudespectrogram to bass saliency representation
n Spectral Analysis
n Resampling to 22.05 kHz
n Constant-Q magnitude spectrogram (librosa [3])
n Pitch range 28 (41.2 Hz) – 67 (392 Hz)
n Multilabel classification
n Input dimensions: 40 * NContextFrames
n Output dimensions: 40
n Learning Target: Bass pitch annotations from the WJD
© Fraunhofer IDMT 8
DNN Hyperparameter
n Layer-wise training [4, 5]
n Least-squares estimate for weight & bias initialization
n (3-5) fully connected layers, MSE loss
n Frame-stacking (3-5 context frames) & feature normalization
n Activation functions: ReLU, Sigmoid (final layer)
n Dropout & L2 weight regularization
n Adadelta optimizer
n Mini-batch size = 500
n 500 epochs / layer
n learning rate = 1
© Fraunhofer IDMT 9
Semi-Supervised Training
n Goal: Select prediction on unseen data as additional training data
F: FeaturesT: Targets
© Fraunhofer IDMT 10
Semi-Supervised TrainingSparsity-based Selection
n Train model on labelled dataset D1+ (3899 notes)
n Predictions on unlabelled dataset D2+ (11697 notes)
n Select additional training data via sparsity greater than threshold t
n Re-training
© Fraunhofer IDMT 11
Beat-Informed Late Fusion
n Use manual beat-annotations from the Weimar Jazz Database
n Find most salient pitch per beat
Beats
Frames
1 2 3 4
© Fraunhofer IDMT 12
Evaluation
n Use manual beat-annotations from the Weimar Jazz Database
n Compare against state-of-the-art bass transcription algorithms
n D: Dittmar, Dressler, and Rosenbauer [8]
n SG: Salamon, Serrà, and Gómez [9]
n RK: Ryynänen and Klapuri [7]
© Fraunhofer IDMT 13
Example
n Chet Baker: “Let’s Get Lost” (0:04 – 0:09)
Initial model
M1 - without data aug.M1
+ - with data aug.
Semi-supervised learning
M20,+ - t0
M21,+ - t1
M22,+ - t2
M23,+ - t3
D - Dittmar et al.SG - Salamon et al.RK - Ryynänen & Klapuri
M1+
© Fraunhofer IDMT 14
Results
© Fraunhofer IDMT 15
Summary
n Data-driven approach seems to enhance non-salient instruments.
n Beneficial
n Data augmentation & dataset enlargement
n Frame stacking (stable bass pitches)
n Beat-informed late fusion
n Semi-supervised training did not improve accuracy but made bass-saliencymaps sparser
n Model is limited to training set‘s pitch range
© Fraunhofer IDMT 16
Acknowledgements
n Thanks to the authors for sharing bass transcription algorithms / results.
n Thanks to the students for transcribing the bass lines
n German Research Foundation
n DFG-PF 669/7-2 : Jazzomat Research Project (2012-2017)
n DFG MU 2686/6-1: Stefan Balke and Meinard Müller
© Fraunhofer IDMT 17
References
[1] Weimar Jazz Database: http://jazzomat.hfm-weimar.de[2] sox http://sox.soundforge.net[3] McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O., “librosa: Audio
and Music Signal Analysis in Python,” in Proc. of the Scientific Computing with Python conference (Scipy), Austin, Texas, 2015.
[4] Uhlich, S., Giron, F., and Mitsufuji, Y., “Deep neural network based instrument extraction from music,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 2135–2139, Brisbane, Australia, 2015.
[5] Balke, S., Dittmar, C., Abeßer, J., and Muller, M., “Data-Driven Solo Voice Enhancement for Jazz Music Retrieval,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017.
[6] Hoyer, P.O., “Non-negative Matrix Factorization with Sparseness Constraints,” J. of Machine Learning Research, 5, pp. 1457–1469, 2004.
[7] Ryynanen, M.P. and Klapuri, A., “Automatic transcription of melody, bass line, and chords in polyphonic music,” Computer Music J., 32, pp. 72–86, 2008
[8] Dittmar, C., Dressler, K., and Rosenbauer, K., “A Toolbox for Automatic Transcription of Polyphonic Music,” Proc. of the Audio Mostly Conf., pp. 58–65, 2007.
[9] Salamon, J., Serrà, J., and Gómez, E., “Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming,” Int. J. of Multimedia Inf. Retrieval, 2, pp. 45–58, 2013
© Fraunhofer IDMT 18
n Jazzomat Research Project & Weimar Jazz Database
n http://jazzomat.hfm-weimar.de/
n Python code and trained model available
n https://github.com/jakobabesser/walking_bass_transcription_dnn
n Additional online demos
n https://www.audiolabs-erlangen.de/resources/MIR/2017-AES-WalkingBassTranscription
Thank You!