149
Richard Vogl [email protected] ifs.tuwien.ac.at/~vogl DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs 21 st Vienna Deep Learning Meetup 15 th of October 2018 Institute of Computational Perception

DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Richard Vogl [email protected]

ifs.tuwien.ac.at/~vogl

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs

21st Vienna Deep Learning Meetup 15th of October 2018

Institute ofComputationalPerception

Page 2: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Richard Vogl1,2, Matthias Dorfer2, Gerhard Widmer2, Peter Knees1

[email protected], [email protected], [email protected], [email protected]

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs

1 2 Institute ofComputationalPerception

Page 3: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

PART 1 AUTOMATIC DRUM TRANSCRIPTION

Task Definition, Problem Modeling, Architectures

PART 2 MULTI-TASK LEARNING

Metadata for Transcripts

Page 4: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

PART 1 AUTOMATIC DRUM TRANSCRIPTION

Task Definition, Problem Modeling, Architectures

PART 2 MULTI-TASK LEARNING

Metadata for Transcripts

Page 5: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!4

audio

Page 6: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

sheet music

WHAT IS DRUM TRANSCRIPTION?

!4

Input: western popular music containing drumsOutput: symbolic representation of notes played by drum instruments

audio ADT systemsymbolic representation

of drum notes

Page 7: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!4

Input: western popular music containing drumsOutput: symbolic representation of notes played by drum instruments

audio ADT systemsymbolic representation

of drum notes

list of events

Page 8: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!4

Input: western popular music containing drumsOutput: symbolic representation of notes played by drum instruments

audio ADT systemsymbolic representation

of drum notes

piano rollt

bass snare hi-hat

Page 9: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

Page 10: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

Page 11: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

drum sampler

Page 12: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

drum sampler audio

Page 13: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

♫ ♫

drum sampler audio

Page 14: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

♫ ♫

drum sampler audio

Page 15: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

♫ ♫

drum sampler audio

Page 16: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

♫ ♫

drum sampler audio

♫♫

Page 17: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHAT IS DRUM TRANSCRIPTION?

!5

audio ADT system notes

♫ ♫

drum sampler audio

♫♫ ♫

Page 18: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHY DRUM TRANSCRIPTION?

!6

Page 19: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHY DRUM TRANSCRIPTION?

Wide range of application

!6

Page 20: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHY DRUM TRANSCRIPTION?

Wide range of application‣ Generate sheet music

!6

Page 21: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHY DRUM TRANSCRIPTION?

Wide range of application‣ Generate sheet music

‣ Music production sampling / remixing / resynthesis

!6

Page 22: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

WHY DRUM TRANSCRIPTION?

Wide range of application‣ Generate sheet music

‣ Music production sampling / remixing / resynthesis

‣ Higher level MIR tasks use drum patterns for other tasksgenre classificationsong segmentation

!6

Page 23: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

FOCUSED INSTRUMENTS

!7

BD

SDHH

Page 24: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

ADT methods focus bass drum (BD) snare (SD) and hi-hat (HH)

FOCUSED INSTRUMENTS

!7

BD

SDHH

Page 25: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

ADT methods focus bass drum (BD) snare (SD) and hi-hat (HH)‣ Make up majority of notes in datasets

FOCUSED INSTRUMENTS

!7

BD

SDHH

Page 26: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

ADT methods focus bass drum (BD) snare (SD) and hi-hat (HH)‣ Make up majority of notes in datasets‣ Beat defining / most important

FOCUSED INSTRUMENTS

!7

BD

SDHH

Page 27: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

ADT methods focus bass drum (BD) snare (SD) and hi-hat (HH)‣ Make up majority of notes in datasets‣ Beat defining / most important‣ Well separated spectral energy distribution

FOCUSED INSTRUMENTS

!7

BD

SDHH

bass drum snare drum hi-hat

Page 28: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

ADT methods focus bass drum (BD) snare (SD) and hi-hat (HH)‣ Make up majority of notes in datasets‣ Beat defining / most important‣ Well separated spectral energy distribution

FOCUSED INSTRUMENTS

!7

BD

SDHH

bass drum snare drum hi-hat

Page 29: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

ADT methods focus bass drum (BD) snare (SD) and hi-hat (HH)‣ Make up majority of notes in datasets‣ Beat defining / most important‣ Well separated spectral energy distribution

FOCUSED INSTRUMENTS

!7

BD

SDHH

bass drum snare drum hi-hat

Page 30: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

ADT methods focus bass drum (BD) snare (SD) and hi-hat (HH)‣ Make up majority of notes in datasets‣ Beat defining / most important‣ Well separated spectral energy distribution

FOCUSED INSTRUMENTS

!7

BD

SDHH

bass drum snare drum hi-hat

Page 31: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

STATE OF THE ART

!8

activation functionsspectrogram

t [ms] t [ms]

bass

snarehi-hat

Page 32: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

End-to-end / activation-function-basedNeural Networks and NMF-based approaches

STATE OF THE ART

!8

activation functionsspectrogram

t [ms] t [ms]

bass

snarehi-hat

Page 33: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

End-to-end / activation-function-basedNeural Networks and NMF-based approaches

Overview ArticleWu, C.-W., Dittmar, C., Southall, C.,Vogl, R., Widmer, G., Hockman, J., Müller, M., Lerch, A.: “An Overview of Automatic Drum Transcription,” IEEE TASLP, vol. 26, no. 9, Sept. 2018.

STATE OF THE ART

!8

activation functionsspectrogram

t [ms] t [ms]

bass

snarehi-hat

Page 34: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

SYSTEM OVERVIEW

!9

signal preprocessing

NN feature extraction event detection classification

peak picking

NN training

audio events

train data

Page 35: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

SYSTEM OVERVIEW

!9

signal preprocessing

NN feature extraction event detection classification

peak picking

NN training

audio events

waveform

t [s]

Atrain data

Page 36: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

SYSTEM OVERVIEW

!9

signal preprocessing

NN feature extraction event detection classification

peak picking

NN training

audio events

waveform

t [s]

A

spectrogram

t [s]

f [H

z]

train data

Page 37: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

SYSTEM OVERVIEW

!9

signal preprocessing

NN feature extraction event detection classification

peak picking

NN training

audio events

waveform

t [s]

A

spectrogram

t [s]

f [H

z]

activation functions

t [s]bass

snarehi-hat

train data

Page 38: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

SYSTEM OVERVIEW

!9

signal preprocessing

NN feature extraction event detection classification

peak picking

NN training

audio events

waveform

t [s]

A

spectrogram

t [s]

f [H

z]

activation functions

t [s]bass

snarehi-hat

detected peaks

t [s]bass

snarehi-hat

train data

Page 39: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

NETWORK MODELS — RNN

!10

Page 40: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Processing of spectrogram frames as sequential data

NETWORK MODELS — RNN

!10

Page 41: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Processing of spectrogram frames as sequential dataFrame-wise detection of instrument onsets

NETWORK MODELS — RNN

!10

RNN train data sample

Page 42: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Processing of spectrogram frames as sequential dataFrame-wise detection of instrument onsets

NETWORK MODELS — RNN

!10

RNN train data sample

output: 3 sigmoid

30 BD GRU

30 BD GRU

30 BD GRU

bidirectional RNN architecture with GRUs:

Page 43: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

NETWORK MODELS — CNN

!11

Page 44: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CNN train data sample

NETWORK MODELS — CNN

Operate on small windows of spectrogram (current frame + spectral context)

!11

Page 45: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CNN train data sample

NETWORK MODELS — CNN

Operate on small windows of spectrogram (current frame + spectral context)Acoustic modeling of drum sounds

!11

Page 46: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CNN train data sample

NETWORK MODELS — CNN

Operate on small windows of spectrogram (current frame + spectral context)Acoustic modeling of drum sounds

!11

output: 3 sigmoid

2 x conv: 32 x 3x3 (batch norm)

max pool: 3x3

2 x conv: 64 x 3x3 (batch norm)

max pool: 3x3

2 x dense: 256 ReLU

VGG - style architecture:

Page 47: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

NETWORK MODELS — CRNN

!12

Page 48: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Low-level CNN for acoustic modeling

NETWORK MODELS — CRNN

!12

Page 49: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Low-level CNN for acoustic modeling High-level RNN for music language model

NETWORK MODELS — CRNN

!12

Page 50: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Low-level CNN for acoustic modeling High-level RNN for music language model

CRNN train data sample

NETWORK MODELS — CRNN

!12

Page 51: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Low-level CNN for acoustic modeling High-level RNN for music language model

CRNN train data sample

NETWORK MODELS — CRNN

!12

2 x conv: 32 x 3x3 (batch norm)

max pool: 3x3

2 x conv: 64 x 3x3 (batch norm)

max pool: 3x3

3 x RNN: 50 BD GRU

stacked CNN + RNN architecture:

output: 3 sigmoid

Page 52: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!13

WHY IS CONTEXT RELEVANT?

Page 53: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Instruments from the same class often sound quite different Similar sound for different instruments

!13

WHY IS CONTEXT RELEVANT?

♫ ♫snare drums: crash v.s. splash:

Page 54: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Instruments from the same class often sound quite different Similar sound for different instruments

!13

WHY IS CONTEXT RELEVANT?

♫ ♫snare drums: crash v.s. splash:

Page 55: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Instruments from the same class often sound quite different Similar sound for different instruments

!13

WHY IS CONTEXT RELEVANT?

♫ ♫snare drums: crash v.s. splash:

Page 56: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Instruments from the same class often sound quite different Similar sound for different instruments

When humans transcribe drums‣ Function in a track equally important (snare drum v.s. backbeat)

!13

WHY IS CONTEXT RELEVANT?

♫ ♫snare drums: crash v.s. splash:

Page 57: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Instruments from the same class often sound quite different Similar sound for different instruments

When humans transcribe drums‣ Function in a track equally important (snare drum v.s. backbeat)‣ Inaudible onsets will be filled in if expected

!13

WHY IS CONTEXT RELEVANT?

♫ ♫snare drums: crash v.s. splash:

Page 58: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Instruments from the same class often sound quite different Similar sound for different instruments

When humans transcribe drums‣ Function in a track equally important (snare drum v.s. backbeat)‣ Inaudible onsets will be filled in if expected

Music Language Model

!13

WHY IS CONTEXT RELEVANT?

♫ ♫snare drums: crash v.s. splash:

Page 59: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!14

?

BASS DRUM OR LOW TOM?

1: bass drum 2: floor tom 3: ? ? ?

Page 60: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!14

?

BASS DRUM OR LOW TOM?

1: bass drum 2: floor tom 3: ? ? ?

Page 61: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!14

?

BASS DRUM OR LOW TOM?

1: bass drum 2: floor tom 3: ? ? ?

Page 62: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!14

?

BASS DRUM OR LOW TOM?

♫context

1: bass drum 2: floor tom 3: ? ? ?

Page 63: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!14

?

BASS DRUM OR LOW TOM?

♫context

1: bass drum 2: floor tom 3: ? ? ?

Page 64: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

3: bass drum

!14

?

BASS DRUM OR LOW TOM?

♫context

1: bass drum 2: floor tom

Page 65: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

3: bass drum

!14

?

BASS DRUM OR LOW TOM?

♫context

1: bass drum 2: floor tom

Page 66: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

DATASETS

!15

Page 67: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

IDMT-SMT-Drums [Dittmar and Gärtner 2014]

‣ Solo drum tracks, recorded, synthesized, and sampled‣ 95 tracks, total: 24m, onsets: 8004

DATASETS

!15

Page 68: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

IDMT-SMT-Drums [Dittmar and Gärtner 2014]

‣ Solo drum tracks, recorded, synthesized, and sampled‣ 95 tracks, total: 24m, onsets: 8004

DATASETS

!15

♫SMT (simple!)

Page 69: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

IDMT-SMT-Drums [Dittmar and Gärtner 2014]

‣ Solo drum tracks, recorded, synthesized, and sampled‣ 95 tracks, total: 24m, onsets: 8004

ENST-Drums [Gillet and Richard 2006]

‣ Recordings, three drummers on different drum kits, optional accompaniment‣ 64 tracks, total: 1h, onsets: 22391

DATASETS

!15

♫ ♫

SMT (simple!)

Page 70: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

IDMT-SMT-Drums [Dittmar and Gärtner 2014]

‣ Solo drum tracks, recorded, synthesized, and sampled‣ 95 tracks, total: 24m, onsets: 8004

ENST-Drums [Gillet and Richard 2006]

‣ Recordings, three drummers on different drum kits, optional accompaniment‣ 64 tracks, total: 1h, onsets: 22391

DATASETS

!15

♫ ♫

SMT (simple!)

ENST solo (harder!)

Page 71: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

IDMT-SMT-Drums [Dittmar and Gärtner 2014]

‣ Solo drum tracks, recorded, synthesized, and sampled‣ 95 tracks, total: 24m, onsets: 8004

ENST-Drums [Gillet and Richard 2006]

‣ Recordings, three drummers on different drum kits, optional accompaniment‣ 64 tracks, total: 1h, onsets: 22391

DATASETS

!15

♫ ♫

SMT (simple!)

ENST solo (harder!)

ENST acc.

(difficult!)

Page 72: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

NETWORK MODELS

!16

Frames Context Conv. Layers Rec. Layers Dense Layers

RNN (S) 100 — — 2x50 GRU —RNN (L) 400 — — 3x30 GRU —

CNN (S) — 9 2 x 32 3x3 filt.3x3 max pooling

2 x 64 3x3 filt.3x3 max poolingall w/ batch norm.

— 2x256CNN (L) — 25 — 2x256

CRNN (S) 100 9 2x50 GRU —CRNN (L) 400 13 3x60 GRU —

tsRNN baseline [Vogl et al. ICASSP’17]

Early stopping Batch normalization L2 norm

Dropout ADAM optimizer

Arc

hite

ctur

e

Page 73: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS

!17

F-m

easu

re [%

]

60

70

80

90

100

SMT ENST solo ENST acc.

RNN (S)RNN (L)CNN (S)CNN (L)CRNN (S)CRNN (L)

tsRNN

SMT ENST solo ENST with accompaniment

Page 74: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!18

♫♫ ♫

HOW DOES IT SOUND?

“Punk” MEDLEY DB

basssnare

hi-hat

Page 75: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!18

♫♫ ♫

HOW DOES IT SOUND?

“Punk” MEDLEY DB

basssnare

hi-hat

Page 76: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

“Hendrix” MEDLEY DB

!18

♫♫ ♫

HOW DOES IT SOUND?

basssnare

hi-hat

Page 77: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

“Hendrix” MEDLEY DB

!18

♫♫ ♫

HOW DOES IT SOUND?

basssnare

hi-hat

Page 78: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Alexa, play some music…

!18

♫♫ ♫

HOW DOES IT SOUND?

basssnare

hi-hat

Page 79: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Alexa, play some music…

!18

♫♫ ♫

HOW DOES IT SOUND?

basssnare

hi-hat

Page 80: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

PART 1 AUTOMATIC DRUM TRANSCRIPTION

Task Definition, Problem Modeling, Architectures

PART 2 MULTI-TASK LEARNING

Metadata for Transcripts

Page 81: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 82: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Do not produce additional information for transcriptsdrum onset detection vs drum transcription

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 83: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Do not produce additional information for transcriptsdrum onset detection vs drum transcription‣ bars lines

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 84: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Do not produce additional information for transcriptsdrum onset detection vs drum transcription‣ bars lines‣ tempo

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 85: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Do not produce additional information for transcriptsdrum onset detection vs drum transcription‣ bars lines‣ tempo‣ meter

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 86: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Do not produce additional information for transcriptsdrum onset detection vs drum transcription‣ bars lines‣ tempo‣ meter‣ dynamics / accents

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 87: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Do not produce additional information for transcriptsdrum onset detection vs drum transcription‣ bars lines‣ tempo‣ meter‣ dynamics / accents‣ stroke / playing technique

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 88: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Richard Vogl, Gerhard Widmer, and Peter Knees, “Towards multi-instrument drum transcription,” in Proc. 21th Intl. Conf. on Digital Audio Effects (DAFx18), Aveiro, Portugal, Sep. 2018.

Do not produce additional information for transcriptsdrum onset detection vs drum transcription‣ bars lines‣ tempo‣ meter‣ dynamics / accents‣ stroke / playing technique

Only three instrument classes

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 89: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Richard Vogl, Gerhard Widmer, and Peter Knees, “Towards multi-instrument drum transcription,” in Proc. 21th Intl. Conf. on Digital Audio Effects (DAFx18), Aveiro, Portugal, Sep. 2018.

Do not produce additional information for transcriptsdrum onset detection vs drum transcription‣ bars lines‣ tempo‣ meter‣ dynamics / accents‣ stroke / playing technique

Only three instrument classes

LIMITATIONS OF CURRENT SYSTEMS

!20

Page 90: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

HH SD BD

t

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

Page 91: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

HH SD BD

t

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

Page 92: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Use beat and downbeat tracking to get:HH SD BD

t

1 2 3 4 143beats 2

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

Page 93: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Use beat and downbeat tracking to get:HH SD BD

t

1 2 3 4 143beats 2

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

Page 94: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Use beat and downbeat tracking to get:‣ bars lines HH

SD BD

t

1 2 3 4 143beats 2

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

Page 95: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Use beat and downbeat tracking to get:‣ bars lines‣ tempo

HH SD BD

t

1 2 3 4 143beats 2

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

Page 96: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Use beat and downbeat tracking to get:‣ bars lines‣ tempo‣ meter

HH SD BD

t

1 2 3 4 143beats 2

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

4/4

Page 97: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Use beat and downbeat tracking to get:‣ bars lines‣ tempo‣ meter

HH SD BD

t

1 2 3 4 143beats 2

ADDITIONAL INFORMATION FOR TRANSCRIPTS

!21

4/4

Page 98: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

LEVERAGE BEAT INFORMATION

!22

HH SD BD

t

1 2 3 4 143beats 2

Page 99: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

LEVERAGE BEAT INFORMATION

Beats are highly correlated with drum patterns(drum onset locations / repetitive patterns)

!22

HH SD BD

t

1 2 3 4 143beats 2

Page 100: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

LEVERAGE BEAT INFORMATION

Beats are highly correlated with drum patterns(drum onset locations / repetitive patterns)Assume that prior knowledge of beats is helpful for drum transcription

!22

HH SD BD

t

1 2 3 4 143beats 2

Page 101: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

LEVERAGE BEAT INFORMATION

Beats are highly correlated with drum patterns(drum onset locations / repetitive patterns)Assume that prior knowledge of beats is helpful for drum transcription Use multi-task learning for beats and drums

!22

HH SD BD

t

1 2 3 4 143beats 2

Page 102: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!23

MULTI-TASK LEARNING

Page 103: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Training one model to solve multiple related tasks

!23

MULTI-TASK LEARNING

Page 104: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Training one model to solve multiple related tasks‣ Improve performance for each subtask ➡ context!

!23

MULTI-TASK LEARNING

Page 105: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Training one model to solve multiple related tasks‣ Improve performance for each subtask ➡ context!

Individual activation functions are already learned using multi-task learning

!23

MULTI-TASK LEARNING

t [ms]

bass

snarehi-hat

Page 106: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Training one model to solve multiple related tasks‣ Improve performance for each subtask ➡ context!

Individual activation functions are already learned using multi-task learning‣ One network for all instruments

!23

MULTI-TASK LEARNING

t [ms]

bass

snarehi-hat

Page 107: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Training one model to solve multiple related tasks‣ Improve performance for each subtask ➡ context!

Individual activation functions are already learned using multi-task learning‣ One network for all instruments‣ Instrument onsets are not independent

!23

MULTI-TASK LEARNING

t [ms]

bass

snarehi-hat

Page 108: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

Training one model to solve multiple related tasks‣ Improve performance for each subtask ➡ context!

Individual activation functions are already learned using multi-task learning‣ One network for all instruments‣ Instrument onsets are not independent‣ MIREX results show that it works better

!23

MULTI-TASK LEARNING

t [ms]

bass

snarehi-hat

Page 109: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

MULTI-TASK EXPERIMENTS

!24

f [H

z]

t [s]

input output

Page 110: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

MULTI-TASK EXPERIMENTS

Three experiments:

!24

f [H

z]

t [s]

input output

Page 111: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

MULTI-TASK EXPERIMENTS

Three experiments:‣ Training on drum targets (DT)

!24

t [s]

f [H

z]

t [s]

input output

Page 112: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

MULTI-TASK EXPERIMENTS

Three experiments:‣ Training on drum targets (DT)‣ Training on drum targets with annotated beats as additional input features (BF)

!24

t [s]

f [H

z]

t [s]

input output

Page 113: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

MULTI-TASK EXPERIMENTS

Three experiments:‣ Training on drum targets (DT)‣ Training on drum targets with annotated beats as additional input features (BF)‣ Training on drum and beat targets as multi-task problem (MT)

!24

t [s]

f [H

z]

t [s]

input output

Page 114: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

MULTI-TASK EXPERIMENTS

Three experiments:‣ Training on drum targets (DT)‣ Training on drum targets with annotated beats as additional input features (BF)‣ Training on drum and beat targets as multi-task problem (MT)

Expected increase in performance for BF compared to DT

!24

t [s]

f [H

z]

t [s]

input output

Page 115: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

MULTI-TASK EXPERIMENTS

Three experiments:‣ Training on drum targets (DT)‣ Training on drum targets with annotated beats as additional input features (BF)‣ Training on drum and beat targets as multi-task problem (MT)

Expected increase in performance for BF compared to DTDesirable increase in performance for MT compared to DT

!24

t [s]

f [H

z]

t [s]

input output

Page 116: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RBMA13-Drums [http://ifs.tuwien.ac.at/~vogl/datasets/] ‣ Free music from the 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total: 1h 43m, onsets: 24365

‣ drum, beat, and downbeat annotations

NEW DATASETS

!25

NEW!

♫ ♫

Page 117: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RBMA13-Drums [http://ifs.tuwien.ac.at/~vogl/datasets/] ‣ Free music from the 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total: 1h 43m, onsets: 24365

‣ drum, beat, and downbeat annotations

RBMA (super difficult!)

NEW DATASETS

!25

NEW!

♫ ♫

Page 118: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RBMA13-Drums [http://ifs.tuwien.ac.at/~vogl/datasets/] ‣ Free music from the 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total: 1h 43m, onsets: 24365

‣ drum, beat, and downbeat annotations

RBMA (super difficult!)

NEW DATASETS

!25

NEW!

♫ ♫

Page 119: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS

DT BF MT

RNN (S) 59.8 63.6 64.6

RNN (L) 61.8 64.5 64.3

CNN (S) 66.2 66.7 63.3

CNN (L) 66.8 65.2 64.8

CRNN (S) 65.2 66.1 66.9

CRNN (L) 67.3 68.4 67.2

% F-measure for drum onsets, tolerance: ±20ms, 3-fold cross-validation

DT … drum transcription BF … DT plus beats as input features MT … DT and beat detection multi-tasking

Experiment

Mod

el

Page 120: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS

DT BF MT

RNN (S) 59.8 63.6 64.6

RNN (L) 61.8 64.5 64.3

CNN (S) 66.2 66.7 63.3

CNN (L) 66.8 65.2 64.8

CRNN (S) 65.2 66.1 66.9

CRNN (L) 67.3 68.4 67.2

% F-measure for drum onsets, tolerance: ±20ms, 3-fold cross-validation

DT … drum transcription BF … DT plus beats as input features MT … DT and beat detection multi-tasking

Experiment

Mod

el

RBMA (super difficult!)

Page 121: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: RNNs

!27

F-m

easu

re [%

]

50

55

60

65

70

RNN (small) RNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Page 122: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: RNNs

!27

F-m

easu

re [%

]

50

55

60

65

70

RNN (small) RNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for RNNs:

Page 123: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: RNNs

!27

F-m

easu

re [%

]

50

55

60

65

70

RNN (small) RNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for RNNs:

BF improves for both models ✔

Page 124: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: RNNs

!27

F-m

easu

re [%

]

50

55

60

65

70

RNN (small) RNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for RNNs:

BF improves for both models ✔

MT improves for both models ✔

Page 125: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CNNs

!28

F-m

easu

re [%

]

50

55

60

65

70

CNN (small) CNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Page 126: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CNNs

!28

F-m

easu

re [%

]

50

55

60

65

70

CNN (small) CNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CNNs:

Page 127: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CNNs

!28

F-m

easu

re [%

]

50

55

60

65

70

CNN (small) CNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CNNs:BF inconsistent

Page 128: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CNNs

!28

F-m

easu

re [%

]

50

55

60

65

70

CNN (small) CNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CNNs:BF inconsistentMT declines for both models

Page 129: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CNNs

!28

F-m

easu

re [%

]

50

55

60

65

70

CNN (small) CNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CNNs:BF inconsistentMT declines for both modelsExpected: CNNs have too little context for beats

Page 130: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CRNNs

!29

F-m

easu

re [%

]

50

55

60

65

70

CRNN (small) CRNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Page 131: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CRNNs

!29

F-m

easu

re [%

]

50

55

60

65

70

CRNN (small) CRNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CRNNs:

Page 132: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CRNNs

!29

F-m

easu

re [%

]

50

55

60

65

70

CRNN (small) CRNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CRNNs:BF improves for both models ✔

Page 133: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CRNNs

!29

F-m

easu

re [%

]

50

55

60

65

70

CRNN (small) CRNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CRNNs:BF improves for both models ✔

MT improves for small models ✔

Page 134: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

RESULTS: CRNNs

!29

F-m

easu

re [%

]

50

55

60

65

70

CRNN (small) CRNN (large)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

Impact of beats for CRNNs:BF improves for both models ✔

MT improves for small models ✔

MT equal for large model ?

Page 135: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!30

F-m

easu

re [%

]

50

55

60

65

70

RNN (S) RNN (L) CRNN (S) CRNN (L)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

RESULTS FOR RECURRENT ARCHITECTURES

Page 136: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!30

F-m

easu

re [%

]

50

55

60

65

70

RNN (S) RNN (L) CRNN (S) CRNN (L)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

RESULTS FOR RECURRENT ARCHITECTURES

Page 137: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!30

F-m

easu

re [%

]

50

55

60

65

70

RNN (S) RNN (L) CRNN (S) CRNN (L)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

RESULTS FOR RECURRENT ARCHITECTURES

Page 138: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

!30

F-m

easu

re [%

]

50

55

60

65

70

RNN (S) RNN (L) CRNN (S) CRNN (L)

DT … drum transcriptionBF … DT plus beats as input featuresMT … DT and beat detection multi-tasking

RESULTS FOR RECURRENT ARCHITECTURESNo improvement because of

beat tracking results?

Page 139: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

snare

!31

♫♫

HOW DOES IT SOUND?three instruments + beats

bass

hi-hat

Page 140: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

snare

!31

♫♫

HOW DOES IT SOUND?three instruments + beats

bass

hi-hat

Page 141: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

snare

!31

♫♫

HOW DOES IT SOUND?eight instruments + beats

bass

hi-hat

snare side stick

crashtom

Page 142: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

snare

!31

♫♫

HOW DOES IT SOUND?eight instruments + beats

bass

hi-hat

snare side stick

crashtom

Page 143: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CONCLUSIONS

!32

Page 144: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CONCLUSIONS

Deep learning for automatic drum transcription

!32

Page 145: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CONCLUSIONS

Deep learning for automatic drum transcription

CRNNs can outperform RNNs and CNNs, especially on complex data‣ Modeling of acoustic and rhythmic properties ➡ better generalization!

!32

Page 146: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CONCLUSIONS

Deep learning for automatic drum transcription

CRNNs can outperform RNNs and CNNs, especially on complex data‣ Modeling of acoustic and rhythmic properties ➡ better generalization!

Leverage multi-task learning effects to increase performance ‣ All instruments under observation within one model‣ Beats and downbeats for additional meta data for transcripts

!32

Page 147: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CONCLUSIONS

Deep learning for automatic drum transcription

CRNNs can outperform RNNs and CNNs, especially on complex data‣ Modeling of acoustic and rhythmic properties ➡ better generalization!

Leverage multi-task learning effects to increase performance ‣ All instruments under observation within one model‣ Beats and downbeats for additional meta data for transcripts

CRNN best overall results @ MIREX’17 and MIREX’18 drum transcriptionMIREX system: http://ifs.tuwien.ac.at/~vogl/models/mirex-17.ziphttp://ifs.tuwien.ac.at/~vogl/models/mirex-18.tar.gz

!32

Page 148: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CONCLUSIONS

Deep learning for automatic drum transcription

CRNNs can outperform RNNs and CNNs, especially on complex data‣ Modeling of acoustic and rhythmic properties ➡ better generalization!

Leverage multi-task learning effects to increase performance ‣ All instruments under observation within one model‣ Beats and downbeats for additional meta data for transcripts

CRNN best overall results @ MIREX’17 and MIREX’18 drum transcriptionMIREX system: http://ifs.tuwien.ac.at/~vogl/models/mirex-17.ziphttp://ifs.tuwien.ac.at/~vogl/models/mirex-18.tar.gz

!32

Page 149: DRUM TRANSCRIPTION VIA JOINT BEAT AND …vogl/slides/2018-10-15...2018/10/15  · 21st Vienna Deep Learning Meetup 15th of October 2018 Institute of Computational Perception Richard

CONCLUSIONS

Deep learning for automatic drum transcription

CRNNs can outperform RNNs and CNNs, especially on complex data‣ Modeling of acoustic and rhythmic properties ➡ better generalization!

Leverage multi-task learning effects to increase performance ‣ All instruments under observation within one model‣ Beats and downbeats for additional meta data for transcripts

CRNN best overall results @ MIREX’17 and MIREX’18 drum transcriptionMIREX system: http://ifs.tuwien.ac.at/~vogl/models/mirex-17.ziphttp://ifs.tuwien.ac.at/~vogl/models/mirex-18.tar.gz

!32