38
Music Processing Christian Dittmar Lecture Applications of Music Processing International Audio Laboratories Erlangen [email protected]

Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Music Processing

Christian Dittmar

Lecture

Applications of Music Processing

International Audio Laboratories [email protected]

Page 2: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Singing Voice Detection

Important pre-requisite for: Music segmentation Music thumbnailing (preview version) Singing voice transcription Singing voice separation Lyrics alignment Lyrics recognition

Page 3: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Singing Voice Detection

Detect singing voice activity during course of a recording Assumptions: Real-world, polyphonic music recordings are

analyzed Singing voice performs dominant melody above

accompaniment

10 15 20 25 30 35 40 45

Time in seconds

Page 4: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Singing Voice Detection

Challenges: Complex characteristics of singing voice Large diversity of accompaniment music Accompaniment may play same melody as singing Pitch-fluctuating instruments my be similar to singing

Stable pitch Fluctuating pitch

Page 5: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Singing Voice Detection

Common approach: Frame-wise extraction of audio features Classification via machine learning

10 15 20 25 30 35 40 45

Time in seconds

Page 6: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Audio Feature Extraction

Frame-wise processing: Hopsize Q Blocksize K Window function w(n) Signal frame x(n)

Compute for eachanalysis frame: Time-domain features Spectral features Cepstral feature others …

Page 7: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Audio Feature Extraction

Time-domain features: Zero Crossing Rate (ZCR) High-pitched vs. Low-pitched

Linear Prediction Coeff. (LPC) Encodes spectral envelope

Page 8: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Audio Feature Extraction

Spectral features: Spectrogram, linear vs. logarithmic frequency spacing Spectral Flatness (SF), Spectral Centroid (SC), and

many others …STFT Spectrogram [dB]

Time [Sec]

Freq

uenc

y [H

z]

0.5 1 1.5 2

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

Gabor Wavelet Spectrogram [dB]

Time [Sec]

Freq

uenc

y [H

z]

0.5 1 1.5 2

277

407

599

880

1293

1901

2794

4106

6035

8870

Page 9: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Audio Feature Extraction

Cepstral features: Singing voice as an example

Convolutive: excitation * filter Excitation: vibration of vocal folds Filter: resonance of the vocal tract

Magnitude spectrum Multiplicative: excitation · filter

Log-magnitude spectrum Additive: excitation + filter

“Liftering” Separation into smooth spectral

envelope and fine-structured excitation

0 0.5 1 1.5 2 2.5

x 104

Mag

nitu

de s

pect

rum

Extraction of spectral envelope via cepstral liftering

0 0.5 1 1.5 2 2.5

x 104Frequency (Hz)

Loga

rithm

ic m

agni

tude

Observed SpectrumSpectral EnvelopeExcitation Spectrum

Page 10: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Machine Learning

Application to audio signals: Speech recognition Speaker recognition Singing voice detection Genre classification Instrument recognition Chord recognition etc …

Page 11: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Machine Learning

Learning principles: Unsupervised learning

Find structures in data

Supervised learning Human observer provides „ground truth“

Semi-supervised learning Combination of above principles

Reinforcement learning Feedback of „confident“ classifications to

the training

Page 12: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

The Feature Space

Geometric and algebraic interpretation of ML problems Features contain numerical values

Concatenation of several features Dimensionality M

The data set contains N observations Cardinality N

Illustrative Example SFM & SCF of 6 complex tones

1

0

1

0SC K

k

K

k

ks

kskf

1

0

1

0

1SFK

k

KK

k

ksK

ks

Page 13: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

The Feature Space

Each feature has one value M=2

Number of observations N=6

258.62 0.59

512.73 0.99

550.13 0.92

146.50 0.27

47.93 0.01

43.95 0.01

SpectralCentroid

SpectralFlatness

M

N

lpNoiseTone.wav

noiseTone.wav

hpNoiseTone.wav

harmonicNoise.wav

pianoTone.wav

harmonicTone.wav

Page 14: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

100

200

300

400

500

600

Spectral Flatness

Spe

ctra

l Cen

troid

Scatter plot of Spectral Flatness vs. Spectral Centroid

lpNoiseTone.wavnoiseTone.wavhpNoiseTone.wavharmonicNoiseTone.wavpianoTone.wavharmonicTone.wav

The Feature Space

Each feature has one value M=2

Number of observations N=6

Mapping of features SC to y-axis SF to x-axis Scatter plot with

unnormalized axes

Page 15: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

The Feature Space

Each feature has one value M=2

Number of observations N=6

Mapping of features SC to y-axis SF to x-axis Scatter plot with

unnormalized axes

Target class labels Provided by manual

annotation

258.62 0.59

512.73 0.99

550.13 0.92

146.50 0.27

47.93 0.01

43.95 0.01

SpectralCentroid

SpectralFlatness

0

0

0

1

1

1

TargetLabels

⋮ ⋮

Page 16: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Classification methods

k-Nearest Neighbours (kNN)

Singing VoiceAccompanimentUnknown data

Page 17: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Singing VoiceAccompanimentUnknown data

Classification methods

k-Nearest Neighbours (kNN)

L1-Dist. (Manhattan)

M

mmm yxd

11

L2-Dist. (Euclidean)2

12

M

mmm yxd

L∞-Dist. (Maximum)

MM yxyxd

,,max 11

Page 18: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Singing VoiceAccompanimentUnknown data

Classification methods

Decision Trees (DT)

Page 19: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Singing VoiceAccompanimentUnknown data

Classification methods

Random Forests (RF)

Page 20: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Classification methods

Gaussian Mixture Models (GMM)

Singing VoiceAccompanimentUnknown data

∙ Σ

Page 21: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Classification methods

Gaussian Mixture Models (GMM)

Singing VoiceAccompanimentUnknown data

Gauss components

Page 22: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Classification methods

Support Vector Machines (SVM)

Singing VoiceAccompanimentUnknown data

sgn ,

Page 23: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Classification methods

Deep Neural Networks (DNN)

Singing VoiceAccompanimentUnknown data

⋯ , ⋯ ,

Page 24: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Classification methods

Deep Neural Networks (DNN)

Singing VoiceAccompanimentUnknown data

Loss function

Page 25: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Classification methods

Further methods: Hidden Markov Models

Transition probabilities between GMMs Sparse Representation Classifier

Sparse linear combination of training data Boosting

Combine many weak classifiers Convolutional Neural Networks Recurrent Neural Networks Multiple Kernel Learning others …

25

Page 26: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Mel-scale Frequency Cepstral Coefficients

Filter BankFrame

txGaussian Mixture Model (GMM)

x

11,Σ22 ,Σ ... GG Σ,

+w

1w

2w

G

)|( xp

N () N ()

V V V V V N N N N N N N NV V NNN N

Segment-by-Segment Classification

1

0

1

0)|(log)|(log

W

iMitW

W

iSitW pp xx

Singing

Accompaniment

Singing Voice Detection

Page 27: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be

Mosaic signal: Let it Bee

Page 28: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

NMF-Inspired Audio Mosaicing

. =

Non-negative matrix factorization (NMF)

Proposed audio mosaicing approach

.

Non-negative matrix Components Activations

Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram

fixed

learnedfixed

learned

fixed

learned

[Driedger et al. ISMIR 2015]

=

Time source

Freq

uenc

y

Tim

e so

urce

Time targetTime target

Freq

uenc

y

Page 29: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Basic NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Page 30: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Basic NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Core idea: support the development of sparse diagonal activation structures

Activation matrix

Iterative updates

Preserve temporal context

Page 31: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Basic NMF-Inspired Audio Mosaicing

Page 32: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Basic NMF-Inspired Audio Mosaicing

Page 33: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times

Mosaic signal

https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBee

Page 34: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep

Mosaic signal

https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBee

Page 35: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Drum Source Separation

Page 36: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.82 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8

Time (seconds)

Rel

ativ

e am

plitu

de

Log-

frequ

ency

V V V V

V

V

V

STFT

iSTFT

Time (seconds)

Drum Source Separation Signal Model

Page 37: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

Drum Sound Separation Decomposition via NMFD

Row

s ofH

Time (seconds)Lateral slices from W

UU U

W

Log-

frequ

ency

Score-based information(drum notation)

Audio-based information(training drum sounds)

Page 38: Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram [dB] Time [Sec] Frequency [Hz] 0.5 1 1.5 2 277 407 599 880 1293 1901 2794 4106 6035

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

Log-

frequ

ency

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

Drum Sound Separation

https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparation

Time (seconds)

Rel

ativ

e am

plitu

de