Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram...

Music Processing

Christian Dittmar

Lecture

Applications of Music Processing

International Audio Laboratories Erlangenchristian.dittmar@audiolabs-erlangen.de

Singing Voice Detection

Important pre-requisite for: Music segmentation Music thumbnailing (preview version) Singing voice transcription Singing voice separation Lyrics alignment Lyrics recognition

Detect singing voice activity during course of a recording Assumptions: Real-world, polyphonic music recordings are

analyzed Singing voice performs dominant melody above

accompaniment

10 15 20 25 30 35 40 45

Time in seconds

Challenges: Complex characteristics of singing voice Large diversity of accompaniment music Accompaniment may play same melody as singing Pitch-fluctuating instruments my be similar to singing

Stable pitch Fluctuating pitch

Common approach: Frame-wise extraction of audio features Classification via machine learning

10 15 20 25 30 35 40 45

Time in seconds

Audio Feature Extraction

Frame-wise processing: Hopsize Q Blocksize K Window function w(n) Signal frame x(n)

Compute for eachanalysis frame: Time-domain features Spectral features Cepstral feature others …

Time-domain features: Zero Crossing Rate (ZCR) High-pitched vs. Low-pitched

Linear Prediction Coeff. (LPC) Encodes spectral envelope

Spectral features: Spectrogram, linear vs. logarithmic frequency spacing Spectral Flatness (SF), Spectral Centroid (SC), and

many others …STFT Spectrogram [dB]

Time [Sec]

0.5 1 1.5 2

Gabor Wavelet Spectrogram [dB]

Time [Sec]

0.5 1 1.5 2

Cepstral features: Singing voice as an example

Convolutive: excitation * filter Excitation: vibration of vocal folds Filter: resonance of the vocal tract

Magnitude spectrum Multiplicative: excitation · filter

Log-magnitude spectrum Additive: excitation + filter

“Liftering” Separation into smooth spectral

envelope and fine-structured excitation

0 0.5 1 1.5 2 2.5

Extraction of spectral envelope via cepstral liftering

0 0.5 1 1.5 2 2.5

x 104Frequency (Hz)

Observed SpectrumSpectral EnvelopeExcitation Spectrum

Machine Learning

Application to audio signals: Speech recognition Speaker recognition Singing voice detection Genre classification Instrument recognition Chord recognition etc …

Machine Learning

Learning principles: Unsupervised learning

Find structures in data

Supervised learning Human observer provides „ground truth“

Semi-supervised learning Combination of above principles

Reinforcement learning Feedback of „confident“ classifications to

the training

The Feature Space

Geometric and algebraic interpretation of ML problems Features contain numerical values

Concatenation of several features Dimensionality M

The data set contains N observations Cardinality N

Illustrative Example SFM & SCF of 6 complex tones

The Feature Space

Each feature has one value M=2

Number of observations N=6

258.62 0.59

512.73 0.99

550.13 0.92

146.50 0.27

47.93 0.01

43.95 0.01

SpectralCentroid

SpectralFlatness

lpNoiseTone.wav

noiseTone.wav

hpNoiseTone.wav

harmonicNoise.wav

pianoTone.wav

harmonicTone.wav

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Spectral Flatness

Scatter plot of Spectral Flatness vs. Spectral Centroid

lpNoiseTone.wavnoiseTone.wavhpNoiseTone.wavharmonicNoiseTone.wavpianoTone.wavharmonicTone.wav

The Feature Space

Mapping of features SC to y-axis SF to x-axis Scatter plot with

unnormalized axes

The Feature Space

Mapping of features SC to y-axis SF to x-axis Scatter plot with

unnormalized axes

Target class labels Provided by manual

annotation

258.62 0.59

512.73 0.99

550.13 0.92

146.50 0.27

47.93 0.01

43.95 0.01

SpectralCentroid

SpectralFlatness

TargetLabels

⋮ ⋮

Classification methods

k-Nearest Neighbours (kNN)

Singing VoiceAccompanimentUnknown data

k-Nearest Neighbours (kNN)

L1-Dist. (Manhattan)

mmm yxd

L2-Dist. (Euclidean)2

mmm yxd

L∞-Dist. (Maximum)

MM yxyxd

,,max 11

Decision Trees (DT)

Random Forests (RF)

Gaussian Mixture Models (GMM)

∙ Σ

Gaussian Mixture Models (GMM)

Gauss components

Support Vector Machines (SVM)

Deep Neural Networks (DNN)

⋯ , ⋯ ,

Deep Neural Networks (DNN)

Loss function

Further methods: Hidden Markov Models

Transition probabilities between GMMs Sparse Representation Classifier

Sparse linear combination of training data Boosting

Combine many weak classifiers Convolutional Neural Networks Recurrent Neural Networks Multiple Kernel Learning others …

Mel-scale Frequency Cepstral Coefficients

Filter BankFrame

txGaussian Mixture Model (GMM)

11,Σ22 ,Σ ... GG Σ,

)|( xp

N () N ()

V V V V V N N N N N N N NV V NNN N

Segment-by-Segment Classification

0)|(log)|(log

iSitW pp xx

Singing

Accompaniment

Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be

Mosaic signal: Let it Bee

NMF-Inspired Audio Mosaicing

Non-negative matrix factorization (NMF)

Proposed audio mosaicing approach

Non-negative matrix Components Activations

Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram

learnedfixed

learned

[Driedger et al. ISMIR 2015]

Time source

Time targetTime target

Basic NMF-Inspired Audio Mosaicing

Time target

Time source

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Time target

Time source

. =≈

Spectrogram target

Spectrogram source

Core idea: support the development of sparse diagonal activation structures

Activation matrix

Iterative updates

Preserve temporal context

Time target

Time source

. =≈

Spectrogram target

Spectrogram source

Time target

Time source

. =≈

Spectrogram target

Spectrogram source

Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times

Mosaic signal

https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBee

Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep

Mosaic signal

https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBee

Drum Source Separation

2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.82 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8

Time (seconds)

V V V V

Time (seconds)

Drum Source Separation Signal Model

Drum Sound Separation Decomposition via NMFD

Time (seconds)Lateral slices from W

Score-based information(drum notation)

Audio-based information(training drum sounds)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

Drum Sound Separation

https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparation

Time (seconds)

Music Processing Applications of Music Processing · 2017. 2. 14. · Gabor Wavelet Spectrogram...

Documents

Book: Fundamentals of Music Processing · Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, ... we discuss fundamental

Eng. 100: Music Signal Processing DSP Lecture 8 Project 3 ...fessler/course/100/l/l08-project3.pdf · Eng. 100: Music Signal Processing DSP Lecture 8 Project 3: Music signal processing

Digit Recognition using Spectrogram in JDSP

Time%frequency-Masking- - Interactive Audio Lab · Time%frequency-Masking-Zafar-Raﬁi,-Winter-2014- 33 Music spectrogram time frequency +-0 +-0 +-0 Voice spectrogram time frequency

Book: Fundamentals of Music Processing · Book: Fundamentals of Music Processing MeinardMüller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p.,

Signal Processing Methods for Music Transcription

Syntactic Processing in Language and Music

PROCESSING OF SYMBOLIC MUSIC NOTATION VIA …

Speech and Music Classiﬁcation Using Hybrid Form of ... · Speech and Music Classiﬁcation Using Hybrid Form of Spectrogram and Fourier Transformation ... is segments and transformed

Audio Processing and Music Recognition

Music-syntactic Processing and Auditory Memory

Enhancing Digital Signal Processing Education With Audio Signal Processing And Music ... · Audio signal processing and music synthesis are familiar, accessible and engaging motivators

Spectrogram Segmentation 2

Spectrogram Analysis

Signal Reconstruction from its Spectrogram

Mass Spectrometryminerva.union.edu/newmanj/Physics200/MSII.pdfUrasil U 112.087003 Mass Spectrometry in Medicine Spectrogram with Abnormal Proteins Expression Spectrogram of Normal

SPECTROGRAM ANALYSIS OF ANIMAL SOUND PRODUCTION

Learning&phonology& Learning&Phonology& Spectrogram&

Brain Organization for Music Processing

Cheng-Che Lee Pei-Yi Patricia Kuo arXiv:2009.08083v1 [cs ... · Figure 1: Let music change the visual style of an image. For example, a spectrogram of Claude Debussy’s music Sarabande