Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Music Processing
Meinard Müller
Lecture
Audio Decomposition
International Audio Laboratories [email protected]
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
Chapter 8: Audio Decomposition
In the final Chapter 8 on audio decomposition, we present a challengingresearch direction that is closely related to source separation. Within this wideresearch area, we consider three subproblems: harmonic–percussiveseparation, main melody extraction, and score-informed audio decomposition.Within these scenarios, we discuss a number of key techniques includinginstantaneous frequency estimation, fundamental frequency (F0) estimation,spectrogram inversion, and nonnegative matrix factorization (NMF).Furthermore, we encounter a number of acoustic and musical properties ofaudio recordings that have been introduced and discussed in previouschapters, which rounds off the book.
8.1 Harmonic-Percussive Separation8.2 Melody Extraction8.3 NMF-Based Audio Decomposition8.4 Further Notes
Why is Music Processing Challenging?
Chopin, Mazurka Op. 63 No. 3 Example:
Why is Music Processing Challenging?
Waveform
Chopin, Mazurka Op. 63 No. 3 Example:Am
plitu
de
Time (seconds)
Why is Music Processing Challenging?
Waveform / Spectrogram
Chopin, Mazurka Op. 63 No. 3 Example:Fr
eque
ncy
(Hz)
Time (seconds)
Why is Music Processing Challenging?
Waveform / Spectrogram
Performance– Tempo– Dynamics– Note deviations– Sustain pedal
Chopin, Mazurka Op. 63 No. 3 Example:
Why is Music Processing Challenging?
Waveform / Spectrogram
Performance– Tempo– Dynamics– Note deviations– Sustain pedal
Polyphony
Chopin, Mazurka Op. 63 No. 3 Example:
Main Melody
AccompanimentAdditional melody line
Decomposition of audio stream into different sound sources
Central task in digital signal processing
“Cocktail party effect”
Source Separation
Source Separation
Decomposition of audio stream into different sound sources
Central task in digital signal processing
“Cocktail party effect”
Several input signals
Sources are assumed to be statistically independent
Source Separation (Music)
Time
Time
Main melody, accompaniment, drum track
Instrumental voices
Individual note events
Only mono or stereo
Sources are often highly dependent
Harmonic-Percussive Decomposition
Mixture:
Harmonic-Percussive Decomposition
Harmonic component
Percussive component
Clearly percussive soundsClearly harmonic sounds
Mixture:
Harmonic-Percussive Decomposition
Clearly percussive soundsClearly harmonic sounds
Mixture:
Harmonic component
Residualcomponent
Percussive component
Harmonic-Percussive Decomposition
Mixture:
• Clearly harmonic sounds of singing voice and accompaniment
• Drum hits• Fricatives &
plosives in singing voice
• Noise-like sounds• Vibrato/glissando
sounds
Demo: https://www.audiolabs-erlangen.de/resources/2014-ISMIR-ExtHPSep/
Harmonic component
Percussive component
Residualcomponent
Literature: [Driedger/Müller/Disch, ISMIR 2014]
Singing Voice Extraction
Singing voice Accompaniment
Original Recording
Singing Voice Extraction
Original recording HPR
Harmonic component Residual componentPercussive component
Harmonic portion singing voice
MR TR SL
F0 annotation
Harmonic portion accompaniment
Fricativessinging voice
Instrument onsetsaccompaniment
Vibrato & formantssinging voice
Diffuse instruments soundsaccompaniment
+ +
Estimatesinging voice
Estimateaccompaniment
Time
Freq
uenc
y
Score-Informed Source SeparationExploit musical score to support separation process
Time
Pitc
hPi
tch
Time
Pitc
h
Time
Freq
uenc
y (H
z)
Render
Parametric Model Approach
Estimate
≈
Parameters
Time (seconds) Time (seconds)
Freq
uenc
y (H
z)
Rebuild spectrogram information
NMF (Nonnegative Matrix Factorization)
≈N
K
K
M
≥ 0 ≥ 0 ≥ 0
M
NMF (Nonnegative Matrix Factorization)
≈
Templates Activations
N
M K
K
M
Magnitude Spectrogram
Templates: Pitch + Timbre
Activations: Onset time + Duration
“How does it sound”
“When does it sound”
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Random initialization
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
yFr
eque
ncy
Note number
Not
e nu
mbe
r
Time
Learnt templates Learnt activations
Initialized template Initialized activations
Random initialization → No semantic meaning
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Constrained initialization
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Activation constraints for p=55
Initialized template Initialized activations
Template constraint for p=55
Constrained initialization
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
yFr
eque
ncy
Not
e nu
mbe
r
Time
Org
Model
Note number
Initialized template Initialized activations
Constrained initialization → NMF as refinement
Learnt templates Learnt activations
Score-Informed Audio Decomposition
500
580
523
Freq
uenc
y (H
ertz
)
0 10.5Time (seconds)
9876
1600
1200
800
400
9876
1600
1200
800
400
500
580
554Fr
eque
ncy
(Her
tz)
0 10.5Time (seconds)
Application: Audio editing
Informed Drum-Sound Decomposition
Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparationLiterature: [Dittmar/Müller, IEEE/ACM-TASLP 2016]
Remix:
Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be
Mosaic signal: Let it Bee
Demo: https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBeeLiterature: [Driedger/Müller, ISMIR 2015]
NMF-Inspired Audio Mosaicing
≈
. =
Non-negative matrix factorization (NMF)
Proposed audio mosaicing approach
≈
.
Non-negative matrix Components Activations
Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram
fixed
learnedfixed
learned
fixed
learned
=
Time source
Freq
uenc
y
Tim
e so
urce
Time targetTime target
Freq
uenc
y
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
Core idea: support the development of sparse diagonal activation structures
Activation matrix
Das Bild kann nicht angezeigt werden.Das Bild kann nicht angezeigt werden.
Iterative updates
Preserve temporal context
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times
Mosaic signal
Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep
Mosaic signal
Links
SiSEC: Signal Separation Evaluation Campaignhttps://www.sisec17.audiolabs-erlangen.de/
MedleyDB: A Dataset of Multitrack Audiohttp://steinhardt.nyu.edu/marl/research/medleydb
LibROSA (Python)https://librosa.github.io/librosa/