23
Making Sense of S d dM i Sound and Music Mark Plumbley Centre for Digital Music Centre for Digital Music Queen Mary, University of London CREST Symposium on Human-Harmonized Information Technology Kyoto, Japan 1 April 2012 O i Overview S Separating sounds Extracting musical notes Following beats Visualisation and manipulation Non-speech non-music sounds

Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Making Sense ofS d d M iSound and Music

Mark Plumbley

Centre for Digital MusicCentre for Digital Music

Queen Mary, University of London

CREST Symposium on Human-Harmonized Information TechnologyKyoto, Japan 1 April 2012

O iOverviewS• Separating sounds

• Extracting musical notes

• Following beats

• Visualisation and manipulation

• Non-speech non-music sounds

Page 2: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

M h kMany thanks to …• Samer Abdallah• Emmanouil Benetos• Thomas Blumensath

• Jason Hockmann• Anssi Klapuri

Chris Landone• Thomas Blumensath• Chris Cannam• Matthew Davies

• Chris Landone • Andrew Nesbit• Marcus Pearce

• Mike Davies• Simon Dixon• George Fazekas

• Josh Reiss• Andrew Robertson

Mark Sandler• George Fazekas• Dimitrios Giannoulis• Chris Harte

• Mark Sandler• Adam Stark• Dan Stowell

• Fabio Hedayioglu • Geraint Wigginsand many others…

And thanks to funders:EPSRC, Leverhulme Trust, EU Framework 7, TSB, ...

Separating Mixed SoundsSeparating Mixed Sounds

Page 3: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Th “C k il P ” blThe “Cocktail Party” problem

Si l “C k il P ” blSimpler “Cocktail Party” problemMi i

Source 1Mi h 1

Mixing

Microphone 1

Source 2Microphone 2

(In Maths: Microphones = Mixing x Sources x = As )

Problem: How can we “unmix” theseProblem: How can we unmix these,if we only have the microphone signals?

Page 4: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

T hi i lTry something simpler

Let’s try with dice instead of sounds

2 coloured dice one Amber (A) and one Blue (B)2 coloured dice, one Amber (A) and one Blue (B)

1 Secretly add some of A to some of B Call this X1. Secretly add some of A to some of B. Call this X

Example: X = ½ x A + 3 x B2. Do again with different amounts. Call this Y

Example: Y = 2 x A + 1 x B3. Roll the dice and write down the numbers X and Y

4. Give me the numbers.4. Give me the numbers.

Can I work out A and B, and how you mixed them?

Mi d di llMixed die rolls

You give me these:

X Y

7 67 6

10.5 7

Let’s plot them6 9

18.5 12

Let s plot them...

(etc...) (etc...)

Page 5: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

S l f h d i20

Scatter plot of the data points

15

20

15

X=7, Y=6

10Y6

5

0 5 10 15 20 250

X7 X

Plotted data points create a lozenge shape. Hmm...

D li ll l i h h hDraw lines parallel with the shape20 1/2

15

20

215

10Y

15 1

3

0 5 10 15 20 250

XX

Page 6: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

R d ff i f A d BRead off mixtures for A and B20 1/2

15

20 1/2

2

Y = 2 x A + 1 x B

15 2

10Y1

X ½ A 3 B 5 3X = ½ x A + 3 x B

These are the original

0 5 10 15 20 250

X

These are the originalmixing equations! *

X

Then use School Math to solve for A and B. Done!* We might have swapped A and B, or scaled them

D h f di i lDo the same for audio signals

I d d t

Method called: “Independent Component Analysis” (ICA)

Unmi edMicrophone 1

IndependentComponent

Analysis UnmixedSource 1

Analysis

Microphone 2UnmixedSource 2

Microphone 2

Page 7: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Separating More Mixed SoundsSeparating More Mixed Sounds

S b hStereo, but more than two sourcesWith: Nesbit, Jafari

Page 8: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Si l l d ’ kSimple scatter plot doesn’t work

25

30

??Try our dice again:

3 dice, 2 mixtures

20

25 ?We can’t see three

,

15Y

We can t see threeobvious lines!

5

10 ?What can we do?

0 10 20 300

5

Change the game...0 10 20 30

X

S d h h i f iSounds have changing frequenciesLet’s look at how the frequencies change over timeLet s look at how the frequencies change over time

(a “spectrogram”)High

cy

Sounds only f

High

eque

nc use a fewfrequencies

Fre

This is called“sparse”

Low

TimeTime

Page 9: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

S di lSparse dice exampleImagine you have toImagine you have to throw a 6 before you can score anything. But if we mix themcan score anything.

4 -> 0

But if we mix them, we can now easily see the mixture lines!

20

252 -> 04 -> 06 4 > 4

15

20

Y

6, 4 -> 41 -> 0(etc )

5

10Y(etc.)

Most scores are zero

0 10 20

0-> “Sparse”

0 10 20X

S l fScatter plot of spectrogramMake the scatter plot trick again,Make the scatter plot trick again,but this time with the numbers from the spectrogram

Colour allColour all these patchesne L patches the same colourop

hon L

t mic

ro

R

Left

Changing “Direction of Arrival”

Right microphone

Page 10: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

C l d dColour-coded spectrogramy

quen

cyF

req

Time

Extracting Musical NotesExtracting Musical Notes

Page 11: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

A i M i T i iAutomatic Music TranscriptionWith: Abdallah

MIDI(“Piano Roll”)

Several notes playing at once( )

Play the notes

Several notes playing at once

(Liszt: Etude No. 5 aus Grandes Etudes de Paganini. MIDI from Classical Piano Midi Page http://www.piano-midi.de, copyright Bernd Krueger)

y

P blProblem:Extract notesfrom thisfrom this

Spectrogram

H d hi ?How can we do this?

Musical notes are very sparse

• Out of 88 notes on a piano only a few are played at onceOut of 88 notes on a piano, only a few are played at once

So idea:So idea:

1. Search for ways to turn our spectrogram into something even sparser

2. Then we have found the notes (we hope!)

• We are looking for a “Sparse Representation”

Page 12: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

G i “ ”0 1 5

Getting to “even sparser”

- 0 . 0 5

0

0 . 0 5

0 . 1

0 . 1 5

-0.05

0

0.05Audio signal:Mostly non-zero.Not sparse

1 2 3 4 5 6 7- 0 . 1 5

- 0 . 1

T im e / s

2 2.005 2.01 2.015

Not sparse

1

1.5Spectrogram:Many small valuesFairly sparse

20 40 60 80 100 120

0.5Fairly sparse

1

Notes:

0 20 40 60 80 1000

0.5

Note

Mostly zeroVery sparse

Note

Ob iE l XObservations(Spectrogram

of Music)

Example:

Harpsichord music:Bach Partitai A Mi BWV827

)

in A Minor BWV827

Time

Fre

q Notes aresparser

X = A × S

Time

SparseD iti

pthan spectrogram

X = A × S

A S

DecompositionNote

frequenciesNotes

A Sfrequencies

=

otes

eq

דNotes” are

“Notes” Time

“No

FreNotes are

discovered

Page 13: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Following BeatsFollowing Beats

B T kiBeat TrackingWith: MEP Davies

1. Measure how much the audio signal changes

-> Biggest at note onsets (“Onset Detection Function”)gg ( )

2. Find regular pattern of peaks -> Beat Locations

Now, the computer can “tap along to the music” ...

Page 14: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Rh h T f iRhythm Transformation

• Extend Beat Tracking to Bar level: Rhythm Tracking

• Rhythm Tracking on model (top) and original (bottom)Rhythm Tracking on model (top) and original (bottom)

• Time-scale segments of original to rhythm of model

Original

Result

Original

Li b ki iLive beat tracking: accompanimentWith: Robertson

B-Keeper System

Page 15: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

B K idB-Keeper video

• B-Keeper System: Andrew Robertson

• Drums: David NockDrums: David Nock

• Glockenspiel: Dave Meckin

[Video]

Music and InformationMusic and Information

Page 16: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

M i d I f iMusic and InformationWith: Abdallah, Pearce, Wiggins, ...

• Idea: Listening to notes gives information* about music• Notes are: “surprising” (high information)Notes are: surprising (high information)

or “not surprising” (low information)• Each note can tell us something new about the futureg• Not “following” the music, but “predicting” the music!

Prediction

* - Information Theory: same as communications engineers use

Fi d h b d i i iFind the boundaries in music

• Measure how much the prediction has to change

Philip Glass: Two Pages

Page 17: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

H l l i i l lik ?Help explain music people like?

• Too predictable from the start? -> boring

C ’t k f it? d d • Can’t make any sense of it? -> sounds random

• Music unfolds bit by bit? -> just right y j g

“Wundtcurve”curve

Also use this idea to build models of the music

Visualisation & ManipulationVisualisation & Manipulation

Page 18: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

S i Vi liSonic VisualiserCannam, Sandler, ...

• Visualise and edit content-derived metadata (low-level audio features and semantic descriptors)p )

• Open source

• VAMP plug-in API for creating new feature extractors• VAMP plug-in API for creating new feature extractors

• Plug-ins for onset, beat, structural segmentation, key, transcription etctranscription, etc

• Contribute/consume “Web 2.0” Linked Open Data

U d b MIR h i l i t t• Used by MIR researchers, musicologists, etc.(> 200,000 downloads)

S i Vi liSonic Visualiser

[Video]

Page 19: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

H i Vi liHarmonic Visualiser Wen, Sandler

• Signal modelled as quasiharmonic sinusoids plus residual

• Handles inharmonicityHandles inharmonicity, captures complete note

• Models vibrato• Models vibrato, enabling modification

• Musicologists:• Musicologists: “what-if” analysis

• Studio: edit pitch and• Studio: edit pitch and vibrato, remove notes, etcetc.

[Video]

EASAIER: Audio Visual ToolEASAIER: Audio-Visual Tool Zhou, Reiss

Time stretch – ½ - 2 xPitch Shifting – [-1,+1] octaveTransient Detection/ Peak LockingTime Freeze – freezing audio and video in time (see/hear chord played in particular time instance)

Video presentation – use mouse wheel to zoom in/out

[Video]

Page 20: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Non-speech Non-music soundsNon speech Non music sounds

Separating Heart SoundsSeparating Heart Sounds

Page 21: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

M di l S d H S dMedical Sounds: Heart SoundsWith: Hedayioglu, Jafari, Coimbra, Mattos

• Produced by valves and blood flow

• Important screening toolImportant screening tool

• Difficult to hear

• May just hear “boomp boomp”• May just hear boomp - boomp

• But the second “boomp” has two i t t timportant parts:

• A2 (aorta) – to body

• P2 (pulmonary artery) – to lungs

A2 P2

• Can we separate them?

S1 S2

O l i h ( h )Only one microphone (stethoscope)

• Clinician listens to 4 different places (“auscultation”)

• Each sound is a slightly different mixtureEach sound is a slightly different mixture

• The heart sound repeats, so:

1. Line them upp

2. Use Independent Component Analysis (ICA)

Page 22: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Li dLine up and separate...Play bothPlay bothwith P2 delayed

Hear both A2 & P2

“boomp - ba-doomp”

A2

Separated

P2

sounds

Natural Sounds: BirdsongNatural Sounds: Birdsong

Page 23: Making Sense of SddMiSound and Musicsap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Plumble... · 2012-04-20 · Y 10 1 X½A 3BX = ½ x A + 3 x B 5 3 These are the original 0 5 10

Bi d S i & Cl iBirdsong Segmentation & ClusteringWith: Stowell, Briefer, McElligott

[Skylark demo]

C l iConclusions

• Separating sounds, extracting notes, beats, ...

• Visualization and manipulationVisualization and manipulation

Where next?Where next?

• Digital Media everywhere

• Personal devices, social networking, audio and video

• More interaction with music – not just passive consumers

• “Non-speech non-music audio”

• Medical sounds, Environmental sounds, Urban soundsMedical sounds, Environmental sounds, Urban sounds