22
E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22 Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical Engineering, Columbia University [email protected] http://www.ee.columbia.edu/~dpwe/e4896/ 1 1. Music Alignment 2. Cover Song Detection 3. Echo Nest Analyze ELEN E4896 MUSIC SIGNAL PROCESSING

ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

Embed Size (px)

Citation preview

Page 1: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Lecture 12:Alignment and Matching

Dan EllisDept. Electrical Engineering, Columbia University

[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1

1. Music Alignment2. Cover Song Detection3. Echo Nest Analyze

ELEN E4896 MUSIC SIGNAL PROCESSING

Page 2: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

1. Music Alignment

• Often have versions of the same music with unmatched time axesdifferent performancesperformance vs. score

• Various applications for aligning themsynchronizing different tracks (with TSM)synchronized score displayground truth transcriptions

2

Kurth et al., 2007

Page 3: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /2216 32 48 time / beats

CDEGAB

Let It Be - Nick Cave

Euclidean Distance

16

Let I

t Be

- The

Bea

tles

3248

time

/ bea

tsCDEGAB

1

2

3

4

5

6

The Similarity Matrix• Point-to-point comparison of sequences

3

e.g. Euclidean distance

or normalized inner product (cosine distance)

dcos(i, j) = 1��

k xi(k)yj(k)��k |xi(k)|2

�k |yj(k)|2

deuc(i, j) =�

k

|xi(k)� yj(k)|2

dijj

i

Foote 1999

Page 4: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

0.1 0.4 0.6 0.8 1.1 1.6 2.3 2.9

0.3 0.3 0.4 0.6 1.0 1.5 2.1 2.6

0.5 0.5 0.4 0.5 0.9 1.5 2.2 2.7

0.8 0.8 0.7 0.6 0.7 1.2 1.8 2.4

1.2 1.2 1.0 1.0 0.7 0.9 1.2 1.5

1.4 1.5 1.3 1.2 1.0 0.9 1.3 1.5

2.1 2.2 2.0 1.8 1.5 1.3 1.2 1.6

2.7 2.8 2.5 2.3 2.0 1.7 1.6 1.51.5

1.2

0.9

0.7

0.6

0.4

0.3

0.1

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Dynamic Programming• Find best path combining local + transitions

works for any kind of similarity matrix

4

Allowable transitions

Finds path {ik, jk} to minimize cost ...

... recursively

C�imax,jmax

=�

k

d(ik, jk)

+T (ik � ik�1, jk � jk�1)

T(1,0) = 0.1

T(0,1) = 0.1

T(1,1) = 0.0

{ik, jk}

{ik-1, jk-1}

Local costs dij ; C* ; paths

C�i,j =

minx,y={(1,1),(1,0),(0,1)}

�d(i, j) + T (x, y) + C�

i�x,j�y

� i

j dij

Bellman 1957

Page 5: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Audio-to-Audio Alignment• Dynamic programming to get time mapping

+ phase vocoder time scaling

5

50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

450

500

Page 6: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Audio-Score Alignment• Aligning a score representation (e.g. MIDI)

is a proxy for polyphonic transcription

6

freq

/ Hz

time / sec

Let It Be + aligned MIDI labels

0 2 4 6 8 10 12 14 16 18 20 220

200

400

600

800

1000

Page 7: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Peak Structure Distance• How do we match spectra to score notes?

synthesize audio from MIDI & compare audio?“Peak Structure distance”: is energy where we expect?

7

0 50 100 150 200 250 300 350 400 time / frames

time / frames

C2C3C4C5C6

time / sec

freq

/ kHz

freq

/ bins

note

0 5 10 15 200

0.5

1

50 100 150 200 250 300 350 400 45020406080

freq

/ bins

20406080

0 50 100 150 200 250 300 350 4000

MIDI “Piano roll”

Synthesized audio

Predicted spectrum = mask M[k]

“Peak Structure”= energy blw mask

Orio & Schwartz 2001

dpsd = 1��

k M [k]|X[k]|�k |X[k]|

Page 8: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

2. Cover Song Detection• Musicians are fond of ‘cover versions’

usually alter melody, harmony, instrumentation, rhythm, stylecan be hard to spot even for a human!

• Can try to match via alignment.. with some threshold on best alignment cost?

8

Page 9: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Smith-Waterman Local Alignment

• “Local alignment” measure want largest score S*similarity s(i, j) must exceed penalty P(x,y) on avg. (e.g. 0.96 for diagonal, 1.2 for off-diagonal)

9

Beatles vs. Carol Woods ! cosine dist

50 100 150

20406080

100120140160180200

Smith!Waterman cd/2,.96.1.2

50 100 time / beats

time

/ bea

ts• Cover version may have different formdifferent number, ordering of verse/chorus/brigewant to find any large aligned regions

S�i,j = maxx,y

�max{0, s(i, j)� P (x, y) + S�i�x,j�y}

Page 10: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Local Alignment Cover Detection

• Smith-Waterman needs predictable valuesuse binary similarity based on best transposition

10

Serrà & Gòmez, 2008

Euclidean Binary Non-cover

Page 11: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Cross-correlation Covers System• DP is good for time-warping, but expensive

beat-timing is tempo independent (if it works)simply cross-correlate beat-chroma patches?

11

100 200 300 400 500 beats

100 200 300 400 500 beats

GEDCAch

rom

a bi

ns

GEDCAch

rom

a bi

ns

extract

cross-correlate

Query

Candidate

how big are the pieces?how do we combine individual scores?also expensive

Page 12: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Global Cross-Correlation

• Cross-correlate entire beat-chroma matrices... at all possible transpositions (circular)implicit combination of match quality and duration

• One good matching fragment is sufficient...?12

Ellis & Poliner, 2007

100 200 300 400 500 beats @281 BPM

-500 -400 -300 -200 -100 0 100 200 300 400 skew / beats-5

0

+5

GEDCAch

rom

a bi

ns

GEDCAch

rom

a bi

nssk

ew /

sem

itone

s

Elliott Smith - Between the Bars

Glen Phillips - Between the Bars

Cross-correlation

Page 13: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Filtered Cross-Correlation

• Raw correlation not as important as precise local matchlooking for large contrast at ±1 beat skewi.e. high-pass filter

13

-500 -400 -300 -200 -100 0 100 200 300 4000

0.20.40.6

-500 -400 -300 -200 -100 0 100 200 300 400 skew / beats

skew / beats

-5

0

+5

skew

/ se

mito

nes Cross-correlation

Cross-correlation @ skew = +2 semitones

raw

filtered

Page 14: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Cover Song Results• 23 Covers found in 8700 song ‘uspop2002’

popular ‘decoys’ – normalization issues14

Test

Que

ry

Ab Ad Al Am Be Be Bl Ca Ce Cl Co Co Da En Fa Go Go Gr Hu I_ I_ Le Ta

Abracadabra/sugar_ray

Addicted_To_Love/tina_turner

All_Along_The_Watchtower/dave_matthews_band

America/simon_and_garfunkel

Before_You_Accuse_Me/eric_clapton

Between_The_Bars/glen_phillips

Blue_Collar_Man/styx

Caroline_No/brian_wilson

Cecilia/simon_and_garfunkel

Claudette/roy_orbison

Cocaine/nazareth

Come_Together/beatles

Day_Tripper/cheap_trick

Enjoy_The_Silence/tori_amos

Faith/limp_bizkit

God_Only_Knows/brian_wilson

Gold_Dust_Woman/sheryl_crow

Grand_Illusion/styx

Hush/milli_vanilli

I_Can_t_Get_No_Satisfaction/rolling_stones

I_Love_You/faith_hill

Let_It_Be/nick_cave

Take_Me_To_The_River/annie_lennox

Cover Songs - dpwe23 - 12/23 correct

Page 15: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Analyzing Cover Song Correlation

• Look inside global cross-correlation to find matching fragments...xcorr = t f (C1(t, f)⋅C2(t, f)) - view along time

15

Let It Be / Beatles (beats 11-441)

50 100 150 200 250 300 350 400

0 50 100 150 200 250 300 350 400-0.2

00.20.4

Let It Be / Nick Cave (beats 13-443)

50 100 150 200 250 300 350 400 time / beats

time / beats

time / beats

chro

ma

A

CD

FG

chro

ma

A

CD

FG

Page 16: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Cover Song False Alarm• Correlation can be weak

“Cocaine” (Clapton) vs. “Satisfaction” (Stones)

16

Eric Clapton - Cocaine - beats 17:1027

100 200 300 400 500 600 700 800 900 1000

Rolling Stones - Satisfaction - beats 1:1011

100 200 300 400 500 600 700 800 900 1000

0 100 200 300 400 500 600 700 800 900 1000-2-1012

chroma

A

CD

FG

chroma

A

CD

FG

Page 17: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

3. Echo Nest Analyze• Web service to provide

beat, chroma, ... analysis (and much more)

17

240

761

2416

CDE

GAB

time / sec

freq

/ Hz

freq

/ Hz

0 2 4 6 8 10 12 14

240

761

2416

TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit

Original

Resynth

EN Features

register for free API keyhttp://developer.echonest.com/account/register/

upload MP3, get back XML with analysis data

Page 18: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

EN Analyze Usage• Matlab wrapper function

18

Page 19: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Million Song Dataset (MSD)

• Commercial-scale dataset available to MIR researchers1M pop songs250 GB of features(6 years of listening)

19

• EN Analyze features +...Lyrics, Tags, Covers, Listeners ...

http://labrosa.ee.columbia.edu/millionsong

Thierry Bertin-Mahieux

Page 20: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

MSD Metadata

20

artist: 'Tori Amos' release: 'LIVE AT MONTREUX' title: 'Smells Like Teen Spirit' id: 'TRKUYPW128F92E1FC0' key: 5 mode: 0 loudness: -16.6780 tempo: 87.2330time_signature: 4 duration: 216.4502 sample_rate: 22050 audio_md5: '8' 7digitalid: 5764727 familiarity: 0.8500 year: 1992

%5489,4468, Smells Like Teen SpiritTRTUOVJ128E078EE10 NirvanaTRFZJOZ128F4263BE3 Weird Al YankovicTRJHCKN12903CDD274 Pleasure BeachTRELTOJ128F42748B7 The Flying PicketsTRJKBXL128F92F994D Rhythms Del Mundo feat. ShanadeTRIHLAW128F429BBF8 The Bad PlusTRKUYPW128F92E1FC0 Tori Amos

12 hello 11 i 10 a 9 and 7 it 6 are 6 we 6 now

6 here 6 us 6 entertain 4 the 4 feel 4 yeah 3 to 3 my

3 is 3 with 3 oh 3 out 3 an 3 light 3 less 3 danger

100.0 – cover57.0 – covers43.0 – female vocalists42.0 – piano34.0 – alternative14.0 – singer-songwriter11.0 – acoustic8.0 – tori amos7.0 – beautiful6.0 – rock6.0 – pop6.0 – Nirvana6.0 – female vocalist6.0 – 90s5.0 – out of genre covers

5.0 – cover songs4.0 – soft rock4.0 – nirvana cover4.0 – Mellow4.0 – alternative rock3.0 – chick rock3.0 – Ballad3.0 – Awesome Covers2.0 – melancholic2.0 – k00l ch1x2.0 – indie2.0 – female vocalistist2.0 – female2.0 – cover song2.0 – american

EN Metadata

SHS Covers MxM Lyric Bag-of-Words

Last.fm Tags

Page 21: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

Summary• Music Alignment

Dynamic Programming finds correspondence

• Cover SongsDP, or cross-correlation for efficiency

• EN AnalyzeWeb service to analyze audio

21

Page 22: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: …dpwe/e4896/lectures/E4896-L12.pdf · E4896 Music Signal Processing ... Lecture 12: Alignment and Matching Dan Ellis Dept. Electrical

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22

References• R. Bellman, Dynamic Programming, Princeton University Press. 1957.

• D. Ellis and G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking,” Proc. ICASSP-07, Hawai'i, pp. IV-1429-1432, 2007.

• J. Foote, “Visualizing Music and Audio using Self-Similarity,” In Proc. ACM Multimedia, Orlando, pp. 77-80, 1999.

• Frank Kurth, Meinard Müller, Christian Fremerey, Yoon ha Chang, and Michael Clausen, “Automated synchronization of scanned sheet music with audio recordings,” Proc. 8th International Conference on Music Information Retrieval (ISMIR), Vienna, pp. 261-266, 2007.

• N. Orio & D. Schwarz, “Alignment of monophonic and polyphonic music to a score,” Proc. Int. Comp. Music Conf., Havana, pp.155-158, 2001.

• J. Serrà, E. Gómez, P. Herrera, X. Serra, “Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification,” IEEE Trans. on Audio, Speech and Lang. Proc., 16(6), pp. 1138-1151, 2008.

22