18
Introduction to Music Informatics: I548/N560, Spring 2011 Instructor: Eric Nichols [email protected] http://tinyurl.com/Info548

Introduction to Music Informatics: I548/N560, Spring 2011

  • Upload
    river

  • View
    54

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction to Music Informatics: I548/N560, Spring 2011. Instructor: Eric Nichols [email protected] http://tinyurl.com/Info548. Overview Tues, Feb 15. HW – questions? HW: contest and output format Dynamic Time Warping for Audio-to-MIDI alignment Symbolic Representations - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to  Music Informatics:  I548/N560, Spring 2011

Introduction to Music Informatics: I548/N560, Spring

2011Instructor: Eric Nichols

[email protected]

http://tinyurl.com/Info548

Page 2: Introduction to  Music Informatics:  I548/N560, Spring 2011

OverviewTues, Feb 15

HW – questions? HW: contest and output format Dynamic Time Warping for Audio-to-MIDI

alignment Symbolic Representations Reading: Dannenberg

Page 3: Introduction to  Music Informatics:  I548/N560, Spring 2011

Polyphonic Audio Matching and

Alignment Ning Hu, Roger B. Dannenberg and George

Tzanetakis Goal: align polyphonic audio to a symbolic

score Does not perform transcription Used to search MIDI databases for a match

to a given audio recording

Page 4: Introduction to  Music Informatics:  I548/N560, Spring 2011

Motivation Query by Humming is an important

problem, and it uses a symbolic database. Why is symbolic better than audio matching

for this problem? Possible solution: do polyphonic

transcription on the query. Then find best match. However, transcription is hard.

Page 5: Introduction to  Music Informatics:  I548/N560, Spring 2011

Idea Instead of transcription of the query,

convert the symbolic database into audio! Instead of using an entire spectrum,

convert to a chroma vector. Do dynamic time warping (DTW) on audio

to look for matches.

Page 6: Introduction to  Music Informatics:  I548/N560, Spring 2011

Chroma Vector For each bin in the FFT

Assign the bin to the nearest half-step Remove octave information For each pitch class (1-12), average the value

of its associated bins. For this paper: 0.25 seconds of audio per

chroma vector. Nonoverlapping windows. Computing pitch from MIDI and vice versa

freq = 440 * 2^((MIDI-69) / 12.0) MIDI = 69 + 12*log(freq/440.0) / log(2)

Page 7: Introduction to  Music Informatics:  I548/N560, Spring 2011

Chroma Vectors

Page 8: Introduction to  Music Informatics:  I548/N560, Spring 2011

Why chroma? Not super-sensitive to spectral distribution –

ignores many details of timbre by collapsing everything into one octave

Mostly is sensitive to fundamental pitches and chords

Page 9: Introduction to  Music Informatics:  I548/N560, Spring 2011

Converting MIDI to chroma

Two possibilities: Render the MIDI with a synthesizer, and then

compute the FFT and then the chroma vector. Go directly from MIDI to chroma with a

theoretical model (in this paper, it is assumed that no overtones are present in the chroma for each given MIDI pitch.)

One difficulty: dealing with percussive sounds

Page 10: Introduction to  Music Informatics:  I548/N560, Spring 2011

Chroma Similarity Now we have lists of chroma vectors for an

audio query and for a database of MIDI files Normalize all vectors to have mean 0 and

variance 1 This helps reduce differences in vectors due

to absolute loudness Compute the Euclidean distance between

vectors (0 distance = perfect match) Compute the entire similarity matrix

between vector pairs.

Page 11: Introduction to  Music Informatics:  I548/N560, Spring 2011

Similarity MatrixDark = highly similar

Black diagonal = matching path

Note start, end, and length disparity

Page 12: Introduction to  Music Informatics:  I548/N560, Spring 2011

DTW computation

Page 13: Introduction to  Music Informatics:  I548/N560, Spring 2011
Page 14: Introduction to  Music Informatics:  I548/N560, Spring 2011

Results: 10 Beatles songs

Page 15: Introduction to  Music Informatics:  I548/N560, Spring 2011

Results 2

Page 16: Introduction to  Music Informatics:  I548/N560, Spring 2011

Results 3

Page 17: Introduction to  Music Informatics:  I548/N560, Spring 2011

Conclusion More sophisticated DTW could be used to

speed up the search Gives an example of linking symbolic and

audio domains

Page 18: Introduction to  Music Informatics:  I548/N560, Spring 2011

Discussion What elements/features of music should we

represent? Can we create a “dream” representation?