Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Lecture 10:Beat Tracking
Dan EllisDept. Electrical Engineering, Columbia University
[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1
1. Rhythm Perception2. Onset Extraction3. Beat Tracking4. Dynamic Programming
ELEN E4896 MUSIC SIGNAL PROCESSING
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
1. Rhythm Perception
• What is rhythm?aspects, origin?
2
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Rhythm Perception Experiments• Tapping experiment
ambiguous; hierarchy
3
0
10
20
30
40
time / sec
Freq
uenc
y
0 5 10 15 20 25 30
240
761
2416
7671
McKinney & Moelants 2006
mirex06/train10 Harnoncourt
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Rhythm Tracking Systems• Two main components
front end: extract ‘events’ from audioback end: find plausible beat sequence to match
• Other outputstempotime signaturemetrical level(s)
4
Onset Eventdetection
Beatmarking
Musicalknowledge
Audio Beat times etc.
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
2. Onset detection• Simplest thing is
energy envelope
5
e(n0) =W/2�
n=�W/2
w[n] |x(n + n0)|2
time / sec
freq
/ kHz
level
/ dB
level
/ dB
0 2 4 6 8 10
0
2
4
6
8
0102030405060
0102030405060
time / sec0 2 4 6 8 10
Harnoncourt Maracatu
Bello et al. 2005
emphasis on high frequencies?
�
f
f · |X(f, t)|
�
f
|X(f, t)|
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Multiband Derivative• Sometimes energy just “shifts”
calculate & sum onset in multiple bandsuse ratio instead of difference - normalize energy
6
o(t) =�
f
W (f) max(0,|X(f, t)|
|X(f, t� 1)| � 1)
0 2 4 6 8 100
10203040
0 2 4 6 8 10
freq
/ kHz
level
/ dB
0
2
4
6
8
5060
time / sec time / sec
Puckette et. al 1998
bonk~
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Phase Deviation• When amplitudes don’t change much,
phase discontinuity may signal new note
• Can detect by comparing actual phase with extrapolation from past
combine with amplitude?7
8.96 8.97 8.98 8.99 9 9.01 9.02 9.03 9.04 9.05 9.06
−0.1−0.05
00.050.10.15
X(f, tn-1)∆Θ∆Θ
X(f, tn+1)^
X(f, tn+1)
X(f, tn)
Re{X}
Im{X}
prediction
actual
Complexdeviation
Bello et al. 2005
X̂(f, tn+1) = X(f, tn)X(f, tn)
X(f, tn�1)
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
3. Rhythm Tracking• Earliest systems were rule based
based on musicology Longuet-Higgins and Lee, 1982inspired by linguistic grammars - Chomsky
input: event sequence (MIDI)output: quarter notes, downbeats
8
Desain & Honing 1999
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Resonators• How to address:
build-up of rhythmic evidence“ghost events”(audio input)
• Seems more like a comb filter...resonant filterbank of
for all possible T
9
Scheirer 1998
y(t) = �y(t� T ) + (1� �)x(t)
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Multi-Hypothesis Systems• Beat is ambiguous→ develop several alternatives
inputs: music audiooutputs: beat times, downbeats, BD/SD patterns...
10
Goto & Muraoka 1994Goto 2001Dixon 2001
A/D conversion Beat prediction Beat informationtransmission
Frequencyanalysis
tf
Frequency
spectru
m
Musical audio signals
Compact disc
Beat informationAgents
Manager
Onset-finders vectorizers
Drum-soundfinder
time Onset-time
Higher-levelcheckers
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
4. Dynamic Programming• Re-cast beat tracking as optimization:
Find beat times {ti} to maximize
is onset strength function is tempo consistency score e.g.
• Looks like an exponential search over all {ti}... but Dynamic Programming saves us
11
C({ti}) =N�
i=1
O(ti) + �N�
i=2
F (ti � ti�1, �p)
F (�t, �) = ��
log�t
�
�2F (�t, �)O(t)
Ellis 2007
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Dynamic Programming (DP)• DP is a general algorithm for optimizing
“optimal substructure” problemsi.e. where optimal total solution can be built from optimal partial solutions
• e.g. best path through cost matrix
path after (i,j) is independent of how we got there12
Input frames fi
Ref
eren
ce fr
ames
r j D(i,j) = d(i,j) + min{ }D(i-1,j) + T10D(i,j-1) + T01
D(i-1,j-1) + T11D(i-1,j)
D(i-1,j) D(i-1,j)
T10
T 01
T 11 Local match cost
Lowest cost to (i,j)
Best predecessor (including transition cost)
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Tempo Estimation• Algorithm needs global tempo period τ
otherwise problem is not “optimal substructure”
13
8 8.5 9 9.5 10 10.5 11 11.5 12-2
0
2
4
0
200
400
0 0.5 1 1.5 2 2.5 3 3.5 4lag / s
time / s
-100
0
100
200
Onset Strength Envelope (part)
Raw Autocorrelation
Windowed Autocorrelation
Primary Tempo PeriodSecondary Tempo Period
• Pick peak in onset envelope autocorrelationafter applying “human preference” windowcheck for subbeat
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Beat Tracking by DP• To optimize
define C*(t) as best score up to time t then build up recursively (with traceback P(t))
final beat sequence {ti} is best C* + back-trace
14
C({ti}) =N�
i=1
O(ti) + �N�
i=2
F (ti � ti�1, �p)
C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)}τ
P(t) = argmax{αF(t – τ, τp) + C*(τ)}τ
tτ
O(t)
C*(t)
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
beatsimple• Beat tracking in 15 lines of Matlab
15
function beats = beat_simple(onset, osr, tempo, alpha)% beats = beat_simple(onset, osr, tempo, alpha)% Core of the DP-based beat tracker% <onset> is the onset strength envelope at frame rate <osr>% <tempo> is the target tempo (in BPM)% <alpha> is weight applied to transition cost% <beats> returns the chosen beat sample times (in sec).% 2007–06–19 Dan Ellis [email protected]
if nargin < 4; alpha = 100; end
% backlink(time) is best predecessor for this point% cumscore(time) is total cumulated score to this pointlocalscore = onset;backlink = -ones(1,length(localscore));cumscore = zeros(1,length(localscore));
% convert bpm to samplesperiod = (60/tempo)*osr;
% Search range for previous beatprange = round(-2*period):-round(period/2);% Log-gaussian window over that rangetxwt = (-alpha*abs((log(prange/-period)).^2));
for i = max(-prange + 1):length(localscore) timerange = i + prange; % Search over all possible predecessors % and apply transition weighting scorecands = txwt + cumscore(timerange); % Find best predecessor beat [vv,xx] = max(scorecands); % Add on local score cumscore(i) = vv + localscore(i); % Store backtrace backlink(i) = timerange(xx);
end
% Start backtrace from best cumulated score[vv,beats] = max(cumscore);% .. then find all its predecessorswhile backlink(beats(1)) > 0 beats = [backlink(beats(1)),beats];end
% convert to secondsbeats = (beats-1)/osr;
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Results• Verify
against human tapping datavary tradeoff weight α vary tempo estimate τ
16
100
70
5040
30
150
200
mea
n ou
tput
BPM
bonus1 (Choral): BPM target vs. BPM out
0
0.5
1Beat accuracy
1000
0.1
0.2m/+ of inter-beat-intervals
target BPM
_ = 400_ = 100
100
70
150
200
300
400
mea
n ou
tput
BPM
accu
racy
accu
racy
m/+
m/+
bonus6 (Jazz): BPM target vs. BPM out
_ = 400_ = 100
0
0.5
1Beat accuracy
10070705030 40 200150 200150 300 4000
0.1
0.2
target BPM
m/+ of inter-beat-intervals
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Downbeat Detection• Downbeat = start of “bar”
one level up in metrical hierarchy
• Approaches
Goto’94 BTS: Pop music SD/BD template
Jehan’05: Trained classifier
17
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
Summary• Rhythm perception
Innate and strong, hierarchic
• Beat tracking modelsNeed to account for buildup & persistence
• Dynamic ProgrammingNeat way to maintain multiple hypotheses
18
E4896 Music Signal Processing (Dan Ellis) 2013-04-01 - /19
References• J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A Tutorial on Onset Detection in
Music Signals,” IEEE Tr. Speech and Audio Proc., vol.13, no. 5, pp. 1035-1047, September 2005.
• P. Desain & H. Honing, “Computational models of beat induction: The rule-based approach,” J. New Music Research, vol. 28 no. 1, pp. 29-42, 1999.
• Simon Dixon, “Automatic extraction of tempo and beat from expressive performances,” J. New Music Research, vol. 30 no. 1, pp. 39-58, 2001.
• D. Ellis, “Beat Tracking by Dynamic Programming,” J. New Music Research, vol. 36 no. 1, pp. 51-60, March 2007.
• Tristan Jehan, “Creating Music By Listening,” Ph.D Thesis, MIT Media Lab, 2005.
• Masataka Goto & Yoichi Muraoka “A Beat Tracking System for Acoustic Signals of Music,” ACM Multimedia, pp.365-372, October 1994.
• Masataka Goto, “An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds,” J. New Music Research, vol. 30 no. 2, pp.159-171, June 2001.
• M. F. McKinney and D. Moelants, “Ambiguity in tempo perception: What draws listeners to different metrical levels?” Music Perception, vol. 24 no. 2 pp. 155-166, December 2006.
• M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp. Music Conf., Ann Arbor, pp. 109–112, October 1998.
• Eric. D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” J. Acoust. Soc. Am., vol. 103, pp. 588-601, 1998.
19