Upload
dinhxuyen
View
213
Download
0
Embed Size (px)
Citation preview
1Copyright 2009 G.Tzanetakis
From rat laughter to speedmetal - a personal history
of phase vocodingGeorge Tzanetakis ([email protected]) Assistant ProfessorComputer Science Department(also in Music, ECE)University of Victoria, Canada
2Copyright 2009 G.Tzanetakis
Linear Systemsand Sine Waves
in1
in2
in1 + in2
out1
out2
out1 + out2
Amplitude
Period = 1 / Frequency
0 180 360
Phase True sine waves last forever
sine wave -> LTI -> new sine wave
3Copyright 2009 G.Tzanetakis
f _x_=!n=0
"
ancos_n_x__!n=0
"
bnsin_n_x_
Time-Frequency AnalysisFourier Transform
f __ _=!_"
"
f _t_ei_t
dt
ei_= cos____ i_sin___
4Copyright 2009 G.Tzanetakis
Short TimeFourier Transform
Time-varying spectraFast Fourier Transform FFT
Input
Time
t t+1
t+2
Filters Oscillators
Output
AmplitudeFrequency
5Copyright 2009 G.Tzanetakis
Phasevocoder
Vocoder = voice encoder (Youtube: HerbieHancock - I thought it was you)
Phase vocoder = vocoder that takes into accountphase information
Time stretching and pitch shifting
Variable speed playback changes the pitch
Why ?
6Copyright 2009 G.Tzanetakis
Phase-vocoding
1966 Flanagan and Gold (Bell Labs, Speech)
Analysis/Synthesis system
Signal = sum of time-varying sine waves(amplitude, frequency and phase)
Filterbank constraints = freq. repsonse of eachfilter identical except center frequency, sumhas flat freq. response, center frequenciesequally spaced
8Copyright 2009 G.Tzanetakis
PhasorsRectangular -> Polar
Time varying amplitude = radiusTime varying frequency = rate of
angular rotation
only 0 to 360 degrees
What if we have the followingsequence ?
180, 225, 270, 315, 0, 45, 90Need to convert to:
180, 225, 270, 315, 360, 405, 450
Phase unwrapping
12Copyright 2009 G.Tzanetakis
Rats are ticklish
Joint work with bio-acoustician Tecumseh Fitch(Univ. Viena, Austria)
Rats laugh when tickled but we can’t hear them -ultrasound range (50 KHz)
Bat detectors simply frequency shift the signaldestroying all harmonic structure
Use phasevocoder for harmonic preserving pitchshifting and time-stretching (differentmetabolism)
13Copyright 2009 G.Tzanetakis
Pitch Shifting
Original - what you hear is nail scratches
Pitch-shifted so that laughter is audible
Only laughter
14Copyright 2009 G.Tzanetakis
Sound SourceSeparation
Gestalt Grouping Cues from Auditory Scene Analysis:Frequency continuityAmplitude continuity
Common Onset Harmonicity
15Copyright 2009 G.Tzanetakis
Time-Frequency tradeoff
FT = global representation of frequency content
Heisenberg uncertainty
Time – Frequency
L2
16Copyright 2009 G.Tzanetakis
Time-Frequency
Problem: to have more accurate frequencyestimation we need larger windows of time
Larger windows of time mean we don’t knowexactly when those frequencies happened
What about making hopsize 1 ?
Pitch-synchronous overlap-add (monophonicsounds)
Holy grail = low latency polyphonic pitch shifting
17Copyright 2009 G.Tzanetakis
IVL Audio Inc.
Local company - products in professionalmusic, consumer electronics, commercialkaraoke
Since 1984 - early pitch trackers)
Late 80s pitch shifting
Fall 2008 - DJ products
18Copyright 2009 G.Tzanetakis
Morpheus Drop-tune
Drop-tuning - retuning guitar strings to playlower (less string tension, multiple guitars)
Morpheus drotune = guitar pedal for drop-tuning(marketed by XP audio, designed by IVL)
Challenge = low latency polyphonic pitch shifting
Based on phase-vocoding at multiple timeresolutions
Various youtube demos
19Copyright 2009 G.Tzanetakis
“Classic” multi-stageapproach
Grouping Cue 1
Time-Frequencyrepresentation
Short Time Fourier TransformDiscrete basis: windowed sine waves
Grouping Cue 2
Partial Tracking (McAuley & Quatieri)
Sound source formation:grouping of partials based on harmonicity
PROBLEMS: Difficult to decide ordering, brittle
21Copyright 2009 G.Tzanetakis
Sinusoidal Front-End
The signal within a frame ismodelled as:
peaks = sinusoidalcomponents withconstant parameters
24Copyright 2009 G.Tzanetakis
Spectral Clustering
Alternative to traditional point-based algorthms(k-means)
Doesn’t assume convex shapes
Doesn’t assume Gaussians
Avoid multiple restarts
Eigenstructure of similarity matrix
26Copyright 2009 G.Tzanetakis
Comparison with partialtracking
MacAuly and QuatieriTracking of Partials
Proposed Approach
27Copyright 2009 G.Tzanetakis
0
50
100
150
200 0
500
1000
1500
2000
25000
0.05
0.1
0.15
0.2
0.25
Frequency (Hz)Time (frames)
Amplitude
Cluster 0
Cluster 1
Synthetic Mixtures ofInstruments
4-note mixture
Instrumentation detection based on timbral modelsMartins, et al, ISMIR07