From rat laughter to speed metal - a personal history of …aalbu/elec310_2010/tzanetakis guest lecture.pdf · Vocoder = voice encoder (Youtube: Herbie Hancock - I thought it was

1Copyright 2009 G.Tzanetakis

From rat laughter to speedmetal - a personal history

of phase vocodingGeorge Tzanetakis ([email protected]) Assistant ProfessorComputer Science Department(also in Music, ECE)University of Victoria, Canada


Linear Systemsand Sine Waves

in1

in2

in1 + in2

out1

out2

out1 + out2

Amplitude

Period = 1 / Frequency

0 180 360

Phase True sine waves last forever

sine wave -> LTI -> new sine wave


f _x_=!n=0

"

ancos_n_x__!n=0

"

bnsin_n_x_

Time-Frequency AnalysisFourier Transform

f __ _=!_"

"

f _t_ei_t

dt

ei_= cos____ i_sin___


Short TimeFourier Transform

Time-varying spectraFast Fourier Transform FFT

Input

Time

t t+1

t+2

Filters Oscillators

Output

AmplitudeFrequency


Phasevocoder

Vocoder = voice encoder (Youtube: HerbieHancock - I thought it was you)

Phase vocoder = vocoder that takes into accountphase information

Time stretching and pitch shifting

Variable speed playback changes the pitch

Why ?


Phase-vocoding

1966 Flanagan and Gold (Bell Labs, Speech)

Analysis/Synthesis system

Signal = sum of time-varying sine waves(amplitude, frequency and phase)

Filterbank constraints = freq. repsonse of eachfilter identical except center frequency, sumhas flat freq. response, center frequenciesequally spaced


Band-Pass Filter

Heterodyning


PhasorsRectangular -> Polar

Time varying amplitude = radiusTime varying frequency = rate of

angular rotation

only 0 to 360 degrees

What if we have the followingsequence ?

180, 225, 270, 315, 0, 45, 90Need to convert to:

180, 225, 270, 315, 360, 405, 450

Phase unwrapping


Demos (waterfall display +phasevocoder)


PhaseVocoder


MarPhasevocoder Demo


Rats are ticklish

Joint work with bio-acoustician Tecumseh Fitch(Univ. Viena, Austria)

Rats laugh when tickled but we can’t hear them -ultrasound range (50 KHz)

Bat detectors simply frequency shift the signaldestroying all harmonic structure

Use phasevocoder for harmonic preserving pitchshifting and time-stretching (differentmetabolism)


Pitch Shifting

Original - what you hear is nail scratches

Pitch-shifted so that laughter is audible

Only laughter


Sound SourceSeparation

Gestalt Grouping Cues from Auditory Scene Analysis:Frequency continuityAmplitude continuity

Common Onset Harmonicity


Time-Frequency tradeoff

FT = global representation of frequency content

Heisenberg uncertainty

Time – Frequency

L2


Time-Frequency

Problem: to have more accurate frequencyestimation we need larger windows of time

Larger windows of time mean we don’t knowexactly when those frequencies happened

What about making hopsize 1 ?

Pitch-synchronous overlap-add (monophonicsounds)

Holy grail = low latency polyphonic pitch shifting


IVL Audio Inc.

Local company - products in professionalmusic, consumer electronics, commercialkaraoke

Since 1984 - early pitch trackers)

Late 80s pitch shifting

Fall 2008 - DJ products


Morpheus Drop-tune

Drop-tuning - retuning guitar strings to playlower (less string tension, multiple guitars)

Morpheus drotune = guitar pedal for drop-tuning(marketed by XP audio, designed by IVL)

Challenge = low latency polyphonic pitch shifting

Based on phase-vocoding at multiple timeresolutions

Various youtube demos


“Classic” multi-stageapproach

Grouping Cue 1

Time-Frequencyrepresentation

Short Time Fourier TransformDiscrete basis: windowed sine waves

Grouping Cue 2

Partial Tracking (McAuley & Quatieri)

Sound source formation:grouping of partials based on harmonicity

PROBLEMS: Difficult to decide ordering, brittle


Sound SourceSeparation using Spectral Clustering


Sinusoidal Front-End

The signal within a frame ismodelled as:

peaks = sinusoidalcomponents withconstant parameters


Architecture


Dynamic texture windows


Spectral Clustering

Alternative to traditional point-based algorthms(k-means)

Doesn’t assume convex shapes

Doesn’t assume Gaussians

Avoid multiple restarts

Eigenstructure of similarity matrix


Denoising of OrcaVocalizaton

Original

Noise

Separated Vocalization


Comparison with partialtracking

MacAuly and QuatieriTracking of Partials

Proposed Approach


0

50

100

150

200 0

500

1000

1500

2000

25000

0.05

0.1

0.15

0.2

0.25

Frequency (Hz)Time (frames)

Amplitude

Cluster 0

Cluster 1

Synthetic Mixtures ofInstruments

4-note mixture

Instrumentation detection based on timbral modelsMartins, et al, ISMIR07


“Real world” separationresults

Original Separated

Mirex database

Live U2

More examples: http://opihi.cs.uvic.ca/NormCutAudio http://opihi.cs.uvic.ca/Dafx2007

Documents

From rat laughter to speed metal - a personal history of …aalbu/elec310_2010/tzanetakis guest lecture.pdf · Vocoder = voice encoder (Youtube: Herbie Hancock - I thought it was