28
1 Copyright 2009 G.Tzan From rat laughter to speed metal - a personal history of phase vocoding George Tzanetakis ([email protected] ) Assistant Professor Computer Science Department (also in Music, ECE) University of Victoria, Canada

From rat laughter to speed metal - a personal history of …aalbu/elec310_2010/tzanetakis guest lecture.pdf · Vocoder = voice encoder (Youtube: Herbie Hancock - I thought it was

Embed Size (px)

Citation preview

1Copyright 2009 G.Tzanetakis

From rat laughter to speedmetal - a personal history

of phase vocodingGeorge Tzanetakis ([email protected]) Assistant ProfessorComputer Science Department(also in Music, ECE)University of Victoria, Canada

2Copyright 2009 G.Tzanetakis

Linear Systemsand Sine Waves

in1

in2

in1 + in2

out1

out2

out1 + out2

Amplitude

Period = 1 / Frequency

0 180 360

Phase True sine waves last forever

sine wave -> LTI -> new sine wave

3Copyright 2009 G.Tzanetakis

f _x_=!n=0

"

ancos_n_x__!n=0

"

bnsin_n_x_

Time-Frequency AnalysisFourier Transform

f __ _=!_"

"

f _t_ei_t

dt

ei_= cos____ i_sin___

4Copyright 2009 G.Tzanetakis

Short TimeFourier Transform

Time-varying spectraFast Fourier Transform FFT

Input

Time

t t+1

t+2

Filters Oscillators

Output

AmplitudeFrequency

5Copyright 2009 G.Tzanetakis

Phasevocoder

Vocoder = voice encoder (Youtube: HerbieHancock - I thought it was you)

Phase vocoder = vocoder that takes into accountphase information

Time stretching and pitch shifting

Variable speed playback changes the pitch

Why ?

6Copyright 2009 G.Tzanetakis

Phase-vocoding

1966 Flanagan and Gold (Bell Labs, Speech)

Analysis/Synthesis system

Signal = sum of time-varying sine waves(amplitude, frequency and phase)

Filterbank constraints = freq. repsonse of eachfilter identical except center frequency, sumhas flat freq. response, center frequenciesequally spaced

7Copyright 2009 G.Tzanetakis

Band-Pass Filter

Heterodyning

8Copyright 2009 G.Tzanetakis

PhasorsRectangular -> Polar

Time varying amplitude = radiusTime varying frequency = rate of

angular rotation

only 0 to 360 degrees

What if we have the followingsequence ?

180, 225, 270, 315, 0, 45, 90Need to convert to:

180, 225, 270, 315, 360, 405, 450

Phase unwrapping

9Copyright 2009 G.Tzanetakis

Demos (waterfall display +phasevocoder)

10Copyright 2009 G.Tzanetakis

PhaseVocoder

11Copyright 2009 G.Tzanetakis

MarPhasevocoder Demo

12Copyright 2009 G.Tzanetakis

Rats are ticklish

Joint work with bio-acoustician Tecumseh Fitch(Univ. Viena, Austria)

Rats laugh when tickled but we can’t hear them -ultrasound range (50 KHz)

Bat detectors simply frequency shift the signaldestroying all harmonic structure

Use phasevocoder for harmonic preserving pitchshifting and time-stretching (differentmetabolism)

13Copyright 2009 G.Tzanetakis

Pitch Shifting

Original - what you hear is nail scratches

Pitch-shifted so that laughter is audible

Only laughter

14Copyright 2009 G.Tzanetakis

Sound SourceSeparation

Gestalt Grouping Cues from Auditory Scene Analysis:Frequency continuityAmplitude continuity

Common Onset Harmonicity

15Copyright 2009 G.Tzanetakis

Time-Frequency tradeoff

FT = global representation of frequency content

Heisenberg uncertainty

Time – Frequency

L2

16Copyright 2009 G.Tzanetakis

Time-Frequency

Problem: to have more accurate frequencyestimation we need larger windows of time

Larger windows of time mean we don’t knowexactly when those frequencies happened

What about making hopsize 1 ?

Pitch-synchronous overlap-add (monophonicsounds)

Holy grail = low latency polyphonic pitch shifting

17Copyright 2009 G.Tzanetakis

IVL Audio Inc.

Local company - products in professionalmusic, consumer electronics, commercialkaraoke

Since 1984 - early pitch trackers)

Late 80s pitch shifting

Fall 2008 - DJ products

18Copyright 2009 G.Tzanetakis

Morpheus Drop-tune

Drop-tuning - retuning guitar strings to playlower (less string tension, multiple guitars)

Morpheus drotune = guitar pedal for drop-tuning(marketed by XP audio, designed by IVL)

Challenge = low latency polyphonic pitch shifting

Based on phase-vocoding at multiple timeresolutions

Various youtube demos

19Copyright 2009 G.Tzanetakis

“Classic” multi-stageapproach

Grouping Cue 1

Time-Frequencyrepresentation

Short Time Fourier TransformDiscrete basis: windowed sine waves

Grouping Cue 2

Partial Tracking (McAuley & Quatieri)

Sound source formation:grouping of partials based on harmonicity

PROBLEMS: Difficult to decide ordering, brittle

20Copyright 2009 G.Tzanetakis

Sound SourceSeparation using Spectral Clustering

21Copyright 2009 G.Tzanetakis

Sinusoidal Front-End

The signal within a frame ismodelled as:

peaks = sinusoidalcomponents withconstant parameters

22Copyright 2009 G.Tzanetakis

Architecture

23Copyright 2009 G.Tzanetakis

Dynamic texture windows

24Copyright 2009 G.Tzanetakis

Spectral Clustering

Alternative to traditional point-based algorthms(k-means)

Doesn’t assume convex shapes

Doesn’t assume Gaussians

Avoid multiple restarts

Eigenstructure of similarity matrix

25Copyright 2009 G.Tzanetakis

Denoising of OrcaVocalizaton

Original

Noise

Separated Vocalization

26Copyright 2009 G.Tzanetakis

Comparison with partialtracking

MacAuly and QuatieriTracking of Partials

Proposed Approach

27Copyright 2009 G.Tzanetakis

0

50

100

150

200 0

500

1000

1500

2000

25000

0.05

0.1

0.15

0.2

0.25

Frequency (Hz)Time (frames)

Amplitude

Cluster 0

Cluster 1

Synthetic Mixtures ofInstruments

4-note mixture

Instrumentation detection based on timbral modelsMartins, et al, ISMIR07

28Copyright 2009 G.Tzanetakis

“Real world” separationresults

Original Separated

Mirex database

Live U2

More examples: http://opihi.cs.uvic.ca/NormCutAudio http://opihi.cs.uvic.ca/Dafx2007