41
Lecture 3: Audio Applications Topological Time Series Analysis - Theory And Practice Jose Perea, Michigan State University. Chris Tralie, Duke University 7/20/2016 Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Lecture 3: Audio Applications

Topological Time Series Analysis - Theory And Practice

Jose Perea, Michigan State University. Chris Tralie, Duke University

7/20/2016

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 2: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Table of Contents

I Audio Data / Biphonation

B Music Data

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 3: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Digital Audio Basics: Representation/Sampling

I 1D time series x [n], sampled at 44100hzB Shannon Nyquist: Need to sample at at least twice the

highest frequency of a bandlimited signal to avoidaliasing

I Very high sampling rate!B 1 second chunk lives in R44100

B 3 second chunk lives in R132300!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 4: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Digital Audio Basics: Representation/Sampling

I 1D time series x [n], sampled at 44100hzB Shannon Nyquist: Need to sample at at least twice the

highest frequency of a bandlimited signal to avoidaliasing

I Very high sampling rate!B 1 second chunk lives in R44100

B 3 second chunk lives in R132300!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 5: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Biphonation

I 2 noncommensurate frequencies present at the same timein biological phenomena

e.g. cos(t) + cos(πt)

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 6: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Horse Whinnies

High Valence Negative

Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 7: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Horse Whinnies

High Valence Positive

B We’ll be focusing on the positive clip today...

Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 8: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Horse Whinnies

High Valence Positive

B We’ll be focusing on the positive clip today...Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 9: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time

(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 10: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)

I By default, only using 512 samples after the starting time(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 11: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time

(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 12: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time

(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 13: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Biphonation Finding Competition

I Pan through audio file to find best region of biphonation, asmeasured by persistence of second most persistent class

I May be corrupted due to noiseI Will keep a running tab of best score on the board!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 14: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Table of Contents

B Audio Data / Biphonation

I Music Data

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 15: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Tempo / Repetition

I Music is full of repetition

I Tempo is determined by a train of music “pulses”/“beats” ina periodic pattern

I Foot tappingI Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 16: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Tempo / Repetition

I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in

a periodic pattern

I Foot tappingI Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 17: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Tempo / Repetition

I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in

a periodic pattern

I Foot tapping

I Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 18: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Tempo / Repetition

I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in

a periodic pattern

I Foot tappingI Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 19: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Tempo / Repetition

Don’t Stop Believin (120 beats per minute)

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 20: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)

I dT = 441I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 21: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)I dT = 441

I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 22: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)I dT = 441I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 23: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)I dT = 441I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 24: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Spectrograms: Definition

Aka the Squared Magnitude Short-Time Fourier Transform.GivenB A discrete signal xB A window size W (implicitly τ = 1)B A hop size H (like dT )

S[k ,n] =

∣∣∣∣∣∣∣∣∣FFT

x

nH

nH + 1...

nH + W − 1

[k ]

∣∣∣∣∣∣∣∣∣2

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 25: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Spectrograms: Definition

Aka the Squared Magnitude Short-Time Fourier Transform.GivenB A discrete signal xB A window size W (implicitly τ = 1)B A hop size H (like dT )

S[k ,n] =

∣∣∣∣∣∣∣∣∣FFT

x

nH

nH + 1...

nH + W − 1

[k ]

∣∣∣∣∣∣∣∣∣2

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 26: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Spectrograms: Definition

S[k ,n] =

∣∣∣∣∣∣∣∣∣FFT

x

nH

nH + 1...

nH + W − 1

[k ]

∣∣∣∣∣∣∣∣∣2

Window 2

Window 3

Window 1

hop

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 27: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Spectrograms

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 28: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Spectrograms

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 29: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Spectrograms

B Look at Journey example, show percussion

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 30: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Novelty Functions

f [n] =W−1∑k=0

s(log(S[k + 1,n])− log(S[k ,n]))

where

s(x) ={

x x > 00 otherwise

}Indicator function for “audio onsets”

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 31: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Novelty Functions

Show module, show Journey example

B By what factor have we reduced the sampling rate?

B Show synchronized audio

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 32: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Novelty Functions

Show module, show Journey example

B By what factor have we reduced the sampling rate?

B Show synchronized audio

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 33: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Novelty Functions

Show module, show Journey example

B By what factor have we reduced the sampling rate?

B Show synchronized audio

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 34: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Novelty Functions

Lots of variants1 Ellis, Daniel PW. ”Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007):

51-60.

2 Gouyon, Fabien, Simon Dixon, and Gerhard Widmer. ”Evaluating low-level features for beat classificationand tracking.” 2007 IEEE International Conference on Acoustics, Speech and SignalProcessing-ICASSP’07. Vol. 4. IEEE, 2007.

3 Boeck, Sebastian, and Gerhard Widmer. ”Maximum filter vibrato suppression for onset detection.”Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland.2013.

e.g. in [1]

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 35: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Audio Novelty Functions

Lots of variants1 Ellis, Daniel PW. ”Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007):

51-60.

2 Gouyon, Fabien, Simon Dixon, and Gerhard Widmer. ”Evaluating low-level features for beat classificationand tracking.” 2007 IEEE International Conference on Acoustics, Speech and SignalProcessing-ICASSP’07. Vol. 4. IEEE, 2007.

3 Boeck, Sebastian, and Gerhard Widmer. ”Maximum filter vibrato suppression for onset detection.”Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland.2013.

e.g. in [1]

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 36: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Music Vs Speech

Show module

B A sliding window of sliding windows!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 37: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Music Vs Speech

Show module

B A sliding window of sliding windows!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 38: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Conclusions

I Quasiperiodicity (biphonation) is present in nature

I Due to noise/artifacts, sometimes necessary to searcharound

I Summary features often better than raw dataI After proper preprocessing, TDA on sliding window

embeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 39: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Conclusions

I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search

around

I Summary features often better than raw dataI After proper preprocessing, TDA on sliding window

embeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 40: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Conclusions

I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search

aroundI Summary features often better than raw data

I After proper preprocessing, TDA on sliding windowembeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Page 41: Lecture 3: Audio Applications - Angewandte Numerische …...Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications. Biphonation I 2 noncommensurate frequencies

Conclusions

I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search

aroundI Summary features often better than raw dataI After proper preprocessing, TDA on sliding window

embeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications