67
Lecture 1: Introduction & DSP Michael Mandel [email protected] CUNY Graduate Center, Computer Science Program http://mr-pc.org/t/csc83060 With much content from Dan Ellis’ EE 6820 course Michael Mandel (83060 SAU) Intro & DSP 1 / 67

Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

  • Upload
    trannga

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Lecture 1:Introduction & DSP

Michael Mandel [email protected]

CUNY Graduate Center, Computer Science Programhttp://mr-pc.org/t/csc83060

With much content from Dan Ellis’ EE 6820 course

Michael Mandel (83060 SAU) Intro & DSP 1 / 67

Page 2: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Outline

1 Sound and information

2 Course Structure

3 Time domain signal processing

4 Frequency-domain signal processing

5 Time-frequency signal processing

Michael Mandel (83060 SAU) Intro & DSP 2 / 67

Page 3: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Acknowledgments: This lecture uses content from

Dan Ellis’ EE 4810 course (Lectures 1,2,3)https://www.ee.columbia.edu/~dpwe/e4810

Dan Ellis’ EE 6820 course (Lecture 1)https://www.ee.columbia.edu/~dpwe/e6820/

Michael Mandel (83060 SAU) Intro & DSP 3 / 67

Page 4: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Outline

1 Sound and information

2 Course Structure

3 Time domain signal processing

4 Frequency-domain signal processing

5 Time-frequency signal processing

Michael Mandel (83060 SAU) Intro & DSP 4 / 67

Page 5: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Sound and information

Sound is air pressure variation

Mechanical vibration

Pressure waves in air

Motion of sensor

Time-varying voltage

+ + + +

t

v(t)

Transducers convert air pressure ↔ voltage

Michael Mandel (83060 SAU) Intro & DSP 5 / 67

Page 6: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

What use is sound?

Footsteps examples:

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-0.5

0

0.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-0.5

0

0.5

time / s

Hearing confers an evolutionary advantage

useful information, complements vision

. . . at a distance, in the dark, around corners

listeners are highly adapted to ‘natural sounds’ (incl. speech)

Michael Mandel (83060 SAU) Intro & DSP 6 / 67

Page 7: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

The scope of audio processing

Michael Mandel (83060 SAU) Intro & DSP 7 / 67

Page 8: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

The acoustic communication chain

message signal channel receiver decoder

!

synthesis audio processing recognition

Sound is an information bearer

Received sound reflects source(s)plus effect of environment (channel)

Michael Mandel (83060 SAU) Intro & DSP 8 / 67

Page 9: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Levels of abstraction

Much processing concerns shifting between levels of abstraction

sound p(t)

representation (e.g. t-f energy)

‘information’

abstract

concrete

An

alys

is

Syn

thesis

Different representations serve different tasks

separating aspects, making things explicit, . . .

Michael Mandel (83060 SAU) Intro & DSP 9 / 67

Page 10: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Outline

1 Sound and information

2 Course Structure

3 Time domain signal processing

4 Frequency-domain signal processing

5 Time-frequency signal processing

Michael Mandel (83060 SAU) Intro & DSP 10 / 67

Page 11: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Course structureGoals

I survey topics in sound analysis & processingI develop an intuition for sound signalsI learn some specific technologies

Course structureI weekly reading (20% + 10%)I midterm project proposal (20%)I final project (30% + 10%)

Required textbook for first part of course. Available athttp://dicklyon.com/hmh/Lyon_Hearing_book_

01jan2018.pdf

Richard Lyon (2018), Human and MachineHearing: Extracting meaning from sound.Cambridge University Press.

Michael Mandel (83060 SAU) Intro & DSP 11 / 67

Page 12: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Web-based

Course website:

http://mr-pc.org/t/csc83060

for lecture notes, articles, examples, . . .

+ Blackboard for homework, etc.

Michael Mandel (83060 SAU) Intro & DSP 12 / 67

Page 13: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Assignments: every week

Everyone reads book chapters and/or review papersI reviewing topic of next classI discuss in class

One student presents a contribution paperI journal & conference papers presenting a novel contributionI everyone else reads it as wellI everyone writes a two-paragraph response on blackboard:¶1: summary and ¶2: future directions

Write one-paragraph project statusI ¶3: weekly status updates

Michael Mandel (83060 SAU) Intro & DSP 13 / 67

Page 14: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Final project

Most significant part of course (60%) of grade

Oral proposals mid-semester;Presentations in final class+ paper on week later

ScopeI practical (Python or Matlab recommended)I identify a problem; try some solutionsI evaluation

TopicI few restrictions within world of audioI investigate other resourcesI develop in discussion with me

No copying

Michael Mandel (83060 SAU) Intro & DSP 14 / 67

Page 15: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Examples of past projects

Use of “bubble noise” to compare native Korean listeners toKorean learners – became masters thesis

Use of siamese networks as learned distance function betweenclean and noisy speech – became INTERSPEECH paper

Implementation of classic noise tracking algorithms andapplication to beamforming

Implementation of LSTM-based single-channel speechenhancement

Michael Mandel (83060 SAU) Intro & DSP 15 / 67

Page 16: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Some project ideas

Podcast transcriber and diarizer

Podcast episode recommender

Analysis of long-term environmental recordings

Heatmapping tools applied to automatic speech recognizers

Use of neural speech synthesis in speech enhancement

Michael Mandel (83060 SAU) Intro & DSP 16 / 67

Page 17: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Example responses: ¶1: Contribution article summary

The paper for this week is Mohri, Mehryar, Fernando Pereira, and Michael Riley. “Speech recognition with weightedfinite-state transducers.” Springer Handbook of Speech Processing. Springer Berlin Heidelberg, 2008. 559-584.The paper describes how to use weighted finite-state transducers(WFST) for automatic speech recognitions. Thetransducers can be efficiently used to represent major components of speech recognition like Hidden MarkovModel(HMM), context-dependency model, pronunciation lexicon and statistical grammars. At first authors presentgeneral description of transducers , algorithms to determinize and minimize them. Then they describe how thesemethods are very good match for combining and optimizing different parts of the speech recognizer. A WFST canbe described informally as a machine with fixed set of states, it has a start state, some final states and intermediatestates. Given a input symbol they can transitions from one state to another with producing output symbol. WFSTis weighted in the sense that transitions from state to state can have different weights. Generally these weightsrepresents the probability or likelihood of different paths. It transduces a string that can be read along the pathfrom start state to final state to output string with particular weight. In another word they represent a binaryrelation between input and output strings. For example pronunciation lexicons can be represented by transducers asmapping from pronunciation to word. Pronunciation lexicon (L) for all of the words can be generated by composingall word pronunciation transducers together thorough a super-initial state. Applying a Kleene closure on each of theword transducer allows for words to repeat in any order. If we add the mapping context dependent triphone tocontext independent phone with this, we get a model(C) that maps from context dependent phones to wordstrings. For efficiency we need to determinize and minimize the transducer. Converting non-deterministictransducer to a deterministic counter part is known as a detereminizer, there are algorithms like semiring applicablefor this operation. Deterministic transducers are efficient to search given a particular input-output pair, thus wewant to determinize the transducers for efficiency. Now the problem is all of the models may not be determinizableto start with, for example pronunciation has homophones , different words that sounds the same. Thus they are notdeterminizable,we need to add auxiliary phone symbols to make it determinizable. A unweighted transducer can beminimized with classical algorithms, but for weighted version there is an extra step pushing /reweighting. Finally ifwe combine HMM model (H) with this we get a transducer that maps from distribution to word strings.

My response: Good summary

Michael Mandel (83060 SAU) Intro & DSP 17 / 67

Page 18: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Example responses: ¶2: Contribution article response

The paper for this week talks about transducers and their application to ASR. They first introduce acceptors andthen define transducers. They also mention different algorithms for composition, determinization and minimizationof transducers. According to me it is a very well written paper, the examples are specially well described and easyto reproduce. With the algorithms I found the mathematical descriptions very informative and useful as well. Iwanted to know more about other available algorithms and whether they can be used for speech recognizer. Forcomposition they described state-wise pairing with semiring method which increases the state space of thecomposite automation. Though that is expected as we are combining two transducers. I wanted to know a little bitmore about other available algorithms and why we can or can’t use them. I felt similarly about weight pushingalgorithm, whether there are other algorithms available for reweighing the transducer. Or if there is any algorithmfor directly minimizing weighted transducer pushing and then minimizing.

My response: Good questions, I don’t know all of the answers to them. A semiring isn’t an algorithm or a method,it is a mathematical object that has certain characteristics. Informally, it’s the way to keep track of the weights andhow to combine them. The semiring to represent probabilities is different from, but equivalent to, the semiring torepresent log probabilities. Basically you need a plus operation and a times operation. For probabilities, plus is +and times is *. For log probabilities, plus is log(exp(x1)+exp(x2)) and times is +. So applying the plus operationto two probabilities is equivalent to applying the plus operation to two log probabilities. And similarly for times.Anyway, I think there are several algorithms for weight pushing, minimization, etc.

Michael Mandel (83060 SAU) Intro & DSP 18 / 67

Page 19: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Example responses: ¶3: Project status

For the project I am working to train the deep neural network, which is the main part of the project.

This week I have written a code for simple WARP loss function.

I have also created the training data from CHiME2-GRID. The training data contains one clean chunk, noisy chunkand another negative example.

I am also working towards setting the DNN properly to train with training data.

My response: Ok, have you checked the code into github on your own branch yet? If so, send me a link to it. Ifnot, please do.

Michael Mandel (83060 SAU) Intro & DSP 19 / 67

Page 20: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Outline

1 Sound and information

2 Course Structure

3 Time domain signal processingTime domain signalsTime-domain systemsConvolution

4 Frequency-domain signal processing

5 Time-frequency signal processing

Michael Mandel (83060 SAU) Intro & DSP 20 / 67

Page 21: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Signals

Signals: Information-bearing function

E.g. sound: air pressure variation at a point as a function of time

DimensionalityI Sound: 1-dimensionalI Greyscale image: 2-dimensional, i(x , y)I Video: 3 x 3-D, r(x , y , t), g(x , y , t), b(x , y , t)

Michael Mandel (83060 SAU) Intro & DSP 21 / 67

Page 22: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Digital signal processing

DSP = signal processing on a computer

Two effects: discrete-time, discrete level

time

xd[n] = Q( xc(nT ) )

Discrete-time sampling limits bandwidth

Discrete-level quantization

limits dynamic range

T

ε

Michael Mandel (83060 SAU) Intro & DSP 22 / 67

Page 23: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

DSP vs analog signal processing

Analog signal processing

Processorp(t) q(t)

Digital signal processing system

Processorp(t) q(t)A/D D/Ap[n] q[n]

Michael Mandel (83060 SAU) Intro & DSP 23 / 67

Page 24: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Digital vs analog

Digital prosI Noise performance – quantized signalI Use general-purpose computer – flexible, upgradeableI Stability/replicability

Digital consI Limitations of analog-digital convertersI Baseline complexity/power consumptionI Latency

Michael Mandel (83060 SAU) Intro & DSP 24 / 67

Page 25: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Digital limitation: sampling sinusoids

Samples of a sinusoid could equally correspond to others

0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

1

x1[n] = sin(ω0n)

x2[n] = sin((ω0 + 2πr)n) = sin(ω0n) = x1[n]

Michael Mandel (83060 SAU) Intro & DSP 25 / 67

Page 26: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Aliasing

For example, for cos(ωn), ω = 2πr ± ω0

all (integer) r values appear to the same after sampling

We say that a larger ω appears aliased to a lower frequency

Therefore we generally consider frequencies 0 ≤ ω0 ≤ πi.e., less than 1

2 cycle per sample

Michael Mandel (83060 SAU) Intro & DSP 26 / 67

Page 27: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Sampling and aliasingDiscrete-time signals equal the continuous time signal at discretesampling instants:

xd [n] = xc(nT )

Sampling cannot represent rapid fluctuations

0 1 2 3 4 5 6 7 8 9 10 1

0.5

0

0.5

1

sin

((ΩM +

T

)Tn

)= sin(ΩMTn) ∀n ∈ Z

Nyquist limit (ΩT/2) from periodic spectrum:

ΩΩM−ΩT ΩT −ΩM ΩT - ΩM−ΩT + ΩM

Gp(jΩ)Ga(jΩ) “alias” of “baseband”

signal

Michael Mandel (83060 SAU) Intro & DSP 27 / 67

Page 28: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

The speech signal: time domain

Speech is a sequence of different sound types

-0.2

-0.1

0

0.1

0.2

1.38 1.4 1.42.1

0

.1

1.52 1.54 1.56 1.58-0.1

0

0.1

1.86 1.88 1.921.9-0.05

0

0.05

2.42 2.44 2.46 2.4

-0.02

0

0.02

1.4 1.6 1.8 2 2.2 2.4 2.6 time/s

watch thin as a dimeahas

Vowel: periodic “has”

Fricative: aperiodic “watch”

Glide: smooth transition “watch”

Stop burst: transient “dime”

Michael Mandel (83060 SAU) Intro & DSP 28 / 67

Page 29: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Discrete-time systems

A system converts input to output

x[n] DT System y[n]

y [n] = f (x [n])∀n

For example, a 3-point moving average (M = 3)

y [n] =1

M

M−1∑k=0

x [n − k]

Michael Mandel (83060 SAU) Intro & DSP 29 / 67

Page 30: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

3-point moving average example

y [n] =1

3

2∑k=0

x [n − k]

n

x[n]

-1 1 2 3 4 5 6 7 8 9

A

n

x[n-1]

-1 1 2 3 4 5 6 7 8 9

n

x[n-2]

-1 1 2 3 4 5 6 7 8 9

n

y[n]

-1 1 2 3 4 5 6 7 8 9

Michael Mandel (83060 SAU) Intro & DSP 30 / 67

Page 31: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Moving average smoother

Moving average smooths out rapid variationse.g., “12 month moving average”

Example, 5-point moving average

x [n] = s[n]︸︷︷︸signal

+ d [n]︸︷︷︸noise

y [n] =1

5

4∑k=0

x [n − k]

Michael Mandel (83060 SAU) Intro & DSP 31 / 67

Page 32: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Example 2: accumulator

Accumulator: outputaccumulates all past inputs

y [n] =n∑

`=−∞x [`]

= x [n] +n−1∑`=−∞

x [`]

= x [n] + y [n − 1]

n

x[n]

-1 1 2 3 4 5 6 7 8 9

A

n

y[n-1]

-1 1 2 3 4 5 6 7 8 9

n

y[n]

-1 1 2 3 4 5 6 7 8 9

Michael Mandel (83060 SAU) Intro & DSP 32 / 67

Page 33: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

System property: Linearity

Linear systems obey superposition

x[n] DT System y[n]

If input x1[n]→ output y1[n], x2 → y2

Given a linear combination of inputs, a linear systemproduces the same linear combination of outputs

x [n] = αx1[n] + βx2[n]

→ y [n] = αy1[n] + βy2[n]

for all α, β, x1, x2

Michael Mandel (83060 SAU) Intro & DSP 33 / 67

Page 34: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

System property: Time-invariance

Time-invariant systems don’t depend on an absolute time

Given a time-shifted input, a time-invariant systemproduces a version of the output shifted by the same amount

if x [n]→ y [n], then x [n − n0]→ y [n − n0]

for all n0, x

Michael Mandel (83060 SAU) Intro & DSP 34 / 67

Page 35: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Impulse response

An impulse is a unit sample sequence

δ[n] =

1, n = 00, n 6= 0

Given a system, if x [n] = δ[n] then y [n] , h[n],and h[n] is the impulse response of the system

x[n] DT System y[n]

A linear, time-invariant system is completely specified by h[n]

Michael Mandel (83060 SAU) Intro & DSP 35 / 67

Page 36: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Impulse response example: Moving average

Consider the 3-point moving average filter again

If we put in a standard impulse x [n] = δ[n]we get out the impulse response y [n] = h[n]

n

x[n]

-1 1 2 3 4 5 6 7

1

-2-3

n

y[n]

-1 1 2 3 4 5 6 7-2-3

3/1

Michael Mandel (83060 SAU) Intro & DSP 36 / 67

Page 37: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Convolution

Impulse response means for an LTI system δ[n]→ h[n]

Shift invariance means δ[n − n0]→ h[n − n0]

Linearity means αδ[n− k] + βδ[n− l ]→ αh[n− k] + βh[n− l ]

We can express any sequence as a sum of weighted δs

x [n] = x [0]δ[n] + x [1]δ[n − 1] + x [2]δ[n − 2] + . . .

Therefore, we can compute the output of any system

Michael Mandel (83060 SAU) Intro & DSP 37 / 67

Page 38: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Convolution sum

Another way to write x [n] as a sum of deltas is

x [n] =∞∑

k=−∞x [k]δ[n − k]

Therefore, for an LTI system, we get the convolution sum

y [n] =∞∑

k=−∞x [k]h[n − k] , x [n] ∗ h[n]

Note that this is symmetric in x and h by using ` = n − k

x [n] ∗ h[n] =∞∑

`=−∞x [n − `]h[`] = h[n] ∗ x [n]

Michael Mandel (83060 SAU) Intro & DSP 38 / 67

Page 39: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Interpreting convolution

Passing a signal through an LTI system is equivalent toconvolving it with the system’s impulse response

x[n] h[n] y[n]

Let’s convolve the following two signals in two different ways

x[n]=0 3 1 2 -1

n-1 1 2 3 5 6 7-2-3

→h[n] = 3 2 1

n-1 1 2 3 4 5 6 7-2-3

y [n] =∞∑

k=−∞x [k]h[n − k] =

∞∑`=−∞

x [n − `]h[`]

Michael Mandel (83060 SAU) Intro & DSP 39 / 67

Page 40: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Convolution interpretation 1

y [n] =∞∑

k=−∞x [k]h[n − k]

Time-reverse h, shift by n, take inner product against fixed x

k-1 1 2 3 5 6 7-2-3

x[k]

k-1 1 2 3 4 5 6 7-2-3

h[0-k]

k-1 1 2 3 4 5 6 7-2-3

h[1-k]

k-1 1 2 3 4 5 6 7-2-3

h[2-k]

n-1 1 2 3 5 7-2-3

y[n]0

9 911

2-1

Michael Mandel (83060 SAU) Intro & DSP 40 / 67

Page 41: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Convolution interpretation 2

y [n] =∞∑

`=−∞x [n − `]h[`]

Shifted xs weightedby points in h

Or, weighted delayedversions of h

n-1 1 2 3 5 7-2-3

y[n]0

9 911

2-1

n-1 1 2 3 5 6 7-2-3

x[n]

n-1 1 2 3 4 6 7-2-3

x[n-1]

n-1 1 2 3 4 5 7-2-3

x[n-2]

k-1

1234567

-2-3

h[k]

Michael Mandel (83060 SAU) Intro & DSP 41 / 67

Page 42: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Time-domain summary

Computers analyze signals that are discrete in time and in level

Sampling leads to aliasing, i.e., limited frequency bandwidth

Linear time-invariant systems transform signals by convolvingthem with an impulse response

Michael Mandel (83060 SAU) Intro & DSP 42 / 67

Page 43: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Outline

1 Sound and information

2 Course Structure

3 Time domain signal processing

4 Frequency-domain signal processing

5 Time-frequency signal processing

Michael Mandel (83060 SAU) Intro & DSP 43 / 67

Page 44: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

The Fourier series

The basic observation (in continuous time) is thatA periodic signal can be decomposed intoa sum of weighted sinusoids at integer multiplesof the fundamental frequency

i.e., if x(t) = x(t + T ) for some fundamental frequency T , then

x(t) ≈M∑k=0

ak cos

(2πk

Tt + φk

)

=M∑k=0

a′k cos

(2πk

Tt

)+ bk sin

(2πk

Tt

)

=M∑k=0

ck︸︷︷︸complex

exp

(2πkj

Tt

)where j =

√−1

Michael Mandel (83060 SAU) Intro & DSP 44 / 67

Page 45: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier series example: square wave

Approximated by the following Fourier series coefficients

φk = 0 ak =

(−1)

k−12

1k k = 1, 3, 5, . . .

0 otherwise

i.e. x(t) ≈M∑k=0

ak cos

(2πk

Tt + φk

)= cos

(2π

Tt

)− 1

3cos

(2π

T3t

)+

1

5cos

(2π

T5t

)− . . .

1.5 1 0.5 0 0.5 1 1.51

0.5

0

0.5

1

Michael Mandel (83060 SAU) Intro & DSP 45 / 67

Page 46: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier analysis

Given a signal, how can we find Fourier coefficients?

Let’s use the a′k and bk representation, so

x(t) =M∑k=0

a′k cos

(2πk

Tt

)+ bk sin

(2πk

Tt

)Take the inner product with complex sinusoids

a′k =1

T

∫ T/2

−T/2x(t) cos

(2πk

Tt

)dt

bk =1

T

∫ T/2

−T/2x(t) sin

(2πk

Tt

)dt

Works because these sinusoids are orthogonal to each otheri.e., each sinusoid picks itself out of the signal

Michael Mandel (83060 SAU) Intro & DSP 46 / 67

Page 47: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Complex numbers

Complex numbers are for us a mathematical convenience thatlead to simple expressions for things like the Fourier transform

Add a second “imaginary” dimension (j ,√−1) to all values

Two equivalent representations:I Rectangular form: x = xr + jxi

with magnitude |x | =√x2r + x2

i

and phase θ = tan−1(xi/xr )I Polar form: x = |x |e jθ = |x | cos θ + j |x | sin(θ)

Euler’s formula: e jθ = exp(jθ) = cos θ + j sin θ

Michael Mandel (83060 SAU) Intro & DSP 47 / 67

Page 48: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Operations on complex numbers

x

real

imag

xre xre+yre

xim+yim

yre

x+y

x·y

xim

yim

|x|

|y|

|x|·|y|

φ

θ

θ+φ

When adding, real and imaginaryparts add

(a + jb) + (c + jd) = (a + c) + j(b + d)

When multiplying, magnitudesmultiply and phases add

re jθ · se jφ = rse j(θ+φ)

Michael Mandel (83060 SAU) Intro & DSP 48 / 67

Page 49: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Complex conjugation

Conjugation operation negates the imaginary part

x∗ = xr − jxi

also negates the imaginary phase

x∗ = |x |e j(−θ)

Useful in extracting real quantities from complex expressions

x + x∗ = xr + jxi + xr − jxi = 2xr

x · x∗ = |x |e jθ|x |e−jθ = |x |2

real

imag

|x|

θ

xx+x*= 2xr

x*

Michael Mandel (83060 SAU) Intro & DSP 49 / 67

Page 50: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier analysis using complex exponentials

Inner product with complex sinusoids

a′k =1

T

∫ T/2

−T/2x(t) cos

(2πk

Tt

)dt

bk =1

T

∫ T/2

−T/2x(t) sin

(2πk

Tt

)dt

Can be written more compactly with complex exponentials

ck =1

T

∫ T/2

−T/2x(t) exp

(−j 2πk

Tt

)dt

Then synthesis of original signal

x(t) =M∑k=0

ck exp

(j2πk

Tt

)

Michael Mandel (83060 SAU) Intro & DSP 50 / 67

Page 51: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier transform

The Fourier series (applicable to periodic signals) extendsnaturally to the Fourier Transform for continuous time signals

Fourier transform X (jΩ) =

∫ ∞−∞

x(t) exp(−jΩt)dt

Inverse Fourier transform x(t) =1

∫ ∞−∞

X (jΩ) exp(jΩt)dΩ

The discrete index k changes to a continuous frequency Ω

Michael Mandel (83060 SAU) Intro & DSP 51 / 67

Page 52: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

The Fourier transform is a mappingbetween two continuous functions

2π ambiguity

0 0.002 0.004 0.006 0.008time / sec

level / dB

-0.01

0

0.01

0.02x(t)

0 2000 4000 6000 8000freq / Hz

-80

-60

-40

-20

0|X jΩ( )|

0 2000 4000 6000 8000freq / Hz

0

π/2

-π/2

argX jΩ( )

Michael Mandel (83060 SAU) Intro & DSP 52 / 67

Page 53: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier transforms

Different Fourier analyses possible depending on signals

continuous vs discrete, infinite vs finite (and periodic)

Time Frequency

Fourier series Cont. periodic x(t) Discrete infinite ckFourier transform (FT) Cont. infinite x(t) Cont. infinite X (Ω)

Discrete-time FT Discrete infinite x [n] Cont. periodic X (e jω)Discrete FT Discrete periodic x [n] Discrete periodic X [k]

Michael Mandel (83060 SAU) Intro & DSP 53 / 67

Page 54: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier transforms

Fourier Series (periodic continuous x)

Ω0 =2π

T

x(t) =∑k

ckejkΩ0t

ck =1

2πT

∫ T/2

−T/2x(t)e−jkΩ0tdt k1 2 3 5 6 74

|ck|1.0

1.5 1 0.5 0 0.5 1 1.5 1

0.5

0

0.5

t

x(t)

Fourier Transform (aperiodic continuous x)

x(t) =1

∫X (jΩ)e jΩtdΩ

X (jΩ) =

∫x(t)e−jΩtdt

0 0.002 0.004 0.006 0.008time / sec

level / dB

-0.01

0

0.01

0.02

x(t)

0 2000 4000 6000 8000freq / Hz

-80

-60

-40

-20 |X(jΩ)|

Michael Mandel (83060 SAU) Intro & DSP 54 / 67

Page 55: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Discrete-time Fourier

DT Fourier Transform (aperiodic sampled x)

x [n] =1

∫ π

−πX (e jω)e jωndω

X (e jω) =∑

x [n]e−jωn

n-1 1 2 3 4 5 6 7

0 π

|X(ejω)|

ω2π 3π 4π 5π

1

2

3

x [n]

Discrete Fourier Transform (N-point x)

the only one we will use on computers

x [n] =∑k

X [k]e j 2πknN

X [k] =∑n

x [n]e−j2πknN

k

|X(ejω)||X[k]|

k=1...

n1 2 3 4 5 6 7

x [n]

Michael Mandel (83060 SAU) Intro & DSP 55 / 67

Page 56: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier transform and convolution

The discrete-time Fourier transform is defined as

X (e jω) ,∞∑

n=−∞x [n]e−jωn

If x [n] = g [n] ∗ h[n] is the result of a convolution

then in the frequency domain it becomes multiplication

X (e jω) = G (e jω) · H(e jω)

Michael Mandel (83060 SAU) Intro & DSP 56 / 67

Page 57: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Fourier transform and convolution proof

The discrete-time Fourier transform is defined as

X (e jω) ,∞∑

n=−∞x [n]e−jωn

If x [n] = g [n] ∗ h[n] then

X (e jω) =∞∑

n=−∞(g [n] ∗ h[n])e−jωn

=∞∑

n=−∞

( ∞∑k=−∞

g [k]h[n − k]

)e−jωn

=∞∑

k=−∞g [k]e−jωk

∞∑n=−∞

h[n − k]e−jω(n−k)

= G (e jω) · H(e jω)

Michael Mandel (83060 SAU) Intro & DSP 57 / 67

Page 58: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Speech sounds in the Fourier domain

1.52 1.54 1.56 1.58-0.1

0

0.1

2.42 2.44 2.46 2.48

-0.02

0

0.02

0 1000 2000 3000 4000-100

-80

-60

-40

0 1000 2000 3000 4000-100

-80

-60

1.37 1.38 1.39 1.4 1.41 1.42-0.1

0

0.1

0 1000 2000 3000-100

-80

-60

-40

1.86 1.87 1.88 1.89 1.9 1.91-0.05

0

0.05

0 1000 2000 3000 4000-100

-80

-60

Vowel: periodic “has”

Fricative: aperiodic “watch”

Glide: transition “watch”

Stop: transient “dime”

time domain frequency domain

time / s freq / Hz

ener

gy

/ dB

dB = 20 log10(amplitude) = 10 log10(power)Voiced spectrum has pitch + formants

Michael Mandel (83060 SAU) Intro & DSP 58 / 67

Page 59: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Complex exponentials are eigen-functions of LTI systems

Why do we use complex numbers in signal processing?

B/c complex exponentials are eigen-functions of LTI systems

x [n] = α0 exp(jωn)

y [n] =∞∑

`=−∞x [n − `]h[`]

=∞∑

`=−∞α0 exp(jω(n − `))h[`]

= α0 exp(jωn)∞∑

`=−∞exp(−jω`)h[`]

= α1 exp(jωn)

Output same frequency as input, but different (complex) gain

Michael Mandel (83060 SAU) Intro & DSP 59 / 67

Page 60: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Frequency-domain summary

Any signal can be approximated as sum of weighted sinusoids

Weights of the sinusoids computed using Fourier analysis

Convolution in the time domain is equivalent tomultiplication in the frequency domain

Complex exponentials are eigen-functions of LTI systems

Michael Mandel (83060 SAU) Intro & DSP 60 / 67

Page 61: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Outline

1 Sound and information

2 Course Structure

3 Time domain signal processing

4 Frequency-domain signal processing

5 Time-frequency signal processing

Michael Mandel (83060 SAU) Intro & DSP 61 / 67

Page 62: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Short-time Fourier TransformWant to localize energy in time and frequency

break sound into short-time pieces

calculate DFT of each one

2.35 2.4 2.45

0

4000

3000

2000

1000

2.5 2.55 2.6-0.1

0

0.1

time / s

freq

/ H

z

k →

short-timewindow

DFT

m = 0 m = 1 m = 2 m = 3

L 2L 3L

Mathematically,

X [k ,m] =N−1∑n=0

x [n]w [n −mL] exp

(−j 2πk(n −mL)

N

)Michael Mandel (83060 SAU) Intro & DSP 62 / 67

Page 63: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

The Spectrogram

Plot log-amplitude STFT, 20 log10 |X [k ,m]|, as a gray-scale image

time / s

time / s

freq

/ H

zintensity / dB

2.35 2.4 2.45 2.5 2.55 2.60

1000

2000

3000

4000

freq

/ H

z

0

1000

2000

3000

4000

0

0.1

-50

-40

-30

-20

-10

0

10

0 0.5 1 1.5 2 2.5

Michael Mandel (83060 SAU) Intro & DSP 63 / 67

Page 64: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Time-frequency tradeoffLonger window w [n] gains frequency resolutionat cost of time resolution

1.4 1.6 1.8 2 2.2 2.4 2.6

freq

/ H

z

time / s

level/ dB

0

1000

2000

3000

4000

freq

/ H

z

0

1000

2000

3000

4000

0

0.2W

indo

w =

256

pt

ÒNar

row

ban

Win

dow

= 4

8 pt

ÒWid

eban

-50

-40

-30

-20

-10

0

10

Michael Mandel (83060 SAU) Intro & DSP 64 / 67

Page 65: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Speech sounds on the Spectrogram

Most popular speech visualization

freq

/ H

z

0

1000

2000

3000

4000

1.4 1.6 1.8 2 2.2 2.4 2.6 time/s

watch thin as a dimeahas

Vow

el:

perio

dic

“has

Fric

've:

ap

erio

dic

“wat

ch”

Glid

e: tr

ansi

tion

“wat

ch”

Sto

p: tr

ansi

ent

“dim

e”

Wideband (short window) better than narrowband (long window)to see formants

Michael Mandel (83060 SAU) Intro & DSP 65 / 67

Page 66: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

A spectrogram is equivalent to a bank of filters1

The STFT can be interpreted as a bank of filters in two ways

X [k,m] =N−1∑n=0

x [n] w [n −mL] exp

(−j 2πk(n −mL)

N

)︸ ︷︷ ︸

Bandpass filter

X [k,m] =N−1∑n=0

w [n −mL]︸ ︷︷ ︸low-pass filter

x [n] exp

(−j 2πk(n −mL)

N

)︸ ︷︷ ︸

Modulated signal

k (really 2πk/N) is the center frequency of each filter

Only look at the output of the filter every L samples

1See https://www.dsprelated.com/freebooks/sasp/Filter_Bank_

Summation_FBS.htmlMichael Mandel (83060 SAU) Intro & DSP 66 / 67

Page 67: Lecture 1: Introduction & DSP - m.mr-pc.orgm.mr-pc.org/t/csc83060/2019sp/lecture01.pdf · Aliasing For example, for cos(!n);!= 2ˇr ! 0 all (integer) r values appear to the same after

Summary

Information in soundI lots of it, multiple levels of abstraction

Course overviewI survey of audio processing topicsI readings, presentations, project

DSP reviewI digital signals, time domainI Fourier domain, STFT

Michael Mandel (83060 SAU) Intro & DSP 67 / 67