Upload
jamesyu
View
227
Download
0
Embed Size (px)
Citation preview
8/15/2019 Spectral Modeling and Signal Processing Intro421
1/35
MUS421/EE367B Lecture 1
Introduction and Overview
Julius O. Smith III ([email protected])Center for Computer Research in Music and Acoustics (CCRMA)
Department of Music, Stanford UniversityStanford, California 94305
April 10, 2007
Course Overview
The focus of this course is spectral modeling andsignal processing using the short-time Fourier
transform (STFT). Applications include musical sound synthesis and
audio signal processing.
1
http://ccrma.stanford.edu/~joshttp://ccrma.stanford.edu/http://www.stanford.edu/group/Music/http://www.stanford.edu/http://www.stanford.edu/http://www.stanford.edu/group/Music/http://ccrma.stanford.edu/http://ccrma.stanford.edu/~jos8/15/2019 Spectral Modeling and Signal Processing Intro421
2/35
Administrative Information
Units
You may sign up for either 3 or 4 units:
3 units = lectures + assignments + final
4 units adds a final project based on outside readingand/or a software project
Important Pointers The course schedule and outline1 (reachable from the
class home page2) lists the following information:
Assignments
Weekly class schedule
Pointers to all lecture overheads The 421 home page further contains pointers to
Programming examples
Sound examples
Related items of interest online
The MUS421/EE367B Overview3 contains thisadministrative info and more.
1http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html2http://ccrma.stanford.edu/CCRMA/Courses/421/3http://ccrma.stanford.edu/~jos/intro421/
2
http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.htmlhttp://ccrma.stanford.edu/CCRMA/Courses/421/http://ccrma.stanford.edu/~jos/intro421/http://ccrma.stanford.edu/~jos/intro421/http://ccrma.stanford.edu/CCRMA/Courses/421/http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html8/15/2019 Spectral Modeling and Signal Processing Intro421
3/35
Why The Fourier Transform
Natural for visualizing audio signals:The ear performs a kind of Fourier analysis
Spectral models can be very compact and flexible:
MPEG audio coding
Sinusoidal modeling (additive synthesis)
AES talk4 on history of spectral modeling atCCRMA and elsewhere.
Any Linear Time Invariant (LTI) system can beimplemented in the frequency domain by means ofthe Fourier Transform
Efficient FFT implementations exist which make itpossible to implement very large LTI systems in realtime, e.g., room impulse-response convolutions oflength 10,000 to 100,000
4http://ccrma.stanford.edu/~jos/pdf/AES-Heyser.pdf
3
http://ccrma.stanford.edu/~jos/pdf/AES-Heyser.pdfhttp://ccrma.stanford.edu/~jos/pdf/AES-Heyser.pdf8/15/2019 Spectral Modeling and Signal Processing Intro421
4/35
Applications of theShort-Time Fourier Transform (STFT)
Frequency-domain display of audio signals
Fast convolution
Time-varying linear filtering
Time-varying nonlinear filtering
Fourier analysis, modification, and resynthesis
Musical sound synthesis via spectral modeling:
Additive synthesis using sinusoids
Sines + Noise modeling
Sines + Noise + Transients modeling
Speech synthesis
Vocoders
Time scaling
Pitch shifting
Pitch Detection Noise reduction
Audio compression
4
8/15/2019 Spectral Modeling and Signal Processing Intro421
5/35
Applications, Contd
Transform coders for audio compression, such as
MPEG AAC (10X common)
MP3 (MPEG-II, Layer III about half thequality of AAC)
Dolby AC-2 and AC-3 (6X, fixed)
Philips DCC, Sony Minidisc (4 or 5X))
Music 422 (EE 367C) is an entire CCRMA coursedevoted to this topic. It is offered only in Winterquarters.
Perfect Reconstruction Filter banks
Computational Auditory Modeling
Filter Design and Implementation
System Identification
Signal Separation (Auditory Scene Analysis)
Non-Parametric Spectral Estimation
5
8/15/2019 Spectral Modeling and Signal Processing Intro421
6/35
Emerging Applications in Audio Coding
MPEG-4 supports an object oriented approach tocompression algorithms:
First transmitobjects = synthesizer patches = decoder
Next transmitmessages = performance data (like MIDI) =encoded bit stream
Main challenge: to develop classifiers and coders forgeneral purpose audio
Final Project (4th Unit)
Individual or group research project
Must be related to lecture/lab topics (FFT+Audio)
One-page project proposal due by 4th class meeting
Final written report due by the end of finals week
Oral presentation during the last class invited!
Project Types
Programming project and report
6
8/15/2019 Spectral Modeling and Signal Processing Intro421
7/35
Reading and report
Suggested Outside Reading: Seehttp://ccrma.stanford.edu/~jos/refs421/
(also available in Appendix P of the text.5)
Example Project Topics
Windows
New FFT window types
Explore window types not covered in class
Spectrum Analysis
Short-time spectrum analysis of recorded data
Study of statistical spectrum estimation
Matching spectrum analysis parameters tohuman hearing
Alternative time-frequency representations(Wavelets, Wigner, ...)
Sinusoidal Modeling
Readings in additive synthesis
Implement your own sines+noiseanalysis/synthesis system
Noise reduction based on sinusoidal modeling
Source separation based on sinusoidal modeling5http://ccrma.stanford.edu/~jos/sasp/
7
http://ccrma.stanford.edu/~%7B%7Djos/refs421/http://ccrma.stanford.edu/~jos/sasp/http://ccrma.stanford.edu/~jos/sasp/http://ccrma.stanford.edu/~%7B%7Djos/refs421/8/15/2019 Spectral Modeling and Signal Processing Intro421
8/35
Transient detection (for sines + noise +transients modeling)
Short-Time Fourier Transform (STFT) based
Analysis, Modification, and Resynthesis Software system development
Noise reduction
Pitch detection/tracking
Time compression/expansion
Transform coding Pitch-synchronous phase vocoder (adaptwindow to pitch period)
Signal reconstruction from the STFT magnitudeonly (phase discarded)
Spectral interpolation schemes to compensate
for lost frames Software Development
Modify course Matlab examples to becompatible with Octave(See http://www.octave.org/.)
Develop missing components of the Signal
Processing Toolbox for Octave (looking only atMatlab help info).
PD FFT-processing patches (seehttp://www.pure-data.org/)
8
http://www.pure-data.org/http://www.pure-data.org/8/15/2019 Spectral Modeling and Signal Processing Intro421
9/35
LADSPA spectral-processing plug-ins (seehttp://www.ladspa.org/
9
http://www.ladspa.org/http://www.ladspa.org/8/15/2019 Spectral Modeling and Signal Processing Intro421
10/35
Main Pointer
The course schedule and outline6 (reachable from theclass home page7) lists the following information:
Assignments
Weekly class schedule
Pointers to all lecture overheads
6http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html7http://ccrma.stanford.edu/CCRMA/Courses/421/
10
http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.htmlhttp://ccrma.stanford.edu/CCRMA/Courses/421/http://ccrma.stanford.edu/CCRMA/Courses/421/http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html8/15/2019 Spectral Modeling and Signal Processing Intro421
11/35
Notation
Before we begin, we will review the notation we will usein this class. (We will try to stay consistent.)
Frequency and Time:
denotes continuous radian frequency (rad/sec)
f denotes continuous frequency in Hertz (Hz)
= 2f
k denotes discrete frequency, k = 2(k/N)fs
, k R (frequencies are always real)
T = sampling interval (sec)
fs = sampling rate, fs =1
T
tn = nT (discrete time)
n, k Z (integers)
t, tn R (times are always real)
11
8/15/2019 Spectral Modeling and Signal Processing Intro421
12/35
Introduction to Audio Spectrum Analysis
Spectrum analysis of real-world signals typically occurs inshort segments. We are therefore most interested inshort-time spectrum analysis:
Spectral content typically varies over time.
The human ear uses less than one second of past
sound to form a spectrum. There is a limit to the length of signal we can analyze
at once.
To extract and analyze a sound segment, it is necessaryto apply a window function. An unmodified segment
extraction corresponds to a rectangular window.Everything we look at will be through a window, henceit is important to realize what the window is doing to ourunderlying signal.
Applications well discuss first:
Spectral Analysis for Display
FIR Filter Design
12
8/15/2019 Spectral Modeling and Signal Processing Intro421
13/35
Example of Windowing
Lets look at a simple example of windowing todemonstrate what happens when we turn an infiniteduration signal into a finite duration signal throughwindowing.
Complex Sinusoid:
x(n) = ejnT, 0 T <
Notes:
real part = cos(nT)
The frequencies present in our signal are only positive.A fancy name for this is an analytic signal
This signal is infinite duration. (It doesnt die out as nincreases.) In order to end up with a signal which diesout eventually (so we can use the DFT), we need tomultiply our signal by a window (which does die out).
13
8/15/2019 Spectral Modeling and Signal Processing Intro421
14/35
The following is a diagram of a typical window function:
1000 800 600 400 200 0 200 400 600 800 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ZeroPhase Window
Time (samples)
Amplitude
This is loosely called a zero-centered (or zero phase,or even) window function, which means its phase in thefrequency domain is either zero or , as we will see indetail later. (Recall that a real and even function has areal and even Fourier transform.) The window is also
nonnegative, as is typical.
14
8/15/2019 Spectral Modeling and Signal Processing Intro421
15/35
We might also require that our window be zero fornegative time. Such a window is said to be causal.Causal windows are necessary for real-time processing:
1000 800 600 400 200 0 200 400 600 800 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Linear Phase Window (Causal)
Time (samples)
Amplit
ude
By shifting the original window in time by half its length,we have turned the original non-causal window into acausal window. The Shift property of the FourierTransform tells us that we have introduced a linear phaseterm.
15
8/15/2019 Spectral Modeling and Signal Processing Intro421
16/35
Putting all this together, we get the following:
Our original signal (unwindowed, infinite duration), is
x(n) = ej0nT, n Z
A portion of the real part, cos(0nT), is plotted below:
2000 1500 1000 500 0 500 1000 1500 20001
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Time (samples)
Amplitude
The imaginary part, sin(0nT), is of course identical butfor a 90-degree phase-shift to the right.
16
8/15/2019 Spectral Modeling and Signal Processing Intro421
17/35
The Fourier Transform of this infinite duration signal is adelta function at 0: X() = ( 0)
( 0)
0 0
The windowed version is: xw(n) = w(n)ej0nT n Z
(Note carefully the difference between w and .)
17
8/15/2019 Spectral Modeling and Signal Processing Intro421
18/35
2500 2000 1500 1000 500 0 500 1000 1500 2000 25001
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Time (samples)
Amplitude
The Convolution Theorem tells us that our multiplicationin the time domain results in a convolution in thefrequency domain. Hence, in our case, we will obtain theconvolution of a delta function at frequency 0, and thetransform of the window.
The result of convolution with a delta function is theoriginal function, shifted to the location of the deltafunction. (The delta function is the identity element for
convolution.)
18
8/15/2019 Spectral Modeling and Signal Processing Intro421
19/35
3 2 1 0 1 2 350
45
40
35
30
25
20
15
10
5
0
Amplitude
dB
main lobe
sidelobes
T 0
19
8/15/2019 Spectral Modeling and Signal Processing Intro421
20/35
Summary
Windowing in the time domain resulted in asmearing or smoothing in the frequency domain.We need to be aware of this if we are trying to resolvesinusoids which are close together in frequency.
Windowing also introduced side lobes.This is important when we are trying to resolve low
amplitude sinusoids in the presence of higheramplitude signals. When we look at specific windows,we will be looking at this behavior.
A sinusoid at amplitude A, frequency 0, and phase becomes a window transform shifted out to frequency
0, and scaled by Ae
j
.
There are many type of windows which serve variouspurposes and exhibit various properties.
20
8/15/2019 Spectral Modeling and Signal Processing Intro421
21/35
The Rectangular Window
The rectangular window may be defined as:
wR(n)=
1, |n| M120, otherwise
20 15 10 5 0 5 10 15 200
0.2
0.4
0.6
0.8
1
Time (samples)
Amplitude
ZeroPhase Rectangular Window M = 21
Zero centered definition (even in time domain)
Need M odd in zero-centered case
Scale window by 1/M to obtain unity dc gain
21
8/15/2019 Spectral Modeling and Signal Processing Intro421
22/35
To see what happens in the frequency domain, we needto look at the DTFT of the window:
WR() = DTFT(wR)
=
n=
wR(n)ejn
=
M12
n=M12
ejn =ej
M12 ej
M+12
1 ej
where we used the closed form of a geometric series:
Un=L
rn =rL rU+1
1 r
We can factor out linear phase terms from the numeratorand denominator of the above expression to get
WR() = ej
1
2
ej12
ej
M
2 ejM
2
ej12 ej
12
=sin
M2
sin
2
= M asincM()where asincM() denotes the aliased sinc function.
asincM()=
sin(M/2)M sin(/2)
(also called the Dirichlet function)
22
8/15/2019 Spectral Modeling and Signal Processing Intro421
23/35
Rectangular Window Transform (Contd)
Above, we found the rectangular window transform to be
the aliased sinc function:
WR() = M asincM()=
sin
M2
sin
2
6 4 2 0 2 4 64
2
0
2
4
6
8
10
12
freq
ComplexAmplitude
DFT of a Rectangular Window of length M = 11
Amp
This (real) result is for the zero-centered rectangularwindow. For the causal case, a linear phase term appears:
WcR() = ej M12 MasincM()
As the sampling rate goes to infinity, the aliased sincfunction approaches the regular sinc function
sinc(x)=
sin(x)
x
23
8/15/2019 Spectral Modeling and Signal Processing Intro421
24/35
More generally, we may plot both the magnitude andphase of the window versus frequency:
6 4 2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Magnitude(Linear)
Magnitude of Rectangular Window Transform (M = 11)
/M
6 4 2 0 2 4 60.3927
0
0.3927
0.7854
1.1781
1.5708
1.9635
2.3562
2.7489
3.1416
3.5343
Phase
Phase of Rectangular Window Transform (M = 11)
Main Lobe
/M
24
8/15/2019 Spectral Modeling and Signal Processing Intro421
25/35
In audio work, we more typically plot the windowtransform magnitude on a decibel (dB) scale:
3 2 1 0 1 2 340
35
30
25
20
15
10
5
0
13 dB down
sidelobes sidelobes
main lobe
nulls nulls
Normalized Frequency (rad/sample)
Magnitude
(dB)
DFT of a Rectangular Window M = 11
25
8/15/2019 Spectral Modeling and Signal Processing Intro421
26/35
Since the DTFT of the rectangular window approximatesthe sinc function, it should roll off at approximately 6dB per octave, as verified in the log-log plot below:
0.1 0.2 0.4 0.8 1.6 3.2 6.4
54.1854
48.1648
42.1442
36.1236
30.103
24.0824
18.0618
12.0412
6.0206
0
Ideal 6 dB per octave line
Normalized Frequency (rad/sample)
Amplitude(dB
)
DFT of a Rectangular Window M = 20
Partial
Main
Lobe
As the sampling rate approaches infinity, the rectangularwindow transform (asinc) converges exactly to the sincfunction. Therefore, the departure of the roll-off from
that of the sinc function can be ascribed to aliasing in thefrequency domain, due to sampling in the time domain.
26
8/15/2019 Spectral Modeling and Signal Processing Intro421
27/35
Sidelobe Roll-Off Rate
In general, if the first n derivatives of a continuous
function w(t) exist (i.e., they are finite and uniquelydefined), then its Fourier Transform magnitude isasymptotically proportional to
|W()| constant
n+1(as )
Proof: Look up roll-off rate in text index.
Thus, we have the following rule-of-thumb:n derivatives 6(n + 1) dB per octave roll-off rate
(since 20log10(2) = 6.0205999 . . .).This is also 20(n + 1) dB per decade.
To apply this result, we normally only need to look at the
windows endpoints. The interior of the window is usuallydifferentiable of all orders.
Examples:
Amplitude discontinuity 6 dB/octave roll-off
Slope discontinuity 12 dB/octave roll-off
Curvature discontinuity 18 dB/octave roll-off
For discrete-time windows, the roll-off rate slows down athigh frequencies due to aliasing.
27
8/15/2019 Spectral Modeling and Signal Processing Intro421
28/35
In summary, the DTFT of the M-sample rectangularwindow is proportional to the aliased sinc function:
asincM(T)
=
sin(MT/2)
sin(T /2)
sin(fMT)
f T= Msinc(f MT)
Some important points:
Zero crossings at integer multiples of M
=2M
where M= 2M = frequency sampling interval used
by a length M DFT
Main lobe width is 2M =4M
As M gets bigger, the mainlobe narrows
(better frequency resolution) M has no effect on the height of the side lobes
(Same as the Gibbs phenomenon for Fourier series)
First sidelobe only 13 dB down from main-lobe peak
Side lobes roll off at approximately 6dB per octave
A phase term arises when we shift the window tomake it causal, while the window transform is real inthe zero-centered case (i.e., when the window w(n) isan even function of n)
28
8/15/2019 Spectral Modeling and Signal Processing Intro421
29/35
Frequency Resolution
The next series of plots shows the effect that an increased
window length has on our ability to resolve 2 sinusoids.
Two Cosines (In-Phase Case)
2 cosines separated by = 240
Rectangular Windows of lengths: 20, 30, 40, 80
( =
1
2M,
3
4M, M, 2M)
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Magnitude
M = 20
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Magnitude
M = 30
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
M
agnitude
M = 40
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
M
agnitude
M = 80
Frequency T (rad/sample)Frequency T (rad/sample)
Frequency T (rad/sample)Frequency T (rad/sample)
29
8/15/2019 Spectral Modeling and Signal Processing Intro421
30/35
One Sine and One Cosine(Phase Quadrature Case)
As above, but 1 sine and 1 cosine
Note: least-resolved case appears resolved!
Note: M = 40 case suddenly looks much worse
Only the M = 80 case looks good at all phases
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Magnitude
M = 20
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Magnitude
M = 30
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Magnitude
M = 40
0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Magnitude
M = 80
Frequency T (rad/sample)Frequency T (rad/sample)
Frequency T (rad/sample)Frequency T (rad/sample)
30
8/15/2019 Spectral Modeling and Signal Processing Intro421
31/35
One Sine and One Cosine(Phase Quadrature Case)
All Four Resolutions Overlaid
Same plots as on previous page, just overlaid
Peak locations are biased in under-resolved cases,both in amplitude and frequency
0 0.5 1 1.5 2 2.5 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
M
agnitude
M=20
M=30M=40M=80
Frequency T (rad/sample)
The preceding figures suggest that, for a rectangularwindow of length M, two sinusoids can be most easily
31
8/15/2019 Spectral Modeling and Signal Processing Intro421
32/35
resolved when they are separated in frequency by
2M M=
2
MThis implies there must be at least two full cycles of thedifference-frequency under the window. (Well see laterthat this is an overly conservative requirementa morecareful study reveals that 1.44 cycles is sufficient for therectangular window.)
In principle, arbitrarily small frequency separations can beresolved if
there is no noise, and
we are sure we are looking at the sum of two idealsinusoids under the window.
However, in practice, there is almost always some noiseand/or interference, so we prefer to require sinusoidalfrequency separation by at least one main-lobe width (ofthe sinc-function in this case, or the window transformmore generally) whenever possible.
The rectangular window provides an abrupt transition atits edge. We will later look at some other windows whichhave a more gradual transition. This is usually done toreduce the height of the side lobes.
32
8/15/2019 Spectral Modeling and Signal Processing Intro421
33/35
Resolution Bandwidth (resolving sinusoids)
Our ability to resolve two closely spaced sinusoids isdetermined primarily by the main lobe width of theFourier transform of the window we are using.
Let Bw denote the main lobe width in Hz, with the mainlobe width defined as the width between zero crossings:
-4 -3 -2 -1 0 1 2 3 4-2
-1
0
1
2
3
4
5
6
7
WR
()
2 32
=7
3
2Bw
For the Rectangular Window (length M), we have
WR() = asincM()=
sin(MT/2)
sin(T /2)=
sin(MfT)
sin(f T)
Main lobe width is two sidelobes wide
Bw = 2M2
= 2fsM
(Hz)
33
8/15/2019 Spectral Modeling and Signal Processing Intro421
34/35
Choosing Window Length to Resolve Sinusoids
A conservative requirement for resolving 2 sinusoids (innoisy conditions) with a spacing of Hz is to choose awindow length M long enough so that their main lobesare clearly discernible. For example, we may require thattheir main lobes meet at the first zero crossings.
0 0.5 1 1.5 2 2.5 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Closely spaced sinusoids - Rectangular window - M = 21
Amplitude
Bw
Bw
T (radians per sample)
To obtain the separation shown above, we must haveBw , where Bw is the main lobe width in Hz, and is the sinusoidal frequency separation in Hz.
34
8/15/2019 Spectral Modeling and Signal Processing Intro421
35/35
For the rectangular window, Bw can be expressed as
Bw = 2fsM
Hence we need:
Bw = 2fsM
M 2fs
or
M 2 fs
|f2 f1|
Thus, to resolve the frequencies f1 and f2 under arectangular window, it is sufficient for the window lengthM to span at least 2 periods of the difference frequencyf2 f1, measured in samples, where 2 is the width of themain lobe, measured in sidelobe-widths.
A rectangular window of length or greater is said toresolve the sinusoidal frequencies f1 and f2.