Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
1
Digital Speech Processing—Lecture 9
Short-Time Fourier Analysis Methods-
Introduction
2
General Discrete-Time Model of Speech Production
Voiced Speech:
• AVP(z)G(z)V(z)R(z)
Unvoiced Speech:
• ANN(z)V(z)R(z)
3
Short-Time Fourier Analysis• represent signal by sum of sinusoids or
complex exponentials as it leads to convenient solutions to problems (formant estimation, pitch period estimation, analysis-by-synthesis methods), and insight into the signal itself
• such Fourier representations provide– convenient means to determine response to a sum of
sinusoids for linear systems– clear evidence of signal properties that are obscured
in the original signal
4
Why STFT for Speech Signals• steady state sounds, like vowels, are produced
by periodic excitation of a linear system => speech spectrum is the product of the excitation spectrum and the vocal tract frequency response
• speech is a time-varying signal => need more sophisticated analysis to reflect time varying properties– changes occur at syllabic rates (~10 times/sec)– over fixed time intervals of 10-30 msec, properties of
most speech signals are relatively constant (when is this not the case)
5
Overview of Lecture• define time-varying Fourier transform (STFT)
analysis method• define synthesis method from time-varying FT
(filter-bank summation, overlap addition)• show how time-varying FT can be viewed in
terms of a bank of filters model• computation methods based on using FFT• application to vocoders, spectrum displays,
format estimation, pitch period estimation
6
Frequency Domain Processing
• Coding:– transform, subband, homomorphic, channel vocoders
• Restoration/Enhancement/Modification:– noise and reverberation removal, helium restoration,
time-scale modifications (speed-up and slow-down of speech)
7
Frequency and the DTFT
0 00
0
2 sinusoids
( ) cos( ) ( ) /where is the (in radians) of the sinusoid the Discrete-Time Fourier Transform ( )
( ) ( ) (
frequency
ω ω
ω ω
ωω
−
∞−
=−∞
•
= = +
•
= =∑
j n j n
j j n
n
x n n e e
X e x n e x
DTFT
DTFT { }
{ }12
-1
)
( ) ( ) ( )
where is the of ( )frequency variable
πω ω ω
π
ω
ωπ
ω−
= =∫ j j n j
j
n
x n X e e d X e
X e
DTFT
8
DTFT and DFT of Speech
1(2 / )
0
The DTFT and the DFT for the infinite duration signal could be calculated (the DTFT) and approximated (the DFT) by the following:
( ) ( ) ( )
( ) ( ) ( ) ,
j j m
mL
j L km
m
X e x m e DTFT
X k x m w m e
ω ω
π
∞−
=−∞
−−
=
=
=
∑
∑
(2 / )
0,1,..., 1
( ) ( )
using a value of =25000 we get the following plot
j
k L
k L
X e DFT
L
ω
ω π=
= −
=
9
25000-Point DFT of SpeechM
agni
tude
Log
Mag
nitu
de (d
B)
Short-Time Fourier Transform (STFT)
10
11
Short-Time Fourier Transform• speech is not a stationary signal, i.e., it
has properties that change with time• thus a single representation based on all
the samples of a speech utterance, for the most part, has no meaning
• instead, we define a time-dependent Fourier transform (TDFT or STFT) of speech that changes periodically as the speech properties change over time
12
Definition of STFTˆ ˆ
ˆ
ˆˆ
ˆ ˆ ˆ ( ) ( ) ( ) both and are variables
ˆ ˆ ( ) is a real window which determines the portion of ( )that is used in the computation of ( )
ω ω
ω
ω∞
−
=−∞
= −
• −
∑j j mn
m
jn
X e x m w n m e n
w n m x nX e
13
Short Time Fourier Transform
ω ω
ω
ω
∞−
=−∞
•
= −
= − ⇒
∑ˆ ˆˆ
ˆ STFT is a function of two variables, the time index, , whichis discrete, and the frequency variable, ˆ, which is continuous
ˆ( ) ( ) ( )
ˆ ˆ( ( ) ( )) fixed, ˆ varia
j j mn
m
n
X e x m w n m e
x m w n m n DTFT ble
14
ˆˆ ˆ ( )ˆ
ˆˆ ˆ
ˆ ˆˆ
ˆ
alternative form of STFT (based on change of variables) is
ˆ ( ) ( ) ( )
ˆ( ) ( )
if we define
ˆ ( ) ( ) ( )
then (
ω ω
ω ω
ω ω
∞− −
=−∞
∞−
=−∞
∞
=−∞
•
= −
= −
•
= −
•
∑
∑
∑%
j j n mn
m
j n j m
m
j j mn
mj
n
X e w m x n m e
e x n m w m e
X e x n m w m e
X e ˆ
ˆ ˆˆ ˆ ˆ ˆˆ
) can be expressed as (using )ˆ ( ) ( ) ( ) ( )
ω
ω ω ω ω− −
′ = −
= = + −⎡ ⎤⎣ ⎦%j j n j j n
nn
m m
X e e X e e x n m w mDTFT
Short-Time Fourier Transform
15
STFT-Different Time Origins
ˆ ˆˆ
the STFT can be viewed as having two different time origins 1. time origin tied to signal ( )
ˆ( ) ( ) ( )
ˆ ˆ ˆ( ) ( ) , fixed, variable
2. time origin ti
ω ω
ω
∞−
=−∞
•
= −
= −⎡ ⎤⎣ ⎦
∑j j mn
m
x n
X e x m w n m e
x m w n m n DTFT
ˆˆ ˆ ˆˆ
ˆˆ ˆ
ˆˆ
ed to window signal ( )
ˆ( ) ( ) ( )
( )ˆ ˆ ˆ( ) ( ) , fixed, variable
ω ω ω
ω ω
ω ω
∞− −
=−∞
−
−
−
= + −
=
= − +⎡ ⎤⎣ ⎦
∑%
j j n j mn
m
j n j
j n
w m
X e e x n m w m e
e X ee w m x n m nDTFT
16
Time Origin for STFT
ˆ [0]m n x= − ⇒
ˆ[ ] [ ]Time origin tied towindow w m x n m− +
17
Interpretations of STFT
1
ˆˆ
ˆˆ
ˆˆ
there are 2 distinct interpretations of ( )ˆ. assume is fixed, then ( ) is simply the normal Fourier
ˆ transform of the sequence ( ) ( ), for ˆfixed , ( ) has the
ω
ω
ω
•
− −∞ < < ∞ =>
jn
jn
jn
X e
n X ew n m x m m
n X e
ˆˆ
ˆˆ ˆˆ
same properties as a normal Fouriertransform
ˆ ˆ2. consider ( ) as a function of the time index with fixed.ˆ Then ( ) is in the form of a convolution of the signal ( )
with
ω
ω ω
ω−
jn
j j nn
X e n
X e x n e
ˆˆ
ˆthe window ( ). This leads to an interpretation in the form of ˆ ˆ linear filtering of the frequency modulated signal ( ) by ( ).
- we will now consider each of these interpretations of th
ω− j n
w nx n e w n
e STFT in a lot more detail
18
Fourier Transform Interpretationˆ
ˆ consider ( ) as the normal Fourier transform of the sequenceˆ ˆ( ) ( ), for fixed .
ˆ the window ( ) slides along the sequence ( ) and defines aˆ new STFT for every value of
w
ω•
− − ∞ < < ∞• −
•
jnX e
w n m x m m nw n m x m
n
hat are the conditions for the existence of the STFTˆ the sequence ( ) ( ) must be absolutely summable for all
ˆ values of ˆ - since | ( ) | (32767 for 16-bit sampling)
-
• −
≤
w n m x mn
x n L1ˆ since | ( ) | (normalized window levels)
- since window duration is usually finiteˆ ˆ ( ) ( ) is absolutely summable for all
≤
• −
w n
w n m x m n
19
Frequencies for STFT
2ˆ ˆ( )ˆ ˆ
the STFT is periodic in with period 2 , i.e.,( ) ( ),
can use any of several frequency variables to express STFT, including
ˆˆ -- (where is the sampling period for
j j kn nX e X e k
T T
ω ω π
ω π
ω
+
•
= ∀
•
= Ω
2
2 2
1
1
ˆˆ
ˆˆ
( )) to represent analog radian frequency,
giving ( )ˆ ˆˆ ˆ -- or to represent normalized
ˆ frequency (0 ) or analog frequency ˆ (0 ), giving (
j Tn
j fs n
x m
X e
f FT
f
F F / T X e π
ω π ω π
Ω
= =
≤ ≤
≤ ≤ = 2 ˆˆ) or ( )j FTnX e π
20
Signal Recovery from STFTˆ
ˆ
ˆˆ
ˆ since for a given value of , ( ) has the same properties as a normal Fourier transform, we can recover the input sequence exactly
since ( ) is the normal Fourier transform of the windo
ω
ω
•
•
jn
jn
n X e
X e
12
0 0
ˆ ˆˆ
wed ˆ sequence ( ) ( ), then
ˆ ˆ ( ) ( ) ( )
assuming the window satisfies the property that ( ) ( a trivial requirement), then by evaluating the inverse Fourier t
πω ω
π
ωπ −
−
− =
• ≠
∫ j j mn
w n m x m
w n m x m X e e d
w
12 0
ˆˆˆ
ransformˆ when , we obtain
ˆ ˆ ( ) ( )( )
πω ω
π
ωπ −
=
= ∫ j j nn
m n
x n X e e dw
21
Signal Recovery from STFT
12 0
0 0
ˆˆˆ
ˆ ˆˆ ˆ
ˆ ˆ ( ) ( )( )
ˆ with the requirement that ( ) , the sequence ( ) can be recovered exactly from ( ), if ( ) is known
ˆfor all values of over one complete pe
πω ω
π
ω ω
ωπ
ω
−
=
• ≠
∫ j j nn
j jn n
x n X e e dw
w x nX e X e
ˆˆ
riod- sample-by-sample recovery process
ˆ ˆ - ( ) must be known for every value of and for all ˆcan also recover sequence ( ) ( ) but can't guarantee
that ( ) can be recovered since (
ω ω−
jnX e n
w n m x mx m w ˆ ) can equal 0−n m
22
Properties of STFT
[ ]
ˆˆ
ˆ ˆ ˆ ˆ2ˆ ˆ ˆ ˆ ˆ
ˆ ˆ
ˆ ˆ ˆ( ) [ ( ) ( )] fixed, variable relation to short-time power density function
ˆ ( ) | ( ) | ( ) ( ) ( ) fixed
ˆ ˆ( ) ( ) ( ) ( ) ( )
ω
ω ω ω ω
ω
∗
= −
•
= = ⋅ =
= − − − + ⇔
jn
j j j jn n n n n
n n
X e w n m x m n
S e X e X e X e R k n
R k w n m x m w n m k x m k S
DTFT
DTFT
[ ]
ˆ
ˆ
ˆ ˆ
ˆˆ ˆ( )ˆ
ˆ
( )
Relation to regular ( ) (assuming it exists)
( ) ( ) ( )
1( ) ( ) ( )2
ˆ( ) ( ) ( ) ( )
ω
ω
ω ω
πω θ ω θ θ
π
θ θ θ
θπ
∞
=−∞
∞−
=−∞
− − −
−
− −
•
= =
=
⎡ ⎤− ↔ ∗⎣ ⎦
∑
∑
∫
j
mj
j j m
m
j j j j nn
j j n j
e
X e
X e x m x m e
X e W e X e e d
w n m x m W e e X e
DTFT
23
Properties of STFT
[ ]
12
1 2
1 22
ˆ
ˆ ˆ
ˆˆ ˆ( )ˆ
ˆ
ˆˆ ˆ ˆ( )ˆ
assume ( ) exists
( ) ( ) ( )
( ) ( ) ( )
limiting caseˆ ˆ ˆ ( ) ( ) ( )
( ) ( ) ( ) ( )
i.e.
ω
ω ω
πω θ ω θ θ
π
ω
πω ω θ θ ω
π
θπ
πδ ω
πδ θ θπ
∞−
=−∞
− − −
−
− −
−
•
= =
=
•
= − ∞ < < ∞ ⇔ =
= − =
∑
∫
∫
j
j j m
m
j j j j nn
j
j j j n jn
X e
X e x m x m e
X e W e X e e d
w n n W e
X e X e e d X e
DTFT
, we get the same thing no matter where the window is shifted
24
Alternative Forms of STFT
1
ˆˆ
ˆ ˆ ˆˆ ˆ ˆ
ˆ ˆ
ˆˆ ˆ
ˆˆ ˆ
Alternative forms of ( ). real and imaginary parts
( ) Re ( ) Im ( )
ˆ ˆ( ) ( )
ˆ( ) Re ( )
ˆ( ) Im ( )
ˆ when ( ) and ( ) are bot
ω
ω ω ω
ω
ω
ω ω
ω
ω
⎡ ⎤ ⎡ ⎤= +⎣ ⎦ ⎣ ⎦= −
⎡ ⎤= ⎣ ⎦⎡ ⎤= − ⎣ ⎦
• −
jn
j j jn n n
n n
jn n
jn n
X e
X e X e j X e
a j b
a X e
b X e
x m w n m
ˆ
ˆ ˆ
ˆ( )ˆ ˆˆ ˆ
ˆˆ ˆ
h real (usually the case)ˆ ˆ ˆ can show that ( ) is symmetric in , and ( ) isˆ anti-symmetric in
2. magnitude and phase ( ) | ( ) |
ˆ can relate | ( ) | and ( )
θ ωω ω
ω
ω ω ωω
θ ω
=
•
n
n n
jj jn n
jn n
a b
X e X e e
X e ˆ ˆˆ ˆ to ( ) and ( ) ω ωn na b
25
Role of Window in STFT
ˆˆ
ˆˆ
ˆ The window ( ) does the following: 1. chooses portion of ( ) to be analyzed 2. window shape determines the nature of ( )
ˆ Since ( ) (for fixed ) is the normal FT o
ω
ω
−
jn
jn
w n mx m
X e
X e n
ˆ ˆ
ˆ ˆ
ˆf ( ) ( ), then if we consider the normal FT's of both ( ) and ( ) individually, we get
( ) ( )
( ) ( )
ω ω
ω ω
∞−
=−∞
∞−
=−∞
−
=
=
∑
∑
j j m
m
j j m
m
w n m x mx n w n
X e x m e
W e w m e
26
Role of Window in STFT
ˆˆ ˆ
ˆ then for fixed , the normal Fourier transform of the ˆproduct ( ) ( ) is the convolution of the transforms
ˆof ( ) and ( )ˆ ˆ for fixed , the FT of ( ) is ( ) --thus
ω ω− −
•
−−
• − j j n
nw n m x m
w n m x mn w n m W e e
12
12
ˆˆ ˆ( )ˆ
ˆˆ ˆ( )ˆ
( ) ( ) ( )
and replacing by gives
( ) ( ) ( )
πω θ θ ω θ
π
πω θ θ ω θ
π
θπ
θ θ
θπ
− − −
−
+
−
=
• −
=
∫
∫
j j j n jn
j j j n jn
X e W e e X e d
X e W e e X e d
27
Interpretation of Role of Windowˆ ˆ
ˆ
ˆˆ ˆ
ˆ
( ) is the convolution of ( ) with the FT of the shifted
window sequence ( )ˆ ( ) really doesn't have meaning since ( ) varies with time;
ˆ consider ( ) defined for wi
ω ω
ω ω
ω
− −
•
•
j jn
j j n
j
X e X e
W e eX e x n
x nˆ
ndow duration and extended for all time to have the same properties then ( ) does exist with properties
ˆ that reflect the sound within the window (can also consider ( ) 0 outside the window
ω⇒=
jX ex n
ˆ
ˆˆ
and define ( ) appropriately--but this is another case)Bottom Line: ( ) is a smoothed version of the FT of the part
ˆ of ( ) that is within the window .
ω
ω
j
jn
X e
X ex n w
28
Windows in STFTˆ
ˆ
ˆ
ˆ for ( ) to represent the short-time spectral properties of ( )
inside the window ( ) should be much narrower in frequency than significant spectral regions of ( )--i.e., almost a
ω
θ
ω
•
=>
jn
j
j
X e x n
W eX e n impulse
in frequency consider rectangular and Hamming windows, where width of the
main spectral lobe is inversely proportional to window length, and side lobe levels are essentially independent
•
of window length
Rectangular Window: flat window of length L samples; first zero in frequency response occurs at FS/L, with sidelobe levels of -14 dB or lower
Hamming Window: raised cosine window of length Lsamples; first zero in frequency response occurs at 2FS/L, with sidelobe levels of -40 dB or lower
Windows
29L=2M+1-point Hamming window and its corresponding DTFT
30
Frequency Responses of Windows
Effect of Window Length-HW
31
Effect of Window Length-HW
32
Effect of Window Length-RW
33
Effect of Window Length-HW
34
Relation to Short-Time Autocorrelation
35
ˆˆ
ˆ ˆ ˆ ˆ2 *ˆ ˆ ˆ ˆ
ˆ
ˆ( ) [ ] [ ]ˆ,
( ) | ( ) | ( ) ( )
ˆ( ) [ ]
is the discrete-time Fourier transform of for each value of then it is seen that is the Fourier transform of
ω
ω ω ω ω
−
= =
= −
jn
j j j jn n n n
n
X e w n m x mn
S e X e X e X e
R l w n m x ˆ[ ] [ ] [ ]
which is the short-time autocorrelation function of the previouschapter. Thus the above equations relate the short-time spectrum to the short-time autocorrelation,
∞
=−∞
− − +∑m
m w n l m x m l
Short-Time Autocorrelation and STFT
36
37
Summary of FT view of STFTˆ
ˆ
interpret ( ) as the normal Fourier transform of the sequenceˆ( ) ( ),
properties of this Fourier transform depend on the window frequency resolution of ( ) varies inver
ω
ω
•
− − ∞ < < ∞•
o
jn
jn
X ew n m x m m
X e sely with the length of the window want long windows for high resolution want ( ) to be relatively stationary (non-time-varying) during duration of window for most s
=>o x n
table spectrum want short windows
as usual in speech processing, there needs to be a compromise between good temporal resolution (short windows) and good frequency resolution (l
=>
⇔
ong windows)
Linear Filtering Interpretation of STFT
38
39
Linear Filtering Interpretation
( )
1
12
2
ˆ ˆ
ˆ
ˆ( )
ˆ. modulation-lowpass filter form ( rather than )
( ) ( ) ( )
ˆ( ) ( ) , variable, fixed
( ) ( )
. bandpass filter-demodulation
ω ω
ω
πθ θ ω θ
π
ω
θπ
∞−
=−∞
−
+
−
= −
= ∗
=
∑
∫
j j mn
m
j n
j j j n
n
n n
X e x m e w n m
w n x n e n
W e X e e d
X ˆ ˆ ( )
ˆ ˆ
ˆ ˆ
( ) ( ) ( )
( ( ) ) ( )
ˆ[( ( ) ) ( )], variable, fixed
ω ω
ω ω
ω ω ω
∞− −
=−∞
∞−
=−∞
−
= −
= −
= ∗
∑
∑
j j n m
m
j n j m
mj n j n
e w m x n m e
e w m e x n m
e w n e x n n
40
Linear Filtering Interpretation
( )( )( )
ˆ ˆ
ˆ
1. modulation-lowpass filter form:
( ) ( ) ( ),
ˆ variable, fixed
( ) ( )
ˆ( )cos( ) ( )
ˆ( )sin( ) ( )ˆ ˆ( ) ( )
j j mn
m
j n
n n
X e x m e w n m
n
x n e w n
x n n w n
j x n n w na jb
ω ω
ω
ω
ω
ω
ω ω
∞−
=−∞
−
= −
= ∗
= ∗ −
∗
= −
∑
41
Linear Filtering Interpretation
( )ˆ ˆ ˆ
2. bandpass filter-demodulation form
ˆ ( ) ( ) ( ) , variable, fixedω ω ω ω− ⎡ ⎤= ∗⎣ ⎦j j n j n
nX e e w n e x n n
ˆ
complex bandpass filter output modulated by signal if ( ) is lowpass, then filter
ˆ is bandpass around all real computation for lower
half structure
j n
j
eW e
ω
θ
θ ω
−
•
•=
•
42
Linear Filtering InterpretationLowpass filter frequency response
Bandpass filter frequency response
43
Linear Filtering Interpretation
ˆ ˆ( )
assume normal FT of ( ) existsˆ ( ) ( ) (recall that is a particular frequency)
( ) ( )ˆ spectrum of ( ) at frequency is shifted to zero frequency;
s
θ
ω θ ω
ω
ω
− +
•
↔
↔=>•
j
j n j
x nx n X ex n e X e
x n
1
ˆ( )
ince the STFT is a convolution, the FT of the STFT is the product of the individual FT's, i.e., ( ) ( ) if ( ) resembles a narrow band lowpass filter, i.e., ( )
for sm
θ ω θ
θ θ
+ ⋅
• =
j j
j j
X e W eW e W e
ˆ ˆ( )
all and is 0 otherwise, then ( ) ( ) ( )θ ω θ ω
θ+ ⋅ ≈j j jX e W e X e
Summary-STFT
44
ˆ ˆˆ
Short-Time Fourier Transform (STFT)
ˆ( ) [ ] [ ] ,
ˆ ˆ, 0 2ˆ ˆFixed value of , varying -- DFT Interpretation
ˆˆFixed value of , varying -- Filter Bank Interpretation
j j mn
mX e x m w n m e
nn
n
ω ω
ω πω
ω
∞−
=−∞
= −
−∞ < < ∞ ≤ <
∑
Summary
45
ˆ ˆˆ
Short-Time Fourier Transform (STFT)
ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn
mX e x m w n m e nω ω ω π
∞−
=−∞
= − −∞ < < ∞ ≤ <∑
0 R 2R 3R n̂
ω̂
0
2π
( )
( )
ˆˆ ˆ
ˆˆ 1
ˆˆ
ˆDFT: ( ) [ ] [ ]
ˆ( ) DFT [ ] [ ]ˆˆ0 2 , 0, ,2 ,...
nj j m
nm n L
jn
X e x m w n m e
X e x m w n m
n R R
ω ω
ω
ω π
−
= − +
= −
= −
≤ < =
∑
Summary – Modulation/Lowpass Filter
46
ˆ ˆˆ
Short-Time Fourier Transform (STFT)
ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn
mX e x m w n m e nω ω ω π
∞−
=−∞
= − −∞ < < ∞ ≤ <∑
0ω̂1ω̂2ω̂
1ˆLω −
n
( )( )( )
ˆ ˆ
1
ˆ ˆ
ˆ
Filter Bank: ( ) [ ] [ ]
( ) [ ] [ ]
[ ] [ ]
nj j m
nm n L
j j nn
j n
X e x m e w n m
X e x n e w n m
x n e w n
ω ω
ω ω
ω
−
= − +
−
−
= −
= −
= ∗
∑
Summary – Bandpass Filter/Demodulation
47
ˆ ˆˆ
Short-Time Fourier Transform (STFT)
ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn
mX e x m w n m e nω ω ω π
∞−
=−∞
= − −∞ < < ∞ ≤ <∑
0ω̂1ω̂2ω̂
1ˆLω −
n
( )ˆ ˆ ( )
ˆ ˆ ˆ
Filter Bank: ( ) [ ] [ ]
( ) ( [ ] ) [ ]
j j n mn
m
j j n j nn
X e x n m e w m
X e e w n e x n
ω ω
ω ω ω
∞− −
=−∞
−
= −
⎡ ⎤= ∗⎣ ⎦
∑
Summary – Modulation
48
ˆ ˆ
ˆ( )
Modulation[ ] ( ) ( )
ˆ( ) ( )( )
j n j j n
j
j
x n e X e FT eX eX e
ω ω ω
ω
ω ω
δ ω ω−
↔ ∗
= ∗ −
=
-W 0 W
( )jX e ω ˆ( )( )jX e ω ω−
ˆ Wω − ˆ Wω +ω̂
49
STFT Magnitude Only
1 22 2 /ˆ
for many applications you only need the magnitude of the STFT(not the phase) in such cases, the bandpass filter implementation is
less complex, since
ˆ ˆ | ( ) | ( ) ( )
|
ω ω ω
•
•
⎡ ⎤= +⎣ ⎦
=
jn n nX e a b
1 22 2 /ˆ ˆ ˆ( ) | ( ) ( )ω ω ω⎡ ⎤= +⎣ ⎦% %%j
n n nX e a b
Sampling Rates of STFT
50
51
Sampling Rates of STFT• need to sample STFT in both time and frequency to
produce an unaliased representation from which x(n) can be exactly recovered
• sampling rates lower than the theoretical minimum rate can be used, in either time or frequency, and x(n) can still be exactly recovered from the aliased (under-sampled) short-time transform– this is useful for spectral estimation, pitch estimation, formant
estimation, speech spectrograms, vocoders– for applications where the signal is modified, e.g., speech
enhancement, cannot undersample STFT and still recover modified signal exactly
52
Sampling Rate in Time
ˆ
ˆ
to determine the sampling rate in time, we take a linear filtering view 1. ( ) is the output of a filter with impulse response ( )
2. ( ) is a lowpass response with effective ba
jn
j
X e w n
W e
ω
ω
•%
2
0 54 0 46
ˆ ˆ
ndwidth of Hertz thus the effective bandwidth of ( ) is Hertz ( ) has
to be sampled at a rate of samples/second to avoid aliasingExample: Hamming Window
( ) . . co
j jn n
BX e B X e
B
w n
ω ω• =>
= − 2 1 0 10
2 10 000
s( / ( ))otherwise
(Hz); for 400, , Hz 50 Hz need
rate of 100/sec (every 100 samples) for sampling rate in time
ss
n L n L
FB L F BL
π − ≤ ≤ −=
⇒ ≈ = = => = =>
53
Sampling Rate in Frequency2
22 0 1 1
ˆ ˆ since ( ) is periodic in with period , it is only necessary to sample over an interval of length
ˆ need to determine an appropriate finite set of frequencies, / , , ,...,
jn
k
X e
k N k N
ω ω ππ
ω π
•
• = = −ˆ
ˆ
ˆ
at which ( ) must be specified to exactly recover ( )
use the Fourier transform interpretation of ( ) 1. if the window ( ) is time-limited, then the inverse transform of ( ) is
jn
jn
jn
X e x n
X e
w n X e
ω
ω
ω
•
ˆ
time-limited
2. the sampling theorem requres that we sample ( ) in the frequency dimension at a rate of at least twice its ('symmetric') "time width" 3. since the inverse Fourie
jnX e ω
ˆ
ˆ
r transform of ( ) is the signal ( ) ( ) and this signal is of duration samples (the duration of ( )), then according to the sampling theorem
( ) must be sampled (in frequency
jn
jn
X e x m w n mL w n
X e
ω
ω
−
2 0 1 1 2
ˆ
) at the set of frequencies
ˆ , , ,..., (where / is the effective width of the window)
in order to exactly recover ( ) from ( ) k
k
jn
k k L LL
x n X e ω
πω = = −
• thus for a Hamming window of duration L=400 samples, we require that the STFT be evaluated at at least 400 uniformly spaced frequencies around the unit circle
54
“Total” Sampling Rate of STFT• the “total” sampling rate for the STFT is the product of the sampling
rates in time and frequency, i.e.,SR = SR(time) x SR(frequency)
= 2B x L samples/secB = frequency bandwidth of window (Hz)L = time width of window (samples)
• for most windows of interest, B is a multiple of FS/L, i.e.,B = C FS/L (Hz), C=1 for Rectangular Window
C=2 for Hamming WindowSR = 2C FS samples/second
• can define an ‘oversampling rate’ ofSR/ FS = 2C = oversampling rate of STFT as compared to
conventional sampling representation of x(n)for RW, 2C=2; for HW 2C=4 => range of oversampling is 2-4this oversampling gives a very flexible representation of the speech signal
55
Mathematical Basis for Sampling the STFT
2 2
2 2
2
2 0 1 1
ˆ assume sample in time at ,
ˆ ˆ and in frequency at , , ,...,
sample values
( ) [ ] [ ]
( )
( ) [ ]
r
k
j k j kmN N
r Rm
j krR j kN N
r R
j kN
r R
n n rR r
k k NN
X e w rR m x m e
e X e
X e x rR m
π π
π π
π
πω ω
∞ −
=−∞
−
• = = −∞ < < ∞
⎛ ⎞= = = −⎜ ⎟⎝ ⎠
•
= −
=
= +
∑
%
% ( )2
2 2
( ) set ;
define DFT-type notation
( ) ( ) ( )
j kmN
m
j k j krRN N
r r R r
w m e m rR m m m
X k X e e X k
π
π π
∞ −
=−∞
−
′ ′− = + =
•
= =
∑
%
56
Sampling the STFT
57
Sampling the STFT
2 2
21
0
0 0 1
0 1
DFT Notation
[ ] ( ) [ ]
let [ ] for (finite duration window with no zero-valued samples)
[ ] [ ] [ ]
( fixed, ) if then (D
j k j krRN N
r r R r
L j kmN
rm
X k X e e X k
w m m L
X k x r R m w m e
r k NL N
π π
π
−
− −
=
•
= =
• − ≠ ≤ ≤ −
= + −
≤ ≤ −• ≤
∑
%
%
21
0
1
0 1
FT defined with no aliasing can recover sequence exactly using inverse DFT)
[ ] [ ] [ ]
( fixed, ) if (IDFT defined with no aliasing), then all samples
ca
N j kmN
rk
x r R m w m X k eN
r m NR L
π−
=
⇒
+ − =
≤ ≤ −• ≤
∑ %
( )n be recovered from [ ] gaps in sequencerX k R L> ⇒
58
What We Have Learned So Far1 ˆ ˆ
ˆ
ˆˆ
ˆ. ( ) ( ) ( )
ˆ ˆ function of for sampled (looks like a time sequence)ˆˆ function of for sampled (looks like a transform)
( ) (no sampling rate reduction) define
j j mn
m
jn
X e x m w n m e
n nn
X e
ω ω
ω
ωω ω
∞−
=−∞
= −
==
∑
1 2 3 0
2 ˆˆ
ˆˆ ˆˆ
ˆ ˆd for , , ,...;ˆ ˆ ˆ. ( ) DTFT ( ) ( ) fixed, variable
ˆ with time origin tied to ( )ˆ ˆ ˆ( ) DTFT ( ) ( ) fixed, variable
with time origin tied to (
jn
j j nn
n
X e x m w n m n
x nX e e x n m w m n
w
ω
ω ω
ω π
ω
ω−
= ≤ ≤
= − ⇒⎡ ⎤⎣ ⎦
= + − ⇒⎡ ⎤⎣ ⎦
3
1
ˆˆ
ˆˆ
ˆ ˆˆ
). Interpretations of ( )
ˆ ˆˆ. fixed, variable; ( ) DTFT ( ) ( ) DFT View
ˆ ˆ 2. variable, fixed; ( ) ( ) ( ) Linear Filtering view
jn
jn
j j nn
mX e
n X e x m w n m
n n X e x n e w n
ω
ω
ω ω
ω ω
ω −
−
= = − ⇒⎡ ⎤⎣ ⎦= = ∗ ⇒
filter bank implementation⇒
59
What We Have Learned So Far4
12
12 0
5
ˆ ˆˆ
ˆˆ ˆˆ
ˆ
. Signal Recovery from STFT
ˆ ˆ ( ) ( ) ( )
ˆ ˆ( ) ( )( )
. Linear Filtering Interpretation
1. modulation-lowpass filter ( ) ( ) ( )
j j mn
j j nn
j jn
x m w n m X e e d
x n X e e dw
X e w n x n e
πω ω
π
πω ω
π
ω
ωπ
ωπ
−
−
−
− =
=
⇒ = ∗
∫
∫
( )
12
ˆ
ˆ ˆ( )
ˆ ˆ ˆ
,
ˆ ˆ variable, fixed ( ) ( ) ( )
2. bandpass filer-demodulation ( ) ( ) ( ) ,
ˆ ˆ variable, fixed
n
j j j j nn
j j n j nn
n n X e W e X e e d
X e e w n e x n
n n
ω
πω θ θ ω θ
π
ω ω ω
ω θπ
ω
+
−
−
⎡ ⎤⎣ ⎦
= =
⎡ ⎤⇒ = ∗⎣ ⎦=
∫
60
What We Have Learned So Far6
22
. Sampling Rates in Time and Frequency 1. time: ( ) has bandwidth of Hertz samples/sec rate
Hamming Window: (Hz)
2. frequency: ( ) is time limited to samples
j
S
W e B BFBL
w n L
ω ⇒
=
% inverse of ( ) is also time limited need to sample in frequency at twice the (effective) time width of the time-limited sequence frequency samples 3.
jnX e
L
ω⇒
⇒⇒
2total Sampling Rate: samples/sec - frequency bandwidth of the window (Hz) - effective time width of the window (samples) / (Hz) SampliS
B LBL
B C F L
⋅=== ⋅ ⇒ 2 2
12
ng Rate samples/second - for Rectangular Window, - for Hamming Window,
SB L CFC
C
= ⋅ =
==
Spectrographic Displays
61
62
Spectrographic Displays• Sound Spectrograph-one of the earliest embodiments of the time-
dependent spectrum analysis techniques– 2-second utterance repeatedly modulates a variable frequency
oscillator, then bandpass filtered, and the average energy at a given time and frequency is measured and used as a crude measure of the STFT
– thus energy is recorded by an ingenious electro-mechanical system on special electrostatic paper called teledeltos paper
– result is a two-dimensional representation of the time-dependent spectrum-with vertical intensity being spectrum level at a given frequency, and horizontal intensity being spectral level at a given time-with spectrum magnitude being represented by the darkness of the marking
– wide bandpass filters (300 Hz bandwidth) provide good temporal resolution and poor frequency resolution (resolve pitch pulses in time but not in frequency)—called wideband spectrogram
– narrow bandpass filters (45 Hz bandwidth) provide good frequency resolution and poor time resolution (resolve pitch pulses in frequency, but not in time)—called narrowband spectrogram
Conventional Spectrogram (Every salt breeze comes from the sea)
63
64
Digital Speech Spectrograms• wideband spectrogram
• follows broad spectral peaks (formants) over time
• resolves most individual pitch periods as vertical striations since the IR of the analyzing filter is comparable in duration to a pitch period
• what happens for low pitch males—high pitch females
• for unvoiced speech there are no vertical pitch striations
• narrowband spectrogram• individual harmonics are resolved in voiced regions
• formant frequencies are still in evidence
• usually can see fundamental frequency
• unvoiced regions show no strong structure
Digital Speech Spectrograms• Speech Parameters (“This is a test”):
– sampling rate: 16 kHz– speech duration: 1.406 seconds– speaker: male
• Wideband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 6 msec (96 samples)– analysis window shift: 0.625 msec (10 samples)– FFT size: 512– dynamic range of spectral log magnitudes: 40 dB
• Narrowband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 60 msec (960 samples)– analysis window shift: 6 msec (96 samples)– FFT size: 1024 – dynamic range of spectral log magnitudes: 40 dB
65
Digital Speech Spectrograms
66
Top Panel: 3 msec (48 samples) window
Second Panel: 6 msec (96 samples) window
Third Panel: 9 msec (144 sample) window
Fourth Panel: 30 msec (480 sample) window
67
Spectrogram Comparisons
Spectrogram - Male
68
1024 80 75= = =nfft , ,L Overlap
Spectrogram - Female
69
1024 80 75= = =nfft , ,L Overlap
Overlap Addition (OLA) Method
70
71
Overlap Addition (OLA) Method
/ˆ ˆ
ˆ
based on normal FT interpretation of short-time spectrumˆ( ) ( ) ( ) ( )
can reconstruct by computing IDFT of ( ) and dividing out the window (assumed non-zero
k
k
DFT IDFTjn n
jn
X e y m x m w n m
x(m) X e
ω
ω
•
←⎯⎯⎯⎯→ = −
•
ˆ
for all samples) this process gives signal values of for each window
window can be moved by samples and the process repeated
since ( ) is "undersampled" in time, it is highly susckjn
L x(m)L
X e ω
• =>
• eptible to aliasing errors need more robust synthesis procedure=>
72
Overlap Addition (OLA) Method( ) ( )
summation is for overlapping analysis sections for each value of where ( ) is measured, do an inverse FT to give
( ) ( ) ( ) (where is the size of the FT)(
k k
k
j j nm
m k
jm
m
y n X e e
m X ey n Lx n w m n Ly
ω ω
ω
⎡ ⎤= ⎢ ⎥
⎣ ⎦•
•
= −
∑ ∑
10
00
) ( ) ( ) ( )
a basic property of the window is
( ) ( ) | ( )
since any set of samples of the window are equivalent (by sampling arguments), then if ( ) is sampled ofte
k
k
mm m
Njj
n
n y n Lx n w m n
W e W e w n
w n
ωω
−
==
= = −
•
= =
•
∑ ∑
∑
0
0
n enough we get (independent of ) ( ) ( )
( ) ( ) ( ) using overlap-added sections
j
mj
nw m n W e
y n Lx n W e
− =
=
∑
Overlap Addition of Bartlett and Hann Windows
73
74
Overlap Addition of Hamming WindowL=128
Window Spectra
75DTFTof Bartlett (triangular), Hann and Hamming windows
Hamming Window Spectra
76
DTFTs of even-length, odd-length and modified odd-to-even length Hamming windows; zeros spaced at 2π/R give perfect
reconstruction using OLA (even-length window)
77
Overlap Addition (OLA) Method
78
Overlap Addition (OLA) Method
• w(n) is an L-point Hamming window with R=L/4
• assume x(n)=0 for n<0
• time overlap of 4:1 for HW
• first analysis section begins at n=L/4
79
Overlap Addition (OLA) Method
• 4-overlapping sections contribute to each interval
• N-point FFT’s done using L speech samples, with N-Lzeros padded at end to allow modifications without significant aliasing effects
• for a given value of ny(n)=x(n)w(R-n)+x(n)w(2R-n)+x(n)w(3R-n)+x(n)w(4R-n)=x(n)[w(R-n)+w(2R-n)+w(3R-n)+w(4R-n)]=x(n) W(ej0)/R
Filter Bank Summation (FBS)
80
81
Filter Bank Summation
the filter bank interpretation of the STFT shows that for
any frequency , ( ) is a lowpass representation ˆ of the signal in a band centered at ( for FBS)
( ) ( )
ω
ω ω
ωω
−
•
=
= −
k
k k
jk n
k
j j nn k
X en n
X e e x n m w ( )
where ( ) is the lowpass window used at frequency (we have generalized the structure to allow a different lowpass window at each frequency ).
ω
ω
ω
∞
=−∞∑ kj m
m
k k
k
m e
w m
82
Filter Bank Summation
define a bandpass filter and substitute it in the equation to give
( ) ( )
( ) ( ) ( )
k
k k
j nk k
j j nn k
m
h n w n e
X e e x n m h m
ω
ω ω∞
−
=−∞
•
=
= −∑
83
Filter Bank Summation
84
Filter Bank Interpretation of STFT (case: wk[n]=w[n])
[ ]
( )
( )
[ ] ( )
[ ] [ ] ( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( )
[ ] ( ) [ ] [ ]
( ) ( ) ( )
k k
k k k
k
k k
j
j n j nj j
j n j n j nj nk
n njj j
k
j j nk n
j j jk
w n W e
h n w n e H e W e FT e
FT e e e e
H e W e W e
v n X e e x n h n
V e H e X e FT
ω
ω ωω ω
ω ω ω ωω
ω ωω ω
ω ω
ω ω ω
δ ω ω
δ ω ω
∞ ∞− −−
=−∞ =−∞
−
−
← − − − →
= ← − − − → = ⊗
= = = −
= ⊗ − =
= = ⋅ ∗
⎡ ⎤= ⋅ ⊗⎣ ⎦
∑ ∑
( )
( )
( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
k
k
k
j n
j jk
jjk
j j
e
H e X e
X e W e
X e W e
ω
ω ω
ω ωω
ω ω ω
δ ω ω
δ ω ω
−
−
+
⎡ ⎤= ⋅ ⊗ +⎣ ⎦⎡ ⎤= ⋅ ⊗ +⎣ ⎦
= ⋅
-ω0 ω0 ω
ωk-ω0 ωk ωk+ ω0 ω
“single-sided, bandpass”
“lowpass”
85
Filter Bank Summation
thus ( ) is obtained by bandpass filtering ( ) followed by modulation with the complex exponential
. We can express this in the form
( ) ( ) ( ) ( )
thus
k
k
k k
jn
j n
j j nk n k
m
X e x n
e
y n X e e x n m h m
ω
ω
ω ω
−
∞
=−∞
•
= = −
•
∑ ( ) is the output of a bandpass filter with impulse
response ( )k
k
y nh n
86
Filter Bank Summation
87
Filter Bank Summation0 1 1
a practical method for reconstructing ( ) from the STFT is as follows
1. assume we know ( ) for a set of frequencies { }, , ,..., 2. assume we have a set of bandpass filters
ω ω
•
= −kjn k
x n
X e N k NN
0 1 1
with impulse responses
( ) ( ) , , ,..., 3. assume ( ) is an ideal lowpass filter with cutoff frequency
ω
ω= = −kj n
k k
k pk
h n w n e k Nw n
( )
- the frequency response of the bandpass filter is
( ) ( )kjjk kH e W e ω ωω −=
88
Filter Bank Summation
2 0 1 1
consider a set of bandpass filters, uniformly spaced, so that the entire frequency band is covered
, , ,...,
also assume window the same for all channels, i.e., ( ) ( ),
πω
•
= = −
•=
k
k
N
k k NN
w n w n
1 1
0 0
0 1 1
( )
, ,..., if we add together all the bandpass outputs, the composite response is
( ) ( ) ( )
if ( ) is properly sampled in frequency ( ), where is the
ω ωω ω
ω
− −−
= =
= −
•
= =
• ≥
∑ ∑% k
k
N Njj j
kk k
j
k N
H e H e W e
W e N L L
1
0
1 0( )
window duration, then it can be shown that
( ) ( )ω ω ω−
−
=
= ∀∑ k
Nj
k
W e wN
FBS Formula
89
Filter Bank Summation
1/N
90
Filter Bank Summation
/
derivation of FBS formula
( ) ( )
if ( ) is sampled in frequency at uniformly spaced points, the inverse discrete Fourier transform
of the sampled version of ( ) is
ω
ω
ω
•
←⎯⎯⎯→
•
k
FT IFT j
j
j
w n W e
W e N
W e
1
0
1
(recall that sampling multiplication convolution aliasing)
( ) ( )
an aliased version of ( ) is obtained.
ω ω− ∞
= =−∞
⇒ ⇔ ⇒
= +
•
∑ ∑k k
Nj j n
k r
W e e w n rNN
w n
91
Filter Bank Summation
0 0
0
1
If ( ) is of duration samples, then ( ) , , and no aliasing occurs due to sampling in frequency
of ( ). In this case if we evaluate the aliased formula for , we get
ω
•= < ≥
•
=
j
w n Lw n n n L
W en
1
0
0( ) ( )
the FBS formula is seen to be equivalent to the formula above, since (according to the sampling theorem) any
set of uniformly spaced samples of ( ) is adequate
ω
ω
−
=
=
•
∑ k
Nj
k
j
W e wN
N W e
92
Filter Bank Summation
1 1
0 0
0
0
the impulse response of the composite filter bank system is
( ) ( ) ( ) ( ) ( )
thus the composite output is
( ) ( ) ( ) ( ) ( ) thus for FBS method, the
ω δ− −
= =
•
= = =
•
= ∗ =•
∑ ∑%
%
k
N Nj n
kk k
h n h n w n e Nw n
y n x n h n Nw x n
1 1
0 0
0
reconstructed signal is
( ) ( ) ( ) ( ) ( )
if ( ) is sampled properly in frequency, and is independent of theshape of ( )
ω ω
ω
− −
= =
= = =
•
∑ ∑ k k
k
N Nj j n
k nk k
jn
y n y n X e e Nw x n
X ew n
93
Filter Bank Summation
1/N
94
2 21 1
0 0
2 21
0
21
0
0 0 1
( )
( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) for if then n
N N j k j knN N
k nk k
N j km j knN N
k m
N j k n mN
m k
m r
r
y n y n X e e
x m w n m e e
x m w n m e
x m w n m N n m rN
y n N w rN x n rN
w n n L N L
π π
π π
π
δ
− −
= =
− −
=
− −
=
∞
=−∞
∞
=−∞
= =
⎡ ⎤= −⎢ ⎥
⎣ ⎦
= −
= − − −
= −
• ≠ ≤ ≤ − ⇒ ≥
∑ ∑
∑ ∑
∑ ∑
∑ ∑
∑0
00 1 2
eed only term
if then in order for ( ) ( ) you need the condition ( ) , , ,... 'undersampled' representation can still work--at least in theory
ry(n) Nw( )x(n)
N L y n x n w rN r
==
• < = = = ± ±•
Filter Bank Summation
95
Summary of FBS Method perfect reconstruction of from ( ) is possible using FBS
under the following conditions: 1. is a finite duration filter/window
2. ( ) is sampled properly in both ti
jn
jn
x(n) X e
w(n)
X e
ω
ω
•
me and frequency
perfect reconstruction of from ( ) is also possible using FBS under the following condition:
( ) is perfectly bandlimited
To avoid time aliasing, ( ) must
k
k
jn
j
jn
x(n) X e
W e
X e
ω
ω
ω
•
•
2 4
be evaluated at at least uniformly spaced frequencies, where is the window duration -since window of length samples has frequency bandwidth of from
/ (for RW) to / (for HW), the
LL
LL Lπ π
2 0 1 1 bandpass filters in FBS overlap
in frequency since the analysis frequencies are / , , ,...,
there is a way (at least theoretically) for ( ) to be evaluated in non-overlapping bands
kjn
k L k L
X e ω
π = −
•
and for which can still be exactly recoveredx(n)
96
FBS Reconstruction in Non-Overlapping Bands
2 0 1 1
assume window length for all bands is samples assume the same window is used for equally spaced
frequency bands with analysis frequencies
, , ,...,
where can be less than
πω
••
= = −
•
k
LN
k k NN
N L assume is an ideal lowpass filter with cutoff frequency
πω
•
=p
w(n)
N
example with N=6 equally spaced ideal
filters
97
FBS Reconstruction in Non-Overlapping Bands
1 1
0 0
1 12
0 0
/
the composite impulse response for the FBS system is
( ) ( ) ( )
defining a composite of the terms being summed as
( )
we get fo
k k
k
N Nj n j n
k k
N Nj n j kn N
k k
h n w n e w n e
p n e e
ω ω
ω π
− −
= =
− −
= =
•
= =
•
= =
•
∑ ∑
∑ ∑
%
r ( )
( ) ( ) ( ) it is easy to show that is a periodic train of impulses of the form
( ) ( )
giving for ( ) the expression
( ) ( ) ( )
thus the composite i
r
r
h n
h n w n p np(n)
p n N n rN
h n
h n N w rN n rN
δ
δ
∞
=−∞
∞
=−∞
=•
= −
•
= −
•
∑
∑
%
%
%
%
mpulse response is the window sequence sampled at intervals of samplesN
98
FBS Reconstruction in Non-Overlapping Bands
1 for ideal LPF we have
sin( / ) sin( )( ) , ( ) ( )
giving ( ) ( ) other cases where perfect reconstruction is obtained1. ( ) is of finite length and causal (no images)2. ( ) has
n N rw n w rN rn rN N
h n n
w n L Nw n
π π δπ π
δ
•
= = =
=•
≤
%
0
0
0
0
0
10 0 1 2
length and has the property ( ) / , for
for ( , , , ,...)
giving ( ) ( ) ( ) ( )
( ) ( ) ( )j r Nj
Nw n N n r N
n rN r r r
h n p n w n n r N
H e e y n x n r Nωω
δ−
>= =
= = ≠ = ± ±
= = −
= ⇒ = −
%
%
impulse response of ideal lowpass filter with cutoff frequency π/N
99
Summary of FBS Reconstruction for perfect reconstruction using FBS methods
1. does not need to be either time-limited or frequency-limited to exactly reconstruct
from ( ) 2. just needs eq
ω
•
kjn
w(n)
x(n) X ew(n) ually spaced zeros, spaced samples apart for theoretically perfect reconstruction
exact reconstruction of the input is possible with a number of frequency channels less than that required by the
•N
sampling theorem key issue is how to design digital filters that match these
criteria•
100
Practical Implementation of FBS
FBS and OLA Comparisons
101
102
FBS and OLA Comparisonsduals filter bank summation method overlap addition method
-- one depends on sampling relation in frequency -- one depends on sampling relation in time FBS requires sampling in frequency
• ←⎯⎯⎯→
•
1
0
0
1 0( )
be such that the window
transform ( ) obeys the relation
( ) ( ) any
OLA requires that sampling in time be such that the window obeys the relation
( ) ( ) / any
ω
ω ω ω−
−
=
=
•
− =
∑ k
j
Nj
k
j
W e
W e wN
w rR n W e R
the key to Short-Time Fourier Analysis is the ability to modify the short-time spectrum (via quantization, noise enhancement, signal enhancement, speed-up/slow-down,etc) and recover an
∞
=−∞
•
∑r
n
"unaliased" modified signal
103
Overlap Addition (OLA) Method
1
0
0 1
1
assume ( ) sampled with period samples in time
( ) ( ), integer, the Overlap Add Method is based on the summation
( ) ( ) OLA Metho
ω
ω ω
ω ω∞ −
=−∞ =
•
= ≤ ≤ −
•
⎡ ⎤= ⎢ ⎥
⎢ ⎥⎣ ⎦∑ ∑
k
k k
k k
jn
j jr rR
Nj j n
rr k
X e R
Y e X e r k N
y n Y e eN
d
for each value of , compute the inverse transform of ( ) giving the sequences ( ) ( ) ( ), the signal at time is obtained by summing the values at time
of all the seque
ω•
= − − ∞ < < ∞
•
kjr
r
r Y e
y m x m w rR m mn n
nces, ( ) that overlap at time , giving
( ) ( ) ( ) ( )
if has a bandlimited FT and if ( ) is properly sampled in time (i.e., small enough to avoid aliasing) the
ω
∞ ∞
=−∞ =−∞
= = −
•
∑ ∑k
r
rr r
jn
y m n
y n y n x n w rR n
w(n) X eR
0
0
n
( ) ( ) / -- independent of , and
( ) ( ) ( ) / -- exact reconstruction of
∞
=−∞
− ≈
=
∑ j
rj
w rR n W e R n
y n x n W e R x(n)