Digital Speech ProcessingDigital Speech Processing

Digital Speech ProcessingDigital Speech Processing——Lecture 11Lecture 11

Modifications, Filter Bank Modifications, Filter Bank Design MethodsDesign Methods

1

Modifications to STFTModifications to STFTModifications to STFTModifications to STFT

modifications to short-time spectrum can be fixed (non-time varying) or time-varying

fi d difi ti ( i bilit ith )

•

fixed modification (no variability with )ˆ ( ) ( ) ( )ω ω ω

•

= ⋅k k kj j jn n

n

X e X e P e

assume inverse DFT of ( ) existsω• kjP e11

, then

[ ] ( )ω ω−

= ∑ k k

Nj j np n P e e

0

[ ] ( )

where is the number of frequencies at which =

=

•

∑k

p n P e eN

N

2 ( ) is evaluatedωkjP e

Fixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBS using FBS methods we get•

1

0

ˆ [ ] ( ) ( )ω ω ω−

=

= ∑ k k k

Nj j j n

nk

y n X e P e e

1

0

[ ] [ ) ( )ω ω ω− ∞

−

= =−∞

⎡ ⎤= −⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑ k k k

Nj m j j n

k m

w n m x m e P e e

1

0

( )[ ] ( ) ( )ω ω∞ −

−

=−∞ =

⎡ ⎤= −⎢ ⎥⎢ ⎥⎣ ⎦∑ ∑ k k

Nj j n m

m k

w n m x m P e e

[ ] [ ] [ ]∞

=−∞

= − −∑m

w n m x m N p n m

3ˆ[ ] [ ] [= ∗y n Nx n w n] [ ]⋅⎡ ⎤⎣ ⎦p n

Fixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBSˆ[ ] [ ] [ ] [ ]= ∗ ⋅⎡ ⎤⎣ ⎦y n Nx n w n p n

ˆ [ ] [ ] convolved with [ ] [ ] equivalent to linear filtering operation on [ ]• = ⋅ =>⎡ ⎤⎣ ⎦y n x n w n p n

x n--ideally want [ ] [ ] [ ]

need duration of [ ] duration of [ ][ ] is

= ∗⇒

•

y n x n p np n w n

p n a periodic sequence of period (sampledN [ ] is • p n a periodic sequence of period (sampled in frequency at frequencies) if [ ] is longer than repetitive structure in • =>

NN

w n N[ ] g p [ ] [ ] [ ] need to avoid long filters (IIR)

but instead use

⋅ = =>pp n w n h n

RW so that

40 1 [ ] [ ] [ ] [ ],= ⋅ ≈ ≤ ≤ −ph n p n w n p n n N

TimeTime--Varying ModificationsVarying ModificationsTimeTime Varying ModificationsVarying Modifications

represent time-varying modification as• represent time-varying modification asˆ ( ) ( ) ( )

with time-varying IR, ( ), defined as

ω ω ω

•

= ⋅

•

k k kj j jn n n

n

X e X e P ep m

1

0

1 [ ] ( )ω ω−

=

= ∑ k k

Nj j m

n nk

p m P e eN

1

ˆ solve for [ ] as

ˆ [ ] ( ) ( )ω ω ω−

•

= ∑ k k

Nj j j

n n

y n

y n X e P e e kn

0=k1

0

[ ] [ ] ( )ω ω ω ω− ∞

−

= =−∞

= −∑ ∑k k k k

Nj n j m j j n

nk m

e x n m w m e P e e

5

0= =−∞k m

TimeTime--Varying ModificationsVarying ModificationsTimeTime Varying ModificationsVarying Modifications1− ∞N

01

ˆ[ ] [ ] [ ] ( )

[ ) [ ] ( )

ω ω ω ω

ω ω

−

= =−∞∞ −

= −

= −

∑ ∑

∑ ∑

k k k k

k k

j n j m j j nn

k mN

j j mn

y n e x n m w m e P e e

x n m w m P e e0

[ ] [ ] [ ]

=−∞ =∞

=−∞∞

= −

∑ ∑

∑m k

nm

x n m w m Np m

ˆ [ ] [ ] [ ] [ ] [ ] [ ] [ )

modified output is the window

∞

=−∞

= − = ∗⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

•

∑ n nm

y n N x n m w m p m Nx n p m w m

[ ] weighted by the modification [ ] and l d ith th i t [ ] i d 'ti li it ' ff t f difi ti

nw m p m convolved with the input [ ] window 'time limits' effects of modifications and prevents smearing in time for FBS spectral modifications lead to convol

⇒

•

x n

ving the original signal with a time-limited window weighted version of the time response due to the

6

time limited, window weighted version of the time response due to the modification

Fixed ModificationsFixed Modifications--OLAOLAFixed ModificationsFixed Modifications OLAOLA for a fixed modification we again have•

1

( ) ( ) ( ) the basic OLA method gives

ω ω ω

∞ −

=

•

⎡ ⎤

k k kj j jn n

N

X e X e P e

1

0

1

1

1

[ ] ( )ω ω∞

=−∞ =

∞ −

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑

∑ ∑

k k

Nj j n

rr k

Nj j j

y n Y e eN

0

1

1

[ ] ( ) ( )

[ ] [ )

ω ω ω

=−∞ =

−

= ∑ ∑ k k kj j j nr

r k

j

y n Y e P e eN

x w rR e1

( )ω ω ω∞ − ∞

∑∑∑ k k k

Nj j nP e e[ ] [ )= − jx w rR e

N 0

11 ( )

( )

[ ] ( ) [ ]

ω ω ω

ω ω

=−∞ = =−∞

∞ − ∞−

⎡ ⎤ ⎡ ⎤= −⎢ ⎥ ⎢ ⎥

∑ ∑∑

∑ ∑ ∑

k k k

k k

j j n

r kN

j j n

P e e

x P e e w rR

70

[ ] ( ) [ ]=−∞ = =−∞

⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

∑ ∑ ∑k r

x P e e w rRN

Fixed ModificationsFixed Modifications--OLAOLAFixed ModificationsFixed Modifications OLAOLA

1

0

1 ( )[ ] [ ] ( ) [ ]ω ω∞ − ∞

−

=−∞ = =−∞

⎡ ⎤ ⎡ ⎤= −⎢ ⎥ ⎢ ⎥

⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦∑ ∑ ∑k k

Nj j n

k r

y n x P e e w rRN

0 01[ ] [ ] [ ] ( ) / ( ) [ ] [ ]

[ ] is the convolution of [ ] with the time response of the spectra

∞

=−∞

⎛ ⎞= − = ∗⎡ ⎤⎜ ⎟ ⎣ ⎦⎝ ⎠

•

∑ j jy n x p n W e R W e x n p nR

y n x n l [ ] is the convolution of [ ] with the time response of the spectra• y n x n l modification ( [ ])--with no window modifications need to use larger FFT sizes to account for (prevent) aliasing due to•

p n

0 01 duration of [ ] where is the window size, is the duration of

⇒ + −p n N N N N [ ]p n

8

Time Varying ModificationsTime Varying Modifications--OLAOLA

11

for the case of a time-varying modification, using OLA, we obtain

ˆ [ ] ( ) ( )ω ω ω∞ −

•

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑ k k k

Nj j j n

r ry n Y e P e eN 0

1

which after a great deal of manipulation can be put in the form

[ ]

=−∞ =

∞

⎢ ⎥⎣ ⎦•

∑ ∑

∑

r kN

1( )( ) ( )ω ω

∞ −⎡ ⎤⎢ ⎥∑ ∑

Nj j nR P1 [ ]

=−∞

= ∑ x wN 0

( )( ) ( )

[ ] [ ] [ ]

ω ω −

=−∞ =

∞ ∞

− ⎢ ⎥⎢ ⎥⎣ ⎦

= − −

∑ ∑

∑ ∑

k kj j nr

r k

rR P e e

x w rR p n[ ] [ ] [ ]=−∞ =−∞

= ∑ ∑ rr

x w rR p n

9

Time Varying ModificationsTime Varying Modifications--OLAOLA

if we make the substitutions , or , we get

ˆ [ ] [ ] [ ] [ ]∞ ∞

• = − = −

= − − +∑ ∑ r

q n n q

y n x n q p q w rR n q

ˆ ˆ if we let [ , ] [ , ] [ ] [ ]

=−∞ =−∞

∞

=−∞

• − = = −

∑ ∑

∑q r

rr

p n q q p m q p q w rR m

ˆˆ[ ] [ ] [ , ]

ˆ[ ] is the convolut

∞

=−∞

= − −

•

∑q

y n x n q p n q q

p m q ion of [ ] and [ ) for all => each coefficientp q w r q [ , ] is the convolut• p m q ion of [ ] and [ ) for all => each coefficientof the time response due to the time-varying modification is smoothed bythe window

rp q w r q

10

Time Varying ModificationsTime Varying Modifications--OLAOLATime Varying ModificationsTime Varying Modifications OLAOLA

11

Modifications with FBS and OLAModifications with FBS and OLAModifications with FBS and OLAModifications with FBS and OLA

• OLA-any modification is first• OLA-any modification is first “bandlimited by the windowbandlimited by the window” then

t t l ti th i tacts as a true convolution on the input signal

• FBS-any modification is first “time time limited by the windowlimited by the window” and couldlimited by the windowlimited by the window and could change instantaneously

12

Additive ModificationsAdditive Modifications--QuantizationQuantization

important to understand effects of additive signal independent• important to understand effects of additive, signal independentmodifications to short-time spectrum as might occur for quantization

( ) ( ) ( )ω ω ω

•

= +k k kj j jn nX e X e E e

where the additive sequence is

[e n1

] ( )ω ω−

= ∑ k k

Nj j nE e e

0

1

for FBS method we get

ˆ[ ] ( ) ( )ω ω ω

=

−

•

⎡ ⎤= +⎣ ⎦∑ k k k

k

Nj j j ny n X e E e e

0

[ ] ( ) ( )

[ ] [ ]=

⎡ ⎤= +⎣ ⎦

= +

∑ nk

y n X e E e e

y n e n

13

Additive ModificationsAdditive Modifications--QuantizationQuantization

additive spectral modification yields additive signal>

( )11

additive spectral modification yields additive signal- for OLA method we get

ˆ[ ] ( ) ( )ω ω ω∞ −

=>

= +∑ ∑ k k k

Nj j j ny n Y e E e e( )

0

11

[ ] ( ) ( )

[ ] ( )ω ω

=−∞ =

∞ −

+

⎡ ⎤= + ⎢ ⎥

⎢ ⎥⎣ ⎦

∑ ∑

∑ ∑ k k

r rr k

Nj j n

r

y n Y e E e eN

y n E e eN 0

[ ] [ ]

=−∞ =

∞

=−∞

⎢ ⎥⎣ ⎦

= + ∑r k

rr

N

y n e n

larger addit⇒ ive signal because of overlap between analysis frames for HW there is about 4 times larger additive signal•

14

STFT SummarySTFT Summary1 ˆ ˆ

ˆ ˆ. ( ) [ ] [ ] -- STFTω ω∞

−

=−∞

= −∑j j mn

m

X e w n m x m e

2

3 ˆ

. [ ] is the analysis window

. ( ) can be considered output of bandpass filter, translated to 0 frequencyor normal Fourier Transform of the sequence (

ωjn

w n

X ex m

4) ( )

th li th t d fi li t i ti d f−w n m

4

5

. use the sampling theorem to define sampling rate in time and frequencydomain representations of the window => 2-4 times higher sampling ratesthan for stationary signal. two synthesis procedures evolved

1

011

FBS => [ ] ( ) -- sum together the bandpass filter outputs

OLA => [ ] ( )

ω ω

ω ω

−

=∞ −

=

=

∑

∑ ∑

k k

k k

Nj j n

nk

Nj j n

y n X e e

y n Y e e0

OLA => [ ] ( )

with ( ) ( ) windowed segments spaced by samples in timeare

ω ω=−∞ =

=

• = =>

∑ ∑k k

rr k

j jr rR

y n Y e eN

Y e X e Roverlapped and added together

FBS d OLA d l th d

15

FBS and OLA are dual methods•

Digital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank Designs

16

Digital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank Designs

• composite frequency responsecomposite frequency response approximates flat magnitude and linear phasephase

• showed ideal conditions for perfect reconstructionreconstruction– choose same w[n] for all channels of filter

bankbank– w[n] has zeros equally spaced at N sample

intervals17

intervals

Uniform Filter Bank DesignsUniform Filter Bank Designs• simple design procedure for uniform bank of filters

choose number of filters (this is determined by the– choose number of filters (this is determined by the desired frequency spacing)

– design LPF with desired frequency resolution and with i t l d i tizeros appropriately spaced in time

• problems with this ideal case:– may want non-uniform filter spacing (model ear– may want non-uniform filter spacing (model ear

characteristics)– certain frequency bands (e.g., 0-100 Hz) often omitted

(almost no essential speech information)(almost no essential speech information)– no design procedure for LPF allows simultaneous

constraints on both time and frequency response => it is

18

very difficult to design the desired filters

Modified Filter Bank StructureModified Filter Bank StructureModified Filter Bank StructureModified Filter Bank Structure

φj

1 1

each channel in FB scaled by complex gain | |

[ ] [ ] ( )

φ

ω ω− −

• =

= =∑ ∑

k

k k

jk k

N Nj j n

k k n

P P e

y n y n P X e e0 0

1 1( )[ ] [ ] | | [ ] ω φ

= =− −

+= =

∑ ∑

∑ ∑ k k

k kN N

j nk k k kh n P h n P w n e

0 0

provides adjustment of gain and phase for each = =

•

∑ ∑k k

kPchannel to make the overall composite filter bank better channel to make the overall composite filter bank better

approximate the ideal filter bank

19

Modified Filter Bank StructureModified Filter Bank StructureModified Filter Bank StructureModified Filter Bank Structure

20

FB DesignsFB Designs--Step 1Step 1FB DesignsFB Designs Step 1Step 10 1 Step 1--choose set of analysis frequencies, ,ω• ≤ ≤ −k k N

2

2-

( )

assume symmetry in frequency => assume symmetry in time => [ ] [ ]

giving ( ) ( ) ( )ω π ω ω

ω π ω

−− −∗ ∗

= −

=

• = =k k k

N k k

k N kj j j

n n n

w n w n

X e X e X e can show that for•

n n n

2 1

0 0 2 22 1/

/ /

even we get

[ ] [ ] | | [ ]cos( ) [ ]( )ω φ−

= + + + −∑N

nk k k k N N

N

h n P w n P w n n P w n1

1 2

0 0 2( )/

and for odd we get

[ ] [ ] | | [ ]cos( )ω φ

=

−

•

= + +∑

k

N

k k k k

N

h n P w n P w n n0 01

0 1

[ ] [ ] | | [ ] ( )

given set of frequencies { }, , we need

φ

ω=

• ≤ ≤ −

∑ k k k kk

k k N to design set of lowpassfilters (or analysis windows) { [ ]} with the desired frequency resolution andkw n

21

the desired composite response

Effect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted Channels

channels are often omitted because they contain virtually no useful information (e.g., channel around =0 or around = ) to see the effect of omitting channels from the filter bank, assume

ω ω π•

• unif

12

2

/

orm frequency spacing, / , and identical analysis windows, [ ] [ ]

[ ] [ ] π

ω π

−

=

=

∑

k

kN

j kn N

k Nw n w n

h n w n P e 2

01

2

/

/

[ ] [ ]

defining [ ] , we get

π

π

=−

=

• =

∑

∑

j kn Nk

kN

j kn Nk

h n w n P e

p n P e0

[ ] [ ] [ ] where [ ] is periodic with period =

=k

h n w n p n p n N.

22


00 2

0

to omit channels, we set for those channels that are omitted, e.g., omit channel 0 ( ), and all channels above = / ,

forω π

• =

=

= >

k

M

k

P kk M N

P k M

2 2

0

/

for . in this case we get

[ ] π π

= >

•

= +∑

k

Mj kn N j kn

P k M

p n e e1

2 2/ / /( )π π−

−= +∑ ∑N M

N j kn N j kn Ne e1=∑k 1

2 11

sin ( )[ ]

π= − =

⎡ ⎤+⎢ ⎥⎣ ⎦= −⎡ ⎤

∑ ∑k N M k

M nNp nsin π⎡ ⎤

⎢ ⎥⎣ ⎦n

N

23


π⎡ ⎤ • p[n] periodic with period N=152 1

1sin ( )

[ ]i

π

π

⎡ ⎤+⎢ ⎥⎣ ⎦= −⎡ ⎤

M nNp n

• pulse amplitudes and widths depend on N and M

• for M=(N-1)/2 - all channels

24

sin π⎡ ⎤⎢ ⎥⎣ ⎦

nN

• for M=(N-1)/2 - all channels included-p[n] is an impulse train spaced every N=15 samples

Effects of Imperfect LPF DesignEffects of Imperfect LPF Design

[ ] [ ] [ ]=h n w n p n

0 0 assume we can delay [ ] by arbitrary amount giving [ ]• −w n n w n n

25

0 0

2 3

1 0 2 0 3 0

2

2' ' '

y [ ] y y g g [ ]

Case 1: [ ] [ ] [ ]

Case 2: [ ) [ ] [ ] [ ]

α δ α δ

α δ α δ α δ

= − + −

= − + − − + − −

h n n N n N

h n n n n N n n N n

Effects of Imperfect LPF DesignEffects of Imperfect LPF Design

t hi hift f l l t0

02

21

0 1/

( )

to achieve a shift of samples, let

,π

π

−

− ∞

•

= ≤ ≤ −j kn Nk

N j k n n

n

P e k N

26

0

00

( )[ ] [ ]δ

−

= =−∞

= = − −∑ ∑j k n n

N

k r

p n e N n n rN

IIR Filter BanksIIR Filter Banks

10 000 assume , samples/sec assume uniform filter bank with spacing 100 Hz => • =

•sF

2 100 0=10,000/100=100, / , analysis range of 100 Hz to 3000 Hz => =30 channels

( f

ω π= ≤ ≤

•kN k k M

M) f Bessel (maximally flat delay•

th

) filters derived

from 6 order analog filter 30 msec duration impulse response =>

300 samples at 10 000 Hz SR•300 samples at 10,000 Hz SR 60 Hz filter bandwidth•

27

IIR Filter BanksIIR Filter Banks

for =30 =100 we get• M N0 61 10 01

for 30, 100, we getsin( . )[ ]sin( . )

broadening of IR from 30 channels

ππ

•

= −

•

M Nnp nn

2 since IR duration > samples, there

are at least 2 strong peaks in [ ] significant ripple in composite fre

•

•

N

h nquency

response (4 dB, 25 degrees)using delay to equalize amplitudes of first

and third pulses, get better frequencyresponse (0 8 dB ripple 0 6 degrees phase

•

response (0.8 dB ripple, 0.6 degrees phasedeviation)

28

Summary of FB DesignSummary of FB DesignSummary of FB DesignSummary of FB Design1. determine filter spacingfilter spacing and number ofnumber of1. determine filter spacingfilter spacing and number of number of

filtersfilters2.2. design filterdesign filter to meet frequency2.2. design filterdesign filter to meet frequency

selectivity for each channel3.3. evaluateevaluate w(n)w(n) and choose delay to3.3. evaluate evaluate w(n)w(n) and choose delay to

minimize ripple4. evaluate composite responsecomposite response and4. evaluate composite responsecomposite response and

iterate solution if response does not meet specs

29

p

FIR Filter BanksFIR Filter Banks

1 exact linear phase designs when ( ) ( ) good design methods--windowing, optimal equiripple designs• = − −•

w n w L n good design methods windowing, optimal equiripple designs

Window Design Method

design ideal LPF as

( ) | |ωω ω ω−

•

= ≤dj njd pW e e

01 2 where ( ) / fo

=• = −d

otherwisen L r -point window

can solve for ideal IR as

sin ( )ω

•

⎡ ⎤p

L

n n12

sin ( ) [ ]

( )

use finite duration window to truncate ( ) giving[ ] [ ] ( )

ω ω

ω

ωω

π π−

−

⎡ ⎤−⎣ ⎦= =−

•

= ≤ ≤

∫p

d

p

p dj n j nd

d

d

n nw n e e d

n n

w nw n w n d n n n n n [ ] [ ] ( ),

( ) ( ) (ω ω ω

= − − ≤ ≤

= ⊗d d d d

j j jd

w n w n d n n n n n

W e W e D e

12

( )

)

( ) ( )π

θ ω θ θπ

−= ∫ j jdW e D e d

30

2

this leads to a non-zero transition region betweenthe passband and stopband and ripples in both bands

ππ

−

•

Window Design PropertiesWindow Design PropertiesWindow Design PropertiesWindow Design Properties

1 transition region Δω inversely1. transition region, Δω, inversely proportional to L

2. W(ejω) antisymmetric around ωp

3. peak errors in passband and3. peak errors in passband and stopband nearly equal

4 i ti t t4. approximation error greatest near ωp

31

Kaiser Window DesignsKaiser Window DesignsKaiser Window DesignsKaiser Window DesignsKaiser window designs close to optimal•

20

0

1

Kaiser window designs close to optimal

( / ) [ ] , | |

( )

α

α

⎡ ⎤−⎢ ⎥⎣ ⎦= ≤d

d

I n nd n n n

I

0

0 ( ) is the zeroth order Bessel function of the first kind is a tradeoff between transition width and pe

αα

=•

•

otherwiseI

ak approximation

1020 7 951

14 362

errorlog .

.

/ ( )

δ

ω π

− −= +

ΔΔ = Δ

Lf

f 2

2

/ ( )Procedure for design: 1. and chosen =>

. computed from formula

ω π

δα

Δ Δ

Δ

f

f L

32

p

NonNon--Uniform Filter BanksUniform Filter BanksNonNon Uniform Filter BanksUniform Filter Banks

33

Ideal NonIdeal Non--Uniform DesignsUniform DesignsIdeal NonIdeal Non Uniform DesignsUniform Designs

0 1

filter bank bandpass filter response of form

[ ] [ ] ,id d i i i d [ ] f ll h l

ω

•

= ≤ ≤ −kj nk k kh n P w n e k N

d

(

consider designs using a common window, [ ], for all channels the composite frequency response is

( ) ( ω ωω −

••

jj

d n

H e P W e1

) )−

∑ k

N( ( ) ( ω ωω = jj

k kH e P W e0

) )

using the same design window for each channel givesπ

=

•

∑ k

k

12

( ) ( ) ( ) ( ) ( ) ( )π

ω ω θ ω ω θ

π

θπ

− − −

−

= ∫k kj j jk dkW e W e D e d

34

Ideal NonIdeal Non--Uniform DesignsUniform DesignsIdeal NonIdeal Non Uniform DesignsUniform Designs

l i f ( ) iωjH1

0

12

( ) ( )

solving for ( ) gives

( ) ( ) ( )

ω

πθ ωω ω θ θ

π

−− −

•

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑∫ k

j

Njj j

k dkk

H e

H e P W e D e d0

1( )

letting

( ) ( )

π

ω ωω

=−

−−

⎢ ⎥⎣ ⎦•

= ∑ k

k

Njj

d k dkH e P W e0

1

( ) ( )

givesπ

=

=

•

∑

∫

d k dkk

H e P W e

12

( ) ( ) ( ) ( )

where ( ) is the d

ω ω ω θ

πω

θπ

−

−

=

•

∫j j jd

jd

H e H e D e d

H e esired composite frequency response

35

( )d p q y p

Ideal NonIdeal Non--Uniform DesignsUniform DesignsIdeal NonIdeal Non Uniform DesignsUniform Designs1 0 1assume , , and that the bandwidths and center frequencies of• = ≤ ≤ −kP k N1 0 1

1

( )

assume , , and that the bandwidths and center frequencies of

( ) are such that the entire frequency range is covered, giving

( ) ,

ω ω

ωω

π ω π

π ω π

−

−

≤ ≤

− ≤ ≤

= − ≤ ≤

k

d

kj

dkj nj

d

P k N

W e

H e eπ

12

(( ) (ωω

π−= dj nj jH e e D e ) ) [ ]

giving a composite impulse response of

[ ] [ ] [ )

ωω θ

π

θ

δ

−−

−

=

•

∫ dj ndd d n e

h d [ ] [ ] [ ) this says that if the composite desired response is flat with linear phase then

the actual composite response of

δ= −

•d dh n d n n n

the filter bank, using filters all designed with thesame window is also ideal--independent of how the center frequencies and same window, is also ideal independent of how the center frequencies and

bandwidths are distributed, and no matter what filter design window is used this says that perfect reconstruction is theoretically possible using FIR filters

with an arbitrary distribution of center frequencies and bandwidths•

36

FIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank Examples

9 6 . kHz want 15 filters to cover range from 200-3200 Hz• =

•sF

3200 200 1002 2 15

lowpass cutoff frequency is

Hz( )

t f i

ωπ

•

−= = =p

pFT

200 100 1 152

center frequencies

,

desire 60 dB attentuation =

ωπ

•

= = + ≤ ≤

•

kkF k k

T> =5.65 with 200 Hzα

175transition bands => =L

37


• Fs=9.6 kHz

• want 4 octave band filters => bandwidth doubles with each successive filter => bandwidths of 200, 400, 800 and 1600 Hz, with center frequencies of 300, 600, 1200 and 2400 Hz

• use L=175

38


• want narrower transition bands

• use larger value of L--301

39

Applications of STFTApplications of STFTApplications of STFTApplications of STFT

• vocoders => voice coders code• vocoders => voice coders, code speech at rates much lower than

f dwaveform coders• removal of additive noise• de-reverberation

d d l d f h• speed-up and slow-down of speech for speed learning, aids for the

40handicapped

Coding of STFTCoding of STFT

elements of STFT1 set of { } chosen to cover frequency range of interestω

•

k 1. set of { } chosen to cover frequency range of interest 2. ( )-set of lowpass analysis windows 3. - set of complex gains to make composite frequency

response

ωk

k

k

w nP

as close to ideal as possible

41

response as close to ideal as possible goal is to sample STFT at rates lower than ( )⇒ x n

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• attenuation > 80 dB above 80 Hz

System example•12195

2 128 95 273

System example samples/sec

=128/ , . Hzω π

•=

= =

s

k k

FN

k F k

420 0 0 28 100

[ ] [ ] -- linear phase FIR filter, length 731 samples, , -- preserved band up to 2690 Hz=

= = < <k

k

w n w nP P k

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• part a is the original

• part b has no quantization, but there is fuzziness in the spectrogram due to echo in the composite IR => thecomposite IR => the system sounds reverberant

• Pk used to adjust phase and improve overall sound quality

43

overall sound quality

Coding of STFTCoding of STFTCoding of STFTCoding of STFT

• quantizing STFT’squantizing STFT s– BR=sampling rate x bits/sample

channel sampling rate depends on bandwidth– channel sampling rate depends on bandwidth of lowpass filter, e.g., 80 Hz BW => SR ≥160/sec160/sec

– using lower SR => aliasing in time– narrower bandwidth LP => reverberancenarrower bandwidth LP > reverberance

because of “holes” in spectrum => need to increase number of channels

44

Coding of STFTCoding of STFT

45aliasing in time due to lower

SR-160/100/80/60narrower bandwidth LP => 80/53/36 Hz

=> formant distortions

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• channel coding and quantization

• use ADM with 1 bit/sample

• 28 channels-with real and imaginary outputs

• BR=2 x 28 x SR=56 SR

• for ADM SR=5-10 times Nyquist rate

• SR=500 Hz (part a), 375 Hz ( t b) 250 H ( t )

46

(part b), 250 Hz (part c)

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• non-uniform coding and quantizationquantization

• 28 channels

• 100/sec SR (gives small amount of aliasing)

• coding log magnitude and phase using 3 bits forand phase using 3 bits for log magnitude and 4 bits for phase for channels 1-10; and 2 bits for log ; gmagnitude and 3 bits for phase for channels 11-28

• total rate of 16 Kbps

47

total rate of 16 Kbps

STFT Coding SummarySTFT Coding Summary• sample channels at sufficient rate, quantize to

12 bits/sample => perceptually perfect p p p y preproduction of x(n) possible with bit rates of 100 Kbps

d d bit t ti l• reduced bit rate => quantize more coarsely, reduce sampling rate => 16 Kbps rates possible

• remove redundancy of the speech signal by• remove redundancy of the speech signal by higher order modeling prior to STFT operations

• cannot use SNR to evaluate STFT since degradations are perceived as modifications of speech quality and intelligibility-not additive noise => spectrograms subjective quality testing

48

noise => spectrograms, subjective quality testing

PhasePhase VocoderVocoderPhase Phase VocoderVocoder

d f d d l d f hused for speed-up and slow-down of speech

49intro phase vocoder

channel vocoder

speeded-up speech

slowed down speech

Additional Examples of Rate Additional Examples of Rate Ch i S hCh i S hChanges in SpeechChanges in Speech

• Male Speakerp– Original rate– Speeded up– Speeded up more– Speeded up more – Slowed down– Slowed down more

F l S k• Female Speaker– Original rate– Speeded upSpeeded up– Speeded up more– Slowed down

Slowed down more50

– Slowed down more

PhasePhase VocoderVocoder Time ExpandedTime ExpandedPhase Phase VocoderVocoder Time ExpandedTime Expanded

51

PhasePhase VocoderVocoder Time CompressedTime CompressedPhase Phase VocoderVocoder Time CompressedTime Compressed

52

Channel VocoderChannel VocoderChannel VocoderChannel Vocoder

• interpret STFT so that each channel can be thought of as a a e p e S so a eac c a e ca be oug o as a abandpass filter with center frequency ωk

• magnitude of STFT can be approximated by envelope detection on the BPF output (FWR and LPF)p ( )

• analyzer-bank of channels; need excitation info (the phase component) => V/UV detector, pitch detector

• synthesizer channel signal control channel amplitude;• synthesizer-channel signal control channel amplitude; excitation signals control detailed structure of output for a given channel; V/UV choice of excitation source

=> highly reverberant speech because of total lack of

53

=> highly reverberant speech because of total lack of control of composite filter bank response


1200 9600 b• 1200-9600 bps

• 600 bps for pitch and V/UV

54

• easy to modify pitch, timing Channel VocoderChannel Vocoder--2.4 kbps2.4 kbps


55

Documents

Digital Speech ProcessingDigital Speech Processing