55
Digital Speech Processing Digital Speech Processing— Lecture 11 Lecture 11 Modifications, Filter Bank Modifications, Filter Bank Design Methods Design Methods 1

Digital Speech ProcessingDigital Speech Processing

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Speech ProcessingDigital Speech Processing

Digital Speech ProcessingDigital Speech Processing——Lecture 11Lecture 11

Modifications, Filter Bank Modifications, Filter Bank Design MethodsDesign Methods

1

Page 2: Digital Speech ProcessingDigital Speech Processing

Modifications to STFTModifications to STFTModifications to STFTModifications to STFT

modifications to short-time spectrum can be fixed (non-time varying) or time-varying

fi d difi ti ( i bilit ith )

fixed modification (no variability with )ˆ ( ) ( ) ( )ω ω ω

= ⋅k k kj j jn n

n

X e X e P e

assume inverse DFT of ( ) existsω• kjP e11

, then

[ ] ( )ω ω−

= ∑ k k

Nj j np n P e e

0

[ ] ( )

where is the number of frequencies at which =

=

∑k

p n P e eN

N

2 ( ) is evaluatedωkjP e

Page 3: Digital Speech ProcessingDigital Speech Processing

Fixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBS using FBS methods we get•

1

0

ˆ [ ] ( ) ( )ω ω ω−

=

= ∑ k k k

Nj j j n

nk

y n X e P e e

1

0

[ ] [ ) ( )ω ω ω− ∞

= =−∞

⎡ ⎤= −⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑ k k k

Nj m j j n

k m

w n m x m e P e e

1

0

( )[ ] ( ) ( )ω ω∞ −

=−∞ =

⎡ ⎤= −⎢ ⎥⎢ ⎥⎣ ⎦∑ ∑ k k

Nj j n m

m k

w n m x m P e e

[ ] [ ] [ ]∞

=−∞

= − −∑m

w n m x m N p n m

3ˆ[ ] [ ] [= ∗y n Nx n w n] [ ]⋅⎡ ⎤⎣ ⎦p n

Page 4: Digital Speech ProcessingDigital Speech Processing

Fixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBSFixed Modifications using FBSˆ[ ] [ ] [ ] [ ]= ∗ ⋅⎡ ⎤⎣ ⎦y n Nx n w n p n

ˆ [ ] [ ] convolved with [ ] [ ] equivalent to linear filtering operation on [ ]• = ⋅ =>⎡ ⎤⎣ ⎦y n x n w n p n

x n--ideally want [ ] [ ] [ ]

need duration of [ ] duration of [ ][ ] is

= ∗⇒

y n x n p np n w n

p n a periodic sequence of period (sampledN [ ] is • p n a periodic sequence of period (sampled in frequency at frequencies) if [ ] is longer than repetitive structure in • =>

NN

w n N[ ] g p [ ] [ ] [ ] need to avoid long filters (IIR)

but instead use

⋅ = =>pp n w n h n

RW so that

40 1 [ ] [ ] [ ] [ ],= ⋅ ≈ ≤ ≤ −ph n p n w n p n n N

Page 5: Digital Speech ProcessingDigital Speech Processing

TimeTime--Varying ModificationsVarying ModificationsTimeTime Varying ModificationsVarying Modifications

represent time-varying modification as• represent time-varying modification asˆ ( ) ( ) ( )

with time-varying IR, ( ), defined as

ω ω ω

= ⋅

k k kj j jn n n

n

X e X e P ep m

1

0

1 [ ] ( )ω ω−

=

= ∑ k k

Nj j m

n nk

p m P e eN

1

ˆ solve for [ ] as

ˆ [ ] ( ) ( )ω ω ω−

= ∑ k k

Nj j j

n n

y n

y n X e P e e kn

0=k1

0

[ ] [ ] ( )ω ω ω ω− ∞

= =−∞

= −∑ ∑k k k k

Nj n j m j j n

nk m

e x n m w m e P e e

5

0= =−∞k m

Page 6: Digital Speech ProcessingDigital Speech Processing

TimeTime--Varying ModificationsVarying ModificationsTimeTime Varying ModificationsVarying Modifications1− ∞N

01

ˆ[ ] [ ] [ ] ( )

[ ) [ ] ( )

ω ω ω ω

ω ω

= =−∞∞ −

= −

= −

∑ ∑

∑ ∑

k k k k

k k

j n j m j j nn

k mN

j j mn

y n e x n m w m e P e e

x n m w m P e e0

[ ] [ ] [ ]

=−∞ =∞

=−∞∞

= −

∑ ∑

∑m k

nm

x n m w m Np m

ˆ [ ] [ ] [ ] [ ] [ ] [ ] [ )

modified output is the window

=−∞

= − = ∗⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

∑ n nm

y n N x n m w m p m Nx n p m w m

[ ] weighted by the modification [ ] and l d ith th i t [ ] i d 'ti li it ' ff t f difi ti

nw m p m convolved with the input [ ] window 'time limits' effects of modifications and prevents smearing in time for FBS spectral modifications lead to convol

x n

ving the original signal with a time-limited window weighted version of the time response due to the

6

time limited, window weighted version of the time response due to the modification

Page 7: Digital Speech ProcessingDigital Speech Processing

Fixed ModificationsFixed Modifications--OLAOLAFixed ModificationsFixed Modifications OLAOLA for a fixed modification we again have•

1

( ) ( ) ( ) the basic OLA method gives

ω ω ω

∞ −

=

⎡ ⎤

k k kj j jn n

N

X e X e P e

1

0

1

1

1

[ ] ( )ω ω∞

=−∞ =

∞ −

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑

∑ ∑

k k

Nj j n

rr k

Nj j j

y n Y e eN

0

1

1

[ ] ( ) ( )

[ ] [ )

ω ω ω

=−∞ =

= ∑ ∑ k k kj j j nr

r k

j

y n Y e P e eN

x w rR e1

( )ω ω ω∞ − ∞

∑∑∑ k k k

Nj j nP e e[ ] [ )= − jx w rR e

N 0

11 ( )

( )

[ ] ( ) [ ]

ω ω ω

ω ω

=−∞ = =−∞

∞ − ∞−

⎡ ⎤ ⎡ ⎤= −⎢ ⎥ ⎢ ⎥

∑ ∑∑

∑ ∑ ∑

k k k

k k

j j n

r kN

j j n

P e e

x P e e w rR

70

[ ] ( ) [ ]=−∞ = =−∞

⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

∑ ∑ ∑k r

x P e e w rRN

Page 8: Digital Speech ProcessingDigital Speech Processing

Fixed ModificationsFixed Modifications--OLAOLAFixed ModificationsFixed Modifications OLAOLA

1

0

1 ( )[ ] [ ] ( ) [ ]ω ω∞ − ∞

=−∞ = =−∞

⎡ ⎤ ⎡ ⎤= −⎢ ⎥ ⎢ ⎥

⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦∑ ∑ ∑k k

Nj j n

k r

y n x P e e w rRN

0 01[ ] [ ] [ ] ( ) / ( ) [ ] [ ]

[ ] is the convolution of [ ] with the time response of the spectra

=−∞

⎛ ⎞= − = ∗⎡ ⎤⎜ ⎟ ⎣ ⎦⎝ ⎠

∑ j jy n x p n W e R W e x n p nR

y n x n l [ ] is the convolution of [ ] with the time response of the spectra• y n x n l modification ( [ ])--with no window modifications need to use larger FFT sizes to account for (prevent) aliasing due to•

p n

0 01 duration of [ ] where is the window size, is the duration of

⇒ + −p n N N N N [ ]p n

8

Page 9: Digital Speech ProcessingDigital Speech Processing

Time Varying ModificationsTime Varying Modifications--OLAOLA

11

for the case of a time-varying modification, using OLA, we obtain

ˆ [ ] ( ) ( )ω ω ω∞ −

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑ k k k

Nj j j n

r ry n Y e P e eN 0

1

which after a great deal of manipulation can be put in the form

[ ]

=−∞ =

⎢ ⎥⎣ ⎦•

∑ ∑

r kN

1( )( ) ( )ω ω

∞ −⎡ ⎤⎢ ⎥∑ ∑

Nj j nR P1 [ ]

=−∞

= ∑ x wN 0

( )( ) ( )

[ ] [ ] [ ]

ω ω −

=−∞ =

∞ ∞

− ⎢ ⎥⎢ ⎥⎣ ⎦

= − −

∑ ∑

∑ ∑

k kj j nr

r k

rR P e e

x w rR p n[ ] [ ] [ ]=−∞ =−∞

= ∑ ∑ rr

x w rR p n

9

Page 10: Digital Speech ProcessingDigital Speech Processing

Time Varying ModificationsTime Varying Modifications--OLAOLA

if we make the substitutions , or , we get

ˆ [ ] [ ] [ ] [ ]∞ ∞

• = − = −

= − − +∑ ∑ r

q n n q

y n x n q p q w rR n q

ˆ ˆ if we let [ , ] [ , ] [ ] [ ]

=−∞ =−∞

=−∞

• − = = −

∑ ∑

∑q r

rr

p n q q p m q p q w rR m

ˆˆ[ ] [ ] [ , ]

ˆ[ ] is the convolut

=−∞

= − −

∑q

y n x n q p n q q

p m q ion of [ ] and [ ) for all => each coefficientp q w r q [ , ] is the convolut• p m q ion of [ ] and [ ) for all => each coefficientof the time response due to the time-varying modification is smoothed bythe window

rp q w r q

10

Page 11: Digital Speech ProcessingDigital Speech Processing

Time Varying ModificationsTime Varying Modifications--OLAOLATime Varying ModificationsTime Varying Modifications OLAOLA

11

Page 12: Digital Speech ProcessingDigital Speech Processing

Modifications with FBS and OLAModifications with FBS and OLAModifications with FBS and OLAModifications with FBS and OLA

• OLA-any modification is first• OLA-any modification is first “bandlimited by the windowbandlimited by the window” then

t t l ti th i tacts as a true convolution on the input signal

• FBS-any modification is first “time time limited by the windowlimited by the window” and couldlimited by the windowlimited by the window and could change instantaneously

12

Page 13: Digital Speech ProcessingDigital Speech Processing

Additive ModificationsAdditive Modifications--QuantizationQuantization

important to understand effects of additive signal independent• important to understand effects of additive, signal independentmodifications to short-time spectrum as might occur for quantization

( ) ( ) ( )ω ω ω

= +k k kj j jn nX e X e E e

where the additive sequence is

[e n1

] ( )ω ω−

= ∑ k k

Nj j nE e e

0

1

for FBS method we get

ˆ[ ] ( ) ( )ω ω ω

=

⎡ ⎤= +⎣ ⎦∑ k k k

k

Nj j j ny n X e E e e

0

[ ] ( ) ( )

[ ] [ ]=

⎡ ⎤= +⎣ ⎦

= +

∑ nk

y n X e E e e

y n e n

13

Page 14: Digital Speech ProcessingDigital Speech Processing

Additive ModificationsAdditive Modifications--QuantizationQuantization

additive spectral modification yields additive signal>

( )11

additive spectral modification yields additive signal- for OLA method we get

ˆ[ ] ( ) ( )ω ω ω∞ −

=>

= +∑ ∑ k k k

Nj j j ny n Y e E e e( )

0

11

[ ] ( ) ( )

[ ] ( )ω ω

=−∞ =

∞ −

+

⎡ ⎤= + ⎢ ⎥

⎢ ⎥⎣ ⎦

∑ ∑

∑ ∑ k k

r rr k

Nj j n

r

y n Y e E e eN

y n E e eN 0

[ ] [ ]

=−∞ =

=−∞

⎢ ⎥⎣ ⎦

= + ∑r k

rr

N

y n e n

larger addit⇒ ive signal because of overlap between analysis frames for HW there is about 4 times larger additive signal•

14

Page 15: Digital Speech ProcessingDigital Speech Processing

STFT SummarySTFT Summary1 ˆ ˆ

ˆ ˆ. ( ) [ ] [ ] -- STFTω ω∞

=−∞

= −∑j j mn

m

X e w n m x m e

2

3 ˆ

. [ ] is the analysis window

. ( ) can be considered output of bandpass filter, translated to 0 frequencyor normal Fourier Transform of the sequence (

ωjn

w n

X ex m

4) ( )

th li th t d fi li t i ti d f−w n m

4

5

. use the sampling theorem to define sampling rate in time and frequencydomain representations of the window => 2-4 times higher sampling ratesthan for stationary signal. two synthesis procedures evolved

1

011

FBS => [ ] ( ) -- sum together the bandpass filter outputs

OLA => [ ] ( )

ω ω

ω ω

=∞ −

=

=

∑ ∑

k k

k k

Nj j n

nk

Nj j n

y n X e e

y n Y e e0

OLA => [ ] ( )

with ( ) ( ) windowed segments spaced by samples in timeare

ω ω=−∞ =

=

• = =>

∑ ∑k k

rr k

j jr rR

y n Y e eN

Y e X e Roverlapped and added together

FBS d OLA d l th d

15

FBS and OLA are dual methods•

Page 16: Digital Speech ProcessingDigital Speech Processing

Digital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank Designs

16

Page 17: Digital Speech ProcessingDigital Speech Processing

Digital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank DesignsDigital Filter Bank Designs

• composite frequency responsecomposite frequency response approximates flat magnitude and linear phasephase

• showed ideal conditions for perfect reconstructionreconstruction– choose same w[n] for all channels of filter

bankbank– w[n] has zeros equally spaced at N sample

intervals17

intervals

Page 18: Digital Speech ProcessingDigital Speech Processing

Uniform Filter Bank DesignsUniform Filter Bank Designs• simple design procedure for uniform bank of filters

choose number of filters (this is determined by the– choose number of filters (this is determined by the desired frequency spacing)

– design LPF with desired frequency resolution and with i t l d i tizeros appropriately spaced in time

• problems with this ideal case:– may want non-uniform filter spacing (model ear– may want non-uniform filter spacing (model ear

characteristics)– certain frequency bands (e.g., 0-100 Hz) often omitted

(almost no essential speech information)(almost no essential speech information)– no design procedure for LPF allows simultaneous

constraints on both time and frequency response => it is

18

very difficult to design the desired filters

Page 19: Digital Speech ProcessingDigital Speech Processing

Modified Filter Bank StructureModified Filter Bank StructureModified Filter Bank StructureModified Filter Bank Structure

φj

1 1

each channel in FB scaled by complex gain | |

[ ] [ ] ( )

φ

ω ω− −

• =

= =∑ ∑

k

k k

jk k

N Nj j n

k k n

P P e

y n y n P X e e0 0

1 1( )[ ] [ ] | | [ ] ω φ

= =− −

+= =

∑ ∑

∑ ∑ k k

k kN N

j nk k k kh n P h n P w n e

0 0

provides adjustment of gain and phase for each = =

∑ ∑k k

kPchannel to make the overall composite filter bank better channel to make the overall composite filter bank better

approximate the ideal filter bank

19

Page 20: Digital Speech ProcessingDigital Speech Processing

Modified Filter Bank StructureModified Filter Bank StructureModified Filter Bank StructureModified Filter Bank Structure

20

Page 21: Digital Speech ProcessingDigital Speech Processing

FB DesignsFB Designs--Step 1Step 1FB DesignsFB Designs Step 1Step 10 1 Step 1--choose set of analysis frequencies, ,ω• ≤ ≤ −k k N

2

2-

( )

assume symmetry in frequency => assume symmetry in time => [ ] [ ]

giving ( ) ( ) ( )ω π ω ω

ω π ω

−− −∗ ∗

= −

=

• = =k k k

N k k

k N kj j j

n n n

w n w n

X e X e X e can show that for•

n n n

2 1

0 0 2 22 1/

/ /

even we get

[ ] [ ] | | [ ]cos( ) [ ]( )ω φ−

= + + + −∑N

nk k k k N N

N

h n P w n P w n n P w n1

1 2

0 0 2( )/

and for odd we get

[ ] [ ] | | [ ]cos( )ω φ

=

= + +∑

k

N

k k k k

N

h n P w n P w n n0 01

0 1

[ ] [ ] | | [ ] ( )

given set of frequencies { }, , we need

φ

ω=

• ≤ ≤ −

∑ k k k kk

k k N to design set of lowpassfilters (or analysis windows) { [ ]} with the desired frequency resolution andkw n

21

the desired composite response

Page 22: Digital Speech ProcessingDigital Speech Processing

Effect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted Channels

channels are often omitted because they contain virtually no useful information (e.g., channel around =0 or around = ) to see the effect of omitting channels from the filter bank, assume

ω ω π•

• unif

12

2

/

orm frequency spacing, / , and identical analysis windows, [ ] [ ]

[ ] [ ] π

ω π

=

=

k

kN

j kn N

k Nw n w n

h n w n P e 2

01

2

/

/

[ ] [ ]

defining [ ] , we get

π

π

=−

=

• =

j kn Nk

kN

j kn Nk

h n w n P e

p n P e0

[ ] [ ] [ ] where [ ] is periodic with period =

=k

h n w n p n p n N.

22

Page 23: Digital Speech ProcessingDigital Speech Processing

Effect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted Channels

00 2

0

to omit channels, we set for those channels that are omitted, e.g., omit channel 0 ( ), and all channels above = / ,

forω π

• =

=

= >

k

M

k

P kk M N

P k M

2 2

0

/

for . in this case we get

[ ] π π

= >

= +∑

k

Mj kn N j kn

P k M

p n e e1

2 2/ / /( )π π−

−= +∑ ∑N M

N j kn N j kn Ne e1=∑k 1

2 11

sin ( )[ ]

π= − =

⎡ ⎤+⎢ ⎥⎣ ⎦= −⎡ ⎤

∑ ∑k N M k

M nNp nsin π⎡ ⎤

⎢ ⎥⎣ ⎦n

N

23

Page 24: Digital Speech ProcessingDigital Speech Processing

Effect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted ChannelsEffect of Omitted Channels

π⎡ ⎤ • p[n] periodic with period N=152 1

1sin ( )

[ ]i

π

π

⎡ ⎤+⎢ ⎥⎣ ⎦= −⎡ ⎤

M nNp n

• pulse amplitudes and widths depend on N and M

• for M=(N-1)/2 - all channels

24

sin π⎡ ⎤⎢ ⎥⎣ ⎦

nN

• for M=(N-1)/2 - all channels included-p[n] is an impulse train spaced every N=15 samples

Page 25: Digital Speech ProcessingDigital Speech Processing

Effects of Imperfect LPF DesignEffects of Imperfect LPF Design

[ ] [ ] [ ]=h n w n p n

0 0 assume we can delay [ ] by arbitrary amount giving [ ]• −w n n w n n

25

0 0

2 3

1 0 2 0 3 0

2

2' ' '

y [ ] y y g g [ ]

Case 1: [ ] [ ] [ ]

Case 2: [ ) [ ] [ ] [ ]

α δ α δ

α δ α δ α δ

= − + −

= − + − − + − −

h n n N n N

h n n n n N n n N n

Page 26: Digital Speech ProcessingDigital Speech Processing

Effects of Imperfect LPF DesignEffects of Imperfect LPF Design

t hi hift f l l t0

02

21

0 1/

( )

to achieve a shift of samples, let

π

− ∞

= ≤ ≤ −j kn Nk

N j k n n

n

P e k N

26

0

00

( )[ ] [ ]δ

= =−∞

= = − −∑ ∑j k n n

N

k r

p n e N n n rN

Page 27: Digital Speech ProcessingDigital Speech Processing

IIR Filter BanksIIR Filter Banks

10 000 assume , samples/sec assume uniform filter bank with spacing 100 Hz => • =

•sF

2 100 0=10,000/100=100, / , analysis range of 100 Hz to 3000 Hz => =30 channels

( f

ω π= ≤ ≤

•kN k k M

M) f Bessel (maximally flat delay•

th

) filters derived

from 6 order analog filter 30 msec duration impulse response =>

300 samples at 10 000 Hz SR•300 samples at 10,000 Hz SR 60 Hz filter bandwidth•

27

Page 28: Digital Speech ProcessingDigital Speech Processing

IIR Filter BanksIIR Filter Banks

for =30 =100 we get• M N0 61 10 01

for 30, 100, we getsin( . )[ ]sin( . )

broadening of IR from 30 channels

ππ

= −

M Nnp nn

2 since IR duration > samples, there

are at least 2 strong peaks in [ ] significant ripple in composite fre

N

h nquency

response (4 dB, 25 degrees)using delay to equalize amplitudes of first

and third pulses, get better frequencyresponse (0 8 dB ripple 0 6 degrees phase

response (0.8 dB ripple, 0.6 degrees phasedeviation)

28

Page 29: Digital Speech ProcessingDigital Speech Processing

Summary of FB DesignSummary of FB DesignSummary of FB DesignSummary of FB Design1. determine filter spacingfilter spacing and number ofnumber of1. determine filter spacingfilter spacing and number of number of

filtersfilters2.2. design filterdesign filter to meet frequency2.2. design filterdesign filter to meet frequency

selectivity for each channel3.3. evaluateevaluate w(n)w(n) and choose delay to3.3. evaluate evaluate w(n)w(n) and choose delay to

minimize ripple4. evaluate composite responsecomposite response and4. evaluate composite responsecomposite response and

iterate solution if response does not meet specs

29

p

Page 30: Digital Speech ProcessingDigital Speech Processing

FIR Filter BanksFIR Filter Banks

1 exact linear phase designs when ( ) ( ) good design methods--windowing, optimal equiripple designs• = − −•

w n w L n good design methods windowing, optimal equiripple designs

Window Design Method

design ideal LPF as

( ) | |ωω ω ω−

= ≤dj njd pW e e

01 2 where ( ) / fo

=• = −d

otherwisen L r -point window

can solve for ideal IR as

sin ( )ω

⎡ ⎤p

L

n n12

sin ( ) [ ]

( )

use finite duration window to truncate ( ) giving[ ] [ ] ( )

ω ω

ω

ωω

π π−

⎡ ⎤−⎣ ⎦= =−

= ≤ ≤

∫p

d

p

p dj n j nd

d

d

n nw n e e d

n n

w nw n w n d n n n n n [ ] [ ] ( ),

( ) ( ) (ω ω ω

= − − ≤ ≤

= ⊗d d d d

j j jd

w n w n d n n n n n

W e W e D e

12

( )

)

( ) ( )π

θ ω θ θπ

−= ∫ j jdW e D e d

30

2

this leads to a non-zero transition region betweenthe passband and stopband and ripples in both bands

ππ

Page 31: Digital Speech ProcessingDigital Speech Processing

Window Design PropertiesWindow Design PropertiesWindow Design PropertiesWindow Design Properties

1 transition region Δω inversely1. transition region, Δω, inversely proportional to L

2. W(ejω) antisymmetric around ωp

3. peak errors in passband and3. peak errors in passband and stopband nearly equal

4 i ti t t4. approximation error greatest near ωp

31

Page 32: Digital Speech ProcessingDigital Speech Processing

Kaiser Window DesignsKaiser Window DesignsKaiser Window DesignsKaiser Window DesignsKaiser window designs close to optimal•

20

0

1

Kaiser window designs close to optimal

( / ) [ ] , | |

( )

α

α

⎡ ⎤−⎢ ⎥⎣ ⎦= ≤d

d

I n nd n n n

I

0

0 ( ) is the zeroth order Bessel function of the first kind is a tradeoff between transition width and pe

αα

=•

otherwiseI

ak approximation

1020 7 951

14 362

errorlog .

.

/ ( )

δ

ω π

− −= +

ΔΔ = Δ

Lf

f 2

2

/ ( )Procedure for design: 1. and chosen =>

. computed from formula

ω π

δα

Δ Δ

Δ

f

f L

32

p

Page 33: Digital Speech ProcessingDigital Speech Processing

NonNon--Uniform Filter BanksUniform Filter BanksNonNon Uniform Filter BanksUniform Filter Banks

33

Page 34: Digital Speech ProcessingDigital Speech Processing

Ideal NonIdeal Non--Uniform DesignsUniform DesignsIdeal NonIdeal Non Uniform DesignsUniform Designs

0 1

filter bank bandpass filter response of form

[ ] [ ] ,id d i i i d [ ] f ll h l

ω

= ≤ ≤ −kj nk k kh n P w n e k N

d

(

consider designs using a common window, [ ], for all channels the composite frequency response is

( ) ( ω ωω −

••

jj

d n

H e P W e1

) )−

∑ k

N( ( ) ( ω ωω = jj

k kH e P W e0

) )

using the same design window for each channel givesπ

=

∑ k

k

12

( ) ( ) ( ) ( ) ( ) ( )π

ω ω θ ω ω θ

π

θπ

− − −

= ∫k kj j jk dkW e W e D e d

34

Page 35: Digital Speech ProcessingDigital Speech Processing

Ideal NonIdeal Non--Uniform DesignsUniform DesignsIdeal NonIdeal Non Uniform DesignsUniform Designs

l i f ( ) iωjH1

0

12

( ) ( )

solving for ( ) gives

( ) ( ) ( )

ω

πθ ωω ω θ θ

π

−− −

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑∫ k

j

Njj j

k dkk

H e

H e P W e D e d0

1( )

letting

( ) ( )

π

ω ωω

=−

−−

⎢ ⎥⎣ ⎦•

= ∑ k

k

Njj

d k dkH e P W e0

1

( ) ( )

givesπ

=

=

d k dkk

H e P W e

12

( ) ( ) ( ) ( )

where ( ) is the d

ω ω ω θ

πω

θπ

=

∫j j jd

jd

H e H e D e d

H e esired composite frequency response

35

( )d p q y p

Page 36: Digital Speech ProcessingDigital Speech Processing

Ideal NonIdeal Non--Uniform DesignsUniform DesignsIdeal NonIdeal Non Uniform DesignsUniform Designs1 0 1assume , , and that the bandwidths and center frequencies of• = ≤ ≤ −kP k N1 0 1

1

( )

assume , , and that the bandwidths and center frequencies of

( ) are such that the entire frequency range is covered, giving

( ) ,

ω ω

ωω

π ω π

π ω π

≤ ≤

− ≤ ≤

= − ≤ ≤

k

d

kj

dkj nj

d

P k N

W e

H e eπ

12

(( ) (ωω

π−= dj nj jH e e D e ) ) [ ]

giving a composite impulse response of

[ ] [ ] [ )

ωω θ

π

θ

δ

−−

=

∫ dj ndd d n e

h d [ ] [ ] [ ) this says that if the composite desired response is flat with linear phase then

the actual composite response of

δ= −

•d dh n d n n n

the filter bank, using filters all designed with thesame window is also ideal--independent of how the center frequencies and same window, is also ideal independent of how the center frequencies and

bandwidths are distributed, and no matter what filter design window is used this says that perfect reconstruction is theoretically possible using FIR filters

with an arbitrary distribution of center frequencies and bandwidths•

36

Page 37: Digital Speech ProcessingDigital Speech Processing

FIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank Examples

9 6 . kHz want 15 filters to cover range from 200-3200 Hz• =

•sF

3200 200 1002 2 15

lowpass cutoff frequency is

Hz( )

t f i

ωπ

−= = =p

pFT

200 100 1 152

center frequencies

,

desire 60 dB attentuation =

ωπ

= = + ≤ ≤

kkF k k

T> =5.65 with 200 Hzα

175transition bands => =L

37

Page 38: Digital Speech ProcessingDigital Speech Processing

FIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank Examples

• Fs=9.6 kHz

• want 4 octave band filters => bandwidth doubles with each successive filter => bandwidths of 200, 400, 800 and 1600 Hz, with center frequencies of 300, 600, 1200 and 2400 Hz

• use L=175

38

Page 39: Digital Speech ProcessingDigital Speech Processing

FIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank ExamplesFIR Filter Bank Examples

• want narrower transition bands

• use larger value of L--301

39

Page 40: Digital Speech ProcessingDigital Speech Processing

Applications of STFTApplications of STFTApplications of STFTApplications of STFT

• vocoders => voice coders code• vocoders => voice coders, code speech at rates much lower than

f dwaveform coders• removal of additive noise• de-reverberation

d d l d f h• speed-up and slow-down of speech for speed learning, aids for the

40handicapped

Page 41: Digital Speech ProcessingDigital Speech Processing

Coding of STFTCoding of STFT

elements of STFT1 set of { } chosen to cover frequency range of interestω

k 1. set of { } chosen to cover frequency range of interest 2. ( )-set of lowpass analysis windows 3. - set of complex gains to make composite frequency

response

ωk

k

k

w nP

as close to ideal as possible

41

response as close to ideal as possible goal is to sample STFT at rates lower than ( )⇒ x n

Page 42: Digital Speech ProcessingDigital Speech Processing

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• attenuation > 80 dB above 80 Hz

System example•12195

2 128 95 273

System example samples/sec

=128/ , . Hzω π

•=

= =

s

k k

FN

k F k

420 0 0 28 100

[ ] [ ] -- linear phase FIR filter, length 731 samples, , -- preserved band up to 2690 Hz=

= = < <k

k

w n w nP P k

Page 43: Digital Speech ProcessingDigital Speech Processing

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• part a is the original

• part b has no quantization, but there is fuzziness in the spectrogram due to echo in the composite IR => thecomposite IR => the system sounds reverberant

• Pk used to adjust phase and improve overall sound quality

43

overall sound quality

Page 44: Digital Speech ProcessingDigital Speech Processing

Coding of STFTCoding of STFTCoding of STFTCoding of STFT

• quantizing STFT’squantizing STFT s– BR=sampling rate x bits/sample

channel sampling rate depends on bandwidth– channel sampling rate depends on bandwidth of lowpass filter, e.g., 80 Hz BW => SR ≥160/sec160/sec

– using lower SR => aliasing in time– narrower bandwidth LP => reverberancenarrower bandwidth LP > reverberance

because of “holes” in spectrum => need to increase number of channels

44

Page 45: Digital Speech ProcessingDigital Speech Processing

Coding of STFTCoding of STFT

45aliasing in time due to lower

SR-160/100/80/60narrower bandwidth LP => 80/53/36 Hz

=> formant distortions

Page 46: Digital Speech ProcessingDigital Speech Processing

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• channel coding and quantization

• use ADM with 1 bit/sample

• 28 channels-with real and imaginary outputs

• BR=2 x 28 x SR=56 SR

• for ADM SR=5-10 times Nyquist rate

• SR=500 Hz (part a), 375 Hz ( t b) 250 H ( t )

46

(part b), 250 Hz (part c)

Page 47: Digital Speech ProcessingDigital Speech Processing

Coding of STFTCoding of STFTCoding of STFTCoding of STFT• non-uniform coding and quantizationquantization

• 28 channels

• 100/sec SR (gives small amount of aliasing)

• coding log magnitude and phase using 3 bits forand phase using 3 bits for log magnitude and 4 bits for phase for channels 1-10; and 2 bits for log ; gmagnitude and 3 bits for phase for channels 11-28

• total rate of 16 Kbps

47

total rate of 16 Kbps

Page 48: Digital Speech ProcessingDigital Speech Processing

STFT Coding SummarySTFT Coding Summary• sample channels at sufficient rate, quantize to

12 bits/sample => perceptually perfect p p p y preproduction of x(n) possible with bit rates of 100 Kbps

d d bit t ti l• reduced bit rate => quantize more coarsely, reduce sampling rate => 16 Kbps rates possible

• remove redundancy of the speech signal by• remove redundancy of the speech signal by higher order modeling prior to STFT operations

• cannot use SNR to evaluate STFT since degradations are perceived as modifications of speech quality and intelligibility-not additive noise => spectrograms subjective quality testing

48

noise => spectrograms, subjective quality testing

Page 49: Digital Speech ProcessingDigital Speech Processing

PhasePhase VocoderVocoderPhase Phase VocoderVocoder

d f d d l d f hused for speed-up and slow-down of speech

49intro phase vocoder

channel vocoder

speeded-up speech

slowed down speech

Page 50: Digital Speech ProcessingDigital Speech Processing

Additional Examples of Rate Additional Examples of Rate Ch i S hCh i S hChanges in SpeechChanges in Speech

• Male Speakerp– Original rate– Speeded up– Speeded up more– Speeded up more – Slowed down– Slowed down more

F l S k• Female Speaker– Original rate– Speeded upSpeeded up– Speeded up more– Slowed down

Slowed down more50

– Slowed down more

Page 51: Digital Speech ProcessingDigital Speech Processing

PhasePhase VocoderVocoder Time ExpandedTime ExpandedPhase Phase VocoderVocoder Time ExpandedTime Expanded

51

Page 52: Digital Speech ProcessingDigital Speech Processing

PhasePhase VocoderVocoder Time CompressedTime CompressedPhase Phase VocoderVocoder Time CompressedTime Compressed

52

Page 53: Digital Speech ProcessingDigital Speech Processing

Channel VocoderChannel VocoderChannel VocoderChannel Vocoder

• interpret STFT so that each channel can be thought of as a a e p e S so a eac c a e ca be oug o as a abandpass filter with center frequency ωk

• magnitude of STFT can be approximated by envelope detection on the BPF output (FWR and LPF)p ( )

• analyzer-bank of channels; need excitation info (the phase component) => V/UV detector, pitch detector

• synthesizer channel signal control channel amplitude;• synthesizer-channel signal control channel amplitude; excitation signals control detailed structure of output for a given channel; V/UV choice of excitation source

=> highly reverberant speech because of total lack of

53

=> highly reverberant speech because of total lack of control of composite filter bank response

Page 54: Digital Speech ProcessingDigital Speech Processing

Channel VocoderChannel VocoderChannel VocoderChannel Vocoder

1200 9600 b• 1200-9600 bps

• 600 bps for pitch and V/UV

54

• easy to modify pitch, timing Channel VocoderChannel Vocoder--2.4 kbps2.4 kbps

Page 55: Digital Speech ProcessingDigital Speech Processing

Channel VocoderChannel VocoderChannel VocoderChannel Vocoder

55