19
1 Digital Speech Processing Digital Speech Processing— Lecture 12 Lecture 12 1 Homomorphic Homomorphic Speech Processing Speech Processing General Discrete General Discrete-Time Model of Time Model of Speech Production Speech Production 2 [] [] [] [] [] [] [] [] [] [] [] [] [] Voiced Speech Unvoiced Speech = −− = = −− = L V V V L U U N p n pn h n h n A gn vn rn p n un h n h n A vn rn Basic Speech Model Basic Speech Model short segment of speech can be modeled as having been generated by exciting an LTI system either by a quasi-periodic impulse train, or a random noise signal speech analysis => estimate parameters of the h dl th i i ti ( d 3 speech model, measure their variations (and perhaps even their statistical variabilites-for quantization) with time speech = excitation * system response => want to deconvolve speech into excitation and system response => do this using homomorphic filtering methods Superposition Principle Superposition Principle + + [] x n } { L ]} [ { ] [ n x n y L = ] [ ] [ 2 1 n x n x + { } { } ] [ ] [ 2 1 n x n x L L + 4 1 2 1 2 [] [] [] [] { [ ]} { [ ]} { [ ]} = + = = + L L L xn ax n bx n y n xn a xn b x n Generalized Superposition for Convolution Generalized Superposition for Convolution for LTI systems we have the result * * [] x n { } H { } ] [ ] [ n x n y H = ] [ ] [ 2 1 n x n x { } { } ] [ ] [ 2 1 n x n x H H 5 1 2 1 2 [] [] [] [][ ] "generalized" superposition => addition replaced by convolution [] [] [] [] { [ ]} { [ ]} { [ ]} homomorphic system f =−∞ = = = = = Hx Hx Hx k yn xn hn xkhn k xn xn x n yn n n n or convolution Homomorphic Filter Homomorphic Filter 1 2 1 homomorphic filter => homomorphic system that passes the desired signal unaltered, while removing the undesired signal () [ ] [ ] - with [ ] the "undesired" signal = H xn xn x n xn 6 1 2 1 1 { [ ]} { [ ]} = H H xn xn 2 1 1 2 2 2 2 { [ ]} { [ ]} () - removal of [] { [ ]} [] { [ ]} [] [] [] for linear systems this is analogous to additive noise removal δ δ = = H H H H x n xn n xn x n x n xn n x n x n

Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

Embed Size (px)

Citation preview

Page 1: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

1

Digital Speech ProcessingDigital Speech Processing——Lecture 12Lecture 12

1

Homomorphic Homomorphic Speech ProcessingSpeech Processing

General DiscreteGeneral Discrete--Time Model of Time Model of Speech ProductionSpeech Production

2

[ ] [ ] [ ][ ] [ ] [ ] [ ][ ] [ ] [ ][ ] [ ] [ ]

Voiced Speech

Unvoiced Speech

= ∗ − −= ⋅ ∗ ∗= ∗ − −

= ⋅ ∗

L V

V V

L U

U N

p n p n h nh n A g n v n r np n u n h nh n A v n r n

Basic Speech ModelBasic Speech Model• short segment of speech can be modeled as

having been generated by exciting an LTI system either by a quasi-periodic impulse train, or a random noise signal

• speech analysis => estimate parameters of the h d l th i i ti ( d

3

speech model, measure their variations (and perhaps even their statistical variabilites-for quantization) with time

• speech = excitation * system response=> want to deconvolve speech into excitation and

system response => do this using homomorphicfiltering methods

Superposition PrincipleSuperposition Principle

++

[ ]x n }{L]}[{][ nxny L=

][][ 21 nxnx + { } { }][][ 21 nxnx LL +

4

1 2

1 2

[ ] [ ] [ ][ ] { [ ]} { [ ]} { [ ]}

= +

= = +L L Lx n ax n bx ny n x n a x n b x n

Generalized Superposition for ConvolutionGeneralized Superposition for Convolution

for LTI systems we have the result•

**[ ]x n

{ }H { }][][ nxny H=][][ 21 nxnx ∗ { } { }][][ 21 nxnx HH ∗

5

1 2

1 2

[ ] [ ] [ ] [ ] [ ]

"generalized" superposition => addition replaced by convolution[ ] [ ] [ ][ ] { [ ]} { [ ]} { [ ]}

homomorphic system f

=−∞

= ∗ = −

•= ∗

= = ∗

H x H x H x

k

y n x n h n x k h n k

x n x n x ny n n n n

or convolution

Homomorphic FilterHomomorphic Filter

1 2 1

homomorphic filter => homomorphic system that

passes the desired signal unaltered, while removing theundesired signal ( ) [ ] [ ] - with [ ] the "undesired" signal

⎡ ⎤• ⎣ ⎦

= ∗

H

x n x n x n x n

6

1 2 1

1

( ) g{ [ ]} { [ ]}= ∗H Hx n x n 2

1 1

2 2

2 2

{ [ ]}{ [ ]} ( ) - removal of [ ]{ [ ]} [ ]{ [ ]} [ ] [ ] [ ]

for linear systems this is analogous to additive noise removal

δ

δ

= ∗ =

H

H

H

H

x nx n n x nx n x nx n n x n x n

Page 2: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

2

Canonic Form for Canonic Form for HomomorphicHomomorphicDeconvolutionDeconvolution

**[ ]x n

+ˆ[ ]x n

+ +ˆ[ ]y n

+

[ ]y n

1 2[ ] [ ]x n x n∗ 1 2ˆ ˆ[ ] [ ]x n x n+ 1 2ˆ ˆ[ ] [ ]y n y n+ 1 2[ ] [ ]y n y n∗

{ }∗D { }1−∗D{ }L

7

any homomorphic system can be represented as a cascadeof three systems, e.g., for convolution 1. system takes inputs combined by convolution and transformsthem into additive outputs 2. system

is a conventional linear system 3. inverse of first system--takes additive inputs and transformsthem into convolutional outputs

Canonic Form for Homomorphic ConvolutionCanonic Form for Homomorphic Convolution

1 2 [ ] [ ] [ ] - convolutional relation= ∗x n x n x n

**[ ]x n

+ˆ[ ]x n

+ +ˆ[ ]y n

+

[ ]y n

1 2[ ] [ ]x n x n∗ 1 2ˆ ˆ[ ] [ ]x n x n+ 1 2ˆ ˆ[ ] [ ]y n y n+ 1 2[ ] [ ]y n y n∗

{ }∗D { }1−∗D{ }L

8

1 2

1 2

1 2 1 21

1 2 1 2

[ ] [ ] [ ]ˆ ˆ ˆ[ ] { [ ]} [ ] [ ] - additive relationˆ ˆ ˆ ˆ ˆ[ ] { [ ] [ ]} [ ] [ ] - conventional linear system

ˆ ˆ[ ] { [ ] [ ]} [ ] [ ] -

−∗

= = +

= + = +

= + = ∗

DL

D

x n x n x n x ny n x n x n y n y n

y n y n y n y n y n

1

inverse of convolutional relation=> design converted back to linear system,

- fixed (called the characteristic system for homomorphic deconvolution)

- fixed (characteristic system for ∗

−∗

⎡ ⎤⎣ ⎦

⎡ ⎤⎣ ⎦

LD

D inverse homomorphic deconvolution)

Properties of Characteristic Properties of Characteristic SystemsSystems

1 2

1 2

ˆ[ ] { [ ]} { [ ] [ ]}{ [ ]} { [ ]}

ˆ ˆ[ ] [ ]

∗ ∗

∗ ∗

= = ∗

= +

= +

D DD D

x n x n x n x nx n x n

x n x n

9

1 2

1 11 2

1 11 2

1 2

[ ] [ ]

ˆ ˆ ˆ{ [ ]} { [ ] [ ]}

ˆ ˆ{ [ ]} { [ ]}[ ] [ ] [ ]

− −∗ ∗

− −∗ ∗

= +

= +

= ∗

= ∗ =

D D

D D

x n x n

y n y n y n

y n y ny n y n y n

DiscreteDiscrete--Time Fourier Time Fourier Transform RepresentationsTransform Representationspp

10

Canonic Form for Canonic Form for DeconvolutionDeconvolution Using DTFTsUsing DTFTs

1 2

1 2

1 2

1 2

need to find a system that converts convolution to addition[ ] [ ] [ ]

( ) ( ) ( ) since

ˆ ˆ ˆ { [ ]} [ ] [ ] [ ]ˆ ˆ ˆ( ) ( ) ( ) ( )

=> use log function wh

ω ω ω

ω ω ω ω

•= ∗

= ⋅

= + =

⎡ ⎤ = + =⎣ ⎦

D

D

j j j

j j j j

x n x n x n

X e X e X e

x n x n x n x n

X e X e X e X e

1 2

ich converts products to sumsˆ ( ) log ( ) log ( ) ( )ω ω ω ω⎡ ⎤ ⎡ ⎤= = ⋅⎣ ⎦ ⎣ ⎦

j j j jX e X e X e X e

1 2 1 2

1 2 1 2

1 2

ˆ ˆlog ( ) log ( ) ( ) ( )

ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) ( ) ( )

ˆ ˆ( ) exp ( ) (

ω ω ω ω

ω ω ω ω ω

ω ω

⎣ ⎦ ⎣ ⎦⎡ ⎤ ⎡ ⎤= + = +⎣ ⎦ ⎣ ⎦⎡ ⎤= + = +⎣ ⎦

= +

L

j j j j

j j j j j

j j j

X e X e X e X e

Y e X e X e Y e Y e

Y e Y e Y e 1 2) ( ) ( )ω ω ω⎡ ⎤ = ⋅⎣ ⎦j jY e Y e

Characteristic System for Characteristic System for DeconvolutionDeconvolution Using DTFTsUsing DTFTs

12

12

( ) [ ]

ˆ ( ) log ( ) log ( ) arg ( )

ˆˆ[ ] ( )

ω ω

ω ω ω ω

πω ω

π

ωπ

∞−

=−∞

=

⎡ ⎤ ⎡ ⎤= = +⎣ ⎦ ⎣ ⎦

=

j j n

n

j j j j

j j n

X e x n e

X e X e X e j X e

x n X e e d

Page 3: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

3

Inverse Characteristic System for Inverse Characteristic System for DeconvolutionDeconvolution Using Using DTFTsDTFTs

13

12

ˆ ˆ( ) [ ]

ˆ( ) exp ( )

[ ] ( )

ω ω

ω ω

πω ω

π

ωπ

∞−

=−∞

=

⎡ ⎤= ⎣ ⎦

=

j j n

n

j j

j j n

Y e y n e

Y e Y e

y n Y e e d

Issues with LogarithmsIssues with Logarithms

1 2 1 2

1 2

1 2

it is essential that the logarithm obey the equation

log ( ) ( ) log ( ) log ( )

this is trivial if ( ) and ( ) are real -- however usually

( ) and ( )

ω ω ω ω

ω ω

ω ω

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⋅ = +⎣ ⎦ ⎣ ⎦ ⎣ ⎦

j j j j

j j

j j

X e X e X e X e

X e X e

X e X e are complex

14

arg ( )

on the unit circle the complex log can be written in the form:

( ) | ( ) |ˆlog ( ) ( ) log | ( ) | arg ( )

no problems with log magnitude term; uniq

ωω ω

ω ω ω ω

⎡ ⎤⎣ ⎦

=

⎡ ⎤ ⎡ ⎤ ⎡ ⎤= = +⎣ ⎦ ⎣ ⎦ ⎣ ⎦•

jj X ej j

j j j j

X e X e e

X e X e X e j X e

ueness problemsarise in defining the imaginary part of the log; can show thatthe imaginary part (the phase angle of the z-transform) needsto be a continuous odd function of ω

Problems with arg FunctionProblems with arg Function

{ } { }{ }

1( ) ( )ω ω≠j jARG X e ARG X e

15

{ }2( )ω+ jARG X e

{ } { }{ }

1

2

arg ( ) arg ( )

arg ( )

ω ω

ω

=

+

j j

j

X e X e

X e

Complex Complex CepstrumCepstrum PropertiesProperties

1ˆ[ ] (log | ( ) | arg{ ( )})2

[ ] ( ) | arg{ ( )}

Given a complex logarithm that satisfies the phase continuity condition, we have:

If real, then log|X is an even function of and

j j j n

j j

x n X e j X e e d

x n e X e

πω ω ω

π

ω ω

ωπ

ω−

= +∫

i

i is an odd function of . This means that the real and imaginary parts of ω

16

ˆ[ ]ˆ[ ]

ˆ[ ] [ ] [ ]

the complex log have the appropriate symmetry for to be a real sequence, and can be represented as:

x nx n

x n c n d n= +

ˆ[ ] ( ) | [ ],ˆ[ ] arg{ ( )} [ ] :

ˆ ˆ ˆ ˆ[ ] [ ] [ ] [ ][ ] ; [ ]2 2

where is the inverse DTFT of log |X and the even part of and is the inverse DTFT of and the odd part of

j

j

c n e x nd n X e x n

x n x n x n x nc n d n

ω

ω

+ − − −= =

Complex and Real CepstrumComplex and Real Cepstrum

12

ˆ define the inverse Fourier transform of ( ) as

ˆˆ[ ] ( )

ˆ where [ ] called the "complex cepstrum" since a complexlogarithm is involved in the computation

ω

πω ω

π

ωπ

=

j

j j n

X e

x n X e e d

x n

17

g can also define a "re•

12

12

al cepstrum" using just the real part ofthe logarithm, giving

ˆ[ ] Re ( )

log | ( ) |

ˆ can show that [ ] is the even part of [ ]

πω ω

ππ

ω ω

π

ωπ

ωπ

⎡ ⎤= ⎣ ⎦

=

j j n

j j n

c n X e e d

X e e d

c n x n

TerminologyTerminology•• SpectrumSpectrum – Fourier transform of signal autocorrelation•• CepstrumCepstrum – inverse Fourier transform of log spectrum•• AnalysisAnalysis – determining the spectrum of a signal•• AlanysisAlanysis – determining the cepstrum of a signal•• FilteringFiltering linear operation on time signal

18

•• FilteringFiltering – linear operation on time signal•• LifteringLiftering – linear operation on cepstrum•• FrequencyFrequency – independent variable of spectrum•• QuefrencyQuefrency – independent variable of cepstrum•• Harmonic Harmonic – integer multiple of fundamental frequency•• RahmonicRahmonic – integer multiple of fundamental frequency

Page 4: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

4

zz--Transform RepresentationTransform Representation

1 2

1 2

[ ] [ ]* [ ]

( ) ( ) ( )

ˆ ( ) log{ ( )} log{ ( ) ( )}

The transform of the signal: is of the form: With an appropriate definition of the complex log, we get:

zx n x n x n

X z X z X z

X z X z X z X z

−=

= ⋅

= = ⋅

i

i

19

1 2

1

( ) log{ ( )} log{ ( ) ( )}log{ (

X z X z X z X zX

= == 2

1 2

)} log{ ( )}ˆ ˆ( ) ( )

z X z

X z X z

+

= +

Characteristic System for Characteristic System for DeconvolutionDeconvolution

20

{ }

[ ] [ ]1

2

arg ( )( ) [ ] ( )

ˆ ( ) log ( ) log ( ) arg ( )

ˆˆ[ ] ( )π

∞−

=−∞

= =

= = +

=

j X zn

n

n

X z x n z X z e

X z X z X z j X z

x n X z z dzj

Inverse Characteristic System for Inverse Characteristic System for DeconvolutionDeconvolution

21

[ ]1

2

ˆ ˆ( ) [ ]

ˆ( ) exp ( ) log ( ) arg ( )

[ ] ( )π

∞−

=−∞

=

⎡ ⎤= = +⎣ ⎦

=

n

n

n

Y z y n z

Y z Y z Y z j Y z

y n Y z z dzj

zz--Transform Transform CepstrumCepstrum AlanysisAlanysis

( ) ( )

( )

01 1 1

1 1

1

1

1 1

1

consider digital systems with rational z-transforms of the general type

( )

− − −

= =

=

− −

=

∏ ∏

i

k

i

MM

kk k

N

kk

A a z b zX z

c z

22

( ) ( ) ( )0

0 1 1

1 1

1 1

we can express the above equation as:

( )

− − −

= =

− − −

=∏

i

k

M MM

k kk k

z A b a z b zX z

( )

0

1

1

1

1

1 with all coefficients , , => all poles and zeros are inside the unit circle; all zeros are outside the unit circle;

=

=

• <

∏ ∏

i

i

M

kN

kk

k k k k

k k

c z

a b c ca b

zz--Transform Transform CepstrumCepstrum AlanysisAlanysis

( )

0min max

1

( )

( ) ( ) ( )

express as product of minimum-phase andmaximum-phase signals, i.e.,

wherei

M

M

X z

X z X z z X z−= ⋅

i

i

23

( )

( )

( )

1

1min

1

1

1max

1

1( )

1

( ) 1

all poles and zeros inside unit circle

i

i

k

kk

N

kk

M

k

A a zX z

c z

X z b

=

=

=

−=

= −

i

( )1

all zeros outside unit circle

iM

kk

b z=

−∏i

zz--Transform Transform CepstrumCepstrum AlanysisAlanysis

min max 0

min

[ ][ ] [ ] [ ]

[ ] 0, 0

can express as the convolution:

minimum-phase component is causal

x nx n x n x n M

x n n

= ∗ −

= <

i

i

24

0

min

max

[ ] ,

[ ] 0, 0maximum-phase component is anti-causal

factor is the shift in time oriM

x n n

z−= >

i

i 0

[ ]

gin by samples required so that the overall sequence,

be causal

M

x n

Page 5: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

5

zz--Transform Cepstrum Transform Cepstrum AlanysisAlanysis

( ) ( ) ( )

00

0

1

1

1 1

the complex logarithm of ( ) is

ˆ ( ) log ( ) log | | log | | log[ ]−−

=

= = + + +⎡ ⎤⎣ ⎦ ∑

∑ ∑ ∑i i

MM

kk

MM N

X z

X z X z A b z

25

( ) ( ) ( )1 1

1 1 1

1 1 1log log log

ˆ evaluating ( ) on the unit circle we can ignore the ter

− −

= = =

− + − − −

∑ ∑ ∑k k kk k k

a z b z c z

X z0

m

related to log (as this contributes only to the imaginary

part and is a linear phase shift)

ω⎡ ⎤⎣ ⎦

j Me

zz--Transform Transform CepstrumCepstrum AlanysisAlanysis

12

we can then evaluate the remaining terms, use power seriesexpansion for logarithmic terms (and take the inversetransform to give the complex cepstrum) giving:

ˆˆ ( ) ( )π

ω ω ωπ

= ∫ j j nx n X e e d log(1 ) , | | 1nZZ Z

n

− = − <∑

26

2

log | |

ππ

= +

A0

0

1

1

1 1

1

0

0

0

log | |−

=

= =

=

=

= − >

= <

∑ ∑

i i

M

kk

N Mn nk k

k kM n

k

k

b n

c an

n n

bn

n

1n n=∑

CepstrumCepstrum PropertiesProperties1.

[ ]( )

complex cepstrum is non-zero and of infinite extent forboth positive and negative , even though may becausal, or even of finite duration ( has only zeros).2. complex cepstrum is a decaying

n x nX z

| |

ˆ| [ ] | , | || |

sequence that is bounded by:

for

3. zero-quefrency value of complex cepstrum (and the cepstrum)

n

x n nnαβ< →∞

27

ˆ[0] 0depends on the gain constant and the zeros outside the unit circle.Setting x =

01

1

[0] 0

( ) 1

( ) 0),ˆ[ ] 0, 0

(and therefore ) is equivalent to normalizingthe log magnitude spectrum to a gain constant of:

4. If has no zeros outside the unit circle (all then:(minimum

M

kk

k

c

A b

X z bx n n

=

=

− =

== <

( ) , 0),ˆ[ ] 0, 0

-phase signals)5. If has no poles or zeros inside the unit circle (all then:

(maximum-phase signals)k kX z a c

x n n=

= >

zz--Transform Cepstrum Transform Cepstrum AlanysisAlanysis

( ) 1

1

11 1

The main z-transform formula for cepstrum alanysis is based on the power series expansion:

log( )

A l thi f l t th ti l

+∞

=

−+ = <∑

i

nn

nx x x

n

E l 1

28

1

--Apply this formula to the exponential sequence

( )

i

x n

Example 1

( )

1 1

11

1 11

11 1

1

11

11

1 1

( ) ( )

ˆ ( ) log[ ( )] log( ) ( )

ˆˆ ( ) ( ) ( ) log( )

+∞− −

=

∞− −

=

= ⇔ =−

−= = − − = − −

⎛ ⎞= − ⇔ = − − = ⎜ ⎟

⎝ ⎠

n

nn n

n

n nn

n

a u n X zaz

X z X z az a zn

a ax n u n X z az zn n

zz--Transform Cepstrum Transform Cepstrum AlanysisAlanysis

1

11 1

2

--consider the case of a digital system with a

single zero outside the unit circle ( )( ) ( ) ( )( ) (zero at / )

δ δ

<

= + +

+

bx n n b nX z bz z b

Example 2

29

[ ] ( )2

2 2

1

1

2

1 1

1

1

( ) (zero at / )ˆ ( ) log ( ) log

( ) ( )

ˆ

+∞

=

= + = −

= = +

−= ∑

nn n

n

X z bz z b

X z X z bz

b zn

x11 1( )( ) ( )+−

= − −n nbn u nn

zz--Transform Transform CepstrumCepstrum AlanysisAlanysis for 2 Pulsesfor 2 Pulses

( )

3

3

3 3

1

0 1

1

1

--an input sequence of two pulses of the form

( ) ( ) ( ) ( )

( )

ˆ ( ) log ( ) log

δ αδ α

α

α

∞ +

= + − < <

= +

= = +⎡ ⎤⎣ ⎦

p

p

p

N

N

n

x n n n N

X z z

X z X z z

Example 3

30

1

1

13

1

1

1

( )

ˆ ( ) ( ) ( )

α

α δ

∞ +−

=∞

+

=

−=

= − −

pn

nNn

nk

kp

k

zn

x n n kNk

the cepstrum is an impulse train with impulses spaced at samples• pN

NNpp 2N2Npp 3N3Npp

Page 6: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

6

Cepstrum for Train of ImpulsesCepstrum for Train of Impulses

0

an important special case is a train of impulses of the form:

( ) ( )α δ=

= −∑M

r prM

N

x n n rN

31

0

1

( )

clearly ( ) is a polynomial in rather than ; thus ( ) can be expressed

α −

=− −

=

∑ p

p

rNr

rN

X z z

X z z zX z

1 1

as a product of factors

of the form ( ) and ( ), giving a complex ˆ cepstrum, ( ), that is non-zero only at integer multiples of

−− −p pN N

p

az bzx n N

zz--Transform Cepstrum Alanysis for Transform Cepstrum Alanysis for Convolution of 2 SequencesConvolution of 2 Sequences

4 1 3

--consider the convolution of sequences 1 and 3, i.e.,

( ) ( ) ( ) ( ) ( ) ( )

( ) ( )The comple cepstr m is therefore the s m of the comple

δ αδ

α −

⎡ ⎤ ⎡ ⎤= ∗ = ∗ + −⎣ ⎦⎣ ⎦

= + −

i

p

np

n Nnp

x n x n x n a u n n n N

a u n a u n N

Example 4

cepstra

32

The complex cepstrum is therefore the sum of the complei

4 1 3

1

1

11

x cepstra of the two sequences (since convolution in the time domain is converted to addition in the cepstral domain)

ˆ ˆ ˆ ( ) ( ) ( )

( )( ) ( )α δ+∞

=

= +

−= − + −∑

n k k

pk

x n x n x n

a u n n kNn k

zz--Transform Cepstrum Alanysis for Transform Cepstrum Alanysis for Convolution of 3 SequencesConvolution of 3 Sequences

[ ]5 1 2 3

1

1

1 1

--consider the convolution of sequences 1, 2 and 3, i.e.,

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

δ δ δ αδ− − +

= ∗ ∗

⎡ ⎤ ⎡ ⎤= ∗ + + ∗ + −⎣ ⎦⎣ ⎦

i

p p

np

n N n Nn n

x n x n x n x n

a u n n b n n n N

a n a n N ba n ba n N

Example 5

33

1 1( ) ( ) ( ) ( )

The complex

α α= + − + + + − +

i

p pn np pa u n a u n N ba u n ba u n N

5 1 2 3

1 1

1

1 11 1

cepstrum is therefore the sum of the complex cepstra of the three sequences

ˆ ˆ ˆ ˆ ( ) ( ) ( ) ( )

( ) ( )( ) ( ) ( )α δ+ +∞

=

= + +

− −= − + − + − −∑

n k k n n

pk

x n x n x n x n

a bu n n kN u nn k n

Example: a=.9, b=.8, a=.7, Np=15

( )1

1 11( )( )

( )α −

+= +

−pNbzX z z

az

34

11( ) +− n nbn

nan

1

1

1( ) [ ]α δ+∞

=

−−∑

k k

pk

n kNk

HomomorphicHomomorphic Analysis of Analysis of Speech ModelSpeech Model

35

HomomorphicHomomorphic Analysis of Speech ModelAnalysis of Speech Model

the transfer function for voiced speech is of the form ( ) ( ) ( ) ( ) with effective impulse response for voiced speech

[ ] [ ] [ ] [ ]

•= ⋅

∗ ∗i

V VH z A G z V z R z

h n A g n v n r n

36

[ ] [ ] [ ] [ ] similarly for unvoiced speech we have

= ⋅ ∗ ∗

•V V

U

h n A g n v n r n

H ( ) ( ) ( ) with effective impulse response for unvoiced speech

[ ] [ ] [ ]

= ⋅

= ⋅ ∗i

U

U U

z A V z R z

h n A v n r n

Page 7: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

7

Complex Cepstrum for SpeechComplex Cepstrum for Speech0

1

1 1

1

1

1 1

1

0

the models for the speech components are as follows:

( ) ( )1. vocal tract: ( )

( )

--for voiced speech, only poles => , all

− −

= =

=

− −

=

= =

∏ ∏

i

i

MMM

k kk k

N

kk

k k

Az a z b zV z

c z

a b k

37

0o o ced speec , o y po es , a--unvoiced speech and nasals

k ka b

1

1

1

, need pole-zero model but all poles areinside the unit circle => --all speech has complex poles and zeros that occur in complex conjugatepairs

2. radiation model: ( ) (high frequency emphas−

<

≈ −

kc

R z z

01

1 1

1 1

is)3. glottal pulse model: finite duration pulse with transform

( ) ( ) ( )

with zeros both inside and outside the unit circle

α β−

= =

= − −∏ ∏i LL

k kk k

G z B z z

Complex Cepstrum for Voiced Complex Cepstrum for Voiced SpeechSpeech

• combination of vocal tract, glottal pulse and radiation will be non-minimum phase => complex cepstrum exists for all values of n

38

• the complex cepstrum will decay rapidly for large n (due to polynomial terms in expansion of complex cepstrum)

• effect of the voiced source is a periodic pulse train for multiples of the pitch period

Simplified Speech ModelSimplified Speech Model

[ ] short-time speech model

[ ] [ ] [ ] [ ] [ ] [ ][ ] [ ]

= ⋅ ∗ ∗ ∗x n w n p n g n v n r nh

39

[ ] [ ] short-time complex cepstrum

ˆ ˆ ˆˆ ˆ [ ] [ ] [ ] [ ] [ ]

≈ ∗

•= + + +

w v

w

p n h n

x n p n g n v n r n

Analysis of Model for Voiced SpeechAnalysis of Model for Voiced Speech

1 1

1 2 1 1 2

1 2

0.5 [1 cos( ( 1) / )] 0 1[ ] cos(0.5 ( 1 ) / ) 2

025, 1

Assume sustained /AE/ vowel with fundamental frequency of 125 Hz Use glottal pulse model of the form:

otherwise

n N n Ng n n N N N n N N

N N

ππ

− + ≤ ≤ −⎧⎪= + − ≤ ≤ + −⎨⎪⎩

= =

ii

33 33

0 34 sample impulse response, with transform⇒

40

33 3333 1

1 1

( ) ( ) (1 ) all roots outside unit circle maximum phase

Vocal tract system specified by 5 formants (frequencies and bandwidth

k kk k

G z z b b z− −

= =

= − − ⇒ ⇒∏ ∏i

52 41 2

1

1

1( )(1 2 cos(2 ) )

{ , } [(660,60), (1720,100), (2410,120), (3500,175), (4500,250)]

( ) 1 , 0.96

s)

Radiation load is simple first difference

k kT Tk

k

k k

V ze F T z e z

F

R z z

πσ πσπ

σ

γ γ

− −− −

=

=− +

=

= − =

i

Time Domain AnalysisTime Domain Analysis

41

PolePole--Zero Analysis of Model Zero Analysis of Model ComponentsComponents

42

Page 8: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

8

Spectral Analysis of ModelSpectral Analysis of Model

43

Speech Model OutputSpeech Model Output

44

Complex Cepstrum of ModelComplex Cepstrum of Model

[ ] [ ] [ ] [ ] [ ]

ˆ ˆ ˆ ˆ ˆ[ ] log | | [ ] [ ] [ ] [ ] [ ]ˆ[ ] 0 0

The voiced speech signal is modeled as: with complex cepstrum:

glottal pulse is maximum phase

V

V

x n A g n v n r n p n

s n A n g n v n r n p ng n n

δ

= ⋅ ∗ ∗ ∗

= + + + +⇒ = >

i

i

i

45

[ ] 0, 0 glottal pulse is maximum phase vocal tra

g n n⇒ = >ii

1

1

ˆ ˆ[ ] 0, 0, [ ] 0, 0

ˆ( ) log(1 )

ˆ[ ] [ ]

ct and radiation systems are minimum phase

p p

kN kN

kk

pk

v n n r n n

P z z zk

p n n kNk

ββ

β δ

∞− −

=

=

⇒ = < = <

= − − =

= −

Cepstral Analysis of ModelCepstral Analysis of Model

46

Resulting Complex and Real Resulting Complex and Real CepstraCepstra

47

Frequency Domain RepresentationsFrequency Domain Representations

48

Page 9: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

9

FrequencyFrequency--Domain Representation of Domain Representation of Complex CepstrumComplex Cepstrum

49

The Complex The Complex CepstrumCepstrum--DFT ImplementationDFT Implementation

2 2

0 1 1[ ] ( ) [ ] , ,..., ,π π∞ −

= = = −∑j k j knN N

nX k X e x n e k N

50

{ } { }2

21

0

1 0 1 1

/

[ ] is the point DFT corresponding to ( )ˆ ˆ[ ] ( ) log [ ] log [ ] arg [ ]

ˆˆ ˆ[ ] [ ] [ ] , ,...,

ˆ [

ω

π

π

=−∞

− ∞

= =−∞

= = = +

= = + = −

∑ ∑

in

jp

j k N

N j knN

k r

X k N X e

X k X e X k X k j X k

x n X k e x n rN n NN

x ˆ] is an aliased version of [ ]use as large a value of as possible to minimize aliasing⇒n x n

N

Inverse SystemInverse System-- DFT ImplementationDFT Implementation

51

The The CepstrumCepstrum--DFT ImplementationDFT Implementation

12

[ ] log | ( ) |

Approximation to cepstrum using DFT:

πω ω

π

ωπ −

= −∞ < < ∞∫i

j j nc n X e e d n

52

2 2

12

0

0 1 1

1 0 1

0 1 1

/

Approximation to cepstrum using DFT:

[ ] ( ) [ ] , ,..., ,

[ ] log | [ ] | ,

( ) [ ] , ,...,

[ ]

π π

π

∞ −

=−∞

=

=−∞

= = = −

= ≤ ≤ −

= + = −

j k j knN N

nN

j kn N

k

r

X k X e x n e k N

c n X k e n NN

c n c n rN n N

c n

2

is an aliased version of [ ] use as large a value of as possible to minimize aliasing

ˆ ˆ[ ] [ ]( )

+ −=

c n N

x n x nc n

Cepstral Computation AliasingCepstral Computation Aliasing

N=256, NN=256, Npp=75, =75, αα=0.8=0.8

Circle dots areCircle dots are

53

Circle dots are Circle dots are cepstrum cepstrum values in values in correct correct

locations; all locations; all other dots are other dots are

results of results of aliasing due to aliasing due to

finite range finite range computationscomputations

SummarySummary

z( )z( ) log( )log( ) L( )L( ) exp( )exp( ) zz--11( )( )x[n]x[n] y[n]y[n]X(z)X(z) X(z)X(z)^̂ Y(z)Y(z)^̂ Y(z)Y(z)

1. Homomorphic System for Convolution:

zz--11( )( ) Linear Linear SystemSystem

z( )z( )X(z)X(z)^̂ Y(z)Y(z)^̂x[n]x[n] y[n]y[n]^̂ ^̂

54

{ }

{ }

1

arg ( )

2. Practical Case: ( )

( )

( ) ( )

log ( ) log ( ) arg ( )

jj X ej j

j j j

z DFTz IDFT

X e X e e

X e X e j X e

ωω ω

ω ω ω

=

⎡ ⎤ = +⎣ ⎦

Page 10: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

10

SummarySummary

12

3. Complex Cepstrum:

ˆˆ [ ] ( )

4 C t

πω ω

π

ωπ −

= ∫ j j nx n X e e d

55

12

2

4. Cepstrum:

[ ] log ( )

ˆ ˆ[ ] [ ] ˆ[ ] even part of [ ]

πω ω

π

ωπ −

=

+ −= =

∫ j j nc n X e e d

x n x nc n x n

SummarySummary

DFTDFT Complex LogComplex Log IDFTIDFTx(n)x(n) X(kX(k)) X(kX(k)) x(nx(n))^̂ ^̂

( ) ( )

{ } { }

2 2/ /

5. Practical Implementation of Complex Cepstrum:

[ ] ( ) [ ] )

ˆ

π π∞

=−∞

= = ∑j N k j N kn

nX k X e x n e

~

56

{ } { }( )

12

0

1 /

ˆ [ ] log ( ) log ( ) arg ( )

ˆˆ ˆ[ ] [ ] [ ] aliasing

ˆ[ ] aliased versi

π− ∞

= =−∞

= = +

= = + ⇒

=

∑ ∑

p p p

Nj N kn

k r

X k X k X k j X k

x n X k e x n rNN

x n

11

1

11

1

ˆon of [ ]6. Examples:

ˆ( ) [ ] [ ]

ˆ( ) [ ] [ ]

δ

δ

−=

=

= ⇔ = −−

= − ⇔ = − +

r

rr

r

x n

aX z x n n raz r

bX z bz x n n rr

Complex Complex CepstrumCepstrum Without Phase Without Phase UnwrappingUnwrapping

0

0

1 1 1

1 1

[ ]

( ) [ ] ,

( ) [0] (1 ) (1 )

short-time analysis uses finite-length windowed segments,

-order polynomial

Find polynomial rootsi

Mn th

n

M M

m mm m

x n

X z x n z M

X z x a z b z

=

− − −

= =

=

= − −

∏ ∏

i

i

57

roots are inside unit circle (minmai

00

00

1 1

1

1 1

1

1

( ) (1 ) (1 )

[0]( 1)

imum-phase part) roots are outside unit circle (maximum-phase part) Factor out terms of form giving:

Use polynomial root finde

i

m

mM M

Mm m

m mM

Mm

m

b

b z

X z Az a z b z

A x b

− −

− −

= =

=

= − −

= −

∏ ∏

ii

iˆ[ ]

r to find the zeros that lie insideand outside the unit circle and solve directly for .x n

Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals

for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum can be completely represented by the real part of the Fourier transforms this means we can represent the compl

2

ex cepstrum of minimum phase signals by the log of the magnitude of the FT alone since the real part of the FT is the FT of the even part of the sequence

ˆ ˆ( ) ( )ˆRe ( )ω

⎡ + − ⎤⎡ ⎤ = ⎢ ⎥⎣ ⎦ ⎣ ⎦j x n x nX e FT

58

2

( )

⎣ ⎦ ⎣ ⎦

=⎡ ⎤⎣ ⎦FT c n

2

0 00

2 0

log ( )

ˆ ˆ( ) ( )( )

givingˆ ( )

( )( )

thus the complex cepstrum (for minimum phase signals) can be computed by computing the cepstrum and using the equation above

ω

+ −=

= <= == >

jX e

x n x nc n

x n nc n nc n n

Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals

the complex cepstrum for minimum phase signals can be computed recursively from the input signal,

( ) using the relation

x n

59

1

0

0 00 0

00 0

ˆ( )log ( )

( ) ( )ˆ( )( ) ( )

=

= <

= =⎡ ⎤⎣ ⎦

−⎛ ⎞= − >⎜ ⎟⎝ ⎠

∑n

k

x n nx n

x n k x n kx k nx n x

Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals

( ) ( ) 1. basic -transform( )( ) ( ) 2. scale by rule

←⎯→

′←⎯→ − = −

x n X z zdX znx n z zX z ndz

60

[ ]

[ ]

ˆˆ( ) ( ) log ( ) 3. definition of complex cepstrumˆ ( ) ( )log[ ( )] 4. differentiation of -transform

( )ˆ ( )

←⎯→ =

′= =

x n X z X z

dX z d X zX z zdz dz X z

dX zz ( ) ( ) 5. multiply both sides of equation′= −X z zX zdz

Page 11: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

11

Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals

( )

0 00 0

ˆ ( )ˆ( ) ( ) ( ) ( ) ( )

ˆ( ) ( ) ( )

ˆ for minimum phase systems we have ( ) for ,( ) for giving:

=−∞

′∗ ←⎯→ − = − ←⎯→

= −

= <= <

∑i

k

dX znx n x n z X z zX z nx ndz

nx n x k x n k k

x n nx n n

61

0

0 0( ) for , giving:

ˆ( ) ( ) ( )

separating out the term for =

= <

⎛ ⎞= − ⎜ ⎟⎝ ⎠

∑i

n

k

x n nkx n x k x n kn

[ ]

1

0

1

0

0

00 0

0 0 0 0

we get:

ˆ ˆ( ) ( ) ( ) ( ) ( )

( ) ( )ˆ ˆ( ) ( ) ,( ) ( )

ˆ ˆ( ) log ( ) , ( ) ,

=

=

=

⎛ ⎞= − +⎜ ⎟⎝ ⎠

− ⎛ ⎞= − >⎜ ⎟⎝ ⎠

= = <

n

k

n

k

k nkx n x k x n k x x nn

x n x n k kx n x k nx x n

x x x n n

Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals

0 00 1 1

0 1 1 1 1

1 10 1 10 0

ˆ why is ( ) log[ ( )]? assume we have a finite sequence ( ), , ,..., we can write ( ) as:

( ) ( ) ( ) ( ) ( ) ... ( ) ( )( ) ( )( ) ( ) ( ) ... ( )( ) ( )

x xx n n N

x nx n x n x n x N N

x x Nx n n Nx x

δ δ δ

δ δ δ

== −

= + − + + − −

⎡ −= + − + + −

iii

⎤⎢ ⎥

62

1 211

0 1 1

1 1

0

taking transforms, we get:

( ) ( ) ( ) ( )

where the first term is the gain, ( ), and the two product terms are the zeros inside and outside t

NZ NZNn

k kn k k

z

X z x n z G a z b z

G x

−− −

= = =

= = − −

=

∑ ∏ ∏

i

i

0 0 0 0

he unit circle. for minimum phase systems we have all zeros inside the unit circle so the

second product term is gone, and we have the result thatˆ ˆ ( ) log[ ] log[ ( )]; ( ) ,

ˆ( )

x G x x n n

x n

= = = <

=

i

0NZ1

k=1

ank nn

⎛ ⎞− >⎜ ⎟

⎝ ⎠∑

Cepstrum for Maximum Phase SignalsCepstrum for Maximum Phase Signals

2

for maximum phase signals (no poles or zeros inside unit circle)

ˆ ˆ( ) ( )( )

giving

+ −=

x n x nc n

63

0 00

2 0

givingˆ ( )

( )( )

thus the complex cepstrum (for maximum phase signals) can be compute

= >= == <

x n nc n nc n n

d by computing the cepstrum and using the equation above

Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Maximum Phase SignalsCepstrum for Maximum Phase Signals

the complex cepstrum for maximum phase signals can be computed recursively from the input signal,

( ) using the relation

x n

64

0

1

0 00 0

00 0

( ) using the relationˆ( )

log ( )

( ) ( )ˆ( )( ) ( )= +

= >

= =⎡ ⎤⎣ ⎦

−⎛ ⎞= − <⎜ ⎟⎝ ⎠

∑k n

x nx n n

x n

x n k x n kx k nx n x

Computing ShortComputing Short--Time Time CepstrumsCepstrums from Speech from Speech U i P l i l R tU i P l i l R tUsing Polynomial RootsUsing Polynomial Roots

65

CepstrumCepstrum From Polynomial RootsFrom Polynomial Roots

66

Page 12: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

12

CepstrumCepstrum From Polynomial RootsFrom Polynomial Roots

67

Computing ShortComputing Short--Time Time CepstrumsCepstrums from Speech from Speech

68

Using the DFTUsing the DFT

Practical ConsiderationsPractical Considerations

• window to define short-time analysis• window duration (should be several pitch

periods long)i f FFT (t i i i li i )

69

• size of FFT (to minimize aliasing)• elimination of linear phase components

(positioning signals within frames)• cutoff quefrency of lifter• type of lifter (low/high quefrency)

Computational ConsiderationsComputational Considerations

70

Voiced Speech ExampleVoiced Speech Example

Hamming window

40 msec duration

(section beginning

71

at sample 13000 in file

test_16k.wav)

Voiced Speech ExampleVoiced Speech Example

wrapped phase

72

unwrapped phase

Page 13: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

13

Voiced Speech ExampleVoiced Speech Example

73

Characteristic System for Characteristic System for Homomorphic ConvolutionHomomorphic Convolution

• still need to define (and design) the Loperator part (the linear system component) of the system to completely

74

p ) y p ydefine the characteristic system for homomorphic convolution for speech– to do this properly and correctly, need to look

at the properties of the complex cepstrum for speech signals

Complex Cepstrum of SpeechComplex Cepstrum of Speech

• model of speech:– voiced speech produced by a quasi-periodic

pulse train exciting slowly time-varying linear system => p[n] convolved with hv[n]

75

y p[ ] v[ ]– unvoiced speech produced by random noise

exciting slowly time-varying linear system => u[n] convolved with hv[n]

• time to examine full model and see what the complex cepstrum of speech looks like

Homomorphic Filtering of Voiced Homomorphic Filtering of Voiced SpeechSpeech

• goal is to separate out the excitation impulses from the remaining components of the complex cepstrum

• use cepstral window, l(n), to separate excitation pulses from combined vocal tract

76

– l(n)=1 for |n|<n0<Np– l(n)=0 for |n|≥n0– this window removes excitation

pulses– l(n)=0 for |n|<n0<Np– l(n)=1 for |n|≥n0– this window removes combined

vocal tract• the filtered signal is processed by

the inverse characteristic system to recover the combined vocal tract component

( )12

ˆ ˆ( ) ( ) ( )

ˆ ˆ( ) ( ) ( )jj j

y n n x n

Y e X e L e dπ

ω θω θ

π

θπ

= ⋅

= ∫

Voiced Speech ExampleVoiced Speech ExampleCepstrally

smoothed log magnitude, 50

quefrenciescutoff

77

Cepstrallyunwrapped phase, 50

quefrenciescutoff

Voiced Speech ExampleVoiced Speech Example

Combined impulse

f

78

response of glottal pulse, vocal tract

system, and radiation system

Page 14: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

14

Voiced Speech ExampleVoiced Speech Example

High quefrencyliftering; cutoff

79

liftering; cutoff quefrency=50; log magnitude

and unwrapped phase

Voiced Speech ExampleVoiced Speech Example

Estimated excitation

function for voiced speech

(Hamming

80

(Hamming window

weighted)

Unvoiced Speech ExampleUnvoiced Speech Example

Hamming window

40 msec duration

(section beginning

81

at sample 3200 in file test_16k.wav)

Unvoiced Speech ExampleUnvoiced Speech Example

wrapped phase

82

unwrapped phase

Unvoiced Speech ExampleUnvoiced Speech Example

83

Unvoiced Speech ExampleUnvoiced Speech ExampleCepstrally

smoothed log magnitude, 50

quefrenciescutoff

84

Cepstrallyunwrapped phase, 50

quefrenciescutoff

Page 15: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

15

Unvoiced Speech ExampleUnvoiced Speech Example

Estimated excitation source

for unvoiced

85

for unvoiced speech section

(Hamming window

weighted)

ShortShort--Time Homomorphic AnalysisTime Homomorphic Analysis

STFT

86

Review of Cepstral CalculationReview of Cepstral Calculation

• 3 potential methods for computing cepstral coefficients, , of sequence x[n]– analytical method; assuming X(z) is a rational

function; find poles and zeros and expand using log i

ˆ[ ]x n

87

power series– recursion method; assuming X(z) is either a minimum

phase (all poles and zeros inside unit circle) or maximum phase (all poles and zeros outside unit circle) sequence

– DFT implementation; using windows, with phase unwrapping (for complex cepstra)

Example 1Example 1——single pole sequence single pole sequence (computed using all 3 methods)(computed using all 3 methods)

[ ] [ ]= nx n a u n

88

1ˆ[ ] [ ]= −nax n u nn

Cepstral Computation AliasingCepstral Computation Aliasing

( )

[ ] [ ] [ ]

1

Effect of quefrency aliasing via a simple example with discrete-time Fourier transform

p

p

j Nj

x n n n N

X e e ωω

δ αδ

α −

= + −

= +

i

i

89

ˆ ( ) log{1 }

We can express the complex logarithm as

pj NjX e e ωω α −= +

i1

1

1

1

( 1)

( 1)ˆ[ ] [ ]

giving a complex cepstrum in the form

p

m mj mN

m

m m

pm

em

x n n mNm

ωα

α δ

+∞−

=

+∞

=

⎛ ⎞−= ⎜ ⎟

⎝ ⎠

⎛ ⎞−= −⎜ ⎟

⎝ ⎠

i

Example 2Example 2——voiced speech framevoiced speech frame

90

Page 16: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

16

Example 3Example 3——low quefrency lifteringlow quefrency liftering

91

Example 3Example 3——high quefrency lifteringhigh quefrency liftering

92

Example 4Example 4——effects of low quefrency liftereffects of low quefrency lifter

93

Example 5Example 5——phase unwrappingphase unwrapping

94

Example 6Example 6——phase unwrappingphase unwrapping

95

Homomorphic Spectrum SmoothingHomomorphic Spectrum Smoothing

96

Page 17: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

17

Running Running CepstrumCepstrum

97

Running Running CepstrumCepstrum

98

Running Running CepstrumsCepstrums

99

CepstrumCepstrum ApplicationsApplications

100

CepstrumCepstrum Distance MeasuresDistance Measures The cepstrum forms a natural basis for comparing

patterns in speech recognition or vector quantizationbecause of its stable mathematical characterizationfor speech signals A typical "cepstral distan

i

i

2( [ ] [ ])

ce measure" is of the form:

con

D c n c n= −∑

101

1( [ ] [ ])

[ ] [ ]where and are cepstral sequences correspondingto frames of signal, and is the cepstral distance betweenthe pair of sequences. Using

n

c n c nD

=∑

i

21 (log | ( ) | log | ( ) |)2

Parseval's theorem, we can express the cepstral distance in the frequency domain as:

Thus we see that the cepstral distance is actually a log magnitude spe

j jD H e H e dπ ω ω

πω

π −= −∫

ictral distance

Mel Frequency Mel Frequency CepstralCepstral CoefficientsCoefficients Basic idea is to compute a frequency analysis based on a filter

bank with approximately critical band spacing of the filters andbandwidths. For 4 kHz bandwidth, approximately 20 filters are used.

i

i [ ], 0,1,..., / 2First perform a short-time Fourier analysis, giving where is the frame number and is the frequency index (1 to halfthe size of the FFT)

Next the DFT values are grouped together

mX k k NFm k

=

i in critical bands and weighted

102

g p g gby triangular weighting functions.

Page 18: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

18

Mel Frequency Mel Frequency CepstralCepstral CoefficientsCoefficients

2

( 1, 2,..., )

1[ ] | [ ] [ ] |

[ ]

The mel-spectrum of the frame for the filter is defined as:

MF

where is the weighting function for the filter, ranging fromDFT index

r

r

th th

U

m r mk Lr

thr

m r r R

r V k X kA

V k r=

=

= ∑

i

to , andr rU

L U

103

2| [ ] |

is the normalizing factor for the mel-filter. (Normalization guaranteesthat if the input spectrum is flat, the mel-spectrum is flat). A discrete cosine transform

r

r

U

r rk L

th

A V k

r=

= ∑

i

1

[ ]

1 2 1[ ] log( [ ]) cos , 1, 2,...,2

13 24

mfcc

mfcc

of the log magnitude of the filter outputs iscomputed to form the function mfcc as:

mfcc MF

Typically and for 4 kHz

R

m mr

n

n r r n n NR R

N R

π=

⎡ ⎤⎛ ⎞= + =⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦= =

∑i bandwidth speech signals.

Delta Delta CepstrumCepstrum The set of mel frequency cepstral coefficients provide perceptually

meaningful and smooth estimates of speech spectra, over time Since speech is inherently a dynamic signal, it is reasonable to seek

i

ia representation that includes some aspect of the dynamic nature ofthe time derivatives (both first and second order derivatives) of the short-term cepstrum The resulting parameter sets are called thi e delta cepstrum (first derivative)

and the delta-delta cepstrum (second derivative).Th i l t th d f ti d lt t t i fi t

104

The simplest method of computing delta cepstrum parameters is a firstdifference of cepstral vectors, of the form:

i

1[ ] [ ] [ ] mfcc mfcc mfcc The simple difference is a poor approximation to the first derivative and is

not generally used. Instead a least-squares approximation to the local slope(over a r

m m mn n n−Δ = −

i

2

( [ ])[ ]

egion around the current sample) is used, and is of the form:

mfcc mfcc

where the region is frames before and after the current frame

M

m kk M

m M

k M

k nn

k

M

+=−

=−

Δ =∑

Homomorphic VocoderHomomorphic Vocoder• time-dependent complex cepstrum retains all the

information of the time-dependent Fourier transform => exact representation of speech

• time dependent real cepstrum loses phase information > not an exact representation of

105

information -> not an exact representation of speech

• quantization of cepstral parameters also loses information

• cepstrum gives good estimates of pitch, voicing, formants => can build homomorphic vocoder

Homomorphic VocoderHomomorphic Vocoder1. compute cepstrum every 10-20 msec2. estimate pitch period and

voiced/unvoiced decision3. quantize and encode low-time cepstral

l

106

values4. at synthesizer-get approximation to hv(n)

or hu(n) from low time quantized cepstral values

5. convolve hv(n) or hu(n) with excitation created from pitch, voiced/unvoiced, and amplitude information

Homomorphic VocoderHomomorphic Vocoder

107

• l(n) is cepstrum window that selects low-time values and is of length 26 samples homomorphic

vocoder

Homomorphic Vocoder Impulse ResponsesHomomorphic Vocoder Impulse Responses

108

Page 19: Homomorphic Speech Processing LV VV LU UN speech processing... · Lecture 12 1 Homomorphic Speech Processing General DiscreteGeneral Discrete--Time Model of Time Model of Speech Production

19

SummarySummary• Introduced the concept of the cepstrum of a signal,

defined as the inverse Fourier transform of the log of the signal spectrum

• Showed cepstrum reflected properties of both the

1ˆ[ ] log ( )jx n F X e ω− ⎡ ⎤= ⎣ ⎦p p p

excitation (high quefrency) and the vocal tract (low quefrency)– short quefrency window filters out excitation; long quefrency

window filters out vocal tract

• Mel-scale cepstral coefficients used as feature set for speech recognition

• Delta and delta-delta cepstral coefficients used as indicators of spectral change over time 109