Upload
trinhnhi
View
223
Download
1
Embed Size (px)
Citation preview
1
Digital Speech ProcessingDigital Speech Processing——Lecture 12Lecture 12
1
Homomorphic Homomorphic Speech ProcessingSpeech Processing
General DiscreteGeneral Discrete--Time Model of Time Model of Speech ProductionSpeech Production
2
[ ] [ ] [ ][ ] [ ] [ ] [ ][ ] [ ] [ ][ ] [ ] [ ]
Voiced Speech
Unvoiced Speech
= ∗ − −= ⋅ ∗ ∗= ∗ − −
= ⋅ ∗
L V
V V
L U
U N
p n p n h nh n A g n v n r np n u n h nh n A v n r n
Basic Speech ModelBasic Speech Model• short segment of speech can be modeled as
having been generated by exciting an LTI system either by a quasi-periodic impulse train, or a random noise signal
• speech analysis => estimate parameters of the h d l th i i ti ( d
3
speech model, measure their variations (and perhaps even their statistical variabilites-for quantization) with time
• speech = excitation * system response=> want to deconvolve speech into excitation and
system response => do this using homomorphicfiltering methods
Superposition PrincipleSuperposition Principle
++
[ ]x n }{L]}[{][ nxny L=
][][ 21 nxnx + { } { }][][ 21 nxnx LL +
4
1 2
1 2
[ ] [ ] [ ][ ] { [ ]} { [ ]} { [ ]}
= +
= = +L L Lx n ax n bx ny n x n a x n b x n
Generalized Superposition for ConvolutionGeneralized Superposition for Convolution
for LTI systems we have the result•
**[ ]x n
{ }H { }][][ nxny H=][][ 21 nxnx ∗ { } { }][][ 21 nxnx HH ∗
5
1 2
1 2
[ ] [ ] [ ] [ ] [ ]
"generalized" superposition => addition replaced by convolution[ ] [ ] [ ][ ] { [ ]} { [ ]} { [ ]}
homomorphic system f
∞
=−∞
= ∗ = −
•= ∗
= = ∗
•
∑
H x H x H x
k
y n x n h n x k h n k
x n x n x ny n n n n
or convolution
Homomorphic FilterHomomorphic Filter
1 2 1
homomorphic filter => homomorphic system that
passes the desired signal unaltered, while removing theundesired signal ( ) [ ] [ ] - with [ ] the "undesired" signal
⎡ ⎤• ⎣ ⎦
= ∗
H
x n x n x n x n
6
1 2 1
1
( ) g{ [ ]} { [ ]}= ∗H Hx n x n 2
1 1
2 2
2 2
{ [ ]}{ [ ]} ( ) - removal of [ ]{ [ ]} [ ]{ [ ]} [ ] [ ] [ ]
for linear systems this is analogous to additive noise removal
δ
δ
→
→
= ∗ =
•
H
H
H
H
x nx n n x nx n x nx n n x n x n
2
Canonic Form for Canonic Form for HomomorphicHomomorphicDeconvolutionDeconvolution
**[ ]x n
+ˆ[ ]x n
+ +ˆ[ ]y n
+
[ ]y n
1 2[ ] [ ]x n x n∗ 1 2ˆ ˆ[ ] [ ]x n x n+ 1 2ˆ ˆ[ ] [ ]y n y n+ 1 2[ ] [ ]y n y n∗
{ }∗D { }1−∗D{ }L
7
any homomorphic system can be represented as a cascadeof three systems, e.g., for convolution 1. system takes inputs combined by convolution and transformsthem into additive outputs 2. system
•
is a conventional linear system 3. inverse of first system--takes additive inputs and transformsthem into convolutional outputs
Canonic Form for Homomorphic ConvolutionCanonic Form for Homomorphic Convolution
1 2 [ ] [ ] [ ] - convolutional relation= ∗x n x n x n
**[ ]x n
+ˆ[ ]x n
+ +ˆ[ ]y n
+
[ ]y n
1 2[ ] [ ]x n x n∗ 1 2ˆ ˆ[ ] [ ]x n x n+ 1 2ˆ ˆ[ ] [ ]y n y n+ 1 2[ ] [ ]y n y n∗
{ }∗D { }1−∗D{ }L
8
1 2
1 2
1 2 1 21
1 2 1 2
[ ] [ ] [ ]ˆ ˆ ˆ[ ] { [ ]} [ ] [ ] - additive relationˆ ˆ ˆ ˆ ˆ[ ] { [ ] [ ]} [ ] [ ] - conventional linear system
ˆ ˆ[ ] { [ ] [ ]} [ ] [ ] -
∗
−∗
= = +
= + = +
= + = ∗
DL
D
x n x n x n x ny n x n x n y n y n
y n y n y n y n y n
1
inverse of convolutional relation=> design converted back to linear system,
- fixed (called the characteristic system for homomorphic deconvolution)
- fixed (characteristic system for ∗
−∗
⎡ ⎤⎣ ⎦
⎡ ⎤⎣ ⎦
LD
D inverse homomorphic deconvolution)
Properties of Characteristic Properties of Characteristic SystemsSystems
1 2
1 2
ˆ[ ] { [ ]} { [ ] [ ]}{ [ ]} { [ ]}
ˆ ˆ[ ] [ ]
∗ ∗
∗ ∗
= = ∗
= +
= +
D DD D
x n x n x n x nx n x n
x n x n
9
1 2
1 11 2
1 11 2
1 2
[ ] [ ]
ˆ ˆ ˆ{ [ ]} { [ ] [ ]}
ˆ ˆ{ [ ]} { [ ]}[ ] [ ] [ ]
− −∗ ∗
− −∗ ∗
= +
= +
= ∗
= ∗ =
D D
D D
x n x n
y n y n y n
y n y ny n y n y n
DiscreteDiscrete--Time Fourier Time Fourier Transform RepresentationsTransform Representationspp
10
Canonic Form for Canonic Form for DeconvolutionDeconvolution Using DTFTsUsing DTFTs
1 2
1 2
1 2
1 2
need to find a system that converts convolution to addition[ ] [ ] [ ]
( ) ( ) ( ) since
ˆ ˆ ˆ { [ ]} [ ] [ ] [ ]ˆ ˆ ˆ( ) ( ) ( ) ( )
=> use log function wh
ω ω ω
ω ω ω ω
∗
∗
•= ∗
= ⋅
•
= + =
⎡ ⎤ = + =⎣ ⎦
D
D
j j j
j j j j
x n x n x n
X e X e X e
x n x n x n x n
X e X e X e X e
1 2
ich converts products to sumsˆ ( ) log ( ) log ( ) ( )ω ω ω ω⎡ ⎤ ⎡ ⎤= = ⋅⎣ ⎦ ⎣ ⎦
j j j jX e X e X e X e
1 2 1 2
1 2 1 2
1 2
ˆ ˆlog ( ) log ( ) ( ) ( )
ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) ( ) ( )
ˆ ˆ( ) exp ( ) (
ω ω ω ω
ω ω ω ω ω
ω ω
⎣ ⎦ ⎣ ⎦⎡ ⎤ ⎡ ⎤= + = +⎣ ⎦ ⎣ ⎦⎡ ⎤= + = +⎣ ⎦
= +
L
j j j j
j j j j j
j j j
X e X e X e X e
Y e X e X e Y e Y e
Y e Y e Y e 1 2) ( ) ( )ω ω ω⎡ ⎤ = ⋅⎣ ⎦j jY e Y e
Characteristic System for Characteristic System for DeconvolutionDeconvolution Using DTFTsUsing DTFTs
12
12
( ) [ ]
ˆ ( ) log ( ) log ( ) arg ( )
ˆˆ[ ] ( )
ω ω
ω ω ω ω
πω ω
π
ωπ
∞−
=−∞
−
=
⎡ ⎤ ⎡ ⎤= = +⎣ ⎦ ⎣ ⎦
=
∑
∫
j j n
n
j j j j
j j n
X e x n e
X e X e X e j X e
x n X e e d
3
Inverse Characteristic System for Inverse Characteristic System for DeconvolutionDeconvolution Using Using DTFTsDTFTs
13
12
ˆ ˆ( ) [ ]
ˆ( ) exp ( )
[ ] ( )
ω ω
ω ω
πω ω
π
ωπ
∞−
=−∞
−
=
⎡ ⎤= ⎣ ⎦
=
∑
∫
j j n
n
j j
j j n
Y e y n e
Y e Y e
y n Y e e d
Issues with LogarithmsIssues with Logarithms
1 2 1 2
1 2
1 2
it is essential that the logarithm obey the equation
log ( ) ( ) log ( ) log ( )
this is trivial if ( ) and ( ) are real -- however usually
( ) and ( )
ω ω ω ω
ω ω
ω ω
•
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⋅ = +⎣ ⎦ ⎣ ⎦ ⎣ ⎦
•
j j j j
j j
j j
X e X e X e X e
X e X e
X e X e are complex
14
arg ( )
on the unit circle the complex log can be written in the form:
( ) | ( ) |ˆlog ( ) ( ) log | ( ) | arg ( )
no problems with log magnitude term; uniq
ωω ω
ω ω ω ω
⎡ ⎤⎣ ⎦
•
=
⎡ ⎤ ⎡ ⎤ ⎡ ⎤= = +⎣ ⎦ ⎣ ⎦ ⎣ ⎦•
jj X ej j
j j j j
X e X e e
X e X e X e j X e
ueness problemsarise in defining the imaginary part of the log; can show thatthe imaginary part (the phase angle of the z-transform) needsto be a continuous odd function of ω
Problems with arg FunctionProblems with arg Function
{ } { }{ }
1( ) ( )ω ω≠j jARG X e ARG X e
15
{ }2( )ω+ jARG X e
{ } { }{ }
1
2
arg ( ) arg ( )
arg ( )
ω ω
ω
=
+
j j
j
X e X e
X e
Complex Complex CepstrumCepstrum PropertiesProperties
1ˆ[ ] (log | ( ) | arg{ ( )})2
[ ] ( ) | arg{ ( )}
Given a complex logarithm that satisfies the phase continuity condition, we have:
If real, then log|X is an even function of and
j j j n
j j
x n X e j X e e d
x n e X e
πω ω ω
π
ω ω
ωπ
ω−
= +∫
i
i is an odd function of . This means that the real and imaginary parts of ω
16
ˆ[ ]ˆ[ ]
ˆ[ ] [ ] [ ]
the complex log have the appropriate symmetry for to be a real sequence, and can be represented as:
x nx n
x n c n d n= +
ˆ[ ] ( ) | [ ],ˆ[ ] arg{ ( )} [ ] :
ˆ ˆ ˆ ˆ[ ] [ ] [ ] [ ][ ] ; [ ]2 2
where is the inverse DTFT of log |X and the even part of and is the inverse DTFT of and the odd part of
j
j
c n e x nd n X e x n
x n x n x n x nc n d n
ω
ω
+ − − −= =
Complex and Real CepstrumComplex and Real Cepstrum
12
ˆ define the inverse Fourier transform of ( ) as
ˆˆ[ ] ( )
ˆ where [ ] called the "complex cepstrum" since a complexlogarithm is involved in the computation
ω
πω ω
π
ωπ
−
•
=
•
∫
j
j j n
X e
x n X e e d
x n
17
g can also define a "re•
12
12
al cepstrum" using just the real part ofthe logarithm, giving
ˆ[ ] Re ( )
log | ( ) |
ˆ can show that [ ] is the even part of [ ]
πω ω
ππ
ω ω
π
ωπ
ωπ
−
−
⎡ ⎤= ⎣ ⎦
=
•
∫
∫
j j n
j j n
c n X e e d
X e e d
c n x n
TerminologyTerminology•• SpectrumSpectrum – Fourier transform of signal autocorrelation•• CepstrumCepstrum – inverse Fourier transform of log spectrum•• AnalysisAnalysis – determining the spectrum of a signal•• AlanysisAlanysis – determining the cepstrum of a signal•• FilteringFiltering linear operation on time signal
18
•• FilteringFiltering – linear operation on time signal•• LifteringLiftering – linear operation on cepstrum•• FrequencyFrequency – independent variable of spectrum•• QuefrencyQuefrency – independent variable of cepstrum•• Harmonic Harmonic – integer multiple of fundamental frequency•• RahmonicRahmonic – integer multiple of fundamental frequency
4
zz--Transform RepresentationTransform Representation
1 2
1 2
[ ] [ ]* [ ]
( ) ( ) ( )
ˆ ( ) log{ ( )} log{ ( ) ( )}
The transform of the signal: is of the form: With an appropriate definition of the complex log, we get:
zx n x n x n
X z X z X z
X z X z X z X z
−=
= ⋅
= = ⋅
i
i
19
1 2
1
( ) log{ ( )} log{ ( ) ( )}log{ (
X z X z X z X zX
= == 2
1 2
)} log{ ( )}ˆ ˆ( ) ( )
z X z
X z X z
+
= +
Characteristic System for Characteristic System for DeconvolutionDeconvolution
20
{ }
[ ] [ ]1
2
arg ( )( ) [ ] ( )
ˆ ( ) log ( ) log ( ) arg ( )
ˆˆ[ ] ( )π
∞−
=−∞
= =
= = +
=
∑
∫
j X zn
n
n
X z x n z X z e
X z X z X z j X z
x n X z z dzj
Inverse Characteristic System for Inverse Characteristic System for DeconvolutionDeconvolution
21
[ ]1
2
ˆ ˆ( ) [ ]
ˆ( ) exp ( ) log ( ) arg ( )
[ ] ( )π
∞−
=−∞
=
⎡ ⎤= = +⎣ ⎦
=
∑
∫
n
n
n
Y z y n z
Y z Y z Y z j Y z
y n Y z z dzj
zz--Transform Transform CepstrumCepstrum AlanysisAlanysis
( ) ( )
( )
01 1 1
1 1
1
1
1 1
1
consider digital systems with rational z-transforms of the general type
( )
− − −
= =
−
=
•
− −
=
−
∏ ∏
∏
i
k
i
MM
kk k
N
kk
A a z b zX z
c z
22
( ) ( ) ( )0
0 1 1
1 1
1 1
we can express the above equation as:
( )
− − −
= =
− − −
=∏
i
k
M MM
k kk k
z A b a z b zX z
( )
0
1
1
1
1
1 with all coefficients , , => all poles and zeros are inside the unit circle; all zeros are outside the unit circle;
=
−
=
−
• <
∏ ∏
∏
i
i
M
kN
kk
k k k k
k k
c z
a b c ca b
zz--Transform Transform CepstrumCepstrum AlanysisAlanysis
( )
0min max
1
( )
( ) ( ) ( )
express as product of minimum-phase andmaximum-phase signals, i.e.,
wherei
M
M
X z
X z X z z X z−= ⋅
∏
i
i
23
( )
( )
( )
1
1min
1
1
1max
1
1( )
1
( ) 1
all poles and zeros inside unit circle
i
i
k
kk
N
kk
M
k
A a zX z
c z
X z b
−
=
−
=
−
=
−=
−
= −
∏
∏
∏
i
( )1
all zeros outside unit circle
iM
kk
b z=
−∏i
zz--Transform Transform CepstrumCepstrum AlanysisAlanysis
min max 0
min
[ ][ ] [ ] [ ]
[ ] 0, 0
can express as the convolution:
minimum-phase component is causal
x nx n x n x n M
x n n
= ∗ −
= <
i
i
24
0
min
max
[ ] ,
[ ] 0, 0maximum-phase component is anti-causal
factor is the shift in time oriM
x n n
z−= >
i
i 0
[ ]
gin by samples required so that the overall sequence,
be causal
M
x n
5
zz--Transform Cepstrum Transform Cepstrum AlanysisAlanysis
( ) ( ) ( )
00
0
1
1
1 1
the complex logarithm of ( ) is
ˆ ( ) log ( ) log | | log | | log[ ]−−
=
•
= = + + +⎡ ⎤⎣ ⎦ ∑
∑ ∑ ∑i i
MM
kk
MM N
X z
X z X z A b z
25
( ) ( ) ( )1 1
1 1 1
1 1 1log log log
ˆ evaluating ( ) on the unit circle we can ignore the ter
− −
= = =
− + − − −
•
∑ ∑ ∑k k kk k k
a z b z c z
X z0
m
related to log (as this contributes only to the imaginary
part and is a linear phase shift)
ω⎡ ⎤⎣ ⎦
j Me
zz--Transform Transform CepstrumCepstrum AlanysisAlanysis
12
we can then evaluate the remaining terms, use power seriesexpansion for logarithmic terms (and take the inversetransform to give the complex cepstrum) giving:
ˆˆ ( ) ( )π
ω ω ωπ
•
= ∫ j j nx n X e e d log(1 ) , | | 1nZZ Z
n
∞
− = − <∑
26
2
log | |
ππ
−
= +
∫
A0
0
1
1
1 1
1
0
0
0
log | |−
=
= =
−
=
=
= − >
= <
∑
∑ ∑
∑
i i
M
kk
N Mn nk k
k kM n
k
k
b n
c an
n n
bn
n
1n n=∑
CepstrumCepstrum PropertiesProperties1.
[ ]( )
complex cepstrum is non-zero and of infinite extent forboth positive and negative , even though may becausal, or even of finite duration ( has only zeros).2. complex cepstrum is a decaying
n x nX z
| |
ˆ| [ ] | , | || |
sequence that is bounded by:
for
3. zero-quefrency value of complex cepstrum (and the cepstrum)
n
x n nnαβ< →∞
27
ˆ[0] 0depends on the gain constant and the zeros outside the unit circle.Setting x =
01
1
[0] 0
( ) 1
( ) 0),ˆ[ ] 0, 0
(and therefore ) is equivalent to normalizingthe log magnitude spectrum to a gain constant of:
4. If has no zeros outside the unit circle (all then:(minimum
M
kk
k
c
A b
X z bx n n
−
=
=
− =
== <
∏
( ) , 0),ˆ[ ] 0, 0
-phase signals)5. If has no poles or zeros inside the unit circle (all then:
(maximum-phase signals)k kX z a c
x n n=
= >
zz--Transform Cepstrum Transform Cepstrum AlanysisAlanysis
( ) 1
1
11 1
The main z-transform formula for cepstrum alanysis is based on the power series expansion:
log( )
A l thi f l t th ti l
+∞
=
−+ = <∑
i
nn
nx x x
n
E l 1
28
1
--Apply this formula to the exponential sequence
( )
i
x n
Example 1
( )
1 1
11
1 11
11 1
1
11
11
1 1
( ) ( )
ˆ ( ) log[ ( )] log( ) ( )
ˆˆ ( ) ( ) ( ) log( )
−
+∞− −
=
∞− −
=
= ⇔ =−
−= = − − = − −
⎛ ⎞= − ⇔ = − − = ⎜ ⎟
⎝ ⎠
∑
∑
n
nn n
n
n nn
n
a u n X zaz
X z X z az a zn
a ax n u n X z az zn n
zz--Transform Cepstrum Transform Cepstrum AlanysisAlanysis
1
11 1
2
--consider the case of a digital system with a
single zero outside the unit circle ( )( ) ( ) ( )( ) (zero at / )
δ δ
•
<
= + +
+
bx n n b nX z bz z b
Example 2
29
[ ] ( )2
2 2
1
1
2
1 1
1
1
( ) (zero at / )ˆ ( ) log ( ) log
( ) ( )
ˆ
+∞
=
= + = −
= = +
−= ∑
nn n
n
X z bz z b
X z X z bz
b zn
x11 1( )( ) ( )+−
= − −n nbn u nn
zz--Transform Transform CepstrumCepstrum AlanysisAlanysis for 2 Pulsesfor 2 Pulses
( )
3
3
3 3
1
0 1
1
1
--an input sequence of two pulses of the form
( ) ( ) ( ) ( )
( )
ˆ ( ) log ( ) log
δ αδ α
α
α
−
−
∞ +
•
= + − < <
= +
= = +⎡ ⎤⎣ ⎦
p
p
p
N
N
n
x n n n N
X z z
X z X z z
Example 3
30
1
1
13
1
1
1
( )
ˆ ( ) ( ) ( )
α
α δ
∞ +−
=∞
+
=
−=
= − −
∑
∑
pn
nNn
nk
kp
k
zn
x n n kNk
the cepstrum is an impulse train with impulses spaced at samples• pN
NNpp 2N2Npp 3N3Npp
6
Cepstrum for Train of ImpulsesCepstrum for Train of Impulses
0
an important special case is a train of impulses of the form:
( ) ( )α δ=
•
= −∑M
r prM
N
x n n rN
31
0
1
( )
clearly ( ) is a polynomial in rather than ; thus ( ) can be expressed
α −
=− −
=
•
∑ p
p
rNr
rN
X z z
X z z zX z
1 1
as a product of factors
of the form ( ) and ( ), giving a complex ˆ cepstrum, ( ), that is non-zero only at integer multiples of
−− −p pN N
p
az bzx n N
zz--Transform Cepstrum Alanysis for Transform Cepstrum Alanysis for Convolution of 2 SequencesConvolution of 2 Sequences
4 1 3
--consider the convolution of sequences 1 and 3, i.e.,
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )The comple cepstr m is therefore the s m of the comple
δ αδ
α −
⎡ ⎤ ⎡ ⎤= ∗ = ∗ + −⎣ ⎦⎣ ⎦
= + −
i
p
np
n Nnp
x n x n x n a u n n n N
a u n a u n N
Example 4
cepstra
32
The complex cepstrum is therefore the sum of the complei
4 1 3
1
1
11
x cepstra of the two sequences (since convolution in the time domain is converted to addition in the cepstral domain)
ˆ ˆ ˆ ( ) ( ) ( )
( )( ) ( )α δ+∞
=
= +
−= − + −∑
n k k
pk
x n x n x n
a u n n kNn k
zz--Transform Cepstrum Alanysis for Transform Cepstrum Alanysis for Convolution of 3 SequencesConvolution of 3 Sequences
[ ]5 1 2 3
1
1
1 1
--consider the convolution of sequences 1, 2 and 3, i.e.,
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
δ δ δ αδ− − +
= ∗ ∗
⎡ ⎤ ⎡ ⎤= ∗ + + ∗ + −⎣ ⎦⎣ ⎦
i
p p
np
n N n Nn n
x n x n x n x n
a u n n b n n n N
a n a n N ba n ba n N
Example 5
33
1 1( ) ( ) ( ) ( )
The complex
α α= + − + + + − +
i
p pn np pa u n a u n N ba u n ba u n N
5 1 2 3
1 1
1
1 11 1
cepstrum is therefore the sum of the complex cepstra of the three sequences
ˆ ˆ ˆ ˆ ( ) ( ) ( ) ( )
( ) ( )( ) ( ) ( )α δ+ +∞
=
= + +
− −= − + − + − −∑
n k k n n
pk
x n x n x n x n
a bu n n kN u nn k n
Example: a=.9, b=.8, a=.7, Np=15
( )1
1 11( )( )
( )α −
−
+= +
−pNbzX z z
az
34
11( ) +− n nbn
nan
1
1
1( ) [ ]α δ+∞
=
−−∑
k k
pk
n kNk
HomomorphicHomomorphic Analysis of Analysis of Speech ModelSpeech Model
35
HomomorphicHomomorphic Analysis of Speech ModelAnalysis of Speech Model
the transfer function for voiced speech is of the form ( ) ( ) ( ) ( ) with effective impulse response for voiced speech
[ ] [ ] [ ] [ ]
•= ⋅
∗ ∗i
V VH z A G z V z R z
h n A g n v n r n
36
[ ] [ ] [ ] [ ] similarly for unvoiced speech we have
= ⋅ ∗ ∗
•V V
U
h n A g n v n r n
H ( ) ( ) ( ) with effective impulse response for unvoiced speech
[ ] [ ] [ ]
= ⋅
= ⋅ ∗i
U
U U
z A V z R z
h n A v n r n
7
Complex Cepstrum for SpeechComplex Cepstrum for Speech0
1
1 1
1
1
1 1
1
0
the models for the speech components are as follows:
( ) ( )1. vocal tract: ( )
( )
--for voiced speech, only poles => , all
− −
= =
−
=
•
− −
=
−
= =
∏ ∏
∏
i
i
MMM
k kk k
N
kk
k k
Az a z b zV z
c z
a b k
37
0o o ced speec , o y po es , a--unvoiced speech and nasals
k ka b
1
1
1
, need pole-zero model but all poles areinside the unit circle => --all speech has complex poles and zeros that occur in complex conjugatepairs
2. radiation model: ( ) (high frequency emphas−
<
≈ −
kc
R z z
01
1 1
1 1
is)3. glottal pulse model: finite duration pulse with transform
( ) ( ) ( )
with zeros both inside and outside the unit circle
α β−
= =
= − −∏ ∏i LL
k kk k
G z B z z
Complex Cepstrum for Voiced Complex Cepstrum for Voiced SpeechSpeech
• combination of vocal tract, glottal pulse and radiation will be non-minimum phase => complex cepstrum exists for all values of n
38
• the complex cepstrum will decay rapidly for large n (due to polynomial terms in expansion of complex cepstrum)
• effect of the voiced source is a periodic pulse train for multiples of the pitch period
Simplified Speech ModelSimplified Speech Model
[ ] short-time speech model
[ ] [ ] [ ] [ ] [ ] [ ][ ] [ ]
•
= ⋅ ∗ ∗ ∗x n w n p n g n v n r nh
39
[ ] [ ] short-time complex cepstrum
ˆ ˆ ˆˆ ˆ [ ] [ ] [ ] [ ] [ ]
≈ ∗
•= + + +
w v
w
p n h n
x n p n g n v n r n
Analysis of Model for Voiced SpeechAnalysis of Model for Voiced Speech
1 1
1 2 1 1 2
1 2
0.5 [1 cos( ( 1) / )] 0 1[ ] cos(0.5 ( 1 ) / ) 2
025, 1
Assume sustained /AE/ vowel with fundamental frequency of 125 Hz Use glottal pulse model of the form:
otherwise
n N n Ng n n N N N n N N
N N
ππ
− + ≤ ≤ −⎧⎪= + − ≤ ≤ + −⎨⎪⎩
= =
ii
33 33
0 34 sample impulse response, with transform⇒
40
33 3333 1
1 1
( ) ( ) (1 ) all roots outside unit circle maximum phase
Vocal tract system specified by 5 formants (frequencies and bandwidth
k kk k
G z z b b z− −
= =
= − − ⇒ ⇒∏ ∏i
52 41 2
1
1
1( )(1 2 cos(2 ) )
{ , } [(660,60), (1720,100), (2410,120), (3500,175), (4500,250)]
( ) 1 , 0.96
s)
Radiation load is simple first difference
k kT Tk
k
k k
V ze F T z e z
F
R z z
πσ πσπ
σ
γ γ
− −− −
=
−
=− +
=
= − =
∏
i
Time Domain AnalysisTime Domain Analysis
41
PolePole--Zero Analysis of Model Zero Analysis of Model ComponentsComponents
42
8
Spectral Analysis of ModelSpectral Analysis of Model
43
Speech Model OutputSpeech Model Output
44
Complex Cepstrum of ModelComplex Cepstrum of Model
[ ] [ ] [ ] [ ] [ ]
ˆ ˆ ˆ ˆ ˆ[ ] log | | [ ] [ ] [ ] [ ] [ ]ˆ[ ] 0 0
The voiced speech signal is modeled as: with complex cepstrum:
glottal pulse is maximum phase
V
V
x n A g n v n r n p n
s n A n g n v n r n p ng n n
δ
= ⋅ ∗ ∗ ∗
= + + + +⇒ = >
i
i
i
45
[ ] 0, 0 glottal pulse is maximum phase vocal tra
g n n⇒ = >ii
1
1
ˆ ˆ[ ] 0, 0, [ ] 0, 0
ˆ( ) log(1 )
ˆ[ ] [ ]
ct and radiation systems are minimum phase
p p
kN kN
kk
pk
v n n r n n
P z z zk
p n n kNk
ββ
β δ
∞− −
=
∞
=
⇒ = < = <
= − − =
= −
∑
∑
Cepstral Analysis of ModelCepstral Analysis of Model
46
Resulting Complex and Real Resulting Complex and Real CepstraCepstra
47
Frequency Domain RepresentationsFrequency Domain Representations
48
9
FrequencyFrequency--Domain Representation of Domain Representation of Complex CepstrumComplex Cepstrum
49
The Complex The Complex CepstrumCepstrum--DFT ImplementationDFT Implementation
2 2
0 1 1[ ] ( ) [ ] , ,..., ,π π∞ −
∞
= = = −∑j k j knN N
nX k X e x n e k N
50
{ } { }2
21
0
1 0 1 1
/
[ ] is the point DFT corresponding to ( )ˆ ˆ[ ] ( ) log [ ] log [ ] arg [ ]
ˆˆ ˆ[ ] [ ] [ ] , ,...,
ˆ [
ω
π
π
=−∞
− ∞
= =−∞
= = = +
= = + = −
•
∑ ∑
in
jp
j k N
N j knN
k r
X k N X e
X k X e X k X k j X k
x n X k e x n rN n NN
x ˆ] is an aliased version of [ ]use as large a value of as possible to minimize aliasing⇒n x n
N
Inverse SystemInverse System-- DFT ImplementationDFT Implementation
51
The The CepstrumCepstrum--DFT ImplementationDFT Implementation
12
[ ] log | ( ) |
Approximation to cepstrum using DFT:
πω ω
π
ωπ −
= −∞ < < ∞∫i
j j nc n X e e d n
52
2 2
12
0
0 1 1
1 0 1
0 1 1
/
Approximation to cepstrum using DFT:
[ ] ( ) [ ] , ,..., ,
[ ] log | [ ] | ,
( ) [ ] , ,...,
[ ]
π π
π
∞ −
=−∞
−
=
∞
=−∞
= = = −
= ≤ ≤ −
= + = −
•
∑
∑
∑
j k j knN N
nN
j kn N
k
r
X k X e x n e k N
c n X k e n NN
c n c n rN n N
c n
2
is an aliased version of [ ] use as large a value of as possible to minimize aliasing
ˆ ˆ[ ] [ ]( )
⇒
+ −=
c n N
x n x nc n
Cepstral Computation AliasingCepstral Computation Aliasing
N=256, NN=256, Npp=75, =75, αα=0.8=0.8
Circle dots areCircle dots are
53
Circle dots are Circle dots are cepstrum cepstrum values in values in correct correct
locations; all locations; all other dots are other dots are
results of results of aliasing due to aliasing due to
finite range finite range computationscomputations
SummarySummary
z( )z( ) log( )log( ) L( )L( ) exp( )exp( ) zz--11( )( )x[n]x[n] y[n]y[n]X(z)X(z) X(z)X(z)^̂ Y(z)Y(z)^̂ Y(z)Y(z)
1. Homomorphic System for Convolution:
zz--11( )( ) Linear Linear SystemSystem
z( )z( )X(z)X(z)^̂ Y(z)Y(z)^̂x[n]x[n] y[n]y[n]^̂ ^̂
54
{ }
{ }
1
arg ( )
2. Practical Case: ( )
( )
( ) ( )
log ( ) log ( ) arg ( )
jj X ej j
j j j
z DFTz IDFT
X e X e e
X e X e j X e
ωω ω
ω ω ω
−
→
→
=
⎡ ⎤ = +⎣ ⎦
10
SummarySummary
12
3. Complex Cepstrum:
ˆˆ [ ] ( )
4 C t
πω ω
π
ωπ −
= ∫ j j nx n X e e d
55
12
2
4. Cepstrum:
[ ] log ( )
ˆ ˆ[ ] [ ] ˆ[ ] even part of [ ]
πω ω
π
ωπ −
=
+ −= =
∫ j j nc n X e e d
x n x nc n x n
SummarySummary
DFTDFT Complex LogComplex Log IDFTIDFTx(n)x(n) X(kX(k)) X(kX(k)) x(nx(n))^̂ ^̂
( ) ( )
{ } { }
2 2/ /
5. Practical Implementation of Complex Cepstrum:
[ ] ( ) [ ] )
ˆ
π π∞
−
=−∞
= = ∑j N k j N kn
nX k X e x n e
~
56
{ } { }( )
12
0
1 /
ˆ [ ] log ( ) log ( ) arg ( )
ˆˆ ˆ[ ] [ ] [ ] aliasing
ˆ[ ] aliased versi
π− ∞
= =−∞
= = +
= = + ⇒
=
∑ ∑
p p p
Nj N kn
k r
X k X k X k j X k
x n X k e x n rNN
x n
11
1
11
1
ˆon of [ ]6. Examples:
ˆ( ) [ ] [ ]
ˆ( ) [ ] [ ]
δ
δ
∞
−=
∞
=
= ⇔ = −−
= − ⇔ = − +
∑
∑
r
rr
r
x n
aX z x n n raz r
bX z bz x n n rr
Complex Complex CepstrumCepstrum Without Phase Without Phase UnwrappingUnwrapping
0
0
1 1 1
1 1
[ ]
( ) [ ] ,
( ) [0] (1 ) (1 )
short-time analysis uses finite-length windowed segments,
-order polynomial
Find polynomial rootsi
Mn th
n
M M
m mm m
x n
X z x n z M
X z x a z b z
−
=
− − −
= =
=
= − −
∑
∏ ∏
i
i
57
roots are inside unit circle (minmai
00
00
1 1
1
1 1
1
1
( ) (1 ) (1 )
[0]( 1)
imum-phase part) roots are outside unit circle (maximum-phase part) Factor out terms of form giving:
Use polynomial root finde
i
m
mM M
Mm m
m mM
Mm
m
b
b z
X z Az a z b z
A x b
− −
− −
= =
−
=
−
= − −
= −
∏ ∏
∏
ii
iˆ[ ]
r to find the zeros that lie insideand outside the unit circle and solve directly for .x n
Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals
for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum can be completely represented by the real part of the Fourier transforms this means we can represent the compl
•
•
2
ex cepstrum of minimum phase signals by the log of the magnitude of the FT alone since the real part of the FT is the FT of the even part of the sequence
ˆ ˆ( ) ( )ˆRe ( )ω
•
⎡ + − ⎤⎡ ⎤ = ⎢ ⎥⎣ ⎦ ⎣ ⎦j x n x nX e FT
58
2
( )
⎣ ⎦ ⎣ ⎦
=⎡ ⎤⎣ ⎦FT c n
2
0 00
2 0
log ( )
ˆ ˆ( ) ( )( )
givingˆ ( )
( )( )
thus the complex cepstrum (for minimum phase signals) can be computed by computing the cepstrum and using the equation above
ω
+ −=
•
= <= == >
•
jX e
x n x nc n
x n nc n nc n n
Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals
the complex cepstrum for minimum phase signals can be computed recursively from the input signal,
( ) using the relation
•
x n
59
1
0
0 00 0
00 0
ˆ( )log ( )
( ) ( )ˆ( )( ) ( )
−
=
= <
= =⎡ ⎤⎣ ⎦
−⎛ ⎞= − >⎜ ⎟⎝ ⎠
∑n
k
x n nx n
x n k x n kx k nx n x
Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals
( ) ( ) 1. basic -transform( )( ) ( ) 2. scale by rule
←⎯→
′←⎯→ − = −
x n X z zdX znx n z zX z ndz
60
[ ]
[ ]
ˆˆ( ) ( ) log ( ) 3. definition of complex cepstrumˆ ( ) ( )log[ ( )] 4. differentiation of -transform
( )ˆ ( )
←⎯→ =
′= =
−
x n X z X z
dX z d X zX z zdz dz X z
dX zz ( ) ( ) 5. multiply both sides of equation′= −X z zX zdz
11
Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals
( )
0 00 0
ˆ ( )ˆ( ) ( ) ( ) ( ) ( )
ˆ( ) ( ) ( )
ˆ for minimum phase systems we have ( ) for ,( ) for giving:
∞
=−∞
′∗ ←⎯→ − = − ←⎯→
= −
= <= <
∑i
k
dX znx n x n z X z zX z nx ndz
nx n x k x n k k
x n nx n n
61
0
0 0( ) for , giving:
ˆ( ) ( ) ( )
separating out the term for =
= <
⎛ ⎞= − ⎜ ⎟⎝ ⎠
∑i
n
k
x n nkx n x k x n kn
[ ]
1
0
1
0
0
00 0
0 0 0 0
we get:
ˆ ˆ( ) ( ) ( ) ( ) ( )
( ) ( )ˆ ˆ( ) ( ) ,( ) ( )
ˆ ˆ( ) log ( ) , ( ) ,
−
=
−
=
=
⎛ ⎞= − +⎜ ⎟⎝ ⎠
− ⎛ ⎞= − >⎜ ⎟⎝ ⎠
= = <
∑
∑
n
k
n
k
k nkx n x k x n k x x nn
x n x n k kx n x k nx x n
x x x n n
Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Minimum Phase SignalsCepstrum for Minimum Phase Signals
0 00 1 1
0 1 1 1 1
1 10 1 10 0
ˆ why is ( ) log[ ( )]? assume we have a finite sequence ( ), , ,..., we can write ( ) as:
( ) ( ) ( ) ( ) ( ) ... ( ) ( )( ) ( )( ) ( ) ( ) ... ( )( ) ( )
x xx n n N
x nx n x n x n x N N
x x Nx n n Nx x
δ δ δ
δ δ δ
== −
= + − + + − −
⎡ −= + − + + −
⎣
iii
⎤⎢ ⎥
⎦
62
1 211
0 1 1
1 1
0
taking transforms, we get:
( ) ( ) ( ) ( )
where the first term is the gain, ( ), and the two product terms are the zeros inside and outside t
NZ NZNn
k kn k k
z
X z x n z G a z b z
G x
−− −
= = =
−
= = − −
=
∑ ∏ ∏
i
i
0 0 0 0
he unit circle. for minimum phase systems we have all zeros inside the unit circle so the
second product term is gone, and we have the result thatˆ ˆ ( ) log[ ] log[ ( )]; ( ) ,
ˆ( )
x G x x n n
x n
= = = <
=
i
0NZ1
k=1
ank nn
⎛ ⎞− >⎜ ⎟
⎝ ⎠∑
Cepstrum for Maximum Phase SignalsCepstrum for Maximum Phase Signals
2
for maximum phase signals (no poles or zeros inside unit circle)
ˆ ˆ( ) ( )( )
giving
•
+ −=
•
x n x nc n
63
0 00
2 0
givingˆ ( )
( )( )
thus the complex cepstrum (for maximum phase signals) can be compute
= >= == <
•
x n nc n nc n n
d by computing the cepstrum and using the equation above
Recursive Relation for Complex Recursive Relation for Complex Cepstrum for Maximum Phase SignalsCepstrum for Maximum Phase Signals
the complex cepstrum for maximum phase signals can be computed recursively from the input signal,
( ) using the relation
•
x n
64
0
1
0 00 0
00 0
( ) using the relationˆ( )
log ( )
( ) ( )ˆ( )( ) ( )= +
= >
= =⎡ ⎤⎣ ⎦
−⎛ ⎞= − <⎜ ⎟⎝ ⎠
∑k n
x nx n n
x n
x n k x n kx k nx n x
Computing ShortComputing Short--Time Time CepstrumsCepstrums from Speech from Speech U i P l i l R tU i P l i l R tUsing Polynomial RootsUsing Polynomial Roots
65
CepstrumCepstrum From Polynomial RootsFrom Polynomial Roots
66
12
CepstrumCepstrum From Polynomial RootsFrom Polynomial Roots
67
Computing ShortComputing Short--Time Time CepstrumsCepstrums from Speech from Speech
68
Using the DFTUsing the DFT
Practical ConsiderationsPractical Considerations
• window to define short-time analysis• window duration (should be several pitch
periods long)i f FFT (t i i i li i )
69
• size of FFT (to minimize aliasing)• elimination of linear phase components
(positioning signals within frames)• cutoff quefrency of lifter• type of lifter (low/high quefrency)
Computational ConsiderationsComputational Considerations
70
Voiced Speech ExampleVoiced Speech Example
Hamming window
40 msec duration
(section beginning
71
at sample 13000 in file
test_16k.wav)
Voiced Speech ExampleVoiced Speech Example
wrapped phase
72
unwrapped phase
13
Voiced Speech ExampleVoiced Speech Example
73
Characteristic System for Characteristic System for Homomorphic ConvolutionHomomorphic Convolution
• still need to define (and design) the Loperator part (the linear system component) of the system to completely
74
p ) y p ydefine the characteristic system for homomorphic convolution for speech– to do this properly and correctly, need to look
at the properties of the complex cepstrum for speech signals
Complex Cepstrum of SpeechComplex Cepstrum of Speech
• model of speech:– voiced speech produced by a quasi-periodic
pulse train exciting slowly time-varying linear system => p[n] convolved with hv[n]
75
y p[ ] v[ ]– unvoiced speech produced by random noise
exciting slowly time-varying linear system => u[n] convolved with hv[n]
• time to examine full model and see what the complex cepstrum of speech looks like
Homomorphic Filtering of Voiced Homomorphic Filtering of Voiced SpeechSpeech
• goal is to separate out the excitation impulses from the remaining components of the complex cepstrum
• use cepstral window, l(n), to separate excitation pulses from combined vocal tract
76
– l(n)=1 for |n|<n0<Np– l(n)=0 for |n|≥n0– this window removes excitation
pulses– l(n)=0 for |n|<n0<Np– l(n)=1 for |n|≥n0– this window removes combined
vocal tract• the filtered signal is processed by
the inverse characteristic system to recover the combined vocal tract component
( )12
ˆ ˆ( ) ( ) ( )
ˆ ˆ( ) ( ) ( )jj j
y n n x n
Y e X e L e dπ
ω θω θ
π
θπ
−
−
= ⋅
= ∫
Voiced Speech ExampleVoiced Speech ExampleCepstrally
smoothed log magnitude, 50
quefrenciescutoff
77
Cepstrallyunwrapped phase, 50
quefrenciescutoff
Voiced Speech ExampleVoiced Speech Example
Combined impulse
f
78
response of glottal pulse, vocal tract
system, and radiation system
14
Voiced Speech ExampleVoiced Speech Example
High quefrencyliftering; cutoff
79
liftering; cutoff quefrency=50; log magnitude
and unwrapped phase
Voiced Speech ExampleVoiced Speech Example
Estimated excitation
function for voiced speech
(Hamming
80
(Hamming window
weighted)
Unvoiced Speech ExampleUnvoiced Speech Example
Hamming window
40 msec duration
(section beginning
81
at sample 3200 in file test_16k.wav)
Unvoiced Speech ExampleUnvoiced Speech Example
wrapped phase
82
unwrapped phase
Unvoiced Speech ExampleUnvoiced Speech Example
83
Unvoiced Speech ExampleUnvoiced Speech ExampleCepstrally
smoothed log magnitude, 50
quefrenciescutoff
84
Cepstrallyunwrapped phase, 50
quefrenciescutoff
15
Unvoiced Speech ExampleUnvoiced Speech Example
Estimated excitation source
for unvoiced
85
for unvoiced speech section
(Hamming window
weighted)
ShortShort--Time Homomorphic AnalysisTime Homomorphic Analysis
STFT
86
Review of Cepstral CalculationReview of Cepstral Calculation
• 3 potential methods for computing cepstral coefficients, , of sequence x[n]– analytical method; assuming X(z) is a rational
function; find poles and zeros and expand using log i
ˆ[ ]x n
87
power series– recursion method; assuming X(z) is either a minimum
phase (all poles and zeros inside unit circle) or maximum phase (all poles and zeros outside unit circle) sequence
– DFT implementation; using windows, with phase unwrapping (for complex cepstra)
Example 1Example 1——single pole sequence single pole sequence (computed using all 3 methods)(computed using all 3 methods)
[ ] [ ]= nx n a u n
88
1ˆ[ ] [ ]= −nax n u nn
Cepstral Computation AliasingCepstral Computation Aliasing
( )
[ ] [ ] [ ]
1
Effect of quefrency aliasing via a simple example with discrete-time Fourier transform
p
p
j Nj
x n n n N
X e e ωω
δ αδ
α −
= + −
= +
i
i
89
ˆ ( ) log{1 }
We can express the complex logarithm as
pj NjX e e ωω α −= +
i1
1
1
1
( 1)
( 1)ˆ[ ] [ ]
giving a complex cepstrum in the form
p
m mj mN
m
m m
pm
em
x n n mNm
ωα
α δ
+∞−
=
+∞
=
⎛ ⎞−= ⎜ ⎟
⎝ ⎠
⎛ ⎞−= −⎜ ⎟
⎝ ⎠
∑
∑
i
Example 2Example 2——voiced speech framevoiced speech frame
90
16
Example 3Example 3——low quefrency lifteringlow quefrency liftering
91
Example 3Example 3——high quefrency lifteringhigh quefrency liftering
92
Example 4Example 4——effects of low quefrency liftereffects of low quefrency lifter
93
Example 5Example 5——phase unwrappingphase unwrapping
94
Example 6Example 6——phase unwrappingphase unwrapping
95
Homomorphic Spectrum SmoothingHomomorphic Spectrum Smoothing
96
17
Running Running CepstrumCepstrum
97
Running Running CepstrumCepstrum
98
Running Running CepstrumsCepstrums
99
CepstrumCepstrum ApplicationsApplications
100
CepstrumCepstrum Distance MeasuresDistance Measures The cepstrum forms a natural basis for comparing
patterns in speech recognition or vector quantizationbecause of its stable mathematical characterizationfor speech signals A typical "cepstral distan
i
i
2( [ ] [ ])
ce measure" is of the form:
con
D c n c n= −∑
101
1( [ ] [ ])
[ ] [ ]where and are cepstral sequences correspondingto frames of signal, and is the cepstral distance betweenthe pair of sequences. Using
n
c n c nD
=∑
i
21 (log | ( ) | log | ( ) |)2
Parseval's theorem, we can express the cepstral distance in the frequency domain as:
Thus we see that the cepstral distance is actually a log magnitude spe
j jD H e H e dπ ω ω
πω
π −= −∫
ictral distance
Mel Frequency Mel Frequency CepstralCepstral CoefficientsCoefficients Basic idea is to compute a frequency analysis based on a filter
bank with approximately critical band spacing of the filters andbandwidths. For 4 kHz bandwidth, approximately 20 filters are used.
i
i [ ], 0,1,..., / 2First perform a short-time Fourier analysis, giving where is the frame number and is the frequency index (1 to halfthe size of the FFT)
Next the DFT values are grouped together
mX k k NFm k
=
i in critical bands and weighted
102
g p g gby triangular weighting functions.
18
Mel Frequency Mel Frequency CepstralCepstral CoefficientsCoefficients
2
( 1, 2,..., )
1[ ] | [ ] [ ] |
[ ]
The mel-spectrum of the frame for the filter is defined as:
MF
where is the weighting function for the filter, ranging fromDFT index
r
r
th th
U
m r mk Lr
thr
m r r R
r V k X kA
V k r=
=
= ∑
i
to , andr rU
L U
103
2| [ ] |
is the normalizing factor for the mel-filter. (Normalization guaranteesthat if the input spectrum is flat, the mel-spectrum is flat). A discrete cosine transform
r
r
U
r rk L
th
A V k
r=
= ∑
i
1
[ ]
1 2 1[ ] log( [ ]) cos , 1, 2,...,2
13 24
mfcc
mfcc
of the log magnitude of the filter outputs iscomputed to form the function mfcc as:
mfcc MF
Typically and for 4 kHz
R
m mr
n
n r r n n NR R
N R
π=
⎡ ⎤⎛ ⎞= + =⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦= =
∑i bandwidth speech signals.
Delta Delta CepstrumCepstrum The set of mel frequency cepstral coefficients provide perceptually
meaningful and smooth estimates of speech spectra, over time Since speech is inherently a dynamic signal, it is reasonable to seek
i
ia representation that includes some aspect of the dynamic nature ofthe time derivatives (both first and second order derivatives) of the short-term cepstrum The resulting parameter sets are called thi e delta cepstrum (first derivative)
and the delta-delta cepstrum (second derivative).Th i l t th d f ti d lt t t i fi t
104
The simplest method of computing delta cepstrum parameters is a firstdifference of cepstral vectors, of the form:
i
1[ ] [ ] [ ] mfcc mfcc mfcc The simple difference is a poor approximation to the first derivative and is
not generally used. Instead a least-squares approximation to the local slope(over a r
m m mn n n−Δ = −
i
2
( [ ])[ ]
egion around the current sample) is used, and is of the form:
mfcc mfcc
where the region is frames before and after the current frame
M
m kk M
m M
k M
k nn
k
M
+=−
=−
Δ =∑
∑
Homomorphic VocoderHomomorphic Vocoder• time-dependent complex cepstrum retains all the
information of the time-dependent Fourier transform => exact representation of speech
• time dependent real cepstrum loses phase information > not an exact representation of
105
information -> not an exact representation of speech
• quantization of cepstral parameters also loses information
• cepstrum gives good estimates of pitch, voicing, formants => can build homomorphic vocoder
Homomorphic VocoderHomomorphic Vocoder1. compute cepstrum every 10-20 msec2. estimate pitch period and
voiced/unvoiced decision3. quantize and encode low-time cepstral
l
106
values4. at synthesizer-get approximation to hv(n)
or hu(n) from low time quantized cepstral values
5. convolve hv(n) or hu(n) with excitation created from pitch, voiced/unvoiced, and amplitude information
Homomorphic VocoderHomomorphic Vocoder
107
• l(n) is cepstrum window that selects low-time values and is of length 26 samples homomorphic
vocoder
Homomorphic Vocoder Impulse ResponsesHomomorphic Vocoder Impulse Responses
108
19
SummarySummary• Introduced the concept of the cepstrum of a signal,
defined as the inverse Fourier transform of the log of the signal spectrum
• Showed cepstrum reflected properties of both the
1ˆ[ ] log ( )jx n F X e ω− ⎡ ⎤= ⎣ ⎦p p p
excitation (high quefrency) and the vocal tract (low quefrency)– short quefrency window filters out excitation; long quefrency
window filters out vocal tract
• Mel-scale cepstral coefficients used as feature set for speech recognition
• Delta and delta-delta cepstral coefficients used as indicators of spectral change over time 109