Upload
ann-pierce
View
233
Download
2
Tags:
Embed Size (px)
Citation preview
1
Speech Recognition Trend and
Features of the Speech Signal for the Speech Recognition
Spring 2014
Hanbat National UniversityDepartment of Computer Engineeroing
Yoon-Joong Kim
2
Trend of Speech Recognition
• Nuance • 1994 – Nuance spun off from SRI's STAR(SRI
International's Speech Technology and Research) Lab• The technology, SI-NLSR(NLSR), does not require to train
for a special speaker in a way quite• ScanSoft• ViaVoice •Vlingo,• Vlingo is an intelligent software assistant and knowledge
navigator functioning as a personal assistant application for Symbian, Android, iPhone, BlackBerry, and other smartphones.
3
Trend of Speech Recognition
• Siri • Siri is a spin-out from the SRI International Artificial
Intelligence Center, and is an offshoot of the DARPA(Defense Advanced Research Agency)-funded CALO(Cognitive Assistant that Learns and Organizes) project
• Siri/ˈsɪri/ is an intelligent personal assistant and knowledge navigator which works as an application for Apple's iOS.
• The application uses a natural language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Web services. Apple claims that the software adapts to the user's individual preferences over time and personalizes results, and performing tasks such as finding recommendations for nearby restaurants, or getting directions
4
Trend of Speech Recognition
• S voice• S Voice is an intelligent personal assistant and knowledge
navigator which is only available as a built-in application for the Samsung Galaxy S III, S III Mini, S4, S II Plus, Note II, Note 10.1, Note 8.0, Stellar, Grand and Camera.
5
Trend of Speech Recognition
• CMU Sphinx• is the general term to describe a group of speech recognition
systems developed at Carnegie Mellon University. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model trainer (SphinxTrain).
• In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001).
• The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training, Language model compilation and a public-domain pronunciation dictionary, cmudict.
6
An Isolated Word HMM Recognizer
7
An Isolated Word HMM Recognizer
O
2
1
)|(maxarg21
v
vOPv
CMS(cepstral mean subtraction) PLP(Perceptual Linear Prediction Coefficients)Multitaper MFCC and PLP Features for Speaker Verification Using i-Vectors
8
Features must (a) provide good representation of phonemes(b) be robust to non-phonetic changes in signal
Features: How to Represent the Speech Signal
Time domain (waveform):
Frequency domain (spectrogram):
“Markov”: male speaker “Markov”: female speaker
99
Features: Windowing
In many cases, the math assumes that the signal is periodic. We always assume that the data is zero outside the window.
When we apply a rectangular window, there are usually discontinuities in the signal at the ends. So we can window the signal with other shapes, making the signal closer to zero at the ends. This attenuates discontinuities.
Hamming window:
10)1
2cos(46.054.0)(
Nn
N
nnh
1.0
0.0 N-1Typical window size is 16 msec, which equals 256 samples for16-kHz (microphone) signal and 128 samples for 8-kHz (telephone) signal. Window size does not have to equal frame size!
0
10
Features: Spectrum and Cepstrum
(log power) spectrum:
1. Hamming window2. Fast Fourier Transform (FFT)3. Compute 10 log10(r2+i2)
where r is the real component, i is the imaginary component
timeampl
itude
frequencyener
gy (
dB)
11
Features: Spectrum and Cepstrum
cepstrum:treat spectrum as signal subject to frequency analysis…
1. Compute log power spectrum2. Compute FFT of log power spectrum
3. Use only the lower 13 values (cepstral coefficients)
12
Features: Spectrum and Cepstrum
Why Use Cepstral Features?
• number of features is small (13 vs. 64 or 128 for spectrum)
• models spectral envelope (relevant to phoneme identity), not (irrelevant) pitch
• coefficients tend to not be correlated with each other (useful to assume that non-diagonal elements of covariance matrix are zero… see Lecture 5, slide 30)
• (relatively) easy to compute
Cepstral features are very commonly used. Another type of feature that is commonly used is called Linear Predictive Coding (LPC).
13
Features: Autocorrelation
Autocorrelation:measure of periodicity in signal
m
n kmxmxkR )()()(
)()()()()(1
0
kmwkmxmwmxkR n
kN
mnn
ampl
itude
n=start sample of analysis, m=sample within analysis window 0…N-1
For periodic signals the function attains a maximum at sample lags of 0, +-P, +-2P, etc. where P is the period of the signal.
x(m)x(m+1)
m=N-1 0
m=N-1 0
x(m)x(m) x(m)x(m-T)
m=N-1 0
14
Features: Autocorrelation
Autocorrelation: measure of periodicity in signal
KkkmymykRkN
mnnn
0)()()(1
0
and if we set yn(m) = xn(m) w(m), so that y is the windowedsignal of x where the window is zero for m<0 and m>N-1, then:
where K is the maximum autocorrelation index desired.
Note that Rn(k) = Rn(-k), because when we sum over allvalues of m that have a non-zero y value (or just change the limits in the summation to m=k to N-1 and use negative k), then
)()()()()(1
0
kmwkmxmwmxkR n
kN
mnn
from previous slide
KkkmymykRN
kmnnn
0)()()(1
15
Linear Time-Invariant System
invariant- timeis f() DTSso][][
][)][(]))[(()(
][])[(]))[((][
invariant?- time)( of () DTS Is)
][]))[((]))[((][
]),[(][
0
20
20
200
2
0
nnynw
nnxnxDnxfDnny
nnxnnxfnxDfnw
xxffex
nwnxDfnxfDnny
nxfny
invariant- timeis f() DTSso][)(
][)]([])[(]))[((][
][])[(])))[((][
invariant?- time][])[( of () DTS Is)
0
000
00
nwnny
nnxnnxnxDnxfDnny
nnxnnxfnxDfnw
nxnxffex
A discrete-time system f() is said to be time-invariant if, an input is delayed(shifted) by , then a output is delayed by same amount
x[n] y[n]
0n
][ 0nnx ][ 0nny
0nn n
nn
16
Linear System
linear not is f() DTS the
])[(])[(])[(])[(
])[][(])[][(
linear? ][])[( Is)
])[(])[(])[][(
])[(])[(:yHomogeneit
])[(])[(])[][(: Additivity
if linear be to said is () system time-discrete A
21
2111
21111
2
1111
1
nxnxnxfnxf
nxnxnxnxf
nxnxfex
nxfnxfnxnxfor
nxfnxf
nyfnxfnynxf
f
linear is f() DTS theSo
][][])[(])[(
][][])[][(
linear? ][])[( of () DTS Is)
2121
2121
naxnaxnxafnxaf
naxnaxnaxnaxf
nxnxffex
17
system. linear not is))((
))(())((
))(()()())((
system? linear a R is)()())((
2
knxR
knxRknxR
knxRkmxmxknxR
kmxmxknxR
mnn
mnn
Is autocorrelation Linear Time-Invariant System?
system. invariant-time is))((
)))((()))(((
)()(
)()())(((
)()(
))(()))(((
system? Invariant-time a R Is)()())((
00
00
0
knxR
knxRDknxDR
kmxmx
kmxmxDknxRD
kmxmx
knnxRknxDR
kmxmxknxR
mnnnn
mnn
mnnnn
mnn
18
11
0
1
0
1
'
1
0
)()()(,)()()(
)()()()(
0)()(0
:10)()(
1010)(
:100)(
m'-,'over)'()'(
m- over)()()(
)()()()()()(where
0)()(0)(
100)()(
1010)(
)()()(,100)(
)()(where)()()(
N
kmnn
kN
mnnn
n
N
mnnn
N
kmnn
nn
n
n
mnn
mnnn
nnnn
kN
mnnn
nn
n
n
nm
nnn
kmymykRthenkmymykRif
kRkmymykRso
kmymy
kNmkoverkmymy
NkmkNmkoverkmy
Nmovermy
kkkmmmxkmx
kmxmxkR
kmwkmxkmymwmxmy
kmymykR
kNmoverkmymy
NkmkNmkoverkmy
mwmxmyNmovermy
mnxmxkmxmxkR
?)()( kRkR nn
19
Features: Autocorrelation
Autocorrelation of speech signals: (from Rabiner & Schafer, p. 143)
Autocorrelation in Speech Signal http://staffwww.dcs.shef.ac.uk/people/M.Cooke/MAD/auto/auto.htm#introduction
20
Features: Autocorrelation
Eliminate “fall-off” by including samples in w2 not in w1.
otherwisemw
kNmmw
otherwisemw
Nmmw
0)(
101)(
0)(
101)(
2
2
1
1
= modified autocorrelation function= cross-correlation function
Note: requires k ·N multiplications; can be slow
KkkmwkmxmwmxkR n
N
mnn
0)()()()()(ˆ2
1
01
KkkmxmxkRN
mnnn
0)()()(ˆ1
0
21
Features: LPC
Linear Predictive Coding (LPC) provides• low-dimension representation of speech signal at one frame• representation of spectral envelope, not harmonics• “analytically tractable” method• some ability to identify formants
LPC models the speech signal at time point n as an approximate linear combination of previous p samples:
where a1, a2, … ap are constant for each frame of speech.
We can make the approximation exact by including a“difference” or “residual” term:
(1)
(2)
where G is a scalar gain factor, and u(n) is the (normalized)error signal (residual).
)()2()1()( 21 pnsansansans p
p
kk nGuknsans
1
)()()(
22
Features: LPC
LPC can be used to generate speech from either the error signal (residual) or a sequence of impulses as input( ):
where ŝ is the generated speech, and e(m) is the error signal or a sequence of impulses. However, we use LPC here as a representation of the signal.
The values a1…ap (where p is typically 10 to 15) describe the signal over the range of one window of data (typically 128 to 256 samples).
While it’s true that 10-15 values are needed to predict (model) only one data point (estimating the value at time m from the previous p points), the same 10-15 values are used to represent all data points in the analysis window. When one frame of speech has more than p values, there is data reduction. For speech, the amount of data reduction is about 10:1. In addition, LPC values model the spectral envelope, not pitch information.
)()2()1()()(ˆ 21 pmsamsamsamems p
)(ˆ ms)(me
23
then we can find ak by setting En/ak = 0 for k = 1,2,…p, obtaining p equations and p unknowns:
Features: LPC
If the error over a segment of speech is defined as
2
1
2
2
1
2
1
)()(
)(ˆ)()(,)(
M
Mm
p
knkn
nnn
M
Mmnn
kmsams
msmsmemeE
pimsimskmsimsaM
Mmnn
p
k
M
Mmnnk
1)()()()(ˆ2
1
2
11
(3)
(4)
(5)
(as shown on next slide…)Error is minimum (not maximum) when derivative is zero, because as any ak changes away from optimum value, error will increase.
24
Features: LPC2
1
2
1
)()(
M
Mm
p
kkn kmsamsE
0)()()()(2
)()()(2
)(
)()...()..2()1()(
)()()()(2
1
1
21
11
2
1
2
1
2
1
2
1
p
k
M
Mmk
M
Mm
M
Mm
p
kk
pii
p
kk
i
M
Mm
p
kk
i
n
imskmsaimsms
imskmsams
ims
pmsaimsamsamsamsa
kmsamsa
kmsamsa
E
)0,(
.
)0,2(
)0,1(
.
),(),...,2,(),1,(
...
),2(),...,2,2(),1,2(
),1(),...,2,1(),1,1(
1where)0,(),(...)2,()1,(
)0,(),(
)()(),(
1where)()()()(
2
1
21
1
1
2
1
2
1
2
1
pa
a
a
pppp
p
p
piiapiaiai
iaki
kmsimski
pimsimsakmsims
n
n
n
pnnn
nnn
nnn
npnnn
n
p
kkn
M
Mmn
M
Mm
p
kk
M
Mm
25
)()2(...)1()2()2()(2
)()1(...)1()1(2)1()(2)(0
2122
11112
1
2
1
pmsamsamsamsamsams
pmsamsamsamsamsamsmsaE
p
M
Mmp
n
Features: LPC
pimsimskmsimsaM
Mm
p
k
M
Mmk
1)()()()(2
1
2
11
(5-1)
pikmsimsaimsmsM
Mm
p
kk
10)()(2)()(22
1 1
pikmsimsaimsmsM
Mm
p
kk
M
Mm
10)()(2)()(22
1
2
1 1
0)()1(2...)2()1(2)1()1(2)1()(22
1
21
M
Mmp pmsmsamsmsamsmsamsms
2
10)1()(...)1()3()1()2(
)()1(...)2()1()1()1(2)1()(2
32
21M
Mm p
p
mspmsamsmsamsmsa
pmsamsmsamsmsmsamsms
2
1
2
1
)()(
M
Mm
p
kkn kmsamsE
2
1 111
2 )()()()(2)(M
Mm
p
rk
p
kk
p
kkn rmsakmsakmsamsmsE
2
1
1
122
111
2
)()()()(2
)()2()2()(2
)()1()1()(2)(
M
Mm
p
rrpp
p
rr
p
rr
n
rmsapmsapmsams
rmsamsamsams
rmsamsamsamsms
E
(5-2)
(5-3)
(5-4)
(5-5)
(5-6)
(5-7)
(5-8)
(5-9)
repeat (5-4) to (5-6) for a2, a3, … ap
26
Features: LPC Autocorrelation Method
Then, defining
we can re-write equation (5) as:
2
1
)()(),(M
Mmnnn kmsimski
piikia n
p
knk
1)0,(),(ˆ1
We can solve for ak using several methods. The most commonmethod in speech processing is the “autocorrelation” method:
Force the signal to be zero outside of interval 0 m N-1:
where w(m) is a finite-length window (e.g. Hamming) of length N that is zero when less than 0 and greater than N-1. ŝ is the windowed signal. As a result,
)()()(ˆ mwmsms nn
1
0
2 )(pN
mnn meE
(6)
(7)
(8)
(9)
27
Features: LPC Autocorrelation Method
How did we get from
to
1
0
2 )(pN
mnn meE
2
1
)(2M
Mmnn meE (equation (3))
(equation (9))
with window from 0 to N-1? Why not
1
0
2 )(N
mnn meE ??
Because value for en(m) may not be zero when m > N-1…for example, when m = N+p-1, then
p
knknn kpNsapNspNe
1
)1(ˆ)1(ˆ)1(
)1(ˆ...)11(ˆ)1(ˆ)1( 1 ppNsapNsapNspNe npnnn 0
ŝn(N-1) is not zero!0
28
Features: LPC Autocorrelation Methodbecause of setting the signal to zero outside the window, eqn (6):
and this can be expressed as
and this is identical to the autocorrelation function for |ik| becausethe autocorrelation function is symmetric, Rn(x) = Rn(x) :
so the set of equations for ak (eqn (7)) can be combo of (7) and (12):
1
0 0
1)(ˆ)(ˆ),(
pN
mnnn pk
pikmsimski
)(1
0 0
1))((ˆ)(ˆ),(
kiN
mnnn pk
pikimsmski
xN
mnnn
nn
xmsmsxR
kiRki1
0
)(ˆ)(ˆ)(
|)(|),(
p
knnk piiRkiRa
1
1)(|)(|ˆ
(10)
(11)
(12)
(13)
(14)
where
29
Features: LPC Autocorrelation Method
||1
0
)(1
0
0
1|)|(ˆ)(ˆ
|)(|),(
|)(|)(|)(|
)(
0
1))((ˆ)(ˆ),(
|)(|)(
0),()(
0,)()(
?|)(|)( then ),()( If
kiN
mnn
n
kiN
mnnn
pk
pikimsms
kiRki
xRxRkiR
kiR
pk
pikimsmski
xRxR
xxRxR
xxRxR
xRxRxRxR
30
Features: LPC Autocorrelation MethodWhy can equation (10):
be expressed as (11): ???
1
0 0
1)(ˆ)(ˆ),(
pN
mnnn pk
pikmsimski
)(1
0 0
1)(ˆ)(ˆ),(
kiN
mnnn pk
pikimsmski
1
0 0
1)(ˆ)(ˆ),(
pN
mnnn pk
pikmsimski original equation
)(1',1)('0at,0))('(
1'0at0)'(
'at)'(ˆ)'(ˆ),(1
0'
kiNmNkimkims
Nmms
immikmsmskiipN
imnnn
add i to sn() offset(m’=m-i) and subtract i from summation limits. If m < 0, sn(m) is zero so still start sum at 0.
)(1
0 0
1))((ˆ)(ˆ),(
kiN
mnnn pk
pikimsmski
replace p in sum limit by k, becausewhen m > N+k-1-i, s(m+i-k)=0 and k is always p
31
Features: LPC Autocorrelation Method
In matrix form, equation (14) looks like this:
)(
)3()2()1(
ˆ
ˆˆˆ
)0()3()2()1(
)3()0()1()2()2()1()0()1()1()2()1()0(
3
2
1
pR
RRR
a
aaa
RpRpRpR
pRRRRpRRRRpRRRR
n
n
n
n
pnnnn
nnnn
nnnn
nnnn
There is a recursive algorithm to solve this: Durbin’s solution
pxyymsmsyR
piiRkiRa
yN
mnnn
p
knnk
0)(ˆ)(ˆ)(
1)(|)(|ˆ
1
0
1
),()(
)()()(ˆ
mnxms
mwmsms
n
nn
10)1
2cos(46.054.0)(
Nn
N
mmw
32
Features: LPC Durbin’s SolutionSolve a Toeplitz (symmetric, diagonal elements equal) matrix for values of :
)(
)1(2)(
)1()1()(
)(
)1(1
1
)1(
)0(
1
ˆ
)1(
11
1)()(
)0(
1)(|)(|
pjj
ii
i
ijii
ij
ij
ii
i
ii
j
iji
p
knnk
a
EkE
ijk
k
piEjiRiRk
RE
piiRkiR
33
Features: LPC Example
For 2nd-order LPC, with waveform samples {462 16 -294 -374 -178 98 40 -82}If we apply a Hamming window (because we assume signal is zerooutside of window; if rectangular window, large prediction errorat edges of window), which is
{0.080 0.253 0.642 0.954 0.954 0.642 0.253 0.080}then we get {36.96 4.05 -188.85 -356.96 -169.89 62.95 10.13 -6.56}and so
R(0) = 197442 R(1)=117319 R(2)=-946
0.59420
0.59420)0()1(
0)1(
197442)0(
1)1(
1
)0(1
)0(
k
RR
ERk
RE
p
kk kmame
msamsamems
1
21
)()(
)2()1()()(ˆ
NmmnxmsS 81)}({)}({
Lmmhmsmsn 1)()()(ˆ
8,20)(ˆ)(ˆ)(1
0
NpyymsmsyRyN
mnnn
)()( mnxmsn
)(mh
34
Features: LPC Example
0.55317ˆ0.92289ˆ
0.92289)1()0(
)2()1()0()1(
0.55317
0.55317)1()0(
)1()0()2()1()2(
127731)0(
)1()0()1()1(
21
22)1(
12)1(
1)2(
1
2)2(
2
22
2)1()1(
12
22)0(2
1
aa
RR
RRRRk
k
RR
RRRERRk
R
RREkE
Note: if divide all R(·) values by R(0), solution is unchanged,but error E(i) is now “normalized error”.Also: -1 kr 1 for r = 1,2,…,p
35
Features: LPC Example
We can go back and check our results by using these coefficients to “predict” the windowed waveform: s(m)={36.96 4.05 -188.85 -356.96 -169.89 62.95 10.13 -6.56}and compute the error from time 0 to N+p-1 (Eqn (9))
0 ×0.92542 + 0 × -0.5554 = 0 vs. 36.96, e(0) = 36.96 036.96 ×0.92542 + 0 × -0.5554 = 34.1 vs. 4.05, e(1) = -30.05 14.05 ×0.92542 + 36.96 × -0.5554 = -16.7 vs. –188.85, e(2) = -172.15 2-188.9×0.92542 + 4.05 × -0.5554 = -176.5 vs. –356.96, e(3) = -180.43 3-357.0×0.92542 + -188.9×-0.5554 = -225.0 vs. –169.89, e(4) = 55.07 4-169.9×0.92542 + -357.0×-0.5554 = 40.7 vs. 62.95, e(5) = 22.28 562.95×0.92542 + -169.89×-0.5554 = 152.1 vs. 10.13, e(6) = -141.95 610.13×0.92542 + 62.95×-0.5554 = -25.5 vs. –6.56, e(7) = 18.92 7-6.56×0.92542 + 10.13×-0.5554 = -11.6 vs. 0, e(8) = 11.65 80×0.92542 + -6.56×-0.5554 = 3.63 vs. 0, e(9) = -3.63 9
A total squared error of 88,645, or error normalized by R(0) of0.449
(If p=0, then predict nothing, and total error equals R(0), so we cannormalize all error values by dividing by R(0).)
time
)()()(
55317.0)2(92289.0)1(
)2()1()( 21
msmsme
xmsxms
msamsams
:)0(s
:)1(s
36
Features: LPC Example
If we look at a longer speech sample of the vowel /iy/, dopre-emphasis of 0.97 (see following slides), and perform LPC of various orders, we get:
0.00
0.04
0.08
0.12
0.16
0.20
0 1 2 3 4 5 6 7 8 9 10
LPC Order
Nor
mal
ized
Pre
dic
tion
Err
or
(tot
al s
qu
ared
err
or /
R(0
))
which implies that order 4 captures most of the importantinformation in the signal (probably corresponding to 2 formants)
37
Features: LPC and Linear Regression
• LPC models the speech at time n as a linear combination of the previous p samples. The term “linear” does not imply that the result involves a straight line, e.g. s = ax + b.
• Speech is then modeled as a linear but time-varying system (piecewise linear).
• LPC is a form of linear regression, called multiple linear regression, in which there is more than one parameter. In other words, instead of an equation with one parameter of the form s = a1x + a2x2, an equation of the form s = a1x + a2y + …
• Because the function is linear in its parameters, the solution reduces to a system of linear equations, and other techniques for linear regression (e.g. gradient descent) are not necessary.
38
Features: LPC Spectrum
because the log power spectrum is:
We can compute spectral envelope magnitude from LPC parameters by evaluating the transfer function S(z) for z=ej:
22
2
2
2
122
11
}Im{}Re{log10
}Im{}Re{log10)(
0)2
sin(}Im{)2
cos(1}Re{
AA
G
AA
Gn
NnN
nkaA
N
nkaA
p
kk
p
kk
Each resonance (complex pole) in spectrum requires twoLPC coefficients; each spectral slope factor (frequency=0 or Nyquist frequency) requires one LPC coefficient.
For 8 kHz speech, 4 formants LPC order of 9 or 10
p
k
kjk
jj
ea
G
eA
GeS
1
1)(
)(
)sin()cos( je j
39
Features: LPC Representations
40
Features: LPC Cepstral Features
The LPC values are more correlated than cepstral coefficients.But, for GMM with diagonal covariance matrix, we want values to be uncorrelated.
So, we can convert the LPC coefficients into cepstral values:
The cepstral coefficients, which are the coefficients of the Fourier transform representation of the log magnitude spectrum, have beenshown a more robust, reliable feature set for speech recognition than the LPC coefficients, the PARCOR coefficients, or the log area ratio coefficients.
pQmpacm
kc
pmacm
kac
c
m
pmkkmkm
m
kkmkmm
m
3
2,
1
ln
1
1
1
2
41
Features: LPC History
Wikipedia has an interesting article on the history of LPC:
… The first ideas leading to LPC started in 1966 when S. Saito and F. Itakura of NTT described an approach to automatic phoneme discrimination that involved the first maximum likelihood approach to speech coding. In 1967, John Burg outlined the maximum entropy approach. In 1969 Itakura and Saito introduced partial correlation, May Glen Culler proposed real-time speech encoding, and B. S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America.
In 1972 Bob Kahn of ARPA, with Jim Forgie (Lincoln Laboratory) and Dave Walden (BBN Technologies), started the first developments in packetized speech, which would eventually lead to Voice over IP. In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol.
It is [currently] used as a form of voice compression by phone companies, for example in the GSM standard. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel.
[from http://en.wikipedia.org/wiki/Linear_predictive_coding]
42
The source signal for voiced sounds has slope of -6 dB/octave:
We want to model only the resonant energies, not the source.But LPC will model both source and resonances.
If we pre-emphasize the signal for voiced sounds, we flatten it in the spectral domain, and source of speech more closely approximates impulses. LPC can then model only resonances (important information) rather than resonances + source.
Pre-emphasis:
Features: Pre-emphasis
0 1k 2k 3k 4k
97.0)1()()(' kmskmsms nnn
frequency
ener
gy (
dB)
43
Features: Pre-emphasis
Adaptive pre-emphasis: a better way to flatten the speech signal
1. LPC of order 1= value of spectral slope in dB/octave= R(1)/R(0) = first value of normalized autocorrelation
2. Result = pre-emphasis factor
)1()0(
)1()()(' ms
R
Rmsms nnn
44
Features: Cpstral Coefficient
For a input speech samples
1.Pre-emphasis factor
2.Framing, n: sample # of a f frame, M:shift rate M<<N
3.Hamming Windows
97.0)1()()(ˆ knsknsns
)}({)( nsns
10)1
2cos(46.054.0)(
)()()(ˆ
NnN
nnw
nwnxnx ll
1,...1,0,1,..1,0)}()( LlNnnlMsnxl
45
Features: Cpstral Coefficient
5. Autocorrelation Analysis
6. LPC Analysis
1..0)(ˆ)(),()( Nnnxnxandmnxmxwhere n
pyymxmxyR
piiRkiRa
yN
mnnn
p
knnk
0)()()(
1)(|)(|ˆ
1
0
1
pja
EkE
ijk
k
piEjiRiRk
RE
pjj
ii
i
ijii
ij
ij
ii
i
ii
j
iji
1ˆ
)1(
11
1)()(
)0(
)(
)1(2)(
)1()1()(
)(
)1(1
1
)1(
)0(
46
Features: Cpstral Coefficient
7. PLC Parameter Conversion to Cepstral Coefficients
1
1
1
,
1
m
pmkkmkm
m
kkmkmm
Qmpacm
kc
pmacm
kac