76
Speech Processing Analysis and Synthesis of Pole- Zero Speech Models

Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

Embed Size (px)

Citation preview

Page 1: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

Speech Processing

Analysis and Synthesis of Pole-Zero Speech Models

Page 2: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 2

Introduction Deterministic:

Speech Sounds with periodic or impulse sources Stochastic:

Speech Sounds with noise sources Goal is to derive vocal tract model of each class of sound

source. It will be shown that solution equations for the two

classes are similar in structure. Solution approach is referred to as linear prediction

analysis. Linear prediction analysis leads to a method of speech

synthesis based on the all-pole model. Note that all-pole model is intimately associated with

the concatenated lossless tube model of previous chapter (i.e., Chapter 4).

Page 3: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 3

All-Pole Modeling of Deterministic Signals

Consider a vocal tract transfer function during voiced source:

T=pitch

Ug[n]

A

GlottalModel

G(z)

Vocal TrackModelV(z)

RadiationModel

R(z)

s[n]Speech

P

k

kk za

AzH

zRzVzAGzH

1

1

Page 4: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 4

All-Pole Modeling of Deterministic Signals

What about the fact that R(z) is a zero model? A single zero function can be expressed as a infinite set

of poles. Note:

From the above expression one can derive:

za az az

zaazk

kk

k

k

1 ,1

1 1

01

0

1

az

z-bzaaz

kk

k

kk

1

111

poles ofnumber infinite

0

1

0

zero simple

1

Page 5: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 5

All-Pole Modeling of Deterministic Signals

In practice infinite number of poles are approximated with a finite site of poles since ak0 as k∞.

H(z) can be considered all-pole representation: representing a zero with large number of

poles ⇒ inefficient Estimating zeros directly a more efficient

approach (covered later in this chapter).

Page 6: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 6

Model Estimation

Goal - Estimate : filter coefficients {a1, a2, …,ap}; for a particular

order p, and A,

Over a short time span of speech signal (typically 20 ms) for which the signal is considered quasi-stationary.

Use linear prediction method: Each speech sample is approximated as a linear

combination of past speech samples ⇒ Set of analysis techniques for estimating parameters

of the all-pole model.

Page 7: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 7

Model Estimation Consider z-transform of the vocal tract model:

Which can be transformed into:

In time domain it can be written as:

Referred to us as a autoregressive (AR) model.

p

k

kk

g za

A

zU

zSzH

1

1

zAUzzSazSzazS g

p

k

kk

p

k

kk

11

1

p

kgk nAuknsans

1

Current Sample Past SamplesScaling Factor –Linear Prediction

CoefficientsInput

Page 8: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 8

Model Estimation

Method used to predict current sample from linear combination of past samples is called linear prediction analysis.

LPC – Quantization of linear prediction coefficients or of a transformed version of these coefficients is called linear prediction coding (Chapter 12).

For ug[n]=0

This observation motivates the analysis technique of linear prediction.

p

kk knsans

1

Page 9: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 9

Model Estimation: Definitions

A linear predictor of order p is defined by:

p

kk knsns

1

~

Estimate of s[n] Estimate of ak

z

p

k

kk

p

k

kk

zzP

zzSzS

1

1

~

Page 10: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 10

Model Estimation: Definitions

Prediction error sequence is given as difference of the original sequence and its prediction:

Associated prediction error filter is defined as:

If {k}={ak}

zAzSzzSzzSzSzE

knsnsnsnsne

p

k

kk

p

k

kk

p

kk

11

1

1

~

p

k

kk zPzzA

1

11

s[n] P[z] e[n]=Aug[n]s[n]˜

A(z)

Page 11: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 11

nAune

knsnAuknsaknsnsne

nsnsne

g

p

kk

p

kgk

p

kk

111

~

Model Estimation: Definitions

Note 1:

Recovery of s[n]:

nAuzA

nsnAuzAns gg

1

zA

1Aug[n] s[n]

Page 12: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 12

Model Estimation: Definitions

Note 2: If

1. Vocal tract contains finite number of poles and no zeros,2. Prediction order is correct,

then {k}={ak}, and e[n] is an impulse train for voiced speech and for impulse

speech e[n] will be just an impulse.

Page 13: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 13

Example 5.1 Consider an exponentially decaying impulse response of the form

h[n]=anu[n] where u[n] is the unit step. Response to the scaled unit sample A[n] is:

Consider the prediction of s[n] using a linear predictor of order p=1. It is a good fit since:

Prediction error sequence with 1=a is:

The prediction of the signal is exact except at the time origin.

nuAanhnAns n

11

1

az

zH

nAnunuAanuaanuaAne

nasnsnennn

1 1

1 1

Page 14: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 14

Error Minimization Important question is: how to derive an estimate of the

prediction coefficients al, for a particular order p, that would be optimal in some sense.

Optimality is measured based on a criteria. An appropriate measure of optimality is mean-squared error (MSE).

Goal is to minimize the mean-squared prediction error: E defined as:

In reality, a model must be valid over some short-time interval, say M samples on either side of n:

mm

memsmsE 22~

Page 15: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 15

Error Minimization

Thus in practice MSE is time-depended and is formed over a finite interval as depicted in previous figure.

[n-M,n+M] – prediction error interval. Alternatively:

where

Mnm

Mnmn meE 2

elsewhere ,

Mnm, n-Mkmsαmsme

meE

p

knkn

n

m

mn n

01

2

Page 16: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 16

Error Minimization

Determine {k} for which En is minimal:

Which results in:

,..,p,, i, α

E

i

n 3210

m

p

knnknn

mn

p

knkn

i

n

m

p

knk

i

p

knkn

i

n

m

p

knkn

im

p

knkn

ii

n

imskmsimsms

imskmsmsα

E

kmsα

kmsmsα

E

kmsmsα

kmsmsαα

E

1

1

11

2

1

2

1

20

2

2

Page 17: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 17

Error Minimization

Last equation can be rewritten by multiplying through:

Define the function:

Which gives the following:

Referred to as the normal equations given in the matrix form bellow:

1

1p

n n k n nm k m

s m i s m α s m i s m k , i p.

, 1 , 1n nm

i k s m i s m k , i p k p

,...,p,, i, i,Φi,kΦαp

kk 3210

1

b

Page 18: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 18

Error Minimization

The minimum error for the optimal solution can be derived as follows:

Last term in the equation above can be rewritten as:

m

p

lnl

p

knk

m

p

knk

mnnn

m

p

knknn

lmskmskmsmsmsE

kmsmsE

111

2

2

1

2

p

l mnnl

p

l

p

k mnnkl

m

p

lnl

p

knk

mslms

lmskmslmskms

1

1 111

Page 19: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 19

Error Minimization

Thus error can be expressed as:

p

knkn

m mnn

p

kkn

p

l mnnl

m mnn

p

kknn

k

mskmsms

mslmsmskmsmsE

1

1

2

11

2

,00,0

2

Page 20: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 20

Error Minimization Remarks:

1. Order (p) of the actual underlying all-pole transfer function is not known. Order can be estimated by observing the fact that a pth

order predictor in theory equals that of a (p+1) order predictor.

Also predictor coefficients for k>p equal zero (or in practice close to zero and model only noise-random effects).

2. Prediction error en[m] is non-zero only “in the vicinity” of the time n: [n-M,n+M]. In predicating values of the short-time sequence sn[m], p –

values outside of the prediction error interval [n-M,n+M] are required. Covariance method – uses values outside the interval to

predict values inside the interval Autocorrelation Method – assumes that speech samples are

zero outside the interval.

Page 21: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 21

Error Minimization

Matrix formulation

Projection Theorem: Columns of Sn – basis vectors

Error vector en is orthogonal to each basis vector: SnTen=0;

where

Orthogonality leads to:

1

2

0 1 0 2 01 1 1 2 1 1

1 2

n n

p

S s

s n M s n M s n M p s n Ms n M s n M s n M p s n M

s n M s n M s n M p s n M

mn-M,n, mkmsαmsmep

knkn

1

nTnn

Tn sSSS

Page 22: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 22

Autocorrelation Method

In previous section we have described a general method of linear prediction that uses samples outside the prediction error interval referred to as covariance method.

Alternative approach that does not consider samples outside analysis interval, referred to as autocorrelation method, will be presented next.

This method is: Suboptimal, however it Leads to an efficient and stable solution to normal

equations.

Page 23: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 23

Autocorrelation Method

Assumes that the samples outside the time interval [n-M,n+M] are all zero, and

Extends the prediction error interval, i.e., the range over which we minimize the mean-squared error to ±∞.

Conventions: Short-time interval: [n, n+Nw-1] where Nw=2M+1 (Note: it is

not centered around sample n as in previous derivation). Segment is shifted to the left by n samples so that the first

nonzero sample falls at m=0. This operation is equivalent to: Shifting of speech sequence s[m] by n-samples to the left

and Windowing by Nw -point rectangular window:

w1, for m=0,1,2, ,N 1w m

Page 24: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 24

Autocorrelation Method Windowed sequence can be

expressed as:

This operation can be depicted in the figure presented on the right.

ns m s m n w m

Page 25: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 25

Autocorrelation Method Important observations that are consequence of zeroing the

signal outside of interval:1. Prediction error is nonzero only in the interval [0,Nw+p-1]

Nw-window length p-the predictor order

2. The prediction error is largest at the left and right ends of the segment. This is due to edge effects caused by the way the prediction is done: from zeros – from the left of the window to zeros – from the right of the window

Page 26: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 26

Autocorrelation Method To compensate for edge effects typically tapered

window is used (e.g., Hamming). Removes the possibility that the mean-squared error

be dominated by end (edge) effects.

Data becomes distorted hence biasing estimates: k.

Let the mean-squared prediction error be given by:

1. Limits of summation refer to new time origin, and2. Prediction error outside this interval is zero.

1

2

0

wN p

n nm

E e m

Page 27: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 27

Autocorrelation Method

Normal equations take the following form (Exercise 5.1):

where

1

, ,0 , 1, 2,3, ,p

k n Nk

i k i i p

1

0

, , 1 , 1 wN p

n n nm

i k s m i s m k i p k p

Page 28: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 28

Autocorrelation Method

Due to summation limits depicted in the figure on the right function n[i,k] can be written as:

Recognizing that only samples in the interval [i,k+Nw-1] contribute to the sum, and

Changing variable m⇒ m-i:

1

,wk N

n n nm i

i k s m i s m k

Page 29: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 29

Autocorrelation Method

Since the above expression is only function of difference i-k thus we denote it as:

Letting =i-k, referred to as correlation “lag”, leads to short-time autocorrelation function:

,n nr i k i k

1

0

wN

n n nm

n n n

r s m s m

r s s

pkpikimsmsjikiN

mnn

w

1 ,1,,1

0

Page 30: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 30

Autocorrelation Method

rn[]=sn[]*sn[-]

Autocorrelation method leads to computation of the short-time sequence sn[m] convolved with itself flipped in time.

Autocorrelation function is a measure of the “self-similarity” of the signal at different lags .

When rn[] is large then signal samples spaced by are said to by highly correlated.

Page 31: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 31

Autocorrelation Method

Properties of rn[]:1. For an N-point sequence, rn[] is zero outside the interval [-

(N-1),N-1].2. rn[] is even function of 3. rn[0] ≥ rn[]4. rn[0] – energy of sn[m] ⇒

5. If sn[m] is a segment of a periodic sequence, then rn[] is periodic-like with the same period: Because sn[m] is short-time, the overlapping data in the correlation

decreases as increases ⇒ Amplitude of rn[] decreases as increases; With rectangular window the envelope of rn[] decreases linearly.

6. If sn[m] is a random white noise sequence, then rn[] is impulse-like, reflecting self-similarity only within a small neighborhood.

m

nn msr2

0

Page 32: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 32

Autocorrelation Method

Page 33: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 33

Autocorrelation Method Letting n[i,k] = rn[i-k], normal equation take the form:

The expression represents p linear equations with p unknowns, k for 1≤k≤p.

Using the normal equation solution, it can be shown that the corresponding minimum mean-squared prediction error is given by:

Matrix form representation of normal equations:Rn=rn.

pk irkirp

knnk

1,1

.01

p

knknn krrE

Page 34: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 34

Autocorrelation Method Expanded form:

The Rn matrix is Toepliz: Symmetric about the diagonal All elements of the diagonal are equal. Matrix is invertible

Implies efficient solution.

pr

r

r

r

rprprpr

prrrr

prrrr

prrrr

n

n

n

n

pnnnn

nnnn

nnnn

nnnn

3

2

1

0321

3012

2101

1210

3

2

1

Rn rn

Page 35: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 35

Example 5.3 Consider a system with an exponentially decaying

impulse response of the form h[n] = anu[n], with u[n] being the unit step function.

Estimate a using the autocorrelation method of linear prediction.

h[n]A[n] s[n]

nuanhnhnns nZ

11

11

a, az

zS

Page 36: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 36

Example 5.3

Apply N-point rectangular window [0,N-1] at n=0. Compute r0[0] and r0[1].

Using normal equations:

2

02

222

2

0

11

0000

1

02

22

1

0

1

0000

1

111

1

10

N

m

Nm

N

m

mmN

m

N

m

Nm

N

m

mmN

m

a

aaaaaamsmsr

a

aaaamsmsr

a

a

aa

r

rrr

N

N

N

lim

1

1

0

1 10

2

22

0

000

Page 37: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 37

Example 5.3 Minimum squared error (from slide 33) is thus (Exercise 5.5):

For 1st order predictor, as in this example here, prediction error sequence for the true predictor (i.e., 1 = a) is given by:

e[n]=s[n]-as[n-1]=[n]

(see example 5.1 presented earlier). Thus the prediction of the signal is exact except at the time origin.

This example illustrates that with enough data the autocorrelation method yields a solution close to the true single-pole model for an impulse input.

N

N

kk a

arαrkrαrE

2

24

010

1

1000 1

1100

Page 38: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 38

Limitations of the linear prediction model

When the underlying measured sequence is the impulse response of an arbitrary all-pole sequence, then autocorrelation methods yields correct result.

There are a number of speech sounds that even with an arbitrary long data sequence a true solution can not be obtained.

Consider a periodic sequence simulating a steady voiced sound formed by convolving a periodic impulse train p[n] with an all-pole impulse response h[n].

Z-transform of h[n] is given by:

p

k

kk z

zH

1

1

1

Page 39: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 39

Limitations of the linear prediction model

Thus

Normal equations of this system are given by (see Exercise 5.7)

Where autocorrelation of h[n] is denoted by rh[]=h[]*h[-].

Suppose now that the system is excited with an impulse train of the period P:

p

kk nknhnh

1

pi ,kirkirαp

khhk

11

P

… h[n]

k

kPnhns

Page 40: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 40

Limitations of the linear prediction model

Normal equations associated with s[n] (windowed over multiple pitch periods) for an order p predictor are given by:

It can be shown that rn[] is equal to periodically repeated replicas of rh[]:

but with decreasing amplitude due to the windowing (Exercise 5.7).

pi ,kirkirαp

knnk

11

n hk

r r kP

Page 41: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 41

Limitations of the linear prediction model

The autocorrelation function rn[] of the windowed signal s[n] can be thought of as “aliased” version of rh[] due to overlap which introduces distortion:

1. When aliasing is minor the two solutions are approximately equal.

2. Accuracy of this approximation decreases as the pitch period decreases (e.g., high pitch) due to increase in overlap of autocorrelation replicas repeated every P samples.

Page 42: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 42

Limitations of the linear prediction model

Sources of error: Aliasing increases with high pitched speakers

(smaller pitch period P). Signal is not truly periodic. Speech not always all-pole. Autocorrelation is a suboptimal solution. Covariance method capable of giving optimal

solution, however, is not guaranteed to converge when underlying signal does not follow an all-pole model.

Page 43: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 43

The Levinson Recursion of the Autocorrelation method

Direct inversion method (Gaussian elimination):

requires p3 multiplies and additions. Levinson Recursion (1947):

Requires p2 multiplies and additions Links directly to the concatenated lossless tube model

(Chapter 4) and thus a mechanism for estimating the vocal tract area function from an all-pole-model estimation.

nn rR 1

Page 44: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 44

The Levinson Recursion of the Autocorrelation method

Step 1:

for i=1,2,…,p

Step 2:

Step 3:

Step 4:

end

11

1

1

i

i

j

iji Ejirirk

0 & 0 000 rE

11 ,)1()(

)1(

ijk

ki

jiii

jij

iii

)1(21 ii EkE

i

pjpjj 1 ,*

ki-partial correlation coefficients - PARCOR

Page 45: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 45

The Levinson Recursion of the Autocorrelation method

It can be shown that on each iteration that the predictor coefficients k, can be written as solely functions of the autocorrelation coefficients (Exercise 5.11).

Desired transfer function is given by:

Gain A has yet to be determined.

p

kk z

AzH

1

1*1

Page 46: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 46

Properties of the Levinson Recursion of the Autocorrelation method

1. Magnitude of partial correlation coefficients is less than 1:

|ki|<1 for all i.

2. Condition under 1 is sufficient for stability; if all |ki|<1 then all roots of A(z) are inside the unit circle.

3. Autocorrelation Method gives a minimum-phase solution even when the actual system is mixed-phase.

Page 47: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 47

Example 5.4 Consider the discrete-time model of the complete transfer function

from the glottis to the lips derived in Chapter 4 (Equation 4.40), but without zero contributions from the radiation and vocal tract:

Suppose we measure a single impulse response denoted by h[n] wich is equal to the inverse z-transform of H(z) and estimate the model with autocorrelation method setting the number of poles of Ĥ(z) correctly; p=2+2Ci, and with prediction error defined over the entire duration of h[n] which yields a solution

2 1 1

1

1 1 1iC

k kk

AH z

z c z c z

21 1 1

1

1 1 1iC

k kk

AH z

z c z c z

Page 48: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 48

Experimentation Results

Page 49: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 49

Properties of the Levinson Recursion of the Autocorrelation method

Formal explanation: Suppose s[n] follows an all-pole model Prediction error function is defined over all time (i.e., no

window truncation effects:

and are the Fourier transform phase functions for the minimum- and maximum-phase contributions of S(), respectively.

Autocorrelation solution can be expressed as (Exercise 5.14):

min maxs ss

jjs sS M e M e

mins max

s

maxmaxmin 2 ˆ ssss j

sj

s eMeMS

Page 50: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 50

Properties of the Levinson Recursion of the Autocorrelation method

Exercise 5.14 Rationalization of the Result:

is the minimum-phase contribution due to the vocal tract poles inside the unit circle, and is maximum-phase contribution due to glottal poles outside the unit circle. Resulting estimated frequency response can be expressed as:

The phase distortion of synthesized speech can have perceptual consequence since a gradual onset of the glottal flow, and thus of the speech waveform during the open phase of the glottal cycle, is transformed to a “sharp attack” consistent with the energy concentration property of minimum-phase sequences (Chapter 2).

v ghjj

h hH M e M e

v g

ghgv js

jh eMeMH 2 ˆ

Page 51: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 51

Properties of the Levinson Recursion to Autocorrelation method

4. Reverse Levinson Recursion:How to obtain lower level model from higher ones?

5. Autocorrelation matching: Let rn[] be the autocorrelation of the speech signal s[n+m]w[m] and rh[] the autocorrelation of h[n]=-1{H(z)} then:

rn[] = rh[] for ||≤p

1,...,2,1for ,1

12

1

ijk

k

k

ijii

ij

i

ij

iii

Page 52: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 52

Autocorrelation Method

Gain Computation:

En – is the average minimum prediction error for the pth-order predictor.

If the energy in the all-pole impulse response h[m] equals the energy in the measurement sn[m] ⇒

Squared gain equal to the minimum prediction error.

2

1

0p

n n k nk

A E r r k

Page 53: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 53

Autocorrelation Method

Relationship to Lossless Tube Model: Recall that for the lossless concatenated tube model, with

glottal impedance Zg(z)= ∞ (open circuit), with the transfer function:

Recursively obtained from:

N-number of tubes and where reflection coefficients rk is a function of cross-sectional areas of successive tubes, i.e.,

1

, where 1N

kk

k

AV z D z z

D z

zDzD

NkzDzrzDzD

zD

N

kk

kkk

,...,2,1 ,

11

11

0

kk

kkk AA

AAr

1

1

Page 54: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 54

Relationship to Lossless Tube Model:

Levinson Recursion:

Can be written in the ℤ domain (see Appendix 5.B)

Starting condition is obtained by mapping 00=0 to

Two recursions are identical when ri=-ki which then makes Di(z)=Ai(z).

1

, where 1p

kk

k

AH z A z z

A z

0

1 1 1

1

, 1, 2,...,i i i ii

p

A z

A z A z k z A z k p

A z A z

0

0 0

1

1 1k

k

k

A z z

Page 55: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 55

Relationship to Lossless Tube Model:

Since the boundary condition was not included in the lossless tube model, V(z) represents the ratio between an ideal volume velocity at the glottis and at the lips:

Speech pressure measurement at the lips output, however, has embedded within it the glottal shape G(z), as well as radiation at the lips R(z). Recall that for the voiced case (with no vocal tract zeros):

The presence of glottal shape, i.e., G(z), thus introduces poles that are not part of vocal tract. The net effect of glottal shape is typically 6dB/octave fall-off (see slide 94 of

the presentation Acoustic of Speech Production) to the spectral tilt of V(z), The influence of the glottal flow shape and radiation load can be

approximately removed with a pre-emphasis of 6dB/octave spectral rise.

( )

( )L

g

U zV z

U z

1

2 1 * 1

1

1( ) ( ) ( )

1 1 1i

k

C

kk

zH z AG z V z R z A

z c z c z

Page 56: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 56

Example 5.5 In the following figure two examples that show good matches

to measured vocal tract area functions for the vowels /a/ and /i/ derived from estimates of the partial correlation coefficients.

Page 57: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 57

Frequency Domain Interpretation

Consider an all-pole model of speech production:

Where A() is given by:

Define Q() as the difference of the log-magnitude of measured and modeled spectra:

Recall:

A

A H

zA

AzH

p

k

kjkeA

1

1

222

logloglog

H

SHSQ

E

ASH

H

ASSAEzSzAzE

Page 58: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 58

Frequency Domain Interpretation

Thus we can write Q() as:

Thus as e[n] is minimized ⇒ E() is minimized, which in turn ⇒ Q() minimized ⇒ spectral difference between actual measured speech and modeled spectrum is minimized.

2

logA

EQ

Page 59: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 59

Linear Prediction Analysis of Stochastic Speech Sounds Linear Prediction analysis was motivated with observation that for a

single impulse or periodic impulse train input to an all-pole vocal tract model, the prediction error is zero “most of the time”.

Such analysis appears not to be applicable to speech sounds with

fricative or aspirated sources modeled as a stochastic (or random) process.

However, autocorrelation method of linear prediction can be formulated for the stochastic case where a white noise input takes on the role of the single impulse.

The solution to a stochastic optimization problem - analogous to the minimization of mean-squared error function En, leads to normal equations which are the stochastic counterparts to our earlier solution.

Derivation and interpretation of this stochastic optimization problem is

left as an exercise.

Page 60: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 60

How well does linear predication describe the speech signal in time and in frequency?

Time Domain Suppose:

Underlying speech model is all-pole model of order p, and Autocorrelation method is used in the estimation of the

coefficients of the predictor polynomial P(z).

If predictor coefficients are estimated exactly then the prediction error: Is perfect impulse train for voiced speech A single impulse for a plosive A white noise for noisy (stochastic) speech.

Speechmeasurement

Criterion of “Goodness”

Predictionerror

s[n] e[n]A(z)=1-P(z)

Page 61: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 61

Time Domain

Autocorrelation method of linear prediction analysis does not yield such idealized outputs when the measurement s[n] is inverse filtered by the estimated system function A(z) (method limitation): Even when the vocal tract response follow an all-pole

model, true solution can not be obtained, since the obtained solution approached to the true solution in the limit when infinite amount of data is available.

In a typical waveform segment, the actual vocal tract impulse response is not all-pole for variety of reasons: Presence of zeros due to:

The radiation load, Nasalization, Back vocal cavity during frication and plosives.

Glottal flow shape – even when adequately modeled, is not minimum phase (see example 5.6).

Page 62: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 62

Prediction Error Residuals•Autocorrelation method of linear prediction of order 14

•Estimation performed over 20 ms Hamming windowed speech segments.

Page 63: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 63

Prediction Error Residuals

Reconstructing residuals form an entire utterance typically one hears in the prediction error: Not a noisy buzz – as expected from idealized

residual, but rather Roughly the speech itself ⇒ Some of the vocal tract spectrum is passing through

the inverse filter.

Page 64: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 64

Frequency Domain

Behavior of linear prediction analysis can be studied alternatively in frequency domain: How well the spectrum derived form linear prediction

analysis matches the spectrum of a sequence that follows: An all-pole model, and Not an all-pole model.

Page 65: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 65

Frequency Domain-Voiced Speech

Recall for voiced speech s[n]:with Fourier transform Ug().

Vocal tract impulse response with all-pole frequency response H(). Windowed speech sn[n] is:

Fourier transform of windowed speech sn[n] is:

Where: W() - is the window transform o=2/P - is the fundamental frequency

k

g kPnnu

mwmnsnsm

k

kWkHP

S 00

2

Page 66: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 66

Frequency Domain-Unvoiced Speech

Recall for unvoiced speech (stochastic sounds):

Linear prediction analysis attempts to estimate |H()| - spectral envelope of the harmonic spectrum S().

Spectral envelope

Periodogramof noise

222

ww oNN UHS

Page 67: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 67

Schematics of Spectra for Periodic and Stochastic Speech Sounds

Page 68: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 68

Properties:

1. For large p |H()| matches the Fourier transform magnitude of the windowed signal |S()|.

Page 69: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 69

Properties:

2. Spectral peeks are better matched than spectral valleys

Page 70: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 70

Properties:

Page 71: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 71

Synthesis Based on All-pole Modeling Properties:

Now able to synthesize the waveform from model parameters estimated using linear prediction analysis:

Synthesized signal:

so[n] e[n]A(z)=1-P(z)

zAzH

1 Au[n] s[n]

p

kk nAuknsns

1

Page 72: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 72

Synthesis Based on All-pole Modeling Important Parameters to Consider:

Window Duration – 20-30 [ms] to give a satisfactory time-frequency tradeoff (Exercise 5.20). Duration can be adaptively varied to account for different time-frequency resolution

requirement based on: Pitch Voicing state Phoneme class.

Frame Interval – Typical rate at which to perform analysis is 10 [ms].

Model Order – There are three components to be considered:

1. Vocal tract: On average “resonant density” of one resonance per 1000 Hz. Order of the system: #poles=2 x #resonances (e.g., for 5000 Hz bandwidth

signal 2x5=10 poles) 2. Glottal flow:

2-pole maximum-phase model3. Radiation at lips:

1 zero inside the unit circle ⇒ 4 poles provide adequate representation. Total of 16 poles Remarks: Magnitude of speech frequency is preserved – frequency phase

response is not preserved.

Page 73: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 73

Synthesis Based on All-pole Modeling

Voiced/Unvoiced State and Pitch Estimation: Currently no discrimination is done between for example

plosive and fricative unvoiced speech sound categories. Pitch is estimated during voiced regions of speech only.

However, Pitch estimation algorithms typically estimate pitch as well as perform voiced/unvoiced classification.

A degree of voicing may be desired in more complex analysis and synthesis methods: Voicing and turbulence occurs simultaneously

Voiced fricatives Breathy vowels.

Page 74: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 74

Synthesis Based on All-pole Modeling

Synthesis Structures: Determine excitation for each frame Generate excitation for each frame by:

Concatenating an impulse train during voiced signal (spacing determined by the time-varying pitch contour)

White noise during unvoiced signal. Compute Gain

Directly by measuring frame energy Using Autocorrelation method

Voiced Speech: Magnitude of impulse is square root of signal energy.

Unvoiced Speech: Noise variance = signal variance.

Update filter values on each frame. Overlap and add signal at consecutive frames:

Page 75: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 75

Synthesis structures

Page 76: Speech Processing Analysis and Synthesis of Pole-Zero Speech Models

April 19, 2023 Veton Këpuska 76

Alternate Synthesis Structures