22
Digital Speech and Audio Processing Spring 2008 - 2 E. Nemer 1. Autocorrelation Function Deterministic Signal Voiced speech Random Signal Unvoiced Speech 2. Average Magnitude Difference Function 3. Windowing issues. 4. Pitch Estimation

Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 1E. Nemer

Digital Speech and Audio Processing Spring 2008 - 2E. Nemer

1. Autocorrelation Function

• Deterministic Signal

• Voiced speech

• Random Signal

• Unvoiced Speech

2. Average Magnitude Difference Function

3. Windowing issues.

4. Pitch Estimation

Page 2: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 3E. Nemer

Autocorrelation Function

• The autocorrelation function of a deterministic signal x(n), is defined by:

∑∞

−∞=

+=m

kmxmxk )()()(φ

∑−−=

=

+=1

0)()()(

kNn

nkmxmxkφ

• In practice, we would window the signal so that we can compute the above sum . Assume a rectangular window of length N, then the simple form of the autocorrelation is :

Digital Speech and Audio Processing Spring 2008 - 4E. Nemer

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

Original x(m)

Shifted versions x(m-k)

• Consider the simple case of a sinusoidal signal :

⎟⎠⎞

⎜⎝⎛ +⋅= θπ m

FsfAmx 02cos)(

0fFsPperiod =

Autocorrelation Function

Page 3: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 5E. Nemer

- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

0 1 2 3 4 5 6 7 8 9 1 0- 1

- 0 . 5

0

0 . 5

1

0 1 2 3 4 5 6 7 8 9 1 0- 1

- 0 . 5

0

0 . 5

1

Autocorrelation Function

∑∞

−∞=

+=m

kmxmxk )()()(φ

Multiply-addGet one point

Digital Speech and Audio Processing Spring 2008 - 6E. Nemer

)()( Pkk +=φφ for periodic functions (period=P)

)()( kk −=φφ symetry

kk ∀≤ ),0()( φφφ(0) is the energy of x(n)

follows Cauchy’s inequality

Properties of autocorrelation function

Autocorrelation Function

- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

Page 4: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 7E. Nemer

0 50 100 150-5

0

5

10x 108

0 50 100 150 200 250 300 350-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

Periodic peaks

Rn(k)

Example for Voiced Signal

Digital Speech and Audio Processing Spring 2008 - 8E. Nemer

)( )(

Density SpectralPower Function ation Autocorrel

fSR XX τ F

)( )( fSR XX τ 1−F

∫∞

∞−

−= ττ τπ deRfS fjXX

2)()(

∫∞

∞−

= dfefSR fjXX

τπτ 2)()(

[ ] )0()( 2XRtXEP == [ ] ∫

∞−

== dffStXEP X )()( 2

Power Spectral Density

Page 5: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 9E. Nemer

• A more reasonable definition of the auto-correlation is expressed using the pair of windowed signals :

∑∞

−∞=

−−+−=m

n mknwkmxmnwmxkR )()()()()(

window for x(m) window for x(m+k)

∑∞

−∞=

−+−−=−=m

nn mknwkmxmnwmxkRkR )()()()()()(

Easy to show

Autocorrelation Function

Digital Speech and Audio Processing Spring 2008 - 10E. Nemer

• Another way to view the windowed autocorrelation is by considering the product of the speech signal with its delayed version, passed thru a filter whose response is :

[ ]∑∞

−∞=

−⋅−=m

kn mnhkmxmxkR )()()()(

)()()( knwnwnhk +=

Delay k

)(nhk)(nx

)( knx −)(kRn

Autocorrelation Function

[ ] [ ]∑∞

−∞=

−+−⋅−=m

n mknwmnwkmxmxkR )()()()()(

Page 6: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 11E. Nemer

∑−

+−=

−−+−=kn

Nnmn mknwkmxmnwmxkR

1

)()()()()(

201 n = 300

50,,0 L=k

1+− Nn nm

)( mnw −

Autocorrelation Function

e.g. N = 100n = 300

Digital Speech and Audio Processing Spring 2008 - 12E. Nemer

∑−

=

−−+−=k

mn kmnwkmxmnwmxkR

300

201

)()()()()(

)0()300()1()299(.......)98()202()99()201()1(

wxwxwxwxRn +=

Autocorrelation Function

)0()300()50()250(.......)98()202()99()201()50(

wxwxwxwxRn +=

Total number of products in the sum : N – k

Page 7: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 13E. Nemer

∑−−=

=

+−

=1

0)()(1)(

kNm

mun kmxmx

kNkR

∑−−=

=

+=1

0)()(1)(

kNm

mb kmxmx

NkR

)(1)( kRNkkR unbau ⎟⎠⎞

⎜⎝⎛ −=

unbiased

biased

Biased for finite N, but Asymptotically unbiased for

∞→N

Autocorrelation Function

Recall the simple definition of the autocorrelation (not windowed)

Digital Speech and Audio Processing Spring 2008 - 14E. Nemer

Autocorrelation voiced speech

00

1T

F =

0T

0TFemale

Page 8: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 15E. Nemer

0T

0T

00 3

13T

F =

Autocorrelation voiced speech Male

Digital Speech and Audio Processing Spring 2008 - 16E. Nemer

Sixteen.wav filtered at 600Hz

sixteen.wav

Page 9: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 17E. Nemer

An ensemble of sample functions:

The case of a random process

Digital Speech and Audio Processing Spring 2008 - 18E. Nemer

When X(t) is a stationary random process , then :

The mean of X(t) is

fX(t)(x) : the probability density function (pdf)The autocorrelation function of X(t) is

[ ])()( tXEtX =μ

xxxf tX d )(

)(∫∞

∞−=

[ ]

)(

),(

),(

)()( )(R

12

-

- 2121)((0)21

-

- 2121)()(21

2121

12

21

ttR

dxdxxxfxx

dxdxxxfxx

tXtXE,tt

X

ttXX

tXtX

X

−=

=

=

=

∫ ∫∫ ∫∞

∞ −

Xμ=

The case of a random process

Page 10: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 19E. Nemer

For convenience of notation , we redefine

1. The mean-square value

2. The autocorrelation is symetrical :

3. The largest value is at lag 0

[ ] allfor , )()()( tτtXtXERX −=τ

[ ] 0 τ, )((0) 2 == tXERX

τ)()( −= RRX τ

(0))( XX RR ≤τ

The case of a random process

Digital Speech and Audio Processing Spring 2008 - 20E. Nemer

The RX(τ) provides the interdependence information of two random variables obtained from X(t) at timesτ seconds apart

The case of a random process

Page 11: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 21E. Nemer

),(~)42cos()( πππ −ΘΘ+= UttX

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1Realizations of Sinusoid of cont. random phase

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

⎪⎩

⎪⎨⎧ ≤≤−=Θ

elsewhere,0

,21

)( πθππf θ

Θf

θπ− π

Random process : sinusoid with random phase

Digital Speech and Audio Processing Spring 2008 - 22E. Nemer

Θ)2cos()( += tπfAtX c

[ ] )2cos(2

)()()(2

tπfAtXτtXER cX =+=τ

Random process : sinusoid with random phase

Page 12: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 23E. Nemer

N=1000;X=randn(1,N);

m=50;Rx=Rx_est(X,m);

plot(X)plot([-m:m],Rx)

0 100 200 300 400 500 600 700 800 900 1000-10

-8

-6

-4

-2

0

2

4

6

8

10Gaussian Random Process

Random process : white noise

Digital Speech and Audio Processing Spring 2008 - 24E. Nemer

Power Spectral Density

0 100 200 300 400 500 600 700 800 900 1000-10

-8

-6

-4

-2

0

2

4

6

8

10Gaussian Random Process

for j=1:100,Rx=Rx_est(X,m);Sx=fftshift(abs(fft(Rx)));

end;

Sx_av=Sx_av+Sx;

Sx_av=Sx_av/100;

-50 -40 -30 -20 -10 0 10 20 30 40 50-0.2

0

0.2

0.4

0.6

0.8

1

1.2Autocorrelation function

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

f

Power Spectral Desity

Page 13: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 25E. Nemer

Autocorrelation – unvoiced speech

Digital Speech and Audio Processing Spring 2008 - 26E. Nemer

Average Magnitude Difference Function

• Instead of taking average over magnitude, we do it over magnitude difference

• Operator T: T(x) is the concatenation of a linear filter H(z)=1-z-p

and taking absolute value |x|• It will be used in a computationally efficient solution to pitch

estimation

∑+−=

−−=Δn

Nnmn pmxmxpM

1

|)()(|)(

Page 14: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 27E. Nemer

AMDF vs. autocorrelation

0 50 100 1500

1

2

3

4

5

6

7x 105

0 50 100 150 200 250 300 350-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

0 5 0 1 0 0 1 5 0-5

0

5

1 0x 1 0 8

Digital Speech and Audio Processing Spring 2008 - 28E. Nemer

Summary

• Short-time energy

• Short-time amplitude

• Short-time average zero-cross (ZC) rate

• Short-time amplitude difference

• Short-time auto-correlation

∑+−=

=n

Nnmn mxE

1

2 )(

∑+−=

=n

Nnmn mxM

1

|)(|

∑+−=

−−=Δn

Nnmn pmxmxpM

1

|)()(|)(

∑∞

−∞=

−−+−=m

n mknwkmxmnwmxkR )()()()()(

∑+−=

−−=n

Nnmn mxmxZ

1|)]1(sgn[)](sgn[|

Page 15: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 29E. Nemer

Window Effect

• Obvious advantage: localizing a part of speech for analysis.

• Less obvious disadvantage: how will the windowing affect the analysis results?– Compare the localization property of two extreme cases: N=1 vs. N=∞

» N=1: best in time but worst in frequency» N= ∞: best in frequency but worst in time

• Fundamentally speaking, windowing is the pursuit of a better tradeoff between time and frequency localization

Why do we window the signal :

Digital Speech and Audio Processing Spring 2008 - 30E. Nemer

Window Parameters

• Window length– Not too short: too few samples are not enough to resolve the uncertainty – Not too long: too many samples would introduce more uncertainty

• Window shape– Rectangular window is conceptually the simplest, but suffers from spectral

leakage problem

• It is always a tradeoff: there is no universally optimal window; which window is more effective depends on specific applications

Page 16: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 31E. Nemer

Rectangular Window

Digital Speech and Audio Processing Spring 2008 - 32E. Nemer

Various Windows

• Hamming/Hanning window – Also known as raised Cosine window

• Blackman window– Based on two cosine terms

• Bartlett window– Also known as triangular window

• Kaiser-Bessel window– Based on Bessel functions

Page 17: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 33E. Nemer

Hanning / Hamming Windows

)2cos(*5.05.0)(NnnW π

+=N-point Hann window

)2cos(*46.054.0)(NnnW π

+=N-point Hamming window

)2cos(*)1()(NnaanW π

−+=N-point raised cosine window

Digital Speech and Audio Processing Spring 2008 - 34E. Nemer

Hamming

Page 18: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 35E. Nemer

Blackman Window

)2cos(08.0)cos(5.042.0)(Nn

NnnW ππ ++=

Digital Speech and Audio Processing Spring 2008 - 36E. Nemer

Barlett Window

NNnnW |2/|1)( −

−=

Page 19: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 37E. Nemer

Kaiser-Bessel Window

zero order modified Bessel function of the first kind

B determines the tradeoff between the main lobe and sidelobesand is often specified as a half integer multiple of pi

Digital Speech and Audio Processing Spring 2008 - 38E. Nemer

Kaiser-Bessel Window

Page 20: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 39E. Nemer

Pitch Period Estimation

P

PP

Digital Speech and Audio Processing Spring 2008 - 40E. Nemer

• How to estimate the period for sinusoids?

– Fourier Transform (will be covered by the next lectures)

• In the time domain, we might consider

– Autocorrelation function – recall its periodic property

– Average Magnitude Difference Function (AMDF) – essentially follow the same motivation but computationally more efficient

Pitch Period Estimation

Page 21: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 41E. Nemer

P1 P2P3

P=Average(P1,P2,…)

Pitch Period Estimation

Digital Speech and Audio Processing Spring 2008 - 42E. Nemer

Pitch Estimation using prediction

We assume the signal is periodic (i.e. repeats itself) , and let L be the estimate of the pitch period.

We then ‘predict’ the current period, from the previous one.

And the prediction error, for the estimate L is then

)()( Lnxcnx −⋅=)

Page 22: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 43E. Nemer

)()()()()( Lnxcnxnxnxne −⋅−=−= )

[ ] ( )[ ]22 )()()( LnxcnxEneE −⋅−=

[ ] [ ] [ ] [ ])()(2)()()( 2222 LnxnxEcLnxEcnxEneE −⋅−−+=

[ ] [ ] [ ])()(2)(2)( 22 LnxnxELnxcEneEc

−−−=∂∂

[ ] 0)(2 =∂∂ neEc

[ ][ ] ),(

),0()(

)()(2 LLR

LRLnxELnxnxEc

x

x=−−

=

Pitch Estimation using prediction

Energy Of error

MinimizeEnergy Of error

Digital Speech and Audio Processing Spring 2008 - 44E. Nemer

• For a given range of estimates for the pitch L = L1 to L2

• Compute the autocorrelation at lags 0, L

• Compute the optimum gain c, and the prediction error:

• Choose the value of L (pitch) that has the minimum error.

),(),0(LLRLRc

x

x=

),(),0()0,0(

2

LLRLRRMinErr

x

xx −=

Pitch Estimation using prediction