ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise

8/10/2019 ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise

1/30

ZCR Based Identification of Voiced

Unvoiced and Silent Parts of SpeechSignal in Presence of Background

Noise

Presented by

Sivaranjan Goswami, B. Tech. 4thYear

Department of Electronics and Communication Engineering

Don Bosco College of Engineering and Technology

Assam Don Bosco University

Guwahati, Assam (India)

Contact: [email protected]
mailto:[email protected]:[email protected]


2/30

Outline of Presentation

Introduction and a Brief Overview

Speech Signal

Experimental Details Proposed Algorithms

Experimental Results

Discussion and Bibliography

2


3/30


4/30

Introduction (1 of 2)

The identification of voiced, unvoiced and

silent parts of speech signal is an important

step of speech processing.

It can be easily achieved by estimating short-

time zero crossing rate and short-time average

magnitude if background is quiet.

However in the presence of background noise,

it is a challenging task.

4


5/30

Introduction (2 of 2)

A simple algorithm is designed based onshort-time zero-crossing-rate (ZCR) and short-

time average magnitude to identify the

voiced, unvoiced and silent frames of speechin quiet background.

The algorithm is then improved to serve the

same purpose in the presence of realbackground noise.

The second algorithm is found to reduce the

errors of the first algorithm by 60% (approx.).5


6/30

A Brief Overview

The first algorithm is totally based on short-timezero-crossing-rate (ZCR) and short-time average

magnitude to identify the voiced, unvoiced and silent

frames of speech.

The modified algorithm processes only background

noise for 1 second at the beginning and creates a

reference of the background noise.

The noise reference is used for separation of voicedor unvoiced samples from samples containing only

noise.

6


7/30

Speech Signal

7


8/30

Human Speech Production System

8


9/30

Types of Excitation

Voiced

Unvoiced

Mixed Plosive

Whisper

Silent

9


10/30

Types of Excitation

Voiced=High Amplitude Low Frequency (ZCR), quasi periodic pulses

Unvoiced= Random signal with low amplitude and high ZCR

Mixed Plosive

Whisper

Silent

Only Voiced and Unvoiced excitations are of our interest.

10


11/30

Experimental Details

11


12/30

Calculation of Zero-Crossing

Rate(ZCR)

The ZCR of a signal within a short time interval

t has been found using the equation:

Where N is the number of times the polarityof the signal is changed during t

)1....(....................2 t

NZCRaverage

12


13/30

Decision of Voiced and Unvoiced

SpeechFor every time-frame, the average ZCR, fis calculated and the

power, xcorresponding to the frequency fis calculated using

Fourier Transform. Then the result is subjected to the

threshold condition given in relations 2 and 3,

Unvoiced: fN aand |xN| b .(2)

Voiced: fN cand |xN| d ..(3)

where, the subscript N denotes normalized value and a, b, c, d

are user defined threshold values between 0 and 1.

13


14/30

Proposed Algorithms

14


15/30

Algorithm For Quiet BackgroundStart

Calculate ZCR ofa 20 ms frame

Calculate power

using Fourier

Transform

Store ZCR and

power in memory

Are all

framesconsidered

?

Normalize ZCR and

power of a frame

Apply equations 2

and 3 to decide

voiced /unvoiced

Mark the frame assilent if it is neither

voiced nor unvoiced

Are all

frames

considered?

NoNo

Yes

Yes

Display result

End

Process 15


16/30

Assumptions for Background Noise

The Algorithm-1 is modified for noisy

background under the following assumptions:

1. The first 1 second of the signal contains only

background noise.

2. The frequency of the noise source is different

from the vocal tract frequency or ZCR.

3. The human voice has dominating amplitude,since mouth is closer to the microphone than

the noise source.

16


17/30

ZCR of Voiced Speech is

Independent of Noise

As shown in the figure, theZCR of voiced speech is

independent of noise,

under assumption 3.

17


18/30

Distinguishing Noise and Unvoiced

Speech It is found that when Algorithm-1 is subjected to

speech with background noise, many of the silent

frames are also marked as unvoiced because of their

similar amplitude and ZCR. The modified algorithm resolves this problem under

assumption 1 and 2.

The first 1 second of the recording is pure

background noise. Hence, a noise reference can be

created using the ZCR information of the first 1

second of the recorded speech.

18


19/30

Algorithm for Creating the Noise ReferenceStart

Calculate ZCR of

a 20 ms frame

Store ZCR in Noise

Reference vector

Are all

frames

considered

?No

YesDelete redundant time-

frames with repeated

ZCRs to reduce the size of

Noise Reference vector

End

Process 19


20/30

Modified Algorithm for Noisy Background

Start

Calculate ZCR of

a 20 ms frame

Calculate power

using Fourier

Transform

Store ZCR and

power in memory

Are allframes

considered

?

Normalize ZCR and

power of a frame

No

Yes

Is the ZCR

is close to

any ZCR in

the noise

reference

?

Apply equation 3

to decide

unvoiced/silent

Mark it assilent

1

Are allframes

considered

?

Yes

Display result

End

Process

1

2

2

Yes

No

No

20

Is it

marked

voiced by

equation

2?

No

1

Yes

Update Noise

Reference


21/30

Experimental Results

21


22/30

Case 1: Quiet Background

For quiet background the 1st algorithm and the modified algorithm gives

similar result.

1stAlgorithm Modified Algorithm

22


23/30

Case 2: Additive White Gaussian

Noise (AWGN)


In this case, the 1stalgorithm gives poor result, the second algorithm improves

the result, still, the accuracy is poor, since AWGN has uniform spectral power

density. 23


24/30

Case 3: Real Noise


In this case, the 1stalgorithm gives poor result, the second algorithm improves

the result since most of the assumptions are satisfied.

24


25/30

Comparison of the Two Algorithms

Table: Percentage of Silent Frames marked Unvoiced

Background First Algorithm Modified

AlgorithmNo Noise 0% 0%

AWGN 80% 30%

Natural Noise 58% 23%

25


26/30

Discussion and Bibliography

26


27/30

Discussion

Advantages: Simple to Implement

Accuracy is high

The information is found to be useful in speech

enhancement.

Drawbacks:

The first 1 second must contain only background noise.

The algorithm involves two loops, hence it needs further

modification in order to be implemented in real time.

It may not give accurate result if the noise contains

human voice, because the noise will also contain voiced

and unvoiced parts in that case.

27


28/30

Bibliography (1 of 2)1. Bachu R.G., Kopparthi S., Adapa B., Barkana B.D Separation of Voiced and

Unvoiced using Zero crossing rate and Energy of the Speech Signa l, ElectricalEngineering Department; School of Engineering, University of Bridgeport;

available at http://audio-fingerprint.googlecode.com/svn-

history/r62/trunk/referencias/ASEE12008 0044 paper.pdf

2. Thierry Dutoit A (Short) Introduction to Speech Processing, ailable at

http://tcts.fpms.ac.be/cours/1005-07-08/speech/icme2002 intro.pdf

3. John R. Deller, Jr. John H. L. Hansen and John G. Proakis. Discrete-Time

Processing of Speech Signals, JOHN WILEY and SONS, INC; New York

4. Douglas, S.C.; Chapter 18, Introduction to Adaptive Filters of Digital Signal

Processing Handbook; Ed. Vijay K. Madisetti and Douglas B. Williams; Boca

Raton: CRC Press LLC, 1999 available at http://www.dsp-

book.narod.ru/DSPMW/18.PDF5. S. Ghaemmaghami, M. Deriche, and B. Boashash A new approach to pitch and

voicing detection through spectrum periodicity measurement; 1997 IEEE

TENCON - Speech and Image Technologies for Computing and

Telecommunications, pp: 743-746

28


29/30


30/30

30

To download full paper:

https://gauhati.academia.edu/SivaranjanGoswami
https://gauhati.academia.edu/SivaranjanGoswamihttps://gauhati.academia.edu/SivaranjanGoswami

Documents

ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise