20
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis L. Mitrofanov Belarusian State University, Radiophysics Department, Minsk, Belarus VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS GENEVA - AUGUST 27-29, 2003 ISCA Tutorial and Research Workshop International Speech Communication Association

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis

Embed Size (px)

Citation preview

Entropy and Dynamism Criteria for Voice Quality Classification

Applications

Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis L.

Mitrofanov

Belarusian State University, Radiophysics Department, Minsk, Belarus

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

GENEVA - AUGUST 27-29, 2003

ISCA Tutorial and Research Workshop International Speech Communication

Association

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationVoice Quality Classification Applications

Introduction System design Experiment Conclusion

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationIntroduction

Audio is a large and extremely variable data class.

The range of sounds is large, from music genres to animal cries to synthesizer samples.

Any of the above can and will occur in combination.

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationExisting Approaches

Signal Processing Techniques Spectrum Modulation spectrum Temporal Information

Decision Making Bayesian Information Criterion (BIC) Log Likelihood Ratio Hidden Markov Model (HMM)

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

Association

Block diagram of the proposed system

Feature vector extraction

Neural network

Entropy&

DynamismHMM

Input Data(Wave file)

Segments

Vectors(Mel Cepstra)

Probability of Russian phonemes

Entropyand

Dynamism

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationDefinitions

Entropy and averaged entropy

Entropy is measure of the uncertainty or disorder in a given distribution

nk

K

knkn xqPxqPh |log| 2

1

2

2

1N

N

n

nttn h

NH

We useN=40

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationDefinitions

Dynamism and average dynamism

Dynamism is a measure of the rate of change of a quantity

K

knknkn xqPxqPd

1

21 ]||[

2

2

1N

N

n

nttn d

ND

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

Association

Feature Vectors extraction

We use 12 Mel Cepstra coefficients in 30ms window with shifting of frame 10ms, for 4-15min wave files of russian speech, non-russian speech and music.

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

Association

S0

S1

S2

S3

S4

S5

S6

HMM

HMM

Define HMM for signal – one HMM state for every segment we want to find

Perform a Viterbi search of an optimal path using probabilities from previous step

Determine segment boundaries as a moments of HMM states change

Hidden Markov Model

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

Association

Neural network for probabilities generation : grounds

Neural networks can model probabilities distribution with a high accuracy due to their ability to approximate a large variety of functions

If training neural network doesn’t stop in local minimum

the outputs can be considered as classes probabilities

Neural Network

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

Association

Neural network for probabilities generation : structure

• Fully connected mutilayer perceptron

– Input layer size equals to feature vector size

– Output layer size equals to probability of phonemes

– Number and sizes of hidden layers varies

– Tangent activation for hidden neurons

– Softmax activation for output neurons

Mutilayer Perceptron

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationResults

Music

Entropy histogram

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationResults - Russian Speech

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationResults - Foreign

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationResults - Russian and Foreign

Blue is Russian, pink is French

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationResults

Two Russian speakers (blue and brown) and Music

(others)

Russian speaker (blue) and Music

(pink)

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

Association

Results Pure Russian & “Czech” Russian

There some difference even between native speech and Russian with Czech accent

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationResultsEntropy histograms of “normal” (brown) and

“rough” (blue) French speech

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationResultsEntropy histograms for “normal”

(brown), “rough” (blue) and “lips” (lips) French speech

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS

ISCA Tutorial and Research Workshop International Speech Communication

AssociationConclusion

Further research Parameter vectors, their size, number of

context frames Specialized HMM structures for a certain

type of speech signals

Conclusion Entropy and Dynamism features, as

experiments show, can be successfully used for automatic signal segmentation. Further research in this area can lead to better practical results.