26
Ala’a Spaih Ala’a Spaih Abeer Abu-Hantash Abeer Abu-Hantash Directed by Directed by Dr.Allam Mousa Dr.Allam Mousa

Ala’a Spaih Abeer Abu-Hantash Directed by

  • Upload
    tasya

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Text-Independent Speaker Identification System. Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa. 1. 2. 3. 4. 5. Speaker Recognition Field. System Overview. MFCC & VQ. Experimental Results. Live Demo. Outline for Today. - PowerPoint PPT Presentation

Citation preview

Page 1: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Ala’a Spaih Abeer Abu-Hantash Ala’a Spaih Abeer Abu-Hantash Directed byDirected by

Dr.Allam MousaDr.Allam Mousa

Page 2: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Outline for TodayOutline for Today

Speaker Recognition Field1.

System Overview2.

MFCC & VQ3.

Experimental Results4.

Live Demo5.

Page 3: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Speaker Recognition FieldSpeaker Recognition Field

Speaker Recognition

Speaker Verification Speaker Identification

Text

Dependent

Text

Independent

Text

Independent

Text

Dependent

Page 4: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

System OverviewSystem Overview

Speech input

Feature extraction

Training

modeSpeaker modeling

FeatureMatching

SpeakerModel

Database

DecisionLogic

SpeakerID

Testing

Mode

Page 5: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Feature ExtractionFeature Extraction

Feature extraction:is a special form of dimensionality reduction.

The aim: is to extract the formants.

Page 6: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Feature ExtractionFeature Extraction

The extracted features must have specific characteristics:

Easily measurable, occur naturally and frequently in speech.

Not change over time.

Vary as much among speakers, consistent for each speaker.

Not affected by: speaker health, background noise.

Many algorithms to extract them:

LPC,LPCC,HFCC,MFCC.

We used Mel Frequency Cepstral Coefficients algorithm:

MFCC.

Page 7: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Feature Extraction Using MFCCFeature Extraction Using MFCC

Input speechFraming and windowing

Fast Fourier transform

Absolute value

Mel scaled-filter bank

Log

Discrete cosine transformFeature vectors

Page 8: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Framing And WindowingFraming And Windowing

FFT

Spectrum

Vocal tract

Glottal

pulse

Page 9: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Mel Scaled-Filter BankMel Scaled-Filter Bank

Spectrum

mel(f)= 2595*log10(1+f/700)

Mel

spectrum

Page 10: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

CepstrumCepstrum

Melspectrum

MFCC

Coeff.

DCT of the logarithm of the magnitude spectrum, the glottal pulse and the impulse response can be separated.

Page 11: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

ClassificationClassification

Classification, that is to build a unique model for each speaker in the database.

Two major types of models for classification.

Stochastic models:GMM,HMM,ANN

Template models:VQ , DTW

We used VQ algorithm.

Page 12: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

VQ AlgorithmVQ Algorithm

The VQ technique consists of extracting a small number of representative feature vectors.

The first step is to build a speaker-database consisting of N codebooks, one for each speaker in the database.

SpeakerFeature vectors

Clustered into

codewords

Speaker model

(codebook)

This done by

K-means

Clustering

algorithm

Page 13: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

K-means ClusteringK-means Clustering

start

No. of clusters k

centroids

Distance objects to centroids

Grouping based on minimum distance

No change End

Noyes

Page 14: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

VQ Example

Given data points, split into 4 codebook vectors with initial values at (2,2),(4,6),(6,5),(8,8).

Page 15: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

VQ Example

Once there’s no more change, the feature space will be partitioned into 4 regions. Any input feature can be classified as belonging to one of the 4 regions. The entire codebook can be specified by the 4 centroid points.

Page 16: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

If we set the codebook size to 8 then the output of the clustering will be:

K-means ClusteringK-means Clustering

0 2 4 6 8 10 12-8

-6

-4

-2

0

2

4

6

8

10

0 2 4 6 8 10 12-6

-4

-2

0

2

4

6

8

VQ

MFCC’s of a speaker (1000x12) Speaker Codebook (8x12)

Page 17: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Feature Matching

d2(x,y) (x i y i)2

i1

D

For each codebook a distortion measure is computed.The speaker with the lowest distortion is chosen. Define the distortion measure Euclidean distance.

Page 18: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

System Operates In Two ModesSystem Operates In Two Modes

OfflineOffline

OnlineOnline

Monitoring Microphone

Inputs

MFCCFeature

Extraction

Calculate VQ

Distortion

Make Decision &

Display

Page 19: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Applications

Speaker Recognition for Authentication. Banking application.

Forensic Speaker Recognition Proving the identity of a recorded voice can help to convict a criminal or

discharge an innocent in court.

Speaker Recognition for Surveillance. Electronic eavesdropping of telephone and radio conversations.

Page 20: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

ResultsResults

To show how the system identify the speaker according to Euclidean distance calculation.

Sp 1 Sp 2 Sp 3 Sp 4 Sp 5

Sp 1 10.7492 13.2712 17.8646 14.7885 13.2859

Sp 2 13.2364 10.2740 13.2884 11.7941 14.0461

Sp 3 17.5438 16.1177 11.9029 16.2916 17.7199

Sp 4 16.1360 13.7095 15.5633 11.7528 16.7327

Sp 5 14.9324 15.7028 17.2842 17.8917 12.3504

12 MFCC, 29 Filter banks, 64 Codebook size … ELSDSR database.

Page 21: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Results

Number of MFCC Vs. ID rate.

No. of

MFCC

ID

Rate

5 76 %

12 91 %

20 91 %

Frame Size Vs. ID rate.

Frame size(10-30) ms Good

Above 30 ms Bad

Page 22: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Results Results

The effect of the codebook size on the ID rate & VQ distortion.

82

84

86

88

90

92

94

96

98

100

0 50 100 150 200 250 300

Codebook Size

ID ra

te (%

)

0

2

4

6

8

10

12

14

0 50 100 150 200 250 300

Codebook Size

Mat

chin

g S

core

Page 23: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

ResultsResults

Number of filter-banks Vs. ID rate & VQ distortion.

0%

20%

40%

60%

80%

100%

120%

0 10 20 30 40 50

Number of Filters in Filter-Bank

ID ra

te (%

)

0

1

2

3

4

5

6

7

8

9

0 10 20 30 40 50

Number of Filters in Filter-bank

Mat

chin

g Sc

ore

Page 24: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

ResultsResults

The performance of the system on different test shot lengths.

Test speech length

ID

Rate

0.2 sec 60 %

2 sec 85 %

6 sec 90 %

10 sec 95 %

0

20

40

60

80

100

0 2 4 6 8 10 12

Test Speech Length (sec)

ID r

ate

(%)

Page 25: Ala’a Spaih                           Abeer Abu-Hantash          Directed by

Summary

Effect of changing some parameters on: MFCC algorithm. VQ algorithm.Our system identify the speaker regardless of the

language and the text.Satisfied results: The same training and testing environment. Test data needs to be several ten seconds.

Page 26: Ala’a Spaih                           Abeer Abu-Hantash          Directed by