22
Entropy Analysis in Speaker Recognition Andreas Nautsch Hochschule Darmstadt, CASED, da/sec Security Research Group Berlin, 15.12.2015 Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 1/15

EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Entropy Analysis in Speaker Recognition

Andreas Nautsch

Hochschule Darmstadt, CASED, da/sec Security Research Group

Berlin, 15.12.2015

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 1/15

Page 2: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Outline

1. Motivation

2. Overview on Speaker Recognition

3. Biometric Strength of State-of-the-Art Voice Features

4. Conclusion

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 2/15

Page 3: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Introduction

Motivation: Different Approach towards Entropy

◮ Established: Goodness of (LLR) scoresFocus: scores values ⇔ expected meaning?

◮ Proposed: metric for the strength of biometric features◮ Collision probability of subjects within feature spaces◮ Metric towards biometric uniqueness◮ Comparability to other modalities on early stages

Face: 56 bit

Fingerprint: 84 bit

Iris: 249 bit

[Buchmann+14] N. Buchmann, C. Rathgeb, H. Baier, C. Busch: Towards electronic identification and trusted

services for biometric authenticated transactions in the Single Euro Payments Area, APF’14, 2014

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 3/15

Page 4: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Introduction

Overview on Speaker Recognition

◮ Voice as biometric characteristic

◮ Application scenarios and challenges (brief excerpt)◮ Call-Center fraud prevention:

natural free speech, variable duration and content

◮ Mobile devices: random PINs, short duration

◮ Forensic: various contents and signal qualities

Voice reference Feature extraction

Voice probe Feature extraction

Comparison Score

Accept

Reject

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 4/15

Page 5: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Introduction

Overview on Speaker Recognition

1. Psycho-acoustic spectral analyses

⇒ 60 Melody-Frequency Cepstral Coefficients (MFCCs)

Time

Airpressure

Time

Frequency

Time

MFCCs

−50

0

50

100

2. MFCC Clustering by Gaussian Mixture Models (GMMs)

⇒ 2048× 60 = 122 880 free parameters per voice sample

3. Total Variability Analysis: intermediate-sized vectors

⇒ 400-dimensional i-vectors per sample

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 5/15

Page 6: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Introduction

Overview on Speaker Recognition

4. Linear Discriminant Analyse (LDA)

⇒ 200-dimensional i-vector

5. Noise reduction by projection into spherical space

Probabilistic Discriminative speaker sub-spaces

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 6/15

Page 7: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Introduction

Cross-Entropy in Score-Domain: Cllr

Representing the Goodness of Log-Likelihood Ratio (LLR) scores:◮ Proper scoring rule:

Costs by too low genuine ⇔ too high impostor scores

◮ Generalized cross-entropy:Information-loss to perfect classification

−10 −5 0 5 100

2

4

6

8

10

LLR score S

Logarithmic

loss

Genuine scores

Impostor scores

Costlog(S |Hgenuine)

Costlog(S |Himpostor)

[Brummer10] N. Brummer: Measuring, refining and calibrating speaker and language information extracted

from speech. Ph.D. thesis, University of Stellenbosch, 2010.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 7/15

Page 8: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Introduction

Cross-Entropy in Score-Domain: Cllr

Examining all operating-points π◮ E : empiric Bayes error = πFNMR(η) + (1− π) FMR(η), η = −logit π

◮ Cllr =∫ inf

−infE dπ as application-independent uncertainty

◮ Aiming at: minimal information-loss: Cminllr =

∫ inf

−infEmindπ

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

NBER

E

Edefault

Emin

η ≈ 4.6 ⇒ odds = 1 : 100

[Brummer+11] N. Brummer, E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and

Code for Binary Classifier Score Processing. Tech. Report, AGNITIO Research, 2011.

[Nautsch14] A. Nautsch: Speaker Verification using i-Vectors., M.Sc. thesis, Hochschule Darmstadt, 2014.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 8/15

Page 9: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Entropy in Feature-space Domain

Relative Entropy in Feature Spaces

Can the biometric discrimination potential of i-vector featurespaces be measured?

◮ Measuring biometric uniqueness

◮ Goal: Comparability of feature extractors◮ of one biometric modality◮ among modalities

◮ Note: entropy of passwords H = L log2N [NIST06]Example: 4-digit PIN H ≈ 13.3 bits

[NIST06] NIST Tech. Rep.: Electronic authentication guideline, recommendations of the national institute of

standards and technology, information security, 2006.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 9/15

Page 10: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Entropy in Feature-space Domain

Estimating relative Entropy

◮ Two-class problem: p same subject, q other subjects

◮ Kullback-Leibler Divergence as lower bound [Adler+06]

39.5 bits

73.5 bits

p sub-space

q1 sub-space

q2 sub-space

−10 −5 0 5 10−10

−5

0

5

10

Feature space dimension 1

Fea

ture

spac

ed

imen

sio

n2

0

0.1

0.2

0.3

0.4

[Adler+06] A. Adler, R. Youmaran, S. Loyka: Towards a Measure of Biometric Information, IEEE CCECE, 2006.[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 10/15

Page 11: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Entropy in Feature-space Domain

Estimating relative Entropy

◮ Two-class problem: p same subject, q other subjects

◮ Kullback-Leibler Divergence as lower bound [Adler+06]

39.5 bits

73.5 bits

47.5 bits

p sub-space

q1 sub-space

q2 sub-space

Gaussian model

for q space

−10 −5 0 5 10−10

−5

0

5

10

Feature space dimension 1

Fea

ture

spac

ed

imen

sio

n2

0

0.1

0.2

0.3

0.4

[Adler+06] A. Adler, R. Youmaran, S. Loyka: Towards a Measure of Biometric Information, IEEE CCECE, 2006.[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 10/15

Page 12: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Entropy in Feature-space Domain

Necessary Regularizations

Challenge: Which information is significant?

◮ Regularizations according to Adler et al. [Adler+06]:

1. Degenerated features: USqVt = svd(Σq)

⇒ Sub-space with: [Sq]i,j ≥ 10−10[Sq]1,1⇒ Subject sub-space as: Sp = U tΣpV

2. Insufficient data: [Σp]i,j = 0 i, j ≥ Np, i 6= j

◮ Extension for variable Np per subject [Nautsch+15]:

3. ill-disposed regularized models:⇒ iterative [Σp]i,j = 0 until Σp is positive-definite

4. Minimal amount of genuine samples: Np ≥ 10

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 11/15

Page 13: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Experimental Set-up

◮ Database: I4U i-vectors of NIST Speaker RecognitionEvaluation 2012 (SRE’12), comprising SREs 2004 – 2010

◮ 551 female, 425 male subjects

◮ Focus: incomplete probe samples⇒ Duration variations: 5s, 10s, 20s, 40s, full

◮ How do speaker sub-spaces accumulate by increasing sampleduration?

◮ Expectation: correlation to system performance

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 12/15

Page 14: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Accumulation of Voice Templates by Duration

81

163

244325

407

488

569

651

732813

895

976

0.25 0.5 1 2

Subject IDs

Ratio to full-full

full-5

full-full

0.25 0.5 1 2

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15

Page 15: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Accumulation of Voice Templates by Duration

81

163

244325

407

488

569

651

732813

895

976

0.25 0.5 1 2

Subject IDs

Ratio to full-full

full-5

full-10

full-full

0.25 0.5 1 2

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15

Page 16: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Accumulation of Voice Templates by Duration

81

163

244325

407

488

569

651

732813

895

976

0.25 0.5 1 2

Subject IDs

Ratio to full-full

full-5

full-10

full-20

full-full0.25 0.5 1 2

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15

Page 17: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Accumulation of Voice Templates by Duration

81

163

244325

407

488

569

651

732813

895

976

0.25 0.5 1 2

Subject IDs

Ratio to full-full

full-5

full-10

full-20

full-40

full-full

0.25 0.5 1 2

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15

Page 18: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Comparison: relative Entropy ⇔ System Performance

full-5

full-10

full-20

full-40

full-full

0

100

200

300

400

500

Duration groups

RelativeEntropy(inbits)

minmeanmax

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15

Page 19: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Comparison: relative Entropy ⇔ System Performance

full-5

full-10

full-20

full-40

full-full

0

100

200

300

400

500

Duration groups

RelativeEntropy(inbits)

minmeanmax

2

5

10

20

4060

BiometricError

Rate(in%)

EER

FMR100

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15

Page 20: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Comparison: relative Entropy ⇔ System Performance

full-5

full-10

full-20

full-40

full-full

0

100

200

300

400

500

Duration groups

RelativeEntropy(inbits)

minmeanmax

2

5

10

20

4060

BiometricError

Rate(in%)

EER

FMR100

0.1

0.15

0.2

0.3

0.4

Score

cross-entropy

Cminllr

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15

Page 21: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Experimental Evaluation

Comparison: relative Entropy ⇔ System Performance

full-5

full-10

full-20

full-40

full-full

0

100

200

300

400

500

Duration groups

RelativeEntropy(inbits)

minmeanmax

2

5

10

20

4060

BiometricError

Rate(in%)

EER

FMR100

0.1

0.15

0.2

0.3

0.4

Score

cross-entropy

Cminllr

full-5

full-10

full-20

full-40

full-full

0

100

200

300

400

500

128-bit passwords

FaceFingerprint

Iris

Duration groups

RelativeEntropy(inbits)

[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-

sensitive speaker recognition., IEEE ICASSP, 2015.

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15

Page 22: EntropyAnalysisinSpeakerRecognition · Overview on Speaker Recognition Voice as biometric characteristic Application scenarios and challenges (brief excerpt) Call-Center fraud prevention:

Conclusion

◮ Relative entropy as metric for biometric uniqueness

◮ Accumulation of voice templates by duration depicted

◮ Relative Entropy: from 63.2 up to 421.9 bit, µ > 124.3 bit

◮ Comparability to other biometric modalities

◮ Comparability to password feature spaces (e.g., 128-bit)

◮ Subject collision probability: subject- and sample-depending5× 10−39 down to 2× 10−55 (empiric data)

Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 15/15