26
2002 VIU Oct 2007 : Speaker Recognition 1 F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification, Speaker Verification

2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 1 F. Schiel

Florian SchielVenice International University

Oct 2007

Speaker Recognition =Speaker Identification, Speaker Verification

Page 2: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 2 F. Schiel

Agenda

• See the Context

• Speech Recognition vs. Speaker Recognition

• Speaker Identification vs. Speaker Verification

• Speaker Recognition: Basics

• Speaker Verification using HMM

• Discussion

• and then ...

Page 3: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 3 F. Schiel

General Approach to Authentification

• Three general ways to perform authentification:- proof of knowledge (e.g. password),- proof of possession (e.g. chip card),- proof of property (biometrics), and their combinations

• Biometrics: physiological based vs. behavioural based• Biometrical features:

Fingerprint, iris scan, facial scan, hand geometry, signature, voice

from U. Türk 2007

Page 4: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 4 F. Schiel

Biometric Features: General Requirements

• universal: can be found in any user• unique: even for identical twins• measurable: does not require human evaluation• robust to short-term and long-term variability• low dimensionality• robust to changing environment• robust to impersonation

from U. Türk 2007

++++++ooo+

Page 5: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 5 F. Schiel

Taxonomie Speech Processing

Natural Language Processing(NLP)

Spoken Language Processing(SLP)

Lexica

SyntaxParsing

Spellers

Search /IndexingSemantics

Terminology

Thesaurus

Dialogue systems

SpeechIdentification

Speech Synthesis

Speaker recognition

Speech Recognition

Forensics

Page 6: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 6 F. Schiel

Speech Recognition

"Decode the spoken content from the acoustic signal"

Speaker Recognition

"Determine the identity of a speaker from acoustic signal"

ASR "Sehr geehrter .." SI/SVAccepted/Rejected

ID

SpeechModels

SpeakerCharacteristics

ClaimedIdentity

Page 7: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 7 F. Schiel

Speaker Verification• Authentification according to

claimed identity• Result is binary:

"accept" / "reject"• Scaling: effort independent

of number of participants• Accuracy: dependent of size

of enrolment data

Speaker Identification• Identification from limited number

of participants• Result is speaker identity• Scaling: effort increases linear

with number of participants• Accuracy: dependent of

+ size of enrolment data+ number of participants

reject

Identität falsch

accept

Identität ok correctidentity ok

accept

Identität ok falsereject

rejectreject

Identität falsch correct

accept

falseaccept

identity wrong

100

NCor

rect

ness

Speaker Recognition

Page 8: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 8 F. Schiel

• Applications:– Access Control

– Verification of identity

via the phone

– Automatic Teller Machines

– Password resetting

– Banking: Identity for new

accounts etc.

– Protection against theft (cars...)

Speaker Verification

• Applications:– Forensics

– Police Work

– Automatic User Settings

– Speaker Classification:

Advertising

Speaker Identification

Page 9: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 9 F. Schiel

Speaker Verification: Doddington's Zoo (1)

User = registered speaker, Impostor = non-registered speaker

• Goats : users that are often rejected wrongly (increasing 'false reject' errors)

• Lambs : users that are easily imitated (increasing 'false accept' errors)

• Sheep : users that 'behave' (not goats and not lambs)• Wolfs : particulary successful impostors

(increasing 'false accept' errors)

from Doddington 1998

Page 10: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 10 F. Schiel

Speaker Verification: Doddington's Zoo (2)

Wolfs may perform zero-effort or active impostor attempts to break into a SV system.

Problem:Speaker verification data bases do not contain active impostorattempts data of wolfs -> most technical evaluations are based on non-realistic data!

Page 11: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 11 F. Schiel

Technical Speech Processing

Featuredetection

DekoderHighpass

Analog Signal

0

t

Digital Signal

t

Vectors

m1

.

.mN

m1

.

.mN

10 20

...• "Call Richard!"• "Radio off!"• "216"

Symbols

Symbols:• Text• Action• Semantics

A / DAnti-

AliasingFilter

Page 12: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 12 F. Schiel

Verification"Accept""Reject"

Featuredetection

Highpass

A / DAnti-

AliasingFilter

Claimedidentity

PINFingerprint

ASR

SelectID

Speaker Models

Speaker Verifikation: Basics (1)

Page 13: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 13 F. Schiel

VerificationFeature

detectionHighpass

Speaker Verification: Basics (2)

ffsam

/2

Analog low pass filterto avoid anti-aliasingeffects

+ Analog-DigitalConverter

„Accept”„Reject”A / D

Anti-Aliasing

Filter

Anti-aliasing

filterA / D

Page 14: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 14 F. Schiel

Speaker Verification: Basics (3)

Features:• speaker specific• robust against noise• partly long term

0

Extraction ofSpeakercharacteristics

m1

...mN

m1

...mN

10 20

m1

...mN

m1

...mN

30 40

...

Window

25 ms

Merkmals-berechnung

VerificationHighpass

A / DAnti-

AliasingFilter

"Accept""Reject"A / D

Anti-Aliasing

FilterFeature

detection

Page 15: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 15 F. Schiel

Featuredetection

Highpass

A / DAnti-

AliasingFilter

Verification

"Accept""Reject"

p(S | ID) < threshold

vector sequenceS

m1

.

.mN

m1

.

.mN

10 20

...

decision

p(S | ID) > threshold

"Accept"

"Reject"

speaker modelof claimed ID

Speaker Verification: Basics (4)

Page 16: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 16 F. Schiel

Speaker Verification: Tuning

• Error types highly dependent on threshold

high security -> false accept low false reject highuser friendly -> false reject low false accept high

EqualErrorRate

falseaccept

falsereject

• Both errors increase by:- channel disturbance- crosstalk- noise- room acoustics

threshold

• Solution:- multiple enrolments- adaptive learning

Page 17: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 17 F. Schiel

Speaker Verification: Score Normalisation (1)

Problem:How to set the optimal threshold?

HMMs generate a priori probabilities:O : observation = sequence of featuresl : speaker model

Bayes:

but is dependent on various factors

P l∣O=p O∣l P l P O

p O∣l

P O

Page 18: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 18 F. Schiel

Speaker Verification: Score Normalisation (2)

Solution: Bayesian Decision Rule:

with Bayes and log to both sides this leads to:

P l∣O =p O∣l P l P O

C FR P l∣O C FAP l∣O

log p O∣l − log p O∣l log C FAP l C FRP l

=threshold

CFR

, CFA

: cost functions

Page 19: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 19 F. Schiel

Speaker Verification: Score Normalisation (3)

Often assumed: costs are equal and speakers occurequally distributed

is estimated using a world or cohort model

world model : speaker model trained to all speakers

cohort model : speaker model trained to a group of

most competing models (wolfs)

lo g p O∣l − lo g p O∣l lo g N − 1

N : number of users∧ im postors

p O∣l

Page 20: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 20 F. Schiel

Speaker Verification: Enrolment

Method

Fixed, pre-specified sentence:e.g. "My voice is my password"

Fixed, selectable sentence:e.g. maiden name of grandmother

Changing number triplets:e.g. fifteen, thirtynine, seventythree

System generates a new sentencefor each verification

Enrolment Remarks

Speak sentence3 - 5 times

Speak sentence3 – 5 times

Speak each number3 – 5 times

Sentence may be intercepted and played back

Additional securityby content

High security by manypossible combinations

Elaborate enrolment,high processing effort,very high security

Speak each phoneme3 – 5 times

Page 21: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 21 F. Schiel

Speaker Verification: HMM types

Method

pre-specified sentence

recombination of segments taken from enrolment data

modeling without time structure

Model Security

Accuracy

linear

piecewise linear

ergodic

o

Page 22: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 22 F. Schiel

Speaker Verification: Features (1)

Variable signal characteristics• often required: telephone band 300 – 3300 Hz

(higher resonances cut off)• changing channel characteristics, caused by

transmission line, handset, distance to mouth• static and intermittent noise • user: health, intoxication, fatigue

Page 23: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 23 F. Schiel

Speaker Verification: Features (2)

Candidates determined by physiology:• fundamental frequency, average• wave form of vocal folds, jimmer, jitter, irregularities• formants: average and dynamics• places of articulation: fricatives, plosives• nasal cavity resonance• sub-glottal resonance

Page 24: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 24 F. Schiel

Speaker Verification: Features (3)

Candidates determined by behaviour:• voiced/unvoice ratio• fundamental frequency, dynamics• syllable rate, pause/speech ratio• dialectal features: vowel qualityCandidates determined by speech technology:• Linear Predictor Coefficients (LPC)• filter bank, Bark filter bank, Mel filter bank• Cepstrum, Mel-Cepstrum• (derivations with respect to time)

Page 25: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 25 F. Schiel

Sprecherverifikation: Road Map

1990 Heute 2010 2020

ZugangskontrollenSicherheitsbereich

Authentifizierungüber Telefon

Geräte "erkennen"ihren Benutzer

Sprecherprofilauf Chipkarten

Zugangskontrolle fürTastaturlose PDAs

Authentifizierungim Hintergrund

ÖffentlicheSprecherprofile

Automatischer Alkohol-test im Fahrzeug

Page 26: 2002 VIU Oct 2007 : Speaker Recognition1F. Schiel Florian Schiel Venice International University Oct 2007 Speaker Recognition = Speaker Identification,

2002

VIU Oct 2007 : Speaker Recognition 26 F. Schiel

Thank You!