27
Language modeling for speaker recognition Dan Gillick January 20, 2004

Language modeling for speaker recognition Dan Gillick January 20, 2004

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Language modeling for speaker recognition Dan Gillick January 20, 2004

Language modeling for speaker recognition

Dan Gillick

January 20, 2004

Page 2: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (2)

Outline

• Author identification

• Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition)

• My next project

Page 3: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (3)

Author ID (undergrad. thesis)

Problem: – train models for each of k authors– given some test text written by 1 of those

authors, identify the correct author

Variations:– different kinds of models– different size test samples– different k

Page 4: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (4)

Character n-gram models

What?– 27 tokens: a-z, <space>– some text generated from such a trigram model:

“you orthad gool of anythilly

uncand or prafecaustiont and to hing that put ably”

Page 5: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (5)

Character n-gram models

Why?– very simple– data sparseness less troublesome than with

word n-grams– supposed to be state-of-the-art or at least close

to it (Khmelev, D, Tweedie, F.J. “Using Markov Chains for the Identification of Writers”: Literary and Linguistic Computing, 16(4): 299-307. 2001.)

Page 6: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (6)

Character n-grams: Setup

• task: pick correct author from 10 possible authors

• training data: 3 novels for each author

• test data: text from a held-out novel

• jack-knifing: 4 novels for each of 20 authors

Page 7: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (7)

Character n-grams: Results• task: picking 1 author from 10 possible authors

• training data size: 3 novels

Page 8: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (8)

Character n-gram models

Why does it work?– captures some word choice information– picks up word endings (–ing, -tion, -ly, etc.)– not hurt much by data sparseness issues

Page 9: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (9)

Key-list models

Incentive:– ought to be able to beat character n-grams– develop a new modeling method more focused

on that which differentiates between authors (characters and words are both useful for topic recognition, but that doesn’t mean they are best for author recognition)

Page 10: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (10)

Key-list models

Idea:– convert the text stream into a stream of only

authorship-relevant symbols (I called these lists of symbols key-lists)

– each symbol is a regular expression to allow for broad definitions (/*tion/ captures any nounification)

– text not accounted for by the key-list is represented by <short>, <med>, or <long> markers

– build n-gram models from these new streams

Page 11: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (11)

Key-list models

sample trigram: <comma> <short> <period>

Regular Expression Description

(\w)(,)(\s) comma

(\w)(\.)(\s) period

(\b)(of|for|to|around|after| … )(\b) common prepositions

(\b)(was|were \w*ed(\b) passive voice

(\b)(is|was|will|are|were|am)(\b) is conjugations

(\b)(\w*ing)(\b) ends in –ing

(\b)(\w*ly)(\b) adverb

(\b)(and|but|or|not|if|then|else)(\b) logical

(\b)(as)(\b) as

(\b)(would|should|could)(\b) modal verbs

Sample key-list:

Page 12: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (12)

Key-list models: Results• task: picking 1 author from 10 possible authors

• training data size: 3 novels

Page 13: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (13)

Key-list models: Results

Some other interesting results:– key-lists with just punctuation (as well as

<short>, <med>, <long>) performed almost as well as the best key-lists

– all key-lists were outperformed by the best n-letter model when test data size < 10,000 chars. but all key-list models eventually surpassed the n-letter models

Page 14: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (14)

Key-list models

Things I didn’t do:– vary amount of training data– spend a long time trying different key-lists– combine key-list results with each other or with

the character results– a lot of other stuff

The thesis is available on the web: http://www.dgillick.com/resource/thesis.pdf

Page 15: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (15)

Outline

• Author identification

• Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition)

• My next project

Page 16: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (16)

G. Doddington’s LM strategy

• create LMs with a limited vocabulary of the most commonly occurring 2000 bigrams

• to smooth out zeroes, boost each bigram prob. by 0.001

• score by calculating:

logprob(test|target) – logprob(test|bkg)

• logprobs are joint probabilitieslogprob(AB) = logprob(A) + logprob(B|A)

Page 17: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (17)

G. Doddington’s LM: Setup

Switchboard 1 data:– collected in early ’90s from all over the US– 2,400 (~5 min.) conversations among 543 speakers– corpus divided into 6 splits and tested using jack-knifing

through the splits– manual transcripts provided by MS. State

Task:– 8 conversation sides used as training data to build models

for each target speaker– 1 conversation side used as test data– background model built from 3 splits of held-out data– jack-knifing allowed for almost 10,000 trials

Page 18: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (18)

G. Doddington’s LM: Results

Notes:– these results are my own

attempt to replicate the original experiments

– SRI reported EER = 8.65% for this same experiment

Page 19: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (19)

Adapted bigram models

Incentive:– adapting target models from a much larger

background model should yield better estimates of probabilities in the language models

Specifically:– use same 2000 bigram vocabulary– target probabilities are a mixture of training

probabilities and background probabilities– mixture weight is 2:1 target data:bkg. data

Page 20: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (20)

Adapted bigram models: Results

Notes:– nearly identical performance

– combination of the 2 systems yields almost no improvement

– why isn’t the adapted version better?

Page 21: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (21)

Can anything improve on 8.68?

Trigrams?– use same count threshold to make a list of the

top 700 trigrams (“a lot of”, “I don’t know” were among the most common)

Character models?– worked well for authorship…– included all character combinations (no limited

vocabulary)– tried bigram and trigram models

Page 22: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (22)

Scores and combinationsadapt. word bigrams

EER = 8.89%adapt. word trigrams

EER = 11.88%adapt.char. bigrams

EER = 13.73%adapt. char. trigrams

EER = 17.92%

adapted wordsEER = 8.46%

adapted words + adapted charactersEER = 7.89%

adapted charactersEER = 13.24%

GD bigramsEER = 8.68%

Page 23: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (23)

Final Comparison

Page 24: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (24)

What about less training data?

1 conversation-side training– character models might provide more of an

advantage with less data?– not so.

• GD EER = 22.5%• adapted character EER = 30%• adapted word EER = 20%

– maybe these character models pick up on the topic of that 1 conversation

– haven’t tried any other size training data

Page 25: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (25)

Outline

• Author identification

• Trying to beat GD’s result

• My next project

Page 26: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (26)

Key-lists for speaker recognition

• key-list n-grams picked up on phrasing (comma and period were valuable tokens)– automatic transcripts don’t have punctuation

but they do have pause and duration information

• use reg. exps. and duration info. to capture idiosynchratic speaker phrasing

• capture other speech information in key-lists? (energy, f0, etc.)

Page 27: Language modeling for speaker recognition Dan Gillick January 20, 2004

January 20, 2004 Language modeling for speaker recognition Dan Gillick (27)

Acknowledgements

Thanks to:

Anand and Luciana at SRI for trying to help me replicate their results

Barbara for providing advice

Barry and Kofi for helping with computers and stuff

George