141
Dr. Gerald Friedland International Computer Science Institute Berkeley, CA [email protected] Introduction to Speaker Diarization Monday, May 21, 12

Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Dr. Gerald FriedlandInternational Computer Science InstituteBerkeley, [email protected]

Introduction to Speaker Diarization

Monday, May 21, 12

Page 2: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speaker Diarization...➡tries to answer the question: “who spoke when?”

➡using a single or multiple microphone inputs

➡without prior knowledge of anything (#speakers, language, text, etc...)

2

Monday, May 21, 12

Page 3: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Visualization

Estimate “who spoke when” with no prior knowledge of speakers, #of speakers, words, or language spoken.

Audiotrack:

3

Monday, May 21, 12

Page 4: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Visualization

Estimate “who spoke when” with no prior knowledge of speakers, #of speakers, words, or language spoken.

Audiotrack:

3

Monday, May 21, 12

Page 5: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Visualization

Estimate “who spoke when” with no prior knowledge of speakers, #of speakers, words, or language spoken.

Audiotrack:

Segmentation:

3

Monday, May 21, 12

Page 6: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Visualization

Estimate “who spoke when” with no prior knowledge of speakers, #of speakers, words, or language spoken.

Audiotrack:

Segmentation:

3

Monday, May 21, 12

Page 7: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Visualization

Estimate “who spoke when” with no prior knowledge of speakers, #of speakers, words, or language spoken.

Audiotrack:

Clustering:

Segmentation:

3

Monday, May 21, 12

Page 8: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speaker Diarization is NOT

4

Monday, May 21, 12

Page 9: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speaker Diarization is NOT

•Speaker ID (Speaker ID is supervized and needs prior training)

4

Monday, May 21, 12

Page 10: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speaker Diarization is NOT

•Speaker ID (Speaker ID is supervized and needs prior training)

•Speaker Verification (is supervized and returns yes/no answer)

4

Monday, May 21, 12

Page 11: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speaker Diarization is NOT

•Speaker ID (Speaker ID is supervized and needs prior training)

•Speaker Verification (is supervized and returns yes/no answer)

•Beamforming (as this requires multiple mics, even though beamforming can be used to support diarization)

4

Monday, May 21, 12

Page 12: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Why Diarization?

5

Monday, May 21, 12

Page 13: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Why Diarization?

5

•Important basic technology for various semantic audio analysis tasks

Monday, May 21, 12

Page 14: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Why Diarization?

5

•Important basic technology for various semantic audio analysis tasks

•Meeting retrieval, video conferencing, speaker-adaptive ASR, video retrieval, etc...

Monday, May 21, 12

Page 15: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Why Diarization?

5

•Important basic technology for various semantic audio analysis tasks

•Meeting retrieval, video conferencing, speaker-adaptive ASR, video retrieval, etc...

•Let’s take a look at some examples

Monday, May 21, 12

Page 16: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

6

Application: Meeting Browsing

Monday, May 21, 12

Page 17: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Application: Semantic Navigation

G. Friedland, L. Gottlieb, A. Janin: “Joke-o-mat: Browsing Sitcoms Punchline by Punchline”, Proceedings of ACM Multimedia, Beijing, China, October 2009.

Monday, May 21, 12

Page 18: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Application: Video Duplicate Detection

8

Monday, May 21, 12

Page 19: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Other Applications

9

(Speaker) Diarization is oftenused as underlying support for...

Monday, May 21, 12

Page 20: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Other Applications

9

•Beamforming

(Speaker) Diarization is oftenused as underlying support for...

Monday, May 21, 12

Page 21: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Other Applications

9

•Beamforming•Visual Localization

(Speaker) Diarization is oftenused as underlying support for...

Monday, May 21, 12

Page 22: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Other Applications

9

•Beamforming•Visual Localization•Video Analysis: Object Detection,

Event Detection, Scene Detection

(Speaker) Diarization is oftenused as underlying support for...

Monday, May 21, 12

Page 23: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Other Applications

9

•Beamforming•Visual Localization•Video Analysis: Object Detection,

Event Detection, Scene Detection•behavior-level analysis tasks, such as

dominance detection

(Speaker) Diarization is oftenused as underlying support for...

Monday, May 21, 12

Page 24: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Other Applications

9

•Beamforming•Visual Localization•Video Analysis: Object Detection,

Event Detection, Scene Detection•behavior-level analysis tasks, such as

dominance detection•Robotics Applications (e.g. addressing

people)

(Speaker) Diarization is oftenused as underlying support for...

Monday, May 21, 12

Page 25: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Other Applications

9

•Beamforming•Visual Localization•Video Analysis: Object Detection,

Event Detection, Scene Detection•behavior-level analysis tasks, such as

dominance detection•Robotics Applications (e.g. addressing

people)•Support for adaptive speech

recognition

(Speaker) Diarization is oftenused as underlying support for...

Monday, May 21, 12

Page 26: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Main Drive: NIST RT Eval

10

Monday, May 21, 12

Page 27: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Main Drive: NIST RT Eval

10

•Speaker Diarization was evaluated as part of the NIST Rich Transcription Evaluation (since about 2002)

Monday, May 21, 12

Page 28: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Main Drive: NIST RT Eval

10

•Speaker Diarization was evaluated as part of the NIST Rich Transcription Evaluation (since about 2002)

•Idea: Create “Rich Transcripts” of broadcast news, later meetings.

Monday, May 21, 12

Page 29: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Main Drive: NIST RT Eval

10

•Speaker Diarization was evaluated as part of the NIST Rich Transcription Evaluation (since about 2002)

•Idea: Create “Rich Transcripts” of broadcast news, later meetings.

•Evaluated on Real-World data

Monday, May 21, 12

Page 30: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speech Recognition

Relevant Web Scraping

Audio Signal

"who spoke when"Speaker

DiarizationSpeaker

Attribution

"what's relevant to this"

"who said what"

Summarization"what was said"

Indexing, Search, Retrieval

Question Answering

...

...

higher-level analysis

...

"what are the main points" ...

11

Typical Component Composition for RT

Monday, May 21, 12

Page 31: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speech Recognition

Relevant Web Scraping

Audio Signal

"who spoke when"Speaker

DiarizationSpeaker

Attribution

"what's relevant to this"

"who said what"

Summarization"what was said"

Indexing, Search, Retrieval

Question Answering

...

...

higher-level analysis

...

"what are the main points" ...

11

Typical Component Composition for RT

Monday, May 21, 12

Page 32: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Speaker Diarization: General Overview

12

Feature

Extraction

Speech/Non-

Speech Detector

Diarization

Engine

Audio Signal

Metadata

Speech OnlyMFCC

Segmentation

Clustering

Monday, May 21, 12

Page 33: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

Monday, May 21, 12

Page 34: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

•RTTM files (as defined by NIST)

Monday, May 21, 12

Page 35: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

•RTTM files (as defined by NIST)

•Example:

Monday, May 21, 12

Page 36: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

•RTTM files (as defined by NIST)

•Example:SPEAKER soupnazi 1 40.0 2.5 <NA> <NA> George <NA>

Monday, May 21, 12

Page 37: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

•RTTM files (as defined by NIST)

•Example:SPEAKER soupnazi 1 40.0 2.5 <NA> <NA> George <NA>

SPEAKER soupnazi 1 42.5 2.5 <NA> <NA> Jerry <NA>

Monday, May 21, 12

Page 38: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

•RTTM files (as defined by NIST)

•Example:SPEAKER soupnazi 1 40.0 2.5 <NA> <NA> George <NA>

SPEAKER soupnazi 1 42.5 2.5 <NA> <NA> Jerry <NA>

SPEAKER soupnazi 1 45.0 2.5 <NA> <NA> female <NA>

Monday, May 21, 12

Page 39: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

•RTTM files (as defined by NIST)

•Example:SPEAKER soupnazi 1 40.0 2.5 <NA> <NA> George <NA>

SPEAKER soupnazi 1 42.5 2.5 <NA> <NA> Jerry <NA>

SPEAKER soupnazi 1 45.0 2.5 <NA> <NA> female <NA>

Monday, May 21, 12

Page 40: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Output Format of Diarization

13

•RTTM files (as defined by NIST)

•Example:SPEAKER soupnazi 1 40.0 2.5 <NA> <NA> George <NA>

SPEAKER soupnazi 1 42.5 2.5 <NA> <NA> Jerry <NA>

SPEAKER soupnazi 1 45.0 2.5 <NA> <NA> female <NA>

•Large amount of tools available to deal with these files.

Monday, May 21, 12

Page 41: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Error Measurement

14

Monday, May 21, 12

Page 42: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Error Measurement

14

•US NIST defines error metrics and is evaluating speaker diarization on a regular basis

Monday, May 21, 12

Page 43: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Error Measurement

14

•US NIST defines error metrics and is evaluating speaker diarization on a regular basis

•Error metrics is called ‘Diarization Error Rate’ (DER)

Monday, May 21, 12

Page 44: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Error Measurement

14

•US NIST defines error metrics and is evaluating speaker diarization on a regular basis

•Error metrics is called ‘Diarization Error Rate’ (DER)

•All tools available open source

Monday, May 21, 12

Page 45: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Error Measurement

15

DER = The amounts of time a speaker has been assigned wrongly, missed, assumed when there is none, or assumed solely when there is more than one relative to the length of the audio.

Monday, May 21, 12

Page 46: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Segmentation & Clustering

16

Monday, May 21, 12

Page 47: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Segmentation & Clustering

16

•Originally: Segment first, cluster later Chen, S. S. and Gopalakrishnan, P., “Clustering via the bayesian information criterion with applications in speech recognition,” Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2001, Vol. 2, Seattle, USA, pp. 645-648.

Monday, May 21, 12

Page 48: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Segmentation & Clustering

16

•Originally: Segment first, cluster later Chen, S. S. and Gopalakrishnan, P., “Clustering via the bayesian information criterion with applications in speech recognition,” Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2001, Vol. 2, Seattle, USA, pp. 645-648.

•More efficient: Top-Down and Bottom-Up Approaches

Monday, May 21, 12

Page 49: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Segmentation: Secret Sauce

17

Monday, May 21, 12

Page 50: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Segmentation: Secret Sauce

17

•How do you distinguish speakers?

Monday, May 21, 12

Page 51: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Segmentation: Secret Sauce

17

•How do you distinguish speakers?

•Combination of MFCC+GMM+BIC seems unbeatable!

Monday, May 21, 12

Page 52: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Segmentation: Secret Sauce

17

•How do you distinguish speakers?

•Combination of MFCC+GMM+BIC seems unbeatable!

•Can be generalized to Audio Percepts

Monday, May 21, 12

Page 53: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

MFCC: Idea

18

power cepstrum of signal

Pre-emphasis

Windowing

FFT

Mel-Scale

Filterbank

Log-Scale

DCT

Audio Signal

MFCC

Monday, May 21, 12

Page 54: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

MFCC: Mel Scale

19

Monday, May 21, 12

Page 55: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

MFCC: Result

20

Monday, May 21, 12

Page 56: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Gaussian Mixtures

21

Monday, May 21, 12

Page 57: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Training of Mixture Models

22

Goal: Find ai for

Expectation:

Maximization:

Monday, May 21, 12

Page 58: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion

23

BIC =where X is the sequence of features for a segment, Θ are the parameters of the statistical model for the segment, K is the number of parameters for the model, N is the number of frames in the segment,λ is an optimization parameter.

Monday, May 21, 12

Page 59: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Explanation

24

Monday, May 21, 12

Page 60: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Explanation

24

•BIC penalizes the complexity of the model (as of number of parameters in model).

Monday, May 21, 12

Page 61: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Explanation

24

•BIC penalizes the complexity of the model (as of number of parameters in model).

•BIC measures the efficiency of the parameterized model in terms of predicting the data.

Monday, May 21, 12

Page 62: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Explanation

24

•BIC penalizes the complexity of the model (as of number of parameters in model).

•BIC measures the efficiency of the parameterized model in terms of predicting the data.

•BIC is therfore used to choose the number of clusters according to the intrinsic complexity present in a particular dataset.

Monday, May 21, 12

Page 63: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Properties

25

Monday, May 21, 12

Page 64: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Properties

25

•BIC is a minimum description length criterion.

Monday, May 21, 12

Page 65: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Properties

25

•BIC is a minimum description length criterion.

•BIC is independent of the prior.

Monday, May 21, 12

Page 66: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bayesian Information Criterion: Properties

25

•BIC is a minimum description length criterion.

•BIC is independent of the prior.•It is closely related to other penalized

likelihood criteria such as RIC and the Akaike information criterion.

Monday, May 21, 12

Page 67: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

Cluster1Cluster2 Cluster2 Cluster3

Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 68: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up AlgorithmInitialization

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 69: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up AlgorithmInitialization

Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 70: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Training

Initialization

Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 71: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Training

Initialization

Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 72: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Alignment

(Re-)Training

Initialization

Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 73: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Alignment

(Re-)Training

Cluster1Cluster2 Cluster2 Cluster3Cluster1Cluster2 Cluster2 Cluster3

Initialization

Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 74: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Alignment

Merge two Clusters?

Yes(Re-)Training

Cluster1Cluster2 Cluster2 Cluster3Cluster1Cluster2 Cluster2 Cluster3

Initialization

Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 75: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Alignment

Merge two Clusters?

Yes(Re-)Training

Cluster1Cluster2 Cluster2 Cluster3Cluster1Cluster2 Cluster2 Cluster3

Initialization

Cluster1Cluster2 Cluster2 Cluster2Cluster1Cluster2 Cluster2 Cluster2

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 76: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Alignment

Merge two Clusters?

Yes(Re-)Training

Cluster1Cluster2 Cluster2 Cluster3Cluster1Cluster2 Cluster2 Cluster3

Initialization

Cluster1Cluster2 Cluster2 Cluster2Cluster1Cluster2 Cluster2 Cluster2

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 77: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Alignment

Merge two Clusters?

Yes(Re-)Training

Cluster1 Cluster2 Cluster1 Cluster2Cluster1 Cluster2 Cluster1 Cluster2

Initialization

Cluster1Cluster2 Cluster2 Cluster2Cluster1Cluster2 Cluster2 Cluster2

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 78: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Bottom-Up Algorithm

(Re-)Alignment

Merge two Clusters?

Yes(Re-)Training

Cluster1 Cluster2 Cluster1 Cluster2Cluster1 Cluster2 Cluster1 Cluster2

End

No

Initialization

Cluster1Cluster2 Cluster2 Cluster2Cluster1Cluster2 Cluster2 Cluster2

26

Start with too many clusters (initialized randomly)Purify clusters by comparing and merging similar clustersResegment and repeat until no more merging needed

Monday, May 21, 12

Page 79: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

27

ICSI’s Speaker Diarization

Monday, May 21, 12

Page 80: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

27

• Speaker Diarization research @ ICSI since 2001

ICSI’s Speaker Diarization

Monday, May 21, 12

Page 81: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

27

• Speaker Diarization research @ ICSI since 2001

• Various versions of Diarization Engines developed over the years

ICSI’s Speaker Diarization

Monday, May 21, 12

Page 82: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

27

• Speaker Diarization research @ ICSI since 2001

• Various versions of Diarization Engines developed over the years

• Status: Research code but stable for some applications that are error tolerant

ICSI’s Speaker Diarization

Monday, May 21, 12

Page 83: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

28

ICSI’s Speaker Diarization Engine Variants

Monday, May 21, 12

Page 84: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

28

Basic (single mic, easy installation)

ICSI’s Speaker Diarization Engine Variants

Monday, May 21, 12

Page 85: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

28

Basic (single mic, easy installation)Fast (single mic, multiple CPU cores)

ICSI’s Speaker Diarization Engine Variants

Monday, May 21, 12

Page 86: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

28

Basic (single mic, easy installation)Fast (single mic, multiple CPU cores)Super fast (single mic, multiple GPUs)

ICSI’s Speaker Diarization Engine Variants

Monday, May 21, 12

Page 87: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

28

Basic (single mic, easy installation)Fast (single mic, multiple CPU cores)Super fast (single mic, multiple GPUs)Accurate but slow (multi mic, additional

preprocessing)

ICSI’s Speaker Diarization Engine Variants

Monday, May 21, 12

Page 88: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

28

Basic (single mic, easy installation)Fast (single mic, multiple CPU cores)Super fast (single mic, multiple GPUs)Accurate but slow (multi mic, additional

preprocessing)Audio/Visual (single and multi mic, for

localization)

ICSI’s Speaker Diarization Engine Variants

Monday, May 21, 12

Page 89: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

28

Basic (single mic, easy installation)Fast (single mic, multiple CPU cores)Super fast (single mic, multiple GPUs)Accurate but slow (multi mic, additional

preprocessing)Audio/Visual (single and multi mic, for

localization)Online (single mic, “who is speaking now”)

ICSI’s Speaker Diarization Engine Variants

Monday, May 21, 12

Page 90: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Basic Speaker Diarization: Facts

29

Monday, May 21, 12

Page 91: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Basic Speaker Diarization: Facts

29

•Input: 16kHz mono audio

Monday, May 21, 12

Page 92: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Basic Speaker Diarization: Facts

29

•Input: 16kHz mono audio•Features: MFCC19, no delta or

deltadelta

Monday, May 21, 12

Page 93: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Basic Speaker Diarization: Facts

29

•Input: 16kHz mono audio•Features: MFCC19, no delta or

deltadelta•Speech/Non-Speech Detector

external

Monday, May 21, 12

Page 94: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Basic Speaker Diarization: Facts

29

•Input: 16kHz mono audio•Features: MFCC19, no delta or

deltadelta•Speech/Non-Speech Detector

external•Runtime: ~ realtime (1h audio needs

1h processing on a single CPU, excluding speech/non-speech)

Monday, May 21, 12

Page 95: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Multi-CPU Speaker Diarization: Facts

30

Monday, May 21, 12

Page 96: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Multi-CPU Speaker Diarization: Facts

30

•Same as Basic Speaker Diarization

Monday, May 21, 12

Page 97: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Multi-CPU Speaker Diarization: Facts

30

•Same as Basic Speaker Diarization•Runtime: Dependent on number of

CPUs used. Example: 8 cores runtime = 14.3 x realtime, i.e. 14minutes of audio need 1 minute of processing.

Monday, May 21, 12

Page 98: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Multi-CPU Speaker Diarization: Facts

30

•Same as Basic Speaker Diarization•Runtime: Dependent on number of

CPUs used. Example: 8 cores runtime = 14.3 x realtime, i.e. 14minutes of audio need 1 minute of processing.

•Runtime bottleneck usually: Speech/Non-Speech Detector

Monday, May 21, 12

Page 99: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

GPU Speaker Diarization: Facts

31

Monday, May 21, 12

Page 100: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

GPU Speaker Diarization: Facts

31

•Same as Basic Speaker Diarization

Monday, May 21, 12

Page 101: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

GPU Speaker Diarization: Facts

31

•Same as Basic Speaker Diarization•Runtime: 250 x realtime, i.e. 1h of

audio is processed in 14.4sec!

Monday, May 21, 12

Page 102: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

GPU Speaker Diarization: Facts

31

•Same as Basic Speaker Diarization•Runtime: 250 x realtime, i.e. 1h of

audio is processed in 14.4sec!•Uses current CUDA NVidia Framework

as backend.

Monday, May 21, 12

Page 103: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

GPU Speaker Diarization: Facts

31

•Same as Basic Speaker Diarization•Runtime: 250 x realtime, i.e. 1h of

audio is processed in 14.4sec!•Uses current CUDA NVidia Framework

as backend. •Frontend: Python!

Monday, May 21, 12

Page 104: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

GPU Speaker Diarization: Facts

31

•Same as Basic Speaker Diarization•Runtime: 250 x realtime, i.e. 1h of

audio is processed in 14.4sec!•Uses current CUDA NVidia Framework

as backend. •Frontend: Python!•Runtime bottleneck usually: Speech/

Non-Speech Detector, Feature Extraction

Monday, May 21, 12

Page 105: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Demo: 1CPU vs 8CPU vs GPU

32

Monday, May 21, 12

Page 106: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Most Accurate Speaker Diarization: Overview

Short-Term Feature

Extraction

Speech/Non-Speech Detector

DiarizationEngine

Audio Signal

"who spoke when"

MFCC(only

Speech)

MFCC

Segmentation

Clustering

Long-TermFeature

Extraction

EMClustering

Prosodics(only Speech)

Initial Segments

Prosodics(only speech)

Dynamic Range Compression Beamforming

Delay FeaturesAudio Audio

Wiener Filtering

33

Monday, May 21, 12

Page 107: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Audio/Visual Speaker Diarization: Overview

34

Feature

Extraction

Speech/Non-

Speech Detector

Audio Signal

"who spoke when"MFCC(only

Speech)

MFCCDiarization

Engine

Segmentation

Clustering

Feature

Extraction

Video Signal

Video Activity

(only Speech

Regions)

Events

Invert Visual

Models"where the speaker was"

Monday, May 21, 12

Page 108: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Video Feature Extraction

35

MPEG-4

Video

n-dimensional

activity vector

Divide Frames

into n Regions

Avg. Motion

Vectors

Detect Skin

Blocks

Windowsize: 400ms

Monday, May 21, 12

Page 109: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

36

Audio/Visual Speaker Diarization: Facts

Monday, May 21, 12

Page 110: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

36

•One engine for audio and video

Audio/Visual Speaker Diarization: Facts

Monday, May 21, 12

Page 111: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

36

•One engine for audio and video

•Scales with n cameras

Audio/Visual Speaker Diarization: Facts

Monday, May 21, 12

Page 112: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

36

•One engine for audio and video

•Scales with n cameras•Robust against visual

changes such as different cloth, occlusions, etc...

“A voiceprint does not care about somebody dimming the light”

Audio/Visual Speaker Diarization: Facts

Monday, May 21, 12

Page 113: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Audio/Visual Diarization: Example Video

37

Monday, May 21, 12

Page 114: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

Monday, May 21, 12

Page 115: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech

Monday, May 21, 12

Page 116: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech•The signal is clean

Monday, May 21, 12

Page 117: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech•The signal is clean•No environmental noise

Monday, May 21, 12

Page 118: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)

Monday, May 21, 12

Page 119: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)•Speaker are well-distinguishable in

their voice (e.g. male - female, young - old)

Monday, May 21, 12

Page 120: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)•Speaker are well-distinguishable in

their voice (e.g. male - female, young - old)

•Speakers are non-emotional

Monday, May 21, 12

Page 121: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)•Speaker are well-distinguishable in

their voice (e.g. male - female, young - old)

•Speakers are non-emotional•Recording is at 16kHz.

Monday, May 21, 12

Page 122: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

In a perfect world...

38

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)•Speaker are well-distinguishable in

their voice (e.g. male - female, young - old)

•Speakers are non-emotional•Recording is at 16kHz.•Recording is 15-60 minute length

Monday, May 21, 12

Page 123: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Current Results using Different Inputs

39

Error/System Basic System:1 Audio Stream

8 Audio Streams

1 Audio Stream + 1 Camera

1 Audio Stream + 4 Cameras

Diarization Error Rate

32.09% 27.55% 27.52% 24.00%

Relative Improvement

baseline 14% 14% 25%

Core Speed (x realtime)

1.0 2.2 1.4 1.3

12 Meeting Recordings from AMI corpus

Monday, May 21, 12

Page 124: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Most Accurate Results

40

Error/System MFCC only (basic system)

Full System Full System+ One Camera

Diarization Error Rate 32.09% 20.33% 18.98%

Relative Improvement baseline 36% 41%

Core Speed (x realtime)

1.0 2.5 2.9

12 Meetings from AMI corpus “VACE Meetings”

Monday, May 21, 12

Page 125: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Top Error Sources

41

Monday, May 21, 12

Page 126: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Top Error Sources

41

•Overlapped Speech

Monday, May 21, 12

Page 127: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Top Error Sources

41

•Overlapped Speech•Short Speech Segments (<2s)

Monday, May 21, 12

Page 128: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Top Error Sources

41

•Overlapped Speech•Short Speech Segments (<2s)•Environmental Noise

Monday, May 21, 12

Page 129: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Top Error Sources

41

•Overlapped Speech•Short Speech Segments (<2s)•Environmental Noise•Low SNR

Monday, May 21, 12

Page 130: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Top Error Sources

41

•Overlapped Speech•Short Speech Segments (<2s)•Environmental Noise•Low SNR•Bad Speech/Non-Speech Detector

performance based on training data mismatch

Monday, May 21, 12

Page 131: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Top Error Sources

41

•Overlapped Speech•Short Speech Segments (<2s)•Environmental Noise•Low SNR•Bad Speech/Non-Speech Detector

performance based on training data mismatch

•Parameter mismatch, e.g. too few initial clusters

Monday, May 21, 12

Page 132: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

Monday, May 21, 12

Page 133: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

•There is no overlapped speech

Monday, May 21, 12

Page 134: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

•There is no overlapped speech•The signal is clean

Monday, May 21, 12

Page 135: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

•There is no overlapped speech•The signal is clean•No environmental noise

Monday, May 21, 12

Page 136: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)

Monday, May 21, 12

Page 137: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)•Speaker are well-distinguishable in

their voice (e.g. male - female, young - old)

Monday, May 21, 12

Page 138: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)•Speaker are well-distinguishable in

their voice (e.g. male - female, young - old)

•Speakers are non-emotional

Monday, May 21, 12

Page 139: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Optimal Performance is achieved when...

42

•There is no overlapped speech•The signal is clean•No environmental noise•Limited amount of speakers (4 or so)•Speaker are well-distinguishable in

their voice (e.g. male - female, young - old)

•Speakers are non-emotional•Recording is at 16kHz or higher.

Monday, May 21, 12

Page 140: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Future Work!

43

Monday, May 21, 12

Page 141: Introduction to Speaker Diarization - ICSI | ICSI · broadcast news, later meetings. Monday, May 21, 12. Main Drive: NIST RT Eval 10 •Speaker Diarization was evaluated as part of

Thank You!

44

Questions?Some of the Presented Work

performed together with: Mary Knox, Katya Gonina, Adam Janin

and others.Monday, May 21, 12