Speaker Diarization and Identification Using Machine Learning · Speaker Diarization and...

Speaker Diarization and Identification Using Machine LearningEnhong Deng, REU Student

Graduate Mentor: Abhinav Dixit, Faculty Advisors: Andreas Spanias, Visar Berisha

MOTIVATION

PROBLEM STATEMENT

Sensor Signal and Information Processing Centerhttp://sensip.asu.edu

SenSIP Center, School of ECEE, Arizona State UniversityDepartment of Speech and Hearing Science, Arizona State University

SenSIP Algorithms and Devices REU

ABSTRACT

REFERENCES

Speaker diarization identifies speakers in long speechrecordings.

Form speech segments and remove undesired noiseand unvoiced sections.

Form i-vectors from features extracted from speechsegments.

Train a machine learning model on extracted features. Classify new speech segments according to the speaker

identity.

Speaker Recognition Track the active speaker in a conversation with multiple

speakers.Audio Indexing Detect the change of speakers as a pre-processing step

for automatic transcription.Information Retrieval Examine contributions of speakers in speech

recordings.

MACHINE LEARNING ALGORITHM

Voice-Activity Detection(VAD) Identifies non-speech sounds and retains only the actual speech.

I-Vectors Extracting identity information using MFCCs. Low-dimensional i-vectors that represent the utterances from

speech.

RESULTS

ACKNOWLEDGEMENT

This project was funded in part by the National ScienceFoundation under Grant No. CNS 1659871 REU site: Sensors, Signaland Information Processing Devices and Algorithms.

METHODS

K-Means Clustering

Perform both supervised and unsupervised speakerdiarization in a telephone conversation.

Distinguish among male and female speakers to answerthe question “Who speaks when?”

[1] J. H. L. Hansen and T. Hasan, "Speaker Recognition by Machines and Humans: Atutorial review," in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 74-99, Nov.2015.[2] H. Song, M. Willi, J. J. Thiagarajan, V. Berisha, and A. Spanias, “Triple Network withAttention for Speaker Diarization,” , in Interspeech 2018[3] http://multimedia.icsi.berkeley.edu/speaker-diarization/

Unsupervised learning 98.5% accuracy in clustering. All data is partitioned into

three groups of clusters. Each cluster represents a

speaker class.

Supervised learning 97.7% accuracy in classification. 75% training data to generate

an SVM model. 25% remaining data to test

trained model.

Support Vector Machines(SVM)

Given labeled data, SVM canbe trained to develop a modelcapable of distinguishingamong different classes.

The trained SVM modelpredicts the identity ofspeaker in new speech data.

With a known number ofgroups k, k number ofcentroids are randomlychosen.

K-means clusters the datainto k groups of clusters.

Each column of the matrix corresponds to the true class and eachrow corresponds to the predicted class.

Number of correct predictions shown in green blocks, and falsepredictions shown in red.

Speaker Diarization and Identification Using Machine Learning · Speaker Diarization and...

Documents

Flocculants for Water Treatment Date: May ???,2011 Speaker :Speaker A 、 Speaker B 、 Speaker C

Speaker diarization of spontaneous meeting room conversationspublications.idiap.ch/downloads/papers/2015/Yella... · Speaker diarization of spontaneous meeting room conversations

Speaker Diarization Features: The UPM Contribution to the ...oa.upm.es/11896/2/INVE_MEM_2011_107717.pdfinformation theoretic approach [3]. Speaker diarization was first applied to

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai

FIRST DRAFT SUBMITTED TO IEEE TASLP: 19 AUGUST 2010 1 ... · meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription

Speaker Diarization: A Review of Recent ... - ICSI | ICSIvinyals/Files/taslp2011a.pdfstream and the intervals during which each speaker is active. Speaker diarization has utility in

Fuzzy Cognitive Diagnosis for Modelling Examinee Performance€¦ · Fuzzy Cognitive Diagnosis for Modelling Examinee Performance QI LIU, RUNZE WU and ENHONG CHEN, University of Science

Speaker Diarization with Enhancing Speech for the First

Keynote Speaker Plenary Speaker

Controllable synthesis of gold nanoparticles in aqueous ... Manuscript-ACS... · Mustafa K Bayazit, Jeffrey Yue, Enhong Cao, Asterios Gavriilidis and Junwang Tang* Department of Chemical

Speech Diarization - University of Rochester · Z. Zhou, Y. Zhang and Z. Duan, "Joint Speaker Diarization and Recognition Using Convolutional and Recurrent Neural Networks," 2018

Latent Class Model with Application to Speaker Diarization · In this paper, we apply a latent class model (LCM) to the task of speaker diarization. LCM is similar to Patrick Kenny’s

Introduction to Speaker Diarization

Unsupervised Methods for Speaker Diarization: An ...people.csail.mit.edu/sshum/talks/sshum-diarization... · • Integrated variational inference into speaker clustering! – Valente

This document is downloaded from DR‑NTU … · 2020. 3. 20. · This document is downloaded from DR‑NTU () Nanyang Technological University, Singapore. Speaker diarization in

The MIT Lincoln Laboratory RT-04F Diarization Systems ... Rich Transcription Workshop November 2004 The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast

1 Beta Counting System Li XiangQing, Jiang DongXing, Hua Hui, Wang EnHong Peking University 20100726@Chifeng

Improved i-Vector Representation for Speaker Diarization · Circuits Syst Signal Process (2016) 35:3393–3404 3397 Fig. 2 The proposed DNN/i-vector system, showing training step

CPAC 2014 Schedule v3.5Official’CPAC’events’appear’in’BOLD. 1 Sponsor’events’are’italicized. Start Session Title Location Speaker I Speaker 2 Speaker 3 Speaker 4 Speaker

Overlap-aware speaker diarization - GitHub Pages