42
RTTH Summer School on Speech Technology A Deep Learning Perspective July 6th - 9th, 2015, Barcelona, Spain TALP Research Center Signal Theory and Communications Department Universitat Politecnica de Catalunya - BarcelonaTech

RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

RTTH Summer School on Speech TechnologyA Deep Learning Perspective

July 6th - 9th, 2015, Barcelona, Spain

TALP Research Center Signal Theory and Communications Department

Universitat Politecnica de Catalunya - BarcelonaTech

Page 2: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Outline

o State-of-the-art in Speaker Recognition

o Deep Learning

o DNNs for Modeling i-Vectors in Speaker

2

o DNNs for Modeling i-Vectors in Speaker Recognition

o Experimental Results

o Conclusions

Deep Neural Networks for Speaker Recognition

Page 3: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

State-of-the-artBackground

What is Supervector? What is i-Vector?

μ1

μ

3Deep Neural Networks for Speaker Recognition

μ2

μM

:

w

Speech Feature Vectors MAP adapted GMM

Supervector i-Vector

(33) (512)

(33×512) (400)i-Vector is a compact representation of a speech signal

Page 4: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

State-of-the-artBackground

How to go from a Supervector to an i-Vector ?

- Using a factor analysis approach

o m is assumed to be normally distributed with mean vector m´ and covariance matrix TTt

i-Vector

Total Variability Matrix

Supervector

4Deep Neural Networks for Speaker Recognition

covariance matrix TTt

o w is a hidden variable which can be defined by the mean of its posterior distribution conditioned on Baum-Welch statistics for a given utterance.

Page 5: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep LearningBackground

Deep Neural Network :

A feed-forward artificial neural network with multiple layers of hidden units

How to train ?

5Deep Neural Networks for Speaker Recognition

Deep Neural Network(DNN)

Back-propagation algorithm given input vectors and class label for each input

How to initialize ?

o Small random numberso Deep Belief Network (DBN) parameterso Auto-encoders, …

Page 6: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep LearningBackground

Deep Belief Network :

A probabilistic generative model composed of many layers of hidden units above a visible input layer

How to train ?

6Deep Neural Networks for Speaker Recognition

Deep Belief Network(DBN)

How to train ?

Using Restricted Boltzmann Machines (RBM) layer by layer

How to initialize ?

Small random numbers

Page 7: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep LearningBackground

DBN Training - DNN Pre-Training :

o Every two adjacent layers are considered as an RBM

7Deep Neural Networks for Speaker Recognition

DBN

o Outputs of the first RBM are given to the next RBM as inputs

o The process is repeated until the top two layers are reached

Page 8: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep LearningBackground

RBM

Restricted Boltzmann Machine :

A generative undirected model constructed from two layers of hidden and visible units

How to train ?

8Deep Neural Networks for Speaker Recognition

RBM Training

o Stochastic gradient descent o Gradient is approximated by Contrastive Divergence (CD) algorithm

CD-1 Steps : Update Network Parameters :

Page 9: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning (Summary)Background

9Deep Neural Networks for Speaker Recognition

RBM RBM Training

DBN DBN Training/DNN Pre-TrainingDNN

Page 10: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning for Modeling i-Vectors

Goal :

Training a discriminative model for each target speaker

What We Have ?

10

o One i-vector (single session) or a couple of i-vectors (multi session) per each target speaker

o A large number of background i-vectors (impostors)

Deep Neural Networks for Speaker Recognition

:

Target i-Vector

Impostor i-Vectors

Page 11: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning for Modeling i-Vectors

Problems :

o Unbalanced data Bias towards the majority class

o Few data Overfitting

Our Proposal:

o Balanced training

11Deep Neural Networks for Speaker Recognition

o Balanced training

Impostor selection and clustering

Distributing equally impostor and target samples among minibatches

o DBN Adaptation

Take advantage of unsupervised learning of DBN using the whole background data called Universal DBN (UDBN)

Adapt UDBN to few data of each speaker :

Page 12: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning for Modeling i-Vectors

DBN

Step 3

DBN

Step 2 (Adaptation)

Target/ImpostorLabels

Impostor MiniBatch

UDBNBackground

i-vectors

12

DBN Adaptation

Discriminative Speaker Model

Step 1 (Balanced Training)

DNNClusteringImpostor Selection

MiniBatchBalance

Deep Neural Networks for Speaker Recognition

Target i-vectors

Page 13: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning for Modeling i-Vectors

Impostor Selection

DBN Adaptation

Clustering DNNMiniBatchBalance

DBN

Target

Target/ImpostorLabels

UDBNBackground

i-vectors

13

Discriminative Speaker Model

Target i-vectors

Step 1 : Balanced Training

Problem:

o A large number of impostor data (negative samples)o Very few number of target data (positive samples)

Step 1 (Balanced Training)

Deep Neural Networks for Speaker Recognition

Page 14: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning for Modeling i-Vectors

Impostor Selection

DBN Adaptation

Clustering DNNMiniBatchBalance

DBNTarget/Impostor

Labels

UDBN

Target

Background i-vectors

14

Discriminative Speaker Model

Step 1 : Balanced Training

Solutions:

o Global Impostor Selectiono Clustering using K-means (cosine distance criterion)o Equally distributing positive and negative samples among minibatches

Step 1 (Balanced Training)

Deep Neural Networks for Speaker Recognition

Target i-vectors

Page 15: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning for Modeling i-Vectors

Impostor Selection

DBN Adaptation

Clustering DNNMiniBatchBalance

DBN

Step 2 (Adaptation)

Target/ImpostorLabels

UDBN

Target

Background i-vectors

15

Discriminative Speaker Model

Step 2 : Adaptation

o Universal DBN (Unsupervised learning using background i-vectors)o Unsupervised Adaptation

Initialize networks by the UDBN parameters Unsupervised learning using balanced data with few iterations

Step 1 (Balanced Training)

Deep Neural Networks for Speaker Recognition

Target i-vectors

Page 16: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Deep Learning for Modeling i-Vectors

DBN Adaptation

Step 3

DBN

Step 2 (Adaptation)

Target/ImpostorLabels

DNNClusteringImpostor Selection

MiniBatchBalance

UDBN

Target

Background i-vectors

16

Discriminative Speaker Model

Step 1 (Balanced Training)

Step 3 : Fine-Tuning

o Supervised learning given impostor and target labels, adapted DBN, and balanced data

Deep Neural Networks for Speaker Recognition

Target i-vectors

Page 17: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Impostor Selection

17Deep Neural Networks for Speaker Recognition

Page 18: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Minibatch Balance

18Deep Neural Networks for Speaker Recognition

In each minibatch, we show the network the same target samples but different impostor centroids.

Page 19: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Minibatch Balance

19Deep Neural Networks for Speaker Recognition

In each minibatch, we show the network the same target samples but different impostor centroids.

Page 20: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Minibatch Balance

20Deep Neural Networks for Speaker Recognition

In each minibatch, we show the network the same target samples but different impostor centroids.

Page 21: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Minibatch Balance

21Deep Neural Networks for Speaker Recognition

In each minibatch, we show the network the same target samples but different impostor centroids.

Page 22: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

DBN Adaptation

22Deep Neural Networks for Speaker Recognition

DBN adaptation sets speaker specific initial points for each speaker model

Page 23: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Setup

o Databases

NIST SRE 2006 core test condition (Single session)

816 target speakers, 51,068 trials

NIST SRE 2006, Multi session task (8 samples per each target speaker)

23

699 target speakers, 31,080 trials

o i-vector size = 400

o Post-processing on i-vectors

Mean Normalization + Whitening

o Hidden layer size = 512

Deep Neural Networks for Speaker Recognition

Page 24: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

24Deep Neural Networks for Speaker Recognition

Page 25: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

25Deep Neural Networks for Speaker Recognition

Page 26: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

26Deep Neural Networks for Speaker Recognition

Page 27: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

27Deep Neural Networks for Speaker Recognition

Page 28: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

28Deep Neural Networks for Speaker Recognition

Baseline: i-Vector + cosine (EER = 7.18, minDCF = 324)

Page 29: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

29Deep Neural Networks for Speaker Recognition

Page 30: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

30Deep Neural Networks for Speaker Recognition

Page 31: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

31Deep Neural Networks for Speaker Recognition

Page 32: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

32Deep Neural Networks for Speaker Recognition

Page 33: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

33Deep Neural Networks for Speaker Recognition

Baseline: i-Vector + cosine (EER = 4.20, minDCF = 191)

Page 34: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

34Deep Neural Networks for Speaker Recognition

Page 35: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

35Deep Neural Networks for Speaker Recognition

Page 36: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Single Session Task)

36Deep Neural Networks for Speaker Recognition

Page 37: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

37Deep Neural Networks for Speaker Recognition

Page 38: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

38Deep Neural Networks for Speaker Recognition

Page 39: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

39Deep Neural Networks for Speaker Recognition

Page 40: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Experimental Results (Multi Session Task)

40Deep Neural Networks for Speaker Recognition

Page 41: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Conclusion

o Modeling discriminatively target and impostor i-vectors using DNN

o Adaptation of network parameters of each speaker from a background model called UDBN

41

o Decreasing the number of impostor i-vectors by the proposed impostor selection method

o The proposed systems outperform the baselines by more than 8% and 17% in the single and multi session tasks, respectively

Deep Neural Networks for Speaker Recognition

Page 42: RTTH Summer School on Speech Technology July 6th -9th ...rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf · RBM RBM Training DNN DBN DBN Training/DNN Pre-Training. Deep Learning

Q & A

Omid Ghahabi

[email protected]