44
Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4, 2009 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 1 / 34

Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Underdetermined Source SeparationUsing Speaker Subspace Models

Thesis Defense

Ron Weiss

May 4, 2009

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 1 / 34

Page 2: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

1 Introduction

2 Speaker subspace model

3 Monaural speech separation

4 Binaural separation

5 Conclusions

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 2 / 34

Page 3: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

1 Introduction

2 Speaker subspace model

3 Monaural speech separation

4 Binaural separation

5 Conclusions

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 3 / 34

Page 4: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Audio source separation

Many real world signals contain contributions from multiple sources

E.g. cocktail party

Want to infer the original sources from the mixture

Robust speech recognitionHearing aids

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 4 / 34

Page 5: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Previous work

Instantaneous mixing systemy1(t)...

yC (t)

=

a11 . . . a1I...

. . ....

aC1 . . . aCI

x1(t)

...xI (t)

Simplest case: more channels than sources (overdetermined)

Perfect separation possible

Use constraints on source signals to guide separation

Independence constraints (e.g. independent component analysis)Spatial constraints (e.g. beamforming)

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 5 / 34

Page 6: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Underdetermined source separation

More sources than channels, need stronger constraints

CASA: Use perceptual cues similar to human auditory system

Segment STFT into short glimpses of each sourceBy harmonicity, common onset, etc.Sequential grouping heuristicsCreate time-frequency mask for each source

Inference based on prior source models

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 6 / 34

Page 7: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Time-frequency masking

MixtureF

requ

ency

(kH

z)

0

2

4

6

8Clean source

Fre

quen

cy (

kHz)

0

2

4

6

8

Masks

Time (sec)

Fre

quen

cy (

kHz)

0.5 1 1.5 2 2.5 30

2

4

6

8Reconstructed source (8.2 dB SNR)

Time (sec)

Fre

quen

cy (

kHz)

0.5 1 1.5 2 2.5 30

2

4

6

8

−50

−40

−30

−20

−10

0

Natural sounds tend to be sparse in time and frequency

10% of spectrogram cells contain 78% of energy

And redundant

Still intelligible when 22% of source energy is masked

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 7 / 34

Page 8: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Model-based separation

Use constraints from prior source models to guide separationLeverage differences in spectral characteristics of different sources

Hidden Markov models, log spectral featuresFactorial model inferencee.g. IBM Iroquois system [Kristjansson et al., 2006]

Speaker-dependent modelsAcoustic dynamics and grammar constraintsSuperhuman performance under some conditions

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 8 / 34

Page 9: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Model-based separation – Limitations

Rely on speaker-dependent models to disambiguate sources

What if the task isn’t so well defined?

No prior knowledge of speaker identities or grammar

Use speaker-independent (SI) model for all sources

Need strong temporal constraints or sources will permute

“place white by t 4 now” mixed with “lay green with p 9 again”Separated source: “place white by t p 9 again”

Solution: adapt speaker-independent model to compensate

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 9 / 34

Page 10: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

1 Introduction

2 Speaker subspace modelModel adaptationEigenvoices

3 Monaural speech separation

4 Binaural separation

5 Conclusions

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 10 / 34

Page 11: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Model selection vs. adaptation

Model selection (e.g. [Kristjansson et al., 2006])

Given set of speaker-dependent (SD) models:1 Identify sources in mixture2 Use corresponding models for separation

How to generalize to speakers outside of training set?

Selection – choose closest modelAdaptation – interpolate

Speaker modelsMean voiceSpeaker subspace basesQuantization boundaries

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 11 / 34

Page 12: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Model adaptation

Adjust model parameters to better matchobservations

Caveats1 Want to adapt to a single utterance, not

enough data for MLLR, MAP

Need adaptation framework with fewparameters

2 Observations are mixture of multiple sources

Iterative separation/adaptation algorithm

Feature 1

Fea

ture

2

Original distributionObservationsAdapted distribution

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 12 / 34

Page 13: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Eigenvoice adaptation [Kuhn et al., 2000]

Train a set of SD models

Pack params into speaker supervectorSamples from space of speaker variation

Principal component analysis to findorthonormal bases for speaker subspace

Model is linear combination of bases

Speaker modelsSpeaker subspace basesOther models

Eigenvoice adaptation

µ = µ̄ + U w + B hadapted mean eigenvoice weights channel channel

model voice bases bases weights

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 13 / 34

Page 14: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Eigenvoice bases

Mean voice= speaker-independent model

Eigenvoices shift formantfrequencies, add pitch

Independent bases to capturechannel variation

Fre

quen

cy (

kHz)

Mean Voice

b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaw ay ah aoowuw ax

2

4

6

8

Fre

quen

cy (

kHz)

Eigenvoice dimension 1

b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaw ay ah aoowuw ax

2

4

6

8

Fre

quen

cy (

kHz)

Eigenvoice dimension 2

b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaw ay ah aoowuw ax

2

4

6

8

Fre

quen

cy (

kHz)

Eigenvoice dimension 3

b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaw ay ah aoowuw ax

2

4

6

8

−50

−40

−30

−20

−10

0

2

4

6

8

0

2

4

6

8

0

2

4

6

8

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 14 / 34

Page 15: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

1 Introduction

2 Speaker subspace model

3 Monaural speech separationMixed signal modelAdaptation algorithmExperiments

4 Binaural separation

5 Conclusions

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 15 / 34

Page 16: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Eigenvoice factorial HMM

Model mixture with combination of source HMMs

Need adaptation parameters wi to estimate source signals xi (t)and vice versa

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 16 / 34

Page 17: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Adaptation algorithm

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 17 / 34

Page 18: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Adaptation example

Mixture: t32_swil2a_m18_sbar9n

02468

−40

−20

0

Adaptation iteration 1

02468

−40

−20

0

Fre

quen

cy (

kHz) Adaptation iteration 3

02468

−40

−20

0

Adaptation iteration 5

02468

−40

−20

0

Time (sec)

SD model separation

0 0.5 1 1.502468

−40

−20

0

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 18 / 34

Page 19: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

2006 Speech separation challenge [Cooke and Lee, 2006]

Single channel mixtures of utterances from 34 different speakers

Constrained grammar:command(4) color(4) preposition(4) letter(25) digit(10) adverb(4)

Separation/recognition taskDetermine letter and digit for source that said “white”

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 19 / 34

Page 20: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Performance – Adapted vs. source-dependent models

−3 dB

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 20 / 34

Page 21: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Experiments – Switchboard

Mixture

Time (sec)

Fre

quen

cy (

kHz)

1 2 3 4

2

4

6

8

−50

−40

−30

−20

−10

0

What about previously unseen speakers?Switchboard: corpus of conversational telephone speech

200+ hours, 500+ speakersTask significantly more difficult than Speech Separation Challenge

Spontaneous speechLarge vocabularySignificant channel variation across calls

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 21 / 34

Page 22: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Switchboard – Results

Adaptation outperforms SD model selection

Model selection errors due to channel variation

SD performance drops off under mismatched conditions

SA performance improves as number of training speakers increases

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 22 / 34

Page 23: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

1 Introduction

2 Speaker subspace model

3 Monaural speech separation

4 Binaural separationMixed signal modelParameter estimation and source separationExperiments

5 Conclusions

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 23 / 34

Page 24: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Binaural audition

y`(t) =∑

i

xi (t − τ `i ) ∗ h`

i (t)

yr (t) =∑

i

xi (t − τ ri ) ∗ hr

i (t)

Given stereo recording of multiple sound sources

Utilize spatial cues to aid separation

Interaural time difference (ITD)Interaural level difference (ILD)

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 24 / 34

Page 25: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

MESSL: Interaural model [Mandel and Ellis, 2007]

Model-based EM Source Separation and LocalizationProbabilistic model of interaural spectrogram

Independent of underlying source signals

Assume each time-frequency cell is dominated by a single source

EM algorithm to learn model parameters for each source

Derive probabilistic time-frequency masks for separation

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 25 / 34

Page 26: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

MESSL-SP: Source prior

Extend MESSL to include prior source model

Pre-trained GMM for speech signals in mixture

Channel model to compensate for HRTF and reverberation

Can incorporate eigenvoice adaptation (MESSL-EV)

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 26 / 34

Page 27: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Parameter estimation and source separation

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 27 / 34

Page 28: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Experiments

0

0.2

0.4

0.6

0.8

1

Ground truth (12.04 dB)

Fre

quen

cy (

kHz)

0 0.5 1 1.50

2

4

6

8DUET (3.84 dB)

0 0.5 1 1.50

2

4

6

82D−FD−BSS (5.41 dB)

0 0.5 1 1.50

2

4

6

8

MESSL (5.66 dB)

Fre

quen

cy (

kHz)

Time (sec)0 0.5 1 1.5

0

2

4

6

8MESSL−SP (10.01 dB)

Time (sec)0 0.5 1 1.5

0

2

4

6

8MESSL−EV (10.37 dB)

Time (sec)

0 0.5 1 1.50

2

4

6

8

SSC

TIMIT

Mixtures of 2 and 3 speech sources, anechoic and reverberant

Evaluated on TIMIT and SSC test data

Source models trained on SSC data (32 components)Compare MESSL systems to:

DUET – Clustering using ILD/ITD histogram [Yilmaz and Rickard, 2004]

2S-FD-BSS – Frequency domain ICA [Sawada et al., 2007]

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 28 / 34

Page 29: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Experiments – Performance as function of distractor angle

0 20 40 60 80

0

5

10

15S

NR

impr

ovem

ent (

dB)

2 sources (anechoic)

Ground truth

MESSL−EV

MESSL−SP

MESSL

2S−FD−BSS

DUET

20 40 60 80

0

5

10

15

2 sources (reverb)

0 20 40 60 80

0

5

10

15

SN

R im

prov

emen

t (dB

)

Separation (degrees)

3 sources (anechoic)

20 40 60 80

0

5

10

15

Separation (degrees)

3 sources (reverb)

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 29 / 34

Page 30: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Experiments – Matched vs. mismatched

Average0

2

4

6

8

10

12

SN

R Im

prov

emen

t (dB

)

GRID

Average0

2

4

6

8

10

12

SN

R Im

prov

emen

t (dB

)

TIMIT

Ground truth MESSL−EV MESSL−SP MESSL 2S−FD−BSS DUET

SSC – matched train/test speakers

MESSL-EV, MESSL-SP beat MESSL baseline by ∼ 3 dB in reverbMESSL-EV beats MESSL-SP by ∼ 1 dB on anechoic mixtures

TIMIT – mismatched train/test speakers

Small difference between MESSL-EV and MESSL-SP

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 30 / 34

Page 31: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

1 Introduction

2 Speaker subspace model

3 Monaural speech separation

4 Binaural separation

5 Conclusions

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 31 / 34

Page 32: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Summary

Prior signal models for underdetermined source separation

Subspace model for source adaptation

Adapt Gaussian means and covariances using a single utteranceNatural extension to compensate for source-independent channel effects

Monaural separation

Speaker-dependent > speaker-adapted � speaker-independentAdaptation helps generalize better to held out speakersImproves as number of training speakers increases

Binaural separation

Extend MESSL framework to use source models (joint with M. Mandel)Improved performance by incorporating simple SI modelSmaller improvement with adaptation

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 32 / 34

Page 33: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

Contributions

Model-based source separation making minimal assumptionsusing subspace adaptation

Extend model-based approach to binaural separation

Ellis, D. P. W. and Weiss, R. J. (2006).

Model-based monaural source separation using a vector-quantized phase-vocoder representation.In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages V–957–960.

Weiss, R. J. and Ellis, D. P. W. (2006).

Estimating single-channel source separation masks: Relevance vector machine classifiers vs. pitch-based masking.In Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA), pages 31–36.

Weiss, R. J. and Ellis, D. P. W. (2007).

Monaural speech separation using source-adapted models.In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 114–117.

Weiss, R. J. and Ellis, D. P. W. (2008).

Speech separation using speaker-adapted eigenvoice speech models.Computer Speech and Language, In Press, Corrected Proof:–.

Weiss, R. J., Mandel, M. I., and Ellis, D. P. W. (2008).

Source separation based on binaural cues and source model constraints.In Proc. Interspeech, pages 419–422.

Weiss, R. J. and Ellis, D. P. W. (2009).

A Variational EM Algorithm for Learning Eigenvoice Parameters in Mixed Signals.In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 33 / 34

Page 34: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions

References

Cooke, M. and Lee, T.-W. (2006).

The speech separation challenge.

Kristjansson, T., Hershey, J., Olsen, P., Rennie, S., and Gopinath, R. (2006).

Super-human multi-talker speech recognition: The IBM 2006 speech separation challenge system.In Proc. Interspeech, pages 97–100.

Kuhn, R., Junqua, J., Nguyen, P., and Niedzielski, N. (2000).

Rapid speaker adaptation in eigenvoice space.IEEE Transations on Speech and Audio Processing, 8(6):695–707.

Mandel, M. I. and Ellis, D. P. W. (2007).

EM localization and separation using interaural level and phase cues.In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

Sawada, H., Araki, S., and Makino, S. (2007).

A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures.In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

Yilmaz, O. and Rickard, S. (2004).

Blind separation of speech mixtures via time-frequency masking.IEEE Transactions on Signal Processing, 52(7):1830–1847.

Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 34 / 34

Page 35: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

6 Extra slides

Page 36: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

Factorial HMM separation

Each source signal ischaracterized by statesequence through its HMM

Viterbi algorithm to findmaximum likelihood paththrough combined factorialHMM

Reconstruct source signalsusing Viterbi path

Aggressively prune unlikelypaths to speed up separation

Page 37: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

Adaptation algorithm initialization

−1000 0 1000 2000−2000

−1000

0

1000

20001 2

w1

w2

−1000 0 1000 2000

1

2

w1

−1000 0 1000 2000

1

2

w1

MaleFemale

Fast convergence needs good initialization

Want to differentiate source models to get best initial separation

Treat each eigenvoice dimension independentlyCoarsely quantize weightsFind most likely combination in mixture

Page 38: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

Adaptation performance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1520

25

30

35

40

45

50

55

60

65

70

Iteration

Avg

Acc

urac

y

Diff Gender

Same Gender

Same Talker

Letter-digit accuracy averaged across all TMRs

Adaptation clearly improves separation

Same talker case hard – source permutations

Page 39: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

Variational learning

Approximate EM algorithm to estimate adaptation parameters

Treat each source HMM independently

Introduce variational parameters to couple them

Page 40: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

Performance – Learning algorithm comparison

Adapting Gaussian covariances and means significantly improvesperformance

Hierarchical algorithm outperforms variational EM

But variational algorithm is significantly (∼ 4x) faster

At same speed variational EM performs better

Page 41: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

Performance – Comparison to other participants

Page 42: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

MESSL-EV: Putting it all together

Big mixture of Gaussians

Interaural model

ITD: Gaussian for each sourceand time delayILD: Single Gaussian for each source

Source model

Separate channel responses for eachsource at each earBoth channels share eigenvoiceadaptation parameters

Explain each point in spectrogram by a particular source, time delay, andsource model mixture component

Page 43: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

MESSL-EV example

IPD (0.73 dB)F

requ

ency

(kH

z)

0

2

4

6

8

ILD (8.54 dB)

Fre

quen

cy (

kHz)

0

2

4

6

8

SP (7.93 dB)

Fre

quen

cy (

kHz)

0

2

4

6

8

Full mask (10.37 dB)

Time (sec)

0 0.5 1 1.50

2

4

6

8

0

0.5

1

IPD informative in low frequencies, but not in high frequencies

ILD primarily adds information about high frequenciesSource model introduces correlations across frequency and emphasizesreliable time-frequency regions

Helps resolve ambiguities in interaural parameters from reverberation andspatial aliasing

Page 44: Underdetermined Source Separation Using Speaker Subspace ...ronw/talks/2009-05-04-thesis_defense.pdf · 04/05/2009  · OutlineIntroduction Speaker subspace model Monaural speech

Extra slides

Just for fun...