18
Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 28th, 2012 Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold H¨ ab-Umbach

Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Improved noise power spectral density trackingby a MAP-based postprocessor

Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach

University of Paderborn, Germany

March 28th, 2012

Computer Science, ElectricalEngineering and Mathematics

Communications EngineeringProf. Dr.-Ing. Reinhold Hab-Umbach

��

Page 2: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Table of Contents

1 Introduction and problem formulation

2 MAP-based noise PSD estimation

3 Experimental framework

4 Performance evaluation

5 Summary and outlook

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 1 / 13

��

Page 3: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Introduction

yt

Motivation

• Noise PSD estimation is a key component

◮ to speech enhancement

◮ to robust automatic speechrecognition

Basic assumptions

• Many sophisticated algorithms rely on two assumptions:

◮ Noise ’more stationary’ than speech

◮ Noise-only time-frequency bins at regular intervals

Here

• MAP-based (MAP-B) postprocessor

◮ Estimate of noise power even if speech is dominant in time-frequency bin

◮ Initial estimate of the current speech power required

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 2 / 13

��

Page 4: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

MAP-B as a postprocessor

σ2N,k,l

σ2X,k,l

x(I)t

σ2N,k,l

STDFT

yt

nt - noise signal

xt - clean speech signal

frequency bin index

frame index

Noiseestimation

Yk,l

a prioriSNR

a prioriSNR

X(I)k,l

X(II)k,l

x(II)t

Gain Gain

MAP-B

ISTDFT ISTDFT

first speech enhancement stage

second speech enhancement stage

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 3 / 13

��

Page 5: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Problem formulation

Observed

Yl = Nl + Xl

• We consider Nl as target process ’corrupted’ by clean speech Xl of knownvariance σ2

X,l

Our goal

• Estimation of noise variance σ2N,l+1 at frame l+ 1, from the noisy observation

yl+1 given

◮ the current speech power σ2X,l+1

◮ a priori PDF pσ2N(σ2) of variance σ2

N,l

Approach

• Maximum A Posteriori (MAP) estimation

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 4 / 13

��

Page 6: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

MAP-based noise PSD estimation

Bayesian variance estimation: Yl = Nl

If σ2X,l+1 = 0 (uncorrupted observ.) and σ2

N,l+1 = σ2N

(stationary) then the textbook

problem:

• Scaled inverse chi-square distribution

pσ2N(σ2) ∝ (σ2)−

νl+2

2 · e−

νlλ2l

2σ2

with the degrees of freedom νl and the scale factor λ2lis conjugate prior to normal

observation PDF

pYl+1|σ

2N(yl+1|σ

2) =1

πσ2· e−

|yl+1|2

σ2

• Parameter update νl+1 = νl + 2 and λ2l+1 = 2

νl+2|yl+1|

2 + νlνl+2

λ2l

• MAP-estimate of variance

σ2N,l+1 = argmax

σ2

[

pσ2N|Yl+1

(σ2|yl+1)]

=νl+1

νl+1 + 2· λ2

l+1

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 5 / 13

��

Page 7: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Extension to non-stationary noise

Still ’uncorrupted’ noise: Yl = Nl

If σ2N,l+1 is time-variant then:

• The parameter νl is kept at a constant value νl+1 = νl = ν0

• This results in recursive smoothing of variance estimate

σ2N,l+1 = (1 − α) · σ2

N,l + α · |yl+1|2, where α =

2

ν0 + 4

Choice of ν0

• Trade-off between tracking ability and estimation error in stationary noise

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 6 / 13

��

Page 8: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

General case

Observation ’corrupted’ by speech: Yl = Nl + Xl

• If σ2X,l+1 6= 0 and is known then the posterior PDF

pσ2N|Yl+1

(σ2|yl+1) ∝ (σ2X,l+1 + σ2)−1 · (σ2)−

νl+2

2 · e−

(

|yl+1|2

σ2X,l+1

+σ2+

νlλ2l

2σ2

)

,

is no longer a conjugate prior for the observation PDF.

In order to maintain an efficient MAP estimation procedure we

• approximate the posterior PDF by a scaled inverse chi-squared distribution,

• and match its maximum (σ2N,l+1) with the maximum of the posterior PDF,

• which we calculate efficiently using a bisection and Newton approach.

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 7 / 13

��

Page 9: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Experimental framework

MAP-B as postprocessor

• First noise PSD estimator: Improved Minima Controlled Recursive Averaging(IMCRA) algorithm [Cohen, 2003]

• Gain function: Optimally-Modified Log-Spectral Amplitude (OM-LSA) estimator[Cohen, Berdugo, 2001]

Yk,l+1

IMCRAσ2N,k,l+1 a priori

a priori

SNR

SNR

σ2X,k,l+1

MAP-Bσ2N,k,l+1

OM-LSAOM-LSA

X(I)k,l+1 X

(II)k,l+1

first speech enhancement stage

second speech enhancement stage

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 8 / 13

��

Page 10: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Performance evaluation

Setup

• Clean speech: TIMIT database, sentences concatenated to 3 minutes length

• Artificially added noise from Noisex92 database:

◮ Noise types: ’Stationary WGN’, ’Triangular WGN’, ’Babble’ and ’Factory-1’

◮ SNR values: −5 , 0 , 5 , 10 , 15 dB

• MAP-B estimator: we set ν0 = 40 corresponding to a time constant of ≃ 0.164 s

Reference noise PSD

• Recursive temporal smoothing

σ2N,k,l = 0.95 · σ2

N,k,l−1 + 0.05 · |Nk,l|2, with known noise periodogram |Nk,l|

2

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 9 / 13

��

Page 11: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Sample trajectories of noise variance estimates

• ’Babbble’ noise at frequency bink = 97 (3 kHz)

Time [s]

[dB]

4 5 6 7 8

50

56

62

• MAP-B: continious update ofnoise variance estimate

• ’Triangular WGN’ (averaged overall frequency bins)

Time [s][dB]

4 5 6 7 8

64

66

68

reference

IMCRA

MAP-B

• MAP-B: faster response to risingnoise power

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 10 / 13

��

Page 12: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Quantitative evaluation

• Performance measures adopted from [Taghia et al., 2011]

• Fig.(a): minimum averaged logdistance LEm between thereference and estimated PSD

• MAP-B obtains lower errorLEm for all noise types andSNRs less than or equal to5 dB or 10 dB

• Fig.(b): variance of thelogarithmic difference LEv

• MAP-B yields lower varianceLEv for all noise types andSNRs than the IMCRA

IMCR

A

IMCR

A

IMCR

A

IMCR

A

MAP-B

MAP-B

MAP-B

MAP-B

(a)

(b)

0

1

2

3

0

1

2

3

4

Stationary Triangular Babble Factory-1WGNWGN

LE

mLE

v

-5 dB 0 dB 5 dB 10 dB 15 dB

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 11 / 13

��

Page 13: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Speech enhancement

• Gain in perceptual speech quality (PESQ) scores

PESQGain = PESQMAP-B − PESQIMCRA

Fema

le

Fema

le

Fema

le

Fema

leMaleMaleMaleMale

0.0

0.1Stationary Triangular Babble Factory-1

WGNWGN

PESQ

Gain

-5 dB 0 dB 5 dB 10 dB 15 dB

• MAP-B has a favourable effect on speech quality for non-stationary noise types

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 12 / 13

��

Page 14: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Summary and outlook

Summary

• Proposed MAP-B estimator is able to track the noise statistics even if the speechis dominant

• Low computational complexity

• Single parameter ν0

• Experimental evaluation: MAP-B obtains

◮ lower estimation error under low SNR conditions

◮ lower fluctuation of the estimated values under all tested environments

◮ slightly improved speech quality for non-stationary noise types

Outlook

• Investigations about dependence of MAP-B algorithm performance:

◮ on the first noise PSD estimator

◮ on the degrees of freedom ν0

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13

��

Page 15: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Thank you for your attention!

Questions?

Computer Science, ElectricalEngineering and Mathematics

Communications EngineeringProf. Dr.-Ing. Reinhold Hab-Umbach

��

Page 16: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Periodograms of cleen, noisy and enhanced signals

• Periodogramms of the spokensentence ’Biblical scholarsargue history’ for ’factory-1’noise at an SNR of 5 dB(a) cleen speech signal

(b) noisy speech signal

(c) enhanced speech signal basedon IMCRA estimates

(d) enhanced speech signal basedon MAP-B estimates

• MAP-B has a positive influenceon periodogramm of enhancedsignal particularly clearly seenin the highlighted window

File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_mapb57.wav Page: 1 of 1 Printed: Mo Mär 19 18:47:36

1

2

3

4

5

6

7

kHz

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

time

File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_noisy57.wav Page: 1 of 2 Printed: Mo Mär 19 18:46:06

1

2

3

4

5

6

7

kHz

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

time

File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_or57.wav Page: 1 of 1 Printed: Mo Mär 19 18:44:19

1

2

3

4

5

6

7

kHz

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

time

File: /home/chinaev/Desktop/ICASSP2012/MatlabCode/CodeCohen/DatasForPesq/Fem/Factory1and5dB/s_imcra57.wav Page: 1 of 2 Printed: Mo Mär 19 18:47:00

1

2

3

4

5

6

7

kHz

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

time

(a)

(b)

(c)

(d)

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13

��

Page 17: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Minimum Statistics (MS) instead of IMCRA

• Performance measures LEm and LEv for female speaker signals

MSMSMSMSMAP

-B

MAP-B

MAP-B

MAP-B

(a)

(b)

0

3

6

9

0

2

4

Stationary Triangular Babble Factory-1WGNWGN

LE

mLE

v

-5 dB 0 dB 5 dB 10 dB 15 dB

• Fig.(a): MAP-Bobtains lowerestimation error LEm

than the MS for allnoise types and SNRs

• Fig.(b): MAP-B yieldslower variance LEv

than the MS for alltested setups

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13

��

Page 18: Improvednoisepowerspectraldensitytracking byaMAP … · 2016. 11. 8. · Improvednoisepowerspectraldensitytracking byaMAP-basedpostprocessor Aleksej Chinaev, Alexander Krueger, Dang

Minimum Statistics (MS) instead of IMCRA

• PESQGain = PESQMAP-B − PESQNoiseEst

with ’NoiseEst’ ∈ [’IMCRA’; ’MS’] for female speaker signals

IMCR

A

IMCR

A

IMCR

A

IMCR

AMSMSMSMS

0.0

0.1Stationary Triangular Babble Factory-1

WGNWGN

PESQ

Gain

-5 dB 0 dB 5 dB 10 dB 15 dB

• A favourable effect of MAP-B estimator on speech quality for non-stationarynoise types is for MS smaller than for IMCRA

Improved noise power spectral density tracking by a MAP-based postprocessor

A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13

��