Multi-microphone noise reduction and dereverberation techniques for speech applications Simon Doclo Dept. of Electrical Engineering, KU Leuven, Belgium

Multi-microphone noise reduction Multi-microphone noise reduction

and dereverberation techniques and dereverberation techniques

for speech applicationsfor speech applications

Simon Doclo

Dept. of Electrical Engineering, KU Leuven, Belgium

8 July 2003

22

OverviewOverview

• Introduction

• Basic principles

• Robust broadband beamforming

• Multi-microphone optimal filtering

• Acoustic transfer function estimation and dereverberation

• Conclusion and further research

33

OverviewOverview

• Introduction

� Motivation and applications

� Problem statement

� Contributions






44

• Speech acquisition in an adverse acoustic environment

MotivationMotivation

• Speech communication applications: hands-free mobiletelephony, voice-controlled systems, hearing aids

Background noise:- fan, radio- other speakers- generally unknown

Reverberation- reflections of signal against walls, objects

• Poor signal quality

• Speech intelligibility and speech recognition

Introduction -Motivation -Problem statement

-Contributions

Basic principles

Beamforming

Multi-microphoneoptimal filtering

Transfer functionestimation anddereverberation

Conclusion

55

Signal enhancement

ObjectivesObjectives

• Signal enhancement techniques:� Noise reduction : reduce amount of background

noise without distorting speech signal� Dereverberation : reduce effect of signal reflections

� Combined noise reduction and dereverberation

• Acoustic source localisation: video camera or spotlight


-Contributions

Basic principles

Beamforming



Conclusion

66

• Video-conferencing:� Microphone array for source localisation :

– point camera towards active speaker– signal enhancement by steering of microphone array

ApplicationsApplications

• Hands-free mobile telephony:

� Most important application from economic point of view

� Hands-free car kit mandatory in many countries

� Most current systems: 1 directional microphone


-Contributions

Basic principles

Beamforming



Conclusion

77

• Hearing aids and cochlear implants:� most hearing impaired suffer from perceptual hearing

loss amplification

reduction of noise wrt useful speech signal

ApplicationsApplications

• Voice-controlled systems:� domotic systems, consumer electronics (HiFi, PC software)� added value only when speech recognition system

performs reliably under all circumstances� signal enhancement as pre-processing step

� multiple microphones + DSP in hearing aid� current systems: simple beamforming � robustness important due to small inter-microphone distance


-Contributions

Basic principles

Beamforming


Transfer functionestimation and dereverberation

Conclusion

88

Algorithmic requirementsAlgorithmic requirements

• ‘Blind’ techniques: unknown noise sources and acoustic environment

• Adaptive: time-variant signals and acoustic environment

• Robustness:

� Microphone characteristics (gain, phase, position)

� Other deviations from assumed signal model

(look direction error, VAD)

• Integration of different enhancement techniques

• Computational complexity


-Contributions

Basic principles

Beamforming



Conclusion

99

Problem statementProblem statement

• Problem of existing techniques:

� Single-microphone techniques: very limited performance multi-microphone techniques: exploit spatial

information multiple microphones required for source localisation

� A-priori assumptions about position of signal sources and microphone array: large sensitivity to deviations improve robustness (and performance)

� Assumption of spatio-temporally white noise extension to coloured noise

Development of multi-microphone noise reduction and dereverberation

techniques with better performance and robustness

for coloured noise scenarios


-Contributions

Basic principles

Beamforming



Conclusion

State-of-the-art and State-of-the-art and

contributionscontributions

1010

Single-microphone techniques

– spectral subtraction [Boll 79, Ephraim 85, Xie 96]

•Signal-independent transformation

•Residual noise problem

– subspace-based [Dendrinos 91, Ephraim 95, Jensen 95]

•Signal-dependent transformation

•Signal + noise subspace

2. Multi-microphone optimal filtering

spatial information

robustness

3. Blind transfer function

estimation and

dereverberation

1. Robust broadband

beamforming

Multi-microphone techniques

– fixed beamforming [Dolph 46, Cox 86, Ward 95, Elko 00]

•Fixed directivity pattern

– adaptive beamforming [Frost 72, Griffiths 82, Gannot 01]

•adapt to different acoustic environments performance

•`Generalised Sidelobe Canceller’ (GSC)

– inverse, matched filtering [Myoshi 88, Flanagan 93, Affes 97]

only spectral information

a-priori assumptions

1111

OverviewOverview

• Introduction


� Signal model

� Signal characteristics and acoustic environment





1212

Signal modelSignal model

• Signal model for microphone signals in time-domain: filtered version of clean speech signal + additive coloured noise

][0 ky

][1 ky

][1 kyN

][][][ kvkxky nnn ][kvn][khn ][ks

Acousticimpulse response

][ks

Speechsignal

Additivenoise

Introduction

Basic principles -Signal model -Characteristics

Beamforming



Conclusion

1313

Signal modelSignal model

• Multi-microphone signal enhancement: microphone signals are filtered with filters wn[k] and summed

� f [k] = total transfer function for speech component

� zv[k] = residual noise component

][

][][][

][

][][][][][1

0

1

0

1

0

kz

kvkwks

kf

khkwkykwkz

v

N

nnn

N

nnn

N

nnn

• Techniques differ in calculation of filters:

� Noise reduction : minimise residual noise zv[k] and limit speech distortion

� Dereverberation : f [k]=δ [k] by estimating acoustic impulse responses hn[k]


Introduction


Beamforming



Conclusion

1414

Signal characteristicsSignal characteristics

• Speech:

� Broadband (300-8000 Hz)

� Non-stationary

� On/off-characteristic

Speech detection algorithm (VAD)

� Linear low-rank model: linearcombination of basis functions

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Am

plit

ud

e

Time (sec)

][][1

kak i

R

ii

ss (R=12…20)

• Noise:

� unknown signals (no reference available)

� slowly time-varying (fan) non-stationary (radio, speech)

� localised diffuse noise

Introduction


Beamforming



Conclusion

1515

Acoustic environmentAcoustic environment

• Reverberation time T60 : global characterisation

• Acoustic impulse responses:

� Acoustic filtering between2 points in a room

� FIR filter (K=1000…2000 taps)

� Non-minimum-phase system no stable inverse

• Microphone array:

� Assumption: point sensors with ideal characteristics

� Deviations: gain, phase, position

� Distance speaker – microphone array: far-field near-field

Car Room Church

70 ms 250 ms 1500 ms

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (sec)

Am

plit

ud

e

Impulse response PSK row 9

Introduction


Beamforming



Conclusion

1616

OverviewOverview

• Introduction



� Novel design procedures for broadband beamformers

� Robust beamforming for gain and phase errors




1717

Fixed beamformingFixed beamforming

• Speech and noise sources with overlapping spectrum at different positions

Exploit spatial diversity by using multiple microphones

• Technique originally developed for radar applications:

� Smallband : delay compensation broadband

� Far-field : planar waves near-field : spherical waves

� Known sensor characteristics deviations

- Low complexity- Robustness at low signal-to-noise ratio (SNR)

- A-priori knowledge of microphone array characteristics- Signal-independent

FIR filter-and-sum structure: arbitrary spatial directivity pattern for arbitrary microphone array configuration

Suppress noise and reverberation from certain directions

Introduction

Basic principles Beamforming -Design -Robustness



Conclusion

1818

Filter-and-sum configurationFilter-and-sum configuration

• Objective: calculate filters wn[k] such that beamformer

performs desired (fixed) spatial and spectral filtering

Far-field: - planar waves- equal attenuation

2D filter design in angle and frequency

Spatial directivity pattern:

),()(

),(),(

gwT

S

ZH

Desired spatial directivity pattern:

),( D

Introduction




Conclusion

1919

Design proceduresDesign procedures

• Design filter w such that spatial directivity pattern optimally fits minimisation of cost function

� Broadband problem: no design for separate frequencies i

design over complete frequency-angle region

� No approximations of integrals by finite Riemann-sum

� Microphone configuration not included in optimisation

• Cost functions:

� Least-squares quadratic function

� Non-linear cost function iterative optimisation = complex!

[Kajala 99]

ddDHFJ LS

2),(),(),()(w

amplitude and phase

ddDHFJ NL

222),(),(),()(w

Double integrals only need to be calculated once

),( H),( D

Introduction




Conclusion

2020

Design proceduresDesign procedures

• 2 non-iterative cost functions, based on eigenfilters:

� Eigenfilters: 1D and 2D FIR filter design

� Extension to design of broadband beamformers

• Novel cost functions:

� Conventional eigenfilter technique (G)EVD

� Eigenfilter based on TLS-criterion GEVD

• Conclusion: TLS-eigenfilter preferred non-iterative design procedure

ddDH

FJtote

TTLS 1

),(),(),()(

2

wQww

[Vaidyanathan 87, Pei 01]

ddHH

D

DFJ cc

cceig

2

),(),(),(

),(),()(w

reference point required

Introduction




Conclusion

2121

Non-linear procedure TLS-Eigenfilter

SimulationsSimulations

Angle (deg) Freq (Hz)

dB


dB

Parameters:-N=5, d=4cm-L=20, fs=8kHz-Pass: 40o-80o

-Stop: 0o-30o + 90o-180o

Delay-and-sum


dB

2222

Near-field configurationNear-field configuration

• Near-field: spherical waves + attenuation

• Ultimate goal: design for all distances

• One specific distance: very similar to far-field design (different calculation of double integrals)

• Several distances: trivial extension for most cost functions, for TLS-eigenfilter = sum of generalised Rayleigh-quotients

Take into account distance r between speaker - microphones

Rtot drddrDrHrFJ 2),,(),,(),,()(w

Finite number (R) of distances

R

rrrtot JJ

1

)()( ww

Deviation for other distances

Trade-off performance for different distances

Introduction




Conclusion

2323

Far-field pattern Near-field pattern (r=0.2m)


Angle (deg)

Frequency (Hz)

dB

Far-fi

eld

desig

n

Angle (deg)

Frequency (Hz)

dB

Mix

ed n

ear-fi

eld

far-

field

Angle (deg)

Frequency (Hz)

dB

Angle (deg)

Frequency (Hz)

dB

Parameters:-N=5, d=4cm-L=20, fs=8kHz-Pass: 70o-110o

-Stop: 0o-60o + 120o-180o

2424

• Small deviations from the assumed microphone characteristics (gain, phase, position) large deviations from desired directivity pattern, especially for small-size microphone arrays

• In practice microphone characteristics are never exactly known

• Consider all feasible microphone characteristics and optimise

� average performance using probability as weight

– requires knowledge about probability density functions

� worst-case performance minimax optimisation problem

– finite grid of microphone characteristics high complexity

Robust broadband beamformingRobust broadband beamforming

101010 )()(),,(0 1

NNN

A A

mean dAdAAfAfAAJJN

Incorporate specific (random) deviations in design

position

/cos

phase

),(

gain

),(),( cfjjnn

snn eeaA

Measurement or calibration procedure

Introduction




Conclusion

2525


• Non-linear design procedure

• N=3, positions: [-0.01 0 0.015] m, L=20, fs=8 kHz

• Passband = 0o-60o, 300-4000 Hz (endfire)Stopband = 80o-180o, 300-4000 Hz

• Robust design - average performance:Uniform pdf = gain (0.85-1.15) and phase (-5o-10o)

• Deviation = [0.9 1.1 1.05] and [5o -2o 5o]

Design J Jdev Jmean Jmax

Non-robust 0.1585 87.131 275.40 3623.6

Average cost 0.2196 0.2219 0.3371 0.4990

Maximumcost

0.1707 0.1990 0.4114 0.4167

Introduction




Conclusion

2626

Non-robust design Robust design

No d

evia

tions

Devia

tions (g

ain

/phase

)


Angle (deg)

Frequency (Hz)

dB

Angle (deg)

Frequency (Hz)

dB

Angle (deg)

Frequency (Hz)

dB

Angle (deg)

Frequency (Hz)

dB

Introduction




Conclusion

Non-robust design Robust design


2727

2828

OverviewOverview

• Introduction




� GSVD-based optimal filtering technique

� Reduction of computational complexity

� Simulations



2929

Multi-microphone optimal Multi-microphone optimal filteringfiltering

Objective: optimal estimate of speech components

in microphone signals

Minimise MSE 2][][ kzkxE n No a-priori assumptions

2

][

2

][][][][min][][min kkkEkkE T

kkyWxzx

WW

][][][ 1 kkk yxyyWF RRW

Multi-channel Wiener Filter

][][][][ 1 kkkk vvyyyyWF RRRW

-Speech and noise independent-2nd order statistics noise stationary estimate during noise periods (VAD)

Multi-microphone

Signal-dependent Robustness

Introduction

Basic principles Beamforming Multi-microphoneoptimal filtering

-Optimal filtering -Complexity -Simulations


Conclusion

3030

Multi-microphone optimal Multi-microphone optimal filteringfiltering

• Implementation procedure:

� based on Generalised Eigenvalue Decomposition (GEVD)

– take into account low-rank model of speech

– trade-off between noise reduction and speech distortion

� QRD [Rombouts 2002] , subband [Spriet 2001] lower complexity

• Generalised Eigenvalue Decomposition (GEVD):

• Speech detection mechanism is the only a-priori assumption:required for estimation of correlation matrices

][][][][

][][][][

kkkk

kkkkT

vvv

Tyyy

QΛQR

QΛQR

coloured noise!

Low-rank model

MRikk

Rikk

ii

ii

1,][][

1,][][22

22

][][

][1diag][][

2

2

kk

kη-kk T

i

iTWF QQW

Signal-dependent FIR-filterbank

Introduction




Conclusion

3131

General class of estimatorsGeneral class of estimators

• Multi-channel Wiener filter: always combination of noise reduction and (linear) speech distortion:

estimation error:

][ke ][][ kkTWFM xWI ][][ kkT

WF vW

• General class: noise reduction speech distortion

– =1 : MMSE (equal importance)

– <1 : less speech distortion, less noise reduction

– >1 : more speech distortion, more noise reduction

[Ephraim 95]

][][)1(][

][][diag][][

22

22

kkηk

kηkkk T

ii

iiTWF QQW

speech distortion

residual noise

Introduction




Conclusion

3232

• Decomposition in spectral and spatial filtering term

• Desired beamforming behaviour for simple scenarios

Frequency-domain analysisFrequency-domain analysis

WFW

vx

x

PP

P

1

11 eΓΓ xy

spectral filtering(PSD)

spatial filtering(coherence)

Introduction




Conclusion

Speech Noise

3333

Complexity reductionComplexity reduction

• Recursive version: each time step calculation GSVD + filter

• Complexity reduction using:

� Recursive techniques for recomputing GSVD [Moonen 90]

� Sub-sampling (stationary acoustic environments)

High computational complexity

Batch Recursive QRD [Rombouts]

sub = 1 7504 Gflops 2.1 Gflops 358 Mflops

sub = 20

375 Gflops 105 Mflops 18 Mflops

(N = 4, L = 20, M=80, fs = 16 kHz, P = 4000, Q = 20000)

)(316 23 QPMM 25.20 M 25.3 M

Real-time implementation possible

Introduction




Conclusion

3434

Complexity reductionComplexity reduction

• Incorporation in ‘Generalised Sidelobe Canceller’ (GSC) structure: adaptive beamforming

� Creation of speech reference and noise reference signals

� Standard multi-channel adaptive filter (LMS, APA)][0 ky

][1 ky

][1 kyN

Speechreferenc

e

][0 kw

][1 kw

][1 kwN

Optimalfilter

Noise reference(

s) +

–

][0 kwa

Adaptive filter

delay

Increase noise reduction performance

Complexity reduction by using shorter filters

Introduction




Conclusion

3535


• N=4, SNR=0 dB, 3 noise sources (white, speech, music), fs=16 kHz

• Performance: improvement of signal-to-noise ratio (SNR)

0 500 1000 15000

5

10

15

Reverberation time (msec)

Unb

iase

d S

NR

(dB

)Delay-and-sum beamformerGSC (LANC=400, noise ref=Griffiths-Jim)

Recursive GSVD (L=20, LANC=400, all nref)Recursive GSVD (L=20, no ANC)

Introduction




Conclusion

3636


• N=4, SNR=0 dB, 3 noise sources, fs=16 kHz, T60=300 msec

• ‘Power Transfer Functions’ (PTF) for speech and noise component

0 1000 2000 3000 4000 5000 6000 7000 8000

-30

-25

-20

-15

-10

-5

0

Speech

Noise

Frequency (Hz)

Sp

ect

rum

(d

B)

Recursive GSVD (L=20, no ANC)Recursive GSVD (L=20, LANC=400, all noise ref)

Introduction




Conclusion

3737

ConclusionsConclusions

• GSVD-based optimal filtering technique:

� Multi-microphone extension of single-microphone subspace-based enhancement techniques

� Signal-dependent low-rank model of speech

� No a-priori assumptions about position of speaker and microphones

• SNR-improvement higher than GSC for all reverberation times and all considered acoustic scenarios

• More robust to deviations from signal model:

� Microphone characteristics

� Position of speaker

� VAD: only a-priori information!

– No effect on SNR-improvement

– Limited effect on speech distortion

Introduction




Conclusion

3838

Advantages - DisadvantagesAdvantages - Disadvantages

Fixed beamforming

Adaptivebeamforming

Optimal filtering

Signal-dependent no yes yes

Noise reduction + ++ +++

Dereverberation + + no

Complexity low average high

VAD no yes yes

Robustness - (+) -- (+) ++

Introduction




Conclusion

3939

OverviewOverview

• Introduction





� Time-domain technique

� Frequency-domain technique



4040

ObjectiveObjective

][0 ky

][1 ky

][1 kyN

][1 kh

][0 kw

][1 kw

][1 kwN

][kz

Blind estimation of acoustic impulse responses

Time-domain Frequency-domain

Noise reduction and

dereverberation

Dereverberation

Source localisation

Introduction Basic principles Beamforming Multi-microphoneoptimal filtering


-Time-domain -Frequency-domain

-Dereverberation Conclusion

4141

• Signal model for N=2 and no background noise

• Subspace-based technique: impulse responses can be computed from null-space of speech correlation matrix� Eigenvector corresponding to smallest eigenvalue� Coloured noise: GEVD� Problems occuring in time-domain technique:

– sensitivity to underestimation of impulse response length – low-rank model in combination with background noise

Time-domain techniquesTime-domain techniques

S(z)

H0(z)

H1(z) Y1(z)

Y0(z)

Signals

][kyyR

-H1(z)

H0(z)

Null-space

0

±α

±α

E(z)

E(z)





4242

• Batch estimation techniques form basis for deriving adaptive stochastic gradient algorithm

• Usage :� Estimation of partial impulse responses time-delay

estimation for acoustic source localisation� For source localisation adaptive GEVD algorithm is

more robust than adaptive EVD algorithm (and prewhitening) in reverberant environments with a large amount of noise

Stochastic gradient algorithmStochastic gradient algorithm

1][ subject to,][min uRuuRuu

kk vvT

yyT

]1[][]1[

]1[]1[

][][][][][][]1[

][][][

kkk

kk

kkkekkekk

kkke

vvT

vv

T

uRu

uu

uRyuu

yu





4343

• Problems of time-domain technique frequency-domain

• Signal model: rank-1 model

• Estimation of acoustic transfer function vector H() from GEVD of correlation matrices and

� Corresponding to largest generalised eigenvalue no stochastic gradient algorithm available (yet)

� Unknown scaling factor in each frequency bin:

can be determined only if norm is known

algorithm only useful when position of source is fixed (e.g. desktop, car)

Frequency-domain techniquesFrequency-domain techniques

)(

1

1

0

)(

1

1

0

1

1

0

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

VH

Y

NNN V

V

V

S

H

H

H

Y

Y

Y

)(yyR )(vvR

)(H





4444

Combined noise reduction and Combined noise reduction and dereverberationdereverberation

• Filtering operation in frequency domain:

• Dereverberation: normalised matched filter

• Combined noise reduction and dereverberation:Z() is optimal (MMSE) estimate of S()

� Optimal estimate of s[k] integration of multi-channel Wiener-filter with normalised matched filter

� Trade-off between both objectives

• Implementation: overlap-save

)()()()()()()()()(

VWHWYW H

F

HH SZ

1)( F

2)(

)()(

H

HW d Residual noise

)(ˆ)()(ˆ SHX





4545


• N=4, d=2 cm, fs=16 kHz, SNR=0 dB, T60= 400 msec

• FFT-size L=1024, overlap R=16

• Performance criteria:

� Signal-to-noise ratio (SNR)

� Dereverberation-index (DI) :

SNR (dB) DI (dB)

Original microphone signal 2.88 4.74

Noise reduction 16.82 4.73

Dereverberation 2.30 0.86

Combined noise reduction and dereverberation

10.12 1.35

dH )()(log20

2

110 HW





4646






4747

ConclusionConclusion

• Low signal quality due to background noise and reverberation signal enhancement to improve speech intelligibility and ASR performance

Single-microphone techniques: spectral informationStandard beamforming: a-priori assumptions

No a-priori assumptions

Multi-microphone

Signal-dependent

Blind transfer function

estimation and dereverberation

Robust broadband

beamforming

Multi-microphone optimal filtering



Conclusion

4848

ContributionsContributions

• Robust broadband beamforming:� novel cost functions for broadband far-field design

(non-linear, eigenfilter-based)

� extension to near-field and mixed near-field far-field

� 2 procedures for robust design against gain and phase deviations

• GSVD-based optimal filter technique for multi-microphone noise reduction:� extension of single-microphone subspace-based

techniques multiple microphones

� integration in GSC-structure

� better performance and robustness than beamforming

• Acoustic transfer function estimation and dereverberation:� stochastic gradient algorithm for estimation of time-delay

and acoustic source localisation (coloured noise)

� combined noise reduction and dereverberation in frequency-domain



Conclusion

4949

Further researchFurther research

• Combination of multi-channel Wiener-filter and fixed beamforming:

� Low SNR: VAD fails poor performance of Wiener-filter

� Combined technique: more robust when VAD fails, better performance than fixed beamformers in other scenarios

• Acoustic transfer function estimation and dereverberation:

� Time-domain: underlying reason for high sensitivity

� Frequency-domain: unknown scaling factor BSS ?

� other blind identification techniques (LP, NL Kalman-filtering)

• Further complexity reduction of multi-channel optimal filtering technique

� Stochastic gradient algorithms

� Subband/frequency-domain



Conclusion

5050

Relevant publicationsRelevant publications

• S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and multimicrophone speech enhancement,” IEEE Trans. Signal Processing, vol. 50, no. 9, pp. 2230-2244, Sep. 2002.

• S. Doclo and M. Moonen, “Multi-Microphone Noise Reduction Using Recursive GSVD-Based Optimal Filtering with ANC Postprocessing Stage,” Accepted for publication in IEEE Trans. Speech and Audio Processing, 2003.

• S. Doclo and M. Moonen, “Robust adaptive time delay estimation for speaker localisation in noisy and reverberant acoustic environments, EURASIP Journal on Applied Signal Processing, Sep. 2003.

• S. Doclo and M. Moonen, “Combined frequency-domain dereverberation and noise reduction technique for multi-microphone speech enhancement,” in Proc. Int. Workshop on Acoustic Echo and Noise Control (IWAENC), Darmstadt, Germany, Sep. 2001, pp. 31-34.

• S. Doclo and M. Moonen, “Design of far-field and near-field broadband beamformers using eigenfilters,” Accepted for publication in Signal Processing, 2003.

• S. Doclo and M. Moonen, “Design of robust broadband beamformers for gain and phase errors in the microphone array characteristics,” IEEE Trans. Signal Processing, Oct. 2003.

Available at http://www.esat.kuleuven.ac.be/~doclo/publications.html



Conclusion

Documents

Multi-microphone noise reduction and dereverberation techniques for speech applications Simon Doclo Dept. of Electrical Engineering, KU Leuven, Belgium