Kalman Filterin

Proceedings of IcsF' '98

Frame-based Subband Kalman Filtering for Speech Enhancement

Wen-Rong Wu and Po-Chen Chen Dept. of Communication Engineering

National Chiao Tung University, Hsinchu, Taiwan Clilia

Hwai-Tsu Chang and Chun-Hung Kuo Advanced Technology Center, Computer & Communication Research Lab.

Industrial Technology Research Institute, Hsinchu, Taiwan China

Abstract noise as a Gaussian mixture and applied a decision-

l\&nan Filtering is an effective speech enhancement technique, in which speech and noise signals are usually modeled as autoregressive (AR) processes and reprcscnted in the state-space domain. Since AR coefficients identification and KalmaIi 21- tering require extensive computations, practical implementation of this approach is difficult. This pa- per proposes a simple and practical scheme that overcomes these problems. Speech signals are first decomposed into subbands. Subband speech signals are then modeled as low-order AR processes, such that low-order Kalman filters can be applied. Enhanced fullband speech signals are finally obtained by combining the enhanced subband speech signals. Using a fiame-based algorithm, autocorrelation functions of subband speech are calculated and the Yuler-Walker equations are solved to obtain the AR parameters. Simulation results show that Kalman filtering in the subband domain not only greatly reduces the computational complexity, but also achieves better performance compared to that in the fullband domain.

I. Introduction

The Kalman filtering technique is well known in signal processing. In [l], Paliwal and Basu used a Kalman filter to enhance speech corrupted by white noise. Gibson, Koo, and Gray considered speech enhancement with colored noise in [2]. They mcd-

i ',,,th ~ c w b ; ~ n d rolored noisc as AR piocesses and developed scalar and vector Kalman filtering algorithms. To estimate the AR coefficients, an EM- based algorithm w a employed. In [3!, Lees and Ann proposed a non-Gaussian AR model for speech signals. They modeled the distribution of the driving-

directed nonlinear Kalman filter. Again, an hM- based algorithm was used to identify unknown parameters. Niediwiecki and Cisowki [4] assumed that speech signals are nonstationary AR processes and used n r a n & ~ l - v ~ & model for the AR Coefficients. An extended Kalman filter was then used to simul- taneously estimate speech and AR coefficients. The aforementioned Kalman filtering algorithms are still not suitable for practical implementation. This is because to hell model a speech signal a high-order AR model is needed. Thus, identification of AR coefficients and the application of the high-order Kalman filter all require extensive computations. To overcome these drawbacks, we suggest modeling and filtering speech signals in the subband domain such that lower-order Kalman filters can be applied. To identify the AR parameters of subband speech signals, we propose a frame-based method. We first divide each input subband signal into consecutive frames and then calculate the autocorrelation function of the speech signal. Solving the corresponding Yuler-Walker equations, we can obtain the derired AR parameters and carry out the subband Kalman filtering.

11. Conventional Kalman Filtering

A . Whzte Nozse Faltenng

On a short-time basis, a speech sequence {z(n)} ran he represented as an AR process given bv

I

z(n) = a,z(n - 2 ) + w(n), (1) i= l

where w(n) is a zero-mean white Gaussian process with variance 0;. The observed speech signal s(n)

0-7803-4325-5/98/$10.00 682

01 a2 ... ap-l a, 1 0 ... 0 0

F = 0 1 ... 0 0 . . . . . . . . . . 0 0 ... 1 0 I

v(n) = hTv(n), (13) where F,, g,, and h, are identical to those in ( 5 ) and (6), except that ai, p are replaced by bi, q. Combining (12), (13), (3) and (4), we then have

(14) *(n) = Pa(n - 1) + G:iit(n)

s(n) = LTs(n), (15) where

, (5 )

pxp hT = [ hT h: 1 . The covariance matrix

of @(n) is defined as Q e E[:iit(n):iit=(n)] = diag(a$,ai). The Kaiman equations for (14) and (15) are then obtained by setting ut = 0 and re- placing 3(n), F, h, and g with &(a), p, &,and & in (7)-(10). The speech estimate is then a(n) = [hT 01L(n).

111. Warned-Based Subband Kalman Filtering

A . Formulation

The block diagram of the proposed speech enhancement system is shown in Fig. 1. Noisy speech s(n) is first split into a set of subband signals, s;(n), where i = 1 . . . M, by an M-channel analysis filter bank and M-fold decimators. The subband signal si(n) can be expressed as follows:

si(n) = zi(n) + ~i(n), i = 1,. . . , M, (18)

where zi(n) and w;(n) are subband signals of z(n) and v(n) , respectively. If v(n) is white, we can ap- proximate vi(n) as white. If v(n) is colored, vi(n) is also colored. Thus, we model vi(.) as an AR process. Since subband speech signals have simpler spectra than their fullband counterparts, they can be modeled as lower-order AR signals. The Kalman filtering operations will be greatly simplified in these cases. Let AR(i) denote the i-th order AR model. Then, if AR(1) is used, zi(n) can be expressed as follows.

~ i ( n ) = cizj(n - 1) + wj(n), (19)

where wi(n) is a zero-mean white Gaussian process with a variance of a&. Equation (19) is the state equation for subband signals. Combining it with

683

the measurement equation in (18), we can apply a bank of Kalman filters t o subband speech signals. For AR(O), we can just set c; = 0 in (19). For the order higher than one, we can have equations similar to (3)-(4). The filtered subband signal, denoted as &(n), is up-sampled by expanders and then pr+ cessed by an M-channel synthesis filter bank to re- construct the filtered signal 2(n) .

B. Parameter Estimation

To use the Kalman filter, the AR parameters of the speech and noise signals must be estimated. It is known that the AR parameters of an process can be obtained solving the corresponding Yuler-Walker equations [SI. Let w;(n) be modeled as q-th order AR process, v i (n ) = [v i (n) ,v , (n- l ) ,..., v ; ( n - q + l)IT, and

R’ - , F { v , ( - ) Y ~ ( x ; ~ ~ , T‘: 1 E{vi(n + l ) ~ i ( n ) } (20)

Then, it has been shown that [6] the AR coefficients of w,(n), ,b’ = [bf , b;, . . . , b 6 - 1 ] T , can be found as

b’ = (R;,)-lP; (21)

The corresponding driving noise variance is

P

at,, = T ~ , ( o ) - b:rvu(j) (22) j=1

where T : , ( T ) is the autocorrelation function of v,(n). Note that entries of Rt, and Pi also con- sist of the autocorrelation function .:,(.) for T = 0 , l . . . , p - 1.

As we known, in a short-period of time, a speech signal can be seen as stationary. Thus, we can divide the speech signal into consecutive frames and model the speech signal in each frame as an AR process. As that in (21), the AR parameters of the speech can be obtained if its autocorrelation function can be estimated. This can be easily achieved if we assume that the speech and noise signals are un- correlated. Let rQ,(.) and T : ~ ( T ) denote the autocorrelation function of s i (n) and ~i(n), respectively. Then,

T 1 8 ( T ) = E{Si(n + T ) S j ( n ) }

= E{[Zi(n T ) + W i ( n + T ) ] [ Z i ( n ) ‘Ji(n)]} = E(Zi(72 + T ) Z i ( n ) } + E{W;(n -t ‘T)’U;(n))

= r22(.) + f : , (7) (23)

Thus, the autocorrelation function of the speech signal can be obtained as

‘;.Z = ‘:8(‘) - ‘.bW (24)

Let the AR order of zi(n) be p, a;(n) = [0;(7t),2;(n-l), ..., 2 i (n -p+1) lT , and

R:z = E { ~ i ( n ) ~ i ( n ) ~ } , Pj = E { ~ i ( n + l)zi(n)} (25)

Similar to that in (21), the AR parameters for the i-iL auLLcnuci bigud, U‘ - iui, U:, . . . , U&lJ , can be obtained by

The corresponding driving noise variance is then

’ ‘T

ai = (R;z)-lP; i26)

c& = r:,(O) - a $ ~ : ~ ( j ) (27) P

j= 1

As we can see that matrix inversions are involved in the parameter estimation. However, if the AR order is low, these operations can be carried out easily. As to the autocorrelation functions, we can simply take the time average to obtain their estimates. For example,

- N - s

where N is the frame size and rn is the sequence index inside a particular frame.

IV. Simulations In this section, we report some simulations results

demonstrating the effectiveness of the proposed algorithm. We designed a 5-band cosine modulated filter bank and the filter length of the prototype filter was 20. A real speech uttered by a female speaker was used in the simulation. Both white and colored noise were considered. To understand the filtering characteristics, we use two types of colored noises, namely, motorcycle and automobile noise. The input SNR is held at 5 dB. The SNR improve- ment (dB) is used as the performance measure. The enhancement results is shown in Table 1. In the table, ( i , j ) denote that the AR order of the subband speech is i and that of the subband noise is 3 . For simplicity, i and j are the same for all subbands. From the table, we find that for white and motorcycle noise, all modelings give similar results. However, for automobile noise, modeling noise with a higher AR order give significantly better results. If the total AR order is fixed, it will be preferable to have a higher order for noise rather than speech. The power spectra of the c o l m d noises are plot- ted in Fig. 2, From the figure, we find that the automobile noise is a narrowband signal while the motorcycle noise is a wideband one. Thus, we need a higher order t o model the automobile noise. This explains the results shown in Table 1.

684

V. Conclusions (0,O) (1,O) (0,l) (1,l) (2,O)

The Kalman filtering is an effective scheme for speech enhancement. Particularly when we apply it in the subband domain. In this way, not only performance can be enhanced, but also the computational complexity can be reduced. If the noise is widebanded, then a (0,O) modeling will give suffi- ciently good results. This yields a filtering scheme with very low computational complexity. If the noise is narrowbanded, then a higher order modeling such as (2,2) modeling can give much better performance. However, the computational complexity will increase. Note that the complexity of the Kalman filter can be reduced using a so called measurement difference method. Research in this direction is now underway.

White Motorcycle Automobile ’ 5.39 5.81 3.53 5.50 5.82 3.43 5.40 5.81 5.70 5.49 5.84 6.98 5.38 5.64 2.94

References

(0,2j i 5.40 (2,2) I 4.45

[l] K. K. Paliwal and Anjan Basu, “A speech enhancement method based on Kalman filtering,” in Proc. IEEE Int. Cod. Acoust., Speech, Sig- nal Processing, Apr. 1987, pp 177-180.

[2] Jerry D. Gibson, Boneung Koo and Steven D Grey, “Filtering of colored noise for speech enhancement and coding,” IEEE Zlans. Signal Processing, vol. 39, no. 8, pp. 1732-1741, Aug. 1991.

I

5.82 7.51 5.57 9.64

[3] B. Lee, K. Y. Lee, S. Ann, “An EM-base approach for parameter enhancement with an application to speech signals,” Signal Processing, vol. 46, no.1, pp. 1-14, Sep. 1995.

[4] M. Niediwiecki and K. Cisowski, “Adaptive scheme for elimination of broadband noise and impulsive disturbance from AR and ARMA signals,” IEEE Trans. Signal Processing, vol. 44, no. 3, pp. 528-537, Mar. 1996.

[5] P. L. Vaidyanathan, Multirate Systems and Filter Bank, Englewood Cliffs, N.J.: Prentice- Hall, 1993.

21; (n) alman filte

-channel

decimator expander

Figure 1: Subband speech enhancement system

“r’ ” ’ ’ j i

[SI S. Haykin, Adaptive Filter Theory, Englewood Cliffs, N. J.: Prentice-Hall, 1991. Figure 2: PSDs of the colored noises

685

Documents

Kalman Filterin