[IEEE 2012 IEEE Statistical Signal Processing Workshop (SSP) - Ann Arbor, MI, USA (2012.08.5-2012.08.8)] 2012 IEEE Statistical Signal Processing Workshop (SSP) - LMS in prominent system

LMS IN PROMINENT SYSTEM SUBSPACE FOR FAST SYSTEM IDENTIFICATION

Rongshan Yu, Ying Song, and Susanto Rahardja

Institute for Infocomm Research, A*STAR, SingaporeEmail: [ryu, sying, rsusanto]@i2r.a-star.edu.sg

ABSTRACT

In many system identification applications, the unknown system is

characterized by time-varying parameters. Therefore, fast on-line

identification is required in order to keep the system stable and

improve the control performance. In this paper, we show that the

dimensionality of system identification can be dramatically reduced

if the unknown system is sparse, in the sense that its parameter

set has a concise representation when expressed in a proper basis.

In such cases, the system identification can be effectively carried

out in a subspace of reduced dimension. Based on this theory, we

further proposed two new least-mean-square (LMS) algorithms,

namely, prominent system subspace LMS (PSS-LMS) and enhanced

PSS-LMS (PSS-LMS+) to exploit this sparsity for fast system

identification. Finally, we conducted experiments to compare the

convergence performances of PSS-LMS, PSS-LMS+, and conven-

tional LMS using numerical simulation, and the results confirm the

superior performances of the proposed algorithms.

Index Terms— Adaptive filter, least-mean-square (LMS), sys-

tem identification, singular value decomposition.

1. INTRODUCTION

Widrow and Hoff’s least-mean-square (LMS) [1] algorithm and its

variations, such as normalized LMS (NLMS), have been widely

used in system identification due to their simplicity and robustness.

Despite their popularity, however, LMS algorithms in their sim-

plest forms suffer from several important limitations and drawbacks,

namely, unsatisfactory convergence rate and asymptotic perfor-

mance, in particular, for input with an autocovariance matrix that

has large eigenvalue spread. Historically, a significant amount of

research has been devoted to improve the convergence performance

of LMS algorithms. The self-orthogonalization LMS [2] and its low-

complexity variation, the transform-domain LMS [3] use a matrix

convergence factor derived from the inverse of the input autocovari-

ance matrix. It can be shown that the resulting matrix controlling

the convergence speed of self-orthogonalization LMS is the identity

matrix and hence the convergence behavior of LMS is improved.

The consequences of eigenvalue spread can be mitigated by using

a two-stage structure [4][5], whereby a time-domain pre-whitening

filter is used to decorrelate the input signal before LMS adaptation.

Alternatively, in the sparse-LMS algorithm [6], convergence behav-

iors is improved by constraining the LMS adaptation to only a few

large coefficients interspersed among many negligible ones.

In this paper, we study a different source of sparsity of the un-

known system, namely, sparse in the transform domain, to improve

the convergence behavior of LMS. A typical example of such a sys-

tem is an ANC headset [7], the electro-acoustic path from the loud-

speaker to the error microphone is relatively fixed due to form fac-

tor constraints of the headset. Therefore, it may be best modeled

by a “nominal” impulse response determined by the form factor of

the headset plus small perturbations resulting from variations in the

manufacturing process, biological features of users, and wearing po-

sition. It can be shown that such a system can be effectively modeled

using a subspace of reduced dimensionality, in which the unknown

system parameters have significant variances. Based on this theory,

we further introduce two new LMS algorithms, namely, PSS-LMS

and PSS-LMS+ to exploit this sparsity and thus improve their system

identification performance. In PSS-LMS, LMS adaptation is con-

strained only in the prominent system subspace to reduce the dimen-

sionality of the problem for faster convergence; while in PSS-LMS+,

the standard LMS algorithm is modified so that a larger adaptation

factor can be used in the prominent system subspace.

The PSS-LMS algorithms introduced in this paper are conceptu-

ally different than reduced-rank adaptive filtering technologies based

on either eigen-decomposition [8][9] or the multistage Wiener fil-

ter [10]. In those prior-art works, adaptive filtering operation is per-

formed on a subspace spanned by either a subset of the eigenvec-

tors of the covariance matrix of observed input data, or Krylov sub-

space obtained from a successive orthonormalization process. The

focus was thus to identify a subspace such that the mean-square error

(MSE) of the output process is minimized, and no prior knowledge

assumption is made regarding to the distribution of the unknown sys-

tem parameters. On the contrary, PSS-LMS and PSS-LMS+ are de-

veloped based on the assumption that the parameters of the unknown

system are drawn from a non-white process, and the adaptive filter-

ing operation is then performed in a subspace spanned by eigenvec-

tors of the system parameter covariance matrix. The proposed al-

gorithms improve the convergence behavior of LMS even when the

input process is white, which is impossible for traditional fast LMS

methods that rely on correlations of the observed signal.

2. PROMINENT SUBSPACE LMS

2.1. Conventional LMS for System Identification

Given an unknown system h characterized by an Lth-order discrete-

time finite impulse response (FIR) filter and observed signal x(n),for which the output is assumed to be further corrupted by an additive

disturbance u(n), independent to x(n). The desired output of the

unknown system is thus given by:

d(n) = hTx(n) + u(n), (1)

where h∆= [h

1, · · · , h

L]T , x(n)

∆= [x(n), · · · , x(n − L + 1)]T ,

and superscript T denotes transpose. The goal herein is to find a

coefficient vector w = [w1, · · · , wL]T such that the MSE defined

as:

ǫ∆= E{[e(n)]2}

∆= E{[d(n)−w

Tx(n)]2}, (2)

is minimized, where E[·] denotes the statistical expectation oper-

ation. The optimal weight vector wopt can be obtained from the

2012 IEEE Statistical Signal Processing Workshop (SSP)

978-1-4673-0183-1/12/$31.00 ©2012 IEEE 209

following Winner-Hopf equation:

wopt = R−1x px (3)

where Rx∆= E[x(n)xT (n)] and px

∆= E[d(n)x(n)]. It can be

seen that wopt = h and the minimum MSE is given by:

ǫmin = E[d(n)2]− pTxR

−1x px. (4)

In order to adaptively find wopt in (3), the standard LMS algo-

rithm is implemented as a steepest descent algorithm as follows:

w(n+ 1) = w(n) + µe(n)x(n) (5)

where µ is the step sizes controlling convergence speed.

Defining ν(n) = w(n) − wopt as the weight-error vector, the

expected modeling error is given by [11] :

J = E[‖ν(n)‖2] = µLǫmin/2. (6)

On the other hand, the time constants of the LMS algorithm are given

by:

τi = 1/(4µλi), 1 ≤ i ≤ L, (7)

where {λi}Li=1 are the eigenvalues of Rx. Therefore, a better

trade-off between convergence speed and modeling accuracy can be

achieved by either reducing the eigenvalue spread of the observed

data, which is done in the usual transform-domain LMS algorithms,

or by reducing the dimensionality of the system identification prob-

lem.

2.2. Prominent Subspace of a Sparse System

We define an unknown variable system h in the Euclidean space

RL. Eigendecomposition of the system covariance matrix Rh

∆=

E[hhT ] leads to

Rh = VΛhVT , (8)

where V∆= [v

1, . . . ,v

L] denotes an orthonormal matrix of eigen-

vectors {vi∈ R

L}, and Λh∆= diag [λ

h(1), . . . , λ

h(L)] denotes

a diagonal matrix of eigenvalues of Rh. Due to possible electronic

and physical constraints existing in the unknown system, h may not

be uniformly distributed in RL. As a result, the eigenvalues of the

system covariance matrix are in general not equal. Without loss of

generality, we assume the eigenvalues are sorted in descendent order

in (8), thus λh(1) ≥ λ

h(2) ≥ . . . ≥ λ

h(L).

The unknown system h can be expanded in the orthonormal ba-

sis {vi}Li=1 as:

h = Vc, (9)

where c is the projection of the coefficient vector h onto the row

space of V, i.e., c∆= VTh. Denoting V

N

∆= [v1 , . . . ,vN

], the

least-square (LS) modeling of h is simply its projection onto the

span of VN

:

h = VNV

TNh. (10)

The average modeling error J with respect to the distribution of the

unknown system is thus the variance of h in the orthogonal comple-

ment of span VN

, which is given by:

J = E[‖h− h‖2

]= tr

(V

TNR

hV

N

)=

L∑

i=N+1

λh(i), (11)

where VN

∆= [v

N+1, . . . ,v

L]. Clearly, for systems that are sparse

in the sense that λh(i) ≈ 0, N < i ≤ L, the system identifica-

tion can be effectively performed in span VN

without introducing

significant modeling error. We call the subspace Ψ∆= span V

Nthe

prominent subspace of unknown system h.

2.3. LMS Adaptation in Prominent System Subspace

We now turn our attention to the problem of performing system iden-

tification task in the prominent system subspace with the input x(n)and the observed output signal d(n). The identification is achieved

by adjusting the weight coefficients {wi}Ni=1 of the orthonormal ba-

sis {vi}Ni=1 such that the MSE of the error signal e(n) is minimized.

Denoting w(n)∆= [w1(n), . . . , wN(n)]T and x(n)

∆= VT

Nx(n),

the error signal e(n) is given by:

e(n) = d(n)− xT (n)w(n). (12)

The Wiener solution that minimize the MSE of e(n) is thus:

wopt = R−1NN

pNN

, (13)

where RNN

and pNN

are, respectively, the covariance matrix and

the cross-correlation vector of x(n), which are defined as

RNN

∆= E[x(n)xT (n)], (14)

and

pNN

∆= E[d(n)x(n)]. (15)

The Wiener MSE solution wopt is the projection of the unknown

system onto the prominent system subspace Ψ with respect to the

error signal e(n). Therefore, it is in general different than the LS

modeling of h in Ψ, which is given by c∆= VT

Nh, due to the cou-

pling of input signal in the prominent system subspace Ψ and its

orthogonal complement. More specifically, we note that

d(n) = xT (n)Vc+ u(n) = x

T (n)c+ xT (n)c+ u(n), (16)

where, x(n) and c are, respectively, the projections of input signal

x(n) and system coefficient vector h onto the row space of VN

, i.e.,

x(n)∆= VT

Nx(n) and c

∆= VT

Nh. Substituting (16) into (15) and

noting that E[u(n)x(n)] = 0, we have

pNN

= RNN

c+RNN

c, (17)

where RNN

∆= E[x(n)xT (n)] is the cross-correlation matrix of

x(n) and x(n). It now follows from (13) that

wopt = c+R−1NN

RNN

c. (18)

When the unknown system is N -sparse, we have wopt ≈ c since

c ≈ 0 and the MSE of the Wiener solution ǫmin in this case will be

approximated by that of the LS estimate:

ǫmin ≈ E{[d(n)− xT (n)c]2} = σ2

u + cTR

NNc, (19)

which is close to the minimum MSE of the time-domain Wiener filter

ǫmin = σ2u when c ≈ 0. Here R

NN

∆= E[x(n)xT (n)].

Similar to the conventional LMS, the resulting prominence sys-

tem subspace LMS (PSS-LMS) is given as follows:

w(n+ 1) = w(n) + µx(n)e(n). (20)

Where µ is the step size of LMS adaptation. Alternatively, by

noting that the time-domain system model is given by w(n) =VN

−1w(n), the PSS-LMS algorithm in (20) can be equivalently

expressed in the time-domain as follows:

w(n+ 1) = w(n) + µVNVTNx(n)e(n). (21)

210

It can be shown that the expected modeling error of PSS-LMS

when applied to system identification is given as follows:

J = E[‖w(n)− h‖2

]= J + J∆(n), (22)

where

J = E[‖h−VNwopt‖

2] (23)

is the modeling bias due to the projection onto the prominent system

subspace, and

J∆(n) = E[‖VNν(n)‖2] (24)

is the modeling error due to the gradient noise of the LMS algorithm,

where ν(n) = w(n)− wopt is the weight-error vector. Substituting

(18) into (23) shows that J is given by:

J = E[‖(VN

−VNR

−1NN

RNN

)c‖2] = tr[BBTRc], (25)

where B = VN−V

NR−1

NNR

NN, and Rc = E[ccT ] is the covari-

ance of c. Furthermore, if the observed data vector is white, it can

be further shown that BBT = I, and thus we have

J =L∑

i=N+1

λh(i). (26)

On the other hand, denoting Rν(n) = E[ν(n)ν(n)T ], it now fol-

lows from standard LMS analysis that under mild conditions, J∆(n)at steady-state is given by:

J∆ = tr[Rν(n)] = µNǫmin/2, (27)

which can be further simplified to J∆ = 12µNσ2

u if the observed

data vector is white.

2.4. LMS with Fast Adaptation in Prominent System Subspace

One limitation of PSS-LMS is that it only converges to the pro-

jection of the system transfer function onto the prominent system

subspace, which may not be desirable for identification applications

that are sensitive to modeling error. To overcome this limitation, the

PSS-LMS algorithm can be further modified by including null space

gradient descent vectors in the LMS adaptation, which leads to the

following enhanced PSS-LMS (PSS-LMS+):

w(n+1) = w(n)+[µVNVTN +µ(I−VNV

TN)]x(n)e(n), (28)

where µ > 0 controls convergence of the weight vector in the null

space, which is normally smaller than µ to maintain the steady-state

performance of LMS for system with large dimensionality.

The LMS adaptation algorithm in (28) can be equivalently writ-

ten as

w(n+ 1) = w(n) + µVΛκVTx(n)e(n), (29)

where Λκ = diag[κ, . . . , κ, 1, . . . , 1], and κ = µ/µ. It can be seen

that (29) is further simplified to standard LMS adaptation as follows:

w(n+ 1) = w(n) + µx(n)e(n), (30)

where

w(n) = Λ−1/2κ V

−1w(n), (31)

and

x(n) = Λ1/2κ V

Tx(n). (32)

Based on the standard LMS foundation (30), it can be seen that

w(n) will, in general, converge to Λ−1/2κ V−1wopt in the mean

sense. Therefore, the PSS-LMS+ algorithm will converge to the full-

dimension Wiener wopt, which is desirable.

Denote the weight-error vector of PSS-LMS+ as ν(n) =w(n) − wopt, and the weight-error vector in the transform do-

main as ν(n) = w(n) − Λ−1/2κ V−1wopt. It is straightforward to

show that ν(n) = Λ1/2κ Vν(n) and hence the expected modeling

error can be calculated as follows:

J = E[‖ν(n)‖2] = E[‖Λ1/2κ Vν(n)‖2] = tr[ΛκRν(n)], (33)

where Rν(n) = E[ν(n)ν(n)T ]. It can be shown that in steady state

we have

Rν(n) = µǫminI/2. (34)

Therefore, the average modeling error of PSS-LMS+ is given by:

J = µǫmintr[Λκ]/2 = [(L−N)µ+Nµ] ǫmin/2. (35)

Clearly, for a sparse system with N ≪ L, it is possible to use a large

adaptation factor µ in the prominent system subspace to speed up

the convergence of the PSS-LMS+ algorithm without significantly

degrading the modeling accuracy, which is not possible if standard

LMS algorithm is used.

3. EXPERIMENT RESULTS

In this section, we present experimental results to verify the theory

developed in previous sections. To this end, we compare the perfor-

mances of different LMS algorithms when they are applied to iden-

tify the system transfer function of the electro-acoustic path between

the loudspeaker and microphone of an experimental ANC headset.

In our experiment, we first collected M = 640 sets of FIR filter co-

efficients hm,m = 1, . . . ,M from the electro-acoustic path of the

ANC headset when it was worn by different human subjects at nor-

mal wearing positions and positions that are slightly deviated from

the normal wearing position. The measurements were conducted in

a quiet listening room. The sampling rate is 8 kHz and the length

of the FIR filter is 64. The eigen-values of the sample covariance

matrix from the observed FIR coefficients are illustrated in Fig. 1,

which clearly demonstrates that the system under investigation is

highly sparse in the transform domain.

We further investigate the relationship between the modeling er-

ror of the PSS-LMS algorithm and the dimension of the prominent

system subspace. The results are given in Fig. 2. To obtain these

results, we assume that both the observed signal x(n) and noise sig-

nal u(n) are white noises with variances σ2x = 1 and σ2

u = 0.01,

respectively. It can be observed that, when step size µ = 0.1984,

the average modeling error of PSS-LMS decreases with increasing

PSS dimension N , reaching its minimum at N = 7. Increasing PSS

dimension beyond that point, the modeling error increases due to the

presence of gradient noise. It can also be observed that smaller µcould in general lead to small average modeling error, and move the

minimum modeling error point to higher PSS dimensions since the

magnitudes of gradient noise are reduced for smaller µ.

We now compare the convergence performance of PSS-LMS,

PSS-LMS+, and conventional LMS algorithms. In our experiments,

the unknown system was constructed by cascading two randomly

chosen FIR filters from the M observations. Fig. 3 shows the av-

erage modeling error vs. number of iterations from different al-

gorithms. The LMS step sizes for LMS and PSS-LMS are µ =0.003 and µ = 0.066, respectively. For PSS-LMS+, two sets of

LMS step sizes, namely, {µ = 0.003, µ = 0.066} and {µ =0.0016, µ = 0.034} are used to generate the curves PSS-LMS+(1)

and PSS-LMS+(2), respectively. The dimension of the prominent

system subspace is set to N = 3. It can be seen that the PSS-LMS

converges much faster than LMS, taking only about 100 iterations

211

0 10 20 30 40 50 6010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Eigenvalues in descending order

Eig

en

va

lue

Fig. 1. Eigenvalues in descending order.

0 5 10 15 20−26

−24

−22

−20

−18

−16

−14

−12

−10

−8

−6

PSS−LMS dimension

Ave

rag

e m

od

elin

g e

rro

r [d

B]

PSS−LMS average modeling error vs. dimension

µ=0.1984

µ=0.0992

µ=0.0661

µ=0.0496

µ=0.0397

Fig. 2. PSS-LMS modeling error vs. dimension.

to reach steady state, while the LMS algorithm takes about 1500 it-

erations. Unfortunately, it has a larger modeling error steady state.

The PSS-LMS+ algorithm, using the same step sizes, will have ini-

tial converge behavior similar to PSS-LMS, which is much faster

than LMS, and further converge to a full-rank solution with lower

modeling error after the initial convergence stage (PSS-LMS+(1)).

Furthermore, by adjusting the step size, the same modeling error per-

formance as that of full-rank LMS algorithm can be achieved (PSS-

LMS+(2)).

4. CONCLUSIONS

In this paper, we have proposed two new adaptive algorithms,

namely, PSS-LMS and PSS-LMS+, for fast identification of systems

that are sparse in the transform domain. The PSS-LMS constrains

the LMS adaptation in the prominent subspace of the unknown sys-

tem, thus effectively improves the convergence speed of the LMS

algorithm at the cost of larger modeling errors in steady state. On the

other hand, the PSS-LMS+ algorithm combines the virtues of both

PSS-LMS and LMS by using a large LMS adaption step only in the

prominent system subspace, thus improving the convergence speed

of the LMS algorithm without introducing significant modeling

error. The merits of PSS-LMS and PSS-LMS+ over conventional

LMS is confirmed in our experiments using examples from electro-

acoustic path modeling for ANC headset.

0 1000 2000 3000 4000 5000 6000−30

−25

−20

−15

−10

−5

0

5LMS, PSS−LMS and PSS−LMS+ with 3 largest eigenvector filters

Iteraions

Ave

rag

e m

od

elin

g e

rro

r [d

B]

LMS

PSS−LMS

PSS−LMS+(1)

PSS−LMS+(2)

Fig. 3. Comparison of identification performance.

5. REFERENCES

[1] B. Widrow, “Adaptive filters,” in Aspects of Network and Sys-

tem Theory, R. E. Kalman and N. DeClaris, Eds. New York:

Holt, Rinehart and Winston, 1970, pp. 563–587.

[2] R. Gitlin and J. F. Magee, “Self-orthogonalizing adaptive

equalization algorithms,” IEEE Trans. Commun., vol. 25, no. 7,

pp. 666–672, Jul. 1977.

[3] S. S. Narayan and A. M. Peterson, “Frequency domain least-

mean-square algorithm,” Proc. IEEE, vol. 69, no. 1, pp. 124–

126, Jan. 1981.

[4] M. Mboup, M. Bonnet, and N. Bershad, “LMS coupled adap-

tive prediction and system identification: a statistical model

and transient mean analysis,” IEEE Trans. Signal Processing,

vol. 42, no. 10, pp. 2607–2615, Oct 1994.

[5] R. Yu and C. C. Ko, “Lossless compression of digital audio us-

ing cascaded RLS-LMS prediction,” IEEE Trans. Speech and

Audio Processing, vol. 11, no. 6, pp. 532–537, Nov. 2003.

[6] Y. Chen, Y. Gu, and A. Hero, “Sparse LMS for system iden-

tification,” in IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP) 2009, April 2009.

[7] Y. Song, Y. Gong, and S. M. Kuo, “A robust hybrid feedback

active noise cancellation headset,” IEEE Trans. Speech and Au-

dio Processing, vol. 13, no. 4, pp. 607–617, July 2005.

[8] W. Gabriel, “Using spectral estimation techniques in adaptive

processing antenna systems,” IEEE Trans. Antennas and Prop-

agation, vol. 34, no. 3, pp. 291 – 300, Mar 1986.

[9] L. Scharf and D. Tufts, “Rank reduction for modeling station-

ary signals,” IEEE Trans. Acoustics, Speech and Signal Pro-

cessing, vol. 35, no. 3, pp. 350 – 355, Mar 1987.

[10] J. S. Goldstein, I. S. Reed, and L. L. Scharf, “A multistage

representation of the wiener filter based on orthogonal pro-

jections,” IEEE Trans. Information Theory, vol. 44, no. 7, pp.

2943 –2959, Nov 1998.

[11] B. Widrow, J. M. McCool, M. G. Larimore, and C. R. John-

son, “Stationary and nonstationary learning characteristics of

the lms adaptive filter,” Proc IEEE, vol. 64, no. 8, pp. 1151–

1162, Aug. 1976.

212

Documents

[IEEE 2012 IEEE Statistical Signal Processing Workshop (SSP) - Ann Arbor, MI, USA (2012.08.5-2012.08.8)] 2012 IEEE Statistical Signal Processing Workshop (SSP) - LMS in prominent system