Upload
susanto
View
213
Download
1
Embed Size (px)
Citation preview
LMS IN PROMINENT SYSTEM SUBSPACE FOR FAST SYSTEM IDENTIFICATION
Rongshan Yu, Ying Song, and Susanto Rahardja
Institute for Infocomm Research, A*STAR, SingaporeEmail: [ryu, sying, rsusanto]@i2r.a-star.edu.sg
ABSTRACT
In many system identification applications, the unknown system is
characterized by time-varying parameters. Therefore, fast on-line
identification is required in order to keep the system stable and
improve the control performance. In this paper, we show that the
dimensionality of system identification can be dramatically reduced
if the unknown system is sparse, in the sense that its parameter
set has a concise representation when expressed in a proper basis.
In such cases, the system identification can be effectively carried
out in a subspace of reduced dimension. Based on this theory, we
further proposed two new least-mean-square (LMS) algorithms,
namely, prominent system subspace LMS (PSS-LMS) and enhanced
PSS-LMS (PSS-LMS+) to exploit this sparsity for fast system
identification. Finally, we conducted experiments to compare the
convergence performances of PSS-LMS, PSS-LMS+, and conven-
tional LMS using numerical simulation, and the results confirm the
superior performances of the proposed algorithms.
Index Terms— Adaptive filter, least-mean-square (LMS), sys-
tem identification, singular value decomposition.
1. INTRODUCTION
Widrow and Hoff’s least-mean-square (LMS) [1] algorithm and its
variations, such as normalized LMS (NLMS), have been widely
used in system identification due to their simplicity and robustness.
Despite their popularity, however, LMS algorithms in their sim-
plest forms suffer from several important limitations and drawbacks,
namely, unsatisfactory convergence rate and asymptotic perfor-
mance, in particular, for input with an autocovariance matrix that
has large eigenvalue spread. Historically, a significant amount of
research has been devoted to improve the convergence performance
of LMS algorithms. The self-orthogonalization LMS [2] and its low-
complexity variation, the transform-domain LMS [3] use a matrix
convergence factor derived from the inverse of the input autocovari-
ance matrix. It can be shown that the resulting matrix controlling
the convergence speed of self-orthogonalization LMS is the identity
matrix and hence the convergence behavior of LMS is improved.
The consequences of eigenvalue spread can be mitigated by using
a two-stage structure [4][5], whereby a time-domain pre-whitening
filter is used to decorrelate the input signal before LMS adaptation.
Alternatively, in the sparse-LMS algorithm [6], convergence behav-
iors is improved by constraining the LMS adaptation to only a few
large coefficients interspersed among many negligible ones.
In this paper, we study a different source of sparsity of the un-
known system, namely, sparse in the transform domain, to improve
the convergence behavior of LMS. A typical example of such a sys-
tem is an ANC headset [7], the electro-acoustic path from the loud-
speaker to the error microphone is relatively fixed due to form fac-
tor constraints of the headset. Therefore, it may be best modeled
by a “nominal” impulse response determined by the form factor of
the headset plus small perturbations resulting from variations in the
manufacturing process, biological features of users, and wearing po-
sition. It can be shown that such a system can be effectively modeled
using a subspace of reduced dimensionality, in which the unknown
system parameters have significant variances. Based on this theory,
we further introduce two new LMS algorithms, namely, PSS-LMS
and PSS-LMS+ to exploit this sparsity and thus improve their system
identification performance. In PSS-LMS, LMS adaptation is con-
strained only in the prominent system subspace to reduce the dimen-
sionality of the problem for faster convergence; while in PSS-LMS+,
the standard LMS algorithm is modified so that a larger adaptation
factor can be used in the prominent system subspace.
The PSS-LMS algorithms introduced in this paper are conceptu-
ally different than reduced-rank adaptive filtering technologies based
on either eigen-decomposition [8][9] or the multistage Wiener fil-
ter [10]. In those prior-art works, adaptive filtering operation is per-
formed on a subspace spanned by either a subset of the eigenvec-
tors of the covariance matrix of observed input data, or Krylov sub-
space obtained from a successive orthonormalization process. The
focus was thus to identify a subspace such that the mean-square error
(MSE) of the output process is minimized, and no prior knowledge
assumption is made regarding to the distribution of the unknown sys-
tem parameters. On the contrary, PSS-LMS and PSS-LMS+ are de-
veloped based on the assumption that the parameters of the unknown
system are drawn from a non-white process, and the adaptive filter-
ing operation is then performed in a subspace spanned by eigenvec-
tors of the system parameter covariance matrix. The proposed al-
gorithms improve the convergence behavior of LMS even when the
input process is white, which is impossible for traditional fast LMS
methods that rely on correlations of the observed signal.
2. PROMINENT SUBSPACE LMS
2.1. Conventional LMS for System Identification
Given an unknown system h characterized by an Lth-order discrete-
time finite impulse response (FIR) filter and observed signal x(n),for which the output is assumed to be further corrupted by an additive
disturbance u(n), independent to x(n). The desired output of the
unknown system is thus given by:
d(n) = hTx(n) + u(n), (1)
where h∆= [h
1, · · · , h
L]T , x(n)
∆= [x(n), · · · , x(n − L + 1)]T ,
and superscript T denotes transpose. The goal herein is to find a
coefficient vector w = [w1, · · · , wL]T such that the MSE defined
as:
ǫ∆= E{[e(n)]2}
∆= E{[d(n)−w
Tx(n)]2}, (2)
is minimized, where E[·] denotes the statistical expectation oper-
ation. The optimal weight vector wopt can be obtained from the
2012 IEEE Statistical Signal Processing Workshop (SSP)
978-1-4673-0183-1/12/$31.00 ©2012 IEEE 209
following Winner-Hopf equation:
wopt = R−1x px (3)
where Rx∆= E[x(n)xT (n)] and px
∆= E[d(n)x(n)]. It can be
seen that wopt = h and the minimum MSE is given by:
ǫmin = E[d(n)2]− pTxR
−1x px. (4)
In order to adaptively find wopt in (3), the standard LMS algo-
rithm is implemented as a steepest descent algorithm as follows:
w(n+ 1) = w(n) + µe(n)x(n) (5)
where µ is the step sizes controlling convergence speed.
Defining ν(n) = w(n) − wopt as the weight-error vector, the
expected modeling error is given by [11] :
J = E[‖ν(n)‖2] = µLǫmin/2. (6)
On the other hand, the time constants of the LMS algorithm are given
by:
τi = 1/(4µλi), 1 ≤ i ≤ L, (7)
where {λi}Li=1 are the eigenvalues of Rx. Therefore, a better
trade-off between convergence speed and modeling accuracy can be
achieved by either reducing the eigenvalue spread of the observed
data, which is done in the usual transform-domain LMS algorithms,
or by reducing the dimensionality of the system identification prob-
lem.
2.2. Prominent Subspace of a Sparse System
We define an unknown variable system h in the Euclidean space
RL. Eigendecomposition of the system covariance matrix Rh
∆=
E[hhT ] leads to
Rh = VΛhVT , (8)
where V∆= [v
1, . . . ,v
L] denotes an orthonormal matrix of eigen-
vectors {vi∈ R
L}, and Λh∆= diag [λ
h(1), . . . , λ
h(L)] denotes
a diagonal matrix of eigenvalues of Rh. Due to possible electronic
and physical constraints existing in the unknown system, h may not
be uniformly distributed in RL. As a result, the eigenvalues of the
system covariance matrix are in general not equal. Without loss of
generality, we assume the eigenvalues are sorted in descendent order
in (8), thus λh(1) ≥ λ
h(2) ≥ . . . ≥ λ
h(L).
The unknown system h can be expanded in the orthonormal ba-
sis {vi}Li=1 as:
h = Vc, (9)
where c is the projection of the coefficient vector h onto the row
space of V, i.e., c∆= VTh. Denoting V
N
∆= [v1 , . . . ,vN
], the
least-square (LS) modeling of h is simply its projection onto the
span of VN
:
h = VNV
TNh. (10)
The average modeling error J with respect to the distribution of the
unknown system is thus the variance of h in the orthogonal comple-
ment of span VN
, which is given by:
J = E[‖h− h‖2
]= tr
(V
TNR
hV
N
)=
L∑
i=N+1
λh(i), (11)
where VN
∆= [v
N+1, . . . ,v
L]. Clearly, for systems that are sparse
in the sense that λh(i) ≈ 0, N < i ≤ L, the system identifica-
tion can be effectively performed in span VN
without introducing
significant modeling error. We call the subspace Ψ∆= span V
Nthe
prominent subspace of unknown system h.
2.3. LMS Adaptation in Prominent System Subspace
We now turn our attention to the problem of performing system iden-
tification task in the prominent system subspace with the input x(n)and the observed output signal d(n). The identification is achieved
by adjusting the weight coefficients {wi}Ni=1 of the orthonormal ba-
sis {vi}Ni=1 such that the MSE of the error signal e(n) is minimized.
Denoting w(n)∆= [w1(n), . . . , wN(n)]T and x(n)
∆= VT
Nx(n),
the error signal e(n) is given by:
e(n) = d(n)− xT (n)w(n). (12)
The Wiener solution that minimize the MSE of e(n) is thus:
wopt = R−1NN
pNN
, (13)
where RNN
and pNN
are, respectively, the covariance matrix and
the cross-correlation vector of x(n), which are defined as
RNN
∆= E[x(n)xT (n)], (14)
and
pNN
∆= E[d(n)x(n)]. (15)
The Wiener MSE solution wopt is the projection of the unknown
system onto the prominent system subspace Ψ with respect to the
error signal e(n). Therefore, it is in general different than the LS
modeling of h in Ψ, which is given by c∆= VT
Nh, due to the cou-
pling of input signal in the prominent system subspace Ψ and its
orthogonal complement. More specifically, we note that
d(n) = xT (n)Vc+ u(n) = x
T (n)c+ xT (n)c+ u(n), (16)
where, x(n) and c are, respectively, the projections of input signal
x(n) and system coefficient vector h onto the row space of VN
, i.e.,
x(n)∆= VT
Nx(n) and c
∆= VT
Nh. Substituting (16) into (15) and
noting that E[u(n)x(n)] = 0, we have
pNN
= RNN
c+RNN
c, (17)
where RNN
∆= E[x(n)xT (n)] is the cross-correlation matrix of
x(n) and x(n). It now follows from (13) that
wopt = c+R−1NN
RNN
c. (18)
When the unknown system is N -sparse, we have wopt ≈ c since
c ≈ 0 and the MSE of the Wiener solution ǫmin in this case will be
approximated by that of the LS estimate:
ǫmin ≈ E{[d(n)− xT (n)c]2} = σ2
u + cTR
NNc, (19)
which is close to the minimum MSE of the time-domain Wiener filter
ǫmin = σ2u when c ≈ 0. Here R
NN
∆= E[x(n)xT (n)].
Similar to the conventional LMS, the resulting prominence sys-
tem subspace LMS (PSS-LMS) is given as follows:
w(n+ 1) = w(n) + µx(n)e(n). (20)
Where µ is the step size of LMS adaptation. Alternatively, by
noting that the time-domain system model is given by w(n) =VN
−1w(n), the PSS-LMS algorithm in (20) can be equivalently
expressed in the time-domain as follows:
w(n+ 1) = w(n) + µVNVTNx(n)e(n). (21)
210
It can be shown that the expected modeling error of PSS-LMS
when applied to system identification is given as follows:
J = E[‖w(n)− h‖2
]= J + J∆(n), (22)
where
J = E[‖h−VNwopt‖
2] (23)
is the modeling bias due to the projection onto the prominent system
subspace, and
J∆(n) = E[‖VNν(n)‖2] (24)
is the modeling error due to the gradient noise of the LMS algorithm,
where ν(n) = w(n)− wopt is the weight-error vector. Substituting
(18) into (23) shows that J is given by:
J = E[‖(VN
−VNR
−1NN
RNN
)c‖2] = tr[BBTRc], (25)
where B = VN−V
NR−1
NNR
NN, and Rc = E[ccT ] is the covari-
ance of c. Furthermore, if the observed data vector is white, it can
be further shown that BBT = I, and thus we have
J =L∑
i=N+1
λh(i). (26)
On the other hand, denoting Rν(n) = E[ν(n)ν(n)T ], it now fol-
lows from standard LMS analysis that under mild conditions, J∆(n)at steady-state is given by:
J∆ = tr[Rν(n)] = µNǫmin/2, (27)
which can be further simplified to J∆ = 12µNσ2
u if the observed
data vector is white.
2.4. LMS with Fast Adaptation in Prominent System Subspace
One limitation of PSS-LMS is that it only converges to the pro-
jection of the system transfer function onto the prominent system
subspace, which may not be desirable for identification applications
that are sensitive to modeling error. To overcome this limitation, the
PSS-LMS algorithm can be further modified by including null space
gradient descent vectors in the LMS adaptation, which leads to the
following enhanced PSS-LMS (PSS-LMS+):
w(n+1) = w(n)+[µVNVTN +µ(I−VNV
TN)]x(n)e(n), (28)
where µ > 0 controls convergence of the weight vector in the null
space, which is normally smaller than µ to maintain the steady-state
performance of LMS for system with large dimensionality.
The LMS adaptation algorithm in (28) can be equivalently writ-
ten as
w(n+ 1) = w(n) + µVΛκVTx(n)e(n), (29)
where Λκ = diag[κ, . . . , κ, 1, . . . , 1], and κ = µ/µ. It can be seen
that (29) is further simplified to standard LMS adaptation as follows:
w(n+ 1) = w(n) + µx(n)e(n), (30)
where
w(n) = Λ−1/2κ V
−1w(n), (31)
and
x(n) = Λ1/2κ V
Tx(n). (32)
Based on the standard LMS foundation (30), it can be seen that
w(n) will, in general, converge to Λ−1/2κ V−1wopt in the mean
sense. Therefore, the PSS-LMS+ algorithm will converge to the full-
dimension Wiener wopt, which is desirable.
Denote the weight-error vector of PSS-LMS+ as ν(n) =w(n) − wopt, and the weight-error vector in the transform do-
main as ν(n) = w(n) − Λ−1/2κ V−1wopt. It is straightforward to
show that ν(n) = Λ1/2κ Vν(n) and hence the expected modeling
error can be calculated as follows:
J = E[‖ν(n)‖2] = E[‖Λ1/2κ Vν(n)‖2] = tr[ΛκRν(n)], (33)
where Rν(n) = E[ν(n)ν(n)T ]. It can be shown that in steady state
we have
Rν(n) = µǫminI/2. (34)
Therefore, the average modeling error of PSS-LMS+ is given by:
J = µǫmintr[Λκ]/2 = [(L−N)µ+Nµ] ǫmin/2. (35)
Clearly, for a sparse system with N ≪ L, it is possible to use a large
adaptation factor µ in the prominent system subspace to speed up
the convergence of the PSS-LMS+ algorithm without significantly
degrading the modeling accuracy, which is not possible if standard
LMS algorithm is used.
3. EXPERIMENT RESULTS
In this section, we present experimental results to verify the theory
developed in previous sections. To this end, we compare the perfor-
mances of different LMS algorithms when they are applied to iden-
tify the system transfer function of the electro-acoustic path between
the loudspeaker and microphone of an experimental ANC headset.
In our experiment, we first collected M = 640 sets of FIR filter co-
efficients hm,m = 1, . . . ,M from the electro-acoustic path of the
ANC headset when it was worn by different human subjects at nor-
mal wearing positions and positions that are slightly deviated from
the normal wearing position. The measurements were conducted in
a quiet listening room. The sampling rate is 8 kHz and the length
of the FIR filter is 64. The eigen-values of the sample covariance
matrix from the observed FIR coefficients are illustrated in Fig. 1,
which clearly demonstrates that the system under investigation is
highly sparse in the transform domain.
We further investigate the relationship between the modeling er-
ror of the PSS-LMS algorithm and the dimension of the prominent
system subspace. The results are given in Fig. 2. To obtain these
results, we assume that both the observed signal x(n) and noise sig-
nal u(n) are white noises with variances σ2x = 1 and σ2
u = 0.01,
respectively. It can be observed that, when step size µ = 0.1984,
the average modeling error of PSS-LMS decreases with increasing
PSS dimension N , reaching its minimum at N = 7. Increasing PSS
dimension beyond that point, the modeling error increases due to the
presence of gradient noise. It can also be observed that smaller µcould in general lead to small average modeling error, and move the
minimum modeling error point to higher PSS dimensions since the
magnitudes of gradient noise are reduced for smaller µ.
We now compare the convergence performance of PSS-LMS,
PSS-LMS+, and conventional LMS algorithms. In our experiments,
the unknown system was constructed by cascading two randomly
chosen FIR filters from the M observations. Fig. 3 shows the av-
erage modeling error vs. number of iterations from different al-
gorithms. The LMS step sizes for LMS and PSS-LMS are µ =0.003 and µ = 0.066, respectively. For PSS-LMS+, two sets of
LMS step sizes, namely, {µ = 0.003, µ = 0.066} and {µ =0.0016, µ = 0.034} are used to generate the curves PSS-LMS+(1)
and PSS-LMS+(2), respectively. The dimension of the prominent
system subspace is set to N = 3. It can be seen that the PSS-LMS
converges much faster than LMS, taking only about 100 iterations
211
0 10 20 30 40 50 6010
−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Eigenvalues in descending order
Eig
en
va
lue
Fig. 1. Eigenvalues in descending order.
0 5 10 15 20−26
−24
−22
−20
−18
−16
−14
−12
−10
−8
−6
PSS−LMS dimension
Ave
rag
e m
od
elin
g e
rro
r [d
B]
PSS−LMS average modeling error vs. dimension
µ=0.1984
µ=0.0992
µ=0.0661
µ=0.0496
µ=0.0397
Fig. 2. PSS-LMS modeling error vs. dimension.
to reach steady state, while the LMS algorithm takes about 1500 it-
erations. Unfortunately, it has a larger modeling error steady state.
The PSS-LMS+ algorithm, using the same step sizes, will have ini-
tial converge behavior similar to PSS-LMS, which is much faster
than LMS, and further converge to a full-rank solution with lower
modeling error after the initial convergence stage (PSS-LMS+(1)).
Furthermore, by adjusting the step size, the same modeling error per-
formance as that of full-rank LMS algorithm can be achieved (PSS-
LMS+(2)).
4. CONCLUSIONS
In this paper, we have proposed two new adaptive algorithms,
namely, PSS-LMS and PSS-LMS+, for fast identification of systems
that are sparse in the transform domain. The PSS-LMS constrains
the LMS adaptation in the prominent subspace of the unknown sys-
tem, thus effectively improves the convergence speed of the LMS
algorithm at the cost of larger modeling errors in steady state. On the
other hand, the PSS-LMS+ algorithm combines the virtues of both
PSS-LMS and LMS by using a large LMS adaption step only in the
prominent system subspace, thus improving the convergence speed
of the LMS algorithm without introducing significant modeling
error. The merits of PSS-LMS and PSS-LMS+ over conventional
LMS is confirmed in our experiments using examples from electro-
acoustic path modeling for ANC headset.
0 1000 2000 3000 4000 5000 6000−30
−25
−20
−15
−10
−5
0
5LMS, PSS−LMS and PSS−LMS+ with 3 largest eigenvector filters
Iteraions
Ave
rag
e m
od
elin
g e
rro
r [d
B]
LMS
PSS−LMS
PSS−LMS+(1)
PSS−LMS+(2)
Fig. 3. Comparison of identification performance.
5. REFERENCES
[1] B. Widrow, “Adaptive filters,” in Aspects of Network and Sys-
tem Theory, R. E. Kalman and N. DeClaris, Eds. New York:
Holt, Rinehart and Winston, 1970, pp. 563–587.
[2] R. Gitlin and J. F. Magee, “Self-orthogonalizing adaptive
equalization algorithms,” IEEE Trans. Commun., vol. 25, no. 7,
pp. 666–672, Jul. 1977.
[3] S. S. Narayan and A. M. Peterson, “Frequency domain least-
mean-square algorithm,” Proc. IEEE, vol. 69, no. 1, pp. 124–
126, Jan. 1981.
[4] M. Mboup, M. Bonnet, and N. Bershad, “LMS coupled adap-
tive prediction and system identification: a statistical model
and transient mean analysis,” IEEE Trans. Signal Processing,
vol. 42, no. 10, pp. 2607–2615, Oct 1994.
[5] R. Yu and C. C. Ko, “Lossless compression of digital audio us-
ing cascaded RLS-LMS prediction,” IEEE Trans. Speech and
Audio Processing, vol. 11, no. 6, pp. 532–537, Nov. 2003.
[6] Y. Chen, Y. Gu, and A. Hero, “Sparse LMS for system iden-
tification,” in IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) 2009, April 2009.
[7] Y. Song, Y. Gong, and S. M. Kuo, “A robust hybrid feedback
active noise cancellation headset,” IEEE Trans. Speech and Au-
dio Processing, vol. 13, no. 4, pp. 607–617, July 2005.
[8] W. Gabriel, “Using spectral estimation techniques in adaptive
processing antenna systems,” IEEE Trans. Antennas and Prop-
agation, vol. 34, no. 3, pp. 291 – 300, Mar 1986.
[9] L. Scharf and D. Tufts, “Rank reduction for modeling station-
ary signals,” IEEE Trans. Acoustics, Speech and Signal Pro-
cessing, vol. 35, no. 3, pp. 350 – 355, Mar 1987.
[10] J. S. Goldstein, I. S. Reed, and L. L. Scharf, “A multistage
representation of the wiener filter based on orthogonal pro-
jections,” IEEE Trans. Information Theory, vol. 44, no. 7, pp.
2943 –2959, Nov 1998.
[11] B. Widrow, J. M. McCool, M. G. Larimore, and C. R. John-
son, “Stationary and nonstationary learning characteristics of
the lms adaptive filter,” Proc IEEE, vol. 64, no. 8, pp. 1151–
1162, Aug. 1976.
212