Upload
wan-xin
View
213
Download
1
Embed Size (px)
Citation preview
An adaptive prediction algorithm based on maximum correntropy criterion
Bu Yun a, Kang Wan-Xin
School of Electrical and Information Engineering, Xihua University, Chengdu, 610039, China
aemail: [email protected]
Keywords: Chaotic time series; adaptive prediction; maximum correntropy criterion
Abstract. The traditional cost function, minimization mean square prediction error is a second
order statistic, and it is based on the error Gaussian distribution and linear assumption. But chaotic
signals are non-Gaussian, so the optimization criterion is not suitable. Then we present using the
robust optimization criterion, maximum correntropy to replace the popular minima mean square
error criterion minimization error. In simulation, the algorithm shows an improved performance to a
common three-order Volterra prediction.
Introduction
Predicting the future trajectories of a chaotic system is very difficult for its sensitive to initial
conditions and noise [1]. But the barrier doesn’t stop people exploring various methods to find out
the future values of a chaotic system, for prediction methods are of attractive applications, such as
stock price predictions. Theoretically, any prediction approach is to construct a feasible model that
can compute the future value:
ˆ ( ) ( ( ))i i
x n T f X nα+ = ∑ (1)
where fi(x) represents a basis function, X(n) denotes its input signals, and iα is model coefficients.
Equation (1) shows that different methods distinguish each other in their input data, basis
functions and optimization method to achieve model coefficients. Some approaches employ local
signals, the closest values or the closest neighbors and their images as input vectors, consequently
they are called local prediction [2]. Local models are successful not only in computer simulations,
but also in practice. The good performance comes from the character that they are model free. But
local predictions are only suitable where the data set is large enough and noiseless, and the
predicted values can’t be out of the range of the known data. To overcome the shortcoming, global
prediction methods were proposed that using all the known data as input signals to train prediction
models. Global methods can employ more information from the known signals and can show better
performance. The second feature that can classify different methods laid on their basis functions.
Various basis functions can be found in the presented algorithms, where popular basis functions
include polynomial functions, Volterra series [3] [4], radial basis functions, fuzzy functions and
neural networks [5] [6]. There is no the best one that can show supper performance than the others,
because each method has its advantages, as well as some shortcomings. For example, neural
network based approaches have good fitting capacity for different nonlinear systems, but their
complicated structures will lead to much effort, awkward update and slowly convergence.
Consequently, few of the methods are used in online systems. However, much endeavor has been
making to construction more sophisticated methods that can overcome the obstacles to track more
complex chaotic systems. The last obvious difference among prediction algorithms is the way that
the model coefficients are calculated, namely the used optimization strategy. Using minimization
mean square error (MSE) criterion may be the most popular method to get the model coefficients.
But the method is only suitable of Gaussian distribution of prediction errors and linear assumption.
As well known, chaotic signals are non-Gaussian and their higher order cumulants are not zero, so
the second-order statistic based MSE is not an advisable choice. Recently, a new cost function,
Applied Mechanics and Materials Vols. 380-384 (2013) pp 1310-1313Online available since 2013/Aug/30 at www.scientific.net© (2013) Trans Tech Publications, Switzerlanddoi:10.4028/www.scientific.net/AMM.380-384.1310
All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of TTP,www.ttp.net. (ID: 130.207.50.37, Georgia Tech Library, Atlanta, USA-14/11/14,05:06:41)
correntropy was proposed to deal with the non-Gaussian and nonlinear optimization problems and
shows supper performance than the MSE criterion. In the following of the paper, correntropy will
be introduced into the prediction algorithm to replace the MSE criterion.
Maximum correntropy criterion (MCC)
Recently, correntropy was presented for measuring how similar two random variables are [7] [8].
It is defined as:
( , ) [ ( )]V X Y E X Yσ σκ= − (2)
where X and Y are two random variables, ( )σκ i is a positive definite kernel with the width being
σ , and E(·) denotes the expectation operator.
A popular kernel is Gaussian function:
2 2
2 2
1 | | 1 | |(| |) exp( ) exp( )
2 22 2
X Y eX Yσκ
σ σπσ πσ−− = − = − (3)
From (2) and (3), when e=0, correntropy reaches its maximum value. Hence the optimization
strategy is named as maximum correntropy criterion (MCC). Obviously, correntropy is bounded and
very robust, and its Taylor expanding is: 2
20
1 ( 1) ( )( ) ( )
2 !2
n n
n nn
X Yk X Y E
nσ σπσ
∞
=
− −− = ∑ (4)
Eqn. (4) tells that correntropy is composed of all the even moments of the error. From this
expanding, we can know that even if error is not a Gaussian distributed and its higher-order
moments are existed, as kernel width is larger the largest even moment of error, the higher-order
moments will be repressed. Consequently the cost function can utilize more information from the
error, and don’t rely on the Gaussianity and linearity assumptions. So MCC has been successfully
employed in nonlinear and non-Gaussian signal processing. Besides this, in [16], Chen proved that
the MC estimation has a unique optimal solution in a concave region if the kernel size is large
enough.
In adaptively calculating the model coefficients, the optimal solution is often searched by
iterative gradient ascent method, and the stochastic gradient is approximates using the current value
for online update:
2
23
( ( ) ( ))( 1) ( )
( )
( )( ) exp( ) ( ) ( )
22
d n y nn n
n
e nn e n X n
σκω ω ηω
ηωσπ σ
∂ −+ = +
∂
= + − (5)
where y(n) is predicted value, d(n) is the actual value, X(n) represents input vector, η is step length,
and e(n) denotes the prediction error.
In (5), if the kernel width is pre-setted, 32πσ is a constant and can be ignored, then the
increment of coefficients can be approximated as: 2
2
( )( 1) ( ) exp( ) ( ) ( )
2
e nn n e n X nω ω η
σ+ = + − (6)
Because the kernel width is a key character of MCC optimization, and we generally don’t
know the suitable width before training, a feasible method is adaptively achieving the width during
training procedure as:
1
1( ) ( )
n
k
e n e kn =
= ∑ (7)
Applied Mechanics and Materials Vols. 380-384 1311
2 2
1
1( ) ( ( ) ( ))
n
k
n e k e nn
σ=
= −∑ (8)
Based on (7) and (8), the update rule can be expressed as:
2
2
( ( ) ( ))( 1) ( ) exp( ) ( ) ( )
2 ( )
e n e nn n e n X n
nω ω η
σ−+ = + − (9)
In this case, 32πσ is not a constant, but for simplification, it is ignored again. Of course,
estimation kernel width can only use local error rather than all of the error, for correntropy measures
local similar of two random variables.
Computer Simulation and Conclusion
To test the MCC algorithm, we use a three-order Volterra prediction to test its performance:
1 1 2
1 1 2
1 2
1 2
1
1 1 2
0
... 1 2
...
ˆ( 1) ( ) ( ) ( )
... ( ) ( )... ( ) ...p
p
L
i i i
i i i
i i i p
i i i
y k a a y k i a y k i y k i
a y k i y k i y k i
−
= ≤
≤ ≤ ≤
+ = + − + − −
+ + − − − +
∑ ∑
∑ (10)
In simulation, we used the quantity, mean square error (MSE) to monitor the power ratio of the
prediction error to signal:
2
110
2
1
( )
10log [ ]
( )
K
i
K
i
e i
MSE
x i
=
=
=∑
∑ (11)
A complex chaotic map, coupled Logistic system was used as tested system:
( 1) 4 ( )(1 ( )) ( ) ( ),
( 1) 4 ( )(1 ( )) ( ) ( )
x n ax n x n rx n y n
y n ay n y n rx n y n
+ = − ++ = − +
(12)
where 0.7, 0.64a r= = .
5500 time series were produced by the equation (12) using a random initial value, where the
former 5000 data were used to training the prediction model, and the residuals were employed as
test data. All the data were normalized to within -1 and 1 in advance:
(max( ) min( )) / 2
(max( ) min( )) / 2
( ) /
A x x
B x x
y x A B
= += −= −
(13)
The simulation results are shown as Fig. 1 and Fig. 2.
Fig.1. Predicted values and actual values
1312 Vehicle, Mechatronics and Information Technologies
Fig.2. The prediction error
From Fig.1, we can see that predicted values are very close to the actual values, and the Fig.2
tells that the prediction error is very small and most of them are less than 0.1. The MSE is -16.8dB,
and the mean value of absolute error is 0.06, and its standard variance is also near 0.06. All the
results show the MCC based adaptive algorithm is of good performance.
Acknowledgement
This work is supported by the research foundation of the education bureau of Si Chuan Province
and Xihua University through Grants No. 12ZB132 and No.Z1120944.
References
[1] Kantz H. and Schreiber T. Nonlinear time series analysis. Cambridge university press, 2004.
[2] Gong Xiaofeng, and C. H. Lai, Phys. Rev. E, Improvement of the local prediction of chaotic
time series, Vol. 60, No. 5, pp. 5463-5468, Nov. 1999
[3] BU Yun, WEN Guang-Jun, ZHOU Xiao-Jia, ZHANG Qiang. A novel adaptive predictor for
chaotic time series. Chin. Phys. Lett. 2009, vol.26, No.10: 100502
[4] Zhang J. S, Xiao X.C. Prediction of chaotic time series by using adaptive higher-order
nonlinear fourier infrared filter. Acta Physica Sinica, vol.49, No.7, 2000: 1221-07
[5] Min Han, Jianhui Xi, Shiguo Xu, and Fu-Liang Yin, Prediction of chaotic time series based on
the recurrent predictor neural network, IEEE Trans. on Sig. Proc. Vol. 52, No. 2, pp. 3409-3416,
Dec. 2004
[6] Zhang Jia-Shu, and Xiao Xian-Ci, Predicting chaotic time series using recurrent neural
network, Chin. Phys. Lett. Vol. 17, No. 2, pp. 88-90, 2000
[7] Badong Chden and J. C. Principe, “Maximum correntropy estimation is a smoothed MAP
estimation,” IEEE Signal Process. Lett., vol. 19, no. 8, pp. 491- 494, Aug. 2012
[8] Weifeng Liu, P. P. Pokharel, and J. C. Principe, “Correntropy: properties and applications in
non-Gaussian signal processing,” IEEE Signal Process., vol. 55, no. 11, pp. 5286-5298, Nov. 2007
Applied Mechanics and Materials Vols. 380-384 1313
Vehicle, Mechatronics and Information Technologies 10.4028/www.scientific.net/AMM.380-384 An Adaptive Prediction Algorithm Based on Maximum Correntropy Criterion 10.4028/www.scientific.net/AMM.380-384.1310