An Adaptive Prediction Algorithm Based on Maximum Correntropy Criterion

An adaptive prediction algorithm based on maximum correntropy criterion

Bu Yun a, Kang Wan-Xin

School of Electrical and Information Engineering, Xihua University, Chengdu, 610039, China

aemail: [email protected]

Keywords: Chaotic time series; adaptive prediction; maximum correntropy criterion

Abstract. The traditional cost function, minimization mean square prediction error is a second

order statistic, and it is based on the error Gaussian distribution and linear assumption. But chaotic

signals are non-Gaussian, so the optimization criterion is not suitable. Then we present using the

robust optimization criterion, maximum correntropy to replace the popular minima mean square

error criterion minimization error. In simulation, the algorithm shows an improved performance to a

common three-order Volterra prediction.

Introduction

Predicting the future trajectories of a chaotic system is very difficult for its sensitive to initial

conditions and noise [1]. But the barrier doesn’t stop people exploring various methods to find out

the future values of a chaotic system, for prediction methods are of attractive applications, such as

stock price predictions. Theoretically, any prediction approach is to construct a feasible model that

can compute the future value:

ˆ ( ) ( ( ))i i

x n T f X nα+ = ∑ (1)

where fi(x) represents a basis function, X(n) denotes its input signals, and iα is model coefficients.

Equation (1) shows that different methods distinguish each other in their input data, basis

functions and optimization method to achieve model coefficients. Some approaches employ local

signals, the closest values or the closest neighbors and their images as input vectors, consequently

they are called local prediction [2]. Local models are successful not only in computer simulations,

but also in practice. The good performance comes from the character that they are model free. But

local predictions are only suitable where the data set is large enough and noiseless, and the

predicted values can’t be out of the range of the known data. To overcome the shortcoming, global

prediction methods were proposed that using all the known data as input signals to train prediction

models. Global methods can employ more information from the known signals and can show better

performance. The second feature that can classify different methods laid on their basis functions.

Various basis functions can be found in the presented algorithms, where popular basis functions

include polynomial functions, Volterra series [3] [4], radial basis functions, fuzzy functions and

neural networks [5] [6]. There is no the best one that can show supper performance than the others,

because each method has its advantages, as well as some shortcomings. For example, neural

network based approaches have good fitting capacity for different nonlinear systems, but their

complicated structures will lead to much effort, awkward update and slowly convergence.

Consequently, few of the methods are used in online systems. However, much endeavor has been

making to construction more sophisticated methods that can overcome the obstacles to track more

complex chaotic systems. The last obvious difference among prediction algorithms is the way that

the model coefficients are calculated, namely the used optimization strategy. Using minimization

mean square error (MSE) criterion may be the most popular method to get the model coefficients.

But the method is only suitable of Gaussian distribution of prediction errors and linear assumption.

As well known, chaotic signals are non-Gaussian and their higher order cumulants are not zero, so

the second-order statistic based MSE is not an advisable choice. Recently, a new cost function,

Applied Mechanics and Materials Vols. 380-384 (2013) pp 1310-1313Online available since 2013/Aug/30 at www.scientific.net© (2013) Trans Tech Publications, Switzerlanddoi:10.4028/www.scientific.net/AMM.380-384.1310

All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of TTP,www.ttp.net. (ID: 130.207.50.37, Georgia Tech Library, Atlanta, USA-14/11/14,05:06:41)

http://www.scientific.net

http://www.ttp.net

correntropy was proposed to deal with the non-Gaussian and nonlinear optimization problems and

shows supper performance than the MSE criterion. In the following of the paper, correntropy will

be introduced into the prediction algorithm to replace the MSE criterion.

Maximum correntropy criterion (MCC)

Recently, correntropy was presented for measuring how similar two random variables are [7] [8].

It is defined as:

( , ) [ ( )]V X Y E X Yσ σκ= − (2)

where X and Y are two random variables, ( )σκ i is a positive definite kernel with the width being

σ , and E(·) denotes the expectation operator.

A popular kernel is Gaussian function:

2 2

2 2

1 | | 1 | |(| |) exp( ) exp( )

2 22 2

X Y eX Yσκ

σ σπσ πσ−− = − = − (3)

From (2) and (3), when e=0, correntropy reaches its maximum value. Hence the optimization

strategy is named as maximum correntropy criterion (MCC). Obviously, correntropy is bounded and

very robust, and its Taylor expanding is: 2

20

1 ( 1) ( )( ) ( )

2 !2

n n

n nn

X Yk X Y E

nσ σπσ

∞

=

− −− = ∑ (4)

Eqn. (4) tells that correntropy is composed of all the even moments of the error. From this

expanding, we can know that even if error is not a Gaussian distributed and its higher-order

moments are existed, as kernel width is larger the largest even moment of error, the higher-order

moments will be repressed. Consequently the cost function can utilize more information from the

error, and don’t rely on the Gaussianity and linearity assumptions. So MCC has been successfully

employed in nonlinear and non-Gaussian signal processing. Besides this, in [16], Chen proved that

the MC estimation has a unique optimal solution in a concave region if the kernel size is large

enough.

In adaptively calculating the model coefficients, the optimal solution is often searched by

iterative gradient ascent method, and the stochastic gradient is approximates using the current value

for online update:

2

23

( ( ) ( ))( 1) ( )

( )

( )( ) exp( ) ( ) ( )

22

d n y nn n

n

e nn e n X n

σκω ω ηω

ηωσπ σ

∂ −+ = +

∂

= + − (5)

where y(n) is predicted value, d(n) is the actual value, X(n) represents input vector, η is step length,

and e(n) denotes the prediction error.

In (5), if the kernel width is pre-setted, 32πσ is a constant and can be ignored, then the

increment of coefficients can be approximated as: 2

2

( )( 1) ( ) exp( ) ( ) ( )

2

e nn n e n X nω ω η

σ+ = + − (6)

Because the kernel width is a key character of MCC optimization, and we generally don’t

know the suitable width before training, a feasible method is adaptively achieving the width during

training procedure as:

1

1( ) ( )

n

k

e n e kn =

= ∑ (7)

Applied Mechanics and Materials Vols. 380-384 1311

2 2

1

1( ) ( ( ) ( ))

n

k

n e k e nn

σ=

= −∑ (8)

Based on (7) and (8), the update rule can be expressed as:

2

2

( ( ) ( ))( 1) ( ) exp( ) ( ) ( )

2 ( )

e n e nn n e n X n

nω ω η

σ−+ = + − (9)

In this case, 32πσ is not a constant, but for simplification, it is ignored again. Of course,

estimation kernel width can only use local error rather than all of the error, for correntropy measures

local similar of two random variables.

Computer Simulation and Conclusion

To test the MCC algorithm, we use a three-order Volterra prediction to test its performance:

1 1 2

1 1 2

1 2

1 2

1

1 1 2

0

... 1 2

...

ˆ( 1) ( ) ( ) ( )

... ( ) ( )... ( ) ...p

p

L

i i i

i i i

i i i p

i i i

y k a a y k i a y k i y k i

a y k i y k i y k i

−

= ≤

≤ ≤ ≤

+ = + − + − −

+ + − − − +

∑ ∑

∑ (10)

In simulation, we used the quantity, mean square error (MSE) to monitor the power ratio of the

prediction error to signal:

2

110

2

1

( )

10log [ ]

( )

K

i

K

i

e i

MSE

x i

=

=

=∑

∑ (11)

A complex chaotic map, coupled Logistic system was used as tested system:

( 1) 4 ( )(1 ( )) ( ) ( ),

( 1) 4 ( )(1 ( )) ( ) ( )

x n ax n x n rx n y n

y n ay n y n rx n y n

+ = − ++ = − +

(12)

where 0.7, 0.64a r= = .

5500 time series were produced by the equation (12) using a random initial value, where the

former 5000 data were used to training the prediction model, and the residuals were employed as

test data. All the data were normalized to within -1 and 1 in advance:

(max( ) min( )) / 2

(max( ) min( )) / 2

( ) /

A x x

B x x

y x A B

= += −= −

(13)

The simulation results are shown as Fig. 1 and Fig. 2.

Fig.1. Predicted values and actual values

1312 Vehicle, Mechatronics and Information Technologies

Fig.2. The prediction error

From Fig.1, we can see that predicted values are very close to the actual values, and the Fig.2

tells that the prediction error is very small and most of them are less than 0.1. The MSE is -16.8dB,

and the mean value of absolute error is 0.06, and its standard variance is also near 0.06. All the

results show the MCC based adaptive algorithm is of good performance.

Acknowledgement

This work is supported by the research foundation of the education bureau of Si Chuan Province

and Xihua University through Grants No. 12ZB132 and No.Z1120944.

References

[1] Kantz H. and Schreiber T. Nonlinear time series analysis. Cambridge university press, 2004.

[2] Gong Xiaofeng, and C. H. Lai, Phys. Rev. E, Improvement of the local prediction of chaotic

time series, Vol. 60, No. 5, pp. 5463-5468, Nov. 1999

[3] BU Yun, WEN Guang-Jun, ZHOU Xiao-Jia, ZHANG Qiang. A novel adaptive predictor for

chaotic time series. Chin. Phys. Lett. 2009, vol.26, No.10: 100502

[4] Zhang J. S, Xiao X.C. Prediction of chaotic time series by using adaptive higher-order

nonlinear fourier infrared filter. Acta Physica Sinica, vol.49, No.7, 2000: 1221-07

[5] Min Han, Jianhui Xi, Shiguo Xu, and Fu-Liang Yin, Prediction of chaotic time series based on

the recurrent predictor neural network, IEEE Trans. on Sig. Proc. Vol. 52, No. 2, pp. 3409-3416,

Dec. 2004

[6] Zhang Jia-Shu, and Xiao Xian-Ci, Predicting chaotic time series using recurrent neural

network, Chin. Phys. Lett. Vol. 17, No. 2, pp. 88-90, 2000

[7] Badong Chden and J. C. Principe, “Maximum correntropy estimation is a smoothed MAP

estimation,” IEEE Signal Process. Lett., vol. 19, no. 8, pp. 491- 494, Aug. 2012

[8] Weifeng Liu, P. P. Pokharel, and J. C. Principe, “Correntropy: properties and applications in

non-Gaussian signal processing,” IEEE Signal Process., vol. 55, no. 11, pp. 5286-5298, Nov. 2007

Applied Mechanics and Materials Vols. 380-384 1313

Vehicle, Mechatronics and Information Technologies 10.4028/www.scientific.net/AMM.380-384 An Adaptive Prediction Algorithm Based on Maximum Correntropy Criterion 10.4028/www.scientific.net/AMM.380-384.1310

http://dx.doi.org/www.scientific.net/AMM.380-384

http://dx.doi.org/www.scientific.net/AMM.380-384.1310

Documents

An Adaptive Prediction Algorithm Based on Maximum Correntropy Criterion