7
APForecast: An Adaptive Forecasting Method for Data Streams Yong-li Wang 1,2 , Hong-bing Xu 1 , Yi-sheng Dong 1 , Xue-jun Liu 1 , and Jiang-bo Qian 1 1 Department of Computer Science and Engineering,Southeast University SiPaiLou No.2, Nanjing, 210096, China wyl [email protected], {hbxu,ysdong,xuliu,jbqian}@seu.edu.cn 2 Department of Common Computer Teaching, Jiamusi University Jiamusi 154007, P.R. China Abstract. This research investigates continuous forecasting queries with alterable forecasting-step over data streams. A novel Adaptive Pre- cision forecasting method to forecasting single attribute value of item in a single steam (stream-value), called AForecast, is proposed. The con- cepts of dual slide windows and forecasting-steps conduction operator are introduced. AForecast determines multiple forecasting-step based on the change ratio of stream-value and forecasts random-variant stream- value using relative precise prediction of deterministic components of data streams. Based on the theory of interpolating wavelet and optimal linear kalman filtering, this method can approximately generate optimal forecasting precision. Experiment results on actual power load data prove that this method can provide online accurate prediction on stream-value. 1 Introduction In many modern applications, such as industry controlling and networking, as well as in new and emerging applications, like sensor networks and pervasive com- puting, data are commonly viewed as an infinite, possibly ordered data sequences rather than a finite data set stored on disk. These infinite (potentially) ordered sequences of data are referred to as data streams [1] . The characteristics of data streams processing, such as online continuous computing, one-pass scanning and limited time-space resources etc, put forward new challenges on traditional fore- casting methods. For example, a fundamental objective of a power-system oper- ating and controlling scheme is to maintain a match between the system’s overall real-power load and generation. The Automatic Generation Control (AGC) loop addresses this objective by using system load and electrical frequency samples to periodically update the set-point power for key “swing” generators with a con- trol sample rate ranging from 1 to 10 min. To improve performance, emerging AGC strategies employ a look-ahead control algorithm that requires real-time estimates of the system’s future load out to several samples using a one to ten minute sample period (a traditional typical horizon of 30 to 120 min). An alter- able forecasting-step strategy can provid a flexible control tracks that adapt to the time-variant real-power load. R. Khosla et al. (Eds.): KES 2005, LNAI 3682, pp. 957–963, 2005. c Springer-Verlag Berlin Heidelberg 2005

[Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

Embed Size (px)

Citation preview

Page 1: [Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

APForecast: An Adaptive Forecasting Methodfor Data Streams

Yong-li Wang1,2, Hong-bing Xu1, Yi-sheng Dong1,Xue-jun Liu1, and Jiang-bo Qian1

1 Department of Computer Science and Engineering,Southeast UniversitySiPaiLou No.2, Nanjing, 210096, China

wyl [email protected], {hbxu,ysdong,xuliu,jbqian}@seu.edu.cn2 Department of Common Computer Teaching, Jiamusi University

Jiamusi 154007, P.R. China

Abstract. This research investigates continuous forecasting querieswith alterable forecasting-step over data streams. A novel Adaptive Pre-cision forecasting method to forecasting single attribute value of item ina single steam (stream-value), called AForecast, is proposed. The con-cepts of dual slide windows and forecasting-steps conduction operatorare introduced. AForecast determines multiple forecasting-step based onthe change ratio of stream-value and forecasts random-variant stream-value using relative precise prediction of deterministic components ofdata streams. Based on the theory of interpolating wavelet and optimallinear kalman filtering, this method can approximately generate optimalforecasting precision. Experiment results on actual power load data provethat this method can provide online accurate prediction on stream-value.

1 Introduction

In many modern applications, such as industry controlling and networking, aswell as in new and emerging applications, like sensor networks and pervasive com-puting, data are commonly viewed as an infinite, possibly ordered data sequencesrather than a finite data set stored on disk. These infinite (potentially) orderedsequences of data are referred to as data streams [1]. The characteristics of datastreams processing, such as online continuous computing, one-pass scanning andlimited time-space resources etc, put forward new challenges on traditional fore-casting methods. For example, a fundamental objective of a power-system oper-ating and controlling scheme is to maintain a match between the system’s overallreal-power load and generation. The Automatic Generation Control (AGC) loopaddresses this objective by using system load and electrical frequency samples toperiodically update the set-point power for key “swing” generators with a con-trol sample rate ranging from 1 to 10 min. To improve performance, emergingAGC strategies employ a look-ahead control algorithm that requires real-timeestimates of the system’s future load out to several samples using a one to tenminute sample period (a traditional typical horizon of 30 to 120 min). An alter-able forecasting-step strategy can provid a flexible control tracks that adapt tothe time-variant real-power load.

R. Khosla et al. (Eds.): KES 2005, LNAI 3682, pp. 957–963, 2005.c© Springer-Verlag Berlin Heidelberg 2005

Page 2: [Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

958 Yong-li Wang et al.

The artificial Intelligence methods are suitable for forecasting periodic fore-cast. Its characteristic is higher precision and more expensive computing com-plexity. The extrapolate method’s run-velocity is rapid but hard to forecastrandom variant non-linear component. Its forecast precision is low. If these twomethods are applied to online forecasting separately, the forecasting results willnot be satisfactory.

We believe that the mixed metheod that combines the precision advantage ofcomplex forecasting methods and the velocity advantage of linear extrapolatingforecasting methods is a good choice. We propose a forecasting method overdata streams that can balance between speed and precision using an adaptivestrategy, named Adaptive Forecasting model (AForecast briefly), in this paper.It use less forecasting points at the period when the stream is relatively smooth,and use more forecasting points at the period when the change of stream-valueis relatively intensive. Experiment prove it can get a higher average forecastingprecision with a low computation complexity.

2 Related Works

At present, the existing theories and methods about trend analyzing on datastreams mainly focus on the analysis of similarity, abnormity and the differenceof patterns [2]. But there are very few literature in forecasting stream-value.A linear regression analyzing and interpolating method used to forecast theinstantaneous stream-value with w future window size is proposed in [3]. Datastreams have also been treated as time series, and ideas from control theory areborrowed for the purposes of resource management. Kalman Filter is selected asa general and adaptive filtering solution for conserving resources in [4]. In powerload forecasting domain, three practical techniques, Fuzzy Logic (FL), NeuralNetworks (NN), and Auto-regressive model (AR) for 30min load forecasting havebeen compared and discussed in [5]. It concludes that FL and NN can be goodcandidates for predicting the very short-term load trends on-line.

However, none of these methods mentioned above combines with the rela-tively stable forecasting information (forecasting-step≥1hr). Moreover, to thebest of our knowledge, almost all methods forecast the future streams-value in afixed forecasting-step manner. The fixed forecasting-step methods performs badin cases of data streams with the characteristics of random change period.

3 Continuous Forecasting Query Model and Definition

A stream is an ordered sequence of items that resembles tuples in relationdatabases. In a data stream context, stream-value denotes a single attributevalue of item in a single steam; forecasting-step denotes the interval between theactual measured stream-value and the forecasting stream-value. In research ofpower systems, stream-value denotes the load-value.

In actual applications, a discrete data stream s(t) is composed of deter-ministic components sd(t) and stochastic components sr(t). The deterministic

Page 3: [Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

APForecast: An Adaptive Forecasting Method for Data Streams 959

components of data streams are primarily a result of the cause-effect dependenceon measurable inputs such as time of day, day of week, temperature, etc. Thestochastic components are primarily driven by the random variations in the ac-quiring environment. sd(t) is more easily to forecast accurately. Consequently,forecasting for stream-value is left to forecast sr(t).

In order to implement the continuous forecasting for the two componentsof data streams, we introduce the definition of dual slide window. A windowof length I, called I interval windows, is used as the base unit for updating thedeterministic components. A window of alterable length, called alterable intervalwindows, is used as the base unit for updating the stochastic components. Thearchitecture of AForecast model is illustrated in Fig. 1.

Alterable

Interval

Moving

Average Filter

I interval

Moving

Average Filter

Tracer for

Stream-value

Change Ratio

I interval

Forecaster

Interpolating

Wavelet

ProcessorData stream s

Sampling at

intervals of I

Sampling at

intervals of k t Sampling at

intervals of t

f0

p1

f2

f1

r

e2

p2

p3

s1

s2

e1

s1

I interval

Error

Predictor

Alterable

Interval

Error

Predictor

r

p3

f0

Fig. 1. Architecture of AForecast model

The AForecast model comprises five components:1) Moving Average Filter is used to filter out the noise in the original data

stream. It can be classified into I interval and alterable interval. It computes theaverage of stream-value in a slide window and regards the average as the newstreaming measured value.

2) I Interval Forecaster is used to forecast the deterministic component ofthe data stream. There is a I intervals (I>k )between two deterministic stream-value forecast.

3) Tracer for Stream-value Change Ratio is used to map the dif-ferent forecasting-steps for data streams. We define stream-value change ra-tio δ(t)=(s(t)-s(t-∆t))/∆t as a quantification of stream-value changes, whereδ(t)∈[0,(max(s(t))-min(s(t)))/∆t ].

4) Interpolating Wavelet Processor is used to generate more refinedforecasting stream-value by binary scale interpolating wavelet method with mul-tiple interpolating-resolution. 2r∈[0,�I/∆t�] (r denotes the interpolating res-

Page 4: [Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

960 Yong-li Wang et al.

olution) and as a result, the maximum binary scale interpolating resolutionrmax=�log2I/∆t�. To map the relation between changes of stream-value andinterpolating resolution, we define forecasting-steps conduction operator Dt asDt(δ)=δ(t)·rmax/(max(s(t))-min(s(t))) .

5) Error Predictor is composed of I interval error predictor that is used toestimate the deterministic component of stream-value in a slow-moving mannerand alterable interval error predictor that is used to estimate the stochastic com-ponent of stream-value in a quick-moving manner. Its objective is to optimallyestimate the future value of random error.

The processing procedure is given as follows. Firstly, we use I interval fore-caster to provide two forecasting stream-values f 0, f 0’ at intervals of I. Whencurrent timestamps surpass I, we use e1 (e1=s1-f 0 , where s1 denotes the I in-terval average stream-value) as the input of I interval error predictor and theestimate of error p1 will be generated, As a result, the improved I interval fore-casting stream-value f 1 (f 1=p1+f 0’) is obtained. Then we use interpolatingwavelet processor to generate the interpolated values of forecasting value at in-tervals of alterable steps between s1 and f 1 under the conduction of r generatedby Tracer for stream-value change ratio. We use e2(e2=s2-f 2’, where s2 denotesthe alterable interval average stream-value, f2’ denotes the prior forecastingstream-value p3) as the input of alterable interval error predictor and generatethe estimate of alterable steps error p2. Finally we construct the final forecastingstream-value p3(p3=p2+f 2).

4 Adaptive Interpolating Algorithm

We use the interpolating wavelet with dynamic forecasting-step conduction op-erator according to the variety ratio of stream-value instead of equal-intervalspline interpolating. This method bears the merits of adaptability precision forforecasting stream-value. In order to satisfy the requirements of online processingand only one-pass scanning computation, we design an improved kalman-filteringalgorithm with amnesia factor to estimate the forecasting error. For limitation ofspace, we omit the implementing details and only describe the key interpolatingalgorithm.

Definition 1. Interpolating wavelet . For f¬∈V j , we define a projectionoperator in V j , which interpolates in stream-value samples f(2jk) in binary scalemanner. The interpolating equation[7] is shown as follows:

PVj =+∞∑

k=−∞f(2jk)φj(t− 2jk) . (1)

Where PVj is P -1 order interpolating polynomial, φ is a interpolating func-tion which produces a new interpolated value at different sampling intervals,φj(t-2jk)(k∈Z) is a set of Riesz Base of φ’s generating space, and j dependson forecasting-steps conduction operator Dt(δ).When the interpolating resolu-tion shrinks doublely, we can get a much more refined interpolated-value of

Page 5: [Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

APForecast: An Adaptive Forecasting Method for Data Streams 961

projection PVjf(t) in a adding ”detail” manner, and this detail remedy the dif-ference between PVjf(2j(k+1/2)) and the middle sampled-point f(2j(k+1/2)).ψj,k = φj−1,2k+1 is the interpolating wavelet with resolution conduction.

Multiresolution interpolating algorithm is shown in Fig. 2.

Fig. 2. Mutiresolution Interpolating Wavelet Algorithm

In order to effectively reduce the time cost for online forecasting, we usesearching-list method to implementing interpolating expensive wavelet algo-rithm, i.e. store every interpolating coefficient in a linear list in advance. Asa result we can locate the right coefficient based on index. The time complexityto generate every transitional interpolated-point is O(1). The time complexityof computing forecasting-step based on the δ(t) is O(1). The maximum level offorecasting-step (i.e. interpolating resolution) is log2(I/∆t). The resulting timecomplexity of generating every interpolated-value is O(log2(I/∆t)).

5 Experiment Evaluation

In order to validate the performance of AForecast, we use actual load dataof a power system in Nanjing. AForecast read a tuple at a fixed interval (10millisecond) from the data set and send it to the model to simulate AGC system.I interval forecaster is implemented by weather-sensitive BP Neural Network.I is set to 1 hour. We also implement 15min level forecasting method (called15mForecast) based on linear extrapolating and 5min level forecasting method(called 5mForecast) based on mixed forecasting technique [6], and carry outcontrasting experiment on the average forecasting-precision and the average run-time. The experiments are conducted on a 2.66GHz Pentium machines with256MB main memory and a 80GB hard disk. We implement the main part of

Page 6: [Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

962 Yong-li Wang et al.

the algorithm in Visual C++ 6.0 on windows 2000 server and use the toolboxfunction of MATLAB to implement the other part of contrasting experimentalgorithm.

Experiment 1: testing the overall average forecasting-precision of AForecast.We use the 36 hours load data during 2002/3/15-2002/3/16 in experiment. Ittakes 46.3ms to generate a small interval forecasting load-value averagely. Theaverage forecasting-precision p (p =1-

√1/n · ∑n

i=1 p2, p = (vf

i − vai )/va

i , wherevf

i is forecasting stream-value and vai is actual measured stream-value) is 95.496

percent.

0 5 10 15 20 25 30 35 400.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4x 10

4

Time/hours

Load(S

tream

-valu

e)/

MW

Alterable interval forecast

1 hour interval forecast

Actual measured load

Fig. 3. 36 hours curve and discrete point of forecasting load and actual measured load

Fig. 3 shows an actual measured load curve, an alterable forecasting-stepforecasting load curve and the discrete points for 1 hour forecasting load. Evenas our expectation, forecasting-precision dose not obviously alter at the period ofload-value intensely variation, because alterable forecasting-step forecaster canadapt to the variety of stream-value. The improved action of introduced Mul-tiresolution interpolating strategy is validated. We can observe that the highestforecasting-precision occurs at a position after a new I interval (1 hour) fore-casting stream-value generated, because I interval (1 hour) forecasting stream-value has relatively weak effect on small interval forecasting stream-value at thisperiod. The forecasting-precision reduce gradually until I interval (1 hour) fore-casting stream-value is updated. The shorter the distance between the positionof actual measured stream-value, the higher the forecasting-precision is.

Experiment 2: testing the average forecasting-precision of AForecast with dif-ferent forecasting-steps. we compared with the forecasting-precision of the othertwo algorithms at 5min and 15min level. the average forecasting-precision ofAForecast algorithm at different forecasting-steps are computed adaptively ex-cept for the specified position 300s and 900s. We observe the 5mins level average

Page 7: [Lecture Notes in Computer Science] Knowledge-Based Intelligent Information and Engineering Systems Volume 3682 || APForecast: An Adaptive Forecasting Method for Data Streams

APForecast: An Adaptive Forecasting Method for Data Streams 963

forecasting-precision of AForecast is lower than 5mForecast’s but higher than15mForecast’s. At 15min level the average forecasting-precision of AForecast ishigher than 15mForecast’s.

Based on experiment results we can conclude that: 1) from 36 hours fore-casting result curve we observe that our method can adaptive well no matter atthe inflexion or at the period when stream-value quickly fluctuates.2) the generalaverage forecasting-precision of AForecast is the highest among three algorithms.

6 Conclusions

We investigate the problem of stream-value forecasting based on typical appli-cation of data streams domain in this paper. The proposed AForecast methodintegrates the accurate merit of non-linear forecasting method and the rapidmerit of linear forecasting method. Its key idea is that adaptively setting upmultiple forecasting-step based on the variety cases of stream-value at differentperiods in a one-pass scanning manner. Compared with conventional forecastingmethods, our method has two advantages: much higher average precision andmore selected forecasting-steps. Experiments prove that the proposed method isa feasible solution to online forecasting for time-varying data streams.

References

1. Babcock, B., Babu, S., Datar, M., Motwani, R. and Widom, J.: Models and Issuesin Data Streams. In Proc. ACM Symp. on Principles of Database Systems, (2002)1–16.

2. Nasraoui, O., Rojas, C. and Cardona, C.: Single Pass Mining of Evolving Trends inWeb Data with Explicit Retrieval Similarity Measures. In Proceedings of ”Interna-tional Web Dynamics Workshop”, International World Wide Web Conference, NewYork, NY, May.(2004).

3. Faloutos, C.: Stream and Sensor data mining. Tutorials in 9th International Confer-ence on Extending DataBase Technology (EDBT 2004), Heraklion, Greece, (2004)25–27

4. Ankur, J., Edward Y. C. and Wang,Y. F.: Adaptive Stream Resource ManagementUsing Kalman Filters. SIGMOD Conference (2004) 11–22

5. Liu, K., Subbarayan, S., Shoults, R. R., Manry, M. T., Kwan, C., Lewis, F. L. andNaccarino,J.: Comparison of Very Short-Term Load Forecasting Techniques. IEEETransactions on Power Systerm, (1996),Vol. 11(2)

6. Trudnowski, J. D., McReynolds, W. L. and Johnson, M. J.: Real-time very short-term load prediction for power-system automatic generation control. IEEE Trans-actions on Control Systems Technology, Mar (2001),Vol. 9(2), 254–260

7. Mallat, S.: A Wavelet Tour of Signal Processing, Second Edition. (1999) by Aca-demic Press, 221–226