21
LETTER Communicated by John Platt Accurate On-line Support Vector Regression Junshui Ma [email protected] Aureon Biosciences Corp., 28 Wells St., Yonkers, NY 10701, U.S.A. James Theiler [email protected] Simon Perkins [email protected] NIS-2, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A. Batch implementations of support vector regression (SVR) are inef cient when used in an on-line setting because they must be retrained from scratch every time the training set is modi ed. Following an incremen- tal support vector classi cation algorithm introduced by Cauwenberghs and Poggio (2001), we have developed an accurate on-line support vec- tor regression (AOSVR) that ef ciently updates a trained SVR function whenever a sample is added to or removed from the training set. The updated SVR function is identical to that produced by a batch algorithm. Applications of AOSVR in both on-line and cross-validation scenarios are presented. In both scenarios, numerical experiments indicate that AOSVR is faster than batch SVR algorithms with both cold and warm start. 1 Introduction Support vector regression (SVR) ts a continuous-valued function to data in a way that shares many of the advantages of support vector machine (SVM) classi cation. Most algorithms for SVR (Smola & Sch¨ olkopf, 1998; Chang & Lin, 2002) require that training samples be delivered in a single batch. For applications such as on-line time-series prediction or leave-one-out cross- validation, a new model is desired each time a new sample is added to (or removed from) the training set. Retraining from scratch for each new data point can be very expensive. Approximate on-line training algorithms have previously been proposed for SVMs (Syed, Liu, & Sung, 1999; Csato & Opper, 2001; Gentile, 2001; Graepel, Herbrich, & Williamson, 2001; Herb- ster, 2001; Li & Long, 1999; Kivinen, Smola, & Williamson, 2002; Ralaivola & d’Alche-Buc, 2001). We propose an accurate on-line support vector re- gression (AOSVR) algorithm that follows the approach of Cauwenberghs and Poggio (2001) for incremental SVM classi cation. Neural Computation 15, 2683–2703 (2003) c ° 2003 Massachusetts Institute of Technology

Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

LETTER Communicated by John Platt

Accurate On-line Support Vector Regression

Junshui MajunshuimayahoocomAureon Biosciences Corp 28 Wells St Yonkers NY 10701 USA

James TheilerjtlanlgovSimon PerkinssperkinslanlgovNIS-2 Los Alamos National Laboratory Los Alamos NM 87545 USA

Batch implementations of support vector regression (SVR) are inefcientwhen used in an on-line setting because they must be retrained fromscratch every time the training set is modied Following an incremen-tal support vector classication algorithm introduced by Cauwenberghsand Poggio (2001) we have developed an accurate on-line support vec-tor regression (AOSVR) that efciently updates a trained SVR functionwhenever a sample is added to or removed from the training set Theupdated SVR function is identical to that produced by a batch algorithmApplications of AOSVRin both on-line and cross-validation scenarios arepresented In both scenarios numerical experiments indicate that AOSVRis faster than batch SVR algorithms with both cold and warm start

1 Introduction

Support vector regression (SVR) ts a continuous-valued function to data ina way that shares many of the advantages of support vector machine (SVM)classication Most algorithms for SVR (Smola amp Scholkopf 1998 Chang ampLin 2002) require that training samples be delivered in a single batch Forapplications such as on-line time-series prediction or leave-one-out cross-validation a new model is desired each time a new sample is added to(or removed from) the training set Retraining from scratch for each newdata point can be very expensive Approximate on-line training algorithmshave previously been proposed for SVMs (Syed Liu amp Sung 1999 Csatoamp Opper 2001 Gentile 2001 Graepel Herbrich amp Williamson 2001 Herb-ster 2001 Li amp Long 1999 Kivinen Smola amp Williamson 2002 Ralaivolaamp drsquoAlche-Buc 2001) We propose an accurate on-line support vector re-gression (AOSVR) algorithm that follows the approach of Cauwenberghsand Poggio (2001) for incremental SVM classication

Neural Computation 15 2683ndash2703 (2003) cdeg 2003 Massachusetts Institute of Technology

2684 J Ma J Theiler and S Perkins

This article is organized as follows The formulation of the SVR problemand the development of the Karush-Kuhn- Tucker (KKT) conditions thatits solution must satisfy are presented in section 2 The incremental SVRalgorithm is derived in section 3 and a decremental version is describedin section 4 Two applications of the AOSVR algorithm are presented insection 5 along with a comparison to batch algorithms uing both cold startand warm start

2 Support Vector Regression and the Karush-Kuhn Tucker Conditions

A more detailed version of the following presentation of SVR theory can befound in Smola and Scholkopf (1998)

Given a training set T D fxi yi i D 1 cent cent cent lg where xisup2RN and yisup2R weconstruct a linear regression function

f x D WT8x C b (21)

on a feature space F Here W is a vector in F and 8(x) maps the input xto a vector in F The W and b in equation 21 are obtained by solving anoptimization problem

minWb

P D 12

WTW C ClX

iD1

raquoi C raquocurreni

st yi iexcl WT8x C b middot C raquoi (22)

WT8x C b iexcl yi middot C raquo curreni

raquoi raquocurreni cedil 0 i D 1 cent cent cent l

The optimization criterion penalizes data points whose y-values differ fromf (x) by more than The slack variables raquo and raquocurren correspond to the size ofthis excess deviation for positive and negative deviations respectively asshown in Figure 1

Introducing Lagrange multipliers reg regcurren acute and acutecurren we can write the cor-responding Lagrangian as

LP D12

WTW C ClX

iD1

raquoi C raquocurreni iexcl

lX

iD1

acuteiraquoi C acutecurreni raquocurren

i

iexcllX

iD1

regi C raquoi C yi iexcl WT8xi iexcl b

iexcllX

iD1

regcurreni C raquocurren

i iexcl yi C WT8xi C b

st regi regcurreni acutei acutecurren

i cedil 0

Accurate On-line Support Vector Regression 2685

f(x)

f(x)-e

f(x)+e

x

x

x

y

Figure 1 The -insensitive loss function and the role of the slack variables raquo andraquo curren

This in turn leads to the dual optimization problem

minregregcurren

D D 12

lX

iD1

lX

jD1

Qijregi iexcl regcurreni regj iexcl regcurren

j C lX

iD1

regi C regcurreni

iexcllX

iD1

yiregi iexcl regcurreni

st 0 middot regi regcurreni middot C i D 1 l (23)

lX

iD1

regi iexcl regcurreni D 0

where Qij D copyxiTcopyxj D Kxi xj Here Kxi xj is a kernel function

(Smolaamp Scholkopf 1998) Given the solution of equation 23 the regressionfunction (21) can be written as

f x DlX

iD1

regi iexcl regcurreni Kxi x C b (24)

2686 J Ma J Theiler and S Perkins

The Lagrange formulation of equation 23 can be represented as

LD D12

lX

iD1

regi iexcl regcurreni regj iexcl regcurren

j C

lX

iD1

regi C regcurreni iexcl

lX

iD1

yiregi iexcl regcurreni

iexcllX

iD1

plusmniregi C plusmncurreni regcurren

i ClX

iD1

[uiregi iexcl C C ucurreni regcurren

i iexcl C]

C sup3

lX

iD1

regi iexcl regcurreni (25)

where plusmncurreni ucurren

i and sup3 are the Lagrange multipliers Optimizing this La-grangian leads to the Karush-Kuhn-Tucker (KKT) conditions

LD

regiD

lX

jD1

Qijregj iexcl regcurrenj C iexcl yi C sup3 iexcl plusmni C ui D 0

LD

regcurreni

D iexcllX

jD1

Qijregj iexcl regcurrenj C C yi iexcl sup3 iexcl plusmncurren

i C ucurreni D 0 (26)

plusmncurreni cedil 0 plusmncurren

i regcurreni D 0

ucurreni cedil 0 ucurren

i regcurreni iexcl C D 0

Note that sup3 in equation 26 is equal to b in equations 21 and 24 at optimality(Chang amp Lin 2002)

According to the KKT conditions 26 at most one of regi and regcurreni will be

nonzero and both are nonnegative Therefore we can dene a coefcientdifference microi as

microi D regi iexcl regcurreni (27)

and note that microi determines both regi and regcurreni

Dene a margin function hxi for the ith sample xi as

hxi acute f xi iexcl yi DlX

jD1

Qijmicroj iexcl yi C b (28)

Combining equations 26 27 and 28 we can obtain

8gtgtgtgtgtlt

gtgtgtgtgt

hxi cedil microi D iexclChxi D iexclC lt microi lt 0

iexcl middot hxi middot microi D 0

hxi D iexcl 0 lt microi lt Chxi middot iexcl microi D C

(29)

Accurate On-line Support Vector Regression 2687

There areve conditions in equation 29 compared to the three conditionsin support vector classication (see equation 2 in Cauwenberghs amp Poggio2001) but like the conditions in support vector classication they can beidentied with three subsets into which the samples in training set T can beclassied The difference is that two of the subsets (E and S) are themselvescomposed of two disconnected components depending on the sign of theerror f xi iexcl yi

The E Set Error support vectors E D fi j jmicroij D Cg (210)

The S Set Margin support vectors S D fi j 0 lt jmicroij lt Cg (211)

The R Set Remaining samples R D fi j microi D 0g (212)

3 Incremental Algorithm

The incremental algorithm updates the trained SVR function whenever anew sample xc is added to the training set T The basic idea is to changethe coefcient microc corresponding to the new sample xc in a nite number ofdiscrete steps until it meets the KKT conditions while ensuring that theexisting samples in T continue to satisfy the KKT conditions at each stepIn this section we rst derive the relation between the change of microc or 1microcand the change of other coefcients under the KKT conditions and thenpropose a method to determine the largest allowed 1microc for each step Apseudocode description of this algorithm is provided in the appendix

31 Derivation of the Incremental Relations Let xc be a new trainingsample that is added to T We initially set microc D 0 and then gradually change(increase or decrease) the value of microc under the KKT conditions equation 29

According to equations 26 27 and 29 the incremental relation between1hxi 1microi and 1b is given by

1hxi D Qic1microc ClX

iD1

Qij1microj C 1b (31)

From the equality condition in equation 23 we have

microc ClX

iD1

microi D 0 (32)

Combining equations 29 through 212 and 31 and 32 we obtainX

j2S

Qij1microj C 1b D iexclQic1microc where i 2 S

X

j2S

1microj D iexcl1microc (33)

2688 J Ma J Theiler and S Perkins

If we dene the index of the samples in the S set as

S D fs1 s2 sls g (34)

equation 33 can be represented in matrix form as

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

2

6664

1b1micros1

1microsls

3

7775 D iexcl

2

6664

1Qs1c

Qsls c

3

7775 1microc (35)

that is2

6664

1b1micros1

1microsls

3

7775 D macr1microc (36)

where

macr D

2

6664

macr

macrs1

macrsls

3

7775 D iexclR

2

6664

1Qs1c

Qsls c

3

7775

where R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(37)

Dene a non-S or N set as N D E [ R D fn1 n2 nlng Combiningequations 29 through 212 31 and 36 we obtain

2

6664

1hxn1

1hxn2

1hxnln

3

7775D deg 1microc (38)

where

deg D

2

6664

Qn1cQn2c

Qnln c

3

7775C

2

6664

1 Qn1s1 cent cent cent Qn1sls

1 Qn2s1 cent cent cent Qn2sls

1 Qnln s1 cent cent cent Qnln sls

3

7775 macr (39)

Accurate On-line Support Vector Regression 2689

In the special case when S set is empty according to equations 31 and 32equation 39 simplies to 1hxn D 1b for all n 2 E [ R

Given 1microc we can update microi i 2 S and b according to equation 36 andupdate hxi i 2 N according to equation 38 Moreover equation 29 sug-gests that microi i 2 N and hxi i 2 S are constant if the S set stays unchangedTherefore the results presented in this section enable us to update all the microiand hxi given 1microc In the next section we address the question of how tond an appropriate 1microc

32 AOSVR Bookkeeping Procedure Equations 36 and 38 hold onlywhen the samples in the S set do not change membership Therefore 1microc ischosen to be the largest value that either can maintain the S set unchangedor lead to the termination of the incremental algorithm

The rst step is to determine whether the change 1microc should be positiveor negative According to equation 29

sign1microc D signyc iexcl f xc D signiexclhxc (310)

The next step is to determine a bound on 1microc imposed by each sample inthe training set To simplify exposition we consider only the case 1microc gt 0and remark that the case 1microc lt 0 is similarFor the new sample xc there are two cases

Case 1 hxc changes from hxc lt iexcl to hxc D iexcl and the newsample xc is added to the S set and the algorithm terminates

Case 2 If microc increases from microc lt C to microc D C the new sample xc is addedto the E set and the algorithml terminates

For each sample xi in the set S

Case 3 If microi changes from 0 lt jmicroij lt C to jmicroij D C sample xi changesfrom the S set to the E set If microi changes to microi D 0 sample xi changesfrom the S set to the R set

For each sample xi in the set E

Case 4 If hxi changes from jhxij gt to jhxij D xi is moved fromthe E set to the S set

For each sample xi in the set R

Case 5 If hxi changes from jhxij lt to jhxij D xi is moved fromthe R set to the S set

The bookkeeping procedure is to trace each sample in the training setT against these ve cases and determine the allowed 1microc for each sampleaccording to equation 36 or 38 The nal 1microc is dened as the one withminimum absolute value among all the possible 1microc

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 2: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2684 J Ma J Theiler and S Perkins

This article is organized as follows The formulation of the SVR problemand the development of the Karush-Kuhn- Tucker (KKT) conditions thatits solution must satisfy are presented in section 2 The incremental SVRalgorithm is derived in section 3 and a decremental version is describedin section 4 Two applications of the AOSVR algorithm are presented insection 5 along with a comparison to batch algorithms uing both cold startand warm start

2 Support Vector Regression and the Karush-Kuhn Tucker Conditions

A more detailed version of the following presentation of SVR theory can befound in Smola and Scholkopf (1998)

Given a training set T D fxi yi i D 1 cent cent cent lg where xisup2RN and yisup2R weconstruct a linear regression function

f x D WT8x C b (21)

on a feature space F Here W is a vector in F and 8(x) maps the input xto a vector in F The W and b in equation 21 are obtained by solving anoptimization problem

minWb

P D 12

WTW C ClX

iD1

raquoi C raquocurreni

st yi iexcl WT8x C b middot C raquoi (22)

WT8x C b iexcl yi middot C raquo curreni

raquoi raquocurreni cedil 0 i D 1 cent cent cent l

The optimization criterion penalizes data points whose y-values differ fromf (x) by more than The slack variables raquo and raquocurren correspond to the size ofthis excess deviation for positive and negative deviations respectively asshown in Figure 1

Introducing Lagrange multipliers reg regcurren acute and acutecurren we can write the cor-responding Lagrangian as

LP D12

WTW C ClX

iD1

raquoi C raquocurreni iexcl

lX

iD1

acuteiraquoi C acutecurreni raquocurren

i

iexcllX

iD1

regi C raquoi C yi iexcl WT8xi iexcl b

iexcllX

iD1

regcurreni C raquocurren

i iexcl yi C WT8xi C b

st regi regcurreni acutei acutecurren

i cedil 0

Accurate On-line Support Vector Regression 2685

f(x)

f(x)-e

f(x)+e

x

x

x

y

Figure 1 The -insensitive loss function and the role of the slack variables raquo andraquo curren

This in turn leads to the dual optimization problem

minregregcurren

D D 12

lX

iD1

lX

jD1

Qijregi iexcl regcurreni regj iexcl regcurren

j C lX

iD1

regi C regcurreni

iexcllX

iD1

yiregi iexcl regcurreni

st 0 middot regi regcurreni middot C i D 1 l (23)

lX

iD1

regi iexcl regcurreni D 0

where Qij D copyxiTcopyxj D Kxi xj Here Kxi xj is a kernel function

(Smolaamp Scholkopf 1998) Given the solution of equation 23 the regressionfunction (21) can be written as

f x DlX

iD1

regi iexcl regcurreni Kxi x C b (24)

2686 J Ma J Theiler and S Perkins

The Lagrange formulation of equation 23 can be represented as

LD D12

lX

iD1

regi iexcl regcurreni regj iexcl regcurren

j C

lX

iD1

regi C regcurreni iexcl

lX

iD1

yiregi iexcl regcurreni

iexcllX

iD1

plusmniregi C plusmncurreni regcurren

i ClX

iD1

[uiregi iexcl C C ucurreni regcurren

i iexcl C]

C sup3

lX

iD1

regi iexcl regcurreni (25)

where plusmncurreni ucurren

i and sup3 are the Lagrange multipliers Optimizing this La-grangian leads to the Karush-Kuhn-Tucker (KKT) conditions

LD

regiD

lX

jD1

Qijregj iexcl regcurrenj C iexcl yi C sup3 iexcl plusmni C ui D 0

LD

regcurreni

D iexcllX

jD1

Qijregj iexcl regcurrenj C C yi iexcl sup3 iexcl plusmncurren

i C ucurreni D 0 (26)

plusmncurreni cedil 0 plusmncurren

i regcurreni D 0

ucurreni cedil 0 ucurren

i regcurreni iexcl C D 0

Note that sup3 in equation 26 is equal to b in equations 21 and 24 at optimality(Chang amp Lin 2002)

According to the KKT conditions 26 at most one of regi and regcurreni will be

nonzero and both are nonnegative Therefore we can dene a coefcientdifference microi as

microi D regi iexcl regcurreni (27)

and note that microi determines both regi and regcurreni

Dene a margin function hxi for the ith sample xi as

hxi acute f xi iexcl yi DlX

jD1

Qijmicroj iexcl yi C b (28)

Combining equations 26 27 and 28 we can obtain

8gtgtgtgtgtlt

gtgtgtgtgt

hxi cedil microi D iexclChxi D iexclC lt microi lt 0

iexcl middot hxi middot microi D 0

hxi D iexcl 0 lt microi lt Chxi middot iexcl microi D C

(29)

Accurate On-line Support Vector Regression 2687

There areve conditions in equation 29 compared to the three conditionsin support vector classication (see equation 2 in Cauwenberghs amp Poggio2001) but like the conditions in support vector classication they can beidentied with three subsets into which the samples in training set T can beclassied The difference is that two of the subsets (E and S) are themselvescomposed of two disconnected components depending on the sign of theerror f xi iexcl yi

The E Set Error support vectors E D fi j jmicroij D Cg (210)

The S Set Margin support vectors S D fi j 0 lt jmicroij lt Cg (211)

The R Set Remaining samples R D fi j microi D 0g (212)

3 Incremental Algorithm

The incremental algorithm updates the trained SVR function whenever anew sample xc is added to the training set T The basic idea is to changethe coefcient microc corresponding to the new sample xc in a nite number ofdiscrete steps until it meets the KKT conditions while ensuring that theexisting samples in T continue to satisfy the KKT conditions at each stepIn this section we rst derive the relation between the change of microc or 1microcand the change of other coefcients under the KKT conditions and thenpropose a method to determine the largest allowed 1microc for each step Apseudocode description of this algorithm is provided in the appendix

31 Derivation of the Incremental Relations Let xc be a new trainingsample that is added to T We initially set microc D 0 and then gradually change(increase or decrease) the value of microc under the KKT conditions equation 29

According to equations 26 27 and 29 the incremental relation between1hxi 1microi and 1b is given by

1hxi D Qic1microc ClX

iD1

Qij1microj C 1b (31)

From the equality condition in equation 23 we have

microc ClX

iD1

microi D 0 (32)

Combining equations 29 through 212 and 31 and 32 we obtainX

j2S

Qij1microj C 1b D iexclQic1microc where i 2 S

X

j2S

1microj D iexcl1microc (33)

2688 J Ma J Theiler and S Perkins

If we dene the index of the samples in the S set as

S D fs1 s2 sls g (34)

equation 33 can be represented in matrix form as

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

2

6664

1b1micros1

1microsls

3

7775 D iexcl

2

6664

1Qs1c

Qsls c

3

7775 1microc (35)

that is2

6664

1b1micros1

1microsls

3

7775 D macr1microc (36)

where

macr D

2

6664

macr

macrs1

macrsls

3

7775 D iexclR

2

6664

1Qs1c

Qsls c

3

7775

where R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(37)

Dene a non-S or N set as N D E [ R D fn1 n2 nlng Combiningequations 29 through 212 31 and 36 we obtain

2

6664

1hxn1

1hxn2

1hxnln

3

7775D deg 1microc (38)

where

deg D

2

6664

Qn1cQn2c

Qnln c

3

7775C

2

6664

1 Qn1s1 cent cent cent Qn1sls

1 Qn2s1 cent cent cent Qn2sls

1 Qnln s1 cent cent cent Qnln sls

3

7775 macr (39)

Accurate On-line Support Vector Regression 2689

In the special case when S set is empty according to equations 31 and 32equation 39 simplies to 1hxn D 1b for all n 2 E [ R

Given 1microc we can update microi i 2 S and b according to equation 36 andupdate hxi i 2 N according to equation 38 Moreover equation 29 sug-gests that microi i 2 N and hxi i 2 S are constant if the S set stays unchangedTherefore the results presented in this section enable us to update all the microiand hxi given 1microc In the next section we address the question of how tond an appropriate 1microc

32 AOSVR Bookkeeping Procedure Equations 36 and 38 hold onlywhen the samples in the S set do not change membership Therefore 1microc ischosen to be the largest value that either can maintain the S set unchangedor lead to the termination of the incremental algorithm

The rst step is to determine whether the change 1microc should be positiveor negative According to equation 29

sign1microc D signyc iexcl f xc D signiexclhxc (310)

The next step is to determine a bound on 1microc imposed by each sample inthe training set To simplify exposition we consider only the case 1microc gt 0and remark that the case 1microc lt 0 is similarFor the new sample xc there are two cases

Case 1 hxc changes from hxc lt iexcl to hxc D iexcl and the newsample xc is added to the S set and the algorithm terminates

Case 2 If microc increases from microc lt C to microc D C the new sample xc is addedto the E set and the algorithml terminates

For each sample xi in the set S

Case 3 If microi changes from 0 lt jmicroij lt C to jmicroij D C sample xi changesfrom the S set to the E set If microi changes to microi D 0 sample xi changesfrom the S set to the R set

For each sample xi in the set E

Case 4 If hxi changes from jhxij gt to jhxij D xi is moved fromthe E set to the S set

For each sample xi in the set R

Case 5 If hxi changes from jhxij lt to jhxij D xi is moved fromthe R set to the S set

The bookkeeping procedure is to trace each sample in the training setT against these ve cases and determine the allowed 1microc for each sampleaccording to equation 36 or 38 The nal 1microc is dened as the one withminimum absolute value among all the possible 1microc

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 3: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2685

f(x)

f(x)-e

f(x)+e

x

x

x

y

Figure 1 The -insensitive loss function and the role of the slack variables raquo andraquo curren

This in turn leads to the dual optimization problem

minregregcurren

D D 12

lX

iD1

lX

jD1

Qijregi iexcl regcurreni regj iexcl regcurren

j C lX

iD1

regi C regcurreni

iexcllX

iD1

yiregi iexcl regcurreni

st 0 middot regi regcurreni middot C i D 1 l (23)

lX

iD1

regi iexcl regcurreni D 0

where Qij D copyxiTcopyxj D Kxi xj Here Kxi xj is a kernel function

(Smolaamp Scholkopf 1998) Given the solution of equation 23 the regressionfunction (21) can be written as

f x DlX

iD1

regi iexcl regcurreni Kxi x C b (24)

2686 J Ma J Theiler and S Perkins

The Lagrange formulation of equation 23 can be represented as

LD D12

lX

iD1

regi iexcl regcurreni regj iexcl regcurren

j C

lX

iD1

regi C regcurreni iexcl

lX

iD1

yiregi iexcl regcurreni

iexcllX

iD1

plusmniregi C plusmncurreni regcurren

i ClX

iD1

[uiregi iexcl C C ucurreni regcurren

i iexcl C]

C sup3

lX

iD1

regi iexcl regcurreni (25)

where plusmncurreni ucurren

i and sup3 are the Lagrange multipliers Optimizing this La-grangian leads to the Karush-Kuhn-Tucker (KKT) conditions

LD

regiD

lX

jD1

Qijregj iexcl regcurrenj C iexcl yi C sup3 iexcl plusmni C ui D 0

LD

regcurreni

D iexcllX

jD1

Qijregj iexcl regcurrenj C C yi iexcl sup3 iexcl plusmncurren

i C ucurreni D 0 (26)

plusmncurreni cedil 0 plusmncurren

i regcurreni D 0

ucurreni cedil 0 ucurren

i regcurreni iexcl C D 0

Note that sup3 in equation 26 is equal to b in equations 21 and 24 at optimality(Chang amp Lin 2002)

According to the KKT conditions 26 at most one of regi and regcurreni will be

nonzero and both are nonnegative Therefore we can dene a coefcientdifference microi as

microi D regi iexcl regcurreni (27)

and note that microi determines both regi and regcurreni

Dene a margin function hxi for the ith sample xi as

hxi acute f xi iexcl yi DlX

jD1

Qijmicroj iexcl yi C b (28)

Combining equations 26 27 and 28 we can obtain

8gtgtgtgtgtlt

gtgtgtgtgt

hxi cedil microi D iexclChxi D iexclC lt microi lt 0

iexcl middot hxi middot microi D 0

hxi D iexcl 0 lt microi lt Chxi middot iexcl microi D C

(29)

Accurate On-line Support Vector Regression 2687

There areve conditions in equation 29 compared to the three conditionsin support vector classication (see equation 2 in Cauwenberghs amp Poggio2001) but like the conditions in support vector classication they can beidentied with three subsets into which the samples in training set T can beclassied The difference is that two of the subsets (E and S) are themselvescomposed of two disconnected components depending on the sign of theerror f xi iexcl yi

The E Set Error support vectors E D fi j jmicroij D Cg (210)

The S Set Margin support vectors S D fi j 0 lt jmicroij lt Cg (211)

The R Set Remaining samples R D fi j microi D 0g (212)

3 Incremental Algorithm

The incremental algorithm updates the trained SVR function whenever anew sample xc is added to the training set T The basic idea is to changethe coefcient microc corresponding to the new sample xc in a nite number ofdiscrete steps until it meets the KKT conditions while ensuring that theexisting samples in T continue to satisfy the KKT conditions at each stepIn this section we rst derive the relation between the change of microc or 1microcand the change of other coefcients under the KKT conditions and thenpropose a method to determine the largest allowed 1microc for each step Apseudocode description of this algorithm is provided in the appendix

31 Derivation of the Incremental Relations Let xc be a new trainingsample that is added to T We initially set microc D 0 and then gradually change(increase or decrease) the value of microc under the KKT conditions equation 29

According to equations 26 27 and 29 the incremental relation between1hxi 1microi and 1b is given by

1hxi D Qic1microc ClX

iD1

Qij1microj C 1b (31)

From the equality condition in equation 23 we have

microc ClX

iD1

microi D 0 (32)

Combining equations 29 through 212 and 31 and 32 we obtainX

j2S

Qij1microj C 1b D iexclQic1microc where i 2 S

X

j2S

1microj D iexcl1microc (33)

2688 J Ma J Theiler and S Perkins

If we dene the index of the samples in the S set as

S D fs1 s2 sls g (34)

equation 33 can be represented in matrix form as

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

2

6664

1b1micros1

1microsls

3

7775 D iexcl

2

6664

1Qs1c

Qsls c

3

7775 1microc (35)

that is2

6664

1b1micros1

1microsls

3

7775 D macr1microc (36)

where

macr D

2

6664

macr

macrs1

macrsls

3

7775 D iexclR

2

6664

1Qs1c

Qsls c

3

7775

where R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(37)

Dene a non-S or N set as N D E [ R D fn1 n2 nlng Combiningequations 29 through 212 31 and 36 we obtain

2

6664

1hxn1

1hxn2

1hxnln

3

7775D deg 1microc (38)

where

deg D

2

6664

Qn1cQn2c

Qnln c

3

7775C

2

6664

1 Qn1s1 cent cent cent Qn1sls

1 Qn2s1 cent cent cent Qn2sls

1 Qnln s1 cent cent cent Qnln sls

3

7775 macr (39)

Accurate On-line Support Vector Regression 2689

In the special case when S set is empty according to equations 31 and 32equation 39 simplies to 1hxn D 1b for all n 2 E [ R

Given 1microc we can update microi i 2 S and b according to equation 36 andupdate hxi i 2 N according to equation 38 Moreover equation 29 sug-gests that microi i 2 N and hxi i 2 S are constant if the S set stays unchangedTherefore the results presented in this section enable us to update all the microiand hxi given 1microc In the next section we address the question of how tond an appropriate 1microc

32 AOSVR Bookkeeping Procedure Equations 36 and 38 hold onlywhen the samples in the S set do not change membership Therefore 1microc ischosen to be the largest value that either can maintain the S set unchangedor lead to the termination of the incremental algorithm

The rst step is to determine whether the change 1microc should be positiveor negative According to equation 29

sign1microc D signyc iexcl f xc D signiexclhxc (310)

The next step is to determine a bound on 1microc imposed by each sample inthe training set To simplify exposition we consider only the case 1microc gt 0and remark that the case 1microc lt 0 is similarFor the new sample xc there are two cases

Case 1 hxc changes from hxc lt iexcl to hxc D iexcl and the newsample xc is added to the S set and the algorithm terminates

Case 2 If microc increases from microc lt C to microc D C the new sample xc is addedto the E set and the algorithml terminates

For each sample xi in the set S

Case 3 If microi changes from 0 lt jmicroij lt C to jmicroij D C sample xi changesfrom the S set to the E set If microi changes to microi D 0 sample xi changesfrom the S set to the R set

For each sample xi in the set E

Case 4 If hxi changes from jhxij gt to jhxij D xi is moved fromthe E set to the S set

For each sample xi in the set R

Case 5 If hxi changes from jhxij lt to jhxij D xi is moved fromthe R set to the S set

The bookkeeping procedure is to trace each sample in the training setT against these ve cases and determine the allowed 1microc for each sampleaccording to equation 36 or 38 The nal 1microc is dened as the one withminimum absolute value among all the possible 1microc

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 4: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2686 J Ma J Theiler and S Perkins

The Lagrange formulation of equation 23 can be represented as

LD D12

lX

iD1

regi iexcl regcurreni regj iexcl regcurren

j C

lX

iD1

regi C regcurreni iexcl

lX

iD1

yiregi iexcl regcurreni

iexcllX

iD1

plusmniregi C plusmncurreni regcurren

i ClX

iD1

[uiregi iexcl C C ucurreni regcurren

i iexcl C]

C sup3

lX

iD1

regi iexcl regcurreni (25)

where plusmncurreni ucurren

i and sup3 are the Lagrange multipliers Optimizing this La-grangian leads to the Karush-Kuhn-Tucker (KKT) conditions

LD

regiD

lX

jD1

Qijregj iexcl regcurrenj C iexcl yi C sup3 iexcl plusmni C ui D 0

LD

regcurreni

D iexcllX

jD1

Qijregj iexcl regcurrenj C C yi iexcl sup3 iexcl plusmncurren

i C ucurreni D 0 (26)

plusmncurreni cedil 0 plusmncurren

i regcurreni D 0

ucurreni cedil 0 ucurren

i regcurreni iexcl C D 0

Note that sup3 in equation 26 is equal to b in equations 21 and 24 at optimality(Chang amp Lin 2002)

According to the KKT conditions 26 at most one of regi and regcurreni will be

nonzero and both are nonnegative Therefore we can dene a coefcientdifference microi as

microi D regi iexcl regcurreni (27)

and note that microi determines both regi and regcurreni

Dene a margin function hxi for the ith sample xi as

hxi acute f xi iexcl yi DlX

jD1

Qijmicroj iexcl yi C b (28)

Combining equations 26 27 and 28 we can obtain

8gtgtgtgtgtlt

gtgtgtgtgt

hxi cedil microi D iexclChxi D iexclC lt microi lt 0

iexcl middot hxi middot microi D 0

hxi D iexcl 0 lt microi lt Chxi middot iexcl microi D C

(29)

Accurate On-line Support Vector Regression 2687

There areve conditions in equation 29 compared to the three conditionsin support vector classication (see equation 2 in Cauwenberghs amp Poggio2001) but like the conditions in support vector classication they can beidentied with three subsets into which the samples in training set T can beclassied The difference is that two of the subsets (E and S) are themselvescomposed of two disconnected components depending on the sign of theerror f xi iexcl yi

The E Set Error support vectors E D fi j jmicroij D Cg (210)

The S Set Margin support vectors S D fi j 0 lt jmicroij lt Cg (211)

The R Set Remaining samples R D fi j microi D 0g (212)

3 Incremental Algorithm

The incremental algorithm updates the trained SVR function whenever anew sample xc is added to the training set T The basic idea is to changethe coefcient microc corresponding to the new sample xc in a nite number ofdiscrete steps until it meets the KKT conditions while ensuring that theexisting samples in T continue to satisfy the KKT conditions at each stepIn this section we rst derive the relation between the change of microc or 1microcand the change of other coefcients under the KKT conditions and thenpropose a method to determine the largest allowed 1microc for each step Apseudocode description of this algorithm is provided in the appendix

31 Derivation of the Incremental Relations Let xc be a new trainingsample that is added to T We initially set microc D 0 and then gradually change(increase or decrease) the value of microc under the KKT conditions equation 29

According to equations 26 27 and 29 the incremental relation between1hxi 1microi and 1b is given by

1hxi D Qic1microc ClX

iD1

Qij1microj C 1b (31)

From the equality condition in equation 23 we have

microc ClX

iD1

microi D 0 (32)

Combining equations 29 through 212 and 31 and 32 we obtainX

j2S

Qij1microj C 1b D iexclQic1microc where i 2 S

X

j2S

1microj D iexcl1microc (33)

2688 J Ma J Theiler and S Perkins

If we dene the index of the samples in the S set as

S D fs1 s2 sls g (34)

equation 33 can be represented in matrix form as

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

2

6664

1b1micros1

1microsls

3

7775 D iexcl

2

6664

1Qs1c

Qsls c

3

7775 1microc (35)

that is2

6664

1b1micros1

1microsls

3

7775 D macr1microc (36)

where

macr D

2

6664

macr

macrs1

macrsls

3

7775 D iexclR

2

6664

1Qs1c

Qsls c

3

7775

where R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(37)

Dene a non-S or N set as N D E [ R D fn1 n2 nlng Combiningequations 29 through 212 31 and 36 we obtain

2

6664

1hxn1

1hxn2

1hxnln

3

7775D deg 1microc (38)

where

deg D

2

6664

Qn1cQn2c

Qnln c

3

7775C

2

6664

1 Qn1s1 cent cent cent Qn1sls

1 Qn2s1 cent cent cent Qn2sls

1 Qnln s1 cent cent cent Qnln sls

3

7775 macr (39)

Accurate On-line Support Vector Regression 2689

In the special case when S set is empty according to equations 31 and 32equation 39 simplies to 1hxn D 1b for all n 2 E [ R

Given 1microc we can update microi i 2 S and b according to equation 36 andupdate hxi i 2 N according to equation 38 Moreover equation 29 sug-gests that microi i 2 N and hxi i 2 S are constant if the S set stays unchangedTherefore the results presented in this section enable us to update all the microiand hxi given 1microc In the next section we address the question of how tond an appropriate 1microc

32 AOSVR Bookkeeping Procedure Equations 36 and 38 hold onlywhen the samples in the S set do not change membership Therefore 1microc ischosen to be the largest value that either can maintain the S set unchangedor lead to the termination of the incremental algorithm

The rst step is to determine whether the change 1microc should be positiveor negative According to equation 29

sign1microc D signyc iexcl f xc D signiexclhxc (310)

The next step is to determine a bound on 1microc imposed by each sample inthe training set To simplify exposition we consider only the case 1microc gt 0and remark that the case 1microc lt 0 is similarFor the new sample xc there are two cases

Case 1 hxc changes from hxc lt iexcl to hxc D iexcl and the newsample xc is added to the S set and the algorithm terminates

Case 2 If microc increases from microc lt C to microc D C the new sample xc is addedto the E set and the algorithml terminates

For each sample xi in the set S

Case 3 If microi changes from 0 lt jmicroij lt C to jmicroij D C sample xi changesfrom the S set to the E set If microi changes to microi D 0 sample xi changesfrom the S set to the R set

For each sample xi in the set E

Case 4 If hxi changes from jhxij gt to jhxij D xi is moved fromthe E set to the S set

For each sample xi in the set R

Case 5 If hxi changes from jhxij lt to jhxij D xi is moved fromthe R set to the S set

The bookkeeping procedure is to trace each sample in the training setT against these ve cases and determine the allowed 1microc for each sampleaccording to equation 36 or 38 The nal 1microc is dened as the one withminimum absolute value among all the possible 1microc

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 5: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2687

There areve conditions in equation 29 compared to the three conditionsin support vector classication (see equation 2 in Cauwenberghs amp Poggio2001) but like the conditions in support vector classication they can beidentied with three subsets into which the samples in training set T can beclassied The difference is that two of the subsets (E and S) are themselvescomposed of two disconnected components depending on the sign of theerror f xi iexcl yi

The E Set Error support vectors E D fi j jmicroij D Cg (210)

The S Set Margin support vectors S D fi j 0 lt jmicroij lt Cg (211)

The R Set Remaining samples R D fi j microi D 0g (212)

3 Incremental Algorithm

The incremental algorithm updates the trained SVR function whenever anew sample xc is added to the training set T The basic idea is to changethe coefcient microc corresponding to the new sample xc in a nite number ofdiscrete steps until it meets the KKT conditions while ensuring that theexisting samples in T continue to satisfy the KKT conditions at each stepIn this section we rst derive the relation between the change of microc or 1microcand the change of other coefcients under the KKT conditions and thenpropose a method to determine the largest allowed 1microc for each step Apseudocode description of this algorithm is provided in the appendix

31 Derivation of the Incremental Relations Let xc be a new trainingsample that is added to T We initially set microc D 0 and then gradually change(increase or decrease) the value of microc under the KKT conditions equation 29

According to equations 26 27 and 29 the incremental relation between1hxi 1microi and 1b is given by

1hxi D Qic1microc ClX

iD1

Qij1microj C 1b (31)

From the equality condition in equation 23 we have

microc ClX

iD1

microi D 0 (32)

Combining equations 29 through 212 and 31 and 32 we obtainX

j2S

Qij1microj C 1b D iexclQic1microc where i 2 S

X

j2S

1microj D iexcl1microc (33)

2688 J Ma J Theiler and S Perkins

If we dene the index of the samples in the S set as

S D fs1 s2 sls g (34)

equation 33 can be represented in matrix form as

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

2

6664

1b1micros1

1microsls

3

7775 D iexcl

2

6664

1Qs1c

Qsls c

3

7775 1microc (35)

that is2

6664

1b1micros1

1microsls

3

7775 D macr1microc (36)

where

macr D

2

6664

macr

macrs1

macrsls

3

7775 D iexclR

2

6664

1Qs1c

Qsls c

3

7775

where R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(37)

Dene a non-S or N set as N D E [ R D fn1 n2 nlng Combiningequations 29 through 212 31 and 36 we obtain

2

6664

1hxn1

1hxn2

1hxnln

3

7775D deg 1microc (38)

where

deg D

2

6664

Qn1cQn2c

Qnln c

3

7775C

2

6664

1 Qn1s1 cent cent cent Qn1sls

1 Qn2s1 cent cent cent Qn2sls

1 Qnln s1 cent cent cent Qnln sls

3

7775 macr (39)

Accurate On-line Support Vector Regression 2689

In the special case when S set is empty according to equations 31 and 32equation 39 simplies to 1hxn D 1b for all n 2 E [ R

Given 1microc we can update microi i 2 S and b according to equation 36 andupdate hxi i 2 N according to equation 38 Moreover equation 29 sug-gests that microi i 2 N and hxi i 2 S are constant if the S set stays unchangedTherefore the results presented in this section enable us to update all the microiand hxi given 1microc In the next section we address the question of how tond an appropriate 1microc

32 AOSVR Bookkeeping Procedure Equations 36 and 38 hold onlywhen the samples in the S set do not change membership Therefore 1microc ischosen to be the largest value that either can maintain the S set unchangedor lead to the termination of the incremental algorithm

The rst step is to determine whether the change 1microc should be positiveor negative According to equation 29

sign1microc D signyc iexcl f xc D signiexclhxc (310)

The next step is to determine a bound on 1microc imposed by each sample inthe training set To simplify exposition we consider only the case 1microc gt 0and remark that the case 1microc lt 0 is similarFor the new sample xc there are two cases

Case 1 hxc changes from hxc lt iexcl to hxc D iexcl and the newsample xc is added to the S set and the algorithm terminates

Case 2 If microc increases from microc lt C to microc D C the new sample xc is addedto the E set and the algorithml terminates

For each sample xi in the set S

Case 3 If microi changes from 0 lt jmicroij lt C to jmicroij D C sample xi changesfrom the S set to the E set If microi changes to microi D 0 sample xi changesfrom the S set to the R set

For each sample xi in the set E

Case 4 If hxi changes from jhxij gt to jhxij D xi is moved fromthe E set to the S set

For each sample xi in the set R

Case 5 If hxi changes from jhxij lt to jhxij D xi is moved fromthe R set to the S set

The bookkeeping procedure is to trace each sample in the training setT against these ve cases and determine the allowed 1microc for each sampleaccording to equation 36 or 38 The nal 1microc is dened as the one withminimum absolute value among all the possible 1microc

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 6: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2688 J Ma J Theiler and S Perkins

If we dene the index of the samples in the S set as

S D fs1 s2 sls g (34)

equation 33 can be represented in matrix form as

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

2

6664

1b1micros1

1microsls

3

7775 D iexcl

2

6664

1Qs1c

Qsls c

3

7775 1microc (35)

that is2

6664

1b1micros1

1microsls

3

7775 D macr1microc (36)

where

macr D

2

6664

macr

macrs1

macrsls

3

7775 D iexclR

2

6664

1Qs1c

Qsls c

3

7775

where R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(37)

Dene a non-S or N set as N D E [ R D fn1 n2 nlng Combiningequations 29 through 212 31 and 36 we obtain

2

6664

1hxn1

1hxn2

1hxnln

3

7775D deg 1microc (38)

where

deg D

2

6664

Qn1cQn2c

Qnln c

3

7775C

2

6664

1 Qn1s1 cent cent cent Qn1sls

1 Qn2s1 cent cent cent Qn2sls

1 Qnln s1 cent cent cent Qnln sls

3

7775 macr (39)

Accurate On-line Support Vector Regression 2689

In the special case when S set is empty according to equations 31 and 32equation 39 simplies to 1hxn D 1b for all n 2 E [ R

Given 1microc we can update microi i 2 S and b according to equation 36 andupdate hxi i 2 N according to equation 38 Moreover equation 29 sug-gests that microi i 2 N and hxi i 2 S are constant if the S set stays unchangedTherefore the results presented in this section enable us to update all the microiand hxi given 1microc In the next section we address the question of how tond an appropriate 1microc

32 AOSVR Bookkeeping Procedure Equations 36 and 38 hold onlywhen the samples in the S set do not change membership Therefore 1microc ischosen to be the largest value that either can maintain the S set unchangedor lead to the termination of the incremental algorithm

The rst step is to determine whether the change 1microc should be positiveor negative According to equation 29

sign1microc D signyc iexcl f xc D signiexclhxc (310)

The next step is to determine a bound on 1microc imposed by each sample inthe training set To simplify exposition we consider only the case 1microc gt 0and remark that the case 1microc lt 0 is similarFor the new sample xc there are two cases

Case 1 hxc changes from hxc lt iexcl to hxc D iexcl and the newsample xc is added to the S set and the algorithm terminates

Case 2 If microc increases from microc lt C to microc D C the new sample xc is addedto the E set and the algorithml terminates

For each sample xi in the set S

Case 3 If microi changes from 0 lt jmicroij lt C to jmicroij D C sample xi changesfrom the S set to the E set If microi changes to microi D 0 sample xi changesfrom the S set to the R set

For each sample xi in the set E

Case 4 If hxi changes from jhxij gt to jhxij D xi is moved fromthe E set to the S set

For each sample xi in the set R

Case 5 If hxi changes from jhxij lt to jhxij D xi is moved fromthe R set to the S set

The bookkeeping procedure is to trace each sample in the training setT against these ve cases and determine the allowed 1microc for each sampleaccording to equation 36 or 38 The nal 1microc is dened as the one withminimum absolute value among all the possible 1microc

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 7: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2689

In the special case when S set is empty according to equations 31 and 32equation 39 simplies to 1hxn D 1b for all n 2 E [ R

Given 1microc we can update microi i 2 S and b according to equation 36 andupdate hxi i 2 N according to equation 38 Moreover equation 29 sug-gests that microi i 2 N and hxi i 2 S are constant if the S set stays unchangedTherefore the results presented in this section enable us to update all the microiand hxi given 1microc In the next section we address the question of how tond an appropriate 1microc

32 AOSVR Bookkeeping Procedure Equations 36 and 38 hold onlywhen the samples in the S set do not change membership Therefore 1microc ischosen to be the largest value that either can maintain the S set unchangedor lead to the termination of the incremental algorithm

The rst step is to determine whether the change 1microc should be positiveor negative According to equation 29

sign1microc D signyc iexcl f xc D signiexclhxc (310)

The next step is to determine a bound on 1microc imposed by each sample inthe training set To simplify exposition we consider only the case 1microc gt 0and remark that the case 1microc lt 0 is similarFor the new sample xc there are two cases

Case 1 hxc changes from hxc lt iexcl to hxc D iexcl and the newsample xc is added to the S set and the algorithm terminates

Case 2 If microc increases from microc lt C to microc D C the new sample xc is addedto the E set and the algorithml terminates

For each sample xi in the set S

Case 3 If microi changes from 0 lt jmicroij lt C to jmicroij D C sample xi changesfrom the S set to the E set If microi changes to microi D 0 sample xi changesfrom the S set to the R set

For each sample xi in the set E

Case 4 If hxi changes from jhxij gt to jhxij D xi is moved fromthe E set to the S set

For each sample xi in the set R

Case 5 If hxi changes from jhxij lt to jhxij D xi is moved fromthe R set to the S set

The bookkeeping procedure is to trace each sample in the training setT against these ve cases and determine the allowed 1microc for each sampleaccording to equation 36 or 38 The nal 1microc is dened as the one withminimum absolute value among all the possible 1microc

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 8: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2690 J Ma J Theiler and S Perkins

33 Efciently Updating the R Matrix The matrix R that is used inequation 37

R D

2

6664

0 1 cent cent cent 11 Qs1s1 cent cent cent Qs1sls

1 Qsls s1 cent cent cent Qsls sls

3

7775

iexcl1

(311)

must be updated whenever the S set is changed Following Cauwenberghsand Poggio (2001) we can efciently update R without explicitlycomputingthe matrix inverse When the kth sample xsk in the S set is removed from theS set the new R can be obtained as follows

Rnew D RII iexclRIkRkI

Rkk where

I D [1 cent cent cent k k C 2 cent cent cent Sls C 1] (312)

When a new sample is added to S set the new R can be updated as follows

Rnew D

2

6664

0

R

00 cent cent cent 0 0

3

7775 C1degi

micromacr1

para poundmacrT 1

curren (313)

where macr is dened as

macr D iexclR

2

6664

1Qs1 i

Qsls i

3

7775

and degi is dened as

degi D Qii C

2

6664

1Qs1i

Qsls i

3

7775 macr

when the sample xi was moved from E set to R set In contrast when thesample xc is the sample added to S set macr is can be obtained according toequation 37 and degi is the last element of deg dened in equation 39

34 Initialization of the Incremental Algorithm An initial SVR solu-tion can be obtained from a batch SVR solution and in most cases that is themost efcient approach But it is sometimes convenient to use AOSVR to

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 9: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2691

produce a full solution from scratch An efcient starting point is the two-sample solution Given a training set T D fx1 y1 x2 y2g with y1 cedil y2the solution of equation 23 is

micro1 D maxsup3

0 minsup3

Cy1 iexcl y2 iexcl 2

2K11 iexcl K12

acuteacute

micro2 D iexclmicro1

b D y1 D y2=2 (314)

The sets E S and R are initialized from these two points based on equa-tions 210 through 212 If the set S is nonempty the matrix R can be ini-tialized from equation 311 As long as S is empty the matrix R will not beused

4 Decremental Algorithm

The decremental (or ldquounlearningrdquo) algorithm is employed when an existingsample is removed from the training set If a sample xc is in the R set then itdoes not contribute to the SVR solution and removing it from the trainingset is trivial no adjustments are needed If on the other hand xc has anonzero coefcient then the idea is to gradually reduce the value of thecoefcient to zero while ensuring all the other samples in the training setcontinue to satisfy the KKT conditions

The decremental algorithm follows the incremental algorithm with a fewsmall adjustments

sup2 The direction of the change of microc is

sign1microc D sign f xc iexcl yc D signhxc (41)

sup2 There is no case 1 because the removed xc need not satisfy KKT con-ditions

sup2 The condition in case 2 becomes microc changing from jmicrocj gt 0 to microc D 0

5 Applications and Comparison with Batch Algorithms

The accurate on-line SVR (AOSVR) learning algorithm produces exactlythe same SVR as the conventional batch SVR learning algorithm and canbe applied in all scenarios where batch SVR is currently employed But foron-line time-series prediction and leave-one-out cross-validation (LOOCV)the AOSVR algorithm is particularly well suited In this section we demon-strate AOSVR for both of these applications and compare its performance toexisting batch SVR algorithms These comparisons are based on direct tim-ing of runs using Matlab implementations such timings should be treatedwith some caution as they can be sensitive to details of implementation

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 10: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2692 J Ma J Theiler and S Perkins

51 AOSVR versus Batch SVR Algorithms with Warm Start Mostbatch algorithms for SVR are implemented as ldquocold startrdquo This is appro-priate when a t is desired to a batch of data that has not been seen beforeHowever in recent years there has been a growing interest in ldquowarm-startrdquoalgorithms that can save time by starting from an appropriate solution andquite a few articles have addressed this issue in the generic context of nu-meric programming (Gondzio 1998 Gondzio amp Grothey 2001 Yildirim ampWright 2002 Fliege amp Heseler 2002) The warm-start algorithms are usefulfor incremental (or decremental) learning because the solution with N iexcl 1(or N C 1) data points provides a natural starting point for nding the so-lution with N data points In this sense AOSVR is a kind of warm-startalgorithm for the QP problem equation 23 which is specially designedfor the incremental or decremental scenario This specialty allows AOSVRto achieve more efciency when handling SVR incremental or decrementallearning as demonstrated in our subsequent experiments

In the machine learning community three algorithms for batch SVRtrain-ing are widely recognized Gunn (1998) solved SVR training as a genericQP optimization we call this implementation QPSVMR Shevade KeerthiBhattacharyya and Murthy (1999) proposed an algorithm specially de-signed for SVR training and it is an improved version of the sequential min-imal optimization for SVM regression (SMOR) and Chang and Lin (2001)proposed another algorithm specially designed for SVR training which wecall LibSVMR since it is implemented as part of the LibSVM software pack-age We implemented all these algorithms so that they can run in both acold-start and a warm-start mode SMOR and LibSVMR are implementedin Matlab and both algorithms allow a straightforward warm-start realiza-tion Because QPSVMR is based on a generic QP algorithm it is much lessefcient than SMOR or LibSVMR To make our subsequent experimentsfeasible we had to implement the QPSVMR core in C (Smola 1998) Smolaessentially employs the interior point QP code of LOQO (Vanderbei 1999)The warm start of QPSVMR directly adopts the warm-start method embed-ded in Smolarsquos (1998) implementation

52 On-line Time-Series Prediction In recent years the use of SVR fortime-series prediction has attracted increased attention (Muller et al 1997Fernandez 1999 Tay amp Cao 2001) In an on-line scenario one updates amodel from incoming data and at the same time makes predictions basedon that model This arises for instance in market forecasting scenariosAnother potential application is the (near) real-time prediction of electrondensity around a satellite in the magnetosphere high charge densities candamage satellite equipment (Friedel Reeves and Obara 2002) and if timesof high charge can be predicted the most sensitive components can beturned off before they are damaged

In time-series prediction the prediction origin denoted O is the timefrom which the prediction is generated The time between the prediction

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 11: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2693

origin and the predicted data point is the prediction horizon which forsimplicity we take as one time step

A typical on-line time-series prediction scenario can be represented asfollows (Tashman 2000)

1 Given a time series fxt t D 1 2 3 g and prediction origin Oconstruct a set of training samples AOB from the segment of timeseries fxt t D 1 Og as AOB D fXt yt t D B 0 iexcl 1gwhere Xt D [xt xt iexcl B C 1]T yt D xt C 1 and B is theembedding dimension of the training set AOB

2 Train a predictor PAOBI X from the training set AOB

3 Predict xO C 1 using OxO C 1 D PAOBI XO

4 When x0 C 1 becomes available update the prediction origin O DO C 1 Then go to step 1 and repeat the procedure

Note that the training set AOB keeps growing as O increases so thetraining of the predictor in step 2 becomes increasingly expensive There-fore many SVR-based time-series predictionsare implemented in a compro-mised way (Tay amp Cao 2001) After the predictor is obtained it stays xedand is not updated as new data arrive In contrast an on-line predictionalgorithm can take advantage of the fact that the training set is augmentedone sample at a time and continues to update and improve the model asmore data arrive

521 Experiments Two experiments were performed to compare theAOSVR algorithm with the batch SVR algorithm We were careful to usethe same algorithm parameters for on-line and batch SVR but since ourpurpose is to compare computational performance we did not attempt tooptimize these parameters for each data set In these experiments the kernelfunction is a gaussian radial basis function expiexcldeg kXiiexclXjk2 where deg D 1the regularization coefcient C and the insensitivity parameter in equation21 are set to 10 and 01 respectively the embedding dimension B of thetraining AOB is 5 Also we scale all the time-series to [iexcl11]

Three widely used benchmark time series are employed in both exper-iments (1) the Santa Fe Institute Competition time series A (Weigend ampGershenfeld1994) (2) the Mackey-Glass equation with iquest D 17 (Mackeyamp Glass 1977) and (3) the yearly average sunspot numbers recorded from1700 to 1995 Some basic information about these time series is listed in Ta-ble 1 The SV ratio is the number of support vectors divided by the numberof training samples This is based on a prediction of the last data point usingall previous data for training In general a higher SV ratio suggests that theunderlying problem is harder (Vapnik 1998)

The rst experiment demonstrates that using a xed predictor producesless accurate predictions than using a predictor that is updated as new data

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 12: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2694 J Ma J Theiler and S Perkins

Table 1 Information Regarding Experimental Time Series

Number of Data Points SV Ratio

Santa Fe Institute 1000 452Mackey-Glass 1500 154Yearly Sunspot 292 4181

become available Two measurements are used to quantify the predictionperformance mean squared error (MSE) and mean absolute error (MAE)The predictors are initially trained on the rst half of the data in the timeseries In the xed case the same predictor is used to predict the second halfof the time series In the on-line case the predictor is updated whenever anew data point is available The performance measurements for both casesare calculated from the predicted and actual values of the second half of thedata in the time series As shown in Table 2 the on-line predictor outper-forms the xed predictor in every case We also note that the errors for thethree time series in Table 2 coincide with the estimated prediction difcultyin Table 1 based on the SV ratio

The second experiment compares AOSVR with batch implementationsusing both cold start and warm start in the on-line prediction scenarioFor each benchmark time series an initial SVR predictor is trained on therst two data points using the batch SVR algorithms For AOSVR we usedequation 314 Afterward both AOSVR and batch SVR algorithms are em-ployed in the on-line prediction mode for the remaining data points in thetime series AOSVR and the batch SVR algorithms produce exactly the sameprediction errors in this experiment so the comparison is only of predic-tion speed All six batch SVR algorithms are compared with AOSVR on thesunspot time series and the experimental results are plotted in Figure 2 Thex-axis of this plot is the number of data points to which the on-line predic-tion model is applied Note that the core of QPSVMR is implemented in CBecause the cold start and warm start of LibSVMR clearly outperform thoseof both SMOR and QPSVMR only the comparison between LibSVMR andAOSVR is carried out in our subsequent experiments The experimental re-

Table 2 Performance Comparison for On-line and Fixed Predictors

On-line Fixed

Santa Fe Institute MSE 00072 00097MAE 00588 00665

Mackey-Glass MSE 00034 00036MAE 00506 00522

Yearly Sunspot MSE 00263 00369MAE 01204 01365

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 13: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2695

50 100 150 200 250

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

SMOR (Cold Start)SMOR (Warm Start)QPSVMR (Cold Start)QPSVMR (Warm Start)LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 2 Real-time prediction time of yearly sunspot time series

sults of both Santa Fe Institute and Mackey-Glass time series are presentedin Figures 3 and 4 respectively

These experimental results demonstrate that AOSVR algorithm is gen-erally much faster than the batch SVR algorithms when applied to on-lineprediction Comparison of Figures 2 and 4 furthermore suggests that morespeed improvement is achieved on the sunspot data than on the Mackey-Glass We speculate that this is because the sunspot problem is ldquoharderrdquothan the Mackey-Glass (it has a higher support vector ratio) and that theperformance of the AOSVR algorithm is less sensitive to problem difculty

To test this hypothesis we compared the performance of AOSVR to Lib-SVMR on a single data set (the sunspots) whose difculty was adjusted bychanging the value of A smaller leads to a higher support vector ratioand a more difcult problem Both the AOSVR and LibSVMR algorithmswere employed for on line prediction of the full time series The overallprediction times are plotted against in Figure 5 Where AOSVR perfor-mance varied by a factor of less than 10 over the range of the LibSVMRperformance varied by a factor of about 100

522 Limited-Memory Version of the On-line Time-Series Prediction ScenarioOne problem with on-line time-series prediction in general is that the longerthe prediction goes on the bigger the training set AOB will become and the

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 14: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2696 J Ma J Theiler and S Perkins

100 200 300 400 500 600 700 800 900

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 3 Real-time prediction time of Santa Fe Institute time series

200 400 600 800 1000 1200 1400

10shy 1

100

101

102

103

104

Number of Points

Tim

e (i

n s

ec)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

Figure 4 Real-time prediction time of Mackey-Glass time series

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 15: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2697

005 01 015 02 025 03 035 04 045 05

102

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

005 01 015 02 025 03 035 04 045 05

200

400

600

800

1000

e

Ove

rall

Pre

dic

tio

n T

ime LibSVMR (Cold Start)

LibSVMR (Warm Start)AOSVR

Figure 5 Semilog and linear plots of prediction time of yearly sunspot timeseries

more SVs will be involved in SVR predictor A complicated SVR predictorimposes both memory and computation stress on the prediction systemOne way to deal with this problem is to impose a ldquoforgettingrdquo time WWhen training set AOB grows to this maximum W then the decrementalalgorithm is used to remove the oldest sample before the next new sampleis added to the training set

We note that this variant of the on-line prediction scenario is also poten-tially suitable for nonstationary time series as it can be updated in real timeto t the most recent behavior of the time series More rigorous investigationin this direction will be a future effort

53 Leave-One-Out Cross-Validation Cross-validation is a useful toolfor assessing the generalization ability of a machine learning algorithm Theidea is to train on one subset of the data and then to test the accuracy ofthe predictor on a separate disjoint subset In leave-one-out cross-validation(LOOCV) only a single sample is used for testing and all the rest are usedfor training Generally this is repeated for every sample in the data setWhen the batch SVR is employed LOOCV can be very expensive sincea fullretraining is done for each sample One compromise approach is to estimateLOOCV with related but less expensive approximations such as the xi-

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 16: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2698 J Ma J Theiler and S Perkins

alpha bound (Joachims 2000) and approximate span bound (Vapnik ampChapelle 1999) Although Lee and Lin (2001) proposed a numerical solutionto reduce the computation for directly implementing LOOCV the amount ofcomputation required is still considerable Also the accuracy of the LOOCVresult obtained using this method can be potentially compromised becausea different parameter set is employed in LOOCV and in the nal training

The decremental algorithm of AOSVR provides an efcient implemen-tation of LOOCV for SVR

1 Given a data set D construct the SVR function f x from the wholedata set D using batch SVR learning algorithm

2 For each nonsupport vector xi in the data set D calculate error eicorresponding to xi as ei D yi iexcl f xi where yi is the target valuecorresponding to xi

3 For each support vector xi involved in the SVR function f x

(a) Unlearn xi from the SVR function f x using the decrementalalgorithm to obtain the SVR function fix which would beconstructed from the data set Di D D=fxig

(b) Calculate error ei corresponding to support vector xi as ei Dyi iexcl fixi where yi is the target value corresponding to xi

4 Knowing the error for each sample xi in D it is possible to constructa variety of overall measures a simple choice is the MSE

MSELOOCVD D1N

NX

ie2i (51)

where N is the number of samples in data set D Other choices oferror metric such as MAE can be obtained by altering equation 51appropriately

531 Experiment The algorithm parameters in this experiment are setthe same as those in the experiments in section 511 Two famous regressiondata sets the Auto-MPG and Boston Housing data sets are chosen from theUCI Machine-Learning Repository Some basic information on these datasets is listed in Table 3

Table 3 Information Regarding Experimental Regression Data Sets

Number of Attributes Number of Samples SV Ratio

Auto-MPG 7 392 4107Boston Housing 13 506 3636

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 17: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2699

200 220 240 260 280 300 320 340 360 380

102

LOOCV Time Comparison of Autoshy MPG Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

LibSVMR (Cold Start)LibSVMR (Warm Start)AOSVR

300 350 400 450 500

101

102

LOOCV Time Comparison of Boston Housing Dataset

Samples in Training Set

LO

OC

V T

ime

(sec

on

d)

Figure 6 Semilog plots of LOOCV time of Auto-MPG and Boston Housing dataset

The experimental results of both data sets are presented in Figure 6 Thex-axis is the size of the training set upon which the LOOCV is implementedThese plots show that AOSVR-based LOOCV is much faster than its Lib-SVMR counterpart

6 Conclusion

We have developed and implemented an accurate on-line support vectorregression (AOSVR) algorithm that permits efcient retraining when a newsample is added to or an existing sample is removed from the training setAOSVR is applied to on-line time-series prediction and to leave-one-outcross-validation and the experimental results demonstrate that the AOSVRalgorithm is more efcient than conventional batch SVR in these scenariosMoreover AOSVR appears less sensitive than batch SVR to the difculty ofthe underlying problem

After this manuscript was prepared we were made aware of a similar on-line SVR algorithm which was independently presented in Martin (2002)

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 18: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2700 J Ma J Theiler and S Perkins

Appendix Pseudocode for Incrementing AOSVR with a New DataSample

Inputs

sup2 Training set T D fxi yi i D 1 lgsup2 Coefcients fmicroi i D 1 lg and bias b

sup2 Partition of samples into sets S E and R

sup2 Matrix R dened in equation 311

sup2 New sample xc yc

Outputs

sup2 Updated coefcients fmicroi i D 1 l C 1g and bias b

sup2 Updated Matrix R

sup2 Updated partition of samples into sets S E and R

AOSVR Incremental Algorithm

sup2 Initialize microc D 0

sup2 Compute f xc DX

i2E[S

microiQic C b

sup2 Compute hxc D f xc iexcl yc

sup2 I jhxcj middot then assign xc to R and terminate

sup2 Let q D signiexclhxc be the sign that 1microc will take

sup2 Do until the new sample xc meets the KKT condition

mdash Update macr deg according to equations 37 and 39mdash Start bookkeeping procedure

Check the new sample xcmdashLc1 D iexclhxc iexcl q=degc (Case 1)mdashLc2 D qC iexcl microc (Case 2)Check each sample xi in the set S (Case 3)mdashIf q i gt 0 and C gt microi cedil 0 LS

i D C iexcl microi=macri

mdashIf q i gt 0 and 0 gt microi cedil iexclC LSi D iexclmicroi=macri

mdashIf q i lt 0 and C cedil microi gt 0 LSi D iexclmicroi=macri

mdashIf q i lt 0 and 0 cedil microi gt iexclC LSi D iexclC iexcl microi=macri

Check each sample xi in the set E (Case 4)mdashLE

i D iexclhxi iexcl signqmacri=macri

Check each sample xi in the set R (Case 5)mdashLR

i D iexclhxi iexcl signqmacri=macri

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 19: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2701

Set 1microc D qminjLc1j jLc2j jLSj jLEj jLRj whereLS D fLS

i i 2 Sg LE D fLEi i 2 Eg and LR D fLR

i i 2 RgLet Flag be the case number that determines 1micro

Let xl be the particular sample in T that determines 1microc

mdash End Bookkeeping Proceduremdash Update microc b and microi i 2 S according to equation 36mdash Update hxi i 2 E [ R according to equation 38mdash Switch Flag

(Flag=1)Add new sample xc to set S update matrix R accordingto equation 313(Flag=2)Add new sample xc to set E(Flag=3)If microl D 0 move xl to set R update R according to equa-tion 312If jmicrolj D C move xl to set E update R according toequation 312(Flag=4)Move xl to set S update R according to equation 313(Flag=5)Move xl to set S update R according to equation 313

mdash End Switch Flagmdash If Flagmiddot 2 terminate otherwise continue the Do-Loop

sup2 Terminate incremental algorithm ready for the next sample

Acknowledgments

We thank Chih-Jen Lin in National University of Taiwan for useful sug-gestions on some implementation issues We also thank the anonymousreviewers for pointing us to the work of Martin (2002) and for suggestingthat we compare AOSVR to the warm-start variants of batch algorithmsThis work is supported by the NASA project NRA-00-01-AISR-088 and bythe Los Alamos Laboratory Directed Research and Development (LDRD)program

References

Cauwenberghs G amp Poggio T (2001) Incremental and decremental supportvector machine learning In T K Leen T G Dietterich amp V Tresp (Eds)

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 20: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

2702 J Ma J Theiler and S Perkins

Advances in neural informationprocessingsystems 13 (pp409ndash123)CambridgeMA MIT Press

Chang C-C amp Lin C-J (2001) LIBSVM a library for support vector machinesSoftware Available on-line at httpwwwcsientuedutwraquocjlinlibsvm

Chang C-C amp Lin C-J (2002) Training ordm-support vector regression Theoryand algorithms Neural Computation 14 1959ndash1977

CsatoL amp Opper M (2001)Sparse representation for gaussian process modelsIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 444ndash450) Cambridge MA MIT Press

Fernandez R (1999) Predicting time series with a local support vector regres-sion machine InAdvancedCourse onArticial Intelligence (ACAI rsquo99) Availableon-line at httpwwwiitdemokritosgrskeleetnacai99

Fliege J amp Heseler A (2002) Constructing approximations to the ef-cient set of convex quadratic multiobjective problems Dortmund GermanyFachbereich Mathematik Universitat Dortmund Available on-line athttpwwwoptimization-onlineorgDB HTML200201426html

Friedel RH Reeves G D amp Obara T (2002) Relativistic electron dynamics inthe inner magnetospheremdasha review Journal of Atmosphericand Solar-TerrestrialPhysics 64 265ndash282

Gentile C (2001)A new approximate maximal margin classication algorithmJournal of Machine Learning Research 2 213ndash242

Gondzio J (1998)Warm start of the primal-dual method applied in the cuttingplane scheme Mathematical Programming 83 125ndash143

Gondzio J amp Grothey A (2001) Reoptimization with the primal-dual interiorpoint method SIAM Journal on Optimization 13 842ndash864

Graepel T Herbrich R amp Williamson R C (2001) From margin to sparsityIn T K Leen T G Dietterich amp V Tresp (Eds) Advances in neural informationprocessing systems 13 (pp 210ndash216) Cambridge MA MIT Press

Gunn S (1998) Matlab SVM toolbox Software Available on-line at httpwwwisisecssotonacukresourcessvminfo

Herbster M (2001) Learning additive models online with fast evaluating ker-nels In D P Helmbold amp B Williamson (Eds) Proceedings of the 14th An-nual Conference on Computational Learning Theory (pp 444ndash460) New YorkSpringer-Verlag

Joachims T (2000) Estimating the generalization performance of an SVM ef-ciently In P Langley (Ed) Proceedings of the Seventeenth International Confer-ence on Machine Learning (pp 431ndash438) San Mateo Morgan Kaufmann

Kivinen J Smola A Jamp Williamson R C (2002)Online learning with kernelsIn T G Dietterich S Becker amp Z Ghahramani (Eds) Advances in neuralinformation processing systems 14 (pp 785ndash792) Cambridge MA MIT Press

Lee J-H amp Lin C-J (2001)Automatic model selection for support vector machines(Tech Rep) Taipei Taiwan Department of Computer Science and Informa-tion Engineering National Taiwan University

Li Y amp Long P M (1999) The relaxed online maximum margin algorithm InS A Solla T K Leen amp K-R Muller (Eds) Advances in neural informationprocessing systems 12 (pp 498ndash504) Cambridge MA MIT Press

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003

Page 21: Accurate On-line Support Vector Regression · 2006. 8. 28. · Accurate On-line Support Vector Regression 2689 In the special case whenSset is empty, according to equations 3.1 and

Accurate On-line Support Vector Regression 2703

Mackey M C amp Glass L (1977)Oscillation and chaos in physiological controlsystems Science 197 287ndash289

Martin M (2002)On-line supportvectormachines for functionapproximation(TechRep LSI-02-11-R)Catalunya Spain Software Department Universitat Po-litecnica de Catalunya

Muller K-R Smola A J Ratsch G Scholkopf B Kohlmorgen J amp VapnikV (1997)Predicting time series with support vector machines In W Gerstner(Ed) Articial Neural NetworksmdashICANN rsquo97 (pp 999ndash1004) Berlin Springer-Verlag

Ralaivola L amp drsquoAlche-Buc F (2001) Incremental support vector machinelearning A local approach In G Dorffner H Bischof amp K Hornik (Eds) Ar-ticial Neural NetworksmdashICANN 2001 (pp 322ndash330) Berlin Springer-Verlag

Shevade S K Keerthi S S Bhattacharyya C amp Murthy K R K (1999)Improvements to SMO algorithm for SVM regression (Tech Rep No CD-99-16)Singapore National University of Singapore

Smola A (1998) Interior point optimizer for SVM pattern recognition SoftwareAvailable on-line at httpwwwkernel-machinesorgcodeprloqotargz

Smola A J amp Scholkopf B (1998) A tutorial on support vector regression (Neu-roCOLT Tech Rep No NC-TR-98-030) London Royal Holloway CollegeUniversity of London

Syed N A Liu H amp Sung K K (1999) Incremental learning with supportvector machines In Proceedings of the Workshop on Support Vector Machines atthe International Joint Conference on Articial IntelligencemdashIJCAI-99 San MateoMorgan Kaufmann

Tashman L J (2000) Out-of-sample tests of forecasting accuracy An analysisand review International Journal of Forecasting 16 437ndash450

Tay F E H amp Cao L (2001)Application of support vectormachines in nancialtime series forcasting Omega 29 309ndash317

Vanderbei R J (1999) LOQO An interior point code for quadratic program-ming Optimization Methods and Software 11 451ndash484

Vapnik V (1998) Statistical learning theory New York WileyVapnik V amp Chapelle O (1999)Bounds on error expectation for support vector

machine In A Smola P Bartlett B Scholkopf amp D Schuurmans (Eds)Advances in large margin classiers (pp 261ndash280) Cambridge MA MIT Press

Weigend A S amp Gershenfeld N A (1994)Time-series predictionForecasting thefuture and understanding the past Reading MA Addison-Wesley

Yildirim E A amp Wright S J (2002) Warm-start strategies in interior-pointmethods for linear programming SIAM Journal on Optimization 12 782ndash810

Received September 12 2002 accepted May 1 2003