Sales Rate and Cumulative Sales Forecasting Using Kalman Filtering Techniques

1

Sales Rate and Cumulative Sales Forecasting Using Kalman Filtering Techniques

Michael Munroe, Intel Corporation

Ted Hench, Arizona State University, 480 614 1496, [email protected] Matthew Timmons, Arizona State University, 480 861 3618, [email protected]

March 19, 2009

2

Sales Rate and Cumulative Sales Forecasting Using Kalman

Filtering Techniques

Abstract

The authors describe the use of the Robbins-Monro stochastic estimator (RM-se) combined with

the continuous-discrete Extended Kalman Filter (c/d-EKF) for estimating unknown values in

product diffusion models. The c/d-EKF extends the discrete Kalman Filter to non-linear

systems, while the RM-se allows the c/d-EKF to effectively estimate model parameters, despite

corruptive noise in available data. With reasonable accuracy, this modified filter predicts total

sales over the lifetime of the product and time of peak sales. For purposes of demonstration, this

paper assumes a Bass model, though the filters described are not uniquely linked to this model,

and extension or application to other models is briefly discussed.

KEYWORDS: Kalman Filter, Forecasting, Modeling

3

Sales Forecasting Using Kalman Filtering

Accurately estimating future demand for a product is an essential part of a fiscally sound

business strategy. Overestimating future demand results in increased costs associated with the

storage and possible disposal of excess inventory resulting from over-production.

Underestimating future demand results in inventory shortages and the loss of sales opportunities

and customer good-will.

One model commonly used to fit sales data is the Bass Model (Bass 1969). While the Bass

Model can be used to generate a relatively accurate fit given enough data points, it is sensitive to

noise in the data and often can not predict the time of peak sales prior to that event. A search of

the available literature on the subject resulted in the determination that there is a need for a

noise-resistant demand forecast model that will generate an accurate and timely prediction of the

time of peak sales and the total market depth within the constraints of typically available sales

data. Specifically, the model would allow identification of the time of peak sales prior to the

event, allowing manufacturers to plan for the post-peak decline in demand and the eventual

obsolescence of the product.

In an effort to identify or generate a demand forecast model that meets these criteria, the

authors initially evaluated the Simple Kalman Filter (IEEE, 1974). The Simple Kalman Filter

was originally introduced to mechanical engineering in 1959 by Rudolf Kalman. This tool

combines state and measurement models with actual (noisy) measurements to estimate

underlying noiseless measurements and to infer the value of hidden model parameters. The

authors determined that the forecasted demand generated by the Simple Kalman Filter did not

meet the specified criteria and proceeded to evaluate derivatives of the Simple Kalman Filter.

4

During this process, the continuous-discrete Extended Kalman Filter (c/d-EKF) (Xie et al.

1996) was identified as a market-typical tool that generated timely and accurate predictions of

demand when matched with idealized sales data. After the ideal-case performance had been

evaluated and characterized, the c/d-EKF was evaluated using more realistic sales data. It was

discovered that, although this tool worked in ideal circumstances, even small errors in

measurement uncertainty estimates made its results diverge. An extension to the c/d-EKF, the

Robbins-Monro stochastic estimator (RM-se), was then evaluated using market-realistic sales

data. This model generated promising results, but also left significant room for improvement. It

is important to note that there exist numerous alternatives to the Bass Model that can be

substituted into the described RM-se method with relative ease, both as single and as mixture

models. These alternatives are not evaluated here for the purposes of simplicity and clarity.

5

Preliminaries

The Bass Model

The Model presented by Bass in his original paper (1969) consists of the following:

(1) S T pm q p Y Tq

mY T

2

(2) Y T S t0

T

dt

Y represents the cumulative sales and S represents the sales at any instant of time (T). The

constant parameters p, q and m are the coefficients of innovation and imitation and the total

market size, respectively.

Although there are three parameters in the Bass-model, only two are quantities of business

interest: time of peak sales, and total market. The total market informs the forecaster of the

projected quantity of the product that needs to be manufactured while the time of peak sales

indicates how quickly production needs to take place. The equation for the time of peak sales is:

(3) t* ln q p

q p

Measurement Strategy

Ensemble testing was performed to validate the results of the models implemented. Each

model was run a significant number of times using newly generated random data added to the

robustly generated non-random data. After the completion of each test run, the resulting data

6

ensemble was analyzed to determine the ensemble mean and the ensemble standard deviation

performance, as these were determined to be good indicators of the overall performance of the

model.

During this process, it was essential to construct a consistent set of metrics in order to

evaluate the predictive performance of each model. The functions used to generate the

performance metrics were error in prediction of peak time

(5) t* log10 abstpeak

est

tpeak

exact1

and error in prediction of total market

(6) M* log10 absmpeak

est

mpeak

exact1 .

The metrics generated were log base ten of the absolute value of these two errors.

Simulation

The simulation data was generated using a Runga-Kutta numeric integrator on the Bass

equations with known model parameters. The column medians were taken from the original data

in the Bass paper (1969): m=40001, p=.018119 and q=.30145. The initial value for sales was

set to 15. This yielded a maximum sales rate of approximately 3387 and a peak time of 8.7982.

The mean sales from initial to peak time were approximately 2142. That data was then corrupted

with Gaussian noise that was scaled by the product of the noise factor multiplied by the mean of

total sales. The additive noise scaling factors ranged logarithmically from 10-4.5

to 10-0.8

and had

12 elements. These elements corresponded to noise scales that ranged from 1.3 through

7

approximately 6339 units of error.

Kalman and Ensemble Criteria

The Kalman Filter requires the noise in the sales data to be non-correlated and to have a zero

mean in order for the filter to function in an ideal manner. The Gaussian random number

generator used to generate the noise was carefully evaluated to ensure the random numbers were

non-correlated and, as a result of this evaluation, the sample size was set to a minimum of 280

data points. This resulted in the added bias being at or below 5% for 90% of the simulation time.

The simulation time was then extended from the initial time to peak-sales time. A second case,

using only 52 samples to represent the number of weeks in a business year, was used to more

accurately simulate a business forecasting environment.

A similar evaluation was required for the uniform random number generator used to

randomly perturb the initial model parameters as part of the ensemble testing. Based on this

evaluation, it was determined that a minimum acceptable ensemble size of 100 would provide a

relative estimation error of .1% in approximately 90% of the iterations of the model. Although

they were only evaluated for single-element data, these values for sample count and ensemble

size were used for all tests.

Filters

Algorithm for the Simple Kalman Filter

In “An Introduction to the Kalman Filter,” Welch (2006) provided a clear tutorial with good

explanations and examples for use of the Simple Kalman Filter. The notation and conventions of

Welch were used in this paper to provide clear transitions for those less familiar with these

8

numeric techniques.

Welch described the Simple Kalman Filter as a predictor-corrector. First a linear state-

transition matrix predicts the updated state (a priori) estimate for the next time-step. At the next

time-step, the Kalman gain, computed from the covariances, is applied in conjunction with the

measurement to correct the prediction and obtain an updated (a posteriori) estimate for the state.

The general equations governing the system are:

(8) xk Axk 1 Buk 1 wk 1

(9) zk Hxk vk

In Equation 8, x is the state vector, A is the discrete state transition matrix, u is the control

input vector, B is the control input transition matrix and w is the state transition additive noise

vector. In Equation 9, z is the measured value, H is the measurement transform on the state, and

v is the measurement additive noise term. It is assumed that the additive noises are zero-mean,

uncorrelated noises. The covariances are defined as follows:

(10) Q cov w

(11) R cov

(12) P E ˆ x k xkˆ x k xk

T

These noises are essential to the operation of the Simple Kalman Filter and are used to determine

the best balance point between the predicted model and measurement for the posterior estimate

of the state.

9

The Simple Kalman Filter functions as follows:

Step 1) Initialize the a priori estimate and covariances

(13) x1

( ) E(x1)

P, Q and R are typically found using heuristics and what Krane (2005) calls “voodoo”.

Step 2) Predict the state and state covariance using state update transforms

(14) xk

( ) Axk 1

( ) Buk 1

(15) Pk

( ) APk 1AT Q

Step 3) Compute the Kalman Gain and a posteriori state covariance

(16) Kk Pk HT HPk HT R1

(17) Pk I KkH Pk

Step 4) Correct the current state estimate using measurement and Kalman gain

(18) i

kkkkk xHzKxx ˆˆˆ

Step 5) Iterate steps 2-4 forward to the end of measured data

The Simple Kalman Filter is an optimal estimator only if the system is linear and reasonable

methods are available to accurately determine the covariances P, Q and R. The Bass model

describes a differential equation whose discrete form introduces what Xie et al. call “time-

interval bias” error (1996). This error causes under-prediction of sales rate before peak time, and

over-prediction after. Some useful diffusion models cannot be readily converted to a discrete

10

form, and thus cannot be implemented for use in a Simple Kalman Filter. This style of Kalman

Filter requires empirically derived, and thus hard to troubleshoot, modification in order to work

properly. It was for these reasons that a more extensible, robust, and unbiased derivative of the

Simple Kalman Filter was explored.

Algorithm for the Continuous-Discrete Extended Kalman Filter

A more recent optimal estimation method is the continuous-discrete Extended Kalman Filter

(c/d-EKF) described by Xie et al. (1996). This filter addresses many of the issues brought forth

by the weaknesses of the Simple Kalman Filter. Its state and state covariance update equations

are applied to a differential form of the state equation using an “off-the-shelf” numerical

integrator. Although this form has no “time-interval” bias, it still requires an a priori

understanding of the covariances.

Since the test data was generated using a numerical simulation, good estimates of the exact

values of all of the covariances were known. The error Q is the tolerance of the numerical

integration. In this case, it was estimated as a square matrix where the diagonal is the integrator

relative tolerance setting (default = .0001) multiplied by the value of the predicted state

multiplied by the number of times the integrator was executed per measurement, in this case 10

(discussed later), yielding an overall factor of .001 times the state. Most numerical integrators

have both a relative and absolute termination criteria. Because of the scales of the explored

values (tens of thousands for the total market), the only termination criterion activated was the

relative tolerance. The integrator stopped when the integral had converged to the relative

tolerance multiplied by the value of the integral. This covariance changed with the estimated

state at each iteration and thereby included measurement information in the estimate of the state

11

covariance. The measurement error R is the square of the additive noise, in this case the noise

factor multiplied by the square of peak sales. The initial state covariance P was determined by

creating a uniform cloud of initial states centered on the initial state, integrating forward in time,

and then calculating the covariance of the cloud. This filter effectively achieved its purpose of

utilizing a less effective initial model to derive a more accurate final model. To that end, the

initial state was equal to the exact state plus a uniform-random perturbation with magnitude

within 25% of the exact.

The c/d-EKF estimator for the Bass model works as follows

Step 1) Initialize the state and covariances

(19) ˆ x 0 x0

exact 25% x0

exact

(20) P1

N 1xi x i

T

i 1

N

(21) R =2

(22) q j executions integrator tolerancej

parameter vectorj

(23) Q

q1

qm

2

Step 2) Predict the state and covariance

(24) tk,1...n

mid , ˆ x k,1...n

mid rk S p,q,m , tk 1,tk , ˆ x k 1

(25) ˆ x k, tk

ˆ x k,end

mid

12

(26) Pk rkdP

dt,tk,1...n

mid , ˆ x k,1...n

mid

(27) dP

dtF(x,t)PT PF(x,t) Q

(28) F(x,t)S y(t),u(t), t

xy(t ) y

Step 3) The equations for steps 3-5 are identical to the simple Kalman Filter and are

similarly iterated through the dataset.

The solver used in the filter was a 2nd

order Runga-Kutta method which was chosen because

of its speed. The differential form of the state covariance is taken from the Kalman-Bucy filter

(the continuous analog to the discrete simple Kalman filter) which allows the state covariance to

be wrapped in a numerical integrator. The number of time-steps per numerical integration per

measurement was set to ten and thus a good estimate of the error is ten times the integration

tolerance. It is essential that the prior state be used in the covariance update.

It is worth noting that this is essentially a first order method which integrates the covariance

forward in time. A second order method would integrate backwards from tk,n

mid , average the

covariances, and then integrate forward. This Heun-like second order method has a higher

computation-time and memory cost.

Although this method addresses some of the shortcomings of the simple Kalman filter, it still

requires precise knowledge of the covariances in order to work effectively. It is not a non-linear

estimator for state or state covariance, and it still presumes Gaussian covariances. A particle

filter, while able to overcome those shortcomings, is a significantly more complex undertaking.

13

Results for the c/d-EKF

The c/d-EKF was ensemble tested by computing the peak-time and total-market errors for

each of the 15 steps in the error vector at each of the 280 time-steps between the initial state and

peak sales. This gave an ensemble data size of 100x280x15 – 100 ensemble elements, 280 time

steps, and 15 error factors.

Several statistics including mean, median, standard deviation and the 1st and 99

th percentiles

of the ensemble volume are included in Table 1. The obtained values for mean and median were

not surprising since the distributions were centered. However, the sizes of the percentiles and

the standard deviation indicated that the system tended to converge over each application of the

filter to simulated data. About 98% of all peak prediction errors were less than 1.11% and total

market errors were less than 1.93%. From this information, it can be inferred that this tool is

displaying consistent convergent behavior during the simulation time, over the ensemble and for

a wide variety of noise scaling factors.

Figures 1 and 2 show the standard ensemble standard deviation as a function of time and

input error for the peak sales time and total market respectively. Note that for noise factors

under .1, the model error reduces significantly within the first two time-periods, and a second

convergence occurs in the second half of the dataset. Figure 3 shows a sample exact run with a

noise figure of approximately .5%. The associated plot of error in prediction of peak time is

shown in Figure 4. After time .28, the error was not greater than 4% - a relevant point

considering the apparent difference between signal and noise shown in Figure 3.

When the initial state covariance was randomly perturbed by a Gaussian noise scaled to the

magnitude of the covariance, the results were much more problematic (Figs. 5 and 6). Observe

14

that although the filter produces an improvement in measured sales values, the peak time

estimation error diverges and is at almost 100% error at the end of the process. There were terms

in the state covariance that are on the order of 107 and others that were 10

-6 resulting in a

condition number for the state covariance matrix of 1.7x1013

.

There are a number of issues which prevent this model from meeting our stated criteria.

Although the exact model with perfect information works very well, it requires knowledge of the

measurement correlation, which is not typically available a priori for the measurement

covariance R. R is not only unknown, but also often impossible to measure directly, although it

can be estimated using a Robbins-Monro method.

The Kalman assumptions about noise are not consistent with real-world noise. Real-world

noise terms would probably not be the zero mean, non-correlated noise assumed in the c/d-EKF.

As we have seen, even moderate noise in the sales data cause the c/d-EKF to diverge from the

curve underlying the noisy sales data.

In practice, sales data is sampled monthly and product lifetimes are on the order of 1 to 2

years – not the 23.3 years it would take to get 280 data points before the peak. Daily reporting

would be required to generate this number of data points, instead of the weekly or monthly

reporting common in business environments.

Finally, there are no predictive indicators of changes in the fundamental nature of a business

and, since these changes often occur quickly, an assumption of constant parameters, and

therefore linearity, is false. Under these circumstances, the Bass model that a business started

with can no longer be used as the final diffusion model for the product and a single Bass model

for the lifespan of the product cannot be assumed.

15

An effective tool for use in the real world must work without a measurement covariance

estimate. It must work with sample sizes on the order of 12 (monthly) to 52 (weekly) data points

associated with realistic data-sampling periods. It must be able to indicate and account for

changes in the fundamental nature of the market, and still predict the time of peak sales and total

market with acceptable accuracy. It should also be able to incorporate diffusion models other

than the Bass Model.

Model for c/d-EKF with RM-se

Wan and Nelson (2001) and Bishop (1995), in his section on sequential parameter

estimation, describe the use of a Robbins-Monro stochastic estimator applied to a Kalman filter

to iteratively estimate the covariance matrices and the specific details required to successfully

implement the estimator, respectively. Essentially, as a form of an iterative averaging tool, this

estimator allowed a perturbed estimate of R to be made to converge toward the exact value of R.

This method was found to converge toward the exact values over each iteration. Note that the

estimates for p, q, m and R resulting from one transit of the dataset were used as initial values for

a second iteration. This allowed continued successive convergence of the approximate tool

toward performance of the exact.

Describe Algorithm: c/d-EKF with RM-se

The data generation method, initial values for parameters and ensemble evaluation methods

described above were used to test the c/d-EKF with RM-se with the following exceptions:

The initial measurement covariance was uniform-randomly perturbed by ± 25% from

the exact value, and then the Robbins-Monro estimator was used to refine it.

16

The noise factor was varied from 10-3.5

to 10-0.5

using 35 logarithmically spaced

sample points.

The Robbins-Monro estimator was set up as follows:

Rk

e 1 k Rk 1

e

k KK yk Hˆ x k yk Hˆ x kT

KK

T

ak

1

k, round

1

52k 1

Where:

i

i 1

i

2

i 1

limi

0

Note that the superscript e indicates estimator. Kk is the Kalman gain, the error term

(y - Hx) is evaluated on the most current data and is the learning rate parameter.

Describe Results: c/d-EKF with RM-se

For use with all noise values and being limited to plausible business constraints, the standard

deviation of the ensemble decreased as the algorithm progressed as can be seen in Figures 7 and

8. Without knowledge of the measurement covariance, the estimates converged toward less error

for all cases, as shown by the average along the noise axis in Figure 9. In Figure 10, it can be

seen that the time-region of greatest rate of convergence in peak time estimate lies between t=2

and t=7 for most cases, with the highest noise terms this region is between t=4 and t=7, or

between 40% and 70% of the time until the peak sales rate is reached. For the total market

17

estimate the region of greatest convergence rate is more parabolic in noise factor, as seen in

Figure 11.

In practice, this estimator should be repeatedly evaluated over the dataset with the final

parameter estimates serving as the input values for the next cycle. When the mean change in the

norm of the estimated covariance R fell below a certain threshold between cycles, the process

was considered convergent. Some convergence is achieved on each cycle, but reasonable

convergence was not achieved until after several cycles, often less than 20, even with some

extreme initial errors in state estimates and in measurement covariance.

Conclusions

The c/d-EKF applied to the Bass Model was evaluated. Given accurate covariance values

and acceptable sample rates, this method was able to rapidly and effectively predict peak sales

and total market. Given slightly perturbed covariance values, this performance is lost.

A Robbins-Monro extension to the c/d-EKF applied to the Bass Model was presented and

shown to be able to predict peak sales and total market with real-world constraints of unknown

covariance, poor initial conditions and limited sampling rates.

Some directions of further research include: using the Kiefer - Wolfowitz estimators instead

of Robbins-Monroe, examining how different learning rates estimators for measurement

covariance perform as indicators for fundamental change in the nature of the market, application

to other diffusion models, such as those indicated in the paper by Meade and Islam (1998), and

using dual-EKF with Rauch-Tung-Streibel smoothing, and using Particle Filters in place of the

EKF. Also of interest is the exploration of the use of windowed datasets to determine time-

18

varying market uncertainty.

19

References

Bass, Frank (1969), “A new product growth model for consumer durables,” Management

Science, 15 (January), 215-27.

Bishop, Christopher M. (1995), Neural Networks for Pattern Recognition. New York: Oxford

University Press.

De Castro, Leandro Nunes (2006), Fundamentals of Natural Computing: Basic Concepts,

Algorithms, and Applications. Boca Raton: Chapman & Hall/CRC Press.

Evensen, Geir (2004), “Sampling strategies and square root analysis schemes for the EnKF,”

Ocean Dynamics, 54 (December), 539-60.

Hagan, Martin, Howard Demuth, and Mark Beale (1996), Neural Network Design. Boston: PWS

Publishing Co.

Wan, Eric and Alex Nelson (2001), “Dual Extended Kalman Filter Methods,” in Kalman

Filtering and Neural Networks, Simon S. Haykin ed. New York: John Wiley & Sons, 123-73.

20

Institute of Electrical and Electronics Engineers (1974), IEEE Annual Banquet Booklet,

(accessed 30, 2008), [available at

http://www.ieee.org/web/aboutus/history_center/biography/kalman.html]

Kalman, Rudolf R. (1960), “A new approach to linear filtering and prediction problems,”

Journal of Basic Engineering, 34 (March), 35-45.

Krane, John (2005), “A Physicist on Wall Street,” (accessed January 2, 2009), [available at

http://www.fnal.gov/orgs/fermilab_users_org/Krane.ppt]

Meade, Nigel and Towhidul Islam (1998), “Technological Forecasting – Model Selection, Model

Stability, and Combining Models,” Management Science, 44 (August), 1115-30.

Welch, Greg and Gary Bishop (2006), “An Introduction to the Kalman Filter,” Technical Report

95-041, Department of Computer Science, University of North Carolina at Chapel Hill.

Xie, Jinhong, X. Michael Song, Marvin Sirbu, and Qiong Wang (1997), “Kalman Filter

Estimation of New Product Diffusion Models,” Journal of Marketing Research, 34 (August),

378-93.

http://www.ieee.org/web/aboutus/history_center/biography/kalman.html

http://www.fnal.gov/orgs/fermilab_users_org/Krane.ppt

21

Table

0

Table 1 – Statistics of c/d EKF filtering ensemble.

22

Figures

Figure 1 – Surface of Ensemble Standard Deviation of Peak Time Prediction Error as a function

of time (peak at 8.79) and log10 of noise factor.

23

Figure 2 – Surface of Ensemble Standard Deviation of Total Market prediction Error as a

function of time (peak at 8.79) and log10 of noise factor.

Figure 3 – Sample run of c/d-EKF with noise factor of .5% and 280 samples with Exact,

corrupted, and filtered data above and the filter error below.

24

Figure 4 – Sample run of c/d-EKF with noise factor of .5% and 280 samples showing log10 of

peak time prediction error.

25

Figure 5 – Sample run of c/d-EKF with noise factor of .5% and 280 samples, but with “Q”

perturbed, showing sales data filtering.

Figure 6 – Sample run of c/d-EKF with noise factor of .5% and 280 samples, but with “Q”

perturbed, showing log10 of peak time prediction error divergence.

26

Figure 7 – Surface of Ensemble Standard Deviation of Peak Time Prediction Error as a function

of time (peak at 8.79) and log10 of noise factor for c/d-EKF RM-se.

Figure 8 – Surface of Ensemble Standard Deviation of Total Market Prediction Error as a

function of time (peak at 8.79) and log10 of noise factor for c/d-EKF RM-se.

27

Figure 9 – Mean over Surface of Ensemble Standard Deviation of Prediction Error as a function

of time (peak at 8.79) for c/d-EKF RM-se.

Figure 10 – Gradient over Surface of Ensemble Standard Deviation of Peak Time Prediction

28

Error as a function of time (peak at 8.79) for c/d-EKF RM-se.

Figure 11 – Gradient over Surface of Ensemble Standard Deviation of Market Prediction Error

as a function of time (peak at 8.79) for c/d-EKF RM-se.

Documents

Sales Rate and Cumulative Sales Forecasting Using Kalman Filtering Techniques