Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
1
Sales Rate and Cumulative Sales Forecasting Using Kalman Filtering Techniques
Michael Munroe, Intel Corporation
Ted Hench, Arizona State University, 480 614 1496, [email protected] Matthew Timmons, Arizona State University, 480 861 3618, [email protected]
March 19, 2009
2
Sales Rate and Cumulative Sales Forecasting Using Kalman
Filtering Techniques
Abstract
The authors describe the use of the Robbins-Monro stochastic estimator (RM-se) combined with
the continuous-discrete Extended Kalman Filter (c/d-EKF) for estimating unknown values in
product diffusion models. The c/d-EKF extends the discrete Kalman Filter to non-linear
systems, while the RM-se allows the c/d-EKF to effectively estimate model parameters, despite
corruptive noise in available data. With reasonable accuracy, this modified filter predicts total
sales over the lifetime of the product and time of peak sales. For purposes of demonstration, this
paper assumes a Bass model, though the filters described are not uniquely linked to this model,
and extension or application to other models is briefly discussed.
KEYWORDS: Kalman Filter, Forecasting, Modeling
3
Sales Forecasting Using Kalman Filtering
Accurately estimating future demand for a product is an essential part of a fiscally sound
business strategy. Overestimating future demand results in increased costs associated with the
storage and possible disposal of excess inventory resulting from over-production.
Underestimating future demand results in inventory shortages and the loss of sales opportunities
and customer good-will.
One model commonly used to fit sales data is the Bass Model (Bass 1969). While the Bass
Model can be used to generate a relatively accurate fit given enough data points, it is sensitive to
noise in the data and often can not predict the time of peak sales prior to that event. A search of
the available literature on the subject resulted in the determination that there is a need for a
noise-resistant demand forecast model that will generate an accurate and timely prediction of the
time of peak sales and the total market depth within the constraints of typically available sales
data. Specifically, the model would allow identification of the time of peak sales prior to the
event, allowing manufacturers to plan for the post-peak decline in demand and the eventual
obsolescence of the product.
In an effort to identify or generate a demand forecast model that meets these criteria, the
authors initially evaluated the Simple Kalman Filter (IEEE, 1974). The Simple Kalman Filter
was originally introduced to mechanical engineering in 1959 by Rudolf Kalman. This tool
combines state and measurement models with actual (noisy) measurements to estimate
underlying noiseless measurements and to infer the value of hidden model parameters. The
authors determined that the forecasted demand generated by the Simple Kalman Filter did not
meet the specified criteria and proceeded to evaluate derivatives of the Simple Kalman Filter.
4
During this process, the continuous-discrete Extended Kalman Filter (c/d-EKF) (Xie et al.
1996) was identified as a market-typical tool that generated timely and accurate predictions of
demand when matched with idealized sales data. After the ideal-case performance had been
evaluated and characterized, the c/d-EKF was evaluated using more realistic sales data. It was
discovered that, although this tool worked in ideal circumstances, even small errors in
measurement uncertainty estimates made its results diverge. An extension to the c/d-EKF, the
Robbins-Monro stochastic estimator (RM-se), was then evaluated using market-realistic sales
data. This model generated promising results, but also left significant room for improvement. It
is important to note that there exist numerous alternatives to the Bass Model that can be
substituted into the described RM-se method with relative ease, both as single and as mixture
models. These alternatives are not evaluated here for the purposes of simplicity and clarity.
5
Preliminaries
The Bass Model
The Model presented by Bass in his original paper (1969) consists of the following:
(1) S T pm q p Y Tq
mY T
2
(2) Y T S t0
T
dt
Y represents the cumulative sales and S represents the sales at any instant of time (T). The
constant parameters p, q and m are the coefficients of innovation and imitation and the total
market size, respectively.
Although there are three parameters in the Bass-model, only two are quantities of business
interest: time of peak sales, and total market. The total market informs the forecaster of the
projected quantity of the product that needs to be manufactured while the time of peak sales
indicates how quickly production needs to take place. The equation for the time of peak sales is:
(3) t* ln q p
q p
Measurement Strategy
Ensemble testing was performed to validate the results of the models implemented. Each
model was run a significant number of times using newly generated random data added to the
robustly generated non-random data. After the completion of each test run, the resulting data
6
ensemble was analyzed to determine the ensemble mean and the ensemble standard deviation
performance, as these were determined to be good indicators of the overall performance of the
model.
During this process, it was essential to construct a consistent set of metrics in order to
evaluate the predictive performance of each model. The functions used to generate the
performance metrics were error in prediction of peak time
(5) t* log10 abstpeak
est
tpeak
exact1
and error in prediction of total market
(6) M* log10 absmpeak
est
mpeak
exact1 .
The metrics generated were log base ten of the absolute value of these two errors.
Simulation
The simulation data was generated using a Runga-Kutta numeric integrator on the Bass
equations with known model parameters. The column medians were taken from the original data
in the Bass paper (1969): m=40001, p=.018119 and q=.30145. The initial value for sales was
set to 15. This yielded a maximum sales rate of approximately 3387 and a peak time of 8.7982.
The mean sales from initial to peak time were approximately 2142. That data was then corrupted
with Gaussian noise that was scaled by the product of the noise factor multiplied by the mean of
total sales. The additive noise scaling factors ranged logarithmically from 10-4.5
to 10-0.8
and had
12 elements. These elements corresponded to noise scales that ranged from 1.3 through
7
approximately 6339 units of error.
Kalman and Ensemble Criteria
The Kalman Filter requires the noise in the sales data to be non-correlated and to have a zero
mean in order for the filter to function in an ideal manner. The Gaussian random number
generator used to generate the noise was carefully evaluated to ensure the random numbers were
non-correlated and, as a result of this evaluation, the sample size was set to a minimum of 280
data points. This resulted in the added bias being at or below 5% for 90% of the simulation time.
The simulation time was then extended from the initial time to peak-sales time. A second case,
using only 52 samples to represent the number of weeks in a business year, was used to more
accurately simulate a business forecasting environment.
A similar evaluation was required for the uniform random number generator used to
randomly perturb the initial model parameters as part of the ensemble testing. Based on this
evaluation, it was determined that a minimum acceptable ensemble size of 100 would provide a
relative estimation error of .1% in approximately 90% of the iterations of the model. Although
they were only evaluated for single-element data, these values for sample count and ensemble
size were used for all tests.
Filters
Algorithm for the Simple Kalman Filter
In “An Introduction to the Kalman Filter,” Welch (2006) provided a clear tutorial with good
explanations and examples for use of the Simple Kalman Filter. The notation and conventions of
Welch were used in this paper to provide clear transitions for those less familiar with these
8
numeric techniques.
Welch described the Simple Kalman Filter as a predictor-corrector. First a linear state-
transition matrix predicts the updated state (a priori) estimate for the next time-step. At the next
time-step, the Kalman gain, computed from the covariances, is applied in conjunction with the
measurement to correct the prediction and obtain an updated (a posteriori) estimate for the state.
The general equations governing the system are:
(8) xk Axk 1 Buk 1 wk 1
(9) zk Hxk vk
In Equation 8, x is the state vector, A is the discrete state transition matrix, u is the control
input vector, B is the control input transition matrix and w is the state transition additive noise
vector. In Equation 9, z is the measured value, H is the measurement transform on the state, and
v is the measurement additive noise term. It is assumed that the additive noises are zero-mean,
uncorrelated noises. The covariances are defined as follows:
(10) Q cov w
(11) R cov
(12) P E ˆ x k xkˆ x k xk
T
These noises are essential to the operation of the Simple Kalman Filter and are used to determine
the best balance point between the predicted model and measurement for the posterior estimate
of the state.
9
The Simple Kalman Filter functions as follows:
Step 1) Initialize the a priori estimate and covariances
(13) x1
( ) E(x1)
P, Q and R are typically found using heuristics and what Krane (2005) calls “voodoo”.
Step 2) Predict the state and state covariance using state update transforms
(14) xk
( ) Axk 1
( ) Buk 1
(15) Pk
( ) APk 1AT Q
Step 3) Compute the Kalman Gain and a posteriori state covariance
(16) Kk Pk HT HPk HT R1
(17) Pk I KkH Pk
Step 4) Correct the current state estimate using measurement and Kalman gain
(18) i
kkkkk xHzKxx ˆˆˆ
Step 5) Iterate steps 2-4 forward to the end of measured data
The Simple Kalman Filter is an optimal estimator only if the system is linear and reasonable
methods are available to accurately determine the covariances P, Q and R. The Bass model
describes a differential equation whose discrete form introduces what Xie et al. call “time-
interval bias” error (1996). This error causes under-prediction of sales rate before peak time, and
over-prediction after. Some useful diffusion models cannot be readily converted to a discrete
10
form, and thus cannot be implemented for use in a Simple Kalman Filter. This style of Kalman
Filter requires empirically derived, and thus hard to troubleshoot, modification in order to work
properly. It was for these reasons that a more extensible, robust, and unbiased derivative of the
Simple Kalman Filter was explored.
Algorithm for the Continuous-Discrete Extended Kalman Filter
A more recent optimal estimation method is the continuous-discrete Extended Kalman Filter
(c/d-EKF) described by Xie et al. (1996). This filter addresses many of the issues brought forth
by the weaknesses of the Simple Kalman Filter. Its state and state covariance update equations
are applied to a differential form of the state equation using an “off-the-shelf” numerical
integrator. Although this form has no “time-interval” bias, it still requires an a priori
understanding of the covariances.
Since the test data was generated using a numerical simulation, good estimates of the exact
values of all of the covariances were known. The error Q is the tolerance of the numerical
integration. In this case, it was estimated as a square matrix where the diagonal is the integrator
relative tolerance setting (default = .0001) multiplied by the value of the predicted state
multiplied by the number of times the integrator was executed per measurement, in this case 10
(discussed later), yielding an overall factor of .001 times the state. Most numerical integrators
have both a relative and absolute termination criteria. Because of the scales of the explored
values (tens of thousands for the total market), the only termination criterion activated was the
relative tolerance. The integrator stopped when the integral had converged to the relative
tolerance multiplied by the value of the integral. This covariance changed with the estimated
state at each iteration and thereby included measurement information in the estimate of the state
11
covariance. The measurement error R is the square of the additive noise, in this case the noise
factor multiplied by the square of peak sales. The initial state covariance P was determined by
creating a uniform cloud of initial states centered on the initial state, integrating forward in time,
and then calculating the covariance of the cloud. This filter effectively achieved its purpose of
utilizing a less effective initial model to derive a more accurate final model. To that end, the
initial state was equal to the exact state plus a uniform-random perturbation with magnitude
within 25% of the exact.
The c/d-EKF estimator for the Bass model works as follows
Step 1) Initialize the state and covariances
(19) ˆ x 0 x0
exact 25% x0
exact
(20) P1
N 1xi x i
T
i 1
N
(21) R =2
(22) q j executions integrator tolerancej
parameter vectorj
(23) Q
q1
qm
2
Step 2) Predict the state and covariance
(24) tk,1...n
mid , ˆ x k,1...n
mid rk S p,q,m , tk 1,tk , ˆ x k 1
(25) ˆ x k, tk
ˆ x k,end
mid
12
(26) Pk rkdP
dt,tk,1...n
mid , ˆ x k,1...n
mid
(27) dP
dtF(x,t)PT PF(x,t) Q
(28) F(x,t)S y(t),u(t), t
xy(t ) y
Step 3) The equations for steps 3-5 are identical to the simple Kalman Filter and are
similarly iterated through the dataset.
The solver used in the filter was a 2nd
order Runga-Kutta method which was chosen because
of its speed. The differential form of the state covariance is taken from the Kalman-Bucy filter
(the continuous analog to the discrete simple Kalman filter) which allows the state covariance to
be wrapped in a numerical integrator. The number of time-steps per numerical integration per
measurement was set to ten and thus a good estimate of the error is ten times the integration
tolerance. It is essential that the prior state be used in the covariance update.
It is worth noting that this is essentially a first order method which integrates the covariance
forward in time. A second order method would integrate backwards from tk,n
mid , average the
covariances, and then integrate forward. This Heun-like second order method has a higher
computation-time and memory cost.
Although this method addresses some of the shortcomings of the simple Kalman filter, it still
requires precise knowledge of the covariances in order to work effectively. It is not a non-linear
estimator for state or state covariance, and it still presumes Gaussian covariances. A particle
filter, while able to overcome those shortcomings, is a significantly more complex undertaking.
13
Results for the c/d-EKF
The c/d-EKF was ensemble tested by computing the peak-time and total-market errors for
each of the 15 steps in the error vector at each of the 280 time-steps between the initial state and
peak sales. This gave an ensemble data size of 100x280x15 – 100 ensemble elements, 280 time
steps, and 15 error factors.
Several statistics including mean, median, standard deviation and the 1st and 99
th percentiles
of the ensemble volume are included in Table 1. The obtained values for mean and median were
not surprising since the distributions were centered. However, the sizes of the percentiles and
the standard deviation indicated that the system tended to converge over each application of the
filter to simulated data. About 98% of all peak prediction errors were less than 1.11% and total
market errors were less than 1.93%. From this information, it can be inferred that this tool is
displaying consistent convergent behavior during the simulation time, over the ensemble and for
a wide variety of noise scaling factors.
Figures 1 and 2 show the standard ensemble standard deviation as a function of time and
input error for the peak sales time and total market respectively. Note that for noise factors
under .1, the model error reduces significantly within the first two time-periods, and a second
convergence occurs in the second half of the dataset. Figure 3 shows a sample exact run with a
noise figure of approximately .5%. The associated plot of error in prediction of peak time is
shown in Figure 4. After time .28, the error was not greater than 4% - a relevant point
considering the apparent difference between signal and noise shown in Figure 3.
When the initial state covariance was randomly perturbed by a Gaussian noise scaled to the
magnitude of the covariance, the results were much more problematic (Figs. 5 and 6). Observe
14
that although the filter produces an improvement in measured sales values, the peak time
estimation error diverges and is at almost 100% error at the end of the process. There were terms
in the state covariance that are on the order of 107 and others that were 10
-6 resulting in a
condition number for the state covariance matrix of 1.7x1013
.
There are a number of issues which prevent this model from meeting our stated criteria.
Although the exact model with perfect information works very well, it requires knowledge of the
measurement correlation, which is not typically available a priori for the measurement
covariance R. R is not only unknown, but also often impossible to measure directly, although it
can be estimated using a Robbins-Monro method.
The Kalman assumptions about noise are not consistent with real-world noise. Real-world
noise terms would probably not be the zero mean, non-correlated noise assumed in the c/d-EKF.
As we have seen, even moderate noise in the sales data cause the c/d-EKF to diverge from the
curve underlying the noisy sales data.
In practice, sales data is sampled monthly and product lifetimes are on the order of 1 to 2
years – not the 23.3 years it would take to get 280 data points before the peak. Daily reporting
would be required to generate this number of data points, instead of the weekly or monthly
reporting common in business environments.
Finally, there are no predictive indicators of changes in the fundamental nature of a business
and, since these changes often occur quickly, an assumption of constant parameters, and
therefore linearity, is false. Under these circumstances, the Bass model that a business started
with can no longer be used as the final diffusion model for the product and a single Bass model
for the lifespan of the product cannot be assumed.
15
An effective tool for use in the real world must work without a measurement covariance
estimate. It must work with sample sizes on the order of 12 (monthly) to 52 (weekly) data points
associated with realistic data-sampling periods. It must be able to indicate and account for
changes in the fundamental nature of the market, and still predict the time of peak sales and total
market with acceptable accuracy. It should also be able to incorporate diffusion models other
than the Bass Model.
Model for c/d-EKF with RM-se
Wan and Nelson (2001) and Bishop (1995), in his section on sequential parameter
estimation, describe the use of a Robbins-Monro stochastic estimator applied to a Kalman filter
to iteratively estimate the covariance matrices and the specific details required to successfully
implement the estimator, respectively. Essentially, as a form of an iterative averaging tool, this
estimator allowed a perturbed estimate of R to be made to converge toward the exact value of R.
This method was found to converge toward the exact values over each iteration. Note that the
estimates for p, q, m and R resulting from one transit of the dataset were used as initial values for
a second iteration. This allowed continued successive convergence of the approximate tool
toward performance of the exact.
Describe Algorithm: c/d-EKF with RM-se
The data generation method, initial values for parameters and ensemble evaluation methods
described above were used to test the c/d-EKF with RM-se with the following exceptions:
The initial measurement covariance was uniform-randomly perturbed by ± 25% from
the exact value, and then the Robbins-Monro estimator was used to refine it.
16
The noise factor was varied from 10-3.5
to 10-0.5
using 35 logarithmically spaced
sample points.
The Robbins-Monro estimator was set up as follows:
Rk
e 1 k Rk 1
e
k KK yk Hˆ x k yk Hˆ x kT
KK
T
ak
1
k, round
1
52k 1
Where:
i
i 1
i
2
i 1
limi
0
Note that the superscript e indicates estimator. Kk is the Kalman gain, the error term
(y - Hx) is evaluated on the most current data and is the learning rate parameter.
Describe Results: c/d-EKF with RM-se
For use with all noise values and being limited to plausible business constraints, the standard
deviation of the ensemble decreased as the algorithm progressed as can be seen in Figures 7 and
8. Without knowledge of the measurement covariance, the estimates converged toward less error
for all cases, as shown by the average along the noise axis in Figure 9. In Figure 10, it can be
seen that the time-region of greatest rate of convergence in peak time estimate lies between t=2
and t=7 for most cases, with the highest noise terms this region is between t=4 and t=7, or
between 40% and 70% of the time until the peak sales rate is reached. For the total market
17
estimate the region of greatest convergence rate is more parabolic in noise factor, as seen in
Figure 11.
In practice, this estimator should be repeatedly evaluated over the dataset with the final
parameter estimates serving as the input values for the next cycle. When the mean change in the
norm of the estimated covariance R fell below a certain threshold between cycles, the process
was considered convergent. Some convergence is achieved on each cycle, but reasonable
convergence was not achieved until after several cycles, often less than 20, even with some
extreme initial errors in state estimates and in measurement covariance.
Conclusions
The c/d-EKF applied to the Bass Model was evaluated. Given accurate covariance values
and acceptable sample rates, this method was able to rapidly and effectively predict peak sales
and total market. Given slightly perturbed covariance values, this performance is lost.
A Robbins-Monro extension to the c/d-EKF applied to the Bass Model was presented and
shown to be able to predict peak sales and total market with real-world constraints of unknown
covariance, poor initial conditions and limited sampling rates.
Some directions of further research include: using the Kiefer - Wolfowitz estimators instead
of Robbins-Monroe, examining how different learning rates estimators for measurement
covariance perform as indicators for fundamental change in the nature of the market, application
to other diffusion models, such as those indicated in the paper by Meade and Islam (1998), and
using dual-EKF with Rauch-Tung-Streibel smoothing, and using Particle Filters in place of the
EKF. Also of interest is the exploration of the use of windowed datasets to determine time-
18
varying market uncertainty.
19
References
Bass, Frank (1969), “A new product growth model for consumer durables,” Management
Science, 15 (January), 215-27.
Bishop, Christopher M. (1995), Neural Networks for Pattern Recognition. New York: Oxford
University Press.
De Castro, Leandro Nunes (2006), Fundamentals of Natural Computing: Basic Concepts,
Algorithms, and Applications. Boca Raton: Chapman & Hall/CRC Press.
Evensen, Geir (2004), “Sampling strategies and square root analysis schemes for the EnKF,”
Ocean Dynamics, 54 (December), 539-60.
Hagan, Martin, Howard Demuth, and Mark Beale (1996), Neural Network Design. Boston: PWS
Publishing Co.
Wan, Eric and Alex Nelson (2001), “Dual Extended Kalman Filter Methods,” in Kalman
Filtering and Neural Networks, Simon S. Haykin ed. New York: John Wiley & Sons, 123-73.
20
Institute of Electrical and Electronics Engineers (1974), IEEE Annual Banquet Booklet,
(accessed 30, 2008), [available at
http://www.ieee.org/web/aboutus/history_center/biography/kalman.html]
Kalman, Rudolf R. (1960), “A new approach to linear filtering and prediction problems,”
Journal of Basic Engineering, 34 (March), 35-45.
Krane, John (2005), “A Physicist on Wall Street,” (accessed January 2, 2009), [available at
http://www.fnal.gov/orgs/fermilab_users_org/Krane.ppt]
Meade, Nigel and Towhidul Islam (1998), “Technological Forecasting – Model Selection, Model
Stability, and Combining Models,” Management Science, 44 (August), 1115-30.
Welch, Greg and Gary Bishop (2006), “An Introduction to the Kalman Filter,” Technical Report
95-041, Department of Computer Science, University of North Carolina at Chapel Hill.
Xie, Jinhong, X. Michael Song, Marvin Sirbu, and Qiong Wang (1997), “Kalman Filter
Estimation of New Product Diffusion Models,” Journal of Marketing Research, 34 (August),
378-93.
21
Table
0
Table 1 – Statistics of c/d EKF filtering ensemble.
22
Figures
Figure 1 – Surface of Ensemble Standard Deviation of Peak Time Prediction Error as a function
of time (peak at 8.79) and log10 of noise factor.
23
Figure 2 – Surface of Ensemble Standard Deviation of Total Market prediction Error as a
function of time (peak at 8.79) and log10 of noise factor.
Figure 3 – Sample run of c/d-EKF with noise factor of .5% and 280 samples with Exact,
corrupted, and filtered data above and the filter error below.
24
Figure 4 – Sample run of c/d-EKF with noise factor of .5% and 280 samples showing log10 of
peak time prediction error.
25
Figure 5 – Sample run of c/d-EKF with noise factor of .5% and 280 samples, but with “Q”
perturbed, showing sales data filtering.
Figure 6 – Sample run of c/d-EKF with noise factor of .5% and 280 samples, but with “Q”
perturbed, showing log10 of peak time prediction error divergence.
26
Figure 7 – Surface of Ensemble Standard Deviation of Peak Time Prediction Error as a function
of time (peak at 8.79) and log10 of noise factor for c/d-EKF RM-se.
Figure 8 – Surface of Ensemble Standard Deviation of Total Market Prediction Error as a
function of time (peak at 8.79) and log10 of noise factor for c/d-EKF RM-se.
27
Figure 9 – Mean over Surface of Ensemble Standard Deviation of Prediction Error as a function
of time (peak at 8.79) for c/d-EKF RM-se.
Figure 10 – Gradient over Surface of Ensemble Standard Deviation of Peak Time Prediction
28
Error as a function of time (peak at 8.79) for c/d-EKF RM-se.
Figure 11 – Gradient over Surface of Ensemble Standard Deviation of Market Prediction Error
as a function of time (peak at 8.79) for c/d-EKF RM-se.