View
613
Download
1
Category
Tags:
Preview:
Citation preview
Rob J Hyndman
Automatic algorithmsfor time seriesforecasting
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting Motivation 2
Motivation
Automatic algorithms for time series forecasting Motivation 3
Motivation
Automatic algorithms for time series forecasting Motivation 3
Motivation
Automatic algorithms for time series forecasting Motivation 3
Motivation
Automatic algorithms for time series forecasting Motivation 3
Motivation
Automatic algorithms for time series forecasting Motivation 3
Motivation
1 Common in business to have over 1000products that need forecasting at least monthly.
2 Forecasts are often required by people who areuntrained in time series analysis.
Specifications
Automatic forecasting algorithms must:
å determine an appropriate time series model;
å estimate the parameters;
å compute the forecasts with prediction intervals.
Automatic algorithms for time series forecasting Motivation 4
Motivation
1 Common in business to have over 1000products that need forecasting at least monthly.
2 Forecasts are often required by people who areuntrained in time series analysis.
Specifications
Automatic forecasting algorithms must:
å determine an appropriate time series model;
å estimate the parameters;
å compute the forecasts with prediction intervals.
Automatic algorithms for time series forecasting Motivation 4
Example: Asian sheep
Automatic algorithms for time series forecasting Motivation 5
Numbers of sheep in Asia
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Example: Asian sheep
Automatic algorithms for time series forecasting Motivation 5
Automatic ETS forecasts
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Example: Cortecosteroid sales
Automatic algorithms for time series forecasting Motivation 6
Monthly cortecosteroid drug sales in Australia
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
Example: Cortecosteroid sales
Automatic algorithms for time series forecasting Motivation 6
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting Forecasting competitions 7
Makridakis and Hibon (1979)
Automatic algorithms for time series forecasting Forecasting competitions 8
Makridakis and Hibon (1979)
Automatic algorithms for time series forecasting Forecasting competitions 8
Makridakis and Hibon (1979)
This was the first large-scale empirical evaluation oftime series forecasting methods.
Highly controversial at the time.
Difficulties:8 How to measure forecast accuracy?8 How to apply methods consistently and objectively?8 How to explain unexpected results?
Common thinking was that the moresophisticated mathematical models (ARIMAmodels at the time) were necessarily better.If results showed ARIMA models not best, itmust be because analyst was unskilled.
Automatic algorithms for time series forecasting Forecasting competitions 9
Makridakis and Hibon (1979)
It is amazing to me, however, that after all thisexercise in identifying models, transforming and soon, that the autoregressive moving averages comeout so badly. I wonder whether it might be partlydue to the authors not using the backwardsforecasting approach to obtain the initial errors.
— W.G. Gilchrist
I find it hard to believe that Box-Jenkins, if properlyapplied, can actually be worse than so many of thesimple methods . . . these authors are more athome with simple procedures than withBox-Jenkins. — C. ChatfieldAutomatic algorithms for time series forecasting Forecasting competitions 10
Makridakis and Hibon (1979)
It is amazing to me, however, that after all thisexercise in identifying models, transforming and soon, that the autoregressive moving averages comeout so badly. I wonder whether it might be partlydue to the authors not using the backwardsforecasting approach to obtain the initial errors.
— W.G. Gilchrist
I find it hard to believe that Box-Jenkins, if properlyapplied, can actually be worse than so many of thesimple methods . . . these authors are more athome with simple procedures than withBox-Jenkins. — C. ChatfieldAutomatic algorithms for time series forecasting Forecasting competitions 10
Consequences of M&H (1979)
As a result of this paper, researchers started to:
å consider how to automate forecasting methods;
å study what methods give the best forecasts;
å be aware of the dangers of over-fitting;
å treat forecasting as a different problem fromtime series analysis.
Makridakis & Hibon followed up with a newcompetition in 1982:
1001 seriesAnyone could submit forecasts (avoiding thecharge of incompetence)Multiple forecast measures used.
Automatic algorithms for time series forecasting Forecasting competitions 11
Consequences of M&H (1979)
As a result of this paper, researchers started to:
å consider how to automate forecasting methods;
å study what methods give the best forecasts;
å be aware of the dangers of over-fitting;
å treat forecasting as a different problem fromtime series analysis.
Makridakis & Hibon followed up with a newcompetition in 1982:
1001 seriesAnyone could submit forecasts (avoiding thecharge of incompetence)Multiple forecast measures used.
Automatic algorithms for time series forecasting Forecasting competitions 11
M-competition
Automatic algorithms for time series forecasting Forecasting competitions 12
M-competition
Main findings (taken from Makridakis & Hibon, 2000)
1 Statistically sophisticated or complex methods donot necessarily provide more accurate forecaststhan simpler ones.
2 The relative ranking of the performance of thevarious methods varies according to the accuracymeasure being used.
3 The accuracy when various methods are beingcombined outperforms, on average, the individualmethods being combined and does very well incomparison to other methods.
4 The accuracy of the various methods depends uponthe length of the forecasting horizon involved.
Automatic algorithms for time series forecasting Forecasting competitions 13
M3 competition
Automatic algorithms for time series forecasting Forecasting competitions 14
Makridakis and Hibon (2000)
“The M3-Competition is a final attempt by the authors tosettle the accuracy issue of various time series methods. . .The extension involves the inclusion of more methods/researchers (in particular in the areas of neural networksand expert systems) and more series.”
3003 seriesAll data from business, demography, finance andeconomics.Series length between 14 and 126.Either non-seasonal, monthly or quarterly.All time series positive.M&H claimed that the M3-competition supported thefindings of their earlier work.However, best performing methods far from “simple”.
Automatic algorithms for time series forecasting Forecasting competitions 15
Makridakis and Hibon (2000)Best methods:
Theta
A very confusing explanation.
Shown by Hyndman and Billah (2003) to be average oflinear regression and simple exponential smoothingwith drift, applied to seasonally adjusted data.
Later, the original authors claimed that theirexplanation was incorrect.
Forecast Pro
A commercial software package with an unknownalgorithm.
Known to fit either exponential smoothing or ARIMAmodels using BIC.
Automatic algorithms for time series forecasting Forecasting competitions 16
M3 results (recalculated)
Method MAPE sMAPE MASE
Theta 17.42 12.76 1.39
ForecastPro 18.00 13.06 1.47
ForecastX 17.35 13.09 1.42
Automatic ANN 17.18 13.98 1.53
B-J automatic 19.13 13.72 1.54
Automatic algorithms for time series forecasting Forecasting competitions 17
M3 results (recalculated)
Method MAPE sMAPE MASE
Theta 17.42 12.76 1.39
ForecastPro 18.00 13.06 1.47
ForecastX 17.35 13.09 1.42
Automatic ANN 17.18 13.98 1.53
B-J automatic 19.13 13.72 1.54
Automatic algorithms for time series forecasting Forecasting competitions 17
ä Calculations do not match
published paper.
ä Some contestants apparently
submitted multiple entries but only
best ones published.
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting Exponential smoothing 18
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothing
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear method
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend method
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend method
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend method
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ method
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ methodA,M: Multiplicative Holt-Winters’ method
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exp. smoothing methods.
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exp. smoothing methods.Each can have an additive or multiplicative error,giving 30 separate models.
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exp. smoothing methods.Each can have an additive or multiplicative error,giving 30 separate models.Only 19 models are numerically stable.
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
There are 15 separate exp. smoothing methods.Each can have an additive or multiplicative error,giving 30 separate models.Only 19 models are numerically stable.Multiplicative trend models give poor forecastsleaving 15 models.
Automatic algorithms for time series forecasting Exponential smoothing 19
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic algorithms for time series forecasting Exponential smoothing 20
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing
Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic algorithms for time series forecasting Exponential smoothing 20
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↑
TrendExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic algorithms for time series forecasting Exponential smoothing 20
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↑ ↖
Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic algorithms for time series forecasting Exponential smoothing 20
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↗ ↑ ↖
Error Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic algorithms for time series forecasting Exponential smoothing 20
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↗ ↑ ↖
Error Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic algorithms for time series forecasting Exponential smoothing 20
Exponential smoothing methods
Seasonal ComponentTrend N A M
Component (None) (Additive) (Multiplicative)
N (None) N,N N,A N,M
A (Additive) A,N A,A A,M
Ad (Additive damped) Ad,N Ad,A Ad,M
M (Multiplicative) M,N M,A M,M
Md (Multiplicative damped) Md,N Md,A Md,M
General notation E T S : ExponenTial Smoothing↗ ↑ ↖
Error Trend SeasonalExamples:
A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors
Automatic algorithms for time series forecasting Exponential smoothing 20
Innovations state space models
å All ETS models can be written in innovationsstate space form (IJF, 2002).
å Additive and multiplicative versions give thesame point forecasts but different predictionintervals.
ETS state space model
xt−1
εt
yt
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3 xt+3
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3 xt+3 yt+4
εt+4
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
ETS state space model
xt−1
εt
yt
xt yt+1
εt+1 xt+1 yt+2
εt+2 xt+2 yt+3
εt+3 xt+3 yt+4
εt+4
Automatic algorithms for time series forecasting Exponential smoothing 21
State space modelxt = (level, slope, seasonal)
EstimationCompute likelihood L fromε1, ε2, . . . , εT.Optimize L wrt modelparameters.
Innovations state space models
Let xt = (`t,bt, st, st−1, . . . , st−m+1) and εtiid∼ N(0, σ2).
yt = h(xt−1)︸ ︷︷ ︸+ k(xt−1)εt︸ ︷︷ ︸ Observation equation
µt et
xt = f(xt−1) + g(xt−1)εt State equation
Additive errors:k(xt−1) = 1. yt = µt + εt.
Multiplicative errors:k(xt−1) = µt. yt = µt(1 + εt).
εt = (yt − µt)/µt is relative error.
Automatic algorithms for time series forecasting Exponential smoothing 22
Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give samepoint forecasts but different predictionintervals.
Estimation
L∗(θ,x0) = n log
( n∑t=1
ε2t /k
2(xt−1)
)+ 2
n∑t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).
Automatic algorithms for time series forecasting Exponential smoothing 23
Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give samepoint forecasts but different predictionintervals.
Estimation
L∗(θ,x0) = n log
( n∑t=1
ε2t /k
2(xt−1)
)+ 2
n∑t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).
Automatic algorithms for time series forecasting Exponential smoothing 23
Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give samepoint forecasts but different predictionintervals.
Estimation
L∗(θ,x0) = n log
( n∑t=1
ε2t /k
2(xt−1)
)+ 2
n∑t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).
Automatic algorithms for time series forecasting Exponential smoothing 23
Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give samepoint forecasts but different predictionintervals.
Estimation
L∗(θ,x0) = n log
( n∑t=1
ε2t /k
2(xt−1)
)+ 2
n∑t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).
Automatic algorithms for time series forecasting Exponential smoothing 23
Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give samepoint forecasts but different predictionintervals.
Estimation
L∗(θ,x0) = n log
( n∑t=1
ε2t /k
2(xt−1)
)+ 2
n∑t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).
Automatic algorithms for time series forecasting Exponential smoothing 23
Innovations state space models
All models can be written in state space form.
Additive and multiplicative versions give samepoint forecasts but different predictionintervals.
Estimation
L∗(θ,x0) = n log
( n∑t=1
ε2t /k
2(xt−1)
)+ 2
n∑t=1
log |k(xt−1)|
= −2 log(Likelihood) + constant
Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).
Automatic algorithms for time series forecasting Exponential smoothing 23
Q: How to choosebetween the 15 usefulETS models?
Cross-validationTraditional evaluation
Automatic algorithms for time series forecasting Exponential smoothing 24
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data
Cross-validationTraditional evaluation
Standard cross-validation
Automatic algorithms for time series forecasting Exponential smoothing 24
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●
Cross-validationTraditional evaluation
Standard cross-validation
Time series cross-validation
Automatic algorithms for time series forecasting Exponential smoothing 24
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Cross-validationTraditional evaluation
Standard cross-validation
Time series cross-validation
Automatic algorithms for time series forecasting Exponential smoothing 24
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Cross-validationTraditional evaluation
Standard cross-validation
Time series cross-validation
Automatic algorithms for time series forecasting Exponential smoothing 24
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Also known as “Evaluation ona rolling forecast origin”
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number ofestimated parameters in the model.
This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.
Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number ofestimated parameters in the model.
This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.
Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number ofestimated parameters in the model.
This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.
Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number ofestimated parameters in the model.
This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.
Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
where L is the likelihood and k is the number ofestimated parameters in the model.
This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.
Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
Corrected AICFor small T, AIC tends to over-fit. Bias-correctedversion:
AICC = AIC + 2(k+1)(k+2)T−k
Bayesian Information Criterion
BIC = AIC + k[log(T)− 2]
BIC penalizes terms more heavily than AICMinimizing BIC is consistent if there is a truemodel.
Automatic algorithms for time series forecasting Exponential smoothing 26
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
Corrected AICFor small T, AIC tends to over-fit. Bias-correctedversion:
AICC = AIC + 2(k+1)(k+2)T−k
Bayesian Information Criterion
BIC = AIC + k[log(T)− 2]
BIC penalizes terms more heavily than AICMinimizing BIC is consistent if there is a truemodel.
Automatic algorithms for time series forecasting Exponential smoothing 26
Akaike’s Information Criterion
AIC = −2 log(L) + 2k
Corrected AICFor small T, AIC tends to over-fit. Bias-correctedversion:
AICC = AIC + 2(k+1)(k+2)T−k
Bayesian Information Criterion
BIC = AIC + k[log(T)− 2]
BIC penalizes terms more heavily than AICMinimizing BIC is consistent if there is a truemodel.
Automatic algorithms for time series forecasting Exponential smoothing 26
What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.
As T →∞, BIC selects true model if there isone. But that is never true!
AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.
Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.
Automatic algorithms for time series forecasting Exponential smoothing 27
What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.
As T →∞, BIC selects true model if there isone. But that is never true!
AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.
Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.
Automatic algorithms for time series forecasting Exponential smoothing 27
What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.
As T →∞, BIC selects true model if there isone. But that is never true!
AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.
Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.
Automatic algorithms for time series forecasting Exponential smoothing 27
What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.
As T →∞, BIC selects true model if there isone. But that is never true!
AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.
Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.
Automatic algorithms for time series forecasting Exponential smoothing 27
What to use?
Choice: AIC, AICc, BIC, CV-MSE
CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.
As T →∞, BIC selects true model if there isone. But that is never true!
AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.
Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.
Automatic algorithms for time series forecasting Exponential smoothing 27
ets algorithm in R
Automatic algorithms for time series forecasting Exponential smoothing 28
Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):
Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.
Select best method using AICc.
Produce forecasts using bestmethod.
Obtain prediction intervals usingunderlying state space model.
ets algorithm in R
Automatic algorithms for time series forecasting Exponential smoothing 28
Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):
Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.
Select best method using AICc.
Produce forecasts using bestmethod.
Obtain prediction intervals usingunderlying state space model.
ets algorithm in R
Automatic algorithms for time series forecasting Exponential smoothing 28
Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):
Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.
Select best method using AICc.
Produce forecasts using bestmethod.
Obtain prediction intervals usingunderlying state space model.
ets algorithm in R
Automatic algorithms for time series forecasting Exponential smoothing 28
Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):
Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.
Select best method using AICc.
Produce forecasts using bestmethod.
Obtain prediction intervals usingunderlying state space model.
Exponential smoothing
Automatic algorithms for time series forecasting Exponential smoothing 29
Forecasts from ETS(M,A,N)
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
300
400
500
600
Exponential smoothing
fit <- ets(livestock)fcast <- forecast(fit)plot(fcast)
Automatic algorithms for time series forecasting Exponential smoothing 30
Forecasts from ETS(M,A,N)
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
300
400
500
600
Exponential smoothing
Automatic algorithms for time series forecasting Exponential smoothing 31
Forecasts from ETS(M,N,M)
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Exponential smoothing
fit <- ets(h02)fcast <- forecast(fit)plot(fcast)
Automatic algorithms for time series forecasting Exponential smoothing 32
Forecasts from ETS(M,N,M)
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Exponential smoothing
> fitETS(M,N,M)
Smoothing parameters:alpha = 0.4597gamma = 1e-04
Initial states:l = 0.4501s = 0.8628 0.8193 0.7648 0.7675 0.6946 1.2921
1.3327 1.1833 1.1617 1.0899 1.0377 0.9937
sigma: 0.0675
AIC AICc BIC-115.69960 -113.47738 -69.24592
Automatic algorithms for time series forecasting Exponential smoothing 33
M3 comparisons
Method MAPE sMAPE MASE
Theta 17.42 12.76 1.39
ForecastPro 18.00 13.06 1.47
ForecastX 17.35 13.09 1.42
Automatic ANN 17.18 13.98 1.53
B-J automatic 19.13 13.72 1.54
ETS 17.38 13.13 1.43
Automatic algorithms for time series forecasting Exponential smoothing 34
Exponential smoothing
Automatic algorithms for time series forecasting Exponential smoothing 35
Exponential smoothing
Automatic algorithms for time series forecasting Exponential smoothing 35
www.OTexts.org/fpp
Exponential smoothing
Automatic algorithms for time series forecasting Exponential smoothing 35
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting ARIMA modelling 36
ARIMA models
yt−1
yt−2
yt−3
yt
Inputs Output
Automatic algorithms for time series forecasting ARIMA modelling 37
ARIMA models
yt−1
yt−2
yt−3
εt
yt
Inputs Output
Automatic algorithms for time series forecasting ARIMA modelling 37
Autoregression (AR)model
ARIMA models
yt−1
yt−2
yt−3
εt
εt−1
εt−2
yt
Inputs Output
Automatic algorithms for time series forecasting ARIMA modelling 37
Autoregression movingaverage (ARMA) model
ARIMA models
yt−1
yt−2
yt−3
εt
εt−1
εt−2
yt
Inputs Output
Automatic algorithms for time series forecasting ARIMA modelling 37
Autoregression movingaverage (ARMA) model
EstimationCompute likelihood L fromε1, ε2, . . . , εT.Use optimizationalgorithm to maximize L.
ARIMA models
yt−1
yt−2
yt−3
εt
εt−1
εt−2
yt
Inputs Output
Automatic algorithms for time series forecasting ARIMA modelling 37
Autoregression movingaverage (ARMA) model
EstimationCompute likelihood L fromε1, ε2, . . . , εT.Use optimizationalgorithm to maximize L.
ARIMA modelAutoregression movingaverage (ARMA) modelapplied to differences.
ARIMA modelling
Automatic algorithms for time series forecasting ARIMA modelling 38
ARIMA modelling
Automatic algorithms for time series forecasting ARIMA modelling 38
ARIMA modelling
Automatic algorithms for time series forecasting ARIMA modelling 38
Auto ARIMA
Automatic algorithms for time series forecasting ARIMA modelling 39
Forecasts from ARIMA(0,1,0) with drift
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Auto ARIMA
fit <- auto.arima(livestock)fcast <- forecast(fit)plot(fcast)
Automatic algorithms for time series forecasting ARIMA modelling 40
Forecasts from ARIMA(0,1,0) with drift
Year
mill
ions
of s
heep
1960 1970 1980 1990 2000 2010
250
300
350
400
450
500
550
Auto ARIMA
Automatic algorithms for time series forecasting ARIMA modelling 41
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
Auto ARIMA
fit <- auto.arima(h02)fcast <- forecast(fit)plot(fcast)
Automatic algorithms for time series forecasting ARIMA modelling 42
Forecasts from ARIMA(3,1,3)(0,1,1)[12]
Year
Tota
l scr
ipts
(m
illio
ns)
1995 2000 2005 2010
0.4
0.6
0.8
1.0
1.2
1.4
Auto ARIMA
> fitSeries: h02ARIMA(3,1,3)(0,1,1)[12]
Coefficients:ar1 ar2 ar3 ma1 ma2 ma3 sma1
-0.3648 -0.0636 0.3568 -0.4850 0.0479 -0.353 -0.5931s.e. 0.2198 0.3293 0.1268 0.2227 0.2755 0.212 0.0651
sigma^2 estimated as 0.002706: log likelihood=290.25AIC=-564.5 AICc=-563.71 BIC=-538.48
Automatic algorithms for time series forecasting ARIMA modelling 43
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1− B)dyt = c + θ(B)εt
Need to select appropriate orders p,q,d, andwhether to include c.
Automatic algorithms for time series forecasting ARIMA modelling 44
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1− B)dyt = c + θ(B)εt
Need to select appropriate orders p,q,d, andwhether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AICc.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.
Automatic algorithms for time series forecasting ARIMA modelling 44
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A non-seasonal ARIMA process
φ(B)(1− B)dyt = c + θ(B)εt
Need to select appropriate orders p,q,d, andwhether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AICc.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.
Automatic algorithms for time series forecasting ARIMA modelling 44
Algorithm choices driven by forecast accuracy.
How does auto.arima() work?
A seasonal ARIMA process
Φ(Bm)φ(B)(1− B)d(1− Bm)Dyt = c + Θ(Bm)θ(B)εt
Need to select appropriate orders p,q,d, P,Q,D, andwhether to include c.
Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select D using OCSB unit root test.Select p,q, P,Q, c by minimising AICc.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.
Automatic algorithms for time series forecasting ARIMA modelling 45
M3 comparisons
Method MAPE sMAPE MASE
Theta 17.42 12.76 1.39
ForecastPro 18.00 13.06 1.47
B-J automatic 19.13 13.72 1.54
ETS 17.38 13.13 1.43
AutoARIMA 19.12 13.85 1.47
Automatic algorithms for time series forecasting ARIMA modelling 46
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 47
Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3competition!
Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.
Some good recent work by Kourentzes andCrone on automated ANN for time series.
Watch this space!
Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48
Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3competition!
Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.
Some good recent work by Kourentzes andCrone on automated ANN for time series.
Watch this space!
Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48
Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3competition!
Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.
Some good recent work by Kourentzes andCrone on automated ANN for time series.
Watch this space!
Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48
Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3competition!
Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.
Some good recent work by Kourentzes andCrone on automated ANN for time series.
Watch this space!
Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48
Automatic nonlinear forecasting
Automatic ANN in M3 competition did poorly.
Linear methods did best in the NN3competition!
Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.
Some good recent work by Kourentzes andCrone on automated ANN for time series.
Watch this space!
Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting Time series with complex seasonality 49
Examples
Automatic algorithms for time series forecasting Time series with complex seasonality 50
US finished motor gasoline products
Weeks
Tho
usan
ds o
f bar
rels
per
day
1992 1994 1996 1998 2000 2002 2004
6500
7000
7500
8000
8500
9000
9500
Examples
Automatic algorithms for time series forecasting Time series with complex seasonality 50
Number of calls to large American bank (7am−9pm)
5 minute intervals
Num
ber
of c
all a
rriv
als
100
200
300
400
3 March 17 March 31 March 14 April 28 April 12 May
Examples
Automatic algorithms for time series forecasting Time series with complex seasonality 50
Turkish electricity demand
Days
Ele
ctric
ity d
eman
d (G
W)
2000 2002 2004 2006 2008
1015
2025
TBATS model
TBATSTrigonometric terms for seasonality
Box-Cox transformations for heterogeneity
ARMA errors for short-term dynamics
Trend (possibly damped)
Seasonal (including multiple and non-integer periods)
Automatic algorithm described in AM De Livera,RJ Hyndman, and RD Snyder (2011). “Forecastingtime series with complex seasonal patterns usingexponential smoothing”. Journal of the AmericanStatistical Association 106(496), 1513–1527.
Automatic algorithms for time series forecasting Time series with complex seasonality 51
TBATS model
yt = observation at time t
y(ω)t =
{(yω
t − 1)/ω if ω 6= 0;
log yt if ω = 0.
y(ω)t = `t−1 + φbt−1 +
M∑i=1
s(i)t−mi+ dt
`t = `t−1 + φbt−1 + αdt
bt = (1− φ)b + φbt−1 + βdt
dt =
p∑i=1
φidt−i +
q∑j=1
θjεt−j + εt
s(i)t =
ki∑j=1
s(i)j,t
Automatic algorithms for time series forecasting Time series with complex seasonality 52
s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt
s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt
TBATS model
yt = observation at time t
y(ω)t =
{(yω
t − 1)/ω if ω 6= 0;
log yt if ω = 0.
y(ω)t = `t−1 + φbt−1 +
M∑i=1
s(i)t−mi+ dt
`t = `t−1 + φbt−1 + αdt
bt = (1− φ)b + φbt−1 + βdt
dt =
p∑i=1
φidt−i +
q∑j=1
θjεt−j + εt
s(i)t =
ki∑j=1
s(i)j,t
Automatic algorithms for time series forecasting Time series with complex seasonality 52
s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt
s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt
Box-Cox transformation
TBATS model
yt = observation at time t
y(ω)t =
{(yω
t − 1)/ω if ω 6= 0;
log yt if ω = 0.
y(ω)t = `t−1 + φbt−1 +
M∑i=1
s(i)t−mi+ dt
`t = `t−1 + φbt−1 + αdt
bt = (1− φ)b + φbt−1 + βdt
dt =
p∑i=1
φidt−i +
q∑j=1
θjεt−j + εt
s(i)t =
ki∑j=1
s(i)j,t
Automatic algorithms for time series forecasting Time series with complex seasonality 52
s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt
s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt
Box-Cox transformation
M seasonal periods
TBATS model
yt = observation at time t
y(ω)t =
{(yω
t − 1)/ω if ω 6= 0;
log yt if ω = 0.
y(ω)t = `t−1 + φbt−1 +
M∑i=1
s(i)t−mi+ dt
`t = `t−1 + φbt−1 + αdt
bt = (1− φ)b + φbt−1 + βdt
dt =
p∑i=1
φidt−i +
q∑j=1
θjεt−j + εt
s(i)t =
ki∑j=1
s(i)j,t
Automatic algorithms for time series forecasting Time series with complex seasonality 52
s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt
s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt
Box-Cox transformation
M seasonal periods
global and local trend
TBATS model
yt = observation at time t
y(ω)t =
{(yω
t − 1)/ω if ω 6= 0;
log yt if ω = 0.
y(ω)t = `t−1 + φbt−1 +
M∑i=1
s(i)t−mi+ dt
`t = `t−1 + φbt−1 + αdt
bt = (1− φ)b + φbt−1 + βdt
dt =
p∑i=1
φidt−i +
q∑j=1
θjεt−j + εt
s(i)t =
ki∑j=1
s(i)j,t
Automatic algorithms for time series forecasting Time series with complex seasonality 52
s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt
s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt
Box-Cox transformation
M seasonal periods
global and local trend
ARMA error
TBATS model
yt = observation at time t
y(ω)t =
{(yω
t − 1)/ω if ω 6= 0;
log yt if ω = 0.
y(ω)t = `t−1 + φbt−1 +
M∑i=1
s(i)t−mi+ dt
`t = `t−1 + φbt−1 + αdt
bt = (1− φ)b + φbt−1 + βdt
dt =
p∑i=1
φidt−i +
q∑j=1
θjεt−j + εt
s(i)t =
ki∑j=1
s(i)j,t
Automatic algorithms for time series forecasting Time series with complex seasonality 52
s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt
s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt
Box-Cox transformation
M seasonal periods
global and local trend
ARMA error
Fourier-like seasonalterms
TBATS model
yt = observation at time t
y(ω)t =
{(yω
t − 1)/ω if ω 6= 0;
log yt if ω = 0.
y(ω)t = `t−1 + φbt−1 +
M∑i=1
s(i)t−mi+ dt
`t = `t−1 + φbt−1 + αdt
bt = (1− φ)b + φbt−1 + βdt
dt =
p∑i=1
φidt−i +
q∑j=1
θjεt−j + εt
s(i)t =
ki∑j=1
s(i)j,t
Automatic algorithms for time series forecasting Time series with complex seasonality 52
s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt
s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt
Box-Cox transformation
M seasonal periods
global and local trend
ARMA error
Fourier-like seasonalterms
TBATSTrigonometric
Box-Cox
ARMA
Trend
Seasonal
Examples
fit <- tbats(gasoline)fcast <- forecast(fit)plot(fcast)
Automatic algorithms for time series forecasting Time series with complex seasonality 53
Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>})
Weeks
Tho
usan
ds o
f bar
rels
per
day
1995 2000 2005
7000
8000
9000
1000
0
Examples
fit <- tbats(callcentre)fcast <- forecast(fit)plot(fcast)
Automatic algorithms for time series forecasting Time series with complex seasonality 54
Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>})
5 minute intervals
Num
ber
of c
all a
rriv
als
010
020
030
040
050
0
3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June
Examples
fit <- tbats(turk)fcast <- forecast(fit)plot(fcast)
Automatic algorithms for time series forecasting Time series with complex seasonality 55
Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>})
Days
Ele
ctric
ity d
eman
d (G
W)
2000 2002 2004 2006 2008 2010
1015
2025
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting Hierarchical and grouped time series 56
Hierarchical time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
ExamplesNet labour turnoverTourism by state and region
Automatic algorithms for time series forecasting Hierarchical and grouped time series 57
Hierarchical time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
ExamplesNet labour turnoverTourism by state and region
Automatic algorithms for time series forecasting Hierarchical and grouped time series 57
Hierarchical time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.
Total
A
AA AB AC
B
BA BB BC
C
CA CB CC
ExamplesNet labour turnoverTourism by state and region
Automatic algorithms for time series forecasting Hierarchical and grouped time series 57
Hierarchical time series
Total
A B C
Automatic algorithms for time series forecasting Hierarchical and grouped time series 58
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
bt : vector of all series atbottom level in time t.
Hierarchical time series
Total
A B C
Automatic algorithms for time series forecasting Hierarchical and grouped time series 58
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
bt : vector of all series atbottom level in time t.
Hierarchical time series
Total
A B C
yt = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
YA,t
YB,t
YC,t
Automatic algorithms for time series forecasting Hierarchical and grouped time series 58
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
bt : vector of all series atbottom level in time t.
Hierarchical time series
Total
A B C
yt = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
︸ ︷︷ ︸
S
YA,t
YB,t
YC,t
Automatic algorithms for time series forecasting Hierarchical and grouped time series 58
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
bt : vector of all series atbottom level in time t.
Hierarchical time series
Total
A B C
yt = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
︸ ︷︷ ︸
S
YA,t
YB,t
YC,t
︸ ︷︷ ︸
bt
Automatic algorithms for time series forecasting Hierarchical and grouped time series 58
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
bt : vector of all series atbottom level in time t.
Hierarchical time series
Total
A B C
yt = [Yt, YA,t, YB,t, YC,t]′ =
1 1 11 0 00 1 00 0 1
︸ ︷︷ ︸
S
YA,t
YB,t
YC,t
︸ ︷︷ ︸
btyt = Sbt
Automatic algorithms for time series forecasting Hierarchical and grouped time series 58
Yt : observed aggregate of allseries at time t.
YX,t : observation on series X attime t.
bt : vector of all series atbottom level in time t.
Hierarchical time seriesTotal
A
AX AY AZ
B
BX BY BZ
C
CX CY CZ
yt =
Yt
YA,t
YB,t
YC,t
YAX,t
YAY,t
YAZ,t
YBX,t
YBY,t
YBZ,t
YCX,t
YCY,t
YCZ,t
=
1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1
︸ ︷︷ ︸
S
YAX,t
YAY,t
YAZ,t
YBX,t
YBY,t
YBZ,t
YCX,t
YCY,t
YCZ,t
︸ ︷︷ ︸
bt
Automatic algorithms for time series forecasting Hierarchical and grouped time series 59
Hierarchical time seriesTotal
A
AX AY AZ
B
BX BY BZ
C
CX CY CZ
yt =
Yt
YA,t
YB,t
YC,t
YAX,t
YAY,t
YAZ,t
YBX,t
YBY,t
YBZ,t
YCX,t
YCY,t
YCZ,t
=
1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1
︸ ︷︷ ︸
S
YAX,t
YAY,t
YAZ,t
YBX,t
YBY,t
YBZ,t
YCX,t
YCY,t
YCZ,t
︸ ︷︷ ︸
bt
Automatic algorithms for time series forecasting Hierarchical and grouped time series 59
Hierarchical time seriesTotal
A
AX AY AZ
B
BX BY BZ
C
CX CY CZ
yt =
Yt
YA,t
YB,t
YC,t
YAX,t
YAY,t
YAZ,t
YBX,t
YBY,t
YBZ,t
YCX,t
YCY,t
YCZ,t
=
1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1
︸ ︷︷ ︸
S
YAX,t
YAY,t
YAZ,t
YBX,t
YBY,t
YBZ,t
YCX,t
YCY,t
YCZ,t
︸ ︷︷ ︸
bt
Automatic algorithms for time series forecasting Hierarchical and grouped time series 59
yt = Sbt
Forecasting notation
Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)
Reconciled forecasts are of the form:yn(h) = SPyn(h)
for some matrix P.
P extracts and combines base forecasts yn(h)to get bottom-level forecasts.
S adds them up
Automatic algorithms for time series forecasting Hierarchical and grouped time series 60
Forecasting notation
Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)
Reconciled forecasts are of the form:yn(h) = SPyn(h)
for some matrix P.
P extracts and combines base forecasts yn(h)to get bottom-level forecasts.
S adds them up
Automatic algorithms for time series forecasting Hierarchical and grouped time series 60
Forecasting notation
Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)
Reconciled forecasts are of the form:yn(h) = SPyn(h)
for some matrix P.
P extracts and combines base forecasts yn(h)to get bottom-level forecasts.
S adds them up
Automatic algorithms for time series forecasting Hierarchical and grouped time series 60
Forecasting notation
Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)
Reconciled forecasts are of the form:yn(h) = SPyn(h)
for some matrix P.
P extracts and combines base forecasts yn(h)to get bottom-level forecasts.
S adds them up
Automatic algorithms for time series forecasting Hierarchical and grouped time series 60
Forecasting notation
Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)
Reconciled forecasts are of the form:yn(h) = SPyn(h)
for some matrix P.
P extracts and combines base forecasts yn(h)to get bottom-level forecasts.
S adds them up
Automatic algorithms for time series forecasting Hierarchical and grouped time series 60
General properties
yn(h) = SPyn(h)
Forecast biasAssuming the base forecasts yn(h) are unbiased,then the revised forecasts are unbiased iff SPS = S.
Forecast varianceFor any given P satisfying SPS = S, the covariancematrix of the h-step ahead reconciled forecasterrors is given by
Var[yn+h − yn(h)] = SPWhP′S′
where Wh is the covariance matrix of the h-stepahead base forecast errors.Automatic algorithms for time series forecasting Hierarchical and grouped time series 61
General properties
yn(h) = SPyn(h)
Forecast biasAssuming the base forecasts yn(h) are unbiased,then the revised forecasts are unbiased iff SPS = S.
Forecast varianceFor any given P satisfying SPS = S, the covariancematrix of the h-step ahead reconciled forecasterrors is given by
Var[yn+h − yn(h)] = SPWhP′S′
where Wh is the covariance matrix of the h-stepahead base forecast errors.Automatic algorithms for time series forecasting Hierarchical and grouped time series 61
General properties
yn(h) = SPyn(h)
Forecast biasAssuming the base forecasts yn(h) are unbiased,then the revised forecasts are unbiased iff SPS = S.
Forecast varianceFor any given P satisfying SPS = S, the covariancematrix of the h-step ahead reconciled forecasterrors is given by
Var[yn+h − yn(h)] = SPWhP′S′
where Wh is the covariance matrix of the h-stepahead base forecast errors.Automatic algorithms for time series forecasting Hierarchical and grouped time series 61
BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then
minP
= trace[SPWhP′S′]
has solution P = (S′W†hS)−1S′W†
h.
W†h is generalized inverse of Wh.
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Revised forecasts Base forecasts
Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).
Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62
BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then
minP
= trace[SPWhP′S′]
has solution P = (S′W†hS)−1S′W†
h.
W†h is generalized inverse of Wh.
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Revised forecasts Base forecasts
Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).
Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62
BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then
minP
= trace[SPWhP′S′]
has solution P = (S′W†hS)−1S′W†
h.
W†h is generalized inverse of Wh.
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Revised forecasts Base forecasts
Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).
Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62
BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then
minP
= trace[SPWhP′S′]
has solution P = (S′W†hS)−1S′W†
h.
W†h is generalized inverse of Wh.
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Revised forecasts Base forecasts
Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).
Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62
BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then
minP
= trace[SPWhP′S′]
has solution P = (S′W†hS)−1S′W†
h.
W†h is generalized inverse of Wh.
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Revised forecasts Base forecasts
Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).
Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62
BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then
minP
= trace[SPWhP′S′]
has solution P = (S′W†hS)−1S′W†
h.
W†h is generalized inverse of Wh.
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Revised forecasts Base forecasts
Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).
Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62
Optimal combination forecasts
Revised forecasts Base forecasts
Solution 1: OLS
yn(h) = S(S′S)−1S′yn(h)
Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.
yn(h) = S(S′ΛS)−1S′Λyn(h)
Automatic algorithms for time series forecasting Hierarchical and grouped time series 63
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Optimal combination forecasts
Revised forecasts Base forecasts
Solution 1: OLS
yn(h) = S(S′S)−1S′yn(h)
Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.
yn(h) = S(S′ΛS)−1S′Λyn(h)
Automatic algorithms for time series forecasting Hierarchical and grouped time series 63
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Optimal combination forecasts
Revised forecasts Base forecasts
Solution 1: OLS
yn(h) = S(S′S)−1S′yn(h)
Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.
yn(h) = S(S′ΛS)−1S′Λyn(h)
Automatic algorithms for time series forecasting Hierarchical and grouped time series 63
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Optimal combination forecasts
Revised forecasts Base forecasts
Solution 1: OLS
yn(h) = S(S′S)−1S′yn(h)
Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.
yn(h) = S(S′ΛS)−1S′Λyn(h)
Automatic algorithms for time series forecasting Hierarchical and grouped time series 63
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Optimal combination forecasts
Revised forecasts Base forecasts
Solution 1: OLS
yn(h) = S(S′S)−1S′yn(h)
Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.
yn(h) = S(S′ΛS)−1S′Λyn(h)
Automatic algorithms for time series forecasting Hierarchical and grouped time series 63
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Optimal combination forecasts
Revised forecasts Base forecasts
Solution 1: OLS
yn(h) = S(S′S)−1S′yn(h)
Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.
yn(h) = S(S′ΛS)−1S′Λyn(h)
Automatic algorithms for time series forecasting Hierarchical and grouped time series 63
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Optimal combination forecasts
Revised forecasts Base forecasts
Solution 1: OLS
yn(h) = S(S′S)−1S′yn(h)
Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.
yn(h) = S(S′ΛS)−1S′Λyn(h)
Automatic algorithms for time series forecasting Hierarchical and grouped time series 63
yn(h) = S(S′W†hS)−1S′W†
hyn(h)
Challenges
Computational difficulties in big hierarchies dueto size of the S matrix and singular behavior of(S′ΛS).
Loss of information in ignoring covariancematrix in computing point forecasts.
Still need to estimate covariance matrix toproduce prediction intervals.
Automatic algorithms for time series forecasting Hierarchical and grouped time series 64
yn(h) = S(S′ΛS)−1S′Λyn(h)
Challenges
Computational difficulties in big hierarchies dueto size of the S matrix and singular behavior of(S′ΛS).
Loss of information in ignoring covariancematrix in computing point forecasts.
Still need to estimate covariance matrix toproduce prediction intervals.
Automatic algorithms for time series forecasting Hierarchical and grouped time series 64
yn(h) = S(S′ΛS)−1S′Λyn(h)
Challenges
Computational difficulties in big hierarchies dueto size of the S matrix and singular behavior of(S′ΛS).
Loss of information in ignoring covariancematrix in computing point forecasts.
Still need to estimate covariance matrix toproduce prediction intervals.
Automatic algorithms for time series forecasting Hierarchical and grouped time series 64
yn(h) = S(S′ΛS)−1S′Λyn(h)
Australian tourism
Automatic algorithms for time series forecasting Hierarchical and grouped time series 65
Australian tourism
Automatic algorithms for time series forecasting Hierarchical and grouped time series 65
Hierarchy:
States (7)
Zones (27)
Regions (82)
Australian tourism
Automatic algorithms for time series forecasting Hierarchical and grouped time series 65
Hierarchy:
States (7)
Zones (27)
Regions (82)
Base forecastsETS (exponential smoothing)models
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: Total
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
6000
065
000
7000
075
000
8000
085
000
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: NSW
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
1800
022
000
2600
030
000
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: VIC
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
1000
012
000
1400
016
000
1800
0
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: Nth.Coast.NSW
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
5000
6000
7000
8000
9000
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: Metro.QLD
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
8000
9000
1100
013
000
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: Sth.WA
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
400
600
800
1000
1200
1400
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: X201.Melbourne
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
4000
4500
5000
5500
6000
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: X402.Murraylands
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
010
020
030
0
Base forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 66
Domestic tourism forecasts: X809.Daly
Year
Vis
itor
nigh
ts
1998 2000 2002 2004 2006 2008
020
4060
8010
0
Reconciled forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 67
Tota
l
2000 2005 2010
6500
080
000
9500
0
Reconciled forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 67
NS
W
2000 2005 2010
1800
024
000
3000
0
VIC
2000 2005 20101000
014
000
1800
0
QLD
2000 2005 2010
1400
020
000
Oth
er2000 2005 201018
000
2400
0
Reconciled forecasts
Automatic algorithms for time series forecasting Hierarchical and grouped time series 67
Syd
ney
2000 2005 20104000
7000
Oth
er N
SW
2000 2005 2010
1400
022
000
Mel
bour
ne
2000 2005 2010
4000
5000
Oth
er V
IC
2000 2005 2010
6000
1200
0
GC
and
Bris
bane
2000 2005 2010
6000
9000
Oth
er Q
LD2000 2005 201060
0012
000
Cap
ital c
ities
2000 2005 2010
1400
020
000
Oth
er
2000 2005 2010
5500
7500
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.
Automatic algorithms for time series forecasting Hierarchical and grouped time series 68
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.
Automatic algorithms for time series forecasting Hierarchical and grouped time series 68
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.
Automatic algorithms for time series forecasting Hierarchical and grouped time series 68
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.
Automatic algorithms for time series forecasting Hierarchical and grouped time series 68
Hierarchy: states, zones, regions
MAPE h = 1 h = 2 h = 4 h = 6 h = 8 AverageTop Level: Australia
Bottom-up 3.79 3.58 4.01 4.55 4.24 4.06OLS 3.83 3.66 3.88 4.19 4.25 3.94WLS 3.68 3.56 3.97 4.57 4.25 4.04Level: States
Bottom-up 10.70 10.52 10.85 11.46 11.27 11.03OLS 11.07 10.58 11.13 11.62 12.21 11.35WLS 10.44 10.17 10.47 10.97 10.98 10.67Level: Zones
Bottom-up 14.99 14.97 14.98 15.69 15.65 15.32OLS 15.16 15.06 15.27 15.74 16.15 15.48WLS 14.63 14.62 14.68 15.17 15.25 14.94Bottom Level: Regions
Bottom-up 33.12 32.54 32.26 33.74 33.96 33.18OLS 35.89 33.86 34.26 36.06 37.49 35.43WLS 31.68 31.22 31.08 32.41 32.77 31.89
Automatic algorithms for time series forecasting Hierarchical and grouped time series 69
hts package for R
Automatic algorithms for time series forecasting Hierarchical and grouped time series 70
hts: Hierarchical and grouped time seriesMethods for analysing and forecasting hierarchical and groupedtime series
Version: 4.5Depends: forecast (≥ 5.0), SparseMImports: parallel, utilsPublished: 2014-12-09Author: Rob J Hyndman, Earo Wang and Alan LeeMaintainer: Rob J Hyndman <Rob.Hyndman at monash.edu>BugReports: https://github.com/robjhyndman/hts/issuesLicense: GPL (≥ 2)
Outline
1 Motivation
2 Forecasting competitions
3 Exponential smoothing
4 ARIMA modelling
5 Automatic nonlinear forecasting?
6 Time series with complex seasonality
7 Hierarchical and grouped time series
8 Recent developments
Automatic algorithms for time series forecasting Recent developments 71
Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic algorithms for time series forecasting Recent developments 72
Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic algorithms for time series forecasting Recent developments 72
Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic algorithms for time series forecasting Recent developments 72
Further competitions
1 2011 tourism forecasting competition.
2 Kaggle and other forecasting platforms.
3 GEFCom 2012: Point forecasting of
electricity load and wind power.
4 GEFCom 2014: Probabilistic forecasting
of electricity load, electricity price,
wind energy and solar energy.
Automatic algorithms for time series forecasting Recent developments 72
Forecasts about forecasting
1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.
2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.
3 Automatic forecasting algorithms formultivariate time series will be developed.
4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.
Automatic algorithms for time series forecasting Recent developments 73
Forecasts about forecasting
1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.
2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.
3 Automatic forecasting algorithms formultivariate time series will be developed.
4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.
Automatic algorithms for time series forecasting Recent developments 73
Forecasts about forecasting
1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.
2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.
3 Automatic forecasting algorithms formultivariate time series will be developed.
4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.
Automatic algorithms for time series forecasting Recent developments 73
Forecasts about forecasting
1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.
2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.
3 Automatic forecasting algorithms formultivariate time series will be developed.
4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.
Automatic algorithms for time series forecasting Recent developments 73
For further information
robjhyndman.com
Slides and references for this talk.
Links to all papers and books.
Links to R packages.
A blog about forecasting research.
Automatic algorithms for time series forecasting Recent developments 74
Recommended