52
Forecasting Purpose is to forecast, not to explain the historical pattern Models for forecasting may not make sense as a description for ”physical” behaviour of the time series Common sense and mathematics in a good combination produces ”optimal” forecasts With time series regression models, forecasting (prediction) is a natural step and forecasting limits (intervals) can be constructed With Classical decomposition, forecasting may be done, but estimation of accuracy lacks and no forecasting limits are produced Classical decomposition is usually combined with Exponential smoothing methods

Forecasting

  • Upload
    evette

  • View
    23

  • Download
    4

Embed Size (px)

DESCRIPTION

Forecasting. Purpose is to forecast, not to explain the historical pattern Models for forecasting may not make sense as a description for ”physical” behaviour of the time series Common sense and mathematics in a good combination produces ”optimal” forecasts - PowerPoint PPT Presentation

Citation preview

Page 1: Forecasting

Forecasting

• Purpose is to forecast, not to explain the historical pattern

• Models for forecasting may not make sense as a description for ”physical” behaviour of the time series

• Common sense and mathematics in a good combination produces ”optimal” forecasts

• With time series regression models, forecasting (prediction) is a natural step and forecasting limits (intervals) can be constructed

• With Classical decomposition, forecasting may be done, but estimation of accuracy lacks and no forecasting limits are produced

• Classical decomposition is usually combined with Exponential smoothing methods

Page 2: Forecasting

Exponential smoothing

• Use the historical data to forecast the future

• Let different parts of the history have different impact on the forecasts

• Forecast model is not developed from any statistical theory

Page 3: Forecasting

Single exponential smoothing

• Given are historical values y1,y2,…yT

• Assume data contains no trend

Page 4: Forecasting

Algorithm for forecasting:

)forecasts!constant ( ,2,1;ˆ

,,1,;)1( 111

TT

ttt

y

Tttty

where is a smoothing parameter with value between 0 and 1

• The forecast procedure is a recursion formula

• How shall we choose α?

• Where should we start, i.e. which is the initial value ?0t

Page 5: Forecasting

For long length time series:

Use a part (usually first half) of the historical data

and calculate their average:

Set histt y0

Update with the rest of the historical data

using the recursion formula

0

10

1 t

tthist y

ty

0,,1 tyy

Tt yy ,,10

Page 6: Forecasting

Example: Sales of everyday commodities

Year Sales values

1985 151

1986 151

1987 147

1988 149

1989 146

1990 142

1991 143

1992 145

1993 141

1994 143

1995 145

1996 138

1997 147

1998 151

1999 148

2000 148

Note! This time series is short but we use it for illustration purposes!

Page 7: Forecasting

Calculate the average of the first 8 observations of the series:

75.146...145)/8151151( histy

Set 75.1468 histy

Assume first that the sales are very stable, i.e. during the period the background mean value is assumed not to change

Set α to be relatively small. This means that the latest observation plays a less role than the history in the forecasts. Thumb rule: 0.05 < α < 0.3

E.g. Set α=0.1

Update using the next 8 values of the historical data

Page 8: Forecasting

998.145776.1459.01481.09.01.0

776.1451955.1459.01511.09.01.0

1955.145995.1449.01471.09.01.0

995.144772.1459.01381.09.01.0

772.1458575.1459.01451.09.01.0

8575.145175.1469.01431.09.01.0

175.14675.1469.01411.09.01.0

141515

131414

121313

111212

101111

91010

899

y

y

y

y

y

y

y

Page 9: Forecasting

Forecasts:

.

2.146ˆ

2.146ˆ

2.146ˆ

2.146998.1459.01481.09.01.0

19

18

17

151616

etc

y

y

y

y

Page 10: Forecasting

For short length time series:

Calculate the average of all historical data i.e.

Tty ttt ,,2,1;)1( 1

T

ttt y

T 10

10

T

tthist y

Ty

1

1

Update from the beginning of the time series:

There are a lot of alternatives: • Average of all data, update from the middle of the series• Average of the first half, update from beginning• etc.

Page 11: Forecasting
Page 12: Forecasting

Analysis of example data with MINITAB

Page 13: Forecasting
Page 14: Forecasting

MTB > Name c3 "FORE1" c4 "UPPE1" c5 "LOWE1"

MTB > SES 'Sales values';

SUBC> Weight 0.1;

SUBC> Initial 8;

SUBC> Forecasts 3;

SUBC> Fstore 'FORE1';

SUBC> Upper 'UPPE1';

SUBC> Lower 'LOWE1';

SUBC> Title "SES alpha=0.1".

Single Exponential Smoothing for Sales values

Data Sales values

Length 16

Smoothing Constant

Alpha 0.1

Page 15: Forecasting

Accuracy Measures

MAPE 2.2378

MAD 3.2447

MSD 14.4781

Forecasts

Period Forecast Lower Upper

17 146.043 138.094 153.992

18 146.043 138.094 153.992

19 146.043 138.094 153.992

Page 16: Forecasting

MINITAB uses smoothing from 1st value!

Page 17: Forecasting

Assume now that the sales are less stable, i.e. during the period the background mean value is possibly changing.

(Note that a change means an occasional “level shift” , not a systematic trend)

Set α to be relatively large. This means that the latest observation becomes more important in the forecasts.

E.g. Set α=0.5 (A bit exaggerated)

Page 18: Forecasting

Single Exponential Smoothing for Sales values

Data Sales values

Length 16

Smoothing Constant

Alpha 0.5

Accuracy Measures

MAPE 1.9924

MAD 2.8992

MSD 13.0928

Forecasts

Period Forecast Lower Upper

17 147.873 140.770 154.976

18 147.873 140.770 154.976

19 147.873 140.770 154.976

Page 19: Forecasting

Slightly narrower prediction intervals

Page 20: Forecasting

We can also use some adaptive procedure to continuosly evaluate the forecast ability and maybe change the smoothing parameter over time

Alt. We can run the process with different alphas and choose the one that performs best. This can be done with the MINITAB procedure.

Page 21: Forecasting
Page 22: Forecasting

Single Exponential Smoothing for Sales values

---

Smoothing Constant

Alpha 0.567101

Accuracy Measures

MAPE 1.7914

MAD 2.5940

MSD 12.1632

Forecasts

Period Forecast Lower Upper

17 148.013 141.658 154.369

18 148.013 141.658 154.369

19 148.013 141.658 154.369

Index

Sale

s valu

es

18161412108642

156

152

148

144

140

Alpha 0.567101Smoothing Constant

MAPE 1.7914MAD 2.5940MSD 12.1632

Accuracy Measures

ActualFitsForecasts95.0% PI

Variable

SES optimal alpha

Yet, narrower prediction intervals

Page 23: Forecasting

Exponential smoothing for times series with trend and/or seasonal variation

• Double exponential smoothing (one smoothing parameter) for trend

• Holt’s method (two smoothing parameters) for trend

• Multiplicative Winter’s method (three smoothing parameters) for seasonal (and trend)

• Additive Winter’s method (three smoothing parameters) for seasonal (and trend)

Page 24: Forecasting

Modern methods

The classical approach:

Method Pros Cons

Time series regression • Easy to implement

• Fairly easy to interpret

• Covariates may be added (normalization)

• Inference is possible (though sometimes questionable)

• Static

• Normal-based inference not generally reliable

• Cyclic component hard to estimate

Decomposition • Easy to interpret

• Possible to have dynamic seasonal effects

• Cyclic components can be estimated

• Descriptive (no inference per def)

• Static in trend

Page 25: Forecasting

Explanation to the static behaviour:

The classical approach assumes all components except the irregular ones (i.e. t and IRt ) to be deterministic, i.e. fixed functions or constants

To overcome this problem, all components should be allowed to be stochastic, i.e. be random variates.

A time series yt should from a statistical point of view be treated as a stochastic process.

We will interchangeably use the terms time series and process depending on the situation.

Page 26: Forecasting

Stationary and non-stationary time series

20

10

0

100908070605040302010

Stationary

Index

3000

2000

1000

0

300200100

Non-stationary

Index

Characteristics for a stationary time series:

• Constant mean

• Constant variance

A time series with trend is non-stationary!

Page 27: Forecasting

Auto Regressive,

Integrated,

Moving Average

Box-Jenkins models

A stationary times series can be modelled on basis of the serial correlations in it.

A non-stationary time series can be transformed into a stationary time series, modelled and back-transformed to original scale (e.g. for purposes of forecasting)

ARIMA – models

These parts can be modelled on a stationary series

This part has to do with the transformation

Page 28: Forecasting

Different types of transformation

1. From a series with linear trend to a series with no trend:

First-order differences zt = yt – yt – 1

MTB > diff c1 c2

Page 29: Forecasting

Note that the differenced series varies around zero.

20

15

10

5

0

linear trendno trend

Variable

Page 30: Forecasting

2. From a series with quadratic trend to a series with no trend:

Second-order differences

wt = zt – zt – 1 = (yt – yt – 1) – (yt – 1 – yt – 2) = yt – 2yt – 1 + yt – 2

MTB > diff 2 c3 c4

Page 31: Forecasting

20

15

10

5

0

quadratic trendno trend 2

Variable

Page 32: Forecasting

3. From a series with non-constant variance (heteroscedastic) to a series with constant variance (homoscedastic):

Box-Cox transformations (per def 1964)

Practically is chosen so that yt + is always > 0

Simpler form: If we know that yt is always > 0 (as is the usual case for measurements)

0 and 0for ln

0 and 0for 1

tt

tt

t

yy

yy

yg

asticity heterosced extreme if1

asticity heteroscedheavy if1

asticity heterosced pronounced ifln

- " -

asticity heteroscedmodest if4

t

t

t

t

t

t

y

y

y

y

y

yg

Page 33: Forecasting

The log transform (ln yt ) usually also makes the data ”more” normally distributed

Example: Application of root (yt ) and log (ln yt ) transforms

25

20

15

10

5

0

originalrootlog

Variable

Page 34: Forecasting

AR-models (for stationary time series)

Consider the model

yt = δ + ·yt –1 + at

with {at } i.i.d with zero mean and constant variance = σ2

and where δ (delta) and (phi) are (unknown) parameters

Set δ = 0 by sake of simplicity E(yt ) = 0

Let R(k) = Cov(yt,yt-k ) = Cov(yt,yt+k ) = E(yt ·yt-k ) = E(yt ·yt+k )

R(0) = Var(yt) assumed to be constant

Page 35: Forecasting

Now:

R(0) = E(yt ·yt ) = E(yt ·( ·yt-1 + at ) = · E(yt ·yt-1 ) + E(yt ·at ) =

= ·R(1) + E(( ·yt-1 + at ) ·at ) = ·R(1) + · E(yt-1 ·at ) + E(at ·at )=

= ·R(1) + 0 + σ2 (for at is independent of yt-1 )

R(1) = E(yt ·yt+1 ) = E(yt ·( ·yt + at+1 ) = · E(yt ·yt ) + E(yt ·at+1 ) =

= ·R(0) + 0 (for at+1 is independent of yt )

R(2) = E(yt ·yt+2 ) = E(yt ·( ·yt+1 + at+2 ) = · E(yt ·yt+1 ) +

+ E(yt ·at+2 ) = ·R(1) + 0 (for at+1 is independent of yt )

Page 36: Forecasting

R(0) = ·R(1) + σ2

R(1) = ·R(0) Yule-Walker equations

R(2) = ·R(1)

R(k ) = ·R(k – 1) =…= k·R(0)

R(0) = 2 ·R(0) + σ2

2

2

1)0(

R

Page 37: Forecasting

Note that for R(0) to become positive and finite (which we require from a variance) the following must hold:

112

This in effect the condition for an AR(1)-process to be weakly stationary

Now, note that

)0()(

)0()0(

)(

)()(

),(),(

RkR

RR

kR

yVaryVar

yyCovyyCorr

ktt

kttkktt

kk

k R

R

)0(

)0(

Page 38: Forecasting

ρk is called the Autocorrelation function (ACF) of yt

”Auto” because it gives correlations within the same time series.

For pairs of different time series one can define the Cross correlation function which gives correlations at different lags between series.

By studying the ACF it might be possible to identify the approximate magnitude of

Page 39: Forecasting

Examples: ACF for AR(1), phi=0.1

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

k

ACF for AR(1), phi=0.3

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

k

Page 40: Forecasting

ACF for AR(1), phi=0.5

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ACF for AR(1), phi=0.8

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ACF for AR(1), phi=0.99

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 41: Forecasting

ACF for AR(1), phi=-0.1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ACF for AR(1), phi=-0.5

-1-0.8-0.6-0.4-0.20

0.20.40.60.81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ACF for AR(1), phi=-0.8

-1-0.8-0.6-0.4-0.20

0.20.40.60.81

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 42: Forecasting

The look of an ACF can be similar for different kinds of time series, e.g. the ACF for an AR(1) with = 0.3 could be approximately the same as the ACF for an Auto-regressive time series of higher order than 1 (we will discuss higher order AR-models later)

To do a less ambiguous identification we need another statistic:

The Partial Autocorrelation function (PACF):

υk = Corr (yt ,yt-k | yt-k+1, yt-k+2 ,…, yt-1 )

i.e. the conditional correlation between yt and yt-k given all observations in-between.

Note that –1 υk 1

Page 43: Forecasting

A concept sometimes hard to interpret, but it can be shown that

for AR(1)-models with positive the look of the PACF is

and for AR(1)-models with negative the look of the PACF is

0.00

1.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

k

-1

0

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

k

Page 44: Forecasting

Assume now that we have a sample y1, y2,…, yn from a time series assumed to follow an AR(1)-model.

Example:

Monthly exchange rates DKK/USD 1991-1998

0

2

4

6

8

10

Page 45: Forecasting

The ACF and the PACF can be estimated from data by their sample counterparts:

Sample Autocorrelation function (SAC):

if n large, otherwise a scaling

might be needed

Sample Partial Autocorrelation function (SPAC)

Complicated structure, so not shown here

n

tt

kt

kn

tt

k

yy

yyyyr

1

2

1

)(

))((

Page 46: Forecasting

The variance function of these two estimators can also be estimated

Opportunity to test

H0: k = 0 vs. Ha: k 0

or

H0: k = 0 vs. Ha: k 0

for a particular value of k.

Estimated sample functions are usually plotted together with critical limits based on estimated variances.

Page 47: Forecasting

Example (cont) DKK/USD exchange:

SAC:

SPAC: Critical limits

Page 48: Forecasting

Ignoring all bars within the red limits, we would identify the series as being an AR(1) with positive .

The value of is approximately 0.9 (ordinate of first bar in SAC plot and in SPAC plot)

Page 49: Forecasting

Higher-order AR-models

AR(2): or

yt-2 must be present

AR(3):

or other combinations with 3 yt-3

AR(p):

i.e. different combinations with p yt-p

tttt ayyy 2211

ttt ayy 22

ttttt ayyyy 332211

tptptt ayyy ...11

Page 50: Forecasting

Stationarity conditions:

For p > 2, difficult to express on closed form.

For p = 2:

The values of 1 and 2 must lie within the blue triangle in the figure below:

tttt ayyy 2211

Page 51: Forecasting

Typical patterns of ACF and PACF functions for higher order stationary AR-models (AR( p )):

ACF: Similar pattern as for AR(1), i.e. (exponentially) decreasing

bars, (most often) positive for 1 positive and alternating for 1 negative.

PACF: The first p values of k are non-zero with decreasing

magnitude. The rest are all zero (cut-off point at p )

(Most often) all positive if 1 positive and alternating if 1 negative

Page 52: Forecasting

Examples:

AR(2), 1 positive:

AR(5), 1 negative:

PACF

0

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ACF

0

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

PACF

-1

0

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ACF

-1

0

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15