Forecasting: principles and practice · 2018. 7. 26. · Forecasting: principles and practice Rob J Hyndman 2 ARIMA models. Outline 1 Stationarity and di˙erencing 2 Backshift notation

1

Forecasting:principles andpractice

Rob J Hyndman

2 ARIMA models

Outline

1 Stationarity and differencing

2 Backshift notation

3 Autoregressive models

4 Moving Average models

5 Non-seasonal ARIMA models

6 Seasonal ARIMA models

7 Lab Session 3

2

Stationarity

DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.

A stationary series is:roughly horizontalconstant varianceno patterns predictable in the long-term

Transformations (e.g., logs) can help to stabilizethe variance.Differences can help to stabilize the mean.

3

Stationarity



Transformations (e.g., logs) can help to stabilizethe variance.Differences can help to stabilize the mean.

3

Stationarity



Transformations (e.g., logs) can help to stabilizethe variance.Differences can help to stabilize the mean. 3

Stationary?dj %>% autoplot() +ylab("Dow Jones Index") + xlab("Day")

3600

3700

3800

3900

4000

0 50 100 150 200 250 300

Day

Dow

Jon

es In

dex

4

Stationary?dj %>% diff() %>% autoplot() +ylab("Change in Dow Jones Index") + xlab("Day")

−100

−50

0

50

0 50 100 150 200 250 300

Day

Cha

nge

in D

ow J

ones

Inde

x

5

Stationary?hsales %>% autoplot() +

xlab("Year") + ylab("Total sales") +ggtitle("Sales of new one-family houses, USA")

40

60

80

1975 1980 1985 1990 1995

Year

Tota

l sal

es

Sales of new one−family houses, USA

6

Stationary?hsales %>% diff(lag=12) %>% autoplot() +xlab("Year") + ylab("Total sales") +ggtitle("Seasonal differences of sales of new one-family houses, USA")

−20

0

20

1975 1980 1985 1990 1995

Year

Tota

l sal

es

Seasonal differences of sales of new one−family houses, USA

7

Stationary?hsales %>% diff(lag=12) %>% diff(lag=1) %>% autoplot() +xlab("Year") + ylab("Total sales") +ggtitle("Seasonal differences of sales of new one-family houses, USA")

−10

0

10

1975 1980 1985 1990 1995

Year

Tota

l sal

es

Seasonal differences of sales of new one−family houses, USA

8

Electricity production

usmelec %>% autoplot()

200

300

400

1980 1990 2000 2010

Time

.

9


usmelec %>% log() %>% autoplot()

5.1

5.4

5.7

6.0

1980 1990 2000 2010

Time

.

10


usmelec %>% log() %>% diff(lag=12) %>%autoplot()

0.0

0.1

1980 1990 2000 2010

Time

.

11


usmelec %>% log() %>% diff(lag=12) %>%diff(lag=1) %>% autoplot()

−0.15

−0.10

−0.05

0.00

0.05

0.10

1980 1990 2000 2010

Time

.

12

Outline







7 Lab Session 3

13

Backshift notation

Backward shift operator

Shift back one periodByt = yt−1

Shift back two periods:B(Byt) = B2yt = yt−2

Shift back 12 periodsB12yt = yt−12

14

Backshift notation




Shift back 12 periodsB12yt = yt−12

14

Backshift notation




Shift back 12 periodsB12yt = yt−12 14

Backshift notation

First differencesy′t = yt − yt−1 = yt − Byt = (1− B)yt .

Second-order differences (i.e., first differences offirst differences):

y′′t = (1− B)2yt .

dth-order differences:(1− B)dyt.

Seasonal difference followed by first difference:(1− B)(1− Bm)yt .

15

Backshift notation



y′′t = (1− B)2yt .



15

Backshift notation



y′′t = (1− B)2yt .



15

Backshift notation



y′′t = (1− B)2yt .



15

Outline







7 Lab Session 3

16

Autoregressive models

Autoregressive (AR) models:

yt = c + φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt(1− φ1B− · · · − φpBp)yt = c + εt

where εt is white noise. This is a multiple regressionwith lagged values of yt as predictors.

8

10

12

0 20 40 60 80 100

Time

AR(1)

15.0

17.5

20.0

22.5

25.0

0 20 40 60 80 100

Time

AR(2)

17

Stationarity conditions

We normally restrict autoregressive models tostationary data, and then some constraints on thevalues of the parameters are required.General condition for stationarityComplex roots of 1− φ1z− φ2z2 − · · · − φpzp lieoutside the unit circle on the complex plane.

For p = 1: −1 < φ1 < 1.For p = 2:−1 < φ2 < 1 φ2 + φ1 < 1 φ2 − φ1 < 1.More complicated conditions hold for p ≥ 3.

18

Stationarity conditions

We normally restrict autoregressive models tostationary data, and then some constraints on thevalues of the parameters are required.General condition for stationarityComplex roots of 1− φ1z− φ2z2 − · · · − φpzp lieoutside the unit circle on the complex plane.

For p = 1: −1 < φ1 < 1.For p = 2:−1 < φ2 < 1 φ2 + φ1 < 1 φ2 − φ1 < 1.More complicated conditions hold for p ≥ 3.

18

Outline







7 Lab Session 3

19

Moving Average (MA) models

Moving Average (MA) models:

yt = c + εt + θ1εt−1 + θ2εt−2 + · · · + θqεt−qyt = c + (1 + θ1B + · · · + θqBq)εt

where εt is white noise. This is a multiple regressionwith past errors as predictors.

18

20

22

0 20 40 60 80 100

Time

MA(1)

−5.0

−2.5

0.0

2.5

0 20 40 60 80 100

Time

MA(2)

20

Invertibility

Invertible models have property that distant pasthas negligible effect on forecasts. Requiresconsraints on MA parameters.

General condition for invertibilityComplex roots of 1 + θ1z + θ2z2 + · · · + θqzq lie outsidethe unit circle on the complex plane.

For q = 1: −1 < θ1 < 1.For q = 2:−1 < θ2 < 1 θ2 + θ1 > −1 θ1 − θ2 < 1.More complicated conditions hold for q ≥ 3.

21

Invertibility

Invertible models have property that distant pasthas negligible effect on forecasts. Requiresconsraints on MA parameters.

General condition for invertibilityComplex roots of 1 + θ1z + θ2z2 + · · · + θqzq lie outsidethe unit circle on the complex plane.

For q = 1: −1 < θ1 < 1.For q = 2:−1 < θ2 < 1 θ2 + θ1 > −1 θ1 − θ2 < 1.More complicated conditions hold for q ≥ 3.

21

Outline







7 Lab Session 3

22

ARIMA models

Autoregressive Moving Average models:yt = c + φ1yt−1 + · · · + φpyt−p

+ θ1εt−1 + · · · + θqεt−q + εtφp(B)yt = θq(B)εt

Predictors include both lagged values of yt andlagged errors.φp(B) is a pth order polynomial in Bθq(B) is a qth order polynomial in B

Autoregressive Integrated Moving Average modelsCombine ARMA model with differencing.(1− B)dyt follows an ARMA model.

23

ARIMA models




Autoregressive Integrated Moving Average modelsCombine ARMA model with differencing.(1− B)dyt follows an ARMA model.

23

ARIMA models




Autoregressive Integrated Moving Average modelsCombine ARMA model with differencing.(1− B)dyt follows an ARMA model. 23

ARIMA models

Autoregressive Integrated Moving Average modelsARIMA(p, d, q) modelAR: p = order of the autoregressive partI: d = degree of first differencing involved

MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)Random walk: ARIMA(0,1,0) with no constantRandom walk with drift: ARIMA(0,1,0) with const.AR(p): ARIMA(p,0,0)MA(q): ARIMA(0,0,q) 24

Backshift notation for ARIMA

ARIMA(p, 0, q) model:yt = c + φ1yt−1 + · · · + φpyt−p + θ1εt−1 + · · · + θqεt−q + εtyt = c + φ1Byt + · · · + φpBpyt + εt + θ1Bεt + · · · + θqBqεt

or (1− φ1B− · · · − φpBp)yt = c + (1 + θ1B + · · · + θqBq)εt

ARIMA(1,1,1) model:(1− φ1B) (1− B)yt = c + (1 + θ1B)εt↑ ↑ ↑

AR(1) First MA(1)difference

Written out:yt = c + yt−1 + φ1yt−1 − φ1yt−2 + θ1εt−1 + εt

25






Written out:yt = c + yt−1 + φ1yt−1 − φ1yt−2 + θ1εt−1 + εt

25






Written out:yt = c + yt−1 + φ1yt−1 − φ1yt−2 + θ1εt−1 + εt 25

R modelIntercept form

(1− φ1B− · · · − φpBp)(1− B)dyt = c + (1 + θ1B + · · · + θqBq)εt

Mean form

(1− φ1B− · · · − φpBp)(1− B)d(yt − µtd/d!) =(1 + θ1B + · · · + θqBq)εt

µ is the mean of (1− B)dyt.c = µ(1− φ1 − · · · − φp).R uses mean form.Including c equivalent to yt having dth orderpolynomial trend.

26

US personal consumptionautoplot(uschange[,"Consumption"]) +xlab("Year") + ylab("Quarterly percentage change") +ggtitle("US consumption")

−2

−1

0

1

2

1970 1980 1990 2000 2010

Year

Qua

rter

ly p

erce

ntag

e ch

ange

US consumption

27

US personal consumption(fit <- auto.arima(uschange[,"Consumption"]))

## Series: uschange[, "Consumption"]

## ARIMA(2,0,2) with non-zero mean

##

## Coefficients:

## ar1 ar2 ma1 ma2 mean

## 1.391 -0.581 -1.180 0.558 0.746

## s.e. 0.255 0.208 0.238 0.140 0.084

##

## sigma^2 estimated as 0.351: log likelihood=-165.1

## AIC=342.3 AICc=342.8 BIC=361.7

ARIMA(2,0,2) model:yt = c + 1.391yt−1 − 0.581yt−2 − 1.180εt−1 + 0.558εt−2 + εt,

where c = 0.746× (1− 1.391 + 0.581) = 0.142 and εt ∼ N(0, 0.351).

28

US personal consumption(fit <- auto.arima(uschange[,"Consumption"]))

## Series: uschange[, "Consumption"]

## ARIMA(2,0,2) with non-zero mean

##

## Coefficients:

## ar1 ar2 ma1 ma2 mean

## 1.391 -0.581 -1.180 0.558 0.746

## s.e. 0.255 0.208 0.238 0.140 0.084

##

## sigma^2 estimated as 0.351: log likelihood=-165.1

## AIC=342.3 AICc=342.8 BIC=361.7

ARIMA(2,0,2) model:yt = c + 1.391yt−1 − 0.581yt−2 − 1.180εt−1 + 0.558εt−2 + εt,

where c = 0.746× (1− 1.391 + 0.581) = 0.142 and εt ∼ N(0, 0.351). 28

US personal consumption

fit %>% forecast(h=10) %>% autoplot(include=80)

−1

0

1

2

2000 2005 2010 2015 2020

Time

usch

ange

[, "C

onsu

mpt

ion"

]

level

80

95

Forecasts from ARIMA(2,0,2) with non−zero mean

29

Information criteria

Akaike’s Information Criterion (AIC):AIC = −2 log(L) + 2(p + q + k + 1),

where L is the likelihood of the data,k = 1 if c 6= 0 and k = 0 if c = 0.

Corrected AIC:

AICc = AIC +2(p + q + k + 1)(p + q + k + 2)

T − p− q− k− 2.

Good models are obtained by minimizing the AICc.

30




Corrected AIC:

AICc = AIC +2(p + q + k + 1)(p + q + k + 2)

T − p− q− k− 2.


30




Corrected AIC:

AICc = AIC +2(p + q + k + 1)(p + q + k + 2)

T − p− q− k− 2.


30

How does auto.arima() work?

A non-seasonal ARIMA process

φ(B)(1− B)dyt = c + θ(B)εtNeed to select appropriate orders: p, q, d

Hyndman and Khandakar (JSS, 2008) algorithm:

Select no. differences d and D via KPSS test andseasonal strength measure.Select p, q by minimising AICc.Use stepwise search to traverse model space.

31


Step 1: Select values of d and D.

Step 2: Select current model (with smallest AICc) from:ARIMA(2, d, 2)ARIMA(0, d, 0)ARIMA(1, d, 0)ARIMA(0, d, 1)

Step 3: Consider variations of current model:vary one of p, q, from current model by±1;p, q both vary from current model by±1;Include/exclude c from current model.

Model with lowest AICc becomes current model.

Repeat Step 3 until no lower AICc can be found.

32


Step 1: Select values of d and D.

Step 2: Select current model (with smallest AICc) from:ARIMA(2, d, 2)ARIMA(0, d, 0)ARIMA(1, d, 0)ARIMA(0, d, 1)

Step 3: Consider variations of current model:vary one of p, q, from current model by±1;p, q both vary from current model by±1;Include/exclude c from current model.

Model with lowest AICc becomes current model.

Repeat Step 3 until no lower AICc can be found. 32

Choosing an ARIMA model

autoplot(internet)

80

120

160

200

0 20 40 60 80 100

Time

inte

rnet

33


(fit <- auto.arima(internet))

## Series: internet## ARIMA(1,1,1)#### Coefficients:## ar1 ma1## 0.650 0.526## s.e. 0.084 0.090#### sigma^2 estimated as 10: log likelihood=-254.2## AIC=514.3 AICc=514.5 BIC=522.1

34


(fit <- auto.arima(internet, stepwise=FALSE,approximation=FALSE))

## Series: internet## ARIMA(3,1,0)#### Coefficients:## ar1 ar2 ar3## 1.151 -0.661 0.341## s.e. 0.095 0.135 0.094#### sigma^2 estimated as 9.66: log likelihood=-252## AIC=512 AICc=512.4 BIC=522.4

35


checkresiduals(fit, plot=TRUE)

−5

0

5

0 20 40 60 80 100

Residuals from ARIMA(3,1,0)

−0.2

−0.1

0.0

0.1

0.2

5 10 15 20

Lag

AC

F

0

5

10

15

20

−10 −5 0 5 10

residuals

coun

t

##

## Ljung-Box test

##

## data: Residuals from ARIMA(3,1,0)

## Q* = 4.5, df = 7, p-value = 0.7

##

## Model df: 3. Total lags used: 10

36


fit %>% forecast() %>% autoplot()

100

150

200

250

0 30 60 90

Time

inte

rnet

level

80

95

Forecasts from ARIMA(3,1,0)

37

Outline







7 Lab Session 3

38

Seasonal ARIMA models

ARIMA (p, d, q)︸︷︷︸ (P,D,Q)m︸︷︷︸↑ ↑

Non-seasonal part Seasonal part ofof the model of the model

wherem = number of observations per year.

39


E.g., ARIMA(1, 1, 1)(1, 1, 1)4 model (without constant)

(1−φ1B)(1−Φ1B4)(1−B)(1−B4)yt = (1+θ1B)(1+Θ1B4)εt.

6 6 6 6 6 6(Non-seasonal

AR(1)

)(SeasonalAR(1)

)(Non-seasonaldifference

)(

Seasonaldifference

)(Non-seasonal

MA(1)

)(SeasonalMA(1)

)

40


E.g., ARIMA(1, 1, 1)(1, 1, 1)4 model (without constant)(1−φ1B)(1−Φ1B4)(1−B)(1−B4)yt = (1+θ1B)(1+Θ1B4)εt.


AR(1)

)(SeasonalAR(1)


)(

Seasonaldifference

)(Non-seasonal

MA(1)

)(SeasonalMA(1)

)

40




AR(1)

)(SeasonalAR(1)


)(

Seasonaldifference

)(Non-seasonal

MA(1)

)(SeasonalMA(1)

)

40



All the factors can be multiplied out and the generalmodel written as follows:

yt = (1 + φ1)yt−1 − φ1yt−2 + (1 + Φ1)yt−4− (1 + φ1 + Φ1 + φ1Φ1)yt−5 + (φ1 + φ1Φ1)yt−6−Φ1yt−8 + (Φ1 + φ1Φ1)yt−9 − φ1Φ1yt−10+ εt + θ1εt−1 + Θ1εt−4 + θ1Θ1εt−5.

41

European quarterly retail trade

autoplot(euretail) +xlab("Year") + ylab("Retail index")

92

96

100

2000 2005 2010

Year

Ret

ail i

ndex

42


(fit <- auto.arima(euretail))

## Series: euretail## ARIMA(1,1,2)(0,1,1)[4]#### Coefficients:## ar1 ma1 ma2 sma1## 0.736 -0.466 0.216 -0.843## s.e. 0.224 0.199 0.210 0.188#### sigma^2 estimated as 0.159: log likelihood=-29.62## AIC=69.24 AICc=70.38 BIC=79.63

43


(fit <- auto.arima(euretail, stepwise=TRUE,approximation=FALSE))

## Series: euretail## ARIMA(1,1,2)(0,1,1)[4]#### Coefficients:## ar1 ma1 ma2 sma1## 0.736 -0.466 0.216 -0.843## s.e. 0.224 0.199 0.210 0.188#### sigma^2 estimated as 0.159: log likelihood=-29.62## AIC=69.24 AICc=70.38 BIC=79.63

44


checkresiduals(fit, test=FALSE)

−1.0

−0.5

0.0

0.5

1.0

2000 2005 2010

Residuals from ARIMA(1,1,2)(0,1,1)[4]

−0.2

−0.1

0.0

0.1

0.2

4 8 12 16

Lag

AC

F

0

5

10

−1.0 −0.5 0.0 0.5 1.0

residuals

coun

t

45


forecast(fit, h=36) %>% autoplot()

80

90

100

110

1995 2000 2005 2010 2015 2020

Time

eure

tail level

80

95

Forecasts from ARIMA(1,1,2)(0,1,1)[4]

46

Cortecosteroid drug salesH

02 sales (million scripts)

Log H02 sales

1995 2000 2005

0.50

0.75

1.00

1.25

−0.8

−0.4

0.0

Year

47

Cortecosteroid drug sales

(fit <- auto.arima(h02, lambda=0, max.order=9,

stepwise=FALSE, approximation=FALSE))

## Series: h02

## ARIMA(4,1,1)(2,1,2)[12]

## Box Cox transformation: lambda= 0

##

## Coefficients:

## ar1 ar2 ar3 ar4 ma1 sar1

## -0.042 0.210 0.202 -0.227 -0.742 0.621

## s.e. 0.217 0.181 0.114 0.081 0.207 0.242

## sar2 sma1 sma2

## -0.383 -1.202 0.496

## s.e. 0.118 0.249 0.214

##

## sigma^2 estimated as 0.00405: log likelihood=254.3

## AIC=-488.6 AICc=-487.4 BIC=-456.1

48

Cortecosteroid drug sales

checkresiduals(fit)

−0.2

−0.1

0.0

0.1

1995 2000 2005

Residuals from ARIMA(4,1,1)(2,1,2)[12]

−0.1

0.0

0.1

12 24 36

Lag

AC

F

0

10

20

30

−0.2 −0.1 0.0 0.1 0.2

residuals

coun

t

#### Ljung-Box test#### data: Residuals from ARIMA(4,1,1)(2,1,2)[12]## Q* = 16, df = 15, p-value = 0.4#### Model df: 9. Total lags used: 24

49

Understanding ARIMA models

Long-term forecastszero c = 0, d + D = 0non-zero constant c = 0, d + D = 1 c 6= 0, d + D = 0linear c = 0, d + D = 2 c 6= 0, d + D = 1quadratic c = 0, d + D = 3 c 6= 0, d + D = 2

Forecast variance and d + DThe higher the value of d + D, the more rapidlythe prediction intervals increase in size.For d + D = 0, the long-term forecast standarddeviation will go to the standard deviation of thehistorical data.

50

Prediction intervals

Prediction intervals increase in size withforecast horizon.Calculations assume residuals are uncorrelatedand normally distributed.Prediction intervals tend to be too narrow.

the uncertainty in the parameter estimates has notbeen accounted for.the ARIMA model assumes historical patterns willnot change during the forecast period.the ARIMA model assumes uncorrelated futureerrors

51

Outline







7 Lab Session 3

52

Lab Session 3

53

Documents

Forecasting: principles and practice · 2018. 7. 26. · Forecasting: principles and practice Rob J Hyndman 2 ARIMA models. Outline 1 Stationarity and di˙erencing 2 Backshift notation