20
(c) Martin L. Puterman 1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

Embed Size (px)

Citation preview

Page 1: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 1

BABS 502

Regression Based Forecasting

March 4, 2014

Page 2: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 2

Simple and Multiple Regression• A widely used set of statistical tools that are useful for:

– forecasting– data summary– adjustment for uncontrolled factors

• Basic idea is to fit an equation of the following form relating a dependent variable to one or more independent variables

y = 0 + 1x1 + 2x2 + 3x3 + …• It’s power is that by choosing the y and xi’s in different ways a wide

range of different effects can be taken into account.• The theoretical model assumes that each observation is subject to an

additive error which is normally distributed with mean zero and the same variance for every observation so that one observes the signal and noise components in aggregate.

• In forecasting the signal part provides the point forecast and the random part provides an accuracy measure.

Page 3: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 3

Regression in forecasting - trend extrapolation

• Fit a trend to historical data– linear Yt = a + bt

– quadratic Yt = a + bt + ct2

– exponential Yt = aebt or Log (Yt) = a + bt

• Assumption is that the same trend occurred throughout the past and that it will persist into future

• Fit using lm or tslm in R– Quadratic fit - tslm(y~poly(trend,2,raw=TRUE))

• Extensive regression theory available to guide use

Page 4: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 4

Trend Regression – Births DataLinear Trend

Quadratic Trend

Cubic Trend

Page 5: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

Cubic regression forecast of barley yields in BC

(c) Martin L. Puterman 5

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.094e+00 1.498e-01 13.983 < 2e-16 ***poly(trend, 3, raw = TRUE)1 -3.348e-02 1.229e-02 -2.724 0.00762 ** poly(trend, 3, raw = TRUE)2 7.752e-04 2.713e-04 2.857 0.00520 ** poly(trend, 3, raw = TRUE)3 -4.273e-06 1.699e-06 -2.515 0.01350 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3681 on 100 degrees of freedomMultiple R-squared: 0.2517, Adjusted R-squared: 0.2293 F-statistic: 11.21 on 3 and 100 DF, p-value: 2.096e-06

Page 6: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

Some R commands for regression

• birthsts<-ts(births[,2],start=1946)• b<-births{,2]• t<-births[,1]• t2<-t^2• plot(b)• plot(t,b,type="l")• lines(lm(b~t)$fit,col=2,lwd=2)• lines(lm(b~t+t2)$fit,col=3,lwd=2)• #residuals• r<-residuals(lm(b~t))• plot(t,r)• acf(r)• print(acf(r))• dwtest(b~t)• summary(lm(b~poly(t,3,raw=TRUE)))• dwtest(b~poly(t,3,raw=TRUE))

• #http://www.r-bloggers.com/polynomial-regression-techniques/

• plot(t,b,type="l")• fit3<-lm(b~poly(t,2,raw=TRUE))• lines(t,predict(fit2),col=2)

• #Using ts regression commands to get fits and plots• tslm(birthsts~trend)• fitq<-tslm(birthsts~trend+I(trend^2))• fq<-forecast(fitq,h=5,level=95)• summary(fq)• plot(fq)• lines(fitted(fitq),col=2)

• #fitting cubic trend with forecast intervals• fitc<-tslm(birthsts~poly(trend,3,raw=TRUE))• fc<-forecast(fitc,h=8,level=95)• plot(fc)• lines(fitted(fitc),col=3)

• dwtest(b~poly(t,3,raw=TRUE))

(c) Martin L. Puterman 6

Page 7: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 7

Dummy Variables• Dummy Variables are independent variables in regression that

assume the values of either 0 or 1. – A value 1 means a condition is present; a value 0 means it is not.– When an observation in regression corresponds to a condition

being present, then the value of that observation is decreased or increased by a constant amount equal to the value of the coefficient of the dummy variable in the regression.

• If a condition has three possible values; say “high”, “medium” or “low”. We encode its value with two dummy variables. The first variable, High, equals 1 if the condition is “high” and zero otherwise and the second variable, Medium, equals 1 if the condition is “medium” and zero otherwise. When the condition is “low” both values are zero. The Baseline condition “low” is reflected in the constant in the regression equation.

• In time series regression, we use dummy variables for seasons, and use S-1 dummy variables if there are S seasons. We are free to chose the baseline season from which all others are measured.

Page 8: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 8

Trend Regression with Seasonality• My experience suggests that a quadratic trend

regression plus (additive) seasonality is useful for forecasting

• Uses “dummy variables” for seasons• Must be fit with regression software• Equation with linear trend and additive monthly

seasonality

Yt = a + bt + dt2+ c2Febt + c3Mart + … + c12Dect

• Also enables multiple levels of seasonality such as weekly and monthly.

Page 9: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 9

Trend Regression with Seasonality• In previous Febt, Mart, … are dummy variables

– they equal 1 if observation Yt is from the indicated month and 0 otherwise

– Note that there is no dummy variable for January • January is the baseline for comparison

Examples:Yt = a + bt Observation t in January

Yt = a + bt + c2 Observation t in February

Yt = a + bt + c3 Observation t in March

• In R, tslm automatically generates seasonal dummies for a ts object

– tslm(y~trend+season)

Page 10: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 10

Intercept 189.88Trend 0.36I(Feb) -23.85I(Mar) -19.96I(Apr) -34.87I(May) -25.17I(Jun) -8.98I(Jul) 11.88I(Aug) 11.72I(Sep) -17.19I(Oct) -26.08I(Nov) -28.46I(Dec) -8.46

Trend Regression With Seasonality - Example

Some forecasts:Jan: F156(1) = 189.88 + 0.36*157 = 246.40

Feb: F156(2) = 189.88 + 0.36*158 - 23.85

= 222.91

Mar: F156(3) = 189.88 + 0.36*159 - 19.96

= 227.16

Page 11: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 11

Regression Example: Forecast Updating During Season

• Goal: Improve total sales forecasts using interim sales data• Data; early forecast, interim sales and total sales data for a

wide range of products. • Fitted Model:

Total Sales = 120 + .6 Interim Sales +.3 Early Forecast• Example: Early Forecast of Total Sales = 3000; Interim

Sales =1400• Revised Total Sales Forecast

Total Sales = 120 + .6*1400 + .3*3000 = 1860• Forecast Standard Deviation is Regression RMSE

Page 12: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 12

Regression Example: Impact of Advertising

• Goal: Take into account effect of advertising expenditures on sales

• Data; Sales and advertising expenditures in previous quarter

• Fitted Model: Salest = 15 + 10 Quartert + .8 Salest-1 +.4 (Advertisingt-1) 1/2

• Example: Sales in last quarter = 2000 and Advertising in previous quarter = 10,000

• Total Sales Forecast Sales = 15 + .8*2000 + .4*100 =1655

• Forecast Standard Deviation is Regression RMSE

Page 13: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 13

Some special concerns when using regression with time series data

• Often the usual regression assumption of uncorrelated errors is violated– This means that the residuals contain information.

Case A: This is usually due to model mis-specification; i.e. omission of important variables

Case B: But sometimes we have what we think is a good model and there is nothing obvious to add.

• Difficulty – Standard errors are underestimated so model seems better than it really is. – Concept: Since observations are not independent, there is less

information in the data than you would think– Reject Ho: βj = 0 when we shouldn’t.

• Detection – Some systematic pattern in residual plot vs. time– Durbin-Watson Test (see next slide).– (Best approach) ACF of residuals

Page 14: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 14

Durbin-Watson test; comments• The Durbin-Watson test is a not so good alternative

to using the ACF of the residuals but it is widely used probably because of historical reasons.

• It is based on the Durbin-Watson test statistic D.

• It tests only for first order autocorrelation in the errors.– Formally it tests H0: =0 vs. Ha: 0

– The test is reject H0 and conclude that there is

autocorrelation in the residuals if D is well below 2 or well above 2; I suggest being imprecise here. I would worry about values less than 1.4 or greater than 2.6.

• In economic data, when is not zero, it is usually positive.

Page 15: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 15

Regression with (auto)-correlated residuals

Approaches for obtaining more reliable estimates;– Add variables, such as trend squared, or use the lagged

dependent variable as an explanatory variable. (See sales and advertising example on previous slide; Salest-1 is a lagged variable.)

– Use time series regression models – which except for a special case (AR1 errors) requires advanced software such as SAS or R.

• Package Orcutt in R obtains estimates for models with AR(1) errors• See Hyndman text section 9.1 for more on this.

– Difference data if lag one autocorrelation is large (and positive)

and software such as that above is not available.

Page 16: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 16

Regression with auto-correlated errors

Modelyt = β0 + β1x1t + ••• + βmxmt + εt

where εt = ρ εt-1 + νt

and

νt ~ N(0, σ2) and independent

The quantity ρ is called the first order auto- correlation or serial correlation parameter and is between -1 and +1.

The Corchrane-Orcutt procedure, which is in the R package “Orcutt”, estimates the regression coefficients and ρ for this model. The help manual describes the algorithm in some detail.

Note that usually the regression coefficients will not change much from ordinary regression but their standard errors will be larger.

Page 17: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 17

Example – BC IncorporationsTrend Regression

Regression Equation SectionRegression Standard T-Value Reject Power

Independent Coefficient Error to test Prob H0 at of TestVariable b(i) Sb(i) H0:B(i)=0 Level 5%? at 5%Intercept 18672.3162 1865.4549 10.010 0.0000 Yes 1.0000trend 584.1152 182.0498 3.209 0.0059 Yes 0.8503

Serial Correlation of Residuals SectionSerial Serial Serial

Lag Correlation Lag Correlation Lag Correlation1 0.7473 9 -0.2205 17 0.00002 0.3632 10 -0.0183 18 0.00003 -0.0190 11 0.1487 19 0.00004 -0.2897 12 0.2349 20 0.00005 -0.4198 13 0.0000 21 0.00006 -0.4632 14 0.0000 22 0.00007 -0.4478 15 0.0000 23 0.00008 -0.3732 16 0.0000 24 0.0000Above serial correlations significant if their absolute values are greater than 0.485071

Durbin-Watson Test For Serial CorrelationDid the Test Reject

Parameter Value H0: Rho(1) = 0?Durbin-Watson Value 0.3573Prob. Level: Positive Serial Correlation 0.0000 YesProb. Level: Negative Serial Correlation 1.0000 No

15000

21667

28333

35000

1990 1992 1994 1997 1999 2001 2003 2006 2008 2010

BC vs Year

Year

BC

Page 18: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 18

Same data using serial correlation routine

Run Summary SectionParameter Value Parameter ValueDependent Variable BC Rows Processed 17Number Ind. Variables 1 Rows Filtered Out 0Weight Variable None Rows with X's Missing 0R2 0.1506 Rows with Weight Missing 0Adj R2 0.0899 Rows with Y Missing 0Coefficient of Variation 0.4879 Rows Used in Estimation 17Mean Square Error 4628200 Sum of Weights 16.000Square Root of MSE 2151.325 Completion Status Normal CompletionAve Abs Pct Error 17.034 Autocorrelation (Rho) 0.8523

Regression Equation SectionRegression Standard T-Value Reject

Independent Coefficient Error to test Prob H0 atVariable b(i) Sb(i) H0:B(i)=0 Level 5%?Intercept 10850.3237 12603.3258 0.861 0.4038 No

trend 1244.8410 790.0657 1.576 0.1374 No

Page 19: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 19

Same data but adding extra variables

• In above – dummy = 1 if year for year > 2001 and dummmyXyear allows for a shift in trends.• Note there is still some autocorrelation present, lag 1 serial autocorrelation equals .32 (which is insignificant) and the Durbin-Watson Test is significant but much less so than without extra variables. • The purpose of this example was to show that autocorrelation can result from the omission of independent variables.

Regression Equation SectionRegression Standard T-Value Reject Power

Independent Coefficient Error to test Prob H0 at of TestVariable b(i) Sb(i) H0:B(i)=0 Level 5%? at 5%Intercept 22290.5091 1303.6382 17.099 0.0000 Yes 1.0000dummy -37509.5091 7155.5803 -5.242 0.0002 Yes 0.9979dummyXyear 3036.6909 518.8170 5.853 0.0001 Yes 0.9997year -73.6909 192.2110 -0.383 0.7076 No 0.0646

Page 20: (c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman 20

What if seasonality is multiplicative and we want to use regression?

• Problem; Model on nominal scale assumes additive effect of seasonal dummy variables.

• Solution: Do regression on the logarithmic scale. This means that we transform the dependent variable by taking logarithms (base 10 or base e) and then do regression.

• Why does this work? Multiplicative seasonality is additive on the log scale!• Thus we can do forecasts using the model on the log scale and then transform

back to the original scale by exponentiating the forecast on the log scale.– Example: If forecast on Log10 scale is 3.4, then forecast on the nominal (original) scale

is 103.4 = 2511.9 units.• But predictions based on these transformations are often biased.• Alternative ad hoc approach: Deseasonalize data; fit model to deseasonalized

data and then multiply back by seasonal factors to get forecasts. This is how time series decomposition works.

• Also trends and dummy’s on the log-scale have nice interpretations. Consider the model for 4 seasons

log10(yt) = 2 + .014t + .19 Season2t -.13 Season3t + .04 Season4t

• Then the value of the series is increasing 1.4% per period. • The value in Season2 is about 19% above the what the trend alone predicts for that

season.