21
1 Stat 5100 Handout #12.d SAS: ARIMA Dependence Structures (Unit 7) Example 1: 57 daily ‘overshorts’ from an underground gasoline tank in Colorado; see Brockwell and Davis, Introduction to Time Series and Forecasting, Example 3.2.8; overshort for day t is Zt = (amount of fuel at end of day t) (amount of fuel at end of day t-1) (amount of fuel delivered during day t) + (amount of fuel sold during day t) (With no measurement error and no tank leaks, Zt=0) /* Define options */ ods html image_dpi=300 style=journal; data Oshort; input X @@; cards; 78 -58 53 -65 13 -6 -16 -14 3 -72 89 -48 -14 32 56 -86 -66 50 26 59 -47 -83 2 -1 124 -106 113 -76 -47 -32 39 -30 6 -73 18 2 -24 23 -38 91 -56 -58 1 14 -4 77 -127 97 10 -28 -17 23 -2 48 -131 65 -17 ; data Oshort; set Oshort; day = _n_; run; /* Graphical representation; check stationarity */ proc sgplot data=Oshort noautolegend; scatter y=X x=day / markerattrs=(symbol=CIRCLEFILLED size=2pt); series y=X x=day / lineattrs=(pattern=solid); xaxis label='Day'; yaxis label='Overshort'; title1 'Daily Overshorts'; run;

Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

Embed Size (px)

Citation preview

Page 1: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

1

Stat 5100 Handout #12.d – SAS: ARIMA Dependence Structures (Unit 7)

Example 1: 57 daily ‘overshorts’ from an underground gasoline tank in Colorado; see

Brockwell and Davis, Introduction to Time Series and Forecasting, Example 3.2.8;

overshort for day t is

Zt = (amount of fuel at end of day t)

– (amount of fuel at end of day t-1)

– (amount of fuel delivered during day t)

+ (amount of fuel sold during day t)

(With no measurement error and no tank leaks, Zt=0)

/* Define options */

ods html image_dpi=300 style=journal;

data Oshort; input X @@; cards; 78 -58 53 -65 13 -6 -16 -14 3 -72 89 -48 -14 32 56 -86 -66 50 26

59 -47 -83 2 -1 124 -106 113 -76 -47 -32 39 -30 6 -73 18 2 -24 23

-38 91 -56 -58 1 14 -4 77 -127 97 10 -28 -17 23 -2 48 -131 65 -17

;

data Oshort; set Oshort;

day = _n_;

run;

/* Graphical representation; check stationarity */

proc sgplot data=Oshort noautolegend;

scatter y=X x=day /

markerattrs=(symbol=CIRCLEFILLED size=2pt);

series y=X x=day / lineattrs=(pattern=solid);

xaxis label='Day';

yaxis label='Overshort';

title1 'Daily Overshorts';

run;

Page 2: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

2

/* Investigate potential dependence structures */

proc arima data=Oshort;

identify var=X nlag=10;

title1 'Look at SAC: MA(1)';

run;

Look at SAC: MA(1)

The ARIMA Procedure

Name of Variable = X

Mean of Working Series -4.03509

Standard Deviation 58.44414

Number of Observations 57

Autocorrelation Check for White Noise

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 20.25 6 0.0025 -0.504 0.122 -0.212 0.080 0.019 0.116

Page 3: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

3

/* Now fit specifically as an MA(1) model */

proc arima data=Oshort;

identify var=X nlag=10;

estimate q=1 method=uls plot;

title1 'MA(1) model fit to Overshort data';

run;

MA(1) model fit to Overshort data

Unconditional Least Squares Estimation

Parameter Estimate Standard Error t Value Approx

Pr > |t|

Lag

MU -5.12443 0.35073 -14.61 <.0001 0

MA1,1 0.99999 0.26992 3.70 0.0005 1

Constant Estimate -5.12443

Variance Estimate 1996.541

Std Error Estimate 44.68267

Autocorrelation Check of Residuals

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 4.82 5 0.4379 0.119 0.131 -0.054 0.102 0.130 0.123

12 13.18 11 0.2817 -0.090 0.079 -0.210 -0.161 -0.178 -0.041

18 29.47 17 0.0304 0.098 -0.141 -0.273 -0.173 -0.207 -0.151

24 32.94 23 0.0821 -0.084 -0.071 0.068 -0.057 -0.086 0.095

Model for variable X

Estimated Mean -5.12443

Moving Average Factors

Factor 1: 1 - 0.99999 B**(1)

Page 4: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

4

Example 2: General Electric’s gross investment (in millions of dollars) for years 1935 –

1954. Originally presented in Grunfeld, Y. (1958), "The Determinants of Corporate

Investment," Ph.D. dissertation, University of Chicago; discussed in Boot, J.C.G. (1960),

"Investment Demand: An Empirical Contribution to the Aggregation Problem,"

International Economic Review, 1, 3-30. See also Damodar N. Gujarati, Basic

Econometrics, Third Edition, 1995, McGraw-Hill, [1995, pp. 522-525].

data GE; input year GEinv @@; cards; 1935 33.1 1936 45.0 1937 77.2 1938 44.6 1939 48.1

1940 74.4 1941 113.0 1942 91.9 1943 61.3 1944 56.8

1945 93.6 1946 159.9 1947 147.2 1948 146.3 1949 98.3

1950 93.5 1951 135.2 1952 157.3 1953 179.5 1954 189.6

;

proc sgplot data=GE noautolegend;

scatter y=GEinv x=year /

markerattrs=(symbol=CIRCLEFILLED size=2pt);

series y=GEinv x=year / lineattrs=(pattern=solid);

xaxis label='Year';

yaxis label='GE gross investment (millions)';

title1 'GE gross investment';

run;

Page 5: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

5

/* Make stationary */

proc reg data=GE noprint;

model GEinv=year; output out=a1 r=resid;

title1 'simple regression on time';

proc sgplot data=a1 noautolegend;

scatter y=resid x=year /

markerattrs=(symbol=CIRCLEFILLED size=2pt);

series y=resid x=year / lineattrs=(pattern=solid);

xaxis label='Year'; yaxis label='Residual';

title1 'GE gross investment after accounting for time';

run;

Page 6: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

6

data GE; set GE; logGEinv=log(GEinv);

proc reg data=GE noprint;

model logGEinv=year; output out=a2 r=resid;

title1 'simple regression on time, using log';

proc sgplot data=a2 noautolegend;

scatter y=resid x=year /

markerattrs=(symbol=CIRCLEFILLED size=2pt);

series y=resid x=year / lineattrs=(pattern=solid);

xaxis label='Year'; yaxis label='Residual';

title1 'GE gross investment after accounting for time,

using log';

run;

/* Investigate potential dependence structures */

data newuse; set a2;

Z = resid;

proc arima data=newuse;

identify var=Z nlag=12 ;

title1 'Look at SPAC: AR(2)';

run;

Look at SPAC: AR(2)

The ARIMA Procedure

Autocorrelation Check for White Noise

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 20.42 6 0.0023 0.290 -0.517 -0.535 -0.070 0.310 0.225

12 21.33 12 0.0457 -0.030 -0.127 -0.040 -0.016 0.040 -0.049

Page 7: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

7

/* Now fit specifically as an AR(2) model */

proc arima data=newuse;

identify var=logGEinv crosscorr=(year) nlag=12;

estimate p=2 input=(year) method=uls plot;

title1 'AR(2) model fit to log of GE data';

run;

AR(2) model fit to log of GE data

Unconditional Least Squares Estimation

Parameter Estimate Standard Error t Value Approx

Pr > |t|

Lag Variable Shift

MU -135.17006 14.84188 -9.11 <.0001 0 logGEinv 0

AR1,1 0.51014 0.18639 2.74 0.0146 1 logGEinv 0

AR1,2 -0.71635 0.17516 -4.09 0.0009 2 logGEinv 0

NUM1 0.07183 0.0076327 9.41 <.0001 0 year 0

Page 8: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

8

Constant Estimate -163.042

Variance Estimate 0.044281

Std Error Estimate 0.210431

Autocorrelation Check of Residuals

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 3.11 4 0.5395 -0.176 -0.019 -0.086 -0.269 0.018 0.078

12 9.44 10 0.4910 0.122 -0.032 0.094 -0.343 0.065 -0.026

18 14.23 16 0.5815 -0.140 0.189 -0.037 0.005 0.074 -0.004

Autoregressive Factors

Factor 1: 1 - 0.51014 B**(1) + 0.71635 B**(2)

Page 9: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

9

Example 3: Gas price data – annual averages 1976 – 2012. Variable ‘price’ is U.S.

annual average cents per gallon for unleaded (source: U.S. Energy Information

Administration); variable ‘infl76’ is inflation (relative buying power) with 1976 as base

(source: U.S. Bureau of Labor Statistics).

data gas; input year price infl76 @@; cards;

1976 61.4 100 1977 65.6 107 1978 67.0 115 1979 90.3 128

1980 124.5 145 1981 137.8 160 1982 129.6 170 1983 124.1 175

1984 121.2 183 1985 120.2 189 1986 92.7 193 1987 94.8 200

1988 94.6 208 1989 102.1 218 1990 116.4 230 1991 114.0 239

1992 112.7 247 1993 110.8 254 1994 111.2 260 1995 114.7 268

1996 123.1 276 1997 123.4 282 1998 105.9 286 1999 116.5 293

2000 151.0 303 2001 146.1 311 2002 135.8 316 2003 159.1 323

2004 188.0 332 2005 229.5 343 2006 258.9 354 2007 280.1 364

2008 326.6 378 2009 235.0 377 2010 278.2 383 2011 352.1 395

2012 361.8 404 2013 350.1 409 2014 335.0 411

; /* Note: 2014 data current through 03.03.14 */

/* Look at plots */

proc sgplot data=gas;

series x=year y=price / lineattrs=(pattern=solid);

series x=year y=infl76 / lineattrs=(pattern=dash) y2axis;

xaxis label='Year';

yaxis label='Annual Average Price of Unleaded';

y2axis label='Buying Power of 1976 Dollar';

title1 'Gas Price Data';

run;

Page 10: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

10

/* Just out of curiousity, look at prices,

adjusted for inflation (to 2014 dollars) */

data gas; set gas;

priceNow = price / infl76 * 411 / 100;

proc sgplot data=gas;

series y=priceNow x=year / lineattrs=(pattern=solid);

yaxis label='Price of Unleaded (2014 Dollars)';

xaxis label='Year';

refline 2007 / axis=x;

title1 'Gas Price Data, adjusted for inflation';

run;

/* For demonstration purposes only,

restrict attention to pre-2007 */

data gas7; set gas;

if year < 2007;

run;

/* Make stationary */

proc reg data=gas7 noprint;

model price=infl76;

output out=out1 r=resid;

run;

/* Check stationarity and potential dependence structure */

proc arima data=out1;

identify var=resid nlag=10;

title1 'Look at SAC and SPAC plots';

run;

Page 11: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

11

Look at SAC and SPAC plots

Autocorrelation Check for White Noise

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 23.80 6 0.0006 0.706 0.377 0.185 0.080 0.032 -0.079

/* Try some models */

proc arima data=out1;

identify var=resid nlag=10;

estimate p=1 method=uls plot;

title1 'AR(1) model fit to gas data';

run;

Page 12: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

12

AR(1) model fit to gas data

Unconditional Least Squares Estimation

Parameter Estimate Standard Error t Value Approx

Pr > |t|

Lag

MU 25.72666 89.82249 0.29 0.7766 0

AR1,1 0.96923 0.11054 8.77 <.0001 1

Constant Estimate 0.791484

Variance Estimate 241.7949

Std Error Estimate 15.54976

Autocorrelation Check of Residuals

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 6.14 5 0.2932 0.416 -0.027 -0.002 0.036 0.061 -0.022

12 8.59 11 0.6592 -0.143 -0.112 0.023 0.116 -0.031 -0.074

Page 13: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

13

proc arima data=out1;

identify var=resid nlag=10;

estimate q=1 method=uls plot;

title1 'MA(1) model fit to gas data';

run;

MA(1) model fit to gas data

Unconditional Least Squares Estimation

Parameter Estimate Standard Error t Value Approx

Pr > |t|

Lag

MU 0.75142 6.17848 0.12 0.9040 0

MA1,1 -0.99999 0.37396 -2.67 0.0122 1

Constant Estimate 0.751422

Variance Estimate 305.3926

Std Error Estimate 17.47549

Autocorrelation Check of Residuals

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 13.57 5 0.0186 0.446 0.402 0.038 0.134 -0.086 0.044

12 22.07 11 0.0239 -0.247 -0.119 -0.184 -0.030 -0.231 -0.130

18 37.36 17 0.0030 -0.246 -0.159 -0.188 -0.110 -0.237 -0.200

24 50.28 23 0.0008 -0.147 -0.047 0.049 0.097 0.153 0.212

Page 14: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

14

proc arima data=out1;

identify var=resid nlag=10;

estimate p=1 q=1 method=uls plot;

title1 'ARMA(1,1) model fit to gas data';

run;

ARMA(1,1) model fit to gas data

Unconditional Least Squares Estimation

Parameter Estimate Standard Error t Value Approx

Pr > |t|

Lag

MU 11.41352 27.31787 0.42 0.6793 0

MA1,1 -0.65140 0.15763 -4.13 0.0003 1

AR1,1 0.85942 0.15923 5.40 <.0001 1

Constant Estimate 1.604542

Variance Estimate 179.4121

Std Error Estimate 13.39448

Page 15: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

15

Autocorrelation Check of Residuals

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 0.18 4 0.9961 0.024 0.052 -0.005 0.032 0.027 0.003

12 2.30 10 0.9935 -0.099 -0.082 -0.038 0.122 -0.097 -0.048

18 8.11 16 0.9457 -0.044 -0.183 0.054 0.092 -0.148 -0.129

24 11.84 22 0.9607 -0.138 0.033 -0.064 -0.094 0.001 0.063

Page 16: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

16

/* Now fit specifically as an ARMA(1,1) model */

proc arima data=gas7;

identify var=price crosscorr=(infl76) nlag=10;

estimate p=1 q=1 input=(infl76) method=uls plot;

title1 'ARMA(1,1) model fit to gas data';

title2 '(with inflation as predictor)';

run;

ARMA(1,1) model fit to gas data

(with inflation as predictor)

Unconditional Least Squares Estimation

Parameter Estimate Standard

Error

t Value Approx

Pr > |t|

Lag Variable Shift

MU -88.09278 116.63110 -0.76 0.4566 0 price 0

MA1,1 -0.56741 0.16480 -3.44 0.0019 1 price 0

AR1,1 0.95756 0.07686 12.46 <.0001 1 price 0

NUM1 1.02118 0.42680 2.39 0.0239 0 infl76 0

Page 17: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

17

Example 3, continued: Gas price data – now look at differencing and forecasting

/* Still restricting attention to pre-2007 */

/* Need to add rows for 'future' predictor variables */

data temp; input year @@; cards;

2007 2008 2009 2010 2011 2012 2013 2014

;

data gas7; set gas7 temp;

run;

/* Look at differencing */

data gas7; set gas7;

Z = price - lag(price);

proc sgplot data=gas7;

series y=Z x=year;

title1 'Gas prices: first differences';

run;

/* Add year and year^2 terms to remove curvilinearity */

data gas7; set gas7;

year1 = year-1975;

year2 = (year-1975)**2;

proc reg data=gas7 noprint;

model Z = year1 year2;

output out=out1 r=resid;

proc sgplot data=out1;

series y=resid x=year;

title1 'Gas prices after differencing and time effects';

run;

Page 18: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

18

/* Check stationarity and potential dependence structure */

proc arima data=out1;

identify var=resid nlag=12;

title1 'Look at SAC and SPAC plots';

title2 '(after differencing and time effects');

run;

Look at SAC and SPAC plots

(after differencing and time effects)

Autocorrelation Check for White Noise

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 10.21 6 0.1159 0.161 -0.425 -0.197 0.083 0.053 -0.188

12 13.19 12 0.3554 -0.170 -0.024 0.109 0.144 0.004 -0.062

Page 19: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

19

/* Fit tentative model: ARIMA(2,1,0) with two covariates */

proc arima data = gas7;

identify var = price (1) crosscorr = (year1 year2)

nlag=12 ;

estimate p = 2 q = 0 input = (year1 year2)

method = uls plot;

forecast lead=6 alpha=.10 noprint out=fout1;

title1 'Tentative model: ARIMA(2,1,0)';

run;

Tentative model: ARIMA(2,1,0)

Unconditional Least Squares Estimation

Parameter Estimate Standard Error t Value Approx

Pr > |t|

Lag Variable Shift

MU 24.60820 6.99701 3.52 0.0017 0 price 0

AR1,1 0.23270 0.17614 1.32 0.1984 1 price 0

AR1,2 -0.52555 0.17708 -2.97 0.0065 2 price 0

NUM1 -3.70280 0.96942 -3.82 0.0008 0 year1 0

NUM2 0.12424 0.02869 4.33 0.0002 0 year2 0

Page 20: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

20

Constant Estimate 31.81473

Variance Estimate 148.1016

Std Error Estimate 12.1697

Autocorrelation Check of Residuals

To Lag Chi-Square DF Pr > ChiSq Autocorrelations

6 3.28 4 0.5128 -0.023 -0.018 -0.012 -0.214 -0.084 -0.179

12 5.31 10 0.8696 -0.109 -0.068 0.058 0.140 -0.043 -0.054

18 10.73 16 0.8258 0.067 -0.120 0.162 0.173 -0.082 0.031

24 16.66 22 0.7819 -0.114 0.080 -0.079 -0.135 -0.012 0.091

Page 21: Stat 5100 Handout #12.d SAS: ARIMA Dependence ...math.usu.edu/~jrstevens/stat5100/12.d.ARIMA.pdf4 Example 2: General Electric’s gross investment (in millions of dollars) for years

21

/* Note that since year wasn't a predictor,

it won't appear in fout1;

so we'll need to add it 'by hand' before making the plot

*/

data fout1; set fout1;

year = _n_ + 1975;

if year=2007 then price=280.1;

if year=2008 then price=326.6;

if year=2009 then price=235.0;

if year=2010 then price=278.2;

if year=2011 then price=352.1;

if year=2012 then price=361.8;

if year=2013 then price=350.1;

if year=2014 then price=335.0;

run;

proc sgplot data=fout1;

series x=year y=price /

lineattrs=(pattern=solid thickness=5);

series x=year y=forecast / lineattrs=(pattern=solid);

series x=year y=l90 / lineattrs=(pattern=dash);

series x=year y=u90 / lineattrs=(pattern=dash);

refline 2007 / axis=x;

xaxis label='Year';

yaxis label='Gas Price';

title1 'Pre-2007 gas data: ARIMA(2,1,0)';

title2 'Forecast with 90 percent confidence intervals';

run;