SAS Introduction to Time Series Forecasting-libre

  • Upload
    palani

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    1/34

    Quick Review about How to Use SAS to

     Analyze Time Series Data

    1. Get to know SAS

    How to Start SAS? 

     

    )f you use computer in this laboratory, please start SAS from Desktop or Start/programs. 

      You can use the SAS software at the laboratory of the Computer center of our university, or even

    by the server of our university if you have the permission.

     

    You can get a temporary license of the SAS software by contacting our computer assistant.

    Five main windows

      Program Editor  -- Edit SAS programs

     

    Log – Records the running messages of SAS session, which is very helpful for program

    debugging.

      Output  – Display output from SAS procedures

     

    Explorer  – Manage SAS datasets or Create new libraries

      Result  – Show a tree-like summary of your Output  window

    Several important shortcuts 

      Open a new Program Editor  window

      Open SAS program which is composed before

      Save your program as external files

      Create a new library

      Open Explorer  window to manage SAS datasets

     

    Submit the whole program or just submit a few lines SAS programs to SAS System

    2. How to use SAS

    Two important concepts

     

    SAS library  – A folder in which the SAS data set is. You can create a new library by libname or

    shortcut . 

     

    SAS data set  – Temporary and Permanent SAS data set. 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    2/34

     

    Structure of SAS program

     

    DATA step – Deal with SAS dataset, or change raw data into a SAS data set, which can be

    identified by SAS System and dealt with by PROC step 

    =====================================

    DATA dataset name; 

    INPUT variable; 

    CARDS;

    …………………..data line

    ;

    =====================================

    The dataset name must contain no more than 8 characters alphabet a, b…, digit , … or

    underscore (_)), and begin with alphabet or underscore.

     

    PROC step – Deal with SAS data set, and output results of analysis

    =====================================

    PROC procedure name DATA= dataset name;

    RUN;

    =====================================

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    3/34

    The procedure name is the name of SAS Command, and includes PRINT, PLOT, GPLOT , and

    INSIGHT  etc.

    3. Change raw data into SAS datasetCreate a new library

    Library Name Physical Path

      Using SAS program.

     

    Using shortcut.

    SAS data set name

      For example, lib1.blood  means that data set blood  is saved in the library lib1.

     

    The library_name can be sashelp, sasuser , maps, work  or lib1. The dataset_name is due to you,

    such as blood .

      When library_name is equal to work , the data set work.dataset_name is temporary SAS data set,

    which will be deleted automatically when you shut down the SAS software. At this time, the

    work  can be ignored. For example, you use blood  or work.blood  as the name of the data set.

    Three methods to deal with data through DATA Step

     

    The size of raw data is small.

    Lib1 D:\example

    Libname lib D:\example 

    library_name.dataset_name

    DATA dataset name;

    INPUT variable ;CARDS;

    ………………. data line)

    ;

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    4/34

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    5/34

    OUTLIER options; 

    FORECAST options; 

    RUN;

    QUIT;BY

      A BY statement can be used in the ARIMA procedure to process a data set in groups of

    observations defined by the BY variables. Note that all IDENTIFY, ESTIMATE, and FORECAST

    statements specified are applied to all BY groups.

    IDENTIFY

     

    ALPHA= significance-level: The ALPHA= option specifies the significance level for tests in the

    IDENTIFY statement. The default is 0.05.

     

    ESACF: computes the extended sample autocorrelation function and uses these estimates to

    tentatively identify the autoregressive and moving average orders of mixed models.The ESACF option generates two tables. The first table displays extended sample

    autocorrelation estimates, and the second table displays probability values that can be used to

    test the significance of these estimates. The P= (pmin: pmax) and Q= (qmin: qmax) options

    determine the size of the table.

     

    NLAG= number: indicates the number of lags to consider in computing the autocorrelations and

    cross-correlations.

     

    STATIONARITY=(ADF= AR orders DLAG= s) or STATIONARITY=(DICKEY= AR orders DLAG= s):

    performs augmented Dickey-Fuller tests. If the DLAG=s option specified with s is greater than

    one, seasonal Dickey-Fuller tests are performed. The maximum allowable value of s is 12. The

    default value of s is one.

      VAR= variable ( d1, d2, ..., dk ) : names the variable containing the time series to analyze. The

    VAR= option is required. A list of differencing lags can be placed in parentheses after the

    variable name to request that the series be differenced at these lags. For example, VAR=X(1)

    takes the first differences of X. VAR=X(1,1) requests that X be differenced twice, both times with

    lag 1, producing a second difference series, which is (Xt-Xt-1)-(Xt-1-Xt-2)=Xt-2Xt-1+Xt-2 .

    VAR=X(2) differences X once at lag two (Xt-Xt-2) . If differencing is specified, it is the

    differenced series that is processed by any subsequent ESTIMATE statement.

    ESTIMATE

     

    METHOD=ML/ULS /CLS: specifies the estimation method to use. METHOD=ML specifies the

    maximum likelihood method. METHOD=ULS specifies the unconditional least-squares method.

    METHOD=CLS specifies the conditional least-squares method. METHOD=CLS is the default.

      P= order: specifies the autoregressive part of the model. By default, no autoregressive

    parameters are fit. P=(l1, l2, ..., lk) defines a model with autoregressive parameters at the

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    6/34

    specified lags. P= order is equivalent to P=(1, 2, ..., order). A concatenation of parenthesized lists

    specifies a factored model. For example, P=(1,2,5)(6,12) specifies the autoregressive model

      Q= order: specifies the moving average part of the model.

      NOCONSTANT/NOINT: suppresses the fitting of a constant (or intercept) parameter in the

    model. (That is, the parameter is omitted.)

     

    PLOT: plots the residual autocorrelation functions. The sample autocorrelation, the sample

    inverse autocorrelation, and the sample partial autocorrelation functions of the model residuals

    are plotted.

    FORECAST

      ALPHA= n: sets the size of the forecast confidence limits. The ALPHA= value must be between 0

    and 1. When you specify ALPHA=, the upper and lower confidence limits will have a confidence

    level. The default is ALPHA=.05, which produces 95% confidence intervals. ALPHA values are

    rounded to the nearest hundredth.

     

    ID= variable: names a variable in the input data set that identifies the time periods associated

    with the observations.

     

    INTERVAL= interval /n: specifies the time interval between observations.

     

    LEAD= n: specifies the number of multistep forecast values to compute.

     

    OUT= SAS-data-set: writes the forecast (and other values) to an output data set.

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    7/34

    Fitting the ARIMA Model to a Simulated Time Series

    0. Simulate an AR(2) time series data

      The model: Z(t)=0.5*Z(t-1)+0.4Z(t-2)+a(t)

     

    The SAS program:

      Simulate an MA(2):

    /* Create a new library */ 

    libname ts 'D:/TimeSeries';

    /* Simulate an AR(2) process */ 

    data ts.ar;

    z1=0; z2=0;

    do t = -50 to 200;

    a = rannor( 32565 );z = z1*0.5 + z2*0.4 + a;

    if t > 0 then output;

    z2=z1; z1=z;

    end;

    keep z t;

    run; 

    /* Simulate an MA(2) process */ 

    data ts.ma;

    a1=0; a2=0;

    do t = -50 to 200;

    a = rannor( 32565 );

    z = a + a1*0.2+a2*0.5;

    if t > 0 then output;

    a2=a1; a1=a;

    end;

    keep z t;

    run;

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    8/34

     

    Simulate an ARMA(1,1):

    1. Draw the time plot

     

    The SAS program:

     

    The result:

    Simulated AR(2) Time Series

    /* Draw the time plot */ 

    symbol i=join v=none;

     proc gplot data=ts.ar;

    plot z*t;

    run;

    quit; 

    /* Simulate an ARMA(1,1) process */ 

    data ts.arma;

    z1=0; a1=0;

    do t = -50 to 200;a = rannor( 32565 );

    z = z1*0.5 + a + a1*0.3;

    if t > 0 then output;

    a1=a; z1=z;

    end;

    keep z t;

    run; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    9/34

     

    2. Identify some suitable models

      The SAS program:

      The summary of the output:

    /* Identify some suitable models with minimum requirement */  proc arima data=ts.ar;

    identify alpha=0.05 var=z nlag=20;

    run;

    /* Use EACF to identify the orders of ARMA models */ 

    identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);

    run;

    /* Use Dickey-Fuller unit root tests to check the stationarity */ 

    identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));

    run;

    /* Take differencing on the data and analyze again */ 

    identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);run;

    quit; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    10/34

     

    The detailed output without differencing:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    11/34

    Series Correlation Panel

    3 deterministic trends

    different values of k

    3 different tests

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    12/34

     

    The detailed output after first differencing:

    Series Correlation Panel

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    13/34

     

    We may reach three possible models:

    ARIMA(3,0,0); ARIMA(0,1,1); and ARIMA(2,1,0).

    3. Estimate the models

     

    Candidate models: AR(3), ARMA(3,1) with AR coefficient at lag 2 suppressed and ARIMA(2,1,0)

    without intercept. 

     

    The SAS program:

    /* Identify some suitable models with minimum requirement */ 

     proc arima data=ts.ar;

    identify alpha=0.05 var=z nlag=20;

    run;

    /* Use EACF to identify the orders of ARMA models */ 

    identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);

    run;

    /* Use Dickey-Fuller unit root tests to check the stationarity */ 

    identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));

    run;

    /* Take diffferencing on the data and analyze again */ 

    identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);

    run;

    /* Use CLS method to estimate the AR(3) model */ 

    identify var=z;

    run;

    estimate method=cls p=3 plot;

    run;/* Use ULS method to estimate the ARMA(3,1) model */ 

    /* with the second coefficient is suppressed */ 

    estimate method=uls p=(1,3) q=1 plot;

    run;

    /* Use ML method to estimate the ARIMA(2,1,0) model without

    intercept */ 

    identify var=z(1);

    run;

    estimate method=ml p=2 noint plot;

    run;

    quit;

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    14/34

     

    The summary of the output:

      The estimated AR(3) model:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    15/34

     

      The important outputs for the fitted AR(3) model:

    Mean

    Estimated

    parameters

    P values of

    significanceIntercep

    Variance of the

    white noise

    Standard deviation

    of the white noise

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    16/34

     

    Outputs for ARMA(3,1) with AR coefficient at lag 2 suppressed:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    17/34

     

     

    Outputs for ARIMA(2,1,0) without intercept:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    18/34

    4. Diagnostic checking for the fitted ARIMA(2,1,0)

     

    The SAS program:

     

    The summary of the output:

    /* Diagnostic checking for the fitted ARIMA(2,1,0) */ 

     proc arima data=ts.ar;

    identify var=z(1);run;

    estimate method=ml p=2 noint plot;

    run;

    forecast out=ts.dc lead=0 id=t;

    run;

    quit;

    /* Draw the time plot */ 

    symbol i=join v=none;

     proc gplot data=ts.dc;

    plot residual*t;run;

    quit;

    /* Perform the normality test */ 

     proc univariate data=ts.dc normal plot;

    var residual;

    run; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    19/34

     

     

    The time plot:

     

    A normality test:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    20/34

     

    Distribution plot and Q-Q plot for normality:

      Sample autocorrelation function (ACF) of the residuals and Sample partial ACF of the residuals:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    21/34

     

    Ljung-Box test:

      Analysis of over-parameterized models:o  The SAS program:

    The first over-parameterized model based on the sample partial ACF:

    /* Analysis of over-parameterized models */ 

     proc arima data=ts.ar;

    identify var=z(1) nlag=20;

    run;

    estimate method=ml p=2 noint plot;

    run;

    estimate method=ml p=(1,2)(6) noint plot;

    run;

    estimate method=ml p=2 q=(6) noint plot;run;

    quit; 

    Test statistic

    Degree of

    freedomP-values

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    22/34

     

    o  The second over-parameterized model based on the sample ACF:

    Three fitted models:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    23/34

     

    Conclusion is that the fitted ARIMA(2,1,0) is not adequate!

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    24/34

     

    5. Do forecasting with the fitted ARIMA(2,1,0) model

     

    The SAS program:

     

    The results:

    /* Do forecasting by using the fitted ARIMA(2,1,0) model */ 

     proc arima data=ts.ar;

    identify var=z(1) nlag=20;

    run;

    estimate method=ml p=2 noint plot;

    run;

    forecast out=ts.out lead=50 id=t;

    run;

    quit;

    /* Draw the time plot */ symbol i=join v=none;

     proc gplot data=ts.out;

    plot z*t=1 forecast*t=2 l95*t=3 u95*t=3/overlay;

    run;

    quit; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    25/34

    1

    Fitting the Seasonal ARIMA Model to

    The Airline Passenger Data

    0. The data

     

    The airline passenger data records the number of passengers traveling by air per month from

    January, 1949 to December, 1960.

     

    It is given as Series G in Box and Jenkins (1976), and has been used in time series analysis

    literature as a standard example of a non-stationary seasonal time series.

    1. Draw the time plot

     

    The SAS program:

      The time plot:

    /* Create a new library */ 

    libname ts 'D:/TimeSeries';

    /* Draw the time plot */ 

    symbol i=join v=none;

     proc gplot data=sashelp.air;

    plot air*date;

    run;

    quit; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    26/34

    2

     

    Taking log transformation and drawing the time plot again.

      The time plot:

    2. Identify some suitable models

     

    The SAS program:

    /* Take log transformation*/ 

    data ts.lair;

    set sashelp.air;

    lair=log(air);run;

    /* Draw the time plot */ 

    symbol i=join v=none;

     proc gplot data=ts.lair;

    plot lair*date;

    run;

    quit; 

    /* Identify some suitable models*/ 

     proc arima data=ts.lair;

    identify alpha=0.05 var=lair;

    run;

    /* Take differencing since the sample ACF decays slowly */ 

    identify alpha=0.05 var=lair(1);

    run;

    /* Take seasonal differencing since the sample ACF decays slowly

    especially after periods */ 

    identify alpha=0.05 var=lair(1,12);

    run; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    27/34

    3

     

    The sample ACF of original sequence:

      The sample ACF of the sequence after common differencing:

     

    The sample ACF of the sequence after both common differencing and seasonal differencing:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    28/34

    4

    3. Estimate the seasonal ARIMA(0,1,1)X(0,1,1)12 model

     

    The SAS program:

     

    The estimated model:

     proc arima data=ts.lair;

    identify alpha=0.05 var=lair(1,12);

    run;

    /* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */ 

    estimate method=ml q=(1)(12) plot;

    run; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    29/34

    5

    4. Diagnostic checking the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model

      The SAS program:

      The sample ACF of residuals:

     proc arima data=ts.lair;identify alpha=0.05 var=lair(1,12);

    run;

    /* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */ 

    estimate method=ml q=(1)(12) plot;

    run;

    /* Diagnostic checking by overfit AR part */ 

    estimate method=ml p=(9) q=(1)(12) plot;

    run;

    /* Diagnostic checking by overfit MA part */ 

    estimate method=ml q=(1)(12)(23) plot;

    run;

    /* Export the data to do further diagnostic checking*/ 

    forecast out=ts.out lead=0 id=date;

    run;

    quit;

    /* Draw the time plot */ 

    symbol i=join v=none;

     proc gplot data=ts.out;

    plot residual*date;

    run;

    quit;

    /* Perform the normality test */ 

     proc univariate data=ts.out normal plot;var residual;

    run;

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    30/34

    6

     

    The sample PACF of residuals:

      Ljung-Box test:

      Diagnostic checking by overfitting the AR part and the MA part:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    31/34

    7

    Compare the

    estimated

    coefficients

    Compare

    the model

    criteria

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    32/34

    8

     

    The time plot of the residuals:

     

    Normality tests:

     

    Distribution plot and Q-Q plot for normality:

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    33/34

    9

    5. Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model

     

    The SAS program:

    /* Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model */ 

     proc arima data=ts.lair;

    identify alpha=0.05 var=lair(1,12);

    run;

    estimate method=ml q=(1)(12) plot;

    run;

    forecast out=ts.out lead=24 id=date interval=month;

    run;

    quit;

    /* Draw the time plot */ 

    symbol i=join v=none;

     proc gplot data=ts.out;

    plot lair*date=1 forecast*date=2 l95*date=3 u95*date=3/overlay;

    run;

    quit; 

  • 8/19/2019 SAS Introduction to Time Series Forecasting-libre

    34/34

    10

     

    The result: