Time Series Laboratory Exercise II

7/31/2019 Time Series Laboratory Exercise II

1/21

Time Series Laboratory Exercise IIAsaad, Al-Ahmadgaid B.

August 24, 2012al in the cloud:email: [email protected]

website: alstat.weebly.comblog: alstatr.blogspot.com

Given the original data find the following:

1. Historical Plot

Correlogram (ACF and PACF)

ADF Test - Stationary or Non-Stationary? If non-stationary, trans-form the data using differencing. After differencing perform thehistorical plot, correlogram, and ADF test again, until it becomesstationary.

2. Seasonal - Seasonal Differencing

3. Modelling of the Stationary data.

What are the several prospect of models - AIC or BIC - residualsgenerated, show its correlogram, historical plot and test for serialcorrelations.

Answers for Southern Oscillation Index Data:

Historical PlotThere is no trend and seasonality seen in the plot of the SOI data,

Southern Oscillation Index

Time

SOI

1880 1900 1920 1940 1960 1980 2000

40

20

0

20

Figure 1: Historical Plot of the SouthernOscillation Index. Plotted in R using thefollowing codes:

its clearly an erratic variation. Thus, this gives us an idea that its anstationary process. The figure was plotted in R. The codes is shownin the next page


2/21

time series laboratory exercise ii 2

r codes for figure 1

Correlogram (ACF and PACF)The autocorrelation function plot of the southern oscillation indexdata are insignificant at lag 12 and 13, and consistently insignificantat negative values starting at lag 27.

Figure 2: The Autocorrelation FunctionPlot of the Southern Oscillation IndexData.

Lag AC PACF1 0.634 0.6342 0.532 0.2173 0.458 0.0994 0.395 0.0465 0.358 0.0516 0.309 0.0087 0.251 -0.0288 0.197 -0.0369 0.166 -0.002

10 0.101 -0.06711 0.056 -0.04312 0.022 -0.02713 -0.030 -0.06014 -0.099 -0.10215 -0.116 -0.01516 -0.106 0.04117 -0.104 0.02218 -0.122 -0.02619 -0.138 -0.02020 -0.123 0.02921 -0.114 0.009

22 -0.110 -0.00723 -0.100 0.00724 -0.126 -0.06425 -0.085 0.04426 -0.077 -0.00227 -0.034 0.05528 -0.021 -0.00429 -0.032 -0.04630 -0.041 -0.03531 -0.043 -0.01532 -0.042 -0.02333 -0.015 0.03134 -0.029 -0.03835 -0.024 0.00736 -0.024 -0.005

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF

SOI Autocorrelation Function Plot


The values of the ACF generated in Eviews is shown in the marginalside. And as observe, it becomes insignificant all the way down tolag 36 from lag 27. Since autocorrelation function is the set of the


3/21


autocorrelation coefficients arranged as a function of separation intime, which actually measures the correlation between observationsat different times. Then, positive autocorrelation might be consid-ered a specific form of "persistence", a tendency for a system toremain in the same state from one observation to the next. For ex-

ample, the likelihood of having a positive value for SOI in the nextmonth is greater if the SOI of the current month is positive than ifSOI is negative. However, that is just a likelihood, because as seenin the historical plot there are cases that a negative value of SOIwould have a positive value on the next month. This case may beexplained by the negative values of the ACF.

The partial autocorrelation function plot shows a statistical signif-icance at lag 1, 2, and 3. The next few lags are at the borderlineof statistical significance, but lag 14 still shows a statistical signifi-cance.

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

Lag

PartialACF

SOI Partial Autocorrelation Function Plot

Figure 3: The Partial AutocorrelationFunction Plot of the Southern OscillationIndex Data.r codes for figure 3

The PACF is useful in measuring the order of the autoregressivemodel, AR(p). Thus, the appropriate order for the AR should be 3.But, using R to confirm the order of the AR, we have


4/21


Well, for some reason R chooses 14 as the AR order. We dont havean idea why, but well stick with order 3.

Augmented Dickey Fuller (ADF) TestH0 : The data needs to be differenced to make it stationary.H1 : The data is stationary and doesnt need to be differenced

Performing this in R using the following codes we have

k = 2 implies that the lag order would be 2, you can specify thelag into any order, but in this case we choose 2 to obtain the sameoutput with the calculated augmented dicky fuller test in Eviews,which uses 2 as the lag length. Later Eviews will present why 2 was

choosen as the lag order.

Now, the computed test statistics of the ADF test is -11.8617 withp-value = 0.01. Hence with default level of significance of 0.05, thenull hypothesis is rejected. Implying that the SOI data follows astationary process.

Moreover, the default option in adf.test() of R treats the series withintercept and with trend. Now, theres no other option to changethe setting, because the usage of the adf.test() function isad f.test() usage

On this case, we cannot consider the obtain value of the ADF testin R, since our series do not have trend. Though, something isinteresting in the default option of R. Later well get back into this.


5/21


Note that, there is a warning message in the output of the R codes,which says that the p-value is smaller than the printed p-value. Andthis was corroborated in Eviews output.

eviews output

If R uses an on its output, Eviews uses

as a guidance in decision making. There areactually two tables generated by Eviews, but we are concern onlywith the ADF test statistic, and thus not to include the second table.Here, we obtain a value -11.86419 that is a bit different from the RADF test statistic output. This is because we modify the option inEviews, in which we set a none intercept on the test equation. Also,the p-value is very small compared to R. Not to worry about that,since this is actually the reason why R put a warning message thatthe p-value is smaller than the printed one in the output. In addi-tion, Eviews chooses lag length 2, since it is based on the SchwarzInformation Criterion (SIC). A direct answer, which one to consider

the calculated ADF of R or Eviews? Well, we choose the calculatedADF test of Eviews, since we set the test equation to have no trendand intercept.

Why R calculated ADF test is interesting? This is because R set atrend and intercept in the test equation. Whats with that? Well, itis better to let our stationary series to have a trend, so that when itis rejected, we can assure then that the test equation with no trendand intercept would of course be stationary.

Since the Augmented Dicky Fuller test assured us that the Southern

Oscillation Index data follows a stationary process. Then, we canproceed with the modelling.

Modelling Stationary Data (Southern Oscillation Index)What are the several prospect of models - AIC or BIC - residualsgenerated, show its correlogram, historical plot and test for serial


6/21


correlations.

We use the following useful steps to identify a tentative model.

Step 1. Plot the time series and choose proper transformations.This step was already performed in the first part of this exercise,the historical plot. And we saw that there is no need for transfor-mation, since there is no trend and seasonality in the data.

Step 2. Compute and examine the sample ACF and the sample PACFof the original series to further confirm necessary degree of differencing.

We already performed this, and we found out that differencingis not needed anymore, since there is no signs of slow cut off onthe ACF values, in which if there exist, differencing is needed.

Step 3. Compute and examine the sample ACF and PACF of the prop-erly transformed and differenced series to identify the orders ofp and q.

We are not concern with this step since we didnt perform differ-encing.

Step 4. Test the deterministic trend term 0 when d> 0We are not concern with this step since the data dont have trend,and no differencing was done.

After performing the steps, identification of the model for our datawill now be determined. Recalling back, the series of the SouthernOscillation Index data shows no trend and seasonality. This indi-cates a stationary process with constant mean and variance. Thefact that the ACF decays exponentially and the PACF has a thirdspike that extend significantly (=0.05) at lag 3 and no differencingwas done, indicates that the series is likely to be generated by anAR(3) process,

(1 1B 2B2 3B

3)(Zt ) = at

(1 1B 2B2 3B

3)Zt = at (1)

The above model is tentative. To choose for the best one, we needto have a candidate model. The model should of course have an ARterm. Now, since our tentative model is an AR(3), then we couldhave a candidate that perhaps could be a combination of an AR andMA. Below are our tentative model.

ARIMA(3,0,0) - Tentative Model

ARIMA(3,0,1)

ARIMA(3,0,2)

ARIMA(3,0,3)

The above models will be diagnosed, well going to look on theresiduals of it if it is white noise already. That is, well going to


7/21


investigate the historical plot, the correlogram, and the serial corre-lation of it.

D I A G N O S T I Cround 1: historical plot

Residuals of ARMA(3,0) of SOI Data

Time

Residuals

1880 1900 1920 1940 1960 1980 2000

40

20

0

20

Figure 4: Residual Historical Plot of theARMA(3,0)


Time

Residuals

1880 1900 1920 1940 1960 1980 2000

40

20

0

20



Time

Residuals

1880 1900 1920 1940 1960 1980 2000

40

20

0

20




Time

Residuals

1880 1900 1920 1940 1960 1980 2000

40

20

0

20

The four figures shows the plot of the residuals of the four can-didate models. All of them are almost the same, and is generated

by the following R codes.

candidate models for arima(p,0,q)



8/21





Since we cannot of compare the difference with the historical plotof the residuals, then lets consider their correlogram.

round 2: correlogram

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF

ARMA(3,0) Residuals ACF Plot of SOI data

Figure 8: Residual AutocorrelationFunction Plot of ARMA(3,0).

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF



A correlation of a variable with itself at different times is knownas autocorrelation or serial correlation. The autocorrelation func-tion plot of the residuals of the candidate models are also almostthe same. And it can be observed that the first few lags except for

lag 0 shows an insignificant ACF. However, there are still spikes thatextend significantly. Ifk = 0 (i.e. the ACF=0), the sampling distri-bution ofrk (estimated ACF) is approximately normal, with a meanof 1n and a variance of

1n . Hence, ifrk falls outside the dashed blue

lines in the ACF plot, we have evidence against the null hypothesisthat = 0 at the 5% level. However, we should be careful aboutinterpreting multiple hypothesis tests. Firstly, ifk does equal 0 atall lags k, we expect 5% of the estimates, rk , to fall outside the lines.Secondly, the rk are correlated, so if one falls outside the lines, theneighbouring ones are more likely to be statistically significant. Wecannot tell if our residuals at this point is normal or not, though

most spikes are insignificant, but there are still other spikes thatmakes it not normal. Nevertheless, since by default the blue linesare at 5% level of significance, then we can have 5% of the estimateto fall outside the lines. All four residuals ACF plots, have four sig-nificant spikes. And that leads us to difficult decision on which oneis closest to normality. Anyway, well proceed with the next round,


9/21


that is testing the normality of the residuals.

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF



0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

Lag

ACF







.round 4: normalityTo test the normality of the residuals we can make use of the Shapiro-Wilk test and Histogram.

shapiro-wilk test

The above output shows us that ARMA(3,0) model is not normallydistributed, since the p-value is less than 0.05 and that rejects ournull hypothesis that the residuals are normally distributed.


10/21


ARMA(3,1) is also not normally distributed, since the p-value 1.039e-05 is less than 0.05.

ARMA(3,2) is also not normally distributed, since the p-value 1.091e-05 is less than 0.05.

Yeah, everything is not normal even ARMA(3,3) the last candidatemodel, since the p-value 1.178e-05 is less than 0.05.

Big Problem!

Since, everything is not normal. Then, we can find at least a modelthat is closer to normal, and thats by investigating the skewnessand kurtosis of our residuals. We are interested now on skewnessthat is closer to zero, and kurtosis that is closer to 3. Below is thecalculation from R.skewness


11/21


From the four output, we found out that the ARMA(3,0) has theskewness that is closer to zero. What about the kurtosis?

kurtosis

It shows that the ARMA(3,3) has a kurtosis that is closer to 3, butthe other model is also closer. Among the four models, we foundout that models ARMA(3,0), ARMA(3,1) and ARMA(3,2) is closerto normality in terms of its skewness and kurtosis. Thus, we aregoing to eliminate the last candidate model ARMA(3,3).

round 5: test for serial correlationTo save time, well not going to use R for now. Lets consider the

output of the test from Eviews. Figure 12, is a sample output fromEviews Serial Correlation LM Test. Now, we are concern only withthe Durbin-Watson computed test statistic. The null hypotheses isthat,

H0 : The errors are uncorrelated.

Due to large number of observations, we have the following crite-rion for the test, that is if

Dubin-Watson Test Statistics

< 2 2 > 2

(+) Serial Corr. (NO) Serial Corr. (-) Serial Corr

Thus, the ARMA(3,0) with 1.905668 Durbin-Watson test statisticshas a positive serial correlation. For ARMA(3,1) is 1.990580 whichis approximately 2, implies no serial correlation. For ARMA(3,2) is2.010376, closer to 2. Thus, at this point well going to eliminate


12/21


the ARMA(3,0) as the candidate model. And we are left with twomodels, which could be the closest to say that errors are white-noise.

Figure 12: Serial Correlation LM Test inEviews

Now, performing the Akaike Information Criterion (AIC) for thetwo models. We have the following calculations in R.

To begin with, consider the general model ARIMA(p, d, q). Wellinvestigate the AIC of the two models left. And we choose themodel with the smallest value of AIC.

aic for arima(p,0,q)

From the output, we have model302 or ARMA(3,2) as the best modelsince it has the smallest AIC among the two of them.

Best Model obtained using Akaike Information Criterion (AIC):

(1 1B 2B2 3B

3)Zt = (1 1B 2B2)at

And the coefficients of our model now, would be


13/21


Lets try forecasting 12 months ahead using our model,

Figure 13 is the plot of the series with the forecasted values high-lighted with orange-yellow color.

Forecasts from ARIMA(3,0,2) with nonzero mean

1880 1900 1920 1940 1960 1980 2000

40

20

0

20

Figure 13: Forecasted values of South-ern Oscillation Index from ARMA(3,2)model


14/21


Answers for Gross Domestic Product of the Philippines Data

Historical PlotFigure 14, a regular pattern of high points or peaks during the sec-ond and fourth quarters and low points or troughs during the firstand third quarters are evident. A gradual change in the seasonal

behavior is also pictured by the plot. The second quarter peak hasdisappeared during the most recent years with the fourth quarterpeak becoming more pronounced.

Gross Domestic Product of the Philippines

Time

GDP

1980 1985 1990 1995 2000 2005 2010

600000

1000

000

Figure 14: Gross Domestic Product ofthe Philippines from January 1981 to

July 2010, with periodicity equal to 4.r codes for figure 14

From the historical plot, we have an idea that the data is nonstation-ary, since the mean is not constant and the variance seems to be notconstant also. And with that, well apply differencing in the latersteps.

Correlogram (ACF and PACF)

The autocorrelation function plot of the Philippines GDP (Figure 8)shows a slow decaying.r codes for figure 8


15/21


0 1 2 3 4 5

0.2

0.2

0.6

1.0

Lag

ACF

GDP Autocorrelation Function Plot

Figure 15: Gross Domestic Product ofthe Philippines Autocorrelation Func-tion Plot.

1 2 3 4 5

0.4

0.0

0.4

0.8

Lag

PartialACF

GDP Partial Autocorrelation Function Plot

Figure 16: Gross Domestic Product ofthe Philippines Partial AutocorrelationFunction Plot.


16/21



And the partial autocorrelation function cuts off at lag 2, but withsignificant spikes at lag 4, 5, and 9 also. Now, since the ACF decaysslowly as lag increases. Then, this type of behavior is said to havea trend on the series, and thats what we actually have. And beforeapplying differencing its better to perform the Augmented DickyFuller (ADF) test to confirm the stationarity of the data.

Augmented Dicky Fuller (ADF) testH0 : The data needs to be differenced to make it stationary.H1 : The data is stationary and doesnt need to be differenced

Well use R for performing the test, and we will check the answer

with Eviews calculation.

And BOOM!, as expected the series is nonstationary since the nullhypothesis is not rejected. Thus, the data needs to be differenced.Eviews output is shown on the next page, in this case we obtain thesame calculation. This is expected since the data has trend whichwe actually apply that factor on the option of the Eviews.

eviews output


17/21


Nonstationary in series would need differencing, thus we proceedwith the the next step.

Seasonal DifferencingAt this point well use Eviews, since there is no seasonal differenc-ing in R. Figure 17 is the first seasonal difference of the logarithmic

transformation of the GDP data.

Figure 17: First Difference of the Log-arithmic Philippines Gross DomesticProduct Data

As observed there is no trend already in the plot, which implies thatthe model is stationary.

Correlogram (ACF and PACF)The autocorrelation function plot of the first differenced log of GDPhas no significant spikes both in ACF and PACF. In this case wecannot determine our tentative model. Now, well not proceed withsecond differencing, instead we test the stationarity of the first or-der difference using ADF test.

From the ADF test table, we found out that the first order dif-ferenced of log of GDP is stationary since -11.18407 is less than thethree critical values. Thus, we proceed with diagnostic checking.

ModellingAfter performing the previous step, our candidate model would bea combination of AR and MA, with first order differencing. Thus,we can use the following:


18/21



19/21


ARIMA(1,1,0)

ARIMA(0,1,1)

ARIMA(1,1,1)

ARIMA(2,1,1)

ARIMA(1,1,2) ARIMA(2,1,2)

D I A G N O S T I CWell not going to investigate the historical and ACF plot of thecandidate models since its difficult to see the differences on it. Onlythe Normality, and the Serial Correlation is our concern now. Andthats enough to investigate for a white noise error.test for serial correlation and normality

At this point well going to use Eviews. Below is the sample Eviewsoutput of ARIMA(1,1,0). From the table, we are concern with the

Figure 18: Sample Eviews Output ofARIMA(1,1,0)

Durbin-Watson stat for our serial correlation test, the criterion isjust the same with what was done in Southern Oscillation Index

data. In which, we are looking on the value of the test stat thatis closer to 2. And notice that the ARIMA(1,1,0) has 1.997505 teststat which implies a negative serial correlation. Below is the obtainDurbin-Watson stat, Jarque-Bera, and Akaike Information Criterionfor ARIMA(0,1,1), ARIMA(1,1,1), ARIMA(2,1,1), ARIMA(1,1,2) andARIMA(2,1,2)


20/21


Models Durbin-Watson AIC Jarque-Bera

ARIMA(1,1,0) 1.997505 -4.834859 4.249897ARIMA(0,1,1) 2.000940 -4.841970 4.313999ARIMA(1,1,1) 1.980834 -4.866778 1.713571ARIMA(2,1,1) 1.986285 -4.827582 4.326431

ARIMA(1,1,2) 1.988571 -4.833490 4.733510ARIMA(2,1,2) 2.034422 -4.906359 0.848613

Among all the candidate models, the best one is the ARIMA(2,1,2).This is because it has the smallest value of Jarque-Bera among theother model which leads it to normality. The Durbin-Watson of it isalso closer to 2, implying no serial correlation. In addition, it has thesmallest value of AIC. And thus ARIMA(2,1,2) is the best model. Westop at AR(2) and MA(2) because we prefer a lower order of AR andMA since the higher the order the more complicated the equationsof the model.

Obtained Best Model:

(1 1B 2B2)(1 B)Zt = (1 1B 2B

2)at

And the roots of the Model is shown below, and the significance ofits coefficients.

Data Description

Southern Oscillation Index gives an indication of the developmentand intensity of El Nio or La Nia events in the Pacific Ocean.


21/21


The SOI is calculated using the pressure differences between Tahitiand Darwin. Sustained negative values of the SOI greater than L8often indicate El Nio episodes.

These negative values are usually accompanied by sustained warm-ing of the central and eastern tropical Pacific Ocean, a decrease in

the strength of the Pacific Trade Winds, and a reduction in winterand spring rainfall over much of eastern Australia and the Top End.Sustained positive values of the SOI greater than +8 are typical ofa La Nia episode. They are associated with stronger Pacific tradewinds and warmer sea temperatures to the north of Australia. Wa-ters in the central and eastern tropical Pacific Ocean become coolerduring this time.

The SOI data collected is from January of year 1876 to July of year2012.

Data Source: www.bom.gov.au/

Gross Domestic Product is the monetary value of all the finishedgoods and services produced within a countrys borders in a spe-cific time period, though GDP is usually calculated on an annual ba-sis. It includes all of private and public consumption, governmentoutlays, investments and exports less imports that occur within adefined territory.

GDP is commonly used as an indicator of the economic health ofa country, as well as to gauge a countrys standard of living. Crit-

ics of using GDP as an economic measure say the statistic does nottake into account the underground economy - transactions that, forwhatever reason, are not reported to the government. Others saythat GDP is not intended to gauge material well-being, but servesas a measure of a nations productivity, which is unrelated.

The GDP (Philippines) data collected is from the first quarter of year1981 to the 4th quarter of year 2010.

Data Source: http://www.nscb.gov.ph/

References:

Wei, W. W.S. (1990). Time Series Analysis. Canada. Addison-Wesley.

Crawley, M. J. (2007). The R Book. England. Wiley.

Documents

Time Series Laboratory Exercise II