Econ 623 Econometrics II Topic 5: Non-stationary …Econ 623 Econometrics II Topic 5: Non-stationary Time Series 1 Types of non-stationary time series models often found in economics

Econ 623 Econometrics II

Topic 5: Non-stationary Time Series

1 Types of non-stationary time series models oftenfound in economics

� Deterministic trend (trend stationary):

Xt = f(t) + "t;

where t is the time trend and "t represents a stationary error term (hence both the

mean and the variance of "t are constant. Usually we assume "t � ARMA(p; q)

with mean 0 and variance �2)

� f(t) is a deterministic function of time and hence the model is called the

deterministic trend model.

�Why non-stationary? E(Xt) = f(t)(depends on t, hence not a constant);

V ar(Xt) = �2(constant)

� Since "t is stationary, Xt is �uctuating around f(t), the deterministic trend.

Xt has no obvious tendency for the amplitude of the �uctuations to increase

or decrease since V ar(Xt) is a constant. (Xt could increase or decrease,

however.)

�Xt� f(t) is stationary. For this reason a deterministic trend model is some-

times called a trend stationary model.

�Assume "t = �(L)et; et � iid(0; �2e): If we label et the innovation or the shock

to the system, the innovation must have a transient, diminishing e¤ect on

X. Why?

1

0 20 40 60 80 100

46

812

16

0 20 40 60 80 100

020

6010

0

0 20 40 60 80 100

040

8012

0

2

0 50 100 150 200

02

46

810

12

0 50 100 150 200

01

23

4

3

0 200 400 600 800 1000

20

10

010

random walk

0 200 400 600 800 1000

020

4060

80

4

Example: Xt = �0 + �1t + �(L)et; with �(L) = 1 + �L + �2L2 + � � � and

j�j < 1

So Xt = �0 + �1t+ "t; "t = �"t�1 + et

"t = �(L)et measures the deviation of the series from trend in period t. We

wish to examine the e¤ect of an innovation et on "t; "t+1; :::

"t = et + �et�1 + �2et�2 + :::

which gives @"t+s@et

= �s (impulse response)

Therefore, when there is a shock to the economy, a deterministic trend model

implies the shock has a transient e¤ect. Sooner or later the system will be

back to the deterministic trend.

� If f(t) = c1 + c2t, we have a linear trend model. It is widely used.

� If f(t) = Aert, we have an exponential growth curve

� If f(t) = c1 + c2t+ c3t2, we have a quadratic trend model

� If f(t) = 1k+abt

; we have a logistic curve

5

� Stochastic trend (unit root or di¤erence stationarity):

Xt = �+Xt�1 + "t

where "t is a stationary process (hence both the mean and the variance of "t are

constant. Usually we assume "t � ARMA(p; q) with mean 0 and variance �2)

�Another representation: (1� L)Xt = �+ "t

� Solving 1 � L = 0 for L; we have L = 1, justifying the terminology �unit

root�.

�Pure random walk: Xt = Xt�1 + et; etiid

~N(0; �2e)

1. Xt =Pt�1

j=0 et�j if we assume X0 = 0 with probability 1

2. E(Xt) = 0; V ar(Xt) = t�2e !1

3. Non-stationarity. Xt is wandering around and can be anywhere.

6

�Random walk with a drift: Xt = �+Xt�1 + et; etiid� N(0; �2e)

1. Xt = t�+Pt�1

j=0 et�j if we assume X0 = 0 with probability 1

2. E(Xt) = t�!1; V ar(Xt) = t�2e !1

3. Non-stationarity.

4. As E(Xt) = t�, a stochastic trend (or unit root) model could behave

similar to a model with a linear deterministic trend.

� In a random walk model, Xt = t�+Pt

j=0 et�j. If we label et�j the innovation

or the shock to the system, the innovation has a permanent e¤ect on Xt

because@Xt

@et�j= 1; 8 j > 0:

This is true for all models with a unit root. Therefore, when there is a shock

to the economy, a unit root model implies that the shock has a permanent

e¤ect. The system will begin with a new level every time.

� In a trend-stationary model

Xt = �0 + �1t+ �Xt�1 + et; j�j < 1;

we have@Xt

@et�j= �j ! 0:

So the innovation has a transient e¤ect on Xt

7

�A deterministic trend model and a stochastic trend (or unit root) model

could behave similar to each other. However, knowing whether non-stationarity

in the data is due to a deterministic trend or a stochastic trend (or unit

root) would seem to be a very important question in economics. For ex-

ample, macroeconomists are very interested in knowing whether economic

recessions have permanent consequences for the level of future GDP, or in-

stead represent temporary downturns with the lost output eventually made

up during the recovery.

� If (1� L)Xt = "t � ARMA(p; q); we say Xt � ARIMA(p; 1; q) or Xt is an

I(1) process.

� If (1�L)dXt = "t � ARMA(p; q); we say Xt � ARIMA(p; d; q) or Xt is an

I(d) process.

8

� Why linear time trends and unit roots?

�Why linear trends? Indeed most GDP series, such as many economic and

�nancial series seem to involve an exponential trend rather than a linear

trend (apart from a stochastic trend). Suppose this is true, then we have

Yt = exp(�t)

However, if we take the natural log of this exponential trend function, we

will have,

lnYt = �t

Thus, it is common to take logs of the data before attempting to describe

them. After we take logs, most economic and �nancial time series exhibit a

linear trend. This is why researchers often take the logarithmic transforma-

tion before doing analysis.

9

�Why unit roots? After taking logs, Suppose a time series follows a unit root

rather than a linear trend, we have (1�L) lnYt = "t; where "t is a stationary

process, such as ARMA(p,q). However, (1�L) lnYt = ln(Yt=Yt�1) = ln(1+Yt�Yt�1Yt�1

) � Yt�Yt�1Yt�1

: So the rate of growth of the series Yt is stationary. For

example, if Yt represents CPI and lnYt is a I(1) process, then (1 � L) lnYt

represent in�ation rate and is a stationary process.

� Other forms of nonstationarity: structural break in mean, structural break in

variance

10

2 Unit Root Tests and A Stationarity Test

� The theory of ARMA method relies on the assumption of stationarity.

� The assumption of stationarity is too strong for many macroeconomic time series.

For example, many macroeconomic time series involve trend (both deterministic

trends and stochastic trends �unit root).

� Tests for a unit root. Various test statistics are often used in practice: Augmented

Dickey-Fuller (ADF) test, Phillips and Perron (PP) test.

� Test for stationarity: Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.

� Three possible alternative hypotheses for the unit root tests

1. no constant and no trend: Xt = �Xt�1 + et; j�j < 1

2. constant but no trend: Xt = �+ �Xt�1 + et; j�j < 1

3. constant and trend: Xt = �+ �Xt�1 + �t+ et; j�j < 1

11

� First consider the AR(1) process, yt = �yt�1 + et; ut � iidN(0; �2)

� Stationarity requires j�j < 1

� If j�j = 1; yt is nonstationary and has a unit root

� Test H0 : � = 1 against H1 : j�j < 1

� How to estimate �? OLS estimation, ie, b� = Pytyt�1Py2t�1

= �+Pyt�1etPy2t�1

. Stock (1994)

shows that the OLS estimator of � is superconsistent:

� What is the distribution or asymptotic distribution of b�� ?

� Recall from the standard model, y = X� + e

� If X is non-stochastic, b� � � � N(0; �2(X0X)�1)

� If X is stochastic and stationary with some restrictions on autocovariance,pT (b� � �)

d! N(0; �2Q�1) where Q = p lim X0XT

12

2.1 Classical Asymptotic Theory for Covariance StationaryProcess

� Consider the AR(1) process, yt = �yt�1 + et; et � iidN(0; �2) with j�j < 1

� For this model E(yt) = 0, V ar(yt) = �2

1��2 � �2y, j = �jV ar(yt).

� What is the asymptotic property of y = 1T

PTt=1 yt?

� We haveE(y) = 0, V ar(y) = E(y2) = V ar(yt)T

�1 + 2T�1

T�+ 2T�2

T�2 + � � �+ 2 1

T�T�1

�!

0. So y r2! 0) yp! 0

� Also, TV ar(y)!P1

j=�1 j, which is called the long-run variance.

� The consistency is generally true for any covariance stationary process with con-

ditionP1

j=0 j jj <1.

� Martingale Di¤erence Sequence (MDS): E(ytjyt�1; :::; y1) = 0 for all t

� MDS is serially uncorrelated but not necessarily independent.

� MDS does not need to be stationary.

13

� Theorem (White, 1984): Let fytg be a MDS with yT = 1T

PTt=1 yt Suppose that

(a) E(y2t ) = �2t > 0 with 1T

PTt=1 �

2t ! �2 > 0, (b) Ejytjr < 1 for some r > 2

and all t, and (c) 1T

PTt=1 y

2t

p! �2. ThenpTyT

d! N(0; �2).

� Theorem (Andersen, 1971): Let

yt =P1

t=0 j"t�j

where f"tg is a sequence of iid variables with E("2t ) < 1 andP1

j=0 j jj < 1.

Let j be the jth order autocovariance. Then

pTyT

d! N(0;P1

j=�1 j)

� To know more about asymptotic theory for linear processes, read Phillips and

Solo (1991).

14

2.2 Dickey-Fuller Test

� Under H0, however, we cannot claimpT (b�� )

d! N(0; �2Q�1) because yt�1 is

not stationary.

� If yt = �yt�1 + et; and H0 : � = 1; we have b�� 1 = Pyt�1etPy2t�1

� Consider y2t = (yt�1 + et)2 =) yt�1et =

12(y2t � y2t�1 � e2t ) =)

Pyt�1et =

12(y2T � y20 �

Pe2t )

� Let y0 = 0;Pyt�1et =

12(y2T �

Pe2t )

� et � iid(0; �2) =) e2t � iid with E(e2t ) = �2: By LLN, 1T

Pe2t

p! �2

� yT = y0 +PT

t=1 et =Pet

a� N(0; T�2) =) yTpT�

d! N(0; 1) =) y2TT�2

d! �2(1)

� 1T�2

Pyt�1et =

12(y2TT�2

�Pe2t

T�2)d! 1

2(�2(1) � 1)

� What is the limit behavior ofPy2t�1?We need to introduce the Brownian Motion.

15

2.2.1 Brownian Motion

� Consider yt = yt�1 + et; y0 = 0; et � iidN(0; 1) =) yt =Pt

j=1 ej � N(0; t)

� 8t > s; yt � ys =Pt

j=1 ej �Ps

j=1 ej = es+1 + :::+ et � N(0; (t� s))

� Also note that, yt � ys is indept of yr � yq if t > s > r > q

� What happen if we go from the discrete case to the continuous case? We have

the standard Brownian Motion (BM). It is denoted by B(t)

� A stochastic process Bt over time is a standard Brownian motion if for small

time interval �, the change in Bt;�Bt(� Bt+��Bt), follows the following prop-

erties:

1. �Bt = ep� where e � N(0; 1)

2. �Bt and �Bs for changes over any non-overlapping short intervals are in-

dependent

� The above two properties imply the following:

1. E(�Bt) = 0

2. Var(�Bt) = �

3. �(�Bt) =p�

16

� In general we have

BT �B0 =

NXi=1

eip�

� Due to the above, we have

E(BT �B0) = 0

Var(BT �B0) = T

�(BT �B0) =pT

17

� Therefore, for the Brownian motion, results for the mean and variance in small

intervals also apply to large intervals.

� When �! 0, we write the Brownian motion as:

dBt = ep�

where dBt should be interpreted as the change in the Brownian motion over an

arbitrarily small time interval.

18

� Sample paths of the Brownian motion are continuous everywhere but not smooth

anywhere.

� Derivative of the Brownian motion does not exist anywhere and the process is

in�nitely �kinky�.

� A Brownian motion is nonstationary as Var(BT ) drifts up with t. The transition

density is

BT+�jBT � N(BT ;�):

� A Brownian motion is also called a Wiener process. It has no drift. Now we

extend it to the generalized Wiener process.

� We can de�ne et = e1t + e2t; e1t; e2t � N(0; 1=2); yt+ 12= yt + e1t

� De�nition of the BM: W (t) = �B(t)

19

� Note:

1. B(0) = 0

2. B(t) � N(0; t) or B(t) is a Gaussian process

3. E(B(t+ h)�B(t)) = 0; E(B(t+ h)�B(t))2 = jhj

4. E(B(t)B(s)) = minft; sg

5. If t > s > r > q;B(t) � B(s) is indept of B(r) � B(q); ie, B(t) has indept

increment

6. Since B(t) is a Gaussian process, it is completely speci�ed by its covariance

matrix

7. If B(t) is a BM, then so is B(t+ a)�B(a); ��1B(�2t);�B(t)

8. The sample path of BM is a continuous function of time with prob 1

9. No point di¤erentiable

20

� Return to the unit root test. yt = y0 +Pt

j=1 ej: De�ne pt =Pt

j=1 ej = pt�1 + et

� Change the time index from T = f1; :::; Tg to t with 0 � t � 1

� De�ne YT (t) = 1�pTp[Tt]; if

j�1T� t � j

T; j = 1; :::; T: [Tt] = integer part of Tt

� For instance, YT (1) = 1�pTpT

� YT (t) =

8<:0 if 0 � t � 1

Tp1�pT

if 1T� t � 2

Tpj�1�pT

if j�1T� t � j

T

� Two important theorems

1. Functional CLT: YT (t)d! B(t)

2. Continuous mapping theorem: if f is a continuous function, f(YT (t))d!

f(B(t))

�PT

t=1 yt =PT

j=1 pj�1 +PT

j=1 ej if y0 = 0.

�R j=T(j�1)=T YT (t)dt =

pj�1�TpT=)

Ppj�1 = �T

pTPT

j=1

R j=T(j�1)=T YT (t)dt = �T

pTR 10YT (t)dt

�PT

t=1 yt = �TpTR 10YT (t)dt+

pTPejpT=)

PTt=1 yt

TpT= �

R 10YT (t)dt+

1T

PejpT

� By FCLT and CMT,PTt=1 yt

TpT

d! �R 10B(t)dt

21

�PT

t=1 y2t�1 = y20 + y

21 + :::+ y

2T�1 = y21 + :::+ y

2T�1 =

PTt=1 y

2t � y2T =

PTt=1 p

2t � p2T

� Recall p2j�1 = �2T 2R j=Tj�1=T Y

2T (t)dt =)

Pp2j�1 = �2T 2

R 10Y 2T (t)dt and p2T =

�2TY 2T (1)

� Therefore,PTt=1 y

2t�1

T 2d! �2

R 10B2(t)dt

� Under H0; T (b�� )d! �2

(1)�1

2R 10 B

2(t)dt

� Phillips (1987) proved the above result under very general conditions.

� Note:

1. This limiting distribution is non-standard

2. The numerator and denominator are not independent.

3. It is called the Dickey-Fuller distribution since Dickey and Fuller (1976) uses

Monte-Carlo simulation to �nd the critical values of�2(1)�1

2R 10 B

2(t)dtand tabulate

them.

22

2.2.2 Case 1 (no drift or trend in the regression):

� In case 1, we test�H0 : Xt = Xt�1 + etH1 : Xt = �Xt�1 + et with j�j < 1

� This is equivalent to�H0 : �Xt = etH1 : �Xt = �Xt�1 + et with � < 0

� We estimate the following model using the OLS method:

�Xt = �Xt�1 + et; et � iid(0; �2e)

23

� The OLS estimator of � is de�ned by b�:The OLS estimator of � is de�ned by b�:They are biased but consistent.

� The test statistic (known as the DF Z test or coe¢ cient statistic) is de�ned as

T b�� Under H0, the sampling distribution is not a normal distribution, but the DF

distribution. See Table B.5 for the critical values of this distribution for di¤erent

sample sizes.

� Alternatively, we can use t-statistic b�bse(b�) : Although it is called the Dickey-Fuller t-statistic, the sampling distribution is no longer a t distribution, but

�2(1)�1

2(R 10 B

2(t)dt)1=2 .

See Table B.6 for the critical values of this distribution for di¤erent sample sizes.

24

2.2.3 Case 2 (constant but no trend in the regression):

� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + et with � < 0

� In case 2, we estimate the following model using the OLS method:

�Xt = �+ �Xt�1 + et; et � iidN(0; �2e)

� The DF Z test statistic is T b�: Under H0, the sampling distribution is not a

normal distribution, but1=2(�2

(1)�1)�B(1)

R 10 B(t)dtR 1

0 B2(t)dt�(

R 10 B(t)dt)

2 . See Table B.5 for the critical

values of this distribution for di¤erent sample sizes.

� Alternatively, we can use the DF t-statistic b�bse(b�) : The sampling distribution is nolonger a t distribution, but

1=2(�2(1)�1)�B(1)

R 10 B(t)dt�R 1

0 B2(t)dt�(

R 10 B(t)dt)

2�1=2 . See Table B.6 for the critical

values of this distribution for di¤erent sample sizes.

25

2.2.4 Case 3 (constant and trend in the regression):

� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0

or�H0 : �Xt = �+ etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0

� In this case, we estimate the following model using the OLS method:

�Xt = �+ �Xt�1 + �t+ et; et � iidN(0; �2e)

� The Dickey-Fuller Z test statistic is T b�: Under H0, the sampling distribution is

not a normal distribution, but another new distribution (it is also di¤erent from

the ones in Case 1 and 2). See Table B.5 for the critical values of this distribution

for di¤erent sample sizes.

� Alternatively, we can use the Dickey-Fuller t-statistic b�bse(b�) : The sampling dis-tribution is no longer a t distribution, but another new distribution (it is also

di¤erent from ones in Case 1 and 2). See Table B.6 for the critical values of this

distribution for di¤erent sample sizes.

Test Statistic 1% 2.5% 5% 10%Case 1 -2.56 -2.34 -1.94 -1.62Case 2 -3.43 -3.12 -2.86 -2.57Case 3 -3.96 -3.66 -3.41 -3.13

When the sample size is 100, the critical values for the Dickey-Fuller t-statistic

are given in the table below

Test Statistic 1% 2.5% 5% 10%Case 1 -2.60 -2.24 -1.95 -1.61Case 2 -3.51 -3.17 -2.89 -2.58Case 3 -4.04 -3.73 -3.45 -3.15

26

2.2.5 Which case to use in practice?

� Use Case 3 test for a series with an obvious trend, such as GDP

� Use Case 2 test for a series without an obvious trend, such as interest rates. For

variables such as interest rate we should use Case 2 rather than Case 1 since if

the data were to be described by a stationary process, surely the process would

have a positive mean. However, if you strongly believe a series has a zero mean

when it is stationary, use Case 1.

27

2.3 Augmented DF (ADF) Test

� The model in the null hypothesis considered in the DF test is a class of highly

restrictive unit root models.

� Suppose the model in the null hypothesis is Xt = Xt�1 + ut; ut = �ut�1 + et.

Unless � = 0, the DF test is not applicable.

� The above model implies the following AR(2) model for Xt,

Xt = (1 + �)Xt�1 � �Xt�2 + et = Xt�1 + ��Xt�1 + et

� In general, if

Xt = �1Xt�1 + �2Xt�2 + :::+ �pXt�p + et; et � iid(0; �2e); (2.1)

then the DF test is not applicable.

� Model (2.1) can be represented by

Xt = �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e);

where � =Pp

i=1 �i

� So if � = 1; fXtg is a unit root process; if j�j < 1; fXtg is a stationary process.

28

� In Case 2, the ADF test is based on the OLS estimator of � from the following

regression model,

�Xt = �+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)

� In Case 3, the ADF test is based on the OLS estimator of � from the following

regression model,

�Xt = �+ �t+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)

� Based on the OLS estimator of �; the ADF test statistic, b�pdV ar(b�) , can be used.It has the same sampling distribution as the DF t statistic.

29

2.4 Phillips-Perron (PP) Test

� Xt = Xt�1 + ut;�(L)ut = �(L)et; et � iidN(0; �2e); so ut is an ARMA process.

Phillips and Perron (1988) proposed a nonparametric method of controlling for

possible serial correlation in ut.

� There are two PP test statistics. One of them (so-called PP Z test) is the

analogue of T b�: The other (so-called PP t test) is the analogue ofb�pdV ar(b�) . PP

Z test has the same sampling distribution as T b� under all three cases in theprevious section. PP t test has the same sampling distribution as

b�pdV ar(b�) underall three cases in the previous section.

� In Case 2, both PP tests are based on the OLS estimator of � from the following

regression model,

�Xt = �+ �Xt�1 + "t; where "t � ARMA

� In this case, the PP Z test statistic is de�ned by T b� � 12(T 2b�2b� � s2T )(�

2 � 0);

where s2T = (T � 2)�1P(Xt � b�� b�Xt�1)

2; � = ��(1); 0 = E(u2t )

� In Case 3, both PP tests are based on the OLS estimator of � from the following

regression model,

�Xt = �+ �t+ �Xt�1 + "t; where "t � ARMA

30

2.5 Kwiatkowski, Phillips, Schmidt, and Shin (KPSS, 1992)Test

� Unlike the tests we have thus far introduced, KPSS test is a test for stationarity or

trend-stationarity, that is, it tests for the null of stationarity or trend stationarity

against the alternative of a unit root.

� It is assumed that one can decompose Xt into the sum of a deterministic trend,

a random walk and a stationary error

Xt = �t+ rt + ut (2.2)

rt = rt�1 + et (2.3)

where et � iidN(0; �2e). The initial value r0 is assumed to be �xed.

� If �2e = 0, Xt is trend-stationary. If �2e > 0, Xt is a non-stationary.

31

� KPSS test is a one-sided Lagrange Multiplier (LM) test

� The KPSS statistic is based on the residuals from the OLS regression of Xt on

the exogenous variables Zt = 1 or (1; t):

Xt = �0Zt + �t

� The LM statistic is be de�ned as: PTt=1 S(t)

2

T 2f0;

where f0 is an estimator of the long-run variance and S(t) is a cumulative residual

function:

S(t) =tXs=1

�s

� Critical values for the KPSS test statistic are:

Level of Signi�cance 10% 5% 2.5% 1%Crit value (case 2) 0.347 0.463 0.574 0.739Crit value (case 3) 0.119 0.146 0.176 0.216

32

� If the LM test statistic is larger than the critical value, we have to reject the null

of stationarity (or trend-stationarity).

� In general �t is not a white noise. As a result, the long run variance is di¤erent

from the short run variance. One way to estimate the long run variance is to

nonparametrically estimate the spectral density at frequency zero, that is, com-

pute a weighted sum of the autocovariances, with the weights being de�ned by a

kernel function. For example, a nonparametric estimate based on Bartlett kernel

is

f0 =T�1X

h=�(T�1)

(h)K(h=l);

where l is a bandwidth parameter (which acts as a truncation lag in the covariance

weighting), and (h) is the h-th sample autocovariance of the residuals �, de�ned

as,

(h) =TX

t=h+1

�t�t�h=T;

and K is the Barlett kernel function de�ned by

K(x) =

� 1� jxj if jxj � 1

0 otherwise

33

3 Revisit to Box-Jenkins

� If the null hypothesis cannot be rejected, di¤erence the data (ie, Xt�Xt�1 = ut)

to achieve stationarity.

� In the case where both a stochastic trend and a linear deterministic trend are

involved, the �rst di¤erence transformation leads to a stationary process.

� Test for a unit root in the di¤erenced data. If the null hypothesis cannot be

rejected, di¤erence the di¤erenced data until stationarity is achieved.

� ARMA(p; q)+one unit root =ARIMA(p; 1; q)

� ARMA(p; q)+d unit roots =ARIMA(p; d; q)

4 Limitations of Box-Jenkins

� The Data Generating Process has to be time-invariant. This assumption can be

too restrictive for an economy undergoing a period of transition and for data

covering a long sample period.

� What if the data is explosive, ie � > 1.

� What if a nonlinear deterministic trend is involved.

� What if the data follow a trend stationary process.

� Unit root tests and model selection are done in two separate steps.

34

5 Model Selection

5.1 Why use model selection criteria

� Select p; q in the ARMA model.

� Select lag order, the trend order and other regressors.

5.2 Conventional model selection criteria

� 1�R2

� Minimize 1� �R2

� Minimize AIC (Akaike information criterion, Akaike, 1969). AIC= �2lT+ 2k

T,

where k is the number of parameters, l is the log-likelihood value, and T is the

sample size

� Minimize BIC (Bayesian information criterion, Schwarz, 1978): BIC= �2lT+

kTlnT

35

5.3 Model selection criteria recently developed

� In AIC/BIC, the penalty depends only on the number of regressors while in PIC

it depends not only on the number of regressors but also on the type of regressors

� In the regression context with the normality assumption (ie, Y = X� + "),

PIC= �2lT+ln(jX0X=�2K j)

T, where �2K is a consistent estimate of the residual variance

of a reference model. Usually the reference model is chosen to be the biggest

plausible model.

36

Number of Parameters

Crit

eria

5 10 15

0.4

0.5

0.6

0.7

0.8

1Adj R SquareAICBICPIC

1Adj R SquareAICBICPIC

5.4 Which criterion to use in practice

� Comparison of these criteria

37

� BIC and PIC are consistent. This means in large samples they will select the

true model

� AIC in not consistent. In large samples it tends to choose a model with too many

lags

� PIC is applicable to ARIMA+Trend processes

� AIC and BIC are provided by most packages when a regression is run.

� A Gauss library COINT developed by Ouliaris and Phillips calculates PIC

� In the context of stationary ARMA models, Phillips and Ploberger (1994) found

that PIC is superior to BIC using simulated data.

� PIC is found in Schi¤ and Phillips (2000) to be superior to BIC in a class of

AR+Trend models for forecasting New Zealand GDP

38

6 Cointegration

� De�nition: two variables (Yt; Xt) is said to be cointegrated if each of them taken

individually is I(1) ( ie non-stationary with a unit root), while some linear com-

binations of them (say Yt � �Xt) is stationary (ie I(0)).

� If (Yt; Xt) is cointegrated, the regression model Yt = � + �Xt + et is called a

cointegration relation, where et is I(0).

� (1;��) is called the cointegrating vector. It is easy to see any multiple of it is

also a cointegrating vector. So normalisation is important.

� Cointegration is a long run equilibrium relationship, that is the manner in which

the two variables drift upward together.

39

� Since et is I(0), any deviation from the equilibrium will come back to the equi-

librium in the long run.

� Cointegration and common trend:

� cointegrated variables share common trends

� Suppose both Yt and Xt have a stochastic trend

�Yt = �Y t + "Y t; where �Y t is I(1) ie �Y t = �Y t�1 + !Y t and "Y t is stationary

Xt = �Xt+"Xt; where �Xt is I(1) ie �Xt = �Xt�1+!Xt and "Y t is stationary

�Now if Yt and Xt are cointegrated, then there must exist a parameter � such

that Yt � �Xt is stationary, ie, �Y t + "Y t � �(�Xt + "Xt) = �Y t � ��Xt +

"Y t � �"Xt is stationary. Since "Y t � �"Xt is stationary, for Yt � �Xt to be

stationary, �Y t � ��Xt must be 0. Therefore �Y t = ��Xt

�The equality thus necessarily requires that Yt and Xt must have a common

stochastic trend

40

� The concept of cointegration is originally due to Granger (1981) but popularised

by Engle and Granger (1987, Econometrica)

� Estimation of cointegration systems:

1. OLS estimator is not only consistent but also �super-consistent�. This

means the estimator converges to its limiting distribution at a faster rate

than the estimator in the stationary case.

2. Super-consistency is also true in the cointegration model involved the auto-

correlation.

41

� Test for cointegration (between Yt and Xt)

Basic Idea:

1. Test Yt and Xt for I(1)

2. Obtain the �tted residual bet from a regression of Yt on Xt (di¤erent cointe-

grating regression models lead to di¤erent cases)

3. Use the ADF or PP test to test for a unit root in bet. Since the test is basedon bet, not et, we cannot use the Dickey-Fuller tables as before. We must usemodi�ed critical values.

4. If H0 is rejected Yt and Xt are cointegrated. If H0 cannot be rejected, the

regression of Yt on Xt is spurious.

42

� Case 2 (constant and no trend in the regression):

�We estimate the following model (with the constant term but without trend)

using the OLS method

Yt = �+X 0t� + et

�The ADF t-statistic or the PP t-statistic can be used. The critical values

for the ADF t-statistic or the PP t-statistic are given in the table below.

� Case 3 (constant and trend in the regression):

�We estimate the following model (with the constant term and the trend)

using the OLS method:

Yt = �+X 0t� + �t+ et

�The ADF t-statistic or the PP t-statistic can be used. The critical values

for the ADF t-statistic or the PP t-statistic are given in the table below.

� Which case to use in practice?

�Use Case 3 test if either Yt or Xt or both involve a possible trend, such as

GDP

�Use Case 2 test if neither Yt nor Xt involve a possible trend, such as interest

rates and exchange rates.

k 2 3 4 5 6case 2 3 2 3 2 3 2 3 2 31% -3.90 -4.32 -4.29 -4.66 -4.64 -4.97 -4.96 -5.25 -5.25 -5.525% -3.34 -3.78 -3.74 -4.12 -4.10 -4.43 -4.42 -4.72 -4.71 -4.9810% -3.04 -3.50 -3.45 -3.84 -3.81 -4.15 -4.13 -4.43 -4.42 -4.70

43

� Error Correction Model (ECM):

1. Since the trends of cointegrated variables are linked, the dynamic paths of

such variables must bear some relation to the current deviation from the

equilibrium relationship. This connection between the change in a variable

and the deviation from equilibrium is examined via the error correction

representation.

2. The set of cointegrated variables is said to be in equilibrium if Yt��Xt =

0

3. Thus the deviation from the long run equilibrium is et = Yt � �� Xt and

this error must be stationary.

4. If the system is to return to equilibrium the movements of at least some of

the variables must respond to the magnitude of the disequilibrium.

5. ECM: �Yt = �0 + �(Yt�1 � �� Xt�1) + �Xt + "t; where (�; �) captures

the long-run equilibrium relationship. � captures the short-run dynamics

and can be interpreted as the speed of adjustment of the system towards

the long run equilibrium.

6. Since all the variables in the ECM are I(0), the classical inferences follow.

7. Estimation of ECM.

(a) NLS. The ECM is a non-linear model and can be estimated by the NLS

method, ie, minf�0;�;�;�; g "2t

(b) OLS. Rewrite the ECM as �Yt = �0+�Yt�1��Xt�1+ �Xt+"t,

where �� = ��; �� = ��: The new model is a linear model. However, �

and � cannot be directly estimated.

44

7 Spurious Regression

� Yt = a+ bZt + ut

� Classical approach requires that both Yt and Zt and hence ut are stationary

� Now consider Yt = Yt�1 + "1t, Zt = Zt�1 + "2t; where "jt � iid(0; �2j )

� Let Y0 = Z0 = 0; we have Yt =Pt

j=1 "1j, Zt =Pt

j=1 "2t

� If "1t and "2t are indept, the regression is a spurious regression.

� Why spurious?

� Using Monte Carlo simulations Granger and Newbold (1974) �nd that t statistic

rejects the null hypothesis b = 0 far more often than it should and tends to reject

it more and more frequently the larger is the sample size.

� Using asymptotic theory Phillips (1986) �nds that t p! +1, hence it will reject

the null hypothesis b = 0 all the time asymptotically. Also he shows R2p! 1

� Another type of spurious regression: Yt = a+ bt+ ut and Yt = Yt�1 + "t

� Note:

1. If Yt and Zt are stationary, the classical theory applies

2. If Yt and Zt are integrated with di¤erent orders, the regression is meaningless

3. If Yt and Zt are integrated with the same order, but ut is integrated with

the same order, we have a spurious regression

4. If Yt and Zt are integrated with the same order, but ut is integrated with a

lower order, we have a cointegration

45

Documents

Econ 623 Econometrics II Topic 5: Non-stationary …Econ 623 Econometrics II Topic 5: Non-stationary Time Series 1 Types of non-stationary time series models often found in economics