Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Econ 623 Econometrics II
Topic 5: Non-stationary Time Series
1 Types of non-stationary time series models oftenfound in economics
� Deterministic trend (trend stationary):
Xt = f(t) + "t;
where t is the time trend and "t represents a stationary error term (hence both the
mean and the variance of "t are constant. Usually we assume "t � ARMA(p; q)
with mean 0 and variance �2)
� f(t) is a deterministic function of time and hence the model is called the
deterministic trend model.
�Why non-stationary? E(Xt) = f(t)(depends on t, hence not a constant);
V ar(Xt) = �2(constant)
� Since "t is stationary, Xt is �uctuating around f(t), the deterministic trend.
Xt has no obvious tendency for the amplitude of the �uctuations to increase
or decrease since V ar(Xt) is a constant. (Xt could increase or decrease,
however.)
�Xt� f(t) is stationary. For this reason a deterministic trend model is some-
times called a trend stationary model.
�Assume "t = �(L)et; et � iid(0; �2e): If we label et the innovation or the shock
to the system, the innovation must have a transient, diminishing e¤ect on
X. Why?
1
0 20 40 60 80 100
46
812
16
0 20 40 60 80 100
020
6010
0
0 20 40 60 80 100
040
8012
0
2
0 50 100 150 200
02
46
810
12
0 50 100 150 200
01
23
4
3
0 200 400 600 800 1000
20
10
010
random walk
0 200 400 600 800 1000
020
4060
80
4
Example: Xt = �0 + �1t + �(L)et; with �(L) = 1 + �L + �2L2 + � � � and
j�j < 1
So Xt = �0 + �1t+ "t; "t = �"t�1 + et
"t = �(L)et measures the deviation of the series from trend in period t. We
wish to examine the e¤ect of an innovation et on "t; "t+1; :::
"t = et + �et�1 + �2et�2 + :::
which gives @"t+s@et
= �s (impulse response)
Therefore, when there is a shock to the economy, a deterministic trend model
implies the shock has a transient e¤ect. Sooner or later the system will be
back to the deterministic trend.
� If f(t) = c1 + c2t, we have a linear trend model. It is widely used.
� If f(t) = Aert, we have an exponential growth curve
� If f(t) = c1 + c2t+ c3t2, we have a quadratic trend model
� If f(t) = 1k+abt
; we have a logistic curve
5
� Stochastic trend (unit root or di¤erence stationarity):
Xt = �+Xt�1 + "t
where "t is a stationary process (hence both the mean and the variance of "t are
constant. Usually we assume "t � ARMA(p; q) with mean 0 and variance �2)
�Another representation: (1� L)Xt = �+ "t
� Solving 1 � L = 0 for L; we have L = 1, justifying the terminology �unit
root�.
�Pure random walk: Xt = Xt�1 + et; etiid
~N(0; �2e)
1. Xt =Pt�1
j=0 et�j if we assume X0 = 0 with probability 1
2. E(Xt) = 0; V ar(Xt) = t�2e !1
3. Non-stationarity. Xt is wandering around and can be anywhere.
6
�Random walk with a drift: Xt = �+Xt�1 + et; etiid� N(0; �2e)
1. Xt = t�+Pt�1
j=0 et�j if we assume X0 = 0 with probability 1
2. E(Xt) = t�!1; V ar(Xt) = t�2e !1
3. Non-stationarity.
4. As E(Xt) = t�, a stochastic trend (or unit root) model could behave
similar to a model with a linear deterministic trend.
� In a random walk model, Xt = t�+Pt
j=0 et�j. If we label et�j the innovation
or the shock to the system, the innovation has a permanent e¤ect on Xt
because@Xt
@et�j= 1; 8 j > 0:
This is true for all models with a unit root. Therefore, when there is a shock
to the economy, a unit root model implies that the shock has a permanent
e¤ect. The system will begin with a new level every time.
� In a trend-stationary model
Xt = �0 + �1t+ �Xt�1 + et; j�j < 1;
we have@Xt
@et�j= �j ! 0:
So the innovation has a transient e¤ect on Xt
7
�A deterministic trend model and a stochastic trend (or unit root) model
could behave similar to each other. However, knowing whether non-stationarity
in the data is due to a deterministic trend or a stochastic trend (or unit
root) would seem to be a very important question in economics. For ex-
ample, macroeconomists are very interested in knowing whether economic
recessions have permanent consequences for the level of future GDP, or in-
stead represent temporary downturns with the lost output eventually made
up during the recovery.
� If (1� L)Xt = "t � ARMA(p; q); we say Xt � ARIMA(p; 1; q) or Xt is an
I(1) process.
� If (1�L)dXt = "t � ARMA(p; q); we say Xt � ARIMA(p; d; q) or Xt is an
I(d) process.
8
� Why linear time trends and unit roots?
�Why linear trends? Indeed most GDP series, such as many economic and
�nancial series seem to involve an exponential trend rather than a linear
trend (apart from a stochastic trend). Suppose this is true, then we have
Yt = exp(�t)
However, if we take the natural log of this exponential trend function, we
will have,
lnYt = �t
Thus, it is common to take logs of the data before attempting to describe
them. After we take logs, most economic and �nancial time series exhibit a
linear trend. This is why researchers often take the logarithmic transforma-
tion before doing analysis.
9
�Why unit roots? After taking logs, Suppose a time series follows a unit root
rather than a linear trend, we have (1�L) lnYt = "t; where "t is a stationary
process, such as ARMA(p,q). However, (1�L) lnYt = ln(Yt=Yt�1) = ln(1+Yt�Yt�1Yt�1
) � Yt�Yt�1Yt�1
: So the rate of growth of the series Yt is stationary. For
example, if Yt represents CPI and lnYt is a I(1) process, then (1 � L) lnYt
represent in�ation rate and is a stationary process.
� Other forms of nonstationarity: structural break in mean, structural break in
variance
10
2 Unit Root Tests and A Stationarity Test
� The theory of ARMA method relies on the assumption of stationarity.
� The assumption of stationarity is too strong for many macroeconomic time series.
For example, many macroeconomic time series involve trend (both deterministic
trends and stochastic trends �unit root).
� Tests for a unit root. Various test statistics are often used in practice: Augmented
Dickey-Fuller (ADF) test, Phillips and Perron (PP) test.
� Test for stationarity: Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.
� Three possible alternative hypotheses for the unit root tests
1. no constant and no trend: Xt = �Xt�1 + et; j�j < 1
2. constant but no trend: Xt = �+ �Xt�1 + et; j�j < 1
3. constant and trend: Xt = �+ �Xt�1 + �t+ et; j�j < 1
11
� First consider the AR(1) process, yt = �yt�1 + et; ut � iidN(0; �2)
� Stationarity requires j�j < 1
� If j�j = 1; yt is nonstationary and has a unit root
� Test H0 : � = 1 against H1 : j�j < 1
� How to estimate �? OLS estimation, ie, b� = Pytyt�1Py2t�1
= �+Pyt�1etPy2t�1
. Stock (1994)
shows that the OLS estimator of � is superconsistent:
� What is the distribution or asymptotic distribution of b�� �?
� Recall from the standard model, y = X� + e
� If X is non-stochastic, b� � � � N(0; �2(X0X)�1)
� If X is stochastic and stationary with some restrictions on autocovariance,pT (b� � �)
d! N(0; �2Q�1) where Q = p lim X0XT
12
2.1 Classical Asymptotic Theory for Covariance StationaryProcess
� Consider the AR(1) process, yt = �yt�1 + et; et � iidN(0; �2) with j�j < 1
� For this model E(yt) = 0, V ar(yt) = �2
1��2 � �2y, j = �jV ar(yt).
� What is the asymptotic property of y = 1T
PTt=1 yt?
� We haveE(y) = 0, V ar(y) = E(y2) = V ar(yt)T
�1 + 2T�1
T�+ 2T�2
T�2 + � � �+ 2 1
T�T�1
�!
0. So y r2! 0) yp! 0
� Also, TV ar(y)!P1
j=�1 j, which is called the long-run variance.
� The consistency is generally true for any covariance stationary process with con-
ditionP1
j=0 j jj <1.
� Martingale Di¤erence Sequence (MDS): E(ytjyt�1; :::; y1) = 0 for all t
� MDS is serially uncorrelated but not necessarily independent.
� MDS does not need to be stationary.
13
� Theorem (White, 1984): Let fytg be a MDS with yT = 1T
PTt=1 yt Suppose that
(a) E(y2t ) = �2t > 0 with 1T
PTt=1 �
2t ! �2 > 0, (b) Ejytjr < 1 for some r > 2
and all t, and (c) 1T
PTt=1 y
2t
p! �2. ThenpTyT
d! N(0; �2).
� Theorem (Andersen, 1971): Let
yt =P1
t=0 j"t�j
where f"tg is a sequence of iid variables with E("2t ) < 1 andP1
j=0 j jj < 1.
Let j be the jth order autocovariance. Then
pTyT
d! N(0;P1
j=�1 j)
� To know more about asymptotic theory for linear processes, read Phillips and
Solo (1991).
14
2.2 Dickey-Fuller Test
� Under H0, however, we cannot claimpT (b�� �)
d! N(0; �2Q�1) because yt�1 is
not stationary.
� If yt = �yt�1 + et; and H0 : � = 1; we have b�� 1 = Pyt�1etPy2t�1
� Consider y2t = (yt�1 + et)2 =) yt�1et =
12(y2t � y2t�1 � e2t ) =)
Pyt�1et =
12(y2T � y20 �
Pe2t )
� Let y0 = 0;Pyt�1et =
12(y2T �
Pe2t )
� et � iid(0; �2) =) e2t � iid with E(e2t ) = �2: By LLN, 1T
Pe2t
p! �2
� yT = y0 +PT
t=1 et =Pet
a� N(0; T�2) =) yTpT�
d! N(0; 1) =) y2TT�2
d! �2(1)
� 1T�2
Pyt�1et =
12(y2TT�2
�Pe2t
T�2)d! 1
2(�2(1) � 1)
� What is the limit behavior ofPy2t�1?We need to introduce the Brownian Motion.
15
2.2.1 Brownian Motion
� Consider yt = yt�1 + et; y0 = 0; et � iidN(0; 1) =) yt =Pt
j=1 ej � N(0; t)
� 8t > s; yt � ys =Pt
j=1 ej �Ps
j=1 ej = es+1 + :::+ et � N(0; (t� s))
� Also note that, yt � ys is indept of yr � yq if t > s > r > q
� What happen if we go from the discrete case to the continuous case? We have
the standard Brownian Motion (BM). It is denoted by B(t)
� A stochastic process Bt over time is a standard Brownian motion if for small
time interval �, the change in Bt;�Bt(� Bt+��Bt), follows the following prop-
erties:
1. �Bt = ep� where e � N(0; 1)
2. �Bt and �Bs for changes over any non-overlapping short intervals are in-
dependent
� The above two properties imply the following:
1. E(�Bt) = 0
2. Var(�Bt) = �
3. �(�Bt) =p�
16
� In general we have
BT �B0 =
NXi=1
eip�
� Due to the above, we have
E(BT �B0) = 0
Var(BT �B0) = T
�(BT �B0) =pT
17
� Therefore, for the Brownian motion, results for the mean and variance in small
intervals also apply to large intervals.
� When �! 0, we write the Brownian motion as:
dBt = ep�
where dBt should be interpreted as the change in the Brownian motion over an
arbitrarily small time interval.
18
� Sample paths of the Brownian motion are continuous everywhere but not smooth
anywhere.
� Derivative of the Brownian motion does not exist anywhere and the process is
in�nitely �kinky�.
� A Brownian motion is nonstationary as Var(BT ) drifts up with t. The transition
density is
BT+�jBT � N(BT ;�):
� A Brownian motion is also called a Wiener process. It has no drift. Now we
extend it to the generalized Wiener process.
� We can de�ne et = e1t + e2t; e1t; e2t � N(0; 1=2); yt+ 12= yt + e1t
� De�nition of the BM: W (t) = �B(t)
19
� Note:
1. B(0) = 0
2. B(t) � N(0; t) or B(t) is a Gaussian process
3. E(B(t+ h)�B(t)) = 0; E(B(t+ h)�B(t))2 = jhj
4. E(B(t)B(s)) = minft; sg
5. If t > s > r > q;B(t) � B(s) is indept of B(r) � B(q); ie, B(t) has indept
increment
6. Since B(t) is a Gaussian process, it is completely speci�ed by its covariance
matrix
7. If B(t) is a BM, then so is B(t+ a)�B(a); ��1B(�2t);�B(t)
8. The sample path of BM is a continuous function of time with prob 1
9. No point di¤erentiable
20
� Return to the unit root test. yt = y0 +Pt
j=1 ej: De�ne pt =Pt
j=1 ej = pt�1 + et
� Change the time index from T = f1; :::; Tg to t with 0 � t � 1
� De�ne YT (t) = 1�pTp[Tt]; if
j�1T� t � j
T; j = 1; :::; T: [Tt] = integer part of Tt
� For instance, YT (1) = 1�pTpT
� YT (t) =
8<:0 if 0 � t � 1
Tp1�pT
if 1T� t � 2
Tpj�1�pT
if j�1T� t � j
T
� Two important theorems
1. Functional CLT: YT (t)d! B(t)
2. Continuous mapping theorem: if f is a continuous function, f(YT (t))d!
f(B(t))
�PT
t=1 yt =PT
j=1 pj�1 +PT
j=1 ej if y0 = 0.
�R j=T(j�1)=T YT (t)dt =
pj�1�TpT=)
Ppj�1 = �T
pTPT
j=1
R j=T(j�1)=T YT (t)dt = �T
pTR 10YT (t)dt
�PT
t=1 yt = �TpTR 10YT (t)dt+
pTPejpT=)
PTt=1 yt
TpT= �
R 10YT (t)dt+
1T
PejpT
� By FCLT and CMT,PTt=1 yt
TpT
d! �R 10B(t)dt
21
�PT
t=1 y2t�1 = y20 + y
21 + :::+ y
2T�1 = y21 + :::+ y
2T�1 =
PTt=1 y
2t � y2T =
PTt=1 p
2t � p2T
� Recall p2j�1 = �2T 2R j=Tj�1=T Y
2T (t)dt =)
Pp2j�1 = �2T 2
R 10Y 2T (t)dt and p2T =
�2TY 2T (1)
� Therefore,PTt=1 y
2t�1
T 2d! �2
R 10B2(t)dt
� Under H0; T (b�� �)d! �2
(1)�1
2R 10 B
2(t)dt
� Phillips (1987) proved the above result under very general conditions.
� Note:
1. This limiting distribution is non-standard
2. The numerator and denominator are not independent.
3. It is called the Dickey-Fuller distribution since Dickey and Fuller (1976) uses
Monte-Carlo simulation to �nd the critical values of�2(1)�1
2R 10 B
2(t)dtand tabulate
them.
22
2.2.2 Case 1 (no drift or trend in the regression):
� In case 1, we test�H0 : Xt = Xt�1 + etH1 : Xt = �Xt�1 + et with j�j < 1
� This is equivalent to�H0 : �Xt = etH1 : �Xt = �Xt�1 + et with � < 0
� We estimate the following model using the OLS method:
�Xt = �Xt�1 + et; et � iid(0; �2e)
23
� The OLS estimator of � is de�ned by b�:The OLS estimator of � is de�ned by b�:They are biased but consistent.
� The test statistic (known as the DF Z test or coe¢ cient statistic) is de�ned as
T b�� Under H0, the sampling distribution is not a normal distribution, but the DF
distribution. See Table B.5 for the critical values of this distribution for di¤erent
sample sizes.
� Alternatively, we can use t-statistic b�bse(b�) : Although it is called the Dickey-Fuller t-statistic, the sampling distribution is no longer a t distribution, but
�2(1)�1
2(R 10 B
2(t)dt)1=2 .
See Table B.6 for the critical values of this distribution for di¤erent sample sizes.
24
2.2.3 Case 2 (constant but no trend in the regression):
� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + et with � < 0
� In case 2, we estimate the following model using the OLS method:
�Xt = �+ �Xt�1 + et; et � iidN(0; �2e)
� The DF Z test statistic is T b�: Under H0, the sampling distribution is not a
normal distribution, but1=2(�2
(1)�1)�B(1)
R 10 B(t)dtR 1
0 B2(t)dt�(
R 10 B(t)dt)
2 . See Table B.5 for the critical
values of this distribution for di¤erent sample sizes.
� Alternatively, we can use the DF t-statistic b�bse(b�) : The sampling distribution is nolonger a t distribution, but
1=2(�2(1)�1)�B(1)
R 10 B(t)dt�R 1
0 B2(t)dt�(
R 10 B(t)dt)
2�1=2 . See Table B.6 for the critical
values of this distribution for di¤erent sample sizes.
25
2.2.4 Case 3 (constant and trend in the regression):
� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0
or�H0 : �Xt = �+ etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0
� In this case, we estimate the following model using the OLS method:
�Xt = �+ �Xt�1 + �t+ et; et � iidN(0; �2e)
� The Dickey-Fuller Z test statistic is T b�: Under H0, the sampling distribution is
not a normal distribution, but another new distribution (it is also di¤erent from
the ones in Case 1 and 2). See Table B.5 for the critical values of this distribution
for di¤erent sample sizes.
� Alternatively, we can use the Dickey-Fuller t-statistic b�bse(b�) : The sampling dis-tribution is no longer a t distribution, but another new distribution (it is also
di¤erent from ones in Case 1 and 2). See Table B.6 for the critical values of this
distribution for di¤erent sample sizes.
Test Statistic 1% 2.5% 5% 10%Case 1 -2.56 -2.34 -1.94 -1.62Case 2 -3.43 -3.12 -2.86 -2.57Case 3 -3.96 -3.66 -3.41 -3.13
When the sample size is 100, the critical values for the Dickey-Fuller t-statistic
are given in the table below
Test Statistic 1% 2.5% 5% 10%Case 1 -2.60 -2.24 -1.95 -1.61Case 2 -3.51 -3.17 -2.89 -2.58Case 3 -4.04 -3.73 -3.45 -3.15
26
2.2.5 Which case to use in practice?
� Use Case 3 test for a series with an obvious trend, such as GDP
� Use Case 2 test for a series without an obvious trend, such as interest rates. For
variables such as interest rate we should use Case 2 rather than Case 1 since if
the data were to be described by a stationary process, surely the process would
have a positive mean. However, if you strongly believe a series has a zero mean
when it is stationary, use Case 1.
27
2.3 Augmented DF (ADF) Test
� The model in the null hypothesis considered in the DF test is a class of highly
restrictive unit root models.
� Suppose the model in the null hypothesis is Xt = Xt�1 + ut; ut = �ut�1 + et.
Unless � = 0, the DF test is not applicable.
� The above model implies the following AR(2) model for Xt,
Xt = (1 + �)Xt�1 � �Xt�2 + et = Xt�1 + ��Xt�1 + et
� In general, if
Xt = �1Xt�1 + �2Xt�2 + :::+ �pXt�p + et; et � iid(0; �2e); (2.1)
then the DF test is not applicable.
� Model (2.1) can be represented by
Xt = �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e);
where � =Pp
i=1 �i
� So if � = 1; fXtg is a unit root process; if j�j < 1; fXtg is a stationary process.
28
� In Case 2, the ADF test is based on the OLS estimator of � from the following
regression model,
�Xt = �+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)
� In Case 3, the ADF test is based on the OLS estimator of � from the following
regression model,
�Xt = �+ �t+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)
� Based on the OLS estimator of �; the ADF test statistic, b�pdV ar(b�) , can be used.It has the same sampling distribution as the DF t statistic.
29
2.4 Phillips-Perron (PP) Test
� Xt = Xt�1 + ut;�(L)ut = �(L)et; et � iidN(0; �2e); so ut is an ARMA process.
Phillips and Perron (1988) proposed a nonparametric method of controlling for
possible serial correlation in ut.
� There are two PP test statistics. One of them (so-called PP Z test) is the
analogue of T b�: The other (so-called PP t test) is the analogue ofb�pdV ar(b�) . PP
Z test has the same sampling distribution as T b� under all three cases in theprevious section. PP t test has the same sampling distribution as
b�pdV ar(b�) underall three cases in the previous section.
� In Case 2, both PP tests are based on the OLS estimator of � from the following
regression model,
�Xt = �+ �Xt�1 + "t; where "t � ARMA
� In this case, the PP Z test statistic is de�ned by T b� � 12(T 2b�2b� � s2T )(�
2 � 0);
where s2T = (T � 2)�1P(Xt � b�� b�Xt�1)
2; � = ��(1); 0 = E(u2t )
� In Case 3, both PP tests are based on the OLS estimator of � from the following
regression model,
�Xt = �+ �t+ �Xt�1 + "t; where "t � ARMA
30
2.5 Kwiatkowski, Phillips, Schmidt, and Shin (KPSS, 1992)Test
� Unlike the tests we have thus far introduced, KPSS test is a test for stationarity or
trend-stationarity, that is, it tests for the null of stationarity or trend stationarity
against the alternative of a unit root.
� It is assumed that one can decompose Xt into the sum of a deterministic trend,
a random walk and a stationary error
Xt = �t+ rt + ut (2.2)
rt = rt�1 + et (2.3)
where et � iidN(0; �2e). The initial value r0 is assumed to be �xed.
� If �2e = 0, Xt is trend-stationary. If �2e > 0, Xt is a non-stationary.
31
� KPSS test is a one-sided Lagrange Multiplier (LM) test
� The KPSS statistic is based on the residuals from the OLS regression of Xt on
the exogenous variables Zt = 1 or (1; t):
Xt = �0Zt + �t
� The LM statistic is be de�ned as: PTt=1 S(t)
2
T 2f0;
where f0 is an estimator of the long-run variance and S(t) is a cumulative residual
function:
S(t) =tXs=1
�s
� Critical values for the KPSS test statistic are:
Level of Signi�cance 10% 5% 2.5% 1%Crit value (case 2) 0.347 0.463 0.574 0.739Crit value (case 3) 0.119 0.146 0.176 0.216
32
� If the LM test statistic is larger than the critical value, we have to reject the null
of stationarity (or trend-stationarity).
� In general �t is not a white noise. As a result, the long run variance is di¤erent
from the short run variance. One way to estimate the long run variance is to
nonparametrically estimate the spectral density at frequency zero, that is, com-
pute a weighted sum of the autocovariances, with the weights being de�ned by a
kernel function. For example, a nonparametric estimate based on Bartlett kernel
is
f0 =T�1X
h=�(T�1)
(h)K(h=l);
where l is a bandwidth parameter (which acts as a truncation lag in the covariance
weighting), and (h) is the h-th sample autocovariance of the residuals �, de�ned
as,
(h) =TX
t=h+1
�t�t�h=T;
and K is the Barlett kernel function de�ned by
K(x) =
� 1� jxj if jxj � 1
0 otherwise
33
3 Revisit to Box-Jenkins
� If the null hypothesis cannot be rejected, di¤erence the data (ie, Xt�Xt�1 = ut)
to achieve stationarity.
� In the case where both a stochastic trend and a linear deterministic trend are
involved, the �rst di¤erence transformation leads to a stationary process.
� Test for a unit root in the di¤erenced data. If the null hypothesis cannot be
rejected, di¤erence the di¤erenced data until stationarity is achieved.
� ARMA(p; q)+one unit root =ARIMA(p; 1; q)
� ARMA(p; q)+d unit roots =ARIMA(p; d; q)
4 Limitations of Box-Jenkins
� The Data Generating Process has to be time-invariant. This assumption can be
too restrictive for an economy undergoing a period of transition and for data
covering a long sample period.
� What if the data is explosive, ie � > 1.
� What if a nonlinear deterministic trend is involved.
� What if the data follow a trend stationary process.
� Unit root tests and model selection are done in two separate steps.
34
5 Model Selection
5.1 Why use model selection criteria
� Select p; q in the ARMA model.
� Select lag order, the trend order and other regressors.
5.2 Conventional model selection criteria
� 1�R2
� Minimize 1� �R2
� Minimize AIC (Akaike information criterion, Akaike, 1969). AIC= �2lT+ 2k
T,
where k is the number of parameters, l is the log-likelihood value, and T is the
sample size
� Minimize BIC (Bayesian information criterion, Schwarz, 1978): BIC= �2lT+
kTlnT
35
5.3 Model selection criteria recently developed
� In AIC/BIC, the penalty depends only on the number of regressors while in PIC
it depends not only on the number of regressors but also on the type of regressors
� In the regression context with the normality assumption (ie, Y = X� + "),
PIC= �2lT+ln(jX0X=�2K j)
T, where �2K is a consistent estimate of the residual variance
of a reference model. Usually the reference model is chosen to be the biggest
plausible model.
36
Number of Parameters
Crit
eria
5 10 15
0.4
0.5
0.6
0.7
0.8
1Adj R SquareAICBICPIC
1Adj R SquareAICBICPIC
5.4 Which criterion to use in practice
� Comparison of these criteria
37
� BIC and PIC are consistent. This means in large samples they will select the
true model
� AIC in not consistent. In large samples it tends to choose a model with too many
lags
� PIC is applicable to ARIMA+Trend processes
� AIC and BIC are provided by most packages when a regression is run.
� A Gauss library COINT developed by Ouliaris and Phillips calculates PIC
� In the context of stationary ARMA models, Phillips and Ploberger (1994) found
that PIC is superior to BIC using simulated data.
� PIC is found in Schi¤ and Phillips (2000) to be superior to BIC in a class of
AR+Trend models for forecasting New Zealand GDP
38
6 Cointegration
� De�nition: two variables (Yt; Xt) is said to be cointegrated if each of them taken
individually is I(1) ( ie non-stationary with a unit root), while some linear com-
binations of them (say Yt � �Xt) is stationary (ie I(0)).
� If (Yt; Xt) is cointegrated, the regression model Yt = � + �Xt + et is called a
cointegration relation, where et is I(0).
� (1;��) is called the cointegrating vector. It is easy to see any multiple of it is
also a cointegrating vector. So normalisation is important.
� Cointegration is a long run equilibrium relationship, that is the manner in which
the two variables drift upward together.
39
� Since et is I(0), any deviation from the equilibrium will come back to the equi-
librium in the long run.
� Cointegration and common trend:
� cointegrated variables share common trends
� Suppose both Yt and Xt have a stochastic trend
�Yt = �Y t + "Y t; where �Y t is I(1) ie �Y t = �Y t�1 + !Y t and "Y t is stationary
Xt = �Xt+"Xt; where �Xt is I(1) ie �Xt = �Xt�1+!Xt and "Y t is stationary
�Now if Yt and Xt are cointegrated, then there must exist a parameter � such
that Yt � �Xt is stationary, ie, �Y t + "Y t � �(�Xt + "Xt) = �Y t � ��Xt +
"Y t � �"Xt is stationary. Since "Y t � �"Xt is stationary, for Yt � �Xt to be
stationary, �Y t � ��Xt must be 0. Therefore �Y t = ��Xt
�The equality thus necessarily requires that Yt and Xt must have a common
stochastic trend
40
� The concept of cointegration is originally due to Granger (1981) but popularised
by Engle and Granger (1987, Econometrica)
� Estimation of cointegration systems:
1. OLS estimator is not only consistent but also �super-consistent�. This
means the estimator converges to its limiting distribution at a faster rate
than the estimator in the stationary case.
2. Super-consistency is also true in the cointegration model involved the auto-
correlation.
41
� Test for cointegration (between Yt and Xt)
Basic Idea:
1. Test Yt and Xt for I(1)
2. Obtain the �tted residual bet from a regression of Yt on Xt (di¤erent cointe-
grating regression models lead to di¤erent cases)
3. Use the ADF or PP test to test for a unit root in bet. Since the test is basedon bet, not et, we cannot use the Dickey-Fuller tables as before. We must usemodi�ed critical values.
4. If H0 is rejected Yt and Xt are cointegrated. If H0 cannot be rejected, the
regression of Yt on Xt is spurious.
42
� Case 2 (constant and no trend in the regression):
�We estimate the following model (with the constant term but without trend)
using the OLS method
Yt = �+X 0t� + et
�The ADF t-statistic or the PP t-statistic can be used. The critical values
for the ADF t-statistic or the PP t-statistic are given in the table below.
� Case 3 (constant and trend in the regression):
�We estimate the following model (with the constant term and the trend)
using the OLS method:
Yt = �+X 0t� + �t+ et
�The ADF t-statistic or the PP t-statistic can be used. The critical values
for the ADF t-statistic or the PP t-statistic are given in the table below.
� Which case to use in practice?
�Use Case 3 test if either Yt or Xt or both involve a possible trend, such as
GDP
�Use Case 2 test if neither Yt nor Xt involve a possible trend, such as interest
rates and exchange rates.
k 2 3 4 5 6case 2 3 2 3 2 3 2 3 2 31% -3.90 -4.32 -4.29 -4.66 -4.64 -4.97 -4.96 -5.25 -5.25 -5.525% -3.34 -3.78 -3.74 -4.12 -4.10 -4.43 -4.42 -4.72 -4.71 -4.9810% -3.04 -3.50 -3.45 -3.84 -3.81 -4.15 -4.13 -4.43 -4.42 -4.70
43
� Error Correction Model (ECM):
1. Since the trends of cointegrated variables are linked, the dynamic paths of
such variables must bear some relation to the current deviation from the
equilibrium relationship. This connection between the change in a variable
and the deviation from equilibrium is examined via the error correction
representation.
2. The set of cointegrated variables is said to be in equilibrium if Yt����Xt =
0
3. Thus the deviation from the long run equilibrium is et = Yt � �� �Xt and
this error must be stationary.
4. If the system is to return to equilibrium the movements of at least some of
the variables must respond to the magnitude of the disequilibrium.
5. ECM: �Yt = �0 + �(Yt�1 � �� �Xt�1) + �Xt + "t; where (�; �) captures
the long-run equilibrium relationship. � captures the short-run dynamics
and can be interpreted as the speed of adjustment of the system towards
the long run equilibrium.
6. Since all the variables in the ECM are I(0), the classical inferences follow.
7. Estimation of ECM.
(a) NLS. The ECM is a non-linear model and can be estimated by the NLS
method, ie, minf�0;�;�;�; g "2t
(b) OLS. Rewrite the ECM as �Yt = �0+�Yt�1������Xt�1+ �Xt+"t,
where �� = ��; �� = ��: The new model is a linear model. However, �
and � cannot be directly estimated.
44
7 Spurious Regression
� Yt = a+ bZt + ut
� Classical approach requires that both Yt and Zt and hence ut are stationary
� Now consider Yt = Yt�1 + "1t, Zt = Zt�1 + "2t; where "jt � iid(0; �2j )
� Let Y0 = Z0 = 0; we have Yt =Pt
j=1 "1j, Zt =Pt
j=1 "2t
� If "1t and "2t are indept, the regression is a spurious regression.
� Why spurious?
� Using Monte Carlo simulations Granger and Newbold (1974) �nd that t statistic
rejects the null hypothesis b = 0 far more often than it should and tends to reject
it more and more frequently the larger is the sample size.
� Using asymptotic theory Phillips (1986) �nds that t p! +1, hence it will reject
the null hypothesis b = 0 all the time asymptotically. Also he shows R2p! 1
� Another type of spurious regression: Yt = a+ bt+ ut and Yt = Yt�1 + "t
� Note:
1. If Yt and Zt are stationary, the classical theory applies
2. If Yt and Zt are integrated with di¤erent orders, the regression is meaningless
3. If Yt and Zt are integrated with the same order, but ut is integrated with
the same order, we have a spurious regression
4. If Yt and Zt are integrated with the same order, but ut is integrated with a
lower order, we have a cointegration
45