Vector autoregressions - Homepage-Adressbuch - Zentraler

Introduction Stable VAR Processes

Vector autoregressionsBased on the book ‘New Introduction to Multiple Time Series

Analysis’ by Helmut Lutkepohl’

Robert M. [email protected]

University of Vienna

and

Institute for Advanced Studies Vienna

November 3, 2011

Vector autoregressions University of Vienna and Institute for Advanced Studies Vienna


Outline

Introduction

Stable VAR ProcessesBasic assumptions and propertiesForecastingStructural VAR analysis



Objectives of analyzing multiple time series

Main objectives of time series analysis may be:

1. Forecasting: prediction of the unknown future by looking atthe known past:

yT+h = f (yT , yT−1, . . .)

denotes the h-step prediction for the variable y ;

2. Quantifying the dynamic response to an unexpected shock toa variable by the same variable h periods later and also byother related variable: impulse-response analysis;

3. Control: how to set a variable in order to achieve a given timepath in another variable; description of system dynamicswithout further purpose.



Some basics: stochastic process

Assume a probability space (Ω,F , Pr). A (discrete) stochasticprocess is a real-valued function

y : Z × Ω → R,

such that, for each fixed t ∈ Z , y(t, ω) is a random variable. Z is auseful index set that represents time, for example Z = Z or Z = N.



Some basics: multivariate stochastic process

A (discrete) K–dimensional vector stochastic process is areal-valued function

y : Z × Ω → RK ,

such that, for each fixed t ∈ Z , y(t, ω) is a K–dimensional randomvector.

A realization is a sequence of vectors yt(ω), t ∈ Z , for a fixed ω. Itis a function Z → R

K . A multiple time series is assumed to be afinite portion of a realization.

Given such a realization, the underlying stochastic process is calledthe data generation process (DGP).



Vector autoregressive processes

Let yt = (y1t , . . . , yKt)′, ν = (ν1, . . . , νK )′, and

Aj =

α11,j · · · α1K ,j...

. . ....

αK1,j · · · αKK ,j

.

Then, a vector autoregressive process (VAR) satisfies the equation

yt = ν + A1yt−1 + . . . + Apyt−p + ut ,

with ut a sequence of independently identically distributed randomK–vectors with zero mean (conditions relaxed later).



Forecasting using a VAR

Assume yt follows a VAR(p). Then, the forecast yT+1 is given by

yT+1 = ν + A1yT + . . . + ApyT−p+1,

i.e. the systematic part of the defining equation. Note that thisalso defines a forecast for each component of yT+1.



A flowchart for VAR analysis

ForecastingStructuralanalysis

Model checking

Specification andestimation of VAR model

model accepted

modelrejected



Basic assumptions and properties

The VAR(p) model

The object of interest is the vector autoregressive process of orderp that satisfies the equation

yt = ν + A1yt−1 + . . . + Apyt−p + ut , t = 0,±1,±2, . . .

with ut assumed as K–dimensional white noise, i.e. Eut = 0,Eusu

′t = 0 for s 6= t, and Eutu

′t = Σ with nonsingular Σ

(conditions relaxed).

First we concentrate on the VAR(1) model

yt = ν + A1yt−1 + ut .




Substituting in the VAR(1)

Continuous substitution in the VAR(1) model yields

y1 = ν + A1y0 + u1,

y2 = (IK + A1)ν + A21y0 + A1u1 + u2,

...

yt = (IK + A1 + . . . + At−11 )ν + At

1y0 +t−1∑

j=0

Aj1ut−j ,

such that y1, . . . , yt can be represented as a function ofy0, u1, . . . , ut . All yt , t ≥ 0, are a function of just one startingvalue and the errors.




The Wold representation of the VAR(1)

If all eigenvalues of A1 have modulus less than one, substitutioncan be continued using the yj , j < 0, and the limit exists:

yt = (IK − A1)−1ν +

∞∑

j=0

Aj1ut−j , t = 0,±1,±2, . . . ,

and the constant portion can be denoted by µ.

The matrix sequence converges according to linear algebra results.The random vector converges in mean square due to an importantstatistical lemma.




Convergence of sums of stochastically bounded processes

TheoremSuppose (Aj) is an absolutely summable sequence of real(K ×K )–matrices and (zt) is a sequence of K–dimensional randomvariables that are bounded by a common c ∈ R in the sense of

E(z ′tzt) ≤ c , t = 0,±1,±2, . . . .

Then there exists a sequence of random variables (yt), such that

n∑

j=−n

Ajzt−j → yt ,

as n → ∞, in quadratic mean. (yt) is uniquely defined except on aset of probability 0.




Aspects of the convergent sum

The matrices converge geometrically and hence absolutely, and thetheorem applies. The limit in the ‘Wold’ representation is welldefined.

This is called a Wold representation, as Wold’s Theoremprovides an infinite-order moving-average representation for allunivariate covariance-stationary processes.

Note that the white-noise property was not used. The sumwould even converge for time-dependent ut .




Expectation of the stationary VAR(1)

The Wold-type representation implies

E(yt) = (IK − A1)−1ν = µ.

This is due to the fact that Eut = 0 for the white-noise terms anda statistical theorem that permits exchanging the limit andexpectation operations under the conditions of the lemma. Notethat the white-noise property (uncorrelated sequence) is not used.




Second moments of the stationary VAR(1)

Luetkepohl presents a derivation of the cross-covariancefunction

Γy (h) = E(yt − µ)(yt−h − µ)′

= limn→∞

n∑

i=0

n∑

j=0

Ai1E(ut−iu

′t−j−h)(A

j1)

′

= lim

n∑

i=0

Ah+i1 Σu(A

i1)

′ =

∞∑

i=0

Ah+i1 Σu(A

i1)

′,

which uses E(utu′s) = 0 for s 6= t, E(utu

′t) = Σu, and a corollary

to the lemma that permits evaluation of second moments underthe same conditions. Here, the white-noise property of ut is used.




The definition of a stable VAR(1)

DefinitionA VAR(1) is called stable iff all eigenvalues of A1 have modulusless than one. By a mathematical lemma, this condition isequivalent to

det(IK − A1z) 6= 0 for |z | ≤ 1.

No roots within or on the unit circle. Note that this definitiondiffers from stability as defined by other authors. Stability is notequivalent to stationarity: a stable process started in t = 1 is notstationary; a backward-directed entirely unstable process isstationary.




Representation of VAR(p) as VAR(1)

All VAR(p) models of the form

yt = ν + A1yt−1 + . . . + Apyt−p + ut

can be written as VAR(1) models

Yt = ν† + AYt−1 + Ut ,

with

A =

A1 A2 . . . Ap−1 Ap

IK 0 . . . 0 0...

. . ....

0 0 . . . IK 0

.




More on the state-space VAR(1) form

In the VAR(1) representation of a VAR(p), the vectors Yt , ν†, andUt have length Kp:

Yt =

yt

yt−1

. . .yt−p+1

, ν† =

ν0. . .0

, Ut =

ut

0. . .0

.

The big matrix A has dimension Kp × Kp. This state-space formpermits using all results from VAR(1) for the general VAR(p).




Stability of the VAR(p)

DefinitionA VAR(p) is called stable iff all eigenvalues of A have modulus lessthan one. By a mathematical lemma, this condition is equivalent to

det(IKp − Az) 6= 0 for |z | ≤ 1.

This condition is equivalent to the stability condition

det(IK − A1z − . . . − Apzp) 6= 0 for |z | ≤ 1,

which is usually more efficient to check. Equivalence follows fromthe determinant properties of partitioned matrices.




The infinite-order MA representation of the VAR(p)

The stationary stable VAR(p) can be represented in the convergentinfinite-order MA form

Yt = µ† +∞∑

j=0

AjUt−j .

This is, however, still an inconvenient process of dimension Kp.Formally, the first K entries of the vector Yt are obtained via the(K × Kp)–matrix

J = [IK : 0 : . . . : 0]

as yt = JYt .




The Wold representation of the VAR(p)

Using J, it follows that

yt = Jµ† + J∞∑

j=0

AjUt−j

= µ +

∞∑

j=0

JAjJ ′JUt−j

= µ +∞∑

j=0

Φjut−j

for the stable and stationary VAR(p), a Wold representation withΦj = JA

jJ ′. This is the canonical or fundamental orprediction-error representation.




First and second moments of the VAR(p)

Applying the lemma to the MA representation yields E (yt) = µ and

Γy (h) = E(yt − µ)(yt−h − µ)′

= E

(

h−1∑

i=0

Φiut−i +∞∑

i=0

Φh+iut−h−i

)

∞∑

j=0

Φjut−h−j

′

=∞∑

i=0

Φh+iΣuΦ′i




The Wold-type representation with lag operators

Using the operator L defined by Lyt = yt−1 permits writing theAR(p) model as

yt = ν + (A1L + . . . + ApLp)yt + ut

or, with A(L) = 1 − A1L − . . . − ApLp,

A(L)yt = ν + ut .

Then, one may write Φ(L) =∑∞

j=0 ΦjLj and

yt = µ + Φ(L)ut = A−1(L)(ν + ut),

thus formally A(L)Φ(L) = I or Φ(L) = A−1(L). Note that A(L) isa polynomial and Φ(L) is a power series.




Remarks on the lag operator representation

The property Φ(L)A(L) = I allows to determine Φj iterativelyby comparing coefficient matrices;

Note that µ = A−1(L)ν = A−1(1)ν and thatA(1) = 1 − A1 − . . . − Ap;

It is possible that A−1(L) is a finite-order polynomial, whilethis is impossible for scalar processes;

The MA representation exists iff the VAR(p) is stable, i.e. iffall zeros of det(A(z)) are outside the unit circle: A(L) iscalled invertible.




Remarks on stationarity

Formally, covariance stationarity of K–variate processes is definedby constancy of first moments Eyt = µ ∀t and of secondmoments

E(yt − µ)(yt−h − µ)′ = Γy (h) = Γy (−h)′ ∀t, h = 0, 1, 2, . . .

Strict stationarity is defined by time invariance of allfinite-dimensional joint distributions. Here, ‘stationarity’ refers tocovariance stationarity, for example in the proposition:

Proposition

A stable VAR(p) process yt , t ∈ Z, is stationary.




Yule-Walker equations for VAR(1) processes

Assume the VAR(1) is stable and stationary. The equation

yt − µ = A1(yt−1 − µ) + ut

can be multiplied by (yt−h − µ)′ from the right. Application ofexpectation yields

E(yt −µ)(yt−h−µ)′ = A1E(yt−1−µ)(yt−h−µ)′+Eut(yt−h−µ)′

orΓy (h) = A1Γy (h − 1)

for h ≥ 1.




The system of Yule-Walker equations for VAR(1)

For the case h = 0, the last term is not 0:

E(yt − µ)(yt − µ)′ = A1E(yt−1 − µ)(yt − µ)′ + Eut(yt − µ)′

orΓy (0) = A1Γy (−1) + Σu = A1Γy (1)′ + Σu,

which by substitution from the equation for h = 1 yields

Γy (0) = A1Γy (0)A′1 + Σu,

which can be transformed to

vecΓy (0) = (IK2 − A1 ⊗ A1)−1vecΣu,

an explicit formula to obtain the process variance from givencoefficient matrix and error variance.




How to use the Yule-Walker equations for VAR(1)

For synthetic purposes, first evaluate Γy (0) from given A1 andΣu;

Then, the entire ACF is obtained from Γy (h) = Ah1Γy (0);

The big matrix in the h = 0 equation must be invertible, asthe eigenvalues of A1 ⊗ A1 are the squares of the eigenvaluesof A1, which have modulus less than one;

Sure, the same trick works for VAR(p), as they have aVAR(1) representation, but you have to invert((Kp)2 × (Kp)2)–matrices;

For analytic purposes, A1 = Γ0Γ−11 can be used to estimate

A1 from the correlogram.




Autocorrelations of stable VAR processes

Autocorrelations are often preferred to autocovariances. Formally,they are defined via

ρij(h) =γij(h)

√

γii (0)√

γjj(0)

from the autocovariances for i , j = 1, . . . ,K and h ∈ Z. Thematrix formula

Ry (h) = D−1Γy (h)D−1

with D = diag(γ11(0)1/2, . . . , γKK (0)1/2) is given for completeness.



Forecasting

The forecasting problem

Based on an information set Ωt ⊇ ys , s ≤ t available at t, theforecaster searches an approximation yt(h) to the ‘unknown’ yt+h

that minimizes some expected loss or cost

Eg(yt+h − yt(h))|Ωt.

The most common loss function g(x) = x2 minimizes the forecastmean squared errors (MSE). t is the forecast origin, h is theforecast horizon, yt(h) is an h–step predictor.



Forecasting

Conditional expectation

Proposition

The h–step predictor that minimizes the forecast MSE is theconditional expectation

yt(h) = E(yt+h|ys , s ≤ t).

Often, the casual notation Et(yt+h) is used.

This property (proof constructive) also applies to vector processesand to VARs, where the MSE is defined by

MSE (yt(h)) = Eyt+h − yt(h)yt+h − yt(h)′.



Forecasting

Conditional expectation in a VAR

Assume ut is independent white noise (martingale differencesequence with E(ut+1|us , s ≤ t) = 0 suffices), then for a VAR(p)

Et(yt+1) = ν + A1yt + A2yt−1 + . . . + Apyt−p+1,

and, recursively,

Et(yt+2) = ν + A1Et(yt+1) + A2yt + . . . + Apyt−p+2,

etc., which allows the iterative evaluation for all horizons.



Forecasting

Larger horizons for a VAR(1)

By repeated insertion, the following formula is easily obtained:

Et(yt+h) = (IK + A1 + . . . + Ah−11 )ν + Ah

1yt ,

which implies that the forecast tends to become trivial as hincreases, given the geometric convergence in the last term.



Forecasting

Forecast MSE for VAR(1)

The MA representation yt = µ +∑∞

j=0 Aj1ut−j clearly decomposes

yt+h into the predictor known in t and the remaining error, suchthat

yt+h − yt(h) =

h−1∑

j=0

Aj1ut+h−j ,

and

Σy (h) = MSE(yt(h)) = E

h−1∑

j=0

Aj1ut+h−j

h−1∑

j=0

Aj1ut+h−j

′

=h−1∑

j=0

Aj1Σu(A

j1)

′ = MSE(yt(h − 1)) + Ah−11 Σu(A

h−11 )′,

such that MSE increases in h.Vector autoregressions University of Vienna and Institute for Advanced Studies Vienna


Forecasting

Forecast MSE for general VAR(p)

Using the Wold-type MA representation yt = µ +∑∞

j=0 Φjut−j , ascheme analogous to p = 1 works for VAR(p) with p > 1, using J.The forecast error variance is

Σy (h) = MSE(yt(h)) =h−1∑

j=0

ΦjΣuΦ′j ,

which converges to Σy = Γy (0) for h → ∞.

These MSE formulae can also be used to determine intervalforecasts (confidence intervals).



Structural VAR analysis


There are three (interdependent) approaches to the interpretationof VAR models:

1. Granger causality

2. Impulse response analysis

3. Forecast error variance decomposition (FEVD)




Granger causality

Assume two M- and N–dimensional sub-processes x and z of aK–dimensional process y , such that y = (z ′, x ′)′.

DefinitionThe process xt is said to cause zt in Granger’s sense iff

Σz(h|Ωt) < Σz(h|Ωt \ xs , s ≤ t)

for some t and h.

The set Ωt is an information set containing ys , s ≤ t; the matrix <is defined via positive definiteness of the difference; the correctinterpretation of the \ operator is doubtful.

The property is not antisymmetric: x may cause z and z may alsocause x : feedback.




Instantaneous Granger causality

Again, assume two M- and N–dimensional sub-processes x and zof a K–dimensional process y .

DefinitionThere is instantaneous causality between process xt and zt inGranger’s sense iff

Σz(1|Ωt ∪ xt+1) < Σz(1|Ωt).

The property is symmetric: x and z can be exchanged in thedefinition: instantaneous causality knows no direction.




Granger causality in a MA model

Assume the representation

yt =

[

zt

xt

]

=

[

µ1

µ2

]

+

[

Φ11(L) Φ12(L)Φ21(L) Φ22(L)

] [

u1t

u2t

]

.

It is easily motivated that x does not cause z iff Φ12,j = 0 for all j .




Granger causality in a VAR

A stationary stable VAR has an MA representation, so Grangercausality can be checked on that one. Alternatively, consider thepartitioned VAR

yt =

[

zt

xt

]

=

[

ν1

ν2

]

+

p∑

j=1

[

A11,j A12,j

A21,j A22,j

] [

zt−j

xt−j

]

+

[

u1t

u2t

]

.

It is easily shown that x does not cause z iff A12,j = 0, j = 1, . . . , p(block inverse of matrix).




Remarks on testing for Granger causality in a VAR

The property that Φ12,j = 0 characterizes non-causality is notrestricted to VARs: it works for any process with a Wold-typeMA representation;

The property may also be generalized to x and z being twosub-vectors of y with M + N < K . Some extensions, however,do not work properly for the VAR representation, just for theMA representation;

The definition is sometimes modified to h–step causality,meaning xt does not improve zt(j), j < h but does improvezt(h): complications in naive testing for the VAR form,though not for the MA form.




Characterization of instantaneous causality

Proposition

Let yt be a VAR with nonsingular innovation variance matrix Σu.There is no instantaneous causality between xt and zt iff

E(u1tu′2t) = 0.

This condition is certainly symmetric.




Instantaneous causality and the non-unique MArepresentation

Consider the Cholesky factorization of Σu = PP ′, with P lowertriangular. Then, it holds that

yt = µ +∞∑

j=0

ΦjPP−1ut−j = µ +∞∑

j=0

Θjwt−j ,

with Θj = ΦjP and w = P−1u and

Σw = P−1Σu(P−1)′ = IK .

In this form, instantaneous causality corresponds to Θ21,0 6= 0,which looks asymmetric. An analogous form and condition isachieved by exchanging x and z .




Impulse response analysis: the idea

The researcher wishes to add detail to the Granger-causalityanalysis and to quantify the effect of an impulse in a componentvariable yj ,t on another component variable yk,t .

The ‘derivative’ ∂yk,t+h/∂yj ,t cannot be determined from the VARmodel. The ‘derivative’

∂yk,t+h

∂uj ,t

corresponds to the (k , j) entry in the matrix Φh of the MArepresentation. It is not uniquely determined. The matrix of graphsof Φkj ,h versus h is called the impulse response function (IRF).




Impulse response analysis: general properties

If yj does not Granger-cause yk , the corresponding impulseresponse in (k , j) is constant zero;

If the first p(K − 1) values of an impulse response are 0, thenall values are 0;

If the VAR is stable, all impulse response functions mustconverge to 0 as h → ∞;

It is customary to scale the impulse responses by the standarddeviation of the response variable

√σkk ;

The impulse response based on the canonical MArepresentation Φh, h ∈ N, ignores the correlation across thecomponents of u in Σu and may not correspond to the truereaction.




Orthogonal impulse response

Re-consider the alternative MA representation based on

Σu = PP ′, Θj = ΦjP, w = P−1u,

that is,

yt = µ +∞∑

j=0

Θjwt−j .

Because of Σw = IK , ‘shocks’ are orthogonal. Note that wj is alinear function of uk , k ≤ j . The resulting matrix of graphs Θkj ,h

versus h is an orthogonal impulse response function (OIRF).




Orthogonal impulse response: properties

Because of the multiplication by the matrix P, diagonalentries in Θ0 will not be ones. This problem can be remediedsimply via a diagonal matrix, such that Θ0 has diagonal onesand Σw is diagonal;

The OIRF can be quite different from the IRF based on Φh. Ifthere is no instantaneous causality, both will coincide;

The orthogonal IRF based on a re-ordering of variablecomponents will differ from the correspondingly re-orderedOIRF. Additional to the permutations, a continuum of OIRFversions may be considered.




Ways out of the arbitrariness dilemma

Some researchers suggest to arrange the vector of componentssuch that the a priori ‘most exogenous’ variable appears firstetc.;

The generalized impulse response function (GIRF) accordingto Pesaran summarizes the OIRF for each response variablesuffering the maximum response (coming last in the vector).It is not an internally consistent IRF;

So-called structural VARs attempt to identify the ‘shocks’from economic theory. They often use an additional matrix A0

that permits an immediate reaction of a component yk toanother yj and various identification restrictions. They mayalso be ‘over-identified’ and restrict the basic VAR(p) model.




Decomposition of the h–step error

Starting from an orthogonal MA representation with Σw = IK ,

yt = µ +∞∑

i=0

Θiwt−i ,

the error of an h–step forecast is

yt+h − yt(h) =h−1∑

i=0

Θiwt+h−i ,

and for the j–th component

yj ,t+h − yj ,t(h) =h−1∑

i=0

K∑

k=1

θjk,iwk,t+i .

All hK terms are orthogonal, and this error can be decomposedinto the K contributions from the component errors.




Forecast error variance decomposition

Consider the variance of the j–th forecast component

MSE(yj ,t(h)) =

h−1∑

i=0

K∑

k=1

θ2jk,i .

The share that is due to the k–th component error,

ωjk,h =

∑h−1i=0 θ2

jk,i

MSE(yj ,t(h)),

defines the forecast error variance decomposition (FEVD) and isoften tabulated or plotted versus h for j , k = 1, . . . ,K .




Invariants and others in structural analysis

1. Granger causality is independent of the choice of Wold-typeMA representation. It is there or it is not;

2. Impulse response functions depend on the chosenrepresentation. OIRF may differ for distinct orderings of thecomponent variables;

3. Forecast error variance decomposition inherits the problems ofIRF analysis: unique only in the absence of instantaneouscausality.


Documents

Vector autoregressions - Homepage-Adressbuch - Zentraler