EUI Working Papers · A number of textbooks and review articles deal with VAR models. Examples of books are Hamilton (1994), Johansen (1995), Hatanaka (1996), Lu¨tkepohl and Kr¨atzig

DEPARTMENT OF ECONOMICS

EUI Working Papers

ECO 2011/30 DEPARTMENT OF ECONOMICS

VECTOR AUTOREGRESSIVE MODELS

Helmut Luetkepohl

EUROPEAN UNIVERSITY INSTITUTE, FLORENCE

DEPARTMENT OF ECONOMICS

Vector Autoregressive Models

HELMUT LUETKEPOHL

EUI Working Paper ECO 2011/30

This text may be downloaded for personal research purposes only. Any additional reproduction for

other purposes, whether in hard copy or electronically, requires the consent of the author(s), editor(s). If cited or quoted, reference should be made to the full name of the author(s), editor(s), the title, the

working paper or other series, the year, and the publisher.

ISSN 1725-6704

© 2011 Helmut Luetkepohl

Printed in Italy European University Institute

Badia Fiesolana I – 50014 San Domenico di Fiesole (FI)

Italy www.eui.eu

cadmus.eui.eu

October 13, 2011

Vector Autoregressive Models

Helmut Lutkepohl1

European University Institute, Florence

Contents

1 Introduction 2

1.1 Structure of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Terminology, Notation and General Assumptions . . . . . . . . . . . . . . . 3

2 VAR Processes 5

2.1 The Reduced Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Structural Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Estimation of VAR Models 7

3.1 Classical Estimation of Reduced Form VARs . . . . . . . . . . . . . . . . . . 73.2 Bayesian Estimation of Reduced Form VARs . . . . . . . . . . . . . . . . . . 93.3 Estimation of Structural VARs . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Model Specification 10

5 Model Checking 11

5.1 Tests for Residual Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . 125.2 Other Popular Tests for Model Adequacy . . . . . . . . . . . . . . . . . . . . 13

6 Forecasting 13

6.1 Forecasting Known VAR Processes . . . . . . . . . . . . . . . . . . . . . . . 146.2 Forecasting Estimated VAR Processes . . . . . . . . . . . . . . . . . . . . . . 15

7 Granger-Causality Analysis 15

8 Structural Analysis 16

8.1 Impulse Response Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168.2 Forecast Error Variance Decompositions . . . . . . . . . . . . . . . . . . . . 188.3 Historical Decomposition of Time Series . . . . . . . . . . . . . . . . . . . . 198.4 Analysis of Forecast Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 20

9 Conclusions and Extensions 22

1Prepared for the Handbook of Research Methods and Applications on Empirical Macroeconomics. Helpfulcomments by Lutz Kilian are gratefully acknowledged.

1 Introduction

Multivariate simultaneous equations models were used extensively for macroeconometricanalysis when Sims (1980) advocated vector autoregressive (VAR) models as alternatives. Atthat time longer and more frequently observed macroeconomic time series called for modelswhich described the dynamic structure of the variables. VAR models lend themselves for thispurpose. They typically treat all variables as a priori endogenous. Thereby they accountfor Sims’ critique that the exogeneity assumptions for some of the variables in simultaneousequations models are ad hoc and often not backed by fully developed theories. Restrictions,including exogeneity of some of the variables, may be imposed on VAR models based onstatistical procedures.

VAR models are natural tools for forecasting. Their setup is such that current values of aset of variables are partly explained by past values of the variables involved. They can also beused for economic analysis, however, because they describe the joint generation mechanism ofthe variables involved. Structural VAR analysis attempts to investigate structural economichypotheses with the help of VAR models. Impulse response analysis, forecast error variancedecompositions, historical decompositions and the analysis of forecast scenarios are the toolswhich have been proposed for disentangling the relations between the variables in a VARmodel.

Traditionally VAR models are designed for stationary variables without time trends.Trending behavior can be captured by including deterministic polynomial terms. In the1980s the discovery of the importance of stochastic trends in economic variables and thedevelopment of the concept of cointegration by Granger (1981), Engle and Granger (1987),Johansen (1995) and others have shown that stochastic trends can also be captured byVAR models. If there are trends in some of the variables it may be desirable to separatethe long-run relations from the short-run dynamics of the generation process of a set ofvariables. Vector error correction models offer a convenient framework for separating long-run and short-run components of the data generation process (DGP). In the present chapterlevels VAR models are considered where cointegration relations are not modelled explicitlyalthough they may be present. Specific issues related to trending variables will be mentionedoccasionally throughout the chapter. The advantage of levels VAR models over vector errorcorrection models is that they can also be used when the cointegration structure is unknown.Cointegration analysis and error correction models are discussed specifically in the nextchapter.

1.1 Structure of the Chapter

Typically a VAR analysis proceeds by first specifying and estimating a reduced form modelfor the DGP and then checking its adequacy. Model deficiencies detected at the latter stageare resolved by modifying the model. If the reduced form model passes the checking stage,it may be used for forecasting, Granger-causality or structural analysis. The main steps ofthis modelling approach are depicted in Figure 1. The basic VAR model will be introducedin Section 2. Estimation and model specification issues are discussed in Sections 3 and 4,respectively, and model checking is considered in Section 5. Sections 6, 7 and 8 addressforecasting, Granger-causality analysis and structural modelling including impulse responseanalysis, forecast error variance decomposition, historical decomposition of time series andanalysis of forecast scenarios. Section 9 concludes and discusses extensions.

2

Specification and estimationof reduced form VAR model

❄

Model checking

modelrejected

✛

model not rejected

❄ ❄ ❄

ForecastingGranger-causalityanalysis

Structuralanalysis

❄ ❄ ❄ ❄Analysis offorecastscenarios

Historicaldecomposition

Impulseresponseanalysis

Forecast errorvariancedecomposition

Figure 1: VAR analysis.

A number of textbooks and review articles deal with VAR models. Examples of booksare Hamilton (1994), Johansen (1995), Hatanaka (1996), Lutkepohl and Kratzig (2004) andin particular Lutkepohl (2005). More formal and more detailed treatments of some of theissues discussed in the present chapter can be found in these references. The present chapterdraws heavily on Lutkepohl and Kratzig (2004), Lutkepohl (2005) and earlier survey articlesby Lutkepohl (2006b, 2009).

1.2 Terminology, Notation and General Assumptions

Given the importance of stochastic trends it is useful to have a special terminology in dealingwith them. A time series variable yt is called integrated of order d (I(d)) if stochastic trendscan be removed by differencing the variable d times and a stochastic trend still remains afterdifferencing only d−1 times. Defining the differencing operator ∆ such that ∆yt = yt−yt−1,the variable yt is I(d) if ∆

dyt is stationary while ∆d−1yt still has a stochastic trend. A moreformal definition of an integrated variable or process can be found in Johansen (1995). In thischapter all variables are assumed to be either I(0) (i.e., they do not have a stochastic trend)

3

or I(1) (if there are stochastic trends) if not explicitly stated otherwise. A K-dimensionalvector of time series variables yt = (y1t, . . . , yKt)

′ is called I(d), in short, yt ∼ I(d), if at leastone of its components is I(d). Using this terminology, it is possible that some componentsof yt may be I(0) individually if yt ∼ I(1). A set of I(d) variables is called cointegrated if alinear combination exists which is of lower integration order. In that case the variables havea common trend component.

The I(d) terminology refers only to the stochastic properties of the variables. Therecan also be deterministic terms. For simplicity I assume that deterministic components willusually be at most linear trends of the form E(yt) = µt = µ0 + µ1t. If µ1 = 0 there is justa constant or intercept term in the process. To further simplify matters it is occasionallyassumed that there is no deterministic term so that µt = 0. Other deterministic terms whichare important in practice are seasonal dummies and other dummy variables. Including themin VAR models is a straightforward extension which is not considered explicitly in thischapter.

The following matrix notation is used. The transpose, inverse, trace, determinant andrank of the matrix A are denoted by A′, A−1, tr(A), det(A) and rk(A), respectively. Formatrices A (n×m) and B (n× k), [A : B] or (A,B) denotes the (n× (m+ k)) matrix whichhas A as its first m columns and B as the last k columns. For an (n×m) matrix A of fullcolumn rank (n > m), an orthogonal complement is denoted by A⊥, that is, A

′⊥A = 0 and

[A : A⊥] is a nonsingular square matrix. The zero matrix is the orthogonal complement ofa nonsingular square matrix and an identity matrix of suitable dimension is the orthogonalcomplement of a zero matrix. The symbol vec denotes the column vectorization operator, ⊗signifies the Kronecker product and In is an (n× n) identity matrix.

The sets of all integers, positive integers and complex numbers are denoted by Z, N andC, respectively. The lag operator L shifts the time index backward by one period, that is,for a time series variable or vector yt, Lyt = yt−1. Using this notation, the previously defineddifferencing operator may be written as ∆ = 1−L. For a number x, |x| denotes the absolutevalue or modulus. A sum is defined to be zero if the lower bound of the summation indexexceeds the upper bound.

The following conventions are used with respect to distributions and stochastic processes.The symbol ‘∼ (µ,Σ)’ abbreviates ‘has a distribution with mean (vector) µ and (co)variance(matrix) Σ’ and N (µ,Σ) denotes a (multivariate) normal distribution with mean (vector) µ

and (co)variance (matrix) Σ. Convergence in distribution is denoted asd→ and plim stands

for the probability limit. Independently, identically distributed is abbreviated as iid. Astochastic process ut with t ∈ Z or t ∈ N is called white noise if the ut’s are iid with meanzero, E(ut) = 0 and positive definite covariance matrix Σu = E(utu

′t).

The following abbreviations are used: DGP, VAR, SVAR and MA for data generationprocess, vector autoregression, structural vector autoregression and moving average, respec-tively; ML, OLS, GLS, LM, LR and MSE for maximum likelihood, ordinary least squares,generalized least squares, Lagrange multiplier, likelihood ratio and mean squared error, re-spectively. The natural logarithm is abbreviated as log.

4

2 VAR Processes

2.1 The Reduced Form

Suppose the investigator is interested in a set of K related time series variables collected inyt = (y1t, . . . , yKt)

′. Given the importance of distinguishing between stochastic and deter-ministic components of the DGPs of economic variables, it is convenient to separate the twocomponents by assuming that

yt = µt + xt, (2.1)

where µt is the deterministic part and xt is a purely stochastic process with zero mean. Thedeterministic term µt is at most a linear trend (µt = µ0+µ1t) and may also be zero (µt = 0)or just a constant (µt = µ0) for simplicity. Deterministic trend terms have implausibleimplications in the context of forecasting. Hence, they are not recommendable in appliedVAR analysis. The issue will be further discussed in Section 6.1. The purely stochastic part,xt, may be I(1) and, hence, may include stochastic trends and cointegration relations. Ithas mean zero and a VAR representation. The properties of the observable process yt aredetermined by those of µt and xt. In particular, the order of integration and the cointegrationrelations are determined by xt.

Suppose the stochastic part xt is a VAR process of order p (VAR(p)) of the form

xt = A1xt−1 + · · ·+ Apxt−p + ut, (2.2)

where the Ai (i = 1, . . . , p) are (K × K) parameter matrices and the error process ut =(u1t, . . . , uKt)

′ is a K-dimensional zero mean white noise process with covariance matrixE(utu

′t) = Σu, that is, ut ∼ (0,Σu). Using the lag operator and defining the matrix polyno-

mial in the lag operator A(L) as A(L) = IK − A1L − · · · − ApLp, the process (2.2) can be

equivalently written as

A(L)xt = ut. (2.3)

The VAR process (2.2)/(2.3) is stable if

detA(z) = det(IK − A1z − · · · − Apzp) 6= 0 for z ∈ C, |z| ≤ 1. (2.4)

In other words, xt is stable if all roots of the determinantal polynomial are outside thecomplex unit circle. In that case xt is I(0). Under usual assumptions a stable processxt has time invariant means, variances and covariance structure and is, hence, stationary.If, however, detA(z) = 0 for z = 1 (i.e., the process has a unit root) and all other rootsof the determinantal polynomial are outside the complex unit circle, then some or all ofthe variables are integrated, the process is, hence, nonstationary and the variables may becointegrated. Recall that all variables are either I(0) or I(1) by default.

Also, recall that xt is the (typically unobserved) stochastic part whereas yt is the vector ofobserved variables. Pre-multiplying (2.1) by A(L), that is, considering A(L)yt = A(L)µt+ut,shows that yt inherits the VAR(p) representation from xt. In other words, if µt = µ0 + µ1t,A(L)yt = ν0 + ν1t+ ut or

yt = ν0 + ν1t+ A1yt−1 + · · ·+ Apyt−p + ut, (2.5)

5

where ν0 = (IK−∑p

j=1Aj)µ0+(∑p

j=1 jAj)µ1 and ν1 = (IK−∑p

j=1Aj)µ1. Since all variablesappear in levels, this form is known as the levels form of the VAR process. Alternatively, someor all variables may appear in first differences if the variables are I(1) and not cointegrated.

If the parameters νi, i = 0, 1, are unrestricted in (2.5), the variables may have quadratictrends if yt ∼ I(1). Thus, the additive model setup (2.1) imposes restrictions on the deter-ministic parameters in (2.5). Generally the additive setup makes it necessary to think aboutthe deterministic terms at the beginning of the analysis and allow for the appropriate poly-nomial order. Sometimes trend-adjustments are performed prior to a VAR analysis. Thereason is that the stochastic part of the variables is often of main interest in econometricanalysis because it is viewed as describing the behavioral relations. In that case there maybe no deterministic term in the levels VAR form (2.5).

Using terminology from the simultaneous equations literature, the model (2.5) is in re-

duced form because all right-hand side variables are lagged or predetermined. The instanta-neous relations between the variables are summarized in the residual covariance matrix. Ineconomic analysis it is often desirable to model the contemporaneous relations between thevariables directly. This may be done by setting up a structural form which is discussed next.

2.2 Structural Forms

In structural form models contemporaneous variables may appear as explanatory variablesin some equations. For example,

Ayt = ν∗0 + ν∗1t+ A∗1yt−1 + · · ·+ A∗pyt−p + vt, (2.6)

is a structural form. Here the (K×K) matrix A reflects the instantaneous relations, ν∗i = Aνi(i = 0, 1) and A∗j = AAj (j = 1, . . . , p). The structural form error term vt = Aut is iid whitenoise with covariance matrix Σv = AΣuA

′. The matrix A usually has ones on its maindiagonal so that the set of equations in (2.6) can be written such that each of the variablesappears on the left-hand side of one of the equations and may depend on contemporaneousvalues of some or all of the other variables. Moreover, A is typically chosen such that Σv

is a diagonal matrix. Structural VAR models are discussed in more detail in Chapter 24 ofthis volume (Kilian (2011)). Therefore they are only sketched briefly here. Other expositorytreatments are Amisano and Giannini (1997), Watson (1994), Breitung, Bruggemann andLutkepohl (2004) and Lutkepohl (2005).

Multiplying (2.6) by any nonsingular matrix results in a representation of the form (2.6).This shows that the parameters of the structural form (2.6) are not identified without fur-ther restrictions. Imposing restrictions on A and Σv to identify the structural form is amain focus of structural VAR (SVAR) analysis (see Chapter 24, Kilian (2011)). Often zerorestrictions are placed on A directly. In other words, some variables are not allowed to havean instantaneous impact on some other variables. For example, A may be lower-triangularif there is a recursive relation between the variables.

Alternatively, in SVAR analyses researchers sometimes think of specific shocks hittingthe system. A suitable structural model setup for that case is obtained by pre-multiplying(2.6) by B = A

−1 and considering

yt = ν0 + ν1t+ A1yt−1 + · · ·+ Apyt−p + Bvt. (2.7)

This setup makes it easy to specify that a certain structural shock vit does not have aninstantaneous effect on one of the observed variables by restricting the corresponding elementof B = A

−1 to be zero. In other words, zero restrictions are placed on B = A−1.

6

Other popular identifying restrictions are placed on the accumulated long-run effectsof shocks. For example, if some variables represent rates of change of some underlyingquantity, one may postulate that a shock has no long-run effect on the level of a variable byenforcing that the accumulated changes in the variable induced by the shock add to zero.For instance, in a seminal article Blanchard and Quah (1989) consider a bivariate modelconsisting of output growth rates (y1t) and an unemployment rate (y2t). They assume thatdemand shocks have no long-run effects on output. In other words, the accumulated effectsof a demand shock on the output growth rates are assumed to be zero. Such restrictions areeffectively restrictions for A or/and B.

The SVAR models (2.6) and (2.7) are sometimes referred to as A- and B-models, re-spectively (see Lutkepohl (2005)). They can also be combined to an AB-model of the form

Ayt = ν∗0 + ν∗1t+ A∗1yt−1 + · · ·+ A∗pyt−p + Bvt, (2.8)

which makes it easy to impose restrictions on the instantaneous effects of changes in observedvariables and unobserved shocks. On the other hand, it involves many more parameters inA and B and, hence, requires more identifying restrictions. In the B- and AB-models, theresiduals are usually assumed to be standardized to have identity covariance matrix, that is,Σv = IK . In that case the reduced form covariance matrix is Σu = BB

′ for the B-model andΣu = A

−1BB

′A−1′ for the AB-model.

As mentioned earlier, identifying the structural relations between the variables or identi-fying the structural shocks is a main concern of SVAR analysis. Other types of informationand restrictions for identification than those mentioned previously have also been proposed.For instance, sign restrictions, using information from higher-frequency data or heteroskedas-ticity may be considered (see Chapter 24, Kilian (2011) for details).

Prior to a structural analysis, a reduced form model as a valid description of the DGPis usually constructed. The stages of reduced form VAR model construction are discussedin the following. Before model specification is considered, estimation of VAR models will bediscussed because estimation is typically needed at the specification stage.

3 Estimation of VAR Models

Reduced form VAR models can be estimated with standard methods. Classical least squaresand maximum likelihood (ML) methods are discussed in Section 3.1 and Bayesian estimationis considered in Section 3.2. Estimation of structural models is treated in Section 3.3.

3.1 Classical Estimation of Reduced Form VARs

Consider the levels VAR(p) model (2.5) written in the more compact form

yt = [ν0, ν1, A1, . . . , Ap]Zt−1 + ut, (3.1)

where Zt−1 = (1, t, y′t−1, . . . , yt−p)′. The deterministic terms may be adjusted accordingly if

there is just a constant in the model or no deterministic component at all. Given a sample ofsize T , y1, . . . , yT , and p presample vectors, y−p+1, . . . , y0, the parameters can be estimatedefficiently by ordinary least squares (OLS) for each equation separately. The estimator is

7

easily seen to be

[ν0, ν1, A1, . . . , Ap] =

(

T∑

t=1

ytZ′t−1

)(

T∑

t=1

ZtZ′t−1

)−1

. (3.2)

This estimator is identical to the generalized least squares (GLS) estimator, if no restrictionsare imposed on the parameters. For a normally distributed (Gaussian) process yt, whereut ∼ N (0,Σu), this estimator is also identical to the ML estimator, conditional on theinitial presample values. Thus, the estimator has the usual desirable asymptotic propertiesof standard estimators. It is asymptotically normally distributed with smallest possibleasymptotic covariance matrix and the usual inference procedures are available if the processis stable. In other words, in this case t-statistics can be used for testing individual coefficientsand for setting up confidence intervals. Moreover, F -tests can be used for testing statisticalhypotheses for sets of parameters. Of course, in the present framework these procedures areonly valid asymptotically and not in small samples.

If there are integrated variables so that yt ∼ I(1), the process is not stable and thevariables may be cointegrated. In that case the OLS/ML estimator can still be used and itis still asymptotically normal under general conditions (see Park and Phillips (1988, 1989),Sims, Stock and Watson (1990), Lutkepohl (2005, Chapter 7)). However, in that case thecovariance matrix of the asymptotic distribution is singular because some estimated parame-ters or linear combinations of them converge with a faster rate than the usual

√T rate when

the sample size goes to infinity. This result implies that t-, χ2- and F -tests for inferenceregarding the VAR parameters may be invalid asymptotically (Toda and Phillips (1993)).Although these properties require caution in doing inference for integrated processes, thereare many situations where standard inference still holds (see Toda and Yamamoto (1995),Dolado and Lutkepohl (1996), Inoue and Kilian (2002a)). In particular, asymptotic infer-ence on impulse responses as discussed in Section 8.1 remains valid if the order of the VARprocess is greater than one.

If restrictions are imposed on the parameters, OLS estimation may be inefficient. In thatcase GLS estimation may be beneficial. Let α = vec[ν1, ν2, A1, . . . , Ap] and suppose thatthere are linear restrictions for the parameters such as zero restrictions which exclude someof the lagged variables from some of the equations. Linear restrictions can often be writtenin the form

α = Rγ, (3.3)

where R is a suitable, known ((K2p + 2K) × M) restriction matrix with rank M whichtypically consists of zeros and ones and γ is the (M × 1) vector of unrestricted parameters.The GLS estimator for γ is then

γ =

[

R′

(

T∑

t=1

ZtZ′t−1 ⊗ Σ−1u

)

R

]−1

R′vec

(

Σ−1u

T∑

t=1

ytZ′t−1

)

. (3.4)

The estimator γ has standard asymptotic properties if yt ∼ I(0), that is, the GLS estimatoris consistent and asymptotically normally distributed and usual methods for inference arevalid asymptotically.

In practice, the white noise covariance matrix is usually unknown and has to be replacedby an estimator based on an unrestricted estimation of the model. The resulting feasible GLS

8

estimator, say ˆγ, has the same asymptotic properties as the GLS estimator under generalconditions. The corresponding feasible GLS estimator of α, ˆα = Rˆγ, is also consistent andasymptotically normal and allows for standard asymptotic inference. For Gaussian whitenoise ut, ML estimation may be used alternatively. Its asymptotic properties are the sameas those of the GLS estimator under standard assumptions.

For I(1) processes a specific analysis of the integration and cointegration properties of theleft-hand and right-hand side variables of the individual equations is necessary to determinethe asymptotic properties of the estimators and the associated inference procedures.

3.2 Bayesian Estimation of Reduced Form VARs

Standard Bayesian methods for estimating linear regression models can be applied for esti-mating the parameters of reduced form VAR models. They are not discussed here in detailbecause they are considered elsewhere in this volume. In the VAR literature specific priorshave been used, however, which may be worth noting at this point. Assuming a normaldistribution for the residuals and, hence, for the observed yt together with a normal-Wishartprior distribution for the VAR coefficients results in a normal-Wishard posterior distribution.Such a setup is rather common in the SVAR literature (see Uhlig (2005, Appendix B)). Theso-called Minnesota prior is a specific example of a prior which has been used quite oftenin practice (see Doan, Litterman and Sims (1984), Litterman (1986)). It shrinks the VARtowards a random walk for each of the variables. Extensions and alternatives were proposedby Kadiyala and Karlsson (1997), Villani (2005), Sims, Waggoner and Zha (2008), Giannone,Lenza and Primiceri (2010) and others. Other, recent proposals include shrinking towardssome dynamic stochastic general equilibrium model (e.g., Ingram and Whiteman (1994) andDel Negro and Schorfheide (2004)). A more detailed exposition of Bayesian methods in VARanalysis may be found in Canova (2007, Chapters 9 - 11).

3.3 Estimation of Structural VARs

Properly identified structural form VAR models are also usually estimated by least squares,ML or Bayesian methods. The specific estimation algorithm depends to some extent onthe type of restrictions used for identification. For example, if a just-identified A-modelis used with ones on the main diagonal and diagonal residual covariance matrix Σv, equa-tionwise OLS can be used for estimation. For the B-model (2.7) without restrictions onν0, ν1, A1, . . . , Ap, the latter parameters can be concentrated out of the likelihood functionby replacing them with their OLS estimators, using Σu = BB

′ and estimating B by maxi-mizing the concentrated Gaussian log-likelihood

l(B) = constant− T

2log det(B)2 − T

2tr(B′−1B−1Σu), (3.5)

where Σu = T−1∑T

t=1 utu′t is the estimator of Σu based on the OLS residuals (cf. Breitung

et al. (2004)). If the actual distribution of yt (and, hence, of ut) is not normal, the resultingestimators are quasi- or pseudo-ML estimators. They still allow for standard asymptoticinference under general conditions.

In the AB-model the concentrated log-likelihood function in terms of A and B is

l(A,B) = constant +T

2log det(A)2 − T

2log det(B)2 − T

2tr(A′B′−1B−1AΣu). (3.6)

9

Numerical methods can be used for optimizing the functions in (3.5) and (3.6) with respectto the free parameters in B or A and B. The resulting estimators have the usual asymptoticproperties of ML estimators (see, e.g., Lutkepohl (2005, Chapter 9) for details). Hence,asymptotic inference proceeds in the usual way. Alternatively, one may use Bayesian esti-mation methods (see, e.g., Sims et al. (2008)). The estimates will be of importance in thestructural VAR analysis discussed in Section 8 and Chapter 24 (Kilian (2011)).

4 Model Specification

Model specification in the present context involves selecting the VAR order and possiblyimposing restrictions on the VAR parameters. Notably zero restrictions on the parametermatrices may be desirable because the number of parameters in a VAR model increases withthe square of the VAR order. Lag order specification is considered next and some commentson setting zero restrictions on the parameters are provided at the end of this section.

The VAR order is typically chosen by sequential testing procedures or model selectioncriteria. Sequential testing proceeds by specifying a maximum reasonable lag order, say pmax,and then testing the following sequence of null hypotheses: H0 : Apmax

= 0, H0 : Apmax−1 = 0,etc.. The procedure stops when the null hypothesis is rejected for the first time. The orderis then chosen accordingly. For stationary processes the usual Wald or LR χ2 tests forparameter restrictions can be used in this procedure. If there are I(1) variables these testsare also asymptotically valid as long as the null hypothesis H0 : A1 = 0 is not tested.Unfortunately, the small sample distributions of the tests may be quite different from theirasymptotic counterparts, in particular for systems with more than a couple of variables(e.g., Lutkepohl (2005, Section 4.3.4)). Therefore it may be useful to consider small sampleadjustments, possibly based on bootstrap methods (e.g., Li and Maddala (1996), Berkowitzand Kilian (2000)).

Alternatively, model selection criteria can be used. Some of them have the general form

C(m) = log det(Σm) + cTϕ(m), (4.1)

where Σm = T−1∑T

t=1 utu′t is the OLS residual covariance matrix estimator for a reduced

form VAR model of order m, ϕ(m) is a function of the order m which penalizes large VARorders and cT is a sequence which may depend on the sample size and identifies the specificcriterion. Popular examples are Akaike’s information criterion (Akaike (1973, 1974)),

AIC(m) = log det(Σm) +2

TmK2,

where cT = 2/T , the Hannan-Quinn criterion (Hannan and Quinn (1979), Quinn (1980)),

HQ(m) = log det(Σm) +2 log log T

TmK2,

with cT = 2 log log T/T , and the Schwarz (or Rissanen) criterion (Schwarz (1978), Rissanen(1978)),

SC(m) = log det(Σm) +log T

TmK2,

10

with cT = log T/T . In all these criteria ϕ(m) = mK2 is the number of VAR parameters in amodel with orderm. The VAR order is chosen such that the respective criterion is minimizedover the possible orders m = 0, . . . , pmax. Among these three criteria, AIC always suggeststhe largest order, SC chooses the smallest order and HQ is in between (Lutkepohl (2005,Chapters 4 and 8)). Of course, the criteria may all suggest the same lag order. The HQ andSC criteria are both consistent, that is, under general conditions the order estimated withthese criteria converges in probability or almost surely to the true VAR order p if pmax is atleast as large as the true lag order. AIC tends to overestimate the order asymptotically witha small probability. These results hold for both I(0) and I(1) processes (Paulsen (1984)).

The lag order obtained with sequential testing or model selection criteria depends tosome extent on the choice of pmax. Choosing a small pmax, an appropriate model may notbe in the set of possibilities and choosing a large pmax may result in a large value which isspurious. At an early stage of the analysis, using a moderate value for pmax appears to be asensible strategy. An inadequate choice should be detected at the model checking stage (seeSection 5).

Once the model order is determined zero restrictions may be imposed on the VAR co-efficient matrices to reduce the number of parameters. Standard testing procedures can beused for that purpose. The number of possible restrictions may be very large, however, andsearching over all possibilities may result in excessive computations. Therefore a numberof shortcuts have been proposed in the related literature under the name of subset modelselection procedures (see Lutkepohl (2005, Section 5.2.8)).

If a model is selected by some testing or model selection procedure, that model is typicallytreated as representing the true DGP in the subsequent statistical analysis. Recent researchis devoted to the problems and possible errors associated with such an approach (e.g., Leeband Potscher (2005)). This literature points out that the actual distribution which does notcondition on the model selected by some statistical procedure may be quite different fromthe conditional one. Suppose, for example, that the VAR order is selected by the AIC, say,the order chosen by this criterion is p. Then a typical approach in practice is to treat aVAR(p) model as the true DGP and perform all subsequent analysis under this assumption.Such a conditional analysis can be misleading even if the true order coincides with p becausethe properties of the estimators for the VAR coefficients are affected by the post-modelselection step. Conditioning on p ignores that this quatity is also a random variable basedon the same data as the estimators of the VAR parameters. Since no general proceduresexist for correcting the error resulting from this simplification, there is little to recommendfor improving applied work in this respect.

5 Model Checking

Procedures for checking whether the VAR model represents the DGP of the variables ade-quately range from formal tests of the underlying assumptions to informal procedures suchas inspecting plots of residuals and autocorrelations. Since a reduced form is underlyingevery structural form, model checking usually focusses on reduced form models. If a specificreduced form model is not an adequate representation of the DGP, any structural form basedon it cannot represent the DGP well. Formal tests for residual autocorrelation, nonnormal-ity and conditional heteroskedasticity for reduced form VARs are briefly summarized in thefollowing. For other procedures see, e.g., Lutkepohl (2004).

11

5.1 Tests for Residual Autocorrelation

Portmanteau and Breusch-Godfrey-LM tests are standard tools for checking residual auto-correlation in VAR models. The null hypothesis of the protmanteau test is that all residualautocovariances are zero, that is, H0 : E(utu

′t−i) = 0 (i = 1, 2, . . . ). The alternative is that

at least one autocovariance and, hence, one autocorrelation is nonzero. The test statisticis based on the residual autocovariances, Cj = T−1

∑T

t=j+1 utu′t−j, where the ut’s are the

mean-adjusted estimated residuals. The portmanteau statistic is given by

Qh = Th∑

j=1

tr(C ′jC−10 CjC

−10 ), (5.1)

or the modified version

Q∗h = T 2

h∑

j=1

1

T − jtr(C ′jC

−10 CjC

−10 )

may be used. The two statistics have the same asymptotic properties. For an unrestrictedstationary VAR(p) process their null distributions can be approximated by a χ2(K2(h− p))distribution if T and h approach infinity such that h/T → 0. For VAR models with parame-ter restrictions, the degrees of freedom of the approximate χ2 distribution are obtained as thedifference between the number of (non-instantaneous) autocovariances included in the statis-tic (K2h) and the number of estimated VAR parameters (e.g., Ahn (1988), Hosking (1980,1981a, 1981b), Li and McLeod (1981) or Lutkepohl (2005, Section 4.4)). Bruggemann,Lutkepohl and Saikkonen (2006) show that this approximation is unsatisfactory for inte-grated and cointegrated processes. For such processes the degrees of freedom also dependon the cointegrating rank. Thus, portmanteau tests are not recommended for levels VARprocesses with unknown cointegrating rank.

The choice of h is crucial for the small sample properties of the test. If h is chosen toosmall the χ2 approximation to the null distribution may be very poor while a large h reducesthe power of the test. Using a number of different h values is not uncommon in practice.

The portmanteau test should be applied primarily to test for autocorrelation of highorder. For low order autocorrelation the Breusch-Godfrey LM test is more suitable. It maybe viewed as a test for zero coefficient matrices in a VAR model for the residuals,

ut = B1ut−1 + · · ·+Bhut−h + et.

The quantity et denotes a white noise error term. Thus, a test of

H0 : B1 = · · · = Bh = 0 versus H1 : Bi 6= 0 for at least one i ∈ {1, . . . , h}

may be used for checking that ut is white noise. The precise form of the statistic can befound, e.g., in Lutkepohl (2005, Section 4.4.4). It has an asymptotic χ2(hK2)-distributionunder the null hypothesis for both I(0) and I(1) systems (Bruggemann et al. (2006)). As aconsequence, the LM test is applicable for levels VAR processes with unknown cointegratingrank.

12

5.2 Other Popular Tests for Model Adequacy

Nonnormality tests are often used for model checking, although normality is not a necessarycondition for the validity of many of the statistical procedures related to VAR models.However, nonnonormality of the residuals may indicate other model deficiencies such asnonlinearities or structural change. Multivariate normality tests are often applied to theresidual vector of the VAR model and univariate versions are used to check normality of theerrors of the individual equations. The standard tests check whether the third and fourthmoments of the residuals are in line with a normal distribution, as proposed by Lomnicki(1961) and Jarque and Bera (1987) for univariate models. For details see Lutkepohl (2005,Section 4.5) and for small sample corrections see Kilian and Demiroglu (2000).

Conditional heteroskedasticity is often a concern for models based on data with monthlyor higher frequency. Therefore suitable univariate and multivariate tests are available tocheck for such features in the residuals of VAR models. Again much of the analysis can bedone even if there is conditional heteroskedasticity. Notice that the VAR model representsthe conditional mean of the variables which is often of primary interest. Still, it may beuseful to check for conditional heteroskedasticity to better understand the properties ofthe underlying data and to improve inference. Also, heteroskedastic residuals can indicatestructural changes. If conditional heteroskedasticity is found in the residuals, modelling themby multivariate GARCH models or using heteroskedasticity robust inference procedures maybe useful to avoid distortions in the estimators of the conditional mean parameters. Fora proposal to robustify inference against conditional heteroskedasticity see Goncalves andKilian (2004).

There are a number of tests for structural stability which check whether there are changesin the VAR parameters or the residual covariances throughout the sample period. Prominentexamples are so-called Chow tests. They consider the null hypothesis of time invariantparameters throughout the sample period against the possibility of a change in the parametervalues in some period TB, say. One possible test version compares the likelihood maximumof the constant parameter model to the one with different parameter values before and afterperiod TB. If the model is time invariant, the resulting LR statistic has an asymptotic χ2-distribution under standard assumptions. See Lutkepohl (2005, Section 4.6) for details andother tests for structural stability of VARs.

Stability tests are sometimes performed for a range of potential break points TB. Us-ing the maximum of the test statistics, that is, rejecting stability if one of the test statisticsexceeds some critical value, the test is no longer asymptotically χ2 but has a different asymp-totic distribution (see Andrews (1993), Andrews and Ploberger (1994) and Hansen (1997)).

If a reduced form VAR model has passed the adequacy tests, it can be used for forecastingand structural analysis which are treated next.

6 Forecasting

Since reduced form VAR models represent the conditional mean of a stochastic process, theylend themselves for forecasting. For simplicity forecasting with known VAR processes willbe discussed first and then extensions for estimated processes will be considered.

13

6.1 Forecasting Known VAR Processes

If yt is generated by a VAR(p) process (2.5), the conditional expectation of yT+h given yt,t ≤ T , is

yT+h|T = E(yT+h|yT , yT−1, . . . ) = ν0 + ν1(T + h) +A1yT+h−1|T + · · ·+ApyT+h−p|T , (6.1)

where yT+j|T = yT+j for j ≤ 0. If the white noise process ut is iid, yT+h|T is the optimal,minimum mean squared error (MSE) h-step ahead forecast in period T . The forecasts caneasily be computed recursively for h = 1, 2, . . . . The forecast error associated with an h-stepforecast is

yT+h − yT+h|T = uT+h + Φ1uT+h−1 + · · ·+ Φh−1uT+1, (6.2)

where the Φi matrices may be obtained recursively as

Φi =i∑

j=1

Φi−jAj, i = 1, 2, . . . , (6.3)

with Φ0 = IK and Aj = 0 for j > p (e.g., Lutkepohl (2005, Chapter 2)). In other words, theΦi are the coefficient matrices of the infinite order polynomial in the lag operator A(L)−1 =∑∞

j=0 ΦjLj. Obviously, the reduced form VAR residual ut is the forecast error for a 1-step

forecast in period t− 1. The forecasts are unbiased, that is, the errors have mean zero andthe forecast error covariance or MSE matrix is

Σy(h) = E[(yT+h − yT+h|T )(yT+h − yT+h|T )′] =

h−1∑

j=0

ΦjΣuΦ′j, (6.4)

that is, yT+h − yT+h|T ∼ (0,Σy(h)).In fact, the conditional expectation in (6.1) is obtained whenever the conditional expec-

tation of uT+h is zero or in other words, if ut is a martingale difference sequence. Even if theut’s are just uncorrelated and do not have conditional mean zero, the forecasts obtained re-cursively from (6.1) are still best linear forecasts but may do not be minimum MSE forecastsin a larger class which includes nonlinear forecasts.

These results are valid even if the VAR process has I(1) components. However, if ytis I(0) (stationary) the forecast MSEs are bounded as the horizon h goes to infinity. Incontrast, for I(1) processes the forecast MSE matrices are unbounded and, hence, forecastuncertainty increases without bounds for increasing forecast horizon.

Notice the major difference between considering deterministic and stochastic trends in aVAR model. The deterministic time trend in (6.1) does not add to the inaccuracy of the fore-casts in this framework, where no estimation uncertainty is present, while stochastic trendshave a substantial impact on the forecast uncertainty. Many researchers find it implausiblethat trending behavior is not reflected in the uncertainty of long-term forecasts. Thereforedeterministic trend components should be used with caution. In particular, higher orderpolynomial trends or even linear trends should be avoided unless there are very good reasonsfor them. Using them just to improve the fit of a VAR model can be counterproductive froma forecasting point of view.

For Gaussian VAR processes yt with ut ∼ iid N (0,Σu), the forecast errors are alsomultivariate normal, yT+h − yT+h|T ∼ N (0,Σy(h)), and forecast intervals can be set up in

14

the usual way. For non-Gaussian processes yt with unknown distribution other methods forsetting up forecast intervals are called for, for instance, bootstrap methods may be considered(see, e.g., Findley (1986), Masarotto (1990), Grigoletto (1998), Kabaila (1993), Kim (1999)and Pascual, Romo and Ruiz (2004)).

6.2 Forecasting Estimated VAR Processes

If the DGP is unknown and, hence, the VAR model only approximates the true DGP, thepreviously discussed forecasts will not be available. Let yT+h|T denote a forecast based on aVAR model which is specified and estimated based on the available data. Then the forecasterror is

yT+h − yT+h|T = (yT+h − yT+h|T ) + (yT+h|T − yT+h|T ). (6.5)

If the true DGP is a VAR process, the first term on the right-hand side is∑h−1

j=0 ΦjuT+h−j.It includes residuals ut with t > T only, whereas the second term involves just yT , yT−1, . . . ,if only variables up to time T have been used for model specification and estimation. Con-sequently, the two terms are independent or at least uncorrelated so that the MSE matrixhas the form

Σy(h) = E[(yT+h − yT+h|T )(yT+h − yT+h|T )′] = Σy(h) + MSE(yT+h|T − yT+h|T ). (6.6)

If the VAR model specified for yt properly represents the DGP, the last term on the right-hand side approaches zero as the sample size gets large because the difference yT+h|T− yT+h|Tvanishes asymptotically in probability under standard assumptions. Thus, if the theoreticalmodel fully captures the DGP, specification and estimation uncertainty is not importantasymptotically. On the other hand, in finite samples the precision of the forecasts dependson the precision of the estimators. Suitable correction factors for MSEs and forecast intervalsfor stationary processes are given by Baillie (1979), Reinsel (1980), Samaranayake and Hasza(1988) and Lutkepohl (2005, Chapter 3). A discussion of extensions with a number of furtherreferences may be found in Lutkepohl (2009).

7 Granger-Causality Analysis

Because VAR models describe the joint generation process of a number of variables, theycan be used for investigating relations between the variables. A specific type of relation waspointed out by Granger (1969) and is known as Granger-causality. Granger called a variabley2t causal for a variable y1t if the information in past and present values of y2t is helpfulfor improving the forecasts of y1t. This concept is especially easy to implement in a VARframework. Suppose that y1t and y2t are generated by a bivariate VAR(p) process,

(

y1ty2t

)

=

p∑

i=1

[

α11,i α12,i

α21,i α22,i

](

y1,t−iy2,t−i

)

+ ut.

Then y2t is not Granger-causal for y1t if and only if α12,i = 0, i = 1, 2, . . . , p. In other words,y2t is not Granger-causal for y1t if the former variable does not appear in the y1t equation ofthe model. This result holds for both stationary and integrated processes.

15

Because Granger-noncausality is characterized by zero restrictions on the levels VARrepresentation of the DGP, testing for it becomes straightforward. Standard Wald χ2- orF -tests can be applied. If yt contains integrated and possibly cointegrated variables thesetests may not have standard asymptotic properties, however (Toda and Phillips (1993)).For the presently considered case, there is a simple way to fix the problem. In this case theproblem of getting a nonstandard asymptotic distribution for Wald tests for zero restrictionscan be resolved by adding an extra redundant lag to the VAR in estimating the parametersof the process and testing the relevant null hypothesis on the matrices A1, . . . , Ap only (seeToda and Yamamoto (1995) and Dolado and Lutkepohl (1996)). Since a VAR(p + 1) is anappropriate model with Ap+1 = 0 if the true VAR order is p, the procedure is sound. It willnot be fully efficient, however, due to the redundant VAR lag.

If there are more than two variables the conditions for non-causality or causality be-come more complicated even if the DGP is a VAR process (see, e.g., Lutkepohl (1993) andDufour and Renault (1998)). In practice, Granger-causality is therefore often investigatedfor bivariate processes. It should be clear, however, that Granger-causality depends on theinformation set considered. In other words, even if a variable is Granger-causal in a bivari-ate model, it may not be Granger-causal in a larger model involving more variables. Forinstance, there may be a variable driving both variables of a bivariate process. When thatvariable is added to the model, a bivariate causal structure may disappear. In turn it isalso possible that a variable is non-causal for another one in a bivariate model and becomescausal if the information set is extended to include other variables as well. There are alsoa number of other limitations of the concept of Granger-causality which have stimulated anextensive discussion of the concept and have promted alternative definitions. For furtherdiscussion and references see Lutkepohl (2005, Section 2.3.1) and for extensions to testingfor Granger-causality in infinite order VAR processes see Lutkepohl and Poskitt (1996) andSaikkonen and Lutkepohl (1996).

8 Structural Analysis

Traditionally the interaction between economic variables is studied by considering the effectsof changes in one variable on the other variables of interest. In VAR models changes in thevariables are induced by nonzero residuals, that is, by shocks which may have a structuralinterpretation if identifying structural restrictions have been placed accordingly. Hence, tostudy the relations between the variables the effects of nonzero residuals or shocks are tracedthrough the system. This kind of analysis is known as impulse response analysis. It willbe discussed in Section 8.1. Related tools are forecast error variance decompositions andhistorical decompositions of time series of interest in terms of the contributions attributableto the different structural shocks. Moreover, forecasts conditional on a specific path of avariable or set of variables may be considered. These tools are discussed in Sections 8.2, 8.3and 8.4, respectively.

8.1 Impulse Response Analysis

In the reduced form VAR model (2.5) impulses, innovations or shocks enter through theresidual vector ut = (u1t, . . . , uKt)

′. A nonzero component of ut corresponds to an equivalentchange in the associated left-hand side variable which in turn will induce further changes

16

in the other variables of the system in the next periods. The marginal effect of a singlenonzero element in ut can be studied conveniently by inverting the VAR representation andconsidering the corresponding moving average (MA) representation. Ignoring deterministicterms because they are not important for impulse response analysis gives

yt = A(L)−1ut = Φ(L)ut =∞∑

j=0

Φjut−j, (8.1)

where Φ(L) =∑∞

j=0 ΦjLj = A(L)−1. The (K×K) coefficient matrices Φj are precisely those

given in (6.3). The marginal response of yn,t+j to a unit impulse umt is given by the (n,m)thelements of the matrices Φj, viewed as a function of j. Hence, the elements of Φj representresponses to ut innovations. Because the ut are just the 1-step forecast errors, these impulseresponses are sometimes called forecast error impulse responses (Lutkepohl (2005, Section2.3.2)) and the corresponding MA representation is called Wold MA representation.

The existence of the representation (8.1) is ensured if the VAR process is stable and,hence, yt consists of stationary (I(0)) variables. In that case Φj → 0 as j → ∞ and theeffect of an impulse is transitory. If yt has I(1) components, the Wold MA representation(8.1) does not exist. However, for any finite j, Φj can be computed as in the stationary case,using the formula (6.3). Thus, impulse responses can also be computed for I(1) processes.For such processes the marginal effects of a single shock may lead to permanent changes insome or all of the variables.

Because the residual covariance matrix Σu is generally not diagonal, the components ofut may be contemporaneously correlated. Consequently, the ujt shocks are not likely tooccur in isolation in practice. Therefore tracing such shocks may not reflect what actuallyhappens in the system if a shock hits. In other words, forecast error shocks may not be theright ones to consider if one is interested in understanding the interactions within the systemunder consideration. Therefore researchers typically try to determine structural shocks andtrace their effects. A main task in structural VAR analysis is in fact the specification of theshocks of interest.

If an identified structural form such as (2.8) is available, the corresponding residuals arethe structural shocks. For a stationary process their corresponding impulse responses canagain be obtained by inverting the VAR representation,

yt = (A− A∗1L− · · · − A∗pLp)−1Bvt =

∞∑

j=1

ΦjA−1Bvt−j =

∞∑

j=1

Ψjvt−j, (8.2)

where the Ψj = ΦjA−1B contain the structural impulse responses. The latter formulas

can also be used for computing structural impulse responses for I(1) processes even if therepresentation (8.2) does not exist.

Estimation of impulse responses is straightforward by substituting estimated reducedform or structural form parameters in the formulas for computing them. Suppose the struc-tural form VAR parameters are collected in the vector α and denote its estimator by α.Moreover, let ψ be the vector of impulse response coefficients of interest. This vector is a(nonlinear) function of α, ψ = ψ(α), which can be estimated as ψ = ψ(α). Using the deltamethod, it is easy to see that ψ = ψ(α) is asymptotically normal if α has this property.More precisely,

√T (α−α)

d→ N (0,Σα)

17

implies

√T (ψ − ψ)

d→ N (0,Σψ), (8.3)

where

Σψ =∂ψ

∂α′Σα

∂ψ′

∂α,

provided the matrix of partial derivatives ∂ψ/∂α′ is such that none of the variances is zeroand, in particular, ∂ψ/∂α′ 6= 0. If ∂ψ/∂α′ does not have full row rank, the asymptoticcovariance matrix Σψ is singular. This problem will arise at specific points in the parameterspace in the present situation because the function ψ(α) consists of sums of products ofelements of α. Also, Σα is generally singular if yt is I(1) which in turn may imply singularityof Σψ even if ∂ψ/∂α′ has full row rank. In the present case, both problems may occur jointly.A singular asymptotic covariance matrix may give rise to misleading inference for impulseresponses. For further discussion see Benkwitz, Lutkepohl and Neumann (2000).

Even in those parts of the parameter space where standard asymptotic theory works,it is known that the actual small sample distributions of impulse responses may be quitedifferent from their asymptotic counterparts. In particular, the accuracy of the confidenceintervals tends to be low for large-dimensional VARs at longer horizons if the data are highlypersistent, that is, if the process has roots close to the unit circle (see Kilian and Chang(2000)). Therefore attempts have been made to use local-to-unity asymptotics for improvinginference in this situation. Earlier attempts in this context are Stock (1991), Wright (2000),Gospodinov (2004) and more recent articles using that approach are Pesavento and Rossi(2006) and Mikusheva (2011).

In practice, bootstrap methods are often used in applied work to construct impulseresponse confidence intervals (e.g., Kilian (1998), Benkwitz, Lutkepohl and Wolters (2001)).Although they have the advantage that complicated analytical expressions of the asymptoticvariances are not needed, it is not clear that they lead to substantially improved inference.In particular, they are also justified by asymptotic theory. In general the bootstrap does notovercome the problems due to a singularity in the asymptotic distribution. Consequentlybootstrap confidence intervals may have a coverage which does not correspond to the nominallevel and may, hence, be unreliable (see Benkwitz et al. (2000)). Using subset VAR techniquesto impose as many zero restrictions on the parameters as possible and estimating only theremaining nonzero parameters offers a possible solution to this problem.

Bayesian methods provide another possible solution (e.g. Sims and Zha (1999)). If ana posteriori distribution is available for α, it can be used to simulate the distribution ofψ = ψ(α) using standard Bayesian simulation techniques. That distribution can then beused for setting up confidence intervals or for inference on ψ. As Bayesian inference does notrely on asymptotic arguments, the singularity problem is not relevant. This does not meanthat Bayesian estimation is necessarily more reliable. It requires extensive computations andis based on distributional assumptions which may be questionable.

8.2 Forecast Error Variance Decompositions

As mentioned earlier, forecast error variance decompositions are another tool for investigatingthe impacts of shocks in VAR models. In terms of the structural residuals the h-step forecast

18

error (6.2) can be represented as

yT+h − yT+h|T = Ψ0vT+h +Ψ1vT+h−1 + · · ·+Ψh−1vT+1.

Using Σv = IK , the forecast error variance of the kth component of yT+h can be shown tobe

σ2k(h) =

h−1∑

j=0

(ψ2k1,j + · · ·+ ψ2

kK,j) =K∑

j=1

(ψ2kj,0 + · · ·+ ψ2

kj,h−1),

where ψnm,j denotes the (n,m)th element of Ψj. The quantity (ψ2kj,0 + · · · + ψ2

kj,h−1) rep-resents the contribution of the jth shock to the h-step forecast error variance of variablek. In practice, the relative contributions (ψ2

kj,0 + · · · + ψ2kj,h−1)/σ

2k(h) are often reported

and interpreted for various variables and forecast horizons. A meaningful interpretation ofthese quantities requires that the shocks considered in the decomposition are economicallymeaningful.

The quantities of interest here can again be estimated easily by replacing unknown pa-rameters by their estimators. Inference is complicated by the fact, however, that the relativevariance shares may be zero or one and, hence, may assume boundary values. In such casesboth classical asymptotic as well as bootstrap methods have problems.

8.3 Historical Decomposition of Time Series

Another way of looking at the contributions of the structural shocks to the observed seriesis opened up by decomposing the series as proposed by Burbidge and Harrison (1985).Neglecting deterministic terms and considering the structural MA representation (8.2), thejth variable can be represented as

yjt =∞∑

i=0

(ψj1,iv1,t−i + · · ·+ ψjK,ivK,t−i),

where ψjk,i is the (j, k)th element of the structural MA matrix Ψi, as before. Thus,

y(k)jt =

∞∑

i=0

ψjk,ivk,t−i

is the contribution of the kth structural shock to the jth variable yjt. Ideally one would like

to plot the y(k)jt for k = 1, . . . , K, throughout the sample period, that is, for t = 1, . . . , T ,

and interpret the relative contributions of the different structural shocks to the jth variable.In practice, such a historical decomposition is, of course, not feasible because the struc-

tural shocks are not available. However, we can estimate the shocks associated with thesample period and use an estimated historical decomposition by noting that by successivesubstitution, the VAR process (2.5) can be written as

yt =t−1∑

i=0

Φiut−i + A(t)1 y0 + · · ·+ A(t)

p y−p+1

=t−1∑

i=0

Ψivt−i + A(t)1 y0 + · · ·+ A(t)

p y−p+1, (8.4)

19

where the Φi and Ψi are the MA coefficient matrices defined earlier and the A(t)i are such

that [A(t)1 , . . . , A

(t)p ] consists of the first K rows of the (pK × pK) matrix At, where

A =

A1 . . . Ap−1 ApIK 0 0

. . ....

0 IK 0

(see Lutkepohl (2005, Sec. 2.1)). Hence, the A(t)i go to zero for stationary VARs when t

becomes large so that the contribution of the initial state becomes negligible for stationaryprocesses as t → ∞. On the other hand, for I(1) processes the contribution of the initialvalues y0, . . . , y−p+1 will remain important. In any case, yjt may be decomposed as

y(k)jt =

t−1∑

i=0

ψjk,ivk,t−i + α(t)j1 y0 + · · ·+ α

(t)jp y−p+1,

where ψjk,i is the (j, k)th element of Ψi and α(t)ji is the jth row of A

(t)i . The series y

(k)jt

represents the contribution of the kth structural shock to the jth component series of yt,given y0, . . . , y−p+1. In practice all unknown parameters have to be replaced by estimators.

The corresponding series y(k)jt , k = 1, . . . , K, represent a historical decomposition of yjt. They

are typically plotted to assess the contributions of the structural shocks to the jth series.Obviously, one may start the decomposition at any point in the sample and not necessarilyat t = 0. In fact, for t close to the starting point of the decomposition the initial values mayhave a substantial impact even for stationary processes. So one may only want to considerthe decomposition for periods some distance away from the starting point.

8.4 Analysis of Forecast Scenarios

SVAR models have also been used for analyzing different forecast scenarios or conditionalforecasts given restrictions for the future values of some of the variables. For example, inmonetary policy analysis one may be interested in knowing the future development of thesystem under consideration for a given path of the interest rate or if the interest rate remainswithin a given range. Clearly, in a model where all variables are endogenous, fixing the futurevalues of one or more variables may be problematic and one has to evaluate carefully howfar the model can be stretched without being invalidated. In other words, SVAR modelscannot be expected to reflect the changes induced in the future paths of the variables forarbitrary forecast scenarios (for applications see, e.g., Waggoner and Zha (1999), Baumeisterand Kilian (2011)).

For describing the approach, a SVAR representation similar to (8.4) is particularly suit-able,

yT+h =h−1∑

i=0

ΨivT+h−i + A(h)1 yT + · · ·+ A(h)

p yT−p+1, h = 1, 2, . . . , (8.5)

where deterministic terms are again ignored for simplicity and all symbols are defined asin (8.4). The standard reduced form forecast yT |T+h discussed in Section 6.1 is obtained

20

from this expression by replacing all structural residuals in the first term on the right-handside by zero. A forecast scenario different from this baseline forecast may be obtained byassigning other values to the structural shocks. For instance, a scenario where the jthvariable has future values y∗j,T+h for h = 1, . . . , H, amounts to choosing structural shocksv∗T+h, h = 1, . . . , H, such that

h−1∑

i=0

K∑

k=1

ψjk,iv∗k,T+h−i = y∗j,T+h − α

(h)j1 yT − · · · − α

(h)jp yT−p+1 h = 1, . . . , H, (8.6)

for periods T + 1, . . . , T +H or, more generally in matrix notation,

RHv∗T,T+H = rH , (8.7)

where vT,T+H = (v′T+1, . . . , v′T+H)

′ is a (KH × 1) vector of stacked future structural shocks,RH is a suitable (Q × KH) dimensional restriction matrix representing the left-hand siderelations in (8.6) with Q ≤ KH and rH is a (Q× 1) vector containing the right-hand side of(8.6). The forecasts conditional on the v∗T+h shocks are then computed as

ycondT |T+h =h−1∑

i=0

Ψiv∗T+h−i + A

(h)1 yT + · · ·+ A(h)

p yT−p+1, h = 1, . . . , H. (8.8)

For concreteness consider, for instance, the case of a B-model with lower-triangular initialeffects matrix B = Ψ0. In that case, if the path of the first variable is prespecified as y∗1,T+h,

h = 1, . . . , H, this amounts to choosing the first residual as v∗1,T+h = y∗1,T+h−α(1)11 y

condT+h−1|T −

· · · − α(1)1p y

condT+h−p|T , h = 1, . . . , H. In general, the values for the v∗T+h’s will not be uniquely

determined by the restrictions. In that case Doan et al. (1984) suggest using

v∗T,T+H = R′H(RHR′H)−1rH

which is the least squares solution obtained by minimizing∑H

h=1 v′T+hvT+h = v′T,T+HvT,T+H

subject to the restrictions (8.7).Obviously, the conditional forecast ycondT |T+h differs from the unconditional forecast yT |T+h

of Section 6.1 by the first term on the right-hand side of equation (8.8). Of course, otherforecast scenarios may be of interest. For example, one may not want to condition on aparticular path of a variable but on its values being in a given range. Such a scenario canbe investigated by using the above formulas for computing forecast ranges accordingly.

For practical purposes the unknown parameters have to be replaced by estimates, asusual. Waggoner and Zha (1999) present a Bayesian method for taking into account therelated estimation uncertainty in that case.

A critical question in the context of evaluating forecast scenarios in the context of SVARmodels is whether such models are suitable for that purpose or whether they are stretchedtoo much by restricting the future developments of some variables. The problem here isthat all variables are regarded as endogenous and, hence, their values should be generatedendogenously by the model and not be forced upon them exogenously. Of course, therecould be cases where one or more of the variables are really exogenous and not affectedby feedback relations within the system under consideration. In such cases a conditionalanalysis as described in the foregoing is plausible. In other cases the users should be cautiousin interpreting the results and carefully evaluate whether the model can be expected to beinformative about the questions of interest.

21

9 Conclusions and Extensions

This chapter reviews VAR analysis. Specification, estimation, forecasting, causality andstructural analysis are discussed. Finite order VAR models are popular for economic analysisbecause they are easy to use. There are several software products which can be used inperforming a VAR analysis (see, e.g., PcGive (Doornik and Hendry (1997)), EViews (EViews(2000)) and JMulTi (Kratzig (2004)).

In many situations of practical interest the VAR models discussed in this chapter aretoo limited, however. For example, the assumption that the VAR order is finite is ratherrestrictive because theoretically omitted variables or linear transformations of the variablesmay lead to infinite order VAR processes. Hence, it may be appealing to extend the modelclass to infinite order VAR processes. Such an extension may be achieved by consideringVAR models with MA error terms or by studying the implications of fitting approximatefinite order VAR models to series which are generated by infinite order processes. Thisissue is dealt with in Lewis and Reinsel (1985), Lutkepohl and Saikkonen (1997), Inoue andKilian (2002b) and Lutkepohl (2005, Part IV). An authoritative discussion of the theory ofVARMA models is also available in Hannan and Deistler (1988) and a recent survey of therelated literature is given by Lutkepohl (2006a).

As already mentioned in the introduction, if some variables are integrated and perhapscointegrated, vector error correction models are suitable tools for modelling the cointegrationrelations in detail. Such models are presented in Chapter 9. A possible advantage of thelevels VAR models considered in the present chapter is that they are robust to cointegrationof unknown form. They can be used even if the number of cointegration relations is unknownnot to speak of the precise cointegration relations. Of course, statistical tools are availablefor analyzing the number and type of cointegration relations. However, such pretestingprocedures have their limitations as well, in particular if some roots are only near unity, aspointed out, for instance, by Elliott (1998).

Other possible extensions are the inclusion of nonlinear components (e.g., Granger (2001),Terasvirta, Tjøstheim and Granger (2010)) or allowing the VAR coefficients to be time-varying (e.g., Primiceri (2005)). Moreover, seasonal macroeconomic data are now availablefor many countries. Hence, accounting specifically for seasonal fluctuations may be nec-essary (e.g., Ghysels and Osborn (2001)). Furthermore, heteroskedasticity or conditionalheteroskedasticity in the residuals of VAR models may be of importance in practice, inparticular, when higher frequency data are considered (see Lutkepohl (2005, Chapter 16)for a textbook treatment and Bauwens, Laurent and Rombouts (2006) or Silvennoinen andTerasvirta (2009) for recent surveys). For some economic variables restricting the order ofintegration to zero or one may be unrealistic. Extensions to higher order integration havebeen considered by Boswijk (2000), Johansen (1997, 2006) and others.

References

Ahn, S. K. (1988). Distribution for residual autocovariances in multivariate autoregressive modelswith structured parameterization, Biometrika 75: 590–593.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle, inB. Petrov and F. Csaki (eds), 2nd International Symposium on Information Theory, AcademiaiKiado, Budapest, pp. 267–281.

22

Akaike, H. (1974). A new look at the statistical model identification, IEEE Transactions on Auto-

matic Control AC-19: 716–723.

Amisano, G. and Giannini, C. (1997). Topics in Structural VAR Econometrics, 2nd edn, Springer,Berlin.

Andrews, D. W. K. (1993). Tests for parameter instability and structural change with unknownchange point, Econometrica 61: 821–856.

Andrews, D. W. K. and Ploberger, W. (1994). Optimal tests when a nuisance parameter is presentonly under the alternative, Econometrica 62: 1383–1414.

Baillie, R. T. (1979). Asymptotic prediction mean squared error for vector autoregressive models,Biometrika 66: 675–678.

Baumeister, C. and Kilian, L. (2011). Real-time forecasts of the real price of oil, mimeo, Universityof Michigan.

Bauwens, L., Laurent, S. and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey,Journal of Applied Econometrics 21: 79–109.

Benkwitz, A., Lutkepohl, H. and Neumann, M. (2000). Problems related to bootstrapping impulseresponses of autoregressive processes, Econometric Reviews 19: 69–103.

Benkwitz, A., Lutkepohl, H. and Wolters, J. (2001). Comparison of bootstrap confidence intervalsfor impulse responses of German monetary systems, Macroeconomic Dynamics 5: 81–100.

Berkowitz, J. and Kilian, L. (2000). Recent developments in bootstrapping time series, Econometric

Reviews 19: 1–48.

Blanchard, O. and Quah, D. (1989). The dynamic effects of aggregate demand and supply distur-bances, American Economic Review 79: 655–673.

Boswijk, H. P. (2000). Mixed normality and ancillarity in I(2) systems, Econometric Theory

16: 878–904.

Breitung, J., Bruggemann, R. and Lutkepohl, H. (2004). Structural vector autoregressive modelingand impulse responses, in H. Lutkepohl and M. Kratzig (eds), Applied Time Series Econo-

metrics, Cambridge University Press, Cambridge, pp. 159–196.

Bruggemann, R., Lutkepohl, H. and Saikkonen, P. (2006). Residual autocorrelation testing forvector error correction models, Journal of Econometrics 134: 579–604.

Burbidge, J. and Harrison, A. (1985). A historical decomposition of the great depression to deter-mine the role of money, Journal of Monetary Economics 16: 45–54.

Canova, F. (2007). Methods for Applied Macroeconomic Research, Princeton University Press,Princeton.

Del Negro, M. and Schorfheide, F. (2004). Priors from general equilibrium models for VARs,International Economic Review 45: 643–673.

Doan, T., Litterman, R. B. and Sims, C. A. (1984). Forecasting and conditional projection usingrealistic prior distributions, Econometric Reviews 3: 1–144.

Dolado, J. J. and Lutkepohl, H. (1996). Making Wald tests work for cointegrated VAR systems,Econometric Reviews 15: 369–386.

23

Doornik, J. A. and Hendry, D. F. (1997). Modelling Dynamic Systems Using PcFiml 9.0 for

Windows, International Thomson Business Press, London.

Dufour, J.-M. and Renault, E. (1998). Short run and long run causality in time series: Theory,Econometrica 66: 1099–1125.

Elliott, G. (1998). On the robustness of cointegration methods when regressors almost have unitroots, Econometrica 66: 149–158.

Engle, R. F. and Granger, C. W. J. (1987). Cointegration and error correction: Representation,estimation and testing, Econometrica 55: 251–276.

EViews (2000). EVievs 4.0 User’s Guide, Quantitative Micro Software, Irvine, CA.

Findley, D. F. (1986). On bootstrap estimates of forecast mean square errors for autoregressive pro-cesses, in D. M. Allen (ed.), Computer Science and Statistics: The Interface, North-Holland,Amsterdam, pp. 11–17.

Ghysels, E. and Osborn, D. R. (2001). The Econometric Analysis of Seasonal Time Series, Cam-bridge University Press, Cambridge.

Giannone, D., Lenza, M. and Primiceri, G. (2010). Prior selection for vector autoregressions,mimeo, Department of Economics, Free University of Brussels.

Goncalves, S. and Kilian, L. (2004). Bootstrapping autoregressions with conditional heteroskedas-ticity of unknown form, Journal of Econometrics 123: 89–120.

Gospodinov, N. (2004). Asymptotic confidence intervals for impulse responses of near-integratedprocesses, Econometrics Journal 7: 505–527.

Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectralmethods, Econometrica 37: 424–438.

Granger, C. W. J. (1981). Some properties of time series data and their use in econometric modelspecification, Journal of Econometrics 16: 121–130.

Granger, C. W. J. (2001). Overview of nonlinear macroeconometric empirical models, Macroeco-

nomic Dynamics 5: 466–481.

Grigoletto, M. (1998). Bootstrap prediction intervals for autoregressions: Some alternatives, Inter-national Journal of Forecasting 14: 447–456.

Hamilton, J. D. (1994). Time Series Analysis, Princeton University Press, Princeton, New Jersey.

Hannan, E. J. and Deistler, M. (1988). The Statistical Theory of Linear Systems, Wiley, New York.

Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression,Journal of the Royal Statistical Society B41: 190–195.

Hansen, B. E. (1997). Approximate asymptotic p values for structural-change tests, Journal ofBusiness & Economic Statistics 15: 60–67.

Hatanaka, M. (1996). Time-Series-Based Econometrics: Unit Roots and Co-Integration, OxfordUniversity Press, Oxford.

Hosking, J. R. M. (1980). The multivariate portmanteau statistic, Journal of the American Statis-

tical Association 75: 602–608.

24

Hosking, J. R. M. (1981a). Equivalent forms of the multivariate portmanteau statistic, Journal ofthe Royal Statistical Society B43: 261–262.

Hosking, J. R. M. (1981b). Lagrange-multiplier tests of multivariate time series models, Journal ofthe Royal Statistical Society B43: 219–230.

Ingram, B. F. and Whiteman, C. H. (1994). Supplanting the ’Minnesota’ prior: Forecasting macroe-conomic time series using real business cycle model priors, Journal of Monetary Economics

34: 497–510.

Inoue, A. and Kilian, L. (2002a). Bootstrapping autoregressive processes with possible unit roots,Econometrica 70: 377–391.

Inoue, A. and Kilian, L. (2002b). Bootstrapping smooth functions of slope parameters and inno-vation variances in VAR(∞) models, International Economic Review 43: 309–332.

Jarque, C. M. and Bera, A. K. (1987). A test for normality of observations and regression residuals,International Statistical Review 55: 163–172.

Johansen, S. (1995). Likelihood-based Inference in Cointegrated Vector Autoregressive Models, Ox-ford University Press, Oxford.

Johansen, S. (1997). Likelihood analysis of the I(2) model, Scandinavian Journal of Statistics

24: 433–462.

Johansen, S. (2006). Statistical analysis of hypotheses on the cointegrating relations in the I(2)model, Journal of Econometrics 132: 81–115.

Kabaila, P. (1993). On bootstrap predictive inference for autoregressive processes, Journal of Time

Series Analysis 14: 473–484.

Kadiyala, K. R. and Karlsson, S. (1997). Numerical methods for estimation and inference inBayesian VAR-models, Journal of Applied Econometrics 12: 99–132.

Kilian, L. (1998). Small-sample confidence intervals for impulse response functions, Review of

Economics and Statistics 80: 218–230.

Kilian, L. (2011). Structural vector autoregressions, Handbook of Research Methods and Applica-

tions on Empirical Macroeconomics, Edward Elger.

Kilian, L. and Chang, P.-L. (2000). How accurate are confidence intervals for impulse responses inlarge VAR models?, Economics Letters 69: 299–307.

Kilian, L. and Demiroglu, U. (2000). Residual-based tests for normality in autoregressions: Asymp-totic theory and simulation evidence, Journal of Business & Economic Statistics 18: 40–50.

Kim, J. H. (1999). Asymptotic and bootstrap prediction regions for vector autoregression, Inter-national Journal of Forecasting 15: 393–403.

Kratzig, M. (2004). The software JMulTi, in H. Lutkepohl and M. Kratzig (eds), Applied Time

Series Econometrics, Cambridge University Press, Cambridge, pp. 289–299.

Leeb, H. and Potscher, B. M. (2005). Model selection and inference: Facts and fiction, Econometric

Theory 21: 21–59.

Lewis, R. and Reinsel, G. C. (1985). Prediction of multivarate time series by autoregressive modelfitting, Journal of Multivariate Analysis 16: 393–411.

25

Li, H. and Maddala, G. S. (1996). Bootstrapping time series models, Econometric Reviews 15: 115–158.

Li, W. K. and McLeod, A. I. (1981). Distribution of the residual autocorrelations in multivariateARMA time series models, Journal of the Royal Statistical Society B43: 231–239.

Litterman, R. B. (1986). Forecasting with Bayesian vector autoregressions - five years of experience,Journal of Business & Economic Statistics 4: 25–38.

Lomnicki, Z. A. (1961). Tests for departure from normality in the case of linear stochastic processes,Metrika 4: 37–62.

Lutkepohl, H. (1993). Testing for causation between two variables in higher dimensional VARmodels, in H. Schneeweiss and K. F. Zimmermann (eds), Studies in Applied Econometrics,Physica-Verlag, Heidelberg, pp. 75–91.

Lutkepohl, H. (2004). Vector autoregressive and vector error correction models, in H. Lutkepohl andM. Kratzig (eds), Applied Time Series Econometrics, Cambridge University Press, Cambridge,pp. 86–158.

Lutkepohl, H. (2005). New Introduction to Multiple Time Series Analysis, Springer-Verlag, Berlin.

Lutkepohl, H. (2006a). Forecasting with VARMA models, in G. Elliott, C. W. J. Granger andA. Timmermann (eds), Handbook of Economic Forecasting, Volume 1, Elsevier, Amsterdam,pp. 287–325.

Lutkepohl, H. (2006b). Vector autoregressive models, in T. C. Mills and K. Patterson (eds),Palgrave Handbook of Econometrics, Volume 1, Econometric Theory, Palgrave Macmillan,Houndmills, Basingstoke, pp. 477–510.

Lutkepohl, H. (2009). Econometric analysis with vector autoregressive models, in D. A. Belsleyand E. J. Kontoghiorghes (eds), Handbook of Computational Econometrics, Wiley, Chichester,pp. 281–319.

Lutkepohl, H. and Kratzig, M. (eds) (2004). Applied Time Series Econometrics, Cambridge Uni-versity Press, Cambridge.

Lutkepohl, H. and Poskitt, D. S. (1996). Testing for causation using infinite order vector autore-gressive processes, Econometric Theory 12: 61–87.

Lutkepohl, H. and Saikkonen, P. (1997). Impulse response analysis in infinite order cointegratedvector autoregressive processes, Journal of Econometrics 81: 127–157.

Masarotto, G. (1990). Bootstrap prediction intervals for autoregressions, International Journal ofForecasting 6: 229–239.

Mikusheva, A. (2011). One-dimensional inference in autoregressive models with the potential pres-ence of a unit root, mimeo, Department of Economics, M.I.T.

Park, J. Y. and Phillips, P. C. B. (1988). Statistical inference in regressions with integratedprocesses: Part 1, Econometric Theory 4: 468–497.

Park, J. Y. and Phillips, P. C. B. (1989). Statistical inference in regressions with integratedprocesses: Part 2, Econometric Theory 5: 95–131.

Pascual, L., Romo, J. and Ruiz, E. (2004). Bootstrap predictive inference for ARIMA processes,Journal of Time Series Analysis 25: 449–465.

26

Paulsen, J. (1984). Order determination of multivariate autoregressive time series with unit roots,Journal of Time Series Analysis 5: 115–127.

Pesavento, E. and Rossi, B. (2006). Small-sample confidence intervals for multivariate impulseresponse functions at long horizons, Journal of Applied Econometrics 21: 1135–1155.

Primiceri, G. E. (2005). Time varying structural vector autoregressions and monetary policy,Review of Economic Studies 72: 821–852.

Quinn, B. G. (1980). Order determination for a multivariate autoregression, Journal of the Royal

Statistical Society B42: 182–185.

Reinsel, G. (1980). Asymptotic properties of prediction errors for the multivariate autoregressivemodel using estimated parameters, Journal of the Royal Statistical Society B42: 328–333.

Rissanen, J. (1978). Modeling by shortest data description, Automatica 14: 465–471.

Saikkonen, P. and Lutkepohl, H. (1996). Infinite order cointegrated vector autoregressive processes:Estimation and inference, Econometric Theory 12: 814–844.

Samaranayake, V. A. and Hasza, D. P. (1988). Properties of predictors for multivariate autoregres-sive models with estimated parameters, Journal of Time Series Analysis 9: 361–383.

Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics 6: 461–464.

Silvennoinen, A. and Terasvirta, T. (2009). Multivariate GARCH models, in T. G. Andersen, R. A.Davis, J.-P. Kreis and T. Mikosch (eds), Handbook of Financial Time Series, Springer, NewYork, pp. 201–228.

Sims, C. A. (1980). Macroeconomics and reality, Econometrica 48: 1–48.

Sims, C. A., Stock, J. H. and Watson, M. W. (1990). Inference in linear time series models withsome unit roots, Econometrica 58: 113–144.

Sims, C. A., Waggoner, D. F. and Zha, T. (2008). Methods for inference in large multiple-equationMarkov-switching models, Journal of Econometrics 146: 255–274.

Sims, C. A. and Zha, T. (1999). Error bands for impulse responses, Econometrica 67: 1113–1155.

Stock, J. H. (1991). Confidence intervals for the largest autoregressive root in U.S. economic timeseries, Journal of Monetary Economics 28: 435–460.

Terasvirta, T., Tjøstheim, D. and Granger, C. W. J. (2010). Modelling Nonlinear Economic Time

Series, Oxford University Press, Oxford.

Toda, H. Y. and Phillips, P. C. B. (1993). Vector autoregressions and causality, Econometrica

61: 1367–1393.

Toda, H. Y. and Yamamoto, T. (1995). Statistical inference in vector autoregressions with possiblyintegrated processes, Journal of Econometrics 66: 225–250.

Uhlig, H. (2005). What are the effects of monetary policy on output? Results from an agnosticidentification procedure, Journal of Monetary Economics 52: 381–419.

Villani, M. (2005). Bayesian reference analysis of cointegration, Econometric Theory 21: 326–357.

Waggoner, D. F. and Zha, T. (1999). Conditional forecasts in dynamic multivariate models, Reviewof Economics and Statistics 81: 639–651.

27

Watson, M. W. (1994). Vector autoregressions and cointegration, in R. F. Engle and D. L. McFad-den (eds), Handbook of Econometrics, Vol. IV, Elsevier, New York.

Wright, J. H. (2000). Confidence intervals for univariate impulse responses with a near unit root,Journal of Business & Economic Statistics 18: 368–373.

28

Documents

EUI Working Papers · A number of textbooks and review articles deal with VAR models. Examples of books are Hamilton (1994), Johansen (1995), Hatanaka (1996), Lu¨tkepohl and Kr¨atzig