NONLINEAR MODEL SPECIFICATION/DIAGNOSTICS: …ashleymac.econ.vt.edu/working_papers/e9806.pdfconsidered, and because each test is implemented using both the usual asymptotic theory

We are grateful to Ivan Pastine for helpful comments and to M.J. Hinich and B. LeBaron1

for sharing their computer codes with us. The authors may be contacted at [email protected] [email protected], respectively. This paper is available as Economics Department Working PaperE98-06 at http://ashleymac.econ.vt.edu/working_papers/E9806.pdf; an MSDOS programimplementing the calculations is available at http://ashleymac.econ.vt.edu/working_papers/toolzipd.exe.

Preliminary: Do not quote from this version.

NONLINEAR MODEL SPECIFICATION/DIAGNOSTICS:

INSIGHTS FROM A BATTERY OF NONLINEARITY TESTS 1

Richard A. Ashley

Department of Economics

Virginia Tech

Douglas M. Patterson

Department of Finance

Virginia Tech

July, 1998

Abstract

We present a comprehensive analysis of the most popular statistical tests used to detect

nonlinear dependence in time series data, including the BDS, Engle LaGrange Multiplier (LM),

McLeod-Li, Tsay, Hinich bicovariance, and Hinich bispectrum tests. The size of each test is

evaluated using serially i.i.d. data drawn from the gaussian, exponential, Student’s t , and

symmetric stable Paretian distributions. The power of each test is evaluated using serially

dependent data generated from General Autoregressive Conditional Heteroskedastic (GARCH),

Self-Exciting Threshold Autoregression (SETAR), Markov switching, quadratic, and cubic

processes. The results presented here are unique because of the wide variety of null and

alternative processes considered, because of the breadth of the study in terms of the tests

considered, and because each test is implemented using both the usual asymptotic theory and the

bootstrap.

The simulations using serially i.i.d data drawn from the exponential distribution allow us to

directly examine the sensitivity of each test’s actual size to asymmetry in the sample data;

similarly, using data drawn from the Student’s t and Paretian distributions allows us to examine

the impact of leptokurtosis and moment failure in the data. Simulating the tests with the serially

dependent data allows us quantify the relative power of each test across the various alternative

generating processes. We find that the differential relative power of the tests across the

alternatives is substantial. Therefore we conclude that the application of the full battery of tests to

sample data is potentially informative in terms of identifying the form of the underlying process.

As an example, we apply the tests to data on U.S. real GNP and to data simulated from

several estimated models for real GNP. Our battery of test results on the actual data confirm that

the generating mechanism for real GNP is nonlinear. The resulting pattern of test results

constitutes a new “stylized fact” about real GNP which any putative model for real GNP ought to

reproduce. By simulating data from each of several estimated models for real GNP in the

literature, we are able to estimate the probability that each of these models could generate data

exhibiting the pattern of nonlinearity test results observed with the actual data. In this way, we

find that it is very unlikely that the observed nonlinearity in U.S. real GNP is generated by either a

SETAR model or a Markov switching mechanism.

1

1. Introduction

Satisfactory methods for detecting linear serial dependence in time series and for

specifying statistically adequate models for such dependence, if detected, have been available for a

long time. The same cannot be said regarding nonlinear serial dependence, however.

This limited progress is certainly not for want of potential applications. Numerous

theoretical macroeconomic models are highly nonlinear, from Hicks’ (1950) elaboration of the

Samuelson multiplier-accelerator theory, to Grandmont’s (1985) overlapping generations model,

to labor hoarding models such as Hall (1990), and to recent models, such as Palm and Pfann

(1997), which are based on an explicit treatment of asymmetric adjustment costs. The

nonlinearity in these models is intrinsic to the macroeconomic hypotheses embodied therein and

essential to the derivation of observed macroeconomic properties, such as asymmetric business

cycles.

Nor is there any dearth of empirical support for such intrinsic nonlinearity, in large part

because a good deal of work has been done on the detection of nonlinear serial dependence.

Numerous tests for nonlinear serial dependence have been proposed and applied. Granger and

Andersen (1978), for example, suggested an examination of the sample correlogram of the

squared times series data corr(X , X ) leading to the McLeod and Li (1983) test, to the2 2t t -k

Engle (1982) LM test, and to many applications {e.g., Bollerslev (1986)} examining financial and

macroeconomic time series for ARCH and/or GARCH effects. In a separate line of research,

Subba Rao (1980) and Hinich (1982) developed tests for nonlinearity based on the observation

that the bispectrum the double fourier transformation of the third order moments, E(X X X )t t-j t-k

is flat across all frequency pairs if X ’s generating mechanism is linear. Ashley, Patterson, andt

Altug, Ashley and Patterson (1997) provides a partial exception to this conclusion; they2

use a sequence of nonlinearity tests to demonstrate that the nonlinearity in real GNP is generatedin the labor markets rather than in the capital markets or via exogenous technological shocks.

2

Hinich (1986) demonstrated that Hinich’s bispectral test has substantial power to detect the kinds

of nonlinearity generated by common statistical models; Ashley and Patterson (1989) showed that

the bispectral test can detect the kinds of nonlinearity intrinsic to simple theoretical

macroeconomic models (e.g., a stochastic Hicks economy) and, further, that it can detect

nonlinearity in the actual macroeconomy, using monthly data on the U.S. Index of Industrial

production. Similarly, Hinich and Patterson (1985) used the bispectral test to uncover widespread

nonlinearity in firm-level stock return data. These and other tests are described below.

Thus, a number of tests have been developed and nonlinear generating mechanisms have

been thereby detected in a number of important settings. However, these detections have

provided little guidance as to the form of the underlying nonlinear generating mechanism . 2

In this paper we present the results of a comprehensive comparison of the major tests for

nonlinearity in time series. Our results are unique because of the wide variety of null and

alternative processes considered, because of the breadth of the study in terms of the tests

considered, and because each test is implemented using both the usual asymptotic theory and the

bootstrap. Simulating the tests with serially i.i.d. data drawn from the exponential distribution

allows us to examine the sensitivity of each test’s empirical size to skewness in the data;

simulating the tests with serially i.i.d. data drawn from the Student’s t(df) distribution allows us to

examine the sensitivity of each test’s empirical size to leptokurtosis. Since stock return data has

been posited to follow a symmetric stable Paretian distribution, simulating the tests with serially

i.i.d. data drawn from this distribution allows us to examine the sensitivity of each test’s empirical

3

size in an empirically relevant circumstance where the data’s higher order moments do not exist.

Simulating the tests with serially dependent data generated by various processes allows us

quantify the relative power of each test across the various alternatives. We find that the

differential relative power of the tests across the alternatives is substantial. Therefore we

conclude that the application of the full battery of tests to sample data is potentially informative in

terms of identifying the form of the underlying process.

The remainder of this paper is organized as follows. Section 2 describes the statistical

tests considered; Section 3 describes the models used to generate the simulated data. In Section 4

we present our results on the empirical sizes of the tests; here we draw conclusions as to the

extent to which the actual size of each test is sensitive to characteristics such as asymmetry,

leptokurtosis, or moment failure in the underlying distribution of the data. In Section 5 we

present results on the power of the tests against various alternatives. In this Section we draw

conclusions as to which of the tests is most broadly powerful at detecting nonlinearities of the

forms considered and as to what the various tests can tell us about the form of the nonlinear

generating mechanism for the data.

Finally, we apply the tests to U.S. real GNP data in Section 6. Here we find, as expected,

that the generating mechanism for real GNP is nonlinear. However, the pattern of results across

the different nonlinearity tests is notably different from the patterns observed in the simulated

data. We conclude that this pattern of nonlinearity test results itself constitutes a new “stylized

fact” about U.S. real GNP and investigate whether or not existing estimated models for real GNP

can reproduce this stylized fact. Our results shed doubt on the commonly held notion that real

output is generated by some sort of switching process.

In our implementation p is chosen to minimize the Schwartz (SC) criterion. In contrast to3

alternative choices (e.g., AIC or FPE) the Schwartz criterion is known to be consistent for AR(p)order determination under the null hypothesis of a linear generating mechanism; see Judge, et al.(1985, p. 246).

Ashley, Patterson, and Hinich (1986) have shown that the test statistic for the Hinich4

bispectral test is invariant to linear filtering of the data, so the adequacy of the prewhitening modelis irrelevant to the validity of this test.

4

2. Testing for Nonlinearities

In this section, we provide a brief description of the statistical tests implemented below. These

include a test for ARCH effects due to McLeod and Li (1983), the Engle (1982) LM test for

ARCH effects, the BDS test proposed by Brock, Dechert, and Scheinkman (1996), the Tsay

(1986) test for quadratic serial dependence, the bicovariance test due to Hinich (1996) and Hinich

and Patterson (1995), and the Hinich bispectral test proposed in Hinich (1982) and studied in

Ashley, Patterson, and Hinich (1986) and in Ashley and Patterson (1989).

Except for the Hinich bispectral test, these tests all share the same premise: once any

linear serial dependence is removed from the data via a prewhitening model, any remaining serial

dependence must be due to a nonlinear generating mechanism. Thus, each of these procedures is

actually a test of serial independence applied to the (by construction) serially uncorrelated fitting

errors of an AR(p) model for the sample data. This fitting error series, standardized to zero3

mean and unit variance, is denoted by {x } below.t4

McLeod-Li Test

This test for ARCH effects was proposed by McLeod and Li (1983) based on a suggestion in

r̂ (k) 'jT

t'k%1x 2

t & F̂2 x 2t&k & F̂2

jT

t'1x 2

t & F̂2

F̂2 ' jT

t'1

x 2t

T

T r̂ ' r̂ (1) , ... , r̂ (L)

Q ' T(T % 2) jL

i'1

r̂ 2(k)T & i

5

Granger and Andersen (1978). It looks at the autocorrelation function of the squares of the

prewhitened data and tests whether corr(x , x ) is non-zero for some k. The autocorrelation2 2t t -k

function for the squared residuals {x } is estimated by:2t

where

Under the null hypothesis that x is an i.i.d process (and assuming that E(x ) exists) McLeod andt t8

Li (1983) show that, for fixed L:

is asymptotically a multivariate unit normal. Consequently the usual Box-Ljung statistic

is asymptotically P (L) under the null hypothesis of a linear generating mechanism for the data.2

x 2t ' "o % j

p

i'1"k x 2

t& i % <t

x mt ' xt , ... , xt%m&1

Cm,T (,) ' jt<s

I, x mt , x m

s2

Tm (Tm&1)

6

Engle LM Test

This test was proposed by Engle (1982) to detect ARCH disturbances; as

Bollerslev(1985) suggests, it should also have power against GARCH alternatives. As with most

LaGrange Multiplier tests, the test statistic itself is based on the R of an auxiliary regression, in2

this case:

Under the null hypothesis of a linear generating mechanism for x , TR for this regression ist2

asymptotically distributed P (p).2

BDS Test

The BDS test is a nonparametric test for serial independence based on the correlation

integral of the scalar series, {x }. For embedding dimension m, let {x } denote the sequence oft tm

m-histories generated by {x }:t

Then the correlation integral C (,) for a realization of T is given by:m,T

xt ' (o % jK

i'1(i v̂t i % 0t.

7

where T = T - (m - 1) and I (x , x ) is an indicator function which equals one if the sup normm , t sm m

2x - x 2 < , and equals 0 otherwise. Brock, Dechert, and Scheinkman (1996) exploit thet sm m

asymptotic normality of C (,) under the null hypothesis that {x } is an i.i.d. process to obtain am,T t

test statistic which asymptotically converges to a unit normal.

Tsay Test

The Tsay (1986) test explicitly looks for quadratic serial dependence.

Let z denote the projection of z on the subspace orthogonal to x , ... , x i.e., the^t t t-1 t-k

residuals from a regression of z on x , ... , x . And let the K = k(k-1)/2 column vectors V ...t t-1 t-k 1

V contain all of the possible crossproducts of the form x x . Thus, v = x ; v = x x ; K t-I t-j t,1 t -1 t,2 t-1 t-22

v = x x ; v = x x ; v = x x , and so forth. t,3 t-1 t-3 t,k+1 t-2 t-3 t,k+2 t-2 t-4

Then estimate ( ... ( by applying OLS to the regression equation1 K

The Tsay test statistic is then just the usual F statistic for testing the null hypothesis that ( ... (1 K

are all zero.

C3 (r ,s) ' (T & s)&1jT& s

t'1xt xt% r xt% s

O3 ' (T & s) .5jR

s'2js&1

r'1C3( r , s) 2

8

Hinich Bicovariance Test

This test assumes that {x } is a realization from a third-order stationary stochastic processt

and tests for serial independence using the sample bicovariances of the data. The (r,s) sample

bicovariance is defined as:

Under the null hypothesis that {x } is an i.i.d. process, Hinich and Patterson (1995) show that, fort

R < T ,.5

is asymptotically distributed P (R [R -1]); they recommend using R = T since they find that the2 .4

power of the test declines for smaller values of R .

cyyy(r ,s) ' E y(t) y(t% r) y(t%s) ,

By (ƒ1 , ƒ2) ' j4

r'&4j4

s'&4cyyy(r ,s) exp[& i2B (ƒ1r % ƒ2 s) ]

y(t) ' j4

n '0a(n) u(t&n) ,

See Brillinger and Rosenblatt (1967) for a rigorous treatment of the bispectrum.5

9

Hinich Bispectral Test

Suppose that {y(t)}, the series of interest, is a third-order stationary time series with, for

expositional convenience, E[y(t)] = 0. The series {y(t)} might be serially correlated, in which

case it is distinct from the prewhitened fitting error series denoted {x(t)} above. Letting c (r,s)yyy

denote the third order cumulant function for {y(t)},

the bispectrum of {y(t)} at frequency pair (ƒ , ƒ ) is its (double) Fourier transform:1 2

B (ƒ , ƒ ) is a spatially periodic function of (ƒ , ƒ ), whose principal domain is the triangular sety 1 2 1 2

S = {0 < ƒ < ½, ƒ < ƒ , 2ƒ +ƒ < 1}. 1 2 1 1 25

The generating mechanism for {y(t)} is linear if and only if it can be expressed as

where {u(t)} is a serially i.i.d. process and the weights {a(n)} are fixed. Letting S (ƒ) denote they

j4

n '0|a(n) | < 4 ,

Q2 (ƒ1 ,ƒ2 ) /|By (ƒ1 , ƒ2) |2

Sy(ƒ1) Sy(ƒ2) Sy(ƒ1% ƒ2)

Y jN

' jN&1

t'0y(t) exp & i2Bjt

N,

Fx( j, k) ' X( jN

) X( kN

) X (( j%kN

).

10

spectrum of {y(t)} at frequency ƒ and assuming that

Ashley, Patterson, and Hinich (1986) show that the squared skewness function,

is a constant for all frequency pairs (ƒ , ƒ ) in S whenever the process generating {y(t)} is linear.1 2

Consequently, under the null hypothesis of a linear generating mechanism, sample

estimates of Q (ƒ , ƒ ) for different frequency pairs will differ from one another no more than one21 2

would expect due to sampling error; this is the basis for the Hinich bispectral test.

Using an N-sample of data {y(0), y(1), ... y(N-1)}and letting Y(j/N) denote

F (j,k) provides an unbiased estimate of B (2Bj/N , 2Bk/N), where x y

See Hinich (1982) and Ashley, Patterson, and Hinich (1986) for details. Based on6

simulation results in the latter paper, M is set to the integer closest to N in the calculations.55

reported below.

11

However just as the sample periodogram must be smoothed (averaged over its values at

adjacent frequencies) in order to provide a consistent estimator of the spectrum, S (ƒ) F (j,k)y x

must be smoothed to obtain a consistent estimator of B (2Bj/N , 2Bk/N). Hinich (1982) showsy

that, properly averaged over a square of M adjacent values of F (j,k), this smoothed estimator of2x

the bispectrum yields an estimator of Q (ƒ ,ƒ ) which is asymptotically distributed as a noncentral21 2

chi square variate with 2 degrees of freedom and a noncentrality parameter proportional to

Q (ƒ ,ƒ ). Under the null hypothesis of a linear generating mechanism, this estimator of Q (ƒ ,ƒ )2 21 2 1 2

should have a dispersion consistent with this noncentral chi squared distribution; this proposition

is tested using standard results (e.g., David (1970)) on the asymptotic distribution of the

interquartile range of a sample drawn from a given distribution.6

Uniform and normal deviates were obtained using the RAN1 and GASDEV routines7

given by Press, et al. (1986), respectively. The adequacy of these pseudorandom deviates for thepresent purpose was confirmed by comparing our results to those obtained using IMSLsubroutines and also by observing that our results converge to those obtained from asymptotictheory when the sample size is large. Student’s t deviates were generated from its definition,using normal deviates to generate a P deviate, etc.2

12

3. Data Generation Models

Each of the data generation models was simulated using serially i.i.d innovations. These

data were generated from the unit normal distribution and from three additional distributions.

Data was generated from the Student’s t distribution with 5 degrees of freedom to simulate the

effects of a leptokurtic (fat-tailed) distribution; data was generated from the exponential

distribution to simulate the effects of an asymmetric innovation distribution. And, to simulate the

effect of an innovation distribution for which finite second (and higher) moments do not exist, we

also used the exact algorithm of Kanter and Steiger (1974) to generate symmetric stable Paretian

innovations. 7

In a (perhaps quixotic) attempt to span the space of nonlinear processes considered in the

literature, the seven models given in Table 1 were simulated. The first nonlinear process listed is

the “pure GARCH” process. Although many nonlinear processes display conditional

heteroskedasticity, the GARCH(1,1) model simulated here (like all ARCH/GARCH processes) is

“pure” in that it is a martingale difference: by construction, only its variance is serially dependent.

The next process listed is a generic switching regression. A number of switching models have

been considered in the literature e.g., Tong and Lim (1980), Hamilton (1989), and Teräsvirta

and Anderson (1992). Here, we have simulated a typical threshold autoregressive (SETAR)

model from Tong and Lim (1980) and a simple markov switching model in the spirit of Hamilton

Several more sophisticated markov switching models, estimated by Lam (1997) using8

U.S. real GNP data, are considered in Section 6.

13

(1989). Finally, we consider several models suggested by the usual Volterra expansion. Two8

quadratic models are considered, so as to include both a forecastable process and a martingale

process. Two cubic models were also considered. The “pure cubic” is included because of its

asymmetry; the “bicubic” is included so as to examine the sensitivity of the tests to more general

third order terms.

, is iid(0,1) in all models. As noted in the text, the distributions for , considered were:9t t

gaussian, Student’s t, exponential, and symmetric Paretian.

14

Table 1. Summary of Data Generating Processes Considered9

Serially i.i.d. noise model x = ,t t

Pure GARCH(1,1) model x = (h ) ,t t t

.5

h = .011 + .12 (x ) + .85 ht t-1 t-12

Switching Models:

SETARx = -.5 x + , if x < 1t t-1 t t-1

x = .4 x + , otherwiset t-1 t

Two State Markov x = .4 x + , if in state 2

x = -.5 x + , if in state 1t t-1 t

t t-1 t

(Remain in state with probability .90)

Quadratic models:

martingale x = , + .6 , [, +.6, +.6 , +.6 , +.6 , ]t t t t-1 t-2 t-3 t-4 t-52 3 4

non-martingale x = , + .6 , [, +.6, +.6 , +.6 , +.6 , ]t t t-1 t-2 t-3 t-4 t-5 t-62 3 4

Cubic models:

pure cubic x = , + .2 [, ]t t t-13

bicubic x = , + .6 , [(, ) + .8(, ) + .8 (, ) + .8 (, ) ]t t t-1 t-2 t-3 t-4 t-52 2 2 2 3 2

15

4. An Examination of the Sensitivity of Empirical Test Size to the Distribution of the Data

Like most econometric procedures, the tests described above are only asymptotically

justified. Particular concern has been expressed about the validity of the BDS test for reasonable

sample sizes e.g., Ramsey and Yuan (1987) and, to some degree, addressed in Brock, et al.

(1991). More recently, de Lima (1997) has considered the behavior of a number of nonlinearity

tests where the moment restriction assumptions underlying the asymptotic distributions of these

tests are not satisfied, finding particular problems in situations involving leptokurtic (heavy-tailed)

data.

Because we share these concerns, we routinely bootstrap the significance levels of all the

tests used here, as well as computing significance levels based on asymptotic theory. This is very

straightforward. After pre-whitening, so that the data is serially i.i.d. under the null hypothesis of

a linear generating mechanism, we draw 1000 N-samples at random from the empirical

distribution of the observed N-sample of data. The bootstrap significance level for a given test is

then just the fraction of these 1000 “new” N-samples for which the test statistic exceeds that

observed in the sample data. It is simple enough to confirm that 1000 bootstrap replications is

sufficient by merely observing that the results are invariant to increasing this number; it is

distinctly less clear that N itself is sufficiently large: after all, the pre-whitening procedure and

bootstrap itself are themselves only asymptotically justified.

Consequently, we examined the actual size of each test (using both asymptotic theory and

the bootstrap) using samples of 200 serially i.i.d. variates generated from each of four

distributions: gaussian, exponential, Student’s t with 5 degrees of freedom, and the symmetric

Symmetric stable Paretian variates were simulated using the exact algorithm given by10

Kanter and Steiger (1974). The de Lima (1997) article arbitrarily considers " = 1.50; we chose "= 1.93 because this is the value Fama (1965) estimates for U.S. stock data.

The fact that one of McLeod-Li bootstrap size estimates lies outside the 95% confidence11

interval around .05 is inconsequential in view of the number of estimates made. The bootstrapitself is only asymptotically justified; apparently the bispectral test is so ill-behaved withexponential or pareto data that samples larger than N = 200 are necessary.

16

stable Paretian distribution with " = 1.93 . The exponential distribution is quite asymmetric. 10

Both of the latter two distributions are heavy-tailed to the point where the symmetric stable

Paretian distribution with this index value has infinite variance.

The results of these calculations for N = 200 are given in Table 2. We observe that the

concerns about the small-sample validity of the tests in particular, the BDS test are justified, at

least at this sample length. In contrast, the bootstrap results appear to be satisfactory for all of the

tests, except for the Hinich bispectral test with the heavy-tailed distributions. We conclude that11

it is reasonable to proceed using the bootstrapped tests for samples of roughly this length or larger

without further concern about the form of the data’s distribution.

Results significantly different from .05 are marked with an asterisk. All figures quoted are based on 1000 generated samples. 12

The 5% critical region for each test was obtained using 1000 bootstrap replications. Under the null hypothesis that the actual size is.05, an (asymptotic) 95% confidence interval for these estimates is (.036, .064). The parameters L, p, m, k, R , and M are defined inSection 2, where each test is discussed.

17

Table 2 Empirical Size of 5% Tests12

Serially i.i.d. Data 200 Observations

McLeod-Li Engle LM BDS Tsay Bicov. Bispectral

L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

Bootstrap

Gaussian .050 .059 .042 .049 .062 .064 .056 .054

Student’s t(5) .054 .056 .052 .050 .053 .044 .052 .063

Exponential .040 .053 .055 .053 .055 .055 .050 .006*

Paretian " =1.93 .039 .052 .051 .052 .048 .050.035* .027*

Asymptotic Theory

Gaussian .050 .052 .053 .053.072* .087* .099* .102*

Student’s t(5) .060 .040 .055.089* .077* .085* .032* .122*

Exponential .050 .057 .061.065* .065* .066* .088* .630*

Paretian " =1.93 .039 .036 .052 .054.070* .074* .078* .236*

18

5. The Differential Power of the Tests Across the Alternatives:

Implications for Model Identification

In this Section we discuss our estimates of the power of each test against the various

alternative data generating processes discussed in Section 3. Our goal is to answer the following

questions:

1. Do one or more of the tests have high power against all of the alternative processes?

2. Is the pattern of power estimates across the alternative processes similar for all of the tests?

3. For a given generating process, are the results of all of the tests highly correlated across the

simulations?

If one of the tests dominates all the rest in terms of power, then this test is the one to use

as a “nonlinearity screening test” to, for example, routinely check the fitting errors of a proposed

model for a time series. On the other hand, since such a test has relatively high power against all

of the alternatives, it conveys very little information as to which kind of nonlinear model is

appropriate.

Paradoxically, such identifying information is only obtainable from tests whose

performance is uneven across the alternatives. In this context, a test conveys identifying

information to the extent that it is either particularly powerful or particularly unpowerful against a

limited subset of the alternatives.

Finally, we sought to examine the correlations between the results of the tests for a given

alternative. Here, again paradoxically, what is useful is a lack of consistency. For example, we

generated 250 samples of 200 observations from the SETAR model described in Section 3. If test

19

#1 rejects the null hypothesis of linearity at the 5% level for, say, 200 of these samples, then it has

higher power than test #2 which only rejects the null for 150 samples. But were most of these

150 samples among the 200 samples for which test #1 rejected, or not? If both tests reject over

basically the same samples, then test #2 is simply an inferior alternative to test #1. In contrast, if

this “rejection overlap” is small, then test #2 is sensitive to a different aspect of the data set than

test #1 and provides separately useful information as to whether or not to reject the null

hypothesis in this case it is worthwhile to do both tests and perhaps combine them into a

portmanteau test.

Our estimated power results are given in Table 3 and in Tables 6 to 11 below. Table 3

summarizes the results for all seven generating processes discussed in Section 3, all driven by

gaussian innovations. Each of Tables 6 to 11 focuses on one generating process and compares

the power of the tests across the four innovation distributions considered.

No single test dominates all the others across all seven alternative generating processes.

However, the BDS test clearly stands out in terms of overall power against a variety of

alternatives: it has distinctly the highest power for the bicubic and quadratic processes and is a

close second for the GARCH and pure cubic processes. For the SETAR process the Tsay test

stands out, but even there the BDS test still exibits reasonable power. We conclude that the BDS

test is the the best test of this group for use as a “nonlinearity screening test.”

On the other hand, this same consistently high power across the alternatives also implies

that the BDS test conveys very little information as to what kind of nonlinear process generated

the data. Here it is inconsistent power against the alternatives that is useful. In this context the

Tsay test stands out as a possible marker for SETAR models in particular and for switching

This result is confirmed using simulated data from the Potter (1995) SETAR model for13

U.S. real GNP; see Section 6 below.

20

models generally. The results given in Table 5 indicate that this result holds up across all four

innovation distributions but, obviously, this result needs to be confirmed across a variety of

different SETAR and other switching models. 13

Next we turn to the third question raised at the beginning of this Section. Generally

speaking, there appear to be few complementarities among the tests when a particular test’s

power exceeds that of another against a given alternative, we typically find that the less powerful

test is rejecting the null hypothesis over basically the same sample replications in which the more

powerful test rejects the null also. However, the quadratic models with gaussian innovations

provided some exceptions to this result for the Bicovariance, Tsay, and Engle LM tests. For

example, a crosstabulation of the Bicovariance vs. Tsay test results for the non-martingale

quadratic process is given in Table 4 below. Clearly, neither test is particularly effective in this

instance compared to, say, the BDS test, but the Bicovariance test is rejecting on 27 of the 110

replications that the Tsay test “misses” and the Tsay test is rejecting on 47 of the 130 replications

that the Bicovariance test “misses.” However, Figure 1 a crossplot of all 250 generated

significance levels for these tests shows that the results of these two tests are still rather highly

correlated. Consequently, we conclude that construction of portmanteau tests is probably not

worth pursuing.

All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100014

bootstrap replications. Data generated from the cubic models were serially correlated, so the bootstrap was, in these cases, applied tothe residuals from a prewhitening model. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed. BDS test results were calculated for , equal to .5, 1, and 2 standard deviations; for brevity (and without much loss of information)results are quoted only for , = 1. The generating models (GARCH, SETAR, etc.) are discussed in Section 3.

21

Table 3 Power Estimates of 5% Tests14

Gaussian Innovations 200 Observations


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

GARCH .75 .72 .63 .76 .82 .43 .74 .32

Switching Models

SETAR .13 .16 .71 .66 .60 .85 .14 .23

Markov .16 .33 .57 .54 .51 .07 .15 .07

Quadratic

martingale .24 .40 .86 .90 .90 .34 .40 .25

non-martingale .51 .68 .73 .81 .82 .85 .84 .26

Cubic

pure cubic .44 .77 .77 .76 .71 .26 .36 .06

bicubic .65 .82 .96 1.00 1.00 .66 .76 .24

22

Table 4

Bicovariance Test Versus Tsay Test Crosstabulation

(Quadratic Non-Martingale Model with Gaussian Innovations)

Power for 5% Bicovariance Test .48

Power for 5%Tsay Test .56

Fraction of replications both tests reject null .37

Fraction Bicovariance test alone rejects .11

Fraction Tsay test alone rejects .19

Fraction neither test rejects .33

Transformed Significance LevelsQuadratic Non-Martingale Data Generation Model

-2

-1

0

1

2

3

4

-3 -2 -1 0 1 2 3 4

Tsay Test Sig. Level

Bic

ova

rian

ce T

est

Sig

. Lev

el

The test significance levels in this figure are transformed with an inverse gaussian c.d.f. to spread the data out and make the graph more interpretable e.g.,15

the results in Table 4 correspond to the points in this figure for which one or both transformed significance levels exceeds two. The “bandedness” discernable at theupper and rightmost margins of this figure is an artifact due to the finite number (1000) of bootstrap replications done for each test.

23

Figure 115


bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.

See Section 3, Table 1.17

24

Table 5Empirical Power of 5% Tests16

Data generated from GARCH(1,1) model 200 Observations17


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 5 M = 11

Gaussian .75 .72 .63 .76 .82 .43 .74 .32

Student’s t(5) .47 .60 .57 .68 .77 .23 .46

Exponential .78 .71 .93 .97 .98 .79 .93

Paretian " =1.93 .91 .94 .97 .98 .98 .97 .98




25


Data generated from SETAR model 200 Observations19


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

Gaussian .13 .16 .71 .66 .60 .85 .14 .23

Student’s t(5) .07 .16 .67 .61 .59 .85 .08 .19

Exponential .08 .14 .92 .97 .98 .83 .15 .07

Paretian " =1.93 .06 .16 .57 .49 .41 .66 .16 .12




26


Data generated from Two State Markov model 200 Observations21


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

Gaussian .16 .33 .57 .54 .51 .07 .15 .07

Student’s t(5) .17 .40 .58 .56 .50 .10 .16

Exponential .24 .44 .89 .92 .89 .30 .36

Paretian " =1.93 .15 .32 .70 .65 .60 .14 .24




27


Data generated from Quadratic Martingale model 200 Observations23


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

Gaussian .24 .40 .86 .90 .90 .34 .40 .25

Student’s t(5)

Exponential

Paretian " =1.93




28


Data generated from Quadratic Non-Martingale model 200 Observations25


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

Gaussian .51 .68 .73 .81 .82 .85 .84 .26

Student’s t(5) .36 .56 .73 .81 .83 .79 .72 .22

Exponential .18 .34 .44 .57 .63 .54 .44 .03

Paretian " =1.93




29


Data generated from Pure Cubic model 200 Observations27


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

Gaussian .44 .77 .77 .76 .71 .26 .36 .06

Student’s t(5) .46 .80 .78 .78 .69 .30 .44

Exponential .06 .13 .92 .90 .88 .96 .35

Paretian " =1.93 .10 .18 .74 .74 .74 .50 .20




30


Data generated from Bicubic model 200 Observations29


L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11

Gaussian .65 .82 .96 1.00 1.00 .66 .76 .24

Student’s t(5) .73 .86 .96 .99 1.00 .66 .78

Exponential .61 .80 .93 .98 .98 .90 .89

Paretian " =1.93 .60 .78 1.00 1.00 1.00 .77 .82

31

6. Analysis of U.S. Real GNP

The battery of tests analyzed above was applied to the logarithmic growth rate of U.S. real

GNP over a sample of 163 quarters from 1953I to 1993III. These data are plotted in Figure 2

below; they appear to be reasonably stationary over this time period. The test results themselves

are given in Table 12.

As expected from results in Ashley and Patterson(1989) on the U.S. Index of Industrial

Production and results in Altug, et al. (1995) and in Potter(1995) on real GNP itself, the null

hypothesis of a linear generating mechanism for this time series can be rejected at the 5% level. In

fact, this null hypothesis can be rejected at the 2% level using the Hinich bicovariance test.

The pattern of test results for this time series is quite interesting, however. For one thing,

the strongest rejection is provided by the bicovariance test, a test whose power is fairly low

relative to that of the other tests in most of our simulations. Note also that the BDS test rejects

the null hypothesis at the 3-5% level of significance for m = 3 and m = 4, but does not reject at all

for m = 2. This pattern also appears nowhere in our simulations.

Nowadays, real output is commonly modelled as a two-state switching process of one sort

or another. And the Tsay test, which has high relative power against the simple SETAR

alternative in Table 1, does reject the null hypothesis at the 3% level for the real data. But the

pattern of power results in Table 3 is quite different from the results in Table 12: for the simple

SETAR model, the Hinich bicovariance test has quite low power and the BDS test has relatively

high power for all three values of m. But perhaps data simulated from a SETAR model estimated

to fit U.S. real GNP data will yield a pattern of test results more in keeping with what is observed

The Potter (1995) SETAR model is estimated over real GNP data from 1948III -30

1990IV. In contrast, the results quoted in Table 12 follow the usual practice of truncating thesample to eliminate the Korean War period. (This practice arises because time plots indicate thatit yields sample data which are more plausibly covariance stationary for most macroeconomic timeseries, including real GNP.) In any case, running the tests over the same sample period Potterused produces materially similar results, except that the Hinich bicovariance test’s significancelevel rises to .15, yielding a pattern even less consistent with the test power results obtained usingdata generated from Potter’s estimated model.

32

from applying the tests to the data directly.

To examine this hypothesis we estimated the power of all six tests using 250 data sets

each of length N = 163 simulated from the SETAR model for U.S. real GNP estimated by

Potter (1995). These power estimates are quoted in the first row of Table 13. The pattern of

these power estimates is different from that obtained from the simple SETAR model; however,

this pattern is also quite different from what one would expect if this SETAR model was the

generating mechanism for real GNP. In particular, note that the McLeod-Li and Engle LM tests

have high power against this SETAR alternative, but do not reject the linear null hypothesis on

the actual real GNP data. And the BDS test has high power across all three embedding

dimensions for this SETAR alternative, whereas the BDS test rejects the null hypothesis only for

m=3 and m=4 using the actual data. To assess the statistical significance of this discrepancy, we30

applied the three BDS tests to 1000 data sets simulated from Potter’s estimated SETAR model:

only .2% of these data sets yielded BDS test results matching or exceeding the pattern of results

obtained with the actual data. (See Table 14 for details.) Evidently, the process generating U.S.

real GNP is nonlinear, but not a SETAR process.

We next considered the possibility that the nonlinear process generating U.S. real GNP is

a Markov switching process. As with the SETAR alternative, the simple Markov switching

See Lam (1997) for details. The transition probabilities are also allowed to depend on31

the duration of the previous state.

33

process considered in Sections 3 and 5 yields a pattern of estimated powers for the six tests that is

totally at odds with the pattern of significance levels obtained when the tests are applied to the

real GNP data itself, but this does not eliminate the possibility that a Markov switching model

estimated to fit this data might yield a good match.

The second and third rows of Table 13 give estimates of the power of each test using data

simulated from each of two Markov switching models for U.S. real GNP estimated by Lam

(1997) over the sample period 1952II - 1996IV. Except for the updated sample period, the first

of these models is identical to that of Hamilton (1989) the economy switches probablistically

back and forth between a low growth rate state and a high growth rate state, with a fixed matrix

of state transition probabilities. The second model generalizes the Markov switching framework

to allow the mean growth rate and the matrix of state transition probabilities to depend on the

length of time the economy has been in its current state. 31

The pattern of test results obtained using data generated from these two estimated

Markov switching models is even more dis-similar to that observed using actual GNP data than

was the pattern obtained using data generated from the Potter SETAR model. The results in

Table 3 hinted that the nonlinearity tests might have some difficulty in detecting these kinds of

processes; evidently, these more realistic models exacerbate the problem. In any case, we see

that, if real GNP were in fact generated by one of these Markov switching models, with power

this low it would hardly be likely that the BDS, Tsay, and Hinich bicovariance tests would be

rejecting the linearity null hypothesis on the actual data. Indeed, the results collected in Table 14

U.S. Real GNP Versus Timegrowth rate 1953I - 1993III

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0 20 40 60 80 100 120 140 160 180

34

indicate that less than .1% of 1000 data sets simulated from these two estimated models yield

BDS test results matching or exceeding those observed with the actual data.

These results demonstrate that the commonly held notion that real output is generated by

a two-state switching model of some sort is seriously in error. Indeed, our results indicate that

the true generating mechanism for U.S. real GNP is more complicated than (or, at least, different

from) all of the alternative generating mechanisms considered here.

Figure 2

Test results given in bold are rejections at the 5% level of the null hypothesis of a linear generating mechanism. The 5%32

critical region for each test was obtained using 1000 bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section2, where each test is discussed.

35

Table 12Significance Levels for Nonlinearity Tests on U.S. Real GNP32


L = 24 p = 5 m=2 m = 3 m = 4 k = 5 R = 7 M = 10

N = 163(53I - 93III) .22 .53 .36 .19.05 .03 .03 .02



36


Data generated from Estimated Models for U.S. real GNP 163 Observations


L = 24 p = 5 m=2 m = 3 m = 4 k = 5 R = 7 M = 10

Simulated data from Potter (1995) SETAR model for U.S. Real GNP

.85 .92 .83 .88 .90 .96 .92 .20

Simulated data from Lam (1997) re-estimation ofHamilton Markov switching model for U.S. Real GNP

.04 .07 .09 .11 .11 .07 .06 .05

Simulated data from Lam (1997) estimated Markov switching model for U.S. Real GNP with duration dependent switching probabilities

.09 .11 .10 .10 .12 .12 .12 .10

Each generated data set is 163 observations in length to match the sample length of the34

actual data. Let s(m) denote the significance level at which the BDS for embedding dimension mrejects linearity. Then “more extreme” in this context means s(2) $ .36, s(3) # .05, and s(4) #.03.

37

Table 14

Percentage of 1000 Simulated Data Sets Yielding BDS Test Results at m = 2, 3, and 4

More Extreme than Those Observed with the Actual Real GNP Data34

Potter (1995) Hamilton Markov switching switching model with

SETAR model model (constant transition duration dependent transition

Lam (1997) re-estimate of Lam (1997) Markov

probabilities) probabilities

.2% < .1% < .1%

Additional sample lengths (e.g., 100 and 400) will be examined as this project develops.35

38

7. Conclusions

The size and power of the McLeod-Li, Engle LM, BDS, Tsay, and Hinich

bicovariance/bispectral tests are examined above over a wide variety of data generating

mechanisms for samples of length 200. We conclude that:35

(1) At this sample length, bootstrapping is necessary (but sufficient) in order for the tests to be

properly sized.

(2) Of the tests considered, the BDS test has relatively high power against all of the alternatives,

making it a reasonable choice as a “nonlinearity screening test” for routine use.

(3) The test results appear to be quite highly correlated with one another: based on these results

we see little potential benefit in attempting to combine them into a portmanteau test.

(4) Excluding the BDS test, the remaining tests are quite inconsistent in their power across the

various alternatives considered. Some of the tests (e.g., McLeod-Li) are simply erratic.

Notably, however, the Tsay test appears to have relatively high power against SETAR

alternatives. More simulations need to be done to confirm this, but we tentatively

conclude that observation of a noticeably stronger rejection by the Tsay test than by the

BDS tests should be taken as evidence in favor of a SETAR generating mechanism.

Applying the battery of tests to actual data on real U.S. GNP, we find persuasive evidence that

the generating mechanism for this time series is nonlinear. The pattern of the test results is quite

unlike anything we observe in our simulations, however. In particular, on the actual data, we find

This result suggests that the Hinich bicovariance test may be substantially more useful in36

practice than our results on simulated data in Table 3 indicate.

We have not yet considered data generated from a STAR model (as in Teräsvirta and37

Anderson (1992)) but it will be surprising if those calculations materially affect this conclusion.

39

that the Hinich bicovariance test rejects the null hypothesis more strongly than any of the other

tests. And we find that the BDS test rejects the null hypothesis only at embedding dimensions36

greater than 2 on the actual data.

Taking the observed pattern of nonlinearity test results as a new “stylized fact” about U.S.

real GNP, we examine whether estimated SETAR and Markov switching models in the literature

are consistent with this pattern of test results obtained from the data itself. We find that data

generated from these SETAR and Markov switching models estimated to fit U.S. real GNP data

yield patterns of test results which are significantly different from the pattern observed in applying

the tests to the sample data itself. We conclude that the commonly held belief that some sort of37

regime switching process is an adequate representation of the true generating process for U.S.

real GNP is most likely seriously in error.

40

References

Altug, S., Ashley, R., and Patterson, D. M. (1995) "Are Technology Shocks Nonlinear?"

Virginia Tech Economics Department Working Paper Number E95-55.

Ashley, R. and Patterson, D. M. (1989). “Linear Versus Nonlinear Macroeconomies”

International Economic Review 30, 685-704.

Ashley, R., Patterson, D. M. and Hinich, M. (1986). “A Diagnostic Test for Nonlinear Serial

Dependence in Time Series Fitting Errors” Journal of Time Series Analysis 7, 165-78.

Bollerslev, Tim (1986) “Generalized Autoregressive Conditional Heteroskedasticity” Journal of

Econometrics 31, 307-27.

Brillinger, D. and M. Rosenblatt (1967) “Asymptotic Theory of kth Order Spectra” in Spectral

Analysis of Time Series, (B. Harris, ed.) Wiley: New York, pp. 153-88.

Brock, W. A., Hsieh, D. A., and LeBaron, B.D. (1991) A Test of Nonlinear Dynamics, Chaos,

and Instability: Theory and Evidence MIT Press: Cambridge.

Brock, W. A., Dechert W., and Scheinkman J. (1996) “A Test for Independence Based on the

Correlation Dimension” Econometric Reviews 15, 197-235.

David, H. A. (1970) Order Statistics Wiley: New York.

Engle, Robert F. (1982) “Autoregressive Conditional Heteroskedasticity with Estimates of the

Variance of United Kingdom Inflation” Econometrica 50, 987-1007.

Fama, E. F. (1965) “The Behavior of Stock Market Prices” Journal of Business 38, 34-105.

41

Grandmont, J. M. (1985) “On Endogenous Competitive Business Cycles” Econometrica 53, 995-

1045.

Granger, C. W. J. and Andersen, A. A. (1978) An Introduction to Bilinear Time Series Models

Vandenhoeck and Ruprecht: Gottingen.

Hall, R. (1990) “Invariance Properties of Solow’s Productivity Residual” in P. Diamond (ed.)

Growth/Productivity/Employment MIT Press: Cambridge.

Hamilton, James (1989) “A New Approach to the Economic Analysis of Non-Stationary Time

Series and the Business Cycle” Econometrica 57, 357-84.

Hicks, J. R. (1950) A Contribution to the Theory of the Trade Cycle Oxford University Press:

Oxford.

Hinich, M. (1982) “Testing for Gaussianity and Linearity of a Stationary Time Series” Journal of

Time Series Analysis 3, 169-76.

Hinich, M. (1996) “Testing for Dependence in the Input to a Linear Time Series Model” Journal

of Nonparametric Statistics 6, 205-221.

Hinich, M. and Patterson D. M. (1985) “Evidence of Nonlinearity in Daily Stock Returns”

Journal of Business and Economic Statistics 3, 69-77.

Hinich, M. and Patterson D. M. (1995) “Detecting Epochs of Transient Dependence in White

Noise,” unpublished manuscript.

Judge, G., W., Griffiths, C., Hill, H. L, Lütkepohl, Lee, T. C. (1985) The Theory and Practice

of Econometrics John Wiley and Sons: New York.

Kanter, M. and Steiger W. L. (1974) “Regression and Autoregression with Infinite Variance”

Advances in Applied Probability 6, 768-83.

42

Lam, P. (1997) “A Markov Switching Model of GNP Growth With Duration Dependence”

(unpublished manuscript).

de Lima, P. J. F. (1997) “On the Robustness of Nonlinearity Tests to Moment Condition Failure”

Journal of Econometrics 76, 251-80.

McLeod, A. I. and Li, W. K. (1983) “Diagnostic Checking ARMA Time Series Models Using

Squared-Residual Autocorrelations” Journal of Time Series Analysis 4, 269-73.

Palm, F. C. and Pfann, G. A. (1997) “Sources of Asymmetry in Production Factor Dynamics”

Journal of Econometrics 82, 361-92.

Potter, S. (1995) “A Nonlinear Approach to U.S. GNP” Journal of Applied Econometrics 10,

109-125.

Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1986) Numerical

Recipes: The Art of Scientific Computing. Cambridge University Press: Cambridge.

Ramsey, J. B. and Yuan, H. J. (1987) “The Statistical Properties of Dimension Calculations

Using Small Data Sets” New York University Economic Research Reports: RR 87-20,

53-63.

Subba Rao, T. and Gabr, M. (1980) “A Test for Linearity of Stationary Time Series Analysis”

Journal of Time Series Analysis 1, 145-58.

Teräsvirta, T. and Anderson, H. (1992) “Modelling Nonlinearities in Business Cycles Using

Smooth Transition Autoregressive Models.” Journal of Applied Econometrics 7, 119-36.

Tsay, Ruey S. (1986) “Nonlinearity Tests for Time Series” Biometrika 73, 461-6.

Documents

NONLINEAR MODEL SPECIFICATION/DIAGNOSTICS: …ashleymac.econ.vt.edu/working_papers/e9806.pdfconsidered, and because each test is implemented using both the usual asymptotic theory