Testing for parameter constancy in non-Gaussian time series

Testing for parameter constancy innon-Gaussian time seriesLu Hana and Brendan McCabea,�,†

This paper investigates testing for parameter constancy in models for non-Gaussian time series. Models for discretevalued count time series are investigated as well as more general models with autoregressive conditionalexpectations. Both sup-tests and CUSUM procedures are suggested depending on the complexity of the modelbeing used. The asymptotic distribution of the CUSUM test is derived for a general class of conditionalautoregressive models.

Keywords: Non-Gaussian Time Series; Discrete Valued Count Time Series; sup-Test; CUSUM Test.

1. INTRODUCTION

This paper is concerned with testing for parameter constancy in models for nonlinear and non-Gaussian time series. One of themotivations of the paper is that there exist difficulties in applying some of the more popular techniques, like the likelihood basedsup-tests introduced by Andrews (1993), to nonlinear non-Gaussian models. This is essentially because the nonlinearity and thepossibility of large numbers of parameters (often associated with lags) makes computation of the required critical values quitedifficult. Accordingly we seek a simplified approach that may be applied generally even in the absence of a likelihood.

The paper is organised as follows. Section 2 discusses a set of nonlinear non-Gaussian models, the integer autoregressive class(INAR), that are useful in modelling dependence in discrete count data and a maximum score statistic (Andrews, 1993) is suggestedto test for a structural break. It turns out that it is impossible to compute critical values for sup-tests unless the chosen model inthe INAR class is comparatively simple. This suggests the need for a general technique to test for parameter stability that may beapplied not only to discrete valued time series but to a general class of nonlinear and non-Gaussian processes. Section 3 uses ageneralised conditional autoregressive (CAR) framework which extends the linear setup of Grunwald et al. (2000) (GHTT). Accordingto GHTT even the simple conditional linear structure accounts for over 30 different nonlinear non-Gaussian time series modelspublished in the literature. Included are models where Yt is continuously valued and defined on the real line or on the subsetsthereof. Also included are models for discrete Yt and, in particular, the INAR class. We advocate the use of the two-sided CUSUMtest for parameter constancy, proposed by Brown et al. (1975) (BDE), in the CAR class. Section 4 describes the CUSUM test and itsasymptotic distribution is established in Section 5. These results allow the CAR class to be checked for parameter stability in a verysimple way. A Monte Carlo study is conducted in Section 6 to assess the performance of the CUSUM test when the data isgenerated by two different discrete valued INAR models from the CLAR class, the Poisson autoregression (INAR-P) and Negative-Binomial autoregression (INAR-NB) models respectively. Of course, there is the potential for the CUSUM test based on theconditional mean to suffer a loss of power relative to a sup-test for a model with a fully specified likelihood. A further Monte Carlosimulation compares the performance of the CUSUM and sup-tests in the INAR-P model, as it is feasible to compute critical valuesin this case. In contrast, due to computational difficulties, critical values for the INAR-NB model are not presented. A datasetconsisting of 144 count observations on motor vehicle thefts is analysed in Section 7 , where a structural break is detected and acomparison made between forecasts derived from a naive use of the full data and those computed from the post break segmentalone. The two sets of forecasts differ appreciably and this is true both for point forecasts and probability estimates of the forecastdistribution. Thus, ignoring a structural break may have a substantial bearing on the forecast outcomes for count data series.Section 8 concludes.

2. MODELS FOR COUNT TIME SERIES

The general INAR class is a set of models for discrete valued time series that has the form

Yt ¼ /1 � Yt�1 þ � � � þ /p � Yt�p þ et; ð1Þ

aUniversity of Liverpool*Correspondence to: Brendan McCabe, Management School, University of Liverpool, Chatham Building, Chatham Street, Liverpool, L69 7ZH, UK.†E-mail: [email protected]

J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd.

Original Article

First version received November 2011 Published online in Wiley Online Library: 2012

1

where, conditional on Yt�k, /k � Yt�k are thinning operators (see Section 2.1) and et is a disturbance. The disturbance may follow anydiscrete valued process. It is possible, asymptotically, to estimate the infinite dimensional distribution of et non-parametrically (seeDrost et al. (2009) and McCabe et al. (2011)), However it is clearly very difficult to simulate critical values for a sup-test in this context;as a result, a parametric model for the disturbances is required. Here, we list the basic properties of two of the most commonparametric discrete valued models in the case where there is a single lag. The Poisson model is discussed in Section 2.1 while Section2.2 deals with the Negative-Binomial case.

2.1. The INAR (1)-P model

The INAR(1)-P model originally proposed by McKenzie (1985) and Al-Osh and Alzaid (1987) and is specified by the equation

Yt ¼ BinðYt�1; aÞ þ ntðkÞ; ð2Þ

where Bin(Yt�1,a) is, conditional on Yt�1, a Binomial distribution with probability of success a and nt(k) are independently andidentically (i.i.d.) Poisson arrivals with parameter k, denoted Pois(k). This model is often written in the thinning notation

Yt ¼ a � Yt�1 þ ntðkÞ;

where

a � Yt�1 ¼XYt�1

j¼1

Bjt

and each collection fBjtgYt�1

j¼1 is a set of independent Bernoulli random variables with parameter a. The mass function of Bin(Yt�1,a) isgiven by

f ðsjYt�1; aÞ ¼ Yt�1

s

� �asð1� aÞYt�1�s

and that of nt(k) is written

gðs; kÞ ¼ e�kks

s!:

By the usual convolution arguments

pðYtjYt�1Þ ¼XminðYt ; Yt�1Þ

s¼0

Yt�1

s

� �asð1� aÞYt�1�s e�kkYt�s

ðYt � sÞ! ð3Þ

from which the likelihood, conditional on the initial observation, can be constructed.The notation ðat; ktÞ is used to indicate parameter instability when a and k are allowed to vary. In the case of a single structural

break at a time TB, we set

at ¼ a and kt ¼ k for t ¼ 1; . . . ; TB:at ¼ aþ Da and kt ¼ kþ Dk for t ¼ TB þ 1; . . . ; T :

�: ð4Þ

The break point TB is assumed to be in an interval that is TB 2 TL; TUð Þ and often the interval TL; TUð Þ is expressed as a fraction of thesample size, and denoted P. A minor modification of the argument leading to eqn (3) allows for the likelihood to be constructedunder a structural break. Specifically, the log likelihood is given by

‘ ¼ logYT

t¼2

XminðYt ;Yt�1Þ

s¼0

Yt�1

s

� �as

tð1� atÞYt�1�s e�kt kYt�st

ðYt � sÞ! ð5Þ

The null hypothesis is H0: ðDa;DkÞ ¼ 0 and the alternative, H1, is that, after a fixed point TB (fraction p), ðDa;DkÞ 6¼ 0 . The scorefunction for parameter vector h ¼ Da; Dk; a; k½ �0 is

_‘h ¼ _‘Da ;_‘Dk ;

_‘a; _‘k� �0

;

where _‘h denotes differentiation of the log likelihood given in eqn (5) with respect to vector h, whose elements are specified by eqn(4). The symmetric information matrix is written

€‘ðhÞ ¼ �

€‘DaDa€‘DaDk

€‘Daa€‘Dak

€‘DkDk€‘Dka

€‘Dkk€‘aa

€‘ak€‘kk

2664

3775

with each element of €‘ hð Þ, €‘ð�Þð�Þ, denoting a second derivative with respect to the elements of h. The restricted parameter vectorunder the null, hR, is estimated by hR ¼ 0; 0; a; k

� �0. The estimators a and k are obtained by maximising the log likelihood (5)

under ðDa;DkÞ0 ¼ 0. The score statistic SDaDkðpÞ is evaluated as

L. HAN AND B. MCCABE

wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012

2

SDaDkðpÞ ¼ _‘h hR

� �0€‘ hR

� ��1_‘h hR

� �:

By constructing such a score statistic for every TB 2 TL; TUð Þ we compute the sup-test as the maximum of that set, writtensup SðpÞ ¼ supp SDaDkðpÞ. Obvious special cases occur when a only or k only is thought to change. Explicit expressions for thederivatives involved are available in Han (2011). For the INAR(1)-P model it is feasible to construct asymptotic critical values for thesup-test by simulation, for, despite the nonlinearity, there are only two parameters involved under the null.

Once P and the dimension of the shift D are determined, the critical values of the distribution of sup S(p) can be evaluated by theMonte Carlo experiments. We set P ¼ [0.1,0.9] and T ¼ 3000. In addition, 7000 replications are used to determine critical values atthe 0.01, 0.05 and 0.1 significance levels. The simulated critical values for sup S(p) are given in the first half of Table 1, for the caseswhere a changes however k is regarded as a constant, k changes while a is constant and, finally, where both a and k are thoughtlikely to change. We computed confidence intervals for the quantiles of the distribution of sup S(p) using asymptotic normality anda nonparametric estimate of the unknown density as in Serfling (1980), Section 2.6.

2.2. The INAR (1)-NB model

The specification of the INAR (1)-NB model is given by

Zt ¼ b-Bin Zt�1;m; nð Þ þ ft l; bð Þ; ð6Þ

where, conditional on Zt�1, b-Bin(Zt�1, m, n) has a Beta-Binomial distribution. The Beta-Binomial mass function, on s ¼ 0,1, . . . ,Zt�1, is

b-Bin s j Zt�1; m; nð Þ ¼ Zt�1

s

� �B mþ s; nþ Zt�1 � sð Þ

B m; nð Þ

where B(Æ, Æ) is the Beta function. The Negative-Binomial arrivals ft(l,b) are i.i.d. and have mass function N (s), on s ¼ 0,1,2, . . .1, givenby

N sð Þ ¼ C sþ lð ÞC lð ÞC sþ 1ð Þ b

s 1� bð Þl;

where C(Æ) is the Gamma function. It is also possible to think of the Negative-Binomial model as a INAR-P model where a and k arerandom with Beta and Gamma distributions respectively. Thus, the Negative-Binomial model may be thought of as a very naturaloverdispersed generalisation of the Poisson. As before, the likelihood under a structural break may be computed and the relevantscore and information matrix are available in Han (2011). Unfortunately, it is not feasible to readily compute simulated critical valuesfor the maximum value of the score statistic for this model because of the number of parameters and the nonlinearities involved.

3. THE CAR CLASS OF MODELS

Due to the potential difficulty in computing critical values for sup-statistics and the need for tests of parameter stability for nonlinearnon-Gaussian model in general, this section adopts a conditional autoregressive approach which encompasses many existing modelsin the literature. In Section 3.1, the scope of the CAR class is outlined and its basic properties are established. Then, Section 3.2reviews the Conditional Least Squares (CLS) framework and the assumptions of Klimko and Nelson (1978) (KN), which allowsasymptotic distribution theory of CLS estimators of CAR class to be established.

3.1. Definition and Basic Properties

Let Yt be a process in discrete time with a continuous or discrete sample (state) space Y � R, the real line. The CAR structure isdefined by the relation

E Yt jFt�1

� �¼ g h;Ft�1

; ð7Þ

where Ft�1 is a sigma field at time t�1 and h is a P-dimensional vector of parameters. The function g(Æ) is assumed to be known andthis is an important consideration in applications where such prior knowledge is assumed.

Table 1. Estimated critical values for maximum score with the confidence intervals (Replication ¼ 7000) and two-sided CUSUM tests

Sup S(p)Two-sided CUSUM

Size Da Dk Da;Dkð Þ

0.01 ð11:3303; 13:8472Þ12:5888

ð11:3868; 13:6820Þ12:5344

ð15:5823; 17:3052Þ16:4437 1.6276221

0.05 ð8:8657; 9:4071Þ9:1364

ð8:8301; 9:3749Þ9:1025

ð12:1665; 12:7965Þ12:4814

1.3581015

0.10 ð7:3994; 7:7126Þ7:5561

ð7:3958; 7:7432Þ7:5695

ð10:4511; 10:8078Þ10:6294

1.2238734

TESTING FOR PARAMETER CONSTANCY

J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa

3

The conditional linear special case, the CLAR (p) model, has

g /;Ft�1

¼ /0 þ /1Yt�1 þ � � � þ /pYt�p

¼ /0 þ y0t�1/:ð8Þ

GHTT studied the 1-dimensional linear case that is eqn (8) with p ¼ 1. The CLAR(p) class of models itself is very broad and, as GHTThas pointed out, contains many of the nonlinear and non-Gaussian models that have been proposed in the literature. For instance,continuously valued autoregressive (AR) models of the form

Yt ¼ /1Yt�1 þ � � � þ /pYt�p þ et /0ð Þ; ð9Þ

where, et /0ð Þ are i.i.d. with finite mean /0, are included. Any continuous random variable may serve as disturbances, for example,et /0ð Þ may be Gaussian on R or exponentially distributed on Y ¼ ½01Þ. Switching models are allowed so that et /0ð Þ could be zerowith probability P and exponential, say, with probability (1�p). One immediate consequence of the CLAR (p) definition (8) is that thecorrelation structure of the CLAR (p) class is the same as that of the AR (p) model (9). This generalises the p ¼ 1 result of GHTT. TheCLAR (p) class includes an extension of eqn (9) that allows for random coefficients that is

Yt ¼ /1;t Yt�1 þ � � � þ /p;tYt�p þ et /0;t

; ð10Þ

where the /i,t are random variables with E /i;t

� �¼ /i . For example, in the case where p ¼ 1, we can let /1,t have a suitably chosen

Beta distribution and et /0ð Þ a Gamma distribution, resulting in the marginal distribution of Yt being Gamma. Such models are usefulin modelling duration data which are dependent.

The general INAR class as given in eqn (1) is also included as a special case of the CLAR (p) process with Y ¼ f0; 1; 2:::g. The INAR-Pmodel of Section 2.1 uses a thinning function built on the Binomial distribution with the arrivals process, et /0ð Þ, being Poisson. Theconditional expectation has the form

E Yt jFt�1

� �¼ aYt�1 þ k:

The INAR-NB model of Section 2.2 uses Beta-Binomial thinning and Negative-Binomial arrivals. The conditional expectation is

E Zt jFt�1

� �¼ m

mþ nZt�1 þ l

b1� b

ð11Þ

and is linear in Zt�1 with /1 ¼ m/(m+n) and /0 ¼ lb/(1 � b). The parameters /k in eqn (1) are also permitted to be random variablesan example of which, given in Section 2.2, was used to motivate the INAR-NB model.

Parameter driven models (see Cox(1981), fall into the CAR class with

g /;Ft�1

¼ g y0t�1/þ /0

:

For example, an alternative to eqn (2) when modelling dependent counts would be to assume that

Yt jFt�1

¼d Pois ktð Þ

with kt ¼ e y0t�1/þ/0ð Þ thus allowing for Yt jFt�1

to be Poisson distributed (see, for example, Heinen and Rengifo (2007)). Here ¼d

means equal in distribution. A Gamma distribution whose mean is driven by g y0t�1/ þ /0

could serve as an alternative model

for dependent durations (see, for example Engle and Russell (1998)). Section 3.2 addresses the question of inference in the CARclass.

3.2. Estimation of the CAR Class

Specifying detailed regularity conditions for inference in CAR models is quite lengthy, so we utilise the Conditional Least Squares(CLS) framework of KN, specifically their Thms 2.1 and 2.2. CLS minimises, with respect to h, the sum of the squared deviations

Q hð Þ ¼XT

t¼1

Yt � E Yt jFt�1

2 ð12Þ

For notational convenience we start summations at t ¼ 1. Under the KN conditions, h, a solution to the equations

@Q hð Þ@h ¼ 0;

ð13Þ

is a consistent estimate of h and T 1=2 h � h� �

is asymptotically normal. Unfortunately, verifying these high level conditions for thevarious models that may be used in practice is quite difficult and KN (Thms 3.1 and 3.2) provide details for the case when Yt isstationary and ergodic. However, Tjostheim (1986) (TJ) has made the important point that finding nonlinear models which satisfystationary and ergodic conditions is far from trivial and he provides additional consistency and asymptotic results for nonstationaryprocesses.

Noting that rt ¼ Yt � E Yt jFt�1

is a martingale difference sequence, we may utilise the fact that Laws of large Numbers (LLN)

and Central Limit Theorems (CLT) are readily available for such processes. Stationarity and ergodicity simplify matters considerably.



4

Since a stationary process may or may not have started in the infinite past and since we require martingale arguments, whoseadapted sigma fields are required to be increasing, it is necessary to define the (sub) sigma field, for some finite integer m,

Ft�1ðmÞ ¼ r Ys; t �m s t � 1f g

and to assume that (a.s.)

E Yt jFt�1

¼ E Yt jFt�1 mð Þ

E r2t jFt�1

¼ E r2

t jFt�1 mð Þ ð14Þ

(See eqns (3.3) of TJ). Transparent sufficient conditions for asymptotic normality of T 1=2 h� h� �

, the solution to eqn (13), are

ASSUMPTION KN(1) the sequence Yt is stationary and ergodic with finite 4th moments(2) the function g h;Ft�1

and its partial derivatives are continuous and uniformly bounded in h for all i,j,k i.e.

gj j H0; dg= dhij j H1i ; d2g= dhi dhj

�� H2ij ; d3g= dhi dhj dhk

�� H3ijk

where the H-functions are independent of h, may depend on Yt�1; . . . ; Y0 and are square integrable.(3) the matrix defined by E dg=dhi:dg=dhj

� �is non singular and dg h;Ft�1

=dhi has finite 4th moments.

In the linear CLAR (p) case, Assumption KN reduces to the sequence yt being stationary and ergodic with finite 4th moments andthe matrix of the lagged yt�1 having rank P; these conditions are sufficient for T 1=2ðh � hÞ to be asymptotically normally distributed.

In the CLAR (1) case, conditions for stationarity and ergodicity were given in GHTT and while these conditions are not veryrestrictive, as noted above, TJ makes the point that the simplicity of the linear situation does not transfer to nonlinear models, forexample the bilinear models of Granger and Andersen (1978). Thus conditions for CAR estimators to be asymptotically normal arerequired for nonstationary processes. Theorems 6.1 and 6.2 of TJ provide conditions for h to be consistent and for T 1=2ðh � hÞ to beasymptotically normal for a reasonably broad class of nonstationary processes which allows for heterogeneity and processes whichdo not have an initial stationary distribution; in addition a multivariate generalisation is provided allowing Yt to be a vector.

4. STRUCTURAL BREAKS IN CAR MODELS

For a recent survey of the general area of structural breaks, we refer the reader to Perron (2005). Non constancy of the parameters in thedata generating process (DGP) of Yt will typically induce instability in the parameter vector h and the conditional expectation will behave time varying parameters that is E Yt jFt�1

� �¼ g h

t;Ft�1

. When parameter instability presents itself as a structural break we set

ht ¼h t ¼ 1; . . . ; TB

hþ Dh t ¼ TB þ 1; . . . ; T

�:

where the break point TB and the magnitude of shift vector Dh are unknown. The null hypothesis is

H0 : Dh ¼ 0

and the alternative is that H0 is not true.The Negative-Binomial model of Section 2.2 shows that there may be a price to pay for adopting the simple CAR approach. From

eqn (11), it is apparent that simultaneous changes in parameters may cancel out and the CAR approach will have no power. To seehow this might occur, set the total differential of / ¼ m/(m + n) to zero to obtain

D/ ¼ @

@m

m

mþ n

� Dmþ @

@n

m

mþ n

� Dn ¼ 0

which implies that there will be no change in / if

Dm

m¼ Dn

n

i.e. if m and n change by the same percentage amount. Of course, there is no a priori reason to believe that such cancellations arelikely to occur in practice. Since the model is estimated under the null the a CUSUM test may always be performed, however it mayhave little power in this type of situation.

Using eqn (13) define

rt ¼ Yt � g h;Ft�1

� �ð15Þ

and

�r ¼ T�1XT

t¼1

rt ð16Þ



5

then the two-sided statistic based on the CUSUM process is given by

maxj¼1;...;T

T�1=2s�1r

Xj

t¼1

rt � �r ��

��where

s2r ¼ T�1

XT

t¼1

rt � �r 2

: ð17Þ

If there was additional information on the possible position of the break, for example, that it occurred late in the sample for example,a weighted version of the CUSUM may be considered, see Olmo and Pouliot (2011)

5. NULL DISTRIBUTION OF CUSUM TEST

In view of the discussion of the distribution theory for CAR estimators we shall consider the stationary and nonstationary casesseparately. First, we deal with the stationary CAR model and this will allow the discrete valued models of Section 2 to be tested forstructural breaks as the Poisson and Negative-Binomial thinning models are stationary and ergodic. Thus, the following Theorem 1,whose proof is given in the Appendix, circumvents the difficulty of finding critical values for the sup-test in the INAR-NB model.

THEOREM 1. Let Yt satisfy the conditions of Assumption KN. For h, a solution to (13), and using using definitions (15),(16) and (17),

maxj¼1;...;T

T�1=2s�1r

Xj

t¼1

rt � r� ��

��) sups2½0;1�

B sð Þj j

where BðsÞ is a Brownian Bridge.Explicit expressions for the distribution of the Brownian Bridge (see Pitman and Yor (1999)) are available and critical values are

given in the second half of Table 1.

Theorem 1 hinges on (C1) T 1=2 h� h0

� �being OP(1), (C2) T�1 sups2 0;1½ �

P sTb ct¼1 ht � �h �� !p 0 and (C3) rt satisfying a stationary

martingale FCLT. These steps need to reformulated for the non-stationary case. Define moments r2 ¼ VarPT

t¼1 rt

h i,

Rh ¼ VarPT

t¼1 ht

h iand Ru ¼

PTt¼1 E hth0tr2

t

� �.

THEOREM 2. Augment the conditions of Thms 6.1 and 6.2 of Tjostheim (1986) by

XT

t¼1

hth0t ¼ Op R1=2u R1=2

h

� �; ð18Þ

R�1=2h

XsTb c

t¼1

ht ð19Þ

is tight and

r�1XsTb c

t¼1

E r2t

� �! s; s 2 ½0; 1� ð20Þ

then

XT

t¼1

rt � �r 2

" #�1=2

maxj¼1;...;T

XsTb c

t¼1

rt � r� �

) sups2 0;1½ �

B sð Þj j

Sufficient conditions to ensure eqns (18) and (19) would be to assume that Yt satisfied a mixing property (or perhaps near epochdependent on a mixing process) then, using eqn (14), ht would also be mixing with the same mixing numbers and so a FCLT could beestablished. Theorems 1 and 2 show that any process Yt of interest to an investigator may be subjected to a test for parameterinstability in a simple way as long as it satisfies the CAR property and the associated regularity conditions. Obviously, in any particularapplication one has to check that the CLAR family member does satisfy the conditions. An interesting case is the random coefficientAR model of Section 3.1. Using Thm 7.2 and (4.8) of TJ, it follows that eqn (10) satisfies the regularity when /i,t are i.i.d., areindependent of the disturbance and satisfy some additional moment conditions. On the other hand, should /i,t follow a stochasticprocess (e.g. an ARMA model) only some special cases have been considered as in, for example, Thm 7.3 of TJ.



6

6. MONTE CARLO EXPERIMENTS FOR THE CUSUM AND SUP S(P) TESTS

Here, the sizes and powers of the two-sided CUSUM test when the data generating process (DGP) is Poisson or Negative-Binomial, areinvestigated. We also compare the CUSUM and sup S(p) tests to determine the loss of power that may occur by using information onthe conditional mean only. A significance level of 5% is employed in each experiment, which consists of 5000 replications for theCUSUM test and 1000 for the sup S(p) test. The results show that the CUSUM test can suffer from substantial power loss whencompared with the sup S(p) test. However, the sup S(p) test presupposes the true model is known and correctly specified but this isoften not the case in practice.

6.1. Size

Table 2 gives sizes of the CUSUM test for both the Poisson and Negative-Binomial models with the sample size T ranging from 100 to1000. For the Poisson model we consider cases where k is fixed at 1 and the values of a vary from 0.1 to 0.9 with grid of 0.2. Alsoconsidered is the situation where a is fixed at 0.5, k ranges from 1–3 with an increment of 0.5. For the Negative-Binomial model, themeans of the Beta thinning and Gamma arrivals distributions have the same magnitudes as those in Poisson cases.

The CUSUM test is seen to be under sized when T is small but as T gets large the size approaches the required 5%. In contrast, thesup S(p) test for the Poisson model has good size for all values of T as seen in Table 3.

6.2. Power

Extensive power results are available from the authors on request but the overall picture is as follows. Generally, the power of theCUSUM test is higher in the Poisson case than it is in the Negative-Binomial model. When either the thinning or innovationparameters change, the power surfaces are symmetric around Da ¼ Dk ¼ 0 and the power approaches 1 as the absolute value ofthe change increases. As expected, the power is maximised near the break fraction Fr ¼ 50% and tails off as the break fractionapproaches the extremes of P, again see Olmo and Pouliot (2011). The CUSUM test has significantly more power against changes in kthan against changes in a. When both a and k change simultaneously in opposite directions the power may drop significantly. This isbecause an increase in the number of arrivals may be offset by a decrease in the thinning parameter resulting in fewer survivors, thusleaving the observed values little changed.

In contrast to the CUSUM statistic, the power of the sup S(p) test is much less sensitive to the break fraction being at the extremesof P. It is also less sensitive to possible offsetting effects when a and k change simultaneously. In comparing the CUSUM andsup S(p), for changes in a only, the sup S(p) test is clearly superior while for changes on k only, the CUSUM test is competitive exceptat the extremes of P. When both a and k change the CUSUM test can be more powerful except in those regions where offsettingmay occur. Further results on power can be found in Han (2011).

7. APPLICATION TO MOTOR VEHICLE THEFT DATA

The data analysed in this section consists of T ¼ 144 monthly counts of motor vehicle theft in district 101 of Pittsburgh, PA, recordedfrom January 1990 to December 2001. This dataset, denoted by MVTHEFT hereafter, is originally from Crime Data section of theForecasting Principles website (see http://www.forecastingprinciples.com). Prior to testing for a structural break in Section 7.1, weplot an overall picture of the data in Figure 1 and present descriptive statistics in Table 4. The data is clearly overdispersed but the

Table 2. Sizes of two-sided CUSUM test for INAR (1) -P and INAR(1)-NB models with replication ¼ 5000

INAR(1)-P Model (Two-sided CUSUM Tests)

k ¼ 1

a

a ¼ 0.5

k

0.1 0.3 0.5 0.7 0.9 1 1.5 2 2.5 3

T 100 0.0292 0.0306 0.0254 0.0160 0.0242 T 100 0.0232 0.0256 0.0260 0.0274 0.0284200 0.0376 0.0362 0.0340 0.0288 0.0170 200 0.0352 0.0314 0.0340 0.0350 0.0352300 0.0432 0.0422 0.0422 0.0372 0.0266 300 0.0392 0.0370 0.0388 0.0338 0.0404500 0.0422 0.0438 0.0396 0.0368 0.0280 500 0.0414 0.0398 0.0386 0.0412 0.0448

1000 0.0474 0.0501 0.0468 0.0476 0.0410 1000 0.0456 0.0420 0.0422 0.0432 0.0440

INAR(1)-NB Model (Two-sided CUSUM Tests)

E[k] ¼ 1

E[a]

E[a] ¼ 0.5

E[k]

0.1 0.3 0.5 0.7 0.9 1 1.5 2 2.5 3

T 100 0.0240 0.0290 0.0216 0.0194 0.0238 T 100 0.0242 0.0246 0.0244 0.0286 0.0218200 0.0362 0.0360 0.0314 0.0348 0.0222 200 0.0336 0.0348 0.0356 0.0366 0.0332300 0.0402 0.0362 0.0398 0.0312 0.0286 300 0.0392 0.0342 0.0408 0.0376 0.0362500 0.0424 0.0454 0.0382 0.0370 0.0296 500 0.0390 0.0388 0.0410 0.0398 0.0416

1000 0.0462 0.0406 0.0438 0.0436 0.0410 1000 0.0478 0.0448 0.0418 0.0416 0.0442



7

figure also illustrates that after the first 40 or so observations there seems to be no strong evidence of overdisrepsion. There is also ahint of some seasonality2. Further, the sample ACF, plotted in Figure 2, exhibits strong evidence of dependence. Because of theoverdispersion, the INAR-P sup test does not seem to be appropriate and the CUSUM approach is used. In Section 7.1, both INAR( 1)-Pand INAR(1)-NB models are fitted to the post break only data and model selection is implemented via the Information Matrix (IM) test.Section 7.2 investigates the difference in forecasting performance between a naive use of the full data set and using the post breaksegment only. It reveals that the break testing is important since the two sets of forecasts are quite different.

7.1. Test and estimation

Here, we apply the CUSUM procedure to test for the presence of a structural break and to identify the break position. The first task isto select the order of CLAR model to be fitted. Since there is the possibility of a break in the data, it is hard to identify the order by thesample ACF or partial autocorrelation function (PACF). Thus, a first order CLAR(1) model is fitted but the CUSUM procedure is modifiedby replacing the usual studentisation by a long run variance estimator (LRV) with the lag length chosen by q ¼ 12(T/100)1/2. Wecomputed the test for the Parzen, Bartlett, QS and Truncated kernels and all agree that the estimated break point is an auspiciousTB ¼ 42. In this example, the lag length evaluates to q ¼ 14 and we also recomputed the tests for a sequence ranging from q ¼ 10to q ¼ 18 to assess the robustness of the original choice. The estimated break point remained at TB ¼ 42.

Although the MVTHEFT data is not necessarily considered to be the sum of thinning and innovation processes, short termdependence among crime counts may nevertheless be modelled by an INAR model and since the data appear to exhibit a structuralbreak at TB ¼ 42, only 102 post break observations are used. The descriptive statistics for the post break data are given in Table 5and the degree of overdispersion is much reduced.

The post break sample PACF is plotted in Figure 3 and it appears that a model with a single lag is sufficient. The parameters of theINAR (1)-P model are estimated by CLS and these values are themselves used as starting values for the MLE’s. An INAR(1)-NB model isalso fitted by maximum likelihood. The estimation results for both models are provided in Table 6, where it can be seen that the

Table 3. Sizes of maximum score test for INAR(1)-P Model with Replication ¼ 1000

INAR(1)-PModel (Maximum score tests)

k ¼ 1

a

a ¼ 0.5

k

0.1 0.3 0.5 0.7 0.9 1 1.5 2 2.5 3

T 100 0.0516 0.0302 0.0269 0.0394 0.1818 T 100 0.0721 0.0697 0.0539 0.0536 0.0463200 0.0672 0.0473 0.0365 0.0378 0.0841 200 0.0682 0.0564 0.0433 0.0459 0.0458300 0.0547 0.0471 0.0398 0.0521 0.0671 300 0.0589 0.0480 0.0499 0.0629 0.0480500 0.0521 0.0508 0.0470 0.0431 0.0442 500 0.0502 0.0514 0.0579 0.0482 0.0443

1000 0.0582 0.0490 0.0532 0.0531 0.0471 1000 0.0518 0.0549 0.0537 0.0586 0.0531

Figure 1. Time series plots of MVTHEFT Data

Table 4. Descriptive statistics of MVTHEFT data

Maximum Minimum Mean Median Mode Variance

17 0 4.3333 3 2 14.4895



8

estimated expectations of the thinning, /, and innovation, l, processes in the INAR (1)-NB model are very close to those estimatesobtained by CLS and ML when the INAR(1)-P is fitted. Thus both models appear to be reasonable representations for the post breakdata.

The choice between the two models is made by means of the Information Matrix (IM) specification test as suggested in Freelandand McCabe (2004). The results for the IM specification test for the INAR( 1)-P and INAR(1)-NB models are given in Table 7 and theNegative-Binomial is rejected in favour of the Poisson specification as the P-value of the component of the IM test associated with bis 0.0000.

Figure 2. Correlogram of MVTHEFT Data

Table 5. Descriptive statistics of post break of MVTHEFT data

Maximum Minimum Mean Median Mode Variance

9 0 2.7647 2 2 3.8054

Figure 3. The Correlograms of Post Break Data of MVTHEFT

Table 6. Estimations of post break MVTHEFT data

INAR (1) - P INAR (1)-NB

CLSE MLE MLE

a k a k m n l b / l

0.3052 1.8569 0.2496 2.0111 3.0414 7.9016 8.2466 0.1897 0.2779 1.9306



9

7.2. Probabilistic forecasting

Drawing on the results of the previous section, the post break MVTHEFT data is modelled by an INAR (1)-P process and is used toforecast. However, it is interesting to compare the performance of forecasting using the post break data with that of a naiveforecaster who, while catering for overdispersion by fitting an INAR(1)-NB, unwittingly ignores the possibility of a structural break.Table 8 gives the h-step-ahead conditional mean, median and mode forecasts for the INAR(1) -P post break and the INAR(1)-NB fulldata models. Both the mean and median forecasts are quite different but the modal forecasts are exactly the same for both models.This latter finding is however, potentially very misleading. For example, the steady state (6-step ahead) forecast for the probability ofobserving the modal value 2, f

ð6Þ2 , differs substantially depending on whether the INAR-P post break or the full INAR-NB model is used

that is fð6Þ2 is calculated to be 0.2462 and 0.1549 for the post break and full data sets respectively and the post break estimate is

almost 160% larger. In addition, when 95% confidence intervals are constructed for these estimated conditional probabilities, weobtain (0.2200,0.2724) and (0.1249,0.1850) respectively for the post and full data sets. These intervals do not overlap and henceprovide very different predictions about f

ð6Þ2 . Also, the allocation of probability across the support by the two models is quite

different, with the Negative-Binomial allocating a lot more weight to the right hand tail and thus overestimating the probabilities oflarger numbers of thefts. For example, in steady state, the conditional probability that the number of thefts exceeds 4 is 0.1337 forthe post break data while it is 0.3218 for the full data set.

8. CONCLUSIONS

This paper makes the following contributions. First, we develop a two-sided CUSUM test that may be used quite generally fornonlinear and non-Gaussian models as long as they have the CAR structure and satisfy some regularity conditions. Theasymptotic distribution of the CUSUM test for the CAR class is shown to be that of absolute supremum of a Brownian Bridge.

Table 7. Estimations of post break MVTHEFT data

INAR(1)-P INAR(1)-NB

a k m n l b

IM Test 0.9062 1.5964 0.0057 0.0000 0.1350 27.1035P-value 0.3411 0.2064 0.9393 0.9989 0.7132 0.0000

Table 8. Mean, median, mode and probability forecasts for post break ( INAR(1)-P) and Full Data (INAR(1)-NB models)

h-Step A head forecasts of post break given XT ¼ 2: INAR(1)-PINAR(1)-NB

1 2 3 4 5 6

Mean 2.5098 2.6366 2.6682 2.6761 2.6781 2.67862.9077 3.4058 3.6792 3.8292 3.9115 3.9567

Median 2 2 2 3 3 33 3 3 3 3 3

Mode 2 2 2 2 2 22 2 2 2 2 2

ph(0|2) 0.0753 0.0712 0.0692 0.0687 0.0686 0.06850.0817 0.0881 0.0843 0.0811 0.0792 0.0781

ph(1|2) 0.2017 0.1884 0.1849 0.1840 0.1838 0.18370.1661 0.1546 0.1447 0.1392 0.1362 0.1346

ph(2|2) 0.2615 0.2490 0.2468 0.2463 0.2462 0.24620.2459 0.1793 0.1649 0.1592 0.1564 0.1549

ph(3|2) 0.2203 0.2191 0.2197 0.2198 0.2199 0.21990.1932 0.1654 0.1552 0.1512 0.1493 0.1483

ph(4|2) 0.1361 0.1444 0.1466 0.1471 0.1473 0.14730.1285 0.1329 0.1301 0.1286 0.1279 0.1275

ph(5|2) 0.0661 0.0760 0.0782 0.0788 0.0789 0.07890.0791 0.0976 0.1007 0.1015 0.1018 0.1020

ph(6|2) 0.0263 0.0333 0.0348 0.0351 0.0352 0.03520.0464 0.0672 0.0736 0.0759 0.0769 0.0774

ph(7|2) 0.0089 0.0125 0.0132 0.0134 0.0134 0.01350.0264 0.0442 0.0514 0.0544 0.0558 0.0565

ph(8|2) 0.0026 0.0041 0.0044 0.0045 0.0045 0.00450.0147 0.0280 0.0346 0.0377 0.0392 0.0399

ph(9|2) 0.0007 0.0112 0.0013 0.0013 0.0013 0.00130.0081 0.0173 0.0227 0.0254 0.0268 0.0275

ph(10|2) 0.0002 0.0003 0.0003 0.0003 0.0003 0.00030.0043 0.0104 0.0145 0.0167 0.0179 0.0185



10

Second, we suggest use of the maximum score statistic for testing parameter constancy in the general INAR process and tabulateestimated critical values for the equi-dispersed Poisson case. Many models in the INAR class cannot be tested by sup-testtechniques due to the difficulties in finding critical values whereas the CUSUM statistic allows tests for the stability of theconditional mean of the model to be conducted. Third, we carry out Monte Carlo experiments to evaluate the performances ofthe CUSUM and maximum score tests when the DGP is INAR-P or INAR-NB. It is found that the CUSUM test seems to be undersized when the sample size is small but approaches the nominal 5% for large T. In contrast, the maximum score test has goodsize for all sample sizes. It is found that the two tests are consistent and the largest power occurs when the break fraction is 50%.In the Poisson model we find that the sup S(p) test is the more competitive in most cases, provided the complete specification ofthe model is known. Finally, an empirical example shows that negligence of a structural break may lead to biased probabilisticforecasts for counts.

APPENDIX

PROOF OF THEOREM 1. Define the terms

ht

¼ @g h;Ft�1

@h

gt h� �¼ g h;Ft�1

� � :Use a Taylor series about the true parameter h0 to obtain

rt ¼ Yt � gt h� �

¼ Yt � gt h0ð Þ þ h� h0

� �0ht þ Rt

n o¼ rt � h� h0

� �0ht � Rt

where the remainder is of the form

Rt ¼1

2h� h0

� �0_ht h�ð Þ h� h0

� �_ht is the derivative of ht and h

�is an intermediate point between h and h0. Next, in an obvious notation, construct the mean �r

r ¼ �r � h� h0

� �0�hþ �R

giving

rt � �r ¼ rt ��r � h� h0

� �0ht � �h

� Rt � �Rð Þ:

For s 2 [0,1] write

T�1=2XsTb c

t¼1

rt � �r

¼ T�1=2XsTb c

t¼1

rt � �rð Þ � T 1=2 h� h0

� �0T�1

XsTb c

t¼1

ht � �h

þ T�1=2XsTb c

t¼1

Rt � �Rð Þ

ð21Þ

where

T�1=2XsTb c

t¼1

Rt � �Rð Þ ¼ 1

2T1=2 h� h0

� �0T�3=2

XsTb c

t¼1

_ht h�ð Þ � �_h h�ð Þh i

T 1=2 h� h0

� �

Defining STðsÞ ¼ T�1=2r�1r

P sTb ct¼1 rt with r2

r , the variance of rt, write

T�1=2r�1r

XsTb c

t¼1

rt � �r

¼ ST sð Þ � sST 1ð Þ½ � � T1=2 h� h0

0T�1r�1

r

XsTb c

t¼1

ht � �h

þ T�1=2r�1r

XsTb c

t¼1

Rt � �Rð Þ:

ð22Þ

By Thm 3.2 of KN or TJ, it follows that T 1=2 h � h0

� �is asymptotically normal and hence is OP(1). Taking deviations from the true

means, lh, we see that that



11

ht � �h

¼ ht � lhð Þ � �h� lh

ð23Þ

and

�h� lh

!p 0 ð24Þ

by stationarity and ergodicity. Consider the ith element of ht, hi,t and this sequence is also stationary and ergodic. Using a maximalinequality (see Cor 3 of Maxwell and Woodroofe (2000), for example) for each lag, i, and any b > 1, there exists a constant K suchthat

P sups2½0;1�

T�1XsTb c

t¼1

hi;t � lh

�� > e

" #

¼ P maxj¼1;...;T

Xj

t¼1

hi;t � lh

�� > Te

" # KTb

T 2e2! 0

by stationarity and finite second moments. Thus,

T�1 sups2 0;1½ �

XsTb c

t¼1

ht � �h ��

��!p 0: ð25Þ

and hence

T 1=2 h� h0

� �0T�1

XsTb c

t¼1

ht � �h

!p 0:

The remainder term, T�3=2P sTb c

t¼1_ht h�ð Þ � �_h h�ð Þ� �

, is dealt with in the same way as (23), (24) and (25) using H2ij of Assumption KN (2) to

uniformly bound the elements of the matrix _ht h�ð Þ over h�. Consequently, the remainder term (22) is op(1) and the asymptotic

distribution of the partial sum process of the rt � �r is the same as that of STðsÞ � sSTð1Þ½ �.By Assumption KN (1) and (3), Yt is a stationary ergodic process and hence rt is a stationary and ergodic martingale difference

sequence with finite 4th moments. Thus, a martingale functional CLT applies (see Hall and Heyde (1980)) and it follows that thepartial sum weakly converges i.e.

STðsÞ ¼ T�1=2r�1r

XsTb c

t¼1

rt )WðsÞ

where WðsÞ is a Brownian motion process. The continuous mapping theorem shows that

maxj¼1;...;T

T�1=2r�1r

Xj

t¼1

rt � �r ��

��) sups

BðsÞj j

where BðsÞ ¼ WðsÞ � sWð1Þ i.e. Brownian Bridge. By similar sorts of argument to those above and using ergodicity again,

s2r ¼ T�1

XT

t¼1

rt � �r 2!p r2

r

and hence by the continuous mapping theorem it is seen that

maxj¼1;...;T

T�1=2s�1r

Xj

t¼1

rt � �r ��

��) sups2 0;1½ �

BðsÞj j:

PROOF OF THEOREM 2. Following the proof of Theorem 1 we need to rederive (21) under nonstationarity. Proceeding along the lines of TJwe obtain

r�1XsTb c

t¼1

rt � �r

¼ r�1XsTb c

t¼1

rt ��rð Þ ð26Þ

�r�1R�1=2u H h� h0

� �0R1=2

u H�1R1=2h :R�1=2

h

XsTb c

t¼1

ht � �h

ð27Þ



12

þr�1XsTb c

t¼1

Rt � �Rð Þ ð28Þ

where H ¼PT

t¼1 hth0t and the remainder term is standardised as in eqn (27). By Thms 6.1 and 6.2 of TJ, there exists a consistentsequence h such that R�1=2

u H h� h0

� �is OP(1) and under eqns (18) and (19) R1=2

u H�1R1=2h :R�1=2

h

P sTb ct¼1 ht � �h

is also bounded inprobability. Thus, the presence of r�1 in (27) and bounding the remainder term (28) as before ensures

r�1XsTb c

t¼1

rt � �r

¼ r�1XsTb c

t¼1

rt � �rð Þ þ oPð1Þ:

Theorem 6.2 of TJ effectively proves a martingale CLT for rt and so the conditions of TJ along with (20) ensures that r�1P sTb c

t¼1 rt

satisfies a FCLT (Thm 27.14 of Davidson (1994), for example). In practice, r needs to be replaced by an estimated quantity andarguing along the same lines as before we may deduce that

XT

t¼1

rt � �r 2

" #�1=2

maxj¼1;...;T

XsTb c

t¼1

rt � r� �

) sups2½0;1�

BðsÞj j:

Acknowledgements

We are greatful to the referee for constructive comments on an earlier draft of this paper.

NOTES

1. In CLAR applications, it is straightfoward to take account of covariates by regressing Yt on its own lags and the covariates and thencomputing residuals to which the CUSUM test is applied. We fitted some seasonal dummies but found there was no substantialdifference in the overall conclusions of the study.

REFERENCES

Al-Osh, M. A. and Alzaid, A. A. (1987) First-order integer valued autoregressive (INAR(1)) process. Journal of Time Series Analysis 8, 261–75.Andrews, D. W. K. (1993) Tests for parameter instability and structural change with unknown change point. Econometrica 61, 821–56.(Corrigendum, 71,

395-397)Brown, R. L., Durbin, J. and Evans, J. M. (1975) Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical

Society B 37, 149–63.Cox, D. R. (1981) Statistical analysis of time series: some recent developments. Scandinavian Journal of Statistics 8, 93–115.Davidson, J. (1994) Stochastic Limit Theory. Oxford: OUP.Drost, F. C., Van den Akker, R. and Werker, B. J. M. (2009) Efficient estimation of autoregression parameters and innovation distributions for

semiparametric integer-valued AR(p) models. Journal of the Royal Statistical Society B 71, 467–85.Engle, R. F. and Russell, J. R. (1998) Autoregressive conditional duration: a new approach for irregularly spaced transaction data. Econometrica 66, 987–

1007.Freeland, R. K. and McCabe, B. P. M. (2004) Forecasting discrete valued low count time series. International Journal of Forecasting 20(3), 427–34.Grunwald, G. K., Hyndman, R. J., Tedesco, L. and Tweedie, R. L. (2000) Non-Gaussian Conditional Linear AR(1) Models. Australian New zealand Journal of

statistics 42(4), 2000, 479–495.Granger, C. W. J. and Andersen, A. P. (1978) An Introduction to Bilinear Time Series Models. Gottingen: Vanderhoeck and Ruprecht.Hall, P. and Heyde, C. C. (1980) Martingale Limit Theory and Its Applications. New York: Academic Press.Han, L. (2011) Statistical Analysis of Structural Breaks in Discrete Valued Time Series Processes. Ph.D Thesis, University of Liverpool, UK.Heinen, A. and Rengifo, E. (2007) Multivariate autoregressive modeling of time series count data using copulas. Journal of Empirical Finance 14, 564–

83.Klimko, L. A. and Nelson, P. I. (1978) On conditional least squares estimation for stochastic process. The Annals of Statistics 6, 629–42.Maxwell, M. and Woodroofe, M. (2000) Central limit theorems for additive functionals of Markov chains. The Annals of Probability 2000, 28(2), 713–24.McCabe, B. P. M., Martin, G. M. and Harris, D. (2011) Efficient probabilistic forecasts for counts. Journal of the Royal Statistical Society B 73, 253–72.McKenzie, E. (1985) Some simple models for discrete variate time series. Water Resources Bulletin 21, 645–50.Olmo, J. and Pouliot, W. (2011) Early detection techniques for market risk failure. Studies in Nonlinear Dynamics & Econometrics 15(3), 1–54.Perron, P. (2007) Dealing with structural breaks. In Palgrave Handbook of Econometrics, Vol. 1: Econometric Theory (eds T. C. Mills and K. Patterson).

Palgrave: Macmillan, pp. 7–20.Pitman, J. and Yor, M. (1999) Path decompositions of a Brownian Bridge related to the ratio of its maximum and amplitude. Studia Scientiarum

Mathematicarum Hungarica 35(1999), 457–74.Serfling, R. J. (1980) Approximation Theorems of Mathematical Statistics, NY: John Wiley.Tjostheim, D. (1986) Estimation in nonlinear time series models. Stochastic Processes and their Applications 21, 251–73.



13

Documents

Testing for parameter constancy in non-Gaussian time series