Upload
lu-han
View
221
Download
0
Embed Size (px)
Citation preview
Testing for parameter constancy innon-Gaussian time seriesLu Hana and Brendan McCabea,�,†
This paper investigates testing for parameter constancy in models for non-Gaussian time series. Models for discretevalued count time series are investigated as well as more general models with autoregressive conditionalexpectations. Both sup-tests and CUSUM procedures are suggested depending on the complexity of the modelbeing used. The asymptotic distribution of the CUSUM test is derived for a general class of conditionalautoregressive models.
Keywords: Non-Gaussian Time Series; Discrete Valued Count Time Series; sup-Test; CUSUM Test.
1. INTRODUCTION
This paper is concerned with testing for parameter constancy in models for nonlinear and non-Gaussian time series. One of themotivations of the paper is that there exist difficulties in applying some of the more popular techniques, like the likelihood basedsup-tests introduced by Andrews (1993), to nonlinear non-Gaussian models. This is essentially because the nonlinearity and thepossibility of large numbers of parameters (often associated with lags) makes computation of the required critical values quitedifficult. Accordingly we seek a simplified approach that may be applied generally even in the absence of a likelihood.
The paper is organised as follows. Section 2 discusses a set of nonlinear non-Gaussian models, the integer autoregressive class(INAR), that are useful in modelling dependence in discrete count data and a maximum score statistic (Andrews, 1993) is suggestedto test for a structural break. It turns out that it is impossible to compute critical values for sup-tests unless the chosen model inthe INAR class is comparatively simple. This suggests the need for a general technique to test for parameter stability that may beapplied not only to discrete valued time series but to a general class of nonlinear and non-Gaussian processes. Section 3 uses ageneralised conditional autoregressive (CAR) framework which extends the linear setup of Grunwald et al. (2000) (GHTT). Accordingto GHTT even the simple conditional linear structure accounts for over 30 different nonlinear non-Gaussian time series modelspublished in the literature. Included are models where Yt is continuously valued and defined on the real line or on the subsetsthereof. Also included are models for discrete Yt and, in particular, the INAR class. We advocate the use of the two-sided CUSUMtest for parameter constancy, proposed by Brown et al. (1975) (BDE), in the CAR class. Section 4 describes the CUSUM test and itsasymptotic distribution is established in Section 5. These results allow the CAR class to be checked for parameter stability in a verysimple way. A Monte Carlo study is conducted in Section 6 to assess the performance of the CUSUM test when the data isgenerated by two different discrete valued INAR models from the CLAR class, the Poisson autoregression (INAR-P) and Negative-Binomial autoregression (INAR-NB) models respectively. Of course, there is the potential for the CUSUM test based on theconditional mean to suffer a loss of power relative to a sup-test for a model with a fully specified likelihood. A further Monte Carlosimulation compares the performance of the CUSUM and sup-tests in the INAR-P model, as it is feasible to compute critical valuesin this case. In contrast, due to computational difficulties, critical values for the INAR-NB model are not presented. A datasetconsisting of 144 count observations on motor vehicle thefts is analysed in Section 7 , where a structural break is detected and acomparison made between forecasts derived from a naive use of the full data and those computed from the post break segmentalone. The two sets of forecasts differ appreciably and this is true both for point forecasts and probability estimates of the forecastdistribution. Thus, ignoring a structural break may have a substantial bearing on the forecast outcomes for count data series.Section 8 concludes.
2. MODELS FOR COUNT TIME SERIES
The general INAR class is a set of models for discrete valued time series that has the form
Yt ¼ /1 � Yt�1 þ � � � þ /p � Yt�p þ et; ð1Þ
aUniversity of Liverpool*Correspondence to: Brendan McCabe, Management School, University of Liverpool, Chatham Building, Chatham Street, Liverpool, L69 7ZH, UK.†E-mail: [email protected]
J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd.
Original Article
First version received November 2011 Published online in Wiley Online Library: 2012
1
where, conditional on Yt�k, /k � Yt�k are thinning operators (see Section 2.1) and et is a disturbance. The disturbance may follow anydiscrete valued process. It is possible, asymptotically, to estimate the infinite dimensional distribution of et non-parametrically (seeDrost et al. (2009) and McCabe et al. (2011)), However it is clearly very difficult to simulate critical values for a sup-test in this context;as a result, a parametric model for the disturbances is required. Here, we list the basic properties of two of the most commonparametric discrete valued models in the case where there is a single lag. The Poisson model is discussed in Section 2.1 while Section2.2 deals with the Negative-Binomial case.
2.1. The INAR (1)-P model
The INAR(1)-P model originally proposed by McKenzie (1985) and Al-Osh and Alzaid (1987) and is specified by the equation
Yt ¼ BinðYt�1; aÞ þ ntðkÞ; ð2Þ
where Bin(Yt�1,a) is, conditional on Yt�1, a Binomial distribution with probability of success a and nt(k) are independently andidentically (i.i.d.) Poisson arrivals with parameter k, denoted Pois(k). This model is often written in the thinning notation
Yt ¼ a � Yt�1 þ ntðkÞ;
where
a � Yt�1 ¼XYt�1
j¼1
Bjt
and each collection fBjtgYt�1
j¼1 is a set of independent Bernoulli random variables with parameter a. The mass function of Bin(Yt�1,a) isgiven by
f ðsjYt�1; aÞ ¼ Yt�1
s
� �asð1� aÞYt�1�s
and that of nt(k) is written
gðs; kÞ ¼ e�kks
s!:
By the usual convolution arguments
pðYtjYt�1Þ ¼XminðYt ; Yt�1Þ
s¼0
Yt�1
s
� �asð1� aÞYt�1�s e�kkYt�s
ðYt � sÞ! ð3Þ
from which the likelihood, conditional on the initial observation, can be constructed.The notation ðat; ktÞ is used to indicate parameter instability when a and k are allowed to vary. In the case of a single structural
break at a time TB, we set
at ¼ a and kt ¼ k for t ¼ 1; . . . ; TB:at ¼ aþ Da and kt ¼ kþ Dk for t ¼ TB þ 1; . . . ; T :
�: ð4Þ
The break point TB is assumed to be in an interval that is TB 2 TL; TUð Þ and often the interval TL; TUð Þ is expressed as a fraction of thesample size, and denoted P. A minor modification of the argument leading to eqn (3) allows for the likelihood to be constructedunder a structural break. Specifically, the log likelihood is given by
‘ ¼ logYT
t¼2
XminðYt ;Yt�1Þ
s¼0
Yt�1
s
� �as
tð1� atÞYt�1�s e�kt kYt�st
ðYt � sÞ! ð5Þ
The null hypothesis is H0: ðDa;DkÞ ¼ 0 and the alternative, H1, is that, after a fixed point TB (fraction p), ðDa;DkÞ 6¼ 0 . The scorefunction for parameter vector h ¼ Da; Dk; a; k½ �0 is
_‘h ¼ _‘Da ;_‘Dk ;
_‘a; _‘k� �0
;
where _‘h denotes differentiation of the log likelihood given in eqn (5) with respect to vector h, whose elements are specified by eqn(4). The symmetric information matrix is written
€‘ðhÞ ¼ �
€‘DaDa€‘DaDk
€‘Daa€‘Dak
€‘DkDk€‘Dka
€‘Dkk€‘aa
€‘ak€‘kk
2664
3775
with each element of €‘ hð Þ, €‘ð�Þð�Þ, denoting a second derivative with respect to the elements of h. The restricted parameter vectorunder the null, hR, is estimated by hR ¼ 0; 0; a; k
� �0. The estimators a and k are obtained by maximising the log likelihood (5)
under ðDa;DkÞ0 ¼ 0. The score statistic SDaDkðpÞ is evaluated as
L. HAN AND B. MCCABE
wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012
2
SDaDkðpÞ ¼ _‘h hR
� �0€‘ hR
� ��1_‘h hR
� �:
By constructing such a score statistic for every TB 2 TL; TUð Þ we compute the sup-test as the maximum of that set, writtensup SðpÞ ¼ supp SDaDkðpÞ. Obvious special cases occur when a only or k only is thought to change. Explicit expressions for thederivatives involved are available in Han (2011). For the INAR(1)-P model it is feasible to construct asymptotic critical values for thesup-test by simulation, for, despite the nonlinearity, there are only two parameters involved under the null.
Once P and the dimension of the shift D are determined, the critical values of the distribution of sup S(p) can be evaluated by theMonte Carlo experiments. We set P ¼ [0.1,0.9] and T ¼ 3000. In addition, 7000 replications are used to determine critical values atthe 0.01, 0.05 and 0.1 significance levels. The simulated critical values for sup S(p) are given in the first half of Table 1, for the caseswhere a changes however k is regarded as a constant, k changes while a is constant and, finally, where both a and k are thoughtlikely to change. We computed confidence intervals for the quantiles of the distribution of sup S(p) using asymptotic normality anda nonparametric estimate of the unknown density as in Serfling (1980), Section 2.6.
2.2. The INAR (1)-NB model
The specification of the INAR (1)-NB model is given by
Zt ¼ b-Bin Zt�1;m; nð Þ þ ft l; bð Þ; ð6Þ
where, conditional on Zt�1, b-Bin(Zt�1, m, n) has a Beta-Binomial distribution. The Beta-Binomial mass function, on s ¼ 0,1, . . . ,Zt�1, is
b-Bin s j Zt�1; m; nð Þ ¼ Zt�1
s
� �B mþ s; nþ Zt�1 � sð Þ
B m; nð Þ
where B(Æ, Æ) is the Beta function. The Negative-Binomial arrivals ft(l,b) are i.i.d. and have mass function N (s), on s ¼ 0,1,2, . . .1, givenby
N sð Þ ¼ C sþ lð ÞC lð ÞC sþ 1ð Þ b
s 1� bð Þl;
where C(Æ) is the Gamma function. It is also possible to think of the Negative-Binomial model as a INAR-P model where a and k arerandom with Beta and Gamma distributions respectively. Thus, the Negative-Binomial model may be thought of as a very naturaloverdispersed generalisation of the Poisson. As before, the likelihood under a structural break may be computed and the relevantscore and information matrix are available in Han (2011). Unfortunately, it is not feasible to readily compute simulated critical valuesfor the maximum value of the score statistic for this model because of the number of parameters and the nonlinearities involved.
3. THE CAR CLASS OF MODELS
Due to the potential difficulty in computing critical values for sup-statistics and the need for tests of parameter stability for nonlinearnon-Gaussian model in general, this section adopts a conditional autoregressive approach which encompasses many existing modelsin the literature. In Section 3.1, the scope of the CAR class is outlined and its basic properties are established. Then, Section 3.2reviews the Conditional Least Squares (CLS) framework and the assumptions of Klimko and Nelson (1978) (KN), which allowsasymptotic distribution theory of CLS estimators of CAR class to be established.
3.1. Definition and Basic Properties
Let Yt be a process in discrete time with a continuous or discrete sample (state) space Y � R, the real line. The CAR structure isdefined by the relation
E Yt jFt�1
� �¼ g h;Ft�1
; ð7Þ
where Ft�1 is a sigma field at time t�1 and h is a P-dimensional vector of parameters. The function g(Æ) is assumed to be known andthis is an important consideration in applications where such prior knowledge is assumed.
Table 1. Estimated critical values for maximum score with the confidence intervals (Replication ¼ 7000) and two-sided CUSUM tests
Sup S(p)Two-sided CUSUM
Size Da Dk Da;Dkð Þ
0.01 ð11:3303; 13:8472Þ12:5888
ð11:3868; 13:6820Þ12:5344
ð15:5823; 17:3052Þ16:4437 1.6276221
0.05 ð8:8657; 9:4071Þ9:1364
ð8:8301; 9:3749Þ9:1025
ð12:1665; 12:7965Þ12:4814
1.3581015
0.10 ð7:3994; 7:7126Þ7:5561
ð7:3958; 7:7432Þ7:5695
ð10:4511; 10:8078Þ10:6294
1.2238734
TESTING FOR PARAMETER CONSTANCY
J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa
3
The conditional linear special case, the CLAR (p) model, has
g /;Ft�1
¼ /0 þ /1Yt�1 þ � � � þ /pYt�p
¼ /0 þ y0t�1/:ð8Þ
GHTT studied the 1-dimensional linear case that is eqn (8) with p ¼ 1. The CLAR(p) class of models itself is very broad and, as GHTThas pointed out, contains many of the nonlinear and non-Gaussian models that have been proposed in the literature. For instance,continuously valued autoregressive (AR) models of the form
Yt ¼ /1Yt�1 þ � � � þ /pYt�p þ et /0ð Þ; ð9Þ
where, et /0ð Þ are i.i.d. with finite mean /0, are included. Any continuous random variable may serve as disturbances, for example,et /0ð Þ may be Gaussian on R or exponentially distributed on Y ¼ ½01Þ. Switching models are allowed so that et /0ð Þ could be zerowith probability P and exponential, say, with probability (1�p). One immediate consequence of the CLAR (p) definition (8) is that thecorrelation structure of the CLAR (p) class is the same as that of the AR (p) model (9). This generalises the p ¼ 1 result of GHTT. TheCLAR (p) class includes an extension of eqn (9) that allows for random coefficients that is
Yt ¼ /1;t Yt�1 þ � � � þ /p;tYt�p þ et /0;t
; ð10Þ
where the /i,t are random variables with E /i;t
� �¼ /i . For example, in the case where p ¼ 1, we can let /1,t have a suitably chosen
Beta distribution and et /0ð Þ a Gamma distribution, resulting in the marginal distribution of Yt being Gamma. Such models are usefulin modelling duration data which are dependent.
The general INAR class as given in eqn (1) is also included as a special case of the CLAR (p) process with Y ¼ f0; 1; 2:::g. The INAR-Pmodel of Section 2.1 uses a thinning function built on the Binomial distribution with the arrivals process, et /0ð Þ, being Poisson. Theconditional expectation has the form
E Yt jFt�1
� �¼ aYt�1 þ k:
The INAR-NB model of Section 2.2 uses Beta-Binomial thinning and Negative-Binomial arrivals. The conditional expectation is
E Zt jFt�1
� �¼ m
mþ nZt�1 þ l
b1� b
ð11Þ
and is linear in Zt�1 with /1 ¼ m/(m+n) and /0 ¼ lb/(1 � b). The parameters /k in eqn (1) are also permitted to be random variablesan example of which, given in Section 2.2, was used to motivate the INAR-NB model.
Parameter driven models (see Cox(1981), fall into the CAR class with
g /;Ft�1
¼ g y0t�1/þ /0
:
For example, an alternative to eqn (2) when modelling dependent counts would be to assume that
Yt jFt�1
¼d Pois ktð Þ
with kt ¼ e y0t�1/þ/0ð Þ thus allowing for Yt jFt�1
to be Poisson distributed (see, for example, Heinen and Rengifo (2007)). Here ¼d
means equal in distribution. A Gamma distribution whose mean is driven by g y0t�1/ þ /0
could serve as an alternative model
for dependent durations (see, for example Engle and Russell (1998)). Section 3.2 addresses the question of inference in the CARclass.
3.2. Estimation of the CAR Class
Specifying detailed regularity conditions for inference in CAR models is quite lengthy, so we utilise the Conditional Least Squares(CLS) framework of KN, specifically their Thms 2.1 and 2.2. CLS minimises, with respect to h, the sum of the squared deviations
Q hð Þ ¼XT
t¼1
Yt � E Yt jFt�1
2 ð12Þ
For notational convenience we start summations at t ¼ 1. Under the KN conditions, h, a solution to the equations
@Q hð Þ@h ¼ 0;
ð13Þ
is a consistent estimate of h and T 1=2 h � h� �
is asymptotically normal. Unfortunately, verifying these high level conditions for thevarious models that may be used in practice is quite difficult and KN (Thms 3.1 and 3.2) provide details for the case when Yt isstationary and ergodic. However, Tjostheim (1986) (TJ) has made the important point that finding nonlinear models which satisfystationary and ergodic conditions is far from trivial and he provides additional consistency and asymptotic results for nonstationaryprocesses.
Noting that rt ¼ Yt � E Yt jFt�1
is a martingale difference sequence, we may utilise the fact that Laws of large Numbers (LLN)
and Central Limit Theorems (CLT) are readily available for such processes. Stationarity and ergodicity simplify matters considerably.
L. HAN AND B. MCCABE
wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012
4
Since a stationary process may or may not have started in the infinite past and since we require martingale arguments, whoseadapted sigma fields are required to be increasing, it is necessary to define the (sub) sigma field, for some finite integer m,
Ft�1ðmÞ ¼ r Ys; t �m s t � 1f g
and to assume that (a.s.)
E Yt jFt�1
¼ E Yt jFt�1 mð Þ
E r2t jFt�1
¼ E r2
t jFt�1 mð Þ ð14Þ
(See eqns (3.3) of TJ). Transparent sufficient conditions for asymptotic normality of T 1=2 h� h� �
, the solution to eqn (13), are
ASSUMPTION KN(1) the sequence Yt is stationary and ergodic with finite 4th moments(2) the function g h;Ft�1
and its partial derivatives are continuous and uniformly bounded in h for all i,j,k i.e.
gj j H0; dg= dhij j H1i ; d2g= dhi dhj
�� �� H2ij ; d3g= dhi dhj dhk
�� �� H3ijk
where the H-functions are independent of h, may depend on Yt�1; . . . ; Y0 and are square integrable.(3) the matrix defined by E dg=dhi:dg=dhj
� �is non singular and dg h;Ft�1
=dhi has finite 4th moments.
In the linear CLAR (p) case, Assumption KN reduces to the sequence yt being stationary and ergodic with finite 4th moments andthe matrix of the lagged yt�1 having rank P; these conditions are sufficient for T 1=2ðh � hÞ to be asymptotically normally distributed.
In the CLAR (1) case, conditions for stationarity and ergodicity were given in GHTT and while these conditions are not veryrestrictive, as noted above, TJ makes the point that the simplicity of the linear situation does not transfer to nonlinear models, forexample the bilinear models of Granger and Andersen (1978). Thus conditions for CAR estimators to be asymptotically normal arerequired for nonstationary processes. Theorems 6.1 and 6.2 of TJ provide conditions for h to be consistent and for T 1=2ðh � hÞ to beasymptotically normal for a reasonably broad class of nonstationary processes which allows for heterogeneity and processes whichdo not have an initial stationary distribution; in addition a multivariate generalisation is provided allowing Yt to be a vector.
4. STRUCTURAL BREAKS IN CAR MODELS
For a recent survey of the general area of structural breaks, we refer the reader to Perron (2005). Non constancy of the parameters in thedata generating process (DGP) of Yt will typically induce instability in the parameter vector h and the conditional expectation will behave time varying parameters that is E Yt jFt�1
� �¼ g h
t;Ft�1
. When parameter instability presents itself as a structural break we set
ht ¼h t ¼ 1; . . . ; TB
hþ Dh t ¼ TB þ 1; . . . ; T
�:
where the break point TB and the magnitude of shift vector Dh are unknown. The null hypothesis is
H0 : Dh ¼ 0
and the alternative is that H0 is not true.The Negative-Binomial model of Section 2.2 shows that there may be a price to pay for adopting the simple CAR approach. From
eqn (11), it is apparent that simultaneous changes in parameters may cancel out and the CAR approach will have no power. To seehow this might occur, set the total differential of / ¼ m/(m + n) to zero to obtain
D/ ¼ @
@m
m
mþ n
� Dmþ @
@n
m
mþ n
� Dn ¼ 0
which implies that there will be no change in / if
Dm
m¼ Dn
n
i.e. if m and n change by the same percentage amount. Of course, there is no a priori reason to believe that such cancellations arelikely to occur in practice. Since the model is estimated under the null the a CUSUM test may always be performed, however it mayhave little power in this type of situation.
Using eqn (13) define
rt ¼ Yt � g h;Ft�1
� �ð15Þ
and
�r ¼ T�1XT
t¼1
rt ð16Þ
TESTING FOR PARAMETER CONSTANCY
J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa
5
then the two-sided statistic based on the CUSUM process is given by
maxj¼1;...;T
T�1=2s�1r
Xj
t¼1
rt � �r �����
�����where
s2r ¼ T�1
XT
t¼1
rt � �r 2
: ð17Þ
If there was additional information on the possible position of the break, for example, that it occurred late in the sample for example,a weighted version of the CUSUM may be considered, see Olmo and Pouliot (2011)
5. NULL DISTRIBUTION OF CUSUM TEST
In view of the discussion of the distribution theory for CAR estimators we shall consider the stationary and nonstationary casesseparately. First, we deal with the stationary CAR model and this will allow the discrete valued models of Section 2 to be tested forstructural breaks as the Poisson and Negative-Binomial thinning models are stationary and ergodic. Thus, the following Theorem 1,whose proof is given in the Appendix, circumvents the difficulty of finding critical values for the sup-test in the INAR-NB model.
THEOREM 1. Let Yt satisfy the conditions of Assumption KN. For h, a solution to (13), and using using definitions (15),(16) and (17),
maxj¼1;...;T
T�1=2s�1r
Xj
t¼1
rt � r� ������
�����) sups2½0;1�
B sð Þj j
where BðsÞ is a Brownian Bridge.Explicit expressions for the distribution of the Brownian Bridge (see Pitman and Yor (1999)) are available and critical values are
given in the second half of Table 1.
Theorem 1 hinges on (C1) T 1=2 h� h0
� �being OP(1), (C2) T�1 sups2 0;1½ �
P sTb ct¼1 ht � �h ��� ���!p 0 and (C3) rt satisfying a stationary
martingale FCLT. These steps need to reformulated for the non-stationary case. Define moments r2 ¼ VarPT
t¼1 rt
h i,
Rh ¼ VarPT
t¼1 ht
h iand Ru ¼
PTt¼1 E hth0tr2
t
� �.
THEOREM 2. Augment the conditions of Thms 6.1 and 6.2 of Tjostheim (1986) by
XT
t¼1
hth0t ¼ Op R1=2u R1=2
h
� �; ð18Þ
R�1=2h
XsTb c
t¼1
ht ð19Þ
is tight and
r�1XsTb c
t¼1
E r2t
� �! s; s 2 ½0; 1� ð20Þ
then
XT
t¼1
rt � �r 2
" #�1=2
maxj¼1;...;T
XsTb c
t¼1
rt � r� �
) sups2 0;1½ �
B sð Þj j
Sufficient conditions to ensure eqns (18) and (19) would be to assume that Yt satisfied a mixing property (or perhaps near epochdependent on a mixing process) then, using eqn (14), ht would also be mixing with the same mixing numbers and so a FCLT could beestablished. Theorems 1 and 2 show that any process Yt of interest to an investigator may be subjected to a test for parameterinstability in a simple way as long as it satisfies the CAR property and the associated regularity conditions. Obviously, in any particularapplication one has to check that the CLAR family member does satisfy the conditions. An interesting case is the random coefficientAR model of Section 3.1. Using Thm 7.2 and (4.8) of TJ, it follows that eqn (10) satisfies the regularity when /i,t are i.i.d., areindependent of the disturbance and satisfy some additional moment conditions. On the other hand, should /i,t follow a stochasticprocess (e.g. an ARMA model) only some special cases have been considered as in, for example, Thm 7.3 of TJ.
L. HAN AND B. MCCABE
wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012
6
6. MONTE CARLO EXPERIMENTS FOR THE CUSUM AND SUP S(P) TESTS
Here, the sizes and powers of the two-sided CUSUM test when the data generating process (DGP) is Poisson or Negative-Binomial, areinvestigated. We also compare the CUSUM and sup S(p) tests to determine the loss of power that may occur by using information onthe conditional mean only. A significance level of 5% is employed in each experiment, which consists of 5000 replications for theCUSUM test and 1000 for the sup S(p) test. The results show that the CUSUM test can suffer from substantial power loss whencompared with the sup S(p) test. However, the sup S(p) test presupposes the true model is known and correctly specified but this isoften not the case in practice.
6.1. Size
Table 2 gives sizes of the CUSUM test for both the Poisson and Negative-Binomial models with the sample size T ranging from 100 to1000. For the Poisson model we consider cases where k is fixed at 1 and the values of a vary from 0.1 to 0.9 with grid of 0.2. Alsoconsidered is the situation where a is fixed at 0.5, k ranges from 1–3 with an increment of 0.5. For the Negative-Binomial model, themeans of the Beta thinning and Gamma arrivals distributions have the same magnitudes as those in Poisson cases.
The CUSUM test is seen to be under sized when T is small but as T gets large the size approaches the required 5%. In contrast, thesup S(p) test for the Poisson model has good size for all values of T as seen in Table 3.
6.2. Power
Extensive power results are available from the authors on request but the overall picture is as follows. Generally, the power of theCUSUM test is higher in the Poisson case than it is in the Negative-Binomial model. When either the thinning or innovationparameters change, the power surfaces are symmetric around Da ¼ Dk ¼ 0 and the power approaches 1 as the absolute value ofthe change increases. As expected, the power is maximised near the break fraction Fr ¼ 50% and tails off as the break fractionapproaches the extremes of P, again see Olmo and Pouliot (2011). The CUSUM test has significantly more power against changes in kthan against changes in a. When both a and k change simultaneously in opposite directions the power may drop significantly. This isbecause an increase in the number of arrivals may be offset by a decrease in the thinning parameter resulting in fewer survivors, thusleaving the observed values little changed.
In contrast to the CUSUM statistic, the power of the sup S(p) test is much less sensitive to the break fraction being at the extremesof P. It is also less sensitive to possible offsetting effects when a and k change simultaneously. In comparing the CUSUM andsup S(p), for changes in a only, the sup S(p) test is clearly superior while for changes on k only, the CUSUM test is competitive exceptat the extremes of P. When both a and k change the CUSUM test can be more powerful except in those regions where offsettingmay occur. Further results on power can be found in Han (2011).
7. APPLICATION TO MOTOR VEHICLE THEFT DATA
The data analysed in this section consists of T ¼ 144 monthly counts of motor vehicle theft in district 101 of Pittsburgh, PA, recordedfrom January 1990 to December 2001. This dataset, denoted by MVTHEFT hereafter, is originally from Crime Data section of theForecasting Principles website (see http://www.forecastingprinciples.com). Prior to testing for a structural break in Section 7.1, weplot an overall picture of the data in Figure 1 and present descriptive statistics in Table 4. The data is clearly overdispersed but the
Table 2. Sizes of two-sided CUSUM test for INAR (1) -P and INAR(1)-NB models with replication ¼ 5000
INAR(1)-P Model (Two-sided CUSUM Tests)
k ¼ 1
a
a ¼ 0.5
k
0.1 0.3 0.5 0.7 0.9 1 1.5 2 2.5 3
T 100 0.0292 0.0306 0.0254 0.0160 0.0242 T 100 0.0232 0.0256 0.0260 0.0274 0.0284200 0.0376 0.0362 0.0340 0.0288 0.0170 200 0.0352 0.0314 0.0340 0.0350 0.0352300 0.0432 0.0422 0.0422 0.0372 0.0266 300 0.0392 0.0370 0.0388 0.0338 0.0404500 0.0422 0.0438 0.0396 0.0368 0.0280 500 0.0414 0.0398 0.0386 0.0412 0.0448
1000 0.0474 0.0501 0.0468 0.0476 0.0410 1000 0.0456 0.0420 0.0422 0.0432 0.0440
INAR(1)-NB Model (Two-sided CUSUM Tests)
E[k] ¼ 1
E[a]
E[a] ¼ 0.5
E[k]
0.1 0.3 0.5 0.7 0.9 1 1.5 2 2.5 3
T 100 0.0240 0.0290 0.0216 0.0194 0.0238 T 100 0.0242 0.0246 0.0244 0.0286 0.0218200 0.0362 0.0360 0.0314 0.0348 0.0222 200 0.0336 0.0348 0.0356 0.0366 0.0332300 0.0402 0.0362 0.0398 0.0312 0.0286 300 0.0392 0.0342 0.0408 0.0376 0.0362500 0.0424 0.0454 0.0382 0.0370 0.0296 500 0.0390 0.0388 0.0410 0.0398 0.0416
1000 0.0462 0.0406 0.0438 0.0436 0.0410 1000 0.0478 0.0448 0.0418 0.0416 0.0442
TESTING FOR PARAMETER CONSTANCY
J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa
7
figure also illustrates that after the first 40 or so observations there seems to be no strong evidence of overdisrepsion. There is also ahint of some seasonality2. Further, the sample ACF, plotted in Figure 2, exhibits strong evidence of dependence. Because of theoverdispersion, the INAR-P sup test does not seem to be appropriate and the CUSUM approach is used. In Section 7.1, both INAR( 1)-Pand INAR(1)-NB models are fitted to the post break only data and model selection is implemented via the Information Matrix (IM) test.Section 7.2 investigates the difference in forecasting performance between a naive use of the full data set and using the post breaksegment only. It reveals that the break testing is important since the two sets of forecasts are quite different.
7.1. Test and estimation
Here, we apply the CUSUM procedure to test for the presence of a structural break and to identify the break position. The first task isto select the order of CLAR model to be fitted. Since there is the possibility of a break in the data, it is hard to identify the order by thesample ACF or partial autocorrelation function (PACF). Thus, a first order CLAR(1) model is fitted but the CUSUM procedure is modifiedby replacing the usual studentisation by a long run variance estimator (LRV) with the lag length chosen by q ¼ 12(T/100)1/2. Wecomputed the test for the Parzen, Bartlett, QS and Truncated kernels and all agree that the estimated break point is an auspiciousTB ¼ 42. In this example, the lag length evaluates to q ¼ 14 and we also recomputed the tests for a sequence ranging from q ¼ 10to q ¼ 18 to assess the robustness of the original choice. The estimated break point remained at TB ¼ 42.
Although the MVTHEFT data is not necessarily considered to be the sum of thinning and innovation processes, short termdependence among crime counts may nevertheless be modelled by an INAR model and since the data appear to exhibit a structuralbreak at TB ¼ 42, only 102 post break observations are used. The descriptive statistics for the post break data are given in Table 5and the degree of overdispersion is much reduced.
The post break sample PACF is plotted in Figure 3 and it appears that a model with a single lag is sufficient. The parameters of theINAR (1)-P model are estimated by CLS and these values are themselves used as starting values for the MLE’s. An INAR(1)-NB model isalso fitted by maximum likelihood. The estimation results for both models are provided in Table 6, where it can be seen that the
Table 3. Sizes of maximum score test for INAR(1)-P Model with Replication ¼ 1000
INAR(1)-PModel (Maximum score tests)
k ¼ 1
a
a ¼ 0.5
k
0.1 0.3 0.5 0.7 0.9 1 1.5 2 2.5 3
T 100 0.0516 0.0302 0.0269 0.0394 0.1818 T 100 0.0721 0.0697 0.0539 0.0536 0.0463200 0.0672 0.0473 0.0365 0.0378 0.0841 200 0.0682 0.0564 0.0433 0.0459 0.0458300 0.0547 0.0471 0.0398 0.0521 0.0671 300 0.0589 0.0480 0.0499 0.0629 0.0480500 0.0521 0.0508 0.0470 0.0431 0.0442 500 0.0502 0.0514 0.0579 0.0482 0.0443
1000 0.0582 0.0490 0.0532 0.0531 0.0471 1000 0.0518 0.0549 0.0537 0.0586 0.0531
Figure 1. Time series plots of MVTHEFT Data
Table 4. Descriptive statistics of MVTHEFT data
Maximum Minimum Mean Median Mode Variance
17 0 4.3333 3 2 14.4895
L. HAN AND B. MCCABE
wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012
8
estimated expectations of the thinning, /, and innovation, l, processes in the INAR (1)-NB model are very close to those estimatesobtained by CLS and ML when the INAR(1)-P is fitted. Thus both models appear to be reasonable representations for the post breakdata.
The choice between the two models is made by means of the Information Matrix (IM) specification test as suggested in Freelandand McCabe (2004). The results for the IM specification test for the INAR( 1)-P and INAR(1)-NB models are given in Table 7 and theNegative-Binomial is rejected in favour of the Poisson specification as the P-value of the component of the IM test associated with bis 0.0000.
Figure 2. Correlogram of MVTHEFT Data
Table 5. Descriptive statistics of post break of MVTHEFT data
Maximum Minimum Mean Median Mode Variance
9 0 2.7647 2 2 3.8054
Figure 3. The Correlograms of Post Break Data of MVTHEFT
Table 6. Estimations of post break MVTHEFT data
INAR (1) - P INAR (1)-NB
CLSE MLE MLE
a k a k m n l b / l
0.3052 1.8569 0.2496 2.0111 3.0414 7.9016 8.2466 0.1897 0.2779 1.9306
TESTING FOR PARAMETER CONSTANCY
J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa
9
7.2. Probabilistic forecasting
Drawing on the results of the previous section, the post break MVTHEFT data is modelled by an INAR (1)-P process and is used toforecast. However, it is interesting to compare the performance of forecasting using the post break data with that of a naiveforecaster who, while catering for overdispersion by fitting an INAR(1)-NB, unwittingly ignores the possibility of a structural break.Table 8 gives the h-step-ahead conditional mean, median and mode forecasts for the INAR(1) -P post break and the INAR(1)-NB fulldata models. Both the mean and median forecasts are quite different but the modal forecasts are exactly the same for both models.This latter finding is however, potentially very misleading. For example, the steady state (6-step ahead) forecast for the probability ofobserving the modal value 2, f
ð6Þ2 , differs substantially depending on whether the INAR-P post break or the full INAR-NB model is used
that is fð6Þ2 is calculated to be 0.2462 and 0.1549 for the post break and full data sets respectively and the post break estimate is
almost 160% larger. In addition, when 95% confidence intervals are constructed for these estimated conditional probabilities, weobtain (0.2200,0.2724) and (0.1249,0.1850) respectively for the post and full data sets. These intervals do not overlap and henceprovide very different predictions about f
ð6Þ2 . Also, the allocation of probability across the support by the two models is quite
different, with the Negative-Binomial allocating a lot more weight to the right hand tail and thus overestimating the probabilities oflarger numbers of thefts. For example, in steady state, the conditional probability that the number of thefts exceeds 4 is 0.1337 forthe post break data while it is 0.3218 for the full data set.
8. CONCLUSIONS
This paper makes the following contributions. First, we develop a two-sided CUSUM test that may be used quite generally fornonlinear and non-Gaussian models as long as they have the CAR structure and satisfy some regularity conditions. Theasymptotic distribution of the CUSUM test for the CAR class is shown to be that of absolute supremum of a Brownian Bridge.
Table 7. Estimations of post break MVTHEFT data
INAR(1)-P INAR(1)-NB
a k m n l b
IM Test 0.9062 1.5964 0.0057 0.0000 0.1350 27.1035P-value 0.3411 0.2064 0.9393 0.9989 0.7132 0.0000
Table 8. Mean, median, mode and probability forecasts for post break ( INAR(1)-P) and Full Data (INAR(1)-NB models)
h-Step A head forecasts of post break given XT ¼ 2: INAR(1)-PINAR(1)-NB
1 2 3 4 5 6
Mean 2.5098 2.6366 2.6682 2.6761 2.6781 2.67862.9077 3.4058 3.6792 3.8292 3.9115 3.9567
Median 2 2 2 3 3 33 3 3 3 3 3
Mode 2 2 2 2 2 22 2 2 2 2 2
ph(0|2) 0.0753 0.0712 0.0692 0.0687 0.0686 0.06850.0817 0.0881 0.0843 0.0811 0.0792 0.0781
ph(1|2) 0.2017 0.1884 0.1849 0.1840 0.1838 0.18370.1661 0.1546 0.1447 0.1392 0.1362 0.1346
ph(2|2) 0.2615 0.2490 0.2468 0.2463 0.2462 0.24620.2459 0.1793 0.1649 0.1592 0.1564 0.1549
ph(3|2) 0.2203 0.2191 0.2197 0.2198 0.2199 0.21990.1932 0.1654 0.1552 0.1512 0.1493 0.1483
ph(4|2) 0.1361 0.1444 0.1466 0.1471 0.1473 0.14730.1285 0.1329 0.1301 0.1286 0.1279 0.1275
ph(5|2) 0.0661 0.0760 0.0782 0.0788 0.0789 0.07890.0791 0.0976 0.1007 0.1015 0.1018 0.1020
ph(6|2) 0.0263 0.0333 0.0348 0.0351 0.0352 0.03520.0464 0.0672 0.0736 0.0759 0.0769 0.0774
ph(7|2) 0.0089 0.0125 0.0132 0.0134 0.0134 0.01350.0264 0.0442 0.0514 0.0544 0.0558 0.0565
ph(8|2) 0.0026 0.0041 0.0044 0.0045 0.0045 0.00450.0147 0.0280 0.0346 0.0377 0.0392 0.0399
ph(9|2) 0.0007 0.0112 0.0013 0.0013 0.0013 0.00130.0081 0.0173 0.0227 0.0254 0.0268 0.0275
ph(10|2) 0.0002 0.0003 0.0003 0.0003 0.0003 0.00030.0043 0.0104 0.0145 0.0167 0.0179 0.0185
L. HAN AND B. MCCABE
wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012
10
Second, we suggest use of the maximum score statistic for testing parameter constancy in the general INAR process and tabulateestimated critical values for the equi-dispersed Poisson case. Many models in the INAR class cannot be tested by sup-testtechniques due to the difficulties in finding critical values whereas the CUSUM statistic allows tests for the stability of theconditional mean of the model to be conducted. Third, we carry out Monte Carlo experiments to evaluate the performances ofthe CUSUM and maximum score tests when the DGP is INAR-P or INAR-NB. It is found that the CUSUM test seems to be undersized when the sample size is small but approaches the nominal 5% for large T. In contrast, the maximum score test has goodsize for all sample sizes. It is found that the two tests are consistent and the largest power occurs when the break fraction is 50%.In the Poisson model we find that the sup S(p) test is the more competitive in most cases, provided the complete specification ofthe model is known. Finally, an empirical example shows that negligence of a structural break may lead to biased probabilisticforecasts for counts.
APPENDIX
PROOF OF THEOREM 1. Define the terms
ht
¼ @g h;Ft�1
@h
gt h� �¼ g h;Ft�1
� � :Use a Taylor series about the true parameter h0 to obtain
rt ¼ Yt � gt h� �
¼ Yt � gt h0ð Þ þ h� h0
� �0ht þ Rt
n o¼ rt � h� h0
� �0ht � Rt
where the remainder is of the form
Rt ¼1
2h� h0
� �0_ht h�ð Þ h� h0
� �_ht is the derivative of ht and h
�is an intermediate point between h and h0. Next, in an obvious notation, construct the mean �r
r ¼ �r � h� h0
� �0�hþ �R
giving
rt � �r ¼ rt ��r � h� h0
� �0ht � �h
� Rt � �Rð Þ:
For s 2 [0,1] write
T�1=2XsTb c
t¼1
rt � �r
¼ T�1=2XsTb c
t¼1
rt � �rð Þ � T 1=2 h� h0
� �0T�1
XsTb c
t¼1
ht � �h
þ T�1=2XsTb c
t¼1
Rt � �Rð Þ
ð21Þ
where
T�1=2XsTb c
t¼1
Rt � �Rð Þ ¼ 1
2T1=2 h� h0
� �0T�3=2
XsTb c
t¼1
_ht h�ð Þ � �_h h�ð Þh i
T 1=2 h� h0
� �
Defining STðsÞ ¼ T�1=2r�1r
P sTb ct¼1 rt with r2
r , the variance of rt, write
T�1=2r�1r
XsTb c
t¼1
rt � �r
¼ ST sð Þ � sST 1ð Þ½ � � T1=2 h� h0
0T�1r�1
r
XsTb c
t¼1
ht � �h
þ T�1=2r�1r
XsTb c
t¼1
Rt � �Rð Þ:
ð22Þ
By Thm 3.2 of KN or TJ, it follows that T 1=2 h � h0
� �is asymptotically normal and hence is OP(1). Taking deviations from the true
means, lh, we see that that
TESTING FOR PARAMETER CONSTANCY
J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa
11
ht � �h
¼ ht � lhð Þ � �h� lh
ð23Þ
and
�h� lh
!p 0 ð24Þ
by stationarity and ergodicity. Consider the ith element of ht, hi,t and this sequence is also stationary and ergodic. Using a maximalinequality (see Cor 3 of Maxwell and Woodroofe (2000), for example) for each lag, i, and any b > 1, there exists a constant K suchthat
P sups2½0;1�
T�1XsTb c
t¼1
hi;t � lh
���������� > e
" #
¼ P maxj¼1;...;T
Xj
t¼1
hi;t � lh
���������� > Te
" # KTb
T 2e2! 0
by stationarity and finite second moments. Thus,
T�1 sups2 0;1½ �
XsTb c
t¼1
ht � �h �����
�����!p 0: ð25Þ
and hence
T 1=2 h� h0
� �0T�1
XsTb c
t¼1
ht � �h
!p 0:
The remainder term, T�3=2P sTb c
t¼1_ht h�ð Þ � �_h h�ð Þ� �
, is dealt with in the same way as (23), (24) and (25) using H2ij of Assumption KN (2) to
uniformly bound the elements of the matrix _ht h�ð Þ over h�. Consequently, the remainder term (22) is op(1) and the asymptotic
distribution of the partial sum process of the rt � �r is the same as that of STðsÞ � sSTð1Þ½ �.By Assumption KN (1) and (3), Yt is a stationary ergodic process and hence rt is a stationary and ergodic martingale difference
sequence with finite 4th moments. Thus, a martingale functional CLT applies (see Hall and Heyde (1980)) and it follows that thepartial sum weakly converges i.e.
STðsÞ ¼ T�1=2r�1r
XsTb c
t¼1
rt )WðsÞ
where WðsÞ is a Brownian motion process. The continuous mapping theorem shows that
maxj¼1;...;T
T�1=2r�1r
Xj
t¼1
rt � �r �����
�����) sups
BðsÞj j
where BðsÞ ¼ WðsÞ � sWð1Þ i.e. Brownian Bridge. By similar sorts of argument to those above and using ergodicity again,
s2r ¼ T�1
XT
t¼1
rt � �r 2!p r2
r
and hence by the continuous mapping theorem it is seen that
maxj¼1;...;T
T�1=2s�1r
Xj
t¼1
rt � �r �����
�����) sups2 0;1½ �
BðsÞj j:
PROOF OF THEOREM 2. Following the proof of Theorem 1 we need to rederive (21) under nonstationarity. Proceeding along the lines of TJwe obtain
r�1XsTb c
t¼1
rt � �r
¼ r�1XsTb c
t¼1
rt ��rð Þ ð26Þ
�r�1R�1=2u H h� h0
� �0R1=2
u H�1R1=2h :R�1=2
h
XsTb c
t¼1
ht � �h
ð27Þ
L. HAN AND B. MCCABE
wileyonlinelibrary.com/journal/jtsa � 2012 Blackwell Publishing Ltd. J. Time Ser. Anal. 2012
12
þr�1XsTb c
t¼1
Rt � �Rð Þ ð28Þ
where H ¼PT
t¼1 hth0t and the remainder term is standardised as in eqn (27). By Thms 6.1 and 6.2 of TJ, there exists a consistentsequence h such that R�1=2
u H h� h0
� �is OP(1) and under eqns (18) and (19) R1=2
u H�1R1=2h :R�1=2
h
P sTb ct¼1 ht � �h
is also bounded inprobability. Thus, the presence of r�1 in (27) and bounding the remainder term (28) as before ensures
r�1XsTb c
t¼1
rt � �r
¼ r�1XsTb c
t¼1
rt � �rð Þ þ oPð1Þ:
Theorem 6.2 of TJ effectively proves a martingale CLT for rt and so the conditions of TJ along with (20) ensures that r�1P sTb c
t¼1 rt
satisfies a FCLT (Thm 27.14 of Davidson (1994), for example). In practice, r needs to be replaced by an estimated quantity andarguing along the same lines as before we may deduce that
XT
t¼1
rt � �r 2
" #�1=2
maxj¼1;...;T
XsTb c
t¼1
rt � r� �
) sups2½0;1�
BðsÞj j:
Acknowledgements
We are greatful to the referee for constructive comments on an earlier draft of this paper.
NOTES
1. In CLAR applications, it is straightfoward to take account of covariates by regressing Yt on its own lags and the covariates and thencomputing residuals to which the CUSUM test is applied. We fitted some seasonal dummies but found there was no substantialdifference in the overall conclusions of the study.
REFERENCES
Al-Osh, M. A. and Alzaid, A. A. (1987) First-order integer valued autoregressive (INAR(1)) process. Journal of Time Series Analysis 8, 261–75.Andrews, D. W. K. (1993) Tests for parameter instability and structural change with unknown change point. Econometrica 61, 821–56.(Corrigendum, 71,
395-397)Brown, R. L., Durbin, J. and Evans, J. M. (1975) Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical
Society B 37, 149–63.Cox, D. R. (1981) Statistical analysis of time series: some recent developments. Scandinavian Journal of Statistics 8, 93–115.Davidson, J. (1994) Stochastic Limit Theory. Oxford: OUP.Drost, F. C., Van den Akker, R. and Werker, B. J. M. (2009) Efficient estimation of autoregression parameters and innovation distributions for
semiparametric integer-valued AR(p) models. Journal of the Royal Statistical Society B 71, 467–85.Engle, R. F. and Russell, J. R. (1998) Autoregressive conditional duration: a new approach for irregularly spaced transaction data. Econometrica 66, 987–
1007.Freeland, R. K. and McCabe, B. P. M. (2004) Forecasting discrete valued low count time series. International Journal of Forecasting 20(3), 427–34.Grunwald, G. K., Hyndman, R. J., Tedesco, L. and Tweedie, R. L. (2000) Non-Gaussian Conditional Linear AR(1) Models. Australian New zealand Journal of
statistics 42(4), 2000, 479–495.Granger, C. W. J. and Andersen, A. P. (1978) An Introduction to Bilinear Time Series Models. Gottingen: Vanderhoeck and Ruprecht.Hall, P. and Heyde, C. C. (1980) Martingale Limit Theory and Its Applications. New York: Academic Press.Han, L. (2011) Statistical Analysis of Structural Breaks in Discrete Valued Time Series Processes. Ph.D Thesis, University of Liverpool, UK.Heinen, A. and Rengifo, E. (2007) Multivariate autoregressive modeling of time series count data using copulas. Journal of Empirical Finance 14, 564–
83.Klimko, L. A. and Nelson, P. I. (1978) On conditional least squares estimation for stochastic process. The Annals of Statistics 6, 629–42.Maxwell, M. and Woodroofe, M. (2000) Central limit theorems for additive functionals of Markov chains. The Annals of Probability 2000, 28(2), 713–24.McCabe, B. P. M., Martin, G. M. and Harris, D. (2011) Efficient probabilistic forecasts for counts. Journal of the Royal Statistical Society B 73, 253–72.McKenzie, E. (1985) Some simple models for discrete variate time series. Water Resources Bulletin 21, 645–50.Olmo, J. and Pouliot, W. (2011) Early detection techniques for market risk failure. Studies in Nonlinear Dynamics & Econometrics 15(3), 1–54.Perron, P. (2007) Dealing with structural breaks. In Palgrave Handbook of Econometrics, Vol. 1: Econometric Theory (eds T. C. Mills and K. Patterson).
Palgrave: Macmillan, pp. 7–20.Pitman, J. and Yor, M. (1999) Path decompositions of a Brownian Bridge related to the ratio of its maximum and amplitude. Studia Scientiarum
Mathematicarum Hungarica 35(1999), 457–74.Serfling, R. J. (1980) Approximation Theorems of Mathematical Statistics, NY: John Wiley.Tjostheim, D. (1986) Estimation in nonlinear time series models. Stochastic Processes and their Applications 21, 251–73.
TESTING FOR PARAMETER CONSTANCY
J. Time Ser. Anal. 2012 � 2012 Blackwell Publishing Ltd. wileyonlinelibrary.com/journal/jtsa
13