TAR model

Embed Size (px)

Citation preview

  • 8/12/2019 TAR model

    1/11

  • 8/12/2019 TAR model

    2/11

    Testing and Modeling ThresholdAutoregressive ProcessesRUEY S. TSAY

    Th e threshold autoregressive model is one of the nonlinear time series models available in the litera ture. I t was first proposedby Tong (1978) and discussed in detail by Tong and Lim (1980) and Tong (1983). The major features of this class of modelsare limit cycles, amplitude depea dent frequencies, an d jump phen omen a. Much of the original motivation q f the model isconcerned with limit cycles of a cyclical time series, and indeed the model is capable of producing asymmetric limit cycles. Thethreshold autoregressive model, ho wever, has not received m uch attention in application. This is due to (a) the lack of a suitablemodeling procedure an d (b) the inability to identify the threshold variable and estimate the threshold values. Th e primary goalof this article, therefore, is to suggest a simple yet widely applicable model-building procedure for threshold autoregressivemodels. Based on some predictive residuals, a simple statistic is proposed to test for threshold nonlinearity and specify thethreshold variable. Some supplementary graphic devices are then used to identify the number and locations of the potentialthresholds. Finally, these statistics are used to build a threshold model. The test statistic and its properties are derived bysimple linear regression. Its performance in the finite-sample case is evaluated by simulation and real-world data analysis. Thestatistic performs well as compared with an alternative test available in the literature. Fu rther applications of threshold auto-regressive models are also suggested, including handling heterogeneous time series ah d modeling rand om processes with periodicvariances whose periodicity is not fixed. T he latter phen omen on is commonly encoun tered in practice, especially in econometricsand biological sciences.KEY WORDS: Arranged autoregression; Nonlinear time series; Nonlinearity test; Predictive residual; Sunspot.

    1 INTRODUCTION Nevertheless, th e TAR model has not been widelv usedA ti me series Y, is a self-exciting threshold auto regr es-sive (TAR ) process if it follows the m odel

    r j - ~ Y,-d < r,, (1)where j = 1, k and d is a positive integer. Thethresholds are - m = r < r1 < < rk = w ; for each j,{aj )) is a seque nce of m artingale difference s satisfying

    sup E(laji)ld F,-l) < w a.s. for some 6 > 2, (2)with F,-I the ield generated by i = 1 , 2,j = 1 , k). Such a process partitions t he one-dim en-sional Euclidean space into k regimes and follows a linearAR model in each regime. The overall process Y s non-linear when there are at least two regimes with differentlinear models. This nonlinear time series model was pro-posed by T ong (1978, 1983) and T ong and Lim (1980) asan alternative model for describing periodic time series.The model has certain features, such as limit cycles, am-plitude dependent frequencies, and jump phenomena, thatcannot be captured by a linear time series model. Forinstance, Tong an d Lim (1980) showed that t he thresholdmodel is capable of producing asymmetric, periodic be-havior exhibited in the annua l Wolf s su nspot and Ca na-dian lynx data.

    Ruey S. Tsay is Associate Professor, Depa rtment of Statistics, Car-negie-Mellon U niversity, Pittsburgh, PA 15213. Th e first draft of thisarticle was completed while the author was on sabbatical leave at theGra dua te School of Business, University of Chicago. Th e author wishesto thank H . Tong and J. I. Pefia for useful comments. H e also acknowl-edges the financial support of the Aluminum Company of America Re-search Foundat ion and N at iona l Sc ience Foundat ion Grant DM S-8702854.

    in applications, primarily because (a) it is hard in practiceto identify the threshold variable and estimate the asso-ciated threshold values, and b) ther e is no simple model-ing procedu re available. The procedur e proposed by Tongand Lim (1980) is complex. It involves several computing-intensive stages, and there were no diagnostic statisticsavailable to assess the need for a threshold model for agiven data set. The goal of this article, therefore, is topropose a proce dure for testing threshold nonlinearity andbuilding, if necessary, a TA R m odel. Th e proposed test issimple because it uses only familiar linear regressiontechniques. T he modeling procedu re consists of four steps,each informative. The steps can also be used iterativelywhen the number of regimes k is large or the degree ofnonlinearity is weak.Th e article is organized as follows. In Section 2, I discusssome sampling properties of least squares estimates of aTAR model. L east squares estimates are used throughout.Section 3 deals with testing the threshold nonlinearity. A narranged autoregression provides predictive residuals thatare used in nonlinearity testing and threshold specification.Asymptotic distribution of the proposed test statistic isgiven, and the finite-sample performance is evaluated bysimulation and analysis of several real data sets. Section4 suggests some graphics that are informative in locatingthe values of the thresholds and thus useful in specifyingthe threshold regimes. Section 5 gives the proposed model-ing procedure, and three illustrative examples (includingthe sunspot data and the Canadian lynx series) are givenin Section 6. It is hoped that the article will broaden theuse and stimulate further investigation of th e TAR model.For convenience, 1 refer to Model (1) as a TAR(k; p,d ) mode l, where k is the num ber of regimes separated by989 American Statistical AssociationJournal of the American Statistical AssociationMarch 1989 Vol. 84 No. 405 Theory and Methods

  • 8/12/2019 TAR model

    3/11

    232 Journal of the American Statistical Association March 1989k 1 nontrivial thresholds rj, p denotes the A R orde r,and d is the threshold lag (called the delay parameter byTong). Th e interval r,-I Yt-d < rj is the jth regime ofY,. Note that (a) the A R orde r p may differ from regimeto regime, (b) the TAR model becomes a nonhomoge-neous linear AR model if only the noise variances a =var(a,(j))are different for different regimes, and (c) theTAR model reduces to a random-level shift model if onlythe constan t terms are different for different j. Theselast two features are of special interest in various appli-cations and are related to outliers and model changes inlinear time series analysis. The fact that the TAR modelsencompass these special cases suggests that the modelsmay have further applications beyond nonlinear time se-ries analysis.2 CONSISTENCY OF LEAST SQUARES ESTIMATES

    Since the T AR m odel is a locally linear m odel, ordinaryleast squares techniques are useful in studying the process.I give a brief d iscussion of some usefu l results. For a givenTA R(k ; p , d ) model of ( I) , denote by n, the number ofobserva tions of Y, tha t ar e in th e jth regim e r,-l Yt-d< r j Assume that

    njln -t cj in probability, (3)for all = 1 , . . . k, wh ere n is the total sample size andc, is a positive fraction such that xi=, = 1. Next, foreach regime j, form the ordinary least squares autoregres-sion of ord er p an d de note th e estimate of @ I ) by I 1 ) andthe associated X'X matrix by XIX(j) .Furthermore, as-sume that for each jas n a , where and are the minimum andmaximum eigenvalues of X'X( j) based on sample size n.

    Theorem 2 1 Supp ose that Y, follows the TA R M odel(1) with a, ', n and XI X (j ) atisfying (2)-(4), respectively.Then , for given k, d , and the threshold values r,, the or-dinary least squa res estimates ? converge to @ ) almostsurely.Under Conditions (1) and (3), Y, is a linear autoregres-sion in every regime, with increasing sample size as thetotal sample size n goes to infinity. he or em 2.1 thenfollows directly from the results of Lai and Wei (1982),where the almost sure convergence of least squares esti-

    mates was shown under Conditions (2) and (4) for generallinear stochastic regressions. Condition (4) is gene ral, andis satisfied, for instance, w hen Y, is ergodic.3 A T ST FOR THRESHOLD NONLINEARITY

    In this section I consider testing threshold nonlinearity.The proposed test is related to the portmanteau test ofnonlinearity of Petruccelli and Davies (1986), in that italso is based on arranged autoregression and predictiveresiduals. Nevertheless, the two tests are different in waysin which the special features of the predictive residuals are

    exploited. Roughly speaking, the proposed test is a com-bined version of the nonlinearity tests of Keenan (1985),Tsay (1986), and Petruccelli and ~ a v i e s1986). It is ex-tremely simple an d widely applicable. Its asym ptotic dis-tribution unde r the linear m odel assumption is nothing butthe usual distribution.3 1 Arranged Autoregression and

    Predictive ResidualsWrite an A R (p ) regression with n observations as Y, =(1, Y ,-l, . . . , Yt-P)P + a, for t = p 1 , . . . , n, wherep is the ( p + 1)-dim ensiona l vector of coefficien ts and a,is the noise. I refer to (Y,, 1, Y,-l, . . . Y,-,) as a caseof data for the AR(p) model. Then, an arranged auto-regression is an autoregression with cases rearranged,based on the values of a particular regressor. For th e TARmodel (I ), arran ged autoregression becomes useful if it isarranged acc ording to th e threshold variable. To see this,consider case k = 2. That is, consider the situation of anontrivial threshold rl. For a given TAR(2 ; p , d ) modelwith n observations, the threshold variable Y,-d may as-sume values {Yh, . . . , Yn-d), where h = max{l, p + 1d) . Let xi be the time index of the ith smallest obser-vation of {Yh,. . . We rewrite the model as

    where s satisfies Yns < r1 Yns+,. his is an arrangedautoregression with the first s cases in the first regime an dthe rest in the second regime. It is useful for the TARmodel because it effectively separates the two regimes.Mo re specifically, the arranged autoregression p rovides ameans by which the d ata points a re groupe d so that all ofthe observations in a group follow the same linear ARmodel. Note th at the separa tion does not require knowingthe precise value of rl. Only the number of observationsin each group depends on rl.To illustrate the potential use of arranged autoregres-sion in studying TAR models, I give the motivation of thepropose d test. Consider Mo del (5). If o ne knew the thresh-old value rl, then consistent estimates of the parameterscould easily be obta ined. Since the threshold value is un-known, howeve r, on e must proceed sequentially. The leastsquares estimates 6; ) re consistent for DL1) if there aresufficiently large num bers of observ ations in the first re-gime, that is, many i s. In this case, the predictiveresiduals are white noise asymptotically and orthogon al tothe regressors {Yni+d-u = 1 , . . . , p). On the otherhand, when i arrives at o r exceeds s the p redictive residualfor the observation with time index n,+l + d is biasedbecause of the model change at time x S + ~ d He re, it iseasy to see tha t th e predictive residual is a function of theregressors {Yni+d-u = 1 , . . . p). Consequently, theorthogonality between the predictive residuals and the re-gressors is destroyed once the recursive autoregression

  • 8/12/2019 TAR model

    4/11

    Tsay: Testing a nd Mo de ling Threshold Autoregressive Processes 33goes on to the observations whose threshold value exceedsrl Notice that here the actual value of rl is not required;all tha t is needed is the ex istence of a nontrivial thres hold .Based on the aforementioned consideration, one way totest for threshold nonlinearity is to regress the predictiveresiduals of the arranged autoregression (5) on the re-gressors {Yn,+,_, v = 1 , . . . p) and use the F statisticof the resulting regression.For th e arranged autoregression (5), let bm e the vectorof least squares estimates based on the first m cases, Pmthe associated X X inverse matrix, and xm + ,he vector ofregressors of th e next observation to ente r the autoregres-sion, namely Yd+,,+,. (No te that the positions of d andnm +, n the subscript of Y are interchanged t o clarify thatm + s a subscript of n.) T hen , recursive least squaresestimates can be computed efficiently by

    D m + l = D m Km+l[Yd+nm+l ~h+lDm ],

    and

    (see Ertel and Fowlkes 1976; Goodw in and Payne 1977),and the predictive and standardized predictive residualsby

    dd+n,+, Yd+n,+, ~ h l b (6)and

    The predictive residuals can also be used to locate thethreshold values once the need for a TAR model is de-tecte d, by using various scatterplots designed to show spe-cific features of the TAR model. Details are given inSection 4. Note tha t the problem considered here is relatedto t he change-point o r switching-regression problem , forwhich voluminous references are available in the litera-ture. For example, see Quandt (1960), Shaban (1980),Pole and Smith (1985), and Siegmund (1988). A key dif-ference, however, is that here the data are serially cor-related.3 2 A Non linearity TestI now give details of the proposed nonlinearity test. Forfixed p and d, the effective number of observations inarranged autoregressions is n d h 1, with h definedjust before (5). A ssume tha t the recursive autoregressionsbegin with b observations so that there a re n d bh + 1predictive residuals available. Do the least squaresregression

    for i = b 1 , . . . n d h 1, and compute the

    associated F statistic

    where th e summ ations are over all of th e observations in(8) and 2 is the least squares residual of (8). The argument( p , d) of f iis used to signify the dependence of the F ratioon p and dTheorem3 1 Suppose that Y, is a linear stationary A Rprocess of or de r p . Th at is, Y, follows Mo del (1) with k

    = 1.Then, for large n the statistic fi(p, d) defined in (9)follows approximately an F distribution with p + 1 andn d b p h df. Furtherm ore, ( p ~ ) f i ( ~ ,)is asymptotically a chi-squared random variable withp + 1 df.

    This theorem can be proved by using the same tech-niques as Tsay (1986, theorem I and Keenan (1985,lemma 3.1). It uses the consistency property of leastsquares estimates of a linear A R m odel and a martingalecentral limit theorem of Billingsley (1961). (Details areomitted .) Note th at the asymptotic distribution of thed ) statistic continues to hold if one replaces the standa rd-ized predictive residuals by the ordinary predictive re-siduals d, of (6). Nevertheless, the standardized predictiveresiduals appear to be preferable when the sample size issmall. For a large sample, ordinary predictive residualsmay save some computation.Since the number and locations of the thresholds areunknown, there exists no (global) most powerful test forthreshold nonlinearity. Relative power, feasibility, andsimplicity are the major considerations in proposing thefi(p, d) statistic. The test can easily be implemented be-cause it requires only a sorting routine and the linearregression method.3 3 Power of the Test and Comparison

    I study (via simulation and analysis of some well-knowndata sets) the power of the fi(p, d statistic in detectingthe threshold nonlinearity. I also compare it with the Pe-truccelli and Davies (1986) portmanteau test. Variousthreshold lags are used for each real data set. I choose b= (n110) + p , with n the sample size and p the fitted ARorder. To com pute the portmanteau test, the standardizedresidual of (7) is further normalized bywhere s is an estimate of the residual variance a2 om-puted recursively by

    where RSS denotes residual sum of squares. Asymptoti-cally, z is standard Gaussian, so the P value of the port-manteau test can be evaluated by using an invariantprinciple (see Petruccelli and Davies 1986).Table 1 summarizes the test results for some real data

  • 8/12/2019 TAR model

    5/11

    Journal of the American Statistical Association March 989Table 1. Nonlinean'ty Tests of Some Real Data

    Threshold lags, dTest 1 2 3 4 5 6 7 8 9 10 11

    Series C: p = 2, n = 226

    Series A: p 7, n = 197

    Logged lynx data: p = 9, n = 114

    Original lynx data: p = 1 1, n = 1 14Fip,B91.41 1.46 3.59 2.08 2.90 1.58 1.68 2.22 1.54 1.17 1.71P 000 000 004 270 267 910 950 257 100 049 451

    Sunspot series, 1700-1979: p = 11, n = 280F12,21s 3.10 10.55 3.89 1.85 1.98 2.73 2.20 2.08 1.93 .91 1.84P 000 607 573 399 195 031 790 .069 .217 316 171NOTE: F , , denotes the proposed F statistic with u and df and P is the P value of the Petruccelii and Davies (1966) ortmanteautest. Series A and C are from Box and Jenkins (1976).

    sets consisting of Series A and C of Box and Jenkins(1976), and the Canadian lynx and Wolf's sunspot data.Series A a nd C ar e known to be linear, whereas the othersare believed to be nonlinear. The AR orders used arethose commonly employed in the literature. From the ta-ble, I m ake the following observations. (a) Both th e pro-posed @ p , d) statistic and the portmanteau te~t~clearlydeclare Series A and C to be linear. (b) The F(p, d)statistic suggests that the lynx series (both t he original andlogged) and the sunspot data are nonlinear. (c) On theother han d, the results of the portmanteau test are mixed,especially for th e sunspot series. T hat result d epends heav-ily on the threshold lag (or the delay parameter) used.This observation agrees with the simulation results of Pe-truccelli and Davies (1986).Table 2 gives the empirical frequencies of rejecting alinear process based on 1,000 realizations and 1 and 5critical values. The model used in the simulation is aTAR(2; 1, I), with parameters (ail), @(, ), @f , r,, a:,a: (1.0, .5, 1.0, 1.0, 1.0, 1.0) and @i2 given in thetable. The sample sizes used are 50 and 100. For eachrealization of sample size n in the simulation, n + 200observations w ere generated and th e first 200 values were

    discarded, to reduce any effect of the starting value (0) ingenerating a TAR model. In the test, p and dwere used. Again, b (nI10) + p. From the table, it isclear that the proposed F statistic is more powerful thanthe portmanteau test in detecting threshold nonlinearity,except for the case @i2 O For the linear models, thatis, @ :) = .5, the F statistic does not result in large TypeI error.Table shows results corresponding to those of Table2 but with threshold value rl O and constant terms@ii) O for j 1, 2 both in the data-generating andtesting. Since the constant term is related to the level ofa process, it is important to see its effect on the testing.Th e statistic again seems to be more powerful than theportmanteau test except for -2.0. Based on theresults of real examples and simulations, the p roposed Fstatistic generally outperform s the p ortmanteau test in de-tecting threshold nonlinearity. It is relatively insensitiveto the change in the threshold lag and is often more pow-erful than the portmanteau test. The portmanteau test,however, is not universally dominated by the F statistic.This is in agreement with the nonexistence of a globaloptimal test.

    Table 2. The Empirical Frequencies of Rejecting a Linear Model Table 3. The Empirical Frequencies of Rejecting a Linear ModelBased on 1,000 Replications of a TAR(2; 1, 1) Model Based ori 1 ,000 Replications of a TAR (2; 1, 1) Model1 5 1 5 1 5 1 5

    @i2) F P F P @@' F P F P @ ? ) F P F P @ ? ' F P F Pn 100,p = 1, d 1 n = 5 0 , p 1 , d 1 n = 1 0 0 , p 1 , d = 1 n = 5 6 , p = l , d = 1

    NOTE: The parameters used are (@tl,\I1, @f , l a: 08) = (1.0,5,.0, 1.0, .0, .0), NOTE: The parameters used are (@fl,\I , @f , l a: 0 ) (.0, 5, O, O, .0, .0). heand F and P denote the F statistic and the portmanteau est. constant term is omitted. F and P denote the F statistic and the portmanteau test.

  • 8/12/2019 TAR model

    6/11

    Tsay: Testing and Modeling Threshold Autoregressive Processes4 SPECIFYING THE THRESHOLD VARIABLE

    4 1 Selecting the Delay ParameterA ma jor difficulty in modeling TA R m odels is the spec-ification of the threshold variable, w hich plays a key rolein the nonlinear nature of the model. For Model I) , thespecification amounts to selection of the delay parameterd. Tong and Lim 1980) used the Akaike information cri-

    terion Aka ike 1974) to select d after choosing all of theother parameters. I propose a different procedure thatselects d before locating the threshold values. The pro-posed method is motivated by the performance of the Fstatistic in analyzing real data. It assumes that the ARorder p is given. For a given TAR process and an ARorder p, one selects an estimate of th e delay parameter,say dp, such thatR P, dp) = maxES @ p, u)), 12)

    where R p, u) is the F statistic of 9), the subscript psignifies that th e estimate of d may depend on p , and S isa set of prespecified positive integers, that is, a collectionof possible values of d . For simplicity, assume that all ofthe test statistics R p, u) of 12) have the same degreesof freedom . This can be achieved by a proper selection ofthe starting point b of the recursion [see 8)]. When thedegrees of freedom are different, one may compute the Pvalues of the F statistics and select dp based on the mini-mum of the resulting P values.No te that the choice of dp in 12) is somewhat heuristic.It is based on the idea that if TAR models are needed,then o ne might start with a delay parameter that gives themost significant result in testing for threshold nonlinearity.A m ore cautious da ta analyst may wish to try several valuesof d, such as those corresponding to the maximum andthe second maximum of R p, d ) in 12).Table 1 provides some examples for the proposedmethod with = 1, p) . For the sunspot seriesdl, = 2, and for the Canadian lynx data d, = 2. It isinteresting to note that for the sunspot series dp = 2 foreach p from 2 to 15, suggesting that the selection couldbe stable with respect to the AR order p. In general, dpmight vary with p , which is usually unknown. In this case,one may start with a reasonable A R o rder p , as suggestedby some identification statistics such as the partial auto-correlation function of Y,, and refine the order later ifnecessary. Details are given in the next section.4 2 Locating the Values of Thresholds

    For a T AR model, special care is needed in estimatingthe thresho ld r;s. To see this , assume that k = 2 and thetrue value of r, satisfies Yns r, YnS +].hen, any valuein the interval [Yns,Yns+,]s as good as the other in pro-viding an estimate of r,, because they all give the samefitting results for a specified TA R model. T herefore , howto select an estim ate of r, with nice finite-sample propertiesfrom infinitely many possible values remains an op en prob-lem. In general, one may provide an interval estimate foreach of the threshold values or use sample percentiles as

    point estimates. I use the latter. That is, adopt the ap-proach of Tong and Lim 1980) by considering the em -pirical percentiles as candidates for the threshold values.But instead of prespecifying a set of finite numbers ofsample percentiles to work w ith, I search through the per-centiles to locate th e threshold values. T he only limitationis that a threshold is not too close to the 0th or 100thpercentile. For these extreme points there are not enoughobservations to provide an efficient estimate.The m ethods proposed to locate the thresholds, hencethe partitions of the Euclidean space, are scatterplots ofvarious statistics versus the specified threshold variable.The statistics used show the special features of the TARmodel. A lthough the graphics are not formal testing andestimating statistics, they d o provide useful information inlocating the thresholds. The plots used are a) the scat-terplot of the standardized predictive residuals of 7) orthe ordinary predictive residuals of 6) versus Yt-dp, ndb) the scatterplot of ratios of recursive estimates of anAR coefficient versus Yl-dp.The rationale of each of theplots is discussed in th e following, whereas illustrative ex-amples are deferred to the applications section.In the framework of arranged autoregression, the TA Rmodel con sists of various model changes that occu r at eachthreshold value rj. T herefore, th e predictive residuals arebiased at the threshold values. A scatterplot of the stan-dardized predictive residuals versus the threshold variablethus may reveal th e locations of th e threshold values of aTAR model. On the other hand, for a linear time seriesthe plot is random, except for the beginning of the recur-sion. This scatterplot is closely related to the traditionalon-line residual plot for quality control. I use the scatter-plot because it tells the locations of the threshold valuesdirectly. In practice, I have found the plot informative inTA R modeling, especially for TA R models in which theonly differences between different regimes are the vari-ances a;To motivate the use of a scatterplot of recursive ratiosof an AR coefficient versus the threshold variable, it isbest to begin with a linear time series. In this case, theratios have two functions: a) they show the significanceof that particular AR coefficient, and b) when the coef-ficient is significant the ratios gradually and smoothlyconverge to a fixed value as the recursion continues. Next,consider the simple TAR model

    where @I1 and @i2 re different. Let 4 be the recursiveestim ate of th e lag-1 A R coefficient in an arra nged au-toregression as in 5). By Theo rem 2.1, the ratios of4 behave exactly as those of a linear time series beforethe recursion reaches the threshold value r,. Once rl isreached, the estimate 6 starts to change and the ratiobegins to deviate. The pattern of gradual convergence ofratios is destroy ed. In effect, the ratio starts to turnand, perhaps, changes direction at the threshold value.For Model 13), 4 begins to change w hen Yt-d reaches

  • 8/12/2019 TAR model

    7/11

    36 Journal of the American Statistical Association March1989r1 and eventually is a comprom ise between and @ :I. simple as com pared with those outlined by Tong and LimThis behavior also app ears in the associated t ratios show- (1980). It is hoped that this procedure may help exploiting information on the value of rl. In ge neral, it is easy to the poten tial of TA R models in application. Th e proceduresee that the change in t ratio is substantial when the two is as follows.AR coefficients aie substantially different.To gain insight into the t ratio plots under the co nditionof no model changes in a series, a simulation study wasconducted. One thousand real izat ions of a Gauss ianAR (1) model Y, .7Y,-, a, were genera ted. As before,300 points of Y, were gen erate d with Yo 0 for eachrealization, but only the last 100 observations were usedas data points. Th e arranged au toregression was then fittedwi thp 1 and d 1 Again, use b (nI10) p 11data points to initiate a recursion. Denote the t ratio of6 at the time point s by T and define the percentagechang e in the t ratio as c, Ts +l T,I/Ts x 100 . LetC( a) be the empirical percenta ge that c, a . C(15) andC(10) were then counted. Since the t ratio is relativelyunstable at the beginning of the recursion, the percentagechanges were tabulated in three different time intervals,namely (13, loo), (36, loo), and (51, 100). The followingresults were obtained: (a) for the time interval (13, loo),C(15) 14.0 and C(10) 21.9 ; (b) for the interval(36, loo), C(15) 2.7 and C(10) 6.5 ; (c) for theinterval (51, loo ), C(15) .7 and C(10) 2.4 . Fromthese results, it is seen that the t ratios are stable exceptfor the beginning of the recursion. Next, to check thepossibility that the t ratio may change its direction, definethat there is an up-turn at time s if T,-I > T, and T,

  • 8/12/2019 TAR model

    8/11

    Tsay Testing and Modeling Threshold Autoregressive Processes 237metric cyclic behavior. Various linear and nonlinearmodels have been proposed for this process, and some ofthem appear to be reasonable. For instance, the TARmodels suggested by Tong and Lim (1980) can produceasymm etric limit cycles similar to tho se actually observ ed.In gene ral, for this series it seems that different da ta spanswould suggest different models. I use all of the 280 ob-servations in model building.

    As shown in Sections 3 and 4, based on p 11 theproposed F statistic confirms that the p rocess is nonlinearand selects Yt-2 as the threshold variable. T here fore , Istart with Step 3. Figure gives the scatterplot of theratios of the lag-2 AR coefficient versus ordered Yr-2.From th e plot, it is clear that the ratio is significant andchanges its direction twice: once near Y,-, 35 and againnear Y,-, 72, suggesting that ther e are two nontrivialthresholds. T hat is, there are three regimes for the process.A n exam ination of the actual values suggests that th e pos-sible estim ates of r, a re (34.0 , 34.5, 34 .8, 35 .O, 35.4, 35 .6,36.0) and th ose of r2 are (70.0, 70 .9, 73.0, 74.0). This ste psubstantially simplifies the complexity in modeling theTAR model because it effectively identifies the numberand locations of th e thresholds. N otice that the possibilityof more than one threshold in this series was noted byHon g (1983, pp. 256-257). Finally, in Step 4 I use AI Cto refine the threshold values and AR orders. The finalthreshold values are r, 34.8 and r2 70.9. Th e A Rorders are 11 ,10 , and 10, and the numbers of observationsare 116, 91, and 62. Details of the model are given inTable 4. Th e table also gives the AC F of the standardizedresiduals of th e mod el, as well as the PA CF of the squaredstandardized residuals. Both ACF and PACF fail to in-dicate any mod el inadequacy. I n Table 4, many of the A Rcoefficients are sm all as comp ared with the correspondingstandard errors, especially for those in the first regime.Nevertheless, the small coefficients are not 0, based onA IC . This shows that the m ajor difficulty in analyzing thesunspot data com es from th e first regime. F urther analysisof this regime might be useful.Using the data from 1700 to 1920, Tong (1983, p. 241)specified a TA R model: for th e sunspo t se ries. Tong s

    AR 2 oefficient Plot

    Figure 1 . Scatterplot of Recursive t Ratios of the Lag 2 AR CoefficientVersus Ordered Y t _ for the Sunspot Data. The X axis is Yt-2 .

    model u ses Y1-3as the threshold variable w ith a threshold36.6. As a rough com parison, I refit both Ton g s and thepreviously specified model to this shorter data span. ForTong s mod el, I obtained close results, with an overall A IC1,083.8. For my mod el, the overall AI C is 1,064.1, whichis substantially smaller. In addition, the PACF of thesquare d standard ized residuals of Tong s model has a value.16 at lag 2, which is significant. Bu t the two TA R m odelshave a sim ilar first threshold that (from Fig. 1) is the m ostsignificant. This demonstrates that the proposed proce-dure can handle multiple thresholds in a direct manner.There is no need to assume knowledge of the number ofthresholds.

    Example 2 In this example I give a TAR model forthe logged Canadian lynx data. Again, this data set hasbeen extensively analyzed. See Lim (1987) for a summaryand discussion. Since there are only 114 observations, Istar t withp 3 and S {1 ,2 ,3 ) at Step of the proposedmodeling procedure. The F statistics of the nonlinearitytest are 4.70,6 .13, and 4.46, respectively. Thus p 3 andd 2 were tentatively entertaine d. To specify the thresh-olds at Ste p 3, the recursive t ratios of the lag-2 A R co ef-ficient are not helpful because the t ratios stay betweenand 2 throughou t the range of Y,-,, except for a fewpoints at th e end . Two othe r scatterplots are useful, how-ever. Figure 2 shows the t-ratio plot of th e lag-l A R coef-ficient, from which a thresho ld with r, 2.4 is clearlyseen. Th e plot also shows a large jump around Y,-, 2.6.But this point is not trea ted as a threshold for two reasons.First, the jump is basically due to three points. Second,there are only few observations with Y,-2 between 2.4 and2.6, making a separation difficult to estimate. Figure 3gives the sc atterplot of th e predictive residuals of (6) versusthe ordered Y,-,. From the plot, it can be seen that thepredictive residuals start to deviate around Y,-, 3.1,suggesting yet ano ther threshold value. [The deviation be-comes much more clear when a horizontal line of O isdrawn across the plot. Also , Tsay s (1988) technique ofdetecting a step change in residual variances could be usedto help read the plot.] Consequently, there are two pos-sible thresholds for the process: One is about 2.4 and theother about 3.1. Again, I use AI C in Step 4 to refine themodel and obtain two thresholds (r , 2.373 and r,3.154). The AR orders are 1 7, and 2, whereas the num-bers of observations are 21, 42, and 45. The final TARmodel is

    The residual variances are .015, .025, and .053, respec-tively. In model checking, the ACF and PACF of thestandardized residuals and squared standardized residuals

  • 8/12/2019 TAR model

    9/11

    Journal of theAmerlcan Statistical Association, March1989Table 4. A TAR Model for the Annual Sun spot Series 1700-1979

    LagsRegimes 0 2 3 4 5 6 7 8 9 1 0 1 1 1 2

    AR Coefficients1 3.14 1.86 -1.36 .06 .04 .06 -.04 1 1 .04 .06 -.01 -.032 11.4 1.06 -.03 -.70 351 -.I2 -.02 .14 -.I9 .02 .273 1.01 .63 .14 -.A2 -.01 -.I4 .10 .31 -.46 .21 .I9

    ACF of standardized residualsPACF of squared standardized residuals

    .06 .05 .02 -.04 -.02 .OO .02 .04 -.04 .14 -.02 .06NOTE: The threshold variable is Yt 2 with thresholds rr = 34.8 and rz 70.9. The AR orders are 11 10 and 10. The numbers ofobselvations are 116 91 and 62 and the residual variances are 173.3 123.7 a nd 84.5. The overall AIC 1 379.4.

    of the model all fail to suggest any model inadequacy.Nevertheless, there is a slight deviation from symme try inthe histogram of the standardized residuals.A T AR model for th e lynx data was given by Tong (1983,p. 190) that has AI C = 37.6, whereas the A IC of theaforem entioned model is 47.7. In comparing the twomodels, it is interesting to note that the single thresholdobtained by Tong is 3.116, which is very close to the secondthreshold obtained here. In fact, there is only one obser-vation, 3.142, between 3.116 and 3.154. Thus the two a p-proaches arrive at a similar conclusion. On the othe r hand ,the analysis of Haggan et al. (1984) of the logged lynxdata indicates that the re is a possible threshold aro und 2.2(of Y,-I rathe r than Y,-2). This appears to agree with myresults. Therefore, for the lynx data the proposed ap-proach is able to capture features previously noticed inthe literature.

    vations to reduce the sizes of three relatively large resid-uals. The adjustments are from 102.5, 84.1, and 91.5 to97.9, 81.0, and 88 .0 at t = 20, 110, and 112, respectively.A linear AR(9) was fitted to the data, and the residualACF and PACF appear to be clean. Nevertheless, thePAC F of th e squa red residuals assumes values .22 and .15at lags 1 and 3, and the residual plot shows some hetero-scedasticity. The latter feature is understandable in lightof the periodic nature of outside temperature that appar-ently influenced the attic temperature. In view of this, aperiodic autoregression or a seasonal time series modelmay be useful. But since the periodicity is not fixedthroughout the process, further analysis is needed.Following the p rocedure of Section 5, select p = 9 andS = (1, 9) and perform a nonlinearity test. Usingthe predictive residuals of (6), the F statistics are 5.92,5.59, 5.18, and 4.69, respectively, for d = 1, 2, 3, and 4.

    Example 3. Finally, I analyze a process of hourly attic As compared with an F distrib dtion with 10 and 198 df,temperatures, consisting of 251 observations. The data these results a re highly significant. Thus I tentatively spec-were obtaine d from the Twin Rivers Project conducted by ify a TAR model with p = 9 and d = 1. Figure 4 showsthe Princeton University Center for Environmental Stud- the t-ratio plot of the constant term in an arranged AR (9)ies, beginning May 26, 1976. A preliminary analysis shows regression with d = 1, that is, the t ratio of con stant termthat there are several potential outlying observations in versus Y,-,. From th e plot, th e t ratios are significant andthe process. For simplicity, I have adjusted three obser- contain two major changes occuring approximately atY,-, = 70 and 82. The second change appears to be rel-

    t ra t io of AR-1 Coe3 ~ ~ ~ ~ ~ ~ ~ ~ j

    f f ic ient

    Figure2.Scatterplot of Recursive t Ratios of the Lag 1 AR CoefficientVersus Ordered Yt - for the Logged Canadian Lynx Data. The X axisis Y t _> Figure 3. Scatterplot of Predictive Residuals Versus Ordered Yt - forthe Logged Canadian Lynx Data. The X axis is Yt -z .

    0 . 6Predictive Residuals

  • 8/12/2019 TAR model

    10/11

    Tsay Testing and Modeling Threshold Autoregressive Processes 239t ratio of AR 0 Coefficient

    Figure 4 Scatterplot of Recursive t Ratios of the Constant TermVersus Orde red Yt-, for Example 3. The axis is Yt-r.atively small and I come back to it later. Finally, aftersome refinement, we arrive at a TAR model with twothreshold values (rl = 69.3 and r2 = 83.0). The A R ordersare 7, 4, 6; the numbers of observations are 95, 75, 74;and the residual variances are .913, 2.386, 2.984. Detailsof the TAR model are

    Th e overall AIC of the model is 177.5. In model check ing,the problems that appear in a linear AR (9) fit are no longerapparen t. The A CF and PAC F of the standardized resid-Predictive Residuals

    Figure 5. Scatterplot of Standardized Predictive Residuals VersusOrdered Y, , for Example 3 After Omitting Observations With Y,-,69.3. The axis is Yt- .

    uals are all clean. The first four PACF's of the squaredstandardized residuals are .l o , .2, l , and .05, respec-tively. Note that the residual variance of regime s muchsmaller than those of the o ther two regim es. This partiallyexplains the heteroscedasticity observed in the linearmodel.As mentioned earlier, the second threshold is less pro-nounced in Figure 4; this sometimes happens in applica-tion. An iterative procedure might be useful. I use thislast example to illustrate the iteration. After locating thefirst threshold valu e, on e may dro p those cases of d ata inthe first regime and carry out the recursive estimation ofarranged autoregression, using the remaining cases to testfor the need of a second threshold. For the attic temper-ature, p(9, 1) = 2.32 after removing those cases of datawith Y,-, 69.3. Com pared with an distribution with10 and 135 df, the test is significant at the 5% level butnot at the 1% level. Similarly, one may also confirm thethreshold location by using the reduced data set. Figure 5gives the scatterplot of the standardized predictive resid-uals of the reduced data set of Example 3, from whichr2 = 82 seems reasonable because there is an apparentdifference in variance.

    7 CONCLUDING REMARKSI proposed a procedure for testing and building TARmodels. The procedure is simple and requires no pre-specification of the number of regimes of a thresholdmodel. I applied the procedure to three real data sets andobtained ade quate models. In particular, for the Canadianlynx data of Example 2 the specified model gives rise toan asym metric limit cycle similar to tha t of th e dat a.Note th at th e TA R m odel (1) has a striking feature of

    discontinuity at the threshold Y,-, = rj when the coeffi-cients @li depend on j This feature can capture jumpphenomena observed in practice such as in the vibrationstudy. Nevertheless, it appears to be somewhat cou nter-intuitive. To check this feature, Chan and Tong (1986)considered a class of smooth T AR models that put certaincontinuity constraints on the m odel. Much remains to beinvestigated for this continuity problem, however.[Received May 1987. Revised June 1988.1

    REFERENCESAka ike, H. (1974), A New Look at Statistical Model Identification,IEEE Transactions on Automatic Control 19, 716-722.Box, G . E. P., and Jenkins, G . M. (19761, Time Series Analysis: Fore-casting and Control San Francisco: Holden-Day.Billingsley, P. (1961), The Lindeberg-Levy Theo rem for Martingales,Proceedings of the American Mathematical Society 12, 788-792.Chan, K. S ., and Tong, H. (1986), On Estimating Thresholds in Au-toregressive Models, Journal of Time Series Analysis 7, 179-190.Ertel, J. E ., and Fowlkes, E . B. (1976), Some Algorithms for LinearSpline and Piecewise Multiple Linear Regression, Journal of theAmerican Statistical Association 71, 640-648.Goodwin, G. C ., and Payne, R . L. (1977), Dynamic System Identifica-tion: Experiment Design and Data Analysis New York: AcademicPress.Haggan, V. Hera vi, S. M., and Priestley, M. B. (1984), A S tudy ofthe Application of State-Dependent Models in Nonlinear Time SeriesAnalysis, Journal of Time Series Analysis 5, 69-102.Kee nan, D . M. (1985), A Tukey Nonadditivity-Type Test for TimeSeries Nonlinearity, Biometrika 72, 39-44.

  • 8/12/2019 TAR model

    11/11

    24 Journal of the American Statlstlcal Association March1989Lai, T. L., and Wei, C. Z (1982), Least Squ ares Estima tes in StochasticRegression Models With Applications to Identification and Control ofDynamic Systems, The Annals of Statistics 10, 154-166.Lim, K. S. (1987), A Comparative Study of Various Univariate TimeSeries Models for Canadian Lynx D ata, Journal of Time Series Anal-ysis 8, 161-176.Petruccelli, J. and Davies, N. (1986), A Portmanteau Test for Self-Exciting Threshold Autoregressive-Type Nonlinearity in Time Series,Biometrika 73, 687-694.Pole, A. M ., and Smith, A. F M. (1985), A Bay esian Analysis of Som eThreshold Switching Models, Journal of Econometrics 29, 97-119.Priestley, M. B. (1980), State-Dependent Models: A General Approachto Nonlinear Time Series Analysis, Journal of Time Series Analysis1, 47-71.Quandt , R. E (1960), Tests of the Hypothesis That a Linea r RegressionSystem Obeys Two Sep arate Regimes, Journal of the American Sta-tistical Association 55, 324-330.

    Shab an, S. A. (1980), Change-Point Problem and Two-Phase Regres-sion: An Annotated Bibliography, International Statistical Review48, 83-93.Siegmund, D. (1988), Confidence S ets in Change-Point Problems,International Statistical Review 56, 31-48.Tong, H. (1978), On a Threshold Model in Pattern Recognition andSignal Processing ed. C. H Chen , Amsterdam: Sijhoff Noord-hoff. (1983), Threshold Mod els in Nonlinear Time Series Analysis (Lec-ture Notes in Statistics No. 21), New York: Springer-Verlag.Tong, H., and Lim, K. S. (1980), Threshold Autoregression, LimitCycles and Cyclical Data (with discussion), Journal of the RoyalStatistical Society Ser. B, 42, 245-292.Tsay, R. S . (1986), Nonlinearity Tests For Time Series, Biometrika73, 461-466.1988), Outliers, Level Shifts, and Variance Changes in TimeSeries, Journal of Forecasting 7, 1-20.