Modeling dynamic diurnal patterns in high frequency

Modeling dynamic diurnal patterns in high frequencyfinancial dataPreliminary draft

Ryoko Ito∗

University of Cambridge

January 10, 2013

Abstract

We extend the dynamic conditional score (DCS) model of Andres and Harvey (2012)and develop an intra-day DCS model with dynamic cubic spline to be fitted to high-frequency observations with periodic behavior. The spline can allow for periodic behaviorand seasonal effects that may evolve over time, and it is based on the work of Harveyand Koopman (1993). The periodic component can be estimated simultaneously with allother components of the model by the method of maximum likelihood. We fit our spline-DCS model to ultra-high frequency observations of IBM trade volume. We estimate thedynamics of conditional scale of data and illustrate the impressive in-sample and predictiveperformance of our model.

1 Introduction

Countless studies in finance have been devoted to modeling and forecasting financial variablesusing variants of multiplicative error models, whose dynamics are driven by a martingale pro-cess constructed from the moment conditions of observations. Prototypes of such models arethe autoregressive conditional heteroscedasticity (ARCH) model of Engle (1982), generalizedARCH (GARCH) of Bollerslev (1986) and Taylor (1986), and autoregressive conditional du-ration (ACD) model of Engle and Russell (1998). These classes of observation-driven modelsare considered to be the most popular for forecasting volatility of returns, trade duration, andintensity in finance. Their success can be attributed to their simplicity, practicality, and theease of making statistical inference. Because the predictive filter of these models are essentiallya linear function of past observations (or squared observations in the context of return volatil-ity), the filter can be said to respond too much to extreme observations, whose effects can betoo persistent, causing strong serial correlation in the estimation residuals that are difficultto remove. From a statistical perspective, whether an extreme observation should be treatedas an outlier depends on the shape of distribution against which data is tested. If we shouldanticipate occasional extreme observations with non-negligible probability, it may be pertinentto capture them as part of the tail(s) of underlying distribution and subdue their effects on thedynamics of the conditional distribution instead of reacting sensitively to them via persistentmovements in the time-varying parameter of the distribution.

∗Faculty of Economics, University of Cambridge. Email: [email protected]. I would like to thank mysupervisor Professor Andrew Harvey for his helpful guidance throughout the year. I would also like to thankPhilipp Andres and Stephen Thiele for providing thoughtful comments on an early version of this paper.

1

Harvey and Chakravarty (2008) take a step in the direction called for above by introducinga new type of observation-driven model called Beta-t-EGARCH. Beta-t-EGARCH falls in theclass of dynamic conditional score (DCS) model formally defined and studied by Harvey (2010),Harvey (2013), and Andres and Harvey (2012). The advantages of DCS are fourfold. First, theparameters of the model can be estimated easily by the method of maximum likelihood. Theconsistency and asymptotic normality of the maximum likelihood estimators are establishedby Harvey (2010) and Andres and Harvey (2012). Second, the analytical expressions of themulti-step optimal predictors1 as well as their mean square errors are available. Third, theseanalytic and asymptotic properties of the model extend to a number of heavy- or long-taileddistributions against which the distribution of observations may be tested. Fourth, becausethe predictive filter of the model is driven by the conditional score,2 the effects of news on thefiltered dynamics are bounded even in the presence of extreme observations.

The predictive DCS filter can be applied either to the scale parameter or the locationparameter of a given distribution.3 The basic DCS model applied to scale is defined as follows.Given a sequence of T observations (yt)

Tt=0, we denote its underlying random data generating

process as (Yt)Tt=0. We assume that this process is defined on the probability space (Ω,F ,P)

equipped with a filtration (Ft)Tt=0, and that F0 trivial so that Y0 is almost surely constant. EachYt follows some unknown cumulative distribution function (cdf) Ft for t = 1, . . . , T . To modelthe observations (yt)

Tt=0, the scale DCS model assumes that there exist a constant location

factor c ∈ R and a standard cumulative distribution function (cdf) F such that

(Yt − c)/αt|t−1|Ft−1 ∼ iid F t = 1, . . . T (1)

for a sequence of the time-varying scale factor(αt|t−1

)Tt=1

, where αt|t−1 > 0 for all t = 1, . . . , T .We denote ·t|t−1 to reflect the conditionality of the variable at time t on Ft−1. F is a short-handnotation for the standard cdf F (·; θ) re-centered around the origin with a constant vector θof distribution parameters that does not include c and αt|t−1. The probability density/massfunction (pdf) of F is denoted f . The location DCS model of Harvey and Luati (2012) assumesthat the location parameter c is time varying and can be estimated by the DCS filter. Thefirst-order DCS filter for scale is defined as

αt|t−1 = exp(λt|t−1), λt|t−1 = δ + φλt−1|t−2 + κut−1, t = 1, . . . T, (2)

where ut is the conditional score

ut =∂ log

[e−λt|t−1f(yte

−λt|t−1 ; θ)]

∂λt|t−1

for t = 1, . . . , T . We also have δ ∈ R, κ > 0, and |φ| < 1 if (λt|t−1)Tt=0 is stationary. Thisscale DCS filters the dynamics of αt|t−1 via the exponential link function with the canonicallink parameter λt|t−1 ∈ R for t = 1, . . . , T . The exponential link is useful as it eliminates theneed to impose parameter restrictions in order to ensure that the scale parameter αt|t−1 remainspositive for all t. However, the choice of link function should always be consistent with theparametric choice of F for the asymptotic properties of DCS to go through. For example, thelogit link may be used if F is binomial. See Harvey (2013). The choice of F should always

1This optimality is in terms of minimum mean square error (MMSE).2It is the first derivative of the log-likelihood of a single observation, where the derivative is taken with

respect to the dynamic parameter of the model.3See Harvey and Chakravarty (2008) and Andres and Harvey (2012) for the DCS model applied to scale,

and Harvey and Luati (2012) for an application to location.

2

depend on the main characteristics and empirical distribution of data.4 Andres and Harvey(2012) study a special case where yt is non-negative for all t, and consider a number of non-negative distributions for F including generalized gamma (GG), generalized beta (GB), andlog-normal. In contrast, Beta-t-EGARCH of Harvey and Chakravarty (2008) is another specialcase of DCS where F is characterized by the student’s t-distribution in the context of returnvolatility. If F is characterized by the GG distribution, ut is a linear function of a gammadistributed random variable. In contrast, if F is characterized by the GB distribution, ut isa linear function of a beta distributed random variable. In both cases, ut is iid given Ft−1

for t = 1, . . . , T . See Appendix A. Appendix B describes the features of the DCS model in agreater detail.

DCS is a relatively new research frontier and, as such, its practicality in empirical financialanalysis is still to be explored in a variety of applications. This paper contributes to theDCS literature by constructing a variant of the DCS model to be fitted to ultra high-frequencyobservations of financial variables. Our empirical results illustrate that DCS becomes a powerfultool for approximating the local dynamics and conditional distribution of high-frequency dataprovided that we thoroughly understand the main characteristics of data and the parametricassumptions are formulated to reflect them.

There are a number of stylized characteristics of high-frequency financial data. One of themis a concentration of zero-valued observations. See Hautsch (2012) for recent evidence. Ourmodel deals with a non-negligible number of zero observations by decomposing the conditionaldistribution of data at zero and applying the DCS filter only to non-zero observations. Thisdecomposition method is a standard technique in economics and finance, and it is adoptedby many. See, for instance, Amemiya (1973), Heckman (1974), McCulloch and Tsay (2001),Rydberg and Shephard (2003), and Hautsch et al. (2010).

Moreover, intra-day returns and trading activity often exhibit diurnal patterns as well asweekly, monthly, and quarterly seasonal effects. (See, for example, Taylor (2005), Hautsch(2012), and Lo and Wang (2010), among many others.) Diurnal patterns have been modeled inpresent literature mainly by either applying the Fourier representation, or using a deterministicspline, or computing sample moments for each bin of intra-day trading hours.5 These methodsgenerally assume that diurnal pattern is a deterministic function of time and does not evolveover time. They may also require the diurnal component and other stochastic components tobe estimated in isolation by a two-step procedure in order to “diurnally adjust” data. Thiscan render the asymptotic properties of statistical tests invalid. We take a novel approachto this issue by integrating the DCS model with the dynamic cubic spline model of Harveyand Koopman (1993). There are three main benefits to this approach. First, it allows forthe possibility that the diurnal U-shaped patterns may evolve over time. Second, the diurnalpatterns can be estimated very easily and simultaneously with all other components of themodel by the method of maximum likelihood. Third, the presence of seasonal effects can bereflected in the shape of diurnal patterns and can be formally tested by a classical likelihoodbased test. In this paper, we use a special case of the spline termed weekly spline to checkwhether the intra-weekly seasonality (the day-of-the-week effect) is operative in data by testing

4If we assume that F is the standard normal distribution, c = E[Yt] and α2 = Var(Yt). In this case, as thefirst and second moments completely describe the shape of the distribution, there are no parameters in θ of F tobe estimated. If we assume that F is Burr, c may be better described as the location of the pole (or sometimesthe vertical asymptote) of the distribution. Obviously and yet importantly, c and α are not the same as thefirst or the second moment of the distribution in this case. If F is highly non-Gaussian, θ is often non-emptyand includes parameters that determine the shape of F such as skewness, excess-kurtosis, and the heaviness oftail(s). For highly non-Gaussian distributions, the first and second moments (even when they exist) may carryvery little information about the shape of distribution.

5See, for instance, Andersen and Bollerslev (1998), Engle and Russell (1998), Zhang et al. (2001), Campbelland Diebold (2005), Engle and Rangel (2008), Brownlees et al. (2011), and Engle and Sokalska (2012).

3

whether there is statistically significant difference in the diurnal shape of different weekdays.This method of testing for the day-of-the-week effect contrasts with standard testing procedureusing seasonal dummies, which essentially capture seasonality by allowing for a level shift in thevariable being estimated on each weekday. See, for instance, Andersen and Bollerslev (1998)and Lo and Wang (2010) for examples of the use of day-of-the-week dummies. The weekly splinemay be preferred over seasonal dummies if seasonality manifests in data in a non-linear fashionvia differences in the shape of intra-day patterns. Harvey and Koopman (1993) originallydevelop the dynamic cubic spline to estimate the intra-weekly patterns of hourly electricitydemand. It has been employed later by Harvey et al. (1997) to model a changing seasonalcomponent of weekly money supply in the U.K., and also by Bowsher and Meeks (2008) toforecast zero-coupon yield curves. Bowsher and Meeks (2008) interpret the spline as a specialtype of “dynamic factor model”, where the knots of the spline are the factors and the factorloadings are treated as given and specified accordingly to the requirement that the model hasto be a cubic spline.

Our proposed spline-DCS model will be fitted to intra-day observations of trade volume ofIBM stock traded on the New York Stock Exchange (NYSE). Trade volume is a measure ofintensity of trading activity. There are a variety of volume measures used in the literatureincluding the number of shares traded, dollar volume, number of transactions, turnover (sharestraded divided by shares outstanding), and dollar turnover. Given that the purpose of thispaper is to develop the spline-DCS model and illustrate its practicality, we arbitrarily choosetrade volume as measured by the number of shares traded. While this paper deals with non-negative time series, the spline-DCS model can be also applied to variables with support over theentire real line, in which case our model should be viewed as an extension of Beta-t-EGARCHof Harvey and Chakravarty (2008) and can be compared with studies of high-frequency assetreturns including the analysis of the deutsche mark-dollar exchange rate by Andersen andBollerslev (1998).

We do not include terms to explicitly capture the overnight effect in our model, and as-sume that periodic surge in market activity near market open is a natural part of the cyclicaldynamics of the conditional distribution of data. This contrasts with the standard methodsfor capturing overnight effects such as treating extreme observations as outliers using dummyvariables, differentiating day and overnight jumps, and treating events that occur in a specifiedperiod immediately after open as censored. (See, for instance, Rydberg and Shephard (2003),Gerhard and Hautsch (2007), and Boes et al. (2007).) We choose to deviate from these stan-dard methods due to a consideration that; it may take time for the overnight effect to diminishcompletely during the day; and it is generally difficult to identify which observations are dueto overnight information if there is arrival of new information after market open. Our modelcaptures the overnight effect via periodic hikes in the time-varying scale parameter, reflectingthe effects of overnight information as a cyclical elevation in the probability of extreme eventsnear market open. This feature of our model also highlights one of the advantages of the expo-nential link function: volatile and dynamic movements in the scale parameter can be inducedby relatively more subtle movements in the link parameter to which the DCS filter is applied.These results are reported in Section 5.6.1.

The structure and main findings of this paper are as follows. Section 2 gives an initialinvestigation of our trade volume data and motivate the construction of our model. We illustratethat the data exhibits diurnal U-shaped patterns, and that the empirical distribution andautocorrelation structure change with the length of aggregation interval. Section 3 outlines ourmodeling assumptions and formally construct the intra-day DCS model with dynamic cubicspline. Sections 4 and 5 discuss methods for estimation and model selection. The in-sample andout-of-sample estimation results are reported in Sections 5 and 6. We find that the estimationresults are mostly robust to marginal changes in the aggregation interval. A statistical test

4

for the intra-weekly seasonality finds no evidence of the weekend effect. However, we findindications that a marginal change in the aggregation interval can affect whether seasonaleffects manifest in data (Section 5.4.1). We find that the Burr distribution, which is a specialcase of the GB distribution, achieves an impressive fit to our data. In particular, the probabilityintegral transform (PIT) of the estimation residuals turns out to be remarkably close to beingiid standard uniformly distributed. In contrast, the log-normal distribution gives an inferiorfit due to the departure of the log-transformed data from normality especially around the tailregions. Burr fits better than log-normal presumably because the shape of the standard log-normal distribution is determined by only one parameter while the standard Burr distributionhas two shape parameters, making the shape of Burr more flexible than that of log-normal(Section 5.2). We also find evidence that long memory is inherent in our data (Section 5.3).The out-of-sample performance tests show that the in-sample estimation results are stable,and that our model is able to provide good one-step and multi-step ahead forecasts for severalhundred or thousand steps of prediction horizon (Section 6).

2 Data characteristics

Before proceeding to detailed modeling and forecasting results, it is useful to get an overall feelfor the trade volume data. This section provides an initial investigation of our high-frequencytrade volume series that serves to motivate our formal model in Section 3.

We analyze the trade volume (in the number of shares) of IBM stock traded on NYSEbetween Monday 28 February and Friday 31 March 2000, which includes 25 trading days andno public holidays. Our raw data set is in tick-format and consists of the record of everytrade in the sequence of occurrence. The tick-data is irregularly spaced and often have multipletransactions in one second. The tick-data can be aggregated over equally spaced time-intervals.In this section, we aggregate the raw data in tick-format by different aggregation intervals anddescribe the notable features of data. For convenience, we refer to the aggregated series asIBM1s if the aggregation interval is 1 second, IBM1m if 1 minute, and so on.

2.1 Diurnal U-shaped patterns

The top panel of Figure 1 shows the time series plot of IBM1s during the market open hours ofNYSE (9.30am-4pm in local time) over a given trading week of March 2000. At the aggregationinterval of 1 second, there are 23,400 observations per trading day.

We observe several recurrent spikes in trade volume either immediately after the marketopening or before closure. These extreme observations make the upper tail of empirical distri-bution very long. (See Table 1 and Figure 3.)

The right column of Figure 1 shows IBM1s smoothed by the simple moving average ofnearest 500 observations, which is equivalent to observations made in the nearest 4 hours.The smoothed series reflect that there are diurnal U-shaped patterns in trading activity. Inparticular, trade volume tends to be high soon after the market opening, falls during the quietlunch hours of around 1pm, and picks up again before closure. One may naturally suspect thatthe shape of diurnal patterns may be slowly changing over time. These qualitative featuresare also present at the aggregation level of 1 minute. Other measures of trading activity basedon durations (such as trade durations, midquote change durations, and volume durations)also exhibit similar diurnal U-shaped patterns. See, for example, Hautsch (2012, pp.41) forrecent evidence. Extreme movements in trading activity in the first hour of trading day canbe caused by news transmitted over night. Trading activity slows down towards lunch time asovernight information is processed, but picks up again in the afternoon as traders re-balancetheir positions before market closure.

5

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000

Mon‐20‐Mar Tue‐21‐Mar Wed‐22‐Mar Thu‐23‐Mar Fri‐24‐Mar0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

Mon‐20‐Mar Tue‐21‐Mar Wed‐22‐Mar Thu‐23‐Mar Fri‐24‐Mar

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

100,000

09 11 13 150

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

09 11 13 15

Figure 1 – IBM1s (left column) and the same series smoothed by the simple moving average of nearest500 observations (right column). Time on the x-axis. Top panel: Monday 20 - Friday 24 March 2000. Eachday covers trading hours between 9.30am-4pm (in the New York local time). Bottom panel: Wednesday22 March 2000, covering 9.30am-4pm (in the New York local time).

2.2 Aggregation effect on a discrete random variable

The IBM stock is traded in the fixed increment of 100 shares over our sampling period. Thisfeature has the following two important implications to our analysis:

• rounding effect: traders often round up or down the number of shares to trade toround numbers such as 500, 1000, 5000, 10000, and 15000. See Figure 2. The empiricalfrequency of tick-data takes discrete jumps at the multiples of 500 (denoted 500N>0). Thesize of jumps decays as volume increases. Thus, the empirical cdf of tick-data exhibitsstep-increases at 500N>0.

These jumps in empirical frequency at 500N>0 moderate as the aggregation intervalwidens. The bottom panel of Figure 2 shows that, at the aggregation level of 10 sec-onds, the jumps are much smaller than those of tick-data.

• discreteness: at any aggregation level, the volume series increments in the multiples of100. Thus our series should be viewed as a realization of a sequence of discrete randomvariables or a discretely distributed continuous-time process. However, the empiricaldistribution becomes closer to one of continuous random variables as the aggregationinterval widens.

2.3 Aggregation effect on empirical distribution and autocorrelation

Figure 3 shows the frequency distribution, empirical cumulative distribution function (cdf), andsample autocorrelation of IBM30s, and IBM1m. We observe a number of important featuresincluding:

• right skewness, long-tail: the left and middle columns of Figure 3 show that our seriesare heavily right-skewed and have extremely long upper-tail. The length of upper-tailcan be also seen in the distance between the maximum and the 99.9% sample quantile

6

0 1000 2000 3000 4000 5000 60000

5

10

15

20

25

30

35

Non−zero volume

Fre

quen

cy %

0 1000 2000 3000 4000 5000 60000

20

40

60

80

100

Non−zero volume

Em

piric

al C

DF

%0 1000 2000 3000 4000 5000 6000

0

5

10

15

20

25

30

35

Non−zero volume

Fre

quen

cy %

0 1000 2000 3000 4000 5000 60000

20

40

60

80

100

Non−zero volumeE

mpi

rical

CD

F %

Figure 2 – Frequency distribution (left) and empirical cdf (right) of non-zero trade volume. Aggregationinterval: tick-data (top row) and 10 seconds (bottom row). Sampling period: 28 February - 31 March 2000.

in Table 1. The length of the upper-tail and skewness of are most extreme for IBM1s.These extreme features moderate as the aggregation interval widens. Note that the sampleskewness statistics in Table 1 must be interpreted with care as the theoretical skewnessmay not exist.

• autocorrelation: the right column of Figure 3 shows that data aggregation affects au-tocorrelation. IBM1s shows statistically significant but weak autocorrelation that is veryslow to decay. As the aggregation interval widens, the series become more strongly andstatistically significantly autocorrelated. The autocorrelation of IBM1m is faster to decaythan IBM1s.6

• zero-observations: the frequency of zero-volume decreases as the aggregation intervalwidens. (See the bottom row of Table 1.) The frequency of zero-volume observations isapproximately 80% for IBM1s, which compares with the 0.06% of IBM1m.

2.4 Choice of aggregation interval for modeling

For modeling and estimation, the rest of our analysis will focus on IBM30s and IBM1m andignore all other aggregation intervals. This choice of aggregation interval is controversial anddeserves a formal explanation.

As we illustrated in Section 2.2, the raw tick-data has a non-negative discrete distributionthat is extremely right-skewed and has a long upper-tail. It is difficult to determine what typeof discrete distribution the data might follow as there are a number of jumps in the empiricalfrequency due to the aforementioned rounding effect. We can remove the rounding effect bywidening the aggregation interval until there are no obvious jumps in the empirical distribution.Widening the aggregation interval also moderates the extreme skewness and length of the

6These effects of data aggregation is interesting as there may be a level of aggregation at which the degreeof autocorrelation is maximized in some sense. This is an interesting feature to be studied in a separate paper.

7

Series IBM1s IBM30s IBM1m

Observations (total) 585,000 19,500 9,750

Mean 351 10,539 21,297Median 0 6,100 14,000Maximum 1,560,000 1,652,100 1,652,100Minimum 0 0 0Std. Dev. 4,402 26,071 39,114Skewness 169 29 18

99.9% sample quantile 25,000 293,654 604,295Max - 99.9% quantile 1,535,000 1,358,446 1,047,805

Frequency of zero-vol. 79.13% 0.47% 0.06%

Table 1 – Table of sample statistics of the IBM trade volume. Sampling period: Monday 28 February -Friday 31 March 2000. (25 trading days with 6.5 trading hours per trading day.)

upper-tail. Once the empirical distribution becomes sufficiently smooth, the distribution of theaggregated series can be approximated by continuous distributions especially when the samplesize is large. Thus data aggregation also widens the parametric choice of distributions we canuse for modeling. However, we cannot widen the aggregation interval arbitrarily as it risksintroducing stronger and more statistically significant autocorrelation in the aggregated seriesas we illustrated in Section 2.3. Strong autocorrelation in data can leave strong autocorrelationin the estimation residuals that we may find difficult to remove.

As the empirical distribution become sufficiently smooth at the aggregation intervals of30 seconds and one minute, we will focus mainly on IBM1m in the rest of the analysis anduse continuous distributions with non-negative support to characterize the distribution of ourseries. In order to explore the effects of changes in the aggregation interval on our inference, wewill also report the results for IBM30s. Although our estimation results are robust to marginalchanges in the aggregation interval, we find indications that the aggregation interval may affectstatistical tests for seasonality. See Section 5.4.1.

3 Intra-day DCS with dynamic spline

The discussion thus far suggests that a periodic component capturing the diurnal U-shapedpatterns will be important in a model fitted to the trade volume series. Moreover, this compo-nent should accommodate the possibility that the shape of diurnal patterns may change slowlyover time. One may naturally suspect that non-periodic factors may coexist with the periodiccomponent in data. One such factor is a highly persistent low-frequency component governingmovements around the periodic U-shaped patterns. The empirical autocorrelation structuresuggests that there may be long-memory inherent in data. This can be captured by a com-bination of autoregressive components. The presence of non-negligible number of zero-volumeobservations can be explained by another component governing the binary outcome of whetherthe next trade volume is zero or non-zero. Given the nature of our data, we find that fixingthe location parameter (c) at zero is sufficient in our application.

Thus we stipulate that the trade volume process is driven by the simultaneous interactionof numerous latent components, one determining zero or non-zero volume, one associated withdiurnal periodic movements, one with persistent low-frequency movements, and one governingthe long-memory feature of data. We allow for conditional scale dynamics with the canonicallink parameter having an unobserved-component structure, where contributions come from both

8

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 104

0

2

4

6

8

10

12

All volume

Fre

quen

cy %

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

10

20

30

40

50

60

70

80

90

100

All volume

Em

piric

al C

DF

%

0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of y

95% confidence interval

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

0.5

1

1.5

All volume

Fre

quen

cy %

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

10

20

30

40

50

60

70

80

90

100

All volume

Em

piric

al C

DF

%

0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of y


0 1 2 3 4 5 6

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

All volume

Fre

quen

cy %

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

10

20

30

40

50

60

70

80

90

100

All volume

Em

piric

al C

DF

%

0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of y


Figure 3 – Frequency distribution (left column), empirical cdf (middle column), and sample autocorrelation(right column). IBM1s (top row), IBM30s (middle row), and IBM1m (bottom row). Sampling period: 28February - 31 March 2000. The axes of cdf and autocorrelation charts are fixed across the rows so thatthey are easily comparable. The 200th lag corresponds approximately to 3 minutes prior for IBM1s, 1.5hours prior for IBM30s, and 3 hours prior for IBM1m.

periodic and non-periodic components. The rest of this section gives the formal definitions ofeach of these components.

3.1 Redefining time indices

Henceforth, we redefine the time-axis by τ = 0, . . . , I to denote intra-day time and t = 1, . . . , Tto denote trading days up to T . τ = 0 and τ = I are the moments of market opening and closureon each trading day. We use the subscript ·t,τ to denote the intra-day time τ on t-th tradingday. We assume that F1,0 is trivial, and that the first (aggregated) trading volume of eachtrading day is observed at τ = 1. Moreover, we assume that ·t+1,0 = ·t,I for all t = 1, . . . , T − 1.Thus the total sample size is I × T . In order to avoid superfluous subscripts, we suppress theconditionality of variables so that we merely write ·t,τ instead of ·t,τ |t,τ−1 even when the variableis conditional on Ft,τ−1. Finally, we introduce the following notations

ΨT, I = (t, τ) ∈ 1, 2, . . . , T × 1, 2, . . . , I,ΨT,I>0 = (t, τ) ∈ 1, 2, . . . , T × 1, 2, . . . , I : yt,τ > 0.

9

Generalized gamma (

Gamma ( Weibull ( )

Generalized beta ( )

Burr ( )

(Pareto IV)

Log-logistic ( )

(Pareto III)

Exponential

F (

Pareto II ( )

Figure 4 – Nested diagram of some of the useful non-negative distributions. Scale factor is assumed to beone in all cases.

3.2 A point-mass probability at the origin

The presence of a non-negligible number of zero-volume observations cannot be explained byconventional continuous distributions as the probability of observing a particular value is zero bydefinition. Although one can eliminate zero-volume observations by widening the aggregationinterval, an better approach is to define a distribution such that:

• its strictly positive support is captured by a conventional continuous distribution; and

• the origin has a discrete point-mass probability.

Formally, we characterize the cdf F : R≥0 → [0, 1] of a standard random variable X ∼ F by

PF (X = 0) = p p ∈ [0, 1]

PF (X > 0) = 1− p (3)

PF (X ≤ x|X > 0) = F ∗(x) x > 0

where F ∗ : R>0 → [0, 1] is the cdf of a conventional standard continuous random variable withthe time-invariant parameter vector θ∗.7 Thus we have θ = (p, θ∗) for F . The properties ofthis type of distributions are studied in Hautsch et al. (2010). The idea of decomposing thedistribution of a random variable using thresholds is not new as it is essentially the same asthe famous censored regression model (c.f. Amemiya (1973) and Heckman (1974)) and thedecomposition models of McCulloch and Tsay (2001) or Rydberg and Shephard (2003). Fornon-negative series, F ∗ can be characterized by a number of distributions including Weibull,Gamma, Burr, and log-normal, many of which are special cases of the GG and GB distributions.See Figure 4 for the relationship between these distributions.

Under the definition of F in (3), we apply the predictive DCS filter only to positive obser-vations so that ut,τ becomes the conditional score of F ∗ and λt,τ responds to ut,τ−1 only when

7The unconditional n-th moment of X is well-defined as long as it is well-defined for F ∗ because

EF [Xn] = p EF [Xn|X = 0] + (1− p)EF [Xn|X > 0] = (1− p)EF∗ [Xn].

10

yt,τ−1 > 0. The idea is that we treat observations at zero to be in the tranquil period of nonews so that the link parameter λt,τ does not respond to exogenous shocks. This is essentiallythe same as treating zero-volumes as points of missing data in the DCS filter. It is natural tosuspect that the probability p of having a zero-volume should change over time as p must be lowimmediately after market opening when trading activity is intense but must be high during thequiet lunch time. That is, p must be dynamic and negatively correlated with the intensity oftrading activity. Rydberg and Shephard (2003) and Hautsch et al. (2010) independently studydecomposition models defining the conditional dynamics of p via the logit link in the contextof high-frequency finance. Thus an interesting extension of our model is a hybrid cubic-splineDCS model for the conditional dynamics of p as well as the scale parameter α. However, in thispaper, we retain p to be constant for simplicity and leave this extension for future research. Weclaim that this restrictive assumption on p is inconsequential in our application as the numberof zero-volumes in IBM1m and IBM30s is small.

The joint likelihood function based on F for the set of observations (yt,τ )(t,τ)∈ΨT,I is

L((yt,τ )(t,τ)∈ΨT,I ; θ

)=

∏(t,τ)∈ΨT,I>0

(1− p) exp(−λt,τ )f ∗(εt,τ ; θ∗)∏

(t,τ)∈ΨT,I∪ΨcT,I>0

p

= (1− p)ApT×I−A∏

(t,τ)∈ΨT,I>0

exp(−λt,τ )f ∗(εt,τ ; θ∗)

where f ∗ is the pdf of F ∗ and A = |ΨT,I>0|. The log-likelihood is

logL = A log(1− p) + (T × I − A) log p+∑

(t,τ)∈ΨT,I>0

log (exp(−λt,τ )f ∗(εt,τ ; θ∗)) .

It is easy to check that the maximum likelihood estimator of p is p = (T × I − A)/(T × I).

3.3 Unobserved components of scale

We noted earlier that our series exhibit diurnal U-shaped patterns and strong autocorrelationthat decays slowly. We assume that these features are due to the periodicity and autocorrelationin the conditional scale factor αt,τ . The standardized series εt,τ ≡ yt,τ/αt,τ given Ft,τ−1 isassumed to be iid and free of periodic behavior. As we will estimate αt,τ via the link parameterλt,τ , this means that λt,τ exhibits periodicity and autocorrelation. Before proceeding further,we note that, by the stationarity property of λt,τ , we do not infer to the true stationarityproperty of the underlying process as the spirit of our model is merely to obtain a good localapproximation of the true data generating process. Our analysis is local in the sense that itworks with discrete-time observations of what is actually a continuous-time process collectedover a short sampling period. Given these assumptions, we specify λt,τ as a sum of:

• the periodic component st,τ , which explains the diurnal U-shaped patterns and allows forthe shape of diurnal patterns to change over time;

• the random-walk component µt,τ capturing the highly persistent low-frequency dynamicsaround the periodic component; and

• the stationary (autoregressive) component ηt,τ , which is a mixture of several autoregressivecomponents and explains the long-memory characteristics of data.

Formally, our intra-day DCS model becomes:

yt,τ = εt,τ exp(λt,τ ), λt,τ = δ + µt,τ + ηt,τ + st,τ , εt,τ |Ft,τ−1 ∼ iid F

11

for τ = 1, . . . , I and t = 1, . . . , T . µt,τ is defined as

µt,τ = µt,τ−1 + κµut,τ−11lyt,τ−1>0, ∀ (t, τ) ∈ ΨT, I ,

where ut,τ is the score of F ∗. The estimation results in Section 5 suggest that the inclusionof this random-walk component does a good job in capturing the low-frequency dynamics ofdata.8 ηt,τ is the stationary component defined as:

ηt,τ =J∑j=1

η(j)t,τ , ∀ (t, τ) ∈ ΨT, I

η(j)t,τ = φ

(j)1 η

(j)t,τ−1 + φ

(j)2 η

(j)t,τ−2 · · ·+ φ(j)

m η(j)

t,τ−m(j) + κ(j)η ut,τ−11lyt,τ−1>0

for some J ∈ N>0. We assume that m(j) ∈ N>0 and η(j)t,τ is stationary for all j = 1, . . . , J .

The stationarity requirements on the autoregressive coefficients are the same as ARMA. Bymaking ηt,τ a mixture of autoregressive components, we can capture any long-memory typebehavior that λt,τ may exhibit given the statistically significant and slowly decaying strongautocorrelation we observed in data. The significance of parameters of these components istested by the standard methods of model selection in Section 5.4. Although we defined ηt,τ inthe most general form of J autoregressive components, we report in Section 5.4 that J = 2empirically works well for our application. Moreover, we illustrate in Section 5.3 that ηt,τ withJ = 2 fitted to our data exhibit long memory.

For the identifiability of each component, we assume that µt,τ is less sensitive to changes

in ut,τ−1 than η(1)t,τ , which is in turn less sensitive than η

(2)t,τ . Thus we impose the parameter

restrictions κµ < κ(1)η < κ

(2)η . Moreover, the scale of trade volume should increase in the wake

of positive news. Thus we impose κµ > 0.

3.4 Dynamic cubic spline st,τ

The periodic component st,τ captures the diurnal U-shaped patterns. Diurnal patterns havebeen modeled in the financial literature mainly by the use of either the Fourier representationor deterministic splines or sample moments for each τ = 1, . . . , I. See, for instance, Andersenand Bollerslev (1998), Campbell and Diebold (2005), Engle and Rangel (2008), Brownleeset al. (2011), and Engle and Sokalska (2012), among many others. We pay attention to thefollowing disadvantages of the existing methods for estimating diurnal patterns. First, theygenerally treat diurnal patterns as deterministic and time-invariant, which can be problematicif the patterns change randomly over time as markets evolve. Second, they usually require thediurnal U-shaped component and other stochastic components to be estimated in isolation bya two-step procedure, which may affect the asymptotic properties of statistical tests.

In specifying st,τ , we overcome these issues by applying the dynamic cubic spline model ofHarvey and Koopman (1993). The main benefits of using this cubic spline is twofold. First, itallows for the possibility that the diurnal U-shaped patterns may evolve over time. Second, thediurnal patterns can be estimated very easily and simultaneously with all other components ofthe model by the method of maximum likelihood.

Harvey and Koopman (1993) develop the spline as a contemporaneous filter whose dynamicsare driven by a contemporaneous Gaussian noise. In this paper, we make a subtle differenceby letting it be a predictive filter driven by the lagged score ut,τ−1. Thus the identificationrestrictions of our filter are also different. In order to formally define the dynamic cubic spline

8In other applications, this low-frequency component µt,τ can be an integrated random walk. See Harvey(2013, pp. 92).

12

st,τ , we first give the definition of a special case, termed a static spline, in which the pattern ofperiodicity does not change over time. Some of the technical details are omitted in the followingsections, and we give the complete mathematical construction in Appendix C.

3.4.1 Static daily spline

The cubic spline is termed a daily spline if its periodicity is complete over one trading day.The daily spline is a continuous piecewise function of time and connected at k + 1 knots forsome k ∈ N>0 such that k < I. The coordinates of the knots along the time axis are denotedτ0 < τ1 < · · · < τk, where τ0 = 0, τk = I, and τj ∈ 1, . . . , I − 1 for j = 1, . . . , k − 1.The set of knots is also called mesh. The y-coordinates (height) of the knots are denotedγ = (γ0, γ1, . . . , γk)

>. As the spline is periodic, we impose γ0 = γk so that γ0 is redundant in

estimation. Using the notation γ† = (γ†1, . . . , γ†k)> = (γ1, . . . , γk)

> the static daily spline (i.e.st,τ = sτ ) is defined as

sτ =k∑j=1

1lτ∈[τj−1,τj ] zj(τ) · γ†, τ = 1, . . . , I, (4)

where zj : [τj−1, τj]k → Rk for j = 1, . . . k is a k-dimensional vector of deterministic functions

that conveys all information about the polynomial order, continuity, periodicity, and zero-sumconditions of our static spline. See Appendix C for the derivation of this variable. The zero-sumcondition ensures that the parameters in γ† are identified. To impose the zero-sum condition,we also need to set

γ†k = −k−1∑i=1

w∗iγ†i /w∗k,

where w∗ = (w∗1 . . . w∗k)> is defined in Appendix C.

For a daily spline, we always fix τ0 = 0 at the market opening and τk = I at closure. AsNYSE is open between 9.30am and 4.00pm, this means that we fix τ0 at 9.30am and τk at4pm. The locations of τ1, . . . , τk−1 and the size of k depend on a number of factors such asthe empirical features of diurnal pattern, number of intra-day observations, and estimationefficiency. Increasing the number of knots can sometimes improve the fit of the model, butusing too many knots deteriorates estimation efficiency. In our subsequent analysis, we findthat using knots located at

9.30am, 11am, 12.30pm, 2.30pm, 4pm

works well in terms of estimation efficiency and goodness-of-fit. The shape of the spline up to12.30pm captures the busy trading hours in the morning, between 12.30pm and 2.30pm capturesthe quiet lunch hours, and after 2.30pm captures any acceleration in trading activities beforeclosure. By the periodicity condition of the spline, we set the coordinate (τk, γ

†k) = (τk, γk) at

4pm to be the same as (τ0, γ0) at 9.30am. In this case, we have k = dim(γ†) = 4. The estimationefficiency deteriorates quickly while there is little or no improvement in the goodness-of-fit whenthe number of knots per day increases from this specification.

3.4.2 Weekly spline

The cubic spline becomes a weekly spline if we set the periodicity to be complete over onetrading week instead of one day. In this case, we redefine τ0, τ1, . . . , τk as follows. We letτ0 < τ1 < · · · < τk′ denote the coordinates along the time-axis of the intra-day mesh, wherek′ < I, τ0 = 0, τk′ = I, and τj ∈ 1, . . . , I − 1 for j = 1, . . . , k′ − 1. Then the coordinates

13

τ0, τ1, . . . , τk along the time-axis of the total mesh for the whole trading week is defined asτ0 = τ0 and τik′+j = τj for i = 0, . . . , 4 and j = 1, . . . , k′. This means that (τj)

kj=1 is no

longer an increasing sequence. Thus we have k = 5k′ over one trading week. By the periodicitycondition of the spline, we set the coordinate (τk, γk) at 4pm on Friday to be the same as (τ0, γ0)at 9.30am on Monday.

The weekly spline allows for diurnal patterns to be different between different weekdays andgives the scope to formally test whether the day-of-the-week effect is operative in data. Theday-of-the-week effect is a type of seasonality that may or may not manifest itself in a given setof financial data. Andersen and Bollerslev (1998) test for the presence of such effect in high-frequency returns of the deutsche mark-dollar exchange rate (DM-D) using day-of-the-weekdummies. Lo and Wang (2010) also use dummy variables to test for the day-of-the-week effectin weighted daily turnover9 and returns of indexes of NYSE and AMEX10 ordinary commonshares. In essence, seasonal dummies capture the day-of-the-week effect by allowing for levelshift on each day in the variable being estimated. In contrast, our model captures seasonality byallowing the coordinates (and hence the shape) of cubic spline to be different between differentweekdays. Thus we check for the day-of-the-week effect by testing whether the diurnal U-shaped patterns are significantly different between different weekdays. More formally, we cantest for the day-of-the-week effect by a likelihood ratio test under the null hypothesis:

H0 : (γ1, γ2, . . . , γk′)> = (γk′+1, γk′+2, . . . , γ2k′)

> = · · · = (γ4k′+1, γ4k′+2, . . . , γ5k′)>.

The alternative hypothesis replaces = by 6=. The likelihood ratio statistics under the nullasymptotically has the chi-square distribution with 4k′ degrees of freedom. The way in whichour weekly spline captures the day-of-the-week effect appears more intuitive than the use ofdummy variables as seasonality is embedded in the diurnal U-shaped patterns. However, amajor disadvantage of the weekly spline is that the number of total knots for one week increasesquickly (fivefold) with the number of intra-day knots. If we use 5 knots per trading day (k′ = 4)as in daily spline, the total number of knots for the week is 25 (k = 20). Estimating thismany number of knots can be computationally highly inefficient, especially when the samplingfrequency is high.

3.4.3 Restricted weekly spline

Instead of letting the the coordinates of the mesh be free for all weekdays, we can restrict themto be the same on selected days so that the diurnal pattern is allowed to be different only oncertain days of the week. We call this the restricted weekly spline. In this paper, we restrictthe diurnal pattern to be the same on mid-weekdays (Tuesday-Thursday) and let the patternbe different on Monday and Friday. This restricted weekly spline captures the weekend effect,which, in the context of this paper, is the tendency of trading volume on Fridays and Mondaysto display distinct patterns compared to mid-weekdays. More specifically, one may suspect thattrading volume may tend to accelerate more on Friday afternoon and begin higher on Mondaymorning than on any other weekdays because the amount of news disseminated on Saturdayand Sunday may be more than any of the overnight period during weekdays. The weekendeffect can be viewed as a special case of the day-of-the-week effect.

It appears that the evidence for the day-of-the-week effect in financial data are mixed orgenerally weak and largely depends on the estimation method and the variable being tested.For instance, Andersen and Bollerslev (1998) find that the day-of-the-week effect is insignificantin the DM-D returns once the calendar effect (e.g. daylight saving and public holidays) and

9Turnover is another measure of trading activity. It is trading volume divided by the total number of sharesoutstanding.

10Abbreviation of the American Stock Exchange.

14

the effects of major macroeconomic announcement are taken into account. However, they findindications of a weak (but clear) seasonality on Monday mornings and Friday afternoons. Loand Wang (2010) find that returns exhibit a strong day-of-the-week effect. They also find thatturnover is roughly constant on all days but Mondays and Fridays tend to have slightly lowerturnover than mid-weekdays. The weekend effect is well-documented in many other studies(mainly in the context of asset returns)11 and appears to be more pronounced than the moregeneral day-of-the-week effect. Thus, given the computational cost of the full-blown weeklyspline, estimating the restricted weekly spline may be adequate and sufficient if the ultimategoal is to establish a good forecasting model. Formally, we define (the static version of) therestricted weekly spline as:

sτ =k∑j=1

1lτ∈[τj−1,τj ] zj(τ) · Sγ† (5)

where γ† = (γ†>1, γ†>

2, γ†>

3)> and γ†

ifor i = 1, 2, 3 are k′ × 1 vectors of mesh for Monday, mid-

weekdays (Tuesday-Thursday), and Friday, respectively. S is the following 5k′ × 3k′ matrix ofentries zeros and ones:

S =

Ik′×k′ 0 0

0 Ik′×k′ 0...

......

0 Ik′×k′ 00 0 Ik′×k′

.We can rewrite (5) as

sτ =k∑j=1

1lτ∈[τj−1,τj ] zj(τ) · γ† (6)

wherezj(τ) = S>zj(τ).

Finally, we can let γ† be time-varying according to the dynamics in (7) with appropriate

adjustments to the zero-sum conditions. In this case, we denote κ∗ = (κ∗>1 , κ∗>2 , κ∗>3 )> andγ†t,τ

= (γ†>1;t,τ

, γ†>2;t,τ

, γ†>3;t,τ

)> for t = 1, . . . , T and τ = 0, . . . , I, where dim(κ∗j) = dim(γ†j;t,τ

) = k′

for j = 1, 2, 3. If we place 5 knots per trading day as we specified for the daily spline, we havek′ = 4 giving k = 12.

3.4.4 Dynamic spline

The static daily spline (4) or the restricted weekly spline (6) becomes dynamic by letting γ† betime-varying as

st,τ =k∑j=1

1lτ∈[τj−1,τj ] zj(τ) · γ†t,τ, γ†

t,τ= γ†

t,τ−1+ κ∗ · ut,τ−11lyt,τ−1>0 (7)

for τ = 1, . . . , I and t = 1, . . . , T , where κ∗ = (κ∗1, . . . , κ∗k)> and zj(τ) is to be replaced by zj(τ)

for the restricted weekly spline. Harvey and Koopman (1993) use a set of contemporaneousdisturbances to drive the dynamics of γ†

t,τinstead of the lagged score. In terms of parameter

identification, our dynamic spline in (7) still satisfies the zero-sum condition over one complete

11See Cross (1973), French (1980), Gibbons and Hess (1981), Harris (1986), Jaffe and Westerfield (1985),Keim and Stambaugh (1984), Lakonishok and Levi (1982), and Lakonishok and Smidt (1988).

15

period due to the construction of zj(τ), but we also need to impose the parameter restrictions

γ†k;1,0 = − 1

w∗k

k−1∑i=1

w∗iγ†i;1,0 κ∗k = − 1

w∗k

k−1∑i=1

w∗iκ∗i . (8)

where γ†i;1,0 denotes the i-th element of γ†1,0

. See Appendix C.

3.4.5 Public holidays

Public holidays can be treated in the same way as overnight periods or weekends where datais missing due to market closure. Whenever the sampling period includes public holidays, weleave λt,τ unchanged by setting

µt,τ = µt,τ−1, η(1)t,τ = η

(1)t,τ−1, η

(2)t,τ = η

(2)t,τ−1, γ†

t,τ= γ†

t,τ−1

for all τ = 1, . . . , I whenever t ∈ H ≡ t ∈ N>0 : t is a public holiday. The joint log-likelihoodis defined only for days t ∈ Hc.

3.5 Summary of the model specification

To summarize, our intra-day DCS model with dynamic spline is

yt,τ = εt,τ exp(λt,τ ), εt,τ |Ft,τ−1 ∼ iidF

λt,τ = δ + µt,τ + ηt,τ + st,τ

µt,τ = µt,τ−1 + κµut,τ−11lyt,τ−1>0

ηt,τ =J∑j=1

η(j)t,τ (9)

η(j)t,τ = φ

(j)1 η

(j)t,τ−1 + φ

(j)2 η

(j)t,τ−2 · · ·+ φ

(j)

m(j)η(j)

t,τ−m(j) + κ(j)η ut,τ−11lyt,τ−1>0, j = 1, 2

st,τ =k∑j=1

1lτ∈[τj−1,τj ] zj(τ) · γ†t,τ

γ†t,τ

= γ†t,τ−1

+ κ∗ · ut,τ−11lyt,τ−1>0

for τ = 1, . . . , I and t = 1, . . . , T , where F is as defined in Section 3.2. st,τ is either a daily orweekly spline. ut,τ is the score of conditional distribution F ∗. zj(τ) is to be replaced by zj(τ)for the restricted weekly spline. Parameters must also satisfy the identifiability and stationarityrestrictions specified above.

4 Maximum likelihood estimation

All parameters of the model can be estimated by the method of maximum likelihood. Andresand Harvey (2012) show the asymptotic consistency and normality of the maximum likelihoodestimator for the basic DCS model (2) where the scale parameter αt|t−1 (or λt|t−1) is station-ary. In this paper, we do not formally verify that their asymptotic results also extend to ourparticular specification with nonstationary scale parameter. This is left as an interesting topicfor future research.

16

4.1 Initialization of maximum likelihood

We choose the initial values of the components of λ1,0 as follows.

• η1,0: as we have E[η(1)t,τ ] = E[η

(2)t,τ ] = 0 by definition of ηt,τ , set η1,0 = 0.

• µ1,0: as µt,τ is assumed to be a random-walk, we have E[µt,τ ] = µ1,0. So we impose theconstraint µ1,0 = 0 for δ to be identified.

• s1,0 or γ†1,0

: we treat the elements of γ†1,0

as unknown constant parameters to be estimated

simultaneously with all other parameters of the model. We impose the zero-sum constraintw∗ · γ†1,0 = 0 as specified in (8).

As a preview of what we are going to see later, Figure 5 compares the estimated daily andrestricted weekly splines with static coordinates γ† and the dynamic γ†

t,τ. We find that st,τ

successfully captures the diurnal U-shaped patterns every trading day in all cases, and that

• the daily spline with static coordinates (γ†) restricts diurnal patterns to be the same forall trading days (the top left panel);

• the restricted weekly spline with static coordinates (γ†) allows diurnal patterns to varyduring the week but restricts the weekly pattern to be fixed for all trading weeks (thebottom left panel); and

• the dynamic splines with time-varying coordinates (γ†t,τ

) is more flexible in capturing

changing diurnal patterns over different trading days than the static splines. The re-stricted weekly spline appears more dynamic than the daily spline (the right panels).

‐0.8

‐0.4

0

0.4

0.8

1.2

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

28‐3 Mar 6‐10 Mar 13‐17 Mar 20‐24 Mar 27‐31 Mar

‐0.8

‐0.4

0.0

0.4

0.8

1.2

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday


‐0.8

‐0.4

0

0.4

0.8

1.2

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday


‐0.8

‐0.4

0.0

0.4

0.8

1.2

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday


Figure 5 – st,τ : static daily spline (top left), dynamic daily spline (top right), static restricted weeklyspline (bottom left), and dynamic restricted weekly spline (bottom right). Obtained by fitting variants ofModel 2 of Table 2 to IBM30s. 28 February - 31 March 2000. Shaded regions distinguish different weeksof the sample period.

17

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

res > 0, Burr

Em

piric

al C

DF

Empirical CDF

res > 0Burr with estimated parameters

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0)

Em

piric

al C

DF

of F

*(re

s >

0)

Empirical CDF

Figure 6 – Empirical cdf of εt,τ > 0 against cdf of Burr(ν, ζ) (left). Empirical cdf of the PIT of εt,τ > 0

computed under F ∗(·; θ∗) ∼Burr(ν, ζ) (right). IBM1m over 28 February - 31 March 2000. Using Model 2specified in Table 2.

5 Estimation results

This section reports the estimation results of fitting our model to IBM30s and IBM1m. Asbefore, the sampling period is Monday 28 February to Friday 31 March 2000, which covers fiveconsecutive trading weeks and no public holidays.

5.1 Choice of F ∗: general-to-specific approach

Given the shape of empirical distributions of data, distributions in the GG, GB, and Paretoclasses are among our candidate F ∗ for both IBM1m and IBM30s. Note that the GB distri-bution is closely related to the Pareto class of distributions. See Appendix A and Figure 4for the formal definitions of these distributions and the relationship between them. Taking ageneral-to-specific approach, we sequentially estimate GB, Burr, and then log-logistic from theGB class of distributions. Within the GG class of distributions, we sequentially estimate GG,and then Gamma and Weibull.

We find that the GB distribution is difficult to estimate as it has three parameters inθ∗. Burr fits both IBM30s and IBM1m very well. As the GG distribution is an asymptoticdistribution of GB with large ζ, GG can be considered as a special class of GB. However, thegoodness of fit of the GG class of distributions are generally found to be inferior to the GBclass of distributions. The inferior fit of GG is consistent with the estimated size of the GBparameter ζ, which is far from being large for both IBM1m and IBM30s.

Figure 6 illustrates the impressive fit of Burr for IBM1m. The quality of fit for IBM30s isequally as impressive. The empirical cdf of non-zero εt,τ appears to overlap the cdf of Burr(ν,ζ)almost everywhere12 so that these two lines are visually indistinguishable. The empirical cdf ofthe PIT values of non-zero εt,τ computed under F ∗(·; θ∗) ∼ Burr(ν, ζ) lies along the diagonal,indicating that the distribution of the PIT is very close to the standard uniform distribution(denoted U [0, 1]). As an experiment, we have also fitted many other distributions including F,inverse-Gamma, and log-Cauchy. However, the estimation results are inferior to the results ofBurr because either the residual PIT values of these distributions become far from uniform orwe cannot remove serial dependence of the score.

5.2 Comparing with log-normality

A standard approach to non-negative time series in the financial literature is to fit the log-normal distribution to data whenever the logarithm of observations roughly resemble normality.In such cases, the degree of efficiency and bias in the estimated parameters depend on how far

12In a non-mathematical term.

18

−6 −4 −2 0 2 4 60

500

1000

1500

2000

2500

log(IBM1m>0)

Em

piric

al fr

eque

ncy

−4 −3 −2 −1 0 1 2 3 4 50

500

1000

1500

2000

2500

3000

3500

4000

log(IBM30s>0)

Em

piric

al fr

eque

ncy

Figure 7 – The frequency distributions of log(IBM1m) (left) and log(IBM30s) (right) at non-zero obser-vations. The series are re-centered around mean and standardized by one standard deviation.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NormCDF(log(IBM1m))

Em

piric

al C

DF

of N

orm

CD

F(lo

g(IB

M1m

)) Empirical CDF

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NormCDF(log(IBM30s))Em

piric

al C

DF

of N

orm

CD

F(lo

g(IB

M30

s)) Empirical CDF

Figure 8 – Empirical cdf of the PIT values of log(IBM1m) (left) and log(IBM30s) (right) at non-zeroobservations computed under the standard normal assumption. The series are re-centered around meanand standardized by one standard deviation.

−4 −3 −2 −1 0 1 2 3 4−6

−4

−2

0

2

4

6

Theoretical N(0,1) quantiles

Em

piric

al q

uant

iles

of IB

M1m

>0

QQ Plot of Sample Data versus Standard Normal

−5 −4 −3 −2 −1 0 1 2 3 4 5−4

−3

−2

−1

0

1

2

3

4

5

Theoretical N(0,1) quantiles

Em

piric

al q

uant

iles

of IB

M30

s>0

QQ Plot of Sample Data versus Standard Normal

Figure 9 – The QQ-plots of log(IBM1m) (left) and log(IBM30s) (right) at non-zero observations. Theseries are re-centered around mean and standardized by one standard deviation.

the distribution of the log-variable is from normality. See, for example, Alizadeh et al. (2002)for a discussion and an empirical example. In our case, the estimation results of F ∗ ∼ log-normal are inferior to the results of Burr for both IBM1m and IBM30s as the PIT values ofthe residuals turn out to be far from uniform.

Figure 7 shows the frequency distributions of the logarithm of non-zero IBM1m and IBM30sre-centered around the mean and standardized by one standard deviation (hereafter denotedlog(IBM1m) and log(IBM30s)). While the frequency distribution of log(IBM1m) roughly re-sembles normality, log(IBM30s) looks far from normal. Figure 8 shows the empirical cdfs ofthe PIT values of log(IBM1m) and log(IBM30s) under the normality assumption. As the em-pirical cdfs meander around the diagonal but do not lie on it, the two series cannot be normal.The QQ-plots of log(IBM1m) and log(IBM30s) in Figure 9 show that the normal distributionappears to put too much weight on the lower-tail and too little weight on the upper-tail. Thusthe failure of F ∗ ∼ log-normal can be ascribed to the departure of the empirical distributionsof log(IBM1m) and log(IBM30s) from normality particularly around the tail regions. The su-perior performance of Burr over log-normal is presumably because the shape of log-normal is

19

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

x

pdf

exp−BurrN(0,0.5)N(0,0.7)N(0,1)N(0,1.5)

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

x

pdf

exp−BurrN(0,0.5)N(0,0.7)N(0,1)N(0,1.5)

Figure 10 – Comparing exp-Burr(ν,ζ) against normal with parameter σ (denoted N(0, σ)). ν and ζ areobtained by fitting Model 2 to IBM30s (left) and IBM1m (right). See Footnote 14 for the pdf of exp-Burr.

determined by only one parameter (σ) while Burr has two shape parameters (ν and ζ). Thismakes the shape of Burr more flexible than log-normal.13

To compare the shape of Burr and log-normal, Figure 10 shows the pdf of the exponential-Burr distribution (denoted exp-Burr)14 against the normal distribution at different values ofσ.15 For IBM30s, we find that the normal distribution is dissimilar to exp-Burr at any valuesof σ. For IBM1m, although normal at σ = 0.7 appears close to exp-Burr, the symmetric shapeof normal distribution contrasts with the asymmetry of exp-Burr.

5.3 Long memory

We find that empirically a two component specification for ηt,τ is the most effective in capturingthe autocorrelation structure of data for both IBM1m and IBM30s. Including the autoregressivelags of order two for η

(1)t,τ and one for η

(2)t,τ was found to be appropriate for both IBM1m and

IBM30s. (Henceforth, we denote this specification of ηt,τ as ηt,τ ∼ AR(2)+AR(1).) That is,the following ηt,τ specification

ηt,τ = η(1)t,τ + η

(2)t,τ , (t, τ) ∈ ΨT, I

η(1)t,τ = φ

(1)1 η

(1)t,τ−1 + φ

(1)2 η

(1)t,τ−2 + κ(1)

η ut,τ−11lyt,τ−1>0

η(2)t,τ = φ

(2)1 η

(2)t,τ−1 + κ(2)

η ut,τ−11lyt,τ−1>0

is our preferred specification, because it completely removes autocorrelation from the estimationresiduals εt,τ and the score ut,τ .

Figure 11 shows that the strong and slowly decaying autocorrelation in IBM1m and IBM30sare captured successfully by the model so that the estimation residuals εt,τ and the score ut,τshow no obvious signs of serial correlation. The absence of autocorrelation is also illustratedmore formally by the Box-Ljung statistics in Table 2, which summarizes statistics for assessing

13For the formal definitions of Burr and log-normal, see Appendix A.14If a random variable X follows the standard Burr distribution with parameters ν and ζ, we say that log(X)

follows the exponential-Burr distribution with the pdf

f(x; ν, ζ) =ν exp(xν)(exp(xν) + 1)−1−ζ

B(1, ζ), x > 0, and ν, ζ > 0

where B(·, ·) denotes the Beta function.15If a random variable X follows the standard log-normal distribution with parameter σ, log(X) follows the

normal distribution with mean 0 and standard deviation σ.

20

the goodness-of-fit of our model fitted to IBM1m and IBM30s.16 Whenever we change thespecification from ηt,τ ∼ AR(1) to ηt,τ ∼ AR(2)+AR(1), AIC and SIC17 fall and autocorrelationin ut,τ is removed. Note that ut,τ exhibits stronger serial correlation than εt,τ . This is becausethe score down-weighs (and thus it is robust to) the effects of extreme observations.

0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of y


0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of r

es


0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of u


0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of y


0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of r

es


0 20 40 60 80 100 120 140 160 180 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Lag

Sam

ple

Aut

ocor

rela

tion

of u


Figure 11 – Sample autocorrelation of trade volume (left) of εt,τ (middle), and of ut,τ (right). IBM1m(top row) and IBM30s (bottom row). Using Model 2 as specified in Table 2. The 95% confidence intervalis computed at ±2 standard errors. Sample period: 28 February - 31 March 2000.

16In Table 2, Q(K) is the Box-Ljung statistic based on the first K autocorrelation computed under the nullhypothesis of no autocorrelation up to order K. Q(K) is compared against the chi-square distribution withK − q degrees of freedom (denoted χ2

K−q), where q is the number of dynamic parameters. (e.g. the basic

DCS in (1) has two dynamic parameters φ and κ.) of the model. We report the results for K ≈√T × I and

some deviations above it following Harvey (1993, pp.76). * and ** denote significance at the 5% and 1% levels,respectively.

17AIC and SIC stand for the Akaike and Schwarz information criteria, respectively.

21

IBM1m IBM30s

Model 1 Model 2 Model 3 Model 4 Model 1 Model 2 Model 3 Model 4

Type of spline Daily Daily Weekly Weekly Daily Daily Weekly Weeklyηt,τ AR(1) AR(2)+AR(1) AR(1) AR(2)+AR(1) AR(1) AR(2)+AR(1) AR(1) AR(2)+AR(1)

Number of dynamic coef. (q) 6 9 14 17 6 9 14 17Maximum log-like -104,158 -104,136 -104,139 -104,123 -195,540 -195,509 -195,524 -195,498

AIC 21.314 21.310 21.313 21.310 20.031 20.028 20.031 20.029SIC 21.323 21.322 21.334 21.334 20.036 20.035 20.043 20.042

KS stat on εt,τa 1.242 0.968 1.051 1.086 0.903 0.924 0.868 0.870

KS stat: p-value 0.001* 0.004* 0.001* 0.000* 0.011 0.009* 0.016 0.015Autocorrelation of εt,τ :

Q(100)b 96.3 92.8 98.5 92.8Q(150) 154.0 152.9 156.5 150.5 90.3 88.9 90.7 87.6Q(200) 197.1 197.6 202.3 197.2 116.0 114.0 117.5 113.5

Autocorrelation of ut,τ :Q(100) 128.8** 90.3 130.0** 94.4Q(150) 163.6 126.2 164.6* 130.1 196.2** 142.3 201.0** 143.9Q(200) 208.8 169.1 208.1 173.1 239.7* 183.4 244.1** 185.2

a Kolmogorov-Smirnov test statistic with finite sample adjustment by√T × I computed under F ∗. Selected critical values obtained by simulation are

tabulated in Appendix E. Simulation assumes three parameters α, ν, and ζ are estimated from data. * indicates significance at the 1% level.b Box-Ljung statistic Q(K) based on the first K autocorrelation. H0: No autocorrelation up to order K. Compare against χ2

K−q where the degree of freedomis adjusted for the number of dynamic coefficients q. * and ** denote significance at the 5% and 1% levels, respectively.

Table 2 – Goodness-of-fit statistics. F ∗ ∼ Burr and all specifications use dynamic spline with 5 knots per day. ”Weekly” refers to the restricted weekly spline.

22

0 50 100 150−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Aut

ocor

rela

tion

of e

ta

ACF of etaHyperbolic decay

Figure 12 – Autocorrelation function of ηt,τ of IBM1m.

We find that the autocorrelation function of ηt,τ decays slowly. The rate of decay compareswith a hyperbolic decay as shown in Figure 12. This suggests that this component may bepicking up the presence of long-memory in data. The estimate of the degree of fractionalintegration (d) obtained by fitting ηt,τ to a fractionally integrated filter is d = 0.173 (s.e. 0.079)

for IBM1m and d = 0.140 (s.e. 0.057) for IBM30s, confirming that there is long memoryoperative in ηt,τ and hence in the trade volume series. Appendix D explains the properties of

a fractionally integrated model and how we obtained d.

5.4 Estimated coefficients and diagnostics

Table 3 shows the estimated coefficients of Model 2 as specified in Table 2.18 In the rest of thepaper, we refer to model specifications by the model numbers at the top of Table 2. For bothIBM1m and IBM30s, we have:

• κ(2)η > κ

(1)η > κµ > 0 as expected, which means that η

(2)t,τ is more sensitive to changes in

ut,τ−1 than η(1)t,τ , and that η

(1)t,τ is more sensitive to ut,τ−1 than µt,τ .

19

• 0 < φ(2)1 < 1 so η

(2)t,τ ∼ AR(1) is stationary. The stationarity of η

(1)t,τ ∼ AR(2) requires:20

φ(1)1 + φ

(1)2 < 1, −φ(1)

1 + φ(1)2 < 1, and φ

(1)2 > −1. (10)

It is easy to check that these conditions are satisfied by φ(1)1 and φ

(1)2 so that η

(1)t,τ is also

stationary.

• the estimate of the point-mass probability at zero is p = 0.0006 for IBM1m and p = 0.0047for IBM30s. These estimates are consistent with the sample statistics in Table 1.

• νζ ≈ 2.5 for IBM1m and νζ ≈ 2.4 for IBM30s. This implies that only the first and secondmoments exist under F ∗ ∼ Burr. (See Appendix A.1.) Thus the theoretical skewness doesnot exist for both IBM1m and IBM30s.

18The coefficients for all other model specifications are tabulated in Appendix F.19In this application, these coefficient estimates are obtained without imposing parameter restrictions. In

different applications, one may need to explicitly impose the identifying parameter restrictions.20See, for example, Harvey (1993, pp. 19).

23

IBM1m IBM30s IBM1m IBM30s

κµ 0.008 (-0.002) 0.007 (0.002) γ†1;1,0 0.094 (0.070) 0.122 (0.070)

φ(1)1 0.417 (0.097) 0.561 (0.129) γ†2;1,0 -0.464 (0.076) -0.485 (0.079)

φ(1)2 0.513 (0.101) 0.400 (0.129) γ†3;1,0 -0.233 (0.058) -0.229 (0.058)

κ(1)η 0.051 (0.010) 0.053 (0.008) δ 9.841 (0.157) 9.254 (0.181)

φ(2)1 0.619 (0.067) 0.676 (0.046) ν 2.229 (0.033) 1.635 (0.016)

κ(2)η 0.065 (0.010) 0.091 (0.009) ζ 1.134 (0.042) 1.467 (0.042)κ∗1 0.000 (0.002) 0.000 (0.001) p 0.0006 (0.0003) 0.0047 (0.0005)κ∗2 -0.003 (0.001) -0.002 (0.001)κ∗3 0.000 (0.001) 0.000 (0.001)

Table 3 – Estimated coefficients of Model 2 for IBM1m over 28 February - 31 March 2000. Standard errorsin parenthesis are computed numerically.

5.4.1 Test for the weekend effect

In Table 2, we observe that SIC always increases (while the results are mixed for AIC) whenwe change the specification from daily to weekly spline. This is partly due to the inclusion ofeight extra parameters in the restricted weekly spline, which is penalized severely by SIC. Wecan test whether the restricted weekly spline is preferred over the daily spline by a likelihoodratio test. The null hypothesis21 of the test is

H0 : κ∗1 = κ∗2 = κ∗3 and γ†1;1,0

= γ†2;1,0

= γ†3;1,0

.

The alternative hypothesis replaces = by 6=. As dim(γ†j;1,0

) = dim(κ∗j) = 4 for j = 1, 2, 3, the

likelihood ratio statistics are compared against the chi-square distribution with sixteen degreesof freedom. If the null hypothesis is rejected, it means that there is statistical evidence of theweekend effect over our sampling period; that is, diurnal patterns of Mondays and Fridays arestatistically significantly different to the patterns of mid-weekdays.

Table 4 shows the test results for IBM1m and IBM30s. The null cannot be rejected at the5% level for either series. However, it is important to note that the test statistic for IBM1mis only marginally insignificant, while the test statistic for IBM30s is comfortably outside therejection region. This feature is interesting as they indicate that our inference about seasonaleffects can be influenced by the aggregation interval. These results are roughly consistent withLo and Wang (2010), whose estimation results indicate that the average turnover is roughlyconstant for all weekdays, in spite of beging slighly lower on Mondays and Fridays. As the testresults may depend (not only frequency but) on the sampling period, it would be interestingto apply this test on different sets of data collected over a variety of sampling periods.

Figure 13 shows the estimated daily spline st,τ of Model 2. We find that st,τ successfullycaptures the tendency of trade volume to be high in the morning, fall during the quiet lunchhours of around 1pm, and pick-up again in the afternoon. As it is a dynamic spline, the shapeof diurnal U-shaped pattern varies over time. The shape also appears similar for all weekdayslargely because the daily spline assumes that the diurnal patterns are not significantly differentbetween different weekdays.

21We can also perform the test using the static spline, in which case the null and alternative hypotheses ofthe test become

H0 : γ†1

= γ†2

= γ†3

(daily spline), H1 : γ†16= γ†

26= γ†

3(restricted weekly spline).

24

IBM1m IBM30s

Model specification Models 2 & 4 Models 2 & 4Time-variation in spline Dynamic (γ†

t,τ) Dynamic (γ†

t,τ)

Degrees of freedom (df) 16 16Loglike, restricted -104,136 -195,509

Loglike, unrestricted -104,123 -195,498Likelihood ratio stat 26.0 20.4

χ2df , p-value 0.054 0.203

Table 4 – Likelihood ratio test for the weekend effect. Sample period: 28 February - 31 March 2000. Usingstatic spline. Model specifications are in Table 2.

We can contrast the daily spline with the restricted weekly spline by comparing Figure 13with Figure 14, which shows st,τ of Model 4. The left panel illustrates that the restricted weeklyspline is more dynamic, especially on Mondays and Fridays, than the daily spline. Diurnalpatterns are similar on mid-weekdays largely because the restricted weekly spline assumes thatthe patterns are significantly different only on Mondays and Fridays. As illustrated in the rightpanel, the restricted weekly spline captures the tendency of trade volume to accelerate morebefore the Friday close and begin higher shortly after the Monday open compared to any otherdays, in spite of the weekend effect being insignificant (only marginally for IBM30s) by thelikelihood ratio test.

‐1.0

‐0.5

0.0

0.5

1.0

1.5

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

6 ‐ 10 March 13 ‐ 17 March 20 ‐ 24 March 27 ‐ 31 March

‐1.0

‐0.5

0.0

0.5

1.0

09:30

10:30

11:30

12:30

13:30

14:30

15:30

Tuesday 14 March

‐1.0

‐0.5

0.0

0.5

1.0

1.5

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

6‐10 March 2000 13‐17 March 2000

20‐24 March 2000 27‐31 March 2000

‐1.0

‐0.5

0.0

0.5

1.0

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Average for March 2000

Figure 13 – st,τ of Model 2 for IBM1m. Over 6 - 31 March 2000 (top left). st,τ of a typical day, Tuesday 14March, from market open to close (top right). st,τ of different weeks over 6 - 31 March 2000 superimposed(bottom left). The weekly average of st,τ over 6 - 31 March 2000 (bottom right). Time along the x-axes.

25

‐1.0

‐0.5

0.0

0.5

1.0

1.5

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

6‐10 March 2000 13‐17 March 2000

20‐24 March 2000 27‐31 March 2000

‐1.0

‐0.5

0.0

0.5

1.0

Mon

day

Tuesday

Wed

nesday

Thursday

Friday

Average for March 2000

Figure 14 – st,τ of Model 2 fitted to IBM1m. st,τ of different weeks over 6 - 31 March 2000 superimposed(left). The weekly average of st,τ over 6 - 31 March 2000 (right). Time along the x-axes.

IBM1m IBM30s

Model specification Model 2 Model 4 Model 2 Model 4Type of spline Daily Weekly Daily Weekly

Degree of freedom (df) 1 1 1 1Loglike, restricted -104,141 -104,128 -195,601 -195,591

Loglike, unrestricted -104,136 -104,123 -195,509 -195,498Likelihood ratio stat 10.8 10.8 185.2 185.8

χ2df , p-value 0.001 0.001 0.000 0.000

Table 5 – Likelihood ratio statistics for the test of log-logistic against Burr. Sample period: 28 February- 31 March 2000. Model specifications are in Table 2.

5.4.2 General versus specific distribution

For both IBM1m and IBM30s, the estimates of the Burr parameter ζ are close to one. Ifζ = 1, Burr becomes log-logistic. However, a likelihood ratio test of the null hypothesisH0 : ζ = 1 (restricted) against the alternative H1 : ζ 6= 1 (unrestricted) suggest that theestimated Burr distribution is not log-logistic. (See Table 5.)

5.4.3 Dynamic versus static spline

Some of the elements in κ∗ appear statistically insignificant.22 We can assess whether dynamicspline is preferred over static spline by comparing AIC and SIC. They are tabulated in Table6. We find that both AIC and SIC are slightly larger for the dynamic spline than the staticspline, partly due to the penalty for the inclusion of κ∗.23

22The statistical significance is based on numerical standard error.23It may be possible to test the joint significance of κ∗ formally using some of the classical likelihood-based

tests. However, the asymptotic distribution of the test statistics may not be the usual chi-square distributionunder the null hypothesis (e.g. it may be a mixture of chi-square) as κ∗ lies on the boundary of the parameterspace. (See, for example, Harvey (1991, pp.236) for a discussion on this issue.) Verifying the properties of thelikelihood-based test statistics in this context is left for further research.

26

IBM1m IBM30s

Model specification —— Model 2 —— Model 2Total number of coef. 13 16 13 16

Variation in spline Static (γ†) Dynamic (γ†t,τ

) Static (γ†) Dynamic (γ†t,τ

)

AIC 21.308 21.310 20.0280 20.0281SIC 21.318 21.322 20.033 20.035

Table 6 – Static versus dynamic splines. The criterions for static spline are computed by setting κ∗ = 0in Model 2. Model 2 is as specified in Table 2. F ∗ ∼ Burr.

5.5 Remark on the Kolmogorov-Smirnov test

It is interesting to contrast the quality of the fit of our model illustrated above with theKolmogorov-Smirnov (KS) goodness-of-fit test24 in Table 2.

Despite the impressive fit of Burr demonstrated by the residual PITs in Figure 6, the KStest always rejects Burr at the 1% level for IBM1m. This partly reflects the fact that the sizeof the KS test at a given critical value tends to be smaller when the distribution being tested isgeneralized to include more unknown parameters in θ. For IBM30s, the KS statistics of Models1, 3, and 4 are insignificant at the 1% level so that the null hypothesis of F ∗ ∼ Burr cannot berejected at the 1% level. This quality of fit for IBM30s is quite impressive. Note that Models1 and 3 cannot be the best specification as ut,τ is significantly serially correlated. In all cases,Burr is rejected at the more conventional significace level of 5% for both IBM1m and IBM30s.

Our application illustrates that, when parameters are estimated from data, the KS testcan be found to be too stringent in the sense that it rejects H0 for negligible deviations ofthe empirical distribution of εt,τ from F ∗ especially when the sample size is very large. Thetest becomes even more stringent at a given sample size when the dimension of θ∗ increases.We find that the KS test is generally not a practical mode of model selection in this type ofapplications.

5.6 Estimated components

Figure 15 shows εt,τ , ut,τ and the estimated components of λt,τ of Model 2 for IBM1m.25 Allseries are displayed over the entire sample period between 28 February and 31 March 2000.While the time series plot of IBM1m clearly exhibits periodic patterns, εt,τ appears free of

periodicity. The diurnal U-shaped patterns are captured by λt,τ through the spline st,τ . µt,τand ηt,τ appear to satisfy their structural assumptions as µt,τ resembles a random-walk while η

(1)t,τ

and η(2)t,τ resemble stationary AR(2) and AR(1), respectively. ut is bounded between −ν ≈ 2.2

and νζ ≈ 2.5 as expected.26

24The Kolmogorov-Smirnov (KS) goodness-of-fit statistics in Table 2 take into account the finite sampleadjustment by

√T × I, and are computed under the null hypothesis

H0 : εt,τ | Ft,τ−1, εt,τ > 0 ∼ iid F ∗. (11)

where F ∗ ∼ Burr in this case. The alternative hypothesis replaces ∼ by in (11). Appendix E discusses the KSstatistics in detail. As the unknown parameters in θ∗ of F ∗ are estimated from data, we compute the criticalvalues by simulation. Some of the simulated critical values at the 10%, 5%, and 1% levels are tabulated inAppendix E.

25The results for IBM30s are very similar to Figure 15 and thus they are omitted.26This is due to the property of the beta distribution. See (18) at the end of Appendix B.

27

0

400,000

800,000

1,200,000

1,600,000

2,000,000

2500 5000 7500

IBM_VOLUME

0

10

20

30

40

50

60

2500 5000 7500

RES

-3

-2

-1

0

1

2

3

2500 5000 7500

U

8

9

10

11

12

2500 5000 7500

LAMBDA

-.8

-.6

-.4

-.2

.0

.2

.4

2500 5000 7500

MU

-0.8

-0.4

0.0

0.4

0.8

1.2

2500 5000 7500

S

-.6

-.4

-.2

.0

.2

.4

.6

.8

2500 5000 7500

ETA

-.4

-.2

.0

.2

.4

2500 5000 7500

ETA2

Figure 15 – IBM1m (top left), εt,τ (top middle), αt,τ (top right), λt,τ (second-row left), µt,τ (second-row

middle), st,τ (second-row right), η(1)t,τ (bottom left), η

(2)t,τ (bottom middle), and ut,τ (bottom right) of Model

2 fitted to IBM1m over 28 February - 31 March 2000. Time along the x-axes.

5.6.1 Overnight effect

In Section 2.1, we observed that market activity tend to surge shortly after market openingand before closure, and this lead to extreme spikes in trade volume that recurrently occur afteropen or before close. These extreme observations make the upper-tail of empirical distributionvery long. Standard approaches to such overnight effect include treating extreme observationsas outliers using dummy variables, differentiating day and overnight jumps, and treating eventsthat occur within a specified period immediately after open as censored.27 Such an approachis often employed in some of observation driven models (based on moments of data) termedautoregressive conditional parameter (ACP) models in Koopman et al. (2012). We do notexplicitly model the overnight effect using any jumps or dummy variables or threshold of timebecause; it may take time for the overnight effect to diminish completely during the day; andit is generally difficult to identify which observations are due to overnight information if thereis arrival of new information. Our model assumes that periodic extreme events are naturalpart of the cyclical dynamics of the conditional distribution of data. Figure 16 illustrates thatour model captures the overnight effect by the periodic hikes in αt,τ = exp(λt,τ ), reflectingthe cyclical surge in market activity due to the overnight effect. These results are intuitive asovernight information can be interpreted simply as the elevated probability of extreme events.Note that αt,τ is far more volatile than λt,τ . This highlights one of the advantages of theexponential link function as dynamic movements in the scale of conditional distribution can beinduced by subtle movements in the link parameter λt,τ to which the filter is applied.

27See, for instance, Rydberg and Shephard (2003), Gerhard and Hautsch (2007), and Boes et al. (2007).

28

0

400,000

800,000

1,200,000

1,600,000

2,000,000

1000 2000 3000 4000 5000 6000 7000 8000 9000

IBM_VOLUME

0

20,000

40,000

60,000

80,000

100,000

120,000

1000 2000 3000 4000 5000 6000 7000 8000 9000

EXP_LAMBDA

Figure 16 – Capturing periodic spikes in data: IBM1m (left) and αt,τ = exp(λt,τ ) (right). Samplingperiod: 28 February - 31 March. Time along the x-axes. Using Model 2 as specified in Table 2.

6 Out-of-sample performance

6.1 Model stability: one-step ahead forecasts

We use the predictive cdf to assess the stability of the estimated distribution and parametersas well as the ability of our model to produce good one-step ahead forecasts over a given out-of-sample prediction period. The procedure is as follows. Hence force, we use the followingnotations

Ψh = (t, τ) ∈ T + 1, . . . , T + h × 1, . . . , IΨh,>0 = (t, τ) ∈ T + 1, . . . , T + h × 1, . . . , I : yt,τ > 0

Given a set of in-sample observations up to trading day T , we compute the predictive cdf forthe next h trading-days at each positive observation as

F ∗(εt,τ ; θ∗ ), εt,τ = yt,τ/αt,τ ∀(t, τ) ∈ Ψh,>0 (12)

where all parameters are the maximum likelihood estimates computed using the observationsup to day T . We take the estimation results from the previous sections to compute (12). Thepredictive cdf in (12) simply gives the PIT values of future observations. If our model is agood forecasting model, the predictive cdf in (12) should be distributed as U [0, 1], at leastapproximately, for each (t, τ). We assess the closeness of the PIT values to uniformity onlyqualitatively by inspecting the empirical cdf of (12).28 Figure 17 shows the empirical cdf of thePIT values for IBM1m over forecast horizons up to h = 20 days ahead. In this paper, we do notdisplay the results for IBM30s as they are very similar to IBM1m. As we have 390 observationsper day for IBM1m and 780 observations for IBM30s, h = 20 corresponds to 7,800 steps aheadfor IBM1m and 156,000 steps ahead for IBM30s. We find that

• for both IBM1m and IBM30s, the distribution of the PIT values is roughly U [0, 1] formuch of the 20 days of forecast horizon, although there is non-negligible deterioration inthe quality of fit;

28One can formally test this using the predictive likelihood methods discussed in Mitchell and Wallis (2011).In our application, the predictive (log-)likelihood of a single observation is

log lt,τ = 1lyt,τ>0 log(1− p ) +(1− 1lyt,τ>0

)log p + 1lyt,τ>0 log

(α−1t,τ f

∗(εt,τ ; θ∗ ))

for all (t, τ) ∈ Ψh.

29

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0)

Em

piric

al C

DF

of F

*(re

s >

0)

Empirical CDF

1 day ahead

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0)

Em

piric

al C

DF

of F

*(re

s >

0)

Empirical CDF

5 days ahead

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0)

Em

piric

al C

DF

of F

*(re

s >

0)

Empirical CDF

10 days ahead

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0)

Em

piric

al C

DF

of F

*(re

s >

0)

Empirical CDF

20 days ahead

Figure 17 – Empirical cdf of the PIT values (predictive cdf) for one-step ahead forecasts. Forecast horizons:1 day ahead (top left), 5 days ahead (top right), 10 days ahead (bottom left), 20 days ahead (bottom right).IBM1m for forecast horizons between 3 - 23 April 2000. Using Model 2 as specified in Table 2.

• for both IBM1m and IBM30s, the empirical cdf of the PIT values is close to uniformityaround the upper-middle quantile (roughly between the 50% and 70% quantiles) for allof the horizons; and

• for IBM1m, the empirical cdf of the PIT values is particularly close to uniformity whenh = 5 and h = 20. For IBM30s, it is the closest when h = 5.

In summary, our estimated distributions and parameter values appear to be fairly stable andable to provide reasonable one-step-ahead forecasts of the conditional distribution of our seriesup to 20 days of prediction horizon.

6.2 Multi-step ahead forecasts

We now examine multi-step forecasts, which are of greater interest than one-step forecasts asthey give us an ultimate assessment of our model’s predictive ability. We produce multi-stepahead density forecasts over a long forecast horizon using the estimation results obtained inSection 5. The procedure is as follows. We make an optimal forecast of αt,τ = exp(λt,τ ) for(t, τ) ∈ Ψh conditional on FT,I . This optimality is in terms of MMSE so that the prediction isE [exp(λt,τ )|FT,I ] ≡ αt,τ for (t, τ) ∈ Ψh.

29 We then standardize the actual future observationsyt,τ by αt,τ for all (t, τ) ∈ Ψh. If our model is a good forecasting model, the standardized

future observations εt,τ ≡ yt,τ/αt,τ should be conditionally distributed as Burr(ν, ζ ) at least

approximately. Then F ∗(εt,τ ; θ∗) for (t, τ) ∈ Ψh,>0 with F ∗ ∼ Burr gives the PIT values of

future observations, and they should be distributed as U [0, 1]. The PIT values should also

29See (16) and (17) in Appendix B for how this quantity is computed. The difference between notations ·t,τand ·t,τ is that the computation of the former does not require the use of conditional moment conditions.

30

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PIT value

Em

piric

al C

DF

1 day ahead

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PIT value

Em

piric

al C

DF

5 days ahead

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PIT value

Em

piric

al C

DF

8 days ahead

Figure 18 – Empirical cdf of the PIT of multi-step forecasts. Forecast horizons: 1 day ahead (left), 5days ahead (middle), 8 days ahead (right). IBM1m for forecast horizons between 3 - 23 April 2000. UsingModel 2 as specified in Table 2.

0 10 20 30 40 50 60 70 80 90−0.5

0

0.5

1

Lag

Sam

ple

Aut

ocor

rela

tion


0 10 20 30 40 50 60 70 80 90−0.5

0

0.5

1

Lag

Sam

ple

Aut

ocor

rela

tion


0 10 20 30 40 50 60 70 80 90−0.5

0

0.5

1

Lag

Sam

ple

Aut

ocor

rela

tion


Figure 19 – Autocorrelation of the PIT of multi-step forecasts. Forecast horizons: 1.5 hours ahead (left),one day ahead (middle), five days ahead (right). IBM1m for forecast horizons between 3 - 23 April 2000.Using Model 2 as specified in Table 2.

be free of autocorrelation if the predictions are good. We make predictions for up to h = 8future trading days. As we have 390 observations per day for IBM1m and 780 observations forIBM30s, h = 8 corresponds to 3,120 steps ahead for IBM1m and 6,240 steps ahead for IBM30s.

Figure 18 shows the empirical cdf of the PIT values for IBM1m. As before, we do not displaythe results for IBM30s as they are very similar to IBM1m. We find that the distribution ofthe PIT values is very close to U [0, 1] for the first day of forecast horizon for both IBM1m andIBM30s. This means that the density prediction produced by our model is very good for the first390 steps for IBM1m and 780 steps for IBM30s. Figure 19 shows the sample autocorrelationfunction of the PIT values for IBM1m. A Box-Ljung test indicate that the PIT values for thefirst quarter of the first forecast day, equivalent to 100 steps ahead for IBM1m, is not seriallycorrelated. By the end of the first forecast day, the PIT becomes autocorrelated, although thedegree of autocorrelation is weak. If we consider this predictive performance in terms of thenumber of steps, these results are impressive as the PIT values are remarkably close to beingiid U [0, 1] for several hundred steps of the forecast horizon. Beyond the first forecast day,the quality of density forecasts and the degree of autocorrelation deteriorate with the length offorecast horizon.

Any deterioration in the goodness-of-fit over a given out-of-sample period can be attributedto changes in fundamental factors such as the nature of periodicity, autocorrelation, and theshape of underlying distribution. In order to reflect these changes, the model should be re-estimated frequently as more data become available. When updating the model, one may wantto keep the length of sampling period roughly fixed and roll it forward instead of arbitrarilyincreasing the sample size because

• the estimation efficiency deteriorates relatively quickly as the number of sampling daysT increases in a high-frequency environment with large I; and

• our model is only a local approximation as we emphasized earlier.

31

7 Concluding remarks and further directions

The main contribution of this paper is the development of an intra-day DCS model with dy-namic cubic spline to be fitted to ultra-high frequency observations with periodic behavior.We illustrated the excellent performance of our model in a financial application. The objectof empirical analysis is trade volume as measured by the number of shared traded, and, assuch, this study also contributes to the literature dedicated to the analysis of market activityand intensity. This paper is a new addition to countless studies devoted to modeling and fore-casting the dynamics of price and quantity in isolation. By the guiding principle of economicanalysis, market prices are determined by the interaction of supply and demand, and thus suchone-dimensional studies is ultimately not satisfactory. Thus, the next step towards a morecomplete analysis is to construct multivariate intra-day DCS modeling price and volume simul-taneously. One can also extend our model to study correlation of scale with other variables, orconstruct a multivariate model for panel-data using composite likelihood.

We illustrated that our spline-DCS model deals with recurrent extreme observations by cap-turing them as part of the periodic movements in the dynamic conditional scale αt,τ . We alsonoted that this contrasts with the standards methods for dealing with overnight effect usingdummy variables, jumps, or censoring. It would be interesting to compare the performance ofthe spline-DCS model against that of some of the ACP models such as the ACD or autoregres-sive conditional intensity (ACI) models of Engle and Russell (1998) and some of multiplicativeerror models surveyed in Hautsch (2012, pp. 99) in a variety of applications.

References

Alizadeh, S., Brandt, M. W. and Diebold, F. X. (2002). Range-based estimation of stochasticvolatility models, Journal of Finance 57(3): 1047–1091.

Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal,Econometrica 41(6): 997–1016.

Andersen, T. G. and Bollerslev, T. (1998). Deutsche mark-dollar volatility: intraday activitypatterns, macroeconomic annoucements, and longer run dependencies, Journal of Finance53(1): 219–265.

Andres, P. and Harvey, A. C. (2012). The dynamic location/scale model: with applications tointra-day financial data, Cambridge Working Papers in Economics CWPE1240, Universityof Cambridge.

Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics, Journalof Econometrics 73(1): 5–59.

Boes, M.-J., Drost, F. C. and Werker, B. J. M. (2007). The impact of overnight periods onoption pricing, Journal of Financial and Quantitative Analysis 42(2): 517–533.

Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity, Journal ofEconometrics 31(3): 307–327.

Bollerslev, T. and Mikkelsen, H. O. (1996). Modeling and pricing long memory in stock marketvolatility, Journal of Econometrics 73(1): 151–184.

Bowsher, C. G. and Meeks, R. (2008). The dynamics of economic functions: Modelling andforecasting the yield curve, Economics Papers 2008-W05, Economics Group, Nuffield College,University of Oxford.

32

Brownlees, C. T., Cipollini, F. and Gallo, G. M. (2011). Intra-daily volume modelling andprediction for algorithmic trading, Journal of Financial Economics 9(3): 489–518.

Campbell, S. D. and Diebold, F. X. (2005). Weather forecasting for weather derivatives, Journalof American Statistical Association 100(469): 6–16.

Cross, F. (1973). The behavior of stock prices on Fridays and Mondays, Financial AnalystsJournal 29(6): 67–69.

Diebold, F. X. and Rudebusch, G. D. (1989). Long memory and persistence in aggregateoutput, Journal of Monetary Economics 24(2): 189–209.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the varianceof United Kingdom inflation, Econometrica 50(4): 987–1007.

Engle, R. F. and Rangel, J. G. (2008). The spline-GARCH model for low-frequency volatilityand its global macroeconomic causes, Review of Financial Studies 21(3): 1187–1222.

Engle, R. F. and Russell, J. R. (1998). Autoregressive conditional duration: A new model forirregularly spaced transaction data, Econometrica 66(5): 1127–1162.

Engle, R. F. and Sokalska, M. E. (2012). Forecasting intraday volatility in the US equity market.multiplicative component GARCH, Journal of Financial Econometrics 10(1): 54–83.

French, K. (1980). Stock returns and the weekend effect, Journal of Financial Economics8(1): 55–69.

Gerhard, F. and Hautsch, N. (2007). A dynamic semiparametric proportional hazard model,Studies in Nonlinear Dynamics and Econometrics 11(2).

Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory timeseries, Journal of Time Series Analysis 4(4): 221–238.

Gibbons, M. R. and Hess, P. (1981). Day of the week effects and asset returns, Journal ofBusiness 54(4): 579–596.

Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series modelsand fractional differencing, Journal of Time Series Analysis 1(1): 15–29.

Harris, L. (1986). A transaction data study of weekly and intradaily patterns in stock returns,Journal of Financial Economics 16(1): 99–117.

Harvey, A. C. (1991). Forecasting, Structural Time Series Models and the Kalman Filter,Cambridge University Press.

Harvey, A. C. (1993). Time Series Models, 2nd edn, Harvester, Wheatsheaf.

Harvey, A. C. (2010). Exponential conditional volatility models, Cambridge Working Papersin Economics CWPE1040, University of Cambridge.

Harvey, A. C. (2013). Dynamic models for volatility and heavy tails: with applications tofinancial and economic time series, Econometric Society monograph, Cambridge UniversityPress (forthcoming).

Harvey, A. C. and Chakravarty, T. (2008). Beta-t-(E)GARCH, Cambridge Working Papers inEconomics CWPE0840, University of Cambridge.

33

Harvey, A. C. and Koopman, S. J. (1993). Forecasting hourly electricity demand using time-varying splines, Journal of the American Statistical Association 88(424): 1228–1236.

Harvey, A. C., Koopman, S. J. and Riani, M. (1997). The modeling and seasonal adjustmentof weekly observations, Journal of Business & Economic Statistics 15(3): 354–68.

Harvey, A. C. and Luati, A. (2012). Filtering with heavy tails, Cambridge Working Papers inEconomics CWPE1255, University of Cambridge.

Hautsch, N. (2012). Econometrics of financial high-frequency data, Springer, Berlin.

Hautsch, N., Malec, P. and Schienle, M. (2010). Capturing the zero: a new class of zero-augmented distributions and multiplicative error processes, SFB 649 Discussion PapersSFB649DP2010-055, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.

Heckman, J. J. (1974). Shadow prices, market wages, and labor supply, Econometrica42(4): 679–94.

Hosking, J. R. M. (1981). Fractional differencing, Biometrika 68(1): 165–176.

Jaffe, J. and Westerfield, R. (1985). The week-end effect in common stock returns: the inter-national evidence, Journal of Finance 40(2): 433–454.

Keim, D. B. and Stambaugh, R. F. (1984). A further investigation of the weekend effect instock returns, Journal of Finance 39(3): 819–835.

Koopman, S. J., Lucas, A. and Scharth, M. (2012). Predicting time-varying parameters withparameter-driven and observation-driven models, Tinbergen Institute Discussion Papers 12-020/4, Tinbergen Institute.

Lakonishok, J. and Levi, M. (1982). Weekend effects on stock returns: a note, Journal ofFinance 37(3): 883–889.

Lakonishok, J. and Smidt, S. (1988). Are seasonal anomalies real? A ninety-year perspective,Review of Financial Studies 1(4): 403–425.

Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and varianceunknown, Journal of the American Statistical Association 62(318): 399–402.

Lo, A. W. and Wang, J. (2010). Stock market trading volume, in Y. At-Sahalia and L. Hansen(eds), Handbook of financial econometrics, Vol. 2, North-Holland, New York.

McCulloch, R. E. and Tsay, R. S. (2001). Nonlinearity in high-frequency financial data andhierarchical models, Studies in Nonlinear Dynamics and Econometrics 5(1): 1–17.

Mitchell, J. and Wallis, K. F. (2011). Evaluating density forecasts: forecast combinations, modelmixtures, calibration and sharpness, Journal of Applied Econometrics 26(6): 1023–1040.

Robinson, P. M. (1990). Time series with strong dependence, Advances in Econometrics: 6thworld congress, Cambridge University Press, Cambridge.

Rydberg, T. H. and Shephard, N. (2003). Dynamics of trade-by-trade price movements: de-composition and models, Journal of Financial Econometrics 1(1): 2–25.

Taylor, S. (1986). Modelling financial time series, Wiley, Chichester West Sussex ; New York.

34

Taylor, S. (2005). Asset price dynamics, volatility, and prediction, Princeton University Press,Princeton, N.J.

Zhang, M. Y., Russell, J. R. and Tsay, R. S. (2001). A nonlinear autoregressive conditionalduration model with applications to financial transaction data, Journal of Econometrics104(1): 179–207.

A List of distributions and their properties

A.1 Generalized beta distribution

The (standard) generalized beta (GB) distribution (also known as the generalized beta distri-bution of the second kind) has the probability density function (pdf):

f(x; ν, ξ, ζ) =νxνξ−1(xν + 1)−ξ−ζ

B(ξ, ζ), x > 0, and ν, ξ, ζ > 0

where B(·, ·) denotes the Beta function. GB becomes the Burr distribution when ξ = 1 and thelog-logistic distribution when ξ = ζ = 1. The Burr distribution is also called the Pareto TypeIV distribution (denoted Pareto IV). Log-logistic is also called Pareto III. Burr becomes ParetoII when ν = 1. Burr becomes Weibull (defined in Appendix A.2) when ζ →∞. GB with ν = 1and ξ = ζ is a special case of the F distribution with the degrees of freedom ν1 = ν2 = 2ξ.

If a non-standardized random variable Y follows the GB distribution, its pdf fY : R>0 → Rwith the scale parameter α > 0 is

fY (y;α, ν, ξ, ζ) = f(y/α; ν, ξ, ζ)/α, y > 0.

The m-th moment of Y is

E[ym] =αmΓ(ξ +m/ν)Γ(ζ −m/ν)

Γ(ξ)Γ(ζ), m < νζ. (13)

For a set of iid observations y1, . . . , yT where each follows the non-standardized GB distribu-tion, the log-likelihood function of a single observation yt can be written using the exponentiallink function α = exp(λ) with the link parameter λ ∈ R as:

log fY (yt) = log(ν)− νξλ+ (νξ − 1) log(yt)− logB(ξ, ζ)− (ξ + ζ) log[(yte−λ)ν + 1].

The score ut of the non-standardized GB computed at yt is

ut ≡∂ log fY (yt)

∂λ=ν(ξ + ζ)(yte

−λ)ν

(yte−λ)ν + 1− νξ = ν(ξ + ζ)bt(ξ, ζ)− νξ

where we used the notation

bt(ξ, ζ) ≡ (yte−λ)ν

(yte−λ)ν + 1.

By the property of the GB distribution, we know that bt(ξ, ζ) follows the beta distribution withparameters ξ and ζ, which is characterized by the moment generating function (mgf)

Mb(z; ξ, ζ) ≡ E[ebz] = 1 +∞∑k=1

(k−1∏r=0

(ξ + r

ξ + ζ + r

)zk

k!

).

35

It is easy to check that E[ut] = 0.

A.2 Generalized gamma distribution

The (standard) generalized gamma (GG) distribution has the pdf:

f(x; γ, ν) =ν

Γ(γ)xνγ−1 exp (−xν) , 0 < x, and γ, ν > 0,

where Γ(·) is the gamma function. The GG distribution becomes the gamma distribution whenν = 1, the Weibull distribution when γ = 1, and the exponential distribution when ν = γ = 1.If a non-standardized random variable Y follows the GG distribution, its pdf fY : R>0 → Rwith the scale parameter α > 0 is

fY (y;α, γ, ν) = f(y/α; γ, ν)/α, 0 < y.

The m-th moment of Y isE[ym] = αmΓ(γ +m/ν)/Γ(ν).

For a set of iid observations y1, . . . , yT where each follows the non-standardized GG distribu-tion, the log-likelihood function of a single observation yt can be written using the exponentiallink function α = exp(λ) as:

log fY (yt) = log(ν)− λ+ (νγ − 1) log(yte−λ)− (yte

−λ)ν − log Γ(γ).

The score ut of the non-standardized GG computed at yt is


∂λ= ν(yte

−λ)ν − νγ = νgt(ν)− νγ,

where we used the notation gt(ν) ≡ (yte−λ)ν . By the property of the GG distribution, we know

that gt(ν) follows the (standard) Gamma distribution with parameter γ, which is characterizedby the mgf E[egz] = (1− z)−γ for z < 1. We also have E[ut] = 0.

A.3 Log-normal distribution

The (non-standardized) log-normal distribution has the pdf:

fY (y;α, σ) =1

y√

2πσ2exp

(−1

2

(log(y)− log(α)

σ

)2), y > 0, and α, σ > 0

For a log-normally distributed random variable Y , the moments of all orders can be obtainedeasily using the mgf of the normal distribution as E[Y m] = E[em log(Y )] for all m ∈ N>0. Fora set of iid observations y1, . . . , yT , where each follows the log-normal distribution, the log-likelihood function of a single observation yt can be written using the exponential link functionα = exp(λ) as:

log fY (yt) = − log(yt)−1

2log(2π)− 1

2log(σ2)− 1

2

(log(yt)− λ

σ

)2

.

The score ut of the log-normal computed at yt is


∂λ=

log(yt)− λσ2

.

36

Thus, ut is a log-normally distributed random variable. We also have E[ut] = 0 as λ is the firstmoment of log(yt).

B The DCS model of Andres and Harvey (2012)

Andres and Harvey (2012) introduce the DCS model for estimating the sequence of positivescaling factors (αt|t−1)Tt=1 of a given sequence of observations (yt)

Tt=1 so that the standardized

series yt/αt|t−1 ≡ εt satisfy

εt|Ft−1 ∼ iid F, t = 1, . . . T,

for some standard distribution F with a vector of parameters θ and the pdf f . The DCSestimates αt|t−1 via an appropriately chosen canonical link function and the link parameterλt|t−1 ∈ R. The choice of the link function depends on F . Using the exponential link function,the one-factor DCS model of Andres and Harvey is defined as

yt = εt exp(λt|t−1), λt|t−1 = δ + φλt−1|t−2 + κut−1, εt|Ft−1 ∼ iid F, t = 1, . . . , T, (14)

where δ, φ ∈ R and κ ≥ 0 are the parameters of the model and ut is the conditional score:

ut ≡∂ log fY (yt;λt|t−1, θ)

∂λt|t−1

=∂[log f(yte

−λt|t−1 ; θ)− λt|t−1]

∂λt|t−1

, t = 1, . . . , T.

We used fY to denote the pdf of the conditional distribution of yt for each t. λt|t−1 is stationarityif |φ| < 1. If λt|t−1 is stationary, its unconditional first moment is ω = δ/(1 − φ). Andres andHarvey study the case where yt is non-negative for all t, and consider a number of non-negativedistributions for F including GG, GB, and log-normal. However, the application of DCS needsnot be limited to non-negative observations. Beta-t-EGARCH of Harvey and Chakravarty(2008) considers the case where yt|Ft−1 follows the student’s t-distribution in the context ofreturn volatility. The notable features of the DCS model in (14) include:

• the asymptotic properties of the maximum likelihood estimators (MLEs) of the model.Andres and Harvey show the consistency and asymptotic normality of MLEs for a numberof useful distributions.

• the conditional score variable ut, whose specification depends on the conditional distribu-tion of yt. If F is characterized by the GG distribution, ut is a linear function of a gammadistributed random variable. In contrast, if F is characterized by the GB distribution,ut is a linear function of a beta distributed random variable. (See Appendix A.) In bothcases, ut is iid given Ft−1. The sensitivity of the dynamics of λt|t−1 to the effects ofextreme observations is determined by F as we will see later.

• the use of the exponential link function, which means that we do not need to imposeparameter restrictions on δ, φ, and κ to ensure that αt|t−1 is positive.

• the availability of the analytical expressions for the unconditional moments and optimalmulti-step forecasts of yt whenever the corresponding (conditional) moments of εt andthe mgf of ut exist.

37

The analytical expression for the m-th moment of yt is readily available in terms of the m-thmoment of εt and the mgf of ut (whenever they are well defined) as:

E[ymt ] = E[εmt ]E[emλt|t−1 ] = E[εmt ]emω∞∏j=1

E[emψjut−j ] (15)

where the second equality used the MA(∞) representation of λt|t−1 with ψj = κφj−1 for eachj ∈ N>0. (Note that if ut is a function of a gamma distributed random variable, the mgf ofut exists only for a restricted parameter space.) We can also derive the expression for theautocorrelation function of yt using (15).

Given the observations up to time T , we can derive the expression for the optimal30 l-steppredictor E[yT+l|FT ] for l ∈ N>0 by writing λT+l|T+l−1 as

λT+l|T+l−1 = ω +l−1∑k=1

ψkuT+l−k +∞∑k=l

ψkuT+l−k = λT+l|T +l−1∑k=1

ψkuT+l−k, (16)

where λT+l|T = ω +∑∞

k=l ψkuT+l−k is the optimal l-step forecast of λT+l|T+l−1. (16) gives

E[eλT+l|T+l−1|FT ] = eλT+l|T

l−1∏k=1

ET [eψkut+l−k ]. (17)

Thus we can derive the expression for E[yT+l|FT ] using (17) and the expression for E[εT+l|FT ](whenever these moments exists under F ). Moreover, the mean squared error (MSE) ofE[yT+l|FT ] can be similarly derived using

Var[eλT+l|T+l−1|FT ] = E[e2λT+l|T+l−1|FT ]− E[eλT+l|T+l−1|FT ]2

= e2λT+l|T

[l−1∏j=1

E(e2ψjut−j |FT )−l−1∏j=1

E[eψjut−j |FT ]2

].

The existence of the mgf of ut is essential in these expressions.Simulating the l-step predictive distribution for the DCS model is easy because, by (16), it

is the distribution of

yT+l = εT+leλT+l|T

l−1∏j=1

eψjuT+l−j .

Finally, when F is characterized by the GB distribution, the variable bt(ξ, ζ) we defined inAppendix A is bounded between 0 and 1 by the property of the beta distribution. This meansthat we have

−νξ ≤ ut ≤ νζ. (18)

These bounds on ut determines the sensitivity of the dynamics of λt|t−1 to the effects of extremeobservations. If F is characterized by the GG distribution, −νγ < ut and it is unbounded aboveby the property of the gamma distribution.

30This optimality is in terms of the minimum mean square error (MMSE).

38

C Time-varying cubic spline of Harvey and Koopman

(1993)

In this section, we use the time indices defined in Section 3.3 and formally explain the construc-tion of the daily cubic spline of Section 3.4.1. Most of our notations are consistent with Harveyand Koopman. As before, we use the increasing sequence of times 0 = τ0 < τ1 < · · · < τk = Ifor some k < I to denote the knots (or mesh) along the time-axis of our spline, whereτj ∈ 1, . . . , I − 1 for each j = 0, . . . , k. The y-coordinates (height) of the knots are denotedγ = (γ0, γ1, . . . , γk). We denote the distance between knots along the time-axis by hj = τj−τj−1

for j = 1, . . . k. Our cubic spline function g : [τ0, τk]→ R is a piecewise function of the form

g(τ) =k∑j=1

gj(τ)1lτ∈[τj−1,τj ], ∀ τ ∈ [τ0, τk],

where each functiongj : [τj−1, τj]→ R, j = 1, . . . , k

is a polynomial of order up to three. gj for j = 1, . . . , k must satisfy the following threeconditions.

1. Continuity: the function g must be continuous at each knot (τj, γj) so that gj(τj) = γjand gj(τj−1) = γj−1 for all j = 1, . . . , k. This means that we have, for j = 1, . . . , k,

gj(τj−1) = gj−1(τj−1) and g′j(τj−1) = g′j−1(τj−1). (19)

2. Order: g′′j (·) is a linear function on [τj−1, τj] for j = 1, . . . , k. This means that we have

g′′j (τ) = aj−1 +τ − τj−1

hj(aj − aj−1) =

(τj − τ)

hjaj−1 +

(τ − τj−1)

hjaj, (20)

for τ ∈ [τj−1, τj] and j = 1, . . . , k, where a0 = g′′1(τ0) and aj = g′′j (τj) for j = 1, . . . , k.

3. Periodicity: to make g a periodic function of time, g1 and gk (at the beginning and theend of one period) must satisfy

γ0 = γk, g′1(τ0) = g′k(τk), and g′′1(τ0) = g′′k(τk)

so that a0 = ak.

We integrate (20) with respect to τ to find the expressions for g′j and gj for j = 1, . . . , k. Thatis, we evaluate

g′j(τ) =

∫g′′j (τ)dτ, and gj(τ) =

∫ ∫g′′j (τ)dτ

for each j = 1, . . . , k, where we recover the integration constant using the continuity condition(19). From this, we obtain for j = 1, . . . , k

g′j(τ) = −[

1

2

(τj − τ)2

hj− hj

6

]aj−1 +

[1

2

(τ − τj−1)2

hj− hj

6

]aj (21)

gj(τ) = (τj − τ)

[(τj − τ)2 − h2

j

6hjaj−1 +

1

hjγj−1

]+ (τ − τj−1)

[(τ − τj−1)2 − h2

j

6hjaj +

1

hjγj

]= rj(τ) · γ + sj(τ) · a (22)

39

for τ ∈ [τj−1, τj], where we used the notation a = (a0, a1, . . . , ak)>, and rj(τ) and sj(τ) are k×1

vectors defined as

rj(τ) =

(0, . . . , 0,

(τj − τ)

hj,(τ − τj−1)

hj, 0, . . . , 0

)>,

sj(τ) =

(0, . . . , 0, (τj − τ)

(τj − τ)2 − h2j

6hj, (τ − τj−1)

(τ − τj−1)2 − h2j

6hj, 0, . . . , 0

)>.

(23)

The non-zero elements of the two vectors in (23) are at the j-th and (j + 1)-th entries. By theperiodicity condition, we have γ0 = γk and a0 = ak so that γ0 and a0 become redundant duringestimation. Moreover, the conditions for g′j in (19) and (21) give

hjhj + hj+1

aj−1 + 2aj +hj+1

hj + hj+1

aj+1 =6γj−1

hj(hj + hj+1)− 6γjhjhj+1

+6γj+1

hj+1(hj + hj+1)

for j = 2, . . . , k − 1, and

h1

h1 + h2

ak + 2a1 +h2

h1 + h2

a2 =6γk

h1(h1 + h2)− 6γ1

h1h2

+6γ2

h2(h1 + h2), j = 1,

hkhk + h1

ak−1 + 2ak +h1

hk + h1

a1 =6γk−1

hk(hk + h1)− 6γkhkh1

+6γ1

h1(hk + h1)j = k.

From these, we obtained k equations for k ”unknowns” a1, . . . , ak. Using notations γ† =

(γ†1, . . . , γ†k)> = (γ1, . . . , γk)

> and a† = (a†1, . . . , a†k)> = (a1, . . . , ak)

>, we write this system ofequations in a matrix form as Pa† = Qγ†, where

P =

2 h2h1+h2

0 0 . . . 0 h1h1+h2

h2h2+h3

2 h3h2+h3

0 . . . 0 0

0 h3h3+h4

2 h4h3+h4

· · · 0 0

0 0 h4h4+h5

2 · · · 0 0...

......

.... . .

......

0 0 0 0 . . . 2 hkhk−1+hk

h1h1+hk

0 0 0 . . . hkh1+hk

2

,

Q =

− 6h1h2

6h2(h1+h2)

0 . . . 0 6h1(h1+h2)

6h2(h2+h3)

− 6h2h3

6h3(h2+h3)

. . . 0 0

0 6h3(h3+h4)

− 6h3h4

. . . 0 0

0 0 6h4(h4+h5)

. . . 0 0...

......

. . ....

...0 0 0 . . . − 6

hk−1hk

6hk(hk−1+hk)

6h1(h1+hk)

0 0 . . . 6hk(h1+hk)

− 6h1hk

For a non-singular P , we have a† = P−1Qγ†. Then (22) becomes

gj(τ) = wj(τ) · γ†, wj(τ)> = rj(τ)> + sj(τ)>P−1Q (24)

40

for τ ∈ [τj−1, τj], where rj(τ) and sj(τ) are now k× 1 vectors by the periodicity condition with

r1(τ) =

(τ − τ0

h1

, 0, . . . , 0,τ1 − τh1

)>,

s1(τ) =

((τ − τ0)

(τ − τ0)2 − h21

6h1

, 0, . . . , 0, (τ1 − τ)(τ1 − τ)2 − h2

1

6h1

)>,

and rj(τ) and sj(τ) for j = 2, . . . , k are as defined in (23) but the non-zero elements are shiftedto (j − 1)-th and j-th entries. Finally, using the expression in (24), we obtain our cubic splineas

sτ = g(τ) =k∑j=1

1lτ∈[τj−1,τj ]wj(τ) · γ† ∀ τ ∈ [τ0, τk]. (25)

The elements of γ† are parameters of the model to be estimated. For the parameters to be

identified, we impose a zero-sum constraint on the elements of γ†. This constraint is

∑τ∈[τ0,τk]

sτ =∑

τ∈[τ0,τk]

k∑j=1

1lτ∈[τj−1,τj ]wj(τ) · γ† = w∗ · γ† = 0

where we used the notation

w∗ = (w∗1 . . . w∗k)> =

∑τ∈[τ0,τk]

k∑j=1

1lτ∈[τj−1,τj ]wj(τ).

We can impose this condition by setting

γ†k = −k−1∑i=1

w∗iγ†i /w∗k.

Then (25) becomes

sτ =k∑j=1

1lτ∈[τj−1,τj ]

k−1∑i=1

(wji(τ)− wjk(τ)w∗i

w∗k

)γ†i =

k∑j=1

1lτ∈[τj−1,τj ]zj(τ) · γ†

for all τ ∈ [τ0, τk], where wji(τ) denotes the i-th element of wj(τ), and the i-th element of zj(τ)is

zji(τ) =

wji(τ)− wjk(τ)w∗i/w∗k i 6= k

0 i = k(26)

for τ ∈ [τj−1, τj] and i, j = 1, . . . , k. When estimating the model, it is convenient to computew∗ using the equation

w>∗ = r>∗ + s>∗ P−1Q,

where r∗ and s∗ are k × 1 vectors defined as

r∗ =

(τ2 − τ0

2, . . . ,

τk − τk−2

2,τ1 − τ0 + τk − τk−1

2

)>,

s∗ =

(τ2 − τ0 − h3

2 − h31

24, . . . ,

τk − τk−2 − h3k − h3

k−1

24,h1(1− h2

1) + hk(1− h2k)

24

)>.

41

0

0.2

0.4

0.6

0.8

1

1.2

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

|ak|

k

Figure 20 – Decay rate of (1− L)d coefficients

The static spline in (25) becomes dynamic by letting γ† to be time-varying according to

st,τ =k∑j=1

1lτ∈[τj−1,τj ] wj(τ) · γ†t,τ, γ†

t,τ= γ†

t,τ−1+ κ∗ · ut,τ−11lyt,τ−1>0 (27)

for τ = 1, . . . , I and t = 1, . . . , T , where κ∗ = (κ∗1, . . . , κ∗k)> is a vector of parameters. Harvey

and Koopman use a set of contemporaneous disturbances instead of the lagged score. Thedynamic spline (27) still needs to sum to zero over one complete period for the parameters tobe identified. That is, (27) must satisfy

I∑τ=1

st,τ = w∗ · γ†t,τ = 0, t = 1, . . . , T.

We can ensure that this constraint holds by setting

w∗ · γ†1,0 = 0, and w∗ · κ∗ = 0.

These conditions, in turn, become zero-sum constraints on the elements of γ†1,0

and κ∗. We canset

γ†k;1,0 = − 1

w∗k

k−1∑i=1

w∗iγ†i;1,0 κ∗k = − 1

w∗k

k−1∑i=1

w∗iκ∗i .

where γ†i;1,0 denotes the i-th element of γ†1,0

. Finally, using the definition of zj(τ) for j = 1, . . . , k

in (26), we can write (27) as

st,τ =k∑j=1

1lτ∈[τj−1,τj ] zj(τ) · γ†t,τ, γ†

t,τ= γ†

t,τ−1+ κ∗ · ut,τ−11lyt,τ−1>0.

for τ = 1, . . . , I and t = 1, . . . , T .

D Fractional integration

Consider the following fractionally integrated process

(1− L)dyt = εt, t = 1, . . . , T (28)

42

where d is a fraction, L is the lag-operator, and (εt)Tt=1 is a stationary and invertible process.

This process is stationary and invertible if |d| < 0.5 (Granger and Joyeux (1980), and Hosking(1981)).31 Using the Binomial formula, we can expand (1− L)d as

(1− L)d =∞∑k=0

Γ(k − d)

Γ(−d)Γ(k + 1)Lk = 1 +

∞∑k=1

1

k!

(k−1∏j=0

(j − d)

)Lk = 1 +

∞∑k=1

akLk (29)

where we have used ak ≡ 1k!

(∏k−1j=0(j − d)

)in the last equality. Although |ak| → 0 as k →∞,

the convergence is very slow as shown in Figure 20. These slowly decaying coefficients ak impliedby (1 − L)d induce long-memory in yt. Harvey (1993) and Taylor (2005) suggest the methodfor estimating d in the frequency domain developed by Geweke and Porter-Hudak (1983). Theprocedure is as follows. First, we rewrite (28) as yt = (1−L)−dεt. Denote the power spectrumof (εt)

Tt=1 as fε(x) where x ∈ [−π, π] is a frequency in radians. The power spectrum of yt

denoted fy(x) is

fy(x) = |1− e−ix|−2dfε(x) = 2−d(1− cosx)−dfε(x) = 4−d(

sin2 x

2

)−dfε(x), x ∈ [−π, π].

Take the logarithm of this equation and rearrange on both sides using the sample spectraldensity32 I(xj) computed at frequencies xj = 2πj

Tfor j = 0, . . . ,m and m T to get

log I(xj) = log fε(0)− d log[4 sin2

(xj2

)]+ log

(fε(xj)

fε(0)

)+ log

[I(xj)

fy(xj)

]The last term is asymptotically iid with a constant mean and variance π2/6. This term can betreated as the disturbance term. The third term is negligible for frequencies near zero. Thuswe have a regression model:

log I(xj) = β0 + β1 log[4 sin2

(xj2

)]+ vj

where vj is assumed to be iid with zero mean and variance π2/6. Regressing log I(xj) on

log[4 sin2(xj/2)] and taking minus β1 yields d. Consistency and asymptotic normality of dwhen d < 0 are shown by Geweke and Porter-Hudak (1983), and consistency for 0 < d < 0.5 isshown by Robinson (1990). If m is chosen appropriately, d has the limiting distribution

(d− d)/

√Var(d)

d→ N(0, 1)

where Var(d) is obtained from the regression. (See Geweke and Porter-Hudak (1983) or Baillie(1996).) As noted in Taylor (2005) and Baillie (1996), one main issue is the choice of m as this

31Nonetheless, as emphasized in Bollerslev and Mikkelsen (1996), shocks to optimal forecasts over an out-of-sample period will dissipate for all values of d < 1. For d > 0.5, Bollerslev and Mikkelsen (1996) suggest thatone could simulate pre-sample values instead of fixing the pre-sample values at their unconditional counterparts.However, they note that employing a non-parametric estimation procedure of this kind to determine the pre-sample values does not necessarily lead to notably different estimation results from the ones obtained by settingthem at their unconditional counterparts.

32The sample spectral density is defined as

I(x) =1

2π

[c(0) + 2

T−1∑τ=1

c(τ) cos(xτ)

], x ∈ [−π, π],

where c(τ) denotes sample autocovariance.

43

is associated with the bias-accuracy trade-off. Taylor (2005) suggests that a popular choice isto set m = T θ where θ is between 0.5 and 0.8. Other methods for choosing m are suggestedin Baillie (1996). For this study, we chose m = bT 0.5c, which was also used by Diebold andRudebusch (1989).33

E Kolmogorov-Smirnov: simulated critical values

Given a set of n iid observations x1, . . . , xn, we can compute the Kolmogorov-Smirnov statisticusing a cumulative distribution function F0 : SX → [0, 1] defined over the support SX with thevector of parameters β (which may or may not include the location and scale factors) as

Dn = supx∈SX

|Fn(x)− F0(x)| (30)

where Fn : R→ [0, 1] is the empirical cumulative distribution function of x1, . . . , xn defined as

Fn(x) =1

n

n∑i=1

1lxi≤x, x ∈ R.

To perform the KS test, we use KSn =√nDn, which adjusts (30) by the finite sample size.

Given the iid observations x1, . . . , xn, if we want to test the null hypotheses H0 : x1 ∼F0 against the alternative H1 : x1 F0 without specifying the parameter values in β, theasymptotic distribution of KSn needs to be simulated. We tabulate some of the simulatedcritical values of the KS statistics (Dn and KSn) for the two cases, F0 ∼ log-logistic and F0 ∼Burr, in Table 7. For log-logistic, we assume that ν and the scale parameter α are estimatedfrom data using the method of maximum likelihood. For Burr, we assume that ν, ζ, and α areestimated from data. In both cases, we simulate the set of n observations 5,000 times and findthe sample quantiles. The critical values in these tables may be compared with the simulatedcritical values for the Gaussian case in Lilliefors (1967).

33In this appendix, T denotes the total sample size. Recall that, in the body of this paper before theappendices, the total sample size is denoted T × I where T denotes the total number of sampling days and I isthe sampling (aggregation) frequency.

44

Dn KSn

Level of significance Level of significanceSample size (n) 0.10 0.05 0.01 0.10 0.05 0.01

4 0.31 0.33 0.37 0.62 0.66 0.755 0.29 0.31 0.34 0.64 0.69 0.756 0.27 0.28 0.32 0.65 0.70 0.787 0.25 0.27 0.30 0.66 0.71 0.808 0.23 0.25 0.28 0.66 0.71 0.819 0.22 0.24 0.27 0.67 0.72 0.8210 0.21 0.23 0.26 0.68 0.73 0.8211 0.21 0.22 0.25 0.68 0.73 0.8312 0.20 0.21 0.24 0.68 0.73 0.8313 0.19 0.20 0.24 0.68 0.74 0.8514 0.18 0.20 0.23 0.69 0.74 0.8515 0.18 0.19 0.22 0.70 0.75 0.8616 0.17 0.18 0.21 0.69 0.74 0.8417 0.17 0.18 0.21 0.69 0.75 0.8718 0.16 0.18 0.20 0.70 0.75 0.8519 0.16 0.17 0.20 0.70 0.75 0.8720 0.16 0.17 0.19 0.70 0.76 0.85

25 0.14 0.15 0.18 0.70 0.77 0.8830 0.13 0.14 0.16 0.71 0.78 0.88

1000 0.02 0.03 0.03 0.75 0.81 0.915000 0.01 0.01 0.01 0.76 0.83 0.9210000 0.01 0.01 0.01 0.75 0.81 0.9315000 0.01 0.01 0.01 0.74 0.80 0.9420000 0.01 0.01 0.01 0.75 0.81 0.92

Dn KSn

Level of significance Level of significanceSample size (n) 0.10 0.05 0.01 0.10 0.05 0.01

1000 0.02 0.02 0.03 0.73 0.79 0.895000 0.01 0.01 0.01 0.72 0.78 0.8910000 0.01 0.01 0.01 0.73 0.79 0.9115000 0.01 0.01 0.01 0.73 0.79 0.9020000 0.01 0.01 0.01 0.73 0.79 0.91

Table 7 – Simulated critical values of the Kolmogorov-Smirnov statistics under the log-logistic (top) andBurr (bottom) distributions. The statistics are simulated 5,000 times. Unknown parameters estimated bythe method of maximum likelihood are ν and α for log-logistic and ν, ζ and α for Burr.

45

F Estimated coefficients of other Models in Table 2

IBM1m IBM30s

Model 1 Model 3 Model 4 Model 1 Model 3 Model 4Coef. S.E. Coef. S.E. Coef. S.E. Coef. S.E. Coef. S.E. Coef. S.E.

κµ 0.015 0.002 0.015 0.002 0.011 0.002 0.021 0.002 0.017 0.002 0.009 0.002

φ(1)1 0.757 0.022 0.733 0.024 0.396 0.109 0.795 0.016 0.801 0.015 0.566 0.134

φ(1)2 — — — — 0.499 0.122 — — — — 0.382 0.134

κ(1)η 0.099 0.005 0.098 0.005 0.050 0.014 0.119 0.006 0.121 0.005 0.054 0.009

φ(2)1 — — — — 0.606 0.076 — — — — 0.664 0.053

κ(2)η — — — — 0.062 0.014 — — — — 0.088 0.010κ∗1 0.001 0.002 0.003 0.009 0.003 0.008 0.001 0.003 0.001 0.008 0.002 0.006κ∗2 -0.004 0.002 -0.011 0.004 -0.008 0.004 -0.007 0.002 -0.013 0.004 -0.006 0.003κ∗3 -0.240 0.044 -0.008 0.004 -0.006 0.004 -0.001 0.002 -0.005 0.005 -0.003 0.004κ∗4 — — 0.013 0.008 0.008 0.007 — — 0.015 0.009 0.005 0.005κ∗5 — — -0.003 0.003 -0.002 0.002 — — -0.002 0.003 -0.001 0.002κ∗6 — — -0.005 0.002 -0.004 0.002 — — -0.005 0.002 -0.003 0.002κ∗7 — — -0.004 0.002 -0.002 0.002 — — -0.004 0.002 -0.001 0.002κ∗8 — — 0.001 0.003 0.001 0.002 — — 0.002 0.003 0.001 0.002κ∗9 — — -0.002 0.005 -0.001 0.004 — — 0.003 0.006 0.001 0.004κ∗10 — — 0.002 0.005 0.002 0.005 — — -0.002 0.005 0.000 0.004κ∗11 — — 0.013 0.007 0.009 0.006 — — 0.011 0.006 0.005 0.004

γ†1;1,0 0.124 0.058 0.045 0.178 0.066 0.187 0.150 0.067 0.051 0.181 0.087 0.177

γ†2;1,0 -0.455 0.066 -0.521 0.123 -0.508 0.132 -0.518 0.080 -0.570 0.157 -0.527 0.151

γ†3;1,0 -0.240 0.044 -0.223 0.117 -0.218 0.125 -0.232 0.050 -0.193 0.128 -0.203 0.133

γ†4;1,0 — — 0.838 0.237 0.807 0.207 — — 0.868 0.288 0.796 0.194

γ†5;1,0 — — -0.011 0.064 -0.004 0.068 — — 0.008 0.072 0.027 0.074

γ†6;1,0 — — -0.513 0.058 -0.502 0.065 — — -0.539 0.066 -0.532 0.075

γ†7;1,0 — — -0.274 0.058 -0.250 0.061 — — -0.275 0.065 -0.233 0.065

γ†8;1,0 — — 0.632 0.082 0.645 0.083 — — 0.660 0.099 0.679 0.094

γ†9;1,0 — — 0.371 0.135 0.348 0.134 — — 0.445 0.182 0.382 0.160

γ†10;1,0 — — -0.156 0.142 -0.145 0.144 — — -0.239 0.146 -0.181 0.142

γ†11;1,0 — — -0.100 0.190 -0.184 0.183 — — -0.072 0.202 -0.203 0.190

δ 9.849 0.180 9.786 0.139 9.757 0.133 9.340 0.214 9.219 0.188 9.177 0.174ν 2.223 0.033 2.228 0.033 2.232 0.033 1.630 0.016 1.631 0.016 1.635 0.016ζ 1.135 0.042 1.135 0.042 1.135 0.042 1.473 0.042 1.475 0.042 1.470 0.042p 0.0006 0.0003 0.0006 0.0003 0.0006 0.0003 0.0047 0.0005 0.0047 0.0005 0.0047 0.0005

IBM30s, —(13 knots)

Coef. S.E. Coef. S.E. Coef. S.E. Coef. S.E. Coef. S.E.

κµ 0.014 0.001 κ∗13 0.024 0.014 κ∗31 -0.007 0.006 γ†11;1,0 0.284 0.132 γ

†29;1,0 0.477 0.220

φ(1)1 0.761 0.018 κ∗14 -0.003 0.004 κ∗32 0.004 0.007 γ

†12;1,0 0.086 0.335 γ

†30;1,0 0.359 0.137

φ(1)2 — — κ∗15 0.006 0.004 κ∗33 0.000 0.006 γ

†13;1,0 0.733 0.460 γ

†31;1,0 -0.013 0.165

κ(1)η 0.122 0.006 κ∗16 -0.001 0.003 κ∗34 -0.003 0.006 γ

†14;1,0 0.619 0.087 γ

†32;1,0 -0.491 0.168

φ(2)1 — — κ∗17 0.002 0.003 κ∗35 0.007 0.006 γ

†15;1,0 0.435 0.149 γ

†33;1,0 -0.341 0.138

κ(2)η — — κ∗18 -0.006 0.003 κ∗36 0.000 0.006 γ

†16;1,0 -0.034 0.074 γ

†34;1,0 -0.591 0.149

κ∗1 -0.003 0.007 κ∗19 -0.002 0.003 κ∗37 0.007 0.007 γ†17;1,0 -0.106 0.098 γ

†35;1,0 -0.238 0.199

κ∗2 0.002 0.007 κ∗20 0.000 0.003 κ∗38 -0.005 0.005 γ†18;1,0 -0.354 0.099 γ

†36;1,0 -0.419 0.161

κ∗3 0.002 0.007 κ∗21 0.000 0.003 γ†1;1,0 0.715 0.189 γ

†19;1,0 -0.439 0.069 γ

†37;1,0 0.048 0.194

κ∗4 -0.006 0.005 κ∗22 -0.006 0.003 γ†2;1,0 0.434 0.176 γ

†20;1,0 -0.544 0.072 γ

†38;1,0 -0.023 0.171

κ∗5 -0.004 0.006 κ∗23 0.000 0.003 γ†3;1,0 0.054 0.171 γ

†21;1,0 -0.551 0.073 δ 9.062 0.220

κ∗6 -0.017 0.005 κ∗24 0.003 0.003 γ†4;1,0 -0.380 0.165 γ

†22;1,0 -0.388 0.110 ν 1.636 0.016

κ∗7 -0.007 0.005 κ∗25 0.003 0.004 γ†5;1,0 -0.213 0.149 γ

†23;1,0 -0.177 0.073 ζ 1.477 0.043

κ∗8 -0.008 0.005 κ∗26 0.003 0.004 γ†6;1,0 -0.460 0.273 γ

†24;1,0 0.123 0.105 p 0.005 0.000

κ∗9 -0.008 0.005 κ∗27 0.008 0.009 γ†7;1,0 -0.587 0.157 γ

†25;1,0 0.397 0.105

κ∗10 -0.001 0.006 κ∗28 -0.007 0.006 γ†8;1,0 -0.530 0.166 γ

†26;1,0 0.858 0.105

κ∗11 -0.003 0.007 κ∗29 0.008 0.008 γ†9;1,0 -0.391 0.168 γ

†27;1,0 0.538 0.221

κ∗12 0.016 0.012 κ∗30 -0.002 0.006 γ†10;1,0 -0.184 0.133 γ

†28;1,0 0.264 0.170

Table 8 – Estimated coefficients of other Models in Table 2. Standard errors are computed numerically.See Table 2 for the model specifications.

46

Documents

Modeling dynamic diurnal patterns in high frequency