College of Liberal Arts – Texas A&M University

Robust Estimation and Inference for Jumps in Noisy

High Frequency Data: A Local-to-Continuity Theory

for the Pre-averaging Method∗

Jia Li

Department of Economics

Duke University†

This Version: September 15, 2012

Abstract

We develop an asymptotic theory for the pre-averaging estimator when asset price

jumps are weakly identified, here modeled as local to zero. The theory unifies the con-

ventional asymptotic theory for continuous and discontinuous semimartingales as two

polar cases with a continuum of local asymptotics, and explains the breakdown of the

conventional procedures under weak identification. We propose simple bias-corrected

estimators for jump power variations, and construct robust confidence sets with valid

asymptotic size in a uniform sense. The method is also robust to microstructure noise.

Keywords: Confidence set; high frequency data; jump power variation; market

microstructure noise; pre-averaging; semimartingale; uniformity.

JEL Codes: C22.

∗This paper is a revised version of part of my Ph.D. dissertation at the department of economics,Princeton University. I am very grateful to my advisors Yacine Aït-Sahalia, Ulrich Müller and MarkWatson, as well as Jean Jacod for their guidance. I am also grateful for comments from Tim Bollerslev,Valentina Corradi, Nour Maddahi, Andrew Patton, George Tauchen and Viktor Todorov on various versionsof this paper. Comments from three referees and the co-editor have vastly improved the paper. The workis partially supported by NSF Grant SES-1227448. All errors are mine.†Durham, NC 27708. E-mail: [email protected].

1

1 Introduction.

This note proposes a robust method for the estimation and inference of power variations

of asset price jumps. We model the asset price as a continuous-time semimartingale and

the pth power variation of its jumps (henceforth the jump power variation) over some time

interval [0, T ] is defined as∑

0≤s≤T |∆Js|p, where ∆Js is the jump at time s. The jump

power variation is a pathwise analogue of the absolute moment of jumps. It naturally serves

as a measure for the jump risk, and can be used for estimating parameters governing the

jump process (Aït-Sahalia (2004), Todorov and Bollerslev (2010)), as well as constructing

nonparametric specification tests related to jumps (Aït-Sahalia and Jacod (2011)). Dis-

entangling the jump power variation, or functionals of the jump component in general, is

nontrivial because jumps are convoluted with the drift and the diffusive parts of the price

process. This task is further confounded by the presence of microstructure noise (Andersen

et al. (2006)). To the best of our knowledge, the pre-averaging method proposed by Jacod

et al. (2010), hereafter denoted JPV, is the only method available in the current literature

for the estimation and inference of jump characteristics that is robust to noise. However,

the asymptotic theory of JPV does not provide a satisfactory finite-sample approximation

in the presence of jumps, as documented by Aït-Sahalia et al. (2012), henceforth AJL.

In this note, we examine the asymptotic properties of the pre-averaging estimator when

jumps are weakly identified, or “small”, here modeled as local to zero. As hinted in the title,

we label this local asymptotic setting as “local-to-continuity”. While the standard theory of

JPV describes very distinct asymptotic behaviors of the pre-averaging estimator depending

on whether jumps are present or not, our results provide a continuum of local asymptotics

which bridges their results as two polar cases. Our theory explains the breakdown of the

standard method when jumps are weakly identified. Constructively, we propose a simple

bias correction for the pre-averaging estimator; we also propose robust confidence sets (CS)

for jump power variations, which have valid asymptotic coverage uniformly against “possibly

small jumps”. The results are nonparametric in nature, are valid for almost unrestricted

semimartingales, and are robust to microstructure noise. Monte Carlo evidence strongly

supports our theoretical findings.

Our contribution is twofold. Firstly, our local asymptotic theory for the pre-averaging

estimator is novel. Secondly, to the best of our knowledge, the robust CS and the associated

uniformity result is the first example of this kind for discretely sampled semimartingales.

More generally, we believe that the local-to-continuity approach can be extended to many

2

other applications for studying asset price jumps based on high frequency data.

We now discuss the related literature. We analyze the estimators of JPV and AJL

under the local-to-continuity setting and propose a robustification for these estimators. The

inference problem considered here, i.e. constructing CS’s for jump power variations, is more

general than the testing problem of AJL. Prior works on noise-robust estimation for high

frequency data, see e.g. Barndorff-Nielsen et al. (2008) and references therein, typically

assume away jumps or treat jumps as a nuisance, and hence have a quite different focus

than here. Jumps are now known to be prevalent in financial data and have been actively

studied in financial econometrics, see Aït-Sahalia and Jacod (2011) for a recent survey.

The insight that local asymptotics often provides a deeper understanding of the finite-

sample behavior of statistical procedures is now well recognized in econometrics. Main

examples include the local-to-unity literature, see e.g. Phillips (1987), as well as the weak

identification literature, see Staiger and Stock (1997) and Stock andWright (2000). Further-

more, local asymptotics have been shown to play a crucial role for constructing uniformly

valid inference procedures, see Mikusheva (2007), Andrews and Cheng (Forthcoming) and

references therein. Our approach here is clearly inspired by the above literature, but it is

distinct from prior works because of the nonstandard nature of the fill-in asymptotics for

semimartingale models.

The note is organized as follows. Section 2 presents the model and the pre-averaging

estimator. Section 3 presents the theory. Section 4 concludes. Technical details are collected

in the Appendix, where we present the regularity conditions and construct an estimator for

the asymptotic variance. The web supplement of this note contains all proofs and simulation

results.

2 The setting.

2.1 The underlying process.

The underlying process is a one-dimensional Itô semimartingale on a filtered space

(Ω,F , (Ft)t≥0,P) with the form

Xηt = X0 +

∫ t

0

bs ds+

∫ t

0

σsdWs + Jt, where (1)

Jt = η

(∫ t

0

∫E

δ (s, z) 1|δ(s,z)|≤1µ (ds, dz) +

∫ t

0

∫E

δ (s, z) 1|δ(s,z)|>1µ (ds, dz)

),

3

X0 is an F0-measurable random variable, W is a Brownian motion, σ is the stochastic

volatility process taking values in (0,∞) almost surely, η ∈ [0, 1] is a constant, δ is a

predictable function, µ is a Poisson random measure on R+ × E and its compensator is

ν(dt, dz) = dt ⊗ λ(dz) where (E, E) is an auxiliary space and λ is a σ-finite measure, and

µ = µ− ν. In typical financial econometrics applications, the underlying process representsthe logarithm of an asset price sampled at regularly spaced discrete times i∆n, i ≥ 0, over

a fixed time interval [0, T ], with the time lag ∆n → 0 asymptotically.

On the right-hand side of (1), the component X0 +∫ t

0bs ds+

∫ t0σsdWs is a continuous Itô

semimartingale, and Jt is a purely discontinuous process which can be completely charac-

terized by its jumps. In the sequel, we refer to these two components as the continuous part

and the jump part, respectively. We use the parameter η to control the scale of the jumps.

This parameter plays an important role in our asymptotic theory. To separate η from other

modeling components in (1), we introduce an auxiliary process X by setting Xt = X1t .1 In

particular, we have

∆J = η∆X, (2)

where for any càdlàg (i.e., right-continuous with left limits) process Y, the process ∆Y is

defined as ∆Yt = Yt − Yt−, t ≥ 0. The fixed jump process ∆X can be thought of as the

“direction”in which ∆J deviates from zero and η quantifies this deviation.

We stress that we are interested in the jumps of the underlying process, i.e. ∆J , rather

than η and ∆X separately. Indeed, while ∆J is identifiable upon observing the underlying

process in continuous time, η and ∆X can not be identified separately because of (2). We

hence keep the dependence of J on η implicit in our notation.2 Nevertheless, introducing

the scaling parameter η is useful for considering local asymptotics under a drifting sequence

of data generating processes. In Section 3, we derive asymptotic properties of the pre-

averaging estimator for a drifting sequence ηn while keeping other coeffi cients (i.e. b, σ, δ,

µ) fixed. When ηn → 0, the jump process ∆J = ηn∆X converges to zero asymptotically,

capturing the idea that jumps are “small” in an asymptotic sense. Moreover, the rate at

which ηn vanishes to zero describes how “small”the jumps are, and not surprisingly, it plays

an important role in the limiting theorems. Generally speaking, one could think of letting

1We note that X is the standard model for high frequency data and has been widely studied in theliterature, see e.g. Jacod (2008), Aït-Sahalia and Jacod (2011), Bollerslev and Todorov (2011), Todorovand Tauchen (2012) and in particular JPV.

2One may find that it is notationally more consistent to write Jη in place of J in order to emphasizeits dependence on η, we suppress the superscript here also for the sake of avoiding the somewhat awkwardnotation

∑s≤T |∆Jηs |

p for the pth jump power variation.

4

J drift to zero as an element in the space of stochastic processes, instead of governed by

the scaling sequence ηn. Our formulation here is only a special case, adopted largely due to

conceptual simplicity and technical convenience.3

To make the exposition as simple as possible, we present and discuss the regularity

conditions on the underlying process in Appendix A. Here, we only point out that the

assumptions are fairly standard and unrestrictive, allowing for stochastic volatility, jumps

of finite or infinite activity, and all manners of dependence between the characteristics of

the underlying process.

2.2 The noise.

We suppose that the underlying process can only be observed with an error: instead of Xηt ,

we observe

Zt = Xηt + χt. (3)

The error term χt is typically referred to as the “market microstructure noise”, which mainly

includes, but is not limited to, the bid-ask spread. We denote the conditional volatility of

the noise by αt =

√E[χ2

t | F(0)t ], where (F (0)

t )t≥0 is a filtration to which all processes in (1)

are adapted. The formal construction of the noise model and regularity conditions on the

noise are given in Appendix A.

The additive noise model considered here is standard in the literature, although the

specific assumption on the noise varies.4 Following Jacod et al. (2009), JPV and AJL, our

key assumption on the noise is that conditionally on the underlying process, the noise is

serially independent with mean zero. This assumption excludes some empirical features of

the noise that have been considered in the literature: we rule out the correlation between the

noise and the underlying process and the serial correlation of the noise process itself (Hansen

and Lunde (2006)), as well as the pure rounding case considered by Li and Mykland (2007).

This being said, we note that our assumption on noise is, to the best of our knowledge, the

most general setup known in the literature for studying the inference problem on jumps. In

3I wish to thank an anonymous referee for interesting comments on alternative formulations of the localasymptotic embedding.

4For various assumptions considered in the literature, see Zhou (1996), Zhang et al. (2005), Zhang (2006),Bandi and Russell (2006), Kalnina and Linton (2008), Barndorff-Nielsen et al. (2008, 2011), Aït-Sahalia et al.(2011). All these papers consider the case of estimating the quadratic variation and covariation, mostlyunder the setting without jumps. Hence, there is only minor overlap between these papers and the currentpaper, which is concerned with a general setting with jumps and functionals beyond the quadratic variation.

5

particular, we do allow the noise process to have stochastic conditional heteroskedasticity (αtis a stochastic process), unconditional serial dependence and dependence on the underlying

process through higher moments. We also allow the situation with “smooth rounding”as

discussed in Jacod et al. (2009).

2.3 Pre-averaging.

We now introduce the pre-averaging estimator proposed by JPV. The pre-averaging esti-

mator can be used to estimate jump power variations or integrated volatility functionals

depending on whether jumps are present or not. In order to define the pre-averaging win-

dow, we choose a sequence of integers kn satisfying kn√

∆n = θ + o(∆1/4n ) for some θ > 0.

Moreover, pre-averaging involves weighting the observations in the pre-averaging window.

In this paper, a function g : R 7→ R+ is called a weight function if it is continuous, piecewise

C1 with a piecewise Lipschitz derivative g′, supported on [0, 1] and satisfies∫g(s)2ds > 0.

For q > 0, we denote g(q) =∫|g(s)|q ds and g′(q) =

∫|g′(s)|q ds.

With any process Y = (Yt)t≥0 and weight function g, we associate the following variables.

For each integer i, we denote gni = g(i/kn), g′ni = gni − gni−1, and ∆ni Y = Yi∆n − Y(i−1)∆n .

5

We then set

Y (g)ni =kn−1∑j=1

gnj ∆ni+jY, Y (g)ni =

kn∑j=1

(g′nj ∆ni+jY )2.

We also define

V (Y, g, q, l)nt =

bt/∆nc−kn∑i=0

|Y (g)ni |q |Y (g)ni |l (4)

with b·c denoting the largest smaller integer function.Let p ≥ 2 be an even integer throughout the rest of this paper. We define (ρ(p)j)j=0,··· ,p/2

as the unique numbers solving the following triangular system of linear equations:

ρ(p)0 = 1,∑jl=0 2l m2j−2l C

p−2jp−2l ρ(p)l = 0, j = 1, 2, · · · , p/2,

where mq denotes the qth absolute moment of the law N (0, 1), and for integers x and y,

Cyx = x!/y!/(x− y)!.

5The notation ∆ni Y indicates that the increments form a triangular array; it is not meant to denote the

nth difference.

6

Finally, for any process Y , the pre-averaging estimator is defined to be

V (Y, g, p)nt =

p/2∑l=0

ρ(p)l V (Y, g, p− 2l, l)nt .

In the above display, the term V (Y, g, p, 0)nt serves as the leading term. The other terms

V (Y, g, p− 2l, l)nt , l = 1, . . . , p/2, when weighted by the constants ρ (p)l , correct the bias

arising from noise in the leading term.

3 The local-to-continuity asymptotics.

3.1 The law of large numbers.

We start with the law of large numbers (LLN) of the pre-averaging estimator V (Z, g, p)ntunder a drifting sequence of models governed by ηn; we use

P,ηn−→ to indicate the convergence

in probability under such a sequence. The normalizing factor in the LLN is given by

dn =∆

1−p/4n

1 + (∆−r∗n ηn)p, where r∗ =

p− 2

4p.

It is chosen to ensure that the probability limit is nondegenerate in all scenarios. We also

set

V (g, p)t = mp(θg (2))p/2∫ t

0

σpsds,

U (g, p)t = θg (p)∑s≤t|∆Xs|p

; (5)

it is helpful to recall that the price jump ∆J is related to the auxiliary process ∆X by (2).

Before stating the LLN, we remind the reader that the exact statements of the assump-

tions are collected in Appendix A, and all proofs are in the web supplement of this paper.

In the sequel, for any function f : R+ 7→ R, f (∞) should be understood as limh→∞ f (h)

provided that the limit exists; for any strictly positive sequences of real numbers xn and yn,

we denote xn ∼ yn iff limn→∞ xn/yn = 1.

Theorem 1 (LLN) Suppose that Assumptions (H-2) and (N) hold. Let (ηn)n≥1 ⊂ [0, 1] be

7

a sequence satisfying ∆−r∗

n ηn → h for some h ∈ [0,∞]. Then

dnV (Z, g, p)ntP,ηn−→ 1

1 + hpV (g, p)t +

hp

1 + hpU (g, p)t . (6)

Comments. (i) First consider the special case in which ηn vanishes at a polynomial rate,

i.e., ηn = ∆rn for some r ≥ 0. If r = r∗, the condition of Theorem 1 is satisfied with h = 1.

The normalizing factor is dn = ∆1−p/4n /2; the limiting variable is (V (g, p)t + U (g, p)t) /2,

the average between the contribution from the continuous part and the contribution from

the jump part, measured by V (g, p)t and U (g, p)t respectively. When r > r∗ (resp. r < r∗),

the theorem can be applied with h = 0 (resp. h = ∞), and the limit is V (g, p)t (resp.

U (g, p)t). In other words, when jumps are “small” (resp. “large”), the continuous part

(resp. the jump part) dominates the first-order asymptotic behavior of the estimator. The

constant r∗ is precisely the critical vanishing rate which balances the contributions of the

continuous part and the jump part in the LLN. The theorem also allows ηn to drift at non-

polynomial rates. For example, if ηn = ∆r∗n log (1/∆n), the theorem can be applied with

h =∞.(ii) The normalizing factor depends on ηn in the following way: when h ∈ [0,∞),

dn ∼ ∆1−p/4n /(1 + hp); when h =∞, dn ∼ ∆

1/2n η−pn ∼ θk−1

n η−pn .

(iii) The standard asymptotic theory of JPV under the non-local model can be recovered

by properly setting the sequence ηn. We first consider the case with jumps by taking ηn ≡ 1,

so h = 1 when p = 2 and h =∞ when p ≥ 4. In view of comment (ii), (6) can be written as

∆1/2n V (Z, g, p)nt

P→ V (g, p)t + U (g, p)t when p = 2

∆1/2n V (Z, g, p)nt

P→ U (g, p)t when p ≥ 4.

(7)

Up to a multiplicative constant, (7) coincides with Theorem 3.4(a) of JPV.6 In the absence

of jumps, we can take ηn ≡ 0 and simplify (6) as

∆1−p/4n V (Z, g, p)nt

P→ V (g, p)t .

This recovers Theorem 3.4(b) of JPV.

6To be precise, Theorem 3.4(a) in JPV allows X to be an arbitrary semimartingale and does not requirep to be an even integer when p > 2. Hence, we only recover special cases of their results. However, the lossof generality is inevitable here, because of our purpose of unifying the limiting theory for both continuousand discontinuous cases. Indeed, in the continuous case (Theorem 3.4(b)), JPV also require p to be an eveninteger.

8

(iv) The parameter h is a reparametrization of the scaling sequence ηn, which measures

the strength of the jump signal. As h increases from 0 to ∞, the limiting variable in(6) shifts continuously from the limit in the standard continuous case to the limit in the

standard jump case. In other words, Theorem 1 bridges the standard asymptotic results as

polar cases with a continuum of local asymptotics.

(v) Finally, we discuss the role of the smoothing parameter θ. Fix some h ∈ (0,∞).

In this case, both the continuous part and the jump part have nonnegligible contributions

in the LLN, and their relative contribution can be measured by V (g, p)t /U (g, p)t. Other

things being equal, this ratio is proportional to θp/2−1. When p > 2, V (g, p)t /U (g, p)tincreases in θ, suggesting that the relative strength of the jump signal decreases when we

over-smooth the data (i.e. θ and kn are large). When p = 2, this intuition is no longer true,

because the relative strength is invariant to θ.

Theorem 1 has an important implication for estimating jump power variations∑s≤t |∆Js|

p with p ≥ 4. It suggests that the pre-averaging estimator tends to overestimate

the jump power variation due to the continuous part in the price process; this source of over-

estimation is invisible in the standard asymptotics, see (7), but emerges naturally in the

local asymptotics. The theorem also suggests a simple procedure for correcting the higher-

order bias.7 Observing from (5) that V (g, p)t and U (g, p)t depend on the weight function g

in distinct ways when p > 2, we propose correcting the bias via multiple weight functions.

We consider d ≥ 2 weight functions (gi)1≤i≤d and a constant d-vector κ = (κi)1≤i≤d such

thatd∑i=1

κigi (2)p/2 = 0, θd∑i=1

κigi (p) = 1. (8)

The bias-corrected estimator is given by

Hnt = ∆1/2

n

d∑i=1

κiV (Z, gi, p)nt . (9)

Below, we compare the bias-corrected and the uncorrected estimators for the jump power

variation. For any nonrandom sequence bn > 0, we denote by op,ηn (bn) a generic sequence

of variables ξn which satisfies ξn/bnP,ηn−→ 0.

7As pointed out by an anonymous referee, the term “bias”here is somewhat imprecisely used. Neverthe-less, for the ease of discussion, we abuse the terminology slightly by using “bias”to refer to the first termon the right-hand side of (6). This term is nonzero only if h < ∞, implying ηn → 0 when p > 2. Hence,our discussion on “bias”does not directly speak to asymptotic results in the standard nonlocal setting.

9

Corollary 1 Suppose that the same conditions as in Theorem 1 hold for some h ∈ (0,∞).

Let p ≥ 4 be an even integer and Hnt = ∆

1/2n (θg (p))−1 V (Z, g, p)nt for some weight function

g. Then we have

Hnt =

∑s≤t|∆Js|p + op,ηn (ηpn) ,

Hnt =

∑s≤t|∆Js|p +

ηpnV (g, p)thpθg (p)

+ op,ηn (ηpn) .

Comments. (i) Corollary 1 shows that in the borderline case (0 < h <∞), Hnt is a valid

estimator for the jump power variation. It is valid in the sense that the estimation error

is asymptotically negligible relative to the estimand, even though the estimand vanishes

asymptotically; recall from (2) that∑

s≤t |∆Js|p = ηpn

∑s≤t |∆Xs|p, where the auxiliary

process ∆X is fixed. In contrast, the uncorrected estimator Hnt is invalid in the same sense,

because it carries a positive estimation error which has the same order of magnitude as the

estimand.

(ii) It can be shown that the assertion of Corollary 1 also holds for h = ∞, with theterm ηpnV (g, p)t /(h

pθg(p)) in the assertion understood as op,ηn (ηpn). Hence, when h = ∞,Hnt and H

nt are both “valid”estimators of the jump power variation in the same sense as

in comment (i). In particular, under the standard asymptotics (so ηn ≡ 1), both estimators

are consistent.

(iii) A convenient choice of the weight functions and the constants κi is the following.

For any weight function g1, if we take g2 (x) = g1 (kx) for some k > 1, then g1 (q) = kg2 (q)

for any q > 0, so (8) is satisfied with κ1 = −1/(θg1(p)(kp/2−1 − 1)) and κ2 = −kp/2κ1.

3.2 The central limit theorem.

We now describe the central limit theorem (CLT) of the pre-averaging estimator under the

local-to-continuity asymptotics. For a sequence of random variables ξn defined on the prob-

ability space (Ω,F ,P), we write ξnL-s,ηn−→ MN (0,Σξ) if ξn converges stably in law

8 under the

drifting sequence ηn to a random variable defined on an extension of the original probability

space and which, conditionally on F , has an N (0,Σξ) distribution. If Σξ is nonrandom, we

8Stable convergence in law is slightly stronger than the usual notion of weak convergence. We need thisstronger mode of convergence for inferential purposes, because the asymptotic variance here is random. SeeBarndorff-Nielsen et al. (2008) or Jacod and Shiryaev (2003) for detailed discussions.

10

write N (0,Σξ) in place of MN (0,Σξ). For applications, it is useful to consider the joint

convergence of estimators associated with multiple weight functions (gi)1≤i≤d for d ≥ 1,

which is our goal below.

We start by specifying the asymptotic variance. Consider two independent Brownian mo-

tions W 1 and W 2, given on another auxiliary filtered probability space (Ω′,F ′, (F ′t)t≥0,P′).For generic weight functions g and h, we define the following Wiener integral processes

L(g)t =

∫g(s− t) dW 1

s , L′(g)t =

∫g′(s− t) dW 2

s ,

and L(h) and L′(h) are defined likewise with h instead of g, with the same W 1 and W 2.

The four dimensional process (L(g), L′(g), L(h), L′(h)) is continuous stationary centered

Gaussian. We then set for x, y ∈ R and q, q′ even integers:

mq(g;x, y) = E′((xL(g)1 + yL′(g)1)q

)mq,q′(g, h;x, y)t = E′

((xL(g)1 + yL′(g)1)q (xL(h)t + yL′(h)t)

q′)

µ(g, h;x, y) =∑p/2

r,r′=0 ρ(p)rρ(p)r′ (2y2g′(2))

r (2y2h′(2)

)r′∫ 2

0

(mp−2r,p−2r′(g, h;x, y)t −mp−2r(g;x, y)mp−2r′(h;x, y)

)dt.

The covariance matrix associated with the continuous part is a d× d positive semi-definitematrix ΣC with entries

ΣijC = θ1−p

∫ t

0

µ(gi, gj; θσs, αs) ds.

We also consider four d× d positive semi-definite matrices Ψ−, Ψ+, Ψ′−, Ψ′+ with entries

Ψij± =

∫ 1

0

Γ(±, gi)t Γ(±, gj)t dt, Ψ′ij± =

∫ 1

0

Γ′(±, gi)t Γ′(±, gj)t dt,

where for any weight function g,

Γ(−, g)t =∫ 1

tg(s)p−1g(s− t)ds, Γ′(−, g)t =

∫ 1

tg(s)p−1g′(s− t)ds

Γ(+, g)t =∫ 1−t

0g(s)p−1g(s+ t)ds, Γ′(+, g)t =

∫ 1−t0

g(s)p−1g′(s+ t)ds.

The covariance matrix associated with the jump part is the d× d matrix given by

ΣJ = θ2p2∑s≤t|∆Xs|2p−2

(θσ2

s−Ψ− + θσ2sΨ+ +

α2s−θ

Ψ′− +α2s

θΨ′+

).

11

For any sequence (ηn)n≥1 ⊂ [0, 1], the normalizing factor of the CLT is given by:

an =∆

3/4−p/4n

1 + (∆−rn ηn)p−1, where r =

p− 2

4 (p− 1).

The normalizing factor depends on ηn and is chosen to avoid degenerate limits. Below, we

describe the joint stable convergence in law of the centered and scaled variables

V (gi, p)nt = an

(V (Z, gi, p)

nt −∆p/4−1

n V (gi, p)t −∆−1/2n θgi (p)

∑s≤t|∆Js|p

), 1 ≤ i ≤ d,

where the centering variable is motivated by Theorem 1.

Theorem 2 (CLT) Suppose that Assumptions (H-1), (K) and (N) hold. Let (ηn)n≥1 ⊂[0, 1] be a sequence such that ∆−rn ηn → h for some h ∈ [0,∞]. Then

(V (gi, p)nt )1≤i≤d

L-s,ηn−→ MN (0,Σ) ,

where

Σ =1

(1 + hp−1)2 ΣC +

(hp−1

1 + hp−1

)2

ΣJ .

Comments. (i) The constant r is the critical vanishing rate at which the continuous

part and the jump part are balanced in the CLT. When ηn ∼ ∆rn, both the continuous

part and the jump part have nonnegligible contributions to the asymptotic variance. When

∆−rn ηn → h = 0 (resp. h = ∞), the continuous part (resp. jump part) dominates in theasymptotic variance.

(ii) If the underlying process is assumed to be continuous, we can set ηn ≡ 0 and simplify

the statement of Theorem 2 as

∆−1/4n

(∆1−p/4n V (Z, gi, p)

nt − V (gi, p)t

)1≤i≤d

L-s−→MN (0,ΣC) .

This result recovers Theorem 4.1 of JPV and justifies the interpretation that ΣC is con-

tributed by the continuous part.

(iii) The standard asymptotic result of JPV in the jump case can be recovered by setting

ηn ≡ 1. In particular, when p ≥ 4 (so r > 0 and h =∞), the result of the theorem can be

12

simplified as

∆−1/4n

(∆1/2n V (Z, gi, p)

nt − θgi (p)

∑s≤t|∆Js|p

)1≤i≤d

L-s−→MN (0,ΣJ) .

Hence, the standard asymptotic theory not only ignores the higher-order bias in the LLN,

but also understates the sampling variability of the pre-averaging estimator, as it ignores

the contribution from the continuous part in the asymptotic variance. Therefore, if one

constructs a CS for the jump power variation based on the standard asymptotics, the CS

tends to be biased upward in location, and biased downward in scale, which may lead to

substantial size distortion. The size distortion in a testing context has been reported in the

simulation study of AJL.

(iv) Finally, it is interesting to observe that r > r∗ when p ≥ 4. Therefore, in view

of Theorem 1, the continuous part and the jump part can not be balanced in both the

first- and the second-order asymptotics for the same drifting sequence ηn. This observation

confirms the necessity of treating ηn as unknown and considering a general specification of

this drifting sequence, instead of imposing that ηn vanishes at some ad hoc rate.

The CLT for the bias-corrected estimator Hnt is given by the following corollary of

Theorem 2. Its proof is obvious and thus omitted.

Corollary 2 Let p ≥ 4 be an even integer and Hnt be defined by (9) with weight functions

(gi)1≤i≤d and constants κ = (κi)1≤i≤d. Under the same settings as in Theorem 2, we have

an

∆1/2n

(Hnt −

∑s≤t|∆Js|p

)L-s,ηn−→ MN

(0, κ>Σκ

).

3.3 Robust inference of jump power variations.

In this section, we construct robust CS’s for the pth jump power variation, i.e.∑

s≤t |∆Js|p,

for p ≥ 4. More precisely, for any constant c ∈ (0, 1), we construct a sequence of set-valued

statistics CSn1−c such that

limn→∞

infη∈[0,1]

Pη

(∑s≤t|∆Js|p ∈ CSn1−c

)= 1− c, (10)

13

where the notation Pη emphasizes the dependence of the data generating process on thescaling parameter η; recall that ∆J depends on η, see (2). By taking the infimum before

sending n → ∞, (10) requires that CSn1−c has valid asymptotic coverage uniformly overη ∈ [0, 1]. This uniformity requirement formalizes the notion of robustness against “possibly

small jumps”. For more discussions on the importance of uniformity, see Andrews and

Guggenberger (2009) and references therein.

Our construction relies on the bias-corrected estimator Hnt . To gauge its sampling vari-

ability, we need an estimator of the asymptotic variance Σ, which is associated with weight

functions (gi)1≤i≤d as described in Theorem 2. To streamline the discussion, we suppose

that there exists a sequence of estimators Σn which verifies Assumption (V) below. We

postpone the (somewhat complicated) construction of Σn until Appendix B for the sake of

readability.

Assumption (V). For any sequence (ηn)n≥1 ⊂ [0, 1] with∆−rn ηn → h ∈ [0,∞], a2nΣn

P,ηn→Σ.

We are now ready to define the robust CS. For any c ∈ (0, 1), let S1−c be a subset of Rsuch that P(ξ ∈ S1−c) = 1 − c for ξ ∼ N (0, 1). For any x, y ∈ R and S ⊆ R, we denotex+ yS ≡ x+ yz : z ∈ S. We then set

CSn1−c = Hnt + ∆1/2

n

√κ>ΣnκS1−c. (11)

The idea behind the construction of CSn1−c is straightforward. By the properties of stable

convergence, Corollary 2 and Assumption (V) imply

Hnt −

∑s≤t |∆Js|

p

∆1/2n

√κ>Σnκ

L-s,ηn−→ N (0, 1) (12)

for all ηn satisfying ∆−rn ηn → h ∈ [0,∞] . Importantly, the variable on the left-hand side of

the above display is asymptotically pivotal; in particular, the limiting distribution does not

depend on the strength of the jump signal, measured by the nuisance parameter h. Based

on (12), CSn1−c is the natural choice of CS with nominal level 1− c.The main result of this section is the following theorem, which summarizes the asymp-

totic properties of CSn1−c.

14

Theorem 3 (Uniformity) Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 be

an even integer and Σn be a sequence of estimators satisfying Assumption (V). For every

c ∈ (0, 1), we have

limn→∞

infη∈[0,1]

Pη

(∑s≤t|∆Js|p ∈ CSn1−c

)= 1− c. (13)

Moreover, (13) still holds if we replace “inf”with “sup”.

Comment. Theorem 3 shows that CSn1−c has asymptotically valid coverage uniformly

over the scaling parameter η and the CS is not conservative; the existence of the limit is

part of the result. In addition, the second part of this theorem shows that the CS is also

asymptotically similar with respect to η.

As an interesting special case of (11), we consider one-sided confidence intervals. With

S0.5 being [0,∞) or (−∞, 0] in (11), the corresponding CS takes the form [Hnt ,∞) or

(−∞, Hnt ] respectively. Theorem 3 implies that

limn→∞

infη∈[0,1]

Pη

(∑s≤t|∆Js|p ≥ Hn

t

)= 0.5,

limn→∞

infη∈[0,1]

Pη

(∑s≤t|∆Js|p ≤ Hn

t

)= 0.5,

and the same results hold if we replace “inf”by “sup”. In other words, Hnt is a uniformly

(relative to η) asymptotically median unbiased estimator of the jump power variation. In

addition to Corollary 1, this result gives an alternative sense of robustness to the bias-

corrected estimator Hnt . In our opinion, this alternative argument is conceptually more

appealing because it relies on the uniformity principle, rather than some ad hoc asymptotic

embedding of the drifting sequence ηn.

4 Concluding remarks.

Motivated by the intuition that jumps in asset prices may be weakly identified due to the

presence of microstructure noise, we propose a local-to-continuity asymptotic theory for

the pre-averaging estimator. In a noisy setting, our theory unifies the standard asymp-

totic results for continuous and discontinuous Itô semimartingales as two polar cases with

a continuum of local asymptotics. The theory explains the higher-order bias in the stan-

15

dard estimator and the size distortion of the standard CS. More importantly, the theory

is constructive for designing methods that are robust to possibly weakly identified jumps.

Simulation evidence in the web supplement of this paper supports the theoretical claim: the

robust method generally outperforms the standard method, and the findings are robust to

various jump behaviors, microstructure noise, bouncebacks, rounding, and moderate per-

turbations on tuning parameters.

Appendix A Assumptions.

In this appendix, we present the assumptions mentioned in the main text. Let

(Ω(0),F (0), (F (0)t )t≥0,P(0)) be a filtered probability space on which the random quantities

in (1) are defined. We assume for r = 1 or 2,

Assumption (H-r): (a) the process (bt) is optional and locally bounded;

(b) the process (σt) is càdlàg and adapted;

(c) the function δ is predictable, and there is a bounded nonnegative measurable function

γ on (E, E), such that∫Eγ (z)r λ (dz) < ∞ and the process supz∈E(|δ(ω(0), t, z)| ∧ 1)/γ(z)

is locally bounded;

(d) we have almost surely∫ t

0σ2s ds > 0 for all t > 0.

Assumption (K): We have Assumption (H-2) and σt is also an Itô semimartingale which

can be written as

σt = σ0 +

∫ t

0

bsds+

∫ t

0

σsdWs +Mt +∑s≤t

∆σs 1|∆σs|>v,

where M is a local martingale orthogonal to W and with bounded jumps and 〈M,M〉t =∫ t0asds, and the compensator of

∑s≤t 1|∆σs|>v is

∫ t0a′sds, and where bt, at, a

′t are optional lo-

cally bounded processes, and the processes bt and σt are optional and càglàd (left-continuous

with right limits).

Conditions (a) and (b) of Assumption (H-r) impose very mild measurability and sample-

path regularity. Condition (c) is more restrictive when r = 1, in which case the price jumps

have finite variation; when r = 2, this condition is quite mild because the jumps of every

semimartingale is square-summable. The stronger version of this assumption (i.e. r = 1) is

needed for the central limit theorem because it is more diffi cult to disentangle the continuous

16

part from “small” jumps when we consider higher-order asymptotics; similar restrictions

have been adopted by Jacod (2008) and Todorov and Tauchen (2012). Condition (d) is

needed to avoid degenerate limiting theorems. Assumption (K) allows the volatility to be

stochastic with finitely or infinitely active jumps, and dependent on the price process in all

manners. Overall, these assumptions are fairly unrestrictive and satisfied by most models

in finance. Of course, they do exclude some examples such as fractional Brownian motion

or models without a continuous martingale part in price.

We now turn to the noise. Following JPV, we formalize the noise model as follows. For

each t ≥ 0, we have a transition probability Qt(ω(0), dz) from (Ω(0),F (0)

t ) into R. The spaceΩ(1) = R[0,∞) is endowed with the product Borel σ-field F (1) and the “canonical process”

(χt : t ≥ 0) and the probability Q(ω(0), dω(1)) which is the product⊗

t≥0Qt(ω(0), ·). The

filtered probability space (Ω,F , (Ft)t≥0,P) in the main text is then defined as follows:

Ω = Ω(0) × Ω(1), F = F (0) ⊗F (1),

Ft = F (0)t ⊗ σ(χs : s ∈ [0, t)),

P(dω(0), dω(1)) = P(0)(dω(0)) Q(ω(0), dω(1)).

Any variable or process which is defined on either Ω(0) or Ω(1) can be considered in the usual

way as a variable or a process on Ω.

The assumption on the noise is the following:

Assumption (N): For each q > 0, there is a sequence of (F (0)t )-stopping times (Tq,n)n≥1

increasing to ∞, such that∫Qt(ω

(0), dz) |z|q ≤ n whenever t < Tq,n(ω(0)). We write

β(q)t(ω(0)) =

∫Qt(ω

(0), dz) zq, αt =√β(2)t,

and we assume that the processes α and β(3) are càdlàg, and that β(1) ≡ 0.

For the results in this paper, we actually only need the moments of χt to be finite up to a

certain order, where the “minimal”order needed varies with the power index p in a nontrivial

way. For the purpose of simplifying the presentation, we adopt the stronger assumption that

χt has finite moments for all orders. This is nevertheless a very mild restriction in financial

applications as the noise is typically bounded within a few tick sizes. As mentioned in the

main text, the key requirement in this assumption is β(1) ≡ 0 in conjunction with the

conditional independence of the noise at different times.

17

Appendix B Estimation of the asymptotic variance.

In this appendix, we construct an estimator Σn which verifies Assumption (V). We fix

p ≥ 4 and weight functions (gi)1≤i≤d in the background. The main result is Corollary 3.

To clarify the idea behind our construction, we also provide auxiliary results (Theorems 4

and 5) which discuss estimators for the continuous component ΣC and the jump component

ΣJ separately. Estimators proposed here are similar to those proposed by AJL. The key

modification is on the estimation of ΣJ as we need to correct higher-order biases arising

from both the continuous part and the noise part under the local-to-continuity asymptotics;

such modification is necessary due to the strong requirement in Assumption (V). Technically

speaking, we derive the convergence of these estimators under the local-to-continuity setting,

which is beyond the scope of AJL.

We start with the estimation of ΣC . We choose a sequence of truncation levels un as

follows:

un = α∆$n , where α > 0,

p− 1

2 (2p− 1)< $ <

1

4.

We then complete the notation (4) with a truncated version: for any weight function φ,

V ∗(Y, φ, q, l)nt =

bt/∆nc−kn∑i=0

|Y (φ)ni |q 1|Y (φ)ni |≤un |Y (φ)ni |l.

We also need to define a number of constants associated with the weight functions. We set

for any functions g and h and any integers w ≥ 1 and w′ ∈ 0, · · · , 2w:

a(g, h)t =∫ 1+1∧t

1∨t g(u− 1) h(u− t) du

a′(g, h;w,w′)t =∑bw′/2c

r=0 C2rw′ m2rm2w−2r a(g, g)w−w

′

1

a(g, h)w′−2r

t

(a(g, g)1 a(h, h)1 − a(g, h)2

t

)r.

Below, g and h may take values as weight functions or their derivatives. We then write for

w ∈ N and generic weight functions g and h,

A(g, h;w)t =∑

l,l′∈0,··· ,p/2,l+l′≤p−w∑(2w)∧(p−2l′)

w′=(2w−p+2l)+ ρ(p)l ρ(p)l′ C2w−w′p−2l Cw′

p−2l′

(2g′(2))l(2h′(2))l′a′(g, h;w,w′)t a

′(g′, h′; p− l − l′ − w, p− 2l′ − w′)t

A′(g, h;w) =∫ 2

0A(g, h;w)t dt− 2m2

p g(2)p/2 h(2)p/2 1w=p.

18

We choose any weight function φ and associate with it a sequence of statistics ΣnC , taking

values in the space of d× d matrices, with entries

Σn,ijC = ∆−1/2

n

p∑w=0

θ A′(gi, gj;w)

m2w2p−wφ(2)wφ′(2)p−w

w∑l=0

ρ(2w)l V∗(Z, φ, 2w − 2l, p+ l − w)nt . (B.1)

The asymptotic property of ΣnC is described in the following theorem.

Theorem 4 Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 and (ηn)n≥1 ⊂ [0, 1]

be a sequence such that ∆−rn ηn → h for some h ∈ [0,∞]. Then

a2nΣn

C

P,ηn−→ 1

(1 + hp−1)2 ΣC .

We now turn to the estimation of ΣJ . Setting

N (0,−)t = θ∑

s≤t |∆Xs|2p−2 σ2s−, N (0,+)t = θ

∑s≤t |∆Xs|2p−2 σ2

s,

N (1,−)t = 1θ

∑s≤t |∆Xs|2p−2 α2

s−, N (1,+)t = 1θ

∑s≤t |∆Xs|2p−2 α2

s,

we can rewrite

ΣJ = θ2p2(Ψ−N (0,−)t + Ψ+N (0,+)t + Ψ′−N (1,−)t + Ψ′+N (1,+)t

). (B.2)

To construct estimators for N (m,±)t, m = 0 or 1, we choose another sequence k′n of integers

satisfying

k′n/kn →∞, k′n∆n → 0.

For any weight functions φ and ψ, and any process Y, we consider the variables

ξ(Y, φ, 0)ni = 1φ(2)kn k′n ∆n

∑k′nj=1

((Y (φ)ni+j)

2 − 12Y (φ)ni+j

)1|Y (φ)ni+j |≤un

ξ(Y, φ, 1)ni = 1

2φ′(2)kn k′n ∆n

∑k′nj=1 Y (φ)ni+j 1|Y (φ)ni+j |≤un

v (Y, ψ)ni =∑p−1

l=0 ρ (2p− 2)l |Y (ψ)ni |2p−2−2l|Y (ψ)ni |l.

We define four processes as follows: for m = 0 or 1,

N (Y, φ, ψ,m,−)nt =∑bt/∆nc−kn

i=kn+k′nv (Y, ψ)ni ξ (Y, φ,m)ni−kn−k′n

N (Y, φ, ψ,m,+)nt =∑bt/∆nc−2kn−k′n+1

i=0 v (Y, ψ)ni ξ (Y, φ,m)ni+kn−1 .

19

To further the discussion, we now describe the asymptotic behaviors of

N (Z, φ, ψ,m,±)nt . To this end, we denote

Q (0)t = m2p−2θp−1

∫ t

0

σ2ps ds,

Q (1)t = m2p−2θp−1

∫ t

0

σ2p−2s α2

s ds.

Theorem 5 Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 and (ηn)n≥1 ⊂ [0, 1]

be a sequence such that ∆−rn ηn → h for some h ∈ [0,∞]. Then for m = 0, 1, we have

a2nN (Z, φ, ψ,m,±)nt

P,ηn→(

1

1 + hp−1

)2

ψ (2)p−1Q (m)t

+

(hp−1

1 + hp−1

)2

ψ (2p− 2)N (m,±)t .

Theorem 5 shows that, when properly normalized, N (Z, φ, ψ,m,±)nt converges to

N (m,±)t plus a bias term, i.e., the term involvingQ (m)t. The bias is contributed by

the continuous part of the underlying price process. Again, we correct the bias by using

multiple weight functions. When p ≥ 4, we can pick two weight functions ψ1 and ψ2 and

real numbers λ1 and λ2 such that

2∑j=1

λjψj (2)p−1 = 0,2∑j=1

λjψj (2p− 2) = 1, (B.3)

and set N (m,±)nt =∑2

j=1 λjN(Z, φ, ψj,m,±

)nt. By Theorem 5 and (B.3),

a2nN (m,±)nt

P,ηn→(

hp−1

1 + hp−1

)2

N (m,±)t .

Finally, in view of (B.2), we set

ΣnJ = θ2p2

(Ψ−N (0,−)nt + Ψ+N (0,+)nt + Ψ′−N (1,−)nt + Ψ′+N (1,+)nt

). (B.4)

The following corollary summarizes the results above. Its proof is elementary and thus

omitted.

Corollary 3 Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 and Σn = ΣnC + Σn

J ,

20

where ΣnC and Σn

J are given by (B.1) and (B.4), respectively. Then Σn satisfies Assumption

(V).

Comment. Corollary 3 shows that imposing Assumption (V) in Theorem 3 is free, as

it is implied by Assumptions (H-1) and (N).

References

Aït-Sahalia, Y., 2004. Disentangling diffusion from jumps. Journal of Financial Economics

74, 487—528.

Aït-Sahalia, Y., Jacod, J., 2011. Analyzing the spectrum of asset returns: Jump and volatil-

ity components in high frequency data. Journal of Economic Literature, Forthcoming.

Aït-Sahalia, Y., Jacod, J., Li, J., 2012. Testing for jumps in noisy high frequency data.

Journal of Econometrics 168, 207—222.

Aït-Sahalia, Y., Mykland, P. A., Zhang, L., 2011. Ultra high frequency volatility estimation

with dependent microstructure noise. Journal of Econometrics 160, 190—203.

Andersen, T. G., Bollerslev, T., Frederiksen, P. H., Nielsen, M. Ø., 2006. Comment on

realized variance and market microstructure noise. Journal of Business and Economic

Statistics 24, 173—179.

Andrews, D. W. K., Cheng, X., Forthcoming. Estimation and inference with weak, semi-

strong, and strong identification. Econometrica.

Andrews, D. W. K., Guggenberger, P., 2009. Validity of subsampling and “plug-in asymp-

totic”inference for parameters defined by moment inequalities. Econometric Theory 25,

669—709.

Bandi, F. M., Russell, J. R., 2006. Separating microstructure noise from volatility. Journal

of Financial Economics 79, 655—692.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2008. Designing realized

kernels to measure ex-post variation of equity prices in the presence of noise. Econometrica

76, 1481—1536.

21

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2011. Multivariate realised

kernels: Consistent positive semi-definite estimators of the covariation of equity prices

with noise and non-synchronous trading. Journal of Econometrics 162, 149—169.

Bollerslev, T., Todorov, V., 2011. Estimation of jump tails. Econometrica 79, 1727—1783.

Hansen, P. R., Lunde, A., 2006. Realized variance and market microstructure noise. Journal

of Business and Economic Statistics 24, 127—161.

Jacod, J., 2008. Asymptotic properties of realized power variations and related functionals

of semimartingales. Stochastic Processes and their Applications 118, 517—559.

Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., Vetter, M., 2009. Microstructure noise in

the continuous case: The pre-averaging approach. Stochastic Processes and Their Appli-

cations 119, 2249—2276.

Jacod, J., Podolskij, M., Vetter, M., 2010. Limit theorems for moving averages of discretized

processes plus noise. Annals of Statistics 38, 1478—1545.

Jacod, J., Shiryaev, A. N., 2003. Limit Theorems for Stochastic Processes, 2nd Edition.

Springer-Verlag, New York.

Kalnina, I., Linton, O., 2008. Estimating quadratic variation consistently in the presence of

endogenous and diurnal measurement error. Journal of Econometrics 147, 47—59.

Li, Y., Mykland, P. A., 2007. Are volatility estimators robust with respect to modeling

assumptions? Bernoulli 13, 601—622.

Mikusheva, A., 2007. Uniform inference in autoregressive models. Econometrica 75, 1411—

1452.

Phillips, P. C. B., 1987. Towards a unified asymptotic theory for autoregression. Biometrika

74, 535—547.

Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments.

Econometrica 65, 557—586.

Stock, J. H., Wright, J. H., 2000. Gmm with weak identification. Econometrica 68, 1055—

1096.

22

Todorov, V., Bollerslev, T., 2010. Jumps and betas: A new framework for disentangling and

estimating systematic risks. Journal of Econometrics 157, 220—235.

Todorov, V., Tauchen, G., 2012. The realized laplace transform of volatility. Econometrica

80, 1105—1127.

Zhang, L., 2006. Effi cient estimation of stochastic volatility using noisy observations: A

multi-scale approach. Bernoulli 12, 1019—1043.

Zhang, L., Mykland, P. A., Aït-Sahalia, Y., 2005. A tale of two time scales: Determining

integrated volatility with noisy high-frequency data. Journal of the American Statistical

Association 100, 1394—1411.

Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of

Business & Economic Statistics 14, 45—52.

23

Documents

College of Liberal Arts – Texas A&M University