Testing conditional symmetry without smoothing

Preview:

Citation preview

This article was downloaded by: [Universitaets und Landesbibliothek]On: 28 April 2013, At: 05:52Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Nonparametric StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gnst20

Testing conditional symmetry withoutsmoothingTao Chen a & Gautam Tripathi ba Department of Economics, University of Waterloo, Waterloo,Ontario, Canadab Faculty of Law, Economics and Finance, University ofLuxembourg, 1511, Luxembourg, LuxembourgVersion of record first published: 13 Feb 2013.

To cite this article: Tao Chen & Gautam Tripathi (2013): Testing conditional symmetry withoutsmoothing, Journal of Nonparametric Statistics, DOI:10.1080/10485252.2012.752083

To link to this article: http://dx.doi.org/10.1080/10485252.2012.752083

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

Journal of Nonparametric Statistics, 2013http://dx.doi.org/10.1080/10485252.2012.752083

Testing conditional symmetry without smoothing

Tao Chena and Gautam Tripathib*

aDepartment of Economics, University of Waterloo, Waterloo, Ontario, Canada; bFaculty of Law,Economics and Finance, University of Luxembourg, 1511 Luxembourg, Luxembourg

(Received 15 June 2012; final version received 14 November 2012)

We test the assumption of conditional symmetry used to identify and estimate parameters in regressionmodels with endogenous regressors, without making any distributional assumptions. The Kolmogorov–Smirnov-type statistic we propose is consistent, computationally tractable because it does not requireoptimisation over an uncountable set, free of any kind of nonparametric smoothing, and can detect n1/2-deviations from the null. Results from a simulation experiment suggest that our test can work very well inmoderately sized samples.

Keywords: testing conditional symmetry; endogenous regressors

1. Introduction

1.1. Motivation

Let (Y , X)1+dim(X)×1 be an observed random vector such that Y = μ(X, θ0) + ε for some θ0 ∈� ⊂ R

dim(�), where μ is a real-valued function of X known up to θ0. The latent random variableε, assumed to be continuously distributed with full support on R, represents the aggregate effect ofall variables affecting the outcome Y that are not part of the model. In observational data models,it is therefore not unlikely for ε, interpreted as an unobserved regressor, to be correlated withsome or all of the observed regressors. In other words, some or all coordinates of X, the vector ofexplanatory variables, may be endogenous.

The usual approach to identify and estimate θ0 in the presence of endogenous regressors isvia the method of instrumental variables (IV), whereby one assumes the existence of a vectorof random variables Wdim(W)×1, called the IV (or just instruments) for X, which are exogenouswith respect to ε in a certain sense. For example, it is common to assume that ε is mean inde-pendent of W , that is, E[ε|W ] = 0 with probability one (w.p.1). However, when conditionalmean restrictions may not be plausible, such as when ε is believed to have thick tails, or whenbehavioural assumptions imply other restrictions on the distribution of ε|W (cf. Example 1.1),then other assumptions asserting exogeneity have to be made. These include assuming ε to bestochastically independent or (more weakly) quantile independent of W or assuming that the dis-tribution of ε is symmetric conditional on W . Compare Powell (1994, Section 2) for an extensive

*Corresponding author. Email: gautam.tripathi@uni.lu

© American Statistical Association and Taylor & Francis 2013

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

2 T. Chen and G. Tripathi

discussion on the various stochastic restrictions used to identify and estimate semiparametricmodels.

In this paper, we focus on conditional symmetry and assume that the unknown conditionaldistribution of ε given W is symmetric about the origin, that is, the null hypothesis we would liketo test is that

H0 : Law(ε|W) = Law(−ε|W). (1)

Note that the instruments W and the regressors X can have elements in common because theexogenous coordinates of X act as their own instruments. If all regressors are exogenous, then, ofcourse, W = X. For notational ease, the vector containing the distinct coordinates of X and W iswritten simply as (X , W).

The main objective of this paper is to propose a smoothing-free test for Equation (1) againstthe alternative that it is false, that is, the alternative hypothesis is that

H1 : ∀θ ∈ �, Law(ε(θ)|W) �= Law(−ε(θ)|W), (2)

where ε(θ) := Y − μ(X , θ). (Of course, ε(θ0) = ε.)Symmetry is a powerful shape restriction because it unambiguously fixes the location under

minimal assumptions, for example, ε need not possess any moments. Moreover, it yields additionalinformation about the underlying distribution in the form of an infinite number of moment condi-tions, namely, that the mean of every odd functions vanishes, that can be exploited to increase theefficiency of estimators. These fundamental properties are essentially the reason why symmetry isoften imposed on statistical models to identify parameters and estimate them more efficiently (cf.,e.g., Bickel 1982; Manski 1984; Powell 1986; Newey and Powell 1987; Newey 1988a, 1988b).Symmetry considerations can also play important roles in economic modelling. For instance,since constraints in the nominal wage adjustment process can affect the smooth functioning oflabour markets, much attention has been paid in empirical labour economics to determine whetherthe changes in nominal wages are symmetrically distributed in order to test the hypothesis thatnominal wages are downwards rigid (cf., e.g., Card and Hyslop 1996; Christofides and Stengos2001; Stengos and Wu 2004). In empirical finance, the pricing of risky assets depends upon theskewness of the distribution of their returns, that is, assets that make the portfolio returns moreleft (right) skewed command higher (lower) expected returns (cf. Harvey and Siddique 2000 andreferences therein). Additional examples of symmetry restrictions in empirical macroeconomicsand finance can be found in Bai and Ng (2001).

The following example illustrates how conditional symmetry can result from more primitiveassumptions.

Example 1.1 Consider a paired sibling study where Yi1 := Ci + X ′i1θ0 + εi1 and Yi2 := Ci +

X ′i2θ0 + εi2 represent observed market outcomes for the ith pair of twins, Ci denotes unobserved

traits that for genetic reasons are common to the sibling pair, Xi1 and Xi2 are observed character-istics that vary across the siblings, and εi1 and εi2 denote idiosyncratic shocks, say, produced by‘luck’, to the respective outcomes that may be correlated with Ci and some, or all, of the observedregressors. Assume the existence of IV Wi, for example, Wi could be an appropriately chosen sub-set of Xi1 and Xi2, such that, given (Ci, Wi), the idiosyncratic shocks εi1 and εi2 are independentlyand identically drawn from the same distribution. (A weaker alternative is to assume that εi1 andεi2 are exchangeable conditional on (Ci, Wi).) It is then straightforward to verify that the con-ditional distribution of εi := εi1 − εi2 given Wi is symmetric about the origin. Therefore, θ0 canbe estimated from the model Yi1 − Yi2 = (Xi1 − Xi2)

′θ0 + εi, where Law(εi|Wi) = Law(−εi|Wi).Note that conditional symmetry here was not directly imposed on the estimable model, but was

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 3

a consequence of the primitive assumptions made regarding the error terms in the structuralequations.

1.2. Brief literature review

Since symmetry restrictions are widely used, it is not surprising that nonparametric tests to deter-mine whether the assumption of symmetry is supported by the data have been extensively studied.For instance, there is a large statistical literature on testing symmetry in univariate models (cf.Randles and Wolfe 1979, Section 3.5 and references therein). For multivariate models, Fan andGencay (1995), Ahmad and Li(1997), Zheng (1998), and Hyndman and Yao (2002) use kernelsmoothing, Bai and Ng (2001) employ the martingale transformation proposed by Khmaladze(1993), and Neumeyer, Dette, and Nagel (2005), Delgado and Escanciano (2007), and Neumeyerand Dette (2007) rely on empirical process theory. While the smoothing approach leads to teststhat are consistent against fixed alternatives, these tests possess zero power against n1/2-deviationsfrom the null and moreover they depend upon smoothing parameters which are not always easyto choose. Khmaladze’s transformation leads to distribution-free tests which have power againstn1/2-alternatives, although it too requires nonparametric smoothing. The empirical process-basedtests require no smoothing but are not distribution free so that critical values for implementingthe test cannot be tabulated in advance; instead, they are obtained by simulation.

1.3. Our contribution

Of the works cited earlier, the ones most closely related to our paper are Neumeyer et al. (2005)and Nagel and Delgado and Escanciano (2007), although our paper differs from these two papersin some important ways. First, unlike these papers, we allow the regressors in our model to beendogenous. (In contrast, the aforementioned papers assume ε to be independent (resp. meanindependent) of X so that endogenous regressors are ruled out by assumption.) This is a non-trivial extension because endogeneity of regressors does affect the behaviour of the test statisticin both large and small samples (cf. Example 3.1 and Section 6.1). Second, the Kolmogorov–Smirnov (KS)-type statistic we propose is computationally more tractable than the one proposedin these papers because it only requires a search over a finite number of points (cf. Section 2).By contrast, the KS statistics in Delgado and Escanciano (2007) and Neumeyer et al. (2005) areimplemented by searching over the uncountable set R

d , where d depends upon the dimension ofthe variables involved. (Delgado and Escanciano also propose a Cramér–von Mises-type statisticthat is computationally less demanding than their KS statistic.) Third, we expand the literature byshowing that the use of simulated critical values for specification tests can be extended to handlemodels with endogenous regressors (cf. Section 4). Finally, unlike the earlier papers that only usethe empirical distribution function to construct the test statistic, our technical treatment is verygeneral and encompasses a large class of test functions allowing us to determine the asymptoticproperties of a large class of test statistics in a unified manner.

1.4. Organisation

The remainder of the paper is organised as follows. Section 2 motivates the test statistic, andits large sample behaviour under the null hypothesis is described in Sections 3. As the limitingdistribution of the test statistic turns out to be nonpivotal, Section 4 shows how to simulate thecritical values. In Section 5, we demonstrate that our test is consistent against fixed alternativesand possesses nontrivial power against sequences of alternatives that are n1/2-distant from the

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

4 T. Chen and G. Tripathi

null. Small sample properties of the test are examined in Section 6, and Section 7 concludes thepaper. All proofs are given in the appendices.

2. The test statistic

Our test for conditional symmetry is easy to motivate and is based on the almost obvious fact(whose proof, for the sake of completeness, is provided in the appendix) that

Law(ε|W) = Law(−ε|W) ⇐⇒ (ε, W)d= (−ε, W), (3)

where ‘d=’ is shorthand for ‘equal in distribution’. This suggests that a test for Equation (1)

can be obtained by comparing the empirical distributions of Z := (ε, W)1+dim(W)×1 and Z r :=(−ε, W), where Z r denotes Z with its first coordinate reflected about the origin. Since Z and Z r

are unobserved as they depend upon ε, we compare the empirical distributions of their feasibleversions Z := (ε, W) and Z r := (−ε, W) instead, where ε := Y − μ(X, θ ) and θ is an estimatorof θ0, for example, the IV estimator, obtained using data (Y1, X1, W1), . . . , (Yn, Xn, Wn).

Given z ∈ R1+dim(W), let (−∞, z] := (−∞, z(1)] × · · · (−∞, z(1+dim(W))] denote the closed

lower orthant in R1+dim(W). Then, letting 1S denote the indicator function of set S,

one possible statistic for testing Equation (1) is R := supz∈R1+dim(W) |n−1 ∑nj=1 1(−∞,z](Zj) −

n−1 ∑nj=1 1(−∞,z](Z r

j )|, which resembles the usual KS statistic for testing whether two samples

come from the same distribution. The null hypothesis is rejected if the observed value of R is largeenough.

The computational cost of implementing R can be reduced by restricting search to the finitesubset Z := {Z1, . . . , Zn} that in a certain sense becomes dense in supp(Z), the support of Z , asthe sample size increases. Therefore, motivated by Andrews (1997), we focus our attention on

Rmax := maxz∈Z

∣∣∣∣∣∣n−1n∑

j=1

1(−∞,z](Zj) − n−1n∑

j=1

1(−∞,z](Z rj )

∣∣∣∣∣∣ . (4)

It will be shown later that R and Rmax behave similarly in large samples. (Note that since Z consistsof estimated observations, unlike the statistic in Andrews’s paper which is maximised over the‘true’ observations, the effect of estimating θ0 has to be taken into account when showing that Zis dense in supp(Z) as n → ∞; cf. Lemma A.4 for exact details.)

As far as checking the equality of two probability measures is concerned, it is worth keepingin mind that the lower orthants used in the construction of R and Rmax can be substituted byany measure determining class of sets. For instance, letting S

dim(W) denote the unit sphere inR

1+dim(W), Equation (1) could also be tested with H := sup(z,t)∈Sdim(W)×R |n−1 ∑nj=1 1H(z,t)(Zj) −

n−1 ∑nj=1 1H(z,t)(Z r

j )|, which employs closed half-spaces H(z, t) := {s ∈ R1+dim(W) : s′z ≤ t}

instead of orthants. Analogous statistics when closed/open lower orthants are replaced byclosed/open upper orthants or when closed half-spaces are replaced by open half-spaces, etc.follow mutatis mutandis.

It is useful to consider the statistics described above as special cases of a general statisticso that asymptotic properties of a large class of test statistics can be investigated in a unifiedmanner. To accomplish this goal, we introduce some compact notation. Henceforth, let PU be thedistribution of a random vector U and PU denote the empirical measure induced by n observationson U, that is, PU := n−1 ∑n

j=1 δUj , where δUj is the Dirac (i.e. point) measure at Uj; for example,

PZ := n−1 ∑nj=1 δZj

and PrZ

:= n−1 ∑nj=1 δZ r

jare the empirical measures induced by Z1, . . . , Zn and

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 5

Z r1, . . . , Z r

n, respectively. Let F be a collection of ‘test’ functions from R1+dim(W) → R and �∞(F)

denote the set of bounded functions from F → R. We often use linear functional notation to writethe integral of f ∈ F with respect to PZ as PZ f := ∫

f (z) PZ(dz) = ∫f (u, w) Pε,W (du, dw). (If

the region of integration is omitted, it means that integration is over the support of Z .)A general KS statistic for testing Equation (1) can now be defined as

KSF := supf ∈F

|PZ f − PrZf | = sup

f ∈F|(PZ − P

rZ)f |. (5)

The statistics described earlier are encompassed by KSF for appropriately chosen F . For instance,if F1 := {1(−∞,z] : z ∈ R

1+dim(W)} is the set of indicator functions of lower orthants in Z , thenKSF1 = R. Similarly, if F2 := {1H(z,t) : (z, t) ∈ S

dim(W) × R} is the collection of indicators ofclosed half-spaces in R

1+dim(W), then KSF2 = H. Although Rmax can also be written as KSF1with

F1 := {1(−∞,z] : z ∈ Z}, for technical reasons, we prefer that F not be random. Hence, we willdeal with Rmax separately when showing that it has the same asymptotic properties as R.

Remarks (i) It is important at this point to emphasise that the statistic we recommend to prac-titioners for applied work is n1/2Rmax as defined in Equation (4), because it is easy to implementand works well in small samples (cf. Section 6). The general KS statistic KSF , though not suitedfor practical implementation, is useful nonetheless, because it allows us to derive the large samplebehaviour of n1/2Rmax and related statistics in a unified manner. Finally, although it may be ofinterest to investigate the choice of F that leads to an ‘optimal’ KSF statistic, this task is beyondthe scope of our paper and we leave it for future research.

(ii) The expression for KSF can be further simplified. In particular, instead of expressing KSFas a contrast between PZ and P

rZ

as we do in Equation (5), there is an equivalent representation

in terms of a single measure. To see this, let f r(Z) := f (Z r). Then, KSF = supf ∈F |PZ(f − f r)|.Besides reducing computational cost, as only one empirical measure has to be calculated insteadof two, this equivalence also simplifies some technical arguments, for example, showing theasymptotic tightness of KSF , because we only have to deal with one measure instead of two.Note that since f − f r is antisymmetric in its first coordinate, KSF will be small, ideally zero, ifthe null hypothesis is true; hence, the simpler form also has the correct interpretation.

(iii) Even though Equation (3) ⇐⇒ (ε, −W)d= (−ε, −W), because a sign change is an one-

to-one transformation and conditional expectations are invariant to one-to-one transformationsof conditioning variables, the Rmax statistic based on instruments (W1, . . . , Wn) will, in general,be different from one based on (−W1, . . . , −Wn). This lack of invariance with respect to signchanges of the data is a generic feature of KS statistics (cf. Andrews 1997, p. 1100, who alsodescribes a standard way to construct a sign invariant version). Following his approach, an Rmax

statistic that is invariant to sign changes of the coordinates of W can be constructed as follows foruse in applications where it is desirable to have such invariance. (Say, for instance, in applicationswhere the instruments are obtained by differencing other variables.) First, for each possible signpermutation π of the coordinates W , construct the statistic Rmax(π) as defined in Equation (4).(There are 2dim(W) sign permutations, denoted by �, to consider.) Next, define the sign invariantstatistic Smax to be the maximum of these statistics, that is, Smax := ‖Rmax‖∞ := maxπ∈� Rmax(π).Smax is clearly invariant to sign changes in the coordinates of W , although, depending upon thedimension of W , it might be computationally more burdensome to calculate. The asymptoticproperties of n1/2Smax remain analogous to those of n1/2Rmax. (Briefly, under the null hypothesisor under the sequence of local alternatives specified in Section 5 (cf. Lemmas 3.3 and 5.3), eachn1/2Rmax(π) converges in distribution to a functional of a Gaussian process indexed by π ; hence,by the continuous mapping theorem, n1/2Smax converges in distribution to the maximum (over

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

6 T. Chen and G. Tripathi

π ∈ �) of this functional.) Therefore, as inAndrews’s paper, we only provide a formal descriptionof the properties of n1/2Rmax.

Additional notation. The following additional notation is used for the remainder of the paper.A(θ) := (ε(θ), X, W) is the random vector containing ε(θ) and the distinct coordinates of X and W ;hence, if all regressors are exogenous, that is, W = X, then A(θ) = (ε(θ), X).We write A0 := A(θ0)

for notational convenience. Keeping in mind that ε(θ) is continuously distributed, whereas Xand W may have discrete components, pA(θ)(u, x, w) du κ(dx, dw) := PA(θ)(du, dx, dw) denotesthe density of PA(θ), where the dominating measure κ is a mixture of the Lebesgue andcounting measures. Given a random vector U, L2(PU) is the set of real-valued functions ofU that are square integrable with respect to PU . The L2(PU) inner product and norm are〈a1, a2〉PU := ∫

a1(u)a2(u) PU(du) and ‖a‖2,PU := 〈a, a〉1/2PU

. The Euclidean norm is ‖ · ‖ and

B(θ , ε) is the open ball of radius ε centred at θ ; its ‖ · ‖-closure is B(θ , ε).Throughout the paper we maintain the assumption that the observations (Yj, Xj, Wj), j =

1, . . . , n, are independent and identically distributed (i.i.d.). Unless stated otherwise, all limitsare taken as the sample size n → ∞.

3. Large sample results under the null hypothesis

In this section, we derive the asymptotic distribution of KSF when the null hypothesis is true. Todo so, we make use of the notion of convergence in distribution of stochastic processes takingvalues in �∞(F) (cf. Chapter 1.5 of van der Vaart and Wellner 1996, henceforth referred to asV&W, for details regarding convergence in distribution in �∞(F)).

If θ0 is known and H0 is true, then (PZ − PrZ)f = (PZ − PZ)(f − f r), f ∈ F . Hence, under

standard conditions on F , it can be shown that the process {n1/2(PZ − PrZ)f : f ∈ F} converges

in distribution in �∞(F) to a mean zero Gaussian process {G0f : f ∈ F} with covariance func-tion EG0f1G0f2 := E[(f1 − f r

1 )(Z)(f2 − f r2 )(Z)], f1, f2 ∈ F . Consequently, n1/2KSF converges in

distribution to the random variable supf ∈F |G0f |. Of course, in practice, θ0 is unknown and has to

be estimated. As we now show, the estimation error from using θ instead of θ0 shifts the limitingdistribution of {n1/2(PZ − P

rZ)f : f ∈ F} and, hence, n1/2KSF .

For maximum generality, we try to impose as few restrictions as possible on our model. Forinstance, we do not require μ to be smooth in θ ; instead, we only assume that it is mean-squaredifferentiable at θ0, that is,

Assumption 3.1 There exists a neighbourhood of θ0 such that for all θ in this neighbourhood,μ(x, θ) = μ(x, θ0) + μ′(x, θ0)(θ − θ0) + ρ(x, θ , θ0), where μ : R

dim(X) × Rdim(�) → R

dim(�)

satisfies∫ ‖μ(x, θ0)‖2

PX(dx) < ∞ and the remainder term ρ : Rdim(X) × � → R is such that∫

supθ∈B(θ0,δ) ρ2(x, θ , θ0) PX(dx) = o(δ2) as δ → 0.

Since mean-square differentiability is weaker than pointwise differentiability, our setup allowsfor kinks in the functional forms for μ. Note that ρ is identically zero if μ is linear in parameters.

Let ∂1 denote partial differentiation with respect to the first argument. The next assumptionstipulates that

Assumption 3.2 (i) u �→ pε|X=x,W=w(u) is differentiable a.e. on R and the conditional secondmoment of ∂1pε|X=x,W=w, denoted by vA0(x, w) := ∫

(∂1 log pε|X=x,W=w)2(u) pε|X=x,W=w(u) du, isuniformly bounded, that is, ‖vA0‖∞ := sup(x,w)∈supp(X)×supp(W) |vA0(x, w)| < ∞.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 7

(ii) The conditional distribution function of ε|X, W is Lipschitz, that is, there exists anonnegative ζ ∈ L2(PX ,W ) such that | ∫

(−∞,t1] pε|X=x,W=w(u) du − ∫(−∞,t2] pε|X=x,W=w(u) du| ≤

ζ(x, w)|t1 − t2|, t1, t2 ∈ R.

(i) implies (cf. Hájek 1972, Lemma A.3) that the square-root density p1/2ε|X=x,W=w is mean-square

differentiable, that is, given δ ∈ L2(PX ,W ),

p1/2ε|X=x,W=w(u + δ(x, w)) − p1/2

ε|X=x,W=w(u)

p1/2ε|X=x,W=w(u)

= 1

2

∂1pε|X=x,W=w(u)

pε|X=x,W=w(u)δ(x, w) + r(δ(x, w), u, x, w),

(6)where, for each ε > 0,

∫r2(δ(x, w), u, x, w) Pε|X=x,W=w(du) ≤ εδ2(x, w). As a consequence,

‖r(δ, ·, ·, ·)‖2,PA0= o(‖δ‖2,PX,W ) as ‖δ‖2,PX,W → 0. This fact is used to bound remainder terms for

Equation (A.3) in the proof of Lemma 3.1. (ii) implies that F1 and F2 satisfy Assumption 3.3(v).The requirements on the class of test functions are as follows.

Assumption 3.3 (i) F separates probability measures on R1+dim(W), that is, if P1 �= P2 are

probability measures on R1+dim(W), then P1f �= P2f for some f ∈ F .

(ii) F has a bounded envelope, that is, MF := sup(f ,z)∈F×R1+dim(W) |f (z)| < ∞.(iii) F is PA0 -Donsker.(iv) Elements of F are stable with respect to perturbations in their first argument in the sense

that f ∈ F =⇒ f (· + t, ·) ∈ F for all t ∈ R.(v) There exists a continuous function q : [0, ∞) → [0, ∞) such that q(0) = 0, supf ∈F ‖f

(· − �(X, θ , θ0), ·) − f (·, ·)‖2,PA0≤ q(‖θ − θ0‖), and supf ∈F ‖f r(· − �(X, θ , θ0), ·) − f r(·, ·)

‖2,PA0≤ q(‖θ − θ0‖), where �(X , θa, θb) := μ(X , θa) − μ(X, θb).

(i) is necessary because we employ elements of F to distinguish between the distributions ofZ and Z r (cf. the proof of Theorem 5.1). F1 and F2 satisfy (i) due to the fact that orthants andhalf-spaces separate probability measures. (ii), clearly satisfied by indicator functions, helps touniformly (in F) bound remainder terms in Equation (A.3) in the proof of Lemma 3.1. In (iii),F is required to be Donsker with respect to the ‘bigger’ measure PA0 rather than PZ in order tocontrol the estimation uncertainty (from estimating θ0) associated with the endogenous compo-nents of X. (iii), which implies that F r := {f r : f ∈ F} is also PA0 -Donsker because F and F r

are covered by the same number of L2(PA0)-brackets, is used in the proof of Lemma 3.2 to showthat n1/2(PZ − P

rZ) converges in distribution in �∞(F). Following V&W (Example 2.5.4), F1

(hence F r1) is PA0 -Donsker because its L2(PA0)-bracketing entropy is finite; the same argument

shows that F2 (hence F r2) is PA0 -Donsker. (iv) implies that test functions with estimation error

as an argument are still test functions. To see this, let f ∈ F1 so that f = 1(−∞,τ ]×(−∞,v] forsome (τ , v) ∈ R × R

dim(W). Then, f (Y − μ(X , θ ), W) = 1(−∞,τ ]×(−∞,v](ε − �(X, θ , θ0), W) =1(−∞,τ+�(X ,θ ,θ0)]×(−∞,v](ε, W), implying that f (Y − μ(X, θ ), W) ∈ F1; the same argument worksfor F2 as well. (v) is similar to Equation (2.7) of Khmaladze and Koul (2004) and Equation (3) ofvan derVaart and Wellner (2007). UnderAssumption 3.2(ii), F1 and F2 satisfy (v) with q(t) ∝ t1/2

(cf. Lemma A.3). (iii)–(v) plus an asymptotic equicontinuity argument helps uniformly (in F)bound Equation (A.2) in the proof of Lemma 3.1.

We also assume that θ is a consistent estimator of θ0, that is,

Assumption 3.4 plim(θ) = θ0.

For our results to hold, it is sufficient for θ to be consistent under implications of conditionalsymmetry that are strictly weaker than Equation (3). For example, it is enough for θ to be a

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

8 T. Chen and G. Tripathi

consistent generalised method of moments (GMM) or IV estimator of θ0 under E[ε|W ] = 0 w.p.1or E[Wε] = 0 (Newey and McFadden 1994), or even under the conditional median restrictionmedian(ε|W) = 0 w.p.1 (Sakata 2007).

We are now ready to describe how the estimation of θ0 affects the empirical distributions ofZ and Z r; this will help us derive the limiting distribution of KSF . Henceforth, μ0(·) := μ(·, θ0)

and the symbol oPr◦ indicates asymptotic negligibility in outer probability (Pr◦) to take care ofmeasurability issues that may arise when taking the supremum over an uncountable set.

Lemma 3.1 Let Assumptions 3.1–3.4 hold. Then,

supf ∈F

|(PZ − PZ)f − 〈f , (∂1 log pA0)μ′0〉PA0

(θ − θ0)| = oPr◦(n−1/2) + oPr(‖θ − θ0‖), (7)

supf ∈F

|(PrZ

− PrZ)f − 〈f r, (∂1 log pA0)μ

′0〉PA0

(θ − θ0)| = oPr◦(n−1/2) + oPr(‖θ − θ0‖). (8)

Lemma 3.1 is a linearisation result. It shows how the processes {PZ f : f ∈ F} and {PrZf : f ∈ F}

can be expanded about θ0 up to first order. The term 〈·, (∂1 log pA0)μ′0〉PA0

(θ − θ0), which vanishes

if θ = θ0, captures the uncertainty arising from estimating θ0. It is useful to note that conditionalsymmetry of ε|W was not used to derive Equations (7) and (8). Lemma 3.1 is thus of independentinterest and can be used in other situations as well; for example, the fact that Equations (7) and (8)hold whether the null hypothesis (1) is true or not, provided θ0 is suitably redefined (cf. Section 4),is used to show the consistency of n1/2KSF .

Next, we derive the limiting distribution of n1/2KSF . Begin by observing that since n1/2KSFis a functional of the stochastic process {n1/2(PZ − P

rZ)f : f ∈ F}, its distribution is determined

by the distribution of this process which in turn can be identified from the joint distribution ofits marginals. Towards this end, fix f ∈ F and write (PZ − P

rZ)f = (PZ − PZ)f − (Pr

Z− P

rZ)f +

(PZ − PrZ)f . Hence, since PZ = P

rZ by Equation (1), under the hull hypothesis we have that

(PZ − PrZ)f = ((PZ − PZ)f − 〈f , (∂1 log pA0)μ

′0〉PA0

(θ − θ0))

− ((PrZ

− PrZ)f − 〈f r, (∂1 log pA0)μ

′0〉PA0

(θ − θ0))

+ (PZ − PZ)f − (PrZ − P

rZ)f + 〈f − f r, (∂1 log pA0)μ

′0〉PA0

(θ − θ0).

Then, by Lemma 3.1 and the fact that (PZ − PZ)f − (PrZ − P

rZ)f = (PZ − PZ)(f − f r),

n1/2(PZ − PrZ)f = X0(f ) + oPr◦(1) + oPr(‖n1/2(θ − θ0)‖) unif. in f ∈ F , (9)

where X0(f ) := n1/2(PZ − PZ)(f − f r) + 〈f − f r, (∂1 log pA0)μ′0〉PA0

n1/2(θ − θ0). To identify the

marginal distribution of X0, we need some additional information about θ . In particular, we assumethat

Assumption 3.5 θ is asymptotically linear with influence function ϕ, that is, n1/2(θ − θ0) =n−1/2 ∑n

j=1 ϕ(Yj, Xj, Wj, θ0) + oPr(1), where Eϕ(Y , X, W , θ0) = 0 and E‖ϕ(Y , X, W , θ0)‖2 < ∞.

This assumption implies that n1/2(θ − θ0) is asymptotically normal. In practical terms, thismeans that θ , centred at θ0, is approximately Gaussian in finite samples. However, it is wellknown by now that IV estimators may not be normally distributed in small or large samples if theparameters they are estimating are poorly identified, as, for instance, may occur if the instruments

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 9

are weak and the degree of endogeneity is high (cf., e.g., Phillips 1989; Nelson and Startz 1990;Choi and Phillips 1992; Maddala and Jeong 1992). In Section 6.1, we examine how the strength ofinstruments and the degree of endogeneity of the regressors affects the finite sample performanceof our test.

By Assumption 3.5 and the central limit theorem (CLT) for i.i.d. random vectors, (n1/2(PZ −PZ)(f − f r), n1/2(θ − θ0)) is asymptotically multivariate normal for each f ∈ F . In particular,

n1/2(θ − θ0) converges in distribution in Rdim(�) to Nϕ0

d= N(0dim(�)×1, Eϕ0ϕ′0), where ϕ0 :=

ϕ(Y , X, W , θ0). Hence, for all f ∈ F ,

X0(f )d−→ X0(f ) in R, (10)

where X0(f ) := G0f + 〈f − f r, (∂1 log pA0)μ′0〉PA0

Nϕ0 and G0 is the Gaussian process definedearlier. The limit X0 is a Gaussian process because all finite-dimensional distributions ofX0 are asymptotically Gaussian (due to the previously stated fact that (n1/2(PZ − PZ)(f −f r), n1/2(θ − θ0)) is asymptotically multivariate normal for each f ∈ F). Since the remainderterms in Equation (9) are asymptotically negligible in probability uniformly in f ∈ F , the pro-cess {n1/2(PZ − P

rZ)f : f ∈ F} will converge in distribution in �∞(F) to the limiting process

{X0(f ) : f ∈ F} provided that {X0(f ) : f ∈ F} is asymptotically tight. As the next result shows,this is indeed the case.

Lemma 3.2 Let Assumptions 3.1–3.5 hold and Equation (1) be true. Then, {n1/2(PZ − PrZ)f :

f ∈ F} d−→ {X0(f ) : f ∈ F} in �∞(F).

Since x �→ supf ∈F |x(f )| (being a norm) is continuous on �∞(F) and the limiting process inLemma 3.2 takes values in �∞(F), an application of the continuous mapping theorem (V&W,Theorem 1.3.6) yields the limiting distribution of n1/2KSF := supf ∈F |n1/2(PZ − P

rZ)f |.

Theorem 3.1 Let Assumptions 3.1–3.5 hold and Equation (1) be true. Then, the random variablen1/2KSF converges in distribution to the random variable supf ∈F |X0(f )|.

This leads immediately to the asymptotic distribution of n1/2R = n1/2KSF1 .

Corollary 3.1 Let the conditions of Theorem 3.1 hold. Then, n1/2R converges in distribution tothe random variable R0 := supz∈R1+dim(W) |G01(−∞,z] + 〈1(−∞,z] − 1r

(−∞,z], (∂1 log pA0)μ′0〉PA0

Nϕ0 |.Example 3.1 (No endogenous regressors) The limiting distribution of the test statistics simpli-fies if there are no endogenous regressors. To see this, suppose that all regressors are exogenous,that is, W = X, so that A0 = Z . Since 2f = (f + f r) + (f − f r) and Equation (1) implies that ∂1pZ

is antisymmetric in its first coordinate, it follows from Corollary 3.1 that n1/2R converges in dis-tribution to the random variable supz∈R1+dim(X) |G01(−∞,z] + 2〈1(−∞,z], (∂1 log pZ)μ′

0〉PZ Nϕ0 |. �

We also expect n1/2Rmax to converge in distribution to R0 under the null hypothesis becauseF1 is dense in F1 with probability approaching one (w.p.a.1) (cf. Lemma A.4). The next resultconfirms this intuition (cf. Andrews 1997, p. 1105).

Lemma 3.3 Let the conditions of Theorem 3.1 hold. Then, n1/2Rmax and n1/2R both converge indistribution to the same random variable, that is, R0.

Now that we know how Rmax behaves in large samples, a size-α test, α ∈ (0, 1), for Equation (1)based on Rmax can be formalised as follows: reject H0 if n1/2Rmax ≥ cα , where cα is the 1 − α

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

10 T. Chen and G. Tripathi

quantile of R0. However, cα cannot be obtained from a table because the distribution of R0 dependsupon θ0 (via Nϕ0 ) and PA0 ; that is, in other words, n1/2Rmax is not an asymptotically pivotal statistic.Instead, quantiles of R0 can be simulated as described in Section 4.

4. The test using simulated critical values

Simulated critical values for specification tests have been used earlier in the literature, cf., e.g.,Su and Wei (1991), Hansen (1996), Neumeyer et al. (2005), Delgado, Domínguez, and Lavergne(2006), and Delgado and Escanciano (2007). In this section, we enlarge this literature by showingthat the use of simulated critical values for specification tests can be extended to handle modelswith endogenous regressors.

Intuitively, the basic idea behind simulating critical values for Rmax is to introduce additionalrandomness in the data and construct an artificial random variable (say R∗

max) that has the samedistribution as Rmax under the null hypothesis. Repeated draws of R∗

max then yield a random samplefrom the distribution of Rmax under the null, which can be used to estimate the quantiles of Rmax

to any desired level of accuracy since the number of draws is in control of the researcher.The test using the simulated critical values is implemented as follows:

(i) Use θ and the residuals εj := Yj − μ(Xj, θ ), j = 1, . . . , n, to calculate n1/2Rmax.(ii) Independent of the observed data Dn := {(Yj, Xj, Wj) : 1 ≤ j ≤ n}, use a random num-

ber generator to generate R1, . . . , Rni.i.d.∼ Rademacher and define Y∗

j := μ(Xj, θ ) + Rj εj,j = 1, . . . , n. Since Law(Rε|W1, . . . , Wn) = Law(−Rε|W1, . . . , Wn) by construction, thesimulated sample {(Rj εj, Wj) : 1 ≤ j ≤ n} satisfies the null hypothesis. (Recall that a ran-dom variable R is said to have the Rademacher, or symmetric Bernoulli, distribution ifPR := (δ−1 + δ1)/2, i.e., R takes the values −1 and 1 with equal probability.)

(iii) Re-estimate θ0 with (Y∗j , Xj, Wj), j = 1, . . . , n, using the same procedure used earlier to obtain

θ to make the estimation error in the simulated sample resemble the estimation error in thedata. Denote the resulting estimator by θ∗ and let ε∗

j := Y∗j − μ(Xj, θ∗), j = 1, . . . , n, be the

corresponding residuals.(iv) Next, with Z∗ := (ε∗, W), Z := (−ε∗, W), Z∗ := {Z∗

1 , . . . , Z∗n }, calculate n1/2R∗

max :=n1/2 maxz∈Z∗ |n−1 ∑n

j=1 1(−∞,z](Z∗j ) − n−1 ∑n

j=1 1(−∞,z](Zj)|. As shown subsequently, n1/2

R∗max has the same limiting distribution as n1/2Rmax when the null hypothesis is true.

(v) Repeat (ii)–(iv) B times to obtain B random draws from the distribution of n1/2Rmax underthe null hypothesis. Calculate the 1 − α sample quantile (c∗

α,B) for these draws.

(vi) The decision rule ‘Reject H0 if the value of n1/2Rmax observed in (i) exceeds c∗α,B’ then leads

to a size-α test for Equation (1). Alternatively, the p-value can be obtained by calculating thefraction of draws in (v) that exceed the observed value of n1/2Rmax.

We now show that (vi) is justified asymptotically. To summarise our approach, we firstprove that n1/2KS

∗F := n1/2 supf ∈F |(P∗

Z− PZ)f | has a well defined limiting distribution irre-

spective of whether the null hypothesis is true or not. From this, it follows that n1/2R∗ :=n1/2 supz∈R1+dim(W) |n−1 ∑n

j=1 1(−∞,z](Z∗j ) − n−1 ∑n

j=1 1(−∞,z](Zj)| has the same limiting distribu-

tion as n1/2R under the null hypothesis and is bounded in probability otherwise. The demonstrationends by showing that n1/2R∗

max has the same limiting distribution as n1/2R∗ whether the nullhypothesis is true or not, implying in particular that n1/2R∗

max (hence its quantiles) is bounded inprobability even when the null hypothesis is false. Therefore, the test using critical values from

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 11

the simulated distribution of n1/2R∗max is consistent, that is, it rejects a false null w.p.a.1, because

n1/2Rmax → ∞ w.p.a.1 when the null hypothesis is false (cf. Section 5).We begin by assuming that θ and θ∗ have a well defined limit even when the null hypothesis is

false, i.e., irrespective of whether (1) is true or not,

Assumption 4.1 plim((θ∗, θ ) − (θ1, θ1)) = 0 for some θ1 ∈ �.

The parameter θ1 is called the ‘pseudo-true value’ and exists under very general condi-tions (cf., e.g., Hall and Inoue 2003). (Of course, if the null hypothesis is true then θ1 = θ0.)Let Z(θ) := (ε(θ), W), Z r(θ) := (−ε(θ), W), A1 := A(θ1), and let R denote the operator that‘Rademacherizes’ the first component of its argument, for example, R(Z) := (Rε, W) andR(A1) := (Rε(θ1), X , W). Given f ∈ F , we can write (cf. the remark at the end of this section)

(P∗Z

− PZ)f = (PR(Z(θ1)) − PR(Z r(θ1)))f + (P∗Z

− PR(Z(θ1)))f − (PZ − PR(Z r(θ1)))f

= (PR(Z(θ1)) − PR(Z(θ1)))(f − f r) + (P∗Z

− PR(Z(θ1)))f − (PZ − PR(Z r(θ1)))f . (11)

To get some intuition behind why it makes sense to centre the last two terms about PR(Z(θ1)) andPR(Z r(θ1)), respectively, note that ε∗ = Rε − �(X , θ∗, θ ) = Rε(θ1) − �(X, θ∗, θ ) − R�(X, θ , θ1).Hence, P

∗Z

− PR(Z(θ1)) captures the estimation error from re-estimating θ0 using the simulated

sample; a similar interpretation holds for PZ − PR(Z r(θ1)).To study the properties of P

∗Z

and PZ without assuming the null hypothesis to be true, westrengthen Assumptions 3.1–3.3 as follows.

Assumption 4.2 (i) There exists a neighbourhood of θ1 such that for each θ inthis neighbourhood, μ(x, θ∗) = μ(x, θ) + μ′(x, θ)(θ∗ − θ) + ρ(x, θ∗, θ) with ‖ supθ∈B(θ ,δ) ρ

(·, θ , θ)‖2,PX = o(δ) as δ → 0, ‖μ(·, θ1)‖ ∈ L2(PX), and ‖μ(·, θ ) − μ(·, θ1)‖2,PX = oPr(1).(ii) pε(θ1)|X ,W satisfies Assumption 3.2. (iii) Assumption 3.3 holds with PA0 replaced by PR(A1) and�(X, θ , θ0) replaced by �(−1, X , θ∗, θ , θ1) and �(1, X, θ∗, θ , θ1), where �(R, X, θa, θb, θ1) :=�(X, θa, θb) + R�(X , θb, θ1).

Since plim(θ) = θ1 by Assumption 4.1, (i) implies that, w.p.a.1, μ(x, θ∗) = μ(x, θ ) +μ′(x, θ )(θ∗ − θ ) + ρ(x, θ∗, θ ) and ‖ sup

θ∈B(θ ,‖θ∗−θ‖) ρ(·, θ , θ )‖2,PX = o(‖θ∗ − θ‖). Under (iii),

‖f θ∗,θ − f θ1,θ1‖2,PR(A1)≤ q(‖(θ∗, θ ) − (θ1, θ1)‖). These facts help prove the first result in this

section, which shows how the re-estimation step affects the empirical distributions of Z∗ and Z (cf.Lemma 3.1). Henceforth, let μ1(·) := μ(·, θ1) and, for f ∈ F , f θa ,θb(Rε(θ1), X, W) := f (Rε(θ1) −�(R, X, θa, θb, θ1), W) and f rθa ,θb(Rε(θ1), X , W) := f r(Rε(θ1) − �(R, X, θa, θb, θ1), W).

Lemma 4.1 Let Assumptions 4.1 and 4.2 hold. Then, whether or not the null hypothesis is true,

supf ∈F

|(P∗Z

− PR(Z(θ1)))f − PR(A1)(fθ ,θ − f θ1,θ1)

− 0.5〈f − f r, (∂1 log pA1)μ′1〉PA1

(θ∗ − θ )| = oPr◦(n−1/2) + oPr(‖θ∗ − θ‖)and

supf ∈F

|(PZ − PR(Z r(θ1)))f − PR(A1)(frθ ,θ − f rθ1,θ1)

− 0.5〈f r − f , (∂1 log pA1)μ′1〉PA1

(θ∗ − θ )| = oPr◦(n−1/2) + oPr(‖θ∗ − θ‖).

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

12 T. Chen and G. Tripathi

As shown in the appendix,

PR(A1)(fθ ,θ − f θ1,θ1) = PR(A1)(f

rθ ,θ − f rθ1,θ1). (12)

Therefore, by Equation (11) and Lemma 4.1 and irrespective of whether the null hypothesis istrue or not,

n1/2(P∗Z

− PZ)f = X∗1(f ) + oPr◦(1) + oPr(‖n1/2(θ∗ − θ )‖) unif. in f ∈ F ,

where

X∗1(f ) := n1/2(PR(Z(θ1)) − PR(Z(θ1)))(f − f r) + 〈f − f r, (∂1 log pA1)μ

′1〉PA1

n1/2(θ∗ − θ ). (13)

The empirical process {n1/2(PR(Z(θ1)) − PR(Z(θ1)))(f − f r) : f ∈ F} converges in distributionin �∞(F) to G1, a mean zero Gaussian process with covariance function EG1f1G1f2 := E[(f1 −f r1 )(Z(θ1))(f2 − f r

2 )(Z(θ1))], f1, f2 ∈ F (cf. the proof of Lemma 4.2). X∗1 is thus a simulated version

of X0 that remains well defined even if the null hypothesis is false. This also illustrates theimportance of re-estimating θ0 using the simulated sample: if θ0 was not re-estimated, that is, setθ∗ = θ in Equation (13), then X

∗1 would not mimic X0 under the null hypothesis and there would

be no reason to believe why n1/2KS∗F would possess the same limiting distribution as n1/2KSF

when the null hypothesis is true.Let Pr∗ and E

∗ stand for probability and expectation conditional on Dn, that is, integrals withrespect to the Rademacher distribution because given Dn, the only source of randomness in(Y∗, X, W , θ ) is from R. Stochastic order symbols under Pr∗ are written as oPr∗ and OPr∗ .

To derive the distribution of {X∗1(f ) : f ∈ F}, we assume that

Assumption 4.3 (i) Conditional on Dn, θ∗ is asymptotically linear with influence function ϕ∗,that is, n1/2(θ∗ − θ ) = n−1/2 ∑n

j=1 ϕ∗(Y∗j , Xj, Wj, θ ) + oPr∗(1), where E

∗ϕ∗(Y∗, X, W , θ ) = 0 and

E∗‖ϕ∗(Y∗, X, W , θ )‖2 < ∞. (ii) Lindeberg’s condition is satisfied, that is, for all ε > 0 and

λ ∈ Rdim(�), E

∗[|λ′ϕ∗(Y∗, X , W , θ )|21(|λ′ϕ∗(Y∗, X, W , θ )| > εn1/2σ ∗λ )] = o(σ ∗2

λ ), where σ ∗2λ :=

E∗[n−1/2 ∑n

j=1 λ′ϕ∗(Y∗j , Xj, Wj, θ )]2. (iii) σ ∗2

λ = E[λ′ϕ1]2 + oPr(1), where ϕ1 := ϕ(Y , X, W , θ1)

and ϕ is the influence function defined earlier in Assumption 3.5.

This assumption is straightforward to verify if μ is linear in parameters (cf. Neumeyer et al.2005, p. 705). For instance, suppose that μ(X , θ) = X ′θ and dim(W) = dim(X). Then, letting θ :=(∑n

j=1 WjX ′j )

−1 ∑nj=1 WjYj be the IV estimator of θ0 so that θ∗ := (

∑nj=1 WjX ′

j )−1 ∑n

j=1 WjY∗j , it

is easy to verify that

n1/2(θ∗ − θ ) =⎛⎝n−1

n∑j=1

WjX′j

⎞⎠−1 ⎛⎝n−1/2n∑

j=1

WjεjRj

⎞⎠−

⎛⎝n−1n∑

j=1

WjX′j

⎞⎠−1 ⎛⎝n−1n∑

j=1

WjX′j Rj

⎞⎠ n1/2(θ − θ0). (14)

Since n−1 ∑nj=1 WjX ′

j Rj = oPr∗(1), the second term is asymptotically negligible in probability,conditional on Dn.

A necessary condition for the critical values to be accurately simulated is that θ∗ mimic thebehaviour of θ . In particular, since θ centred at θ0 is asymptotically linear by Assumption 3.5, it is

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 13

required that θ∗ centred at θ also be asymptotically linear with its influence function resembling theinfluence function of θ . (That, in fact, is why θ∗ is re-estimated using the same IV procedure usedto obtain θ even though, conditional on the observed data, the simulated residuals (R1ε1, . . . , Rnεn)

are independent of the regressors (X1, . . . , Xn).)Recall that when θ0 is poorly identified, as is the case when the instruments are weak and

the regressors are highly endogenous, θ − θ0 may not be Gaussian in small or large samples,hence, not asymptotically linear. In contrast, θ∗, by virtue of being centred at θ instead of θ0,is likely to be asymptotically linear even under these circumstances. (This is easily seen fromEquation (14), where the only requirement on θ for the second term to be asymptotically negligiblein probability is that n1/2(θ − θ0) = OPr(1), that is, in particular, asymptotic normality of n1/2(θ −θ0) is not required.) The practical implication is that if the instruments are weak and the degreeof endogeneity is high, then θ∗ may not mimic θ well enough in finite samples to guarantee thatthe critical values are accurately simulated. We investigate this issue in Section 6.1.

Conditional on Dn, n1/2λ′(θ∗ − θ )/σ ∗λ converges in distribution to a standard Gaussian random

variable byAssumption 4.3(i), (ii), and Lindeberg’s CLT. Hence, byAssumption 4.3(iii), n1/2(θ∗ −θ ) converges in distribution, conditional on Dn, to Nϕ1 , a mean zero Gaussian random vector withvariance E[ϕ1ϕ

′1]. A dominated convergence argument (Andrews 1997, p. 1101, Footnote 2) then

implies that n1/2(θ∗ − θ ) also converges in distribution unconditionally to Nϕ1 . This leads to thefollowing result about the limiting distribution of P

∗Z

− PZ .

Lemma 4.2 Let Assumptions 4.1–4.3 hold. Then, irrespective of whether the null hypothesis is

true or not, {n1/2(P∗Z

− PZ)f : f ∈ F} d−→ {X1(f ) : f ∈ F} in �∞(F), conditionally on Dn, henceunconditionally, where X1(f ) := G1f + 〈f − f r, (∂1 log pA1)μ

′1〉PA1

Nϕ1 .

It follows by the continuous mapping theorem that

Corollary 4.1 Under the assumptions maintained in Lemma 4.2, n1/2KS∗F converges in dis-

tribution to supf ∈F |X1(f )| whether the null hypothesis is true or not. Consequently, letting

F = F1, the same holds for n1/2R∗, that is, it converges in distribution to the random variableR1 := supz∈R1+dim(W) |G11(−∞,z] + 〈1(−∞,z] − 1r

(−∞,z], (∂1 log pA1)μ′1〉PA1

Nϕ1 |.

Therefore, n1/2R∗ has the same limiting distribution as n1/2R under the null hypothesis (becausethen θ1 = θ0) and is bounded in probability otherwise. Finally, we have that

Lemma 4.3 Under Assumptions 4.1–4.3, n1/2R∗max and n1/2R∗ have the same limiting distribution

whether or not the null hypothesis is true.

Since n1/2Rmax and n1/2R have the same asymptotic distribution under the null hypothesis(cf. Lemma 3.3), it follows by Corollary 4.1 and Lemma 4.3 that n1/2R∗

max has the same lim-iting distribution as n1/2Rmax when the null hypothesis is true and is bounded in probabilityotherwise. Therefore, under the null hypothesis, the simulated critical value c∗

α,B converges inprobability (conditional on Dn) to cα , the 1 − α quantile of R0, provided B → ∞ as n → ∞ (cf.Andrews 1997, p. 1108). This completes our argument justifying the use of simulated criticalvalues.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

14 T. Chen and G. Tripathi

Remark By Rademacherization, (Rε(θ1), W)d= (−Rε(θ1), W). Thus, for f ∈ F ,

PR(Z(θ1))f = Ef (R(Z(θ1))) = Ef (Rε(θ1), W) = Ef (−Rε(θ1), W) = Ef r(Rε(θ1), W)

= PR(Z(θ1))fr.

Hence, PR(Z(θ1))(f − f r) = 0. Moreover, since f (R(Z r(θ1))) = f (−Rε(θ1), W) = f r(Rε(θ1),W) = f r(R(Z(θ1))) implies that PR(Z r(θ1))f = n−1 ∑n

j=1 f (R(Z rj )) = n−1 ∑n

j=1 f r(R(Zj(θ1))) =PR(Z(θ1))f

r, it follows that (PR(Z(θ1)) − PR(Z r(θ1)))f = (PR(Z(θ1)) − PR(Z(θ1)))(f − f r).

5. Large sample results under the alternative

We begin by showing that our test is consistent, that is, it rejects any deviation from conditionalsymmetry w.p.a.1.

Theorem 5.1 Let Equation (2) be true and Assumptions 3.1–3.4 hold with (θ0, ε) replaced by(θ1, ε(θ1)). Then, limn→∞ n1/2KSF = ∞ w.p.a.1.

Hence, n1/2KSF will reject Equation (1) w.p.a.1 as n → ∞ because the simulated criticalvalues are bounded in probability (cf. Section 4). Letting F = F1 in Theorem 5.1, we have thatn1/2R is consistent. With a little additional effort, we can also show that

Theorem 5.2 limn→∞ n1/2Rmax = ∞ w.p.a.1 under the conditions of Theorem 5.1.

Hence, n1/2Rmax is consistent against fixed alternatives as well. Next, we derive the power ofn1/2KSF against a sequence of alternatives that lie in a n1/2-neighbourhood of the null. To createthe local alternatives of interest, begin by assuming that the null hypothesis is true, that is, PZ = P

rZ .

Next, let θn be a sequence in � such that limn→∞ θn = θ0 and assume that An := (ε(θn), X, W)

is drawn from the perturbed measure PAn := PA0(1 + n−1/2h), where h : R1+dim(W) → R is such

that ‖h‖∞ := supR×supp(W) |h(u, w)| < ∞ and∫

h dPZ = 0 (these conditions ensure that PAn is aprobability measure for n ≥ ‖h‖2∞). Reflecting the first coordinate about the origin, this leads toa sequence of distributions for Z(θn) := (ε(θn), W) and Z r(θn) := (−ε(θn), W) given by

H1n : PZ(θn) := PZ(1 + n−1/2h) and PrZ(θn) := PZ(1 + n−1/2hr). (15)

Henceforth, let hr �= h. Then, for f ∈ F ,

(5.1) =⇒ (PZ(θn) − PrZ(θn))f = n−1/2

PZ(h − hr)f =: n−1/2�h(f )

and supf ∈F |�h(f )| > 0 because F is measure determining and h is not symmetric in its firstcoordinate. Thus, Equation (15) defines a sequence of local alternatives for Equation (3).

Since observations under H1n are independent but (for different n) not identically distributedbecause the underlying measures PZ(θn) and P

rZ(θn) depend upon n, the assumptions introduced

in Section 3 have to be strengthened in order to derive the distribution of n1/2KSF under H1n. Webegin with Assumption 3.1.

Assumption 5.1 There exists a neighbourhood of θ0 such that for each θn in thisneighbourhood, μ(x, ·) is mean-square differentiable at θn, that is, μ(x, θ) = μ(x, θn) +μ′(x, θn)(θ − θn) + ρ(x, θ , θn) with ‖ supθ∈B(θn,δ) ρ(·, θ , θn)‖2,PX = o(δ) as δ → 0, ‖μ(·, θ0)‖ ∈L2(PX), and ‖μ(·, θn) − μ(·, θ0)‖2,PX = o(1).

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 15

Next, we strengthen Assumption 3.2.

Assumption 5.2 (i) There exists a neighbourhood of θ0 such that for each θn in thisneighbourhood, pε(θn)|X=x,W=w is a.e. differentiable on R and the conditional second momentof the derivative is uniformly bounded on N × supp(X) × supp(W). (ii) For each θn in the afore-mentioned neighbourhood, the conditional distribution function of ε(θn) given (X, W) is Lipschitzand the nonnegative Lipschitz constants (ζn) are such that supn∈N ζn ∈ L2(PX ,W ).

Assumption 3.3 is modified as follows:

Assumption 5.3 (i), (ii), and (iv) are same as (i), (ii), and (iv) of Assumption 3.3. (iii) Thebracketing integral

∫ ∞0 supn∈N

√log N[ ](ε, F , L2(PAn)) dε < ∞. (v) Same as Assumption 3.3(v)

but with (PA0 , θ0) replaced by (PAn , θn).

F1 and F2 satisfy (iii) because indicators of orthants and half-spaces are VC, hence, universallyDonsker (V&W, Example 2.5.4 and Problem 2.6.14). The argument in Lemma A.3 shows that F1

and F2 also satisfy (v) under Assumption 5.2(ii). By V&W (Theorem 2.8.4), (ii) and (iii) implythat F is Donsker and pre-Gaussian uniformly in (PAn). Since F and −F r are covered by thesame number of (pointwise) L2(PAn)-brackets,

N[ ](ε, F − F r, L2(PAn)) ≤ N[ ](ε, F , L2(PAn)) × N[ ](ε, −F r, L2(PAn)) = N2[ ](ε, F , L2(PAn)).

Hence, by (iii),∫ ∞

0supn∈N

√log N[ ](ε, F − F r, L2(PAn)) dε ≤

∫ ∞

0supn∈N

√2 log N[ ](ε, F , L2(PAn)) dε < ∞.

Therefore, by V&W (Theorem 2.8.4), F − F r is Donsker and pre-Gaussian uniformly in (PAn).This fact is used in the proof of Lemma 5.2.

Finally, Assumption 3.4 becomes

Assumption 5.4 plim(θ − θn) = 0.

Under these conditions it is straightforward to show that Lemma 3.1 remains valid with θ0

replaced by θn (the proof of the following result is virtually identical to the proof of Lemmas 3.1and 4.1 and is therefore omitted), that is,

Lemma 5.1 Let Assumptions 5.1–5.4 hold. Then, under H1n,

supf ∈F

|(PZ − PZ(θn))f − 〈f , (∂1 log pAn)μ′0〉PAn

(θ − θn)| = oPr◦(n−1/2) + oPr(‖θ − θn‖)

supf ∈F

|(PrZ

− PrZ(θn))f − 〈f r, (∂1 log pAn)μ

′0〉PAn

(θ − θn)| = oPr◦(n−1/2) + oPr(‖θ − θn‖).

As in Section 3, we use Lemma 5.1 to derive the distribution of n1/2KSF under H1n. Beginby observing that PZ − P

rZ

= (PZ − PZ(θn)) − (PrZ

− PrZ(θn)) + (PZ(θn) − PZ(θn)) − (Pr

Z(θn) −P

rZ(θn)) + (PZ(θn) − P

rZ(θn)). Hence, by Equation (15) and Lemma 5.1,

n1/2(PZ − PrZ)f = Xθn(f ) + �h(f ) + oPr◦(1) + oPr(‖n1/2(θ − θn)‖) unif. in f ∈ F ,

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

16 T. Chen and G. Tripathi

where Xθn(f ) := n1/2(PZ(θn) − PZ(θn))(f − f r) + 〈f − f r, (∂1 log pAn)μ′〉PAn

n1/2(θ − θn). To

identify the marginal distribution of Xθn assume that (cf. Andrews 1997, Assumption E2(i)):

Assumption 5.5 (i) Assumption 3.5 holds with θ0 replaced by θn. (ii) The Lindeberg con-dition E[|λ′ϕn|21(|λ′ϕn| > εn1/2σλ,n)] = o(σ 2

λ,n) holds for all ε > 0 and λ ∈ Rdim(�), where

ϕn := ϕ(Y , X, W , θn) and σ 2λ,n := E[λ′ϕn]2 → E[λ′ϕ0]2.

By Assumption 5.5, n1/2(θ − θn) converges in distribution to Nϕ0 by Lindeberg’s CLT for inidrandom variables. Let ∂1h exist and ‖∂1h‖∞ < ∞. Since log pAn = log pA0 + log(1 + n−1/2h) (thesecond term is well defined for n ≥ ‖h‖2∞),

〈f − f r, (∂1 log pAn)μ0〉PAn= 〈f − f r, (∂1 log pA0)μ0〉PAn

+ n−1/2〈f − f r, (∂1 h)μ0〉PA0

so that, by Cauchy–Schwarz,

‖〈f − f r, (∂1 log pAn)μ0〉PAn− 〈f − f r, (∂1 log pA0)μ0〉PA0

‖≤ n−1/2〈f − f r, (h + ∂1h)(∂1 log pA0)μ0〉PA0

≤ n−1/2‖f − f r‖2,PA0‖h + ∂1h‖∞‖vA0‖∞‖(μ′

0μ0)1/2‖2,PX .

Hence, for each f ∈ F , Xθn(f ) converges in distribution to X0(f ) by Lindeberg’s CLT. Therefore,n1/2(PZ − P

rZ)F converges in distribution in �∞(F) because (Xθn + �h)F is asymptotically tight.

Lemma 5.2 Let Assumptions 5.1–5.5 hold. Then, under H1n, {n1/2(PZ − PrZ)f : f ∈ F} d−→

{X0(f ) + �h(f ) : f ∈ F} in �∞(F).

Consequently, by the continuous mapping theorem, we have the limiting distribution of n1/2KSFunder H1n.

Theorem 5.3 Let Assumptions 5.1–5.5 hold. Then, under H1n, the random variable n1/2KSFconverges in distribution to the random variable supf ∈F |X0(f ) + �h(f )|.

Letting F = F1 in Theorem 5.3, we immediately obtain that

Corollary 5.1 If Assumptions 5.1–5.5 hold, then, under H1n, n1/2R converges in distributionto the random variable supz∈R1+dim(W) |G01(−∞,z] + 〈1(−∞,z] − 1r

(−∞,z], (∂1 log pA0)μ′0〉PA0

Nϕ0 +�h(1(−∞,z])|.

Given that we know that n1/2Rmax and n1/2R behave similarly under the null hypothesis, it isalso not surprising that

Lemma 5.3 Under the conditions of Theorem 5.3, n1/2Rmax and n1/2R have the same limitingdistribution under H1n.

Finally, as in Andrews (1997, p. 1114), it can be shown that n1/2KSF is asymptoticallylocally unbiased (the same argument works for n1/2Rmax as well). Indeed, since (F , ‖ · ‖2,PA0

)

is totally bounded (cf. the proof of Lemma 3.2), there exists a sequence of increasing finite sets(Fj) whose limit ∪∞

j=1Fj is dense (in the ‖ · ‖2,PA0norm) in F . Hence, supf ∈F |(X0 + �h)f | =

supf ∈∪∞j=1Fj

|(X0 + �h)f | = supf ∈∪∞j=1Fj

|(X0 + �h)f |, where the second equality follows because

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 17

X0 + �h is ‖ · ‖2,PA0-continuous on F w.p.1. (Almost all sample paths of X0(F) are uniformly

‖ · ‖2,PA0-continuous by V&W (Addendum 1.5.8), and uniform ‖ · ‖2,PA0

-continuity of �h followsbecause |�h(f )| ≤ 2‖h‖∞ ‖f ‖2,PA0

for all f ∈ F , that is, �h is a bounded linear functional on F .)Therefore, if B → ∞ as n → ∞,

limn→∞ PrH1n(n

1/2KSF > c∗α,B) = Pr(sup

f ∈F|(X0 + �h)f | > cα)

= limj→∞ Pr

(supf ∈Fj

|(X0 + �h)f | > cα

)(continuity of prob. meas.)

≥ limj→∞ Pr

(supf ∈Fj

|X0(f )| > cα

)(Anderson’s lemma)

= Pr

(sup

f ∈∪∞j=1Fj

|X0(f )| > cα

)= α (continuity of prob. meas.).

Hence, our test has nontrivial power against the local alternatives.

6. Monte Carlo study

We now investigate the small sample properties of n1/2Rmax for a linear IV model. Results arereported in Tables 1–5 for three different sample sizes: n = 50, 100, 200, and two sets of designs:the first with one instrument and the second with two instruments. (Parameters in the former are

Table 1. Empirical size of n1/2Rmax for the linear IV model with one instrument.

Design n corr(X, U) R2X ,W FX ,W 0.01 0.05 0.10

High endogeneity,weak instrument

50 0.8 0.03 1.6 0.005 (0.003) 0.021 (0.007) 0.04 (0.009)

r = 1, π1 = √1/99 100 0.8 0.02 2.2 0.003 (0.003) 0.016 (0.007) 0.038 (0.009)

σ1 = 1, σ2 = √2,

σV = 1, γ = 0.99200 0.8 0.02 3.1 0.000 (0.003) 0.005 (0.007) 0.019 (0.009)

High endogeneity,strong instrument

50 0.6 0.61 79 0.014 (0.003) 0.06 (0.007) 0.106 (0.009)

r = 0.5, π1 = 3 100 0.6 0.6 154 0.01 (0.003) 0.043 (0.007) 0.094 (0.009)

σ1 = √0.3, σ2 =√

0.31, σV = √1.5,

γ = 0.65

200 0.6 0.6 305 0.014 (0.003) 0.041 (0.007) 0.087 (0.009)

Low endogeneity,weak instrument

50 0.06 0.03 1.6 0.009 (0.003) 0.037 (0.007) 0.091 (0.009)

r = 1, π1 = 0.1 100 0.06 0.02 2.2 0.014 (0.003) 0.048 (0.007) 0.088 (0.009)

σ1 = 1, σ2 = √2,

σV = 1, γ = 0.08200 0.06 0.02 3.2 0.011 (0.003) 0.046 (0.007) 0.084 (0.009)

Low endogeneity,strong instrument

50 0.07 0.6 82 0.014 (0.003) 0.043 (0.007) 0.083 (0.009)

r = 9√

11, π1 = 0.04 100 0.07 0.6 161 0.016 (0.003) 0.044 (0.007) 0.095 (0.009)

σ1 = 1, σ2 = √2,

σV = 1, γ = 0.14200 0.07 0.6 317 0.007 (0.003) 0.051 (0.007) 0.097 (0.009)

Notes: The last three columns report the fraction of simulations for which n1/2Rmax > c∗α,B . Monte Carlo standard errors are given in

parentheses.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

18 T. Chen and G. Tripathi

Table 2. Empirical power of n1/2Rmax for the linear IV model with one instrument.

Nominal size (α)

Design n corr(X, U) R2X ,W FX ,W 0.01 0.05 0.10

High endogeneity,weak instrument

50 0.98 0.03 1.5 0.048 0.095 0.156

π1 = 1/n, γ = 0.99 100 0.98 0.01 1.3 0.035 0.101 0.181200 0.98 0.01 1.1 0.038 0.111 0.188

High endogeneity,strong instrument

50 0.51 0.62 93 0.132 0.319 0.442

π1 = 2.4, γ = 0.95 100 0.56 0.6 164 0.315 0.561 0.678200 0.59 0.58 291 0.716 0.903 0.950

Low endogeneity,weak instrument

50 0.07 0.02 1.1 0.02 0.093 0.153

π1 = 0.2, γ = 0.1 100 0.06 0.01 1.5 0.08 0.237 0.307200 0.07 0.01 2.3 0.31 0.45 0.573

Low endogeneity,strong instrument

50 0.04 0.66 108 0.106 0.3 0.431

π1 = 2.6, γ = 0.18 100 0.06 0.63 189 0.308 0.569 0.702200 0.07 0.6 343 0.706 0.894 0.958

Note: The last three columns report the fraction of simulations for which n1/2Rmax > c∗α,B .

Table 3. Empirical size of n1/2Rmax for the linear IV model with two instruments.

Nominal size (α)

Design n corr(X, U) R2X ,W FX ,W 0.01 0.05 0.10

High endogeneity,collectively weak instruments

50 0.8 0.04 1.1 0.013 (0.003) 0.032 (0.007) 0.059 (0.009)

r = 1, π1 = 1/n, π2 = 1/n 100 0.8 0.02 1 0.009 (0.003) 0.023 (0.007) 0.052 (0.009)

σ1 = √W (2) + 0.5, σ2 = √

10,σV = √

10, γ = 9.99200 0.8 0.01 1 0.005 (0.003) 0.016 (0.007) 0.04 (0.009)

High endogeneity,collectively strong instruments

50 0.55 0.61 39 0.014 (0.003) 0.055 (0.007) 0.099 (0.009)

r = 0.5, π1 = √2/3, π2 = √

1/3 100 0.55 0.6 76 0.009 (0.003) 0.049 (0.007) 0.103 (0.009)

σ1 = √0.3, σ2 =√

0.31(W (2) + 0.5),σV = √

1.5, γ = 0.65

200 0.55 0.6 150 0.016 (0.003) 0.051 (0.007) 0.109 (0.009)

Low endogeneity,collectively weak instruments

50 0.06 0.05 1.3 0.019 (0.003) 0.055 (0.007) 0.099 (0.009)

r = 1, π1 = 0.1√

2/3, π2 =0.1

√1/3

100 0.06 0.03 1.5 0.012 (0.003) 0.056 (0.007) 0.099 (0.009)

σ1 = 1, σ2 = √2(W (2) + 0.5),

σV = 1, γ = 0.08200 0.05 0.02 1.9 0.009 (0.003) 0.047 (0.007) 0.082 (0.009)

Low endogeneity,collectively strong instruments

50 0.03 0.6 41 0.012 (0.003) 0.051 (0.007) 0.101 (0.009)

r = 9√

11, π1 = 0.03, π2 = 0.02 100 0.03 0.6 79 0.012 (0.003) 0.056 (0.007) 0.099 (0.009)

σ1 = √W (2) + 0.5, σ2 = √

2,σV = 1, γ = 0.14

200 0.02 0.6 156 0.018 (0.003) 0.057 (0.007) 0.116 (0.009)

Notes: The last three columns report the fraction of simulations for which n1/2Rmax > c∗α,B . Monte Carlo standard errors are given in

parentheses.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 19

Table 4. Empirical power of n1/2Rmax for the linear IV model with two instruments.

Nominal size (α)

Design n corr(X, U) R2X ,W FX ,W 0.01 0.05 0.10

High endogeneity,collectively weak instruments

50 0.98 0.05 1.2 0.036 0.105 0.163

π1 = 1/n√

2/3, π2 = 1/n√

1/3,γ = 0.99

100 0.99 0.02 1.2 0.035 0.121 0.194

200 0.99 0.01 1.1 0.052 0.153 0.225

High endogeneity,collectively strong instruments

50 0.6 0.5 30 0.084 0.238 0.383

π1 = 2.4√

2/3, π2 = 2.4√

1/3,γ = 0.95

100 0.6 0.5 55 0.252 0.544 0.707

200 0.65 0.5 99 0.725 0.917 0.966

Low endogeneity,collectively weak instruments

50 0.1 0.05 1.2 0.056 0.133 0.227

π1 = 0.2√

2/3, π2 = 0.2√

1/3,γ = 0.1

100 0.09 0.05 1.1 0.117 0.255 0.365

200 0.09 0.01 1.5 0.371 0.526 0.624

Low endogeneity,collectively strong instruments

50 0.07 0.58 37 0.068 0.211 0.331

π1 = 2.6√

2/3, π2 = 2.6√

1/3,γ = 0.18

100 0.08 0.55 64 0.237 0.493 0.644

200 0.09 0.53 116 0.699 0.896 0.949

Note: The last three columns report the fraction of simulations for which n1/2Rmax > c∗α,B .

estimated using a set of just-identified moment conditions, those in the latter are estimated from aset of over-identified moment conditions.) To keep the running time manageable, these results arebased on 1000 simulations and 500 replications per simulation for simulating the critical values.Code for the simulation experiment is written in R and is available from the authors.

6.1. Linear IV model with one instrument

We consider the simple linear regression model Y := θ0 + θ1X + U, where the regressor X iscorrelated with U. The reduced form for the endogenous regressor is given by X := π0 + π1W +V , with W the single instrument. We fix θ0 = θ1 = π0 = 1; choice of π1 is described subsequently.The random variables U and W are generated such that the conditional distribution of U given Wis symmetric about the origin without U being independent of W . This is done as follows. First, fora pre-selected positive number r specified in Table 1, we draw W from the two-point distribution(δr + δ−r)/2. Next, given W , we generate (U, V) bivariate normal with zero mean and covarianceγ ; choice of γ is described subsequently. The variance of U, denoted by σ 2

U , is allowed to dependupon W by choosing σ 2

U := σ 21 1(W = r) + σ 2

2 1(W = −r), with σ1 �= σ2; the variance of V ,denoted by σ 2

V , is constant. The parameters (θ0, θ1) are estimated from the just-identified momentconditions EU = 0 and E[WU] = 0 using the simulated data (Y1, X1, W1), . . . , (Yn, Xn, Wn). Thatis, the estimator of (θ0, θ1) is given by θ := (θ0, θ1)2×1 := (

∑nj=1 WjX ′

j )−1 ∑n

j=1 WjYj, where W :=(1, W)2×1 and X := (1, X)2×1.

As mentioned earlier in Sections 3 and 4, the finite sample performance of our test may beaffected if the instruments are weak and the regressors are highly endogenous. To study the impactthat the degree of endogeneity of X and strength of the instrument W may have on the small sampleproperties of our test statistic, the empirical size and power of n1/2Rmax are reported in Tables 1and 2, respectively, for the 2 × 2 contingency table {high endogeneity, low endogeneity} ×

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

20 T. Chen and G. Tripathi

Table 5. Empirical size and power of n1/2Rmax for the linear IV model with one strong (W (1)) and one weak (W (2))instrument.

Nominal size (α)

Design n corr(X, U) R2X ,W FX ,W 0.01 0.05 0.10

High endogeneity (nullhypothesis is true)

50 0.55 0.61 39 0.014 (0.003) 0.045 (0.007) 0.078 (0.009)

r = 0.5, π1 = √0.99,

π2 = √0.01

100 0.55 0.6 76 0.011 (0.003) 0.054 (0.007) 0.098 (0.009)

σ1 = √0.3, σ2 =√

0.31(W (2) + 0.5),σV = √

1.5, γ = 0.65

200 0.55 0.6 150 0.013 (0.003) 0.05 (0.007) 0.107 (0.009)

High endogeneity(alternative hypothesisis true)

50 0.54 0.6 44 0.082 0.21 0.357

π1 = 2.4√

0.99, π2 =2.4

√0.01, γ = 0.95

100 0.56 0.6 79 0.239 0.533 0.693

200 0.6 0.6 141 0.721 0.91 0.959

Low endogeneity (nullhypothesis is true)

50 0.03 0.6 41 0.006 (0.003) 0.045 (0.007) 0.091 (0.009)

r = 0.5, π1 = √0.99,

π2 = √0.01

100 0.03 0.6 80 0.015 (0.003) 0.048 (0.007) 0.09 (0.009)

σ1 = √0.3, σ2 =√

0.31(W (2) + 0.5),σV = √

1.5, γ = 0.65

200 0.02 0.6 156 0.015 (0.003) 0.062 (0.007) 0.117 (0.009)

Low endogeneity(alternative hypothesisis true)

50 0.06 0.67 53 0.068 0.195 0.323

π1 = 2.6√

0.99, π2 =2.6

√0.01, γ = 0.18

100 0.07 0.63 92 0.224 0.488 0.646

200 0.08 0.61 166 0.681 0.889 0.951

Notes: The last three columns report the fraction of simulations for which n1/2Rmax > c∗α,B . Monte Carlo standard errors are given in

parentheses.

{weak instrument, strong instrument}. By a strong (resp. weak) instrument we mean here thatR2

X ,W ≥ 0.6 (resp. R2X ,W ≤ 0.03), where R2

X ,W is the R2 statistic for the reduced form regression.Tables 1 and 2 report R2

X ,W and the first-stage F-statistic FX ,W for testing π1 = 0, averaged over thesimulations. (The strength of the instrument can also be stated in terms of the sample ‘concentrationparameter’ by using the fact that since EW = 0, the population concentration parameter is givenby R2

0/(1 − R20), where R2

0 is the population R2 for the reduced form regression, that is, R2X ,W is the

sample analogue of R20.] Similarly, high (resp. low) endogeneity means that |corr(X, U)| ≥ 0.8

(resp. |corr(X, U)| ≤ 0.08), where corr denotes sample correlation.A point to note: since the max-imum value that corr2(X , W) ∧ corr(X , U) can take is (

√5 − 1)/2 (cf. Lemma A.1) for the high

endogeneity and strong instrument design, we generate the data so that R2X ,W ≈ corr(X, U) ≈ 0.6.

The values of π1 and γ for the first design in Tables 1 and 2 are chosen to create a combinationof highly endogenous X , weak instrument, and high correlation between U and V . As noted byMaddala and Jeong (1992), this trifecta causes θ to be nonnormal in small samples so that our testis not likely to perform well for this design (recall the discussion after Assumption 4.3). Values ofπ1 and γ for the remaining three designs were chosen so that θ is approximately normal in finitesamples. (This was verified by looking at the quantile plots of θ across the simulations. To savespace, these plots are not reported in the paper.)

The results in Table 1 suggest that our test is working as expected. When X is highly endogenous,W is a weak instrument, and U and V are highly correlated, the empirical size of n1/2Rmax is not

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 21

close to its nominal value. Except for this worst case scenario, the empirical size is not statisticallydifferent from its nominal value at 5% level of significance. This makes intuitive sense when theinstrument is strong because θ then behaves as expected irrespective of the degree of endogeneity.However, as Table 1 reveals, an interesting finding is that the empirical size can be close to nominalsize even when the instrument is weak provided the regressor is not very endogenous, that is, thethird design. This is because θ is approximately Gaussian for this design so that the critical valuesare accurately simulated.

We also carried out a small simulation to examine the power of n1/2Rmax in finite samples. Now,

θ0 = θ1 = π0 = 1 and Wd= (δ1 + δ−1)/2. Independent of W , we generate (U, V) bivariate normal

with mean zero, variance one, covariance γ , and let U := W(eU − e0.5) and V := W(eV − e0.5),that is, U and V are symmetrised shifted lognormals. Note that U is conditionally asymmetricbut marginally symmetric. Moreover, EU = 0 and E[WU] = 0 so that the structural parameters(θ0, θ1) remain identified, and θ remains consistent for (θ0, θ1), under the alternative hypothesis.

The results in Table 2 again validate our intuition. In particular, as expected, the test exhibitspoor power for the first design (where X is highly endogenous, W is a weak instrument, and Uand V are highly correlated), and the power does not seem to improve as the sample size getslarger. The performance improves markedly for the weak instrument case when the degree ofendogeneity gets smaller, namely, power increases with sample size and is several times higherwhen compared to the first design by the time n = 200. The test, of course, works best in termsof power when the instrument is strong; in particular, its power then appears to be robust to thedegree of endogeneity.

Next, we enlarge the set of IV to see how that affects the finite sample size and power ofn1/2Rmax. We consider a model with two instruments so that in addition to the cases consideredearlier, we can now also examine the finite sample properties of n1/2Rmax in a setting whereone instrument is strong and the other is weak. (Intuitively, we expect the strong instrument tocompensate for the weak one.)

6.2. Linear IV model with two instruments

As in Section 6.1, Y := θ0 + θ1X + U, where X is correlated with U. However, we now have twoinstruments W := (W (1), W (2))2×1 such that, under the null hypothesis, the conditional distributionof U given W is symmetric about the origin. The reduced form model for the endogenous regressoris given by X := π0 + π1W (1) + π2W (2) + V . We set θ0 = θ1 = π0 = 1 as before. The choice ofπ1 and π2 is described in Tables 3–5. The first IV W (1) is identical to the instrument in Section 6.1and is generated as described earlier under the null hypothesis as well as the alternative hypothesis.The second instrument W (2) is stochastically independent of W (1) and is generated as follows.

Under the null hypothesis, W (2) d= r√

12 Uniform(0, 1) is a rescaled uniform random variable,where r is the pre-selected positive number used to generate W (1). Under the alternative hypothesis,

W (2) d= Uniform(0, 1). The error terms U and V are generated as follows. Under the null hypothe-sis, (U, V) is drawn from a bivariate normal distribution with mean zero and covarianceγ . The vari-ance of U is σ 2

U := σ 21 1(W (1) = r) + σ 2

2 1(W (1) = −r), with σ1 or σ2 depending upon W (2). Thus,under the null hypothesis, U is heteroscedastic with respect to both W (1) and W (2). The varianceof V is constant. Under the alternative hypothesis, U := W (1)(eU − e1/2 + W (2) − 0.5) and V :=W (1)(eV − e1/2 + W (2) − 0.5) with (U, V) as defined in Section 6.1. The values of σ1, σ2, σV , γfor various designs under the null and the alternative hypotheses are given in Table 3–5.

The distribution of (U, W (1), W (2)) is chosen such that the moment conditions EU = 0,E[W (1)U] = 0 and E[W (2)U] = 0 hold under the null as well as the alternative hypotheses.Since these moment conditions over-identify (θ0, θ1) under the null and the alternative

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

22 T. Chen and G. Tripathi

hypotheses, (θ0, θ1) is consistently estimated with the two-step optimal GMM estima-tor θ := [(∑n

j=1 XjW ′j )�

−1(θ)(∑n

j=1 WjX ′j )]−1(

∑nj=1 XjW ′

j )�−1(θ)(

∑nj=1 WjYj), where W :=

(1, W (1), W (2))3×1, θ := [(∑nj=1 XjW ′

j )(∑n

j=1 WjX ′j )]−1(

∑nj=1 XjW ′

j )(∑n

j=1 WjYj) is a preliminary

estimator of (θ0, θ1) and �(θ) := ∑nj=1 WjW ′

j (Yj − X ′j θ )2/n is an estimator of the efficient weight-

ing matrix using θ . The size and power of n1/2Rmax when the instruments are collectively weak orcollectively strong are given in Tables 3 and 4. Following Section 6.1, we say that the instrumentsare collectively weak, which necessarily implies that the instruments are individually weak, ifthe R2 of the reduced form is less than or equal to 0.03. Similarly, the instruments are calledcollectively strong if the R2 of the reduced form is greater than or equal to 0.60. This does notrule out the case, handled separately, that one of the instruments is strong while the other is weak.

Table 5 contains results for the case when W (1) is strong and W (2) is weak. By this we meanthat W (1) explains about 99% of the R2 for the reduced form and W (2) the remaining 1%. (To beconsistent with the definition of collectively strong instruments, the R2 of the reduced form inTable 5 is set roughly equal to 0.6.) Since W (1) and W (2) are independent with equal variance, theircontribution to the population R2 is determined by π1 and π2, respectively. Consequently, settingπ1/π2 = √

99 makes W (1) strong and W (2) weak. Other parameter values are as in Tables 3 and 4.The results in Tables 3–5 are qualitatively very similar to those in Tables 1–2. Namely, under

the null hypothesis, the empirical size of n1/2Rmax is not close to its nominal value only when theregressor is highly endogenous, both instruments are weak, and U and V are highly correlated.In all other cases, the empirical size is not statistically different from its nominal value at 5%level of significance. Similarly, under the alternative hypothesis, n1/2Rmax has poor power forthe design when the regressor is highly endogenous, both instruments are weak, and U andV are highly correlated. However, the power improves rapidly as the regressor becomes lessendogenous. These findings are in accordance with our intuition that, in general, regardless ofthe degree of endogeneity, a strong instrument can compensate for a weak one, so that the finitesample properties of n1/2Rmax are not much affected by a weak instrument, provided that a strongenough instrument is also present.

Overall, the simulation results in Tables 1–5 suggest that a combination of weak instrumentsand highly endogenous regressors can significantly affect the performance of n1/2Rmax. Apartfrom this worst-case scenario, however, our test appears to perform well in terms of both size andpower for moderately sized samples.

7. Conclusion

We have shown how to consistently test for conditional symmetry in the presence of endoge-nous regressors without making any distributional assumptions and without doing nonparametricsmoothing. The KS-type statistic we propose is easy to implement because it does not requireoptimisation over an uncountable set, and it can detect n1/2-deviations from the null. Simulationresults suggest that our test can work very well in moderately sized samples.

Acknowledgements

We thank an anonymous referee for helpful comments.

References

Ahmad, I.A., and Li, Q. (1997), ‘Testing Symmetry of an Unknown Density Function by Kernel Method’, Journal ofNonparametric Statistics, 7, 279–293.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 23

Andrews, D.W.K. (1997), ‘A Conditional Kolmogorov Test’, Econometrica, 65, 1097–1128.Bai, J., and Ng, S. (2001), ‘A Consistent Test for Conditional Symmetry of Time Series’, Journal of Econometrics, 103,

225–258.Bickel, P.J. (1982), ‘On Adaptive Estimation’, Annals of Statistics, 10, 647–671.Card, D., and Hyslop, D. (1996), ‘Does Inflation “Grease the Wheels of the Labor Market"?’, NBER Working Paper 5538.Choi, I., and Phillips, P.C.B. (1992), ‘Asymptotic and Finite Sample Distribution Theory for IV Estimators and Tests in

Partially Identified Structural Equations’, Journal of Econometrics, 51, 113–150.Christofides, L.N., and Stengos, T. (2001), ‘A Non-parametric Test of the Symmetry of PSID Wage-Change Distributions’,

Economics Letters, 71, 363–368.Davydov, Y.A., Lifshits, M.A., and Smorodina, N.V. (1998), Local Properties of Distributions of Stochastic Functionals,

Providence, Rhode Island: American Mathematical Society.Delgado, M.A., Domínguez, M.A., and Lavergne, P. (2006), ‘Consistent Tests of Conditional Moment Restrictions’,

Annales d’Economie et de Statistique, 81, 33–67.Delgado, M.A., and Escanciano, J.C. (2007), ‘Nonparametric Tests for Conditional Symmetry in Dynamic Models’,

Journal of Econometrics, 141, 652–682.Fan, Y., and Gencay, R. (1995), ‘A Consistent Nonparametric Test of Symmetry in Linear Regression Models’, Journal

of the American Statistical Association, 90, 551–557.Hájek, J. (1972), ‘Local Asymptotic Minimax and Admissibility in Estimation’, in Proceedings of the Sixth Berkeley

Symposium on Mathematical Statistics and Probability (Vol. 10), Berkeley, CA: University of California Press, pp.175–194.

Hall, A.R., and Inoue, A. (2003), ‘The Large Sample Behaviour of the Generalized Method of Moments Estimator inMisspecified Models’, Journal of Econometrics, 114, 361–394.

Hansen, B.E. (1996), ‘Inference When a Nuisance Parameter Is Not Identified under the Null Hypothesis’, Econometrica,64, 413–430.

Harvey, C.R., and Siddique, A. (2000), ‘Conditional Skewness inAsset Pricing Tests’, Journal of Finance, LV, 1263–1295.Hyndman, R.J., and Yao, Q. (2002), ‘Nonparametric Estimation and Symmetry Tests for Conditional Density Functions’,

Journal of Nonparametric Statistics, 14, 259–278.Khmaladze, E.V. (1993), ‘Goodness of Fit Problem and Scanning Innovation Martingales’, Annals of Statistics, 21(2),

798–829.Khmaladze, E.V., and Koul, H. (2004), ‘Martingale Transforms Goodness-of-Fit Tests in Regression Models’, Annals of

Statistics, 32(3), 995–1034.Maddala, G.S., and Jeong, J. (1992), ‘On the Exact Small Sample Distribution of the Instrumental Variable Estimator’,

Econometrica, 60, 181–183.Manski, C. (1984), ‘Adaptive Estimation of Nonlinear Regression Models’, Econometric Reviews, 3, 145–194.Nelson, C.R., and Startz, R. (1990), ‘Some Further Results on the Exact Small Sample Properties of the Instrumental

Variables Estimator’, Econometrica, 58, 967–976.Neumeyer, N., and Dette, H. (2007), ‘Testing for Symmetric Error Distribution in Nonparametric Regression Models’,

Statistica Sinica, 17, 775–795.Neumeyer, N., Dette, H., and Nagel, E.-R. (2005), ‘A Note on Testing Symmetry of the Error Distribution in Linear

Regression Models’, Journal of Nonparametric Statistics, 17, 697–715.Newey, W.K. (1988a), ‘Adaptive Estimation of Regression Models via Moment Restrictions’, Journal of Econometrics,

38, 301–339.Newey, W.K. (1988b), ‘Efficient Estimation of Tobit Models Under Conditional Symmetry’, in Nonparametric and Semi-

parametric Methods in Econometrics and Statistics. Proceedings of the Fifth International Symposium in EconomicTheory and Econometrics, eds. W.A. Barnett, J. Powell, and G. Tauchen, New York: Cambridge University Press,pp. 291–336.

Newey, W.K., and McFadden, D. (1994), ‘Large Sample Estimation and Hypothesis Testing’, in Handbook ofEconometrics, Vol. IV, eds. R. Engle, and D. McFadden, The Netherlands: Elsevier Science B.V., pp. 2111–2245.

Newey, W.K., and Powell, J.L. (1987), ‘Asymmetric Least Squares Estimation and Testing’, Econometrica, 55, 819–847.Phillips, P.C.B. (1989), ‘Partially Identified Econometric Models’, Econometric Theory, 5, 181–240.Powell, J.L. (1986), ‘Symmetrically Trimmed Least Squares Estimation for Tobit Models’, Econometrica, 54, 1435–1460.Powell, J.L (1994), ‘Estimation of Semiparametric Models’, in Handbook of Econometrics, Vol. IV, eds. R. Engle, and

D. McFadden, The Netherlands: Elsevier Science B.V., pp. 2443–2521.Randles, R.H., and Wolfe, D.A. (1979), Introduction to the Theory of Nonparametric Statistics, New York: John Wiley &

Sons.Sakata, S. (2007), ‘Instrumental Variable Estimation Based on Conditional Median Restriction’, Journal of Econometrics,

141, 350–382.Stengos, T., and Wu, X. (2004), ‘Information-Theoretic Distribution Tests with Application to Symmetry and Normality’,

http://ssrn.com/abstract=512902.Su, J.Q., and Wei, L. (1991), ‘A Lack-of-Fit Test for the Mean Function in a Generalized Linear Model’, Journal of the

American Statistical Association, 86, 420–426.van der Vaart, A.W. (1998), Asymptotic Statistics, New York: Cambridge University Press.van der Vaart, A.W., and Wellner, J.A. (1996), Weak Convergence and Empirical Processes: With Applications to Statistics,

New York: Springer-Verlag.van der Vaart, A.W., and Wellner, J.A. (2007), ‘Empirical Processes Indexed by Estimated Functions’, in Asymptotics:

Particles, Processes and Inverse Problems, eds., Cator, E.A., Jongbloed, G., Kraaikamp, C., Lopuhaa, H.P., and

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

24 T. Chen and G. Tripathi

Wellner, J.A.,Vol. 55 of IMS Lecture Notes-Monograph Series, Beachwood, OH: Institute of Mathematical Statistics,pp. 234–252.

Zheng, J.X. (1998), ‘Consistent Specification Testing for Conditional Symmetry’, Econometric Theory, 14,139–149.

Appendix 1. Proofs for Section 2

Proof of Equation (3) Let pε,W denote the density of (ε, W) with respect to an appropriate dominating measure. Then,

(ε, W)d= (−ε, W) ⇐⇒ pε,W (u, w) = p−ε,W (u, w) ∀(u, w) ∈ R × supp(W)

⇐⇒ pε|W=w(u)pW (w) = p−ε|W=w(u)pW (w) ∀(u, w) ∈ R × supp(W)

⇐⇒ pε|W=w(u) = p−ε|W=w(u) ∀(u, w) ∈ R × supp(W)

⇐⇒ ε|W d= −ε|W . �

Appendix 2. Proofs for Section 3

Proof of Lemma 3.1 Let mA0 := (∂1 log pA0 )μ0. We only show Equation (7), that is,

supf ∈F

|(PZ − PZ )f − 〈f , m′A0

〉PA0(θ − θ0)| = oPr◦ (n−1/2) + oPr(‖θ − θ0‖). (A.1)

Since (PrZ

− PrZ )f = (PZ − PZ )f r , Equation (8) follows by replacing f with f r in Equation (A.1). So, let f ∈ F and recall

that �(X , θ , θ0) := μ(X, θ) − μ(X, θ0). Since

f (Y − μ(X, θ), W) = f (ε − �(X, θ , θ0), W) =: f θ (ε, X , W),

we have

(PZ − PZ )f = n−1n∑

j=1

[f (εj − �(Xj , θ , θ0), Wj) − f (εj − �(Xj , θ0, θ0), Wj)]

= n−1n∑

j=1

[f θ (εj , Xj , Wj) − f θ0 (εj , Xj , Wj)]

= PA0 (fθ − f θ

0 ).

Hence, we can write (PZ − PZ )f = (PA0 − PA0 )(fθ − f θ

0 ) + PA0 (fθ − f θ

0 ). Consequently, to prove Equation (A.1) itsuffices to show that

supf ∈F

|n1/2(PA0 − PA0 )(fθ − f θ

0 )| = oPr◦ (1), (A.2)

supf ∈F

|PA0 (fθ − f θ

0 ) − 〈f , m′A0

〉PA0(θ − θ0)| = oPr(‖θ − θ0‖). (A.3)

We will use equicontinuity of the empirical process n1/2(PA0 − PA0 ) to demonstrate Equation (A.2) and mean-square

differentiability of p1/2A0

to show that Equation (A.3) holds. (Compare the proof of Proposition 2.2 in Khmaladze and Koul(2004) for a similar approach.)

We begin with Equation (A.2). Given f ∈ F , we know that f θ , f θ0 ∈ F by Assumption 3.3(iv) and ‖f θ − f θ

0 ‖2,PA0≤

q(‖θ − θ0‖) by Assumption 3.3(v), where q is continuous and passes through the origin, that is, q(0) = 0. Hence, as

|(PA0 − PA0 )(fθ − f θ

0 )| ≤ supf θ ,f θ

0 ∈F :‖f θ −f θ0 ‖2,PA0

≤q(‖θ−θ0‖)|(PA0 − PA0 )(f

θ − f θ0 )|

≤ supg,h∈F :‖g−h‖2,PA0

≤q(‖θ−θ0‖)|(PA0 − PA0 )(g − h)|

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 25

and the right-hand side does not depend upon f ,

supf ∈F

|(PA0 − PA0 )(fθ − f θ

0 )| ≤ supg,h∈F :‖g−h‖2,PA0

≤q(‖θ−θ0‖)|(PA0 − PA0 )(g − h)|. (A.4)

Since F is PA0 -Donsker by Assumption 3.3(iii), the empirical process {n1/2(PA0 − PA0 )f : f ∈ F } is asymptoticallyequicontinuous (V&W, Section 2.1.2), that is, ∀ε > 0,

limθ→θ0

lim supn→∞

Pr◦⎛⎝ sup

g,h∈F :‖g−h‖2,PA0≤q(‖θ−θ0‖)

|n1/2(PA0 − PA0 )(g − h)| > ε

⎞⎠ = 0. (A.5)

Therefore, Equation (A.2) follows by Equations (A.4) and (A.5) because θ is a consistent estimator of θ0 (Assumption 3.4).Next, we show Equation (A.3). Begin by observing that, for θ ∈ �,

PA0 f θ =∫

f (u − �(x, θ , θ0), w)PA0 (du, dx, dw)

=∫

f (u − �(x, θ , θ0), w)pA0 (u, x, w) du κ(dx, dw)

=∫

f (t, w)pA0 (t + �(x, θ , θ0), x, w) dt κ(dx, dw)

by the translation invariance of Lebesgue measure. Hence,

PA0 (fθ − f θ

0 ) =∫

f (t, w)

[pA0 (t + �(x, θ , θ0), x, w)

pA0 (t, x, w)− 1

]PA0 (dt, dx, dw). (A.6)

Since

p1/2A0

(u + �(x, θ , θ0), x, w) − p1/2A0

(u, x, w)

p1/2A0

(u, x, w)

(6)= 1

2

∂1pA0 (u, x, w)

pA0 (u, x, w)�(x, θ , θ0) + r(�(x, θ , θ0), u, x, w),

it follows that

pA0 (t + �(x, θ , θ0), x, w)

pA0 (t, x, w)− 1 = (∂1 log pA0 (t, x, w))�(x, θ , θ0)

+ 0.25(∂1 log pA0 (t, x, w))2�2(x, θ , θ0)

+ r2(�(x, θ , θ0), t, x, w) + 2r(�(x, θ , θ0), t, x, w)

+ r(�(x, θ , θ0), t, x, w)(∂1 log pA0 (t, x, w))�(x, θ , θ0). (A.7)

We use the decomposition in Equation (A.7) to handle Equation (A.6). By Assumption 3.1,∫f (t, w)(∂1 log pA0 (t, x, w))�(x, θ , θ0)PA0 (dt, dx, dw)

=∫

f (t, w)(∂1 log pA0 (t, x, w))μ′0(x)(θ − θ0)PA0 (dt, dx, dw)

+∫

f (t, w)(∂1 log pA0 (t, x, w))ρ(x, θ )PA0 (dt, dx, dw).

By Assumptions 3.2 and 3.3(ii), ‖vA0 ‖∞ ∨ MF < ∞. Hence, by Jensen and Assumption 3.1,∣∣∣∣∫ f (t, w)(∂1 log pA0 (t, x, w))ρ(x, θ )PA0 (dt, dx, dw)

∣∣∣∣2

≤ M2F

∫(∂1 log pA0 (t, x, w))2

⎛⎝ supθ∈B(θ0,‖θ−θ0‖)

|ρ(x, θ , θ0)|⎞⎠2

PA0 (dt, dx, dw)

= M2F

∫vA0 (x, w) sup

θ∈B(θ0,‖θ−θ0‖)ρ2(x, θ , θ0)PX ,W (dx, dw)

≤ M2F ‖vA0 ‖∞

∫sup

θ∈B(θ0,‖θ−θ0‖)ρ2(x, θ , θ0)PX (dx)

= o(‖θ − θ0‖2). (A.8)

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

26 T. Chen and G. Tripathi

Therefore,

supf ∈F

∣∣∣∣∫ f (t, w)(∂1 log pA0 (t, x, w))�(x, θ , θ0)PA0 (dt, dx, dw) − 〈f , m′A0

〉PA0(θ − θ0)

∣∣∣∣ = o(‖θ − θ0‖).

Next, by Assumption 3.1,∣∣∣∣∫ f (t, w)(∂1 log pA0 (t, x, w))2�2(x, θ , θ0)PA0 (dt, dx, dw)

∣∣∣∣≤ 2MF

∫(∂1 log pA0 (t, x, w))2

⎛⎜⎝‖μ0(x)‖2‖θ − θ0‖2 +⎛⎝ sup

θ∈B(θ0,‖θ−θ0‖)|ρ(x, θ , θ0)|

⎞⎠2⎞⎟⎠ PA0 (dt, dx, dw)

= 2MF

∫vA0 (x, w)

⎛⎝‖μ0(x)‖2 ‖θ − θ0‖2 + supθ∈B(θ0,‖θ−θ0‖)

ρ2(x, θ , θ0)

⎞⎠ PX ,W (dx, dw)

≤ 2MF ‖vA0 ‖∞∫ ⎛⎝‖μ0(x)‖2 ‖θ − θ0‖2 + sup

θ∈B(θ0,‖θ−θ0‖)ρ2(x, θ , θ0)

⎞⎠ PX (dx)

≤ 2MF ‖vA0 ‖∞(∫

‖μ0(x)‖2PX (dx) + o(1)

)‖θ − θ0‖2.

Therefore,

supf ∈F

∣∣∣∣∫ f (t, w)(∂1 log pA0 (t, x, w))2�2(x, θ , θ0)PA0 (dt, dx, dw)

∣∣∣∣ = O(‖θ − θ0‖2)

= oPr(‖θ − θ0‖) (A.9)

because plim(θ) = θ0. Next, let ε > 0. Then, by Assumption 3.1 and Equation (6),∣∣∣∣∫ f (t, w)r2(�(x, θ , θ0), t, x, w)PA0 (dt, dx, dw)

∣∣∣∣≤ MF

∫r2(�(x, θ , θ0), t, x, w)PA0 (dt, dx, dw)

≤ MF ε

∫�2(x, θ , θ0)PX (dx)

≤ 2MF ε

∫ ⎛⎝‖μ0(x)‖2 ‖θ − θ0‖2 + supθ∈B(θ0,‖θ−θ0‖)

ρ2(x, θ , θ0)

⎞⎠ PX (dx)

≤ 2MF ε

(∫‖μ0(x)‖2

PX (dx) + o(1)

)‖θ − θ0‖2.

Therefore, since ε was arbitrary,

supf ∈F

∣∣∣∣∫ f (t, w)r2(�(x, θ , θ0), t, x, w)PA0 (dt, dx, dw)

∣∣∣∣ = o(‖θ − θ0‖2). (A.10)

A similar argument using Jensen’s inequality reveals that

supf ∈F

∣∣∣∣∫ f (t, w)r(�(x, θ , θ0), t, x, w)PA0 (dt, dx, dw)

∣∣∣∣ = o(‖θ − θ0‖).

Finally, since ∣∣∣∣∫ f (t, w)r(�(x, θ , θ0), t, x, w)∂1 log pA0 (t, x, w)�(x, θ , θ0)PA0 (dt, dx, dw)

∣∣∣∣≤

(∫|f (t, w)|r2(�(x, θ , θ0), t, x, w)PA0 (dt, dx, dw)

)1/2

×(∫

|f (t, w)|(∂1 log pA0 (t, x, w))2�2(x, θ , θ0)PA0 (dt, dx, dw)

)1/2

by Cauchy–Schwarz, the arguments leading to Equations (A.9) and (A.10) show that

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 27

supf ∈F

∣∣∣∣∫ f (t, w)r(�(x, θ , θ0), t, x, w)(∂1 log pA0 (t, x, w))�(x, θ , θ0)PA0 (dt, dx, dw)

∣∣∣∣ = o(‖θ − θ0‖2). (A.11)

Therefore, Equation (A.3) follows by Equations (A.6)–(A.11). �

Proof of Lemma 3.2 Recall that

X0(f ) := n1/2(PZ − PZ )(f − f r) + 〈f − f r , m′A0

〉PA0n1/2(θ − θ0), f ∈ F .

Assumption 3.3(ii) implies that the sample paths of X0 are bounded functions on F . Hence, by V&W (Theorem 1.5.4),{X0(f ) : f ∈ F } converges in distribution in �∞(F ) to a tight limit process {X0(f ) : f ∈ F } ⊂ �∞(F ) if {X0(f ) : f ∈ F } isasymptotically tight and its marginals X0(f1), . . . , X0(fk) converge in distribution in R

k to the marginals X0(f1), . . . , X0(fk)for every finite subset f1, . . . , fk ofF . From Equation (10), we know that the limiting process, if it exists, is given by X0(f ) :=G0f + 〈f − f r , m′

A0〉PA0

Nϕ0 . To verify its existence, it only remains to show that {X0(f ) : f ∈ F } is asymptotically tight.We

proceed as follows. First, for each f ∈ F , X0(f ) is asymptotically tight in R by Equation (10). Next, since F is PA0 -Donsker(Assumption 3.3(iii)), (F , ‖ · ‖2,PA0

) is totally bounded by Assumption 3.3(ii) and V&W (Problem 2.1.1). Hence, by

V&W (Theorem 1.5.7), {X0(f ) : f ∈ F } is asymptotically tight if it is asymptotically uniformly ‖ · ‖2,PA0-equicontinuous

in probability, that is, for all ε > 0,

limδ→0

lim supn→∞

Pr◦⎛⎝ sup

f ,g∈Fδ,PA0

|X0(f − g)| > ε

⎞⎠ = 0, (A.12)

where Fδ,PA0:= {f , g ∈ F : ‖f − g‖2,PA0

≤ δ}. Thus, we are done if we can show Equation (A.12). Since (f − g)r =f r − gr , by definition of X0 and the triangle inequality Pr◦(supf ,g∈Fδ,PA0

|X0(f − g)| > 2ε) ≤ aδ,n + bδ,n, where aδ,n :=Pr◦(supf ,g∈Fδ,PA0

|n1/2(PZ − PZ )((f − f r) − (g − gr))| > ε) and

bδ,n := Pr◦⎛⎝ sup

f ,g∈Fδ,PA0

|〈(f − f r) − (g − gr), m′A0

〉PA0n1/2(θ − θ0)| > ε

⎞⎠ .

Now, F r is PA0 -Donsker because this is equivalent to F being PA0 -Donsker (Assumption 3.3(iii)). Consequently, F −F r := {f − gr : f , g ∈ F } is PA0 -Donsker by V&W (Theorem 2.10.2) because the difference map is Lipschitz. Hence,since the empirical process n1/2(PZ − PZ )(F − F r) is asymptotically tight under ‖ · ‖2,PA0

(V&W, 2.1.8),

limδ→0

lim supn→∞

aδ,n ≤ limδ→0

lim supn→∞

Pr◦⎛⎝ sup

f ,g∈(F−F r)δ,PA0

|n1/2(PZ − PZ )(f − g)| > ε

⎞⎠ = 0.

Next, by repeated applications of Cauchy–Schwarz,

|〈(f − f r) − (g − gr), m′A0

〉PA0(θ − θ0)| ≤ ‖〈(f − g) − (f r − gr), mA0 〉PA0

‖ ‖θ − θ0‖≤ 2‖f − g‖2,PA0

‖vA0 ‖∞ ‖(μ′0μ0)

1/2‖2,PX ‖θ − θ0‖because ‖f r − gr‖2,PA0

= ‖f − g‖2,PA0for all f , g ∈ F under the null hypothesis. Indeed,

‖f r − gr‖22,PA0

=∫

(f − g)2(−u, w)Pε,X ,W (du, dx, dw)

=∫

(f − g)2(−u, w)Pε,W (du, dw) (marginal integration)

=∫

(f − g)2(u, w)Pε,W (du, dw) (change of variables and Equation (2.1))

=∫

(f − g)2(u, w)Pε,X ,W (du, dx, dw) (marginal integration)

= ‖f − g‖22,PA0

.

Hence,

limδ→0

lim supn→∞

bδ,n ≤ limδ→0

lim supn→∞

Pr◦(ε < 2δ‖vA0 ‖∞‖(μ′0μ0)

1/2‖2,PX ‖n1/2(θ − θ0)‖) = 0

because θ is a n1/2-consistent estimator of θ0 (Assumption 3.5). Therefore, Equation (A.12) holds. �

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

28 T. Chen and G. Tripathi

Proof of Lemma 3.3 We follow the approach taken by Andrews (1997) in proving his Theorem A.2 while accountingfor the fact that (unlike Andrews) our statistic is optimised over the estimated set of observations Z. We prove the statedresult by showing that the cumulative distribution function (cdf) of the nonnegative random variable n1/2Rmax convergespointwise to the cdf of R0. In particular, we show that for all c ∈ [0, ∞),

lim supn→∞

Pr(n1/2Rmax > c) ≤ Pr(R0 > c) ≤ lim infn→∞ Pr(n1/2Rmax > c). (A.13)

The first inequality is easy: since Rmax ≤ R, it follows by Corollary 3.1 that

lim supn→∞

Pr(n1/2Rmax > c) ≤ lim supn→∞

Pr(n1/2R > c) = Pr(R0 > c). (A.14)

The second inequality requires a bit more effort. Let Y := n1/2(PZ − PrZ) and for g ∈ F1 define BPA0

(g, ε) := {h ∈ F1 :

‖g − h‖2,PA0< ε}. By Lemma A.4, given f ∈ F1 and r > 0, there exists f ∈ F1 ∩ BPA0

(f , r) w.p.a.1. Consequently,w.p.a.1,

infg∈BPA0

(f ,r)|Y(g)| ≤ |Y(f )| ≤ max

h∈F1

|Y(h)| = n1/2Rmax.

In fact, since the upper bound does not depend upon f , it follows that given T ∈ N and f1, . . . , fT ∈ F1, the eventEn := {maxt≤T infg∈BPA0

(ft ,r) |Y(g)| ≤ n1/2Rmax} holds w.p.a.1. Now,

Pr

(maxt≤T

infg∈BPA0

(ft ,r)|Y(g)| > c, En

)≤ Pr(n1/2Rmax > c, En)

or equivalently

Pr(maxt≤T

infg∈BPA0

(ft ,r)|Y(g)| > c) − Pr(max

t≤Tinf

g∈BPA0(ft ,r)

|Y(g)| > c, E�n )

≤ Pr(n1/2Rmax > c) − Pr(n1/2Rmax > c, E�n ).

Hence, since limn→∞ Pr(E�n ) = 0,

lim infn→∞ Pr(n1/2Rmax > c) ≥ lim inf

n→∞ Pr

(maxt≤T

infg∈BPA0

(ft ,r)|Y(g)| > c

)

= Pr

(maxt≤T

infg∈BPA0

(ft ,r)|X0(g)| > c

)(A.15)

≥ supT∈N

supr>0

Pr

(maxt≤T

infg∈BPA0

(ft ,r)|X0(g)| > c

), (A.16)

where Equation (A.15) follows by Lemma 3.2 applied to F1 plus the continuous mapping theorem and Equation (A.16) isthe tightest lower bound. From the proof of Lemma 3.2, we know that (F1, ‖ · ‖2,PA0

) is totally bounded, that is, given r > 0,

there exists Tr < ∞ such that F1 ⊂ ∪Trt=1BPA0

(ft , r) with f1, . . . , fTr ∈ F1. Moreover, from V&W (Addendum 1.5.8), itfollows that almost all sample paths of {X0(f ) : f ∈ F1} are uniformly ‖ · ‖2,PA0

-continuous so that for all η > 0 thereexists rη > 0 such that the probability of the event Uη := {supg,h∈F1:‖g−h‖2,PA0

<2rη |X0(g) − X0(h)| < η} approaches one

as η → 0. Therefore, since being a convex functional of a Gaussian process, the random variable R0 := supf ∈F1|X0(f )|

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 29

is continuously distributed on (0, ∞) (cf. Davydov, Lifshits, and Smorodina 1998, Theorem 11.1),

Pr(R0 > c) = limη→0

Pr

(sup

f ∈F1

|X0(f )| > c + η

)

≤ limη→0

Pr

⎛⎝maxt≤Trη

supf ∈BPA0

(ft ,rη)

|X0(f )| > c + η

⎞⎠= lim

η→0Pr

⎛⎝maxt≤Trη

supf ∈BPA0

(ft ,rη)

|X0(f )| > c + η, Uη

⎞⎠≤ lim

η→0Pr

⎛⎝ supf ∈BPA0

(ft ,rη)

|X0(f )| > c + η for some t ≤ Trη , Uη

⎞⎠≤ lim

η→0Pr

(inf

f ∈BPA0(ft ,rη)

|X0(f )| > c for some t ≤ Trη

)

≤ limη→0

Pr

(maxt≤Trη

inff ∈BPA0

(ft ,rη)|X0(f )| > c

)

≤ supT∈N

supr>0

Pr

(maxt≤T

inff ∈BPA0

(ft ,r)|X0(f )| > c

). (A.17)

Hence, Equation (A.13) follows by Equations (A.14), (A.16), and (A.17). �

Appendix 3. Proofs for Section 4

Proof of Lemma 4.1 For f ∈ F ,

(P∗Z

− PR(Z(θ1)))f = n−1n∑

j=1

[f (ε∗j , Wj) − f (Rjεj(θ1), Wj)]

= n−1n∑

j=1

[f (Rjεj(θ1) − �(Rj , Xj , θ∗, θ , θ1), Wj) − f (Rjεj(θ1) − �(Rj , Xj , θ1, θ1, θ1), Wj)]

= PR(A1)(fθ∗ ,θ − f θ1,θ1 )

= (PR(A1) − PR(A1))(fθ∗ ,θ − f θ1,θ1 ) + PR(A1)(f

θ∗ ,θ − f θ1,θ1 ).

Therefore,

(P∗Z

− PR(Z(θ1)))f − PR(A1)(fθ ,θ − f θ1,θ1 ) = (PR(A1) − PR(A1))(f

θ∗ ,θ − f θ1,θ1 ) + PR(A1)(fθ∗ ,θ − f θ ,θ ).

To prove the first result (we only show the first result, as the proof of the second one is similar), it thus suffices to showthat

supf ∈F

|n1/2(PR(A1) − PR(A1))(fθ∗ ,θ − f θ1,θ1 )| = oPr◦ (1), (A.18)

supf ∈F

|PR(A1)(fθ∗ ,θ − f θ ,θ ) − 0.5〈f − f r , (∂1 log pA1 )μ

′1〉PA1

(θ∗ − θ )| = oPr(‖θ∗ − θ‖). (A.19)

As in the proof of Lemma 3.1, we use equicontinuity of the n1/2(PR(A1) − PR(A1)) process to demonstrate

Equation (A.18) and mean-square differentiability of p1/2A1

to show Equation (A.19). We begin with the first claim. Given

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

30 T. Chen and G. Tripathi

f ∈ F , we know that f θ∗ ,θ , f θ1,θ1 ∈ F and

‖f θ∗ ,θ − f θ1,θ1 ‖2,PR(A1)≤ q(‖(θ∗, θ ) − (θ1, θ1)‖) =: q

byAssumption 4.2(iii). Hence, as |(PR(A1) − PR(A1))(f θ∗ ,θ − f θ1,θ1 )| is bounded from above by supg,h∈F :‖g−h‖2,PR(A1)≤q

|(PR(A1) − PR(A1))(g − h)| and the bound does not depend upon f ,

supf ∈F

|n1/2(PR(A1) − PR(A1))(fθ∗ ,θ − f θ1,θ1 )| ≤ sup

g,h∈F :‖g−h‖2,PR(A1)≤q

|n1/2(PR(A1) − PR(A1))(g − h)|.

Then, Equation (A.18) follows because the empirical process {n1/2(PR(A1) − PR(A1))f : f ∈ F } is asymptoticallyequicontinuous (due to the fact that F is PR(A1)-Donsker by Assumption 4.2(iii)) and plim(q) = 0 by Assumption 4.1and the properties of q.

Next, we show Equation (A.19). Begin by observing that

PR(A1)fθa ,θb =

∫f (su − �(s, x, θa, θb, θ1), w)PR(ds)PA1 (du, dx, dw)

= 0.5∫

f (u − �(1, x, θa, θb, θ1), w)pA1 (u, x, w) du κ(dx, dw)

+ 0.5∫

f (−u − �(−1, x, θa, θb, θ1), w)pA1 (u, x, w) du κ(dx, dw)

= 0.5∫

f (t, w)pA1 (t + �(1, x, θa, θb, θ1), x, w) dt κ(dx, dw)

+ 0.5∫

f (t, w)pA1 (−t − �(−1, x, θa, θb, θ1), x, w) dt κ(dx, dw)

=∫

f (t, w)pA1 (st + s�(s, x, θa, θb, θ1), x, w)PR(ds) dt κ(dx, dw), (A.20)

where the third equality follows by the translation invariance of the Lebesgue measure. Hence,

PR(A1)(fθ∗ ,θ − f θ ,θ ) =

∫f (t, w)(pA1 (st + s�(s, x, θ∗, θ , θ1), x, w)

− pA1 (st + �(x, θ , θ1), x, w)) PR(ds) dt κ(dx, dw). (A.21)

Under Assumption 4.2(ii), p1/2A1

is mean-square differentiable. Hence, expanding as in Equation (A.7) while keeping

st + �(x, θ , θ1) fixed,

pA1 (st + s�(s, x, θ∗, θ , θ1), x, w) − pA1 (st + �(x, θ , θ1), x, w)

= pA1 (st + �(x, θ , θ1) + s�(x, θ∗, θ ), x, w) − pA1 (st + �(x, θ , θ1), x, w)

= (∂1pA1 (st + �(x, θ , θ1), x, w))s�(x, θ∗, θ )

+ 0.25(∂1pA1 (st + �(x, θ , θ1), x, w))2p−1A1

(st + �(x, θ , θ1), x, w)(s�(x, θ∗, θ ))2

+ r2(s�(x, θ∗, θ ), st + �(x, θ , θ1), x, w)pA1 (st + �(x, θ , θ1), x, w)

+ 2r(s�(x, θ∗, θ ), st + �(x, θ , θ1), x, w)pA1 (st + �(x, θ , θ1), x, w)

+ r(s�(x, θ∗, θ ), st + �(x, θ , θ1), x, w)(∂1pA1 (st + �(x, θ , θ1), x, w))s�(x, θ∗, θ ).

We use this decomposition to handle Equation (A.21). First, by Assumption 4.2(i),∫f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))s�(x, θ∗, θ ) PR(ds) dt κ(dx, dw)

=∫

f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))sμ′(x, θ )(θ∗ − θ )PR(ds) dt κ(dx, dw)

+∫

f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))sρ(x, θ∗, θ )PR(ds) dt κ(dx, dw).

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 31

By a change of variable,∫f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))sρ(x, θ∗, θ )PR(ds) dt κ(dx, dw)

= 0.5∫

f (t, w)(∂1pA1 (t + �(x, θ , θ1), x, w))ρ(x, θ∗, θ ) dt κ(dx, dw)

− 0.5∫

f (t, w)(∂1pA1 (−t + �(x, θ , θ1), x, w))ρ(x, θ∗, θ ) dt κ(dx, dw)

= 0.5∫

f (u − �(x, θ , θ1), w)(∂1pA1 (u, x, w))ρ(x, θ∗, θ ) du κ(dx, dw)

− 0.5∫

f (−(u − �(x, θ , θ1)), w)(∂1pA1 (u, x, w))ρ(x, θ∗, θ ) du κ(dx, dw)

= 0.5∫

(f − f r)(u − �(x, θ , θ1), w)(∂1 log pA1 (u, x, w))ρ(x, θ∗, θ )PA1 (du, dx, dw). (A.22)

Hence, following the argument leading to Equation (A.8),∣∣∣∣∫ f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))sρ(x, θ∗, θ )PR(ds) dtκ(dx, dw)

∣∣∣∣2

≤ M2F ‖vA1 ‖∞

∫sup

θ∈B(θ ,‖θ∗−θ‖)ρ2(x, θ , θ )PX (dx)

= oPr(‖θ∗ − θ‖2) (A.23)

by Assumption 4.2(i). Arguing similarly,∣∣∣∣∫ f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w)) s(μ(x, θ ) − μ′(x, θ1))′(θ∗ − θ ) PR(ds) dt κ(dx, dw)

∣∣∣≤ MF ‖vA1 ‖∞ ‖μ(·, θ ) − μ(·, θ1)‖2,PX ‖θ∗ − θ‖= oPr(‖θ∗ − θ‖) (A.24)

by Assumption 4.2(i). Therefore, by Equations (A.23) and (A.24),∣∣∣∣∫ f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))s�(x, θ∗, θ )PR(ds) dt κ(dx, dw)

−∫

f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))sμ′(x, θ1)(θ∗ − θ ) PR(ds) dt κ(dx, dw)

∣∣∣∣ = oPr(‖θ∗ − θ‖). (A.25)

Now, the argument leading to Equation (A.22) shows that∫f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))sμ′(x, θ1)(θ

∗ − θ )PR(ds) dt κ(dx, dw)

= 0.5∫

(f − f r)(u − �(x, θ , θ1), w)(∂1 log pA1 (u, x, w))μ′(x, θ1)(θ∗ − θ )PA1 (du, dx, dw).

Thus, by Assumption 4.2(iii) and Cauchy–Schwarz,∣∣∣∣∫ f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w) − ∂1pA1 (st, x, w))sμ′(x, θ1)(θ∗ − θ )PR(ds) dt κ(dx, dw)

∣∣∣∣≤ q(‖θ − θ1‖)‖vA1 ‖∞‖(μ′

1μ1)1/2‖2,PX ‖θ∗ − θ‖ = oPr(‖θ∗ − θ‖) (A.26)

because plim(θ) = θ1, q is continuous, and q(0) = 0. Therefore, by Equations (A.25) and (A.26),∣∣∣∣∫ f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))s�(x, θ∗, θ )PR(ds) dt κ(dx, dw)

−∫

f (t, w)(∂1pA1 (st, x, w))sμ′(x, θ1)(θ∗ − θ )PR(ds) dt κ(dx, dw)

∣∣∣∣ = oPr(‖θ∗ − θ‖).

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

32 T. Chen and G. Tripathi

But as in Equation (A.22),∫f (t, w)(∂1pA1 (st, x, w))sμ(x, θ1)PR(ds) dt κ(dx, dw) = 0.5〈f − f r , (∂1 log pA1 )μ1〉PA1

.

Therefore,

supf ∈F

∣∣∣∣∫ f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))s�(x, θ∗, θ ) PR(ds) dt κ(dx, dw)

−0.5〈f − f r , (∂1 log pA1 (u, x, w)μ1〉PA1

∣∣∣ = oPr(‖θ∗ − θ‖). (A.27)

Next, since R2 = 1, the argument leading to Equation (A.22) reveals that∫f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))2p−1

A1(st + �(x, θ , θ1), x, w)

× (s�(x, θ∗, θ ))2PR(ds) dt κ(dx, dw)

= 0.5∫

(f + f r)(u − �(x, θ , θ1), w)(∂1 log pA1 (u, x, w))2

× �2(x, θ∗, θ )PA1 (du, dx, dw).

Hence, by Assumption 4.2(ii),

supf ∈F

∣∣∣∣∫ f (t, w)(∂1pA1 (st + �(x, θ , θ1), x, w))2p−1A1

(st + �(x, θ , θ1), x, w)

× (s�(x, θ∗, θ ))2PR(ds) dt κ(dx, dw)

∣∣∣≤ MF ‖vA1 ‖∞

∫ ⎛⎝‖μ(x, θ )‖2‖θ∗ − θ‖2 + supθ∈B(θ ,‖θ∗−θ‖)

ρ2(x, θ , θ )

⎞⎠ PX (dx)

≤ MF ‖vA1 ‖∞

⎛⎝(‖μ(·, θ ) − μ(·, θ1)‖22,PX

+ ‖μ(·, θ1)‖22,PX

)‖θ∗ − θ‖2

+∫

supθ∈B(θ ,‖θ∗−θ‖)

ρ2(x, θ , θ )PA1 (dx)

⎞⎠= OPr(‖θ∗ − θ‖2). (A.28)

Next, since ∫f (t, w)r2(s�(x, θ∗, θ ), st + �(x, θ , θ1), x, w)

× pA1 (st + �(x, θ , θ1), x, w)PR(ds) dt κ(dx, dw)

= 0.5∫

(f (u − �(x, θ , θ1), w)r2(�(x, θ∗, θ ), u, x, w)

+ f r(u − �(x, θ , θ1), w)r2(−�(x, θ∗, θ ), u, x, w))PA1 (du, dx, dw),

following the argument leading to Equation (A.28), we obtain that

supf ∈F

∣∣∣∣∫ f (t, w)r2(s�(x, θ∗, θ ), st + �(x, θ , θ1), x, w)

× pA1 (st + �(x, θ , θ1), x, w)PR(ds) dt κ(dx, dw)

∣∣∣≤ MF o(‖�(·, θ∗, θ )‖2

2,PX) = o(‖θ∗ − θ‖2).

In a similar manner, it can be shown that

supf ∈F

∣∣∣∣∫ f (t, w)r(s�(x, θ∗, θ ), st + �(x, θ , θ1), x, w)

× pA1 (st + �(x, θ , θ1), x, w)PR(ds)dt κ(dx, dw)

∣∣∣ = o(‖θ∗ − θ‖2)

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 33

and

supf ∈F

∣∣∣∣∫ f (t, w)r(s�(x, θ∗, θ ), st + �(x, θ , θ1), x, w)(∂1pA1 (st + �(x, θ , θ1), x, w))

× s�(x, θ∗, θ )PR(ds) dt κ(dx, dw)

∣∣∣ = o(‖θ∗ − θ‖2). (A.29)

The desired result follows from Equations (A.27)–(A.29). �

Proof of Equation (12) By Equation (A.20) and the fact that R2 = 1,

PR(A1)fθ ,θ =

∫f (t, w)pA1 (st + �(x, θ , θ1), x, w)PR(ds) dt κ(dx, dw)

PR(A1)fθ1,θ1 =

∫f (t, w)pA1 (st, x, w)PR(ds) dt κ(dx, dw).

Similarly, by a change of variable and the fact that supp(R) = {−1, 1},

PR(A1)frθ ,θ =

∫f r(t, w)pA1 (st + �(x, θ , θ1), x, w)PR(ds) dt κ(dx, dw)

=∫

f (−t, w)pA1 (st + �(x, θ , θ1), x, w)PR(ds) dt κ(dx, dw)

=∫

f (u, w)pA1 (−su + �(x, θ , θ1), x, w)PR(ds) du κ(dx, dw)

=∫

f (u, w)pA1 (ru + �(x, θ , θ1), x, w)PR(dr) du κ(dx, dw)

= PR(A1)fθ ,θ .

The desired result follows because PR(A1)f rθ1,θ1 = PR(A1)f θ1,θ1 by setting � to be identically zero in the precedingequation. �

Proof of Lemma 4.2 Since F is PR(A1)-Donsker by Assumption 4.2(iii), tightness of X∗1 can be established as in the

proof of Lemma 3.2. It only remains to prove that the empirical process {n1/2(PR(Z(θ1)) − PR(Z(θ1)))(f − f r) : f ∈ F }has the same limiting marginal distribution and covariance function as G1. So, let f ∈ F . Then, since (f − f r)(Z r(θ1)) =−(f − f r)(Z(θ1)) and 1{1}(R) − 1{−1}(R) = R,

PR(Z(θ1))(f − f r) = n−1n∑

j=1

(f − f r)(R(Zj(θ1)))

= n−1n∑

j=1

(f − f r)(Zj(θ1))1{1}(Rj) + (f − f r)(Z rj (θ1))1{−1}(Rj)

= n−1n∑

j=1

(1{1}(Rj) − 1{−1}(Rj))(f − f r)(Zj(θ1))

= n−1n∑

j=1

Rj(f − f r)(Zj(θ1)).

Hence, as PR(Z(θ1))(f − f r) = 0 (cf. the remark at the end of Section 4), the CLT for i.i.d. random variables shows

that the empirical process {n1/2(PR(Z(θ1)) − PR(Z(θ1)))(f − f r) : f ∈ F } has the same limiting marginal distribution andcovariance function as G1. �

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

34 T. Chen and G. Tripathi

Proof of Lemma 4.3 From Corollary 4.1, we know that n1/2R∗ converges in distribution to R1 whether the null hypothesisis true or not. Therefore, it suffices to prove the same for n1/2R∗

max. In particular, we show that for all c ∈ [0, ∞),

lim supn→∞

Pr(n1/2R∗max > c) ≤ Pr(R1 > c) ≤ lim inf

n→∞ Pr(n1/2R∗max > c).

This follows exactly as in the proof of Lemma 3.3 provided F ∗1 := {1(−∞,z] : z ∈ Z∗} is dense in F1 in the sense that for

each f ∈ F1 and ε > 0,

limn→∞ Pr(∃f ∈ F ∗

1 : ‖f − f ‖2,PR(A1)∨ ‖f r − f r‖2,PR(A1)

< ε) = 1.

But the denseness result above follows as in the proof of Lemma A.4 upon replacing ε by ε∗ := Rε(θ1), Z byZ∗ := (ε∗, W), Z by Z∗ := (ε∗, W) with ε∗ as defined in Section 4, �(X , θ , θ0) by �(R, X , θ∗, θ , θ1) (because ε∗ − ε∗ =−�(R, X , θ∗, θ , θ1)), and using the fact that plim(θ∗, θ ) = (θ1, θ1) by Assumption 4.1. �

Appendix 4. Proofs for Section 5

Proof of Theorem 5.1 Under the alternative, PZ (θ) �= PrZ (θ) for each θ ∈ �; in particular, PZ (θ1) �= P

rZ (θ1). Now write

supf ∈F

|(PZ − PrZ)f | = sup

f ∈F|(PZ (θ1) − P

rZ (θ1))f | + �1(F ) + �2(F ),

where �1(F ) := supf ∈F |(PZ − PrZ)f | − supf ∈F |(PZ (θ1) − P

rZ (θ1))f | and

�2(F ) := supf ∈F

|(PZ (θ1) − PrZ (θ1))f | − sup

f ∈F|(PZ (θ1) − P

rZ (θ1))f |.

By the reverse triangle inequality and Lemma A.2,

|�1(F )| ≤ supf ∈F

||(PZ − PrZ)f | − |(PZ (θ1) − P

rZ (θ1))f ||

≤ supf ∈F

|(PZ − PrZ)f − (PZ (θ1) − P

rZ (θ1))f |

≤ supf ∈F

|(PZ − PZ (θ1))f | + supf ∈F

|(PrZ

− PrZ (θ1))f |

= oPr◦ (1). (A.30)

Next, since F is PA(θ1)-Donsker =⇒ F and F r are PA(θ1)-Glivenko–Cantelli =⇒ F and F r are PZ (θ1)-Glivenko–Cantelli,

|�2(F )| ≤ supf ∈F

||(PZ (θ1) − PrZ (θ1))f | − |(PZ (θ1) − P

rZ (θ1))f ||

≤ supf ∈F

|(PZ (θ1) − PrZ (θ1))f − (PZ (θ1) − P

rZ (θ1))f |

≤ supf ∈F

|(PZ (θ1) − PZ (θ1))f | + supf ∈F

|(PZ (θ1) − PZ (θ1))fr |

= supf ∈F

|(PZ (θ1) − PZ (θ1))f | + supf ∈F r

|(PZ (θ1) − PZ (θ1))f |

= oPr◦ (1). (A.31)

Hence,

KSF := supf ∈F

|(PZ − PrZ)f | = sup

f ∈F|(PZ (θ1) − P

rZ (θ1))f | + oPr◦ (1). (A.32)

Since F is measure determining, supf ∈F |(PZ (θ1) − PrZ (θ1))f | = 0 =⇒ PZ (θ1) = P

rZ (θ1). Therefore, under the alter-

native hypothesis, supf ∈F |(PZ (θ1) − PrZ (θ1))f | > 0. It follows by Equation (A.32) that, under the alternative, n1/2KSF

is unbounded w.p.a.1. �

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 35

Proof of Theorem 5.2 Let �(f ) := |(PZ (θ1) − PrZ (θ1))f | for notational convenience. Then, with �1 and �2 as defined

in the proof of Theorem 5.1,

Rmax := supf ∈F1

|(PZ − PrZ)f | = sup

f ∈F1

�(f ) + �1(F1) + �2(F1).

The argument leading to Equation (A.30) shows that

|�1(F1)| ≤ supf ∈F1

|(PZ − PZ (θ1))f | + supf ∈F1

|(PrZ

− PrZ (θ1))f |

≤ supf ∈F1

|(PZ − PZ (θ1))f | + supf ∈F1

|(PrZ

− PrZ (θ1))f | (F1 ⊂ F1)

= oPr◦ (1).

Similarly, the argument leading to Equation (A.31) shows that

|�2(F1)| ≤ supf ∈F1

|(PZ (θ1) − PZ (θ1))f | + supf ∈F r

1

|(PZ (θ1) − PZ (θ1))f | = oPr◦ (1).

Therefore, letting �3 := supf ∈F1�(f ) − supf ∈F1

�(f ),

Rmax = supf ∈F1

�(f ) + oPr◦ (1) = supf ∈F1

�(f ) + �3 + oPr◦ (1).

Since supf ∈F �(f ) > 0 under the alternative hypothesis, n1/2Rmax will be unbounded under the alternative (hence con-

sistent) provided �3 = oPr(1). To show the latter, let ε > 0 and observe that �3 ≤ ε is always true because F1 ⊂ F1.Next, by the definition of the supremum, there exists gε ∈ F1 such that supf ∈F1

�(f ) − ε < �(gε). For η > 0, considerthe event

Eη := {∃hη ∈ F1 : ‖hη − gε‖2,PA(θ1)∨ ‖hr

η − grε‖2,PA(θ1)

< η}.

If Eη is true, then there exists hη ∈ F1 such that ‖hη − gε‖2,PA(θ1)+ ‖hr

η − grε‖2,PA(θ1)

< 2η. But,

supf ∈F1

�(f ) − ε < �(gε) ≤ |�(gε) − �(hη)| + �(hη) ≤ |�(gε) − �(hη)| + supf ∈F1

�(f )

and

|�(gε) − �(hη)| ≤ |(PZ (θ1) − PrZ (θ1))(gε − hη)| (reverse triangle inequality)

≤ |PZ (θ1)(gε − hη)| + |PZ (θ1)(grε − hr

η)|≤ ‖gε − hη‖2,PA(θ1)

+ ‖grε − hr

η‖2,PA(θ1)(Jensen).

Hence, choosing η ≤ ε,

Eη =⇒ supf ∈F1

�(f ) − ε < 2η + supf ∈F1

�(f ) ⇐⇒ �3 ≥ −ε − 2η ≥ −3ε =⇒ |�3| ≤ 3ε.

As we proved Equation (A.35), we can show that limn→∞ Pr(Eη) = 1 under Assumptions 3.1, 3.2(ii), and 3.4 with (θ0, ε)replaced by (θ1, ε(θ1)). (Note that Lemma A.4 holds irrespective of whether the null hypothesis is true or not.) It followsthat limn→∞ Pr(|�3| ≤ 3ε) = 1 which is equivalent to �3 = oPr(1) because ε was arbitrary. �

Proof of Lemma 5.2 The approach (and notation) here is very similar to the proof of Lemma 3.2, the main differencebeing that we now use empirical process results that allow the underlying data generating measures to depend uponn. Recall that Xθn (f ) := n1/2(PZ (θn) − PZ (θn))(f − f r) + 〈f − f r , m′

An〉PAn

n1/2(θ − θn), where mAn := (∂1 log pAn )μ0

and f ∈ F . To prove that (Xθn + �h)F is asymptotically tight, it suffices to show (V&W, Section 2.8.3) that

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

36 T. Chen and G. Tripathi

limn→∞ supf ,g∈F |‖f − g‖2,PAn− ‖f − g‖2,PA0

| = 0 and, for all ε > 0,

limδ→0

lim supn→∞

Pr◦⎛⎝ sup

f ,g∈Fδ,PAn

|(Xθn + �h)(f − g)| > ε

⎞⎠ = 0. (A.33)

The first condition follows immediately because for all f , g ∈ F ,

(‖f − g‖2,PAn− ‖f − g‖2,PA0

)2 ≤ |‖f − g‖2,PAn− ‖f − g‖2,PA0

| × |‖f − g‖2,PAn+ ‖f − g‖2,PA0

|= |‖f − g‖2

2,PAn− ‖f − g‖2

2,PA0|

= n−1/2∫

(f − g)2 h dPA0

≤ 2M2F ‖h‖∞n−1/2.

To show Equation (A.33), begin by observing that

Pr◦⎛⎝ sup

f ,g∈Fδ,PAn

|(Xθn + �h)(f − g)| > 3ε

⎞⎠ ≤ tδ,n(1) + tδ,n(2) + tδ,n(3),

where tδ,n(1) := Pr◦(supf ,g∈Fδ,PAn|n1/2(PZ(θn) − PZ(θn))((f − f r) − (g − gr))| > ε),

tδ,n(2) := Pr◦⎛⎝ sup

f ,g∈Fδ,PAn

|〈(f − g) − (f r − gr), m′An

〉PAnn1/2(θ − θn)| > ε

⎞⎠ ,

and tδ,n(3) := 1(supf ,g∈Fδ,PAn|�h(f − g)| > ε) because �h(f − g) is nonstochastic. Now,

limδ→0

lim supn→∞

tδ,n(1) ≤ limδ→0

lim supn→∞

Pr◦⎛⎝ sup

f ,g∈(F−F r)δ,PAn

|n1/2(PZ (θn) − PZ (θn))(f − g)| > ε

⎞⎠ = 0

because F − F r is asymptotically equicontinuous uniformly in (PAn ) which follows because F − F r is Donsker andpre-Gaussian uniformly in (PAn ) by Assumptions 5.3(ii) and (iii) (cf. V&W, Section 2.8.3). Next, for the remainder of theproof, let f , g ∈ Fδ,PAn

. By Assumption 5.2(ii),

‖〈(f − g) − (f r − gr), mAn 〉PAn‖ ≤ (‖f − g‖2,PAn

+ ‖f r − gr‖2,PAn)‖vAn ‖∞‖(μ′

0μ0)1/2‖2,PX .

Moreover, recalling that PZ = PrZ was assumed to create the local alternatives,

‖f r − gr‖22,PAn

=∫

(f − g)2(−u, w)(1 + n−1/2h(u, w))PA0 (du, dx, dw)

=∫

(f − g)2(−u, w)(1 + n−1/2h(u, w))PZ (du, dw) (marginal integration)

=∫

(f − g)2(u, w)(1 + n−1/2h(−u, w))PZ (du, dw) (change of variables)

=∫

(f − g)2(u, w)

(1 + n−1/2h(−u, w)

1 + n−1/2h(u, w)

)PAn (du, dx, dw) (marginal integration)

≤(

1 + n−1/2‖h‖∞1 − n−1/2‖h‖∞

)‖f − g‖2

2,PAn(for n ≥ ‖h‖2∞)

≤(

1 + n−1/2‖h‖∞1 − n−1/2‖h‖∞

)δ2.

Hence, limδ→0 lim supn→∞ tδ,n(2) = 0 because ‖n1/2(θ − θn)‖ is bounded in probability by Assumption 5.5. Finally, bymarginal integration and Jensen,

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 37

|�h(f − g)| ≤∫

|(f − g)(h − hr)| dPA0 =∫ |(f − g)(h − hr)|

1 + n−1/2hdPAn

≤ 2‖h‖∞1 − n−1/2‖h‖∞

∫|f − g| dPAn (for n ≥ ‖h‖2∞)

≤ 2‖h‖∞1 − n−1/2‖h‖∞

‖f − g‖2,PAn

≤ 2‖h‖∞1 − n−1/2‖h‖∞

δ.

Hence, limδ→0 lim supn→∞ tδ,n(3) = 0 as well. Therefore, Equation (A.33) holds. �

Proof of Lemma 5.3 In the proof of Lemma 3.3, replace references to Corollary 3.1 by Corollary 5.1, Lemma 3.2 byLemma 5.2, and X0 by X0 + �h. The argument leading to Equation (A.17) goes through because (a) the denseness result inLemma A.4 continues to hold under H1n and Assumptions 5.1, 5.2(ii), and 5.4; (b) almost all sample paths of (X0 + �h)F1are uniformly ‖ · ‖2,PA0

-continuous; and (c) supf ∈F1(X0 + �h)f is continuously distributed. �

Appendix 5. Proofs for Section 6

The following inequality, relating the strength of an instrument W to the degree of endogeneity of a regressor X , may beof independent interest.

Lemma A.1 Let Y := θ0 + θ1X + U, X := π0 + π1W + V with cov(W , U) = cov(W , V) = 0, let R20 := corr2(X , W)

denote the population R2 for the second regression, and assume that σ 2X < ∞ and σ 2

V > 0. Then, R20 ∧ |corr(X , U)| ≤

(√

5 − 1)/2.

Proof of Lemma A.1 Since the result is trivial if σ 2U = 0, assume without loss of generality that σ 2

U > 0. We provethe desired result by contradiction. So, suppose to the contrary that there exists a distribution of (X , W , U) for whichR2

0 ∧ |corr(X , U)| ≥ c, where c ∈ ((√

5 − 1)/2, 1). (The case c = 1 is ruled out by the assumption that σ 2X < ∞ and

σ 2V > 0 (cf. Equation (A.34)).) Then, since cov(W , V) = 0,

c ≤ R20 = cov2(X, W)

σ 2Xσ 2

W

= (π1σ2W )2

σ 2Xσ 2

W

= π21 σ 2

W

σ 2X

= π21 σ 2

W

π21 σ 2

W + σ 2V

= 1 − σ 2V

π21 σ 2

W + σ 2V

.

Consequently,

σ 2X = π2

1 σ 2W + σ 2

V ≥ σ 2V

1 − c. (A.34)

Next, since cov(W , U) = 0 =⇒ cov(X, U) = cov(V , U),

c ≤ |corr(X, U)| ⇐⇒ c2 ≤ cov2(X, U)

σ 2Xσ 2

U

⇐⇒ σ 2Xσ 2

U c2 ≤ cov2(V , U)

=⇒ σ 2V σ 2

Uc2

1 − c≤ cov2(V , U) (by Equation A.34)

⇐⇒ c2

1 − c≤ corr2(V , U)

=⇒ c2

1 − c≤ 1

⇐⇒(

c −√

5 − 1

2

) (c +

√5 + 1

2

)≤ 0.

Therefore, the maximum possible value for c is (√

5 − 1)/2, which contradicts the supposition that c > (√

5 − 1)/2. �

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

38 T. Chen and G. Tripathi

Appendix 6. Auxiliary results

Lemma A.2 Under the conditions of Theorem 5.1,

supf ∈F

|(PZ − PZ (θ1))f | = oPr◦ (1) and supf ∈F

|(PrZ

− PrZ (θ1))f | = oPr◦ (1).

Proof of Lemma A.2 Since the proof of Lemma 3.1 only requires the existence of plim(θ), the argument leading toEquation (A.1) can be replicated word for word with (θ0, ε) replaced by (θ1, ε(θ1)) and A0 replaced by A(θ1) to show that

supf ∈F

|(PZ − PZ (θ1))f − 〈f , (∂1 log pA(θ1))μ′1〉PA(θ1)

(θ − θ1)| = oPr◦ (n−1/2) + oPr(‖θ − θ1‖).

Therefore, supf ∈F |(PZ − PZ (θ1))f | = oPr◦ (1). The second result follows from the first upon noting that (PrZ

−P

rZ (θ1))f = (PZ − PZ (θ1))f r . �

Lemma A.3 Under Assumption 3.1 and 3.2(ii), F1 and F2 satisfy Assumption 3.3(v) with q(t) ∝ t1/2.

Proof of Lemma A.3 If f ∈ F1, then f = 1(−∞,τ ]×(−∞,v] for some (τ , v) ∈ R × Rdim(W) and

f (u − �(x, θ , θ0), w) = 1(−∞,τ ]×(−∞,v](u − �(x, θ , θ0), w) = 1(−∞,τ+�(x,θ ,θ0)](u)1(−∞,v](w).

Since |1(−∞,a] − 1(−∞,b]| = 1(a∧b,a∨b] and a ∨ b − a ∧ b = |a − b|, it follows that∫(f (u − �(x, θ , θ0), w) − f (u, w))2

Pε|X=x,W=w(du)

=∫

((τ+�(x,θ ,θ0))∧τ ,(τ+�(x,θ ,θ0))∨τ ]Pε|X=x,W=w(du)1(−∞,v](w)

≤ ζ(x, w)|�(x, θ , θ0)| (by Assumption 3.2(ii))

≤ ζ(x, w)(‖μ0(x)‖ ‖θ − θ0‖ + |ρ(x, θ , θ0)|) (by Assumption 3.1)

≤ ζ(x, w)

(‖μ0(x)‖ ‖θ − θ0‖ + sup

θ∈B(θ0,‖θ−θ0‖)|ρ(x, θ , θ0)|

).

Hence, by Cauchy–Schwarz and Assumption 3.1, for all ε > 0,∫(f (u − �(x, θ , θ0), w) − f (u, w))2

PA0 (du, dx, dw)

≤ ‖ζ‖2,PX,W

⎛⎝‖(μ′0μ0)

1/2‖2,PX ‖θ − θ0‖ +∥∥∥∥∥ sup

θ∈B(θ0,‖θ−θ0‖)|ρ(·, θ , θ0)|

∥∥∥∥∥2,PX

⎞⎠≤ ‖ζ‖2,PX,W (‖(μ′

0μ0)1/2‖2,PX + ε)‖θ − θ0‖.

Therefore, since the right-hand side does not depend upon (τ , v),

supf ∈F1

‖f (· − �(X, θ , θ0), ·) − f (·, ·)‖2,PA0≤ ‖ζ‖1/2

2,PX,W(‖(μ′

0μ0)1/2‖2,PX + ε)1/2‖θ − θ0‖1/2;

that is, F1 satisfies Assumption 3.3(v) with q(t) ∝ t1/2. Since f ∈ F1 =⇒ f r = 1[−τ ,∞)×(−∞,v] and Pε|X ,W ({−τ }) = 0,the above argument shows that Assumption 3.3(v) is also satisfied with f r and q(t) ∝ t1/2. The same reasoning works forF2 as well. �

The following denseness result may be of independent interest.

Lemma A.4 Let Assumptions 3.1, 3.2(ii), and 3.4 hold. Then, F1 is dense in F1 in the following sense: for each f ∈ F1and ε > 0,

limn→∞ Pr(∃f ∈ F1 : ‖f − f ‖2,PA0

∨ ‖f r − f r‖2,PA0< ε) = 1. (A.35)

Proof of Lemma A.4 We make use of the following fact (van der Vaart 1998, Theorem 19.1): given ε > 0 and any cdfF, there exists a partition {−∞ =: t0 < t1 < · · · < tK := ∞} of R such that F(tl−) − F(tl−1) < ε for each l (points at

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 39

which F jumps more than ε belong to the partition). Hence, letting Fε and FW (i) denote the marginal cdfs of ε and W (i)

and working coordinate by coordinate, we can create a partition of R × Rdim(W) denoted by Zπ := {τ0 < τ1 < · · · <

τK } × ×dim(W)i=1 {ϑ(i)

0 < ϑ(i)1 < . . . < ϑ

(i)Ki

} such that Fε(τl) − Fε(τl−1) < ε and FW (i) (ϑ(i)l −) − FW (i) (ϑ

(i)l−1) < ε for each

l (remember that ε is assumed to be continuously distributed). For the remainder of the proof, let c := 1 + dim(W) andd := dim(W).

Fix f ∈ F1. Then, f = 1(−∞,u]×(−∞,v] for some (u, v) ∈ R × Rd and f r = 1[−u,∞)×(−∞,v]. We begin by showing that,

given Zπ , there exist functions fπ and fπ such that

d(f , fπ ; f r , fπ ) := ‖f − fπ‖2,PA0∨ ‖f r − fπ‖2,PA0

< (cε)1/2. (A.36)

Since Zπ partitions R × Rd , we can find zπ := (τl , ϑ

(1)l1

, . . . , ϑ(d)ld

) ∈ Zπ such that (u, v) ∈ [τl , τl+1) × ×di=1[ϑ(i)

li, ϑ(i)

li+1).

Similarly, there exists zπ := (τl , ϑ(1)l1

, . . . , ϑ(d)ld

) ∈ Zπ such that (−u, v) ∈ [τl , τl+1) × ×di=1[ϑ(i)

li, ϑ(i)

li+1). Now, if (Ai)ki=1

and (Bi)ki=1 are subsets of R then using the fact that (∩iAi)�(∩iBi) ⊂ ∪i(Ai�Bi), where Ai�Bi := (Ai \ Bi) ∪ (Bi \ Ai) is

the symmetric difference of Ai and Bi, it is straightforward to show that 1(×k

i=1Ai)�(×ki=1Bi)

≤ ∑ki=1 1Ai�Bi . Hence, letting

ϑ := (ϑ(1)l1

, . . . , ϑ(d)ld

), fπ := 1(−∞,τl ]×(−∞,ϑ], and fπ := 1[τl ,∞)×(−∞,ϑ], we have that

|f − fπ |2 = |1(−∞,u]×(−∞,v] − 1(−∞,τl ]×(−∞,ϑ]|2 = 1(−∞,u]×(−∞,v]�(−∞,τl ]×(−∞,ϑ]

≤ 1(−∞,u]�(−∞,τl ] +d∑

i=1

1(−∞,v(i)]�(−∞,ϑ(i)

li]

= 1(u∧τl ,u∨τl ] +d∑

i=1

1(v(i)∧ϑ

(i)li

,v(i)∨ϑ(i)li

]

≤ 1(τl ,τl+1) +d∑

i=1

1(ϑ

(i)li

,ϑ(i)li+1)

. (A.37)

Similarly,

|f r − fπ |2 = 1[−u,∞)×(−∞,v]�[τl ,∞)×(−∞,ϑ] ≤ 1[−u∧τl ,−u∨τl)+

d∑i=1

1(v(i)∧ϑ

(i)li

,v(i)∨ϑ(i)li

]

≤ 1[τl ,τl+1) +d∑

i=1

1(ϑ

(i)li

,ϑ(i)li+1)

.

Let supp(W) := ×di=1[a(i), b(i)] with the convention that [−∞, ∞] := (−∞, ∞), [−∞, ·] := (−∞, ·], and [·, ∞] :=

[·, ∞) so that coordinates of W are allowed to have unbounded support under this notation. By Equation (A.37),

‖f − fπ‖22,PA0

≤∫ τl+1

τl

dPε +d∑

i=1

∫1

(ϑ(i)li

∨a(i) ,ϑ(i)li+1∧b(i))

dPW (i)

≤ Fε(τl+1) − Fε(τl) +d∑

i=1

FW (i) (ϑ(i)li+1 ∧ b(i)−) − FW (i) (ϑ

(i)li

∨ a(i)) < cε. (A.38)

Similarly, using the fact that Fε(τl+1) − Fε(τl) < ε,

‖f r − fπ‖22,PA0

< cε. (A.39)

Therefore, Equation (A.36) is proved. Next, note that since ϑ(i)li

and ϑ(i)li+1 enter Equation (A.38) only via ϑ

(i)li

∨ a(i) and

ϑ(i)li+1 ∧ b(i), we can assume without loss of generality that each v(i) ∈ [ϑ(i)

li∨ a(i), ϑ(i)

li+1 ∧ b(i)) so that (u, v) ∈ Eπ × Wπ ⊂R × supp(W) and (−u, v) ∈ Eπ × Wπ with Eπ := [τl , τl+1), Eπ := [τl , τl+1), Wπ := ×d

i=1[ϑ(i)li

∨ a(i), ϑ(i)li+1 ∧ b(i)),

Pε,W (Eπ × Wπ ) > 0, and Pε,W (Eπ × Wπ ) > 0. Next, replacing (u, v) by Zj := (εj , Wj) and (−u, v) by Z rj := (−εj , Wj)

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

40 T. Chen and G. Tripathi

in the argument leading to Equation (A.39), we obtain that for fj := 1(−∞,Zj ] = 1(−∞,εj ]×(−∞,Wj ],

Zj ∈ Eπ × Wπ =: Sπ and Z rj ∈ Eπ × Wπ =: Sπ =⇒ d(fj , fπ ; f r

j , fπ ) < (cε)1/2. (A.40)

Recall that Zj := (εj , Wj) and Z rj := (−εj , Wj) and let fj := 1

(−∞,Zj ] = 1(−∞,εj ]×(−∞,Wj ]. To prove Equation (A.35), itsuffices to show that

limn→∞ Pr(∪n

j=1{d(fj , fj ; f rj , f r

j ) ≤ (cε)1/2 and (Zj , Z rj ) ∈ Sπ × Sπ }) = 1. (A.41)

This is because Equations (A.40) and (A.41) and the triangle inequality imply that the event

Kn := {d(fj , fπ ; f rj , fπ ) < 2(cε)1/2 for some j ≤ n}

holds w.p.a.1. Moreover,

d(fj , f ; f rj , f r) ≤ d(fj , fπ ; f r

j , fπ ) + (cε)1/2 (by Equation A.36)

< 3(cε)1/2 for some j ≤ n (if Kn is true).

Therefore, since Kn holds w.p.a.1,

1 = limn→∞ Pr(Kn) ≤ lim

n→∞ Pr(d(fj , f ; f rj , f r) < 3(cε)1/2 for some j ≤ n)

and Equation (A.35) follows because ε was arbitrary.So we now prove Equation (A.41). Since εj = εj − �(Xj , θ , θ0) and

|fj − fj| = |1(−∞,Zj ] − 1(−∞,Zj ]| ≤ |1(−∞,εj ] − 1(−∞,εj ]| = 1(εj∧εj ,εj∨εj ],

we have that∫|fj − fj|2 dPε|X=x,W=w ≤

∫ εj∨εj

εj∧εj

dPε|X=x,W=w

≤ ζ(x, w)|�(Xj , θ , θ0)| (by Assumption 3.2(ii))

≤ ζ(x, w)(‖μ0(Xj)‖ ‖θ − θ0‖ + |ρ(Xj , θ , θ0)|) (by Assumption 3.1)

≤ ζ(x, w)

⎛⎝‖μ0(Xj)‖ ‖θ − θ0‖ + supθ∈B(θ0,‖θ−θ0‖)

|ρ(Xj , θ , θ0)|⎞⎠ .

Similarly,

|f rj − f r

j | = |1[−εj ,∞)×(−∞,Wj ] − 1[−εj ,∞)×(−∞,Wj ]| ≤ |1[−εj ,∞) − 1[−εj ,∞)| = 1[−εj∧−εj ,−εj∨−εj)

implies that ∫|f r

j − f rj |2 dPε|X=x,W=w ≤ ζ(x, w)

⎛⎝‖μ0(Xj)‖ ‖θ − θ0‖ + supθ∈B(θ0,‖θ−θ0‖)

|ρ(Xj , θ , θ0)|⎞⎠ .

Hence,

d2(fj , fj ; f r

j , f rj ) ≤ ‖ζ‖2,PX,W

⎛⎝‖μ0(Xj)‖ ‖θ − θ0‖ + supθ∈B(θ0,‖θ−θ0‖)

|ρ(Xj , θ , θ0)|⎞⎠ . (A.42)

Consequently, for each r, η > 0,

{d2(fj , fj ; f r

j , f rj ) > r} ∩ {‖θ − θ0‖ < η} ⊂

{‖ζ‖2,PX,W

(‖μ0(Xj)‖ η + sup

θ∈B(θ0,η)

|ρ(Xj , θ , θ0)|)

> r

},

which, by Equation (A.42), implies that for each j,

({d2(fj , fj ; f r

j , f rj ) > r} ∪ {(Zj , Z r

j ) �∈ Sπ × Sπ }) ∩ {‖θ − θ0‖ < η}

⊂{

‖ζ‖2,PX,W

(‖μ0(Xj)‖ η + sup

θ∈B(θ0,η)

|ρ(Xj , θ , θ0)|)

> r

}∪ {(Zj , Z r

j ) �∈ Sπ × Sπ }.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Journal of Nonparametric Statistics 41

Hence, we have that

∩nj=1({d2

(fj , fj ; f rj , f r

j ) > r} ∪ {(Zj , Z rj ) �∈ Sπ × Sπ }) ∩ {‖θ − θ0‖ < η}

⊂ ∩nj=1

{‖ζ‖2,PX,W

(‖μ0(Xj)‖ η + sup

θ∈B(θ0,η)

|ρ(Xj , θ , θ0)|)

> r

}∪ {(Zj , Z r

j ) �∈ Sπ × Sπ }.

Therefore, since the observations are i.i.d. and p := Pr((Zj , Z rj ) ∈ Sπ × Sπ ) > 0 for all j,

Pr(∩nj=1({d2

(fj , fj ; f rj , f r

j ) > r} ∪ {(Zj , Z rj ) �∈ Sπ × Sπ }) ∩ {‖θ − θ0‖ < η})

≤(

Pr

(‖ζ‖2,PX,W

(‖μ0(X1)‖ η + sup

θ∈B(θ0,η)

|ρ(X1, θ , θ0)|)

> r

)+ 1 − p

)n

≤(

r−1‖ζ‖2,PX,W

(E‖μ0(X1)‖ η + E sup

θ∈B(θ0,η)

|ρ(X1, θ , θ0)|)

+ 1 − p

)n

≤ (r−1‖ζ‖2,PX,W (‖(μ′0μ0)

1/2‖2,PX + 1)η + 1 − p)n,

where the second inequality is by Chebyshev and the third by Jensen and Assumption 3.1. Let r := cε and choose η smallenough so that r−1‖ζ‖2,PX,W (‖(μ′

0μ0)1/2‖2,PX + 1)η + 1 − p < 1. Then,

∞∑n=1

Pr(∩nj=1({d2

(fj , fj ; f rj , f r

j ) > cε} ∪ {(Zj , Z rj ) �∈ Sπ × Sπ }) ∩ {‖θ − θ0‖ < η}) < ∞.

Hence, by Borel–Cantelli,

Pr(∩nj=1({d2

(fj , fj ; f rj , f r

j ) > cε} ∪ {(Zj , Z rj ) �∈ Sπ × Sπ }) ∩ {‖θ − θ0‖ < η} i.o.) = 0

⇐⇒Pr(∪n

j=1({d2(fj , fj ; f r

j , f rj ) ≤ cε} ∩ {(Zj , Z r

j ) ∈ Sπ × Sπ }) ∪ {‖θ − θ0‖ ≥ η} a.b.f.m n) = 1,

where ‘a.b.f.m n’ is short for ‘all but finitely many n’. Therefore, since limn→∞ Pr(‖θ − θ0‖ ≥ η) = 0 by Assumption 3.4,it follows that

limn→∞ Pr(∪n

j=1{d(fj , fj ; f rj , f r

j ) ≤ (cε)1/2 and (Zj , Z rj ) ∈ Sπ × Sπ }) = 1. �

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

5:52

28

Apr

il 20

13

Recommended