Uniform Convergence of Semi-Nonparametric Density

Uniform Convergence of Semi-NonparametricDensity Functions and Their Derivatives,with Application to the Semi-Nonparametric

Fractional Index Regression Model

Herman J. Bierens∗

Pennsylvania State University

June 10, 2018

Abstract

In this paper I will set forth conditions under which semi-nonparametric(SNP) density functions and their derivatives are uniformly conver-gent, and derive their rates of uniform convergence. These results willbe applied to sieve least squares estimation of the SNP fractional in-dex regression model, with asymptotic normality results based on theapproach in Bierens (2014): ”Consistency and Asymptotic Normalityof Sieve ML Estimators Under Low-Level Conditions”, EconometricTheory 30, 1021-1076.

1 Introduction

In Bierens (2014a,b) I have shown that any continuous density h(u) on (0, 1)satisfying h(u) > 0 on (0, 1) can be approximated arbitrarily close by

hn(u) =

¡1 +

Pnk=1 δk

√2 cos(kπu)

¢21 +

Pnm=1 δ

2m

, (1)

∗Professor Emeritus of Economics, Pennsylvania State University. Please address cor-respondence by email only to [email protected], because I have no longer an office at PennState. This paper was presented at the Academia Sinica, Taiwan.

1

in the sense that

limn→∞

Z 1

0

|h(u)− hn(u)|du = 0, (2)

where

δm =

R 10

√2 cos(mπu)

ph(u)duR 1

0

ph(u)du

, m ∈ N, (3)

withP∞

m=1 δ2m <∞.Moreover, in these two papers it has been shown that (2)

is equivalent to limn→∞ hn(u) = h(u) a.e. on [0, 1], i.e., limn→∞ hn(u) = h(u)pointwise in u in a subset of [0, 1] with Lebesgue measure 1.Furthermore, given an a priory chosen absolutely continuous distrib-

ution function G(x) with continuous density g(x), support R and inverseG−1(u), any continuous density f(x) with support R can be written asf(x) = h(G(x))g(x), where

h(u) = f(G−1(u))/g(G−1(u)) (4)

is a continuous density on (0, 1) satisfying h(u) > 0 on (0, 1). Thus, denotingfn(x) = hn(G(x))g(x), we have limn→∞

R∞−∞ |fn(x) − f(x)|dx = 0 and thus

limn→∞ fn(x) = f(x) a.e. on R.The questions I will address in this paper are:

• Under what conditions on h(u) do we also havelimn→∞

sup0≤u≤1

|hn(u)− h(u)| = 0,

limn→∞

sup0≤u≤1

¯̄(d/du)ihn(u)− (d/du)ih(u)

¯̄= 0,

for i = 1, 2, ..., ` and some ` ∈ N ?• Under what conditions on f(x) and g(x) are these conditions on h(u)in (4) satisfied, so that

limn→∞

supx∈R

|fn(x)− f(x)| = 0,limn→∞

supx∈R

¯̄(d/dx)ifn(x)− (d/dx)if(x)

¯̄= 0,

for i = 1, 2, ..., ` as well ?

• What are the rates of uniform convergence?

2

In Bierens (2014b) I have imposed these conditions by assuming thatthe parameters δm in (3) satisfy

P∞m=1m

`|δm| <∞ for some ` ∈ N. However,as will be shown, the latter condition can be derived from smoothness andmoment conditions on f and g.The answers to these questions play a key role in deriving the consistency

and asymptotic normality of sieve estimators of semi-nonparametric (SNP)models, as will be shown for the SNP fractional index regression model.Recall that in general SNP models are models for which the functional formis only partly parametrized and where the non-specified parts are unknownfunctions which are approximated by series expansions.To motivate my answers to the above questions, I will give in section

2 a brief review of the key elements of Hilbert spaces of square-integrablefunctions, and in section 3 I will briefly review the various ways densityfunctions can be approximated by series expansions.In section 4 I will discuss the core idea of attaining uniform convergence

of functions in the Hilbert space L2(0, 1) of square-integrable functions on[0, 1]. In particular, given a four times differentiable real function ϕ on (0, 1)such that ϕ0000 ∈ L2(0, 1), I will derive the series expansions of ϕ, ϕ0, ϕ00 andϕ000 in terms of the cosine and sine sequences and show that they convergeuniformly on [0, 1]. It appears that the fastest rates of uniform convergenceare attained if the tail conditions

ϕ0(0) = ϕ0(1) = 0 and ϕ000(0) = ϕ000(1) = 0 (5)

hold.Given that these conditions hold for ϕ(u) =

ph(u), where h(u) is a

density on [0, 1], I will show in section 5 that the series expansions of h(u)and its derivatives h0(u), h00(u) and h000(u) converge uniformly on [0, 1] aswell, and that the δm’s defined by (3) satisfy

P∞m=1m

3|δm| <∞.In section 6 I will set forth conditions on f and g in (4) such that, with

ϕ(u) =ph(u), the conditions ϕ0000 ∈ L2(0, 1) and (5) hold, and in section 7

I will specialize these results to densities on R and their derivatives.The infinite-dimensional parameter space of the δm’s in (3) is now

∆3 =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| <∞).

Endowing ∆3 with the norm ||δ||3 =P∞

m=1m3|δm| and associated metric,

it becomes a Banach space. In section 8 it will be shown that for given

3

M ∈ (0,∞) the subspace ∆3(M) = {δ = {δm}∞m=1 :P∞

m=1m3|δm| ≤M} of

∆3 is compact. This result is very convenient in SNP modeling and inference,as will be demonstrated in section 9.In section 9 I will apply these results to the SNP fractional index regres-

sion model, where the dependent variable Y is a fraction, so that Pr[Y ∈(0, 1)] = 1, and given a vector X of covariates, E[Y |X] = F0(λ(X,β0)) a.s.,where λ(x, β) is a given linear parametric index function1 and F0 is an un-known absolutely continuous distribution function with support R. It willbe shown that β0 and F0 can be estimated consistently by sieve nonlinearregression, similar to standard parametric nonlinear regression. Moreover,using my approach in Bierens (2014b), I will show that the sieve estimatorof β0 is asymptotically zero-mean normal at the usual parametric rate.In section 10 I will make a few concluding remarks, and the last section

11 is an appendix containing proofs.

2 Hilbert spaces of functions: A brief review

Let w(x) be a given density on R. Consider the space L2 (w) of real functionsf(x) on R satisfying Z

f(x)2w(x)dx <∞.Endow the space L2 (w) with the innerproduct

hf, gi =Zf(x)g(x)w(x)dx

and associated norm ||f || = phf, fi and metric ||f − g||. Then L2 (w) is aHilbert space.Recall that in general

A Hilbert space H is a vector space endowed with an inner product and asso-ciated norm and metric such that every Cauchy sequence in H takes a limitin H.

In particular, for any sequence fn ∈ L2 (w) satisfyinglim

min(m,k)→∞kfm − fkk = 0

1I.e., given x, λ(x,β) is linear in β, and given β, λ(x,β) is linear in x.

4

(so that fn a Cauchy sequence) it can be shown that there exists a functionf ∈ L2 (w) such that

limn→∞

kfn − fk = 0.Thus, L2 (w) is a Hilbert space.

2.1 Complete orthonormal sequences

Let {ϕj(x)}∞j=0 be an orthonormal sequence in L2 (w):

hϕi,ϕji =Z

ϕi (x)ϕj (x)w(x)dx = I(i = j),

where I(.) is the indicator function. Let fn(x) be the linear projection off(x) on {ϕj(x)}nj=0:

fn(x) =nXj=0

γjϕj (x) , where ||f − fn||2 is minimal.

The coefficients γj for which this is the case are

γj = hf,ϕji =Zf(x)ϕj (x)w(x)dx,

∞Xj=0

γ2j <∞.

The γj’s involved are called the Fourier coefficients of f(x).An orthonormal sequence {ϕj(x)}∞j=0 in L2 (w) is called complete if for

arbitrary f ∈ L2 (w) ,

limn→∞

||f − fn|| =slimn→∞

Z(f(x)− fn(x))2w(x)dx = 0, (6)

where

fn(x) =nXj=0

γjϕj(x) with γj = hf,ϕji .

Moreover, since fn(x) is a projection with residual rn(x) = f(x)− fn(x), wemust have

hrn, fni = 0,

5

hence by (6),Zf(x)2w(x)dx =

Zfn(x)

2w(x)dx+Zrn(x)

2w(x)dx

=nXj=0

γ2j +

Z(f(x)− fn(x))2w(x)dx

=∞Xj=0

γ2j .

and Z(f(x)− fn(x))2w(x)dx =

∞Xj=n+1

γ2j

Furthermore, note that (6) implies that f(x) = limn→∞ fn(x) a.e. in thesupport of w(x), i.e.

f(x) = limn→∞

fn(x)

pointwise in x in a set X ⊂ {x ∈ R : w(x) > 0} satisfying RXw(x)dx = 1.See Bierens (2014a, Theorem 9).Finally, note that the rate of convergence may depend on x, i.e., for every

ε > 0 and x ∈ X there exists an n0(ε, x) ∈ N such that

|f(x)− fn(x)| < ε if n > n0(ε, x).

2.2 Examples of complete orthonormal sequences

2.2.1 Hermite polynomials

In the casew(x) = exp

¡−x2/2¢ /√2π, x ∈ R,the Hermite polynomials form a complete orthonormal sequence in the cor-responding Hilbert space L2 (w) . Note that the Hermite polynomials ϕk (x) ,k ≥ 0, on R can be generated recursively by

√k + 1ϕk+1(x)− x.ϕk(x) +

√kϕk−1(x) = 0, k ≥ 1,

starting from ϕ0(x) = 1,ϕ1(x) = x. See, for example, Hamming (1973).

6

2.2.2 Legendre polynomials

In the case that w(u) is the uniform density on [0, 1],

w(u) = I (0 ≤ u ≤ 1) ,

the Hilbert space L2(w) involved is denoted by L2(0, 1).The Legendre polynomials form a complete orthonormal sequence in

L2 (0, 1) . These polynomials, ϕk(u), say, can be generated recursively by

(k + 1) /2√2k + 3

√2k + 1

ϕk+1(u) + (0.5− u) .ϕk(u)

+k/2√

2k + 1√2k − 1ϕk−1(u) = 0, k ≥ 1,

starting from ϕ0(u) = 1,ϕ1(u) =√3 (2u− 1) . See again Hamming (1973).

2.2.3 Trigonometric series

Other complete orthonormal sequences in L2(0, 1) are:

• the cosine series,

ϕk(u) =

½1 for k = 0,√2 cos(kπu) for k ∈ N.

• the Fourier series,

ϕ0(u) = 1,

ϕ2k−1(u) =√2 sin (2kπu) , k ∈ N,

ϕ2k(u) =√2 cos (2kπu) , k ∈ N.

• the sine seriesϕk(u) =

√2 sin(kπu), k ∈ N.

7

2.3 How well (or bad) do the trigonometric series fit?

In the cases of the cosine and Fourier series I will use the test function

f(u) = u(4− 3u), 0 ≤ u ≤ 1,whereas in the sine case I will use the derivative of f(u) as test function:

f 0(u) = 4− 6u, 0 ≤ u ≤ 1.Since f(u) is a polynomial of order 2, we can represent f(u) exactly by a

linear combination of the first three Legendre polynomials, i.e.,

f(u) ≡ α0ϕ0(u) + α1ϕ1(u) + α2ϕ2(u)

where

ϕ0(u) = 1, ϕ1(u) =√3 (2u− 1) , ϕ2(u) =

√5¡6u2 − 6u+ 1¢

and

αm =

Z 1

0

f(u)ϕm(u)du, m = 0, 1, 2.

However, by nature the trigonometric series ”wobble” with increasing fre-quency, so the question is whether these series are capable of approximatinga smooth function like f(u) = u(4− 3u), using a modest number of leadingelements of these series.

2.3.1 The cosine series

The SNP version of f(u) = u(4− 3u), 0 ≤ u ≤ 1, takes the form

fn(u) = α0 +nX

m=1

αm√2 cos(mπu), n ∈ N,

where

α0 =

Z 1

0

f(u)du = 1,

αm =

Z 1

0

f(u)√2 cos(mπu)du

=

½ −6√2(mπ)−2 if m ≥ 2 is even,−2√2(mπ)−2 if m ≥ 1 is odd. (7)

8

Note that in this case

sup0≤u≤1

|f(u)− fn(u)| ≤ 6√2

π2

∞Xm=n+1

m−2 = O(1/n).

The function f(u) will be compared with its SNP version fn(u) for n = 10and n = 20, respectively.

9

Clearly, fn(u) for n = 10 approximates f(u) very well, except in the tailsof fn(u).The reason for the latter is that f 0n(u) = −π

Pnm=1 αmm

√2 sin(mπu),

so that f 0n(0) = f0n(1) = 0. As expected, the tail fit becomes better for larger

truncation orders n.

2.3.2 The Fourier series

In the case of the Fourier series the SNP version of f(u) = u(4 − 3u) takesthe form:

fn(u) = γ0 +

n/2Xk=1

γ2k√2 cos(2kπu) +

n/2Xk=1

γ2k−1√2 sin(2kπu),

for even n, where

γ0 =

Z 1

0

f(u)du = 1,

γ2k =

Z 1

0

f(u)√2 cos(2kπu)du = −

³3/√2´(kπ)−2 ,

γ2k−1 =

Z 1

0

f(u)√2 sin(2kπu)du = −

³√2kπ

´−1.

10

As we will see, the fit of the Fourier series approximation for n = 10 ispretty bad compared with the cosine series approximations, especially in thetails. For n = 20 the fit is slightly better, as expected, but still inferior tothe cosine case.This bad fit may be due to the slower rate of convergence to zero of γ2k−1,

i.e.,

γ2k−1 =Z 1

0

f(u)√2 sin(2kπu)du = O(k−1),

compared with

α2k−1 =Z 1

0

f(u)√2 cos((2k − 1)πu)du = O(k−2)

in the cosine case, whereas γ2k and α2k are both of order O(k−2).

11

2.3.3 The sine series

The SNP version of f 0(u) = 4−6u is just the derivative of fn(u) in the cosinecase:

f 0n(u) = −πnX

m=1

m.αm√2 sin(mπu),

where αm is the same as in (7), henceZ 1

0

(f 0(u)− f 0n(u))2 du = π2∞X

m=n+1

m2α2m = O

Ã ∞Xm=n+1

m−2!

= O(1/n) = o(1),

and thus limn→∞ f 0n(u) = f0(u) pointwise in u ∈ A ⊂ [0, 1], where the set A

has Lebesgue measure 1. Clearly, the set A does not contain the singletons{0} and {1} because f 0(0) = 4, f 0(1) = −2, whereas fn(0) = fn(1) = 0.

12

To see what happens as n becomes much larger, consider the case n = 50.

13

Apparently, the pointwise rate of convergence of limn→∞ f 0n(u) = f 0(u)for u close to 0 or 1 seems to slow down.

3 Densities and distribution functions

3.1 Series representation of densities

Given a density w(x) with support (a, b) ⊆ R and corresponding completeorthonormal sequence ϕj(x), every density function f(x) with support (a, b)can be written as

f(x) = limn→∞

fn(x) a.e. on (a, b),

where

fn(x) =w(x)

³Pnj=0 γjϕj(x)

´2Pnj=0 γ

2j

with ∞Xj=0

γ2j =

Z b

a

f(x)dx = 1

14

However, there are uncountable many sequences γj for which this is true, dueto the square in the expression for fn(x).In particular, let for an arbitrary τ ∈ (a, b),

g (x|τ) = (I (x ≤ τ)− I (x > τ))pf(x)/w(x),

where I(.) is the indicator function. Then f(x) = w(x)g (x|τ)2, henceg (x|τ) ∈ L2 (w) . Consequently, denoting

fn(x|τ) =w(x)

³Pnj=0 γj(τ)ϕj(x)

´2Pnj=0 γj(τ)

2,

where

γj(τ) =

Z b

a

g (x|τ)ϕj(x)w(x)dx

=

Z τ

a

ϕj(x)pf(x).w(x)dx−

Z b

τ

ϕj(x)pf(x).w(x)dx,

we havef(x) = lim

n→∞fn(x|τ) a.e. on (a, b)

as well,This result can be used to reparametrize the γj’s more conveniently, pro-

vided that ϕ0(x) ≡ 1, which is usually the case. (An exception is the sineseries). Then we can always choose

γ0 ∈µ0,

Z b

a

pf(x).w(x)dx

¶,

so that the conditionP∞

j=0 γ2j = 1 can be implemented by reparametrizing

the γj’s as

γ0 =1p

1 +P∞

m=1 δ2m

γj =δjp

1 +P∞

m=1 δ2m

, j ≥ 1,

where ∞Xm=1

δ2m <∞.

15

This reparametrization is merely a convenient way to implement the con-dition

P∞j=0 γ

2j = 1, and does not solve the lack of uniqueness problem.

However, if w(x), f(x) and the ϕm(x)’s are continuous on (a, b) then thesequence {δm}∞m=1 is unique:

Theorem 3.1. Given a density w(x) which is continuous on its support(a, b) ⊆ R, and corresponding complete orthonormal sequence {ϕm(x)}∞m=0in L2(w), with ϕ0(x) ≡ 1, such that for each m ∈ N, ϕm(x) is continuouson (a, b),2 for every density function f(x) which is continuous on its support(a, b) there exists a unique sequence {δm}∞m=1, given by

δm =

R baϕm(x)

pw(x)

pf(x)dxR b

a

pw(x)

pf(x)dx

, m ∈ N,

and satisfyingP∞

m=1 δ2m <∞, such thatf(x) = lim

n→∞fn(x) a.e. on (a, b),

or equivalently

limn→∞

Z b

a

|f(x)− fn(x)|dx = 0,where

fn(x) = w(x)(1 +

Pnk=1 δkϕk(x))

2

1 +Pn

m=1 δ2m

.

Proof. See Bierens (2014a, Theorem 21).

Gallant and Nychka (1987) use this approach to generalize the standardnormal density to

fn(x) =exp (−x2/2)√

2π× (1 +

Pnk=1 δkϕk(x))

2

1 +Pn

m=1 δ2m

,

where the ϕk(x)’s are the Hermite polynomials. They call these densitiessemi-nonparametric (SNP) densities. In particular, it follows from Theo-rem 3.1 that for every continuous density f(x) with support R there exists aunique sequence {δm}∞m=1 satisfying

P∞m=1 δ

2m <∞ such that limn→∞

R∞−∞ |f(x)−

fn(x)|dx = 0, hence fn(x)→ f(x) a.e. on R.2Which is usually the case.

16

3.2 Density and distribution functions on the unit in-terval

Let G(x) be an a priori chosen distribution function with density g(x) andsupport

X = {x ∈ R : g(x) > 0},Every absolutely continuous distribution function F (x) with support X canbe written as

F (x) = H(G(x))

whereH(u) is an absolutely continuous distribution function on [0, 1], namely

H(u) = F (G−1(u)),

where G−1(u), u ∈ [0, 1], is the inverse of G(x).The density f(x) of F (x) = H(G(x)) can be written as

f(x) = h(G(x))g(x)

where h(u) is the density of H(u):

h(u) = f(G−1(u))/g(G−1(u)).

Therefore, in modeling general density and distribution functions semi-nonparametrically, it suffices to model the density h(u) and its c.d.f. H(u)semi-nonparametrically.From now onwards I will use the cosine sequence as the preferred complete

orthonormal sequence in L2(0, 1). Then Theorem 3.1 translates as follows.

Theorem 3.2. For every continuous density function h(u) on (0, 1) sat-isfying h(u) > 0 on (0, 1) there exists a unique sequences {δm}∞m=1 de-fined by (3) such that h(u) = limn→∞ hn(u) a.e. on [0, 1], or equivalently,limn→∞

R 10|h(u)− hn(u)|du = 0, where hn(u) is the corresponding SNP den-

sity, defined by (1)

The advantage of the cosine sequence is that then the SNP distributionfunction Hn(u) =

R u0hn (z)dz has a closed form expression:

Hn(u) = u+1

1 +Pn

m=1 δ2m

"2√2

nXk=1

δksin (kπu)

kπ+

nXm=1

δ2msin (2mπu)

2mπ

17

+2nXk=2

k−1Xm=1

δkδmsin ((k +m)πu)

(k +m)π+ 2

nXk=2

k−1Xm=1

δkδmsin ((k −m)πu)(k −m)π

#.

Of course, a similar closed form expression can be derived for the Fouriersequence. However, as has been demonstrated, the cosine sequence yieldsthe best fit.

4 Uniform convergence by integration

Let ϕ(u) be a real function on (0, 1) satisfyingR 10ϕ(u)2du < ∞, so that

ϕ(u) ∈ L2(0, 1). Recall that by the completeness of the cosine series we havelimn→∞ ϕn(u) = ϕ(u) a.e. on (0, 1),where ϕn(u) = γ0+

Pnm=1 γm

√2 cos(mπu)

with γ0 =R 10ϕ(u)du, γm =

R 10

√2 cos(mπu)ϕ(u)du for m ∈ N.

The questions I will now address are: Under what conditions do we have

limn→∞

sup0≤u≤1

|ϕn(u)− ϕ(u)| = 0, (8)

limn→∞

sup0≤u≤1

|diϕn(u)/(du)i − diϕn(u)ϕ(u)/(du)i| = 0

for i = 1, 2, .., ` and some ` ∈ N, (9)

and what are the rates of uniform convergence?

4.1 The case ` = 1

Suppose that ϕ(u) is twice differentiable on (0, 1), and that

ϕ00(u) ∈ L2(0, 1).

Denote

α0 =

Z 1

0

ϕ00(u)du = ϕ0(1)− ϕ0(0),

αm =

Z 1

0

ϕ00(u)√2 cos(mπu)du for m ∈ N,

ϕ00n(u) = α0 +nX

m=1

αm√2 cos(mπu) for n ∈ N.

18

Recall from section 2.1 thatZ 1

0

ϕ00(u)2du =∞Xm=0

α2m

and Z 1

0

(ϕ00n(u)− ϕ00(u))2 du =∞X

m=n+1

α2m = o(1).

Note that

|ϕ0(1)− ϕ0(0)| =¯̄̄̄Z 1

0

ϕ00(u)du¯̄̄̄≤Z 1

0

|ϕ00(u)|du ≤sZ 1

0

ϕ00(u)2du <∞,(10)

and similarly, for any u ∈ (0, 1),

|ϕ0(u)− ϕ0(0)| =¯̄̄̄Z u

0

ϕ00(v)dv¯̄̄̄≤Z 1

0

|ϕ00(v)|dv <∞. (11)

Since ϕ0(u) is differentiable and therefore continuous on (0, 1), it is uniformlycontinuous on any closed interval in (0, 1), so that |ϕ0(u)| < ∞ for eachu ∈ (0, 1). Consequently, it follows from (11) that |ϕ0(0)| <∞, which by (10)implies that |ϕ0(1)| <∞. Thus,

ϕ0(0) ∈ R, ϕ0(1) ∈ R,

which implies that ϕ0(u) is uniformly continuous on [0, 1].Now the primitive of ϕ00n(u) takes the general form

ϕ0n(u) = c+ (ϕ0(1)− ϕ0(0))u+

nXm=1

αmπm

√2 sin(mπu)

for some constant c. Since ϕ0n(1) = ϕ0(1) and ϕ0n(0) = ϕ0(0) if c = ϕ0(0), thelatter is a natural choice for c, so that

ϕ0n(u) = ϕ0(0) + (ϕ0(1)− ϕ0(0))u+nX

m=1

αmπm

√2 sin(mπu)

= ϕ0(0) +Z u

0

ϕ00n(v)dv.

19

Since also ϕ0(u) = ϕ0(0) +R u0ϕ00(v)dv we have

lim supn→∞

sup0≤u≤1

|ϕ0(u)− ϕ0n(u)| ≤ lim supn→∞

Z 1

0

|ϕ00(v)− ϕ00n(v)|dv

≤ lim supn→∞

sZ 1

0

(ϕ00(v)− ϕ00n(v))2 dv =

vuut limn→∞

∞Xm=n+1

α2m = 0.

Consequently, the series expansion

ϕ0(u) ≡ ϕ0(0) + (ϕ0(1)− ϕ0(0))u+∞Xm=1

αmπm

√2 sin(mπu) (12)

holds exactly and uniformly on [0, 1].As to the rate of uniform convergence, note that

sup0≤u≤1

|ϕ0(u)− ϕ0n(u)| ≤√2

∞Xm=n+1

|αm|πm

= o(n−1/2),

where the latter is due to the following lemma:

Lemma 4.1.P∞

m=1 α2m <∞ implies that for c > 1/2,

∞Xm=n+1

m−c|αm| = o¡n1/2−c

¢.

Proof. See the Appendix.

Next, the primitive ϕ(u) of ϕ0(u) takes the general form

ϕ(u) = c+ ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−∞Xm=1

αm(πm)2

√2 cos(mπu)

for some constant c. But thenZ 1

0

ϕ(v)dv = c+1

2ϕ0(0) +

1

6(ϕ0(1)− ϕ0(0)) ,

20

so that

ϕ(u) ≡Z 1

0

ϕ(v)dv − 12ϕ0(0)− 1

6(ϕ0(1)− ϕ0(0))

+ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−∞Xm=1

αm(πm)2

√2 cos(mπu).

exactly and uniformly on [0, 1].Moreover, it can be shown that for m ∈ N,Z 1

0

u√2 cos(mπu)du =

√2(−1)m − 1(mπ)2

,Z 1

0

u2√2 cos(mπu)du =

2√2(−1)m(mπ)2

,

hence, since limn→∞P∞

m=n+1m−2 = 0, it follows that

u ≡ 1

2+√2∞Xm=1

(−1)m − 1(mπ)2

√2 cos(mπu),

u2 ≡ 1

3+ 2√2∞Xm=1

(−1)m(mπ)2

√2 cos(mπu),

exactly and uniformly on [0, 1]. Thus,

ϕ(u) ≡Z 1

0

ϕ(v)dv + ϕ0(0)√2∞Xm=1

(−1)m − 1(mπ)2

√2 cos(mπu)

+(ϕ0(1)− ϕ0(0))√2∞Xm=1

(−1)m(mπ)2

√2 cos(mπu)

−∞Xm=1

αm(πm)2

√2 cos(mπu)

exactly and uniformly on [0, 1].However, since ϕ ∈ L2(0, 1), ϕ(u) has also the cosine series representation

ϕ(u) = γ0 +∞Xm=1

γm√2 cos(mπu) a.e. on (0, 1),

21

where

γ0 =

Z 1

0

ϕ(u)du, γm =Z 1

0

ϕ(u)√2 cos(mπu)du for m ∈ N.

Therefore we must have that for m ∈ N,

γm =

Z 1

0

ϕ(u)√2 cos(mπu)du

=

√2 (ϕ0(1)(−1)m − ϕ0(0))− αm

(mπ)2.

Consequently, ϕ(u) has two equivalent cosine series representations, namely

ϕ(u) ≡Z 1

0

ϕ(v)dv − 12ϕ0(0)− 1

6(ϕ0(1)− ϕ0(0))

+ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−∞Xm=1

αm(πm)2

√2 cos(mπu)

≡Z 1

0

ϕ(v)dv +∞Xm=1

γm√2 cos(mπu),

exactly and uniformly on [0, 1].However, there is a difference between these two expressions for ϕ(u),

namely their rates of uniform convergence are different! To see this, denote

ϕn(u) =

Z 1

0

ϕ(v)dv − 12ϕ0(0)− 1

6(ϕ0(1)− ϕ0(0))

+ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−nX

m=1

αm(πm)2

√2 cos(mπu).

Then we have

sup0≤u≤1

|ϕn(u)− ϕ(u)| ≤√2

π2

∞Xm=n+1

|αm|m2

= o¡n−3/2

¢,

22

where the latter is again due to Lemma 4.1. On the other hand, denoting

ϕ∗n(u) =Z 1

0

ϕ(v)dv +nX

m=1

γm√2 cos(mπu),

we have

sup0≤u≤1

|ϕ∗n(u)− ϕ(u)| ≤√2

∞Xm=n+1

|γm|

≤ 2

π2(|ϕ0(1)|+ |ϕ0(0)|)

∞Xm=n+1

m−2 +

√2

π2

∞Xm=n+1

m−2|αm|

=

½o¡n−3/2

¢if ϕ0(0) = ϕ0(1) = 0.

O(n−1) if ϕ0(0) 6= 0 or ϕ0(1) 6= 0.

Remark 4.1. Suppose that the function ϕ(u) is continuous and periodic on[0, 1], i.e. ϕ(u) can be extended to a continuous function ϕ(x) on R by therelation ϕ(x) = ϕ(x + 1) for all x ∈ R. Then with ϕn(u) the Fourier seriesexpansion of ϕ(u) it is well-known that limn→∞ sup0≤u≤1 |ϕn(u)−ϕ(u)| = 0.See Courant and Hilbert (1953, Chapter II). However, for the purpose of thispaper the case that ϕ(u) is periodic on [0, 1] is of no interest.

4.2 The case ` = 3

Now suppose that ϕ(u) is four times differentiable on (0, 1), and that ϕ0000(u) ∈L2(0, 1). Denote

α0 =

Z 1

0

ϕ0000(u)du = ϕ000(1)− ϕ000(0),

αm =

Z 1

0

√2 cos(mπu)ϕ0000(u)du, m ∈ N, (13)

ϕ0000n (u) = α0 +nX

m=1

αm√2 cos(mπu), n ∈ N,

and recall thatR 10(ϕ0000n (u)− ϕ0000(u))2du =

P∞m=n+1 α

2m = o(1). Then it is not

too hard (but somewhat tedious) to verify that, similar to the case ` = 1, thefollowing results hold.

23

Theorem 4.1. Let ϕ(u) be a four times differentiable real function on (0, 1),with fourth derivative ϕ0000(u) ∈ L2(0, 1), and let αm be defined by (13). Thenϕ(u) and its derivatives ϕ0(u), ϕ00(u) and ϕ000(u) are uniformly continuous on[0, 1], with exact and uniform cosine-sine series representations

ϕ(u) ≡ P4(u) +∞Xm=1

αm(mπ)4

√2 cos(mπu),

ϕ0(u) ≡ P 04(u)−∞Xm=1

αm(mπ)3

√2 sin(mπu),

ϕ00(u) ≡ P 004 (u)−∞Xm=1

αm(mπ)2

√2 cos(mπu),

ϕ000(u) ≡ P 0004 (u) +∞Xm=1

αmmπ

√2 sin(mπu)

on [0, 1], where

P4(u) =

Z 1

0

ϕ(v)dv − 12ϕ0(0)

−16

µϕ0(1)− ϕ0(0)− 1

2ϕ000(0)− 1

6(ϕ000(1)− ϕ000(0))

¶− 124

ϕ000(0)− 1

120(ϕ000(1)− ϕ000(0)) + ϕ0(0)u

+1

2

µϕ0(1)− ϕ0(0)− 1

2ϕ000(0)− 1

6(ϕ000(1)− ϕ000(0))

¶u2

+1

6ϕ000(0)u3 +

1

24(ϕ000(1)− ϕ000(0))u4.

Moreover, denoting

ϕn(u) = P4(u) +nX

m=1

αm(mπ)4

√2 cos(mπu),

ϕ0n(u) = P 04(u)−nX

m=1

αm(mπ)3

√2 sin(mπu),

ϕ00n(u) = P 004 (u)−nX

m=1

αm(mπ)2

√2 cos(mπu),

24

ϕ000n (u) = P 0004 (u) +nX

m=1

αmmπ

√2 sin(mπu),

we have by Lemma 4.1 that

sup0≤u≤1 |ϕ(u)− ϕn(u)| = o(n−7/2),sup0≤u≤1 |ϕ0(u)− ϕ0n(u)| = o(n−5/2),sup0≤u≤1 |ϕ00(u)− ϕ00n(u)| = o(n−3/2),sup0≤u≤1 |ϕ000(u)− ϕ000n (u)| = o(n−1/2).

(14)

Note that if ϕ000(1) = ϕ000(0) = 0 and ϕ0(1) = ϕ0(0) = 0 then P4(u) ≡R 10ϕ(v)dv, so that exactly and uniformly on [0, 1],

ϕ(u) ≡Z 1

0

ϕ(v)dv +∞Xm=1

αm(mπ)4

√2 cos(mπu),

ϕ0(u) ≡ −∞Xm=1

αm(mπ)3

√2 sin(mπu),

ϕ00(u) ≡ −∞Xm=1

αm(mπ)2

√2 cos(mπu),

ϕ000(u) ≡∞Xm=1

αmmπ

√2 sin(mπu).

Denote

γm =

Z 1

0

ϕ(u)√2 cos(mπu)du =

αm(mπ)4

, m ∈ N.

Then ∞Xm=1

m8γ2m = π−8∞Xm=1

α2m <∞,

which by Lemma 4.1, with αm replaced by m4γm, implies thatP∞m=n+1m

3|γm| =P∞

m=n+1m−1|m4γm| = o(n−1/2),P∞

m=n+1m2|γm| =

P∞m=n+1m

−2|m4γm| = o(n−3/2),P∞m=n+1m|γm| =

P∞m=n+1m

−3|m4γm| = o(n−5/2),P∞m=n+1 |γm| =

P∞m=n+1m

−4|m4γm| = o(n−7/2).(15)

Note that the first result in (15) implies thatP∞

m=1m3|γm| <∞.

25

Summarizing, the following results hold.

Theorem 4.2. Let ϕ(u) be a four times differentiable real function on (0, 1),with fourth derivative ϕ0000 ∈ L2(0, 1), satisfying the tail conditions

limu↓0

ϕ000(u) = limu↑1

ϕ000(u) = 0, limu↓0

ϕ0(u) = limu↑1

ϕ0(u) = 0.

Then ϕ(u) and its first, second and third derivatives are uniform continuouson [0, 1] and have the exact and uniform cosine-sine series representations

ϕ(u) ≡Z 1

0

ϕ(v)dv +∞Xm=1

γm√2 cos(mπu), (16)

ϕ0(u) ≡ −π∞Xm=1

m.γm√2 sin(mπu),

ϕ00(u) ≡ −π2∞Xm=1

m2γm√2 cos(mπu),

ϕ000(u) ≡ π3∞Xm=1

m3γm√2 sin(mπu)

on [0, 1], where γm =R 10ϕ(u)√2 cos(mπu)du for m ∈ N, satisfying P∞

m=n+1

m3|γm| = o(n−1/2), hence∞Xm=1

m3|γm| <∞. (17)

Moreover, denoting ϕn(u) =R 10ϕ(v)dv+

Pnm=1 γm

√2 cos(mπu), the uniform

convergence rates (14) apply.

5 Uniform convergence of SNP densities onthe unit interval and their derivatives

Let h(u) be a density on [0, 1] satisfying h(u) > 0 on (0, 1), and denoteϕ(u) =

ph(u). Again, suppose that ϕ(u) is four times continuously differen-

tiable on (0, 1), with ϕ0000 ∈ L2(0, 1). Moreover, suppose that ϕ0(1) = ϕ0(0) =0 and ϕ000(1) = ϕ000(0) = 0.

26

As we have seen before, we can write

ϕ(u) ≡Z 1

0

ϕ(v)dv +∞Xm=1

αm(mπ)4

√2 cos(mπu)

≡ γ0 +∞Xm=1

γm√2 cos(mπu), say,

≡ 1 +P∞

m=1 δm√2 cos(mπu)p

1 +P∞

k=1 δ2k

where nowP∞

m=0 γ2m = 1 and

δm = γm.

Z 1

0

ϕ(v)dv =αm(mπ)4

Z 1

0

ph(v)dv.

Note that ∞Xm=n+1

δ2m = O

Ã ∞Xm=n+1

α2mm8

!= o(n−7). (18)

Next, let

ϕn(u) =1 +

Pnm=1 δm

√2 cos(mπu)p

1 +Pn

k=1 δ2k

.

It is now easy to verify, using (18), that we still have

sup0≤u≤1 |ϕ(u)− ϕn(u)| = o(n−7/2),sup0≤u≤1 |ϕ0(u)− ϕ0n(u)| = o(n−5/2),sup0≤u≤1 |ϕ00(u)− ϕ00n(u)| = o(n−3/2),sup0≤u≤1 |ϕ000(u)− ϕ000n (u)| = o(n−1/2).

Using the latter results it follows straightforwardly that the followingresults hold.

Theorem 5.1. Let h(u) be a positive density on (0, 1) such that ϕ(u) =ph(u) satisfies the conditions of Theorem 4.2.3 Then h(u) has exactly and

uniformly the cosine series representation

h(u) ≡¡1 +

P∞m=1 δm

√2 cos(mπu)

¢21 +

P∞m=1 δ

2m

,

3So that h(u) is uniformly continuous on [0, 1].

27

where the δm’s are uniquely defined by (3) and satisfy

∞Xm=1

m3|δm| <∞. (19)

Moreover, with hn(u) defined by (1), the following uniform convergence ratesapply:

sup0≤u≤1 |h(u)− hn(u)| = o(n−7/2),sup0≤u≤1 |h0(u)− h0n(u)| = o(n−5/2),sup0≤u≤1 |h00(u)− h00n(u)| = o(n−3/2),sup0≤u≤1 |h000(u)− h000n (u)| = o(n−1/2).

Furthermore, with H the c.d.f. of h and Hn the c.d.f. of hn we have

sup0≤u≤1

|H(u)−Hn(u)| ≤ sup0≤u≤1

|h(u)− hn(u)| = o(n−7/2).

These uniform convergence results depend crucially on the conditionsthat ϕ0000(u) ∈ L2(0, 1) and ϕ0(1) = ϕ0(0) = 0, ϕ000(1) = ϕ000(0) = 0, whereϕ(u) =

ph(u). So the question is how to impose these conditions.

6 Tail conditions

Recall that, with F (x) an absolutely continuous distribution function withdensity f(x) and support R, we can always write F (x) = H(G(x)), f(x) =h(G(x))g(x), where G(x) is an a priori chosen absolutely continuous distri-bution function with density g(x) and support R, and H(u) is an absolutelycontinuous distribution function on [0, 1] with density h(u), given by

H(u) = F (G−1(u)), h(u) = H 0(u) = f(G−1(u))/g(G−1(u)),

with G−1(u) the inverse of G(x).Thus, with ϕ(u) =

ph(u), the question is now:

Under what conditions on f and g do we have

ϕ0000 ∈ L2(0, 1), ϕ0(1) = ϕ0(0) = 0 and ϕ000(1) = ϕ000(0) = 0 ?

First of all, f and g need to be four times continuously differentiable onR. Second, a necessary condition is that G is chosen such that

ϕ(u) =ph(u) =

pf(G−1(u))/

pg(G−1(u))

28

and its derivatives ϕ0(u), ϕ00(u) and ϕ000(u) are uniformly continuous on [0, 1],as otherwise the aforementioned uniform convergence results cannot hold.Regarding ϕ(u) itself, this requirement holds if and only if the tails of g

are not fatter than those of f, because then ϕ(u) is bounded on [0, 1], whichby continuity of ϕ(u) on (0, 1) implies uniform continuity on [0, 1].

6.1 The choice of G

Given thatR∞−∞ |x|f(x)dx < ∞, which implies that lim|x|→∞ x2f(x) = 0,4

this tail condition holds if we choose for G(x) the c.d.f. of the standardCauchy distribution, i.e.,

G(x) = 0.5 + π−1 arctan(x), g(x) =1

π(1 + x2),

G−1(u) = tan(π(u− 0.5)).Then

h(u) = π¡1 + tan2(π(u− 0.5))¢ f (tan(π(u− 0.5))) , (20)

which satisfies

h(0) = limu↓0h(u) = lim

x→−∞x2f(x) = 0,

h(1) = limu↑1h(u) = lim

x→+∞x2f(x) = 0.

6.2 Tail conditions for the derivatives

Now with G the c.d.f. of the standard Cauchy distribution we can write

ϕ(u) =ph(u) =

√π

µq1 + tan2(π(u− 0.5))

¶η (tan(π(u− 0.5))) , (21)

where for notational convenience, η(x) =pf(x). Of course, in order that

ϕ(u) is four times differentiable on (0, 1) we need to require that η(x) is fourtimes differentiable on R, hence f(x) needs to be four times differentiable onR.Next, denote

φ(u|k, c,ψ) = tank(π(u− 0.5)) ¡1 + tan2(π(u− 0.5))¢c×ψ (tan(π(u− 0.5))) (22)

4See Lemma 6.1 below.

29

for some differentiable real function ψ on R, so that

ϕ(u) =√πφ(u|0, 1/2, η), (23)

Using the well-known fact that dtan(π(u−0.5))/du = π(1+tan2(π(u−0.5))),it follows that

φ0(u|k, c,ψ) = ∂φ(u|k, c,ψ)/∂u= kπφ(u|k − 1, c+ 1,ψ) + 2cπφ(u|k + 1, c,ψ)

+πφ(u|k, c+ 1,ψ0), (24)

hence, plugging in k = 0, c = 0.5 and ψ = η, it follows from (24) that

ϕ0(u) = π√πφ(u|1, 0.5, η) + π

√πφ(u|0, 1.5, η0). (25)

Similarly, it follows from (24) that

ϕ00(u) = π√π (φ0(u|1, 0.5, η) + φ0(u|0, 1.5, η0))

= π2√π (φ(u|0, 1.5, η) + φ(u|2, 0.5, η)

+4φ(u|1, 1.5, η0) + φ(u|0, 2.5, η00)) , (26)

ϕ000(u) = π2√π (φ0(u|0, 1.5, η) + φ0(u|2, 0.5, η)

+4φ0(u|1, 1.5, η0) + φ0(u|0, 2.5, η00))= π3

√π (5φ(u|1, 1.5, η) + φ(u|3, 0.5, η)

+5φ(u|0, 2.5, η0) + 13φ(u|2, 1.5, η0)+9φ(u|1, 2.5, η00) + φ(u|0, 3.5, η000)) , (27)

and

ϕ0000(u) = π3√π (5φ0(u|1, 1.5, η) + φ0(u|3, 0.5, η)

+5φ0(u|0, 2.5, η0) + 13φ0(u|2, 1.5, η0)+9φ0(u|1, 2.5, η00) + φ0(u|0, 3.5, η000))

= π4√π (5φ(u|0, 2.5, η) + 18φ(u|2, 1.5, η) + φ(u|4, 0.5, η))

+56φ(u|1, 2.5, η0) + 40φ(u|3, 1.5, η0)+58φ(u|2, 2.5, η00) + 14φ(u|0, 3.5, η00)+16φ(u|1, 3.5, η000) + φ(u|0, 4.5, η0000)) . (28)

30

Next, observe from (22) that

limu↓0

φ(u|k, c,ψ) = limx→−∞

xk+2cψ (x) , limu↑1

φ(u|k, c,ψ) = limx→+∞

xk+2cψ (x) ,

(29)hence by (23), (25), (26) and (27),

ϕ(0) =√π limx→−∞

xη (x) , ϕ(1) =√π limx→+∞

xη (x) ,

ϕ0(0) = π√π

µlimx→−∞

x2η (x) + limx→−∞

x3η0 (x)¶,

ϕ0(1) = π√π

µlimx→+∞

x2η (x) + limx→+∞

x3η0 (x)¶,

ϕ00(0) = π2√π

µlimx→−∞

x3η (x) + 4 limx→−∞

x4η0 (x) + limx→−∞

x5η00 (x)¶,

ϕ00(1) = π2√π

µlimx→+∞

x3η (x) + 4 limx→+∞

x4η0 (x) + limx→+∞

x5η00 (x)¶,

and

ϕ000(0) = π3√π

µ6 limx→−∞

x4η (x) + 17 limx→−∞

x5η0 (x)

+ 9 limx→−∞

x6η00 (x) + limx→−∞

x7η000 (x)¶,

ϕ000(1) = π3√π

µ6 limx→+∞

x4η (x) + 17 limx→+∞

x5η0 (x)

+ 9 limx→+∞

x6η00 (x) + limx→+∞

x7η000 (x)¶.

Thus,ϕ000(0) = ϕ000(1) = ϕ0(0) = ϕ0(1) = 0and ϕ00(0) = ϕ00(1) = ϕ(0) = ϕ(1) = 0

(30)

if

lim|x|→∞

x4η (x) = 0, lim|x|→∞

x5η0 (x) = 0, lim|x|→∞

x6η00 (x) = 0, lim|x|→∞

x7η000 (x) = 0.

(31)

31

Now assume:

Assumption 6.1. Given a density f(x) with support R, the following con-ditions hold:(a) f(x) is four times continuously differentiable on R;(b)R∞−∞ |x|3

pf(x)dx <∞;

(c) Denoting η(x) =pf(x), the set {x ∈ R : η0(x) = 0 or η00(x) =

0 or η000(x) = 0} is finite.

Assumption 6.1 holds for most densities considered in parametric econometricmodels, in particular the densities of the normal and logistic distributions.Moreover, consider the following easy lemma.

Lemma 6.1. Let ψ(x) be a continuous real function on R such that the set{x ∈ R : ψ(x) = 0} is either finite or empty, and R∞−∞ |ψ(x)|dx < ∞. Thenlim|x|→∞ xψ(x) = 0.


Then lim|x|→∞ x4η (x) = 0 follows from condition (b) and Lemma 6.1.Due to condition (c) and the fact that lim|x|→∞ η(x) = 0 there exists an

a > 0 such that η0(x) ≤ 0 for all x ≥ a and η0(x) ≥ 0 for all x ≤ −a. Thenfor b > a, using integration by parts,Z b

a

x4η0(x)dx = b4η(b)− a4η(a)− 4Z b

a

x3η(x)dx.

Letting b→∞ it follows from lim|x|→∞ x4η (x) = 0 and condition (b) thatZ ∞

a

x4|η0(x)|dx = a4η(a) + 4Z b

a

x3η(x)dx <∞,

and similarlyZ −a

−∞x4|η0(x)|dx = a4η(−a) + 4

Z −a

−∞|x|3η(x)dx <∞.

Hence,R∞−∞ x

4|η0(x)|dx < ∞, which by Lemma 6.1 implies that lim|x|→∞ x5η0 (x) = 0. Along similar lines it can be shown that

32

Lemma 6.2. Assumption 6.1 implies that the limits (31) hold, hence thetail conditions (30) hold.

Finally, we have to set forth conditions such that ϕ0000(u) ∈ L2(0, 1), whichis the case if

R 10ϕ0000(u)2du <∞. A sufficient condition for the latter is that

max

µlimu↓0|ϕ0000(u)|, lim

u↑1|ϕ0000(u)|

¶<∞, (32)

because then ϕ0000(u) is uniformly continuous [0, 1].Now observe from (29) and (28) that

ϕ0000(0) = limu↓0

ϕ0000(u)

= π4√π

µ24 lim

x→−∞x5η(x) + 96 lim

x→−∞x6η0(x)

+72 limx→−∞

x7η00(x) + 16 limx→−∞

x8η000(x) + limx→−∞

x9η0000(x)¶

ϕ0000(1) = limu↑1

ϕ0000(u)

= π4√π limx→+∞

µ24 lim

x→+∞x5η(x) + 96 lim

x→+∞x6η0(x)

+72 limx→+∞

x7η00(x) + 16 limx→+∞

x8η000(x) + limx→+∞

x9η0000(x)¶,

hence, (32) holds if

lim|x|→∞ |x|5η(x) <∞, lim|x|→∞ x6|η0(x)| <∞,lim|x|→∞ |x7η00(x)| <∞, lim|x|→∞ x8|η000(x)| <∞,lim|x|→∞ |x9η0000(x)| <∞.

(33)

However, it is difficult, if not impossible, to derive more primitive generalconditions for (33). On the other hand, suppose that:

Assumption 6.2. In addition to the conditions in Assumption 6.1 the fol-lowing conditions hold:(1)

R∞−∞ x

4pf(x)dx <∞;

(2) The set {x ∈ R : η0000(x) = 0} is finite, where η(x) =pf(x).

33

Then it follows similar to Lemma 6.2 that

lim|x|→∞ |x|5η(x) = 0, lim|x|→∞ x6|η0(x)| = 0,lim|x|→∞ |x7η00(x)| = 0, lim|x|→∞ x8|η000(x)| = 0,lim|x|→∞ |x9η0000(x)| = 0,

henceϕ0000(0) = ϕ0000(1) = 0. (34)

Part (2) of Assumption 6.2 is not a big deal, but admittedly, part (1) is toostrong a condition for

R 10ϕ0000(u)2du <∞. Nevertheless, I will adopt Assump-

tion 6.2, as it holds for most densities considered in parametric econometricmodels.Summarizing, the following results hold.

Theorem 6.1. Under Assumptions 6.1 and 6.2 the function ϕ(u) =ph(u)

defined in (21) satisfies the tail conditions (30) and (34) Consequently, theresults in Theorem 5.1 carry over to the density h(u) defined by (20).

7 Densities on R and their derivativesAs have been shown, every density f(x) with support R satisfying the con-ditions in Assumptions 6.1 and 6.2 can be written as

f(x) =h(0.5 + π−1 arctan(x))

π (1 + x2), (35)

with

h(u) = π¡1 + tan2(π(u− 0.5))¢ f (tan(π(u− 0.5)))

≡¡1 +

P∞m=1 δm

√2 cos(mπu)

¢21 +

P∞m=1 δ

2m

= h(u|δ), say, (36)

where δ = {δm}∞m=1, with the δm’s defined by (3), is unique and satisfy thecondition

P∞m=1m

3|δm| <∞. Hence, we can write f(x) = f(x|δ), where

f(x|δ) ≡ h(0.5 + π−1 arctan(x)|δ)π (1 + x2)

≡¡1 +

P∞m=1 δm

√2 cos(mπ/2 +m. arctan(x))

¢2π (1 + x2) (1 +

P∞k=1 δ

2k)

. (37)

34

The SNP version hn(u) of h(u) can now be written as

hn(u) = h(u|πnδ),where πn is the truncation operator, i.e.,

Definition 7.1. The operator πn applied to δ = {δm}∞m=1 as πnδ sets all theδm’s for m > n to zeros.

Similarly, the SNP version fn(x) of f(x) can be written as

fn(x) = f(x|πnδ).It follows now similar to Theorems 5.1 and 6.1 that the following results

hold.

Theorem 7.1. Every density f(x) with support R satisfying Assumptions6.1 and 6.2 has the exact and uniform cosine series representations f(x) ≡f(x|δ) with f(x|δ) defined by (37), where δ = {δm}∞m=1 is unique and iscontained in the infinite-dimensional parameter space

∆3 =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| <∞). (38)

Moreover, with πn the truncation operator, we have

supx∈R

|f(x|δ)− f(x|πnδ)| = o(n−7/2), (39)

supx∈R

|f 0(x|δ)− f 0(x|πnδ)| = o(n−5/2),

supx∈R

|f 00(x|δ)− f 00(x|πnδ)| = o(n−3/2),

supx∈R

|f 000(x|δ)− f 000(x|πnδ)| = o(n−1/2),

where the derivatives are to x. Consequently, f 0(x) ≡ f 0(x|δ), f 00(x) ≡f 00(x|δ) and f 000(x) ≡ f 000(x|δ), respectively, exactly and uniformly on R.Furthermore, denoting the c.d.f. of f(x|δ) by F (x|δ), it follows from (39)that

supx∈R

|F (x|δ)− F (x|πnδ)| = o(n−7/2).

35

Remark 7.1. It can be shown by (3), (21), the change of variable x =tan(π(u− 0.5)), and the well-known sine-cosine formulas that for m ∈ N,

δ2m =√2(−1)m

R∞−∞ cos (2m arctan(x))

√f(x)√1+x2

dxR∞−∞

√f(x)√1+x2

dx,

δ2m−1 =√2(−1)m

R∞−∞ sin ((2m− 1) arctan(x))

√f(x)√1+x2

dxR∞−∞

√f(x)√1+x2

dx.

Hence, if f(x) is symmetric around zero then δ2m−1 = 0.

Remark 7.2. A natural norm on the space ∆3 in (38) is

||δ||3 =∞Xm=1

m3|δm| (40)

for δ = {δm}∞m=1. Endowed with this norm and associated metric ||δ1− δ2||3the space ∆3 becomes a Banach space,5 because it is not hard to verify thatevery Cauchy sequence in ∆3 converges to a limit in ∆3.

The results in Theorem 7.1 play a key role in deriving asymptotic normal-ity results for the sieve estimator of the Euclidean parameter vector in SNPmodels where these Euclidean parameters enter as argument in an unknowndistribution function or density function in the form of a linear combinationof covariates, as is the case for the SNP discrete choice model considered inBierens (2014b). Similar to Bierens (2014b) we also need to establish thecontinuity of f(x|δ), f 0(x|δ) and f 00(x|δ) in δ as well as the continuity in δof their first and second partial derivatives to the elements of δ = {δm}∞m=1.Due to the relation between h(u|δ) and f(x|δ) in (37), it suffices to verifythese continuity properties for h(u|δ), which I have already done in Bierens(2014b, Lemma 5.1), restated here as Lemma 7.1.

5A Banach space is a vector space endowed with a norm and associated metric suchthat every Cauchy sequence in this space takes a limit in this space. The differencewith a Hilbert space is that this norm is defined directly rather than on the basis of aninnerproduct.

36

Lemma 7.1. Let δ = {δm}∞m=1 and δ∗ = {δ∗,m}∞m=1 be sequences satisfying||δ||` =

P∞m=1m

`|δm| < ∞ and ||δ∗ − δ||` < 1 for some ` ∈ N, and denoteh(k)(u|δ) = dkh(u|δ)/(du)k, h(0)(u|δ) = h(u|δ), where the latter is defined by(36). Moreover, denote

∇mh(k)(u|δ∗) = ∂h(k)(u|δ)/∂δm¯̄δ=δ∗

, (41)

∇m1,m2h(k)(u|δ∗) = ∂2h(k)(u|δ)/(∂δm1∂δm2)

¯̄δ=δ∗

. (42)

Then for k = 0, 1, ..., ` and m,m1,m2 ∈ N,sup0≤u≤1

¯̄h(k)(u|δ)¯̄ < Ck,

sup0≤u≤1

¯̄∇mh(k)(u|δ)¯̄ < Ck.mk,

sup0≤u≤1

¯̄∇m1,m2h(k)(u|δ)¯̄ < Ck.mk

1mk2,

sup0≤u≤1

¯̄h(k)(u|δ∗)− h(k)(u|δ)

¯̄< Ck.||δ∗ − δ||k,

sup0≤u≤1

¯̄∇mh(k)(u|δ∗)−∇mh(k)(u|δ)¯̄ < Ck.mk.||δ∗ − δ||k,

sup0≤u≤1

¯̄∇m1,m2h(k)(u|δ∗)−∇m1,m2h

(k)(u|δ)¯̄ < Ck.mk1m

k2.||δ∗ − δ||k.

where the Ck’s are constants depending on k and the norm ||δ||k =P∞

m=1mk|δm|

for k = 0, 1, ..., ` only

Proof. See Lemma 5.1 in Bierens (2014b). The proof of this lemma is givenin a separate appendix to Bierens (2014b), which is reprinted in Bierens(2017, Chapter 10).

As said before, the results in Lemma 7.1 carry over to f(x|δ) definedby (37), for ` = 3, simply by replacing h(u|δ) with f(x|δ) and ” sup0≤u≤1 ”with ” supx∈R ”. In particular, Theorem 7.1 can now be augmented with thefollowing results.

Theorem 7.2. Under the conditions of Theorem 7.1, and with the derivativeoperations ∇m and ∇m1,m2 defined similar to (41) and (42), respectively, wehave for any pair δ, δ∗ ∈ ∆3 satisfying ||δ − δ∗||3 < 1,

supx∈R

|f(x|δ∗)− f(x|δ)| < C0.||δ∗ − δ||0

37

supx∈R

|f 0(x|δ∗)− f 0(x|δ)| < C1.||δ∗ − δ||1,supx∈R

|f 00(x|δ∗)− f 00(x|δ)| < C2.||δ∗ − δ||2supx∈R

|f 000(x|δ∗)− f 000(x|δ)| < C3.||δ∗ − δ||3,

supx∈R

|∇mf(x|δ∗)−∇mf(x|δ)| < C0.||δ∗ − δ||0,supx∈R

|∇mf 0(x|δ∗)−∇mf 0(x|δ)| < C1.m.||δ∗ − δ||1supx∈R

|∇mf 00(x|δ∗)−∇mf 00(x|δ)| < C2.m2||δ∗ − δ||2

supx∈R

|∇mf 000(x|δ∗)−∇mf 000(x|δ)| < C3.m3||δ∗ − δ||3,

and

supx∈R

|∇m1,m2f(x|δ∗)−∇m1,m2f(x|δ)| < C0.||δ∗ − δ||0,supx∈R

|∇m1,m2f0(x|δ∗)−∇m1,m2f

0(x|δ)| < C1.m1m2||δ∗ − δ||1supx∈R

|∇m1,m2f00(x|δ∗)−∇m1,m2f

00(x|δ)| < C2.m21m

22||δ∗ − δ||2

supx∈R

|∇m1,m2f000(x|δ∗)−∇m1,m2f

000(x|δ)| < C3.m31m

32||δ∗ − δ||3,

where the Ck’s for k = 0, 1, 2, 3 are constants depending on ||δ||3.

8 A compact infinite-dimensional parameterspace

A standard assumption for nonlinear parametric models is that the true pa-rameter vector involved is contained in the interior of a given compact set.This assumption is crucial for deriving the consistency and asymptotic nor-mality of the parameter estimators involved. In SNP models the parametersinvolved are infinite-dimensional, and in general infinite-dimensional compactsets are extremely small. Therefore, in the SNP literature [see for exampleChen (2007)] compactness assumptions are usually avoided.

38

However, in the case (38) it is possible to construct a compact subset of∆3, namely

∆3(M) =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| ≤M)

(43)

for a given constant M > 0, endowed with the same norm ||δ||3 and associ-ated metric ||δ1 − δ2||3 as before. Admittedly, also ∆3(M) is small relative∆3, but as long as M is chosen larger than the norm ||δ0||3 of the true pa-rameter δ0, so that then δ0 is contained in the interior of ∆3(M), this spacesuffices as the infinite dimensional parameter space. Thus, the problem howto chooseM in this case is similar to the problem how to choose the compactparameter space in the case of nonlinear parametric models.The compactness of (43) follows from the following more general theorem.

Theorem 8.1. For ` > 0.5 and M > 0, denote

∆`(M) =

(δ = {δm}∞m=1 :

∞Xm=1

m`|δm| ≤M). (44)

(a) Endow this space with the norm ||δ||` =P∞

m=1m`|δm| and associated

metric ||δ1−δ2||`. Then ∆`(M) is compact with respect to the latter metric.(b) Moreover, denoting the pseudo-Euclidean norm by ||δ|| = pP∞

m=1 δ2m,

with associated pseudo-Euclidean metric ||δ1−δ2||, the space ∆`(M) is com-pact with respect to the pseudo-Euclidean metric as well.

This result is in essence a combination of Lemmas 3.1 and 3.2 in Bierens(2017, pp. 631-632), where part (b) corresponds to Lemma 3.1 and part(a) corresponds to Lemma 3.2. However, the proof of the latter lemma wasincorrect. The correct proof will be given in the Appendix.

9 An application to the SNP fractional indexregression model

9.1 The model

Let Y be a fractional dependent variable, i.e., Pr[Y ∈ (0, 1)] = 1, and letX ∈ Rd a vector of covariates. The SNP fractional index regression model

39

assumes that for some parameter vector θ0 and some unknown distributionfunction F0(x) on R,

E[Y |X] = E[Y |θ00X] = F0(θ00X).Note however that θ0 and F0 are not unique because for any constant

c 6= 0, E[Y |θ00X] = E[Y |c.θ00X] a.s., so that without loss of generality wemay normalize θ0 to ||θ0|| = 1, for example, or set one of the components ofθ0 equal to 1 or −1. Alternatively, one can achieve identification by imposingquantile conditions on F0, as in Bierens (2014b), but it is easier to restrictθ0 rather than F0.Similarly, we cannot allow a constant component in X, because for any

constant c 6= 0, E[Y |θ00X] = E[Y |c + θ00X] a.s. Moreover, if all the compo-nents of X are discrete then there exists multiple distinct θ0’s, and possiblyuncountable many θ0’s, such that E[Y |X] = E[Y |θ00X] a.s., even after nor-malization. See Bierens and Hartog (1988).Thus, at least one component of X needs to be continuously distributed

in order for the model involved to be identifiable, and such a componentneeds to have a nonzero coefficient and support R.As to the normalization of θ0, I will choose the following options. In

the case d ≥ 2, partition X as X = (X1,X02)0 and normalize θ0 as θ0 =

(1,β00)0, where conditional on X2, X1 is absolutely continuously distributed

with support R, whereas in the case d = 1, θ0 = 1 and X itself has anabsolutely continuously distributed with support R.Summarizing, the SNP fractional index regression model takes the form

Pr[Y ∈ (0, 1)] = 1, withE[Y |X] =

½F0(X1 + β00X2) a.s. if X = (X1, X

02)0 ∈ Rd, d ≥ 2,

F0(X) a.s. if X ∈ R, (45)

where in the case d ≥ 2, β0 ∈ Rd−1 is the Euclidean parameter, and theabsolutely continuous distribution function F0 with support R acts as thenon-Euclidean parameter.The model involved assumes that X1 has a positive effect on E[Y |X]. If

not, simply replace X1 by −X1, and similarly in the case X = X1.Next, suppose that

Assumption 9.1. The density f0 of F0 satisfies the conditions of Theorem7.1.

40

Then there exists a unique infinite-dimensional parameter δ0 = {δ0,m}∞m=1∈ ∆3 such that

F0(x) = F (x|δ0) =Z x

−∞f(z|δ0)dz

with f(x|δ) defined by (37). Thus, we can write the SNP fractional regressionmodel in the case d ≥ 2 as

Y = F (X1 + β00X2|δ0) + U, where E[U |X1,X2] = 0 a.s. (46)

9.2 Semi-nonparametric identification

In SNP modeling, the first question that needs to be answered is: Is themodel involved identified? In the present case the answer is Yes, providedthat the following conditions hold.

Assumption 9.2. In the case d ≥ 2, the conditional distribution of X1 givenX2 in the partition X = (X1, X

02)0 is absolutely continuous with support R.

Moreover E[||X2||2] <∞ and det(Var(X2)) > 0. In the case d = 1, X itselfhas an absolutely continuous distribution with support R.

It follows now similar to Lemma 2.1 in Bierens (2014b) and its proof [seeBierens (2017, pp. 585-586)] that if in the case d ≥ 2 there exist an alternativeparameter vector β ∈ Rd−1 and/or an alternative absolutely continuous c.d.f.F with support R such that

F (X1 + β0X2) = F0(X1 + β00X2) a.s., (47)

then under Assumption 9.2, (47) implies that β = β0 and F (x) = F0(x) forall x ∈ R. In other words:

Theorem 9.1. Under Assumption 9.2 the SNP fractional index regressionmodel (45) is semi-nonparametrically identified. Consequently, the parame-ters β0 and δ0 in the SNP fractional index regression model (46) are thenidentified as the unique least square solution

(β0, δ0) = arg min

β∈Rd−1,δ∈∆3

Q(β, δ),

whereQ(β, δ) = E

h(Y − F (X1 + β0X2|δ))2

i. (48)

41

9.3 Sieve least squares estimation: Strong consistency

From now on I will only consider the case d ≥ 2.Given that

Assumption 9.3. We observe a random sample {(Yj, X1,j,X2,j)}Nj=1 from(Y,X1,X2), with (X1,X 0

2)0 ∈ R×Rd−1, d ≥ 2,

the empirical counter-part of the expectation in (48) is

bQN(β, δ) = 1

N

NXj=1

(Yj − F (X1,j + β0X2,j|δ))2 . (49)

Moreover, as motivated in section 8, I will confine (β, δ) toB×∆3(M), where

Assumption 9.4. B is a given compact subset of Rd−1 with respect to theEuclidean metric ||β1− β2||, containing β0 in its interior, and ∆3(M) is thecompact set with respect to the metric ||δ1 − δ2||3 defined in (43), where Mis chosen so large that M > ||δ0||3.

In first instance one may think of mimicking the standard nonlinear leastsquares approach by using

(bβ,bδ) = arg min(β,δ)∈B×∆3(M)

bQN(β, δ).as an estimator of (β0, δ

0). However, due to the fact that ∆3(M) is infinite-dimensional, this solution is not unique, and in general none of these solutionare consistent.The cure for this problem is sieve estimation, as follows. Denote for

n ∈ N,

∆3,n(M) =

(δ = {δm}∞m=1 :

nXm=1

m3|δm| ≤M, δm = 0 for m > n),

which is called a sieve space, and the collection {∆3,n(M)}∞n=1 is called thesieve, satisfying ∪∞n=1∆3,n(M) = ∆3(M). Now the idea of sieve estimation,proposed by Grenander (1981), is in the present case to use

(bβnN ,bδnN ) = arg min(β,δ)∈B×∆3,nN

(M)

bQN(β, δ). (50)

42

as an estimator of (β0, δ0), where nN is a subsequence of the sample size N

satisfyinglimN→∞

nN =∞, limN→∞

nN/N = 0.

Then the following strong consistency results hold.

Theorem 9.2. Under Assumptions 9.1-9.4, bβnN a.s.→ β0 and ||bδnN−δ0||3 a.s.→ 0,hence

supx∈R

¯̄̄F0(x)− F (x|bδnN )¯̄̄ a.s.→ 0,

supx∈R

¯̄̄f0(x)− f(x|bδnN )¯̄̄ a.s.→ 0,

supx∈R

¯̄̄f 00(x)− f 0(x|bδnN )¯̄̄ a.s.→ 0.


9.4 Sieve least squares estimation: Asymptotic nor-mality

The next step is to set forth conditions such that√N(bβnN−β0) d→ Nd−1(0,Σ),

using the approach in Bierens (2014b). This approach mimics the standardfinite-dimensional approach on the basis of the mean value theorem for thefirst order conditions, except that instead of inverting the matrix of secondderivatives and then selecting the upper-left (d − 1) × (d − 1) part as theestimate of the asymptotic variance matrix of

√N(bβnN − β0), the latter is

singled out via projections.First note that under the conditions of Theorem 9.2, and by (50), (bβnN ,bδnN )

is an interior point of B×∆3,nN (M) with probability converging to 1 asN → ∞. Consequently, it follows from the first-order conditions for (50)that, with probability converging to 1,

√N ∂ bQN(β, δ)/∂βi ¯̄̄

(β,δ)=(bβnN ,bδnN ) = 0 for i = 1, 2, ..., d− 1,√N ∂ bQN(β, δ)/∂δm ¯̄̄

(β,δ)=(bβnN ,bδnN ) = 0 for m = 1, 2, ..., nN ,

jointly, where βi is component i of β = (β1, ..,βd−1)0.

43

Then by the mean value expansion around (β0,πnNδ0) we have, for i =

1, 2, ..., d− 1,

op(1) =√N ∂ bQN(β, δ)/∂βi ¯̄̄

(β,δ)=(bβnN ,bδnN )=√N ∂ bQN(β, δ)/∂βi ¯̄̄

(β,δ)=(β0,πnN δ0)

+d−1Xs=1

µ∂2 bQN(β, δ)/(∂βi∂βs)¯̄̄

(β,δ)=(eβ(i)nN ,eδ(i)nN )¶√

N(bβs,nN − βs,0)

+

nNXm=1

µ∂2 bQN(β, δ)/(∂βi∂δm)¯̄̄

(β,δ)=(eβ(i)nN ,eδ(i)nN )¶√

N(bδm,nN − δ0,m),

(51)

where (eβ(i)nN ,eδ(i)nN ) is a mean value, i.e., for some λi,N ∈ [0, 1],(eβ(i)nN ,eδ(i)nN ) = (β0,πnNδ0) + λi,N(bβnN − β0,bδnN − πnNδ

0)

so that||eβ(i)nN − β0|| ≤ ||bβnN − β0||

and||eδ(i)nN − πnNδ

0||3 ≤ ||bδnN − πnNδ0||3.

Moreover, bβs,nN and βs,0 are components s of bβnN and β0 respectively, andbδm,nN and δ0,m are components m of bδnN and δ0, respectively.Similarly, for k = 1, 2, ..., nN ,

op(1) =√N ∂ bQN(β, δ)/∂δk ¯̄̄

(β,δ)=(bβnN ,bδnN )=√N ∂ bQN(β, δ)/∂δk ¯̄̄


+d−1Xs=1

µ∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄

(β,δ)=(bβ(k)nN,bδ(k)nN

)

¶√N(bβs,nN − βs,0)

+

nNXm=1

µ∂2 bQN(β, δ)/(∂δk∂δm)¯̄̄


)

¶√N(bδm,nN − δ0,k),

(52)

44

where the (bβ(k)nN ,bδ(k)nN )’s are again mean values satisfying||bβ(k)nN − β0|| ≤ ||bβnN − β0||

and||bδ(k)nN − πnNδ

0||3 ≤ ||bδnN − πnNδ0||3.

In order to derive the asymptotic normality of√N(bβnN − β0) we have

to get rid of the terms involving√N(bδm,nN − δ0,k) in (51) and (52). As

said before, the standard finite-dimensional approach to invert the (d− 1 +nN)× (d−1+nN) matrix of the second derivatives involved is not applicablebecause nN → ∞. Therefore, in Bierens (2014b) I proposed the followingtrick to get around this problem.Convert the d− 1+ nN mean value equations to a single system of mean

value equations in function form, using an orthogonal sequence of weightfunctions ηk(u) for k = 1, 2, ..., d−1+nN on [0, 1]. In Bierens (2014b) I haveproposed to choose

ηk(u) = 2−k√2 cos(kπu), k ∈ N, (53)

which I will adopt here as well. Then the mean value equations can bewritten as cWN(u) = bVN(u)− bZN(u)

+d−1Xs=1

bcs,N(u)√N(bβs,nN − βs,0)

+

nNXm=1

bbm,N(u)√N(bδm,nN − δ0,k), (54)

where

cWN(u) =√N

d−1Xi=1

µ∂ bQN(β, δ)/∂βi ¯̄̄

(β,δ)=(bβnN ,bδnN )¶ηi(u)

+√N

nNXk=1

µ∂ bQN(β, δ)/∂δk ¯̄̄

(β,δ)=(bβnN ,bδnN )¶ηd−1+k(u),

bVN(u) =√N

d−1Xi=1



45

− ∂ bQN(β, δ)/∂βi ¯̄̄(β,δ)=(β0,δ

0)

¶ηi(u)

+√N

nNXk=1



− ∂ bQN(β, δ)/∂δk ¯̄̄(β,δ)=(β0,δ

0)

¶ηd−1+k(u),

bZN(u) = −√N

d−1Xi=1


(β,δ)=(β0,δ0)

¶ηi(u)

−√N

nNXk=1


(β,δ)=(β0,δ0)

¶ηd−1+k(u),

bbs,N(u) =d−1Xi=1


(β,δ)=(eβ(i)nN ,eδ(i)nN )¶ηi(u)

+

nNXk=1



)

¶ηd−1+k(u),

s = 1, 2, .., d− 1,

bcm,N(u) =d−1Xi=1

µ∂2 bQN(β, δ)/(∂βi∂δm)¯̄̄

(β,δ)=(eβ(i)nN ,eδ(i)nN )¶ηi(u)

+

nNXk=1

µ∂2 bQN(β, δ)/(∂δk∂δm)¯̄̄


)

¶ηd−1+k(u),

m = 1, 2, ..., nN .

It follows straightforwardly from (51) and (52) that

cWN(u) = op(1)

uniformly in u ∈ [0, 1]. Moreover, in the Appendix it will be shown that

Lemma 9.1. With nN ∝ N1/6 or faster, bVN(u) = op(1) uniformly in u ∈[0, 1],

where here and in the lemmas below in this section it is assumed that theconditions of Theorem 9.2 hold and the weight functions ηk are defined by

46

(53). Then (54) reads

op(1) = − bZN(u) + d−1Xs=1

bbs,N(u)√N(bβs,nN − βs,0)

+

nNXm=1

bcm,N(u)√N(bδm,nN − δ0,k)

= − bZN(u) +bbN(u)0√N(bβnN − β0)

+

nNXm=1

bcm,N(u)√N(bδm,nN − δ0,k), (55)

where bbN(u)0 = (bb1,N(u), ...,bbd−1,N(u)).In order to get rid of the bcm,N(u)’s in (55), project each bbs,N(u) on the

space spanned by {bcm,N(u)}nNm=1, and denote the projection residuals involvedby bas,N(u). Then baN(u) = (ba1,N(u), ...,bad−1,N(u))0 is the vector of residualsof the projection of bbN(u) on the Hilbert space spanned by {bcm,N(u)}nNm=1, sothatZ 1

0

baN(u)bcm,N(u)du = 0,

Z 1

0

baN(u)bbN(u)0du = Z 1

0

baN(u)baN(u)0du,Z 1

0

baN(u)0baN(u)du ≤ Z 1

0

bbN(u)0bbN(u)du,hence Z 1

0

baN(u)du× op(1)=

Z 1

0

baN(u) bZN(u)du+µZ 1

0

baN(u)baN(u)0du¶√N(bβnN − β0). (56)

Next, it will be shown in the Appendix that similar to Bierens (2014b)the following results hold.

Lemma 9.2. For s = 1, 2, ..., d− 1,

p limN→∞

sup0≤u≤1

|bbs,N(u)− bs(u)| = 0,47

where

bs(u) =d−1Xi=1

³∂2Q(β, δ)/(∂βi∂βs)

¯̄β=β0,δ=δ

0

´ηi(u)

+∞Xk=1

³∂2Q(β, δ)/(∂δk∂βs)

¯̄β=β0,δ=δ

0

´ηd−1+k(u),

and for m ∈ N,p limN→∞

sup0≤u≤1

|bcm,N(u)− cm(u)| = 0,where

cm(u) =d−1Xi=1

³∂2Q(β, δ)/(∂βi∂δm)

¯̄β=β0,δ=δ

0

´ηi(u)

+∞Xk=1

³∂2Q(β, δ)/(∂δk∂δm)

¯̄β=β0,δ=δ

0

´ηd−1+k(u).

Hence, the residual bas,N(u) of the projection of bbs,N(u) on the space spannedby {bcm,N(u)}nNm=1 satisfies

p limN→∞

Z 1

0

(bas,N(u)− as(u))2du = 0, (57)

where as(u) is the residual of the projection of bs(u) on the space spanned by{cm(u)}∞m=1.

Since as(u) ∈ L2(0, 1), henceR 10a(u)0a(u)du <∞, where

a(u) = (a1(u), ..., ad−1(u))0,

it follows thatR 10baN(u)du = Op(1), hence R 10 baN(u)du×op(1) = op(1) in (56).

Moreover, given that

Assumption 9.5. det³R 1

0a(u)a(u)0du

´> 0,

it follows that

p limN→∞

µZ 1

0

baN(u)baN(u)0du¶−1 = µZ 1

0

a(u)a(u)0du¶−1

.

48

Furthermore, the following result holds.

Lemma 9.3. bZN(u) converges weakly to a zero-mean Gaussian process Z(u),with covariance function

Γ(u1, u2) = 4∞Xk=1

∞Xm=1

E[U2ρk(X)ρm(X)]ηk(u1)ηm(u2),

whereρi(X) = f(X1 + β00X2|δ0)Xi,2, i = 1, 2, ..., d− 1,ρd−1+k(X) = ∇kF (X1 + β00X2|δ0), k ∈ N. (58)

Then (56) now reads,

√N(bβnN − β0) =

µZ 1

0

a(u)a(u)0du¶−1 Z 1

0

a(u)Z(u)du+ op(1).

Hence, sinceR 10a(u)Z(u)du is zero-mean normal, the following result holds.

Theorem 9.3. Let the conditions of Theorem 9.2 hold. Choose nN ∝ N1/6

or faster and choose the weight functions ηk as in (53). Then under Assump-tion 9.5,

√N(bβnN − β0)

d→ Nd−1(0,Σ),

where

Σ =

µZ 1

0

a(u)a(u)0du¶−1 Z 1

0

Z 1

0

a(u1)Γ(u1, u2)a(u2)0du1du2

×µZ 1

0

a(u)a(u)0du¶−1

.

This result depends crucially on Assumption 9.5, i.e.,

det

µZ 1

0

a(u)a(u)0du¶> 0.

If the latter does not hold then for some nonzero vector γ ∈ Rd−1, γ0b(u) ∈span({ck(u)}∞k=1), the latter being the Hilbert space spanned by the sequence{ck(u)}∞k=1, which is a subspace of the Hilbert space L2(0, 1).

49

Let an(u) be the residual of the projection of b(u) on span({ck(u)}nk=1).Then by a well-known projection result in L2(0, 1) we have

limn→∞

Z 1

0

(an(u)− a(u))0(an(u)− a(u))du = 0,

hence

limn→∞

Z 1

0

an(u)an(u)0du =

Z 1

0

a(u)a(u)0du.

Therefore, a necessary condition for Assumption 9.5 is that for all n ∈ N,det

³R 10an(u)an(u)

0du´> 0. As shown in Bierens (2014b), the latter condi-

tion holds if and only if for all n ∈ N the (d − 1 + n) × (d − 1 + n) matrixof second derivatives to β and δ(n) = (δ1, ..., δn)

0 of Q(β, δ) in (β0, δ0) isnonsingular. Thus, denoting

Bd−1+n =

Ã∂2Q(β,δ)∂β∂β0

∂2Q(β,δ)

∂β0∂δ(n)∂2Q(β,δ)

∂δ(n)0∂β∂2Q(β,δ)

∂δ(n)∂δ(n)0

!¯̄̄̄¯(β,δ)=(β0,δ

0)

,

it follows that

Lemma 9.4. Assumption 9.5 implies that det(Bd−1+n) > 0 for all n ∈ N.

Finally, it follows now from Bierens (2014b, Theorem 6.2) that

Theorem 9.4. The variance matrix Σ in Theorem 9.3 takes the form Σ =limn→∞Σn, where

Σn = (Id−1, Od−1,d−1+n)B−1d−1+nCd−1+nB−1d−1+n

µId−1Od−1+n,d−1

¶,

with

Cd−1+n = 4E

⎡⎢⎣U2⎛⎜⎝ ρ1(X)

...ρd−1+n(X)

⎞⎟⎠ (ρ1(X), ..., ρd−1+n(X))⎤⎥⎦ ,

where the ρk(X)’s are defined in (58).

50

Note that Cd−1+n is the variance matrix of the (d − 1 + n)-dimensionalscore vector

√N

⎛⎜⎜⎜⎜⎜⎜⎜⎝

∂ bQN(β, δ)/∂β0 ¯̄̄(β,δ)=(β0,δ

0)

∂ bQN(β, δ)/∂δ1 ¯̄̄(β,δ)=(β0,δ

0)...

∂ bQN(β, δ)/∂δn ¯̄̄(β,δ)=(β0,δ

0)

⎞⎟⎟⎟⎟⎟⎟⎟⎠.

Moreover, as I have shown in Bierens (2014b), the existence of the limitΣ = limn→∞Σn is implied by Assumption 9.5, which is plausible but difficult,if not impossible, to verify.Finally, in order to use the results in Theorems 9.3 and 9.4 for inference

on β0 we need a consistent estimator of Σ, which can be derived as follows.Suppose in first instance that δ0 is finite-dimensional, so that δ0 = πnδ

0 forsome fixed n ∈ N. Then the nonlinear least squares (NLLS) problem involvedbecomes fully parametric, hence under standard NLLS conditions,6

√N(bβn − β0)

d→ Nd−1(0,Σn),

where Σn is the came as in Theorem 9.4. In this case Σn can be consistentlyestimated by

bΣn = (Id−1, Od−1,n−d+1) bB−1d−1+n bCd−1+n bB−1d−1+nµ Id−1On−d+1,d−1

¶,

where bBd−1+n and bCd−1+n are the usual consistent estimators of Bd−1+n andCd−1+n, respectively, in the fixed n case. Then it is not hard to verify thatunder the conditions of Theorems 9.3 and 9.4,

p limN→∞

bΣnN = Σ.

10 Concluding remarks

In this paper I have set forth conditions such that SNP densities on [0, 1] andR based on the cosine sequence, and their first, second and third derivatives,are uniform convergent. Moreover, I have shown that without much loss

6See for example Jennrich (1969).

51

of generality the infinite-dimensional parameter involved can be confinedto a compact infinite-dimensional parameter space. The usefulness of theseresults have been demonstrated by an application to the SNP fractional indexregression model.Note that these results are not directly applicable to SNP maximum like-

lihood models, like the SNP discrete choice model considered in Bierens(2014b). In the latter case it was assumed that, with G the logistic dis-tribution function with density g, and f0 the density of the c.d.f. F0 in theSNP discrete choice model Pr[Y = 1|X] = F0(X

0θ0), the density h0(u) =f0(G

−1(u))/g(G−1(u)) satisfies h0(0) > 0 and h0(1) > 0. The latter condi-tions enabled me to derive the asymptotic normality of the sieve estimatorof θ0, similar to Theorems 9.3 and 9.4. However, under the uniform con-vergence conditions in the present paper, with G the c.d.f. of the standardCauchy distribution, we have h0(0) = h0(1) = 0, which would make thisasymptotic normality proof very complicated. See Bierens (2014b, RemarkC). On the other hand, the discrete choice model involved can be estimatedby sieve nonlinear least squares under similar conditions as in section 9, andwith similar results as in Theorems 9.3 and 9.4, but at the expense of loss ofefficiency.

11 Appendix: Proofs

11.1 Lemma 4.1

For K ∈ N and c > 1/2, we have by Lyapunov’s inequality,Pn+Km=n+1m

−c|αm|Pk+Kk=n+1 k

−2c =n+KXm=n+1

Ãm−2cPk+K

k=n+1 k−2c

!mc|αm|

≤vuut n+KX

m=n+1

Ãm−2cPk+K

k=n+1 k−2c

!m2cα2m

=

qPn+Km=n+1 α

2mqPk+K

k=n+1 k−2c.

52

Letting K →∞ it follows that

∞Xm=n+1

m−c|αm| ≤vuut ∞X

m=n+1

α2m

vuut ∞Xk=n+1

k−2c.

SinceP∞

k=n+1 k−2c ≤ P∞

k=n+1

R kk−1 x

−2cdx =R∞nx−2cdx = 1

2c−1n1−2c andP∞

m=n+1 α2m = o(1), Lemma 4.1 follows.

11.2 Lemma 6.1

There exists an x0 > 0 such that either ψ(x) > 0 or ψ(x) < 0 for allx > x0. Without loss of generality we may assume that the former case ap-plies. Thus, ψ(x) > 0 for all x > x0. Now suppose that for someM ∈ (0,∞),limy→∞ supx≥y xψ(x) ≥M. Then there exists a y0 > x0 such that for all x ≥y0, xψ(x) > M/2, which implies that

R∞y0

ψ(x)dx ≥ (M/2) R∞y0x−1dx = ∞.

However, the latter contradictsR∞−∞ |ψ(x)|dx <∞. Thus, limy→∞ supx≥y xψ(x)

= 0, which implies that limx→∞ xψ(x) = 0. By a similar argument it can beshown that limx→−∞ xψ(x) = 0.

11.3 Theorem 8.1

The result of Lemma 3.1 in Bierens (2017, p. 631), which is in essence part(b) of Theorem 8.1, hinges on Lemma A.1 in Bierens (2008), reprinted inBierens (2017, Chapter 7, p. 455). In particular, note that

∆`(M) ⊂ X∞m=1[−M.m−`,M.m−`] = ∆X` (M), (59)

say, where by Lemma A.1 in Bierens (2008) the latter set is compact with re-spect to the pseudo Euclidean metric ||δ1−δ2||, provided that

P∞m=1m

−2` <∞. The latter condition holds if and only if ` > 0.5.In Lemma 3.1 in Bierens (2017, p. 631) I have used the fact that a

set is compact if and only if it is totally bounded and complete. See Roy-den (1968, Proposition 15, p. 164). Since ∆X

` (M) is compact and there-fore totally bounded, it follows trivially from (59) that ∆`(M) is totallybounded. Completeness of ∆`(M) means that every Cauchy sequence in∆`(M) takes a limit in ∆`(M). This completeness proof is easy. See Bierens(2017, p. 632). Hence, ∆`(M) is compact with respect to the pseudo Euclid-ean metric ||δ1 − δ2||, which proves part (b) of Theorem 8.1.

53

To prove part (a), I will employ another criterion for compactness, namelythat a set is compact if and only is it is sequentially compact. See Royden(1968, Corollary 14, p.163). Sequential compactness means that every se-quence in the set involved has a convergent subsequence. Since by part(b) of Theorem 8.1 ∆`(M) is compact with respect to the pseudo Euclid-ean metric ||δ1 − δ2||, it follows therefore that for an arbitrary sequenceδn = {δn,m}∞m=1 ∈ Ξ`(M) there exist a subsequence nk and an elementδ = {δm}∞m=1 ∈ Ξ`(M) such that limk→∞ ||δnk − δ|| = 0, which triviallyimplies that

limk→∞

supm∈N

|δnk,m − δm| = 0.

To show that limk→∞ ||δnk − δ||` = 0 as well, note that for any L ∈ N wehave

lim supk→∞

∞Xm=1

m`|δnk,m − δm| ≤ lim supk→∞

LXm=1

m`|δnk,m − δm|

+ lim supk→∞

∞Xm=1+L

m`|δnk,m − δm|

= lim supk→∞

∞Xm=1+L

m`|δnk,m − δm|.

Clearly, the latter ”lim sup ” is invariant for L.Next, denote Sk(L) =

P∞m=1+Lm

`|δnk,m−δm| and let lim supk→∞ Sk(L) =η, which does not depend on L. Then by the definition of ”lim sup”,

η = infL∈N

µlim sup

k→∞Sk(L)

¶= inf

L∈Ninfs∈Nsupk≥s

Sk(L)

≤ infL∈N

supk≥s

Sk(L) for all s ∈ N.

Since supk≥s Sk(L) is decreasing in L we have

infL∈N

supk≥s

Sk(L) = supk≥s

Sk(∞) = supk≥s

limL→∞

Sk(L),

hence, for an arbitrary s ∈ N,

η ≤ supk≥s

limL→∞

Sk(L) = supk≥s

ÃlimL→∞

∞Xm=1+L

m`|δnk,m − δm|!= 0.

54

The latter follows from the fact thatP∞

m=1m`|δnk,m − δm| ≤ 2M. Conse-

quently, limk→∞ ||δnk − δ||` = 0 as well. This completes the proof of part(a) of Theorem 8.1, including the correction of the proof of Lemma 3.2 inBierens (2017, p. 632-633).

11.4 Theorem 9.2

First note that, by Theorem 7.2, for each β ∈ B, F (X1 + β0X2|δ) is a.s.continuous in δ ∈ ∆3(M) with respect to the metric ||δ1 − δ2||3, whereas,trivially, for each δ ∈ ∆3(M), F (X1 + β0X2|δ) is a.s. continuous in β withrespect to the Euclidean metric ||β1− β2||. Consequently, F (X1 + β0X2|δ) isa.s. continuous in (β, δ) ∈ B×∆3(M) with respect to the combined metric||β1 − β2|| + ||δ1 − δ2||3, for example, and so is (Y − F (X1 + β0X2|δ))2. Itfollows now straightforwardly from Jennrich’s (1969) uniform strong law of

large numbers that sup(β,δ)∈B×∆3(M)

¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄ a.s.→ 0 as N → ∞,and therefore

sup(β,δ)∈B×∆3,nN

(M)

¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄ a.s.→ 0 (60)

as well. Moreover, by the continuity of Q(β, δ) we have

limN→∞

Q(β0,πnNδ0) = Q(β0, δ

0). (61)

Next, observe that

0 ≤ Q(bβnN ,bδnN )−Q(β0, δ0)= Q(bβnN ,bδnN )− bQN(bβnN ,bδnN ) + bQN(bβnN ,bδnN )−Q(β0, δ0)≤ sup

(β,δ)∈B×∆3,nN(M)

¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄+ bQN(β0,πnNδ0)−Q(β0, δ0)≤ 2 sup

(β,δ)∈B×∆3,nN(M)

¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄+Q(β0,πnNδ0)−Q(β0, δ0)a.s.→ 0

where the last result follows from (60) and (61), hence

Q(bβnN ,bδnN ) a.s.→ Q(β0, δ0). (62)

55

Again, by the continuity of Q(β, δ) and the uniqueness of (β0, δ0), (62) im-

plies, similar to standard nonlinear regression, that bβnN a.s.→ β0 and ||bδnN −δ0||3 a.s.→ 0. The uniform a.s. convergence results for F0, f0 and f 00 follow nowfrom Theorem 7.2.

11.5 Lemma 9.1

Note that

∂ bQN(β, δ)/∂βi = −2 1N

NXj=1

(Yj − F (X1,j + β0X2,j|δ))f(X1,j + β0X2,j|δ)Xi,2,j

i = 1, 2, ..., d− 1, (63)

where Xi,2,j is component i of X2,j, and using the notation ∇k = ∂/∂δk as in(41), we have

∂ bQN(β, δ)/∂δk = −2 1N

NXj=1

(Yj − F (X1,j + β0X2,j|δ))∇kF (X1,j + β0X2,j|δ)

k = 1, 2, ..., nN (64)

Hence, by (53),

E

∙sup0≤u≤1

|bVN(u)|¸≤ 2√2

Ãd−1Xi−12−i!√

N

×E £¯̄(Y − F (X1 + β00X2|πnNδ0))f(X1 + β00X2|πnNδ0)−(Y − F (X1 + β00X2|δ0))f(X1 + β00X2|δ0)

¯̄.||X2||

¤+2√2.2−d+1

√N

nNXk=1

2−k

×E £¯̄(Y − F (X1 + β00X2|πnNδ0))∇kF (X1 + β00X2|πnNδ0)−(Y − F (X1 + β00X2|δ0))∇kF (X1 + β00X2|δ0)

¯̄¤Moreover,

E£¯̄(Y − F (X1 + β00X2|πnNδ0))f(X1 + β00X2|πnNδ0)

56

−(Y − F (X1 + β00X2|δ0))f(X1 + β00X2|δ0)¯̄.||X2||

¤≤ 2E [||X2||] . sup

x∈R

¯̄f(x|πnNδ0)− f(x|δ0)

¯̄+E [||X2||] sup

x∈Rf(x|δ0). sup

x∈R

¯̄F (x|δ0)− F (x|πnNδ0)

¯̄≤ E [||X2||]

µ2 + sup

x∈Rf(x|δ0)

¶supx∈R

¯̄f(x|πnNδ0)− f(x|δ0)

¯̄= o(n

−7/2N ),

where the latter follows from part (39) of Theorem 7.1. Similarly,

E£¯̄(Y − F (X1 + β00X2|πnNδ0))∇kF (X1 + β00X2|πnNδ0)

−(Y − F (X1 + β00X2|δ0))∇kF (X1 + β00X2|δ0)¯̄¤

≤ 2 supx∈R

¯̄∇kF (x|πnNδ0)−∇kF (x|δ0)¯̄+ sup

x∈R

¯̄∇kF (x|δ0)¯̄ . supx∈R

¯̄F (x|δ0)− F (x|πnNδ0)

¯̄≤ 2 sup

x∈R

¯̄∇kf(x|πnNδ0)−∇kf(x|δ0)¯̄+ sup

x∈R

¯̄∇kf(x|δ0)¯̄ . supx∈R

¯̄f(x|δ0)− f(x|πnNδ0)

¯̄≤ 2C0.||δ0 − πnNδ

0||0 + o(n−7/2N ).

where the latter follows from Theorems 7.1 and 7.2.Hence,

E

∙sup0≤u≤1

|bVN(u)|¸ = o(n−7/2nN

√N) +O

³√N ||δ0 − πnNδ

0||0´

= o(n−3N√N),

where the last equality follows from

||δ0 − πnδ0||0 =

∞Xm=n+1

|δ0,m| ≤ n−3∞X

m=n+1

m3|δ0,m| = o(n−3).

Thus, limN→∞Ehsup0≤u≤1 |bVN(u)|i = 0 if nN ∝ N1/6 or faster.

57

11.6 Lemma 9.2

Recall that for s = 1, ..., d− 1,

bbs,N(u) =d−1Xi=1


(β,δ)=(eβ(i)nN ,eδ(i)nN )¶2−i√2 cos(iπu)

+

nNXk=1



)

¶2−d+1−k

×√2 cos((d+ 1 + k)πu)where

∂2 bQN(β, δ)/∂βi∂βs= −2 1

N

NXj=1

(Yj − F (X1,j + β0X2,j|δ))f 0(X1,j + β0X2,j|δ)Xi,2,jXs,2,j

+21

N

NXj=1

(f(X1,j + β0X2,j|δ)))2Xi,2,jXs,2,j, i = 1, 2, ..., d− 1.

Hence by the uniform strong law of Jennrich (1969),

sup(β,δ)∈B×∆3(M)

¯̄̄∂2 bQN(β, δ)/(∂βi∂βs)−E h∂2 bQN(β, δ)/(∂βi∂βs)i¯̄̄ a.s.→ 0

and thus

∂2 bQN(β, δ)/(∂βi∂βs)¯̄̄(β,δ)=(eβ(i)nN ,eδ(i)nN )

a.s.→ Eh∂2 bQN(β, δ)/(∂βi∂βs)i¯̄̄

β=β0,δ=δ0

where

Eh∂2 bQN(β, δ)/(∂βi∂βs)i¯̄̄

β=β0,δ=δ0

= −2E[U.f 0(X1 + β00X2|δ0)Xi,2Xs,2] + 2Eh¡f(X1 + β00X2|δ0)

¢2Xi,2Xs,2

i= 2E

h¡f(X1 + β00X2|δ0)

¢2Xi,2Xs,2

i= ∂2Q(β, δ)/(∂βi∂βs)

¯̄β=β0,δ=δ

0 .

Note that the second equality follows from E[U |X] = 0 a.s. and the lastequality follows by dominated convergence.

58

Similarly,

∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄(β,δ)=(bβ(k)nN

,bδ(k)nN)

a.s.→ Eh∂2 bQN(β, δ)/(∂δk∂βs)i¯̄̄

β=β0,δ=δ0

where

Eh∂2 bQN(β, δ)/(∂δk∂βs)i¯̄̄

β=β0,δ=δ0

= 2E£(OkF (X1 + β00X2|δ0))f(X1 + β00X2|δ0)Xs,2

¤= ∂2Q(β, δ)/(∂δk∂βs)

¯̄β=β0,δ=δ

0 ,

where the differentiation operator Ok is defined similar to (41).Finally, it is easy to verify that uniformly in k ∈ N and s = 1, 2, .., d− 1,

∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄(β,δ)=(bβ(k)nN

,bδ(k)nN)= Op(1),

hence

∞Xk=n+1



)

¶2−d+1−k = Op

Ã ∞Xk=n+1

2−k!

= Op(2−n) = op(1).

It follows now straightforwardly that for s = 1, 2, ..., d− 1,p limN→∞

sup0≤u≤1

|bbs,N(u)− bs(u)| = 0,where

bs(u) =d−1Xi=1

³∂2Q(β, δ)/(∂βi∂βs)

¯̄β=β0,δ=δ

0

´2−i√2 cos(iπu)

+∞Xk=1

³∂2Q(β, δ)/(∂δk∂βs)

¯̄β=β0,δ=δ

0

´2−d+1−k

×√2 cos((d+ 1 + k)πu).

Along similar lines it can be shown that for m ∈ N,p limN→∞

sup0≤u≤1

|bcm,N(u)− cm(u)| = 0,59

where

cm(u) =d−1Xi=1

³∂2Q(β, δ)/(∂βi∂δm)

¯̄β=β0,δ=δ

0

´2−i√2 cos(iπu)

+∞Xk=1

³∂2Q(β, δ)/(∂δk∂δm)

¯̄β=β0,δ=δ

0

´2−d+1−k

×√2 cos((d+ 1 + k)πu).

The result (57) follows now from Lemma B.1 in Bierens (2014b).

11.7 Lemma 9.3

Recall that

bZN(u) = −√N

d−1Xi=1


(β,δ)=(β0,δ0)

¶2−i√2 cos(iπu)

−√N

nNXk=1


(β,δ)=(β0,δ0)

¶2−d+1−k

×√2 cos((d− 1 + k)πu)

By (63) and (64) we have

√N ∂ bQN(β, δ)/∂βi ¯̄̄

(β,δ)=(β0,δ0)= −2 1√

N

NXj=1

Ujf(X1,j + β00X2,j|δ0)Xi,2,j

i = 1, 2, ..., d− 1,

√N ∂ bQN(β, δ)/∂δk ¯̄̄

(β,δ)=(β0,δ0)= −2 1

N

NXj=1

Uj∇kF (X1,j + β00X2,j|δ0)

k = 1, 2, ..., nN

whereUj = Yj − F (X1,j + β00X2,j|δ0).

Recall thatE[Uj|Xj] = 0 a.s., where Xj = (X1,j, X 0

2,j)0.

60

Hence, denoting

ρi(Xj) = f(X1,j + β00X2,j|δ0)Xi,2,j, i = 1, 2, ..., d− 1,ρd−1+k(Xj) = ∇kF (X1,j + β00X2,j|δ0), k ∈ N,

we can write

bZN(u) = 2√N

NXj=1

Uj

Ãd−1+nNXk=1

ρk(Xj)2−k√2 cos(kπu)

!.

Next, let

eZN(u) = 2√N

NXj=1

Uj

Ã ∞Xk=1


!.

Then

E

∙Z 1

0

³eZN(u)− bZN(u)´2 du¸

=4

N

NXj=1

E

⎡⎣U2j Z 1

0

Ã ∞Xk=d−1+nN


!2du

⎤⎦=4

N

NXj=1

E

"U2j

∞Xk=d−1+nN

ρk(Xj)22−2k

#

= 4∞X

k=d−1+nNE£U2ρk(X)

2¤2−2k → 0 as N →∞,

hence bZN(u)⇒ Z(u) if eZN(u)⇒ Z(u), where⇒ indicates weak convergence.But eZN(u) ⇒ Z(u) follows from Billingsley (1968, Theorem 8.2), where bythe standard central limit theorem, Z(u) is a zero-mean Gaussian process on[0, 1], with covariance function

Γ(u1, u2) = E[Z(u1)Z(u2)] = Eh eZN(u1) eZN(u2)i

= 4∞Xk=1

∞Xm=1

E[U2ρk(X)ρm(X)]2−k−m+1 cos(kπu1) cos(mπu2).

61

ReferencesBierens, H. J. (2008): ”Semi-Nonparametric Interval-Censored Mixed

Proportional HazardModels: Identification and Consistency Results”, Econo-metric Theory 24, 749-794.Bierens, H. J. (2014a): ”The Hilbert Space Theoretical Foundation of

Semi-Nonparametric Modeling”. Chapter 1 in: J. Racine, L. Su and A. Ullah(eds), The Oxford Handbook of Applied Nonparametric and SemiparametricEconometrics and Statistics, Oxford University Press.Bierens, H. J. (2014b): ”Consistency and Asymptotic Normality of Sieve

ML Estimators Under Low-Level Conditions”, Econometric Theory 30, 1021-1076.Bierens, H. J. (2017): Econometric Model Specification: Consistent Model

Specification Tests and Semi-Nonparametric Modeling and Inference. WorldScientific Publishers.Bierens, H. J., and J. Hartog (1988): ”Non-linear Regression with Dis-

crete Explanatory Variables, with an Application to the Earnings Function”,Journal of Econometrics 38, 269-299.Billingsley, P. (1968): Convergence of Probability Measures. Wiley.Chen, X. (2007): ”Large Sample Sieve Estimation of Semi-Nonparametric

Models”. Chapter 76 in J. J. Heckman and E. Leamer (eds.), Handbook ofEconometrics, Vol. 6. Elsevier.Courant, R., and D. Hilbert (1953): Methods of Mathematical Physics,

Vol. I. Interscience Publishers.Gallant, A. R., and D. W. Nychka (1987): ”Semi-Nonparametric Maxi-

mum Likelihood Estimation”, Econometrica 55, 363-390.Grenander, U. (1981): Abstract Inference. Wiley.Hamming, R. W. (1973): Numerical Methods for Scientists and Engi-

neers. Dover Publications.Jennrich, R. I. (1969): ”Asymptotic Properties of Nonlinear Least Squares

Estimators”, Annals of Mathematical Statistics 40, 633-643.Royden, H. L. (1968): Real Analysis. Macmillan.

62

Documents

Uniform Convergence of Semi-Nonparametric Density