Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Uniform Convergence of Semi-NonparametricDensity Functions and Their Derivatives,with Application to the Semi-Nonparametric
Fractional Index Regression Model
Herman J. Bierens∗
Pennsylvania State University
June 10, 2018
Abstract
In this paper I will set forth conditions under which semi-nonparametric(SNP) density functions and their derivatives are uniformly conver-gent, and derive their rates of uniform convergence. These results willbe applied to sieve least squares estimation of the SNP fractional in-dex regression model, with asymptotic normality results based on theapproach in Bierens (2014): ”Consistency and Asymptotic Normalityof Sieve ML Estimators Under Low-Level Conditions”, EconometricTheory 30, 1021-1076.
1 Introduction
In Bierens (2014a,b) I have shown that any continuous density h(u) on (0, 1)satisfying h(u) > 0 on (0, 1) can be approximated arbitrarily close by
hn(u) =
¡1 +
Pnk=1 δk
√2 cos(kπu)
¢21 +
Pnm=1 δ
2m
, (1)
∗Professor Emeritus of Economics, Pennsylvania State University. Please address cor-respondence by email only to [email protected], because I have no longer an office at PennState. This paper was presented at the Academia Sinica, Taiwan.
1
in the sense that
limn→∞
Z 1
0
|h(u)− hn(u)|du = 0, (2)
where
δm =
R 10
√2 cos(mπu)
ph(u)duR 1
0
ph(u)du
, m ∈ N, (3)
withP∞
m=1 δ2m <∞.Moreover, in these two papers it has been shown that (2)
is equivalent to limn→∞ hn(u) = h(u) a.e. on [0, 1], i.e., limn→∞ hn(u) = h(u)pointwise in u in a subset of [0, 1] with Lebesgue measure 1.Furthermore, given an a priory chosen absolutely continuous distrib-
ution function G(x) with continuous density g(x), support R and inverseG−1(u), any continuous density f(x) with support R can be written asf(x) = h(G(x))g(x), where
h(u) = f(G−1(u))/g(G−1(u)) (4)
is a continuous density on (0, 1) satisfying h(u) > 0 on (0, 1). Thus, denotingfn(x) = hn(G(x))g(x), we have limn→∞
R∞−∞ |fn(x) − f(x)|dx = 0 and thus
limn→∞ fn(x) = f(x) a.e. on R.The questions I will address in this paper are:
• Under what conditions on h(u) do we also havelimn→∞
sup0≤u≤1
|hn(u)− h(u)| = 0,
limn→∞
sup0≤u≤1
¯̄(d/du)ihn(u)− (d/du)ih(u)
¯̄= 0,
for i = 1, 2, ..., ` and some ` ∈ N ?• Under what conditions on f(x) and g(x) are these conditions on h(u)in (4) satisfied, so that
limn→∞
supx∈R
|fn(x)− f(x)| = 0,limn→∞
supx∈R
¯̄(d/dx)ifn(x)− (d/dx)if(x)
¯̄= 0,
for i = 1, 2, ..., ` as well ?
• What are the rates of uniform convergence?
2
In Bierens (2014b) I have imposed these conditions by assuming thatthe parameters δm in (3) satisfy
P∞m=1m
`|δm| <∞ for some ` ∈ N. However,as will be shown, the latter condition can be derived from smoothness andmoment conditions on f and g.The answers to these questions play a key role in deriving the consistency
and asymptotic normality of sieve estimators of semi-nonparametric (SNP)models, as will be shown for the SNP fractional index regression model.Recall that in general SNP models are models for which the functional formis only partly parametrized and where the non-specified parts are unknownfunctions which are approximated by series expansions.To motivate my answers to the above questions, I will give in section
2 a brief review of the key elements of Hilbert spaces of square-integrablefunctions, and in section 3 I will briefly review the various ways densityfunctions can be approximated by series expansions.In section 4 I will discuss the core idea of attaining uniform convergence
of functions in the Hilbert space L2(0, 1) of square-integrable functions on[0, 1]. In particular, given a four times differentiable real function ϕ on (0, 1)such that ϕ0000 ∈ L2(0, 1), I will derive the series expansions of ϕ, ϕ0, ϕ00 andϕ000 in terms of the cosine and sine sequences and show that they convergeuniformly on [0, 1]. It appears that the fastest rates of uniform convergenceare attained if the tail conditions
ϕ0(0) = ϕ0(1) = 0 and ϕ000(0) = ϕ000(1) = 0 (5)
hold.Given that these conditions hold for ϕ(u) =
ph(u), where h(u) is a
density on [0, 1], I will show in section 5 that the series expansions of h(u)and its derivatives h0(u), h00(u) and h000(u) converge uniformly on [0, 1] aswell, and that the δm’s defined by (3) satisfy
P∞m=1m
3|δm| <∞.In section 6 I will set forth conditions on f and g in (4) such that, with
ϕ(u) =ph(u), the conditions ϕ0000 ∈ L2(0, 1) and (5) hold, and in section 7
I will specialize these results to densities on R and their derivatives.The infinite-dimensional parameter space of the δm’s in (3) is now
∆3 =
(δ = {δm}∞m=1 :
∞Xm=1
m3|δm| <∞).
Endowing ∆3 with the norm ||δ||3 =P∞
m=1m3|δm| and associated metric,
it becomes a Banach space. In section 8 it will be shown that for given
3
M ∈ (0,∞) the subspace ∆3(M) = {δ = {δm}∞m=1 :P∞
m=1m3|δm| ≤M} of
∆3 is compact. This result is very convenient in SNP modeling and inference,as will be demonstrated in section 9.In section 9 I will apply these results to the SNP fractional index regres-
sion model, where the dependent variable Y is a fraction, so that Pr[Y ∈(0, 1)] = 1, and given a vector X of covariates, E[Y |X] = F0(λ(X,β0)) a.s.,where λ(x, β) is a given linear parametric index function1 and F0 is an un-known absolutely continuous distribution function with support R. It willbe shown that β0 and F0 can be estimated consistently by sieve nonlinearregression, similar to standard parametric nonlinear regression. Moreover,using my approach in Bierens (2014b), I will show that the sieve estimatorof β0 is asymptotically zero-mean normal at the usual parametric rate.In section 10 I will make a few concluding remarks, and the last section
11 is an appendix containing proofs.
2 Hilbert spaces of functions: A brief review
Let w(x) be a given density on R. Consider the space L2 (w) of real functionsf(x) on R satisfying Z
f(x)2w(x)dx <∞.Endow the space L2 (w) with the innerproduct
hf, gi =Zf(x)g(x)w(x)dx
and associated norm ||f || = phf, fi and metric ||f − g||. Then L2 (w) is aHilbert space.Recall that in general
A Hilbert space H is a vector space endowed with an inner product and asso-ciated norm and metric such that every Cauchy sequence in H takes a limitin H.
In particular, for any sequence fn ∈ L2 (w) satisfyinglim
min(m,k)→∞kfm − fkk = 0
1I.e., given x, λ(x,β) is linear in β, and given β, λ(x,β) is linear in x.
4
(so that fn a Cauchy sequence) it can be shown that there exists a functionf ∈ L2 (w) such that
limn→∞
kfn − fk = 0.Thus, L2 (w) is a Hilbert space.
2.1 Complete orthonormal sequences
Let {ϕj(x)}∞j=0 be an orthonormal sequence in L2 (w):
hϕi,ϕji =Z
ϕi (x)ϕj (x)w(x)dx = I(i = j),
where I(.) is the indicator function. Let fn(x) be the linear projection off(x) on {ϕj(x)}nj=0:
fn(x) =nXj=0
γjϕj (x) , where ||f − fn||2 is minimal.
The coefficients γj for which this is the case are
γj = hf,ϕji =Zf(x)ϕj (x)w(x)dx,
∞Xj=0
γ2j <∞.
The γj’s involved are called the Fourier coefficients of f(x).An orthonormal sequence {ϕj(x)}∞j=0 in L2 (w) is called complete if for
arbitrary f ∈ L2 (w) ,
limn→∞
||f − fn|| =slimn→∞
Z(f(x)− fn(x))2w(x)dx = 0, (6)
where
fn(x) =nXj=0
γjϕj(x) with γj = hf,ϕji .
Moreover, since fn(x) is a projection with residual rn(x) = f(x)− fn(x), wemust have
hrn, fni = 0,
5
hence by (6),Zf(x)2w(x)dx =
Zfn(x)
2w(x)dx+Zrn(x)
2w(x)dx
=nXj=0
γ2j +
Z(f(x)− fn(x))2w(x)dx
=∞Xj=0
γ2j .
and Z(f(x)− fn(x))2w(x)dx =
∞Xj=n+1
γ2j
Furthermore, note that (6) implies that f(x) = limn→∞ fn(x) a.e. in thesupport of w(x), i.e.
f(x) = limn→∞
fn(x)
pointwise in x in a set X ⊂ {x ∈ R : w(x) > 0} satisfying RXw(x)dx = 1.See Bierens (2014a, Theorem 9).Finally, note that the rate of convergence may depend on x, i.e., for every
ε > 0 and x ∈ X there exists an n0(ε, x) ∈ N such that
|f(x)− fn(x)| < ε if n > n0(ε, x).
2.2 Examples of complete orthonormal sequences
2.2.1 Hermite polynomials
In the casew(x) = exp
¡−x2/2¢ /√2π, x ∈ R,the Hermite polynomials form a complete orthonormal sequence in the cor-responding Hilbert space L2 (w) . Note that the Hermite polynomials ϕk (x) ,k ≥ 0, on R can be generated recursively by
√k + 1ϕk+1(x)− x.ϕk(x) +
√kϕk−1(x) = 0, k ≥ 1,
starting from ϕ0(x) = 1,ϕ1(x) = x. See, for example, Hamming (1973).
6
2.2.2 Legendre polynomials
In the case that w(u) is the uniform density on [0, 1],
w(u) = I (0 ≤ u ≤ 1) ,
the Hilbert space L2(w) involved is denoted by L2(0, 1).The Legendre polynomials form a complete orthonormal sequence in
L2 (0, 1) . These polynomials, ϕk(u), say, can be generated recursively by
(k + 1) /2√2k + 3
√2k + 1
ϕk+1(u) + (0.5− u) .ϕk(u)
+k/2√
2k + 1√2k − 1ϕk−1(u) = 0, k ≥ 1,
starting from ϕ0(u) = 1,ϕ1(u) =√3 (2u− 1) . See again Hamming (1973).
2.2.3 Trigonometric series
Other complete orthonormal sequences in L2(0, 1) are:
• the cosine series,
ϕk(u) =
½1 for k = 0,√2 cos(kπu) for k ∈ N.
• the Fourier series,
ϕ0(u) = 1,
ϕ2k−1(u) =√2 sin (2kπu) , k ∈ N,
ϕ2k(u) =√2 cos (2kπu) , k ∈ N.
• the sine seriesϕk(u) =
√2 sin(kπu), k ∈ N.
7
2.3 How well (or bad) do the trigonometric series fit?
In the cases of the cosine and Fourier series I will use the test function
f(u) = u(4− 3u), 0 ≤ u ≤ 1,whereas in the sine case I will use the derivative of f(u) as test function:
f 0(u) = 4− 6u, 0 ≤ u ≤ 1.Since f(u) is a polynomial of order 2, we can represent f(u) exactly by a
linear combination of the first three Legendre polynomials, i.e.,
f(u) ≡ α0ϕ0(u) + α1ϕ1(u) + α2ϕ2(u)
where
ϕ0(u) = 1, ϕ1(u) =√3 (2u− 1) , ϕ2(u) =
√5¡6u2 − 6u+ 1¢
and
αm =
Z 1
0
f(u)ϕm(u)du, m = 0, 1, 2.
However, by nature the trigonometric series ”wobble” with increasing fre-quency, so the question is whether these series are capable of approximatinga smooth function like f(u) = u(4− 3u), using a modest number of leadingelements of these series.
2.3.1 The cosine series
The SNP version of f(u) = u(4− 3u), 0 ≤ u ≤ 1, takes the form
fn(u) = α0 +nX
m=1
αm√2 cos(mπu), n ∈ N,
where
α0 =
Z 1
0
f(u)du = 1,
αm =
Z 1
0
f(u)√2 cos(mπu)du
=
½ −6√2(mπ)−2 if m ≥ 2 is even,−2√2(mπ)−2 if m ≥ 1 is odd. (7)
8
Note that in this case
sup0≤u≤1
|f(u)− fn(u)| ≤ 6√2
π2
∞Xm=n+1
m−2 = O(1/n).
The function f(u) will be compared with its SNP version fn(u) for n = 10and n = 20, respectively.
9
Clearly, fn(u) for n = 10 approximates f(u) very well, except in the tailsof fn(u).The reason for the latter is that f 0n(u) = −π
Pnm=1 αmm
√2 sin(mπu),
so that f 0n(0) = f0n(1) = 0. As expected, the tail fit becomes better for larger
truncation orders n.
2.3.2 The Fourier series
In the case of the Fourier series the SNP version of f(u) = u(4 − 3u) takesthe form:
fn(u) = γ0 +
n/2Xk=1
γ2k√2 cos(2kπu) +
n/2Xk=1
γ2k−1√2 sin(2kπu),
for even n, where
γ0 =
Z 1
0
f(u)du = 1,
γ2k =
Z 1
0
f(u)√2 cos(2kπu)du = −
³3/√2´(kπ)−2 ,
γ2k−1 =
Z 1
0
f(u)√2 sin(2kπu)du = −
³√2kπ
´−1.
10
As we will see, the fit of the Fourier series approximation for n = 10 ispretty bad compared with the cosine series approximations, especially in thetails. For n = 20 the fit is slightly better, as expected, but still inferior tothe cosine case.This bad fit may be due to the slower rate of convergence to zero of γ2k−1,
i.e.,
γ2k−1 =Z 1
0
f(u)√2 sin(2kπu)du = O(k−1),
compared with
α2k−1 =Z 1
0
f(u)√2 cos((2k − 1)πu)du = O(k−2)
in the cosine case, whereas γ2k and α2k are both of order O(k−2).
11
2.3.3 The sine series
The SNP version of f 0(u) = 4−6u is just the derivative of fn(u) in the cosinecase:
f 0n(u) = −πnX
m=1
m.αm√2 sin(mπu),
where αm is the same as in (7), henceZ 1
0
(f 0(u)− f 0n(u))2 du = π2∞X
m=n+1
m2α2m = O
à ∞Xm=n+1
m−2!
= O(1/n) = o(1),
and thus limn→∞ f 0n(u) = f0(u) pointwise in u ∈ A ⊂ [0, 1], where the set A
has Lebesgue measure 1. Clearly, the set A does not contain the singletons{0} and {1} because f 0(0) = 4, f 0(1) = −2, whereas fn(0) = fn(1) = 0.
12
To see what happens as n becomes much larger, consider the case n = 50.
13
Apparently, the pointwise rate of convergence of limn→∞ f 0n(u) = f 0(u)for u close to 0 or 1 seems to slow down.
3 Densities and distribution functions
3.1 Series representation of densities
Given a density w(x) with support (a, b) ⊆ R and corresponding completeorthonormal sequence ϕj(x), every density function f(x) with support (a, b)can be written as
f(x) = limn→∞
fn(x) a.e. on (a, b),
where
fn(x) =w(x)
³Pnj=0 γjϕj(x)
´2Pnj=0 γ
2j
with ∞Xj=0
γ2j =
Z b
a
f(x)dx = 1
14
However, there are uncountable many sequences γj for which this is true, dueto the square in the expression for fn(x).In particular, let for an arbitrary τ ∈ (a, b),
g (x|τ) = (I (x ≤ τ)− I (x > τ))pf(x)/w(x),
where I(.) is the indicator function. Then f(x) = w(x)g (x|τ)2, henceg (x|τ) ∈ L2 (w) . Consequently, denoting
fn(x|τ) =w(x)
³Pnj=0 γj(τ)ϕj(x)
´2Pnj=0 γj(τ)
2,
where
γj(τ) =
Z b
a
g (x|τ)ϕj(x)w(x)dx
=
Z τ
a
ϕj(x)pf(x).w(x)dx−
Z b
τ
ϕj(x)pf(x).w(x)dx,
we havef(x) = lim
n→∞fn(x|τ) a.e. on (a, b)
as well,This result can be used to reparametrize the γj’s more conveniently, pro-
vided that ϕ0(x) ≡ 1, which is usually the case. (An exception is the sineseries). Then we can always choose
γ0 ∈µ0,
Z b
a
pf(x).w(x)dx
¶,
so that the conditionP∞
j=0 γ2j = 1 can be implemented by reparametrizing
the γj’s as
γ0 =1p
1 +P∞
m=1 δ2m
γj =δjp
1 +P∞
m=1 δ2m
, j ≥ 1,
where ∞Xm=1
δ2m <∞.
15
This reparametrization is merely a convenient way to implement the con-dition
P∞j=0 γ
2j = 1, and does not solve the lack of uniqueness problem.
However, if w(x), f(x) and the ϕm(x)’s are continuous on (a, b) then thesequence {δm}∞m=1 is unique:
Theorem 3.1. Given a density w(x) which is continuous on its support(a, b) ⊆ R, and corresponding complete orthonormal sequence {ϕm(x)}∞m=0in L2(w), with ϕ0(x) ≡ 1, such that for each m ∈ N, ϕm(x) is continuouson (a, b),2 for every density function f(x) which is continuous on its support(a, b) there exists a unique sequence {δm}∞m=1, given by
δm =
R baϕm(x)
pw(x)
pf(x)dxR b
a
pw(x)
pf(x)dx
, m ∈ N,
and satisfyingP∞
m=1 δ2m <∞, such thatf(x) = lim
n→∞fn(x) a.e. on (a, b),
or equivalently
limn→∞
Z b
a
|f(x)− fn(x)|dx = 0,where
fn(x) = w(x)(1 +
Pnk=1 δkϕk(x))
2
1 +Pn
m=1 δ2m
.
Proof. See Bierens (2014a, Theorem 21).
Gallant and Nychka (1987) use this approach to generalize the standardnormal density to
fn(x) =exp (−x2/2)√
2π× (1 +
Pnk=1 δkϕk(x))
2
1 +Pn
m=1 δ2m
,
where the ϕk(x)’s are the Hermite polynomials. They call these densitiessemi-nonparametric (SNP) densities. In particular, it follows from Theo-rem 3.1 that for every continuous density f(x) with support R there exists aunique sequence {δm}∞m=1 satisfying
P∞m=1 δ
2m <∞ such that limn→∞
R∞−∞ |f(x)−
fn(x)|dx = 0, hence fn(x)→ f(x) a.e. on R.2Which is usually the case.
16
3.2 Density and distribution functions on the unit in-terval
Let G(x) be an a priori chosen distribution function with density g(x) andsupport
X = {x ∈ R : g(x) > 0},Every absolutely continuous distribution function F (x) with support X canbe written as
F (x) = H(G(x))
whereH(u) is an absolutely continuous distribution function on [0, 1], namely
H(u) = F (G−1(u)),
where G−1(u), u ∈ [0, 1], is the inverse of G(x).The density f(x) of F (x) = H(G(x)) can be written as
f(x) = h(G(x))g(x)
where h(u) is the density of H(u):
h(u) = f(G−1(u))/g(G−1(u)).
Therefore, in modeling general density and distribution functions semi-nonparametrically, it suffices to model the density h(u) and its c.d.f. H(u)semi-nonparametrically.From now onwards I will use the cosine sequence as the preferred complete
orthonormal sequence in L2(0, 1). Then Theorem 3.1 translates as follows.
Theorem 3.2. For every continuous density function h(u) on (0, 1) sat-isfying h(u) > 0 on (0, 1) there exists a unique sequences {δm}∞m=1 de-fined by (3) such that h(u) = limn→∞ hn(u) a.e. on [0, 1], or equivalently,limn→∞
R 10|h(u)− hn(u)|du = 0, where hn(u) is the corresponding SNP den-
sity, defined by (1)
The advantage of the cosine sequence is that then the SNP distributionfunction Hn(u) =
R u0hn (z)dz has a closed form expression:
Hn(u) = u+1
1 +Pn
m=1 δ2m
"2√2
nXk=1
δksin (kπu)
kπ+
nXm=1
δ2msin (2mπu)
2mπ
17
+2nXk=2
k−1Xm=1
δkδmsin ((k +m)πu)
(k +m)π+ 2
nXk=2
k−1Xm=1
δkδmsin ((k −m)πu)(k −m)π
#.
Of course, a similar closed form expression can be derived for the Fouriersequence. However, as has been demonstrated, the cosine sequence yieldsthe best fit.
4 Uniform convergence by integration
Let ϕ(u) be a real function on (0, 1) satisfyingR 10ϕ(u)2du < ∞, so that
ϕ(u) ∈ L2(0, 1). Recall that by the completeness of the cosine series we havelimn→∞ ϕn(u) = ϕ(u) a.e. on (0, 1),where ϕn(u) = γ0+
Pnm=1 γm
√2 cos(mπu)
with γ0 =R 10ϕ(u)du, γm =
R 10
√2 cos(mπu)ϕ(u)du for m ∈ N.
The questions I will now address are: Under what conditions do we have
limn→∞
sup0≤u≤1
|ϕn(u)− ϕ(u)| = 0, (8)
limn→∞
sup0≤u≤1
|diϕn(u)/(du)i − diϕn(u)ϕ(u)/(du)i| = 0
for i = 1, 2, .., ` and some ` ∈ N, (9)
and what are the rates of uniform convergence?
4.1 The case ` = 1
Suppose that ϕ(u) is twice differentiable on (0, 1), and that
ϕ00(u) ∈ L2(0, 1).
Denote
α0 =
Z 1
0
ϕ00(u)du = ϕ0(1)− ϕ0(0),
αm =
Z 1
0
ϕ00(u)√2 cos(mπu)du for m ∈ N,
ϕ00n(u) = α0 +nX
m=1
αm√2 cos(mπu) for n ∈ N.
18
Recall from section 2.1 thatZ 1
0
ϕ00(u)2du =∞Xm=0
α2m
and Z 1
0
(ϕ00n(u)− ϕ00(u))2 du =∞X
m=n+1
α2m = o(1).
Note that
|ϕ0(1)− ϕ0(0)| =¯̄̄̄Z 1
0
ϕ00(u)du¯̄̄̄≤Z 1
0
|ϕ00(u)|du ≤sZ 1
0
ϕ00(u)2du <∞,(10)
and similarly, for any u ∈ (0, 1),
|ϕ0(u)− ϕ0(0)| =¯̄̄̄Z u
0
ϕ00(v)dv¯̄̄̄≤Z 1
0
|ϕ00(v)|dv <∞. (11)
Since ϕ0(u) is differentiable and therefore continuous on (0, 1), it is uniformlycontinuous on any closed interval in (0, 1), so that |ϕ0(u)| < ∞ for eachu ∈ (0, 1). Consequently, it follows from (11) that |ϕ0(0)| <∞, which by (10)implies that |ϕ0(1)| <∞. Thus,
ϕ0(0) ∈ R, ϕ0(1) ∈ R,
which implies that ϕ0(u) is uniformly continuous on [0, 1].Now the primitive of ϕ00n(u) takes the general form
ϕ0n(u) = c+ (ϕ0(1)− ϕ0(0))u+
nXm=1
αmπm
√2 sin(mπu)
for some constant c. Since ϕ0n(1) = ϕ0(1) and ϕ0n(0) = ϕ0(0) if c = ϕ0(0), thelatter is a natural choice for c, so that
ϕ0n(u) = ϕ0(0) + (ϕ0(1)− ϕ0(0))u+nX
m=1
αmπm
√2 sin(mπu)
= ϕ0(0) +Z u
0
ϕ00n(v)dv.
19
Since also ϕ0(u) = ϕ0(0) +R u0ϕ00(v)dv we have
lim supn→∞
sup0≤u≤1
|ϕ0(u)− ϕ0n(u)| ≤ lim supn→∞
Z 1
0
|ϕ00(v)− ϕ00n(v)|dv
≤ lim supn→∞
sZ 1
0
(ϕ00(v)− ϕ00n(v))2 dv =
vuut limn→∞
∞Xm=n+1
α2m = 0.
Consequently, the series expansion
ϕ0(u) ≡ ϕ0(0) + (ϕ0(1)− ϕ0(0))u+∞Xm=1
αmπm
√2 sin(mπu) (12)
holds exactly and uniformly on [0, 1].As to the rate of uniform convergence, note that
sup0≤u≤1
|ϕ0(u)− ϕ0n(u)| ≤√2
∞Xm=n+1
|αm|πm
= o(n−1/2),
where the latter is due to the following lemma:
Lemma 4.1.P∞
m=1 α2m <∞ implies that for c > 1/2,
∞Xm=n+1
m−c|αm| = o¡n1/2−c
¢.
Proof. See the Appendix.
Next, the primitive ϕ(u) of ϕ0(u) takes the general form
ϕ(u) = c+ ϕ0(0)u+1
2(ϕ0(1)− ϕ0(0))u2
−∞Xm=1
αm(πm)2
√2 cos(mπu)
for some constant c. But thenZ 1
0
ϕ(v)dv = c+1
2ϕ0(0) +
1
6(ϕ0(1)− ϕ0(0)) ,
20
so that
ϕ(u) ≡Z 1
0
ϕ(v)dv − 12ϕ0(0)− 1
6(ϕ0(1)− ϕ0(0))
+ϕ0(0)u+1
2(ϕ0(1)− ϕ0(0))u2
−∞Xm=1
αm(πm)2
√2 cos(mπu).
exactly and uniformly on [0, 1].Moreover, it can be shown that for m ∈ N,Z 1
0
u√2 cos(mπu)du =
√2(−1)m − 1(mπ)2
,Z 1
0
u2√2 cos(mπu)du =
2√2(−1)m(mπ)2
,
hence, since limn→∞P∞
m=n+1m−2 = 0, it follows that
u ≡ 1
2+√2∞Xm=1
(−1)m − 1(mπ)2
√2 cos(mπu),
u2 ≡ 1
3+ 2√2∞Xm=1
(−1)m(mπ)2
√2 cos(mπu),
exactly and uniformly on [0, 1]. Thus,
ϕ(u) ≡Z 1
0
ϕ(v)dv + ϕ0(0)√2∞Xm=1
(−1)m − 1(mπ)2
√2 cos(mπu)
+(ϕ0(1)− ϕ0(0))√2∞Xm=1
(−1)m(mπ)2
√2 cos(mπu)
−∞Xm=1
αm(πm)2
√2 cos(mπu)
exactly and uniformly on [0, 1].However, since ϕ ∈ L2(0, 1), ϕ(u) has also the cosine series representation
ϕ(u) = γ0 +∞Xm=1
γm√2 cos(mπu) a.e. on (0, 1),
21
where
γ0 =
Z 1
0
ϕ(u)du, γm =Z 1
0
ϕ(u)√2 cos(mπu)du for m ∈ N.
Therefore we must have that for m ∈ N,
γm =
Z 1
0
ϕ(u)√2 cos(mπu)du
=
√2 (ϕ0(1)(−1)m − ϕ0(0))− αm
(mπ)2.
Consequently, ϕ(u) has two equivalent cosine series representations, namely
ϕ(u) ≡Z 1
0
ϕ(v)dv − 12ϕ0(0)− 1
6(ϕ0(1)− ϕ0(0))
+ϕ0(0)u+1
2(ϕ0(1)− ϕ0(0))u2
−∞Xm=1
αm(πm)2
√2 cos(mπu)
≡Z 1
0
ϕ(v)dv +∞Xm=1
γm√2 cos(mπu),
exactly and uniformly on [0, 1].However, there is a difference between these two expressions for ϕ(u),
namely their rates of uniform convergence are different! To see this, denote
ϕn(u) =
Z 1
0
ϕ(v)dv − 12ϕ0(0)− 1
6(ϕ0(1)− ϕ0(0))
+ϕ0(0)u+1
2(ϕ0(1)− ϕ0(0))u2
−nX
m=1
αm(πm)2
√2 cos(mπu).
Then we have
sup0≤u≤1
|ϕn(u)− ϕ(u)| ≤√2
π2
∞Xm=n+1
|αm|m2
= o¡n−3/2
¢,
22
where the latter is again due to Lemma 4.1. On the other hand, denoting
ϕ∗n(u) =Z 1
0
ϕ(v)dv +nX
m=1
γm√2 cos(mπu),
we have
sup0≤u≤1
|ϕ∗n(u)− ϕ(u)| ≤√2
∞Xm=n+1
|γm|
≤ 2
π2(|ϕ0(1)|+ |ϕ0(0)|)
∞Xm=n+1
m−2 +
√2
π2
∞Xm=n+1
m−2|αm|
=
½o¡n−3/2
¢if ϕ0(0) = ϕ0(1) = 0.
O(n−1) if ϕ0(0) 6= 0 or ϕ0(1) 6= 0.
Remark 4.1. Suppose that the function ϕ(u) is continuous and periodic on[0, 1], i.e. ϕ(u) can be extended to a continuous function ϕ(x) on R by therelation ϕ(x) = ϕ(x + 1) for all x ∈ R. Then with ϕn(u) the Fourier seriesexpansion of ϕ(u) it is well-known that limn→∞ sup0≤u≤1 |ϕn(u)−ϕ(u)| = 0.See Courant and Hilbert (1953, Chapter II). However, for the purpose of thispaper the case that ϕ(u) is periodic on [0, 1] is of no interest.
4.2 The case ` = 3
Now suppose that ϕ(u) is four times differentiable on (0, 1), and that ϕ0000(u) ∈L2(0, 1). Denote
α0 =
Z 1
0
ϕ0000(u)du = ϕ000(1)− ϕ000(0),
αm =
Z 1
0
√2 cos(mπu)ϕ0000(u)du, m ∈ N, (13)
ϕ0000n (u) = α0 +nX
m=1
αm√2 cos(mπu), n ∈ N,
and recall thatR 10(ϕ0000n (u)− ϕ0000(u))2du =
P∞m=n+1 α
2m = o(1). Then it is not
too hard (but somewhat tedious) to verify that, similar to the case ` = 1, thefollowing results hold.
23
Theorem 4.1. Let ϕ(u) be a four times differentiable real function on (0, 1),with fourth derivative ϕ0000(u) ∈ L2(0, 1), and let αm be defined by (13). Thenϕ(u) and its derivatives ϕ0(u), ϕ00(u) and ϕ000(u) are uniformly continuous on[0, 1], with exact and uniform cosine-sine series representations
ϕ(u) ≡ P4(u) +∞Xm=1
αm(mπ)4
√2 cos(mπu),
ϕ0(u) ≡ P 04(u)−∞Xm=1
αm(mπ)3
√2 sin(mπu),
ϕ00(u) ≡ P 004 (u)−∞Xm=1
αm(mπ)2
√2 cos(mπu),
ϕ000(u) ≡ P 0004 (u) +∞Xm=1
αmmπ
√2 sin(mπu)
on [0, 1], where
P4(u) =
Z 1
0
ϕ(v)dv − 12ϕ0(0)
−16
µϕ0(1)− ϕ0(0)− 1
2ϕ000(0)− 1
6(ϕ000(1)− ϕ000(0))
¶− 124
ϕ000(0)− 1
120(ϕ000(1)− ϕ000(0)) + ϕ0(0)u
+1
2
µϕ0(1)− ϕ0(0)− 1
2ϕ000(0)− 1
6(ϕ000(1)− ϕ000(0))
¶u2
+1
6ϕ000(0)u3 +
1
24(ϕ000(1)− ϕ000(0))u4.
Moreover, denoting
ϕn(u) = P4(u) +nX
m=1
αm(mπ)4
√2 cos(mπu),
ϕ0n(u) = P 04(u)−nX
m=1
αm(mπ)3
√2 sin(mπu),
ϕ00n(u) = P 004 (u)−nX
m=1
αm(mπ)2
√2 cos(mπu),
24
ϕ000n (u) = P 0004 (u) +nX
m=1
αmmπ
√2 sin(mπu),
we have by Lemma 4.1 that
sup0≤u≤1 |ϕ(u)− ϕn(u)| = o(n−7/2),sup0≤u≤1 |ϕ0(u)− ϕ0n(u)| = o(n−5/2),sup0≤u≤1 |ϕ00(u)− ϕ00n(u)| = o(n−3/2),sup0≤u≤1 |ϕ000(u)− ϕ000n (u)| = o(n−1/2).
(14)
Note that if ϕ000(1) = ϕ000(0) = 0 and ϕ0(1) = ϕ0(0) = 0 then P4(u) ≡R 10ϕ(v)dv, so that exactly and uniformly on [0, 1],
ϕ(u) ≡Z 1
0
ϕ(v)dv +∞Xm=1
αm(mπ)4
√2 cos(mπu),
ϕ0(u) ≡ −∞Xm=1
αm(mπ)3
√2 sin(mπu),
ϕ00(u) ≡ −∞Xm=1
αm(mπ)2
√2 cos(mπu),
ϕ000(u) ≡∞Xm=1
αmmπ
√2 sin(mπu).
Denote
γm =
Z 1
0
ϕ(u)√2 cos(mπu)du =
αm(mπ)4
, m ∈ N.
Then ∞Xm=1
m8γ2m = π−8∞Xm=1
α2m <∞,
which by Lemma 4.1, with αm replaced by m4γm, implies thatP∞m=n+1m
3|γm| =P∞
m=n+1m−1|m4γm| = o(n−1/2),P∞
m=n+1m2|γm| =
P∞m=n+1m
−2|m4γm| = o(n−3/2),P∞m=n+1m|γm| =
P∞m=n+1m
−3|m4γm| = o(n−5/2),P∞m=n+1 |γm| =
P∞m=n+1m
−4|m4γm| = o(n−7/2).(15)
Note that the first result in (15) implies thatP∞
m=1m3|γm| <∞.
25
Summarizing, the following results hold.
Theorem 4.2. Let ϕ(u) be a four times differentiable real function on (0, 1),with fourth derivative ϕ0000 ∈ L2(0, 1), satisfying the tail conditions
limu↓0
ϕ000(u) = limu↑1
ϕ000(u) = 0, limu↓0
ϕ0(u) = limu↑1
ϕ0(u) = 0.
Then ϕ(u) and its first, second and third derivatives are uniform continuouson [0, 1] and have the exact and uniform cosine-sine series representations
ϕ(u) ≡Z 1
0
ϕ(v)dv +∞Xm=1
γm√2 cos(mπu), (16)
ϕ0(u) ≡ −π∞Xm=1
m.γm√2 sin(mπu),
ϕ00(u) ≡ −π2∞Xm=1
m2γm√2 cos(mπu),
ϕ000(u) ≡ π3∞Xm=1
m3γm√2 sin(mπu)
on [0, 1], where γm =R 10ϕ(u)√2 cos(mπu)du for m ∈ N, satisfying P∞
m=n+1
m3|γm| = o(n−1/2), hence∞Xm=1
m3|γm| <∞. (17)
Moreover, denoting ϕn(u) =R 10ϕ(v)dv+
Pnm=1 γm
√2 cos(mπu), the uniform
convergence rates (14) apply.
5 Uniform convergence of SNP densities onthe unit interval and their derivatives
Let h(u) be a density on [0, 1] satisfying h(u) > 0 on (0, 1), and denoteϕ(u) =
ph(u). Again, suppose that ϕ(u) is four times continuously differen-
tiable on (0, 1), with ϕ0000 ∈ L2(0, 1). Moreover, suppose that ϕ0(1) = ϕ0(0) =0 and ϕ000(1) = ϕ000(0) = 0.
26
As we have seen before, we can write
ϕ(u) ≡Z 1
0
ϕ(v)dv +∞Xm=1
αm(mπ)4
√2 cos(mπu)
≡ γ0 +∞Xm=1
γm√2 cos(mπu), say,
≡ 1 +P∞
m=1 δm√2 cos(mπu)p
1 +P∞
k=1 δ2k
where nowP∞
m=0 γ2m = 1 and
δm = γm.
Z 1
0
ϕ(v)dv =αm(mπ)4
Z 1
0
ph(v)dv.
Note that ∞Xm=n+1
δ2m = O
à ∞Xm=n+1
α2mm8
!= o(n−7). (18)
Next, let
ϕn(u) =1 +
Pnm=1 δm
√2 cos(mπu)p
1 +Pn
k=1 δ2k
.
It is now easy to verify, using (18), that we still have
sup0≤u≤1 |ϕ(u)− ϕn(u)| = o(n−7/2),sup0≤u≤1 |ϕ0(u)− ϕ0n(u)| = o(n−5/2),sup0≤u≤1 |ϕ00(u)− ϕ00n(u)| = o(n−3/2),sup0≤u≤1 |ϕ000(u)− ϕ000n (u)| = o(n−1/2).
Using the latter results it follows straightforwardly that the followingresults hold.
Theorem 5.1. Let h(u) be a positive density on (0, 1) such that ϕ(u) =ph(u) satisfies the conditions of Theorem 4.2.3 Then h(u) has exactly and
uniformly the cosine series representation
h(u) ≡¡1 +
P∞m=1 δm
√2 cos(mπu)
¢21 +
P∞m=1 δ
2m
,
3So that h(u) is uniformly continuous on [0, 1].
27
where the δm’s are uniquely defined by (3) and satisfy
∞Xm=1
m3|δm| <∞. (19)
Moreover, with hn(u) defined by (1), the following uniform convergence ratesapply:
sup0≤u≤1 |h(u)− hn(u)| = o(n−7/2),sup0≤u≤1 |h0(u)− h0n(u)| = o(n−5/2),sup0≤u≤1 |h00(u)− h00n(u)| = o(n−3/2),sup0≤u≤1 |h000(u)− h000n (u)| = o(n−1/2).
Furthermore, with H the c.d.f. of h and Hn the c.d.f. of hn we have
sup0≤u≤1
|H(u)−Hn(u)| ≤ sup0≤u≤1
|h(u)− hn(u)| = o(n−7/2).
These uniform convergence results depend crucially on the conditionsthat ϕ0000(u) ∈ L2(0, 1) and ϕ0(1) = ϕ0(0) = 0, ϕ000(1) = ϕ000(0) = 0, whereϕ(u) =
ph(u). So the question is how to impose these conditions.
6 Tail conditions
Recall that, with F (x) an absolutely continuous distribution function withdensity f(x) and support R, we can always write F (x) = H(G(x)), f(x) =h(G(x))g(x), where G(x) is an a priori chosen absolutely continuous distri-bution function with density g(x) and support R, and H(u) is an absolutelycontinuous distribution function on [0, 1] with density h(u), given by
H(u) = F (G−1(u)), h(u) = H 0(u) = f(G−1(u))/g(G−1(u)),
with G−1(u) the inverse of G(x).Thus, with ϕ(u) =
ph(u), the question is now:
Under what conditions on f and g do we have
ϕ0000 ∈ L2(0, 1), ϕ0(1) = ϕ0(0) = 0 and ϕ000(1) = ϕ000(0) = 0 ?
First of all, f and g need to be four times continuously differentiable onR. Second, a necessary condition is that G is chosen such that
ϕ(u) =ph(u) =
pf(G−1(u))/
pg(G−1(u))
28
and its derivatives ϕ0(u), ϕ00(u) and ϕ000(u) are uniformly continuous on [0, 1],as otherwise the aforementioned uniform convergence results cannot hold.Regarding ϕ(u) itself, this requirement holds if and only if the tails of g
are not fatter than those of f, because then ϕ(u) is bounded on [0, 1], whichby continuity of ϕ(u) on (0, 1) implies uniform continuity on [0, 1].
6.1 The choice of G
Given thatR∞−∞ |x|f(x)dx < ∞, which implies that lim|x|→∞ x2f(x) = 0,4
this tail condition holds if we choose for G(x) the c.d.f. of the standardCauchy distribution, i.e.,
G(x) = 0.5 + π−1 arctan(x), g(x) =1
π(1 + x2),
G−1(u) = tan(π(u− 0.5)).Then
h(u) = π¡1 + tan2(π(u− 0.5))¢ f (tan(π(u− 0.5))) , (20)
which satisfies
h(0) = limu↓0h(u) = lim
x→−∞x2f(x) = 0,
h(1) = limu↑1h(u) = lim
x→+∞x2f(x) = 0.
6.2 Tail conditions for the derivatives
Now with G the c.d.f. of the standard Cauchy distribution we can write
ϕ(u) =ph(u) =
√π
µq1 + tan2(π(u− 0.5))
¶η (tan(π(u− 0.5))) , (21)
where for notational convenience, η(x) =pf(x). Of course, in order that
ϕ(u) is four times differentiable on (0, 1) we need to require that η(x) is fourtimes differentiable on R, hence f(x) needs to be four times differentiable onR.Next, denote
φ(u|k, c,ψ) = tank(π(u− 0.5)) ¡1 + tan2(π(u− 0.5))¢c×ψ (tan(π(u− 0.5))) (22)
4See Lemma 6.1 below.
29
for some differentiable real function ψ on R, so that
ϕ(u) =√πφ(u|0, 1/2, η), (23)
Using the well-known fact that dtan(π(u−0.5))/du = π(1+tan2(π(u−0.5))),it follows that
φ0(u|k, c,ψ) = ∂φ(u|k, c,ψ)/∂u= kπφ(u|k − 1, c+ 1,ψ) + 2cπφ(u|k + 1, c,ψ)
+πφ(u|k, c+ 1,ψ0), (24)
hence, plugging in k = 0, c = 0.5 and ψ = η, it follows from (24) that
ϕ0(u) = π√πφ(u|1, 0.5, η) + π
√πφ(u|0, 1.5, η0). (25)
Similarly, it follows from (24) that
ϕ00(u) = π√π (φ0(u|1, 0.5, η) + φ0(u|0, 1.5, η0))
= π2√π (φ(u|0, 1.5, η) + φ(u|2, 0.5, η)
+4φ(u|1, 1.5, η0) + φ(u|0, 2.5, η00)) , (26)
ϕ000(u) = π2√π (φ0(u|0, 1.5, η) + φ0(u|2, 0.5, η)
+4φ0(u|1, 1.5, η0) + φ0(u|0, 2.5, η00))= π3
√π (5φ(u|1, 1.5, η) + φ(u|3, 0.5, η)
+5φ(u|0, 2.5, η0) + 13φ(u|2, 1.5, η0)+9φ(u|1, 2.5, η00) + φ(u|0, 3.5, η000)) , (27)
and
ϕ0000(u) = π3√π (5φ0(u|1, 1.5, η) + φ0(u|3, 0.5, η)
+5φ0(u|0, 2.5, η0) + 13φ0(u|2, 1.5, η0)+9φ0(u|1, 2.5, η00) + φ0(u|0, 3.5, η000))
= π4√π (5φ(u|0, 2.5, η) + 18φ(u|2, 1.5, η) + φ(u|4, 0.5, η))
+56φ(u|1, 2.5, η0) + 40φ(u|3, 1.5, η0)+58φ(u|2, 2.5, η00) + 14φ(u|0, 3.5, η00)+16φ(u|1, 3.5, η000) + φ(u|0, 4.5, η0000)) . (28)
30
Next, observe from (22) that
limu↓0
φ(u|k, c,ψ) = limx→−∞
xk+2cψ (x) , limu↑1
φ(u|k, c,ψ) = limx→+∞
xk+2cψ (x) ,
(29)hence by (23), (25), (26) and (27),
ϕ(0) =√π limx→−∞
xη (x) , ϕ(1) =√π limx→+∞
xη (x) ,
ϕ0(0) = π√π
µlimx→−∞
x2η (x) + limx→−∞
x3η0 (x)¶,
ϕ0(1) = π√π
µlimx→+∞
x2η (x) + limx→+∞
x3η0 (x)¶,
ϕ00(0) = π2√π
µlimx→−∞
x3η (x) + 4 limx→−∞
x4η0 (x) + limx→−∞
x5η00 (x)¶,
ϕ00(1) = π2√π
µlimx→+∞
x3η (x) + 4 limx→+∞
x4η0 (x) + limx→+∞
x5η00 (x)¶,
and
ϕ000(0) = π3√π
µ6 limx→−∞
x4η (x) + 17 limx→−∞
x5η0 (x)
+ 9 limx→−∞
x6η00 (x) + limx→−∞
x7η000 (x)¶,
ϕ000(1) = π3√π
µ6 limx→+∞
x4η (x) + 17 limx→+∞
x5η0 (x)
+ 9 limx→+∞
x6η00 (x) + limx→+∞
x7η000 (x)¶.
Thus,ϕ000(0) = ϕ000(1) = ϕ0(0) = ϕ0(1) = 0and ϕ00(0) = ϕ00(1) = ϕ(0) = ϕ(1) = 0
(30)
if
lim|x|→∞
x4η (x) = 0, lim|x|→∞
x5η0 (x) = 0, lim|x|→∞
x6η00 (x) = 0, lim|x|→∞
x7η000 (x) = 0.
(31)
31
Now assume:
Assumption 6.1. Given a density f(x) with support R, the following con-ditions hold:(a) f(x) is four times continuously differentiable on R;(b)R∞−∞ |x|3
pf(x)dx <∞;
(c) Denoting η(x) =pf(x), the set {x ∈ R : η0(x) = 0 or η00(x) =
0 or η000(x) = 0} is finite.
Assumption 6.1 holds for most densities considered in parametric econometricmodels, in particular the densities of the normal and logistic distributions.Moreover, consider the following easy lemma.
Lemma 6.1. Let ψ(x) be a continuous real function on R such that the set{x ∈ R : ψ(x) = 0} is either finite or empty, and R∞−∞ |ψ(x)|dx < ∞. Thenlim|x|→∞ xψ(x) = 0.
Proof. See the Appendix.
Then lim|x|→∞ x4η (x) = 0 follows from condition (b) and Lemma 6.1.Due to condition (c) and the fact that lim|x|→∞ η(x) = 0 there exists an
a > 0 such that η0(x) ≤ 0 for all x ≥ a and η0(x) ≥ 0 for all x ≤ −a. Thenfor b > a, using integration by parts,Z b
a
x4η0(x)dx = b4η(b)− a4η(a)− 4Z b
a
x3η(x)dx.
Letting b→∞ it follows from lim|x|→∞ x4η (x) = 0 and condition (b) thatZ ∞
a
x4|η0(x)|dx = a4η(a) + 4Z b
a
x3η(x)dx <∞,
and similarlyZ −a
−∞x4|η0(x)|dx = a4η(−a) + 4
Z −a
−∞|x|3η(x)dx <∞.
Hence,R∞−∞ x
4|η0(x)|dx < ∞, which by Lemma 6.1 implies that lim|x|→∞ x5η0 (x) = 0. Along similar lines it can be shown that
32
Lemma 6.2. Assumption 6.1 implies that the limits (31) hold, hence thetail conditions (30) hold.
Finally, we have to set forth conditions such that ϕ0000(u) ∈ L2(0, 1), whichis the case if
R 10ϕ0000(u)2du <∞. A sufficient condition for the latter is that
max
µlimu↓0|ϕ0000(u)|, lim
u↑1|ϕ0000(u)|
¶<∞, (32)
because then ϕ0000(u) is uniformly continuous [0, 1].Now observe from (29) and (28) that
ϕ0000(0) = limu↓0
ϕ0000(u)
= π4√π
µ24 lim
x→−∞x5η(x) + 96 lim
x→−∞x6η0(x)
+72 limx→−∞
x7η00(x) + 16 limx→−∞
x8η000(x) + limx→−∞
x9η0000(x)¶
ϕ0000(1) = limu↑1
ϕ0000(u)
= π4√π limx→+∞
µ24 lim
x→+∞x5η(x) + 96 lim
x→+∞x6η0(x)
+72 limx→+∞
x7η00(x) + 16 limx→+∞
x8η000(x) + limx→+∞
x9η0000(x)¶,
hence, (32) holds if
lim|x|→∞ |x|5η(x) <∞, lim|x|→∞ x6|η0(x)| <∞,lim|x|→∞ |x7η00(x)| <∞, lim|x|→∞ x8|η000(x)| <∞,lim|x|→∞ |x9η0000(x)| <∞.
(33)
However, it is difficult, if not impossible, to derive more primitive generalconditions for (33). On the other hand, suppose that:
Assumption 6.2. In addition to the conditions in Assumption 6.1 the fol-lowing conditions hold:(1)
R∞−∞ x
4pf(x)dx <∞;
(2) The set {x ∈ R : η0000(x) = 0} is finite, where η(x) =pf(x).
33
Then it follows similar to Lemma 6.2 that
lim|x|→∞ |x|5η(x) = 0, lim|x|→∞ x6|η0(x)| = 0,lim|x|→∞ |x7η00(x)| = 0, lim|x|→∞ x8|η000(x)| = 0,lim|x|→∞ |x9η0000(x)| = 0,
henceϕ0000(0) = ϕ0000(1) = 0. (34)
Part (2) of Assumption 6.2 is not a big deal, but admittedly, part (1) is toostrong a condition for
R 10ϕ0000(u)2du <∞. Nevertheless, I will adopt Assump-
tion 6.2, as it holds for most densities considered in parametric econometricmodels.Summarizing, the following results hold.
Theorem 6.1. Under Assumptions 6.1 and 6.2 the function ϕ(u) =ph(u)
defined in (21) satisfies the tail conditions (30) and (34) Consequently, theresults in Theorem 5.1 carry over to the density h(u) defined by (20).
7 Densities on R and their derivativesAs have been shown, every density f(x) with support R satisfying the con-ditions in Assumptions 6.1 and 6.2 can be written as
f(x) =h(0.5 + π−1 arctan(x))
π (1 + x2), (35)
with
h(u) = π¡1 + tan2(π(u− 0.5))¢ f (tan(π(u− 0.5)))
≡¡1 +
P∞m=1 δm
√2 cos(mπu)
¢21 +
P∞m=1 δ
2m
= h(u|δ), say, (36)
where δ = {δm}∞m=1, with the δm’s defined by (3), is unique and satisfy thecondition
P∞m=1m
3|δm| <∞. Hence, we can write f(x) = f(x|δ), where
f(x|δ) ≡ h(0.5 + π−1 arctan(x)|δ)π (1 + x2)
≡¡1 +
P∞m=1 δm
√2 cos(mπ/2 +m. arctan(x))
¢2π (1 + x2) (1 +
P∞k=1 δ
2k)
. (37)
34
The SNP version hn(u) of h(u) can now be written as
hn(u) = h(u|πnδ),where πn is the truncation operator, i.e.,
Definition 7.1. The operator πn applied to δ = {δm}∞m=1 as πnδ sets all theδm’s for m > n to zeros.
Similarly, the SNP version fn(x) of f(x) can be written as
fn(x) = f(x|πnδ).It follows now similar to Theorems 5.1 and 6.1 that the following results
hold.
Theorem 7.1. Every density f(x) with support R satisfying Assumptions6.1 and 6.2 has the exact and uniform cosine series representations f(x) ≡f(x|δ) with f(x|δ) defined by (37), where δ = {δm}∞m=1 is unique and iscontained in the infinite-dimensional parameter space
∆3 =
(δ = {δm}∞m=1 :
∞Xm=1
m3|δm| <∞). (38)
Moreover, with πn the truncation operator, we have
supx∈R
|f(x|δ)− f(x|πnδ)| = o(n−7/2), (39)
supx∈R
|f 0(x|δ)− f 0(x|πnδ)| = o(n−5/2),
supx∈R
|f 00(x|δ)− f 00(x|πnδ)| = o(n−3/2),
supx∈R
|f 000(x|δ)− f 000(x|πnδ)| = o(n−1/2),
where the derivatives are to x. Consequently, f 0(x) ≡ f 0(x|δ), f 00(x) ≡f 00(x|δ) and f 000(x) ≡ f 000(x|δ), respectively, exactly and uniformly on R.Furthermore, denoting the c.d.f. of f(x|δ) by F (x|δ), it follows from (39)that
supx∈R
|F (x|δ)− F (x|πnδ)| = o(n−7/2).
35
Remark 7.1. It can be shown by (3), (21), the change of variable x =tan(π(u− 0.5)), and the well-known sine-cosine formulas that for m ∈ N,
δ2m =√2(−1)m
R∞−∞ cos (2m arctan(x))
√f(x)√1+x2
dxR∞−∞
√f(x)√1+x2
dx,
δ2m−1 =√2(−1)m
R∞−∞ sin ((2m− 1) arctan(x))
√f(x)√1+x2
dxR∞−∞
√f(x)√1+x2
dx.
Hence, if f(x) is symmetric around zero then δ2m−1 = 0.
Remark 7.2. A natural norm on the space ∆3 in (38) is
||δ||3 =∞Xm=1
m3|δm| (40)
for δ = {δm}∞m=1. Endowed with this norm and associated metric ||δ1− δ2||3the space ∆3 becomes a Banach space,5 because it is not hard to verify thatevery Cauchy sequence in ∆3 converges to a limit in ∆3.
The results in Theorem 7.1 play a key role in deriving asymptotic normal-ity results for the sieve estimator of the Euclidean parameter vector in SNPmodels where these Euclidean parameters enter as argument in an unknowndistribution function or density function in the form of a linear combinationof covariates, as is the case for the SNP discrete choice model considered inBierens (2014b). Similar to Bierens (2014b) we also need to establish thecontinuity of f(x|δ), f 0(x|δ) and f 00(x|δ) in δ as well as the continuity in δof their first and second partial derivatives to the elements of δ = {δm}∞m=1.Due to the relation between h(u|δ) and f(x|δ) in (37), it suffices to verifythese continuity properties for h(u|δ), which I have already done in Bierens(2014b, Lemma 5.1), restated here as Lemma 7.1.
5A Banach space is a vector space endowed with a norm and associated metric suchthat every Cauchy sequence in this space takes a limit in this space. The differencewith a Hilbert space is that this norm is defined directly rather than on the basis of aninnerproduct.
36
Lemma 7.1. Let δ = {δm}∞m=1 and δ∗ = {δ∗,m}∞m=1 be sequences satisfying||δ||` =
P∞m=1m
`|δm| < ∞ and ||δ∗ − δ||` < 1 for some ` ∈ N, and denoteh(k)(u|δ) = dkh(u|δ)/(du)k, h(0)(u|δ) = h(u|δ), where the latter is defined by(36). Moreover, denote
∇mh(k)(u|δ∗) = ∂h(k)(u|δ)/∂δm¯̄δ=δ∗
, (41)
∇m1,m2h(k)(u|δ∗) = ∂2h(k)(u|δ)/(∂δm1∂δm2)
¯̄δ=δ∗
. (42)
Then for k = 0, 1, ..., ` and m,m1,m2 ∈ N,sup0≤u≤1
¯̄h(k)(u|δ)¯̄ < Ck,
sup0≤u≤1
¯̄∇mh(k)(u|δ)¯̄ < Ck.mk,
sup0≤u≤1
¯̄∇m1,m2h(k)(u|δ)¯̄ < Ck.mk
1mk2,
sup0≤u≤1
¯̄h(k)(u|δ∗)− h(k)(u|δ)
¯̄< Ck.||δ∗ − δ||k,
sup0≤u≤1
¯̄∇mh(k)(u|δ∗)−∇mh(k)(u|δ)¯̄ < Ck.mk.||δ∗ − δ||k,
sup0≤u≤1
¯̄∇m1,m2h(k)(u|δ∗)−∇m1,m2h
(k)(u|δ)¯̄ < Ck.mk1m
k2.||δ∗ − δ||k.
where the Ck’s are constants depending on k and the norm ||δ||k =P∞
m=1mk|δm|
for k = 0, 1, ..., ` only
Proof. See Lemma 5.1 in Bierens (2014b). The proof of this lemma is givenin a separate appendix to Bierens (2014b), which is reprinted in Bierens(2017, Chapter 10).
As said before, the results in Lemma 7.1 carry over to f(x|δ) definedby (37), for ` = 3, simply by replacing h(u|δ) with f(x|δ) and ” sup0≤u≤1 ”with ” supx∈R ”. In particular, Theorem 7.1 can now be augmented with thefollowing results.
Theorem 7.2. Under the conditions of Theorem 7.1, and with the derivativeoperations ∇m and ∇m1,m2 defined similar to (41) and (42), respectively, wehave for any pair δ, δ∗ ∈ ∆3 satisfying ||δ − δ∗||3 < 1,
supx∈R
|f(x|δ∗)− f(x|δ)| < C0.||δ∗ − δ||0
37
supx∈R
|f 0(x|δ∗)− f 0(x|δ)| < C1.||δ∗ − δ||1,supx∈R
|f 00(x|δ∗)− f 00(x|δ)| < C2.||δ∗ − δ||2supx∈R
|f 000(x|δ∗)− f 000(x|δ)| < C3.||δ∗ − δ||3,
supx∈R
|∇mf(x|δ∗)−∇mf(x|δ)| < C0.||δ∗ − δ||0,supx∈R
|∇mf 0(x|δ∗)−∇mf 0(x|δ)| < C1.m.||δ∗ − δ||1supx∈R
|∇mf 00(x|δ∗)−∇mf 00(x|δ)| < C2.m2||δ∗ − δ||2
supx∈R
|∇mf 000(x|δ∗)−∇mf 000(x|δ)| < C3.m3||δ∗ − δ||3,
and
supx∈R
|∇m1,m2f(x|δ∗)−∇m1,m2f(x|δ)| < C0.||δ∗ − δ||0,supx∈R
|∇m1,m2f0(x|δ∗)−∇m1,m2f
0(x|δ)| < C1.m1m2||δ∗ − δ||1supx∈R
|∇m1,m2f00(x|δ∗)−∇m1,m2f
00(x|δ)| < C2.m21m
22||δ∗ − δ||2
supx∈R
|∇m1,m2f000(x|δ∗)−∇m1,m2f
000(x|δ)| < C3.m31m
32||δ∗ − δ||3,
where the Ck’s for k = 0, 1, 2, 3 are constants depending on ||δ||3.
8 A compact infinite-dimensional parameterspace
A standard assumption for nonlinear parametric models is that the true pa-rameter vector involved is contained in the interior of a given compact set.This assumption is crucial for deriving the consistency and asymptotic nor-mality of the parameter estimators involved. In SNP models the parametersinvolved are infinite-dimensional, and in general infinite-dimensional compactsets are extremely small. Therefore, in the SNP literature [see for exampleChen (2007)] compactness assumptions are usually avoided.
38
However, in the case (38) it is possible to construct a compact subset of∆3, namely
∆3(M) =
(δ = {δm}∞m=1 :
∞Xm=1
m3|δm| ≤M)
(43)
for a given constant M > 0, endowed with the same norm ||δ||3 and associ-ated metric ||δ1 − δ2||3 as before. Admittedly, also ∆3(M) is small relative∆3, but as long as M is chosen larger than the norm ||δ0||3 of the true pa-rameter δ0, so that then δ0 is contained in the interior of ∆3(M), this spacesuffices as the infinite dimensional parameter space. Thus, the problem howto chooseM in this case is similar to the problem how to choose the compactparameter space in the case of nonlinear parametric models.The compactness of (43) follows from the following more general theorem.
Theorem 8.1. For ` > 0.5 and M > 0, denote
∆`(M) =
(δ = {δm}∞m=1 :
∞Xm=1
m`|δm| ≤M). (44)
(a) Endow this space with the norm ||δ||` =P∞
m=1m`|δm| and associated
metric ||δ1−δ2||`. Then ∆`(M) is compact with respect to the latter metric.(b) Moreover, denoting the pseudo-Euclidean norm by ||δ|| = pP∞
m=1 δ2m,
with associated pseudo-Euclidean metric ||δ1−δ2||, the space ∆`(M) is com-pact with respect to the pseudo-Euclidean metric as well.
This result is in essence a combination of Lemmas 3.1 and 3.2 in Bierens(2017, pp. 631-632), where part (b) corresponds to Lemma 3.1 and part(a) corresponds to Lemma 3.2. However, the proof of the latter lemma wasincorrect. The correct proof will be given in the Appendix.
9 An application to the SNP fractional indexregression model
9.1 The model
Let Y be a fractional dependent variable, i.e., Pr[Y ∈ (0, 1)] = 1, and letX ∈ Rd a vector of covariates. The SNP fractional index regression model
39
assumes that for some parameter vector θ0 and some unknown distributionfunction F0(x) on R,
E[Y |X] = E[Y |θ00X] = F0(θ00X).Note however that θ0 and F0 are not unique because for any constant
c 6= 0, E[Y |θ00X] = E[Y |c.θ00X] a.s., so that without loss of generality wemay normalize θ0 to ||θ0|| = 1, for example, or set one of the components ofθ0 equal to 1 or −1. Alternatively, one can achieve identification by imposingquantile conditions on F0, as in Bierens (2014b), but it is easier to restrictθ0 rather than F0.Similarly, we cannot allow a constant component in X, because for any
constant c 6= 0, E[Y |θ00X] = E[Y |c + θ00X] a.s. Moreover, if all the compo-nents of X are discrete then there exists multiple distinct θ0’s, and possiblyuncountable many θ0’s, such that E[Y |X] = E[Y |θ00X] a.s., even after nor-malization. See Bierens and Hartog (1988).Thus, at least one component of X needs to be continuously distributed
in order for the model involved to be identifiable, and such a componentneeds to have a nonzero coefficient and support R.As to the normalization of θ0, I will choose the following options. In
the case d ≥ 2, partition X as X = (X1,X02)0 and normalize θ0 as θ0 =
(1,β00)0, where conditional on X2, X1 is absolutely continuously distributed
with support R, whereas in the case d = 1, θ0 = 1 and X itself has anabsolutely continuously distributed with support R.Summarizing, the SNP fractional index regression model takes the form
Pr[Y ∈ (0, 1)] = 1, withE[Y |X] =
½F0(X1 + β00X2) a.s. if X = (X1, X
02)0 ∈ Rd, d ≥ 2,
F0(X) a.s. if X ∈ R, (45)
where in the case d ≥ 2, β0 ∈ Rd−1 is the Euclidean parameter, and theabsolutely continuous distribution function F0 with support R acts as thenon-Euclidean parameter.The model involved assumes that X1 has a positive effect on E[Y |X]. If
not, simply replace X1 by −X1, and similarly in the case X = X1.Next, suppose that
Assumption 9.1. The density f0 of F0 satisfies the conditions of Theorem7.1.
40
Then there exists a unique infinite-dimensional parameter δ0 = {δ0,m}∞m=1∈ ∆3 such that
F0(x) = F (x|δ0) =Z x
−∞f(z|δ0)dz
with f(x|δ) defined by (37). Thus, we can write the SNP fractional regressionmodel in the case d ≥ 2 as
Y = F (X1 + β00X2|δ0) + U, where E[U |X1,X2] = 0 a.s. (46)
9.2 Semi-nonparametric identification
In SNP modeling, the first question that needs to be answered is: Is themodel involved identified? In the present case the answer is Yes, providedthat the following conditions hold.
Assumption 9.2. In the case d ≥ 2, the conditional distribution of X1 givenX2 in the partition X = (X1, X
02)0 is absolutely continuous with support R.
Moreover E[||X2||2] <∞ and det(Var(X2)) > 0. In the case d = 1, X itselfhas an absolutely continuous distribution with support R.
It follows now similar to Lemma 2.1 in Bierens (2014b) and its proof [seeBierens (2017, pp. 585-586)] that if in the case d ≥ 2 there exist an alternativeparameter vector β ∈ Rd−1 and/or an alternative absolutely continuous c.d.f.F with support R such that
F (X1 + β0X2) = F0(X1 + β00X2) a.s., (47)
then under Assumption 9.2, (47) implies that β = β0 and F (x) = F0(x) forall x ∈ R. In other words:
Theorem 9.1. Under Assumption 9.2 the SNP fractional index regressionmodel (45) is semi-nonparametrically identified. Consequently, the parame-ters β0 and δ0 in the SNP fractional index regression model (46) are thenidentified as the unique least square solution
(β0, δ0) = arg min
β∈Rd−1,δ∈∆3
Q(β, δ),
whereQ(β, δ) = E
h(Y − F (X1 + β0X2|δ))2
i. (48)
41
9.3 Sieve least squares estimation: Strong consistency
From now on I will only consider the case d ≥ 2.Given that
Assumption 9.3. We observe a random sample {(Yj, X1,j,X2,j)}Nj=1 from(Y,X1,X2), with (X1,X 0
2)0 ∈ R×Rd−1, d ≥ 2,
the empirical counter-part of the expectation in (48) is
bQN(β, δ) = 1
N
NXj=1
(Yj − F (X1,j + β0X2,j|δ))2 . (49)
Moreover, as motivated in section 8, I will confine (β, δ) toB×∆3(M), where
Assumption 9.4. B is a given compact subset of Rd−1 with respect to theEuclidean metric ||β1− β2||, containing β0 in its interior, and ∆3(M) is thecompact set with respect to the metric ||δ1 − δ2||3 defined in (43), where Mis chosen so large that M > ||δ0||3.
In first instance one may think of mimicking the standard nonlinear leastsquares approach by using
(bβ,bδ) = arg min(β,δ)∈B×∆3(M)
bQN(β, δ).as an estimator of (β0, δ
0). However, due to the fact that ∆3(M) is infinite-dimensional, this solution is not unique, and in general none of these solutionare consistent.The cure for this problem is sieve estimation, as follows. Denote for
n ∈ N,
∆3,n(M) =
(δ = {δm}∞m=1 :
nXm=1
m3|δm| ≤M, δm = 0 for m > n),
which is called a sieve space, and the collection {∆3,n(M)}∞n=1 is called thesieve, satisfying ∪∞n=1∆3,n(M) = ∆3(M). Now the idea of sieve estimation,proposed by Grenander (1981), is in the present case to use
(bβnN ,bδnN ) = arg min(β,δ)∈B×∆3,nN
(M)
bQN(β, δ). (50)
42
as an estimator of (β0, δ0), where nN is a subsequence of the sample size N
satisfyinglimN→∞
nN =∞, limN→∞
nN/N = 0.
Then the following strong consistency results hold.
Theorem 9.2. Under Assumptions 9.1-9.4, bβnN a.s.→ β0 and ||bδnN−δ0||3 a.s.→ 0,hence
supx∈R
¯̄̄F0(x)− F (x|bδnN )¯̄̄ a.s.→ 0,
supx∈R
¯̄̄f0(x)− f(x|bδnN )¯̄̄ a.s.→ 0,
supx∈R
¯̄̄f 00(x)− f 0(x|bδnN )¯̄̄ a.s.→ 0.
Proof. See the Appendix.
9.4 Sieve least squares estimation: Asymptotic nor-mality
The next step is to set forth conditions such that√N(bβnN−β0) d→ Nd−1(0,Σ),
using the approach in Bierens (2014b). This approach mimics the standardfinite-dimensional approach on the basis of the mean value theorem for thefirst order conditions, except that instead of inverting the matrix of secondderivatives and then selecting the upper-left (d − 1) × (d − 1) part as theestimate of the asymptotic variance matrix of
√N(bβnN − β0), the latter is
singled out via projections.First note that under the conditions of Theorem 9.2, and by (50), (bβnN ,bδnN )
is an interior point of B×∆3,nN (M) with probability converging to 1 asN → ∞. Consequently, it follows from the first-order conditions for (50)that, with probability converging to 1,
√N ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(bβnN ,bδnN ) = 0 for i = 1, 2, ..., d− 1,√N ∂ bQN(β, δ)/∂δm ¯̄̄
(β,δ)=(bβnN ,bδnN ) = 0 for m = 1, 2, ..., nN ,
jointly, where βi is component i of β = (β1, ..,βd−1)0.
43
Then by the mean value expansion around (β0,πnNδ0) we have, for i =
1, 2, ..., d− 1,
op(1) =√N ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(bβnN ,bδnN )=√N ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(β0,πnN δ0)
+d−1Xs=1
µ∂2 bQN(β, δ)/(∂βi∂βs)¯̄̄
(β,δ)=(eβ(i)nN ,eδ(i)nN )¶√
N(bβs,nN − βs,0)
+
nNXm=1
µ∂2 bQN(β, δ)/(∂βi∂δm)¯̄̄
(β,δ)=(eβ(i)nN ,eδ(i)nN )¶√
N(bδm,nN − δ0,m),
(51)
where (eβ(i)nN ,eδ(i)nN ) is a mean value, i.e., for some λi,N ∈ [0, 1],(eβ(i)nN ,eδ(i)nN ) = (β0,πnNδ0) + λi,N(bβnN − β0,bδnN − πnNδ
0)
so that||eβ(i)nN − β0|| ≤ ||bβnN − β0||
and||eδ(i)nN − πnNδ
0||3 ≤ ||bδnN − πnNδ0||3.
Moreover, bβs,nN and βs,0 are components s of bβnN and β0 respectively, andbδm,nN and δ0,m are components m of bδnN and δ0, respectively.Similarly, for k = 1, 2, ..., nN ,
op(1) =√N ∂ bQN(β, δ)/∂δk ¯̄̄
(β,δ)=(bβnN ,bδnN )=√N ∂ bQN(β, δ)/∂δk ¯̄̄
(β,δ)=(β0,πnN δ0)
+d−1Xs=1
µ∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄
(β,δ)=(bβ(k)nN,bδ(k)nN
)
¶√N(bβs,nN − βs,0)
+
nNXm=1
µ∂2 bQN(β, δ)/(∂δk∂δm)¯̄̄
(β,δ)=(bβ(k)nN,bδ(k)nN
)
¶√N(bδm,nN − δ0,k),
(52)
44
where the (bβ(k)nN ,bδ(k)nN )’s are again mean values satisfying||bβ(k)nN − β0|| ≤ ||bβnN − β0||
and||bδ(k)nN − πnNδ
0||3 ≤ ||bδnN − πnNδ0||3.
In order to derive the asymptotic normality of√N(bβnN − β0) we have
to get rid of the terms involving√N(bδm,nN − δ0,k) in (51) and (52). As
said before, the standard finite-dimensional approach to invert the (d− 1 +nN)× (d−1+nN) matrix of the second derivatives involved is not applicablebecause nN → ∞. Therefore, in Bierens (2014b) I proposed the followingtrick to get around this problem.Convert the d− 1+ nN mean value equations to a single system of mean
value equations in function form, using an orthogonal sequence of weightfunctions ηk(u) for k = 1, 2, ..., d−1+nN on [0, 1]. In Bierens (2014b) I haveproposed to choose
ηk(u) = 2−k√2 cos(kπu), k ∈ N, (53)
which I will adopt here as well. Then the mean value equations can bewritten as cWN(u) = bVN(u)− bZN(u)
+d−1Xs=1
bcs,N(u)√N(bβs,nN − βs,0)
+
nNXm=1
bbm,N(u)√N(bδm,nN − δ0,k), (54)
where
cWN(u) =√N
d−1Xi=1
µ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(bβnN ,bδnN )¶ηi(u)
+√N
nNXk=1
µ∂ bQN(β, δ)/∂δk ¯̄̄
(β,δ)=(bβnN ,bδnN )¶ηd−1+k(u),
bVN(u) =√N
d−1Xi=1
µ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(β0,πnN δ0)
45
− ∂ bQN(β, δ)/∂βi ¯̄̄(β,δ)=(β0,δ
0)
¶ηi(u)
+√N
nNXk=1
µ∂ bQN(β, δ)/∂δk ¯̄̄
(β,δ)=(β0,πnN δ0)
− ∂ bQN(β, δ)/∂δk ¯̄̄(β,δ)=(β0,δ
0)
¶ηd−1+k(u),
bZN(u) = −√N
d−1Xi=1
µ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(β0,δ0)
¶ηi(u)
−√N
nNXk=1
µ∂ bQN(β, δ)/∂δk ¯̄̄
(β,δ)=(β0,δ0)
¶ηd−1+k(u),
bbs,N(u) =d−1Xi=1
µ∂2 bQN(β, δ)/(∂βi∂βs)¯̄̄
(β,δ)=(eβ(i)nN ,eδ(i)nN )¶ηi(u)
+
nNXk=1
µ∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄
(β,δ)=(bβ(k)nN,bδ(k)nN
)
¶ηd−1+k(u),
s = 1, 2, .., d− 1,
bcm,N(u) =d−1Xi=1
µ∂2 bQN(β, δ)/(∂βi∂δm)¯̄̄
(β,δ)=(eβ(i)nN ,eδ(i)nN )¶ηi(u)
+
nNXk=1
µ∂2 bQN(β, δ)/(∂δk∂δm)¯̄̄
(β,δ)=(bβ(k)nN,bδ(k)nN
)
¶ηd−1+k(u),
m = 1, 2, ..., nN .
It follows straightforwardly from (51) and (52) that
cWN(u) = op(1)
uniformly in u ∈ [0, 1]. Moreover, in the Appendix it will be shown that
Lemma 9.1. With nN ∝ N1/6 or faster, bVN(u) = op(1) uniformly in u ∈[0, 1],
where here and in the lemmas below in this section it is assumed that theconditions of Theorem 9.2 hold and the weight functions ηk are defined by
46
(53). Then (54) reads
op(1) = − bZN(u) + d−1Xs=1
bbs,N(u)√N(bβs,nN − βs,0)
+
nNXm=1
bcm,N(u)√N(bδm,nN − δ0,k)
= − bZN(u) +bbN(u)0√N(bβnN − β0)
+
nNXm=1
bcm,N(u)√N(bδm,nN − δ0,k), (55)
where bbN(u)0 = (bb1,N(u), ...,bbd−1,N(u)).In order to get rid of the bcm,N(u)’s in (55), project each bbs,N(u) on the
space spanned by {bcm,N(u)}nNm=1, and denote the projection residuals involvedby bas,N(u). Then baN(u) = (ba1,N(u), ...,bad−1,N(u))0 is the vector of residualsof the projection of bbN(u) on the Hilbert space spanned by {bcm,N(u)}nNm=1, sothatZ 1
0
baN(u)bcm,N(u)du = 0,
Z 1
0
baN(u)bbN(u)0du = Z 1
0
baN(u)baN(u)0du,Z 1
0
baN(u)0baN(u)du ≤ Z 1
0
bbN(u)0bbN(u)du,hence Z 1
0
baN(u)du× op(1)=
Z 1
0
baN(u) bZN(u)du+µZ 1
0
baN(u)baN(u)0du¶√N(bβnN − β0). (56)
Next, it will be shown in the Appendix that similar to Bierens (2014b)the following results hold.
Lemma 9.2. For s = 1, 2, ..., d− 1,
p limN→∞
sup0≤u≤1
|bbs,N(u)− bs(u)| = 0,47
where
bs(u) =d−1Xi=1
³∂2Q(β, δ)/(∂βi∂βs)
¯̄β=β0,δ=δ
0
´ηi(u)
+∞Xk=1
³∂2Q(β, δ)/(∂δk∂βs)
¯̄β=β0,δ=δ
0
´ηd−1+k(u),
and for m ∈ N,p limN→∞
sup0≤u≤1
|bcm,N(u)− cm(u)| = 0,where
cm(u) =d−1Xi=1
³∂2Q(β, δ)/(∂βi∂δm)
¯̄β=β0,δ=δ
0
´ηi(u)
+∞Xk=1
³∂2Q(β, δ)/(∂δk∂δm)
¯̄β=β0,δ=δ
0
´ηd−1+k(u).
Hence, the residual bas,N(u) of the projection of bbs,N(u) on the space spannedby {bcm,N(u)}nNm=1 satisfies
p limN→∞
Z 1
0
(bas,N(u)− as(u))2du = 0, (57)
where as(u) is the residual of the projection of bs(u) on the space spanned by{cm(u)}∞m=1.
Since as(u) ∈ L2(0, 1), henceR 10a(u)0a(u)du <∞, where
a(u) = (a1(u), ..., ad−1(u))0,
it follows thatR 10baN(u)du = Op(1), hence R 10 baN(u)du×op(1) = op(1) in (56).
Moreover, given that
Assumption 9.5. det³R 1
0a(u)a(u)0du
´> 0,
it follows that
p limN→∞
µZ 1
0
baN(u)baN(u)0du¶−1 = µZ 1
0
a(u)a(u)0du¶−1
.
48
Furthermore, the following result holds.
Lemma 9.3. bZN(u) converges weakly to a zero-mean Gaussian process Z(u),with covariance function
Γ(u1, u2) = 4∞Xk=1
∞Xm=1
E[U2ρk(X)ρm(X)]ηk(u1)ηm(u2),
whereρi(X) = f(X1 + β00X2|δ0)Xi,2, i = 1, 2, ..., d− 1,ρd−1+k(X) = ∇kF (X1 + β00X2|δ0), k ∈ N. (58)
Then (56) now reads,
√N(bβnN − β0) =
µZ 1
0
a(u)a(u)0du¶−1 Z 1
0
a(u)Z(u)du+ op(1).
Hence, sinceR 10a(u)Z(u)du is zero-mean normal, the following result holds.
Theorem 9.3. Let the conditions of Theorem 9.2 hold. Choose nN ∝ N1/6
or faster and choose the weight functions ηk as in (53). Then under Assump-tion 9.5,
√N(bβnN − β0)
d→ Nd−1(0,Σ),
where
Σ =
µZ 1
0
a(u)a(u)0du¶−1 Z 1
0
Z 1
0
a(u1)Γ(u1, u2)a(u2)0du1du2
×µZ 1
0
a(u)a(u)0du¶−1
.
This result depends crucially on Assumption 9.5, i.e.,
det
µZ 1
0
a(u)a(u)0du¶> 0.
If the latter does not hold then for some nonzero vector γ ∈ Rd−1, γ0b(u) ∈span({ck(u)}∞k=1), the latter being the Hilbert space spanned by the sequence{ck(u)}∞k=1, which is a subspace of the Hilbert space L2(0, 1).
49
Let an(u) be the residual of the projection of b(u) on span({ck(u)}nk=1).Then by a well-known projection result in L2(0, 1) we have
limn→∞
Z 1
0
(an(u)− a(u))0(an(u)− a(u))du = 0,
hence
limn→∞
Z 1
0
an(u)an(u)0du =
Z 1
0
a(u)a(u)0du.
Therefore, a necessary condition for Assumption 9.5 is that for all n ∈ N,det
³R 10an(u)an(u)
0du´> 0. As shown in Bierens (2014b), the latter condi-
tion holds if and only if for all n ∈ N the (d − 1 + n) × (d − 1 + n) matrixof second derivatives to β and δ(n) = (δ1, ..., δn)
0 of Q(β, δ) in (β0, δ0) isnonsingular. Thus, denoting
Bd−1+n =
Ã∂2Q(β,δ)∂β∂β0
∂2Q(β,δ)
∂β0∂δ(n)∂2Q(β,δ)
∂δ(n)0∂β∂2Q(β,δ)
∂δ(n)∂δ(n)0
!¯̄̄̄¯(β,δ)=(β0,δ
0)
,
it follows that
Lemma 9.4. Assumption 9.5 implies that det(Bd−1+n) > 0 for all n ∈ N.
Finally, it follows now from Bierens (2014b, Theorem 6.2) that
Theorem 9.4. The variance matrix Σ in Theorem 9.3 takes the form Σ =limn→∞Σn, where
Σn = (Id−1, Od−1,d−1+n)B−1d−1+nCd−1+nB−1d−1+n
µId−1Od−1+n,d−1
¶,
with
Cd−1+n = 4E
⎡⎢⎣U2⎛⎜⎝ ρ1(X)
...ρd−1+n(X)
⎞⎟⎠ (ρ1(X), ..., ρd−1+n(X))⎤⎥⎦ ,
where the ρk(X)’s are defined in (58).
50
Note that Cd−1+n is the variance matrix of the (d − 1 + n)-dimensionalscore vector
√N
⎛⎜⎜⎜⎜⎜⎜⎜⎝
∂ bQN(β, δ)/∂β0 ¯̄̄(β,δ)=(β0,δ
0)
∂ bQN(β, δ)/∂δ1 ¯̄̄(β,δ)=(β0,δ
0)...
∂ bQN(β, δ)/∂δn ¯̄̄(β,δ)=(β0,δ
0)
⎞⎟⎟⎟⎟⎟⎟⎟⎠.
Moreover, as I have shown in Bierens (2014b), the existence of the limitΣ = limn→∞Σn is implied by Assumption 9.5, which is plausible but difficult,if not impossible, to verify.Finally, in order to use the results in Theorems 9.3 and 9.4 for inference
on β0 we need a consistent estimator of Σ, which can be derived as follows.Suppose in first instance that δ0 is finite-dimensional, so that δ0 = πnδ
0 forsome fixed n ∈ N. Then the nonlinear least squares (NLLS) problem involvedbecomes fully parametric, hence under standard NLLS conditions,6
√N(bβn − β0)
d→ Nd−1(0,Σn),
where Σn is the came as in Theorem 9.4. In this case Σn can be consistentlyestimated by
bΣn = (Id−1, Od−1,n−d+1) bB−1d−1+n bCd−1+n bB−1d−1+nµ Id−1On−d+1,d−1
¶,
where bBd−1+n and bCd−1+n are the usual consistent estimators of Bd−1+n andCd−1+n, respectively, in the fixed n case. Then it is not hard to verify thatunder the conditions of Theorems 9.3 and 9.4,
p limN→∞
bΣnN = Σ.
10 Concluding remarks
In this paper I have set forth conditions such that SNP densities on [0, 1] andR based on the cosine sequence, and their first, second and third derivatives,are uniform convergent. Moreover, I have shown that without much loss
6See for example Jennrich (1969).
51
of generality the infinite-dimensional parameter involved can be confinedto a compact infinite-dimensional parameter space. The usefulness of theseresults have been demonstrated by an application to the SNP fractional indexregression model.Note that these results are not directly applicable to SNP maximum like-
lihood models, like the SNP discrete choice model considered in Bierens(2014b). In the latter case it was assumed that, with G the logistic dis-tribution function with density g, and f0 the density of the c.d.f. F0 in theSNP discrete choice model Pr[Y = 1|X] = F0(X
0θ0), the density h0(u) =f0(G
−1(u))/g(G−1(u)) satisfies h0(0) > 0 and h0(1) > 0. The latter condi-tions enabled me to derive the asymptotic normality of the sieve estimatorof θ0, similar to Theorems 9.3 and 9.4. However, under the uniform con-vergence conditions in the present paper, with G the c.d.f. of the standardCauchy distribution, we have h0(0) = h0(1) = 0, which would make thisasymptotic normality proof very complicated. See Bierens (2014b, RemarkC). On the other hand, the discrete choice model involved can be estimatedby sieve nonlinear least squares under similar conditions as in section 9, andwith similar results as in Theorems 9.3 and 9.4, but at the expense of loss ofefficiency.
11 Appendix: Proofs
11.1 Lemma 4.1
For K ∈ N and c > 1/2, we have by Lyapunov’s inequality,Pn+Km=n+1m
−c|αm|Pk+Kk=n+1 k
−2c =n+KXm=n+1
Ãm−2cPk+K
k=n+1 k−2c
!mc|αm|
≤vuut n+KX
m=n+1
Ãm−2cPk+K
k=n+1 k−2c
!m2cα2m
=
qPn+Km=n+1 α
2mqPk+K
k=n+1 k−2c.
52
Letting K →∞ it follows that
∞Xm=n+1
m−c|αm| ≤vuut ∞X
m=n+1
α2m
vuut ∞Xk=n+1
k−2c.
SinceP∞
k=n+1 k−2c ≤ P∞
k=n+1
R kk−1 x
−2cdx =R∞nx−2cdx = 1
2c−1n1−2c andP∞
m=n+1 α2m = o(1), Lemma 4.1 follows.
11.2 Lemma 6.1
There exists an x0 > 0 such that either ψ(x) > 0 or ψ(x) < 0 for allx > x0. Without loss of generality we may assume that the former case ap-plies. Thus, ψ(x) > 0 for all x > x0. Now suppose that for someM ∈ (0,∞),limy→∞ supx≥y xψ(x) ≥M. Then there exists a y0 > x0 such that for all x ≥y0, xψ(x) > M/2, which implies that
R∞y0
ψ(x)dx ≥ (M/2) R∞y0x−1dx = ∞.
However, the latter contradictsR∞−∞ |ψ(x)|dx <∞. Thus, limy→∞ supx≥y xψ(x)
= 0, which implies that limx→∞ xψ(x) = 0. By a similar argument it can beshown that limx→−∞ xψ(x) = 0.
11.3 Theorem 8.1
The result of Lemma 3.1 in Bierens (2017, p. 631), which is in essence part(b) of Theorem 8.1, hinges on Lemma A.1 in Bierens (2008), reprinted inBierens (2017, Chapter 7, p. 455). In particular, note that
∆`(M) ⊂ X∞m=1[−M.m−`,M.m−`] = ∆X` (M), (59)
say, where by Lemma A.1 in Bierens (2008) the latter set is compact with re-spect to the pseudo Euclidean metric ||δ1−δ2||, provided that
P∞m=1m
−2` <∞. The latter condition holds if and only if ` > 0.5.In Lemma 3.1 in Bierens (2017, p. 631) I have used the fact that a
set is compact if and only if it is totally bounded and complete. See Roy-den (1968, Proposition 15, p. 164). Since ∆X
` (M) is compact and there-fore totally bounded, it follows trivially from (59) that ∆`(M) is totallybounded. Completeness of ∆`(M) means that every Cauchy sequence in∆`(M) takes a limit in ∆`(M). This completeness proof is easy. See Bierens(2017, p. 632). Hence, ∆`(M) is compact with respect to the pseudo Euclid-ean metric ||δ1 − δ2||, which proves part (b) of Theorem 8.1.
53
To prove part (a), I will employ another criterion for compactness, namelythat a set is compact if and only is it is sequentially compact. See Royden(1968, Corollary 14, p.163). Sequential compactness means that every se-quence in the set involved has a convergent subsequence. Since by part(b) of Theorem 8.1 ∆`(M) is compact with respect to the pseudo Euclid-ean metric ||δ1 − δ2||, it follows therefore that for an arbitrary sequenceδn = {δn,m}∞m=1 ∈ Ξ`(M) there exist a subsequence nk and an elementδ = {δm}∞m=1 ∈ Ξ`(M) such that limk→∞ ||δnk − δ|| = 0, which triviallyimplies that
limk→∞
supm∈N
|δnk,m − δm| = 0.
To show that limk→∞ ||δnk − δ||` = 0 as well, note that for any L ∈ N wehave
lim supk→∞
∞Xm=1
m`|δnk,m − δm| ≤ lim supk→∞
LXm=1
m`|δnk,m − δm|
+ lim supk→∞
∞Xm=1+L
m`|δnk,m − δm|
= lim supk→∞
∞Xm=1+L
m`|δnk,m − δm|.
Clearly, the latter ”lim sup ” is invariant for L.Next, denote Sk(L) =
P∞m=1+Lm
`|δnk,m−δm| and let lim supk→∞ Sk(L) =η, which does not depend on L. Then by the definition of ”lim sup”,
η = infL∈N
µlim sup
k→∞Sk(L)
¶= inf
L∈Ninfs∈Nsupk≥s
Sk(L)
≤ infL∈N
supk≥s
Sk(L) for all s ∈ N.
Since supk≥s Sk(L) is decreasing in L we have
infL∈N
supk≥s
Sk(L) = supk≥s
Sk(∞) = supk≥s
limL→∞
Sk(L),
hence, for an arbitrary s ∈ N,
η ≤ supk≥s
limL→∞
Sk(L) = supk≥s
ÃlimL→∞
∞Xm=1+L
m`|δnk,m − δm|!= 0.
54
The latter follows from the fact thatP∞
m=1m`|δnk,m − δm| ≤ 2M. Conse-
quently, limk→∞ ||δnk − δ||` = 0 as well. This completes the proof of part(a) of Theorem 8.1, including the correction of the proof of Lemma 3.2 inBierens (2017, p. 632-633).
11.4 Theorem 9.2
First note that, by Theorem 7.2, for each β ∈ B, F (X1 + β0X2|δ) is a.s.continuous in δ ∈ ∆3(M) with respect to the metric ||δ1 − δ2||3, whereas,trivially, for each δ ∈ ∆3(M), F (X1 + β0X2|δ) is a.s. continuous in β withrespect to the Euclidean metric ||β1− β2||. Consequently, F (X1 + β0X2|δ) isa.s. continuous in (β, δ) ∈ B×∆3(M) with respect to the combined metric||β1 − β2|| + ||δ1 − δ2||3, for example, and so is (Y − F (X1 + β0X2|δ))2. Itfollows now straightforwardly from Jennrich’s (1969) uniform strong law of
large numbers that sup(β,δ)∈B×∆3(M)
¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄ a.s.→ 0 as N → ∞,and therefore
sup(β,δ)∈B×∆3,nN
(M)
¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄ a.s.→ 0 (60)
as well. Moreover, by the continuity of Q(β, δ) we have
limN→∞
Q(β0,πnNδ0) = Q(β0, δ
0). (61)
Next, observe that
0 ≤ Q(bβnN ,bδnN )−Q(β0, δ0)= Q(bβnN ,bδnN )− bQN(bβnN ,bδnN ) + bQN(bβnN ,bδnN )−Q(β0, δ0)≤ sup
(β,δ)∈B×∆3,nN(M)
¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄+ bQN(β0,πnNδ0)−Q(β0, δ0)≤ 2 sup
(β,δ)∈B×∆3,nN(M)
¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄+Q(β0,πnNδ0)−Q(β0, δ0)a.s.→ 0
where the last result follows from (60) and (61), hence
Q(bβnN ,bδnN ) a.s.→ Q(β0, δ0). (62)
55
Again, by the continuity of Q(β, δ) and the uniqueness of (β0, δ0), (62) im-
plies, similar to standard nonlinear regression, that bβnN a.s.→ β0 and ||bδnN −δ0||3 a.s.→ 0. The uniform a.s. convergence results for F0, f0 and f 00 follow nowfrom Theorem 7.2.
11.5 Lemma 9.1
Note that
∂ bQN(β, δ)/∂βi = −2 1N
NXj=1
(Yj − F (X1,j + β0X2,j|δ))f(X1,j + β0X2,j|δ)Xi,2,j
i = 1, 2, ..., d− 1, (63)
where Xi,2,j is component i of X2,j, and using the notation ∇k = ∂/∂δk as in(41), we have
∂ bQN(β, δ)/∂δk = −2 1N
NXj=1
(Yj − F (X1,j + β0X2,j|δ))∇kF (X1,j + β0X2,j|δ)
k = 1, 2, ..., nN (64)
Hence, by (53),
E
∙sup0≤u≤1
|bVN(u)|¸≤ 2√2
Ãd−1Xi−12−i!√
N
×E £¯̄(Y − F (X1 + β00X2|πnNδ0))f(X1 + β00X2|πnNδ0)−(Y − F (X1 + β00X2|δ0))f(X1 + β00X2|δ0)
¯̄.||X2||
¤+2√2.2−d+1
√N
nNXk=1
2−k
×E £¯̄(Y − F (X1 + β00X2|πnNδ0))∇kF (X1 + β00X2|πnNδ0)−(Y − F (X1 + β00X2|δ0))∇kF (X1 + β00X2|δ0)
¯̄¤Moreover,
E£¯̄(Y − F (X1 + β00X2|πnNδ0))f(X1 + β00X2|πnNδ0)
56
−(Y − F (X1 + β00X2|δ0))f(X1 + β00X2|δ0)¯̄.||X2||
¤≤ 2E [||X2||] . sup
x∈R
¯̄f(x|πnNδ0)− f(x|δ0)
¯̄+E [||X2||] sup
x∈Rf(x|δ0). sup
x∈R
¯̄F (x|δ0)− F (x|πnNδ0)
¯̄≤ E [||X2||]
µ2 + sup
x∈Rf(x|δ0)
¶supx∈R
¯̄f(x|πnNδ0)− f(x|δ0)
¯̄= o(n
−7/2N ),
where the latter follows from part (39) of Theorem 7.1. Similarly,
E£¯̄(Y − F (X1 + β00X2|πnNδ0))∇kF (X1 + β00X2|πnNδ0)
−(Y − F (X1 + β00X2|δ0))∇kF (X1 + β00X2|δ0)¯̄¤
≤ 2 supx∈R
¯̄∇kF (x|πnNδ0)−∇kF (x|δ0)¯̄+ sup
x∈R
¯̄∇kF (x|δ0)¯̄ . supx∈R
¯̄F (x|δ0)− F (x|πnNδ0)
¯̄≤ 2 sup
x∈R
¯̄∇kf(x|πnNδ0)−∇kf(x|δ0)¯̄+ sup
x∈R
¯̄∇kf(x|δ0)¯̄ . supx∈R
¯̄f(x|δ0)− f(x|πnNδ0)
¯̄≤ 2C0.||δ0 − πnNδ
0||0 + o(n−7/2N ).
where the latter follows from Theorems 7.1 and 7.2.Hence,
E
∙sup0≤u≤1
|bVN(u)|¸ = o(n−7/2nN
√N) +O
³√N ||δ0 − πnNδ
0||0´
= o(n−3N√N),
where the last equality follows from
||δ0 − πnδ0||0 =
∞Xm=n+1
|δ0,m| ≤ n−3∞X
m=n+1
m3|δ0,m| = o(n−3).
Thus, limN→∞Ehsup0≤u≤1 |bVN(u)|i = 0 if nN ∝ N1/6 or faster.
57
11.6 Lemma 9.2
Recall that for s = 1, ..., d− 1,
bbs,N(u) =d−1Xi=1
µ∂2 bQN(β, δ)/(∂βi∂βs)¯̄̄
(β,δ)=(eβ(i)nN ,eδ(i)nN )¶2−i√2 cos(iπu)
+
nNXk=1
µ∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄
(β,δ)=(bβ(k)nN,bδ(k)nN
)
¶2−d+1−k
×√2 cos((d+ 1 + k)πu)where
∂2 bQN(β, δ)/∂βi∂βs= −2 1
N
NXj=1
(Yj − F (X1,j + β0X2,j|δ))f 0(X1,j + β0X2,j|δ)Xi,2,jXs,2,j
+21
N
NXj=1
(f(X1,j + β0X2,j|δ)))2Xi,2,jXs,2,j, i = 1, 2, ..., d− 1.
Hence by the uniform strong law of Jennrich (1969),
sup(β,δ)∈B×∆3(M)
¯̄̄∂2 bQN(β, δ)/(∂βi∂βs)−E h∂2 bQN(β, δ)/(∂βi∂βs)i¯̄̄ a.s.→ 0
and thus
∂2 bQN(β, δ)/(∂βi∂βs)¯̄̄(β,δ)=(eβ(i)nN ,eδ(i)nN )
a.s.→ Eh∂2 bQN(β, δ)/(∂βi∂βs)i¯̄̄
β=β0,δ=δ0
where
Eh∂2 bQN(β, δ)/(∂βi∂βs)i¯̄̄
β=β0,δ=δ0
= −2E[U.f 0(X1 + β00X2|δ0)Xi,2Xs,2] + 2Eh¡f(X1 + β00X2|δ0)
¢2Xi,2Xs,2
i= 2E
h¡f(X1 + β00X2|δ0)
¢2Xi,2Xs,2
i= ∂2Q(β, δ)/(∂βi∂βs)
¯̄β=β0,δ=δ
0 .
Note that the second equality follows from E[U |X] = 0 a.s. and the lastequality follows by dominated convergence.
58
Similarly,
∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄(β,δ)=(bβ(k)nN
,bδ(k)nN)
a.s.→ Eh∂2 bQN(β, δ)/(∂δk∂βs)i¯̄̄
β=β0,δ=δ0
where
Eh∂2 bQN(β, δ)/(∂δk∂βs)i¯̄̄
β=β0,δ=δ0
= 2E£(OkF (X1 + β00X2|δ0))f(X1 + β00X2|δ0)Xs,2
¤= ∂2Q(β, δ)/(∂δk∂βs)
¯̄β=β0,δ=δ
0 ,
where the differentiation operator Ok is defined similar to (41).Finally, it is easy to verify that uniformly in k ∈ N and s = 1, 2, .., d− 1,
∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄(β,δ)=(bβ(k)nN
,bδ(k)nN)= Op(1),
hence
∞Xk=n+1
µ∂2 bQN(β, δ)/(∂δk∂βs)¯̄̄
(β,δ)=(bβ(k)nN,bδ(k)nN
)
¶2−d+1−k = Op
à ∞Xk=n+1
2−k!
= Op(2−n) = op(1).
It follows now straightforwardly that for s = 1, 2, ..., d− 1,p limN→∞
sup0≤u≤1
|bbs,N(u)− bs(u)| = 0,where
bs(u) =d−1Xi=1
³∂2Q(β, δ)/(∂βi∂βs)
¯̄β=β0,δ=δ
0
´2−i√2 cos(iπu)
+∞Xk=1
³∂2Q(β, δ)/(∂δk∂βs)
¯̄β=β0,δ=δ
0
´2−d+1−k
×√2 cos((d+ 1 + k)πu).
Along similar lines it can be shown that for m ∈ N,p limN→∞
sup0≤u≤1
|bcm,N(u)− cm(u)| = 0,59
where
cm(u) =d−1Xi=1
³∂2Q(β, δ)/(∂βi∂δm)
¯̄β=β0,δ=δ
0
´2−i√2 cos(iπu)
+∞Xk=1
³∂2Q(β, δ)/(∂δk∂δm)
¯̄β=β0,δ=δ
0
´2−d+1−k
×√2 cos((d+ 1 + k)πu).
The result (57) follows now from Lemma B.1 in Bierens (2014b).
11.7 Lemma 9.3
Recall that
bZN(u) = −√N
d−1Xi=1
µ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(β0,δ0)
¶2−i√2 cos(iπu)
−√N
nNXk=1
µ∂ bQN(β, δ)/∂δk ¯̄̄
(β,δ)=(β0,δ0)
¶2−d+1−k
×√2 cos((d− 1 + k)πu)
By (63) and (64) we have
√N ∂ bQN(β, δ)/∂βi ¯̄̄
(β,δ)=(β0,δ0)= −2 1√
N
NXj=1
Ujf(X1,j + β00X2,j|δ0)Xi,2,j
i = 1, 2, ..., d− 1,
√N ∂ bQN(β, δ)/∂δk ¯̄̄
(β,δ)=(β0,δ0)= −2 1
N
NXj=1
Uj∇kF (X1,j + β00X2,j|δ0)
k = 1, 2, ..., nN
whereUj = Yj − F (X1,j + β00X2,j|δ0).
Recall thatE[Uj|Xj] = 0 a.s., where Xj = (X1,j, X 0
2,j)0.
60
Hence, denoting
ρi(Xj) = f(X1,j + β00X2,j|δ0)Xi,2,j, i = 1, 2, ..., d− 1,ρd−1+k(Xj) = ∇kF (X1,j + β00X2,j|δ0), k ∈ N,
we can write
bZN(u) = 2√N
NXj=1
Uj
Ãd−1+nNXk=1
ρk(Xj)2−k√2 cos(kπu)
!.
Next, let
eZN(u) = 2√N
NXj=1
Uj
à ∞Xk=1
ρk(Xj)2−k√2 cos(kπu)
!.
Then
E
∙Z 1
0
³eZN(u)− bZN(u)´2 du¸
=4
N
NXj=1
E
⎡⎣U2j Z 1
0
à ∞Xk=d−1+nN
ρk(Xj)2−k√2 cos(kπu)
!2du
⎤⎦=4
N
NXj=1
E
"U2j
∞Xk=d−1+nN
ρk(Xj)22−2k
#
= 4∞X
k=d−1+nNE£U2ρk(X)
2¤2−2k → 0 as N →∞,
hence bZN(u)⇒ Z(u) if eZN(u)⇒ Z(u), where⇒ indicates weak convergence.But eZN(u) ⇒ Z(u) follows from Billingsley (1968, Theorem 8.2), where bythe standard central limit theorem, Z(u) is a zero-mean Gaussian process on[0, 1], with covariance function
Γ(u1, u2) = E[Z(u1)Z(u2)] = Eh eZN(u1) eZN(u2)i
= 4∞Xk=1
∞Xm=1
E[U2ρk(X)ρm(X)]2−k−m+1 cos(kπu1) cos(mπu2).
61
ReferencesBierens, H. J. (2008): ”Semi-Nonparametric Interval-Censored Mixed
Proportional HazardModels: Identification and Consistency Results”, Econo-metric Theory 24, 749-794.Bierens, H. J. (2014a): ”The Hilbert Space Theoretical Foundation of
Semi-Nonparametric Modeling”. Chapter 1 in: J. Racine, L. Su and A. Ullah(eds), The Oxford Handbook of Applied Nonparametric and SemiparametricEconometrics and Statistics, Oxford University Press.Bierens, H. J. (2014b): ”Consistency and Asymptotic Normality of Sieve
ML Estimators Under Low-Level Conditions”, Econometric Theory 30, 1021-1076.Bierens, H. J. (2017): Econometric Model Specification: Consistent Model
Specification Tests and Semi-Nonparametric Modeling and Inference. WorldScientific Publishers.Bierens, H. J., and J. Hartog (1988): ”Non-linear Regression with Dis-
crete Explanatory Variables, with an Application to the Earnings Function”,Journal of Econometrics 38, 269-299.Billingsley, P. (1968): Convergence of Probability Measures. Wiley.Chen, X. (2007): ”Large Sample Sieve Estimation of Semi-Nonparametric
Models”. Chapter 76 in J. J. Heckman and E. Leamer (eds.), Handbook ofEconometrics, Vol. 6. Elsevier.Courant, R., and D. Hilbert (1953): Methods of Mathematical Physics,
Vol. I. Interscience Publishers.Gallant, A. R., and D. W. Nychka (1987): ”Semi-Nonparametric Maxi-
mum Likelihood Estimation”, Econometrica 55, 363-390.Grenander, U. (1981): Abstract Inference. Wiley.Hamming, R. W. (1973): Numerical Methods for Scientists and Engi-
neers. Dover Publications.Jennrich, R. I. (1969): ”Asymptotic Properties of Nonlinear Least Squares
Estimators”, Annals of Mathematical Statistics 40, 633-643.Royden, H. L. (1968): Real Analysis. Macmillan.
62