Nonparametric Estimation of Scalar

Embed Size (px)

Citation preview

  • 7/28/2019 Nonparametric Estimation of Scalar

    1/31

    The Annals of Statistics

    2004, Vol. 32, No. 5, 22232253DOI 10.1214/009053604000000797 Institute of Mathematical Statistics, 2004

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS

    BASED ON LOW FREQUENCY DATA1

    BY EMMANUEL GOBET, MARC HOFFMANN AND MARKUS REI

    Ecole Polytechnique, Universit Paris 7 and Humboldt-Universitt zu Berlin

    We study the problem of estimating the coefficients of a diffusion(Xt, t 0); the estimation is based on discrete data Xn, n = 0, 1, . . . , N .The sampling frequency 1 is constant, and asymptotics are taken as thenumber N of observations tends to infinity. We prove that the problemof estimating both the diffusion coefficient (the volatility) and the driftin a nonparametric setting is ill-posed: the minimax rates of convergence

    for Sobolev constraints and squared-error loss coincide with that of a,respectively, first- and second-order linear inverse problem. To ensureergodicity and limit technical difficulties we restrict ourselves to scalardiffusions living on a compact interval with reflecting boundary conditions.

    Our approach is based on the spectral analysis of the associated Markovsemigroup. A rate-optimal estimation of the coefficients is obtained viathe nonparametric estimation of an eigenvalueeigenfunction pair of thetransition operator of the discrete time Markov chain (Xn, n = 0, 1, . . . , N )in a suitable Sobolev norm, together with an estimation of its invariantdensity.

    1. Introduction.

    1.1. Overview. Since Fellers celebrated classification, stationary scalar diffu-sions have served as a representative model for homogeneous Markov processesin continuous time. Historically, diffusion processes were probably first seen asapproximation models for discrete Markov chains, up to an appropriate rescal-ing in time and space. More recently, the development of financial mathematicshas argued in favor of genuinely continuous time models, with simple dynamicsgoverned by a local mean (drift) b(

    ) and local variance (diffusion coefficient, or

    volatility) () on the state space S= R or SR with appropriate boundary con-dition. The dynamics are usually described by an It-type stochastic differentialequation in the interior ofS, which in the time-homogeneous case reads like

    dXt = b(Xt) dt+ (Xt) dWt, t 0,

    Received July 2002; revised June 2003.1Supported in part by the European network Dynstoch and the DFG-Sonderforschungsbereich

    373.AMS 2000 subject classifications. 62G99, 62M05, 62M15.Key words and phrases. Diffusion processes, nonparametric estimation, discrete sampling, low

    frequency data, spectral approximation, ill-posed problems.

    2223

  • 7/28/2019 Nonparametric Estimation of Scalar

    2/31

    2224 E. GOBET, M. HOFFMANN AND M. REI

    where the driving process (Wt, t 0) is standard Brownian motion. The growingimportance of diffusion models progressively raised among the community ofstatisticians a vast research program, from both quantitative and theoretical angles.

    We outline the main achievements of this program in Section 1.2.In the late 1970s a statistician was able to characterize qualitatively theproperties of a parametric ergodic diffusion model based on the continuousobservation of a sample path

    XT := (Xt, 0 t T )of the trajectory, as T , that is, as the time length of the experiment grows toinfinity, a necessary assumption to assure the growing of information thanks to therecurrence of the sample path. The 1980s explored various discretization schemesof the continuous time model: the data XT could progressively be replaced by the

    more realistic observationX(N,N) := XnN, n = 0, 1, . . . , N ,

    with asymptotics taken as N . The discretization techniques used at that timerequired the high frequency sampling assumption N 0 whereas N N in order to guarantee the closeness of X(N,N) and XT, with T = N N. Soon,a similar nonparametric program was achieved for both continuous time and highfrequency data.

    By the early to mid-1990s, the frontier remained the fixed case, that is,the case of low frequency data. This is the topic of the present paper. First, one

    must understand the importance and flexibility gained by being able to relax theassumption that the sampling time between two data points is small: indeed,one can hardly deny that, in practice, it may well happen that sampling witharbitrarily small is simply not feasible. Put differently, the asymptotic statisticaltheory is a mathematical construct to assess the quality of an estimator basedon discrete observations and it must be decided which asymptotics are adequatefor the data at hand. Second, the statistical nature of the problem drasticallychanges when passing from high to low frequency sampling: the approximationproperties of the sample path XN N by X(N,N) are not valid anymore; the

    observation(X

    0, X

    , . . . , X

    N )

    becomes a genuine Markov chain, and inferenceabout the underlying coefficients of the diffusion process must be sought via theidentification of the law of the observation X(N,N). In the time-homogeneouscase the mathematical properties of the random vector X(N,N) are embodied inthe transition operator

    Pf(x) := E[f (X)|X0 = x],defined on appropriate test functions f. Under suitable assumptions, the opera-tor P is associated with a Feller semigroup (Pt, t 0) with a densely definedinfinitesimal generator L on the space of continuous functions given by

    Lf (x) = L,b f(x) := 2(x)2

    f(x) + b(x)f(x).

  • 7/28/2019 Nonparametric Estimation of Scalar

    3/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2225

    The second-order term () is the diffusion coefficient, and the first-order term b()is the drift coefficient. Postulating the existence of an invariant density () =,b(), the operator L is unbounded, but self-adjoint negative on L2() :={f| |f|2 < }, and the functional calculus gives the correspondence

    P = exp(L)(1.1)in the operator sense. Therefore, a consistent statistical program can be presumedto start from the observed Markov chain X(N,), estimate its transition operator Pand infer about the pair (b(), ()), via the correspondence (1.1), in other wordsvia the spectral properties of the operator P. Expressed in a diagram, we obtainthe following line:

    data

    =X(N,)

    (E)

    P

    (I )

    L

    b(), ()= parameter.(1.2)The efficiency of a given statistical estimation procedure will be measured by theproficiency in combining the estimation part (E) and the identification part (I ) ofthe model.

    The works of Hansen, Scheinkman and Touzi (1998) and Chen, Hansen andScheinkman (1997) paved the way: they formulated a precise and thorough pro-gram, proposing and discussing several methods for identifying scalar diffusionsvia their spectral properties. Simultaneously, the Danish school, given on impulseby the works of Kessler and Srensen (1999), systematically studied the para-metric efficiency of spectral methods in the fixed setting described above. By

    constructing estimating functions based on eigenfunctions of the operator L, theycould construct

    N-consistent estimators and obtained precise asymptotic prop-

    erties.However, a quantitative study of nonparametric estimation in the fixed

    context remained out of reach for some time, for both technical and conceptualreasons. The purpose of the present paper is to fill in this gap, by trying tounderstand and explain why the nonparametric case significantly differs fromits parametric analogue, as well as from the high frequency data framework innonparametrics.

    We are going to establish minimax rates of convergence over various smooth-ness classes, characterizing upper and lower bounds for estimating b() and ()based on the obervation ofX0, X, . . . , XN, with asymptotics taken as N .The minimax rate of convergence is an index of both accuracy of estimation andcomplexity of the model. We will show that in the nonparametric case the com-plexity of the problems of estimating b() and () is related to ill-posed inverseproblems. Although we mainly focus on the theoretical aspects of the statisticalmodel, the estimators we propose are based on feasible nonparametric smoothingmethods: they can be implemented in practice, allowing for adaptivity and finitesample optimization. Some simulation results were performed by Rei (2003).

    The estimation problem is exactly formulated in Section 2, where also themain theoretical results are stated. The spectral estimation method we adopt is

  • 7/28/2019 Nonparametric Estimation of Scalar

    4/31

    2226 E. GOBET, M. HOFFMANN AND M. REI

    explained in Section 3, which includes a discussion of related problems andpossible extensions. The proofs of the upper bound for our estimator and itsoptimality in a minimax sense are given in Sections 4 and 5, respectively. Results

    of rather technical nature are deferred to Section 6.

    1.2. Statistical estimation for diffusions: an outlook. We give a brief andselective summary of the evolution of the area over the last two decades. Thenonparametric identification of diffusion processes from continuous data wasprobably first addressed in the reference paper of Banon (1978). More preciseestimation results can be listed as follows:

    1.2.1. Continuous or high frequency data: the parametric case. Estimation ofa finite-dimensional parameter from XT

    =(Xt, 0

    t

    T ) with asymptotics as

    T when X is a diffusion of the formdXt = b(Xt) dt+ (Xt) dWt(1.3)

    is classical [Brown and Hewitt (1975) and Kutoyants (1975)]. Here (Wt, t 0) isa standard Wiener process. The diffusion coefficient is perfectly identified fromthe data by means of the quadratic variation of X. By assuming the process Xto be ergodic (positively recurrent), a sufficiently regular parametrization b() implies the local asymptotic normality (LAN) property for the underlyingstatistical model, therefore ensuring the

    T-consistency and efficiency of the

    ML-estimator [see Liptser and Shiryaev (2001)].In the case of discrete data XnN, n = 0, 1, . . . , N , with high frequencysampling 1N , but long range observation N N as N , variousdiscretization schemes and estimating procedures had been proposed [Yoshida(1992) and Kessler (1997)] until Gobet (2002) eventually proved the LAN propertyfor ergodic diffusions of the form

    dXt = b1 (Xt) dt+ 2 (Xt) dWt(1.4)in a general setting, by means of the Malliavin calculus: under suitable regularityconditions, the finite-dimensional parameter 1 in the drift term can be estimatedwith optimal rate N N, whereas the finite-dimensional parameter 2 in thediffusion coefficient is estimated with the optimal rate

    N.

    1.2.2. Continuous or high frequency data: the nonparametric case. A similarprogram was progressively obtained in nonparametrics: If the drift function b()is globally unknown in the model given by (1.3), but belongs to a Sobolev ballS(s,L) (of smoothness order s > 0 and radius L) over a given compact interval I,a certain kernel estimator bT() achieves the following upper bound in L2(I) andin a root-mean-squared sense:

    supbS(s,L)

    EbT b2L2(I)1/2 Ts/(2s+1).

  • 7/28/2019 Nonparametric Estimation of Scalar

    5/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2227

    This already indicates a formal analogy with the model of nonparametricregression or signal + white noise where the same rate holds. (Here and inthe sequel, the symbol means up to constants, possibly depending on the

    parameters of the problem, but that are continuous in their arguments.) SeeKutoyants (1984) for precise mathematical results.

    Similar extensions to the discrete case with high frequency data samplingfor the model driven by (1.4) were given in Hoffmann (1999), where the rates(N N)

    s/(2s+1) for the drift function b() and Ns/(2s+1) for the diffusioncoefficient () have been obtained and proved to be optimal. See also thepioneering paper of Pham (1981). Methods of this kind have been successfullyapplied to financial data [At-Sahalia (1996), Stanton (1997), Chapman andPearson (2000) and Fan and Zhang (2003)]. In particular, it is investigated whether

    the usual parametric model assumptions are compatible with the data, and the useof nonparametric methods is advocated.

    1.2.3. From high to low frequency data. As soon as the sampling frequency1N = 1 is not large anymore, the problem of estimating a parameter in thedrift or diffusion coefficient becomes significantly more difficult: the trajectoryproperties that can be recovered from the data when N is small are lost. Inparticular, there is no evident approximating scheme that can efficiently computeor mimic the continuous ML-estimator in parametric estimation.

    Likewise, the usual nonparametric kernel estimators, based on differencing, donot provide consistent estimation of the drift b() or the diffusion coefficient ().As a concrete example, consider the standard NadarayaWatson estimator b(x) ofthe drift b(x) in the point x R:

    b(x) := (N)1 N1

    n=0 Kh(x Xn)(X(n+1) Xn)N1

    N1n=0 Kh(x Xn)

    with a kernel function K() and Kh(x) := h1K(h1x) for h > 0. If we letN and h 0, then by the ratio ergodic theorem and by kernel propertieswe obtain almost surely the limit

    E[1(X X0)|X0 = x] = 1

    0Ptb(x)dt.

    Hence, this estimator is not consistent. It merely yields a highly blurred versionofb(x), which of course tends to b(x) in the high frequency limit 0. Note thatthe transition operators Pt involved depend on the unknown functions b() and ()as a whole. The situation for estimators of () based on the approximation of thequadratic variation is even worse, because the drift b() enters directly into thelimit expression.

  • 7/28/2019 Nonparametric Estimation of Scalar

    6/31

    2228 E. GOBET, M. HOFFMANN AND M. REI

    1.2.4. Spectral methods for parametric estimation. Kessler and Srensen(1999) suggested the use of eigenvalues and eigenvectors () of theparametrized infinitesimal generator

    Lf(x) =2 (x)

    2f(x) + b(x)f(x),

    that is, such that L(x) = (x). Indeed, since the pair (, ) also satisfiesP(Xn) = E

    X(n+1)

    Xn= exp()(Xn),whenever it is easy to compute, the knowledge of a pair (, ) can be translatedinto a set of conditional moment conditions to be used in estimating functions.With their method, Kessler and Srensen can construct

    N-consistent estimators

    that are nearly efficient. See also the paper of Hansen, Scheinkman and Touzi(1998) that we already mentioned.

    In a sense, in this idea also lies the essence of our method. However, thestrategy of Kessler and Srensen is not easily extendable to nonparametrics: thereis no straightforward way to pass from a finite-dimensional parametrization of thegenerator L with explicit eigenpairs (, ) to a full nonparametric space withsatisfactory approximation properties. Besides, there would be no evident routeto treat the variance of such speculative nonparametric estimators either, becausethe behavior of the parametric Fisher information matrix for a growing number ofparameters is too complex to be easily controlled. We will see in Section 3 how to

    pass over these objections by estimating directly an eigenpair nonparametrically.

    1.2.5. Prospectives. A quick summary yields Table 1 for optimal rates ofconvergence.

    Table 1 can be interpreted as follows: the difficulty of the estimation problemis increasing from top to bottom and from left to right. A blank line separatesthe continuoushigh-frequency (HF) data domain from the low-frequency (LF)data domain. The breach for LF data opened by Kessler and Srensen as well asby Hansen, Scheinkman and Touzi shows that

    N-consistent estimators usually

    exist in the parametric case. The remaining case are the rates of convergence forLF data in the nonparametric case uN for the drift b() and vN for the diffusioncoefficient (), for which we are aiming.

    TABLE 1

    Parametric Nonparametric

    b b

    Continuous T1/2 known Ts/(2s+1) knownHF data (N N)1/2 N1/2 (NN)s/(2s+1) Ns/(2s+1)

    LF data N1/2 N1/2 uN vN

  • 7/28/2019 Nonparametric Estimation of Scalar

    7/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2229

    2. Main results.

    2.1. A diffusion model with boundary reflections. We shall restrict ourselves

    to reflecting diffusions on a one-dimensional interval to avoid highly nontrivialtechnical issues; see the discussion in Section 3.3.

    Choosing for convenience the interval [0, 1], we suppose the following.

    ASSUMPTION 2.1. The function b : [0, 1] R is measurable and bounded,the function : [0, 1] (0, ) is continuous and positive and the function : [0, 1] R satisfies (0) = 1, (1) = 1.

    We consider the stochastic differential equation

    dXt = b(Xt) dt+ (Xt) dWt + (Xt) dLt(X),(2.1)

    X0 = x0 and Xt [0, 1] t 0.The process (Wt, t 0) is a standard Brownian motion and (Lt(X),t 0) isa nonanticipative continuous nondecreasing process that increases only whenXt {0, 1}. The boundedness ofb() and the ellipticity of () ensure the existenceof a weak solution; see for instance Stroock and Varadhan (1971). Note that theprocess L(X) is part of the solution and is given by a difference of local times ofX

    at the boundary points of[0, 1].Due to the compactness of [0, 1] and the reflecting boundary conditions,

    the Markov process X has a spectral gap, which implies geometric ergodicity;compare with Lemmas 6.1 and 6.2. In particular, a unique invariant measure exists and the one-dimensional distributions ofXt converge exponentially fast to as t so that the assumption of stationarity can be made without loss ofgenerality for asymptotic results.

    We denote byP,b the law of the associated stationary diffusion on the canonicalspace = C(R+, [0, 1]) of continuous functions over the positive axis with valuesin [0, 1], equipped with the topology of uniform convergence and endowed with itsBorel -fieldF. We denote byE,b the corresponding expectation operator. GivenN 1 and > 0, we observe the canonical process (Xt, t 0) at equidistanttimes n for n = 0, 1, . . . , N . Let FN denote the -field generated by {Xn|n =0, . . . , N }.

    DEFINITION 2.2. An estimator of the pair ( (),b()) is an FN-measurablefunction on with values in L2([0, 1]) L2([0, 1]).

    To assess the L2-risk in a minimax framework, we introduce the nonparametricset s , which consists of pairs of functions of regularity s and s 1, respectively.

  • 7/28/2019 Nonparametric Estimation of Scalar

    8/31

    2230 E. GOBET, M. HOFFMANN AND M. REI

    DEFINITION 2.3. For s > 1 and given constants C c > 0, we consider theclass s := (s,C,c) defined by

    (,b) Hs ([0, 1]) Hs1([0, 1])Hs C, bHs1 C, infx (x) c,where Hs denotes the L2-Sobolev space of order s.

    Note that all ( (),b()) s satisfy Assumption 2.1.

    2.2. Minimax rates of convergence. We are now in position to state the maintheorems. By (3.12) and (3.13) in the next section we define estimators 2 and busing a spectral estimation method based on the observation (X0, X, . . . , XN ).These estimators, which of course depend on the number N of observations, satisfy

    the following uniform asymptotic upper bounds.

    THEOREM 2.4. For all s > 1, C c > 0 and0 < a < b < 1 we havesup

    (,b)sE,b

    2 22L2([a,b])

    1/2Ns/(2s+3),

    sup(,b)s

    E,bb b2

    L2([a,b])1/2

    N(s1)/(2s+3).

    Recall that A B means that A can be bounded by a constant multiple of B,

    where the constant depends continuously on other parameters involved. Similarly,A B is equivalent to B A and AB holds if both relations A B and A Bare true.

    As the following lower bounds prove, the rates of convergence of our estimatorsare optimal in a minimax sense over s .

    THEOREM 2.5. Let EN denote the set of all estimators according toDefinition 2.2. Then for all 0 a < b 1 ands > 1 the following hold:

    inf

    2

    EN

    sup(,b)

    s

    E,b

    2 22

    L2([a,b])

    1/2Ns/(2s+3),(2.2)

    infbEN

    sup(,b)s

    E,bb b2

    L2([a,b])1/2

    N(s1)/(2s+3).(2.3)

    If we set s1 = s 1 and s2 = s, then the drift b() s with regularity s1 can beestimated with the minimax rate of convergence uN = Ns1/(2s1+5), whereas thediffusion coefficient () s has regularity s2 and the corresponding minimaxrate of convergence is vN = Ns2/(2s2+3). Hence, Table 1 in Section 1.2.5 can befilled with two rather unexpected rates of convergence uN and vN. In Section 3.3the rates are explained in the terminology of ill-posed problems and reasons are

    given why the tight connection between the regularity assumptions on b() and ()is needed.

  • 7/28/2019 Nonparametric Estimation of Scalar

    9/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2231

    3. Spectral estimation method.

    3.1. The basic idea. We shall base our estimator of the diffusion coeffi-

    cient () and of the drift coefficient b() on spectral methods for passing fromthe transition operator P, which is approximately known to us, to the infinitesi-mal generator L, which more explicitly encodes the functions () and b(). In thesequel, we shall rely on classical results for scalar diffusions; for example, consultBass [(1998), Chapter 4]. We use the specific form of the invariant density

    (x) = 2C02(x) expx

    02b(y)2(y)dy

    (3.1)

    and the function S() = 1/s (), derived from the scale function s(),

    S(x) = 12 2(x)1(x) = C0 exp x0

    2b(y)2(y)dy,(3.2)with the normalizing constant C0 > 0 depending on () and b(). The action ofthe generator in divergence form is given by

    Lf (x) = L,b f(x) =1

    22(x)f(x) + b(x)f(x) = 1

    (x)

    S(x)f(x)

    ,(3.3)

    where the domain of this unbounded operator on L2() is given by the subspaceof the L2-Sobolev space H2 with Neumann boundary conditions

    dom(L) = {f H2([0, 1])|f(0) = f(1) = 0}.The generator L is a self-adjoint elliptic operator on L2() with compact resolventso that it has nonpositive point spectrum only. If 1 denotes the largest negativeeigenvalue ofL with eigenfunction u1, then due to the reflecting boundary of[0, 1]the Neumann boundary conditions u1(0) = u1(1) = 0 hold and thus

    Lu1 = 1(Su1) = 1u1 S(x)u1(x) = 1x

    0u1(y)(y)dy.(3.4)

    From (3.2) we can derive an explicit expression for the diffusion coefficient:

    2(x) = 21x

    0 u1(y)(y)dy

    u1(x)(x).(3.5)

    The corresponding expression for the drift coefficient is

    b(x) = 1u1(x)u

    1(x)(x) u1(x)

    x0 u1(y)(y)dy

    u1(x)2(x).(3.6)

    Hence, if we knew the invariant measure , the eigenvalue 1 and the eigenfunc-tion u1 (including its first two derivatives), we could exactly determine the drift and

    diffusion coefficient. Of course, these identities are valid for any eigenfunction ukwith eigenvalue k , but for better numerical stability we shall use only the largest

  • 7/28/2019 Nonparametric Estimation of Scalar

    10/31

    2232 E. GOBET, M. HOFFMANN AND M. REI

    nondegenerate eigenvalue 1. Moreover, it is known that only the eigenfunction u1does not have a vanishing derivative in the interior of the interval (cf. Proposi-tion 6.5) so that by this choice indeterminacy at interior points is avoided.

    Using semigroup theory [Engel and Nagel (2000), Theorem IV.3.7] we knowthat u1 is also an eigenfunction ofP with eigenvalue 1 = e1 . Our procedureconsists of determining estimators of and P of P, to calculate thecorresponding eigenpair (1, u1) and to use (3.5) and (3.6) to build a plug-inestimator of () and b().

    3.2. Construction of the estimators. We use projection methods, takingadvantage of approximating properties of abstract operators by finite-dimensionalmatrices, for which the spectrum is easy to calculate numerically. A similarapproach was already suggested by Chen, Hansen and Scheinkman (1997). Morespecifically, we make use of wavelets on the interval [0, 1]. For the construction ofwavelet bases and their properties we refer to Cohen (2000).

    DEFINITION 3.1. Let () with multiindices = (j,k) be a compactly sup-ported L2-orthonormal wavelet basis ofL2([0, 1]). The approximation spaces (VJ)are defined as L2-closed linear spans of the wavelets up to the frequency level J,

    VJ := span{||| J} where |(j,k)| := j.The L2-orthogonal projection onto VJ is called J; the L2()-orthogonal

    projection onto VJ is called

    J .In the sequel we shall regularly use the Jackson and Bernstein inequalities with

    respect to the L2-Sobolev spaces Hs ([0, 1]) of regularity s:(Id J)fHt 2J (st )fHs , 0 t s,

    vJ VJ, vJHs 2J (st )vJHt, 0 t s.The canonical projection estimate of based on (Xn)0nN is given by

    := ||J with :=1

    N+ 1N

    n=0

    (Xn).(3.7)

    By the ergodicity of X it follows that is a consistent estimate of, forN . To estimate the action of the transition operator on the wavelet basis(PJ), := P, , we introduce the symmetrized matrix estimator Pwith entries

    (P), :=1

    2N

    Nn=1

    X(n1)

    (Xn) +

    X(n1)

    (Xn)

    .(3.8)

    This yields an approximation of P , , that is, of the action of thetransition operator on VJ with respect to the unknown scalar product ,

  • 7/28/2019 Nonparametric Estimation of Scalar

    11/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2233

    in L2(). We therefore introduce a third statistic G, which approximates thedim(VJ) dim(VJ)-dimensional Gram matrix G with entries G, = , ,and which is given by

    G, :=1

    N

    1

    2(X0)(X0)

    + 12

    (XN )(XN) +N1n=1

    (Xn)(Xn)

    .

    (3.9)

    The particular treatment of the boundary terms will be explained later. If weput = (wn(Xn))||J, nN with w0 = wN = 12 and wn = 1 otherwise, we

    have G = N1

    T

    , T

    being the transpose of. Our construction can thus beregarded as a least squares type estimator, as in a usual regression setting; see theargument developed in Section 3.3.1.

    We combine the last two estimators in order to determine estimates for theeigenvalue 1 and the eigenfunction u1 of P. As will be made precise inProposition 4.5, the operators P and

    J P are close for large values ofJ. Note

    that all eigenvectors of J P lie in VJ, the range ofJ P. The eigenfunction u

    J1

    corresponding to the second largest eigenvalue J1 ofJ P is characterized by

    PuJ

    1 , = J

    1 uJ

    1 , || J.(3.10)We pass to vector-matrix notation and use from now on bold letters to define fora function v VJ the corresponding coefficient column vector v = (v, )||J.Observe carefully the different L2-scalar products used; here they are with respectto the Lebesgue measure. Thus, we can rewrite (3.10) as

    PJuJ1 = J1 GuJ1 .(3.11)

    As vTGv=

    v, v

    > 0 holds for v

    VJ

    \ {0}, the matrix G is invertible and

    (J1 , uJ1 ) is an eigenpair ofG

    1PJ. This matrix is self-adjoint with respect to thescalar product induced by G:

    G1PJv, wG := (G1PJv)TGw= vTPJw = v, G1PJwG.

    Similarly, vTGv = N1(Tv)TTv 0 holds and the matrix G can be shownto be even strictly positive definite with high probability (see Lemma 4.12). In

    this case, we similarly infer that G1P is self-adjoint with respect to the G-scalarproduct. The CauchySchwarz inequality and the inequality between geometric

  • 7/28/2019 Nonparametric Estimation of Scalar

    12/31

    2234 E. GOBET, M. HOFFMANN AND M. REI

    and arithmetic mean yield the estimate

    G1Pv, v

    G

    =

    1

    N

    N

    n=1 vX(n1)v(Xn) 1

    N

    N1n=0

    v(Xn)2

    1/2 Nn=1

    v(Xn)2

    1/2

    1N

    1

    2v(X0)

    2 + 12

    v(XN)2 +

    N1n=1

    v(Xn)2

    = v, v

    G.

    We infer that all eigenvalues of G1

    P are real and not larger than 1. Hence,the second largest eigenvalue 1 of G1P is well defined, which is why wedownweighted the boundary terms of G. The eigenvalue 1 of G1P and itscorresponding eigenvector u1 yield estimators ofJ1 and u

    J1 .

    Plugging the estimator as well as 1 and u1 into (3.5) and (3.6), we obtain ourestimators of2() and b():

    2(x) := 21 log(1)

    x0 u1(y)(y)dy

    u1(x)(x),(3.12)

    b(x) := 1 log(1) u1(x)u1(x)(x) u1(x) x0 u1(y)(y)dyu1(x)2(x)

    .(3.13)

    To avoid indeterminacy, the denominators of the estimators are forced to remainabove a certain minimal level, which depends on the subinterval [a, b] [0, 1] forwhich the loss function is taken. See (4.5) for the exact formulation in the caseof2() and proceed analogously for b().

    3.3. Discussion.

    3.3.1. Least squares approach. The estimator matrix G1

    P is built as inthe least squares approach for projection methods in classical regression. Toestimate P0 (x) = E,b[0 (X)|X0 = x], the least squares method consistsof minimizing

    Nn=1

    0 (Xn) ||J 0X(n1)

    2

    min!(3.14)

    over all real coefficients (0), leading to the normal equations

    Nn=1

    ||J

    0X(n1)X(n1)= Nn=1

    X(n1)0 (Xn)

  • 7/28/2019 Nonparametric Estimation of Scalar

    13/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2235

    for all || J. Up to the special treatment of the boundary terms, we thus obtainthe vector (0) as the column with index 0 in G

    1P.

    3.3.2. Other than wavelet methods. For our projection estimates to work,we merely need approximation spaces satisfying the Jackson and Bernsteininequalities. Hence, other finite element bases could serve as well.

    The invariant density and the transition density could also be estimated usingkernel methods, but the numerical calculation of the eigenpair (1, u1) would theninvolve an additional discretization step.

    3.3.3. Diffusions over the real line. Using a wavelet basis ofL2(R), it is stillpossible to estimate and P over the real line; in particular the eigenvaluecharacterization (3.10) extends to this case. Hansen, Scheinkman and Touzi (1998)derive the same formulae as (3.5) and (3.6) under ergodicity and boundaryconditions so that a plug-in approach is feasible. However, a theoretical studyseems to require much more demanding theoretical tools. If the uniform separationof the spectral value 1 and a polynomial growth bound for the eigenfunctionu1() are ensured, we expect that the same minimax results hold with respect toan L2()-loss function, where the invariant density () is of course parameter-dependent. However, all spectral approximation results have to be reconsideredwith extra care, in particular because the L2()-norms are in general not equivalentfor different parameters.

    3.3.4. Adaptation to unknown smoothness. The knowledge of the smooth-ness s that is needed for the construction of our estimators is not realistic inpractice. An adaptive estimation of the eigenpair (u1(), 1) and () that yieldsadaptive estimators for ( (),b()) could be obtained by the following modifica-tions: First, the adaptive estimation of () in a classical mixing framework isfairly well known [e.g., Tribouley and Viennet (1998)]. Second, taking advan-tage of the multiresolution structure provided by wavelets, the adaptive estima-tion of P could be obtained by introducing an appropriate thresholding in theestimated matrices on a large approximation space.

    3.3.5. Interpretation as an ill-posed problem. One can make the link with ill-posed inverse problems by saying that estimation of() is well-posed (i.e., withachievable rate Ns/(2s+1)), but for S() we need an estimate of the derivative u1()yielding an ill-posedness degree of 1 (Ns/(2s+3)). Observe that the regularityconditions Hs and b Hs1 are translated into Hs , S Hs . Thetransformation of(,S) to 2() = 2S()/() is stable [L2-continuous for S() s0 > 0], whereas in b() = S()/() another ill-posed operation (differentiation)occurs with degree 1.

    A brief stepwise explanation reads as follows. Step 1, the natural parametriza-tion (,P) is well-posed (for P in the strong operator norm sense). Step 2, the

  • 7/28/2019 Nonparametric Estimation of Scalar

    14/31

    2236 E. GOBET, M. HOFFMANN AND M. REI

    calculation of the spectral pair (1, u1) is well-posed. Step 3, the differentiationof u1 that determines S has an ill-posedness of degree 1. Step 4, the calculationof 2 from (,S) is well-posed. Step 5, the calculation of b from (,S) is ill-

    posed of degree 1.

    3.3.6. Regularity restrictions on b() and (). It is noteworthy that in thecontinuous time or high frequency observation case, the parameter b() does notinfluence the asymptotic behavior of the estimator of () and vice versa. Theestimation problems are separated. In our low frequency regime we had to supposetight regularity connections between () and b(). This stems from the fact thatfor the underlying Markov chain X(N,) the parameters () and S() are morenatural and the regularity of these functions depends on the regularity both ofb()

    and of ().At a different level, in nonparametric regression, different smoothness con-straints are needed between the mean and the variance function. Recommendedreferences are Mller and Stadtmller (1987) and Fan and Yao (1998).

    Finally, although we ask for the tight connection s1 = s2 1 for the regularitys1 of the drift b() and s2 of the diffusion coefficient (), our results readily carryover to the milder constraint s1 s2 1.

    3.3.7. Estimation when one parameter is known. If () is known, anestimate

    of the invariant density yields an estimate ofb(

    ), since

    b(x) = (2(x)(x))

    2(x), x [0, 1].

    Estimation of Hs , s > 1,in H1-norm can be achieved with rate N(s1)/(2s+1)and this rate is thus also valid for estimating b() in L2-norm. Given the driftcoefficient b(), we find

    2(x) = 2x

    0 b(y)(y)dy + C(x)

    , x [0, 1],

    where C is a suitable constant. If we knew 2(0), we would obtain the rateNs/(2s+1) for Hs .

    Using a preliminary nonparametric estimate 2C depending on the parameter Cand then fitting a parametric model for C, we are likely to find the same rate. Inany case, the assumption of knowing one parameter seems rather artificial and nofurther investigations have been performed.

    3.3.8. Estimation at the boundary. Our plug-in estimators can only be definedon the open subinterval (0, 1). Estimation at the boundary points leads to a

    risk increase due to S1(0) = 1u1(0)(0)/u1(0) by de lHspitals rule appliedto (3.4). Thus, estimating (0) and b(0) involves taking the second and third

  • 7/28/2019 Nonparametric Estimation of Scalar

    15/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2237

    derivative, respectively, when using plug-in estimators. A pointwise lower boundresultalong the lines of the L2-lower bound proofshows that this deteriorationcannot be avoided.

    4. Proof of the upper bound.

    4.1. Convergence of . First, we recall the proof for the risk bound inestimating the invariant measure:

    PROPOSITION 4.1. With the choice 2J N1/(2s+1) the following uniform riskestimate holds for based on N observations:

    sup(,b)s

    E,b 2L21/2Ns/(2s+1).

    PROOF. The explicit formula (3.1) for shows that Hs is uniformlybounded over s . This implies that the bias term satisfies

    JL2 2J sHs Ns/(2s+1),uniformly over s . Since is an unbiased estimator of, , we can apply thevariance estimates of Lemma 6.2 to obtain

    E,b J2L2= ||J Var,b[ ] 2

    JN1,

    whichin combination with the uniformity of the constants involvedgives theannounced upper bound.

    4.2. Spectral approximation. We shall rely on the spectral approximationresults given in Chatelin (1983); compare also Kato (1995). Since for () wehave to estimate not only the eigenvalue u1, but also its derivative u1, we willbe working in the L2-Sobolev space H1. The general idea is that the error in theeigenvalue and in the eigenfunction can be controlled by the error of the operatoron the eigenspace, once the overall error measured in the operator norm is small.Let R(T,z) = (T z Id)1 denote the resolvent map of the operator T, ( T ) itsspectrum and B(x,r) the closed ball of radius r around x.

    PROPOSITION 4.2. Suppose a bounded linear operatorT on a Hilbert spacehas a simple eigenvalue such that ( T ) B(,) = {} holds for some > 0. Let T be a second linear operator with T T < R2 , where R :=(supzB(,) R(T,z))1. Then the operator T has a simple eigenvalue inB(,) and there are eigenvectors u andu with T u

    =u, Tu

    =u satisfying

    u u 8R1(T T )u.(4.1)

  • 7/28/2019 Nonparametric Estimation of Scalar

    16/31

    2238 E. GOBET, M. HOFFMANN AND M. REI

    PROOF. We use the resolvent identity and the Cauchy integral representationof the spectral projection P on the eigenspace of T contained in B(,) [seeChatelin (1983), Lemma 6.4]. By the usual Neumann series argument we find

    formally for an eigenvector u corresponding to ,

    u Pu =1

    2

    B(,)

    R(T, z)

    z dz(T T )u

    12

    2 supzB(,)

    R(T, z)1(T T )u

    supzB(,)

    R(T,z)1 R(T,z)T T

    (T T )u

    = (R T T)1(T T )u.Hence, for T T < R2 this calculation is a posteriori justified and yieldsu Pu < 2R1(T T )u. Applying (T T )u < R2 u once again, wesee that the projection P cannot be zero. Consequently there must be a part ofthe spectrum ofT in B(,). By the argument in the proof of Theorem 5.22 inChatelin (1983) this part consists of a simple eigenvalue .

    It remains to find eigenvectors that are close, too. Observe that, for arbitraryHilbert space elements g, h with g = h = 1 and g, h 0,

    g h2 = 2 2g, h 2(1 + g, h)(1 g, h) = 2g g, hh2holds. We substitute for g and h the normalized eigenvectors u and u withu, u 0; note that oblique projections only enlarge the right-hand side and thusinfer (4.1).

    COROLLARY 4.3. Under the conditions of Proposition 4.2 there is a constantC = C(R,T) such that| | C(T T )u.

    PROOF. The inverse triangle inequality yields

    | |= | Tu T u |T(u u) + (T T )u (T + T T)u u + (T T )u

    T + R2

    8R1 + 1

    (T T )u,

    where the last line follows from Proposition 4.2.

    4.3. Bias estimates. In a first estimation step, we bound the deterministic error

    due to the finite-dimensional projection J P of P. We start with a lemma

    stating that J and J have similar approximation properties.

  • 7/28/2019 Nonparametric Estimation of Scalar

    17/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2239

    LEMMA 4.4. Let m : [0, 1] [m0, m1] be a measurable function with m1 m0 > 0. Denote by mJ the L

    2(m)-orthogonal projection onto the multiresolution

    space VJ. Then there is a constantC

    =C(m0, m1) such that

    (Id mJ )fH1 C(Id J)fH1 f H1([0, 1]).

    PROOF. The norm equivalence m0gL2 gm m1gL2 implies mJ L2L2 m1m10 mJ L2(m)L2(m) = m1m10 .

    On the other hand, the Bernstein inequality in VJ and the Jackson inequality forId J in H1 and L2 yield, for f H1,

    (Id

    J )f

    H1

    = (Id

    J )(Id

    J)f

    H1

    (Id J)fH1 + J (Id J)fH1 (Id J)fH1 + 2J J (Id J)fL2 (Id J)fH1 + J L2L2(Id J)fH1 ,

    where the constants depend only on the approximation spaces.

    PROPOSITION 4.5. Uniformly over s we have J P PH1H1 2J s .

    PROOF. The transition density p is the kernel of the operator P. Hence,from Lemma 6.7 it follows that P : H1 Hs+1 is continuous with a uniformnorm bound over s . Lemma 4.4 yields

    (P J P)fH1 Id JHs+1H1fH1 .The Jackson inequality in H1 gives the result.

    COROLLARY 4.6. LetJ

    1be the largest eigenvalue smaller than 1 of

    J

    with

    eigenfunction uJ1 . Then uniformly overs the following estimate holds:

    |J1 1| + uJ1 u1H1 2J s .

    PROOF. We are going to apply Proposition 4.2 on the space H1 and itsCorollary 4.3. In view of Proposition 4.5, it remains to establish the existenceof uniformly strictly positive values for and R over s . The uniform separationof1 from the rest of the spectrum is the content of Proposition 6.5.

    For the choice of in Proposition 6.5 we search a uniform bound R. If

    we regard P on L2(), then P is self-adjoint and satisfies R(P, z) =dist(z,(P))1 [see Chatelin (1983), Proposition 2.32].

  • 7/28/2019 Nonparametric Estimation of Scalar

    18/31

    2240 E. GOBET, M. HOFFMANN AND M. REI

    By Lemma 6.3 and the commutativity between P and L we conclude

    R(P,z)fH1 (Id L)1/2R(P,z)f R(P, z)(Id L)1/2f dist

    z,(P)

    1fH1 .Hence, R(P, z)H1H1 1 holds uniformly over z B(,) and(,b) s .

    REMARK 4.7. The KatoTemple inequality [Chatelin (1983), Theorem 6.21]on L2() even establishes the so-called superconvergence |J1 1| 22J s .

    4.4. Variance estimates. To bound the stochastic error on the finite-dimensional space VJ, we return to vector-matrix notation and look for a boundon the error involved in the estimators G and P. The Euclidean norm is denotedby l2 .

    LEMMA 4.8. For any vectorv R|VJ| we have, uniformly overs ,E,b

    (G G)v2l2

    v2

    l2N12J.

    PROOF. We obtain, by (6.1) in Lemma 6.2,

    E,b(G G)v2

    l2

    =

    ||J

    E,b

    1

    N2

    E,b[(X0)v(X0)]

    12

    (X0)v(X0)

    12

    (XN )v(XN) N1n=1

    (Xn)v(Xn)

    2

    ||J

    N1E,b(X0)v(X0)2N1

    ||J 2v2L2 N12Jv2l2 ,

    as asserted.

    LEMMA 4.9. For any vectorv we have, uniformly overs ,

    E,b(P P)v2l2 v2l2 N12J.

  • 7/28/2019 Nonparametric Estimation of Scalar

    19/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2241

    PROOF. We obtain, by (6.2) in Lemma 6.2,

    E,b

    (P P)v2l2=

    ||J

    E,b

    1N

    Nn=1

    X(n1)v(Xn) E,b[(X0)v(X)]

    2

    ||J

    N1E,b

    (X0)v(X1)2

    N1

    ||J2L2v2L2p N12Jv2l2 ,

    as asserted.

    DEFINITION 4.10. We introduce the random set

    R=RJ,N :=G G 12G11.

    REMARK 4.11. Since G is invertible, so is G on R with G1 2G1by the usual Neumann series argument.

    LEMMA 4.12. Uniformly overs we have P,b (

    \R)N122J.

    PROOF. By the classical HilbertSchmidt norm inequality,

    G G2l2l2

    ||J

    (G G)e2l2l2

    holds with unit vectors (e). Then Lemma 4.8 gives uniformly

    E,bG G2

    l2l2N122J.

    Since the spaces L2([0, 1

    ]) and L2() are isomorphic with uniform isomorphism

    constants, G1 1 holds uniformly over s and the assertion follows fromChebyshevs inequality.

    PROPOSITION 4.13. For any > 0 we have, uniformly overs ,

    P,b(R {G1P G1P })N122J2.

    PROOF. First, we separate the different error terms:

    G

    1

    P

    G

    1P

    = G

    1(

    P

    P

    )+

    (

    G

    1

    G

    1)P

    = G1(P P) + (G G)G1P.

  • 7/28/2019 Nonparametric Estimation of Scalar

    20/31

    2242 E. GOBET, M. HOFFMANN AND M. REI

    On the set R we obtain, by Remark 4.11,

    G1P G1P G1(P P + G GG1P) 2G1(P P + G GG1P) P P + G G.

    By Lemmas 4.8 and 4.9 and the HilbertSchmidt norm estimate (cf. the proof ofLemma 4.12) we obtain the uniform norm bound over s ,

    E,b[G1P G1P21R]N122J.It remains to apply Chebyshevs inequality.

    Having established the weak consistency of the estimators in matrix norm, wenow bound the error on the eigenspace.

    PROPOSITION 4.14. Let uJ1 be the vector associated with the normalizedeigenfunction uJ1 of

    J P with eigenvalue

    J1 . Then uniformly over s the

    following risk bound holds:

    E,b(G1P G1PJ)uJ1 2l2 1RN12J.

    PROOF. By the same separation of the error terms on R as in the precedingproof and by Lemmas 4.8 and 4.9 we find

    E,b(G1P G1PJ)uJ1 2l2 1R 8G12E(P PJ)uJ1 2l2+E(G G)J1 uJ1 2l2N12J.

    The uniformity over s follows from the respective statements in the lemmas.

    COROLLARY 4.15. Let 1 be the second largest eigenvalue of the matrix

    G1

    P with eigenvector u1. If G is not invertible or if u1l2 2sups u1L2holds, put u1 := 0, 1 := 0. If N122J 0 holds, then uniformly over s thefollowing bounds hold forN , J :

    E,b|1 J1 |2 + u1 uJ1 2l21RN12J,(4.2)

    E,bu1 uJ1 2H1N123J.(4.3)

    PROOF. For the proof of (4.2) we apply Proposition 4.2 using the EuclideanHilbert space RVJ and Corollary 4.3. Then Proposition 4.14 in connection with

    Proposition 4.13 (using < R/2 and N122J 0) yields the correct asymptoticrate on the event R. For the uniform choice of and R for G1PJ in

  • 7/28/2019 Nonparametric Estimation of Scalar

    21/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2243

    Proposition 4.2 just use the corresponding result for P and the convergence J P P 0.

    The precaution taken for undefined or too large

    u1 is necessary for the event

    \R. Since the estimators 1 and u1 are now kept artificially bounded, the rateP,b( \R) N122J established in Lemma 4.12 suffices to bound the risk on\R. Hence, the second estimate (4.3) is a consequence of (4.2) and the Bernsteininequality u1 uJ1 H1 2Ju1 uJ1 L2 .

    REMARK 4.16. The main result of this section, namely (4.3), can be extendedto pth moments for all p (1, ):

    E,b

    u1 uJ1 pH1

    1/pN1/223J /2.(4.4)

    Indeed, tracing back the steps, it suffices to obtain bounds on the moments oforder p in Lemmas 4.8 and 4.9, which on their part rely on the mixing statementin Lemma 6.2. By arguments based on the Riesz convexity theorem this lastlemma generalizes to the corresponding bounds for pth moments, as derived inSection VII.4 of Rosenblatt (1971). For the sake of clarity we have restrictedourselves to the case p = 2 here.

    4.5. Upper bound for (). By Corollary 4.15 and our choice of J, 2J N1/(2s+3),

    sup(,b)s

    E,b|1 J1 |2 + u1 uJ1 2H1N123JN2s/(2s+3)holds. Using this estimate and the estimate for in Proposition 4.1, the risk ofthe plug-in estimator 2() in (3.12) is bounded as asserted in the theorem. Weonly have to ensure that the stochastic error does not increase from the plug-in andthat the denominator is uniformly bounded away from zero. Using the CauchySchwarz inequality and Remark 4.16 on the higher moments of our estimators, weencounter no problem in the first case. The second issue is dealt with by using thelower bound ca,b > 0 in Proposition 6.5 so that an improvement of the estimate forthe denominator by using

    u1 := max(u1, ca,b )(4.5)instead ofu1 guarantees the uniform lower bound away from zero.

    4.6. Upper bound forb(). Since b() = S()/() holds, it suffices to discusshow to estimate S(), which amounts to estimating the eigenfunction u1 inH2-norm; compare with (3.6). Substituting H2 for H1 in Proposition 4.5 and itsproof, we obtain the bound

    J P PH2H2 2J (s1),

  • 7/28/2019 Nonparametric Estimation of Scalar

    22/31

    2244 E. GOBET, M. HOFFMANN AND M. REI

    because Id JHs+1H2 is of this order. As in Corollary 4.6 this is also therate for the bias estimate. The only fine point is the uniform norm equivalence

    f

    H2

    (Id

    L)f

    for f

    dom(L), which follows by the methodology of

    perturbation and similarity arguments given in Section VI.4b of Engel and Nagel(2000). We omit further details.

    The variance estimate is exactly the same. From (4.2) we infer, by Bernsteinsinequality for H2 and the estimate ofP,b( \R),

    E,bu1 uJ1 2H2N125J.

    Therefore balancing the bias and variance part of the risk by the choice 2J N1/(2s+3)as beforeyields the asserted rate of convergence N(s1)/(2s+3).

    5. Proof of the lower bounds. First, the usual Bayes prior technique isapplied for the reduction to a problem of bounding certain likelihood ratios. Thenthe problem is reduced to that of bounding the L2-distance between the transitionprobabilities, which is finally accomplished using HilbertSchmidt norm estimatesand the explicit form of the inverse of the generator.

    5.1. The least favorable prior. The idea is to perturb the drift and diffusioncoefficients of the reflected Brownian motion in such a way that the invariantmeasure remains unchanged. Let us assume that is a compactly supported

    wavelet in Hs with one vanishing moment. Without loss of generality we supposeC > 1 > c > 0 such that (1, 0) s holds. We put j k = 2j/2(2j k) anddenote by Kj Z a maximal set of indices k such that supp(j k ) [a, b] andsupp(j k) supp(j k) = holds for all k, k Kj, k = k. Furthermore, we set 2j (s+1/2) such that, for all = (j) {1, +1}|Kj|,

    2S, (S)

    s with S(x) := S(j,x) :=

    2 +

    kKjk j k(x)

    1.

    We consider the corresponding diffusions with generator

    LS f(x) := (Sf)(x) := S(x)f(x) + S(x)f(x), f dom(L).Hence, the invariant measure is the Lebesgue measure on [0, 1].

    The usual Assouad cube techniques [e.g., see Korostelev and Tsybakov (1993)]give, for any estimator () and for NN, > 0, the lower bounds

    sup(,b)s

    E,b2 22

    L2([a,b]) 2j2

    2e p0,(5.1)

    sup(,b)s

    E,bb b2L2([a,b]) 2j2b e p0,(5.2)

  • 7/28/2019 Nonparametric Estimation of Scalar

    23/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2245

    where, for all , with l1 = 2 and with P := P2S,S ,

    2

    2S

    2S

    L2 , b

    S

    S

    L2 , p0

    P

    dP

    dP FN > e.We choose 2 since, for x supp(j k) with k = k ,

    S(x) S(x) = 2 j k(x)S(x)S(x)

    and S, S 12 holds uniformly so that the L2-norm is indeed of order .Equivalently, we find b 2j. Due to 2j (s+1/2), the proof of the theoremis accomplished, once we have shown that a strictly positive p0 can be chosen forfixed > 0 and the asymptotics 2jN1/(2s+3).

    5.2. Reduction to the convergence of transition densities. If we denote thetransition probability densities P(X dy |X0 = x) by p(x,y)dy and thetransition density of reflected Brownian motion by pBM, then we infer, fromProposition 6.4,

    limj

    sup{1,+1}Kj

    p pBM = 0

    due to S 1

    2C1 23j /2

    0 for s > 1. We are now going to use theestimate log(1 + x) x2 x, which is valid for all x 12 . For j so largethat 1 p

    p 12 holds, the KullbackLeibler distance can be bounded from

    above (note that the invariant measure is Lebesgue measure):

    E

    log

    dP

    dP

    FN

    = N

    k=1E logp (Xk

    1, Xk )

    p(Xk1, Xk) = N

    10

    10

    log

    p(x,y)

    p(x,y)

    p(x, y)dydx

    N1

    0

    10

    (p (x,y) p(x,y))2p(x,y)

    p(x,y) p(x,y) d y d x= N

    1

    0 1

    0

    (p (x,y) p(x,y))2p(x,y)

    d y d x

    Np1 p p2L2([0,1]2).

  • 7/28/2019 Nonparametric Estimation of Scalar

    24/31

    2246 E. GOBET, M. HOFFMANN AND M. REI

    The square root of the KullbackLeibler distance bounds the total variationdistance in order, which by the Chebyshev inequality yields

    PdPdP FN > e = 1 PdP

    dP FN 1 e 1

    1 EdPdP

    FN

    1(1 e )1

    = 1 (1 e )1(P P)|FNTV 1 CN1/2p pL2([0,1]2),

    where C > 0 is some constant independent of , N, and j. Summarizing, weneed the estimate

    limsupN,j

    N1/2p pL2([0,1]2) < C1 for 2jN1/(2s+3).(5.3)

    5.3. Convergence of the transition densities. Observe first that p pL2([0,1]2) is exactly the HilbertSchmidt norm distance P

    PHS between

    the transition operators derived from LS and LS acting on the Hilbert spaceL2([0, 1]). If we introduce

    V := f L2([0, 1]) f = 0 and V := {f L2([0, 1])|f constant},then the transition operators coincide on V and leave the space V invariant sothat P PHS = (P

    P)|VHS.

    We take advantage of the key result that for Lipschitz functions f with Lipschitzconstant on the union of the spectra of two self-adjoint bounded operatorsT1 and T2 the continuous functional calculus satisfies

    f (T1) f (T2)HS T1 T2HS;(5.4)

    see Kittaneh (1985). We proceed by bounding the HilbertSchmidt norm of thedifference of the inverses of the generators and by then transferring this bound tothe transition operators via (5.4). By the functional calculus for operators on V, thefunction f (z) = exp((z1)) sends (L|V)1 to P|V. Moreover, f is uniformlyLipschitz continuous on (, 0) due to := supz

  • 7/28/2019 Nonparametric Estimation of Scalar

    25/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2247

    Using |S1 S1 | = 2 j k for some k Kj and denoting by the primitiveof with compact support, we obtain

    (L |V)1

    (L|V)1

    2HS

    =1

    0

    10

    1y

    2 j k(v)

    v 1[x,1](v)

    dv

    2d x d y

    = 422j1

    0

    10

    (2jy k)y

    1

    y(2jv k)dv +

    2j(x y) k

    2

    d x d y

    22j(2j)2L2 222j.

    Consequently, p p2L2 2j (2s+3) holds with an arbitrarily small constantif2j (s+1/2) is chosen sufficiently small. Hence, the estimate (5.3) is valid for thischoice and the asymptotics N2j (2s+3) 1, which remained to be proved.

    6. Technical results. We shall need several technical results, mainly to de-scribe the dependence of certain quantities on the underlying diffusion parameters.

    The following result is in close analogy with Section IV.5 in Bass (1998).

    LEMMA 6.1. The second largest eigenvalue 1 of the infinitesimal generatorL,b can be bounded away from zero:

    1 infx[0,1]

    S(x) =: s0.

    This eigenvalue is simple and the corresponding eigenfunction f1 is monotone.

    PROOF. The variational characterization of 1 [Davies (1995), Section 4.5]and partial integration yield

    1 = supf=1f,1=0

    Lf, f = inff=1f,1=0

    10

    S(x)f(x)2 dx.

    Given the derivative f, the function f dom(L) with f, 1 = 0 is uniquelydetermined. Setting M(x) := ([0, x]), this function f satisfies

    0 = 10

    f (0) + x0

    f(y)dy(x)dx = f (0) + 10

    f(y)1 M(y)dy .

  • 7/28/2019 Nonparametric Estimation of Scalar

    26/31

    2248 E. GOBET, M. HOFFMANN AND M. REI

    For f, g L2() with f, 1 = g, 1 = 0 we find

    f, g = 1

    0 f (0) +

    x

    0f(y)dy

    g(0) +

    x

    0g(z)dz

    (x)dx

    = f (0)g(0) f (0)1

    0g(z)

    1 M(z)dz

    g(0)1

    0f(y)

    1 M(y)dy

    +1

    0

    10

    f(y)g(z)1 M(y z)d y d z

    = 1

    0 1

    0 M(y z) M(y)M(z)f(y)g(z)dydz=:

    10

    10

    m(y,z)f(y)g(z) dy dz.

    The kernel m(y,z) is positive on (0, 1)2 and bounded by 1, whence we obtain, byregarding u = f,

    1 = infu

    10 S(x)u(x)

    2 dx101

    0 m(y, z)u(y)u(z) dy dz s0u

    2L2

    u2L1

    s0.

    If the derivative of an eigenfunction f1 changed sign, we could write f1 =

    u+

    u

    with two nonnegative functions u+, u that are nontrivial. However, this wouldentail that the antiderivative f0 off0 := u++u satisfies Lf0, f0 = Lf1, f1,while f0 would be strictly greater than f1 due to the positivity ofm(u,v).This contradicts the variational characterization of 1 so that all eigenfunctionscorresponding to 1 are monotone. Consequently, for any two eigenfunctionsf1 and g1 the integrand in

    f1, g1 =1

    0

    10

    m(y,z)f1(y)g1(z)dydz

    does not change sign and the whole integral does not vanish. We infer thatthe eigenspace of 1 cannot contain two orthogonal functions and is thus one-dimensional.

    LEMMA 6.2. For H1, H2 L2([0, 1]) we have the following two uniformvariance estimates overs :

    Var,b

    1

    N

    Nn=1

    H1(Xn)

    N1E,b[H1(X0)2],(6.1)

    Var,b 1NN

    n=1H1X(n1)H2(Xn)N1E,b[H1(X0)2H2(X1)2].(6.2)

  • 7/28/2019 Nonparametric Estimation of Scalar

    27/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2249

    PROOF. Due to the uniform spectral gap s0 over s (Proposition 6.1),P satisfies Pf f with := e|s0| < 1 for all f L2() with

    f, 1

    =0.

    We obtain the first estimate by considering the centered random variablesf1(Xk) := H1(Xk ) E,b[H1(Xk )], k N:

    Var,b

    N

    n=1H1(Xn)

    =

    Nm,n=1

    E,b[f1(Xm)f1(Xn)] =N

    m,n=1

    f1, P

    |mn| f1

    N

    m,n=1|mn|f12 NE,b[H1(X0)2].

    The second estimate follows along the same lines. Merely observe that form > n, by the projection property of conditional expectations

    E,bH1

    X(n1)

    H2(Xn)H1

    X(m1)

    H2(Xm)

    = H1 (PH2), Pmn1 H1 (PH2)

    holds, where is the usual multiplication operator.

    LEMMA 6.3. Uniformly overs the following norm equivalence holds:

    fH1 (Id L)1/2

    f for all f H1

    .

    PROOF. The invariant measure and the function S are uniformly boundedaway from zero and infinity so that we obtain, with uniform constants for f dom(L),

    f2H1

    = f, f + f, f f, f + Sf, f= (Id L)f,f = (Id L)1/2f2.

    By an approximation argument this extends to allf

    H1

    = dom(L1/2)

    .

    PROPOSITION 6.4. Suppose ((n(), bn()) s , n 0, andlim

    nn 0 = 0, limnbn b0 = 0.

    Then the corresponding transition probabilities (p(n)t ) converge uniformly:

    limn

    p(n)t (, ) p(0)t (, )

    = 0.

    PROOF. An application of the results by Stroock and Varadhan (1971) yieldsthat the corresponding diffusion processes X(n) converge weakly to X(0) for any

  • 7/28/2019 Nonparametric Estimation of Scalar

    28/31

    2250 E. GOBET, M. HOFFMANN AND M. REI

    fixed initial value X(n)(0) = x. This implies in particular

    limn

    1

    0

    p(n)t (x,y)(y)dy

    = 1

    0

    p(0)t (x,y)(y)dy

    for all test functions L([0, 1]) and all x [0, 1].On the other hand, the functions (p(n)t )n form a relatively compact subset of

    C([0, 1]2) by Proposition 6.7 and Sobolev embeddings. Any point of accumulationof(p(n)t )n in C([0, 1]2) must equal p(0)t , which follows from testing with suitablefunctions L([0, 1]). Consequently, (p(n)t )n is a relatively compact sequencewith only one point of accumulation and thus converges.

    PROPOSITION 6.5. For the class s there is a constant > 0 such that for all

    parameters ( (),b()) the eigenvalue 1 = 1(,b) ofP is uniformly separated: (P) B(1, 2) = {1}.Furthermore, for all 0 < a < b < 1 there is a uniform constant ca,b > 0 such

    that the associated first eigenfunction u1 = u1(,b) satisfies, for all (,b) s ,min

    x[a,b]|u1(x)| ca,b .

    PROOF. Proceeding indirectly, assume that there is a sequence (n, bn) ssuch that the corresponding eigenvalues satisfy (n)1 1 (or (n)1 (n)2 0,

    resp.). By the compactness of the Sobolev embedding of Hs

    into C0

    we canpass to a uniformly converging subsequence. Hence, Proposition 6.4 yieldsthat the corresponding transition densities converge uniformly, which impliesthat the transition operators P(n) converge in operator norm on L

    2([0, 1]). ByProposition 5.6 and Theorem 5.20 in Chatelin (1983), this entails the convergenceof their eigenvalues with preservation of the multiplicities. Since the limitingoperator is again associated with an elliptic reflected diffusion, the fact that theeigenvalue 1 = e1 is always simple (Lemma 6.1) gives the contradiction.

    By the same indirect arguments, we construct transition operators P(n) on

    the spaceC

    ([0, 1]) and infer that the eigenfunctions u(n)

    1 [Chatelin (1983),Theorem 5.10], the invariant measures (n) [see (3.1)] and the inverses of thefunctions S(n) [see (3.2)] converge in supremum norm. Therefore (u(n)1 )

    =

    (n)1 (S

    (n))1

    u(n)1

    (n) also converges in supremum norm. Due to u1|[a,b] = 0(Lemma 6.1) this implies that (u(n)1 )

    cannot converge to zero on [a, b].

    LEMMA 6.6. The L2()-normalized eigenfunction uk of the generator Lcorresponding to the (k + 1)st largest eigenvalue k satisfies

    uk

    Hs

    +1

    C(s,s0,

    S1

    s ,

    s

    1)|k

    |s,

    where C is a continuous function of its arguments.

  • 7/28/2019 Nonparametric Estimation of Scalar

    29/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2251

    PROOF. We know that 1(Suk) = k uk and uk(0) = 0 holds, which imply

    uk(x) = kS1(x)x

    0uk (u)(u) du.

    Suppose uk Hr+1 with r [0, s]. Then the function uk is in Hr(s1) dueto uk Cr (Sobolev embeddings). Hence, the antiderivative is in H(r+1)s .As S1 Hs holds, the right-hand side is an element of Hs(r+1). We concludethat the regularity r ofuk is larger by 1, which implies that uk is in Hs+1.

    In quantitative terms we obtain for r [1, s], where we use the seminorm|f|s := f(s)L2 ,

    |uk|r |k |C(r)|S1|r

    0uk

    + S1

    0uk

    r

    |k |C(r)S1rukL1() + |uk|r1 |k |C(r)S1s (1 + |uk|r1 + ukr1) |k |C(r)S1s (1 + 2ukr1s1).

    By applying this estimate ukr+1 |k |S1s (1 + ukr1s1) succes-sively for r = 1, 2, . . . ,s and finally, for r = s 1, the estimate follows.

    PROPOSITION 6.7. For (,b) s the corresponding transition probabilitydensity p = p,,b satisfies

    sup(,b)s

    pHs+1Hs < .

    PROOF. The spectral decomposition ofP : L2() L2() yields

    p(x,y) = (y)

    k=0ekuk(x)uk (y), x, y [0, 1].

    Due to the uniform ellipticity and uniform boundedness of the coefficients, wehave k [C1k2, C2k2] with uniform constants C1, C2 > 0 on s [see Davies(1995), adapting Example 4.6.1, page 93, to our situation]. From the precedingLemma 6.6 and the Sobolev embedding Hs+1 Cs we infer

    pHs+1Hs

    k=0eC2k

    2uks+1uks

    k=0C(s,s0, S1s , s1)2eC2k

    2(C1k

    2)s+1s ,

    which gives the desired uniform estimate.

    Acknowledgment. The helpful comments of two anonymous referees aregratefully acknowledged.

  • 7/28/2019 Nonparametric Estimation of Scalar

    30/31

    2252 E. GOBET, M. HOFFMANN AND M. REI

    REFERENCES

    AT-S AHALIA, Y. (1996). Nonparametric pricing of interest rate derivative securities. Econometrica64 527560.

    BANON, G. (1978). Nonparametric identification for diffusion processes. SIAM J. Control Optim. 16380395.BAS S, R. F. (1998). Diffusions and Elliptic Operators. Springer, New York.BROWN, B. M. and HEWITT, J. I. (1975). Asymptotic likelihood theory for diffusion processes.

    J. Appl. Probability 12 228238.CHAPMAN, D. A. and PEARSON, N. D. (2000). Is the short rate drift actually nonlinear? J. Finance

    55 355388.CHATELIN, F. (1983). Spectral Approximation of Linear Operators. Academic Press, New York.CHE N, X., HANSEN, L. P. and SCHEINKMAN, J. A. (1997). Shape preserving spectral approximation

    of diffusions. Working paper. (Last version, November 2000.)COHEN, A. (2000). Wavelet methods in numerical analysis. In Handbook of Numerical Analysis 7

    (P. G. Ciarlet, ed.) 417711. North-Holland, Amsterdam.DAVIES, E. B. (1995). Spectral Theory and Differential Operators. Cambridge Univ. Press.ENGEL, K.-J. and NAGEL, R. (2000). One-Parameter Semigroups for Linear Evolution Equations.

    Springer, Berlin.FAN, J. and YAO, Q. (1998). Efficient estimation of conditional variance functions in stochastic

    regression. Biometrika 85 645660.FAN, J. and ZHANG, C. (2003). A re-examination of diffusion estimators with applications to

    financial model validation. J. Amer. Statist. Assoc. 98 118134.GOBET, E. (2002). LAN property for ergodic diffusions with discrete observations. Ann. Inst.

    H. Poincar Probab. Statist. 38 711737.HANSEN, L. P., SCHEINKMAN, J. A. and TOUZI, N. (1998). Spectral methods for identifying scalar

    diffusions. J. Econometrics 86 132.HOFFMANN, M. (1999). Adaptive estimation in diffusion processes. Stochastic Process. Appl. 79

    135163.KATO, T. (1995). Perturbation Theory for Linear Operators. Springer, Berlin. [Reprint of the

    corrected printing of the 2nd ed. (1980).]KESSLER, M. (1997). Estimation of an ergodic diffusion from discrete observations. Scand. J. Statist.

    24 211229.KESSLER, M. and SRENSEN, M. (1999). Estimating equations based on eigenfunctions for a

    discretely observed diffusion process. Bernoulli 5 299314.KITTANEH, F. (1985). On Lipschitz functions of normal operators. Proc. Amer. Math. Soc. 94

    416418.KOROSTELEV, A. P. and TSYBAKOV, A. B. (1993). Minimax Theory of Image Reconstruction.

    Lecture Notes in Statist. 82. Springer, Berlin.KUTOYANTS, Y. A. (1975). Local asymptotic normality for processes of diffusion type. Izv. Akad.

    Nauk Armyan. SSR Ser. Mat. 10 103112.KUTOYANTS, Y. A. (1984). On nonparametric estimation of trend coefficient in a diffusion process.

    In Statistics and Control of Stochastic Processes 230250. Nauka, Moscow.LIPTSER, R. S. and SHIRYAEV, A. N. (2001). Statistics of Random Processes 1. General Theory,

    2nd ed. Springer, Berlin.MLLER, H.-G. and STADTMLLER, U. (1987). Estimation of heteroscedasticity in regression

    analysis. Ann. Statist. 15 610625.PHAM, D. T. (1981). Nonparametric estimation of the drift coefficient in the diffusion equation.

    Math. Operationsforsch. Statist. Ser. Statist. 12 6173.

    REI, M. (2003). Simulation results for estimating the diffusion coefficient from discrete timeobservation. Available at www.mathematik.hu-berlin.de/reiss/sim-diff-est.pdf.

  • 7/28/2019 Nonparametric Estimation of Scalar

    31/31

    NONPARAMETRIC ESTIMATION OF SCALAR DIFFUSIONS 2253

    ROSENBLATT, M. (1971). Markov Processes. Structure and Asymptotic Behavior. Springer, Berlin.STANTON, R. (1997). A nonparametric model of term structure dynamics and the market price of

    interest rate risk. J. Finance 52 19732002.STROOCK, D. W. and VARADHAN, S. R. S. (1971). Diffusion processes with boundary conditions.

    Comm. Pure Appl. Math. 24 147225.STROOCK, D. W. and VARADHAN, S. R. S. (1997). Multidimensional Diffusion Processes. Springer,

    Berlin.TRIBOULEY, K. and VIENNET, G. (1998). Lp adaptive density estimation in a -mixing framework.

    Ann. Inst. H. Poincar Probab. Statist. 34 179208.YOSHIDA, N. (1992). Estimation for diffusion processes from discrete observations. J. Multivariate

    Anal. 41 220242.

    E. GOBETCENTRE DE MATHMATIQUES APPLIQUESECOLE POLYTECHNIQUE

    91128 PALAISEAU CEDEXPARISFRANCE

    M. HOFFMANNLABORATOIRE DE PROBABILITS

    ET MODLES ALATOIRES

    UNIVERSIT PARIS 6 ET 7UFR MATHMATIQUES, CASE 7012 TAGE2 P LACE JUSSIEU75251 PARIS CEDEX 05FRANCEE- MAIL: [email protected]

    M. REIINSTITUT FR MATHEMATIKHUMBOLDT-U NIVERSITT ZU BERLINUNTER DEN LINDEN 610099 BERLINGERMANY