Robust Estimation of Derivatives Using Locally Weighted ...tongt/papers/LADD.pdfin locally weighted least squares regression (LowLSR), and proposed a new estimator for the derivative

Journal of Machine Learning Research 1 (2017) 1-48 Submitted 20/6/2017; Published

Robust Estimation of Derivatives Using Locally WeightedLeast Absolute Deviation Regression

WenWu Wang [email protected] Yu [email protected] of Economics and Finance

The University of Hong Kong

Pokfulam Road, Hong Kong

Lu Lin [email protected] Securities Institute for Financial Studies

Shandong University

Jinan, 250100, China

Tiejun Tong [email protected] of Mathematics

Hong Kong Baptist University

Kowloon Tong, Hong Kong

Editor:

Abstract

In nonparametric regression, the derivative estimation has attracted much attention inrecent years due to its wide applications. In this paper, we propose a new method for thederivative estimation using the locally weighted least absolute deviation regression. Dif-ferent from the local polynomial regression, the proposed method does not require a finitevariance for the error term and so is robust to the presence of heavy-tailed errors. Mean-while, it does not require a zero median or a positive density at zero for the error term incomparison with the local median regression. We further show that the proposed estimatorwith random difference is asymptotically equivalent to the (infinitely) composite quantileregression estimator. The proposed method is also extended to estimate the derivatives atthe boundaries and to estimate the higher-order derivatives. For the equidistant design, wederive theoretical results for the proposed estimators, including the asymptotic bias andvariance, consistency, and asymptotic normality. Finally, we conduct simulation studies todemonstrate that the proposed method has better performance than the existing methodsin the presence of outliers and heavy-tailed errors.

Keywords: Composite quantile regression, Differenced method, LowLAD, Outlier andHeavy-tailed error, Robust nonparametric derivative estimation

1. Introduction

The derivative estimation is an important problem in nonparametric regression and it hasapplications in a wide range of fields. For instance, when analyzing human growth data(Müller, 1988; Ramsay and Silverman, 2002) or maneuvering target tracking data (Li andJilkov, 2003, 2010), the first- and second-order derivatives of the height as a function of

c©2017 WenWu Wang, Ping Yu, Lu Lin and Tiejun Tong.

Wang, Yu, Lin and Tong

time are two important parameters, with the first-order derivative representing the speedand the second-order derivative representing the acceleration. The derivative estimates arealso needed in change-point problems, e.g., for exploring the structures of curves (Chaudhuriand Marron, 1999; Gijbels and Goderniaux, 2005), for detecting the extremum of derivatives(Newell et al., 2005), for characterizing submicroscopic nanoparticle from scattering data(Charnigo et al., 2007, 2011a), for comparing regression curves (Park and Kang, 2008), fordetecting abrupt climate changes (Matyasovszky, 2011), and for inferring the cell growthrates (Swain et al., 2017).

In the existing literature, one usually obtains the derivative estimates as a by-productby taking the derivative of a nonparametric fit of the regression function. There are threemain approaches for the derivative estimation: smoothing spline, local polynomial regres-sion, and differenced estimation. For smoothing spline, the derivatives are estimated bytaking derivatives of the spline estimation of the regression function (Stone, 1985; Zhouand Wolfe, 2000). For local polynomial regression, a polynomial using the Taylor expansionis fitted locally by the kernel method (Ruppert and Wand, 1994; Fan and Gijbels, 1996;Delecroix and Rosa, 1996). These two methods both require an estimate of the regressionfunction. And as pointed out in Wang and Lin (2015), the convergence rates of their re-gression function estimator and their derivative estimators are often different to each other.In particular, when the regression function estimator achieves the optimal rate of conver-gence, the corresponding derivative estimators may not able to achieve it. In other words,minimizing the mean square error of the regression function estimator does not necessarilyguarantee the derivatives be optimally estimated (Wahba and Wang, 1990; Charnigo et al.,2011b).

For the differenced estimation, Müller et al. (1987) and Härdle (1990) proposed a cross-validation method to estimate the first-order derivative without estimating the regressionfunction. Unfortunately, their method may not perform well in practice as the variance oftheir estimator is proportional to n2 when the design points are equally spaced. Inspired bythis, Charnigo et al. (2011b) and De Brabanter et al. (2013) proposed a variance-reducingestimator for the derivative function called the empirical derivative that is essentially a linearcombination of the symmetric difference quotients. They further derived the order of theasymptotic bias and variance, and established the consistency of the empirical derivative.Wang and Lin (2015) represented the empirical derivative as a local constant estimatorin locally weighted least squares regression (LowLSR), and proposed a new estimator forthe derivative function to reduce the estimation bias in both valleys and peaks of the truederivative functions. Dai et al. (2016) further proposed a method to minimize the meansquare error of the derivative estimation.

The aforementioned differenced derivative estimators are all based on the least squaresmethod. Although very elegant, the least squares method is not robust and may oftenbe sensitive to outliers (Huber and Ronchetti, 2009). To overcome the problem, variousrobust methods have been proposed in the literature to improving the estimation of theregression function, see, for example, kernel M-smoother (Härdle and Gasser, 1984), localleast absolute deviation (Fan and Hall, 1994; Wang and Scott, 1994), and locally weightedleast squares (Cleveland, 1979; Ruppert and Wand, 1994). In contrast, little attention hasbeen paid to improving the derivative estimation except for the parallel developments ofthe above remedies (Härdle and Gasser, 1985; Welsh, 1996; Boente and Rodriguez, 2006).

2

Robust Derivative Estimation

In particular, the convergence rates of the regression function estimator and the derivativeestimators remain different and so call for a better solution.

In this paper, we propose a locally weighted least absolute deviation (LowLAD) methodby combining the differenced method and the L1 regression systematically. Over a neighbor-hood centered at a fixed point, we first obtain a sequence of linear regression representationin which the derivative is estimated as the intercept term. We then estimate the derivativein a linear model by minimizing the sum of weighted absolute errors. The proposed methoddoes not require a finte variance for the error term and so is robust to the presence ofheavy-tailed errors. Meanwhile, it does not require a zero median or a positive density atzero for the error term, since the differenced errors are guaranteed to have a zero medianand a positive symmetric density in the neighborhood of zero, regardless of the density ofthe original errors. By repeating this local fitting over a grid of points, we can obtain thederivative estimates on a discrete set of points. Finally, the entire derivative function isobtained by applying the local polynomial regression or the cubic spline interpolation.

The rest of the paper is organized as follows. Section 2 presents the motivation, the first-order derivative estimator and its theoretical properties, including the asymptotic bias andvariance, consistency, and asymptotic normality. Section 3 studies the relation between theLowLAD estimator and the existing estiamtors. In particular, we show that the LowLADestimator with random difference is asymptotically equivalent to the (infinitely) compositequantile regression estimator. Section 4 derives the first-order derivative estimation atthe boundaries of the domain, and Section 5 generalizes the proposed method to estimatethe higher-order derivatives. We then conduct extensive simulation studies to assess thefinite-sample performance of the proposed estimators and compare them with the existingcompetitors in Section 6. Finally, we conclude the paper with some discussions in Section7, and provide the proofs of the theoretical results in the Appendices.

2. First-Order Derivative Estimation

Combining the differenced method and the L1 regression, we propose the LowLAD regres-sion to estimate the first-order derivative. The new method inherits the advantage of thedifferenced method and also the robustness of the L1 method.

2.1 Motivation

Consider the nonparametric regression model

Yi = m(xi) + �i, 1 ≤ i ≤ n, (1)

where xi = i/n is the design point, Yi is the response variable, m(·) is an unknown regressionfunction, and �i are independent and identically distributed (iid) random errors with acontinuous density f(·).

First-order symmetric (about i) difference quotients (Charnigo et al., 2011b; De Bra-banter et al., 2013) are defined as

Y(1)ij =

Yi+j − Yi−jxi+j − xi−j

, 1 ≤ j ≤ k, (2)

3


where k is a positive integer. We decompose Y(1)ij into two parts as

Y(1)ij =

m(xi+j)−m(xi−j)2j/n

+�i+j − �i−j

2j/n, 1 ≤ j ≤ k. (3)

On the right hand side of (3), the first term contains the bias information of the true deriva-tive, and the second term contains the variance information. By Wang and Lin (2015), thefirst-order derivative estimation based on the third-order Taylor expansion usually outper-forms the estimation based on the first-order Taylor expansion due to bias correction. Forthe same reason, we assume that m(·) is three times continuously differentiable on [0, 1].By the Taylor expansion, we obtain

m(xi+j)−m(xi−j)2j/n

= m(1)(xi) +m(3)(xi)

6

j2

n2+ o

(j2

n2

), (4)

where the estimation bias is contained in the remainder term of the Taylor expansion.

By (3) and (4), we have

Y(1)ij = m

(1)(xi) +m(3)(xi)

6

j2

n2+ o

(j2

n2

)+�i+j − �i−j

2j/n. (5)

In Proposition 10 (see Appendix A), we show that the median of �i+j − �i−j is always zero,no matter whether the median of �i is zero or not. As a result, for any fixed k = o(n), wehave

Median[Y(1)ij ] ≈ m

(1)(xi) +m(3)(xi)

6d2j , 1 ≤ j ≤ k, (6)

where dj = j/n. We treat (6) as a linear regression with d2j and Y

(1)ij as the independent

and dependent variables, respectively. In the presence of heavy-tailed errors, we propose toestimate m(1)(xi) as the intercept of the linear regression using the LowLAD method.

2.2 Estimation Methodology

In order to derive the estimation bias, we further assume that m(·) is five times continuouslydifferentiable, that is, the regression function is two degrees smoother than our postulatedmodel due to the equidistant design. Following the paradigm of Draper and Smith (1981)and Wang and Scott (1994), we discard the higher-order terms of m(·) and assume locallythat the true model is

Y(1)ij ≈ βi1 + βi3d

2j + βi5d

4j + ζij ,

where βi = (βi1, βi3, βi5)T = (m(1)(xi),m

(3)(xi)/6,m(5)(xi)/120)

T are the unknown co-efficients of the true underlying quintic function, and ζij = (�i+j − �i−j)/(2j/n) withMedian[ζij ] = 0. Under the assumption of the true model, βi1 can be estimated as

minbi

k∑j=1

wj∣∣Y (1)ij − (bi1 + bi3d2j + bi5d4j )∣∣ = min

bi

k∑j=1

∣∣Ỹ (1)ij − (bi1dj + bi3d3j + bi5d5j )∣∣,4


where wj = dj are the weights, bi = (bi1, bi3, bi5)T , and Ỹ

(1)ij = (Yi+j−Yi−j)/2. Accordingly,

the true model can be rewritten as

Ỹ(1)ij ≈ βi1dj + βi3d

3j + βi5d

5j + ζ̃ij ,

where ζ̃ij = (�i+j − �i−j)/2 are iid random errors with Median[ζ̃ij ] = 0 and a continuous,symmetric density g(·) which is positive in a neighborhood of zero (see Appendix A).

Rather than the best L1 quintic fitting, we search for the best L1 cubic fitting to Ỹ(1)ij .

Specifically, we estimate the model by LowLAD:

(β̂i1, β̂i3) = arg minbi1,bi3

k∑j=1

∣∣Ỹ (1)ij − bi1dj − bi3d3j ∣∣,and define the LowLAD estimator of m(1)(xi) as

m̂(1)LowLAD(xi) = β̂i1. (7)

The following theorem reveals the asymptotic behavior of β̂i1.

Theorem 1 Assume that �i are iid random errors with a continuous bounded density f(·).Then as k →∞ and k/n→ 0, β̂i1 in (7) is asymptotically normally distributed with

Bias[β̂i1] ≈ −m(5)(xi)

504

k4

n4, Var[β̂i1] ≈

75

16g(0)2n2

k3,

where g(0) = 2∫∞−∞ f

2(x)dx. The optimal k that minimizes the asymptotic mean squareerror (AMSE) is

kopt ≈ 3.26(

1

g(0)2m(5)(xi)2

)1/11n10/11,

and, consequently, the minimum AMSE is given by

AMSE[β̂i1] ≈ 0.19

(m(5)(xi)

6

g(0)16

)1/11n−8/11.

In the local median regression (see Section 3.1 below for its defintion), the density f(·)is usually assumed to have a zero median and a positive f(0) value. While in Theorem 1,we only require a continuity condition on the density f(·). In addition, the variance of theLowLAD estimator depends on g(0) = 2

∫∞−∞ f

2(x)dx which is always positive, while thevariance of the local median estimator relies on a single value f(0) only. In this sense, theLowLAD estimator is more robust than the local median estimator.

We now compare the LowLAD and LowLSR estimators using a simulated data set,where the LowLSR estimator of Wang and Lin (2015) is defined in Section 3.1. Figure1 displays the LowLAD estimator (use the R package ‘L1pack’ in Osorio (2015)) and theLowLSR estimator with k ∈ {6, 12, 25, 30, 50}. The data set with size 300 is generated frommodel (1) with xi ∈ [0.25, 1], m(x) =

√x(1− x) sin((2.1π)/(x+0.05)), and �i

iid∼ N(0, 0.12).

5


Figure 1: (a) Simulated data set of size 300 from model (1) with equidistant xi ∈ [0.25, 1],m(x) =

√x(1− x) sin((2.1π)/(x+0.05)), �i

iid∼ N(0, 0.12), and the true regressionfunction (bold line). (b)-(f) The first-order LowLADderivative estimators (greenpoints) and the first-order LowLSRderivative estimators (red dashed line) fork ∈ {6, 9, 12, 25, 30, 50}. As a reference, the true first-order derivative fucntion isalso plotted (bold line).

We note that both estimators have the same variation trend whereas our estimator has aslightly larger oscillation compared to the LowLSR estimator. When k is small (see Figure 1(b) and (c)), both estimators are noise-corrupted versions of the true first-order derivatives.As k becomes larger (see Figure 1 (d)-(f)), our estimator provides a similar performance asthe LowLSR estimator. Furthermore, by combining the left part of Figure 1 (d), the middlepart of (e) and the right part of (f), more accurate derivative estimators can be obtainedfor practical use.

In what follows, we compare the LowLAD and LowLSR estimators from a theoretical

point of view. Both estimators have the same bias −m(5)(xi)504

k4

n4, while their variances are

6


7516g(0)2

n2

k3and 75σ

2

8n2

k3, respectively. When � ∼ N(0, σ2), we have g(0) = 1/(

√πσ) and

75

16g(0)2n2

k3=π

2

75σ2

8

n2

k3>

75σ2

8

n2

k3.

Figure 1 conincides with the theoretical results that the same variation trend is mainly dueto the same bias, while a little larger oscillation indicates that the LowLAD estimator hasa slightly larger variance.

To provide more quantitative evidence on the relative performace of these two estima-tors, we consider five distributions for the random errors: N(0, 12), 0.9N(0, 12)+0.1N(0, 32),0.9N(0, 12)+0.1N(0, 102), the Laplace distribution, and the Cauchy distribution, which wereadopted in the robust location estimators by Koenker and Bassett (1978). Table 1 lists theratios of variances between the LowLAD and LowLSR estimators for the five cases, whereRatio=Var(LowLSR)/Var(LowLAD). The table shows that the variance of the LowLADestimator is smaller than that of the LowLSR estimator in the presence of sharp-peak andheavy-tailed errors.

N(0, 1) 0.9N(0, 1) + 0.1N(0, 32) 0.9N(0, 1) + 0.1N(0, 102) Laplace CauchyRatio 0.64 0.92 4.85 1 ∞

Table 1: The ratio of variances between the LowLSR and LowLAD derivative estimators.

For further illustration of the trade-off between the sharp-peak and heavy-tailed errors,we consider the mixed normal errors as �i ∼ (1 − α)N(0, σ20) + αN(0, σ21) with σ21 > σ20and 0 < α ≤ 1/2. Letting λ = σ1/σ0, we list in Table 2 the critical values of λ such thatthe LowLAD and LowLSR estimators have the same variance. The overall comparison isprovided in Appendix E.

α 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009λ 24.04 17.10 14.04 12.22 10.99 10.09 9.38 8.82 8.36

α 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09λ 7.96 5.88 4.99 4.47 4.12 3.87 3.69 3.54 3.42

α 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50λ 3.32 3.01 2.86 2.79 2.77 2.78 2.83 2.92 3.05

Table 2: The critical values of λ that equate the variances of the LowLAD and LowLSRderivative estimators with different contamination.

3. Comparison to Existing First-Order Derivative Estimators

We first review some existing methods for the first-order derivative estimation: either theleast squares estimation or the quantile regression. Both methods can be used to estimatethe first-order derivative of m(·) due to the structure of model (1). Note that E [Yi|xi] =m(xi)+E [�i], and Qτ (Yi|xi) = m(xi)+Qτ (�i) because �i is independent of xi, where Qτ (�i)

7


is the τth unconditional quantile of �i. Although E [Yi|xi] may not equal to Qτ (Yi|xi), theirderivatives must be the same as the corresponding derivatives ofm(xi). To make the existingestimators comparable to our LowLAD estimator, we use the uniform kernel throughoutthis section.

3.1 LS, LowLSR and LAD Estimators

In Fan and Gijbels (1996), the first-order derivative is estimated by the least squares method:

(α̂LSi0 , α̂LSi1 , α̂

LSi2 , α̂

LSi3 , α̂

LSi4 ) = arg minα

k∑j=−k

(Yi+j − αi0 − αi1dj − αi2d2j − αi3d3j − αi4d4j

)2,

where α = (αi0, αi1, αi2, αi3, αi4)T . For ease of comparison, we exclude the point j = 0 so

that the same Yi’s are used as in the LowLAD estimation. Define the least squares (LS)estimator as

m̂(1)LS (xi) = α̂

LSi1 . (8)

Corollary 2 Under the assumptions of Theorem 1, the bias and variance of the LS esti-mator in (8) are, respectively,

Bias[α̂LSi1 ] ≈ −m(5)(xi)

504

k4

n4, Var[α̂LSi1 ] ≈

75σ2

8

n2

k3.

To derive the optimal convergence rate of the derivative estimation, Wang and Lin(2015) proposed the LowLSR estimator:

m̂(1)LowLSR(xi) = α̂

LowLSRi1 , (9)

where

(α̂LowLSRi1 , α̂LowLSRi3 ) = arg minαi1,αi3

k∑j=1

(Ỹ

(1)ij − αi1dj − αi3d

3j

)2.

Corollary 3 Under the assumptions of Theorem 1, the bias and variance of the LowLSRestimator in (9) are, respectively,

Bias[α̂LowLSRi1 ] ≈ −m(5)(xi)

504

k4

n4, Var[α̂LowLSRi1 ] ≈

75σ2

8

n2

k3.

Wang and Scott (1994) proposed the local polynomial least absolute deviation (LAD)estimator:

m̂(1)LAD(xi) = β̂

LADi1 , (10)

where β = (βi0, βi1, βi2, βi3, βi4)T and

(β̂LADi0 , β̂LADi1 , β̂

LADi2 , β̂

LADi3 , β̂

LADi4 ) = arg min

β

k∑j=−k

∣∣Yi+j − βi0 − βi1d1j − βi2d2j − βi3d3j − βi4d4j ∣∣ .8


Corollary 4 Under the assumptions of Theorem 1, the bias and variance of the LAD es-timator in (10) are, respectively,

Bias[β̂LADi1 ] ≈ −m(5)(xi)

504

k4

n4, Var[β̂LADi1 ] ≈

75

32f(0)2n2

k3.

Now we can see one key difference between the LS method and the LAD method. Forthe LS method, the LS estimator and the LowLSR estimator are asymptotically equivalent;while for the LAD method, the asymptotic variances of the LAD estimator and the LowLADestimator are very different, although their asymptotic biases are the same. To understandtheir discrepancy, we show here that the objective functions of the LS and LowLSR esti-mation are asymptotically equivalent while those of the LAD and LowLAD estimation arenot. Note that the objective function of the LowLSR estimation is equivalent to

4∑k

j=1

(Ỹ

(1)ij − αi1dj − αi3d3j

)2=∑k

j=1

(Yi+j − Yi−j − 2αi1dj − 2αi3d3j

)2=∑k

j=1

(Yi+j − Yi−j − (αi0 − αi0)− αi1 (dj − d−j)− αi2

(d2j − d2−j

)− αi3

(d3j − d3−j

)− αi4

(d4j − d4−j

))2=∑k

j=1


)2+∑k

j=1

(Yi−j − αi0 − αi1d−j − αi2d2−j − αi3d3−j − αi4d4−j

)2−2∑k

j=1


)(Yi−j − αi0 − αi1d−j − αi2d2−j − αi3d3−j − αi4d4−j

),

where d−j = −j/n. It can be shown that the cross term in the last equality is a higher-orderterm and is negligible, and the first two terms constitute exactly the objective function ofthe least squares estimation. More specifically, we can show that

m̂(1)LS (xi)−m

(1)(xi)− bias ≈ (0, 1, 0, 0, 0)(∑k

j=−kXjX

′j

)−1∑kj=−k

Xj�i+j

=∑k

j=−kDj�i+j

with Xj =(

1, dj , d2j , d

3j , d

4j

)Tand

Dj = nj(75k4 + 150k3 − 75k + 25)− j3

(105k2 + 105k − 35

)k(8k6 + 28k5 + 14k4 − 35k3 − 28k2 + 7k + 6)

= −D−j ,

andm̂

(1)LowLSR(xi)−m(1)(xi)− bias

≈ (1, 0)

[(0 1 0 0 00 0 0 1 0

)(∑kj=1XjX

′j

)( 0 1 0 0 00 0 0 1 0

)T]−1·(

0 1 0 0 00 0 0 1 0

)∑kj=1Xj

�i+j−�i−j2

=∑k

j=1 2Dj�i+j−�i−j

2 =∑k

j=−kDj�i+j .

9


The influence functions of m̂(1)LS (xi) and m̂

(1)LowLSR(xi) are exactly the same, so the contribu-

tion of the cross term is null.

On the contrary, the objective function of the LowLAD estimator is equivalent to

2∑k

j=1

∣∣∣Ỹ (1)ij − βi1dj − βi3d3j ∣∣∣=∑k

j=1

∣∣∣Yi+j − Yi−j − (βi0 − βi0)− βi1 (dj − d−j)− βi2 (d2j − d2−j)− βi3 (d3j − d3−j)− βi4 (d4j − d4−j)∣∣∣=∑k

j=1

∣∣∣Yi+j − βi0 − βi1dj − βi2d2j − βi3d3j − βi4d4j ∣∣∣+∑kj=1 ∣∣∣Yi−j − βi0 − βi1d−j − βi2d2−j − βi3d3−j − βi4d4−j∣∣∣+ extra term,

where the extra term is equal to

−2 max

(sign

(Yi+j − βi0 − βi1dj − βi2d2j − βi3d3j − βi4d4j

)·sign

(Yi−j − βi0 − βi1d−j − βi2d2−j − βi3d3−j − βi4d4−j

), 0

)·min

(∣∣∣Yi+j − βi0 − βi1dj − βi2d2j − βi3d3j − βi4d4j ∣∣∣ , ∣∣∣Yi−j − βi0 − βi1d−j − βi2d2−j − βi3d3−j − βi4d4−j∣∣∣) ,and cannot be neglected, where sign(x) = 1 if x > 0 and sign(x) = −1 if x < 0. Morespecifically, we can show that

m̂(1)LAD(xi)−m(1)(xi)− bias

≈ (0, 1, 0, 0, 0)(

2f(0)∑k

j=−kXjX′j

)−1∑kj=−kXjsign (�i+j)

= 12f(0)∑k

j=−kDjsign (�i+j) =1

f(0)

∑kj=1Dj

sign(�i+j)−sign(�i−j)2 ,

and

m̂(1)LowLAD(xi)−m(1)(xi)− bias

≈ (1, 0)

[2g(0)

(0 1 0 0 00 0 0 1 0

)(∑kj=1XjX

′j

)( 0 1 0 0 00 0 0 1 0

)T]−1·(

0 1 0 0 00 0 0 1 0

)∑kj=1Xjsign

(�i+j−�i−j

2

)= 1g(0)

∑kj=1Djsign

(�i+j−�i−j

2

).

So the contribution of the extra term to the influence function is

Dj

[sign

(�i+j−�i−j

2

)g(0) −

sign(�i+j)−sign(�i−j)2f(0)

]=

Djg(0)

{sign (�i+j − �i−j) , if �i+j�i−j > 0,(

1− g(0)f(0))

sign (�i+j) , if �i+j�i−j < 0,

which is not null, where we neglect the event that �i+j�i−j = 0 because the probability ofsuch an event is zero.

10


3.2 Quantile Regression Estimators

Quantile regression (Koenker and Bassett, 1978; Koenker, 2005) exploits the distributioninformation of the error term to improve the estimation efficiency. Following the compositequantile regression (CQR) in Zou and Yuan (2008), Kai et al. (2010) proposed the localpolynomial CQR estimator. In general, the local polynomial CQR estimator of m(1)(xi) isdefined as

m̂(1)CQR(xi) = γ̂

CQRi1 , (11)

where

({γ̂CQRi0h

}qh=1

, γ̂CQRi1 , γ̂CQRi2 , γ̂

CQRi3 , γ̂

CQRi4 )

= arg minγ

q∑h=1

k∑j=−k

ρτh(Yi+j − γi0h − γi1d1j − γi2d2j − γi3d3j − γi4d4j )

,with γ =

({γi0h}qh=1, γi1, γi2, γi3, γi4

)T, ρτ (x) = τx − xI(x < 0) is the check function, and

τh = h/(q + 1).

Corollary 5 Under the assumptions of Theorem 1, the bias and variance of the CQRestimator in (11) are, respectively,

Bias[γ̂CQRi1 ] ≈ −m(5)(xi)

504

k4

n4, Var[γ̂CQRi1 ] ≈

75R1(q)

8

n2

k3,

where R1(q) =∑q

l=1

∑ql′=1 τll′/{

∑ql=1 f(cl)}

2, cl = F−1(τl), and τll′ = min{τl, τl′} − τlτl′.

As q →∞,

R1(q)→1

12(E[f(�)])2=

1

3g(0)2, Var[γ̂CQRi1 ] ≈

75

24g(0)2n2

k3.

Zhao and Xiao (2014) proposed the weighted quantile average (WQA) estimator for theregression function in nonparametric regression, an idea originated from Koenker (1984).We now extend the WQA method to estimate m(1)(xi) using the local polynomial quantileregression. Specifically, we define

m̂(1)WQA(xi) =

q∑h=1

whγ̂WQAi1h , (12)

where∑q

h=1wh = 1, and

(γ̂WQAi0h , γ̂WQAi1h , γ̂

WQAi2h , γ̂

WQAi3h , γ̂

WQAi4h )

= arg minγh

k∑j=−k

ρτh(Yi+j − γi0h − γi1hd1j − γi2hd2j − γi3hd3j − γi4hd4j )

with γh = (γi0h, γi1h, γi2h, γi3h, γi4h)

T .

11


Corollary 6 Under the assumptions of Theorem 1, the bias and variance of the WQAestimator in (12) are, respectively,

Bias[m̂(1)WQA(xi)] ≈ −

m(5)(xi)

504

k4

n4, Var[m̂

(1)WQA(xi)] ≈

75R2(q|w)8

n2

k3,

where R2(q|w) = wTHw with w = (w1, · · · , wq)T , and H ={

τll′f(cl)f(cl′ )

}1≤l,l′≤q

. The

optimal weights are given by w∗ =H−1eqeTq H

−1eq, where eq = (1, . . . , 1)

Tq×1, and R2(q|w∗) =

(eTq H−1eq)

−1. As q → ∞, under the regularity assumption in Theorem 6.2 of Zhao andXiao (2014), the variance of the optimal CQR is

Var[m̂(1)WQA(xi)] ≈

75I(f)−1

8

n2

k3,

where I(f) is the Fisher information of f .

LS LowLSR LAD LowLAD CQR WQA ML

Asymptotic s2 σ2 σ2 14f(0)2

12g(0)2

13g(0)2

I(f)−1 I(f)−1

s2 σ2 σ2 14f(0)2

12g(0)2

R1(q) R2(q) I(f)−1

Table 3: The variances s2 75n2

8k3of the existing first-order derivative estimators.

For the LS, LowLSR, LAD, CQR, WQA and LowLAD estimators with the same k, theirasymptotic biases are all the same. In contrast, their asymptotic variances are 75n

2

8k3s2 with s2

being σ2, σ2, 14f(0)2

, R1(q), R2(q) and1

2g(0)2, respectively. From the kernel interpretation of

the differenced estimator by Wang and Yu (2017), we expect an equivalence between the LSand LowLSR estimators. As q →∞, we have R1(q)→ 1/{3g(0)2} and R2(q)→ I(f)−1, andhence the WQA estimator is the most efficient estimator as q becomes large. Nevertheless,it requires the error density function to be known in advance to carry out the most efficientWQA estimator. Otherwise, a two-step procedure is needed, where the first step estimatesthe error density. For a fixed q, the three quantile-based estimators (i.e., LAD, CQR, WQA)depend only on the density values of f(·) at finite quantile points, whose behaviors areuncertain and may not be reliable. In contrast, our new estimator relies on g(0) = 2E[f(x)],which includes all information on the density f(·) and hence is more robust.

3.3 Equivalence to the Infinitely CQR Estimator

From the asymptotic variances, we can see that LowLAD has a close relationship withCQR. We now show that the LowLAD estimator with random difference is asymptoticallyequivalent to the infinitely CQR estimator. First, define the first-order random differencesequence as

Yijl = Yi+j − Yi+l, −k ≤ j, l ≤ k,

where we implicitly exclude j and/or l being 0 to be comparable with the LowLAD esti-mator. Second, define the LowLAD estimator with random difference, referred to as the

12


RLowLAD estimator of m(1)(xi), as

m̂(1)RLowLAD(xi) = β̂

RLowLADi1 , (13)

where

(β̂RLowLADi1 , β̂RLowLADi2 , β̂

RLowLADi3 , β̂

RLowLADi4 )

= arg minb

k∑j=−k

k∑l=−k,l


where γ̂WQAi1 =(γ̂WQAi11 , · · · , γ̂

WQAi1q

)T, W is a symmetric weight matrix, and w =

WeqeTq Weq

.

When W = Iq, the q × q identity matrix, we get the equally-weighted WQA estimator;when W = H−1 we get the optimally weighted WQA estimator; and when

W = aIq + (1− a)H−1 6= Iq,

we get an estimator that is asymptotically equivalent to the CQR estimator, where

a =− (q −B) (1−BC) +

√(q2 −AB) (1−BC)

A− (2q −B) (1−BC)− q2C6= 1

with A = eTq Heq = q2R2(q|wE), B = R2 (q|w∗)−1 and C = R1 (q). For example, if

ε ∼ N (0, 1), then when q = 5, a = −0.367; when q = 9, a = −0.165; when q = 19,a = −0.067; and when q = 99, a = −0.011. This is why imposing constraints directly on theobjective function (i.e., the CQR estimator) or on the resulting estimators (i.e., eTq γ̂

WQAi1 )

would generate different estimators. The RLowLAD estimator and the CQR estimatorhave the same asymptotic variance because both of them impose constraints directly on theobjective function.

4. Derivative Estimation at Boundaries

At the left boundary with 2 ≤ i ≤ k, the bias and variance of the LowLAD estimator are−m(5)(xi)(i− 1)4/(504n4) and 75n2/(16g(0)2(i− 1)3), respectively. At the endpoint i = 1,the LowLAD estimator does not exist. Similar results hold for the estimation at the rightboundary with n − k + 1 ≤ i ≤ n − 1. In the presence of outliers and heavy-tailed errors,the estimation variance at the boundaries may be even larger. In this section, we proposean asymmetrical LowLAD method (As-LowLAD) to further reduce the estimation varianceas well as to improve the finite-sample performance at the boundaries.

Assume that m(·) is twice continuously differentiable on [0, 1]. For 1 ≤ i ≤ k, we definethe asymmetric lag-j first-order difference quotients as

Y ij =Yi+j − Yixi+j − xi

, −(i− 1) ≤ j ≤ k, j 6= 0.

By decomposing Y ij into two parts, we have

Y ij =m(xi+j)−m(xi)

j/n+�i+j − �ij/n

≈ m(1)(xi) + (−�i)d−1j +�i+jj/n

+m(2)(xi)

2!

j

n,

where �i is fixed as j increases. Thus, Median(Y

ij

∣∣�i) = m(1)(xi) + (−�i)d−1j + m(2)(xi)2! jn .Ignoring the last term, we can rewrite the above model as

Y ij ≈ βi1 + βi0d−1j + δij , −(i− 1) ≤ j ≤ k, j 6= 0,

where (βi0, βi1)T = (−�i,m(1)(xi))T . By the LowLAD method, the regression coefficients

are estimated by

14


(β̂i0, β̂i1

)= arg min

βi0,βi1

k∑j=−(i−1)

∣∣Y ij − βi1 − βi0d−1j ∣∣wj= arg min

βi0,βi1

k∑j=−(i−1)

∣∣Ỹ ij − βi0 − βi1dj∣∣,(14)

where wj = dj and Ỹ

ij = Yi+j − Yi. The As-LowLAD estimator of m(1)(xi) is β̂i1.

Similarly to Theorem 1, we can prove the asymptotic normality for β̂i1. The followingtheorem derives its asymptotic bias and variance.

Theorem 8 Assume that �i are iid random errors with median 0 and a continuous, positivedensity f(·) in a neighborhood of zero. Furthermore, assume that m(·) is twice continuouslydifferentiable on [0, 1]. Then for each 1 ≤ i ≤ k, the bias and variance of β̂i1 in (14) are,respectively,

Bias[β̂i1] ≈m(2)(xi)

2

k4 + 2k3i− 2ki3 − i3

n(k3 + 3k2i+ 3ki2 + i3),

Var[β̂i1] ≈3

f(0)2n2

k3 + 3k2i+ 3ki2 + i3.

For the estimation at the boundaries, Wang and Lin (2015) proposed a one-side LowLSR(OS-LowLSR) estimator of the first-order derivative, with the bias and variance beingm(2)(xi)k/(2n) and 12σ

2n2/k3, respectively. In contrast, if we consider the one-side LowLAD(OS-LowLAD) estimator of the first-order derivative, its bias and variance arem(2)(xi)k/(2n)and 3n2/(f(0)2k3), respectively. Note that the two biases are the same, while the variancessatisfy the same proportional relationship as the LowLSE and LowLAD estimators at theinterior point:

3

f(0)2n2

k3=π

2

12σ2n2

k3>

12σ2n2

k3

when �i ∼ N(0, σ2).Theorem 8 shows that our estimator has a smaller bias than the OS-LowLSR and OS-

LowLAD estimators. In the special case i = k, the bias disappears (in fact it reducesto the higher-order term O(k2/n2)). With normal errors, when 1 < i < b0.163kc, theorder of variances of the three estimators is Var(m̂

(1)OS−LAD(xi)) > Var(m̂

(1)AS−LAD(xi)) >

Var(m̂(1)OS−LSR(xi)), where for a real number x, bxc means the largest integer less than x;

when b0.163kc < i < k, the order of variances of the three estimators is Var(m̂(1)AS−LAD(xi)) <Var(m̂

(1)OS−LAD(xi)) < Var(m̂

(1)OS−LSR(xi)). As i approaches k, the variance of the As-

LowLAD estimator is reduced to one-eighth of the variance of the OS-LowLAD estimator,and is much smaller than the variance of the OS-LowLSR estimator.

Up to now, we have the first-order derivative estimators {m̂(1)(xi)}ni=1 on the discretepoints {xi}ni=1. To estimate the first-order derivative function, we suggest two strategies fordifferent noise levels of the derivative data: the cubic spline interpolation in Knott (2000)

15


for ‘good’ derivative estimators, and the local polynomial regression in Brown and Levine(2007) and De Brabanter et al. (2013) for ‘bad’ derivative estimators. Here, the terms‘good’ and ‘bad’ indicate small and large estimation variances of the derivative estimator,respectively.

5. Second- and Higher-Order Derivative Estimation

In this section, we generalize the robust method for the first-order derivative estimation tothe second- and higher-order derivatives estimation.

5.1 Second-Order Derivative Estimation

Define the second-order difference quotients as

Y(2)ij =

Yi−j − 2Yi + Yi+jj2/n2

, 1 ≤ j ≤ k. (15)

Assume that m(·) is six times continuously differentiable. We decompose (15) into twoparts and simplify it by the Taylor expansion as

Y(2)ij =

m(xi−j)− 2m(xi) +m(xi+j)j2/n2

+�i−j − 2�i + �i+j

j2/n2

≈ m(2)(xi) +m(4)(xi)

12

j2

n2+m(6)(xi)

360

j4

n4+�i−j − 2�i + �i+j

j2/n2.

Since i is fixed as j varies, the conditional expectation of Y(2)ij given �i is

E[Y(2)ij

∣∣�i] ≈ m(2)(xi) + m(4)(xi)12

j2

n2+m(6)(xi)

360

j4

n4+ (−2�i)

n2

j2.

This results in the true regression model as

Y(2)ij ≈ αi2 + αi4d

2j + αi6d

4j + αi0d

−2j + δij , 1 ≤ j ≤ k,

where αi = (αi0, αi2, αi4, αi6)T = (−2�i,m(2)(xi),m(4)(xi)/12,m(6)(xi)/360)T , and the er-

rors δij =�i+j+�i−jj2/n2

. If �i has a symmetric density about zero, then δ̃ij =j2

n2δij = �i+j + �i−j

has median 0 (see Appendix D). Following the similar estimation as in the first-order deriva-tive estimation, our LowLAD estimator of m(2)(xi) is defined as

m̂(2)LowLAD(xi) = α̂i2, (16)

where

(α̂i0, α̂i2, α̂i4)T = arg min

αi0,αi2,αi4

k∑j=1

∣∣Y (2)ij − (αi2 + αi4d2j + αi0d−2j )∣∣Wj= arg min

αi0,αi2,αi4

k∑j=1

∣∣Ỹ (2)ij − (αi0 + αi2d2j + αi4d4j )∣∣16


with Wj = d2j and Ỹ

(2)ij = Yi−j−2Yi+Yi+j . The following theorem shows that m̂

(2)LowLAD(xi)

behaves similarly as m̂(1)LowLAD(xi).

Theorem 9 Assume that �i are iid random errors whose density f(·) is continuous andsymmetric about zero. Then as k →∞ and k/n→ 0, α̂i2 in (16) is asymptotically normallydistributed with

Bias[α̂i2] ≈ −m(6)(xi)

792

k4

n4, Var[α̂i2] ≈

2205

16h(0)2n4

k5.

where h(0) =∫∞−∞ f

2(x)dx. The optimal k that minimizes the AMSE is

kopt ≈ 3.93(

1

h(0)2m(6)(xi)2

)1/13n12/13,

and, consequently, the minimum AMSE is given by

AMSE[α̂i2] ≈ 0.24

(m(6)(xi)

10

h(0)16

)1/13n−8/13.

Following the simulation settings in Figure 1, Figure 2 displays the LowLAD andLowLSR estimators for the second-order derivative. The finite-sample performance of twoestimators is parallel to the first-order derivative estimation. For the theoretical results, the

LowLAD and LowLSR estimators have the same asymptotic bias −m(6)(xi)792

k4

n4, while their

asymptotic variances are 220516h(0)2

n4

k5and 2205σ

2

8n4

k5, respectively. When � ∼ N(0, σ2), we have

h(0) = 1/(2√πσ), so

2205

64h(0)2n4

k5=π

2

2205σ2

8

n4

k5>

2205σ2

8

n4

k5,

which matches the the simulation results in Figure 2.

5.2 Higher-Order Derivative Estimation

We now propose the robust method for estimating the higher-order derivatives m(l)(xi)with l > 2 via a two-step procedure. In the first step, we construct a sequence of symmetricdifference quotients in which the higher-order derivative is the intercept of the linear regres-sion derived by the Taylor expansion, and in the second step, we estimate the higher-orderderivative using the LowLAD method.

When l is odd, let d = (l + 1)/2. We linearly combine m(xi±j) subject to

d∑h=1

[ajd+hm(xi+jd+h) + a−(jd+h)m(xi−(jd+h))] = m(l)(xi) +O

(j

n

), 0 ≤ j ≤ k,

where k is a positive integer. We can derive a total of 2d equations through the Taylorexpansion to solve the 2d unknown parameters. Define

Y(l)ij =

d∑h=1

[ajd+hYi+jd+h + a−(jd+h)Yi−(jd+h)],

17


Figure 2: (a) Simulated data set of size 300 as in Figure 1. (b)-(d) The second-orderLowLAD derivative estimator (green points) and second-order LowLSR derivativeestimator (red dashed line) for k ∈ {25, 35, 60}. As a reference, the true second-order derivative curve is also plotted (bold line).

and consider the linear regression

Y(l)ij = m

(l)(xi) + δij , 1 ≤ j ≤ k,

where δij =∑d

h=1[ai,jd+h�i+jd+h + ai,−(jd+h)�i−(jd+h)] +O(j/n).

When l is even, let d = l/2. We linearly combine m(xi±j) subject to

bjm(xi) +d∑

h=1

[ajd+hm(xi+jd+h) + a−(jd+h)m(xi−(jd+h))] = m(l)(xi) +O

(j

n

), 0 ≤ j ≤ k,

18


where k is a positive integer. We can derive a total of 2d+ 1 equations through the Taylorexpansion to solve the 2d+ 1 unknown parameters. Define

Y(l)ij = bjYi +

d∑h=1

[ajd+hYi+jd+h + a−(jd+h)Yi−(jd+h)],

and consider the linear regression

Y(l)ij = m

(l)(xi) + bj�i + δij , 1 ≤ j ≤ k,

where δij =∑d

h=1[ai,jd+h�i+jd+h + ai,−(jd+h)�i−(jd+h)] +O(j/n).When k is large, we suggest to keep the j2/n2 term as in (6) to reduce the estimation

bias. If∑d

h=1[ai,jd+h�i+jd+h + ai,−(jd+h)�i−(jd+h)] has median zero, then we can obtain thehigher-order derivative estimators by the LowLAD method and deduce their asymptoticproperties by similar arguments as in the previous sections. To save space, we omit thetechnical details in this paper.

6. Simulation Studies

In this section, we conduct more simulations to further evaluate the finite-sample perfor-mance of the first- and second-order derivative estimators and compare them with threeexisting methods.

6.1 First-Order Derivative Estimators

We first consider the following two regression functions:

m1(x) = sin(2πx) + cos(2πx) + log(4/3 + x), x ∈ [−1, 1],m2(x) = 32e

−8(1−2x)2(1− 2x), x ∈ [0, 1].

These two functions were also considered in Hall (2010) and De Brabanter et al. (2013).We simulated the data set from model (1) with n = 500 and the errors being generated asfollows: 90% of the errors come from � ∼ N(0, σ2) with σ = 0.1, and the remaining 10%come from � ∼ N(0, σ21) with σ1 = 1 or 10 corresponding to the low or high contaminatedlevel. Figures 3 and 4 present the finite-sample performance of the first-order derivativeestimator for the regression functions m1 and m2, respectively. It shows that our estimatedcurves of the first-order derivative fit the true curves more accurately than the LowLSRestimator in the presence of sharp-peak and heavy-tailed errors. The heavier the tail, themore significant the improvement.

We also compute the mean absolute errors to further assess the performance of differ-ent methods. Since the oscillation of the periodic function depends on the frequency andamplitude, we consider the sine function as the regression function,

m3(x) = A sin(2πfx), x ∈ [0, 1].

The errors are generated in the same way as in the previous simulation study. We considerthree sample sizes: n = 50, 200 and 1000, two standard deviations: σ = 0.1 and 0.5, two

19


frequencies: f = 0.5 and 1, and two amplitudes: A = 1 and 10. We consider the followingadjusted mean absolute error (AMAE) as the criterion for simplicity and robustness:

AMAE(k) =1

n− 2k

n−k∑i=k+1

∣∣m̂(1)(xi)−m(1)(xi)∣∣.Based on the condition k < n/4 and the AMAE, we choose the optimal k value as

k̂ = min{arg mink

AMAE(k),n

4}.

Table 5 reports the simulation results based on 1000 repetitions. The numbers outsideand inside the brackets represent the mean and standard deviation of the AMAE, respec-tively. It is evident that our robust estimator performs better than the LowLSR estimatorfor the first-order derivative in most cases except for the case with σ1 = 1 and σ = 0.5. Alsofor the cases with σ1 = 2σ, the contamination is very light and thus the simulation resultis similar to the cases with normal errors in Section 2.2. In other cases, the contaminationalways dominates in the variance.

LowLAD LowLSR LowLAD LowLSR LowLAD LowLSR

A f σ n = 50 n = 50 n = 200 n = 200 n = 1000 n = 1000

σ1 = 1

1 0.5 0.1 0.61(0.19) 1.13(0.53) 0.29(0.06) 0.62(0.20) 0.13(0.02) 0.28(0.09)0.5 2.45(0.56) 2.07(0.62) 1.27(0.25) 1.08(0.31) 0.58(0.11) 0.49(0.15)

1 0.1 0.61(0.18) 1.12(0.53) 0.29(0.06) 0.61(0.21) 0.13(0.02) 0.28(0.08)0.5 2.45(0.54) 2.08(0.62) 1.28(0.25) 1.08(0.31) 0.58(0.11) 0.49(0.13)

10 0.5 0.1 0.45(0.14) 0.84(0.43) 0.29(0.06) 0.61(0.20) 0.13(0.02) 0.28(0.82)0.5 1.87(0.50) 1.58(0.56) 1.27(0.24) 1.07(0.31) 0.58(0.11) 0.49(0.13)

1 0.1 0.62(0.15) 0.95(0.41) 0.33(0.06) 0.65(0.21) 0.21(0.03) 0.32(0.08)0.5 1.89(0.46) 1.60(0.51) 1.28(0.24) 1.08(0.30) 0.60(0.11) 0.51(0.14)

σ1 = 10

1 0.5 0.1 0.67(0.55) 7.82(4.50) 0.30(0.06) 5.95(2.10) 0.13(0.03) 2.69(0.78)0.5 2.43(0.82) 8.07(4.10) 1.47(0.29) 5.82(1.97) 0.66(0.12) 2.75(0.84)

1 0.1 0.65(0.47) 7.85(4.42) 0.30(0.06) 5.74(1.97) 0.13(0.02) 2.69(0.78)0.5 2.43(0.80) 7.95(4.24) 1.47(0.30) 5.97(2.03) 0.66(0.13) 2.76(0.78)

10 0.5 0.1 0.67(0.62) 7.84(4.36) 0.30(0.06) 5.98(2.10) 0.13(0.02) 2.71(0.78)0.5 2.39(0.82) 7.97(4.28) 1.47(0.30) 5.92(2.05) 0.66(0.13) 2.75(0.79)

1 0.1 0.78(0.47) 7.87(4.56) 0.34(0.06) 5.91(2.03) 0.21(0.03) 2.74(0.80)0.5 2.50(0.95) 8.04(4.42) 1.48(0.29) 5.78(1.97) 0.67(0.12) 2.64(0.75)

Table 4: AMAE for the first-order derivative estimation of the function m3.

6.2 Second-Order Derivative Estimators

To assess the finite-sample performance of the second-order derivative estimators, we con-sider the same regression functions as in Section 6.1. Figures 5 and 6 present the estimated

20


curves and the true curves of the second-order derivatives of m1 and m2, respectively. Itshows that our LowLAD estimator fits the true curves more accurately than the LowLSRestimator in all settings.

We further compare our method with two well-known methods: the local polynomialregression with p = 5 (use R package ‘locpol’ in Cabrera (2012)) and the penalized smooth-ing splines with norder = 6 and method = 4 (use R package ‘pspline’ in Ramsay and Ripley(2013)). For simplicity, we consider the simple version of m3 with A = 5:

m4(x) = 5 sin(2πx), x ∈ [0, 1].

We let n = 500, and the errors are generated in the same way as in Section 6.1. With 1000repetitions, the simulation results in Figures 7 and 8 indicate that our robust estimator issuperior to the existing methods in the presence of sharp-peak and heavy-tailed errors.

7. Conclusion and Extensions

In this paper, we propose a robust differenced method for estimating the first- and higher-order derivatives of the regression function in nonparametric models. The new methodconsists of two main steps: first construct a sequence of symmetric (or random) differencequotients, and then estimate the derivatives using the LowLAD regression. The new deriva-tive estimator does not require a zero median or a positive density at zero for the error term.Also, the modified version of the new estimator with random difference is asymptoticallyequivalent to the (infinitely) composite quantile regression estimator. We further extendthe robust differenced method to the derivative estimation at boundary points and thehigher-order derivatives estimation.

We in this paper focus on the derivative estimation with fixed designs with iid errors.With minor technical extension, the proposed method can be extended to random designswith heteroskedastic errors. Further extensions to semiparametric estimation and change-point estimation are also possible. Following the discussions at the end of Section 3.3, it isalso interesting to study which kinds of weights should be imposed on the objective functionsin CQR and Yijl terms in RLowLAD to generate the same efficiency as the optimal WQAestimator. These questions would be investigated in a separate paper.

8. Acknowledge

The research was supported by NNSF projects (11571204 and 11231005) of China.

References

G. Boente and D. Rodriguez. Robust estimators of high order derivatives of regressionfunctions. Statistics & Probability Letters, 76(13):1335–1344, 2006.

L.D. Brown and M. Levine. Variance estimation in nonparametric regression via the differ-ence sequence method. The Annals of Statistics, 35(5):2219–2232, 2007.

J.L.O. Cabrera. locpol: Kernel local polynomial regression. R package version 0.6-0, 2012.URL http://mirrors.ustc.edu.cn/CRAN/web/packages/locpol/index.html.

21


R. Charnigo, M. Francoeur, M.P. Mengüç, A. Brock, M. Leichter, and C. Srinivasan. Deriva-tives of scattering profiles: tools for nanoparticle characterization. Journal of the OpticalSociety of America, 24(9):2578–2589, 2007.

R. Charnigo, M. Francoeur, M.P. Mengüç, B. Hall, and C. Srinivasan. Estimating quanti-tative features of nanoparticles using multiple derivatives of scattering profiles. Journalof Quantitative Spectroscopy and Radiative Transfer, 112:1369–1382, 2011a.

R. Charnigo, B. Hall, and C. Srinivasan. A generalized Cp criterion for derivative estimation.Technometrics, 53(3):238–253, 2011b.

P. Chaudhuri and J.S. Marron. SiZer for exploration of structures in curves. Journal of theAmerican Statistical Association, 94(447):807–823, 1999.

W.S. Cleveland. Robust locally weighted regression and smoothing scatterplots. Journal ofthe American Statistical Association, 74(368):829–836, 1979.

W. Dai, T. Tong, and M.G. Genton. Optimal estimation of derivatives in nonparametricregression. Journal of Machine Learning Research, 17(164):1–25, 2016.

K. De Brabanter, J. De Brabanter, B. De Moor, and I. Gijbels. Derivative estimation withlocal polynomial fitting. Journal of Machine Learning Research, 14(1):281–301, 2013.

M. Delecroix and A.C. Rosa. Nonparametric estimation of a regression function and itsderivatives under an ergodic hypothesis. Journal of Nonparametric Statistics, 6(4):367–382, 1996.

N.R. Draper and H. Smith. Applied Regression Analysis. Wiley and Sons, New York, 2ndedition, 1981.

J. Fan and I. Gijbels. Local Polynomial Modelling and Its Applications. Chapman & Hall,London, 1996.

J. Fan and P. Hall. On curve estimation by minimizing mean absolute deviation and itsimplications. The Annals of Statistics, 22(2):867–885, 1994.

S. Ghosal, A. Sen, and A.W. vander Vaart. Testing monotonicity of regression. The Annalsof Statistics, 28(4):1054–1082, 2000.

I. Gijbels and A.C. Goderniaux. Data-driven discontinuity detection in derivatives of aregression function. Communications in Statistics-Theory and Methods, 33(4):851–871,2005.

B. Hall. Nonparametric Estimation of Derivatives with Applications. PhD thesis, Universityof Kentucky, Lexington, Kentucky, 2010.

W. Härdle. Applied Nonparametric Regression. Cambridge University Press, Cambridge,1990.

W. Härdle and T. Gasser. Robust non-parametric function fitting. Journal of the RoyalStatistical Society, Series B, 46(1):42–51, 1984.

22


W. Härdle and T. Gasser. On robust kernel estimation of derivatives of regression functions.Scandinavian Journal of Statistics, 12(3):233–240, 1985.

P.J. Huber and E.M. Ronchetti. Robust Statistics. John Wiley & Sons, Inc., 2009.

B. Kai, R. Li, and H. Zou. Local composite quantile regression smoothing: an efficient andsate alternative to local polynomial regression. Journal of the Royal Statistical Society,Series B, 72(1):49–69, 2010.

G.D. Knott. Interpolating Cubic Splines. Spring, 1st edition, 2000.

R. Koenker. A note on l-estimation for linear models. Statistics & Probability Letters, 2:323–325, 1984.

R. Koenker. Quantile Regression. Cambridge University Press, New York, 2005.

R. Koenker and G. Bassett. Regression quantiles. Econometrica, 46:33–50, 1978.

X.R. Li and V.P. Jilkov. Survey of maneuvering target tracking. Part I: Dynamic models.IEEE Transations on Aerospace and Electronic Systems, 39(4):1333–1364, 2003.

X.R. Li and V.P. Jilkov. Survey of maneuvering target tracking. Part II: Motion models ofballistic and space targets. IEEE Transations on Aerospace and Electronic Systems, 46(1):96–119, 2010.

I. Matyasovszky. Detecting abrupt climate changes on different time scales. Theoretical andApplied Climatology, 105:445–454, 2011.

H.G. Müller. Nonparametric Regression Analysis of Longitudinal Data. Springer, New York,1988.

H.G. Müller, U. Stadtmüller, and T. Schmitt. Bandwidth choice and confidence intervalsfor derivatives of noisy data. Biometrika, 74(4):743–749, 1987.

J. Newell, J. Einbeck, N. Madden, and K. McMillan. Model free endurance markers basedon the second derivative of blood lactate curves. In Proceedings of the 20th InternationalWorkshop on Statistical Modelling, pages 357–364, Sydney, 2005.

F. Osorio. L1pack: Routines for L1 estimation. R package version 0.3, 2015. URL http://www.ies.ucv.cl/l1pack/.

A. Pakes and D. Pollard. Simulation and the asymptotics of optimization estimators. Econo-metrica, 57(5):1027–1057, 1989.

C. Park and K.H Kang. SiZer analysis for the comparison of regression curves. Computa-tional Statistics & Data Analysis, 52(8):3954–3970, 2008.

J. Porter and P. Yu. Regression discontinuity designs with unknown discontinuity points:Testing and estimation. Journal of Econometrics, 189:132–147, 2015.

J. Ramsay and B. Ripley. pspline: Penalized smoothing splines. R package version 1.0-16,2013. URL http://mirrors.ustc.edu.cn/CRAN/web/packages/pspline/index.html.

23


J.O. Ramsay and B.W. Silverman. Applied Functional Data Analysis: Methods and CaseStudies. Springer, New York, 2002.

D. Ruppert and M.P. Wand. Multivariate locally weighted least squares regression. TheAnnals of Statistics, 22(3):1346–1370, 1994.

C.J. Stone. Additive regression and other nonparametric models. The Annals of Statistics,13(2):689–705, 1985.

P.S. Swain, K. Stevenson, A. Leary, L.F. Montano-Gutierrez, I.B.N. Clark, J. Vogel, andT. Pilizota. Inferring time derivatives including cell growth rates using gaussian process.Nature Communications, in press, 2017.

G. Wahba and Y. Wang. When is the optimal regularization parameter insensitive to thechoice of the loss function? Communications in Statistics-Theory and Methods, 19:1685–1700, 1990.

F.T. Wang and D.W. Scott. The L1 method for robust nonparametric regression. Journalof the American Statistical Association, 89(425):65–76, 1994.

W.W. Wang and L. Lin. Derivative estimation based on difference sequence via locallyweighted least squares regression. Journal of Machine Learning Research, 16:2617–2641,2015.

W.W. Wang and P. Yu. Asymptotically optimal differenced estimators of error variancein nonparametric regression. Computational Statistics & Data Analysis, (105):125–143,2017.

A.H. Welsh. Robust estimation of smooth regression and spread functions and their deriva-tives. Statistica Sinica, 6:347–366, 1996.

Z. Zhao and Z. Xiao. Efficient regression via optimally combining quantile information.Econometric Theory, 30(6):1272–1314, 2014.

S. Zhou and D.A. Wolfe. On derivative estimation in spline regression. Statistica Sinica, 10(1):93–108, 2000.

H. Zou and M. Yuan. Composite quantile regression and the oracle model selection theory.The Annals of Statistics, 36(3):1108–1126, 2008.

24


Appendix A. Proof of Theorem 1

Proposition 10 If �i are iid with a continuous, positive density f(·) in a neighborhood ofthe median, then ζ̃ij = (�i+j − �i−j)/2 (j=1,. . . , k) are iid with median 0 and a continuous,positive, symmetric density g(·), where

g(x) = 2

∫ ∞−∞

f(2x+ �)f(�)d�.

Proof of Proposition 10 The distribution of ζ̃ij = (�i+j − �i−j)/2 is

Fζ̃ij (x) = P ((�i+j − �i−j)/2 ≤ x)

=

∫∫�i+j≤2x+�i−j

f(�i+j)f(�i−j)d�i+jd�i−j

=

∫ ∞−∞{∫ 2x+�i−j−∞

f(�i+j)d�i+j}f(�i−j)d�i−j

=

∫ ∞−∞

f(2x+ �i−j)f(�i−j)d�i−j .

Then the density of ζ̃ij is

g(x) ,dFζ̃ij (x)

dx= 2

∫ ∞−∞

f(2x+ �i−j)f(�i−j)d�i−j .

The density g(·) is symmetric due to

g(−x) = 2∫ ∞−∞

f(−2x+ �i−j)f(�i−j)d�i−j

= 2

∫ ∞−∞

f(�i−j)f(�i−j + 2x)d�i−j

= g(x).

Therefore, we have

Fζ̃ij (0) =

∫ ∞−∞

f(�i−j)f(�i−j)d�i−j =1

2f2(�i−j) |∞−∞=

1

2,

g(0) = 2

∫ ∞−∞

f2(�i−j)d�i−j .

Proof of Theorem 1 Rewrite the objective function as

Sn (b) =1

n

∑j

fn

(Ỹij |b

),

25


where

fn

(Ỹij |b

)=∣∣∣Ỹ (1)ij − bi1dj − bi3d3j ∣∣∣ 1h1(0 < dj ≤ h)

with b = (bi1, bi3)T and h = k/n. Define Xj =

(dj , d

3j

)Tand H =diag

{h, h3

}. Note that

arg minbSn (b) = arg min

b[Sn(b)− Sn (β)], where β =

(m(1)(xi),m

(3)(xi)/6)T

.

We first show that H(β̂ − β

)= op(1), where β̂ = (β̂i1, β̂i3)

T . We use Lemma 4 of

Porter and Yu (2015) to show the consistency. Essentially, we need to show that

(i) supb∈B|Sn (b)− Sn (β)− E [Sn (b)− Sn (β)]|

p−→ 0,

(ii) inf‖H(b−β)‖>δ

E [Sn (b)− Sn (β)] > ε for n large enough, where B is a compact parameter

space for β, and δ and ε are fixed positive small numbers.

We use Lemma 2.8 of Pakes and Pollard (1989) to show (i), where

Fn ={fn

(Ỹ |b)− fn

(Ỹ |β

): b ∈ B

}.

Note that Fn is Euclidean (see, e.g., Definition 2.7 of Pakes and Pollard (1989) for thedefiniton of an Euclidean-class of functions) by applying Lemma 2.13 of Pakes and Pollard(1989), where α = 1, f(·, t0) = 0, φ(·) = ‖Xj‖ 1h1(0 < dj ≤ h) and the envelope functionis Fn (·) = Mφ(·) for some finite constant M . Since E [Fn] = E

[‖Xj‖ 1h1(0 < dj ≤ h)

]=

O (h) δ

E [Sn (b)− Sn (β)], by Proposition 1 of Wang and Scott (1994),

E [Sn (b)− Sn (β)]

≈1

n

∑j

g(0)[XTj H

−1H (b− β)]2 1h

1(0 < dj ≤ h)

− 1n

∑j

2g(0)

[m (di+j)−m(di−j)

2−XTj β

] [XTj H

−1H (b− β)] 1h

1(0 < dj ≤ h)

& δ2 − h5δ,

where & means the left side is bounded blow by a constant times the right side.

We then derive the asymptotic distribution of√nhH

(β̂ − β

)by applying the empirical

process technique. First, the first order conditions can be written as

1

n

∑j

sign(Ỹ

(1)ij − Z

Tj Hβ̂

)Zj

√h

h1(0 < dj ≤ h) = op (1) ,

26


which is denoted as1

n

∑j

f ′n(Ỹ(1)ij |β̂) , S

′n

(β̂)

= op (1) ,

where Zj = H−1Xj . By Example 2.9 of Pakes and Pollard (1989), F ′n forms an Euclidean-

class of functions with envelope F ′n = ‖Zj‖√hh 1(0 < dj ≤ h), where F

′n =

{f ′n(Ỹ

(1)ij |b) : b ∈ B

},

and E[F ′2n]


and the variance is

Var[β̂i1] ≈1

4g(0)2[1, 0]

∑j

XjXTj

−1 [ 10

]≈ 75

16g(0)2n2

k3.

Combining the squared bias and the variance, we obtain the AMSE

AMSE[β̂i1] =m(5)(xi)

2

5042k8

n8+

75

16g(0)2n2

k3. (17)

To minimize (17) with respect to k, we take the first-order derivative of (17) and yield thegradient as

dAMSE[β̂i1]

dk=m(5)(xi)

2

31752

k7

n8− 225σ

2

16g(0)2n2

k4.

Our optimization problem is to solve dAMSE[β̂i1]dk = 0. So we obtain

kopt =

(893025σ2

2g(0)2m(5)(xi)2

)1/11n10/11 ≈ 3.26

(σ2

g(0)2m(5)(xi)2

)1/11n10/11,

and

AMSE[β̂i1] ≈ 0.19((σ/g(0))16m(5)(xi)6)1/11n−8/11.

Appendix B. Proof of Theorem 7

Rewrite the objective function as a U-process,

Sn (b) =∑l


(ii) inf‖H(b−β)‖>δ

E [Un (b)− Un (β)] > ε for n large enough, where B is a compact parameter

space for β, and δ and ε are fixed positive small numbers.

We use Theorem A.2 of Ghosal et al. (2000) to show (i), where

Fn = {fn (Yi+j , Yi+l|b)− fn (Yi+j , Yi+l|β) : b ∈ B} .

Note that Fn forms an Euclidean-class of functions by applying Lemma 2.13 of Pakes andPollard (1989), where α = 1, f(·, t0) = 0, φ(·) = ‖Xjl‖ 1h2 1(|dj | ≤ h)1(|dl| ≤ h) and theenvelope function is Fn (·) = Mφ(·) for some finite constant M . It follows that

N(ε ‖Fn‖Q,2 ,Fn, L2 (Q)

). ε−C

for any probability measure Q and some positive constant C, where . means the left sideis bounded by a constant times the right side. Hence,

1

nE

[∫ ∞0

logN (ε,Fn, L2 (Un2 )) dε].

1

n

√E [F 2n ]

∫ ∞0

log1

εdε = O

(1

n

),

where Un2 is the random discrete measure putting mass1

n(n−1) on each of the points

(Yi+j , Yi+l). Next, by Lemma A.2 of Ghosal et al. (2000), the projections

fn (Yi+j |b) =∫fn (Yi+j , Yi+l|b) dFYi+l(Yi+l)

satisfy

supQN(ε∥∥Fn∥∥Q,2 ,Fn, L2 (Q)) . ε−2C ,

where Fn ={fn (Yi+j |b)− fn (Yi+j |β) : b ∈ B

}, and Fn is an envelope of Fn. Thus

1√nE

[∫ ∞0

logN(ε,Fn, L2 (Pn)

)dε

].

1√n

√E[F

2n

] ∫ ∞0

log1

εdε = O

(1√n

).

By Theorem A.2 and Markov’s inequality, supb∈B|Un (b)− Un (β)− E [Un (b)− Un (β)]|

p−→ 0.

As to inf‖H(b−β)‖>δ

E [Un (b)− Un (β)], by Proposition 1 of Wang and Scott (1994),

E [Un (b)− Un (β)]

≈2

n(n− 1)∑l


We then derive the asymptotic distribution of√nhH

(β̂ − β

). First, by Theorem A.1

of Ghosal et al. (2000), we approximate the first order conditions by an empirical process.

Second, we derive the asymptotic distribution of√nhH

(β̂ − β

)by applying the empirical

process technique.

First, the first order conditions can be written as

2

n(n− 1)∑l


with F� (·) being the cumulative distribution function of �. In other words,

2Gn(E2

[f ′n(Yi+j , Yi+l|β̂)

])+√nE[f ′n(Yi+j , Yi+l|β̂)

]= op(1).

By Lemma 2.13 of Pakes and Pollard (1989), F ′1n ={E2 [f

′n(Yi+j , Yi+l|b)] : b ∈ B

}is Eu-

clidean with envelope F1n =√h

nh2∑l

‖Zjl‖ 1(0 < |dj | ≤ h)1(0 < |dl| ≤ h), so by Lemma 2.17

of Pakes and Pollard (1989) and H(β̂ − β

)= op(1),

Gn(E2

[f ′n(Yi+j , Yi+l|β̂)

])= Gn

(E2[f ′n(Yi+j , Yi+l|β)

])+ op(1).

As a result,

2Gn(E2[f ′n(Yi+j , Yi+l|β)

])+√nE[f ′n(Yi+j , Yi+l|β̂)

]= 2

√nPn


])− 2√nE[f ′n(Yi+j , Yi+l|β)

]+√n(E[f ′n(Yi+j , Yi+l|β̂)

]− E

[f ′n(Yi+j , Yi+l|β)

])+√nE[f ′n(Yi+j , Yi+l|β)

]= 2

√nPnE2

[f ′n(Yi+j , Yi+l)

]+ 2√nPn


]− E2


])+√n(E[f ′n(Yi+j , Yi+l|β̂)

]− E


])−√nE[f ′n(Yi+j , Yi+l|β)

]= 2

√nPnE2


]+√n(E[f ′n(Yi+j , Yi+l|β̂)

]− E


])+√nE[f ′n(Yi+j , Yi+l|β)

]= op(1),

where

E2[f ′n(Yi+j , Yi+l)

]=

√h

nh2

∑l

[2F� (�i+j)− 1]Zjl1(0 < |dj | ≤ h)1(0 < |dl| ≤ h)

satisfies E[E2 [f

′n(Yi+j , Yi+l)]

]= 0, and the second to last equality is from

√nPn


]− E2


])≈√nE[f ′n(Yi+j , Yi+l|β)

].

Since

E[f ′n(Yi+j , Yi+l|b)

]=

√h

n2h2

∑l,j

Zjl1(0 < |dj | ≤ h)1(0 < |dl| ≤ h)

·[2

∫F�(�+m(di+j)−m(di+l)− ZTjlHb

)− 1]f (�) d�,

we have

√n(E[f ′n(Yi+j , Yi+l|β̂)

]− E


])≈ −

√nhg(0)

1n2h2

∑l,j

ZjlZTjl

H (β̂ − β) ,√nE[f ′n(Yi+j , Yi+l|β)

]≈√nhg(0)

1

n2h2

∑l,j

Zjl(d5j − d5l

) m(5)(xi)5!

.

31


In summary,

√nh

H (β̂ − β)− 1n2h2

∑l,j

ZjlZTjl

−1 1n2h2

∑l,j

Zjl(d5j − d5l

) m(5)(xi)5!

≈ 2g(0)−1

1n2h2

∑l,j

ZjlZTjl

−1√nPnE2 [f ′n(Yi+j , Yi+l)] ,Thus, the asymptotic bias is

eTH−1

1n2h2

∑l,j

ZjlZTjl

−1 1n2h2

∑l,j

Zjl(d5j − d5l

) m(5)(xi)5!

= −m(5)(xi)

504

k4

n4,

and the asymptotic variance is

4

kg(0)2eTH−1G−1V G−1H−1e =

75

24g(0)2n2

k3,

where e = (1, 0, 0, 0)T , G = 1k2∑

l,j ZjlZTjl , and V =

13k

∑kj=−k(

1k

∑kl=−k Zjl)(

1k

∑kl=−k Zjl)

T

with Var (2F� (�i+j)− 1) = 1/3.

Appendix C. Proof of Theorem 8

Proof of Theorem 8 Following the proof of Theorem 1, the asymptotic bias is

Bias[β̂i11] ≈m(2)(xi)

2

k4 + 2k3i− 2ki3 − i3

n(k3 + 3k2i+ 3ki2 + i3),

and the asymptotic variance is

Var[β̂i11] ≈3

f(0)2n2

k3 + 3k2i+ 3ki2 + i3.

Appendix D. Proof of Theorem 9

Proposition 11 If the errors �i are iid with median 0 and a symmetric, continuous, positivedensity function f(·), then δ̃ij = �i+j + �i−j (j=1, . . . , k) are iid with Median[δ̃ij ] = 0 anda continuous, positive density h(·) in a neighborhood of 0, where

h(x) =

∫ ∞−∞

f(x− �)f(�)d�.

32


Proof of Proposition 11 The distribution of δ̃ij = �i+j + �i−j is

Fδ̃ij (x) = P (�i+j + �i−j ≤ x)

=

∫∫�i+j≤x−�i−j

f(�i+j)f(�i−j)d�i+jd�i−j

=

∫ ∞−∞{∫ x−�i−j−∞

f(�i+j)d�i+j}f(�i−j)d�i−j

=

∫ ∞−∞

f(x− �i−j)f(�i−j)d�i−j .

Then the density of δ̃ij is

h(x) ,dFδ̃ij (x)

dx=

∫ ∞−∞

f(x− �i−j)f(�i−j)d�i−j .

By the symmetry of the density function, we have

Fδ̃ij (0) =

∫ ∞−∞

f(−�i−j)f(�i−j)d�i−j

= (f − 12f2(�i−j)) |∞−∞

=1

2,

h(0) =

∫ ∞−∞

f2(�i−j)d�i−j .

Proof of Theorem 9 Following the proof of Theorem 1, the asymptotic bias is

Bias[α̂i2] ≈ −m(6)(xi)

792

k4

n4,

and the asymptotic variance of α̂i1 is

Var[α̂i2] ≈2205

64h(0)2n4

k5.

Combining the squared bias and the variance, we obtain the AMSE as

AMSE[α̂i2] =m(6)(xi)

2

7922k8

n8+

2205

64h(0)2n4

k5. (18)

To minimize (18) with respect to k, we take the first-order derivative of (18) and yield thegradient as

dAMSE[α̂i2]

dk=m(6)(xi)

2

78408

k7

n8− 11025

16h(0)2n4

k6.

33


Now the optimization problem is to solve dAMSE[α̂i2]dk = 0. So we obtain

kopt =

(108056025

8h(0)2m(6)(xi)2

)1/13n12/13 ≈ 3.54

(1

h(0)2m(6)(xi)2

)1/13n12/13,

andAMSE[α̂i2] ≈ 0.29(m(6)(xi)10/h(0)16)1/13n−8/13.

Appendix E. Variances of LowLAD and LowLSR Derivative Estimatorswith Mixed Normal Distributions

We consider the mixed normal contaminated errors �i ∼ (1− α)N(0, σ20) + αN(0, σ21) withσ21 > σ

20 and 0 < α < 1/2. Now the mixed density function is

f(x) = (1− α) 1√2πσ0

e− x

2

2σ20 + α1√

2πσ1e− x

2

2σ21 .

Thus

g(0) = 2

∫ ∞−∞

f2(x)dx

= 2{(1− α)2∫ ∞−∞

1

2πσ20e− x

2

2σ20 dx+ 2α(1− α)∫ ∞−∞

1

2πσ0σ1e−( x

2

2σ20+ x

2

2σ21)dx

+α2∫ ∞−∞

1

2πσ21e− x

2

2σ21 dx}

= 2{(1− α)2 12√πσ0

+ 2α(1− α) 1√2π√σ20 + σ

21

+ α21

2√πσ1},

Var(�i) = (1− α)σ20 + ασ21 , σ2.

Next, we compare σ2 and 1/(2g2(0)). With λ = σ1/σ0, the difference of these two variancesis

DV(α, λ) = σ2 − 1/(2g2(0))

= (1− α) + αλ2 − (8{(1− α)2

2√π

+2α(1− α)√2π(1 + λ2)

+α2

2√πλ})−1.

34


−1.0 −0.5 0.0 0.5 1.0

−5

05

10

(a) LowLAD (k=70)

1st d

eriv

ativ

e

−1.0 −0.5 0.0 0.5 1.0

−5

05

10

(b) LowLSR (k=70)

1st d

eriv

ativ

e−1.0 −0.5 0.0 0.5 1.0

−5

05

10

(c) LowLAD (k=70)

1st d

eriv

ativ

e

−1.0 −0.5 0.0 0.5 1.0

−5

05

10(d) LowLSR (k=100)

1st d

eriv

ativ

e

Figure 3: (a-b) The true first-order derivative function (bold line), LowLAD (green line) andLowLSR estimators (red line). Simulated data set of size 500 from model (1) withequidistant xi ∈ [−1, 1], regression function m1, and errors � ∼ 90%N(0, 0.12) +10%N(0, 12). (c-d) The same designs except � ∼ 90%N(0, 0.12) + 10%N(0, 102)

−1.0 −0.5 0.0 0.5 1.0

−60

−20

2060

(a) LowLAD (k=90)

2nd

deriv

ativ

e

−1.0 −0.5 0.0 0.5 1.0

−60

−20

2060

(b) LowLSR (k=100)

2nd

deriv

ativ

e

−1.0 −0.5 0.0 0.5 1.0

−60

−20

2060

(c) LowLAD (k=90)

2nd

deriv

ativ

e

−1.0 −0.5 0.0 0.5 1.0

−60

−20

2060

(d) LowLSR (k=100)

2nd

deriv

ativ

e

Figure 4: (a-d) The true first-order derivative function (bold line), LowLAD (green line)and LowLSR estimators (red line). The same designs as in Figure 3 except theregression function being m2.

35


0.0 0.2 0.4 0.6 0.8 1.0

−60

−20

020

(a) LowLAD (k=50)

1st d

eriv

ativ

e

0.0 0.2 0.4 0.6 0.8 1.0

−60

−20

020

(b) LowLSR (k=60)

1st d

eriv

ativ

e0.0 0.2 0.4 0.6 0.8 1.0

−60

−20

020

(c) LowLAD (k=50)

1st d

eriv

ativ

e

0.0 0.2 0.4 0.6 0.8 1.0−

60−

200

20(d) LowLSR (k=100)

1st d

eriv

ativ

e

Figure 5: (a)-(d) The true second-order derivative function (bold line), LowLAD (greenline) and LowLSR estimators (red line) based on the simulated data sets fromFigure 3 correspondingly.

0.0 0.2 0.4 0.6 0.8 1.0

−60

00

400

(a) LowLAD (k=70)

2nd

deriv

ativ

e

0.0 0.2 0.4 0.6 0.8 1.0

−60

00

400

(b) LowLSR (k=80)

2nd

deriv

ativ

e

0.0 0.2 0.4 0.6 0.8 1.0

−60

00

400

(c) LowLAD (k=70)

2nd

deriv

ativ

e

0.0 0.2 0.4 0.6 0.8 1.0

−60

00

400

(d) LowLSR (k=100)

2nd

deriv

ativ

e

Figure 6: (a)-(d) The true second-order derivative function (bold line), LowLAD (greenline) and LowLSR estimators (red line) based on the simulated data sets fromFigure 4 correspondingly.

36


LowLAD LowLSR locpol Psline

050

100

150

Figure 7: Boxplot of four estimators for the function m4 with errors � ∼ 95%N(0, 0.12) +5%N(0, 12).

LowLAD LowLSR locpol Psline

020

040

060

080

012

00

Figure 8: Boxplot of four estimators for the function m4 with errors � ∼ 95%N(0, 0.12) +5%N(0, 102).

37


Figure 9: The curve of the same variance between LowLAD and LowLSR estimators witherrors �i ∼ (1−α)N(0, σ20)+αN(0, σ21). The two horizontal lines are y = 1 (green)and y = 2.77 (blue).

38

Documents

Robust Estimation of Derivatives Using Locally Weighted ...tongt/papers/LADD.pdfin locally weighted least squares regression (LowLSR), and proposed a new estimator for the derivative