Nonparametric Density Estimation and Monotone Rearrangement · In nonparametric density estimation, the method of kernel estimators is commonly used. However, if extra information

Nonparametric Density Estimation andMonotone Rearrangement

Robben Riksen

July 16, 2014

Bachelor Thesis

Supervisor: dr. A.J. van Es

Korteweg-de Vries Institute for Mathematics

Faculty of Natural Sciences, Mathematics and Informatics

University of Amsterdam

Abstract

In nonparametric density estimation, the method of kernel estimators is commonly used.However, if extra information about the monotonicity of the target density is available,kernel estimation does not take this into account and will in most cases not give amonotone estimate. A recently appeared method in this field of research is the method ofmonotone rearrangement. Using this extra information to improve the density estimate,it has some interesting properties. In this thesis, nonparametric density estimationby means of kernel estimators is discussed. The reflection method is introduced toimprove kernel estimation if the target density equals zero on the negative real lineand has a discontinuity in zero. In the case where information about monotonicityof the target density is available, monotone rearrangement is used to monotonize theoriginal estimate. The theoretical and asymptotic properties of the reflection methodand monotone rearrangement will be discussed briefly, after which simulations in Matlabare run to measure the performance, in terms of MISE, of both methods separatelyand applied together. It appears to be the case that both the reflection method andmonotone rearrangement greatly reduce the MISE of the kernel estimator, but whenapplied together, the MISE remains of the level of that of the reflection method.

Cover image: a plot of an exponential-1 density (dashed-dotted line), a kernel estimatebased on a sample of size n = 100 (dashed line) and its rearranged estimate (solid line).

Title: Nonparametric Density Estimation and Monotone RearrangementAuthor: Robben Riksen, [email protected], 10188258Supervisor: dr. A.J. van EsSecond grader: dr. I.B. van VulpenDate: July 16, 2014

Korteweg-de Vries Institute for MathematicsUniversity of AmsterdamScience Park 904, 1098 XH Amsterdamhttp://www.science.uva.nl/math

3

http://www.science.uva.nl/math

Contents

Introduction 7

1. Theory 91.1. Kernel Estimation of Smooth Densities . . . . . . . . . . . . . . . . . . . 9

1.1.1. The Histogram Method . . . . . . . . . . . . . . . . . . . . . . . . 91.1.2. Kernel Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2. Defining Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3. Bandwidth Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4. Decreasing Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4.1. Reflection Method . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5. Monotone Rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.5.1. Introducing Monotone Rearrangement . . . . . . . . . . . . . . . 221.5.2. Properties of Monotone Rearrangement . . . . . . . . . . . . . . . 26

2. Results 282.1. Comparing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3. Conclusion 31

4. Discussion and Review of the Progress 32

5. Populaire Samenvatting 34

References 37

A. Theorems and Proofs 39

B. MATLAB code 44B.1. Epanechnikov Kernel Estimator . . . . . . . . . . . . . . . . . . . . . . . 44B.2. Gaussian Kernel Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 46B.3. Montone Rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . 47B.4. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

C. Samples 52C.1. Sample 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52C.2. Sample 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5

Introduction

When I returned from my semester abroad at the University of Edinburgh, I had madeup my mind about the master’s degree I wanted to pursue after graduating for my bach-elor’s degree: it had to be the master Stochastics and Financial Mathematics at theUniversity of Amsterdam. For a longer time, my interest had gone out to statistics, sowriting my bachelor thesis is this field of research was a logical step in the preparationfor this master’s programme. When approached, Dr. Van Es suggested a thesis aboutMonotone Rearrangement, a method of improving density estimates of monotone prob-ability densities. Being such a recently developed method in this application, it sparkedmy interest and we decided it would be a suitable subject for a bachelor thesis.

A common challenge in statistics is the estimation of an unknown density functionout of a given data set. For example, this can be valuable to gain information about theoccurrence of certain events in a population. Estimating the probability density fromwhich such a data set is sampled, can be done in parametric and a nonparametric way.If we assume or know that a sample is drawn from, say, a N (µ, σ2) distribution. Wewill apply parametric density estimation to estimate µ and/or σ. There are many waysof doing this, but the produced estimate will be normal distributed and its accuracycan only be improved by varying the parameters. If the assumption was right, thisparametric way of density estimation is satisfying enough. But if we do not know ifthe density we want to estimate belongs to a known family of parameterized probabilitydensities, the parametric approach obviously is not satisfying.

In nonparametric density estimation, these kind of assumptions are not made. Weonly assume the target density exists, and make no assumptions about a parametricfamily to which the target density might, or might not, belong. Again, there are manyways to obtain a density function from observed data in a nonparametric way, but themost common way to do this is with kernel estimators, the method we will discuss inthis thesis.

The method of kernel estimation is designed to estimate smooth densities, but canalso be applied to densities with discontinuities. We will take a closer look at densitiesthat are equal to zero on the negative real line, and therefore have a discontinuity inzero; the boundary. To improve the kernel estimator in this case, the commonly usedreflection method is applied. This method, introduced by Schuster (1985) [1], ‘reflects’the data set around zero, that is, adds the negative of the data points to the data setand applies the kernel estimation method to this new, bigger set of data points. Thisgreatly improves the kernel estimation method.

Suppose now there is some extra information about the target density: we know thedensity we want to estimate is monotonically decreasing, or increasing. The availability

7

of such information occurs often. For example, in biometrics, age-height charts, charts inwhich the expected height for children is plotted against their age, should be monotonicin age. In econometrics, demand functions are monotonic in price (both examples from[2]). And there are a lot of other examples in which a monotone density is expected.In these cases, kernel estimation still works fine. But the method of kernel estimatorsdoes not take this extra information into account, and will in most cases not produce amonotone estimate. Wanting to produce the best possible estimate from our data set,we would like to use all information available to minimize our estimation error.

To monotonize the density found by kernel estimation, the method of monotone rear-rangement is introduced. This relatively simple method, essentially rearranging functionvalues in decreasing or increasing order, appeared first in the context of variational anal-ysis in 1952 in the book Inequalities by Hardy, Littlewood and Polya [3], but was onlyvery recently introduced in the context of density estimation. Monotone rearrangement,however, proved to have many interesting properties in this context. This was the mo-tivation to look into this subject further in this thesis

The goal of this thesis is to familiarize with the method of kernel estimation andmonotone rearrangement, and investigate whether a combination of methods, designedto improve kernel estimation, improves the estimate of a target density even more.Therefore, we will apply the reflection method and monotone rearrangement togetherto estimate a monotone probability density out of a generated sample. This leads tothe following research question: ‘Does the combination of the reflection method andmonotone rearrangement significantly improve the estimation of a monotonic densityfrom a given data sample?’ In an attempt to answer this question, these methods wereimplemented in Matlab and simulations were run with different sample sizes.

This thesis will start with giving an overview of the method of kernel estimators, byintroducing them in an intuitive way. Then, measures of how well an estimator estimatesthe target density will be introduced and discussed, after which several problems withthe original kernel estimators are pointed out in the case where they are used to estimatetarget densities with a discontinuity in zero. A solution is proposed in the form of thereflection method. After that, the method of monotone rearrangement is introducedand some properties of this method are discussed. Then simulations are run to answerthe main question of this thesis: ‘Does the combination of the reflection method andmonotone rearrangement significantly improve the estimation of a monotonic densityfrom a given data sample?’ The results of the simulations are discussed and possibleimprovements to the simulation and estimation methods are proposed.

8

1. Theory

This section will start with introducing the concept of kernel estimation in an intuitiveway. The reflection method [1] will be introduced as a method to improve kernel esti-mation for densities that have a bounded support. An example of such a density is theexponential density, which is zero for all points on the negative real axis and has a dis-continuity in 0. Then monotone rearrangement will be introduced and applied, to takeinto account extra information about monotonicity of the target density. Throughoutthis chapter, unless stated otherwise, the same data set with 100 points drawn from aN (µ, σ2) distribution, with µ = 2 and σ2 = 1

2, will be used in the examples to illustrate

the theory. The value of these data points can be found in Appendix C.1. Furthermore,in general descriptions and theory it will be assumed that a sample of n independently,identical distributed observations X1, X2, . . . , Xn is given for which the density, f , willbe estimated.

1.1. Kernel Estimation of Smooth Densities

In this section we will introduce the method of kernel estimation, by starting with thehistogram method as a density estimator. We then expand this idea to introduce kernelestimators and find a smooth estimate of the target density.

1.1.1. The Histogram Method

Probably the most intuitive way of representing the probability density from which a dataset is sampled, is by use of a histogram. To do this, the bin width h and starting pointx0 are chosen. The bins are defined as the half open intervals [x0 + (m− 1)h, x0 +mh),for m ∈ Z. Then the histogram density approximation is defined as

f(x) =1

nh· {no. of Xi in same bin as x}.

For the data set from Appendix C.1 this definition, with starting point x0 = 0 andbin width h = 0.1708, results in the density f as depicted in figure 1.1.

Clearly, the histogram method for approximating the density is not at all satisfactory.The approximation is not a continuous function, but the major drawback is that thechoice of the starting point x0 has a big influence on the approximated density, as isshown figure 1.2.

The starting point is shifted to the left by 0.10 to x0 = −0.1 while the bin widthis the same. This results in the peak extending less far to the right and the structure

9

Figure 1.1.: Histogram for the data set from Appendix C.1, with starting point x0 = 0and bin width h = 0.1708.

Figure 1.2.: Histogram for the data set from Appendix C.1, with starting point x0 = −0.1and bin width h = 0.1708.

10

separated from the rest of the bins visible in figure 1.1 has almost disappeared in figure1.2.

If we now, instead of placing the data points in bins, place ‘bins’ around the datapoints, we get rid of the problem of the starting point. The only variable then will bethe shape of the ‘bin’ and the bin width h. This idea leads to the definition of kernelestimators, which we will introduce below.

1.1.2. Kernel Estimators

A far more satisfactory method to estimate the probability density, is a family of methodsbased on kernel estimators. First a kernel function K, a function that places weight on(and around) the observations, has to be introduced. This kernel function, or simplykernel, K has to integrate to 1 over the real line. Now the kernel estimator can bedefined as follows.

Definition 1.1. Let X1, X2, . . . , Xn be independently, identical distributed observationswith density f . A kernel estimator of the target function f , f , with kernel K, is definedby

f(x) =1

nh

n∑i=1

K

(x−Xi

h

), (1.1)

where n is the number of data points and, from now on, h is called the bandwidth.

A kernel K is said to be of order m if∫Kn(x)dx = 0, for 0 < n < m, and

∫Km(x)dx 6= 0. (1.2)

The mostly used symmetrical kernel are of order 2, because otherwise it can appearthat the kernel estimate takes negative values in some areas, which is in most cases notdesirable when estimating a probability density.

An example of a commonly used kernel function is the standard normal density func-tion. To visualize the working of a kernel estimator using the normal distribution func-tion as a kernel, imagine small normal densities placed over the data points Xi thatadd up to the estimated probability density. This is illustrated in figure 1.3, where weused the first 4 data points from sample 1 in Appendix C.1 and bandwidth h = 0.5 toconstruct the density estimation f .

Varying the bandwidth h has big effects on the density estimate f . Choosing htoo small, will result in too much weight being given to single data point in the tailof the distribution, while choosing h too big will result in over smoothing and mightconceal structures in the target density. Therefore, the selected bandwidth will largelydetermine the behaviour of the kernel estimator. This is illustrated by figure 1.4, wherethe bandwidth h = 0.3 is chosen.

A (non-existent) and before invisible structure around x = 3 has appeared by choosingthe bandwidth too small.

11

Figure 1.3.: Kernel estimator for the N (2, 0.5)-distribution from 4 data points withbandwidth h = 0.5, showing individual kernels.

Figure 1.4.: Kernel estimator for the N (2, 0.5)-distribution from 4 data points withbandwidth h = 0.3, showing individual kernels.

12

Another property of a kernel estimator that determines its accuracy is the choice ofthe kernel. A huge variation in the choice of the kernel used in kernel estimation ispossible. A commonly used symmetric kernel is the gaussian kernel we used in figure1.3: K(t) = 1√

2πe−

12t2 . Another commonly used kernel is the Epanechnikov kernel Ke:

Ke(t) =

{3

4√

5

(1− 1

5t2)

if −√

5 ≤ t ≤√

5

0 otherwise(1.3)

This Kernel, sometimes called the optimal kernel, was introduced by Epanechnikov(1996) [4] and is the most efficient kernel function in terms of minimizing the MISE,although the difference compared to other kernels relatively small. Its drawback, on theother hand, is that is not continuously differentiable. The Kernels we just discussed willbe the same for all data points. Variable kernels can be introduced, in which the shapeof the kernel can depend on the position of the data point around which it is placed, orits distance to other data points. Depending on the information known about the dataset, this can greatly increase the accuracy of the estimate.

1.2. Defining Errors

To compare different estimation methods and to draw conclusions about which methodimproves the estimation, it is necessary to define what measures of error will be usedto compare different techniques. In the derivations in this section, the line of DensityEstimation for Statistics and Data Analysis, by Silverman (1986) [5], will be followed.

For the error in the estimation in a single point, the ’mean square error’ is defined inthe usual way.

Definition 1.2. The mean square error is given by

MSEx(f) := E[f(x)− f(x)]2, (1.4)

where f(x) is the estimatior of the density f .

It is easy to see, using the expression for the variance of f , Varf(x) = Ef 2(x)−(Ef(x))2,that the MSE can be written in the following way

MSEx(f) =(Ef(x)− f(x)

)2

+ Varf . (1.5)

We will now define the bias of f .

Definition 1.3. Let f(x) be the estimation of the density f . Then

bf (x) := Ef(x)− f(x), (1.6)

is called the bias of f .

13

So in a single point, the mean square error is given by a bias term, representing thesystematic error of f , and a variance term. To quantify the error on the whole domainof f , the ‘mean integrated square error’ is introduced.

Definition 1.4. The mean integrated square error is given by

MISE(f) := E

∫ [f(x)− f(x)

]2

dx. (1.7)

The MISE is a commonly used method of quantifying the error in an estimate. Now,because the integrant is non-negative, the expectation and the integral can be inter-changed, and the MISE takes the following form [6].

MISE(f) : = E

∫ [f(x)− f(x)

]2

dx

=

∫MSEx(f)dx

=

∫ [Ef(x)− f(x)

]2

dx+

∫Varf(x)dx

=

∫b2f(x)dx+

∫Varf(x)dx. (1.8)

Thus, the MISE is the sum of the integrated square bias and the integrated variance. Ifin this definition f is taken as in definition 1.1, its expectation value is [7]

Ef(x) =

∫1

hK

(x− yh

)f(y)dy, (1.9)

and its variance can be found as

Varf(x) = Var

[1

n

n∑i=1

1

hK

(x−Xi

h

)]

=1

n2

n∑i=1

Var

[1

hK

(x−Xi

h

)]

=1

n

(E

[1

h2K2

(x−Xi

h

)]−(E

[1

hK

(x−Xi

h

)])2)

=1

n

∫1

h2K2

(x− yh

)f(y)dy − 1

n

[1

h

∫K

(x− yh

)f(y)dy

]2

. (1.10)

Written in this form, it becomes clear that the bias term in the MISE does not directlydepend on the sample size, thus this term will not be reduced just by taking largersamples.

14

From now on, it will be assumed that the unknown density has continuous derivativesof all required orders and the kernel function K used to define f is a symmetric functionwith the following properties: ∫

K(x)dx = 1 (1.11)∫xK(x)dx = 0 (1.12)∫x2K(x)dx = k2 6= 0. (1.13)

In other words, we are dealing with a symmetric kernel of order 2. These propertiesare met by most commonly used kernel functions, for in many cases the kernel K isa symmetric probability density with variance k2. The kernel functions used in thesimulations later on in this thesis will also satisfy these properties.

To calculate the MISE of an estimation, it is useful to derive a more applicable (ap-proximate) form of the bias and variance. So for the rest of this section we will take ona heuristic approach. We can rewrite the bias in the following way

bf (x) = Ef(x)− f(x)

=

∫1

hK

(x− yh

)f(y)dy − f(x)

∗=

∫K(t)f(x− ht)dt− f(x) (1.14)

∗∗=

∫K(t) [f(x− ht)− f(x)] dt, (1.15)

where at ∗ a change of variable y = x−ht is applied and at ∗∗ it is used that K integratesto unity. If the Taylor expansion

f(x− ht) = f(x)− htf ′(x) + h2t2f ′′(x)

2+ . . . (1.16)

is substituted in expression 1.15 the following expression for the bias is found:

bf (x) = −hf ′(x)

∫tK(t)dt+

1

2h2f ′′(x)

∫t2K(t)dt+ . . .

=∞∑p=1

(−1)p hpf (p)

p!

∫tpK(t)dt

=∞∑p=1

(−1)p hp b(p)(x), (1.17)

where b(p)(x) = f (p)

p!

∫tpK(t)dt. Using properties (1.12) and (1.13) of K, the bias can be

approximated by

15

bf (x) = h2b(2)(x) + o(h4) ≈ h2b(2)(x). (1.18)

So, the integrated square bias can be approximated by

∫b2f(x)dx ≈ 1

4h4k2

2

∫(f ′′(x))

2dx. (1.19)

In a similar manner, an approximation of the variance of f can be found. Substitutionof expression (1.9) into (1.10), gives

Varf(x) =1

n

∫1

h2K2

(x− yh

)f(y)dy − 1

n

[f(x) + bf (x)

]2

.

The substitution y = x− ht in the integral and approximation (1.18) for the bias, leadsto the following approximation for the variance:

Varf(x) ≈ 1

nh

∫K2(t)f(x− ht)dt− 1

n

[f(x) +O(h2)

]2.

With the Taylor expansion (1.16) and the assumption that n is large and h is small, thiscan be rewritten as

Varf(x) ≈ 1

nh

∫K2(t) {f(x)− htf ′(x) + . . .} dt+O(n−1)

=1

nhf(x)

∫K2(t)dt+O(n−1)

≈ 1

nhf(x)

∫K2(t)dt. (1.20)

Because f is a probability density,∫f(x)dx = 1, so the integral over the variance of f

can be approximated by

∫Varf(x)dx ≈ 1

nh

∫K2(t)dt. (1.21)

This means we can approximate the MISE by [6]

MISE =

∫b2f(x)dx+

∫Varf(x)dx

≈ 1

4h4k2

2

∫(f ′′(x))

2dx+

1

nh

∫K2(t)dt. (1.22)

16

The last expression (1.22) is known as the Asymptotic Mean Integrated Square Error,AMISE, and is a useful approximation to the MISE when dealing with large samplesand much easier to calculate than the MISE in equation (1.8) [8].

It now becomes clear that decreasing the bandwidth h, whilst reducing the integratedsquare bias term in (1.8), increases the integrated variance term. This means that findingthe optimal bandwidth will always be a trade-off between random and systematic error.This is known as the variance-bias trade-off. More about bandwidth selection will besaid in section 1.3.

Besides the MISE as a measure of how well a density estimator approaches the targetdensity, the asymptotic properties of the estimator are also important. Prakasa Rao(1983) [6] proved that there is no reasonable estimator fn(x) such that

E[fn(x)

]= f(x), (1.23)

so we are forced to look for asymptotically unbiased estimators. That is, a sequence ofdensity estimators fn is asymptotically unbiased if, for every density f and x,

limn→∞

Ef

[fn(x)

]= f(x). (1.24)

We call a sequence of density estimators fn weakly consistent if

fn(x)p→ f(x) as n→∞. (1.25)

Both definitions are valuable when looking for an appropriate density estimator.

1.3. Bandwidth Selection

As stated before, a good choice of the bandwidth h in a kernel estimator is very impor-tant, for the bandwidth will largely determine the behaviour of the estimator. We willdefine the optimal bandwidth as that bandwidth which minimizes the AMISE (1.22). Inthis thesis we will not go into detail about different methods to determine the optimalbandwidth, but we will just state the theoretical asymptotically optimal bandwidth.

Parzen (1962) [9] showed that the asymptotically optimal bandwidth for a two timescontinuously differentiable density f is equal to

hoptn = k

− 25

2

(∫K(t)2dt

) 15(∫

f ′′(x)2dx

)− 15

n−15 , (1.26)

where k2 is the same as in equation (1.13). In this thesis however, we will be dealingwith densities with discontinuities. In Van Eeden 1985 [10] the asymptotically optimal

17

bandwidth is derived for non-smooth densities. Let D be the set of discontinuity pointsof f , then the asymptotically optimal bandwidth is given by

hoptn =

( ∫K(t)2dt∫∞

0

(∫∞tK(u)du

)2dt

) 12(∑d∈D

(f(d+)− f(d−))2

)− 12

n−12 , (1.27)

where f(d+) = limh→0 f(d + h) and f(d−) = limh→0 f(d− h). So we conclude that for

smooth densities, the optimal bandwidth is of order n−15 , and for discontinuous densities

the optimal bandwidth is of order n−12 . In practice, for finite samples, the optimal

bandwidth has to be approximated. There are many methods of finding a suitablebandwidth, but in the simulations later on in this thesis we will use the Sheather andJones bandwidth selection procedure [11].

Now we have defined the measures by which we can measure the ‘quality’ of a densityestimator and know what the theoretical optimal bandwidths are, we can move on tothe different methods of improving the estimator found by kernel estimation.

1.4. Decreasing Densities

When a probability density function is monotonically decreasing, the lower bound ofthe interval on which this density is defined has to exist. Otherwise this function cannever be a probability density, for it will not integrate to unity. So the density functionwill take on positive (non-zero) values on an interval [al,∞) and will be equal to zeroon the interval (−∞, al). This means, for a monotonely decreasing density, there has tobe a discontinuitiy in x = 0. However, most kernel estimators will not correct for thisdiscontinuity and will have postive values on the interval where the target density equalszero. Especially at this boundary, the density estimate will be inaccurate and even notconsistent with the target density. The problems induced by such a boundary are calledboundary effects (or boundary constraints). Kernel estimators not taking into accountthese boundary constraints we will call unconstrained estimators. In this section we willdiscuss the reflection method. A method that corrects the kernel density estimate tothese boundary effects.

1.4.1. Reflection Method

In the previous sections, the kernel density method is introduced as a method to estimatea target density that satisfies certain smoothness criteria. For example, in the derivationof the AMISE (1.22) we assumed the target function f to have a continuous secondderivative over the whole real line. In many cases densities are used that do not satisfythese conditions. A common example of such a density is the exponential-λ density,f(x) = λe−λx1 {x ≥ 0}, which has an obvious discontinuity in x = 0 and only positivevalues on the positive real line. From figure 1.5, in which an exponential-1 density is

18

estimated by an kernel estimator with an Epanechnikov kernel from a sample of n = 100observations (sample 2 in Appendix C.2), it becomes clear that the estimator gives toolittle weight to the points close to the boundary on the positive x-axis and too much(namely any) to points on the negative axis.

Figure 1.5.: Kernel density estimation with Epanechnikov kernel with bandwidth h = 0.7(dashed line) of an exponential-1 density (solid line).

So such boundary effects influence the performance of the estimator near the boundary[8],[12]. To quantify this, suppose f is a density with f(x) = 0, for x < 0, and f(x) > 0,for x ≥ 0, and has a continuous second derivative away from x = 0. Let f be a kernelestimator of f based on a kernel K with support [−1, 1] and bandwidth h. We expressa point x as x = ch, h ≥ 0. With the same change of variable as in equation (1.14) wefind for the expectation of f

Ef(x) =

∫ c

−1

K(t)f(x− ht)dt. (1.28)

Now if c ≥ 1, that is x ≥ h, we saw in equation (1.18) that

Ef(x) = f(x) + h2b(2)(x) + o(h4), (1.29)

for a kernel of order 2. If on the other hand 0 ≤ c < 1, we find after Taylor expansionof f(x− ht) around x

19

Ef(x) = f(x)

∫ c

−1

K(t)dt+ o(h). (1.30)

Because in general∫ c−1K(t)dt 6= 1, f is not consistent in points close to x = 0. By

the assumption that the kernel K is symmetric, we find Ef(0) = 12f(0) + o(h) at the

boundary x = 0. This can be explained intuitively by the kernel estimator that has tobe a smooth function on the whole line, but is an estimation of the target function fthat is not continuous in x = 0.

To create a consistent kernel estimator near the boundary there exist multiple meth-ods. In this section we will take a closer look at the reflection method, as introduced bySchuster (1985) [1]. All observations are mirrored in x = 0, so to the set of observationsX1 . . . Xn, the observations −X1 . . . −Xn are added. Then on this new set of observa-tions the kernel estimator is applied for x ∈ [0,∞). This method can also be describedin the form of a kernel estimator on X1 . . . Xn for x ∈ [0,∞) as

fR(x) =1

nh

n∑i=1

{K

(x−Xi

h

)+K

(x+Xi

h

)}. (1.31)

Note that fR(x) integrates to 1 on the positive real axis and is a density. If we applythe reflection method to the same set of observations that was used in figure 1.5, thisresults in figure 1.6.

At first sight fR(x) already is a better estimator of the target density f near theboundary, and it no longer gives any weight to negative values of x. This results in anasymptotically unbiased estimate of f . Following the same steps as before

EfR(x) = E1

nh

n∑i=1

{K

(x−Xi

h

)+K

(x+Xi

h

)}= E

1

hK

(x−Xh

)+ E

1

hK

(x+X

h

)=

∫1

hK

(x− yh

)f(y)dy +

∫1

hK

(x+ y

h

)f(y)dy

=

∫K(t)f(x− ht)dt+

∫K(t)f(−x+ ht)dt, (1.32)

where in the first integral the change of variable y = x−ht is applied, and in the secondintegral y = −x + ht. If we again express a point x close to the boundary as x = ch,h ≥ 0, c ∈ [0, 1), and consider that f(x) = 0 for x < 0 and K(t) = 0 for t < −1 and1 < t, we find

20

Figure 1.6.: Reflection method with Epanechnikov kernel with bandwidth h = 0.7(dashed line) of an exponential-1 density (solid line).

EfR(x) =

∫ c

−1

K(t)f(x− ht)dt+

∫ 1

c

K(t)f(−x+ ht)dt

= f(x)

∫ c

−1

K(t)dt+ f(−x)

∫ 1

c

K(t)dt+ o(h), (1.33)

using Taylor expansion of f(x − ht) around x and f(−x + ht) around −x is used. Sofor x = 0 the expectation becomes f(0) + o(h) and we see that the estimator fR(x)is asymptotically unbiased at the boundary. To achieve a smaller bias of order O(h2)near the boundary, the generalized reflection method was introduced by Karunamuniand Alberts (2006) [13], but this is beyond the scope of this thesis.

1.5. Monotone Rearrangement

In some cases there is additional information about the target density. Suppose it isknown that the probability density is a monotone function, it then is desirable to usethis extra information to find a density estimate that matches the original density better.Clearly, the kernel estimation method discussed in the previous section does not take thisinformation into account. In this section monotone rearrangement will be introduced, amethod that adapts the approximated density gained by the use of kernel estimation torespect the given monotonicity of the target probability function.

21

In this chapter, unless stated otherwise, f will be the target density and f the notnecessarily monotone estimation of f , for example generated by a kernel method asdiscussed in the previous chapter.

1.5.1. Introducing Monotone Rearrangement

Let U be uniformly distributed on a compact interval A = [al, au] in R, and g : A→ Ra strictly increasing and differentiable function. Now the distribution function Fg(y) ofg(U) is proportional to [14]

Fg(y) =

∫ au

al

1 {g(u) ≤ y} du+ al = g−1(y), t ∈ g([al, au]). (1.34)

In the case where g is not a strictly increasing function, the distribution function isstill given by the integral above, and we will define the isotone increasing rearrangementas the generalized inverse of the distribution function:

g∗(x) = inf{y ∈ R|Fg(y) ≥ x}. (1.35)

In other words, g∗ is the quantile function of Fg and Fg(y) is the measure of the set inwhich g(u) ≤ y [3].

In a similar way the isotone decreasing rearrangement can be defined with distributionfunction

Fg(y) =

∫ au

al

1 {g(u) ≥ y} du+ al, t ∈ g([al, au]) (1.36)

and decreasing rearrangement:

g∗(x) = inf{y ∈ R|Fg(y) ≤ x}. (1.37)

This reasoning is valid for every compact interval A in R, so for reasons of convenienceA = [0, 1] is used from now on.

In section 1.5.2 it is shown that the rearrangement of f , as defined in (1.35) and (1.37),will always give a better approximation of the monotone target function, when appliedto a non-monotonic estimation.

To illustrate the principle of monotone rearrangement, a simple example is studied.The method of monotone rearrangement as described above is applied to the function

g(x) = (2x− 1), x ∈ [0, 1]. (1.38)

Clearly, this function is not monotone. Using definition (1.34) and (1.35), the increasingmonotone rearrangement of the function g is given by

22

g∗(x) := inf

{y ∈ R |

∫ 1

0

1{

(2u− 1)2 ≤ y}

du ≥ x

}. (1.39)

So the rearrangement sorts the values of g on [0, 1] in ascending order. The result isplotted in figure 1.7.

Figure 1.7.: Increasing monotone rearrangement (dashed line) applied to the functiong(x) = (2x− 1)2 (solid line).

In the same way, using definition (1.37), the decreasing rearrangement is given by

g∗(x) = inf

{y ∈ R |

∫ 1

0

1{

(2u− 1)2 ≥ y}

du ≤ x

}. (1.40)

Resulting in figure 1.8.In section 1.4.1 we used the reflection method to improve the kernel estimator. From

figure 1.6 it is clear that the resulting estimate is not a monotone decreasing function,while the target density is. Applying the isotone decreasing rearrangement as defined in(1.37) to the reflected density, results in figure 1.9.

If we plot the reflection estimate and its monotone rearrangement together and zoomin at the part around x = 2 where the reflection method estimate is not monotone, wesee how the monotone rearrangement method ’rearranges’ the function values to find amonotone estimate, as is shown in figure 1.10.

As we just saw, the monotone rearrangement method monotonizes the kernel densityestimate, to achieve a better estimate of the monotone target density. However, the rear-rangements achieved by equations (1.35) and (1.37) will not always be continuously dif-ferentiable functions. For example, if we take a look at the function f(x) = x+ 1

4sin(4πx)

23

Figure 1.8.: Decreasing monotone rearrangement (dashed line) applied to the functiong(x) = (2x− 1)2 (solid line).

Figure 1.9.: Decreasing monotone rearrangement applied to reflection method withEpanechnikov kernel with bandwidth h = 0.7 (dashed line) of anexponential-1 density (solid line)

and its isotone rearrangement, as depicted in figure 1.11 below, we clearly see that therearrangement is not everywhere continuously differentiable. In some cases however, it

24

Figure 1.10.: Decreasing monotone rearrangement (dotted line) applied to reflectionmethod with Epanechnikov kernel (dashed line) of an exponential-1 density(solid line)

might be necessary to find a differentiable estimator of f . To find an everywhere dif-ferentiable increasing rearrangement, the indicator function in the distribution function(1.34) can be approximated by a kernel in the following way [14]. Let Kd be a positivekernel of order 2 with compact support [−1, 1] and hd the corresponding bandwidth.Then

Fg,hd(y) =1

hd

∫ 1

0

∫ y

−∞Kd

(g(u)− v

hd

)dvdu (1.41)

is a smoothed version of the distribution function and

g∗hd(x) = inf{y ∈ R|Fg,hd(y) ≥ x} (1.42)

is called the smoothed increasing rearrangement of g. In the same way as before, thedecreasing rearrangement can be defined via the smoothed distribution function

Fg,hd(y) =1

hd

∫ 1

0

∫ ∞y

Kd

(g(u)− v

hd

)dvdu, (1.43)

and becomes

25

Figure 1.11.: Increasing isotone rearrangement (dashed line) applied to the functionf(x) = x+ 1

4sin(4πx) (solid line).

g∗hd(x) = inf{y ∈ R|Fg,hd(y) ≤ x}. (1.44)

For sufficiently small hd, the smoothed rearrangements g∗hd and g∗hd of an unconstrainedprobability density are still probability densities, and converge pointwise to the isotonerearrangements g∗ and g∗ respectively, as proved in Birke (2009) [14] (see Appendix A,theorem A.1 for the proof).

As pointed out in Neumeyer (2007) [15], both the smoothed and the isotone rear-rangement share the same rate of convergence. Depending on the properties requiredfor the estimator, one of these method can be chosen. The smoothed rearrangement willbe preferred if one requires a smooth estimator, however for the isotone rearrangementthere is no need for the choice of bandwidth hd and flat parts of g will be better reflectedby g∗. In the rest of this thesis and in the simulations, we will choose to use the isotonerearrangement rather than the smoothed rearrangement.

1.5.2. Properties of Monotone Rearrangement

The method of monotone rearrangement has promising properties. Bennett and Sharp-ley (1988) [16] showed that the monotone rearrangement f ∗ of a function f has the same

26

Lp-norm, p ∈ [1,∞), as the function f : ‖f ∗‖p = ‖f‖p. Chernozhukov et al. (2009) [2]proved that monotone rearrangement of an unconstrained estimate of the target func-tion, decreases the estimation error in the Lp-norm, that is ‖f ∗ − f‖p ≤ ‖f − f‖p (seeAppendix A, theorem A.3 for a proof). Taking p = 2 this implies that the MISE ofthe rearranged estimate will be smaller than that of the original unconstrained esti-mate. Neumeyer (2007) [15] stated that the smoothed rearrangement of an estimator fis pointwise consistent if f is pointwise consistent. Furthermore, she showed that theasymptotic behavior of the rearrangement is the same as that of the original uncon-strained estimator.

The above definitions of monotone rearrangement were only introduced for densitieswith a bounded support. In many purposes, as is the case with our example of theexponential-1 density, the support of the target density is unbounded. For decreasingdensities it will often be of the form A = [al,∞) and for increasing densities of the formA = (∞, au]. Dette and Volgushev (2008) [17] showed that in these cases the rearrange-ments can be defined on an unbounded support and that the asymptotic behavior of therearrangement is the same as described above. In this order, we can safely use monotonerearrangement in the estimation of the exponential-1 density function.

27

2. Results

2.1. Comparing Methods

The main point of this thesis is to investigate whether improving the kernel estimatorby the reflection method, as introduced in section 1.4.1, before applying the method ofmonotone rearrangement, will result in a better estimate of a monotone target density.

In the last section we saw the asymptotic properties of the monotone rearrangementof an estimate are the same as those of the original estimate. So since the reflectionmethod produces an consistent estimate of a density function with a discontinuity inx = 0, applying monotone rearrangement to this estimate will also produce a consistentestimate, and it will share the same asymptotic properties. In practice however, one doesnot encounter samples of infinite size. Therefore it is valuable to look at the behaviourof MISE of the methods for samples of smaller sizes. To measure the performance ofthe methods, simulations have been run in Matlab with different sample sizes. Thedensity we will look at, is the exponential-1 density we also used in section 1.4.1: f(x) =e−x1[0,∞)(x).

2.2. Simulations

In the simulations ran in this thesis, we used a Sheather and Jones bandwidth selec-tor. This selector was originally designed by Sheather and Jones (1991) [11] for smoothdensities. In our case, we applied it to the exponential-1 density, which has a discon-tinuity in x = 0. Van Es and Hoogstrate (1997) [18] showed that the Sheather andJones bandwidth selection method adapts slightly to the non-smoothness of the targetdensity. The bandwidths produced by this method are of smaller order than n−

15 (1.26),

the optimal rate for smooth densities, but the optimal rates of n−12 (1.27) for densities

with discontinuities are not reached. However, they state that the Sheather and Jonesbandwidth still performs well in smaller sample size applications, which is why we choseto use this bandwidth selection method. Therefore we used a Matlab script publishedby Dynare [19], ‘mh optimal bandwidth’, which we adapted slightly to fit our needs (e.g.removed options that were unnecessary in our application to speed up the selection pro-cess and reduce the amount of input variables). The kernel we used in the simulations isa gaussian kernel, for the Sheather and Jones bandwidth selection method requires anat least six times differentiable kernel and the Epanechnikov kernel is not continuouslydifferentiable.

Simulations where run on samples of different sizes, n = 50, 100, 500, 1000. For eachsample size, 1000 samples were generated to which the kernel estimation methods have

28

been applied. The estimated densities were used to approximate the MSE and MISEof the method, where the average of the function values was used to approximate theexpectation of the estimator and the sample variance to approximate the variance of theestimator. Furthermore, to calculate the MISE from the MSE, the Composite Simpson’sRule is used to approximate the integral from the numerical values.

From each sample, the density was estimated by a ‘pure’ kernel estimator as in def-inition 1.1 with a gaussian kernel (second column of table 2.1), the decreasing isotonemonotone rearrangement of this kernel estimation (third column of table 2.1), the re-flection method applied to the sample (fourth column of table 2.1) and the decreasingisotone monotone rearrangement applied to the reflection method (fifth column of table2.1). After the calculation of the estimate, the MSE and the MISE of the estimatorswas calculated. Hereby we have to note, that for the smaller sample size n = 50, thevalues of the MISE can vary when the simulations are repeated. For the larger samplesizes the mean integrated square errors are stable in repetition of the simulation.

We tried several ways to implement the monotone rearrangement algorithm. Imple-mentation literally following the definition in equation (1.37) turned out to be inaccurateif the simulations had to be run within reasonable calculation time. Another method ofcomputing the rearranged estimate was proposed in Chernozhukov et al. (2009) [2], us-ing quantiles of the set of numerical function values as the rearrangement. This methodproved to be far more accurate in terms of MISE, and required significantly less calcula-tion time. The most accurate and efficient method however, also proposed in the samepaper [2], and in Anevski and Fougeres (2008) [20], was simulating the rearrangement asa sorting operator. If the function values of f , the function we want to apply the rear-rangement to, are calculated in a fine enough grid of equidistant points, the decreasingrearrangement can just be simulated by the sorting of these function values in decreasingorder. Therefore, this is the method we used, to calculate the mean integrated squareerrors for the rearrangements.

For the reproducibility of these results, one can find the full Matlab code used for thesimulations in Appendix B.4.

n MISE MISErearr MISErefl MISErefl+rearr

50 0.1235 0.0813 0.0232 0.0231100 0.1065 0.0655 0.0156 0.0156500 0.0706 0.0334 0.0052 0.0052

1000 0.0585 0.0243 0.0032 0.0032

Table 2.1.: The MISE for the different methods of improving the kernel estimation forsamples of size n.

The mean square errors of the simulations for sample size n = 1000 are represented infigure 2.1, where it becomes clear that al methods have a comparable performance awayfrom the boundary x = 0. The difference in the performance of these methods lies, aswe would expect, close to the boundary. A close-up of this area is shown in the second

29

graph of figure 2.1.

Figure 2.1.: In the estimation of the exponential-1 density, the MSE of the ‘pure’ kernelestimator (solid line), its monotone rearrangement (dashed line), the reflec-tion method (dotted line) and its monotone rearrangement (dotted-dashedline). Note that the dotted line and the dotted-dashed line are indistin-guishable in these plots. The graph below is a close-up of the area close tothe boundary x = 0.

30

3. Conclusion

As we saw from table 2.1 for each sample size n, applying decreasing rearrangementafter applying the reflection method has no, or a very small effect. One could think theMISE is reduced in the simulation with sample size n = 50, but because the variationin the MISE for this sample size is relatively big, the small improvement is not signifi-cant. What does become clear from table 2.1 is that both the monotone rearrangementmethod and the reflection method significantly reduce the MISE. It also shows that interms of the reduction of the MISE, the reflection method is preferred over the monotonerearrangement. However, in terms of computation time the monotone rearrangement al-gorithm performs better than the reflection method. The time necessary for rearranginga found estimate is negligible compared to the computation time necessary for kernelestimation. For example, for a sample of size n = 1000, unconstrained kernel estimationand monotone rearrangement takes roughly 0.5 seconds, while the reflection methodtakes somewhat more than 1 second to compute. Rearranging the reflected estimatetakes negligible time, but on the other hand will not improve the estimate significantly.Therefore, for extremely large samples it might be more desirable to use the monotonerearrangement algorithm than to use the reflection method.

The conclusion we can draw from the data in table 2.1 is that combining the methodof monotone rearrangement and the reflection method will not significantly improve theestimate relatively to just applying the reflection method.

31

4. Discussion and Review of theProgress

Several remarks can be made about the simulations ran in this thesis. First of all, itshould be stressed that the sample size of n = 50 is too small to draw any conclusions,for the MISE varies a lot when the estimation procedure is repeated on a different setof samples. Secondly, more monotone probability densities should be investigated toconfirm the answer to the research question more firmly. And thirdly, the Sheather andJones bandwidth selector we used to estimate the target density in this thesis is notoptimal for discontinuous densities, as was stated in section 1.3. Therefore other band-width selection methods should be investigated to improve the density estimates. Forexample, for larger sample sizes, one could use the least squares cross-validation band-widths, which are discussed in Stone (1984) [21]. These bandwidths are asymptoticallyequivalent to the optimal bandwidths, even in the non-smooth cases [18]. This mightbe the reason why the accuracy of the unconstrained kernel estimator is less than thatin [14], where a different bandwidth selector was used. Since the monotone rearrangedestimator is based on this, less accurate, kernel estimator, the monotone rearrangementwill also be less accurate, thus the mean integrated square errors will not be comparableto those from [14].

In this thesis, we started out with the introduction of kernel estimators and somebasic properties of this method. Then we pointed out the boundary effects that occurwhen a target density with a discontinuity in x = 0 is estimated. As a solution to theloss of consistency, the reflection method was discussed. Monotone rearrangement, amethod rearranging function values in ascending or decreasing order, is proposed in thecase where a monotonic density is to be estimated. Simulations in Matlab were run, tocompare the reflection method and the method of monotone rearrangement separatelyand applied together.

Review of the ProgressTaking on a project extending over such a long period of time was new for me, and Ilearned a lot in the progress. Looking back on the last few months, there are a lot ofthings I could have done better, but also a lot of things that worked out well. The mostimportant lesson I learned in these months, is that I should not hesitate to approach mysupervisor. While reading the articles I used for this thesis, there were moments whereI was struggling a lot with minor details, which in some cases cost me days. However, asimple explanation by my supervisor, sometimes just a clarification of a definition, wasenough to get me back on track.

After the introduction to the subject of nonparametric density estimation and mono-

32

tone rearrangement by Dr. Van Es, I started familiarizing myself with the subject.There were a lot of books at hand with a proper introduction to kernel estimation to getacquainted with this principle. However, since the method of monotone rearrangementwas only recently introduced in the subject of statistics, I had to get all the informationon this subject out of articles that are not written for undergraduate students who arenew to this field of research. This made it harder to fully understand the method andwork with it. Therefore it took me more time than I had expected, especially becausemost of the time while I was working on this project, I still had to follow courses, soI had to divide my attention. Only when the courses ended, this project could finallyget my full attention, and in those weeks the real progress was made. Furthermore,simulating in Matlab was relatively new to me, but fortunately enough it did not tookme long to gain experience in the programming and I really enjoyed it.

Personally, I found the conclusion drawn rather surprising, for I had expected thatmonotone rearrangement would perform better in combination with the reflection method.If I had had more time for this project, I would have liked to find out why the combinedperformance of these methods is relatively poor. Also, I would have liked to take a lookat another method of nonparametric density estimation, the method of nonparametricmaximum likelihood estimators (NPMLE), and measure its performance relatively tothe kernel estimators. On top of that, I would have liked to learn more about asymp-totic statistics to find out more about the asymptotic properties of different estimationmethods. Luckily, I will get this chance next year during my masters.

To conclude, I would like to thank Dr. Van Es for introducing me to this subject andhis support during the process of writing this thesis.

33

5. Populaire Samenvatting

Het onderwerp van deze bachelorscriptie is het schatten van kansdichtheden en verschil-lende methoden om een dergelijke schatting te verbeteren. Om dat volledig te kunnenbegrijpen, is het uiteraard een vereiste te weten wat een kansdichtheid is.

Gemakkelijk gezegd, is een kansdichtheid een functie die aan elke gebeurtenis de kansvan zijn voorkomen toekent. In het geval dat er eindig veel mogelijkheden zijn, is dit zelfsde definitie van een kansdichtheid. Om alles te verduidelijken zullen we de begrippenverder verkennen met behulp van een voorbeeld. Stel we zouden van een bepaalde geiserde lengte van een eruptie willen kunnen voorspellen. Om dat te kunnen doen, gaan weeen maand lang met een stopwatch naast de geiser staan, en elke keer dat hij uitbarstmeten we hoeveel minuten de uitbarsting duurt. Aan het eind van de maand hebbenwe, zeg 50 metingen gedaan (we zullen de eerste helft van de dataset uit Appendix C.2gebruiken om deze gegevens te simuleren). Een veel gebruikte manier om die data weerte geven is met behulp van een histogram. We delen de tijd op in balkjes en als de lengtevan een eruptie binnen een balkje valt, maken we het balkje iets hoger. Zo krijgen weuiteindelijk een histogram waarin de hoogte van elk balkje het relatieve voorkomen vande eruptielengte is, zoals te zien is in het linker deel van figuur 5.1.

Figure 5.1.: Links: histogram van de eruptielengtes (eerste helft data Appendix C.2).Rechts: schatting van de kansdichtheid op basis van het histogram.

Als we nu de balkjes wegdenken, alleen de bovenkant nemen en dit beschouwen alseen curve, zoals gedaan is in de rechterkant van figuur 5.1, zouden we dit kunnen in-terpreteren als een schatting van de kansdichtheid. De oppervlakte onder een interval

34

representeert dan de kans dat de lengte van de eruptie binnen dat interval valt.Wiskundigen zijn echter om meerdere redenen niet helemaal tevreden met een derge-

lijke schatting van de kansdichtheid. Zo is het feit dat er allerlei sprongen in de dichtheidzitten niet bevredigend. Liever zouden we een gladde curve krijgen. Een methode omdit voor elkaar te krijgen is de zogenaamde kernschatter methode. In plaats van datwe de datapunten in een balkje plaatsen, plaatsen we een curve over elk datapunt, dieeen gewicht toekent aan het punt en het gebied eromheen. Het optellen van al dezebolletjes geeft ons dan een schatting van de kansdichtheid. Als we dit toepassen op onzedataset, resulteert dat in figuur 5.2. In het linker deel hebben we de schatting gemaaktop basis van 4 metingen en de individuele curves voor de datapunten weergegeven, in derechter figuur is de schatting gemaakt met behulp van alle metingen. Ook hebben we dewerkelijke kansdichtheid waarmee onze data verdeeld is (de exponentieel-1 dichtheid),in de figuur geplot om te hoe goed deze benaderd wordt. We zien dat deze methode, integenstelling tot het histogram, wel een mooie, gladde curve oplevert, die in alle gevallendichter bij de werkelijke kansdichtheid zal liggen dan onze histogramschatting. Om dezeschatting te optimaliseren, kan er nog veel winst behaald worden in de keuze van debreedte van de individuele curves, maar op dit moment is deze schatting illustratiefgenoeg.

Figure 5.2.: Links: de kernschatting van de kansdichtheid met behulp van 4 metingen.Rechts: de kernschatting van de kansdichtheid op basis van alle data (door-getrokken lijn), en de werkelijke kansdichtheid (gestreepte lijn).

Stel nu dat we de extra informatie hebben dat de kans op korte uitbarstingen altijdgroter is dan de kans op lange uitbarstingen. Als we dan een schatting maken van dekansdichtheid van de lengte van de uitbarstingen, zouden we deze informatie natuurlijkgraag mee willen nemen om onze schatting te verbeteren. De relatief simpele, maar tochpas recent ontdekte methode in dit vakgebied, is een methode die in feite neerkomt ophet herschikken van de functiewaarden van de rechterkant van figuur 5.2. De methodegaat als volgt: we berekenen in heel veel equidistante punten op de x-as de functiewaarde

35

van de kernschatter uit. Vervolgens ordenen we die functiewaarden van hoog naar laag.De op deze manier verkregen functie is onze nieuwe schatting. In ons geval levert ditfiguur 5.3 op, waar we de werkelijke kansdichtheid ook in de figuur geplot hebben.

Figure 5.3.: De monotone herschikking van de schatting uit de rechterkant van figuur5.2 (doorgetrokken lijn), en de werkelijke kansdichtheid (gestreepte lijn).

Op het oog ziet deze schatting er al veelbelovend uit, en deze methode van monotoneherschikking blijkt theoretisch over veelbelovende eigenschappen te beschikken. In dezebachelorscriptie wordt er, naast deze methode, nog een andere methode om een schattingmet kernschatters te verbeteren behandeld. Daarna wordt onderzocht of het combinerenvan de twee methodes de schatting significant verbeterd. Na het doen van enkele simu-laties, concluderen we echter dat dit niet het geval is.

36

Bibliography

[1] E. F. Schuster, Incorporating support constraints into nonparametric estimators ofdensities, Communications in Statistics - Theory and Methods, 14, 5, 1123-1136,1985.

[2] V. Chernozhukov, I. Fernandez-val and A. Galichon, Improving Point and IntervalEstimators of Monotone Functions by Rearrangement, Biometrika, 96, 3, 559-575,2009.

[3] G. H. Hardy, J. E. Littlewood and G. Polya, Inequalities, Cambridge UniversityPress, 1952.

[4] V. Epanechnikov, Nonparametric Estimation of a Multidimensional ProbabilityDensity, Theory of Probability and its Applications, 14, 153-158, 1966.

[5] B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapmanand Hall, 1986.

[6] B. L. S. Prakasa Rao, Nonparametric Functional Estimation, New York: AcademicPress, 1983.

[7] P. Whittle, On the Smoothing of Probability Density Functions, Journal of the RoyalStatistical Society. Series B, 20, 2, 334-343, 1958.

[8] M.P. Wand, M.C. Jones, Kernel Smoothing, Chapman and Hall, 1995.

[9] E. Parzen, On Estimation of a Probability Density Function and Mode, The Annalsof Mathematical Statistics, 33, 1065-1076, 1962.

[10] C. Van Eeden, Mean Integrated Squared Error of Kernel Estimators when the Den-sity and its Derivatives are not Necessarily Continuous, Annals of the Institute ofMathematical Statistics, 37, A, 461-472, 1985.

[11] S. J. Sheather, M. C. Jones, A Reliable Data-Based Bandwidth Selection Methodfor Kernel Density Estimation, Journal of the Royal Statistical Society. Series B,53, 683-690, 1991.

[12] I. Horova, J. Kolacek, J. Zelinka, Kernel Smoothing in MATLAB, World Scientific,2012.

37

[13] R. J. Karunamuni, T. Alberts, A Locally Adaptive Transformation Method ofBoundary Correction in Kernel Density Estimation, Journal of Statistical Plan-ning and Inference, 136, 5, 2936-2960, 2006.

[14] M. Birke, Shape Constrained Kernel Density Estimation, Journal of Statistical Plan-ning and Inference, 139, 8, 2851-2862, 2009.

[15] N. Neumeyer, A note on uniform consistency of monotone function estimators,Statist. Probab. lett., 77, 693-703, 2007.

[16] C. Bennett, R. C. Sharpley, Interpolation of Operators, Academic Press New York,1988.

[17] H. Dette, S. Volgushev, Non-crossing nonparametric estimates of quantile curves,Journal of the Royal Statistical Society. Series B, 70, 3, 609-627, 2008.

[18] A. J. Van Es, A. J. Hoogstrate, How much do plug-in bandwidth selectors adapt tonon-smoothness?, Journal of Nonparametric Statistics, 8, 2, 185-197, 1997.

[19] http://www.dynare.org/dynare-matlab-m2html/matlab/mh_optimal_

bandwidth.html

[20] D. Anevski, A. Fougeres, Limit Properties of the Monotone Rearrangement forDensity and Regression Function Estimation, arXiv:0710.4617 [math.ST], preprint(2008).

[21] C. J. Stone, An Asymptotically Optimal Window Selection Rule for Kernel DensityEstimates, Annals of Statistics, 12, 4, 1285-1297, 1984.

[22] D. Pollard, A User’s Guide to Measure Theoretic Probability, Cambridge UniversityPress, 2002.

[23] A. W. Van Der Vaart, Asymptotic Statistics, Cambridge University Press, 1998.

38

http://www.dynare.org/dynare-matlab-m2html/matlab/mh_optimal_bandwidth.html

http://www.dynare.org/dynare-matlab-m2html/matlab/mh_optimal_bandwidth.html

A. Theorems and Proofs

The following theorem as stated in Birke (2009) [14], proves that the smoothed re-arrangements g∗hd and g∗hd of an unconstrained probability density are still probabilitydensities, and converge pointwise to the isotone rearrangements g∗ and g∗ respectively.We will here elaborate on the proof from Birke, to make it readable for a third yearbachelor mathematics student. The theorem below states the result for the increasingrearrangement. The proof for the decreasing case is similar, and therefore omitted.

Theorem A.1. Let D be the set of all probability densities on A, a compact interval in R.For any unconstrained continuous probability density g ∈ D, the smoothed rearrangementg∗hd of g converges pointwise to g∗ ∈ D, if hd → 0.

Proof. We will first show that Fg,hd(t) converges to Fg(t) as hd → 0. Without loss ofgenerality, we choose A = [0, 1]. Then, from the definition of Fg,hd(t),

Fg,hd(t) :=1

hd

∫ 1

0

∫ t

−∞Kd

(g(u)− v

hd

)dvdu,

because the kernel Kd is 0 outside [−1, 1],

Kd

(g(u)− v

hd

)= 0 if v ≤ g(u)− hd,

we can change the lower bound in the second integral to g(u) − hd and because thesecond integral is zero if g(u) > t+ hd, we get

1

hd

∫ 1

0

∫ t

−∞Kd

(g(u)− v

hd

)dvdu

=1

hd

∫ 1

0

1 {g(u) ≤ t+ hd}∫ t

g(u)−hdKd

(g(u)− v

hd

)dvdu

=

∫ 1

0

1 {g(u) ≤ t+ hd}∫ 1

(g(u)−t)/hdKd(z)dzdu, (A.1)

by change of variable v = g(u) − hdz. Now considering that Kd integrates to unityover [−1, 1], separating the cases (g(u) − t)/hd ≤ −1 (second integral equals 1) and

39

(g(u)− t)/hd > −1, we can write∫ 1

0

1 {g(u) ≤ t+ hd}∫ 1

(g(u)−t)/hdKd(z)dzdu,

=

∫ 1

0

1 {g(u) ≤ t+ hd}1 {g(u) ≤ t− hd} du

+

∫ 1

0

1 {t− hd ≤ g(u) ≤ t+ hd}∫ 1

(g(u)−t)/hdKd(z)dzdu

=

∫ 1

0

1 {g(u) ≤ t− hd} du+

∫ 1

0

1 {t− hd ≤ g(u) ≤ t+ hd}∫ 1

(g(u)−t)/hdKd(z)dzdu.

(A.2)

Thus, by the definition of Fg(t),

Fg(t) :=

∫ 1

0

1 {g(u) ≤ t} du,

we get

|Fg(t)− Fg,hd(t)| =∣∣∣∣∫ 1

0

1 {t− hd ≤ g(u) ≤ t} du

−∫ 1

0

1 {t− hd ≤ g(u) ≤ t+ hd}∫ 1

(g(u)−t)/hdKd(z)dzdu

∣∣∣∣≤∫ 1

0

1 {t− hd ≤ g(u) ≤ t} du

+

∫ 1

0

1 {t− hd ≤ g(u) ≤ t+ hd} du −→ 0,

as hd → 0, using the uniform continuity of g. Since g∗ and g∗hd are the generalizedinverses of Fg and Fg,hd respectively, because of the continuity of the functional whichmaps a function onto its inverse in a fixed point x, this implies

g∗hd(x) −→ g∗(x) as hd −→ 0. (A.3)

So we showed the pointwise convergence. We now have to show that g∗ is still a densityto conclude the proof. Since by definition of monotone rearrangement, g∗ only rearrangesthe function values in increasing order and g ≥ 0, we find g∗ ≥ 0. Furthermore, becausemonotone rearrangement of a function preserves the Lp-norm, as showed by Bennettand Sharpley (1988) [16], taking p = 1, we get

∫A

g∗(x)dx = ‖g∗‖1 = ‖g‖1 =

∫A

g(x)dx = 1. (A.4)

Thus we conclude that g∗ ∈ D, which proves the theorem.

40

The following theorem, as described by Chernozhukov (2009) [2], implies that theincreasing rearrangement as defined in (1.35) will be a better approximation of thetarget function in the standard norm on Lp: for a measurable function f : A → K,

‖f‖p =(∫

A|f(x)|pdx

) 1p . To prove the theorem, we will first need the following definition.

Definition A.2. A function L : Rk → R is called submodular if

L(x ↑ y) + L(x ↓ y) ≤ L(x) + L(y), (A.5)

for all x, y ∈ Rk, where x ↑ y denotes the componentwise maximum and x ↓ y denotesthe componentwise minimum of x and y.

Below, we will formulate the theorem by Chernozhukov and elaborate on the proof. Theproof is dealing with the increasing rearrangement, but for the decreasing case the proofis similar and therefore omitted.

Theorem A.3. Let A be a compact interval in R and the target function f : A→ K bea weakly increasing measurable function in x, where K is a bounded subset of R. Andlet f : A→ K be a measurable estimate of the target function.

1. For any p ∈ [1,∞), applying monotone rearrangement to f , to get f ∗, weaklyreduces the estimation error:

‖f ∗(x)− f(x)‖p ≤ ‖f(x)− f(x)‖p. (A.6)

2. If there exist regions A1 and A2, with measure greater than δ > 0, such that for allx ∈ A0 and x′ ∈ A1: (i) x′ > x, (ii) f(x) > f(x′) + ε, and (iii) f(x′) > f(x) + ε,for some ε > 0. Then, for any p ∈ [1,∞):

‖f ∗(x)− f(x)‖p ≤[‖f(x)− f(x)‖pp − δηp

] 1p, (A.7)

where ηp = inf {|v − t′|p + |v′ − t|p − |v − t|p − |v′ − t′|p} and ηp > 0 for p ∈(1,∞), with the infimum taken over all v, v′, t, t′ in the set K such that v′ ≥ v + εand t′ ≥ t+ ε.

Proof. We will start by proving the first part of the theorem, considering simple functionsat first, using the dominated convergence theorem to prove the general case. Supposeboth the estimate f(·) and the target function f(·) are simple functions that are constanton the intervals ( s−1

r, 1r], s = 1, . . . , r. Each simple function g(·) of this form can be

written as the step function g(·) =∑r

s=1 gs1( s−1r, sr

](·). Now we can define the r-vector g

as the vector with values gs, the value g(·) takes on the sth interval, on position s. Fromthis definition we see that each r-vector corresponds to a step function.

We will now define the sorting operator S acting on the vector g as follows. Let l bean integer in 1, . . . , r such that gl > gm for some m > l. If l exists, set S(g) to be ther-vector with gm on the lth position and gl on the mth position, and all other elementsequal to the corresponding elements of f . If such an l does not exist, set S(g) = g.

41

For any submodular function L: R2 → R+, because gl ≥ gm and, by the fact that fis weakly increasing, fm ≥ fl, we get

L(gm, fl) + L(gl, fm) ≤ L(gl, fl) + L(gm, fm). (A.8)

Therefore, taking into consideration that the L of simple functions is still a simplefunction, ∫

A

L(S(f)(x), f(x)

)dx =

r∑s=1

L(S(f)s, fs

)(A.9)

≤r∑s=1

L(fs, fs

)=

∫A

L(f(x), f(x)

)dx (A.10)

where the inequality follows from the submodularity of L. If we apply the sortingoperator S sufficiently many times to f , to a maximum of r times, we find the rearrangedvector f ∗: a vector completely sorted in ascending order. Every application of the sortingoperator S reduces the size of the integral, and since we apply it a finite number of timeswe get ∫

A

L(f ∗(x), f(x)

)dx =

∫A

L(S ◦ . . . ◦ S(f)(x), f(x)

)dx

≤∫A

L(f(x), f(x)

)dx (A.11)

If we now notice L(x, y) = |x− y|p is submodular for p ∈ [1,∞), we see that this impliesthat ‖f ∗ − f‖p ≤ ‖f − f‖p. Thus, for simple f and f the rearranged estimate will havean approximation error that is smaller or equal to the approximation error of the originalestimate. We will now extend this inequality to the general case.

Let f(·) and f(·) be measurable functions mapping A to K, then there exists a se-quence of bounded simple functions f (r)(·) and f (r)(·) converging to f(·) and f(·) a.e.as r →∞ and taking values in K [22]. Because of the definition of the rearrangementsas quantile functions, the almost everywhere convergence of f (r)(·) to f(·) and f (r)(·) tof(·) implies the almost everywhere convergence of their rearrangements f ∗(r)(·) to therearrangement of the limit f ∗(·), and f ∗(r)(·) to f ∗(·) = f(·) [23], [16]. Note that the lastequality is because for the weakly increasing function f , the rearrangement f ∗ equalsf . Since inequality (A.11) holds along the sequence, the dominated convergence theoryimplies that it also holds for the general case. This concludes the proof of part 1 of thetheorem.

We will now move on to part 2 of the theorem. We will start out the same way as inthe proof for part 1, with simple functions f and f . We take this functions to satisfythe conditions from the theorem, that is: there exist regions A1 and A2, with measuregreater than δ > 0, such that for all x ∈ A0 and x′ ∈ A1: (i) x′ > x, (ii) f(x) > f(x′)+ ε,

42

and (iii) f(x′) > f(x) + ε, for some ε > 0. Then, for any strictly submodular functionL: R2 → R+, we find

η = inf {L(v′, t) + L(v, t′)− L(v, t)− L(v′, t′)} > 0, (A.12)

with the infimum taken over all v, v′, t, t′ in the set K such that v′ ≥ v+ ε and t′ ≥ t+ ε.We will begin the sorting by exchanging the elements f(x), x ∈ A0, of the r-vector f

with the elements f(x′), x ∈ A1, of f , for as long this is possible. For each point sortedthis way, this will induce a sorting gain of at least η, from the strict submodularity byequation (A.12), times 1/r, the length of the interval where f(x) has the value exchanged.The total mass of points that can be sorted this way is at least δ, for both regions arebigger than δ. So if we sort all these points in this way, the gain is at least δη. If wethen continue to sort the other points with the sorting operator we used in the proof ofpart 1, we get the following inequality,∫

A

L(f ∗(x), f(x)

)dx ≤

∫A

L(f(x), f(x)

)dx− δη. (A.13)

In exactly the same way as in the proof of part 1, we can extend this to the case ofgeneral measurable function. This concludes the proof of the theorem.

43

B. MATLAB code

Our target density, the exponential-1 density

function [f]= exppdf(lambda,grid)

l = length(grid);

f = zeros(1,l);

for i=1:l;

if grid(i) >= 0;

f(i) = lambda * exp(-lambda*grid(i));

end

end

end

B.1. Epanechnikov Kernel Estimator

The implementation of the Epanechnikov kernel estimator.

function f = Epan(x)

f=0;

if abs(x)<=1;

f=0.75*(1-x^2);

end

end

function y = EpKernel(datapoint,h,grid)

n=length(grid);

y=zeros(1,n);

for i=1:n;

y(i)=Epan((grid(i)-datapoint)/h);

end

44

end

function [estimate] = kernestEp(data,grid,h)

l = length(grid);

n = length(data);

estimate = zeros(1,l);

for k=1:n;

estimate = estimate + (EpKernel(data(k),h,grid)/(n*h));

end

end

The reflection method for the Epanechnikov kernel estimator.

function [estimate] = kernestmirrorEp(data,grid,h)

l = length(grid);

n = length(data);


%reflection of the data

datas = zeros(2*n);

for j = 1:n;

datas(j) = -data(j);

datas(n + j) = data(j);

end

for k=1:2*n;

estimate = estimate + (EpKernel(datas(k),h,grid)/(n*h));

end

for z = 1:l;

if grid(z) < 0;

estimate(z) = 0;

else

break

end

end

end

45

B.2. Gaussian Kernel Estimator

The implementation of the gaussian kernel estimator.

function f = npdf(x)

f=1/(sqrt(2*pi))*exp(-(x.^2)/2);

end

function y = NorKernel(datapoint,h,grid)

l=length(grid);

y=zeros(1,l);

for i=1:l;

y(i)=npdf((grid(i)-datapoint)/h);

end

end

function [estimate] = kernestNor(data,grid,h)

l = length(grid);

n = length(data);


for k=1:n;

estimate = estimate + (NorKernel(data(k),h,grid)/(n*h));

end

end

The reflection method for the gaussian kernel estimator.

function [estimate] = kernestmirrorNor(data,grid,h)

l = length(grid);

n = length(data);


%reflection of the data

datas = zeros(2*n);

46

for j = 1:n;

datas(j) = -data(j);

datas(n + j) = data(j);

end

for k = 1:2*n;

estimate = estimate + (NorKernel(datas(k),h,grid)/(n*h));

end

for z = 1:l;

if grid(z) < 0;

estimate(z) = 0;

else

break

end

end

end

B.3. Montone Rearrangement

The implementation of the decreasing isotone monotone rearrangement.

function [rearr]=mon_rearr(fx)

y = sort(-fx);

rearr = -y;

end

B.4. Simulations

Simulations en calculations of MSE and MISE.

%set number of grid intervals

l = 600;

%sample size

n = 500;

%number of samples

p = 1000;

%sample parameter

47

lambda = 1;

%begin interval

a = -2;

%end interval

b = 7;

%setting grid

x = linspace(a,b,l+1);

%creating grid of the positive x

x_0 = [];

for i = 1:l+1;

if x(i)>=0;

x_0 = [x_0,x(i)];

end

end

%distance between points

d = (b-a)/l;

%target distribution

f = exppdf(lambda,x);

%pure kernel estimation

estimate = zeros(p,l+1);

%monotone rearrangement

estimate_mon = zeros(p,l+1);

%reflection method

estimate_ref = zeros(p,l+1);

%reflection and monotone rearrangement

estimate_ref_mon = zeros(p,l+1);

for i = 1:p

%generating sample

sample = exprnd(lambda,1,n);

%bandwidth selection

h = mh_optimal_bandwidth(sample); %script from Dynare

48

%pure kernel estimation

%kernel estimation

f_i = kernestNor(sample,x,h);

%storing estimate

estimate(i,:) = f_i;

%monotone rearrangement

%applying monotone rearrangement

fm_i = mon_rearr(f_i);

fm_i = fm_i(1:length(x_0));

%storing estimate

estimate_mon(i,l+1-length(x_0)+1:end) = fm_i;

%reflection method

%kernel estimation

fr_i = kernestmirrorNor(sample,x,h);

%storing estimate

estimate_ref(i,:) = fr_i;

%reflection and monotone rearrangement

%applying monotone rearrangement

frm_i = mon_rearr(fr_i);

frm_i = frm_i(1:length(x_0));

%storing estimate

estimate_ref_mon(i,l+1-length(x_0)+1:end) = frm_i;

end

%summing rows

cumsum = sum(estimate,1);

cumsum_mon = sum(estimate_mon,1);

cumsum_ref = sum(estimate_ref,1);

cumsum_ref_mon = sum(estimate_ref_mon,1);

49

%calculating bias

bias = zeros(1,l+1);

bias_mon = zeros(1,l+1);

bias_ref = zeros(1,l+1);

bias_ref_mon = zeros(1,l+1);

for k = 1:l+1;

bias(k) = 1/p * cumsum(k) - f(k);

bias_mon(k) = 1/p * cumsum_mon(k) - f(k);

bias_ref(k) = 1/p * cumsum_ref(k) - f(k);

bias_ref_mon(k) = 1/p * cumsum_ref_mon(k) - f(k);

end

%calculating variance

variance = zeros(1,l+1);

variance_mon = zeros(1,l+1);

variance_ref = zeros(1,l+1);

variance_ref_mon = zeros(1,l+1);

matrix = zeros(p,l+1);

matrix_mon = zeros(p,l+1);

matrix_ref = zeros(p,l+1);

matrix_ref_mon = zeros(p,l+1);

for z = 1:p;

matrix(z,:) = (estimate(z,:) - 1/p * cumsum).^2;

matrix_mon(z,:) = (estimate_mon(z,:) - 1/p * cumsum_mon).^2;

matrix_ref(z,:) = (estimate_ref(z,:) - 1/p * cumsum_ref).^2;

matrix_ref_mon(z,:) = (estimate_ref_mon(z,:) - 1/p * cumsum_ref_mon).^2;

end

cumsum2 = sum(matrix,1);

cumsum2_mon = sum(matrix_mon,1);

cumsum2_ref = sum(matrix_ref,1);

cumsum2_ref_mon = sum(matrix_ref_mon,1);

for j = 1:l+1;

variance(j) = 1/(p-1) * cumsum2(j);

variance_mon(j) = 1/(p-1) * cumsum2_mon(j);

variance_ref(j) = 1/(p-1) * cumsum2_ref(j);

variance_ref_mon(j) = 1/(p-1) * cumsum2_ref_mon(j);

end

50

%calculating MSE

MSE = bias.^2 + variance;

MSE_mon = bias_mon.^2 + variance_mon;

MSE_ref = bias_ref.^2 + variance_ref;

MSE_ref_mon = bias_ref_mon.^2 + variance_ref_mon;

%applying composite Simpson’s rule to calculate MISE

MISE = 0;

MISE_mon = 0;

MISE_ref = 0;

MISE_ref_mon = 0;

for m = 1:l/2;

MISE = MISE + d/3 * (MSE(2*m-1)+4*MSE(2*m)+MSE(2*m+1));

MISE_mon = MISE_mon + d/3 * (MSE_mon(2*m-1)+4*MSE_mon(2*m)+MSE_mon(2*m+1));

MISE_ref = MISE_ref + d/3 * (MSE_ref(2*m-1)+4*MSE_ref(2*m)+MSE_ref(2*m+1));

MISE_ref_mon = MISE_ref_mon + d/3 * (MSE_ref_mon(2*m-1)+4*MSE_ref_mon(2*m)+

MSE_ref_mon(2*m+1));

end

%--------------------------------------------------------------------------

plot(x,MSE,x,MSE_mon,x,MSE_ref,x,MSE_ref_mon)

MISE

MISE_mon

MISE_ref

MISE_ref_mon

51

C. Samples

C.1. Sample 1

The sample from the N (2, 0.5) distribution introduced in the beginning of chapter 1.

Data set 11.859130517 2.150285993 1.329358799 3.033707827 1.9388238832.92439891 2.18154171 1.315351276 1.226787768 1.9974272481.63756407 0.867908815 1.763219685 2.060479471 2.0030951362.015978479 2.034225715 2.467058137 1.903709855 2.5204662251.572335255 1.416589732 1.648264727 2.221823764 1.1387784462.511194472 2.210809691 2.74901624 2.077400774 1.9512980672.565208967 0.837820992 2.136638497 1.596050921 2.6708960572.351654536 1.512167994 2.232414594 1.959188191 2.4416613872.300198127 2.107729066 1.768741581 2.806583648 2.0622238512.409254896 1.99022004 2.56824382 1.753444281 1.5973350191.520747442 1.781527162 2.117442297 1.947934224 2.3892217391.777120696 2.075653694 2.682696017 1.424887318 1.9812148762.2155439 1.688834295 2.183972628 2.640224035 1.6698636872.069464545 1.792603349 2.314631996 2.385431905 2.2109821.199309145 1.443658999 2.562451745 1.682006522 1.6681591171.412233492 1.742094356 1.6875907 1.590964349 2.1972681321.593629023 1.849007438 2.045220762 2.210027768 2.3994813811.143021337 1.89398825 2.073727989 2.140081987 2.2891633911.857464664 1.851461126 2.465186527 1.322685628 2.1925550332.332546544 1.849450922 2.052489779 1.968057 0.812498218

52

C.2. Sample 2

The sample from the exponential-1 distribution used from section 1.4.1 on.

Data set 20.580752135 0.411710422 0.811186942 0.776799198 0.9987194210.068715857 0.647574915 0.279827506 0.796547943 0.0599434460.328027506 1.34748034 0.505346701 0.595765809 4.064443630.014873424 0.292888309 0.127758079 2.225197878 0.2854251670.725590795 0.038747308 0.244283002 0.21641087 0.1874676850.447802167 0.615808364 2.172165448 0.355461291 0.4674613040.119192554 3.497592757 0.021669205 0.136695818 0.6185101061.615774194 0.361953905 0.164171264 2.952823132 0.4300022430.927942668 0.654472467 2.982885198 1.515577367 0.3193385160.007855471 2.829700077 0.763136619 0.777307331 2.3592764690.910428904 0.116493106 1.121922037 0.042350666 0.1305940720.417249551 1.108049956 0.461709917 0.235664921 4.2431594270.103863722 1.470975954 1.468376103 0.794350551 1.2231466980.00462892 2.172007441 0.544925517 1.09832777 1.7152714370.425928133 1.168210907 0.505578878 2.828603669 0.0765633112.221591196 1.476515296 0.511027151 0.299882458 2.6855976053.321073824 0.427715062 0.802007412 0.679649636 0.5428440740.481119196 2.715677345 3.340380947 1.609810916 0.4507482540.567141401 1.289416756 0.665892322 0.850518087 0.4288321050.038777567 1.266485867 0.897149692 1.779690916 0.145462847

53

Documents

Nonparametric Density Estimation and Monotone Rearrangement · In nonparametric density estimation, the method of kernel estimators is commonly used. However, if extra information