- Home
- Documents
*ESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES · PDF fileESTIMATION OF INTEGRATED...*

of 155/155

ESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES by Brian Kent Aldershof A dissertation submitted to the faculty of The University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics. Chapel Hill 1991 Advisor Reader U Reader ----'---+t----'---"'----

View

214Download

1

Embed Size (px)

ESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES

by

Brian Kent Aldershof

A dissertation submitted to the faculty of The University of North Carolina at Chapel Hill in

partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department

of Statistics.

Chapel Hill

1991

Advisor

Reader

UL~./~Reader

----'---+t----'---"'----

BRIAN KENT ALDERSHOF. Estimation of Integrated Squared Density Derivatives

(under the direction of J. Steven Marron)

ABSTRACT

The dissertation research examines smoothing estimates of integrated squared density

derivatives. The estimators discussed are derived by substituting a kernel density estimate into

the functional being estimated. A basic estimator is derived and then many modifications of it

are explored. A set of similar bias-reduced estimators based on jackknife techniques, higher

order kernels, and other modifications are compared. Many of these bias reduction techniques

are shown to be equivalent. A computationally more efficient estimator based on binning is

presented. The proper way to bin is established so that the binned estimator has the same

asymptotic MSE convergence rate as the basic estimator.

Asymptotic results are evaluated by using exact calculations based on Gaussian mixture

densities. It is shown that in some cases the asymptotic results can be quite misleading, while

in others they approximate truth acceptably well.

A set of estimators is presented that relies on estimating similar functionals of higher

derivatives. It is shown that there is an optimal number of functionals that should be

estimated, but that this number depends on the density and the sample size. In general, the

number of functionals estimated is itself a smoothing parameter. These results are explored

through asymptotic calculations and some simulation studies.

ii

ACKNOWLEDGEMENTS

I am very grateful for the encouragement, support, and guidance of my advisor Dr. J.

Steven Marron. His insights and intuition led to many of the results here. His patient

encouragement helped me through rough times. Thanks, Steve.

I am grateful to the people who supported me and my family throughout my years in

Graduate School. In particular, thanks to my mother who always helped out. Thanks also to

my in-laws for their support.

Most of all, I want to thank my wife and daughter. My family has always been loving

and supportive despite Graduate School poverty and uncertainty. Welcome to the world, Nick.

iii

TABLE OF CONTENTS

Page

LIST OF TABLES vi

LIST OF FIGURES vii

Chapter

I. Introduction and Literature Review

1. Introduction 1

2. Literature Review 3

II. Diagonal Terms

1. Introduction 13

2. Bias Reduction 14

3. Mean Squared Error Reduction 15

4. Computation 23

5. Stepped Estimators 24

III. Bias Reduction

1. Introduction 25

2. Notation 26

3. Higher Order Kernel Estimators 26

4. D - Estimators 27

5. Generalized Jackknife Estimators 28

6. Higher Order Generalized Jackknife Estimators 29

7. Relationships Among Bias Reduction Estimators 30

8. Theorems 32

9. Example 34

10. Proofs 38

IV. Computation

1. Introduction 48

2. Notation 49

3. The Histogram Binned Estimator : 50

4. Computation of 8m (h, n, K) 535. Generalized Bin Estimator 54

6. Proofs 57

iv

V. Asymptotics and Exact Calculations

1. Introduction 69

2. Comparison of Asymptotic and Exact Risks 69

3. Exact MSE Calculations 73

4. Examples 77

5. Proofs 80

VI. Estimability of 8m and m

1. Introduction 84

2. Asymptotic Calculations 84

3. Exact MSE Calculations 86

VII. The One - Step Estimator

1. Introduction 92

2. Assumptions and Notation 93

3. Results 94

4. Figures 100

5. Conclusions 101

6. Proofs 106

7. em(t) and calculating the skewness of 8m (h, n, K) 115

VIII. The K - Step Estimator

1. Introduction 118

2. Assumptions and Notation 120

3. Results 121

4. Simulations 125

5. Conclusions 127

6. Proofs 132

Appendix A viii

v

Table 2.1:

Table 2.2:

Table 6.1a:

Table 6.1b:

LIST OF TABLES

Exact Asymptotic Values of MSEj02 21

"Plug-in" Values of MSEj02 22

Values of N1/ 2(m) for m=O, , 5; Distns 1-8 89

Values of N1/ 2(m) for m=O, , 5; Distns 9-15 90

vi

Figure 3.1:

Figure 3.2a:

Figure 3.2b:

Figure 3.2c

Figure 5.1:

Figure 5.2:

Figure 6.1a:

Figure 6.1b:

Figure 7.1a:

Figure 7.1b:

Figure 7.2a:

Figure 7.2b:

Figure 7.3a:

Figure 7.3b:

Figure 7Aa:

Figure 7Ab:

Figure 8.1a:

Figure 8.1b:

Figure 8.2a:

Figure 8.2b:

Figure 8.3a:

Figure 8.3b:

Figure 804:

LIST OF FIGURES

Equivalences of Bias Reduction Techniques 31

D-estimator kernels 36

Jackknife kernels 36

Second-order kernel 37

MSE vs log(Bandwidth) (Distn #4; Sample Size = 1000) 79

MSE vs log(Bandwidth) (Distn #11; Sample Size = 1000) 79

Ntol(O) vs tolerance 91

Ntol(l) vs tolerance 91

MSE vs Bandwidth (Distn #2; Sample Size = 250) 103

MSE vs Bandwidth (Distn #2; Sample Size = 1000) 103

MSE vs log(Bandwidth) (Distn #2; Sample Size = 250) 104

MSE vs log(Bandwidth) (Distn #2; Sample Size = 1000) 104

MSE vs Bandwidth (Distn #2; Sample Size = 250) 105

MSE vs Bandwidth (Distn #2; Sample Size = 1000) 105

C2(T) vs T (Distn #1; Sample Size = 250) 117

C2(T) vs T (Distn #1; Sample Size = 1000) 117

Theta-hat densities (Distn #6; Squared 1st DerivativeSS=100; 100 Samples) 128

Theta-hat densities (Distn #6; Squared 1st DerivativeSS=100; 100 Samples) 128

Theta-hat densities (Distn #3; Squared 1st DerivativeSS=100; 100 Samples) : 129

Bandwidth densities (Distn #3; Squared 1st DerivativeSS=100; 100 Samples) 129

MSE vs Step (Distn #6; Sample Size = 100) 130

MSE vs Step (Distn #6; Sample Size = 500) 130

MSE vs Step (Distn #3; Sample Size = 100) 131

vii

Chapter I: Introduction and Literature Review

1. Introduction

This research discusses a class of estimators of the functional Om = J(im )Yfor faprobability density function. The estimators discussed in this dissertation are of the form:

(1.1)

for some D which mayor may not be a function of the data. The goal of the research is to

discuss the behavior of these estimators in a variety of settings and to provide guidelines for

computing them.

Chapter II discusses possible choices of D given in (1.1). D can be chosen to reduce bias

simply by including the diagonal terms of the double sum thereby making Bnecessarily positive.

In many settings, this estimator performs better than a "leave-out-the-diagonals" version with

D = O. A possibly better estimator chosen to reduce MSE is also given although with reasonable

sample sizes this did not perform as well as hoped.

Chapter III discusses three strategies for reducing bias in B. The "leave-in-the-..

diagonals" estimator is some improvement over the "no-diagonals" estimator because it reduces

bias with some choice of bandwidth. More sophisticated strategies for reducing bias can improve

the estimator even more (at least with a sufficiently large sample size). The three strategies

explored are jackknifing, higher order kernels, and special choices of D (as in (1.1)). Some

equivalences between these techniques are given.

..

Chapter IV discusses computation of the estimators using a strategy called "binning".

Binning is used to reduce the number of computationally intensive kernel evaluations. An

optimal binning strategy is given in this chapter.

Chapter V discusses asymptotic calculations. Many of the guidelines presented in this

dissertation and in many areas of density estimation rely on asymptotic results. This chapter

provides some comparison of these results to exact results. A tool used to do this is exact

calculations based on Gaussian mixtures. Theorems required to do these calculations are

presented here.

Chapter VI provides some guidelines about the relative difficulty of estimating (Jm

compared to (Jm+k. Both asymptotic and exact calculations are used to show that it gets much

more difficult to estimate (Jm for greater m. It is also shown that the asymptotic calculations

become increasingly misleading in this context as m increases.

Chapter VII discusses a "one-step" estimator for choosing the bandwidth of Bm The

estimator is based on estimating (Jm+l first and using this estimate to find an "optimal"

bandwidth for Bm . The results given here suggest that there may be some advantages to using

the one-step estimator in that it might be easier to choose a near-optimal bandwidth. The

penalty for using this estimator is that it has a greater minimum MSE than the standard

estimator.

Chapter VIII discusses a generalization of the one-step estimator called the "k-step"

estimator. In the "k-step" estimator all the functionals (Jm+!' .., (Jm+k are estimated and used

to provide a bandwidth for Bm . It is shown there that with a finite sample size and a "plug-in"

bandwidth, MSE(Bm ) decreases for the first couple of steps and ultimately increases without

bound.

2

The remai