of 155/155
ESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES by Brian Kent Aldershof A dissertation submitted to the faculty of The University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics. Chapel Hill 1991 Advisor Reader U Reader ----'---+t----'---"'----

ESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES · PDF fileESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES by ... (Bandwidth) (Distn #2; Sample Size = 250) 104 MSE vs log

  • View
    214

  • Download
    1

Embed Size (px)

Text of ESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES · PDF fileESTIMATION OF INTEGRATED...

  • ESTIMATION OF INTEGRATED SQUARED DENSITY DERIVATIVES

    by

    Brian Kent Aldershof

    A dissertation submitted to the faculty of The University of North Carolina at Chapel Hill in

    partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department

    of Statistics.

    Chapel Hill

    1991

    Advisor

    Reader

    UL~./~Reader

    ----'---+t----'---"'----

  • BRIAN KENT ALDERSHOF. Estimation of Integrated Squared Density Derivatives

    (under the direction of J. Steven Marron)

    ABSTRACT

    The dissertation research examines smoothing estimates of integrated squared density

    derivatives. The estimators discussed are derived by substituting a kernel density estimate into

    the functional being estimated. A basic estimator is derived and then many modifications of it

    are explored. A set of similar bias-reduced estimators based on jackknife techniques, higher

    order kernels, and other modifications are compared. Many of these bias reduction techniques

    are shown to be equivalent. A computationally more efficient estimator based on binning is

    presented. The proper way to bin is established so that the binned estimator has the same

    asymptotic MSE convergence rate as the basic estimator.

    Asymptotic results are evaluated by using exact calculations based on Gaussian mixture

    densities. It is shown that in some cases the asymptotic results can be quite misleading, while

    in others they approximate truth acceptably well.

    A set of estimators is presented that relies on estimating similar functionals of higher

    derivatives. It is shown that there is an optimal number of functionals that should be

    estimated, but that this number depends on the density and the sample size. In general, the

    number of functionals estimated is itself a smoothing parameter. These results are explored

    through asymptotic calculations and some simulation studies.

    ii

  • ACKNOWLEDGEMENTS

    I am very grateful for the encouragement, support, and guidance of my advisor Dr. J.

    Steven Marron. His insights and intuition led to many of the results here. His patient

    encouragement helped me through rough times. Thanks, Steve.

    I am grateful to the people who supported me and my family throughout my years in

    Graduate School. In particular, thanks to my mother who always helped out. Thanks also to

    my in-laws for their support.

    Most of all, I want to thank my wife and daughter. My family has always been loving

    and supportive despite Graduate School poverty and uncertainty. Welcome to the world, Nick.

    iii

  • TABLE OF CONTENTS

    Page

    LIST OF TABLES vi

    LIST OF FIGURES vii

    Chapter

    I. Introduction and Literature Review

    1. Introduction 1

    2. Literature Review 3

    II. Diagonal Terms

    1. Introduction 13

    2. Bias Reduction 14

    3. Mean Squared Error Reduction 15

    4. Computation 23

    5. Stepped Estimators 24

    III. Bias Reduction

    1. Introduction 25

    2. Notation 26

    3. Higher Order Kernel Estimators 26

    4. D - Estimators 27

    5. Generalized Jackknife Estimators 28

    6. Higher Order Generalized Jackknife Estimators 29

    7. Relationships Among Bias Reduction Estimators 30

    8. Theorems 32

    9. Example 34

    10. Proofs 38

    IV. Computation

    1. Introduction 48

    2. Notation 49

    3. The Histogram Binned Estimator : 50

    4. Computation of 8m (h, n, K) 535. Generalized Bin Estimator 54

    6. Proofs 57

    iv

  • V. Asymptotics and Exact Calculations

    1. Introduction 69

    2. Comparison of Asymptotic and Exact Risks 69

    3. Exact MSE Calculations 73

    4. Examples 77

    5. Proofs 80

    VI. Estimability of 8m and m

    1. Introduction 84

    2. Asymptotic Calculations 84

    3. Exact MSE Calculations 86

    VII. The One - Step Estimator

    1. Introduction 92

    2. Assumptions and Notation 93

    3. Results 94

    4. Figures 100

    5. Conclusions 101

    6. Proofs 106

    7. em(t) and calculating the skewness of 8m (h, n, K) 115

    VIII. The K - Step Estimator

    1. Introduction 118

    2. Assumptions and Notation 120

    3. Results 121

    4. Simulations 125

    5. Conclusions 127

    6. Proofs 132

    Appendix A viii

    v

  • Table 2.1:

    Table 2.2:

    Table 6.1a:

    Table 6.1b:

    LIST OF TABLES

    Exact Asymptotic Values of MSEj02 21

    "Plug-in" Values of MSEj02 22

    Values of N1/ 2(m) for m=O, , 5; Distns 1-8 89

    Values of N1/ 2(m) for m=O, , 5; Distns 9-15 90

    vi

  • Figure 3.1:

    Figure 3.2a:

    Figure 3.2b:

    Figure 3.2c

    Figure 5.1:

    Figure 5.2:

    Figure 6.1a:

    Figure 6.1b:

    Figure 7.1a:

    Figure 7.1b:

    Figure 7.2a:

    Figure 7.2b:

    Figure 7.3a:

    Figure 7.3b:

    Figure 7Aa:

    Figure 7Ab:

    Figure 8.1a:

    Figure 8.1b:

    Figure 8.2a:

    Figure 8.2b:

    Figure 8.3a:

    Figure 8.3b:

    Figure 804:

    LIST OF FIGURES

    Equivalences of Bias Reduction Techniques 31

    D-estimator kernels 36

    Jackknife kernels 36

    Second-order kernel 37

    MSE vs log(Bandwidth) (Distn #4; Sample Size = 1000) 79

    MSE vs log(Bandwidth) (Distn #11; Sample Size = 1000) 79

    Ntol(O) vs tolerance 91

    Ntol(l) vs tolerance 91

    MSE vs Bandwidth (Distn #2; Sample Size = 250) 103

    MSE vs Bandwidth (Distn #2; Sample Size = 1000) 103

    MSE vs log(Bandwidth) (Distn #2; Sample Size = 250) 104

    MSE vs log(Bandwidth) (Distn #2; Sample Size = 1000) 104

    MSE vs Bandwidth (Distn #2; Sample Size = 250) 105

    MSE vs Bandwidth (Distn #2; Sample Size = 1000) 105

    C2(T) vs T (Distn #1; Sample Size = 250) 117

    C2(T) vs T (Distn #1; Sample Size = 1000) 117

    Theta-hat densities (Distn #6; Squared 1st DerivativeSS=100; 100 Samples) 128

    Theta-hat densities (Distn #6; Squared 1st DerivativeSS=100; 100 Samples) 128

    Theta-hat densities (Distn #3; Squared 1st DerivativeSS=100; 100 Samples) : 129

    Bandwidth densities (Distn #3; Squared 1st DerivativeSS=100; 100 Samples) 129

    MSE vs Step (Distn #6; Sample Size = 100) 130

    MSE vs Step (Distn #6; Sample Size = 500) 130

    MSE vs Step (Distn #3; Sample Size = 100) 131

    vii

  • Chapter I: Introduction and Literature Review

    1. Introduction

    This research discusses a class of estimators of the functional Om = J(im )Yfor faprobability density function. The estimators discussed in this dissertation are of the form:

    (1.1)

    for some D which mayor may not be a function of the data. The goal of the research is to

    discuss the behavior of these estimators in a variety of settings and to provide guidelines for

    computing them.

    Chapter II discusses possible choices of D given in (1.1). D can be chosen to reduce bias

    simply by including the diagonal terms of the double sum thereby making Bnecessarily positive.

    In many settings, this estimator performs better than a "leave-out-the-diagonals" version with

    D = O. A possibly better estimator chosen to reduce MSE is also given although with reasonable

    sample sizes this did not perform as well as hoped.

    Chapter III discusses three strategies for reducing bias in B. The "leave-in-the-..

    diagonals" estimator is some improvement over the "no-diagonals" estimator because it reduces

    bias with some choice of bandwidth. More sophisticated strategies for reducing bias can improve

    the estimator even more (at least with a sufficiently large sample size). The three strategies

    explored are jackknifing, higher order kernels, and special choices of D (as in (1.1)). Some

    equivalences between these techniques are given.

  • ..

    Chapter IV discusses computation of the estimators using a strategy called "binning".

    Binning is used to reduce the number of computationally intensive kernel evaluations. An

    optimal binning strategy is given in this chapter.

    Chapter V discusses asymptotic calculations. Many of the guidelines presented in this

    dissertation and in many areas of density estimation rely on asymptotic results. This chapter

    provides some comparison of these results to exact results. A tool used to do this is exact

    calculations based on Gaussian mixtures. Theorems required to do these calculations are

    presented here.

    Chapter VI provides some guidelines about the relative difficulty of estimating (Jm

    compared to (Jm+k. Both asymptotic and exact calculations are used to show that it gets much

    more difficult to estimate (Jm for greater m. It is also shown that the asymptotic calculations

    become increasingly misleading in this context as m increases.

    Chapter VII discusses a "one-step" estimator for choosing the bandwidth of Bm The

    estimator is based on estimating (Jm+l first and using this estimate to find an "optimal"

    bandwidth for Bm . The results given here suggest that there may be some advantages to using

    the one-step estimator in that it might be easier to choose a near-optimal bandwidth. The

    penalty for using this estimator is that it has a greater minimum MSE than the standard

    estimator.

    Chapter VIII discusses a generalization of the one-step estimator called the "k-step"

    estimator. In the "k-step" estimator all the functionals (Jm+!' .., (Jm+k are estimated and used

    to provide a bandwidth for Bm . It is shown there that with a finite sample size and a "plug-in"

    bandwidth, MSE(Bm ) decreases for the first couple of steps and ultimately increases without

    bound.

    2

  • The remai