75
On the Estimation of the Conditional - Median الشرطى تقدير الوسيط حولMuna Yaqoub Mohamed Quraiq Supervised by Prof. Raid B. Salha Prof. of Mathematical Statistics A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Mathmatics March / 2017 الج ـ بمع ـــــــــس ـت ا ـــــمي ــ ت غ ــ زةعليبمي والدراسبث العل شئون البحث ال ك ـ ـ ليـ ـــــــــــــ ــــ تعــلــــــــــــــــــوم الـــــــبضيـــــــــــــــبثــــر الري مبجستيـبضــــــــــــــــي احصــــــــــــــــبء ريThe Islamic UniversityGaza Research and Postgraduate Affairs Faculty of Science Master of Mathematics Mathematical Statistics

On the Estimation of the Conditional Median · 2017. 4. 18. · 1- median. The sample median is de ned as the middle value of a set of ranked data, i.e. the sample median splits the

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • On the Estimation of the Conditional -

    Median

    حول تقدير الوسيط الشرطى

    Muna Yaqoub Mohamed Quraiq

    Supervised by

    Prof. Raid B. Salha

    Prof. of Mathematical Statistics

    A thesis submitted in partial fulfillment of the requirements for the degree of

    Master of Science in Mathmatics

    March / 2017

    زةــغ –تــالميــــــت اإلســـــــــبمعـالج

    شئون البحث العلمي والدراسبث العليب

    العــلــــــــــــــــــومت ـــــــــــــــــليـــك

    مبجستيــــر الريـــــــبضيـــــــــــــــبث

    احصــــــــــــــــبء ريـبضــــــــــــــــي

    The Islamic University–Gaza

    Research and Postgraduate Affairs

    Faculty of Science

    Master of Mathematics

    Mathematical Statistics

  • i

    رإقــــــــــــــرا

    أنا الموقع أدناه مقدم الرسالة التي تحمل العنوان:

    On the Estimation of the Conditional -Median

    حول تقدير الوسيط الشرطى

    الرسالة إنما ىو نتاج جيدي الخاص، باستثناء ما تمت اإلشارة أقر بأن ما اشتممت عميو ىذه

    لنيل درجة أو االخرين إليو حيثما ورد، وأن ىذه الرسالة ككل أو أي جزء منيا لم يقدم من قبل

    لقب عممي أو بحثي لدى أي مؤسسة تعميمية أو بحثية أخرى.

    Declaration

    I understand the nature of plagiarism, and I am aware of the University’s

    policy on this.

    The work provided in this thesis, unless otherwise referenced, is the

    researcher's own work, and has not been submitted by others elsewhere

    for any other degree or qualification.

    :Student's name منى يعقوب محمد قريق اسم الطالب:

    :Signature منى قريق التوقيع:

    :Date 4/3/2017 التاريخ:

  • ii

    Abstract

    In this thesis, we study the conditional -median which plays an important role in the nonparametric prediction.

    An estimator of the -median based on the Nadaraya-Watson estimator of the conditional distribution function has been studied and its

    consistency has been driven under some conditions.

    Another estimator based on the double kernel estimator of the

    conditional distribution function has been proposed to get a smoother

    estimator than that based on the Nadaraya –Watson estimator .

    The performance of the two estimators has been tested using two

    bivariate time series data. The comparison indicated that the double

    kernel estimator gives smooth prediction curves while the Nadaraya-

    Watson estimator gives a smaller mean square error.

  • iii

    الملخص

    الشرطى الذى يمعب دور ميم في التنبؤ دراسة الوسيط الدراسة فى ىذا البحث تيتم ب

    .الالمعممى

    لدالة التوزيع الشرطية واتسون –عمى مقدر ندارايا باالعتماد وسيط تم دراسة مقدر لمولقد

    .وتم اثبات خاصية اتساقو تحت شروط معينة

    مقدر آخر تم اقتراحو باالعتماد عمى مقدر ثنائى النواة لدالة التوزيع الشرطية لمحصول عمى

    . واتسون –مقدر أكثر نعومة من ذلك الذى يعتمد عمى مقدر ندارايا

    المقارنة أشارت الى أن المقدر ثنائيتن.أداء المقدرين تم اختباره باستخدام متسمسمتين زمنيتين

    واتسن أعطى مقدرات ذات خطأ –أعطى منحنيات لمتنبؤ ممساء بينما مقدر ندارايا النواة ثنائى

    .قميل

  • iv

    Dedication

    To My Parents.

    To My husband.

    To My brothers and sisters.

    To My Friends.

    To all Knowledge Seekers.

  • v

    Acknowledgment

    First, I thank Allah for the many blessings especially for

    success blessing. I would like to thank several people for

    helping me with this work. I would like to thank my professor

    Dr. Raid Salha for his support, patience, encouragement, and

    spending long hours helping me, Without his support could

    have never accomplished this thesis. I would like to thank the

    whole group of Department of Mathematics who gave me great

    support and helped me in my work. I would like to thank all my

    colleagues and friends at the Department of Mathematics for

    encouraging me. I want to express my great gratitude to my

    entire family for your continuous support and encouragement

    during my whole life and especially during this work. And

    special thanks to my father, mother, husband and my children.

  • vi

    Table of Contents

    Declaration ......................................................................................................................... i

    Abstract in English ............................................................................................................ ii

    Abstract in Arabic ............................................................................................................ iii

    Dedication ........................................................................................................................ iv

    Acknowledgment .............................................................................................................. v

    Table of Contents ............................................................................................................. vi

    List of Tables .................................................................................................................. vii

    List of Figures ................................................................................................................ viii

    List of Abbreviations ....................................................................................................... ix

    List of Symbols ................................................................................................................. x

    Chapter 1 Preliminaries ................................................................................................. 4

    1.1 Basic Definitions and Notations ............................................................................. 4

    1.2 Kernel Density Estimation of the Pdf ................................................................... 10

    1.3 Properties of the Kernel Estimator ........................................................................ 12

    1.4 Optimal Bandwidth ............................................................................................... 15

    Chapter 2 Nadaraya-Watson Estimator of L1-Median ............................................ 19

    2.1 Important of Median ............................................................................................. 20

    2.2 The Conditional L1-Median ................................................................................. 21

    2.3 The Nadaraya-Watson Estimator ......................................................................... 25

    2.4 The Consistency of the Nadaraya-Watson Estimator of the L1-Median ............ 28

    2.4.1 Assumotion and Main Results……………………………………………29

    2.4.2 Preliminary Lemmas……………………………...……………………...30

    2.4.3 Proof of the Theorem in Section 2.4.1………………………………….…36

    Chapter 3 The Double Kernel Estimator .................................................................. 39

    3.1 The Double Kernel Estimator ............................................................................. 40

    3.2 Consistency of the Double Kernel Estimator ...................................................... 41

    Chapter 4 Applications ................................................................................................. 44

    4.1 Application 1 ......................................................................................................... 45

    4.2 Application …...………………………………………………………………50

    4.3 Discussion and conclusion ...............................................................................56

    The Reference List ........................................................................................................ 58

  • vii

    List of Tables

    Table (4.1):Summary statistics of IBM and SP500 data................................................ 45

    Table (4.2): The DK and NW median estimators for the IBM data .............................. 48

    Table (4.3): The DK and NW median estimators for the SP500 data ........................... 49

    Table (4.4): MSE of the DK and the NW estimators for Application 1 ........................ 50

    Table (4.5):Summary statistics of Cisco and Intel data ................................................. 51

    Table (4.6): The DK and NW median estimators for the Cisco data ............................. 54

    Table (4.7): The DK and NW median estimators for the Intel data............................... 55

    Table (4.8): MSE of the DK and the NW estimators for Application 2 ........................ 56

    file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643456file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643457file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643457file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643457file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643456file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643457file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643457file:///C:/Users/محمد/Downloads/lالصفحة-الثانية-لللاخر%20(6).docx%23_Toc445643457

  • viii

    List of Figures

    Figure (1.1): Kernel density estimation based on 7 points ([11]). ............................. 11

    Figure (1.2): Kernel density estimates of the Ethanol data )[11]) ............................. 12

    Figure (1.3): Kernel density estimates based on differnet bandwidths h=0.25 (solid

    curves), h = 0.5 ) dashed curves), h =0.75)dotted curve) )[11]) .................................. 18

    Figure (4.1): Time plot of the rescaled IBM stock. ................................................... 46

    Figure (4.2): Time plot of the rescaled SP500 stock. ................................................ 46

    Figure (4.3): Scatterplot of the rescaled IBM stock versus the rescaled SP500 stock.

    ................................................................................................................................... 47

    Figure (4.4): Scatterplot of the squares of the rescaled IBM stock versus the squares

    of the rescaled SP500 stock. ...................................................................................... 47

    Figure (4.5): Graph of the NW and the DK estimators for IBM . ........................... 48

    Figure (4.6): Graph of the NW and the DK estimators for SP500. .......................... 49

    Figure (4.7): Time plot of the rescaled Cisco stock. .................................................. 52

    Figure (4.8): Time plot of the rescaled Intel stock. ................................................... 52

    Figure (4.9): Scatterplot of the rescaled Cisco stock versus the rescaled Intel stock.

    .................................................................................................................................... 53

    Figure (4.10): Scatterplot of the squares of the rescaled Cisco stock versus the

    squares of the rescaled Intel stock. ............................................................................. 53

    Figure (4.11): Graph the NW and the DK estimators for Cisco ............................... 54

    Figure (4.12): Graph of the NW and the DK estimators for Intel………………… 55

  • ix

    List of Abbreviations

    arg Argument

    Cdf Cumulative distribution function

    Cov Covariance

    C.I. Confidence Interval

    i.i.d. Independent and identically distributed

    Inf Infinum

    Min Minimum

    MSE Mean Square Error

    o Small oh

    O Big oh

    Pdf Probability density function

    Var. Variance

    N-W Nadaraya Watson

    DK Double Kernel

    w.p With probability

  • List of Symbols

    Symbol Description

    X univariate random variable, X ∈ R.

    X multivariate random variable, X ∈ Rd, d ≥ 2.

    f(x) Probability density function.

    F (x) Cumulative distribution function.

    Kh Scaled univariate kernel function.

    f̂ kernel estimator for the function f .

    P probability set function.

    h the bandwidth smoothing parameter.

    µ the mean.

    σ2 the variance.

    E the expectation.

    R(K)∫K2(x)dx.

    IA the indicator function.

    | · | the absolute value function.

    R the set of real numbers.∏product.

    N(0, 1) Standard normal distribution .

    µj(K) j-th moment of a kernel K, or Gaussian distribution.

    K(.) the kernel function.p↪→ converge in probability.d↪→ converge in distribution.

    wi weight function.

    ||.|| the lp−norm function.

    ||.||p,α the norm like function.

    x

  • Preface

    The probability density function is a fundamental concept in statistics. Consider any

    random variable X that has probability density function (pdf) f(x), x ∈ R. When

    we have an observed data, pdf of this data may be unknown. So it is useful to find

    the pdf or to estimate it. There are many methods for statistical estimation of the

    density function, these methods are divided into two kinds, parametric estimation

    and nonparametric estimation. It is helpful to distinguish between the parametric

    estimation and nonparametric estimation. The parametric estimation assumes that,

    the sample which is studied has known distribution such as Gaussian and Gamma

    distribution and then parametric estimation can be used to estimate the missed

    parameters for the distribution by using moment, maximum likelihood estimators,

    bayes estimators, chi square estimators...etc. For example the parametric density

    function assume that, if the data has normal distribution with mean and variance

    σ2, we can estimate these parameters µ and σ2 and substitute in the normal distri-

    bution formula, Then we obtain the estimation density function denoted by f̂(x).

    On the other hand, nonparametric estimation is a very useful way for dealing with

    data from unknown distribution. It is used for estimating the density function to

    choose the suitable model for a given data. Examples of nonparametric estimators,

    histogram estimator, the naive estimator, kernel estimator, Nadaraya-Watson esti-

    mator, kernel nearest neighbor (KNN),...etc; For more details see [15], [21] and [25].

    In this thesis, we will study the kernel estimation of the conditional L1- median.

    The sample median is defined as the middle value of a set of ranked data, i.e. the

    sample median splits the data into two parts with an equal number of data points

    in each. Usually, the sample median is taken as an estimator of the population

    median m, a quantity which splits the distribution into two halves in the sense that

    P (y ≤ m) = p(y ≥ m) = 12. Multivariate time series arise when several time series

    1

  • are observed simultaneously over time. A multivariate time series consists of mul-

    tiple single series referred to as components, see [26] and [8]. When the individual

    series are related to each other, there is a need for jointly analyzing the series rather

    than treating each one separately. By so doing, one hopes to improve the accuracy

    of the predictions by utilizing the additional information available from the related

    series in the predictions of each other. Several extensions of the concept of univari-

    ate median were introduced in the literature with application in different fields of

    statistics. One of them, the so called L1- median.

    A nonparametric estimator is proposed for estimating the L1- median for multivari-

    ate conditional distribution when the covariates take values in an infinite dimensional

    space. The multivariate case is more appropriate to predict the components of a

    vector of random variables simultaneously rather than predicting each of them sep-

    arately.

    In this thesis, we will study the conditional L1 - median which plays an important

    role in the nonparametric prediction. The main goal of this thesis is to modify

    the estimator of the L1- median using the double kernel estimator rather than the

    Nadaraya-Watson estimator. In our study, we will discuss the consistency of the

    proposed estimator. Also, applications using real data, to test the performance of

    the L1 - median estimator will be given.

    This thesis consists of four chapters which are organized as follows:

    Chapter 1

    This chapter contains notations, some basic definitions, and facts that we need in

    the thesis. Also, we introduce the kernel density estimation of the pdf, properties

    of the kernel estimator.

    Chapter 2

    In this chapter, we introduce the L1 median, and we use the Nadaraya-Watson

    (NW) estimator of the conditional cdf to estimate it. we will study the asymptotic

    consistently properties of the NW estimator.

    Chapter 3

    In this chapter, we use the double kernel (DK) estimator of the cdf rather than the

    NW estimator.

    Chapter 4

    2

  • In this chapter, we will practically compare between the two estimators, the NW

    and DK estimators and conclusion.

    3

  • Chapter 1

    Preliminaries

  • Chapter 1

    Preliminaries

    This chapter contains notations, some basic definitions, and facts that we need in

    the remaining of this thesis, it is organized as follows: In Section 1.1, we introduce

    some basic definitions and notations related to the areas of this thesis. In Section

    1.2, we introduce the kernel density estimator of the probability density function

    (pdf). In the next section, we summarize some properties of the kernels. Finally, in

    Section 1.4, we present the problems of the optimal bandwidth selection.

    1.1 Basic Definitions and Notations

    In this section, we will introduce some basic definitions and theorems that will be

    help in the remanning of this thesis.

    Consider any random variable X that has pdf f . Specifying the function f gives a

    natural description of the distribution of X, and allows probabilities associated with

    X to be found from the relation.

    P (a < X < b) =

    ∫ ba

    f(x)dx

    for any real constants a and b with a < b, if the observed data are drawn from a

    distribution with unknown pdf, then the construction of an estimator to the un-

    known density function is called a density estimation. Density estimation has

    experienced a wide explosion of interest over the last 40 years. It has been ap-

    4

  • plied in many fields, including archeology chemistry, banking, climatology, genetics,

    economics, hydrology and physiology. Density estimates give valuable indication of

    such features as skewness and multimodality in the data in some cases they will

    yield conclusions while in other all they do is to point the way to farther analysis

    and data collection.

    Definition 1.1.1. (Estimator) [14] An estimator is any statistic from the sample

    data which is used to give information about an unknown parameter in the popula-

    tion.

    For example, the sample mean is an estimator of the population mean. Estimators

    of population parameters are sometimes distinguished from the true value by using

    the symbol hat . For example the normal distribution has two parameters, the mean

    µ and the standard deviation σ. There estimators are denoted by µ̂ and σ̂.

    Definition 1.1.2. [14] Let X be a random variable with pdf with parameter θ. Let

    X1,X2,... ,Xn be a random sample from the distribution of X and let θ̂ denotes an

    estimator of θ. We say θ̂ is an unbiased estimator of θ if

    E(θ̂) = θ.

    If θ̂ is not unbiased, we say that θ̂ is a biased estimator of θ.

    Example 1.1.1. S2 is unbiased estimator for σ2.

    5

  • Proof.

    E(S2) = E(1

    n− 1∑

    (Xi − x̄)2)

    =1

    n− 1E(∑

    (Xi − x̄)2)

    =1

    n− 1E(∑

    (Xi − µ+ µ− x̄)2)

    =1

    n− 1E[∑

    (Xi − µ)2 − 2(x̄− µ)∑

    (Xi − µ) +∑

    (x̄− µ)2]

    =1

    n− 1[nE(Xi − µ)2 − 2nE(x̄− µ)2 + nE(x̄− µ)2]

    =1

    n− 1[nσ2 − nE(x̄− µ)2]

    =1

    n− 1[nσ2 − nσ

    2

    n]

    =σ2

    n− 1[n− 1] = σ2

    So, S2 is unbiased estimator for σ2.

    Definition 1.1.3. [14] If θ̂ is an unbiased estimator of θ and

    V ar(θ̂) =1

    nE[(

    ∂lnf(X)∂θ

    )]2 (1.1.1)then θ̂ is called a minimum variance unbiased estimator (efficient) of θ.

    Example 1.1.2. x̄ has a minimum variance unbiased estimator for µ in normal

    population.

    Proof.

    V ar(x̄) =σ2

    n

    f(x) =1

    σ√

    2πexp−

    12(x−µσ

    )2

    ln f(x) = −lnσ − 12ln2π − 1

    2(x− µσ

    )2

    ∂ ln f(x)

    ∂µ= −1

    22(x− µσ

    )(−1σ

    ) =x− µσ2

    E(x− µσ2

    )2 =1

    σ4E(x− µ)2

    =σ2

    σ4=

    1

    σ2

    1

    nE( ∂∂µ

    ln f(x))2=

    1

    n 1σ2

    =σ2

    n

    6

  • Definition 1.1.4. [14] The statistic θ̂ is a consistent estimator of the parameter

    θ if and only if for each c > 0

    limn−→∞

    P (|θ̂ − θ| < c) = 1. (1.1.2)

    Example 1.1.3. x̄ is a consistent estimator for µ. in normal population.

    Theorem 1.1.1. [14] If θ̂ is an unbiased estimator of θ and V ar(θ̂)→ 0, as n→∞,

    then θ̂ is a consistent estimator of θ.

    Definition 1.1.5. [14]. The statistic θ̂ is a sufficient estimator of the parameter

    θ if and only if for each value of θ̂ the conditional probability distribution or density

    of the random sample X1,X2, ... ,Xn given θ̂ = θ is independent of θ.

    Example 1.1.4. x̄ is a sufficient estimator for µ in normal population.

    There are two types of density estimation:

    • Parametric Estimation.

    • Nonparametric Estimation.

    Parametric Estimation.

    The parametric approach for estimating f(x) is to assume that f(x) is a member

    of some parametric family of distributions, e.g. N(x̄, S2), and then to estimate the

    parameters of the assumed distribution from the data. For example, fitting a normal

    distribution leads to the estimator

    fn(x) =1√

    2πSnexp(

    (x− x̄)2

    2S2), x ∈ R,

    where,

    x̄ =1

    n

    n∑i=1

    xi,

    and

    S2 =1

    n− 1

    n∑i=1

    (xi − x̄)2.

    This approach has advantages as long as the distributional assumption is correct,

    or at least is not seriously wrong. It is easy to apply and it yields (relatively) stable

    7

  • estimates. The main disadvantage of the parametric approach is lack of exibility.

    Each parametric family of distributions imposes restrictions on the shapes that f(x)

    can have. For example the density function of the normal distribution is symmet-

    rical, bell-shaped, and therefore is unsuitable for representing skewed densities or

    bimodal densities.

    Nonparametric Estimation.

    If the data that we study come from unknown distribution i.e. The density func-

    tion f(x) is unknown, then we must estimate the density function. This estimation

    is called a nonparametric estimation, there are many nonparametric statistical ob-

    jects of potential interest, including density functions (uni-variate and multivariate),

    density derivatives, conditional density functions, conditional distribution functions,

    regression functions, median functions, quantile functions, and variance functions.

    Many nonparametric problems are generalizations of uni-variate density estimation.

    For obtaining a nonparametric estimation of a pdf, there many methods :

    1. Histogram.

    2. The Naive Estimator.

    3. Kernel Density Estimation .

    Definition 1.1.6. (Indicator function) [23] If A is any set, we define the indi-

    cator function IA of the set A to be the function given by

    IA =

    1 if x∈A,0 if x/∈A.Definition 1.1.7. (Converge in Probability) [16]

    Let Xn be a sequence of random variables and let X be a random variable defined on

    a sample space. We say Xn converges in probability to X if for all � > 0, we have

    limn−→∞

    P [|Xn −X|≥�] = 0, (1.1.3)

    or equivalently,

    limn−→∞

    P [|Xn −X| < �] = 1. (1.1.4)

    8

  • If so, we write Xnp−→ X

    Definition 1.1.8. (Converge in Distribution)[16]

    Let Xn be a sequence of random variables and let X be a random variable. Let FXn

    and FX be, respectively, the cdfs of Xn and X. Let C(FX) denote the set of all points

    where FX is continuous. We say that Xn converge in distribution to X if

    limn−→∞

    FXn(x) = FX(x), for all x ∈ C(FX). (1.1.5)

    We denote this convergence by

    Xnd−→ X

    Definition 1.1.9. Converge with probability 1 [16]

    Let {Xn}n=∞n=1 be a sequence of random variables on (Ω, L, P ). We say that Xn con-

    verge almost surly to a random variable X (Xna.s−→ X) or Converge with probability

    1 to X or Xn converge strongly to X if and only if

    P ({w : Xn(w)→ X(w), as n→∞}) = 1,

    or equivalent, for all � > 0, there exists N ∈ N

    P (|Xn −X| < �, n ≥ N) = 1.

    Definition 1.1.10. (Order Notation O And o) [27]

    Let an and bn each be sequences of real numbers. Then we say that an is of big

    order bn or (an is big oh bn) as n → ∞ and write an = O(bn) as n → ∞, if and

    only if

    lim supn→∞

    |anbn|

  • Theorem 1.1.2. (Taylor’s Theorem) [27]

    Suppose that f is real-valued function defined on R and let x ∈ R. Assume that

    f has p continuous derivatives in an interval (x − δ, x + δ) for some δ > 0. Then

    for any sequence αn converging to zero.

    f(x+ αn) =

    p∑j=0

    (αjnj!

    )f j(x) + o(αpn)

    1.2 Kernel Density Estimation of the Pdf

    We present the kernel density estimation of the pdf and review some important

    definitions and aspects in this area. In statistics, kernel Density Estimation

    (KDE) is a non-parametric way to estimate the (pdf) of a random variable. Kernel

    density estimation is a fundamental data smoothing problem where inferences about

    the population are made, based on a finite data sample.

    Definition 1.2.1. (Kernel Estimator of a Probability Density Function)

    [25] Suppose that X1, ..., Xn is a random sample of data from an unknown continuous

    distribution with pdf f(x) and cumulative distribution function (cdf) F (x), the kernel

    estimator of a probability density function is defined as

    f̂(x) =1

    nh

    n∑i=1

    K(x−Xih

    ), (1.2.1)

    where the bandwidth h = hn is a sequence of positive numbers that converging to

    zero and K(.) is the kernel function considers to be both symmetric and satisfies∫ ∞−∞

    K(x)dx = 1

    where

    µj(K) =

    ∫ ∞−∞

    yjK(y)dy. (1.2.2)

    The density estimates derived using such kernels can fail to be probability densities,

    because they can be negative for some values of x. Typically, K is chosen to be a

    symmetric pdf. There is a large body of literature on choosing K and h well, where

    ”well” means that the estimate converges asymptotically as rapidly as possible in

    some suitable norm on pdf.

    10

  • A slightly more compact formula for the kernel estimator can be obtained by

    introducing the recalling notation Kh(u) = h−1K(u/h). This allows us to write

    f̂(x) = n−1n∑i=1

    Kh(x−Xi).

    Figure 1.1: Kernel density estimation based on 7 points([11])

    From Figure (1.1), we have

    (1) The shape of bump is defined the kernel function.

    (2) The shape of the bump is determined by a bandwidth h.

    That is, the value of the kernel estimate at the point x is the average of the n kernel

    ordinates at this point.

    11

  • Figure 1.2: Kernel density estimates of the Ethanol data ([11])

    Figure 1.2 shows the kernel density estimates of the Ethanol data based on the

    same bandwidth hn = 0.2, but using different kernels. The solid curve stands for

    the triangular kernel, the dashed curve for the uniform kernel, and the dotted curve

    for the normal kernel.

    1.3 Properties of the Kernel Estimator

    In this section, we will introduce some important properties of the kernel. A ker-

    nel is a piecewise continuous function, symmetric around zero, even function and

    integrating to one, i.e.

    K(x) = K(−x),∫ ∞−∞

    K(x)dx = 1.

    The kernel function need not have bounded support and in most applications K

    is a positive pdf.

    Definition 1.3.1. [5] A kernel function K is said to be of order p, if its first nonzero

    moment is µp 6= 0, i.e. if

    µj(K) = 0, j = 1, 2, ..., p− 1;µp(K) 6= 0;

    12

  • where

    µj(K) =

    ∫ ∞−∞

    yjK(y)dy.

    we consider the following conditions:

    • The unknown density function f(x) has continuous second derivative f ′′(x)

    • The bandwidth h = hn satisfies limn→∞ h = 0, and limn→∞ nh =∞

    • The kernel K is a bounded pdf of order 2 and symmetric about the origin.∫∞−∞ zK(z)dz = 0 and

    ∫∞−∞ z

    2K(z)dz 6= 0

  • (Now expanding f(x− hz) in a Taylor series about x to obtain

    f(x− hz) = f(x)− hzf ′(x) + 12h2z2f ′′(x) + o(h2)

    E(f̂(x)) =

    ∫K(z)[f(x)− hzf ′(x) + 1

    2h2z2f ′′(x)]dz + o(h2)

    = f(x)

    ∫K(z)dz − hf ′(x)

    ∫zK(z)dz +

    h2

    2f ′′(x)

    ∫z2K(z)dz + o(h2).

    This leads to:

    Ef̂(x) = f(x) +1

    2h2f ′′(x)

    ∫z2K(z)dz + o(h2),

    where we have used∫K(z)dz = 1,

    ∫zK(z)dz = 0, and

    ∫z2K(z)dz

  • let z = x−yh

    , then we get

    V ar(f̂(x)) =1

    nh2[h

    ∫K2(z)f(x− zh))dz)− (h

    ∫K(z)f(x− zh)dz)2] (1.3.3)

    Using Taylor series for f(x− zh) as before, we have

    V ar(f̂(x)) =1

    nh2[(h

    ∫(f(x)− hzf ′(x))K2(z)dz)− o(h2)

    =1

    nh[(

    ∫(f(x)− hzf ′(x))K2(z)dz)− o(h)]

    = (nh)−1f(x)

    ∫K2(z)dz + o((nh)−1).

    By assumption, we have the result.

    From the previous nothing above lemmas, we have some properties about the bias

    and the variance:

    1. The bias is of order h2, which implies that f̂(x) is an asymptotically unbiased

    estimator.

    2. The bias is large, whenever the absolute value of the second derivative |f (2)(x)|

    is large. This occurs for several densities at peaks where the bias is negative, and

    valleys, where the bias is positive.

    3. The variance is of order (nh)−1, which means that the variance converges to zero

    by Condition (ii).

    1.4 Optimal Bandwidth

    The problem of selection the bandwidth is very important in kernel density esti-

    mation. Choice of appropriate bandwidth is critical to the performance of most

    nonparametric density estimators. When the bandwidth is very small, the estimate

    will be very close to the original data. The estimate will be almost unbiased, but it

    will have large variation under repeated sampling. If the bandwidth is very large,

    the estimate will be very smooth, lying close to the mean of all the data. Such an

    estimate will have small variance, but it will be highly biased. There are many rules

    for bandwidth selection, for example Normal Scale Rules, Over-smoothed bandwidth

    selection rules, Least Squares Cross-Validation, Biased Cross-Validation, Estimation

    15

  • of density function-als and Plug-In Bandwidth Selection. For more details see [25]

    and [27].

    We shall use two types of errors criteria. The mean square error (MSE) is used

    to measure the error when estimating the density function at a single point. It is

    defined by

    MSE{fn(x)} = E{fn(x)− f(x)}2. (1.4.1)

    We can write the MSE as a sum of the squared bias and the variance at x,

    MSE(fn(x)) = {Efn(x)− f(x)}2 + V ar(fn(x)). (1.4.2)

    A second type of criteria measures the error when estimating the density over the

    whole real line. The most well known of this type is the mean integral square error

    (MISE) introduced by [22]. The MISE is defined as

    MISE(fn) = E

    ∫ ∞−∞{fn(x)− f(x)}2dx. (1.4.3)

    By changing the order of integration we have,

    MISE(fn) =

    ∫ ∞−∞

    MSE{fn(x)}dx

    =

    ∫ ∞−∞{Efn(x)− f(x)}2dx+

    ∫ ∞−∞

    V ar(fn(x))dx. (1.4.4)

    Equation (1.4.4) gives the MISE as a sum of the integral squared bias and the

    integral variance. Substituting (1.3.1) and (1.3.2) we conclude that

    MISE(fn) = AMISE(fn) + o{h4 + (nh)−1}, (1.4.5)

    where A MISE is the asymptotic mean integral squared error given by

    AMISE(fn) =1

    4h4µ2(K)

    2R(f (2)) + (nh)−1R(K), (1.4.6)

    see [27].

    The natural way for choosing h is to plot out several curves and choose the esti-

    mate that best matches one prior (subjective) ideas. However, this method is not

    practical in pattern recognition since we typically have high-dimensional data.

    16

  • Assume a standard density function and find the value of the bandwidth that min-

    imizes the integral of the square error (MISE)

    hMISE = arg minE[

    ∫(fn(x)− f(x))2dx]. (1.4.7)

    If we assume that the true distribution is Gaussian and we use a Gaussian kernel,

    the bandwidth h is computed using the following equation from [25].

    h∗ = 1.06SN−15 (1.4.8)

    where S is the sample standard deviation and N is the number of training examples.

    By differentiating (1.4.6) with respect to h we can find the optimal bandwidth with

    respect to AMISE criterion. This yields, the optimal bandwidth is given by

    hop = [R(K)

    µ2(K)2R(f 2)n]. (1.4.9)

    Therefore if we substitute (1.4.9) into (1.4.6), we obtain the smallest value of A

    MISE for estimating f using the kernel K.

    infh>0

    AMISE{fn} =5

    4{µ2K2R(K)4R(f ′′)}

    15n−

    45 .

    Notice that in 1.4.9 the optimal bandwidth depends on the unknown density being

    estimated, so we can not use (1.4.9) directly to find the optimal bandwidth hopt.

    Also from 1.4.9 we can conclude the following useful conclusions.

    • The optimal bandwidth will converge to zero as the sample size increases, but

    at very slow rate.

    • The optimal bandwidth is inversely proportional to R(f ′′) 15 . Since R(f ′′) mea-

    sures the curvature of f, this means that for a density function with little

    curvature, the optimal bandwidth will be large. Conversely, if the density

    function has a large curvature, the optimal bandwidth will be small.

    17

  • Figure 1.3: Kernel density estimates based on different bandwidths h =0 .25 (solid

    curve), h =0 .5 (dashed curve), h =0 .75 (dotted curve)([11])

    Summary

    In this chapter, we introduced some basic definitions and theorems that will need it

    in this thesis, and we studied definition of estimation, its types, and precis of non-

    parametric estimation and its common methods, then we presented kernel density

    estimation of the pdf and we presented properties of the kernel estimator. In the

    next chapter, we will study the Nadaraya-Watson estimator of L1 −median.

    18

  • Chapter 2

    Nadaraya Watson

    Estimator of 𝑳𝟏 −Median

  • Chapter 2

    Nadaraya Watson Estimator of L1

    - Median

    Introduction

    Conditional distribution functions underline many popular statistical object of in-

    terest. They are rarely modeled directly in parametric setting and have perhaps

    received even less attention in kernel setting. Nevertheless, as will be seen, they

    are extremely useful for range of tasks, whether directly estimation the conditional

    distribution function [6], or perhaps molding conditional quantiles . The conditional

    median depends directly on the conditional distribution function. Indeed, estimating

    the conditional distribution is actually much more informative, since it allows us not

    only to recalculate the expected value E(Y |X) and the variance σ2(Y |X), but also

    to provide the general shape of the conditional distribution. In this context several

    nonparametric methods can be applicable for estimating the conditional distribution

    function based on data (X1, Y1),...,(Xn, Yn). A class of the kernel-type estimators is

    called the Nadaraya-Watson estimator which is one of the most widely known and

    used for estimating the conditional distribution function. Conditional distribution

    estimation was introduced by [22]. A bias correction was proposed by [17]. [12]

    proposed a direct estimator on Local polynomial estimation. The Nadaraya-Watson

    (NW) estimator is created by independent researchers [28] and [19]. In this chap-

    ter, we will investigate a nonparametric method for estimating the conditional L1

    median which is called the Nadaraya-Watson estimator.

    19

  • In This chapter we will present the NW estimator in order to estimate the condi-

    tional L1 - median. In Section 2.1, we introduce the important of the median, and

    giving historical notes. In Section 2.2, we will present conditional L1 - median. In

    Section 2.3, the NW estimator of the conditional L1 median will be studied. Then

    in Section 2.4, the asymptotic properties of the NW will be discussed and derived.

    2.1 Important of Median

    The median is the value separating the higher half of a data sample, a population, or

    a probability distribution, from the lower half. In simple terms, it may be thought

    of as the ”middle” value of a data set. For example, in the data set 1, 3, 3, 6, 7, 8,

    9, the median is 6, the fourth number in the sample. The median is a commonly

    used measure of the properties of a data set in statistics and probability theory.

    The basic advantage of the median in describing data compared to the mean (often

    simply described as the ”average”) is that it is not skewed so much by extremely large

    or small values, and so it may give a better idea of a ’typical’ value. For example, in

    understanding statistics like household income or assets which vary greatly, a mean

    may be skewed by a small number of extremely high or low values. Median income,

    for example, may be a better way to suggest what a ’typical’ income is.

    Because of this, the median is of central importance in robust statistics, as it is

    the most resistant statistic, having a breakdown point of 50: so long as no more

    than half the data are contaminated, the median will not give an arbitrarily large

    or small result.

    The median is another way to measure the center of a numerical data set.

    A statistic median is much like the median of an interstate highway on many high-

    way, the median is the middle and the equal number of lans lay on either side of it.

    In a numerical data set, the median is the point of which there are an equal number

    of points whose value is above and below the median value,thus the median is truly

    the middle of the data set.

    The mean and the median are the used measure of locations by the investigators,

    20

  • because those measure are easy the understand,most inrestigators describe the lo-

    cation of the data by the arithmetic mean, except when the data are highly skewed,

    higly kurtotic,or contaminoted with outliers, in which case the median is often used,

    in the case of asymmetric data, the median is almost always close the mode, and is

    many cases agood approach would be the estimate the mode itself.See [20]

    Advantages of Median

    1. It is very simple to understand and easy to calculate.

    2. It is a special a verge used in qualities phenomena like intelligence or beauty

    which are not quantified but rank are given.

    Disadvantages of Median

    1. It is less representative.

    2. Take along times to calculates for avery large set of data.

    2.2 The Conditional L1- Median

    In this section, we will study the conditional median and we will define the median

    as the solution to the problem of minimization a sum of absolute residuals, and we

    will study the conditional L1− Median.

    Definition 2.2.1. Let Y1, Y2, ......., Yn be a random sample from a distribution with

    pdf f(y) and cdf F (y) then the median of the distribution is defined by

    θ = inf{y : F (y) ≥ 0.5}

    Example 2.2.1. To define the median as minimization problem we will solve:

    arg minθ∈R

    n∑i=1

    |Yi − θ|.

    21

  • Proof. [7]

    E|Yi − θ| =∫ ∞−∞|Yi − θ|fY (y)dy

    =

    ∫ θ−∞−(Yi − θ)f(y)dy +

    ∫ ∞θ

    (Yi − θ)f(y)dy

    dE

    dθ|Yi − θ| =

    ∫ θ−∞

    f(y)dy −∫ ∞θ

    f(y)dy = 0

    then ∫ θ−∞

    f(y)dy =

    ∫ ∞θ

    f(y)dy

    The definition of the median is p(Y ≤ θ) = p(Y ≥ θ) = 0.50.

    The symmetry of the piecewise linear absolute value function implies that the min-

    imization of the sum of absolute residuals equates the number of the positive and

    negative residuals.

    The median regression estimates the conditional median of Y given X = x and

    corresponds to the minimization of E(|Y − θ||X = x) over θ. The associated loss

    function is r(u) = |u|. We can take the loss function to be ρ0.5(u) = 0.5|u|. because

    the half positive equal half negative.

    Definition 2.2.2. (The Conditional Median) [1]

    Let (X1, Y1),(X2, Y2), .... ,(Xn, Yn) be a random sample from a distribution with

    a conditional cdf F (y|x), then the conditional median M(x) is defined by

    M(x) = inf{y : F (y|x) ≥ 0.5}

    .

    Definition 2.2.3. Suppose we have a complex vector space V. A norm is a function

    f : V → R which satisfies:

    1. f(x) ≥ 0 for all x ∈ V

    2. f(x+ y) ≤ f(x) + f(y) for all x, y ∈ V

    3. f(λx) = |λ|f(x) for all λ ∈ C and x ∈ V

    22

  • 4. f(x) = 0 if and only if x = 0

    We usually write a norm by ||x||.

    Examples of the most important norms are as follows:

    • The 2-norm or Euclidean norm:

    ||x||2 = (n∑i=1

    |xi|2)12

    • The 1-norm:

    ||x||1 = (n∑i=1

    |xi|)

    • For any integer p ≥ 1 we have the p-norm:

    ||x||p = (n∑i=1

    |xi|p)1p

    • The ∞− norm, also called the sup-norm:

    ||x||∞ = maxi|xi|

    This notation is used because ||x||∞ = limp→∞ ||x||p

    Several extensions of the concept of univariate median were introduced in the

    literature with application in different fields of statistics. One of them, the so called

    L1- median.

    A nonparametric estimator is proposed for estimating the L1- median for multivari-

    ate conditional distribution when the covariates take values in an infinite dimensional

    space. The multivariate case is more appropriate to predict the components of a

    vector of random variables simultaneously rather than predicting each of them sep-

    arately.

    Definition 2.2.4. Convex function A function f : M −→ R defined on a

    nonempty subset M of Rn and taking real values is called convex, if:

    1. the domain M of the function is convex;

    23

  • 2. for any x, y ∈M and every λ ∈ [0, 1] one has

    f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y). (2.2.1)

    If the above inequality is strict whenever x 6= y and 0 < λ < 1, f is called strictly

    convex.

    Example of convex functions:

    norms , recall that a real-valued function ||x|| on Rn is called a norm, if it is:

    • nonnegative everywhere, for all x ∈ Rn, ||x|| ≥ 0;

    • homogeneous, for all x ∈ Rn and a ∈ R, ||ax|| = |a|||x||;

    • satisfies the triangle inequality, ||x+ y|| ≤ ||x||+ ||y||;

    • ||x|| = 0 if and only if x = 0

    Let ‖.‖ denote any strictly convex norm (‖α + β‖ < (‖α‖ + ‖β‖), whenever α

    and β are not proportional on Rd. Let ‖.‖p : Rd −→ R, (1 < p < ∞). In the

    sequel, we restrict to the Euclidean norm. For notational simplicity, we write ‖.‖pfor ‖.‖p for ‖.‖ for ‖.‖2.

    For a fixed x ∈ Rs, define a vector function of α, α ∈ Rd, by

    φ(α, x) = E(‖Y − α‖ − ‖Y ‖|X = x)

    =

    ∫Rd

    (‖y − α‖ − ‖y‖)F (dy|x)

    where F (.|x) is the conditional probability measure of Y given X = x.

    When the norm is strictly convex, the existence and uniqueness of µ ∈ Rd such that

    φ(µ) = infα∈Rd

    φ(α, x)

    are guaranteed. The vector µ is called L1 - median of the measure associated with

    the distribution function G [18].

    Definition 2.2.5. (The Conditional L1- Median) [2],[3]

    The L1- median of Y conditioally on X = x is defined by

    µ(x) = arg minα∈Rd

    φ(α, x)

    24

  • where for α ∈ Rd

    φ(α, x) =

    ∫Rd

    (‖y − α‖ − ‖y‖)F (dy|x)

    .

    In this section, we introduced the median as a problem of minimization for both

    cases, the univariate case and the multivariate case. The L1-median has been in-

    troduced. In the next section, we will study the Nadaraya-Watson estimator of the

    univariate conditional median. It’s mean and variance will be discussed.

    2.3 The Nadaraya-Watson Estimator

    In this section, we will study some basic information about the (NW) estimator to

    use this later. It’s one of the popular nonparametric methods for estimating the

    conditional density function f(y|x). We will consider the kernel estimation of the

    conditional cumulative distribution function (cdf).

    let (X1, Y1),(X2, Y2), .... ,(Xn, Yn) be a random sample from a distribution with a

    conditional probability density function (pdf) f(y|x) then the cdf F (y|x) is given

    by :

    F (y|x) =∫ y−∞

    f(u|x)du,

    where

    f(y|x) = f(x, y)fX(x)

    .

    Now, we introduce the basic equation of the kernel conditional density estimation

    (KCDF). The Kernel function K(u) is assumed to be a Borel symmetric function,

    and h is sequence of positive number converging to zero and it is called bandwidth,

    and Kh(x) = K(x/h)/h..

    Standard kernel estimators of f(x, y) and fX(x) are

    f̂(x, y) =1

    n

    n∑i=1

    Kh(x−Xi)Kh(y − Yi)

    f̂X(x) =1

    n

    n∑i=1

    Kh(x−Xi)

    25

  • and

    f̂(y|x) = f̂(x, y)f̂X(x)

    =1n

    ∑ni=1Kh(x−Xi)Kh(y − Yi)1n

    ∑ni=1Kh(x−Xi)

    =

    ∑ni=1Kh(x−Xi)Kh(y − Yi)∑n

    i=1Kh(x−Xi),

    Now the estimate of the conditional cdf is given by :

    F̂ (y|x) =∫ y−∞

    f̂(u|x)du

    =

    ∑ni=1Kh(x−Xi)

    ∫ y−∞Kh(u− Yi)du∑n

    i=1Kh(x−Xi)

    Now, there are two ways to estimate the conditional cdf F (y|x). By using the indi-

    cator function I(Yi ≤ y), which is called the Nadaraya-Watson estimator,

    F̂NW (y|x) =∑n

    i=1 I(Yi≤y)Kh(x−Xi)∑ni=1Kh(x−Xi)

    ,

    Remark 2.3.1.

    0 ≤ F̂NW (y|x) ≤ 1,

    because, if y ≤ Yi for all i = 1, 2, ..., n then I(Yi ≤ y) = 0 for all i, then,

    F̂NW (y|x) =∑n

    i=1 I(Yi≤y)Kh(x−Xi)∑ni=1Kh(x−Xi)

    = 0.

    If y lies between the Y ,i s, i.e. there are some Yi less than or equal y but not at all,

    then I(Yi ≤ y) = 0 for some i, and I(Yi ≤ y) = 1 for the other, then

    F̂NW (y|x) =∑n

    i=1 I(Yi≤y)Kh(x−Xi)∑ni=1Kh(x−Xi)

    < 1

    If y ≥ Yi for all i then I(Yi ≤ y) = 1 for all i then

    F̂NW (y|x) =∑n

    i=1Kh(x−Xi)∑ni=1Kh(x−Xi)

    = 1

    Definition 2.3.1. The NW estimator of the conditional median m(x) is defined as

    m̂NW (x) = inf{y ∈ R : F̂NW (y|x) ≥ 0.5}

    26

  • In the following theorem, we discuss the expectation of the NW estimator.

    Theorem 2.3.1. (The Expectation of the NW Estimator)

    Let Yi be an independent random variables. The expectation of the estimator FNW (y|x)

    is given by

    E[F̂NW (y|x)] =n∑i=1

    Kh(x−Xi)∑ni=1Kh(x−Xi)

    F (y|Xi)

    Proof.

    E[F̂NW (y|x)] = E[∑n

    i=1 I{Yi≤y}Kh(x−Xi)∑ni=1Kh(x−Xi)

    ]

    = E[∑n

    i=1 I{Yi≤y}Kh(x−Xi)]∑ni=1Kh(x−Xi)

    =

    ∫∞−∞∑n

    i=1 I{Yi≤y}Kh(x−Xi)f(y|Xi)dy∑ni=1Kh(x−Xi)

    =

    ∑ni=1Kh(x−Xi)

    ∫∞−∞ I{Yi≤y}f(y|Xi)dy∑n

    i=1Kh(x−Xi)

    =

    ∑ni=1Kh(x−Xi)

    ∫∞−∞ f(t|Xi)dt∑n

    i=1Kh(x−Xi)

    =

    ∑ni=1Kh(x−Xi)F (y|Xi)∑n

    i=1Kh(x−Xi)

    =n∑i=1

    Kh(x−Xi)∑ni=1Kh(x−Xi)

    F (y|Xi)

    Theorem 2.3.2. (The Variance of the Nadaraya-Watson Estimato)[9]

    Let Yi be an independent random variables. The variance of the estimator F̂NW (y|x)

    is given by

    V ar[F̂NW (y|x)] =n∑i=1

    K2(x−Xihn

    )

    [∑n

    i=1K(x−Xihn

    )]2.[F (y|Xi)− F 2(y|Xi)].

    27

  • Proof.

    V ar[F̂NW (y|x)] = E(F̂ 2NW (y|x))− [E(F̂NW (y|x))]2

    = E[[∑n

    i=1 I2{Yi≤y}Kh(x−Xi) +

    ∑1≤i≤j

  • [2].

    Let φn(α, x) be the estimate of φ(α, x) defined by

    φn(α, x) =

    ∫Rd

    (‖y − α‖ − ‖y‖)F (dy|x)

    =

    ∑ni=1(‖Yi − α‖ − ‖y‖)Kh(x−Xi)∑n

    i=1Kh(x−Xi)

    From the definition of µ(x), it seems natural to estimate it by minimizing an esti-

    mate of φn(α, x).

    Definition 2.4.1. (The minimizer µn is the L1-median) [2], [3]

    µn(x) = arg minα∈Rd

    φn(α, x)

    = arg minα∈Rd

    n∑i=1

    ‖Yi − α‖Kh(x−Xi)

    where the last equality is obtained by removing terms independent of α in the ex-

    pressin of φn(α, x)

    2.4.1 Assumption and Main Results

    Let C be a compact subset of Rs on which the marginal density of X, denote by g,

    is lower bounded by some positive constant. Now we state some assumption, which

    are required to prove the theoretical results of this section.

    (A1) The density g of X is uniformly continuous.

    (A2) The function µ(.), satisfies a uniform uniqueness property over C

    ∀� > 0, ∃η > 0, ∀t : C −→ Rd,

    sup ‖µ(x)− t(x)‖ ≥ � =⇒ sup |φ(µ(x), x)− φ(t(x), x)| ≥ η.

    (A3) The kernel K is bounded, positive Hölderian function.

    (A4) The sequence (hn)n≥1 satisfies

    limn−→∞

    nhsnlog n

    =∞

    (A5) For any Borel set V ⊂Rd and for any α ∈ Rd, the function Q(V|.) and φ(α, .)

    are continuous on C.

    29

  • Then under the previous conditions, the results in Theorem 2.4.1 can be proved.

    Theorem 2.4.1. [2],[3] Assume (A1), (A2), (A3), (A4) and (A5). Then

    1. with probability 1(w.p.1), one can find an integer N ≥ 1, such that if n ≥ N

    and x ∈ C, the median µn(x) associated with the probability measure Qn(.|x)

    exists and is unique. Moreover, the function µn(.) is continuous on C

    2. the function µ(.) is continues on C.

    3. w.p.1:

    supx∈C||µn(x)− µ(x)|| → 0, if n→∞

    2.4.2 Preliminary Lemmas

    We review here some preliminary lemmas and definitions that will be helpful to

    follow the proof of Theorem 2.4.1.

    Definition 2.4.2. Borel set

    A Borel set is any set in a topological space that can be formed from open sets(or,

    equivalently, from closed sets) through the operations of countable union, and count-

    able intersection.

    Definition 2.4.3. Continuous, Uniform Continuous [13]

    Let f : I −→ R and let x◦ ∈ I, we say that f continuous at x◦ if

    limx−→x◦

    f(x) = f(x◦)

    Using (�, δ) definition of the limit, this means the following:

    ∀� > 0,∃δ = δ(�, x◦) > 0, ∀x ∈ I, |x− x◦| < δ ⇒ |f(x)− f(x◦)| < �

    If f is continuous at x for all x ∈ I, we say f is continuous on I.

    We say that f is uniformly continuous on I if

    ∀� > 0,∃δ = δ(�) > 0, ∀x ∈ I, |x− x◦| < δ ⇒ |f(x)− f(x◦)| < �

    30

  • Lemma 2.4.1. Assume (A5). We have

    lim‖α‖−→∞

    supn≥1

    supx∈C| φn(α, x)‖α‖

    − 1 |= 0.

    Proof. As for all p ≥ 1, x 7→ P (‖Y ‖ > p|X = x) is continuous function (by (A5)),

    one can find xp ∈ C such that

    supx∈C

    P (‖Y ‖ > p|X = x) = P (‖Y ‖ > p|X = xp) (2.4.1)

    The sequence (xp)p≥1 being in the compact C, one can extract a subsequence (pk)k≥1

    such that

    xpk −→ x∞ if k −→∞. (2.4.2)

    Then, w. p. 1, for any α ∈ Rd − {0}, x ∈ C, n ≥ 1 and k ≥ 1, we have

    |φn(α, x)‖α‖

    − 1| ≤∫Rd| ‖y − α‖ − ‖y‖ − ‖α‖

    ‖α‖| Fn(dy|x)

    ≤∫‖y‖≤pk

    | ‖y − α‖ − ‖y‖ − ‖α‖‖α‖

    | Fn(dy|x)

    +

    ∫‖y‖>pk

    | ‖y − α‖ − ‖y‖ − ‖α‖‖α‖

    | Fn(dy|x).

    Now, for all α ∈ Rd − {0} and y ∈ Rd:

    | ‖y − α‖ − ‖y‖ − ‖α‖‖α‖

    | ≤ ‖y − α‖ − ‖y‖+ ‖α‖‖α‖

    ≤ ‖y − α− y‖+ ‖α‖‖α‖

    ≤ ‖ − α‖+ ‖α‖‖α‖

    =2‖α‖‖α‖

    = 2

    and

    | ‖y − α‖ − ‖y‖ − ‖α‖‖α‖

    | ≤ ‖y − α‖+ ‖α‖+ ‖y‖‖α‖

    ≤ ‖y − α + α‖+ ‖y‖‖α‖

    ≤ ‖y‖+ ‖y‖‖α‖

    =2‖y‖‖α‖

    Thus, we get the inequality

    | φn(α, x)‖α‖

    − 1| ≤∫‖y‖≤pk

    2‖y‖‖α‖

    Fn(dy|x) +∫‖y‖>pk

    2Fn(dy|x).

    31

  • Letting ‖α‖ tend to infinity, we obtain that, w. p. 1, and for all k ≥ 1

    lim‖α‖−→∞

    supn≥1

    supx∈C| φn(α, x)‖α‖

    − 1 |≤ 2 supn≥1

    supx∈C

    ∫‖y‖>Pk

    Fn(dy|x).(2.4.3)

    The last upper bound is now proved to tend to zero as k tends to infinity.

    If k, n ≥ 1 and x ∈ C, let us denote

    qxn(k) =

    ∫‖y‖>pk

    Fn(dy|x)

    =

    ∑ni=1 I(‖Yi‖>Pk)K(

    x−Xihn

    )∑ni=1K(

    x−Xihn

    )

    Assuming (A1),(A3), (A4) and (A5), we have, w.p.1, for any k ≥ 1:

    supx∈C| qxn(k)− P (‖Y ‖ > pk|X = x)| −→ 0 as n −→∞, [4]. (2.4.4)

    The P− null set where the convergence in 2.4.4 is not true may be chosen to be

    independent of k. Moreover, let us note that

    (supx∈C

    qxn(k))k≥1

    and

    (supP (||Y || > pk|X = x))k≥1

    are decreasing sequences of positive numbers (because the sequence (pk)k≥1 is in-

    creasing), hence both converge as k →∞. Consequently, w.p.1, the convergence in

    2.4.4 is uniform in k ≥ 1, i.e.

    supk≥1

    supx∈C|qxn(k)− P (||Y || > pk|X = x)| → 0, if n→∞.

    Let ε > 0. By the above property, one can find, w.p.1, an integer N ≥ 1 such that

    if n > N, k ≥ 1 and x ∈ C

    qxn(k) ≤ ε+ P (||Y || > pk|X = x). (2.4.5)

    Now, w.p.1, if k ≥ 1

    supn≥1

    supx∈C

    qxn(k) ≤ supn=1....N

    supx∈C

    qxn(k) + supn>N

    supx∈C

    qxn(k).

    32

  • on the one hand, by the very definition of qxn(k) :

    supn=1....N

    supx∈C

    qxn(k) ≤ supn=1...N

    1[||Yi||>pk].

    On the other hand, according to 2.4.5:

    supn≥N

    supx∈C

    qxn(k) ≤ ε+ supx∈C

    P (||Y || > pk|X = x).

    But, the Yis are P− a.s. finite random variables, so that, with probability 1

    lim supk→∞

    supn≥1

    supx∈C

    qxn(k)

    is upper bounded by

    ε+ lim supk→∞

    supx∈C

    P (||Y || > pk|X = x). (2.4.6)

    As ε is arbitrary, the proof will be completed by showing that the last term in 2.4.6

    is equal to 0.

    Let K ≥ 1. According to 2.4.1:

    supx∈C

    P (||Y || > pk|X = x) = P (||Y || > pk|X = xpk).

    Moreover, xpk → x∞ if k →∞, according to 2.4.2. Now, if k ≥ 1 and ṕ ≤ pk :

    P (||Y || > pk|X = xpk) ≤ P (||Y || > ṕ|X = xpk),

    so that if ṕ ≥ 1 :

    lim supk→∞

    P (||Y || > ṕ|X = xpk) = P (||Y || > ṕ|X = x∞),

    because P (||Y || > ṕ|X = .) is a continuous function on C by (A5). Letting ṕ→∞,

    we get

    limk→∞

    supx∈C

    P (||Y || > pk|X = x) = 0,

    because Y is P -a.s. finite.

    Lemma 2.4.2. Let us assume that K is a positive probability density. w.p.1,one

    can find an integer N ≥ 1 such that if n ≥ N and x ∈ C, the L1-median µn(x)

    associate with the probability measure Qn(.|x) ( which is not supported by a straight

    line) exists and is unique.

    33

  • Proof. Let x ∈ C and n ≥ 1. w.p.1, the set of minimizers of ϕn(.|x) is non-empty.

    Let us prove that this set contains only one point. let x ∈ Rd. By assumption, if D

    is any straight line in Rd :

    Q(D|x) = P (Y ∈ D|X = x) < 1.

    Obviously, this gives that :

    P (Y ∈ D) < 1(2.4.7)

    Now, let us denote by D(y1, y2) the line which connects two points y1, y2 ∈ Rd(y1 6=

    y2), and by PY the distribution of Y. Clearly

    P (Y1, Y2, Y3 on the same line) = P (Y3 ∈ D(Y1, Y2))

    =

    ∫R2d

    P (Y3 ∈ D(y1, y2))PY (dy1)PY (dy2),

    because Y1, Y2 and Y3 are independent and identically distributed random vari-

    ables. Then, according to 2.4.7, one gets

    P (Y1, Y2, Y3 on the same lines) < 1.

    Let us denote by Ω1 the subset of Ω defined by

    Ω1 = {Y1, Y2, ... not on the same line}.

    From the previous inequality, we have

    P (Ωc1) ≤ P (Y1, Y2, Y3 on the same line) < 1.

    But Ωc1 is an element of the σ− field generated by the independent random variables

    Y1, Y2, ..., so that it is symmetric and P (Ωc1) = 0 by the Hewitt-Savage 0-1 low (see

    [9]) Hence P (Ω1) = 1, and one can deal with Ω1 instead of Ω. For all ω ∈ Ω1, one can

    find N(ω) ≥ 1 such that if n ≥ N(ω), Y1(ω), Y2(ω), ..., Yn(ω) are not on a straight

    line. Now, let x ∈ Rs, ω ∈ Ω1 and n ≥ N(ω). Because K is a positive function, the

    support of the probability measure Qn(.|x)(ω) is

    {Y1(ω), ..., Yn(ω)}.

    But n ≥ N(ω), so that this support is not included in any straight line. consequently,

    according to Theorem 2.17 of [18], the set of L1− medians associated with the

    probability measure Qn(.|x)(ω) contains only one element: µn(x)(ω).

    34

  • Lemma 2.4.3. Let us assume that K is a continuous positive kernel. w . p. 1,one

    can find an integer N ≥ 1 such that if n ≥ N , µn(.) is continuous on C.

    Proof. According to Lemma 2.4.2, w. p. 1, one can find an integer N ≥ 1 such that

    if n ≥ N and x ∈ C, the L1-median µn(x) associate with the probability measure

    Qn(.|x) exists and is unique. We prove now that P-a.s., if n ≥ N , µn(x) is continu-

    ous on C.

    Let x ∈ C, and let (xp)p≥1 be a sequence such that xp −→ x, if P −→∞.

    By the continuity of K, one obviously gets that the sequence of probability mea-

    sures (Qn(.|x))p≥1 converges weakly to the measure Qn(.|x) is not supported by

    any straight line, so that according to Corollary 2.26 of [18], µn(xp) −→ µ(x), if

    p −→∞

    Hence, w. p. 1, if n ≥ N , µn(.) is continuous function on C.

    Lemma 2.4.4. Assume (A1), (A2), (A3), (A4) and (A5). W. p. 1, and for

    any A > 0

    sup‖α‖≤A

    supx∈C| φn(α, x)− φ(α, x) |−→ 0, if n −→∞.

    Proof. We clearly have, w. p. 1, and for all α ∈ Rd, i ≥ 1:

    |‖Yi − α‖ − ‖Yi‖| ≤ ‖α‖

    Consequently, assuming (A1),(A3),(A4)and (A5), we have that, for all α ∈ Rd and

    w. p. 1:

    supx∈C|φn(α, x)− φ(α, x)| −→ 0, if n −→∞, (2.4.8)

    [4]. But w. p. 1, if n ≥ 1, x ∈ C and α, ά ∈ Rd:

    |φn(α, x)− φn(ά, x)| ≤ ‖α− ά‖ (2.4.9)

    and

    |φ(α, x)− φ(ά, x)| ≤ ‖α− ά‖

    From this we deduce that P-null set on which (2.4.8) is not true may be chosen

    independent of α. Moerover according to (2.4.9) and w. p. 1, the sequence of

    35

  • functions (φn(., x), n ≥ 1) is equicontinuous, and this property is independent of x.

    Thus, according to (2.4.8) and the Ascoli-Arzel. Theorem [9], we get that , w. p. 1,

    if A > 0:

    sup‖α‖≤A

    supx∈C|φn(α, x)− φ(α, x)| −→ 0, if n −→∞

    This completes the proof of the lemma.

    2.4.3 Proof of Theorem in Section 2.4.1

    Assuming (A3), assertion (1) follows from Lemmas 2.4.2 and 2.4.3. Moreover, (2)is

    a straightforward consequence of (1) and (3). One only needs to prove (3). The

    proof is divided into two steps.

    Step 1. We want to prove that, w.p.1, one can find r > 0 and N ≥ 1 such that

    supn≥N

    supx∈C||µn(x)|| ≤ r

    and

    supx∈C||µ(x)|| ≤ r.

    From Lemma 2.4.1, one can find, w.p.1, r1 > 0 such that, if ||α|| > r1,∀n ≥ 1, and

    ∀x ∈ C :

    ϕn(α, x) ≥1

    2||α||. (2.4.10)

    We have already proved in Lemma 2.4.2 that assuming (A3), w.p.1, there exists

    N ≥ 1 such that , if n ≥ N and x ∈ C, the L1−median µn(x) associated with

    the probability measure Qn(.|x) exists and is unique. Assume now that there exists

    n ≥ N and x ∈ C such that

    ||µn(x)|| > r1.

    Then, according to (2.4.10)

    ϕn(µn(x), x) ≥1

    2||µn(x)||.

    But, by the very definition of µn(x) :

    ϕn(µn(x), x) = infα∈Rd

    ϕn(α, x) ≤ ϕn(0, x) = 0.

    36

  • This is impossible. Hence, w.p.1, and for all n ≥ N :

    supx∈C||µn(x)|| ≤ r1.

    similar arguments lead to the existence of a real number r2 > 0 such that

    supx∈C||µ(x)|| ≤ r2.

    Now, the desired result is obtained with r = max(r1, r2).

    Step 2. Conclusion. w.p.1, if n ≥ N (N was fixed in step 1)

    supx∈C|ϕ(µ(x), x)− ϕ(µn(x), x)| ≤ sup

    x∈C|ϕ(µ(x), x)− ϕn(µn(x), x)|.

    + supx∈C|ϕn(µn(x), x)− ϕ(µn(x), x)|.

    Form step 1:

    supn≥N

    supx∈C||µn(x)|| ≤ r

    and

    supx∈C||µ(x)|| ≤ r,

    so that

    ϕ(µ(x), x) = infα∈Rd

    ϕ(α, x) = inf||α||≤r

    ϕ(α, x)

    and

    ϕn(µn(x), x) = infα∈Rd

    ϕn(α, x) = inf||α||≤r

    ϕn(α, x).

    Thus, w.p.1, if n ≥ N

    supx∈C|ϕ(µ(x), x)− ϕ(µn(x), x)| ≤ sup

    x∈C| inf||α||≤r

    ϕ(α, x)− inf||α||≤r

    ϕn(α, x)|

    + sup||α||≤r

    supx∈C|ϕn(α), x)− ϕ(α, x)|.

    Assuming (A1), (A3), (A4) and (A5), from Lemma 2.4.4, we get

    supx∈C|ϕ(µ(x), x)− ϕ(µn(x), x)| → 0, if n→∞,

    w.p.1. Then, applying (A2), one gets

    supx∈C||µ(x)− µn(x)|| → 0, if n→∞,

    w.p.1.

    This complete the proof of Theorem 2.4.1.

    37

  • Through the studying the NW estimator, the large bias and boundary effects is

    the most drawbacks of the NW estimator. Hence, in the next chapter, we will study

    another estimator, which is know as a double kernel (DK) estimator.

    38

  • Chapter 3

    The Double Kernel

    Estimator

  • Chapter 3

    The Double Kernel Estimator

    Introduction

    Let (X,Y ) be a two dimensional random variable with a joint distribution func-

    tion F (x, y). The large bias and boundary effects are considered to be the most

    important defect in the case of the NW estimator. The NW estimator was treated

    and modified in order to obtain some more refinement estimator, which is called the

    double kernel estimator (DK), see [10].

    Various consistency proofs for the kernel density estimator have been developed over

    the last few decades. Important milestones are the pointwise consistency and almost

    sure uniform convergence with a fixed bandwidth on the one hand and the rate of

    convergence with a fixed or even a variable bandwidth on the other hand. While

    considering global properties of the empirical distribution functions is sufficient for

    strong consistency, proofs of exact convergence rates use deeper information about

    the underlying empirical processes. A unifying character, however, is that earlier

    and more recent proofs use bounds on the probability that a sum of random vari-

    ables deviates from its mean see [25].

    This chapter studies the double kernel estimation of the conditional L1 -median of Y

    for a given value of X based on a random sample from the above distribution. In this

    chapter, the joint asymptotic consistency of the conditional L1- median estimated

    at a finite number of distinct points is established under some regularity conditions.

    The aim of this chapter is to introduce the double kernel estimator (DK) and its

    aspects, discussion the properties of the DK estimator of the conditional L1- median

    39

  • and studying the asymptotic consistency of the DK.

    This chapter consists of two sections. In Section 3.1, we will introduce the DK

    estimator of the conditional distribution function F̂DK(y|x). In Section 3.2, we will

    investigate the asymptotic consistency of the DK.

    3.1 The Double Kernel Estimator

    If f(x, y) is the joint pdf of the random variables X and Y at (x,y), and g(x) is the

    marginal pdf of X at x, the conditional pdf of Y given X = x is given by

    f(y|x) = f(x, y)g(x)

    , g(x) > 0 (3.1.1)

    for each y within the range of Y , and then ,

    F (y|x) =∫ y−∞

    f(u|x)du (3.1.2)

    Now, we introduce the basic equation of the kernel conditional distribution function

    estimation (KCDF). The Kernel function K(u) is assumed to be the kernel function,

    hn = h is a sequence of positive numbers converging to zero.

    Standard kernel estimators of f(x, y) , g(x) , f(y|x) are

    f̂(x, y) = (nh2)−1n∑i=1

    Kh(x−Xi)Kh(y − Yi) (3.1.3)

    ĝ(x) = (nh)−1n∑i=1

    Kh(x−Xi) (3.1.4)

    and

    f̂(y|x) = f̂(x, y)ĝ(x)

    =(nh2)−1

    ∑ni=1Kh(x−Xi)Kh(y − Yi)

    (nh)−1∑n

    i=1Kh(x−Xi)

    =

    ∑ni=1Kh(x−Xi)Kh(y − Yi)h∑n

    i=1Kh(x−Xi)

    Definition 3.1.1. The double kernel estimator of the conditional distribution func-

    tion F (x, y) is defined as:

    F̂DK(y|x) =∫ y−∞

    f̂(u|x)du = Bn(x, y)ĝ(x)

    40

  • where,

    Bn(x, y) =1nhn

    ∑ni=1Kh(x−Xi)K̂h(y − Yi), K̂(y) =

    ∫ y−∞K(u)du

    and K is a probability density function, hn is a sequence of positive numbers con-

    verging to zero.

    [29] used a double kernel approach. In this case, the indicator function in the NW

    estimator is replaced by a continuous distribution function Ω(Yi−yh

    ). Then Fn(y|x)

    has the form

    F̂DK(y|x) =n∑i=1

    wi(x)Ω(Yi − yh2

    ),

    where,

    Ω(y) =

    ∫ y−∞

    W (u)du

    is a distribution function with associated density function W (u).

    Definition 3.1.2. The double kernel estimator of the conditional median m(x) is

    defined as:

    m̂DK(x) = inf[y ∈ R : F̂DK(y|x) ≥ 0.5].

    3.2 Consistency of the Double Kernel Estimator

    This section is the main section of this chapter. Here, we will introduce the DK es-

    timator of the L1− median as a minimization problem, then the asymptotic consis-

    tency of this estimator will be discussed and driven under some regularity conditions.

    Let φn(α, x) be the estimate of φ(α, x) defined by

    φn(α, x) =

    ∫Rd

    (‖y − α‖ − ‖y‖)F (dy|x)

    =

    ∑ni=1(‖Yi − α‖ − ‖y‖)Kh(x−Xi)K̂h(y − Yi)∑n

    i=1Kh(x−Xi)

    From the definition of µ(x), it seems natural to estimate it by minimizing an

    estimate of φn(α, x).

    41

  • Definition 3.2.1. (The minimizer µn is the L1-median)

    µn(x) = arg minα∈Rd

    φn(α, x)

    = arg minα∈Rd

    n∑i=1

    ‖Yi − α‖Kh(x−Xi)K̂h(y − Yi)

    where the last equality is obtained by removing terms independent of α in the ex-

    pression of φn(α, x)

    Theorem 3.2.1. Assume (A1), (A2), (A3), (A4) and (A5). Then

    1. with probability 1(w.p.1), one can find an integer N ≥ 1, such that if n ≥ N

    and x ∈ C, the median µn(x) associated with the probability measure Qn(.|x)

    exists and is unique. Moreover, the function µn(.) is continuous on C

    2. the function µ(.) is continues on C.

    3. w.p.1:

    supx∈C||µn(x)− µ(x)|| → 0, if n→∞

    Lemma 3.2.1. (Bochner Lemma)[24] Suppose that K is a Borel function satis-

    fying the conditions:

    1. supx∈Rs |K(x)|

  • Lemma 3.2.4. Let us assume that K is a positive probability density. W. p. 1,one

    can find an integer N ≥ 1 such that if n ≥ N and x ∈ C, the L1-median µn(x)

    associate with the probability measure Qn(.|x) ( which is not supported by a straight

    line) exists and is unique.

    Lemma 3.2.5. Let us assume that K is a continuous positive kernel. W. p. 1,one

    can find an integer N ≥ 1 such that if n ≥ N , µn(.) is continuous on C.

    The proofs of the previous Lemmas and the main theorem are obtaind using the

    same techniques as in chapter 2, after replacing the indicator function I(Yi ≤ y)

    by Ω(Yi−yh

    ).

    43

  • Chapter 4

    Application

  • Chapter 4

    Application

    The performance of the estimators of the conditional L1 - median is our concern in

    this chapter. We will use the two estimators that we have studied in chapter 2 and

    chapter 3 to analyze and forecast two bivariate time series.

    The dependency between the stock markets in the world of finance indicated that

    the dealing must be between multiple time series rather than we deal with each time

    series alone.

    The L1- median estimators enable use to predict multivariate medians of different

    time series. Therefor, we will use the NW and DK estimators to analyze bivariate

    time series.

    The S-plus program is used to compute the two estimators depending on their

    theoretical equations from the previous two chapters.

    This chapter consists from three sections. In section 4.1, the first application will

    be given using a bivariate time series from [26] which gives the asserts prices for two

    international companies, IBM and SP500. The next section is similar to the first,

    but it depends on another bivariate time series, from [26] which gives the asserts

    prices for the Intel and Cisco companies. Finally, we close this chapter by Section 3,

    which contains some conclusions and suggestions that we have concluded from our

    study in this thesis.

    44

  • 4.1 Application 1

    We illustrate the application of our conditional NW and DK estimators for the

    bivariate median by considering the prediction of a financial position with bivariate

    financial time series.

    Prediction for the IBM and SP500 series

    Consider the bivariate time series of the monthly log returns of the IBM stock and

    the SP500 index, from January 1926 to December 1999 consisting of 880 observation.

    The source of the this data set from [26].

    As in application 1, we rescaled the data such that they range from zero to one.

    Now, let x1,t = {IBM}t and x2,t = {SP500}t. Thus xt = {(x1,t, x2,t)} is a bivariate

    time series.

    The two time series x1,t and x2,t are correlated. Table 4.1 provides some summary

    statistics of the rescaled series. Figure 4.1 and 4.2 show the time plot of the two

    series, while Figure 4.3 and 4.4 show the scatterplots of the two series and their

    squares respectively.

    Table 4.1: Summary statistics of IBM and SP500 data

    Series Min. 1st Qu Mean Median 3rd Qu. Max.

    IBM 0.00 0.46 0.52 0.52 0.59 1.00

    SP500 0.00 0.47 0.51 0.52 0.56 1.00

    Details of Calculation

    We computed µn(x) by finding the minimum at a finite sample of points

    µn(x) = arg minα∈Rd

    φn(α, x)

    = arg minα∈Rd

    n∑i=1

    ‖Yi − α‖Kh(x−Xi) (4.1.1)

    µn(x) = arg minα∈Rd

    φn(α, x)

    = arg minα∈Rd

    n∑i=1

    ‖Yi − α‖Kh(x−Xi)K̂h(y − Yi) (4.1.2)

    45

  • We use the first 880 bivariate observations to predict the last 8 observations for the

    bivariate time series using the NW from (4.1.1) and DK from (4.1.2) estimators for

    the median. The results for IBM data are listed in Table 4.2, Figure 4.5 shows the

    true observations for IBM and their predictions using the NW and DK estimators

    . The results for SP500 data are listed in Table 4.3 and their graph are shown in

    Figure 4.6.

    Figure 4.1: Time plot of the rescaled IBM stock

    Figure 4.2: Time plot of the rescaled SP500 stock

    46

  • Figure 4.3: Scatterplot of the rescaled IBM stock versus the rescaled SP500 stock

    Figure 4.4: Scatterplot of the squares of the rescaled IBM stock versus the squares

    of the rescaled SP500 stock

    47

  • Figure 4.5: Graph of the NW and the DK estimators for IBM

    Table 4.2: The DK and NW median estimators for the IBM data.

    i DK estimator True value NW estimator

    880 0.7051316 0.6751316 0.6651316

    881 0.7051316 0.6811093 0.6351316

    882 0.6851316 0.4560167 0.5851316

    883 0.6351316 0.4889528 0.5351316

    884 0.5851316 0.4542471 0.5651316

    885 0.5951316 0.1577721 0.5351316

    886 0.5751316 0.5832439 0.5651316

    887 0.6051316 0.5777071 0.5551316

    48

  • Figure 4.6: Graph of the NW and the DK estimators for SP500

    Table 4.3: The DK and NW median estimators for the SP500 data

    i DK estimator True value NW estimator

    880 0.5068489 0.6751316 0.5068489

    881 0.5468489 0.6811093 0.5468489

    882 0.5868489 0.4560167 0.5468489

    883 0.5968489 0.4889528 0.5168489

    884 0.5468489 0.4542471 0.5368489

    885 0.5668489 0.1577721 0.5268489

    886 0.5468489 0.5832439 0.5268489

    887 0.5668489 0.5777071 0.5168489

    49

  • In Table 4.4, We computed the values of the MSE related to the kernel estima-

    tions NW and DK, as seen from Table 4.4, for all sample sizes, the MSE values of

    the NW has the best performance than the DK kernel estimator.

    MSE =1

    n

    n∑i=1

    (yi − ŷi)2,

    where yi is the true value and ŷi its prediction value.

    Table 4.4: MSE of the DK and the NW estimators for Application 1

    IBM SP500

    DK 0.03557136 0.004800537

    NW 0.02206879 0.003111601

    4.2 Application 2

    In this application, we use the NW and DK using a bivariate time series consists of

    Cisco and Intel data.

    Prediction for the Cisco and Intel series

    Suppose that we want to predict values of the following two time series, the Stocks

    of Cisco Systems and the Intel Corporation. We use the daily log returns for the

    two stocks from January 2, 1991 to December 31, 1999 with 2258 observations. [26]

    considered this data set and computed the VaR at the end of the data span by

    building univariate and multivariate volatility models. We rescaled the data so that

    they range from zero to one. Now, let x1,t = {Cisco}t and x2,t = {Intel}t. Thus

    xt = {(x1,t, x2,t)} is bivariate time series.

    The two time series x1,t and x2,t are correlated.Table 4.5 provide some summary

    statistics of the rescaled series. Figure 4.7 and Figure 4.8 show the time plot of the

    two series, while Figure 4.9 and Figure 4.10 show the scatterplots of the two series

    and their squares respectively.

    50

  • Table 4.5: Summary statistics of Cisco and Intel data

    Series Min. 1st Qu Mean Median 3rd Qu. Max.

    Cisco 0.00 0.55 0.59 0.54 0.64 1.00

    Intel 0.00 0.48 0.54 0.54 0.60 1.00

    We use the first 2258 bivariate observations to predict the last 8 observations for

    the bivariate time series using the NW and DK estimators for the median. The re-

    sults for Cisco data are listed in Table 4.6, Figure 4.11 shows the true observations

    for Cisco and their predictions using the NW and DK estimators . The results for

    Intel data are listed in Table 4.7 and their graph are shown in Figure 4.12.

    51

  • Figure 4.7: Time plot of the rescaled Cisco stock

    Figure 4.8: Time plot of the rescaled Intel stock

    52

  • Figure 4.9: Scatterplot of the rescaled Cisco stock versus the rescaled Intel stock

    Figure 4.10: Scatterplot of the squares of the rescaled Cisco stock versus the squares

    of the rescaled Intel stock

    53

  • Figure 4.11: Graph of the NW and the DK estimators for Cisco

    Table 4.6: The DK and NW median estimators for the Cisco data

    i DK estimator True value NW estimator

    2258 0.7280763 0.7280763 0.7380763

    2259 0.6880763 0.5314482 0.7080763

    2260 0.6380763 0.5408192 0.6580763

    2261 0.6580763 0.6439006 0.6380763

    2262 0.6280763 0.6509391 0.6380763

    2263 0.6580763 0.4613087 0.6380763

    2264 0.6280763 0.5078365 0.6480763

    2265 0.6580763 0.6794206 0.6280763

    54

  • Figure 4.12: Graph of the NW and the DK estimators for Intel

    Table 4.7: The DK and NW median estimators for the Intel data

    i DK estimator True value NW estimator

    2258 0.5176012 0.4776012 0.5176012

    2259 0.5576012 0.3765592 0.5576012

    2260 0.5176012 0.4469881 0.5576012

    2261 0.5576012 0.4590812 0.5476012

    2262 0.5276012 0.6055674 0.5676012

    2263 0.5576012 0.4269851 0.5576012

    2264 0.5276012 0.8381123 0.5576012

    2265 0.5576012 0.5740417 0.5576012

    55

  • In Table 4.8 summarizes their computed MSE for the two time series using the

    NW and DK estimators .

    The results indicator that the NW estimator performs better than DK estimator for

    Cisco data. On the other hand DK estimator performs better than NW estimator

    for Intel data.

    Table 4.8: MSE of the DK and the NW estimators for Application 2

    Intel Cisco

    DK 0.0110432 0.02111191

    NW 0.01898825 0.01234954

    4.3 Discussion and Conclusion

    The need to deal with multiple time series rather than dealing with each time series,

    alone gives the researchers the idea to use L1- median estimator, for more details

    see [2].

    [2] has used the NW estimator for estimating the L1- median. This estimator de-

    pends on the indicator function and this make the resulting curves are not smooth.

    To get curves much smoothers, we have proposed to use a DK estimator for the L1

    median. By looking on Figure 4.5, Figure 4.6, Figure 4.11 and Figure 4.12, we note

    that the estimated curves using the DK is smoother than that we have obtained

    using the NW estimator.

    The comparison results depending on the MSE has indicated that the NW estimator

    was better than the DK for the three time series.

    From this results, we can conclude the following :

    1. To get smooth curves, we must use the DK estimator.

    2. In general, the NW estimator gives curves that much closer to the real data.

    3. Using a modified estimators depending on the NW estimator, by using reweighed

    versions or using variable bandwidth will give better estimators than that we

    56

  • have used, but it request much complicated computations and programming.

    57

  • Bibliography

    [1] Al Attal, A. (2016). On the Kernel Estimation of the Conditional Median. The

    Islamic University of Gaza.

    [2] Berlinet, A., Cardre, B. and Gannoun, A. 1998. On the Conditional L1-Median

    and its Estimation. Nonparametric Statistics.

    [3] Berlinet, A., Cadre, B. and Gannoun, A. (2001). On the Conditional L1 -Median

    and its Estimation. University of Montpellier, France.

    [4] Bosq, D., and Lecoutre, J, P.(1987). Theorie de 1 estimation fonctionnelle. Eco-

    nomica.

    [5] Bruce, E. Hansen.Lecture Notes on Nonparametrics.University of Wisconsin

    .Spring 2009

    [6] Cameron, A.C., Trivedi, P.K. (1998). Regression Analyse of Count Data.

    NewYork: Cambridge University Prsee.

    [7] Casella, G. and Berger, R.(2002). Statistical Inference,USA.

    [8] De Gooijer J. G., Gannoun, A., Zerom, D. (2004). A multivariate Quantile Pre-

    dictor, UVA Econometrics, Discussion Paper 2002/08.

    [9] Dudley, R. M. (1989). Real Analysis and Probability. Chapman and Hall.

    [10] Fan, J., Hu, T.C., Troung, Y.K. (1994). Robust Nonparametric Function Es-

    timation. Scandinavian Journal of Statistics, 21, 433-446.

    [11] Fan, J., Yao, Q. (2003). Nonlinear Time series: Nonparametric and Parametric

    Methods. Springer-Velag. USA.

    58

  • [12] Fan, J., Yao, Q., Tong, H. (1996). Estimation of conditional densities and

    sensitivity measuresin nonlinear dynamical systems. Biometrika, Vol. 83, 189-

    206.

    [13] Florescu, I. (2015). Probability and Stochastic Processes, By John Willy and

    Sons.

    [14] Freund, J. (1992).Mathematical Statistics , Arizona State University.

    [15] George Casell. Roger L. Berger. (1990). Statistical Inference. Cornell University,

    North Carolina State University

    [16] Hogg . Mckean . Craig , (2005). Introduction to Mathematical Statistic. Uni-

    versity of Iowa, Wester Michigan University, University of Iowa.

    [17] Hyndman R. J. (1996), Estimating and visualizing condti tional densities. Jour-

    nal of Computational and Graghical Statistics, Vol.5,315-336.

    [18] Kemperman, J. H. B. (1987). The Median of Finite Measure on Banach Space.

    Statistical Data Analysis Based on L1 Norm and Related Method, Y. Dodge

    (ed.), Amsterdam: NorthHoland, 217-230

    [19] Nadaraya, E. A. (1964). One Estimating Regression. Theory of the Probability

    of its Applications, 10, 186-190.

    [20] Rider, Paul R. (1960). Variance of the median of small samples from several

    special populations.Journal of the American Statistical Association.

    [21] Roger, C.R. (1965). Linear Statistical Inference and Its Applications. Wiley,

    New York.

    [22] Rosenblatt, M. (1969). Conditional Probability Density and Regression Esti-

    mator. New York: Academic Press

    [23] Royden H.L. (1997). Real Analysis. Stanford University.

    [24] Salha, R (2006). Kernel Estimation for the Conditional mode and quantiles of

    time series. The Macedonia economic and social sciences university .

    59

  • [25] Silverman, B. W. (1986). Density estimation for statistics and data analysis.

    Chapman and Hall, London.

    [26] Tsay. R. S. (2002). Analysis of Financial Time Series. John Wiley and Sons.

    [27] Wand and Jones, Wand, M. P. and Jones, M. C. (1995). Kernel smoothing,

    Chapman and Hall.

    [28] Watson, G.S. (1964). Smooth regression analysis.Sankhya Series A, 26, 359-

    3720.

    [29] Yu, K. and Jones M. C. (1998). Local Linear Quantile Regression. Journal of

    the American Statistical Association, Vol. 93, No. 441, 228-237.

    60