9
Chapter 7: Estimation 7.5 Maximum Likelihood Estimators Maximum likelihood estimation chooses as the estimate of θ the value of θ that provides the largest value of the likelihood function. Definition: Likelihood Function. When the joint p.d.f. or the joint p.f. f n (x|θ )= f (x 1 |θ ) × ... × f (x n |θ ) of the observations in a random sample is regarded as a function of θ for given values of x 1 , ..., x n , it is called the likelihood function. Definition: Maximum Likelihood Estimator/Estimate. For each possible observed vector x =(x 1 , ..., x n ), let δ (x) Ω denote a value of θ Ω for which the likelihood function f n (x|θ ) is a maximum, and let ˆ θ = δ (X) be the estimator of θ defined in this way. The estimator ˆ θ is called a maximum likelihood estimator of θ . After X = x is observed, the value δ (x) is called a maximum likelihood estimate of θ . The set Ω of all possible values of a parameter/parameters is called parameter space. 111

Chapter 7_6377_8557_20141201232831

Embed Size (px)

Citation preview

  • Chapter 7: Estimation

    7.5 Maximum Likelihood Estimators

    Maximum likelihood estimation chooses as the estimate

    of the value of that provides the largest value of the

    likelihood function.

    Definition: Likelihood Function. When the joint p.d.f.

    or the joint p.f. fn(x|) = f (x1|) ... f (xn|) of theobservations in a random sample is regarded as a function

    of for given values of x1, ..., xn, it is called the likelihood

    function.

    Definition: Maximum Likelihood Estimator/Estimate.

    For each possible observed vector x = (x1, ..., xn), let

    (x) denote a value of for which the likelihoodfunction fn(x|) is a maximum, and let = (X) bethe estimator of defined in this way. The estimator

    is called a maximum likelihood estimator of . After

    X = x is observed, the value (x) is called a maximum

    likelihood estimate of . The set of all possible values

    of a parameter/parameters is called parameter space.

    111

  • Examples of Maximum Likelihood Estimators

    Example: Choose a sample of size 3 from the expo-

    nential distribution with parameter > 0, the observed

    data are (x1, x2, x3) = (3, 1.5, 2.1).

    X f (x) =

    {ex x > 0

    0 x 0The likelihood function is,

    f3(x|) = f (x1)f (x2)f (x3) = 3e(x1+x2+x3) = 3e6.6

    Since log is an increasing function, the value of that

    maximizes the likelihood function f3(x|) will be the sameas the value of that maximizes log f3(x|).

    L() = log f3(x|) = 3 log 6.6

    Taking the derivative, setting the derivative to 0, and

    solving for yields

    dL()

    d=

    3

    6.6, d

    2L()

    d2= 3

    2< 0

    dL()

    d= 0, =

    3

    6.6= 0.455

    The maximum likelihood estimate is then 0.455.

    Example: Suppose X has the Bernoulli distribution

    with parameter .

    112

  • X 0 1

    P 1 The p.f. of X can be rewriten as X f (x|) = x(1 )1x, x = 0, 1 . Let the parameter space = {0.1, 0.9}.

    If x = 0 (sample size is one) is observed,

    f (x = 0|) =

    {0.9 if = 0.1

    0.1 if = 0.9Clearly, = 0.1

    maximizes the likelihood when x = 0 is observed. So the

    MLE is = 0.1 if X = 0.

    Question: if x = 1 is observed, what is the MLE of ?

    Example: Suppose that the random variablesX1, ..., Xn

    form a random sample from the Bernoulli distribution

    with parameter , which is unknown (0 1). Forall observed values x1, ..., xn, where each xi is either 0 or

    1, the likelihood function is

    fn(x|) =ni=1

    xi(1 )1xi = ni=1 xi(1 )n

    ni=1 xi

    L() = log fn(x|) =ni=1

    xi log + [nni=1

    xi] log(1)

    113

  • dL()

    d=

    ni=1 xi

    nn

    i=1 xi1

    Ifn

    i=1 xi = 0,dL()d =

    n1 < 0, then L() is a de-

    creasing function of , and hence L achieves its maximum

    at = 0 = x.

    Ifn

    i=1 xi = n,dL()d =

    n > 0, then L() is a in-

    creasing function of , and hence L achieves its maximum

    at = 1 = x.

    Ifn

    i=1 xi 6= {0, n}, SetdL()d = 0, = x,

    dL2()d2

    = ni=1 xi2 n

    ni=1 xi

    (1)2 < 0,

    then L achieves its maximum at = x.

    that the M.L.E. of is X .

    Example: Suppose that X1, ..., Xn form a random

    sample from a normal distribution for which the mean

    is unknown and the variance 2 is known. For all ob-

    served values x1, ..., xn, the likelihood function will be

    fn(x|) =1

    (22)n/2exp

    [ 1

    22

    ni=1

    (xi )2]

    fn(x|) will be maximized by the value of that min-

    114

  • imizes

    Q() =

    ni=1

    (xi )2 =ni=1

    x2i 2ni=1

    xi + n2

    We see that Q is a quadratic in with positive coefficient

    on 2. It follows that Q will be minimized where its

    derivative is 0.

    dQ()

    d= 2

    ni=1

    xi + 2n = 0

    =

    ni=1 xin

    = x

    the M.L.E. of is = X .

    Example: Suppose again that X1, ..., Xn form a ran-

    dom sample from a normal distribution for which the

    mean is unknown and the variance 2 is also unknown.

    For all observed values x1, ..., xn, the likelihood function

    will be

    fn(x|) =1

    (22)n/2exp

    [ 1

    22

    ni=1

    (xi )2]

    The parameter is = (, 2), where < 0.

    115

  • L() = log fn(x|) = n

    2log(2)n

    2log 2 1

    22

    ni=1

    (xi)2

    We shall find the value of = (, 2) for which L() is

    maximum.{L =

    12

    ni=1(xi ) = 0

    L2

    = n22

    + 124

    ni=1(xi )2 = 0

    Solve these two equations, we have

    = x, 2 = 1nn

    i=1(xi x)2.the M.L.E. of is = (, 2) = (X, 1n

    ni=1(XiX)2).

    Example: Suppose that X1, ..., Xn form a random

    sample from the uniform distribution on the interval [0, ],

    where > 0. The pdf of each observation is

    X f (x|) =

    {1 0 x 0 otherwise

    The joint pdf (likelihood function) fn(x|) of X1, ..., Xnhas the form

    fn(x|) =

    {1n 0 xi (i = 1, ..., n)0 otherwise

    The MLE of must be a value of for which xi (i =1, ..., n) and that maximizes 1n among all such values.

    116

  • Since 1n is a decreasing function of , the estimate will be

    the smallest value of such that xi for i = 1, ..., n.Since this value is = max{x1, ..., xn}, the MLE of is = max{X1, ..., Xn}.

    7.6 Properties of Maximum Likelihood Estimators

    Theorem: Invariance Property of M.L.E.s. If is the

    maximum likelihood estimator of and if g is a one-to-one

    function, then g() is the maximum likelihood estimator

    of g().

    Example: Suppose that X1, ..., Xn form a random

    sample from a normal distribution for which both the

    mean and the variance 2 are unknown. It was found

    that the MLE of = (, 2) is

    = (, 2) = (X,1

    n

    ni=1

    (Xi X)2)

    From the invariance property, we can conclude that the

    MLE of is =2 =

    1n

    ni=1(Xi X)2, also, the

    MLE of 2 + 2 is 2 + 2 = X2 + 1nn

    i=1(Xi X)2.

    117

  • Consistency

    Under some conditions, the maximum likelihood esti-

    mator is consistent. The consistency means that having a

    sufficiently large number of observations n, it is possible

    to find the value of with arbitrary precision.

    limn

    P (|n | < ) = 1

    Method of Moments

    Definition: Assume that X1, ..., Xn form a random

    sample from a distribution X .

    sample moments: mj = 1nn

    i=1Xji for j = 1, ..., k.

    population moments: i() = E(Xj) = E(Xji )

    for a k-dimensional parameter , set up the k equations

    mj = j() and solve for .

    Example: we considered a sample of size n from the

    gamma distribution with parameters and 1.

    we use one equation. m1 =1n

    ni=1 xi = x, 1 =

    EX = , we let m1 = 1, = x, The method of

    moments estimator is then = X .

    118

  • Definition: A random variable X has the gamma distri-

    bution with parameters > 0, > 0, if X has a contin-

    uous distribution for which the p.d.f. is

    f (x) =

    {

    ()x1ex x > 0

    0 x 0Theorem: If X Gamma(, ), then

    E(X) =

    , V ar(X) =

    2.

    Example: we considered a sample of size n from the

    gamma distribution with unknown parameters and .

    1 =

    , 2 =

    ( + 1)

    2

    The method of moments says to replace the right-hand

    sides of these equations by the sample moments and then

    solve for and .

    Let 1 = m1, 2 = m2, then

    =m21

    m2 m21, =

    m1m2 m21

    Theorem: The sequence of method of moments esti-

    mators based on X1, ..., Xn is a consistent sequence of

    estimators of .

    119