Chapter 7_6377_8557_20141201232831

Chapter 7: Estimation

7.5 Maximum Likelihood Estimators

Maximum likelihood estimation chooses as the estimate

of the value of that provides the largest value of the

likelihood function.

Definition: Likelihood Function. When the joint p.d.f.

or the joint p.f. fn(x|) = f (x1|) ... f (xn|) of theobservations in a random sample is regarded as a function

of for given values of x1, ..., xn, it is called the likelihood

function.

Definition: Maximum Likelihood Estimator/Estimate.

For each possible observed vector x = (x1, ..., xn), let

(x) denote a value of for which the likelihoodfunction fn(x|) is a maximum, and let = (X) bethe estimator of defined in this way. The estimator

is called a maximum likelihood estimator of . After

X = x is observed, the value (x) is called a maximum

likelihood estimate of . The set of all possible values

of a parameter/parameters is called parameter space.

111

Examples of Maximum Likelihood Estimators

Example: Choose a sample of size 3 from the expo-

nential distribution with parameter > 0, the observed

data are (x1, x2, x3) = (3, 1.5, 2.1).

X f (x) =

{ex x > 0

0 x 0The likelihood function is,

f3(x|) = f (x1)f (x2)f (x3) = 3e(x1+x2+x3) = 3e6.6

Since log is an increasing function, the value of that

maximizes the likelihood function f3(x|) will be the sameas the value of that maximizes log f3(x|).

L() = log f3(x|) = 3 log 6.6

Taking the derivative, setting the derivative to 0, and

solving for yields

dL()

d=

3

6.6, d

2L()

d2= 3

2< 0

dL()

d= 0, =

3

6.6= 0.455

The maximum likelihood estimate is then 0.455.

Example: Suppose X has the Bernoulli distribution

with parameter .

112

X 0 1

P 1 The p.f. of X can be rewriten as X f (x|) = x(1 )1x, x = 0, 1 . Let the parameter space = {0.1, 0.9}.

If x = 0 (sample size is one) is observed,

f (x = 0|) =

{0.9 if = 0.1

0.1 if = 0.9Clearly, = 0.1

maximizes the likelihood when x = 0 is observed. So the

MLE is = 0.1 if X = 0.

Question: if x = 1 is observed, what is the MLE of ?

Example: Suppose that the random variablesX1, ..., Xn

form a random sample from the Bernoulli distribution

with parameter , which is unknown (0 1). Forall observed values x1, ..., xn, where each xi is either 0 or

1, the likelihood function is

fn(x|) =ni=1

xi(1 )1xi = ni=1 xi(1 )n

ni=1 xi

L() = log fn(x|) =ni=1

xi log + [nni=1

xi] log(1)

113

dL()

d=

ni=1 xi

nn

i=1 xi1

Ifn

i=1 xi = 0,dL()d =

n1 < 0, then L() is a de-

creasing function of , and hence L achieves its maximum

at = 0 = x.

Ifn

i=1 xi = n,dL()d =

n > 0, then L() is a in-

creasing function of , and hence L achieves its maximum

at = 1 = x.

Ifn

i=1 xi 6= {0, n}, SetdL()d = 0, = x,

dL2()d2

= ni=1 xi2 n

ni=1 xi

(1)2 < 0,

then L achieves its maximum at = x.

that the M.L.E. of is X .

Example: Suppose that X1, ..., Xn form a random

sample from a normal distribution for which the mean

is unknown and the variance 2 is known. For all ob-

served values x1, ..., xn, the likelihood function will be

fn(x|) =1

(22)n/2exp

[ 1

22

ni=1

(xi )2]

fn(x|) will be maximized by the value of that min-

114

imizes

Q() =

ni=1

(xi )2 =ni=1

x2i 2ni=1

xi + n2

We see that Q is a quadratic in with positive coefficient

on 2. It follows that Q will be minimized where its

derivative is 0.

dQ()

d= 2

ni=1

xi + 2n = 0

=

ni=1 xin

= x

the M.L.E. of is = X .

Example: Suppose again that X1, ..., Xn form a ran-

dom sample from a normal distribution for which the

mean is unknown and the variance 2 is also unknown.

For all observed values x1, ..., xn, the likelihood function

will be

fn(x|) =1

(22)n/2exp

[ 1

22

ni=1

(xi )2]

The parameter is = (, 2), where < 0.

115

L() = log fn(x|) = n

2log(2)n

2log 2 1

22

ni=1

(xi)2

We shall find the value of = (, 2) for which L() is

maximum.{L =

12

ni=1(xi ) = 0

L2

= n22

+ 124

ni=1(xi )2 = 0

Solve these two equations, we have

= x, 2 = 1nn

i=1(xi x)2.the M.L.E. of is = (, 2) = (X, 1n

ni=1(XiX)2).


sample from the uniform distribution on the interval [0, ],

where > 0. The pdf of each observation is

X f (x|) =

{1 0 x 0 otherwise

The joint pdf (likelihood function) fn(x|) of X1, ..., Xnhas the form

fn(x|) =

{1n 0 xi (i = 1, ..., n)0 otherwise

The MLE of must be a value of for which xi (i =1, ..., n) and that maximizes 1n among all such values.

116

Since 1n is a decreasing function of , the estimate will be

the smallest value of such that xi for i = 1, ..., n.Since this value is = max{x1, ..., xn}, the MLE of is = max{X1, ..., Xn}.

7.6 Properties of Maximum Likelihood Estimators

Theorem: Invariance Property of M.L.E.s. If is the

maximum likelihood estimator of and if g is a one-to-one

function, then g() is the maximum likelihood estimator

of g().


sample from a normal distribution for which both the

mean and the variance 2 are unknown. It was found

that the MLE of = (, 2) is

= (, 2) = (X,1

n

ni=1

(Xi X)2)

From the invariance property, we can conclude that the

MLE of is =2 =

1n

ni=1(Xi X)2, also, the

MLE of 2 + 2 is 2 + 2 = X2 + 1nn

i=1(Xi X)2.

117

Consistency

Under some conditions, the maximum likelihood esti-

mator is consistent. The consistency means that having a

sufficiently large number of observations n, it is possible

to find the value of with arbitrary precision.

limn

P (|n | < ) = 1

Method of Moments

Definition: Assume that X1, ..., Xn form a random

sample from a distribution X .

sample moments: mj = 1nn

i=1Xji for j = 1, ..., k.

population moments: i() = E(Xj) = E(Xji )

for a k-dimensional parameter , set up the k equations

mj = j() and solve for .

Example: we considered a sample of size n from the

gamma distribution with parameters and 1.

we use one equation. m1 =1n

ni=1 xi = x, 1 =

EX = , we let m1 = 1, = x, The method of

moments estimator is then = X .

118

Definition: A random variable X has the gamma distri-

bution with parameters > 0, > 0, if X has a contin-

uous distribution for which the p.d.f. is

f (x) =

{

()x1ex x > 0

0 x 0Theorem: If X Gamma(, ), then

E(X) =

, V ar(X) =

2.

Example: we considered a sample of size n from the

gamma distribution with unknown parameters and .

1 =

, 2 =

( + 1)

2

The method of moments says to replace the right-hand

sides of these equations by the sample moments and then

solve for and .

Let 1 = m1, 2 = m2, then

=m21

m2 m21, =

m1m2 m21

Theorem: The sequence of method of moments esti-

mators based on X1, ..., Xn is a consistent sequence of

estimators of .

119

Documents

Chapter 7_6377_8557_20141201232831