Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation 2(More intraductry explanations)

INSTRUCTOR： DAISUKE NAGAKURA

([email protected])

1

Maximum Likelihood Estimation

◼ Statistical Models

A statistical model is a mechanism that generates

particular (random) variables of interest, which is usually

described with mathematical notations.

For example, a linear regression model

is also a statistical model, that describes how Yi is

generated given Xki, k = 1…., K.

2

0 1 1

2

... ,

( ) 0, var( )

i i K Ki i

i i

Y X X

E

= + + + +

= =


Another examples of statistical models include the one for

a coin-toss. Suppose we toss a coin N times, and Yi = 1

when the flipped coin shows its head in i-th trial, and Yi=0

when it shows the tail. Then Yi may be modeled as

and Yi, i =1,…, N are independent.

This is also a statistical model.

3

1 , with probability ,

0, with probability1 .i

pY

p

=

−


Yet, another example is a model for stock returns. For

example, we may model a daily stock return rt by an i.i.d.

normal random variable so that

This is also an example of statistical models.

A statistical model typically includes unknown

parameters in it. In the linear regression model, βk, k =

0,…K, and σ2 are unknown model parameters, and for the

model of coin toss, it is p, namely, the probability of

having the head, and for the stock return model, they are

the mean μ and the variance ω2.4

2~ ( , ).tr N


◼ Estimating Unknown Parameters

Unknown parameters in a model are not observable.

For a model to be fully useful, we need to know the

values of unknown parameters, or at least its approximate

values.

Statistical estimation methods are methods that provide

such values from related observations.

They are typically functions of observations, or are

procedures (or rules) that determine some values as

estimates of unknown parameters

5


There are several estimation methods depending on

models to be estimated.

Ordinally least square (OLS) method is an example of

statistical estimation methods. It is for a linear regression

model.

Typically, there are many estimation methods applicable

to a model, For example, linear regression models can be

estimated by many methods other than OLS method such

as a method of moments estimation (that we de not deal

with in this class) or the maximum likelihood method

that we will see soon. 6

7


◼ Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) method is one of

the most frequently used estimation methods in

statistics as well as OLS method.

OLS method is very useful for estimating linear

regression models; however, usually cannot be applied

to other kinds of models.

On the other hand, MLE can be applied to many different

models that involve random variables.


◼Conditions for MLE to be applicable

To apply the MLE, we need to know the joint probability

or probability density function.

The dependent variable is:

a discrete random variable → probability function,

a continuous random variable → density function.

8


◼How does the MLE estimate?

MLE regards the joint probability (density) function as a

function of unknow parameters in the model,

This function is called the likelihood function.

The MLE defines the ML estimates as the values that

maximizes the likelihood function. Hence, it is called

the maximum likelihood estimation.

9


Example 1: coin toss model

Suppose that we observe N results of coin tosses, Y1, …, YN.

Also suppose that Yi follows a Bernoulli distribution, that is,

Yi =1 with probability p, and Yi = 0 with probability 1− p.

We estimate the unknown parameter p.

For this model, we cannot apply the OLS method for an

obvious reason. However, the MLE can be applied.

10


To apply the MLE, we need to derive the joint

probability function for Y1, Y2, …., YN.

It is equal to

Pr(Y1= j1, Y2=j2, …., YN = jN)

= Pr(Y1= j1)Pr(Y2=j2)….Pr(YN = jN),

where ji represents the actual result of Yi , or observed

values of Yi , ji takes 1 or 0.

The equality comes from the assumption that Yi, i =1,…,N

are independent.

11


To express the joint probability function concisely, it is

convenient to express Pr(Yi =j) for j = 0, 1 as

This expression reduces to Pr(Yi =1) = p when j =1, and

Pr(Yi = 0) = 1− p when j = 0.

Then, the joint probability function is concisely written

as

12

1Pr( ) (1 )j j

iY j p p −= = −

1

1

(1 ) .i i

Nj j

i

p p−

=

−


Given observed values j1,…, jN, we regard this function as

the likelihood function of p, which is denoted by L(p),

namely,

We obtain the ML estimate for p by searching the value of

p that maximizes L(p).

13

1

1

( ) (1 ) .i i

Nj j

i

L p p p−

=

= −


To find the ML estimate for p, it is computationally more

convenient to maximize log L(p), rather than L(p).

Because if L(p) > L(p’), then log L(p) > log L(p’), the

value of p that maximizes log L(p) is the same as the

value that maximizes L(p).

The function log L(p) is called the log likelihood

function. In the present case, it is given by

14

( )1

1

1 1

log ( ) log (1 )

log( ) log(1 ) (1 )

i iN j j

i

N N

i ii i

L p p p

p j p j

−

=

= =

= −

= + − −


Finally, we will find the ML estimate. This is done by

solving the first order condition: dlog L(p)/dp = 0.

Because

we have

15

1 1

log ( ) 1 1(1 ) ,

1

N N

i ii i

d L pj j

dp p p= == − −

−

1 1

1 1

1 1 1

1 1

1 1(1 ) 0

1

(1 ) (1 ) 0

0

1.

N N

i ii i

N N

i ii i

N N N

i i ii i i

N N

i ii i

j jp p

p j p j

j p j Np p j

Np j p jN

= =

= =

= = =

= =

− − =−

− − − =

− − + =

= =


Thus, the ML estimate of p, which we denote by is

given by

This is just a sample average.

16

ˆMLEp

1

1.

N

MLE iip j

N ==


Example 2: stock return model

Suppose that we have T observations of daily stock

returns, r1, …, rT. We assume that they are i.i.d. and

rt ~ N(μ, σ2). We estimate μ and σ2 by the MLE method.

Again, first we have to derive the joint density function

of rt, t =1,…,T.

17


Because the density function of N(μ, σ2) is given by

the likelihood function of μ , σ2 is given as

where

and its log likelihood function is

18

2

22

1 ( )( ) exp ,

22

xf x

−= −

2 2

1( , ) ( ; , ) ,

T

ttL f r

==

2

22

( )1( ) exp ,

22

t

t

rf r

−= −


because

19

2 2

1

2

2

21

log ( , ) log ( ; , )

( )log(2 ) log( ) ,

2 2 2

T

tt

T t

t

L f r

rT T

=

=

=

−= − − −

2

2 2

2

( )1 1log ( ; , ) log(2 ) log( ) .

2 2 2

t

t

rf r

−= − − −


We solve the first order condition:

∂log L(μ, σ2)/∂μ = 0, and ∂log L(μ, σ2)/ ∂ σ2 = 0.

First, from the condition ∂log L(μ, σ2)/∂μ = 0, we have

Thus the ML estimate for μ is the sample mean.

Next, the ML estimate for σ2 is obtained by calculating:

20

2

2 1

1 1 1

log ( , ) 1( ) 0

1( ) 0 0

T

tt

T T T

t t tt t t

Lr

r r T rT

=

= = =

= − =

− = − = =


Eventually, we have the ML estimates for μ and σ2 as

and

21

22

2 2 4 1

2 2

1

2 2

1

log ( , ) 1 1 1( ) 0

2 2

( ) 0

1( ) .

T

tt

T

tt

T

tt

L Tr

T r

rT

=

=

=

= − + − =

− + − =

= −

1

1ˆ

T

MLE t

t

rT

=

= 2 2

1

1ˆ ˆ( ) .

T

MLE t MLE

t

rT

=

= −

22


◼ General properties of the ML estimator

Here we loosely state typical properties of the ML

estimator. The ML estimator has the following properties:

1. It is consistent under mild conditions.

2. It is not usually unbiased estimator.

3. It has approximately the smallest asymptotic

variance among all estimators.

The first and third properties are the reasons why the ML

estimator is very useful in practice. Roughly speaking it is

the best estimator.


Exercise

Suppose that Yi takes Yi = 0, 1, 2, …… Suppose also that

Yi follows a Poisson distribution with parameter λ.

Given N observations, yi, i =1,…,N for Yi, estimate the

parameter λ by the MLE method.

Hint: A Poisson random variable X is a discrete random

variable that takes non-negative integer values and its

probability function is given by

where λ > 0.23

exp( )Pr( ) , 0,1,2,...

!

k

X k kk

−= = =

Documents

Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model