23
Maximum Likelihood Estimation 2 (More intraductry explanations) INSTRUCTORDAISUKE NAGAKURA ([email protected]) 1

Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation 2(More intraductry explanations)

INSTRUCTOR: DAISUKE NAGAKURA

([email protected])

1

Page 2: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

◼ Statistical Models

A statistical model is a mechanism that generates

particular (random) variables of interest, which is usually

described with mathematical notations.

For example, a linear regression model

is also a statistical model, that describes how Yi is

generated given Xki, k = 1…., K.

2

0 1 1

2

... ,

( ) 0, var( )

i i K Ki i

i i

Y X X

E

= + + + +

= =

Page 3: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Another examples of statistical models include the one for

a coin-toss. Suppose we toss a coin N times, and Yi = 1

when the flipped coin shows its head in i-th trial, and Yi=0

when it shows the tail. Then Yi may be modeled as

and Yi, i =1,…, N are independent.

This is also a statistical model.

3

1 , with probability ,

0, with probability1 .i

pY

p

=

Page 4: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Yet, another example is a model for stock returns. For

example, we may model a daily stock return rt by an i.i.d.

normal random variable so that

This is also an example of statistical models.

A statistical model typically includes unknown

parameters in it. In the linear regression model, βk, k =

0,…K, and σ2 are unknown model parameters, and for the

model of coin toss, it is p, namely, the probability of

having the head, and for the stock return model, they are

the mean μ and the variance ω2.4

2~ ( , ).tr N

Page 5: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

◼ Estimating Unknown Parameters

Unknown parameters in a model are not observable.

For a model to be fully useful, we need to know the

values of unknown parameters, or at least its approximate

values.

Statistical estimation methods are methods that provide

such values from related observations.

They are typically functions of observations, or are

procedures (or rules) that determine some values as

estimates of unknown parameters

5

Page 6: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

There are several estimation methods depending on

models to be estimated.

Ordinally least square (OLS) method is an example of

statistical estimation methods. It is for a linear regression

model.

Typically, there are many estimation methods applicable

to a model, For example, linear regression models can be

estimated by many methods other than OLS method such

as a method of moments estimation (that we de not deal

with in this class) or the maximum likelihood method

that we will see soon. 6

Page 7: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

7

Maximum Likelihood Estimation

◼ Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) method is one of

the most frequently used estimation methods in

statistics as well as OLS method.

OLS method is very useful for estimating linear

regression models; however, usually cannot be applied

to other kinds of models.

On the other hand, MLE can be applied to many different

models that involve random variables.

Page 8: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

◼Conditions for MLE to be applicable

To apply the MLE, we need to know the joint probability

or probability density function.

The dependent variable is:

a discrete random variable → probability function,

a continuous random variable → density function.

8

Page 9: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

◼How does the MLE estimate?

MLE regards the joint probability (density) function as a

function of unknow parameters in the model,

This function is called the likelihood function.

The MLE defines the ML estimates as the values that

maximizes the likelihood function. Hence, it is called

the maximum likelihood estimation.

9

Page 10: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Example 1: coin toss model

Suppose that we observe N results of coin tosses, Y1, …, YN.

Also suppose that Yi follows a Bernoulli distribution, that is,

Yi =1 with probability p, and Yi = 0 with probability 1− p.

We estimate the unknown parameter p.

For this model, we cannot apply the OLS method for an

obvious reason. However, the MLE can be applied.

10

Page 11: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

To apply the MLE, we need to derive the joint

probability function for Y1, Y2, …., YN.

It is equal to

Pr(Y1= j1, Y2=j2, …., YN = jN)

= Pr(Y1= j1)Pr(Y2=j2)….Pr(YN = jN),

where ji represents the actual result of Yi , or observed

values of Yi , ji takes 1 or 0.

The equality comes from the assumption that Yi, i =1,…,N

are independent.

11

Page 12: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

To express the joint probability function concisely, it is

convenient to express Pr(Yi =j) for j = 0, 1 as

This expression reduces to Pr(Yi =1) = p when j =1, and

Pr(Yi = 0) = 1− p when j = 0.

Then, the joint probability function is concisely written

as

12

1Pr( ) (1 )j j

iY j p p −= = −

1

1

(1 ) .i i

Nj j

i

p p−

=

Page 13: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Given observed values j1,…, jN, we regard this function as

the likelihood function of p, which is denoted by L(p),

namely,

We obtain the ML estimate for p by searching the value of

p that maximizes L(p).

13

1

1

( ) (1 ) .i i

Nj j

i

L p p p−

=

= −

Page 14: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

To find the ML estimate for p, it is computationally more

convenient to maximize log L(p), rather than L(p).

Because if L(p) > L(p’), then log L(p) > log L(p’), the

value of p that maximizes log L(p) is the same as the

value that maximizes L(p).

The function log L(p) is called the log likelihood

function. In the present case, it is given by

14

( )1

1

1 1

log ( ) log (1 )

log( ) log(1 ) (1 )

i iN j j

i

N N

i ii i

L p p p

p j p j

=

= =

= −

= + − −

Page 15: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Finally, we will find the ML estimate. This is done by

solving the first order condition: dlog L(p)/dp = 0.

Because

we have

15

1 1

log ( ) 1 1(1 ) ,

1

N N

i ii i

d L pj j

dp p p= == − −

1 1

1 1

1 1 1

1 1

1 1(1 ) 0

1

(1 ) (1 ) 0

0

1.

N N

i ii i

N N

i ii i

N N N

i i ii i i

N N

i ii i

j jp p

p j p j

j p j Np p j

Np j p jN

= =

= =

= = =

= =

− − =−

− − − =

− − + =

= =

Page 16: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Thus, the ML estimate of p, which we denote by is

given by

This is just a sample average.

16

ˆMLEp

1

1.

N

MLE iip j

N ==

Page 17: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Example 2: stock return model

Suppose that we have T observations of daily stock

returns, r1, …, rT. We assume that they are i.i.d. and

rt ~ N(μ, σ2). We estimate μ and σ2 by the MLE method.

Again, first we have to derive the joint density function

of rt, t =1,…,T.

17

Page 18: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Because the density function of N(μ, σ2) is given by

the likelihood function of μ , σ2 is given as

where

and its log likelihood function is

18

2

22

1 ( )( ) exp ,

22

xf x

−= −

2 2

1( , ) ( ; , ) ,

T

ttL f r

==

2

22

( )1( ) exp ,

22

t

t

rf r

−= −

Page 19: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

because

19

2 2

1

2

2

21

log ( , ) log ( ; , )

( )log(2 ) log( ) ,

2 2 2

T

tt

T t

t

L f r

rT T

=

=

=

−= − − −

2

2 2

2

( )1 1log ( ; , ) log(2 ) log( ) .

2 2 2

t

t

rf r

−= − − −

Page 20: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

We solve the first order condition:

∂log L(μ, σ2)/∂μ = 0, and ∂log L(μ, σ2)/ ∂ σ2 = 0.

First, from the condition ∂log L(μ, σ2)/∂μ = 0, we have

Thus the ML estimate for μ is the sample mean.

Next, the ML estimate for σ2 is obtained by calculating:

20

2

2 1

1 1 1

log ( , ) 1( ) 0

1( ) 0 0

T

tt

T T T

t t tt t t

Lr

r r T rT

=

= = =

= − =

− = − = =

Page 21: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Eventually, we have the ML estimates for μ and σ2 as

and

21

22

2 2 4 1

2 2

1

2 2

1

log ( , ) 1 1 1( ) 0

2 2

( ) 0

1( ) .

T

tt

T

tt

T

tt

L Tr

T r

rT

=

=

=

= − + − =

− + − =

= −

1

T

MLE t

t

rT

=

= 2 2

1

1ˆ ˆ( ) .

T

MLE t MLE

t

rT

=

= −

Page 22: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

22

Maximum Likelihood Estimation

◼ General properties of the ML estimator

Here we loosely state typical properties of the ML

estimator. The ML estimator has the following properties:

1. It is consistent under mild conditions.

2. It is not usually unbiased estimator.

3. It has approximately the smallest asymptotic

variance among all estimators.

The first and third properties are the reasons why the ML

estimator is very useful in practice. Roughly speaking it is

the best estimator.

Page 23: Maximum Likelihood Estimation 2 (More intraductry ...user.keio.ac.jp/~nagakura/ae2020/...2020_slide4.5.pdf · Maximum Likelihood Estimation Statistical Models A statistical model

Maximum Likelihood Estimation

Exercise

Suppose that Yi takes Yi = 0, 1, 2, …… Suppose also that

Yi follows a Poisson distribution with parameter λ.

Given N observations, yi, i =1,…,N for Yi, estimate the

parameter λ by the MLE method.

Hint: A Poisson random variable X is a discrete random

variable that takes non-negative integer values and its

probability function is given by

where λ > 0.23

exp( )Pr( ) , 0,1,2,...

!

k

X k kk

−= = =