Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Maximum Likelihood Estimation 2(More intraductry explanations)
INSTRUCTOR: DAISUKE NAGAKURA
1
Maximum Likelihood Estimation
◼ Statistical Models
A statistical model is a mechanism that generates
particular (random) variables of interest, which is usually
described with mathematical notations.
For example, a linear regression model
is also a statistical model, that describes how Yi is
generated given Xki, k = 1…., K.
2
0 1 1
2
... ,
( ) 0, var( )
i i K Ki i
i i
Y X X
E
= + + + +
= =
Maximum Likelihood Estimation
Another examples of statistical models include the one for
a coin-toss. Suppose we toss a coin N times, and Yi = 1
when the flipped coin shows its head in i-th trial, and Yi=0
when it shows the tail. Then Yi may be modeled as
and Yi, i =1,…, N are independent.
This is also a statistical model.
3
1 , with probability ,
0, with probability1 .i
pY
p
=
−
Maximum Likelihood Estimation
Yet, another example is a model for stock returns. For
example, we may model a daily stock return rt by an i.i.d.
normal random variable so that
This is also an example of statistical models.
A statistical model typically includes unknown
parameters in it. In the linear regression model, βk, k =
0,…K, and σ2 are unknown model parameters, and for the
model of coin toss, it is p, namely, the probability of
having the head, and for the stock return model, they are
the mean μ and the variance ω2.4
2~ ( , ).tr N
Maximum Likelihood Estimation
◼ Estimating Unknown Parameters
Unknown parameters in a model are not observable.
For a model to be fully useful, we need to know the
values of unknown parameters, or at least its approximate
values.
Statistical estimation methods are methods that provide
such values from related observations.
They are typically functions of observations, or are
procedures (or rules) that determine some values as
estimates of unknown parameters
5
Maximum Likelihood Estimation
There are several estimation methods depending on
models to be estimated.
Ordinally least square (OLS) method is an example of
statistical estimation methods. It is for a linear regression
model.
Typically, there are many estimation methods applicable
to a model, For example, linear regression models can be
estimated by many methods other than OLS method such
as a method of moments estimation (that we de not deal
with in this class) or the maximum likelihood method
that we will see soon. 6
7
Maximum Likelihood Estimation
◼ Maximum Likelihood Estimation
Maximum likelihood estimation (MLE) method is one of
the most frequently used estimation methods in
statistics as well as OLS method.
OLS method is very useful for estimating linear
regression models; however, usually cannot be applied
to other kinds of models.
On the other hand, MLE can be applied to many different
models that involve random variables.
Maximum Likelihood Estimation
◼Conditions for MLE to be applicable
To apply the MLE, we need to know the joint probability
or probability density function.
The dependent variable is:
a discrete random variable → probability function,
a continuous random variable → density function.
8
Maximum Likelihood Estimation
◼How does the MLE estimate?
MLE regards the joint probability (density) function as a
function of unknow parameters in the model,
This function is called the likelihood function.
The MLE defines the ML estimates as the values that
maximizes the likelihood function. Hence, it is called
the maximum likelihood estimation.
9
Maximum Likelihood Estimation
Example 1: coin toss model
Suppose that we observe N results of coin tosses, Y1, …, YN.
Also suppose that Yi follows a Bernoulli distribution, that is,
Yi =1 with probability p, and Yi = 0 with probability 1− p.
We estimate the unknown parameter p.
For this model, we cannot apply the OLS method for an
obvious reason. However, the MLE can be applied.
10
Maximum Likelihood Estimation
To apply the MLE, we need to derive the joint
probability function for Y1, Y2, …., YN.
It is equal to
Pr(Y1= j1, Y2=j2, …., YN = jN)
= Pr(Y1= j1)Pr(Y2=j2)….Pr(YN = jN),
where ji represents the actual result of Yi , or observed
values of Yi , ji takes 1 or 0.
The equality comes from the assumption that Yi, i =1,…,N
are independent.
11
Maximum Likelihood Estimation
To express the joint probability function concisely, it is
convenient to express Pr(Yi =j) for j = 0, 1 as
This expression reduces to Pr(Yi =1) = p when j =1, and
Pr(Yi = 0) = 1− p when j = 0.
Then, the joint probability function is concisely written
as
12
1Pr( ) (1 )j j
iY j p p −= = −
1
1
(1 ) .i i
Nj j
i
p p−
=
−
Maximum Likelihood Estimation
Given observed values j1,…, jN, we regard this function as
the likelihood function of p, which is denoted by L(p),
namely,
We obtain the ML estimate for p by searching the value of
p that maximizes L(p).
13
1
1
( ) (1 ) .i i
Nj j
i
L p p p−
=
= −
Maximum Likelihood Estimation
To find the ML estimate for p, it is computationally more
convenient to maximize log L(p), rather than L(p).
Because if L(p) > L(p’), then log L(p) > log L(p’), the
value of p that maximizes log L(p) is the same as the
value that maximizes L(p).
The function log L(p) is called the log likelihood
function. In the present case, it is given by
14
( )1
1
1 1
log ( ) log (1 )
log( ) log(1 ) (1 )
i iN j j
i
N N
i ii i
L p p p
p j p j
−
=
= =
= −
= + − −
Maximum Likelihood Estimation
Finally, we will find the ML estimate. This is done by
solving the first order condition: dlog L(p)/dp = 0.
Because
we have
15
1 1
log ( ) 1 1(1 ) ,
1
N N
i ii i
d L pj j
dp p p= == − −
−
1 1
1 1
1 1 1
1 1
1 1(1 ) 0
1
(1 ) (1 ) 0
0
1.
N N
i ii i
N N
i ii i
N N N
i i ii i i
N N
i ii i
j jp p
p j p j
j p j Np p j
Np j p jN
= =
= =
= = =
= =
− − =−
− − − =
− − + =
= =
Maximum Likelihood Estimation
Thus, the ML estimate of p, which we denote by is
given by
This is just a sample average.
16
ˆMLEp
1
1.
N
MLE iip j
N ==
Maximum Likelihood Estimation
Example 2: stock return model
Suppose that we have T observations of daily stock
returns, r1, …, rT. We assume that they are i.i.d. and
rt ~ N(μ, σ2). We estimate μ and σ2 by the MLE method.
Again, first we have to derive the joint density function
of rt, t =1,…,T.
17
Maximum Likelihood Estimation
Because the density function of N(μ, σ2) is given by
the likelihood function of μ , σ2 is given as
where
and its log likelihood function is
18
2
22
1 ( )( ) exp ,
22
xf x
−= −
2 2
1( , ) ( ; , ) ,
T
ttL f r
==
2
22
( )1( ) exp ,
22
t
t
rf r
−= −
Maximum Likelihood Estimation
because
19
2 2
1
2
2
21
log ( , ) log ( ; , )
( )log(2 ) log( ) ,
2 2 2
T
tt
T t
t
L f r
rT T
=
=
=
−= − − −
2
2 2
2
( )1 1log ( ; , ) log(2 ) log( ) .
2 2 2
t
t
rf r
−= − − −
Maximum Likelihood Estimation
We solve the first order condition:
∂log L(μ, σ2)/∂μ = 0, and ∂log L(μ, σ2)/ ∂ σ2 = 0.
First, from the condition ∂log L(μ, σ2)/∂μ = 0, we have
Thus the ML estimate for μ is the sample mean.
Next, the ML estimate for σ2 is obtained by calculating:
20
2
2 1
1 1 1
log ( , ) 1( ) 0
1( ) 0 0
T
tt
T T T
t t tt t t
Lr
r r T rT
=
= = =
= − =
− = − = =
Maximum Likelihood Estimation
Eventually, we have the ML estimates for μ and σ2 as
and
21
22
2 2 4 1
2 2
1
2 2
1
log ( , ) 1 1 1( ) 0
2 2
( ) 0
1( ) .
T
tt
T
tt
T
tt
L Tr
T r
rT
=
=
=
= − + − =
− + − =
= −
1
1ˆ
T
MLE t
t
rT
=
= 2 2
1
1ˆ ˆ( ) .
T
MLE t MLE
t
rT
=
= −
22
Maximum Likelihood Estimation
◼ General properties of the ML estimator
Here we loosely state typical properties of the ML
estimator. The ML estimator has the following properties:
1. It is consistent under mild conditions.
2. It is not usually unbiased estimator.
3. It has approximately the smallest asymptotic
variance among all estimators.
The first and third properties are the reasons why the ML
estimator is very useful in practice. Roughly speaking it is
the best estimator.
Maximum Likelihood Estimation
Exercise
Suppose that Yi takes Yi = 0, 1, 2, …… Suppose also that
Yi follows a Poisson distribution with parameter λ.
Given N observations, yi, i =1,…,N for Yi, estimate the
parameter λ by the MLE method.
Hint: A Poisson random variable X is a discrete random
variable that takes non-negative integer values and its
probability function is given by
where λ > 0.23
exp( )Pr( ) , 0,1,2,...
!
k
X k kk
−= = =