Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Statistical estimationStatistical modelling: theory and practice

Gilles Guillot

[email protected]

September 3, 2013

Gilles Guillot ([email protected]) Estimation September 3, 2013 1 / 27

1 Introductory example

2 Principles of estimation

3 Likelihood theory

4 Reading

5 Exercises


Introductory example


A batch of 1000 electronic components contains some faulty items.One takes a sample of size 100 with replacement, of which 3 arefaulty. What is the proportion of faulty items in the batch?

Arriving in a new city, you see a tram passing in the street with thenumber 16. How many tram lines are there in this city?

For a set of measurements y1, ..., yn of temperatures at dates t1, .., tnobserved at a certain location, we want to �t a line y = at+ b. Whatare the values a and b that best �t the data?



A common set up

We have some data

There is a �mechanism� that generates the data

This mechanism depends on a unknown parameter that we want toestimate



The Statistics way

We relate the unknown parameter to the data by mean of a probabilitydistribution.

Proportion of faulty items: the number of faulty items in te samplecan be assumed to be follow a binomial distribution B(n, p)

Tram lines: the number observed can be assumed to follow a uniformdistribution U{1, ..., N}


Principles of estimation

Estimator, estimate

Estimator

Denoting generically θ the unknown parameter value, an estimator is a rule(or algorithm) allowing us to �guess� θ, from the data.From a mathematical point of view, it is a function

Rn −→ Rd

(x1, ..., xn) −→ θ̂



d is the dimension of the parameter space (often d = 1 for us)

Since we assume that data are random, we will often stress this bydenoting them (X1, ..., Xn).

The number θ̂ is an estimate of θ. It is a random variable, denotedsometimes θ̂(X1, ..., Xn)



Bias of an estimator

De�nition: bias

The bias of an estimator is the average discrepancy between the estimateand the true parameter value:

Bias(θ̂) = E[θ̂ − θ] = E[θ̂]− θ

An estimator is said to be unbiased if Bias(θ̂) = 0



Precision and accuracy

Precision and accuracy are two concepts that belong to science andengineering best explained by the �gure below:

In statistics, we have two related concepts: variance and mean square error.



De�nition: variance of an estimator

V [θ̂] = E[(θ̂ − E[θ̂])2

]V [θ̂] is a measure of how much θ̂ is scaterred around its mean (which maydi�er from the true value θ).

De�nition: mean square error of an estimator

MSE[θ̂] = E[(θ̂ − θ)2

]is a measure of how much θ̂ is scaterred around the true value θ.

When θ̂ is unbiased, E[θ̂] = θ hence V [θ̂] = MSE[θ̂].



Con�dence interval

De�nition: con�dence interval

A con�dence interval at level (1− α) ∈ [0, 1] is an interval that containsthe true unknown parameter value θ with probbaility 1− α.



Example: estimation of a proportion

We have a sample of n objects of which x are faulty. We estimate theunknown proportion p by p̂ = x/n.Exercise: give the bias and variance of p̂.


Likelihood theory

A new look at the probability of the data

We consider again the problem of estimating a proportion with binomialsampling.

P (X = x) = px(1− p)n−x is the probability to obtain x faulty objects.

If we consider px(1− p)n−x as a function of p, it can be interpreted asthe likelihood of the unknown parameter p.

To acknowledge the dependence on p, we denote L(x; p) = px(1− p)n−xor for short L(p).

0.0 0.2 0.4 0.6 0.8 1.0

0.00

000.

0005

0.00

100.

0015

0.00

20

p

Pro

babi

lity

of d

ata


Likelihood theory

The maximum likelihood principle

The above suggests a method to estimate the unknown parameter p:

p̂ = Argmaxp px(1− p)n−x

p̂ is the parameter value that makes our data most probable,

It is known as the Maximum Likelihood Estimate of p.and denoted p̂ML

Note that de�ning p̂ as Argmaxp

(n

x

)px(1− p)n−x would lead to the same

result as(nx

)does not depend on p.


Likelihood theory

Likelihood in a general statistical model

De�nition: likelihood function

We consider a dataset consisting of n observations (x1, ..., xn)

We assume that the probability density function or probability massfunction of (x1, ..., xn) denoted by fθ(x1, ..., xn) is known up to anunknown parameter θ.

The likelihood function L is de�ned as

L(x1, ..., xn; θ) = fθ(x1, ..., xn)


Likelihood theory

Examples of likelihood functions

Poisson counts

We observe the number of phone calls at various calling centres over agiven period and denote them by (x1, ..., xn).

We assume that the xi are independent realizations of a Poissonrandom variable Xi with parameter λ, i.e.P (Xi = x) = exp(−λ)λx/x!

NB: x ∈ N and λ ∈ R+

E[Xi] = λ and V [Xi] = λ


Likelihood theory

Likelihood for i.i.d Poisson observations

Remember: "Likelihood = probability of data for a given parameter value "

L(x1, ..., xn;λ) =

n∏i=1

exp(−λ)λxi/xi!

= exp(−nλ)λ∑

i xi∏i xi!

∝ exp(−nλ)λ∑

i xi


Likelihood theory

Likelihood for i.i.d Normal observations

"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+

L(x1, ..., xn;µ, σ) =

n∏i=1

1

σ√

2πexp[−1

2

(xi − µ)2

σ2]

∝n∏i=1

1

σexp[−1

2

(xi − µ)2

σ2]


Likelihood theory

General maximum likelihood principle

Maximum likelihood estimator

We consider a dataset consisting of n observations (x1, ..., xn)

We assume that we know the likelihood functionL(x1, ..., xn; θ) = fθ(x1, ..., xn)

The maximum likelihood estimator of θ is de�ned as

θ̂ML(x1, ..., xn) = ArgmaxθL(x1, ..., xn; θ)


Likelihood theory

Deriving p̂ explicitly for the previous binomial sampling

We want to maximize L(p) = px(1− p)n−xWe could work on L(p) directly in this case but let us denotel(p) = lnL(p).

l(p) = ln[px(1− p)n−x] = x ln p+ (n− x) ln(1− p)l′(p) = x/p− (n− x)/(1− p) (1)

l′(p) = 0 if p = x/n

p̂ = x/n is the estimate of p

for a generic sample with random outcome X, p̂ = X/n is theestimator or p, it is a random variable


Likelihood theory

Maximum likelihood estimator for i.i.d Poisson observations

Omitting the term that does not depend on λ, we have

l(x1, ..., xn;λ) = lnL(x1, ..., xn;λ) = ln[exp(−nλ)λ∑

i xi ]

= −nλ+∑i

xi lnλ

Henced

dλl(x1, ..., xn;λ) = −n+

∑i

xi/λ

Andd

dλl(x1, ..., xn;λ) = 0⇐⇒ λ =

∑i

xi/n

The MLE of λ is λ̂ML =∑

i xi/n = x̄


Likelihood theory

Likelihood for i.i.d Normal observations

"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+

L(x1, ..., xn;µ, σ) =

n∏i=1

1

σ√

2πexp[−1

2

(xi − µ)2

σ2]

∝n∏i=1

1

σexp[−1

2

(xi − µ)2

σ2]

l(x1, ..., xn;µ, σ) =

n∑i=1

[− lnσ − 1

2

(xi − µ)2

σ2

]

= −n lnσ − 1

2σ2

n∑i=1

(xi − µ)2


Likelihood theory

d

dµl(x1, ..., xn;µ, σ) =

1

2σ2

n∑i=1

(xi − µ)

d

dσl(x1, ..., xn;µ, σ) = −n

σ+

1

σ3

n∑i=1

(xi − µ)2

d

dµl = 0 and

d

dσl = 0 give

n∑i=1

(xi − µ) = 0

and − nσ2 +

n∑i=1

(xi − µ)2 = 0

hence

θ̂ML = (̂µ, σ)ML =

(1

n

n∑i=1

xi,1

n

n∑i=1

(xi − x̄)2

)

NB: The ML estimator of the variance is biased. This bias tends to zero asn −→∞


Likelihood theory

Remarks on the MLE

The rule �likelihod = product of marginal densities� applies only whenthe observations are independent

Taking the log

linearizes the product into a sum

simpli�es greatly the math expressions in the case density proportional

to exp(axb)xc

avoids numerical instabilities when using numerical computation


Likelihood theory

Remarks on the MLE (cont')

Deriving the MLE in closed form is often impossible in real-lifeproblems. One has to resort to numerical optimization. Hence theimportance of optimization methods in statistics.

If the parameter θ belongs to a discrete set, di�erentiating l(θ) ismeaningless. One has to resort to discrete optimization methods.

The likelihood L(x1, ..., xn; θ) is sometimes denoted L(θ|x1, ..., xn).�

This is misleading and mathematically completely wrong since inthe likelihood theory, θ is not a random variable.


Reading

Reading

To go beyond these slides, you can read the �rst two chapters of In All

Likelihood, Yudi Pawitan, Oxford Science Publications, 2001.This book is not in DTU digital library but almost completely on [Googlebooks ]


http://books.google.dk/books?id=M-3pSCVxV5oC&printsec=frontcover&dq=in+all+likelihoods&source=bl&ots=BQVjeuEm5J&sig=9lgg72YnfS9haeOcY7VaJGOZAN4&hl=en&sa=X&ei=KuluUInmC47O4QTMk4DADw&redir_esc=y

http://books.google.dk/books?id=M-3pSCVxV5oC&printsec=frontcover&dq=in+all+likelihoods&source=bl&ots=BQVjeuEm5J&sig=9lgg72YnfS9haeOcY7VaJGOZAN4&hl=en&sa=X&ei=KuluUInmC47O4QTMk4DADw&redir_esc=y

Exercises

Exercises

1 We assume that we have recorded the life duration of n light bulbsdenoted x1, ..., xn. We assume that they are n iid replicates of anexponential E(α) distribution.Derive analytically the expression of the MLE of α.

2 Derive analytically the MLE of a for a dataset consisting of n iidreplicates of a U [0, a] distribution. Evaluate the bias of this estimator.What is the limit of the bias when n tends to +∞?

3 For a distribution fθ, the expectation of X under fθ can be expressedas a function φ(θ). The moment method consists in identifying φ(θ)to the empirical mean. Apply this principle to the case above anddiscuss the estimator in terms of bias, variance, other remarks?


Documents

Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion