27

Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Statistical estimationStatistical modelling: theory and practice

Gilles Guillot

[email protected]

September 3, 2013

Gilles Guillot ([email protected]) Estimation September 3, 2013 1 / 27

Page 2: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

1 Introductory example

2 Principles of estimation

3 Likelihood theory

4 Reading

5 Exercises

Gilles Guillot ([email protected]) Estimation September 3, 2013 2 / 27

Page 3: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Introductory example

Introductory example

A batch of 1000 electronic components contains some faulty items.One takes a sample of size 100 with replacement, of which 3 arefaulty. What is the proportion of faulty items in the batch?

Arriving in a new city, you see a tram passing in the street with thenumber 16. How many tram lines are there in this city?

For a set of measurements y1, ..., yn of temperatures at dates t1, .., tnobserved at a certain location, we want to �t a line y = at+ b. Whatare the values a and b that best �t the data?

Gilles Guillot ([email protected]) Estimation September 3, 2013 3 / 27

Page 4: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Introductory example

A common set up

We have some data

There is a �mechanism� that generates the data

This mechanism depends on a unknown parameter that we want toestimate

Gilles Guillot ([email protected]) Estimation September 3, 2013 4 / 27

Page 5: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Introductory example

The Statistics way

We relate the unknown parameter to the data by mean of a probabilitydistribution.

Proportion of faulty items: the number of faulty items in te samplecan be assumed to be follow a binomial distribution B(n, p)

Tram lines: the number observed can be assumed to follow a uniformdistribution U{1, ..., N}

Gilles Guillot ([email protected]) Estimation September 3, 2013 5 / 27

Page 6: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Principles of estimation

Estimator, estimate

Estimator

Denoting generically θ the unknown parameter value, an estimator is a rule(or algorithm) allowing us to �guess� θ, from the data.From a mathematical point of view, it is a function

Rn −→ Rd

(x1, ..., xn) −→ θ̂

Gilles Guillot ([email protected]) Estimation September 3, 2013 6 / 27

Page 7: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Principles of estimation

d is the dimension of the parameter space (often d = 1 for us)

Since we assume that data are random, we will often stress this bydenoting them (X1, ..., Xn).

The number θ̂ is an estimate of θ. It is a random variable, denotedsometimes θ̂(X1, ..., Xn)

Gilles Guillot ([email protected]) Estimation September 3, 2013 7 / 27

Page 8: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Principles of estimation

Bias of an estimator

De�nition: bias

The bias of an estimator is the average discrepancy between the estimateand the true parameter value:

Bias(θ̂) = E[θ̂ − θ] = E[θ̂]− θ

An estimator is said to be unbiased if Bias(θ̂) = 0

Gilles Guillot ([email protected]) Estimation September 3, 2013 8 / 27

Page 9: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Principles of estimation

Precision and accuracy

Precision and accuracy are two concepts that belong to science andengineering best explained by the �gure below:

In statistics, we have two related concepts: variance and mean square error.

Gilles Guillot ([email protected]) Estimation September 3, 2013 9 / 27

Page 10: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Principles of estimation

De�nition: variance of an estimator

V [θ̂] = E[(θ̂ − E[θ̂])2

]V [θ̂] is a measure of how much θ̂ is scaterred around its mean (which maydi�er from the true value θ).

De�nition: mean square error of an estimator

MSE[θ̂] = E[(θ̂ − θ)2

]is a measure of how much θ̂ is scaterred around the true value θ.

When θ̂ is unbiased, E[θ̂] = θ hence V [θ̂] = MSE[θ̂].

Gilles Guillot ([email protected]) Estimation September 3, 2013 10 / 27

Page 11: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Principles of estimation

Con�dence interval

De�nition: con�dence interval

A con�dence interval at level (1− α) ∈ [0, 1] is an interval that containsthe true unknown parameter value θ with probbaility 1− α.

Gilles Guillot ([email protected]) Estimation September 3, 2013 11 / 27

Page 12: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Principles of estimation

Example: estimation of a proportion

We have a sample of n objects of which x are faulty. We estimate theunknown proportion p by p̂ = x/n.Exercise: give the bias and variance of p̂.

Gilles Guillot ([email protected]) Estimation September 3, 2013 12 / 27

Page 13: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

A new look at the probability of the data

We consider again the problem of estimating a proportion with binomialsampling.

P (X = x) = px(1− p)n−x is the probability to obtain x faulty objects.

If we consider px(1− p)n−x as a function of p, it can be interpreted asthe likelihood of the unknown parameter p.

To acknowledge the dependence on p, we denote L(x; p) = px(1− p)n−xor for short L(p).

0.0 0.2 0.4 0.6 0.8 1.0

0.00

000.

0005

0.00

100.

0015

0.00

20

p

Pro

babi

lity

of d

ata

Gilles Guillot ([email protected]) Estimation September 3, 2013 13 / 27

Page 14: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

The maximum likelihood principle

The above suggests a method to estimate the unknown parameter p:

p̂ = Argmaxp px(1− p)n−x

p̂ is the parameter value that makes our data most probable,

It is known as the Maximum Likelihood Estimate of p.and denoted p̂ML

Note that de�ning p̂ as Argmaxp

(n

x

)px(1− p)n−x would lead to the same

result as(nx

)does not depend on p.

Gilles Guillot ([email protected]) Estimation September 3, 2013 14 / 27

Page 15: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Likelihood in a general statistical model

De�nition: likelihood function

We consider a dataset consisting of n observations (x1, ..., xn)

We assume that the probability density function or probability massfunction of (x1, ..., xn) denoted by fθ(x1, ..., xn) is known up to anunknown parameter θ.

The likelihood function L is de�ned as

L(x1, ..., xn; θ) = fθ(x1, ..., xn)

Gilles Guillot ([email protected]) Estimation September 3, 2013 15 / 27

Page 16: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Examples of likelihood functions

Poisson counts

We observe the number of phone calls at various calling centres over agiven period and denote them by (x1, ..., xn).

We assume that the xi are independent realizations of a Poissonrandom variable Xi with parameter λ, i.e.P (Xi = x) = exp(−λ)λx/x!

NB: x ∈ N and λ ∈ R+

E[Xi] = λ and V [Xi] = λ

Gilles Guillot ([email protected]) Estimation September 3, 2013 16 / 27

Page 17: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Likelihood for i.i.d Poisson observations

Remember: "Likelihood = probability of data for a given parameter value "

L(x1, ..., xn;λ) =

n∏i=1

exp(−λ)λxi/xi!

= exp(−nλ)λ∑

i xi∏i xi!

∝ exp(−nλ)λ∑

i xi

Gilles Guillot ([email protected]) Estimation September 3, 2013 17 / 27

Page 18: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Likelihood for i.i.d Normal observations

"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+

L(x1, ..., xn;µ, σ) =

n∏i=1

1

σ√

2πexp[−1

2

(xi − µ)2

σ2]

∝n∏i=1

1

σexp[−1

2

(xi − µ)2

σ2]

Gilles Guillot ([email protected]) Estimation September 3, 2013 18 / 27

Page 19: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

General maximum likelihood principle

Maximum likelihood estimator

We consider a dataset consisting of n observations (x1, ..., xn)

We assume that we know the likelihood functionL(x1, ..., xn; θ) = fθ(x1, ..., xn)

The maximum likelihood estimator of θ is de�ned as

θ̂ML(x1, ..., xn) = ArgmaxθL(x1, ..., xn; θ)

Gilles Guillot ([email protected]) Estimation September 3, 2013 19 / 27

Page 20: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Deriving p̂ explicitly for the previous binomial sampling

We want to maximize L(p) = px(1− p)n−xWe could work on L(p) directly in this case but let us denotel(p) = lnL(p).

l(p) = ln[px(1− p)n−x] = x ln p+ (n− x) ln(1− p)l′(p) = x/p− (n− x)/(1− p) (1)

l′(p) = 0 if p = x/n

p̂ = x/n is the estimate of p

for a generic sample with random outcome X, p̂ = X/n is theestimator or p, it is a random variable

Gilles Guillot ([email protected]) Estimation September 3, 2013 20 / 27

Page 21: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Maximum likelihood estimator for i.i.d Poisson observations

Omitting the term that does not depend on λ, we have

l(x1, ..., xn;λ) = lnL(x1, ..., xn;λ) = ln[exp(−nλ)λ∑

i xi ]

= −nλ+∑i

xi lnλ

Henced

dλl(x1, ..., xn;λ) = −n+

∑i

xi/λ

Andd

dλl(x1, ..., xn;λ) = 0⇐⇒ λ =

∑i

xi/n

The MLE of λ is λ̂ML =∑

i xi/n = x̄

Gilles Guillot ([email protected]) Estimation September 3, 2013 21 / 27

Page 22: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Likelihood for i.i.d Normal observations

"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+

L(x1, ..., xn;µ, σ) =

n∏i=1

1

σ√

2πexp[−1

2

(xi − µ)2

σ2]

∝n∏i=1

1

σexp[−1

2

(xi − µ)2

σ2]

l(x1, ..., xn;µ, σ) =

n∑i=1

[− lnσ − 1

2

(xi − µ)2

σ2

]

= −n lnσ − 1

2σ2

n∑i=1

(xi − µ)2

Gilles Guillot ([email protected]) Estimation September 3, 2013 22 / 27

Page 23: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

d

dµl(x1, ..., xn;µ, σ) =

1

2σ2

n∑i=1

(xi − µ)

d

dσl(x1, ..., xn;µ, σ) = −n

σ+

1

σ3

n∑i=1

(xi − µ)2

d

dµl = 0 and

d

dσl = 0 give

n∑i=1

(xi − µ) = 0

and − nσ2 +

n∑i=1

(xi − µ)2 = 0

hence

θ̂ML = (̂µ, σ)ML =

(1

n

n∑i=1

xi,1

n

n∑i=1

(xi − x̄)2

)

NB: The ML estimator of the variance is biased. This bias tends to zero asn −→∞

Gilles Guillot ([email protected]) Estimation September 3, 2013 23 / 27

Page 24: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Remarks on the MLE

The rule �likelihod = product of marginal densities� applies only whenthe observations are independent

Taking the log

linearizes the product into a sum

simpli�es greatly the math expressions in the case density proportional

to exp(axb)xc

avoids numerical instabilities when using numerical computation

Gilles Guillot ([email protected]) Estimation September 3, 2013 24 / 27

Page 25: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Likelihood theory

Remarks on the MLE (cont')

Deriving the MLE in closed form is often impossible in real-lifeproblems. One has to resort to numerical optimization. Hence theimportance of optimization methods in statistics.

If the parameter θ belongs to a discrete set, di�erentiating l(θ) ismeaningless. One has to resort to discrete optimization methods.

The likelihood L(x1, ..., xn; θ) is sometimes denoted L(θ|x1, ..., xn).�

This is misleading and mathematically completely wrong since inthe likelihood theory, θ is not a random variable.

Gilles Guillot ([email protected]) Estimation September 3, 2013 25 / 27

Page 26: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Reading

Reading

To go beyond these slides, you can read the �rst two chapters of In All

Likelihood, Yudi Pawitan, Oxford Science Publications, 2001.This book is not in DTU digital library but almost completely on [Googlebooks ]

Gilles Guillot ([email protected]) Estimation September 3, 2013 26 / 27

Page 27: Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion

Exercises

Exercises

1 We assume that we have recorded the life duration of n light bulbsdenoted x1, ..., xn. We assume that they are n iid replicates of anexponential E(α) distribution.Derive analytically the expression of the MLE of α.

2 Derive analytically the MLE of a for a dataset consisting of n iidreplicates of a U [0, a] distribution. Evaluate the bias of this estimator.What is the limit of the bias when n tends to +∞?

3 For a distribution fθ, the expectation of X under fθ can be expressedas a function φ(θ). The moment method consists in identifying φ(θ)to the empirical mean. Apply this principle to the case above anddiscuss the estimator in terms of bias, variance, other remarks?

Gilles Guillot ([email protected]) Estimation September 3, 2013 27 / 27