Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look...

Statistical estimationStatistical modelling: theory and practice

Gilles Guillot

gigu@dtu.dk

September 3, 2013

Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 1 / 27

1 Introductory example

2 Principles of estimation

3 Likelihood theory

4 Reading

5 Exercises

Introductory example

A batch of 1000 electronic components contains some faulty items.One takes a sample of size 100 with replacement, of which 3 arefaulty. What is the proportion of faulty items in the batch?

Arriving in a new city, you see a tram passing in the street with thenumber 16. How many tram lines are there in this city?

For a set of measurements y1, ..., yn of temperatures at dates t1, .., tnobserved at a certain location, we want to �t a line y = at+ b. Whatare the values a and b that best �t the data?

A common set up

We have some data

There is a �mechanism� that generates the data

This mechanism depends on a unknown parameter that we want toestimate

The Statistics way

We relate the unknown parameter to the data by mean of a probabilitydistribution.

Proportion of faulty items: the number of faulty items in te samplecan be assumed to be follow a binomial distribution B(n, p)

Tram lines: the number observed can be assumed to follow a uniformdistribution U{1, ..., N}

Principles of estimation

Estimator, estimate

Estimator

Denoting generically θ the unknown parameter value, an estimator is a rule(or algorithm) allowing us to �guess� θ, from the data.From a mathematical point of view, it is a function

Rn −→ Rd

(x1, ..., xn) −→ θ̂

d is the dimension of the parameter space (often d = 1 for us)

Since we assume that data are random, we will often stress this bydenoting them (X1, ..., Xn).

The number θ̂ is an estimate of θ. It is a random variable, denotedsometimes θ̂(X1, ..., Xn)

Bias of an estimator

De�nition: bias

The bias of an estimator is the average discrepancy between the estimateand the true parameter value:

Bias(θ̂) = E[θ̂ − θ] = E[θ̂]− θ

An estimator is said to be unbiased if Bias(θ̂) = 0

Precision and accuracy

Precision and accuracy are two concepts that belong to science andengineering best explained by the �gure below:

In statistics, we have two related concepts: variance and mean square error.

De�nition: variance of an estimator

V [θ̂] = E[(θ̂ − E[θ̂])2

]V [θ̂] is a measure of how much θ̂ is scaterred around its mean (which maydi�er from the true value θ).

De�nition: mean square error of an estimator

MSE[θ̂] = E[(θ̂ − θ)2

]is a measure of how much θ̂ is scaterred around the true value θ.

When θ̂ is unbiased, E[θ̂] = θ hence V [θ̂] = MSE[θ̂].

Con�dence interval

De�nition: con�dence interval

A con�dence interval at level (1− α) ∈ [0, 1] is an interval that containsthe true unknown parameter value θ with probbaility 1− α.

Example: estimation of a proportion

We have a sample of n objects of which x are faulty. We estimate theunknown proportion p by p̂ = x/n.Exercise: give the bias and variance of p̂.

Likelihood theory

A new look at the probability of the data

We consider again the problem of estimating a proportion with binomialsampling.

P (X = x) = px(1− p)n−x is the probability to obtain x faulty objects.

If we consider px(1− p)n−x as a function of p, it can be interpreted asthe likelihood of the unknown parameter p.

To acknowledge the dependence on p, we denote L(x; p) = px(1− p)n−xor for short L(p).

0.0 0.2 0.4 0.6 0.8 1.0

Likelihood theory

The maximum likelihood principle

The above suggests a method to estimate the unknown parameter p:

p̂ = Argmaxp px(1− p)n−x

p̂ is the parameter value that makes our data most probable,

It is known as the Maximum Likelihood Estimate of p.and denoted p̂ML

Note that de�ning p̂ as Argmaxp

)px(1− p)n−x would lead to the same

result as(nx

)does not depend on p.

Likelihood theory

Likelihood in a general statistical model

De�nition: likelihood function

We consider a dataset consisting of n observations (x1, ..., xn)

We assume that the probability density function or probability massfunction of (x1, ..., xn) denoted by fθ(x1, ..., xn) is known up to anunknown parameter θ.

The likelihood function L is de�ned as

L(x1, ..., xn; θ) = fθ(x1, ..., xn)

Likelihood theory

Examples of likelihood functions

Poisson counts

We observe the number of phone calls at various calling centres over agiven period and denote them by (x1, ..., xn).

We assume that the xi are independent realizations of a Poissonrandom variable Xi with parameter λ, i.e.P (Xi = x) = exp(−λ)λx/x!

NB: x ∈ N and λ ∈ R+

E[Xi] = λ and V [Xi] = λ

Likelihood theory

Likelihood for i.i.d Poisson observations

Remember: "Likelihood = probability of data for a given parameter value "

L(x1, ..., xn;λ) =

n∏i=1

exp(−λ)λxi/xi!

= exp(−nλ)λ∑

i xi∏i xi!

∝ exp(−nλ)λ∑

Likelihood theory

Likelihood for i.i.d Normal observations

"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+

L(x1, ..., xn;µ, σ) =

n∏i=1

2πexp[−1

(xi − µ)2

∝n∏i=1

σexp[−1

(xi − µ)2

Likelihood theory

General maximum likelihood principle

Maximum likelihood estimator

We consider a dataset consisting of n observations (x1, ..., xn)

We assume that we know the likelihood functionL(x1, ..., xn; θ) = fθ(x1, ..., xn)

The maximum likelihood estimator of θ is de�ned as

θ̂ML(x1, ..., xn) = ArgmaxθL(x1, ..., xn; θ)

Likelihood theory

Deriving p̂ explicitly for the previous binomial sampling

We want to maximize L(p) = px(1− p)n−xWe could work on L(p) directly in this case but let us denotel(p) = lnL(p).

l(p) = ln[px(1− p)n−x] = x ln p+ (n− x) ln(1− p)l′(p) = x/p− (n− x)/(1− p) (1)

l′(p) = 0 if p = x/n

p̂ = x/n is the estimate of p

for a generic sample with random outcome X, p̂ = X/n is theestimator or p, it is a random variable

Likelihood theory

Maximum likelihood estimator for i.i.d Poisson observations

Omitting the term that does not depend on λ, we have

l(x1, ..., xn;λ) = lnL(x1, ..., xn;λ) = ln[exp(−nλ)λ∑

i xi ]

= −nλ+∑i

xi lnλ

Henced

dλl(x1, ..., xn;λ) = −n+

dλl(x1, ..., xn;λ) = 0⇐⇒ λ =

The MLE of λ is λ̂ML =∑

i xi/n = x̄

Likelihood theory

Likelihood for i.i.d Normal observations

"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+

L(x1, ..., xn;µ, σ) =

n∏i=1

2πexp[−1

(xi − µ)2

∝n∏i=1

σexp[−1

(xi − µ)2

l(x1, ..., xn;µ, σ) =

n∑i=1

[− lnσ − 1

(xi − µ)2

= −n lnσ − 1

n∑i=1

(xi − µ)2

Likelihood theory

dµl(x1, ..., xn;µ, σ) =

n∑i=1

(xi − µ)

dσl(x1, ..., xn;µ, σ) = −n

n∑i=1

(xi − µ)2

dµl = 0 and

dσl = 0 give

n∑i=1

(xi − µ) = 0

and − nσ2 +

n∑i=1

(xi − µ)2 = 0

θ̂ML = (̂µ, σ)ML =

n∑i=1

(xi − x̄)2

NB: The ML estimator of the variance is biased. This bias tends to zero asn −→∞

Likelihood theory

Remarks on the MLE

The rule �likelihod = product of marginal densities� applies only whenthe observations are independent

Taking the log

linearizes the product into a sum

simpli�es greatly the math expressions in the case density proportional

to exp(axb)xc

avoids numerical instabilities when using numerical computation

Likelihood theory

Remarks on the MLE (cont')

Deriving the MLE in closed form is often impossible in real-lifeproblems. One has to resort to numerical optimization. Hence theimportance of optimization methods in statistics.

If the parameter θ belongs to a discrete set, di�erentiating l(θ) ismeaningless. One has to resort to discrete optimization methods.

The likelihood L(x1, ..., xn; θ) is sometimes denoted L(θ|x1, ..., xn).�

This is misleading and mathematically completely wrong since inthe likelihood theory, θ is not a random variable.

Reading

To go beyond these slides, you can read the �rst two chapters of In All

Likelihood, Yudi Pawitan, Oxford Science Publications, 2001.This book is not in DTU digital library but almost completely on [Googlebooks ]

Exercises

1 We assume that we have recorded the life duration of n light bulbsdenoted x1, ..., xn. We assume that they are n iid replicates of anexponential E(α) distribution.Derive analytically the expression of the MLE of α.

2 Derive analytically the MLE of a for a dataset consisting of n iidreplicates of a U [0, a] distribution. Evaluate the bias of this estimator.What is the limit of the bias when n tends to +∞?

3 For a distribution fθ, the expectation of X under fθ can be expressedas a function φ(θ). The moment method consists in identifying φ(θ)to the empirical mean. Apply this principle to the case above anddiscuss the estimator in terms of bias, variance, other remarks?

Statistical estimation - Statistical modelling: theory and ...€¦ · Likelihood theory A new look...

Documents

Statistical Decision Theory - perrywilliams.us · statistical estimation theory should and can be organized within the framework of the theory of statistical decision functions (Wald

Maximum Likelihood Estimators in a Statistical Model of

نظریه یادگیری Instructor : Saeed Shiry. Course Overview Statistical learning theory Statistical Learning Theory The Nature of Statistical Learning

U-Likelihood and U-Updating Algorithms: Statistical ...mlg.postech.ac.kr/~seungjin/publications/ecml05.pdf · U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent

Non-maximum likelihood estimation and statistical inference for

Fundamental Theory of Statistical Inferenceayoung/ltcc/slideschapter2.pdf · G. Alastair Young Fundamental Theory of Statistical Inference. Title: Fundamental Theory of Statistical

Human Movement Sciences: Sport, Exercise and Health ......Sep 05, 2016 · Statistical model, likelihood function, maximum likelihood estimation, confidence region, likelihood ratio

Likelihood, Bayesian and Decision Theory

Invariant theory and scaling algorithms for maximum likelihood … · 2021. 5. 30. · Discrete statistical models maximumlikelihoodestimation Givendataisavector of counts Y 2Zm 0,

On learning statistical mixtures maximizing the complete likelihood

principles of statistical inference: likelihood and the bayesian paradigm

Statistical Learning Theory

Likelihood Theory using Score Function

Statistical Decision Theory-English Statistical Decisio

Lecture 5 Bayesian Decision Theory Likelihood Ratio Test · Lecture 5 Bayesian Decision Theory { A fundamental statistical approach to quantifying the tradeo s between various decisions

Statistical Theory

The Quasi-Maximum Likelihood Method: Theory

Chapter 9 Quasi-Maximum Likelihood Theory

Flavors of Likelihood - sites.fas.harvard.edusites.fas.harvard.edu/~bio181/lectures/Flavors-of-Likelihood.pdf · Flavors of Likelihood Steel, M. 2002. Some statistical aspects of

Lecture 2: Statistical Decision Theory (Part I)hzhang/math574m/2019Lect2_decisionI.pdf · Part I: Statistical Decision Theory Bayes Inference Statistical Decision Theory Statistical