Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Statistical estimationStatistical modelling: theory and practice
Gilles Guillot
September 3, 2013
Gilles Guillot ([email protected]) Estimation September 3, 2013 1 / 27
1 Introductory example
2 Principles of estimation
3 Likelihood theory
4 Reading
5 Exercises
Gilles Guillot ([email protected]) Estimation September 3, 2013 2 / 27
Introductory example
Introductory example
A batch of 1000 electronic components contains some faulty items.One takes a sample of size 100 with replacement, of which 3 arefaulty. What is the proportion of faulty items in the batch?
Arriving in a new city, you see a tram passing in the street with thenumber 16. How many tram lines are there in this city?
For a set of measurements y1, ..., yn of temperatures at dates t1, .., tnobserved at a certain location, we want to �t a line y = at+ b. Whatare the values a and b that best �t the data?
Gilles Guillot ([email protected]) Estimation September 3, 2013 3 / 27
Introductory example
A common set up
We have some data
There is a �mechanism� that generates the data
This mechanism depends on a unknown parameter that we want toestimate
Gilles Guillot ([email protected]) Estimation September 3, 2013 4 / 27
Introductory example
The Statistics way
We relate the unknown parameter to the data by mean of a probabilitydistribution.
Proportion of faulty items: the number of faulty items in te samplecan be assumed to be follow a binomial distribution B(n, p)
Tram lines: the number observed can be assumed to follow a uniformdistribution U{1, ..., N}
Gilles Guillot ([email protected]) Estimation September 3, 2013 5 / 27
Principles of estimation
Estimator, estimate
Estimator
Denoting generically θ the unknown parameter value, an estimator is a rule(or algorithm) allowing us to �guess� θ, from the data.From a mathematical point of view, it is a function
Rn −→ Rd
(x1, ..., xn) −→ θ̂
Gilles Guillot ([email protected]) Estimation September 3, 2013 6 / 27
Principles of estimation
d is the dimension of the parameter space (often d = 1 for us)
Since we assume that data are random, we will often stress this bydenoting them (X1, ..., Xn).
The number θ̂ is an estimate of θ. It is a random variable, denotedsometimes θ̂(X1, ..., Xn)
Gilles Guillot ([email protected]) Estimation September 3, 2013 7 / 27
Principles of estimation
Bias of an estimator
De�nition: bias
The bias of an estimator is the average discrepancy between the estimateand the true parameter value:
Bias(θ̂) = E[θ̂ − θ] = E[θ̂]− θ
An estimator is said to be unbiased if Bias(θ̂) = 0
Gilles Guillot ([email protected]) Estimation September 3, 2013 8 / 27
Principles of estimation
Precision and accuracy
Precision and accuracy are two concepts that belong to science andengineering best explained by the �gure below:
In statistics, we have two related concepts: variance and mean square error.
Gilles Guillot ([email protected]) Estimation September 3, 2013 9 / 27
Principles of estimation
De�nition: variance of an estimator
V [θ̂] = E[(θ̂ − E[θ̂])2
]V [θ̂] is a measure of how much θ̂ is scaterred around its mean (which maydi�er from the true value θ).
De�nition: mean square error of an estimator
MSE[θ̂] = E[(θ̂ − θ)2
]is a measure of how much θ̂ is scaterred around the true value θ.
When θ̂ is unbiased, E[θ̂] = θ hence V [θ̂] = MSE[θ̂].
Gilles Guillot ([email protected]) Estimation September 3, 2013 10 / 27
Principles of estimation
Con�dence interval
De�nition: con�dence interval
A con�dence interval at level (1− α) ∈ [0, 1] is an interval that containsthe true unknown parameter value θ with probbaility 1− α.
Gilles Guillot ([email protected]) Estimation September 3, 2013 11 / 27
Principles of estimation
Example: estimation of a proportion
We have a sample of n objects of which x are faulty. We estimate theunknown proportion p by p̂ = x/n.Exercise: give the bias and variance of p̂.
Gilles Guillot ([email protected]) Estimation September 3, 2013 12 / 27
Likelihood theory
A new look at the probability of the data
We consider again the problem of estimating a proportion with binomialsampling.
P (X = x) = px(1− p)n−x is the probability to obtain x faulty objects.
If we consider px(1− p)n−x as a function of p, it can be interpreted asthe likelihood of the unknown parameter p.
To acknowledge the dependence on p, we denote L(x; p) = px(1− p)n−xor for short L(p).
0.0 0.2 0.4 0.6 0.8 1.0
0.00
000.
0005
0.00
100.
0015
0.00
20
p
Pro
babi
lity
of d
ata
Gilles Guillot ([email protected]) Estimation September 3, 2013 13 / 27
Likelihood theory
The maximum likelihood principle
The above suggests a method to estimate the unknown parameter p:
p̂ = Argmaxp px(1− p)n−x
p̂ is the parameter value that makes our data most probable,
It is known as the Maximum Likelihood Estimate of p.and denoted p̂ML
Note that de�ning p̂ as Argmaxp
(n
x
)px(1− p)n−x would lead to the same
result as(nx
)does not depend on p.
Gilles Guillot ([email protected]) Estimation September 3, 2013 14 / 27
Likelihood theory
Likelihood in a general statistical model
De�nition: likelihood function
We consider a dataset consisting of n observations (x1, ..., xn)
We assume that the probability density function or probability massfunction of (x1, ..., xn) denoted by fθ(x1, ..., xn) is known up to anunknown parameter θ.
The likelihood function L is de�ned as
L(x1, ..., xn; θ) = fθ(x1, ..., xn)
Gilles Guillot ([email protected]) Estimation September 3, 2013 15 / 27
Likelihood theory
Examples of likelihood functions
Poisson counts
We observe the number of phone calls at various calling centres over agiven period and denote them by (x1, ..., xn).
We assume that the xi are independent realizations of a Poissonrandom variable Xi with parameter λ, i.e.P (Xi = x) = exp(−λ)λx/x!
NB: x ∈ N and λ ∈ R+
E[Xi] = λ and V [Xi] = λ
Gilles Guillot ([email protected]) Estimation September 3, 2013 16 / 27
Likelihood theory
Likelihood for i.i.d Poisson observations
Remember: "Likelihood = probability of data for a given parameter value "
L(x1, ..., xn;λ) =
n∏i=1
exp(−λ)λxi/xi!
= exp(−nλ)λ∑
i xi∏i xi!
∝ exp(−nλ)λ∑
i xi
Gilles Guillot ([email protected]) Estimation September 3, 2013 17 / 27
Likelihood theory
Likelihood for i.i.d Normal observations
"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+
L(x1, ..., xn;µ, σ) =
n∏i=1
1
σ√
2πexp[−1
2
(xi − µ)2
σ2]
∝n∏i=1
1
σexp[−1
2
(xi − µ)2
σ2]
Gilles Guillot ([email protected]) Estimation September 3, 2013 18 / 27
Likelihood theory
General maximum likelihood principle
Maximum likelihood estimator
We consider a dataset consisting of n observations (x1, ..., xn)
We assume that we know the likelihood functionL(x1, ..., xn; θ) = fθ(x1, ..., xn)
The maximum likelihood estimator of θ is de�ned as
θ̂ML(x1, ..., xn) = ArgmaxθL(x1, ..., xn; θ)
Gilles Guillot ([email protected]) Estimation September 3, 2013 19 / 27
Likelihood theory
Deriving p̂ explicitly for the previous binomial sampling
We want to maximize L(p) = px(1− p)n−xWe could work on L(p) directly in this case but let us denotel(p) = lnL(p).
l(p) = ln[px(1− p)n−x] = x ln p+ (n− x) ln(1− p)l′(p) = x/p− (n− x)/(1− p) (1)
l′(p) = 0 if p = x/n
p̂ = x/n is the estimate of p
for a generic sample with random outcome X, p̂ = X/n is theestimator or p, it is a random variable
Gilles Guillot ([email protected]) Estimation September 3, 2013 20 / 27
Likelihood theory
Maximum likelihood estimator for i.i.d Poisson observations
Omitting the term that does not depend on λ, we have
l(x1, ..., xn;λ) = lnL(x1, ..., xn;λ) = ln[exp(−nλ)λ∑
i xi ]
= −nλ+∑i
xi lnλ
Henced
dλl(x1, ..., xn;λ) = −n+
∑i
xi/λ
Andd
dλl(x1, ..., xn;λ) = 0⇐⇒ λ =
∑i
xi/n
The MLE of λ is λ̂ML =∑
i xi/n = x̄
Gilles Guillot ([email protected]) Estimation September 3, 2013 21 / 27
Likelihood theory
Likelihood for i.i.d Normal observations
"Likelihood = probability of data for a given parameter value "Parameter: θ = (µ, σ) ∈ R× R+
L(x1, ..., xn;µ, σ) =
n∏i=1
1
σ√
2πexp[−1
2
(xi − µ)2
σ2]
∝n∏i=1
1
σexp[−1
2
(xi − µ)2
σ2]
l(x1, ..., xn;µ, σ) =
n∑i=1
[− lnσ − 1
2
(xi − µ)2
σ2
]
= −n lnσ − 1
2σ2
n∑i=1
(xi − µ)2
Gilles Guillot ([email protected]) Estimation September 3, 2013 22 / 27
Likelihood theory
d
dµl(x1, ..., xn;µ, σ) =
1
2σ2
n∑i=1
(xi − µ)
d
dσl(x1, ..., xn;µ, σ) = −n
σ+
1
σ3
n∑i=1
(xi − µ)2
d
dµl = 0 and
d
dσl = 0 give
n∑i=1
(xi − µ) = 0
and − nσ2 +
n∑i=1
(xi − µ)2 = 0
hence
θ̂ML = (̂µ, σ)ML =
(1
n
n∑i=1
xi,1
n
n∑i=1
(xi − x̄)2
)
NB: The ML estimator of the variance is biased. This bias tends to zero asn −→∞
Gilles Guillot ([email protected]) Estimation September 3, 2013 23 / 27
Likelihood theory
Remarks on the MLE
The rule �likelihod = product of marginal densities� applies only whenthe observations are independent
Taking the log
linearizes the product into a sum
simpli�es greatly the math expressions in the case density proportional
to exp(axb)xc
avoids numerical instabilities when using numerical computation
Gilles Guillot ([email protected]) Estimation September 3, 2013 24 / 27
Likelihood theory
Remarks on the MLE (cont')
Deriving the MLE in closed form is often impossible in real-lifeproblems. One has to resort to numerical optimization. Hence theimportance of optimization methods in statistics.
If the parameter θ belongs to a discrete set, di�erentiating l(θ) ismeaningless. One has to resort to discrete optimization methods.
The likelihood L(x1, ..., xn; θ) is sometimes denoted L(θ|x1, ..., xn).�
This is misleading and mathematically completely wrong since inthe likelihood theory, θ is not a random variable.
Gilles Guillot ([email protected]) Estimation September 3, 2013 25 / 27
Reading
Reading
To go beyond these slides, you can read the �rst two chapters of In All
Likelihood, Yudi Pawitan, Oxford Science Publications, 2001.This book is not in DTU digital library but almost completely on [Googlebooks ]
Gilles Guillot ([email protected]) Estimation September 3, 2013 26 / 27
Exercises
Exercises
1 We assume that we have recorded the life duration of n light bulbsdenoted x1, ..., xn. We assume that they are n iid replicates of anexponential E(α) distribution.Derive analytically the expression of the MLE of α.
2 Derive analytically the MLE of a for a dataset consisting of n iidreplicates of a U [0, a] distribution. Evaluate the bias of this estimator.What is the limit of the bias when n tends to +∞?
3 For a distribution fθ, the expectation of X under fθ can be expressedas a function φ(θ). The moment method consists in identifying φ(θ)to the empirical mean. Apply this principle to the case above anddiscuss the estimator in terms of bias, variance, other remarks?
Gilles Guillot ([email protected]) Estimation September 3, 2013 27 / 27