View
243
Download
0
Tags:
Embed Size (px)
Citation preview
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof)
Variance of an estimator of single parameter is limited as:
is called “efficient” when the bound exactly archived
• When an efficient estimator exists, it is a ML estimator
• All ML estimators are efficient for n → ∞
Example: exponential distribution :
as (b=0) :
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
2
2
2
b1
ˆV[ ]logL( )
E
ˆ2
112
1log
21
22
2 nt
n
nL n
ii
0b
nEnnE
V2
22
]ˆ[21
1
ˆ21
1]ˆ[
Several parameters :
RCF:
“Fisher information matrix” V-1 ~ n and V ~ 1/n → statistical errors
decrease in proportion to (for efficient estimators)
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
2
1
iji j
logLV E
2 nn
k l lk 1 l 1i j
... logf(x ; ) f(x ; )dx
2
i j
n f(x; ) logf(x; )dx
n
1
Estimation of covariance matrix :
Often difficult to be calculated
Estimation through :
For 1 parameter
Determination of second derivative can be performed analytically or numerically
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
ji
LE
log2
21
ˆ
i jij
logLV |
ˆ
2
22ˆ
log
L
Graphical method for determination of variance :
Expand logL(θ) in a Taylor series around the ML estimate :
We know that :
from RCF bound :
(can be considered as definition of statistical error)
For n → ∞ logL(θ) function becomes parabola ( vanish)
When logL is not parabolic → asymmetrical errors
central confidence interval
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
...)ˆ(log
!2
1)ˆ(
log)ˆ(log)(log 2
ˆ2
2
ˆ
LLLL
,log)ˆ(log MAXLL 0|log
ˆ
L
2ˆ
2
2
)ˆ(log)(log
MAXLL
2
1log)ˆˆ(log ˆ MAXLL
3,)ˆ( nn
%68]ˆˆ,ˆˆ[
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
Extended maximum likelihood (EML)
Model makes predictions on normalization and form of distribution
example: differential cross section (angular distribution)
absolute normalization is Poisson distributed
itself is a function of
Extended likelihood function:
1. when then logarithm of EML:
leads in general to smaller variance (smaller error) for as the estimator explores additional information from normalization
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
en
nfn
!)(
n n n
i ii 1 i 1
eL( , ) e f(x ; ) f(x ; )
n! n!
)(
n n
i ii 1 i 1
logL( ) nlog ( ) ( ) logf(x ; ) ( ) log ( )f(x ; )
2. independent parameter
from ; one obtain the same estimators for
Contributions to distribution
f1,f2,f3 are known; relative contribution is not. p.d.f. :
Parameters are not independent as !
One can define only m-1parameters by ML
Symmetric – EML :
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
nL
0
m
i ii 1
f(x; ) f (x)
i 1 i
1
1
1m
iim
n
i
m
jijj xfL
1 1
)(log),(log
by defining (μi = expected number of events of type i)
μi are no longer subject to constraints; each μj is Poisson mean
Example : signal + background
EML-fit of μs , μb → can be negative
when ? → bias → combination of many experiments has bias !
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
m n m
j j j ij 1 i 1 j 1
logL( ) log f (x )
ii
)()()( xfxfxf bbs
bs
bs
s
s0ˆ ' s 0ˆ s
Maximum likelihood with binned data
For very large data samples log-likelihood function becomes difficult to compute → make a histogram for each measurement, yielding a central number of entries n = (n1,…,nN) in N bins. The expectation values
of the number of entries are given by:
where and are the bin limits
The likelihood function is then :
(does not depend on θ )
Simpler to interpret when ??
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
),...,( 1 N
max
min
);()(i
i
x
x
toti dxxfn minix max
ix
N
i i
ni
n
eL
ii
1 !log)(log
N
ii
N
ii
N
iii nn
111
)!log(log
N
iiiii nnnn
1
!loglog
then :
(when ni = 0 the [] should be substituted by
when
(→ method of least squares)
For large ntot the binned ML method provides only slightly larger errors
as unbinned ML
Testing goodness-of-fit with maximum likelihood
ML-method dos not directly suggest a method of testing “goodness-of-fit”. After successful fit the results must be checked for consitency
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
constnn
nLN
iii
i
ii
1
)(loglog
i
ii n 2
2
11loglog
i
ii
i
ii
i
ii
i
i
n
n
n
n
n
n
n
i i
ii
n
nL
2)(log2
Possibilities:
1. MC study for Lmax :
a) estimate from many MC experiments
b) Histogram logLmax
c) Compare with the observed Lmax from the real experiment
d) Calculate “P-value” = probability for the observed parameter
E[P]=50% (??)
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
obsLmaxlog
Observed Lmax
Lmax
P-Wert
p.d.f.für Lmax
-test
divide sample into N bins and build :
(ntot = free (Poisson), N-m d.o.f.)
or : (n tot = fixed, N-m-1 d.o.f.)
For large ntot the statistics given above follows exactly distribution
or
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
2
N
i i
iin
1
22
ˆ
)ˆ(
N
i toti
totii
np
npn
1
22
ˆ)ˆ(
2
);(obPr 2dofobs n 1
2
dof
obs
n
Combining measurements with maximum likelihood
Example : experiment with n measurements xi, p.d.f f(x;θ)
second experiment with m measurements yi, p.d.f. f(y;θ) (same par. θ)
→ common likelihood function :
or equivalently:
→ a common estimation of through max. of L(θ)
Suppose two experiments based on a measurements of x and y give estimators and . One reports the estimated standard deviations and as the errors on and
When and are independent, then
6. Max. Likelihood 6.1 Maximum likelihood method
K. Desch – Statistical methods of data analysis SS10
)()();();()(11
yx
m
iiy
n
iix LLyfxfL
)(log)(log)(log yx LLL
x yx y
x ˆˆ
y ˆˆ
x y 2ˆ
2ˆ
ˆˆ
ˆ/1ˆ/1
ˆ/ˆˆ/ˆˆ
yx
yxyx