Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the

Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof)

Variance of an estimator of single parameter is limited as:

is called “efficient” when the bound exactly archived

• When an efficient estimator exists, it is a ML estimator

• All ML estimators are efficient for n → ∞

Example: exponential distribution :

as (b=0) :

6. Max. Likelihood 6.1 Maximum likelihood method

K. Desch – Statistical methods of data analysis SS10

2

2

2

b1

ˆV[ ]logL( )

E

ˆ2

112

1log

21

22

2 nt

n

nL n

ii

0b

nEnnE

V2

22

]ˆ[21

1

ˆ21

1]ˆ[

Several parameters :

RCF:

“Fisher information matrix” V-1 ~ n and V ~ 1/n → statistical errors

decrease in proportion to (for efficient estimators)



2

1

iji j

logLV E

2 nn

k l lk 1 l 1i j

... logf(x ; ) f(x ; )dx

2

i j

n f(x; ) logf(x; )dx

n

1

Estimation of covariance matrix :

Often difficult to be calculated

Estimation through :

For 1 parameter

Determination of second derivative can be performed analytically or numerically



ji

LE

log2

21

ˆ

i jij

logLV |

ˆ

2

22ˆ

log

L

Graphical method for determination of variance :

Expand logL(θ) in a Taylor series around the ML estimate :

We know that :

from RCF bound :

(can be considered as definition of statistical error)

For n → ∞ logL(θ) function becomes parabola ( vanish)

When logL is not parabolic → asymmetrical errors

central confidence interval



...)ˆ(log

!2

1)ˆ(

log)ˆ(log)(log 2

ˆ2

2

ˆ

LLLL

,log)ˆ(log MAXLL 0|log

ˆ

L

2ˆ

2

2

)ˆ(log)(log

MAXLL

2

1log)ˆˆ(log ˆ MAXLL

3,)ˆ( nn

%68]ˆˆ,ˆˆ[



Extended maximum likelihood (EML)

Model makes predictions on normalization and form of distribution

example: differential cross section (angular distribution)

absolute normalization is Poisson distributed

itself is a function of

Extended likelihood function:

1. when then logarithm of EML:

leads in general to smaller variance (smaller error) for as the estimator explores additional information from normalization



en

nfn

!)(

n n n

i ii 1 i 1

eL( , ) e f(x ; ) f(x ; )

n! n!

)(

n n

i ii 1 i 1

logL( ) nlog ( ) ( ) logf(x ; ) ( ) log ( )f(x ; )

2. independent parameter

from ; one obtain the same estimators for

Contributions to distribution

f1,f2,f3 are known; relative contribution is not. p.d.f. :

Parameters are not independent as !

One can define only m-1parameters by ML

Symmetric – EML :



nL

0

m

i ii 1

f(x; ) f (x)

i 1 i

1

1

1m

iim

n

i

m

jijj xfL

1 1

)(log),(log

by defining (μi = expected number of events of type i)

μi are no longer subject to constraints; each μj is Poisson mean

Example : signal + background

EML-fit of μs , μb → can be negative

when ? → bias → combination of many experiments has bias !



m n m

j j j ij 1 i 1 j 1

logL( ) log f (x )

ii

)()()( xfxfxf bbs

bs

bs

s

s0ˆ ' s 0ˆ s

Maximum likelihood with binned data

For very large data samples log-likelihood function becomes difficult to compute → make a histogram for each measurement, yielding a central number of entries n = (n1,…,nN) in N bins. The expectation values

of the number of entries are given by:

where and are the bin limits

The likelihood function is then :

(does not depend on θ )

Simpler to interpret when ??



),...,( 1 N

max

min

);()(i

i

x

x

toti dxxfn minix max

ix

N

i i

ni

n

eL

ii

1 !log)(log

N

ii

N

ii

N

iii nn

111

)!log(log

N

iiiii nnnn

1

!loglog

then :

(when ni = 0 the [] should be substituted by

when

(→ method of least squares)

For large ntot the binned ML method provides only slightly larger errors

as unbinned ML

Testing goodness-of-fit with maximum likelihood

ML-method dos not directly suggest a method of testing “goodness-of-fit”. After successful fit the results must be checked for consitency



constnn

nLN

iii

i

ii

1

)(loglog

i

ii n 2

2

11loglog

i

ii

i

ii

i

ii

i

i

n

n

n

n

n

n

n

i i

ii

n

nL

2)(log2

Possibilities:

1. MC study for Lmax :

a) estimate from many MC experiments

b) Histogram logLmax

c) Compare with the observed Lmax from the real experiment

d) Calculate “P-value” = probability for the observed parameter

E[P]=50% (??)



obsLmaxlog

Observed Lmax

Lmax

P-Wert

p.d.f.für Lmax

-test

divide sample into N bins and build :

(ntot = free (Poisson), N-m d.o.f.)

or : (n tot = fixed, N-m-1 d.o.f.)

For large ntot the statistics given above follows exactly distribution

or



2

N

i i

iin

1

22

ˆ

)ˆ(

N

i toti

totii

np

npn

1

22

ˆ)ˆ(

2

);(obPr 2dofobs n 1

2

dof

obs

n

Combining measurements with maximum likelihood

Example : experiment with n measurements xi, p.d.f f(x;θ)

second experiment with m measurements yi, p.d.f. f(y;θ) (same par. θ)

→ common likelihood function :

or equivalently:

→ a common estimation of through max. of L(θ)

Suppose two experiments based on a measurements of x and y give estimators and . One reports the estimated standard deviations and as the errors on and

When and are independent, then



)()();();()(11

yx

m

iiy

n

iix LLyfxfL

)(log)(log)(log yx LLL

x yx y

x ˆˆ

y ˆˆ

x y 2ˆ

2ˆ

ˆˆ

ˆ/1ˆ/1

ˆ/ˆˆ/ˆˆ

yx

yxyx

Documents

Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the