Parameter estimation, maximum likelihood and least squares techniques

Parameter estimation, maximum likelihood and least squares

techniques

Jorge Andre Swieca School

Campos do Jordão, January,2003

third lecture

References

• Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989;

• Statistical Data Analysis, G. Cowan, Oxford, 1998

• Particle Data Group (PDG) Review of Particle Physics, 2002 electronic edition.

• Data Analysis, Statistical and Computational Methods for Scientists and Engineers, S. Brandt, Third Edition, Springer, 1999

Likelihood

“A verossimilhança (…) é muita vez toda a verdade.” Conclusão de Bento – Machado de Assis

“Quem quer que a ouvisse, aceitaria tudo por verdade, tal era a nota sincera, a meiguice dos termos e a verossimilhança dos pormenores”

Quincas Borba – Machado de Assis

Parameter estimation

p.d.f. f(x): sample space all possible values of x.

Sample of size n: independent obsv.),...,,( nxxxx 21

Joint p.d.f. )()()(),...,( nnsam xfxfxfxxf 211

Central problem of statistics: from n measurements of x , infer properties of , .);(

xf ),,,( m

21

A statistic: a function of the observed . x

To estimate prop. of p.d.f. (mean, variance,…): estimador.

Estimador para : θ θEstimador consistent 0 )ˆ(lim Pn

(large sample or assimptotic limit)


),,(ˆ nxx 1 random variable distributed as );( g(sampling distribution)

ˆ);(ˆ)](ˆ[ dgxE

nn dxdxxfxfx

11 );();()(ˆ

Infinite number of similar experiments of size n

Bias ]ˆ[Eb • sample size• functional form of estimator• true properties of p.d.f.b=0 independent of n: θ is unbiased

Important to combine results of two or more experiments.


mean square error222 ])ˆ[(])ˆ[ˆ[(])ˆ[( EEEEMSE

2bVMSE ]ˆ[

Classical statistics: no unique method for building estimatorsgiven an estimator one can evaluate its properties

sample mean

From supposed from unknown pdf ),...,,( nxxxx 21

)(xf

Estimator for E[x]=µ (population mean)

one possibility:

n

iix

nx

1

1


Important property: weak law of large numbers

If V(x) exists, is a consistent estimator for µ x

n→∞, →µ in the sense of probability x

n

i

n

ii

n

ii n

xEn

xn

ExE111

111 ][][

nnii dxdxxfxfxxE 11 )()(][

x is an unbiased estimator for the population mean µ


Sample variance )()( 22

1

22

11

1xx

n

nxx

ns

n

ii

22 ][sE

s2 is an unbiased estimator for V[x]

if µ is known 22

1

22 1

xxn

Sn

ii )(

S2 is an unbiased estimator for σ2.

Maximum likelihood

Technique for estimating parameters given a finite sample of data ),...,,( nxxxx 21

Suppose the functional form of f(x;θ) known.

The probability of be in is 1x ],[ 111 dxxx 11 dxxf );(

prob. xi in for all i = ],[ iii dxxx

n

iii dxxf

1

);(

If parameters correct: high probability for the data.

n

iixfL

1

);()( likelihood function• joint probability• θ variables• X parameters

ML estimators for θ: maximize the likelihood function

0

i

L

mi ,,1 )ˆ,,ˆ(ˆ

m 1

Maximum likelihood

Maximum likelihood

n decay times for unstable particles t1,…,tn

hypothesis: distribution an exponential p.d.f. with mean

)exp();(

ttf

1 )(log);(log)(log

in

i

n

ii

ttfL

1 1

1

0

)(logL

n

iitn 1

1

nnnn dtdtttfttttE 11joint11 );,,(),,(ˆ)],,(ˆ[

n

ttn

ii dtdteet

n

n

11

111 1

)(

n

i

n

i ijj

t

i

t

i ndtedtet

n

ji

11

1111

Maximum likelihood

01.

0621.ˆ

50 decaytimes

Maximum likelihood

1 ?

given )(a 0

a

a

LL0

a

)(ˆ aa

n

iit

n

1

1

ˆˆ

1

1

1

n

n

n

nE

]ˆ[

unbiased estimator for when n→∞

1

Maximum likelihood

n measurements of x assumed to come from a gaussian

2

2

2

2

22

1

)(

exp),;(x

xf

n

i

in

ii

xxfL

12

2

22

1

2

2

1

2

1

2

1

)(

loglog),;(log),(log

0

Llog

n

iix

n 1

1 ]ˆ[E unbiased

02

Llog

n

iix

n 1

22 1)ˆ(

/\

22 1n

nE

/\

][

unbiased for large n

Maximum likelihood

we showed that s2 is an unbiased estimator for the variance of any p.d.f., so

n

iix

ns

1

22

1

1)ˆ(

is unbiased estimator for 2

Maximum likelihood

Variance of ML estimators

many experiments (same n): spread of ? analytically (exponential)

n

iitn 1

122 ])ˆ[(]ˆ[]ˆ[ EEV

n

ttn

ii dtdteet

n

n

12

1

111 1

)(

n

ttn

ii dtdteet

n

n

11

111 1

)(

n

2

transf. invariance of ML estimators

ML estimate of n

22 ˆ

n

22

ˆ/\

ˆ n

ˆ

ˆ ˆ

Maximum likelihood

430827 ..ˆ If the experiment repeated many times (with the same n) the standard deviation of the estimation 0.43.

• one possible interpretation• not the standard when the distribution is not gaussian (68% confidence interval, +- standard deviation if the p.d.f. for the estimator is gaussian)• in the large sample limit, ML estimates are distributed according to a gaussian p.d.f.• two procedures lead to the same result

Maximum likelihood

Variance: MC method

cases too difficult to solve analytically: MC method

• simulate a large number of experiments• compute the ML estimate each time• distribution of the resulting values

S2 unbiased estimator for the variance of a p.d.f.

S from MC experiments: statistical errors of the parameter estimated from real measurement

asymptotic normality: general property of ML estimators for large samples.

Maximum likelihood

1000 experiments50 obs/experiment

s = 0.151sample standarddeviation

50

0621.ˆˆ ˆ

n

1500.

RCF bound

A way to estimate the variance of any estimators without analytical calculations or MC.

2

22

1

LE

bV

log]ˆ[ Rao-Cramer-Frechet

Equality (minimum variance): estimator efficient If efficient estimators exist for a problem, the ML will find them.

ML estimators: always efficient in the large sample limit.

Ex: exponential

t

etf

1

);(

ˆlog

2112

12

122

2 nt

n

nL n

ii

0b

nnE

V2

2

21

1

ˆ

][ equal to exact resultefficient estimator

RCF bound

),,( m

1 assume efficiency and zero bias]ˆ,ˆcov[ jiijV

jiij

LEV

log2

1

n

lll

n

kk

ji

dxxfxf11

2

);();(log

dxxfxfnji

);(log);(

2

nV

1 statistical errors

n

1

RCF bound

large data sample: evaluating the second derivative with the measured data and the ML estimates

ji

VL

ij

log/\2

1

2

22 1

Llog

/\

usual method for estimating the covariance matrix when the likelihood function is maximized numerically

Ex: MINUIT (Cern Library) • finite differences• invert the matrix to get Vij

Graphical method

single parameter θ

...)ˆ(log

!)ˆ(

log)(log)(log

ˆˆ

22

2

2

1

LLLL

2

2

2 ˆ

)ˆ(log)(log max

LL

2

1 maxlog)ˆ(log LL

logLmax

2

1

later ]ˆ,ˆ[ 68.3% central confidenceinterval

ML with two parameters

angular distribution for the scattering angles θ (x=cosθ) in a particle reaction.

32

2

1 2

xxxf ),;( normalized -1≤ x ≤+1

realistic measurements only in xmin ≤ x ≤ xmax

)()()(),;(

minmaxminmaxminmax3322

2

32

1

xxxxxx

xxxf


50.

50.

05205080 ..ˆ

10804660 ..ˆ

2000 events


500 exper.

2000 evts/exp.

Both marginalpdf’s are aprox.gaussian

4990.ˆ

0510.ˆ s

4980.ˆ

1110.ˆ s

420.r

Least squares

measured value y: gaussian random variable centered about the quantity’s true value λ(x,θ) niyx iii ,,),( 1

2

2

12

22111 22

1

i

iin

i i

nnn

yyyL

)(

exp),,,,,;,,(

estimate the

n

i i

ii xyL

12

2

2

1

));((

)(log

maximized with that mimize

n

i i

ii xy

12

22

)),((

)(

Least squares

used to define the procedure even if yi are not gaussian

n

jijjijii xyVxyL

1

1

2

1

,

));(()))(,(()(log

measurements not independent, described by a n-dim Gaussian p.d.f. with nown cov. matrix but unknown mean values:

n

jijjijii xyVxy

1

12

,

));(()))(,(()(

m ˆ,,ˆ 1LS estimators

Least squares

m

jjj xax

1

)();( )(xa j linearly independent

• estimators and their variances can be found analytically• estimators: zero bias and minimum variance

m

jjij

m

jjiji Axax

11

)();(

)()()()(

AyVAyyVy TT 112

minimum 02 112 )(

AVAyVA TT

yByVAAVA TT 111 )(

Least squares

covariance matrix for the estimators

],cov[ jiijU 11 )( AVABVBU TT

ˆ

)(

ji

ijU2

1

2

1

coincides with the RCF bound for the inverse covariance matrix if yi are gaussian distributed

2

2Llog

Least squares

);(

x linear in , quadratic in 2

)ˆ)(ˆ()ˆ()(ˆ,

jjii

m

ji ji

1

2222

2

1

to interpret this, one single θ

222

22

2

1)ˆ()ˆ()(

ˆ

/\

2

ˆ

12 )ˆ()( min

Chi-squared distribution

)();(

2

1

2

22

2 nn

zn

eznzf

n=1,2,…0 ≤ z ≤ ∞

(degrees of freedom) )!()( 1 nn)()( xxx

0

ndznzfzE );(][

ndznzfnzzV 20

2

);()(][

n independent gaussian random variables xi with known 2ii ,

n

i i

iixz

12

2

)( is distributed as a for n dof

2




Documents

Parameter estimation, maximum likelihood and least squares techniques