Upload
frye
View
70
Download
1
Tags:
Embed Size (px)
DESCRIPTION
third lecture. Parameter estimation, maximum likelihood and least squares techniques. Jorge Andre Swieca School Campos do Jordão, January,2003. References. Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow , J. Wiley & Sons, 1989; - PowerPoint PPT Presentation
Citation preview
Parameter estimation, maximum likelihood and least squares
techniques
Jorge Andre Swieca School
Campos do Jordão, January,2003
third lecture
References
• Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989;
• Statistical Data Analysis, G. Cowan, Oxford, 1998
• Particle Data Group (PDG) Review of Particle Physics, 2002 electronic edition.
• Data Analysis, Statistical and Computational Methods for Scientists and Engineers, S. Brandt, Third Edition, Springer, 1999
Likelihood
“A verossimilhança (…) é muita vez toda a verdade.” Conclusão de Bento – Machado de Assis
“Quem quer que a ouvisse, aceitaria tudo por verdade, tal era a nota sincera, a meiguice dos termos e a verossimilhança dos pormenores”
Quincas Borba – Machado de Assis
Parameter estimation
p.d.f. f(x): sample space all possible values of x.
Sample of size n: independent obsv.),...,,( nxxxx 21
Joint p.d.f. )()()(),...,( nnsam xfxfxfxxf 211
Central problem of statistics: from n measurements of x , infer properties of , .);(
xf ),,,( m
21
A statistic: a function of the observed . x
To estimate prop. of p.d.f. (mean, variance,…): estimador.
Estimador para : θ θEstimador consistent 0 )ˆ(lim Pn
(large sample or assimptotic limit)
Parameter estimation
),,(ˆ nxx 1 random variable distributed as );( g(sampling distribution)
ˆ);(ˆ)](ˆ[ dgxE
nn dxdxxfxfx
11 );();()(ˆ
Infinite number of similar experiments of size n
Bias ]ˆ[Eb • sample size• functional form of estimator• true properties of p.d.f.b=0 independent of n: θ is unbiased
Important to combine results of two or more experiments.
Parameter estimation
mean square error222 ])ˆ[(])ˆ[ˆ[(])ˆ[( EEEEMSE
2bVMSE ]ˆ[
Classical statistics: no unique method for building estimatorsgiven an estimator one can evaluate its properties
sample mean
From supposed from unknown pdf ),...,,( nxxxx 21
)(xf
Estimator for E[x]=µ (population mean)
one possibility:
n
iix
nx
1
1
Parameter estimation
Important property: weak law of large numbers
If V(x) exists, is a consistent estimator for µ x
n→∞, →µ in the sense of probability x
n
i
n
ii
n
ii n
xEn
xn
ExE111
111 ][][
nnii dxdxxfxfxxE 11 )()(][
x is an unbiased estimator for the population mean µ
Parameter estimation
Sample variance )()( 22
1
22
11
1xx
n
nxx
ns
n
ii
22 ][sE
s2 is an unbiased estimator for V[x]
if µ is known 22
1
22 1
xxn
Sn
ii )(
S2 is an unbiased estimator for σ2.
Maximum likelihood
Technique for estimating parameters given a finite sample of data ),...,,( nxxxx 21
Suppose the functional form of f(x;θ) known.
The probability of be in is 1x ],[ 111 dxxx 11 dxxf );(
prob. xi in for all i = ],[ iii dxxx
n
iii dxxf
1
);(
If parameters correct: high probability for the data.
n
iixfL
1
);()( likelihood function• joint probability• θ variables• X parameters
ML estimators for θ: maximize the likelihood function
0
i
L
mi ,,1 )ˆ,,ˆ(ˆ
m 1
Maximum likelihood
Maximum likelihood
n decay times for unstable particles t1,…,tn
hypothesis: distribution an exponential p.d.f. with mean
)exp();(
ttf
1 )(log);(log)(log
in
i
n
ii
ttfL
1 1
1
0
)(logL
n
iitn 1
1
nnnn dtdtttfttttE 11joint11 );,,(),,(ˆ)],,(ˆ[
n
ttn
ii dtdteet
n
n
11
111 1
)(
n
i
n
i ijj
t
i
t
i ndtedtet
n
ji
11
1111
Maximum likelihood
01.
0621.ˆ
50 decaytimes
Maximum likelihood
1 ?
given )(a 0
a
a
LL0
a
)(ˆ aa
n
iit
n
1
1
ˆˆ
1
1
1
n
n
n
nE
]ˆ[
unbiased estimator for when n→∞
1
Maximum likelihood
n measurements of x assumed to come from a gaussian
2
2
2
2
22
1
)(
exp),;(x
xf
n
i
in
ii
xxfL
12
2
22
1
2
2
1
2
1
2
1
)(
loglog),;(log),(log
0
Llog
n
iix
n 1
1 ]ˆ[E unbiased
02
Llog
n
iix
n 1
22 1)ˆ(
/\
22 1n
nE
/\
][
unbiased for large n
Maximum likelihood
we showed that s2 is an unbiased estimator for the variance of any p.d.f., so
n
iix
ns
1
22
1
1)ˆ(
is unbiased estimator for 2
Maximum likelihood
Variance of ML estimators
many experiments (same n): spread of ? analytically (exponential)
n
iitn 1
122 ])ˆ[(]ˆ[]ˆ[ EEV
n
ttn
ii dtdteet
n
n
12
1
111 1
)(
n
ttn
ii dtdteet
n
n
11
111 1
)(
n
2
transf. invariance of ML estimators
ML estimate of n
22 ˆ
n
22
ˆ/\
ˆ n
ˆ
ˆ ˆ
Maximum likelihood
430827 ..ˆ If the experiment repeated many times (with the same n) the standard deviation of the estimation 0.43.
• one possible interpretation• not the standard when the distribution is not gaussian (68% confidence interval, +- standard deviation if the p.d.f. for the estimator is gaussian)• in the large sample limit, ML estimates are distributed according to a gaussian p.d.f.• two procedures lead to the same result
Maximum likelihood
Variance: MC method
cases too difficult to solve analytically: MC method
• simulate a large number of experiments• compute the ML estimate each time• distribution of the resulting values
S2 unbiased estimator for the variance of a p.d.f.
S from MC experiments: statistical errors of the parameter estimated from real measurement
asymptotic normality: general property of ML estimators for large samples.
Maximum likelihood
1000 experiments50 obs/experiment
s = 0.151sample standarddeviation
50
0621.ˆˆ ˆ
n
1500.
RCF bound
A way to estimate the variance of any estimators without analytical calculations or MC.
2
22
1
LE
bV
log]ˆ[ Rao-Cramer-Frechet
Equality (minimum variance): estimator efficient If efficient estimators exist for a problem, the ML will find them.
ML estimators: always efficient in the large sample limit.
Ex: exponential
t
etf
1
);(
ˆlog
2112
12
122
2 nt
n
nL n
ii
0b
nnE
V2
2
21
1
ˆ
][ equal to exact resultefficient estimator
RCF bound
),,( m
1 assume efficiency and zero bias]ˆ,ˆcov[ jiijV
jiij
LEV
log2
1
n
lll
n
kk
ji
dxxfxf11
2
);();(log
dxxfxfnji
);(log);(
2
nV
1 statistical errors
n
1
RCF bound
large data sample: evaluating the second derivative with the measured data and the ML estimates
ji
VL
ij
log/\2
1
2
22 1
Llog
/\
usual method for estimating the covariance matrix when the likelihood function is maximized numerically
Ex: MINUIT (Cern Library) • finite differences• invert the matrix to get Vij
Graphical method
single parameter θ
...)ˆ(log
!)ˆ(
log)(log)(log
ˆˆ
22
2
2
1
LLLL
2
2
2 ˆ
)ˆ(log)(log max
LL
2
1 maxlog)ˆ(log LL
logLmax
2
1
later ]ˆ,ˆ[ 68.3% central confidenceinterval
ML with two parameters
angular distribution for the scattering angles θ (x=cosθ) in a particle reaction.
32
2
1 2
xxxf ),;( normalized -1≤ x ≤+1
realistic measurements only in xmin ≤ x ≤ xmax
)()()(),;(
minmaxminmaxminmax3322
2
32
1
xxxxxx
xxxf
ML with two parameters
50.
50.
05205080 ..ˆ
10804660 ..ˆ
2000 events
ML with two parameters
500 exper.
2000 evts/exp.
Both marginalpdf’s are aprox.gaussian
4990.ˆ
0510.ˆ s
4980.ˆ
1110.ˆ s
420.r
Least squares
measured value y: gaussian random variable centered about the quantity’s true value λ(x,θ) niyx iii ,,),( 1
2
2
12
22111 22
1
i
iin
i i
nnn
yyyL
)(
exp),,,,,;,,(
estimate the
n
i i
ii xyL
12
2
2
1
));((
)(log
maximized with that mimize
n
i i
ii xy
12
22
)),((
)(
Least squares
used to define the procedure even if yi are not gaussian
n
jijjijii xyVxyL
1
1
2
1
,
));(()))(,(()(log
measurements not independent, described by a n-dim Gaussian p.d.f. with nown cov. matrix but unknown mean values:
n
jijjijii xyVxy
1
12
,
));(()))(,(()(
m ˆ,,ˆ 1LS estimators
Least squares
m
jjj xax
1
)();( )(xa j linearly independent
• estimators and their variances can be found analytically• estimators: zero bias and minimum variance
m
jjij
m
jjiji Axax
11
)();(
)()()()(
AyVAyyVy TT 112
minimum 02 112 )(
AVAyVA TT
yByVAAVA TT 111 )(
Least squares
covariance matrix for the estimators
],cov[ jiijU 11 )( AVABVBU TT
ˆ
)(
ji
ijU2
1
2
1
coincides with the RCF bound for the inverse covariance matrix if yi are gaussian distributed
2
2Llog
Least squares
);(
x linear in , quadratic in 2
)ˆ)(ˆ()ˆ()(ˆ,
jjii
m
ji ji
1
2222
2
1
to interpret this, one single θ
222
22
2
1)ˆ()ˆ()(
ˆ
/\
2
ˆ
12 )ˆ()( min
Chi-squared distribution
)();(
2
1
2
22
2 nn
zn
eznzf
n=1,2,…0 ≤ z ≤ ∞
(degrees of freedom) )!()( 1 nn)()( xxx
0
ndznzfzE );(][
ndznzfnzzV 20
2
);()(][
n independent gaussian random variables xi with known 2ii ,
n
i i
iixz
12
2
)( is distributed as a for n dof
2
Chi-squared distribution
Chi-squared distribution
Chi-squared distribution