Applied Statistics Principle of maximum likelihood
“Statistics is merely a quantisation of common sense”
Troels C. Petersen (NBI)
1
Likelihood function
“I shall stick to the principle of likelihood…” [Plato, in Timaeus]
2
Likelihood function
Given a PDF f(x) with parameter(s) θ, what is the chance that with N observations, xi falls in the intervals [xi, xi + dxi]?
L(✓) =Y
i
f(xi, ✓)dxi
3
Given a set of measurements x, and parameter(s) θ, the likelihood function is defined as:
The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of parameter(s) (here θ) given some data (here x).There is nothing strange about this - it is exactly the same we do for the ChiSquare!
The likelihood function plays a central role in statistics, as it can be shown to be:• Consistent (converges to the right value).• Asymptotically normal (converges with Gaussian errors).• Efficient (reaches the Minimum Variance Bound (MVB, Cramer-Rao) for large N).
To some extend, this means that the likelihood function is “optimal”, that is, if it can be applied in practice.
Likelihood function
L(x1, x2, . . . , xN ; ✓) =Y
i
p(xi, ✓)
4
V (â) � 1< (d lnL/da)2 >
AAACE3icbZC7SgNBFIbPejfeopY2gyJEC9210UIkaGNhoWCikI3h7OxsMmR2dpmZFcKy72Djq9hYKGJrY+cb2PkKThILbz8MfPznHM6cP0gF18Z135yR0bHxicmp6dLM7Nz8Qnlxqa6TTFFWo4lI1GWAmgkuWc1wI9hlqhjGgWAXQfeoX7+4ZkrzRJ6bXsqaMbYljzhFY61WebNe8Ttociw2iN9mxI8U0twr8v1KSHwhycl2iBtXO+SgaJXX3C13IPIXvC9Yq1Y/3h8B4LRVfvXDhGYxk4YK1Lrhualp5qgMp4IVJT/TLEXaxTZrWJQYM93MBzcVZN06IYkSZZ80ZOB+n8gx1roXB7YzRtPRv2t9879aIzPRXjPnMs0Mk3S4KMoEMQnpB0RCrhg1omcBqeL2r4R20KZibIwlG4L3++S/UN/Z8iyf2TQOYagpWIFVqIAHu1CFYziFGlC4gTt4gEfn1rl3npznYeuI8zWzDD/kvHwCqGWfEQ==AAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/jAAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/jAAACE3icbZC7TsMwFIYdrqXcAowsFhVSywAJCwwIIVgYGIpES6UmVCeO01p1nMh2kKoo78DCq7AwgBArCxtvg1sycPslS5/+c46Ozx+knCntOB/W1PTM7Nx8ZaG6uLS8smqvrbdVkklCWyThiewEoChngrY005x2UkkhDji9DoZn4/r1LZWKJeJKj1Lqx9AXLGIEtLF69k677g1A51A0sNen2IskkNwt8qN6iD0u8MVeCI2bfXxc9Oyas+tMhP+CW0INlWr27HcvTEgWU6EJB6W6rpNqPwepGeG0qHqZoimQIfRp16CAmCo/n9xU4G3jhDhKpHlC44n7fSKHWKlRHJjOGPRA/a6Nzf9q3UxHh37ORJppKsjXoijjWCd4HBAOmaRE85EBIJKZv2IyAJOKNjFWTQju75P/Qnt/1zV86dROTss4KmgTbaE6ctEBOkHnqIlaiKA79ICe0LN1bz1aL9brV+uUVc5soB+y3j4BfLub+g==
Likelihood vs. Chi-SquareFor computational reasons, it is often much easier to minimise the logarithm of the likelihood function:
In problems with Gaussian errors, it turns out that the likelihood function boils down to the Chi-Square with a constant offset and a factor -2 in difference.
The likelihood function for fits comes in two versions:• Binned likelihood (using Poisson) for histograms.• Unbinned likelihood (using PDF) for single values.
The “trouble” with the likelihood is, that it is unlike the Chi-Square, there is NOsimple way to obtain a probability of obtaining certain likelihood value!
@ lnL@✓
����✓=✓̄
= 0
5
See Barlow 5.6
ChiSquareRecall, the ChiSquare is a sum over bins in a histogram:
N observedi
N expectedi
i6
�2(✓) =NbinsX
i
✓N observedi �N
expected
i
�(N observedi )
◆2
AAACj3icfVFdaxNBFJ1dv2qsmuqjL4NBSB4Mu6FQpVWCvuiLVDBtIZMss5O72aHzsczcLQ3L/h1/kG/+GydpHmojXhg4nHPu3Jlz80pJj0nyO4rv3X/w8NHe486T/afPnncPXpx5WzsBE2GVdRc596CkgQlKVHBROeA6V3CeX35e6+dX4Ly05geuKphpvjSykIJjoLLuTyZKOR/1GZaAfEA/UOZrncl58y1rmM7tdcNQmhXNpfFt21KmoMA+ZYXjIniC8bbL5h7cFSyC8S3dUeG6AoFrtW2Yl0vN+/+5YRCGObkscTAfZd1eMkw2RXdBugU9sq3TrPuLLayoNRgUins/TZMKZw13KIWCtsNqDxUXl3wJ0wAN1+BnzSbPlr4JzIIW1oVjkG7Y2x0N196vdB6cmmPp72pr8l/atMbi3ayRpqoRjLgZVNSKoqXr5dCFdCEgtQqACyfDW6koeUgawwo7IYT07pd3wdlomB4O338/7I0/bePYI6/Ia9InKTkiY/KFnJIJEdF+NIqOo5P4ID6KP8bjG2scbXtekr8q/voHsVnKAw==
ChiSquareRecall, the ChiSquare is a sum over bins in a histogram:
N observedi
N expectedi
i7
�2(✓) =NbinsX
i
(N observedi �Nexpected
i )2
N observedi
<latexit sha1
_base64="m/DnbuNayyLGq0Hj
kl/bb0LAYqI=">AAACenicfVH
LahsxFNVMX6n7ctNlKYiYtjYl
ZsYYmi4Kod10FVKok4DHHjTyn
VhEj0G6E2LEfER/rbt+STddV
ON4kcalBwSHc85F0rlFJYXDJP
kZxXfu3rv/YOdh59HjJ0+fdZ/
vnjhTWw4TbqSxZwVzIIWGCQqU
cFZZYKqQcFpcfG7900uwThj9D
VcVzBQ716IUnGGQ8u73jC/FfN
TPcAnIBvQjzVytcjH3R7nPVGG
ufIZCr2ghtGuahmalZdz3j9rI
Td8UDuwlLEJkn265cFUBx9Ydz
EeN/890k3d7yTBZg26TdEN6ZI
PjvPsjWxheK9DIJXNumiYVzj
yzKLiEppPVDirGL9g5TAPVTIG
b+XV1DX0dlAUtjQ1HI12rNyc8
U86tVBGSiuHS3fZa8V/etMbyY
OaFrmoEza8vKmtJ0dB2D3QhbG
hErgJh3IrwVsqXLHSLYVudUEJ
6+8vb5GQ0TMfDD1/HvcNPmzp2
yEuyR/okJe/JIflCjsmEcPIre
hW9id5Gv+O9eBC/u47G0WbmBf
kL8fgPqHvDsA==</latexit>
ChiSquareRecall, the ChiSquare is a sum over bins in a histogram:
N observedi
N expectedi
i8
�2(✓) =NbinsX
i
(N observedi �Nexpected
i )2
N expectedi
<latexit sha1
_base64="dhDUQ/T5bI+iF1d
Fo9EZzJAdOwI=">AAACenicf
VFdaxNBFJ1dv2r8ivVRhKFBT
ZCG3RCwPghFX3wqFUxbyCbL7O
RuM3Q+lpm7pWHYH+Ff881f4o
sPzqZ5qI14YeBwzrncuecWlR
QOk+RnFN+5e+/+g52HnUePnz
x91n2+e+JMbTlMuJHGnhXMgRQ
aJihQwlllgalCwmlx8bnVTy/
BOmH0N1xVMFPsXItScIaByrv
fM74U81E/wyUgG9CPNHO1ysX
cH+U+U4W58hkKvaKF0K5pGpqV
lnHfP2otN3VTOLCXsAiWfbql
wlUFHFt1MB81/j96k3d7yTBZ
F90G6Qb0yKaO8+6PbGF4rUAj
l8y5aZpUOPPMouASmk5WO6gYv
2DnMA1QMwVu5tfRNfR1YBa0N
DY8jXTN3uzwTDm3UkVwKoZLd
1tryX9p0xrLg5kXuqoRNL8eV
NaSoqHtHehC2LCxXAXAuBXhr5
QvWcgWw7U6IYT09srb4GQ0TM
fDD1/HvcNPmzh2yEuyR/okJe
/JIflCjsmEcPIrehW9id5Gv+
O9eBC/u7bG0abnBfmr4vEfnEH
DqA==</latexit>
Binned LikelihoodThe binned likelihood is a sum over bins in a histogram:
N observedi
N expectedi
i
L(✓)binned =NbinsY
i
Poisson(N expectedi , Nobservedi )
9
f(n,�) =�n
n!e�
Unbinned LikelihoodThe binned likelihood is a sum over single measurements:
i
PDF(xobservedi )
L(✓)unbinned =Nmeas.Y
i
PDF(xobservedi )
10
Notes on the likelihoodFor a large sample, the maximum likelihood (ML) is indeed unbiased and has the minimum variance - that is hard to beat! Also, the binned LLH approaches the unbinned version. However...
For the ML, you have to know your PDF. This is also true for the Chi-Square, but unlike for the Chi-Square, you get no goodness-of-fit measure to check it!
Also, the small statistics, the ML is not necessarily unbiased, but still fares much better than the ChiSquare! But be careful with small statistics.The way to avoid this problem is using simulation - more to follow.
11
Evaluating the likelihoodAs mentioned, it is possible to evaluate a likelihood fit (i.e. get a p-value).This is done by first fitting the data, and then (given the fit parameters) one simulates data according to these parameters, and fit these many times.
12Finally, you see what fraction of the fits have a worse LLH. Note the Gaussian shape!
The likelihood ratio testNot unlike the Chi-Square, were one can compare χ2 values, the likelihood between two competing hypothesis can be compared (SAME offset constant/factor!).
While their individual LLH values do not say much, their RATIO says everything!
As with the likelihood, one often takes the logarithm and multiplies by -2 to match the Chi-Square, thus the “test statistic” becomes:
If the two hypothesis are simple (i.e. no free parameters) then the Neyman-Pearson Lemma states that this is the best possible test one can make.
If the alternative model is not simple but nested (i.e. contains the null hypothesis), this difference approximately behaves like a Chi-Square distribution with Ndof = Ndof(alternative) - Ndof(null). This is called Wilk’s Theorem.
13