Download pdf - AS2020 11 24 MaximumLikelihood - nbi.dkpetersen/Teaching/Stat2020/Week2/AS2020_11_24... · AS2020_11_24_MaximumLikelihood Created Date: 11/23/2020 9:28:10 PM

Applied Statistics Principle of maximum likelihood

“Statistics is merely a quantisation of common sense”

Troels C. Petersen (NBI)

1

Likelihood function

“I shall stick to the principle of likelihood…” [Plato, in Timaeus]

2

Likelihood function

Given a PDF f(x) with parameter(s) θ, what is the chance that with N observations, xi falls in the intervals [xi, xi + dxi]?

L(✓) =Y

i

f(xi, ✓)dxi

3

Given a set of measurements x, and parameter(s) θ, the likelihood function is defined as:

The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of parameter(s) (here θ) given some data (here x).There is nothing strange about this - it is exactly the same we do for the ChiSquare!

The likelihood function plays a central role in statistics, as it can be shown to be:• Consistent (converges to the right value).• Asymptotically normal (converges with Gaussian errors).• Efficient (reaches the Minimum Variance Bound (MVB, Cramer-Rao) for large N).

To some extend, this means that the likelihood function is “optimal”, that is, if it can be applied in practice.

Likelihood function

L(x1, x2, . . . , xN ; ✓) =Y

i

p(xi, ✓)

4

V (â) � 1< (d lnL/da)2 >

AAACE3icbZC7SgNBFIbPejfeopY2gyJEC9210UIkaGNhoWCikI3h7OxsMmR2dpmZFcKy72Djq9hYKGJrY+cb2PkKThILbz8MfPznHM6cP0gF18Z135yR0bHxicmp6dLM7Nz8Qnlxqa6TTFFWo4lI1GWAmgkuWc1wI9hlqhjGgWAXQfeoX7+4ZkrzRJ6bXsqaMbYljzhFY61WebNe8Ttociw2iN9mxI8U0twr8v1KSHwhycl2iBtXO+SgaJXX3C13IPIXvC9Yq1Y/3h8B4LRVfvXDhGYxk4YK1Lrhualp5qgMp4IVJT/TLEXaxTZrWJQYM93MBzcVZN06IYkSZZ80ZOB+n8gx1roXB7YzRtPRv2t9879aIzPRXjPnMs0Mk3S4KMoEMQnpB0RCrhg1omcBqeL2r4R20KZibIwlG4L3++S/UN/Z8iyf2TQOYagpWIFVqIAHu1CFYziFGlC4gTt4gEfn1rl3npznYeuI8zWzDD/kvHwCqGWfEQ==AAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/jAAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/jAAACE3icbZC7TsMwFIYdrqXcAowsFhVSywAJCwwIIVgYGIpES6UmVCeO01p1nMh2kKoo78DCq7AwgBArCxtvg1sycPslS5/+c46Ozx+knCntOB/W1PTM7Nx8ZaG6uLS8smqvrbdVkklCWyThiewEoChngrY005x2UkkhDji9DoZn4/r1LZWKJeJKj1Lqx9AXLGIEtLF69k677g1A51A0sNen2IskkNwt8qN6iD0u8MVeCI2bfXxc9Oyas+tMhP+CW0INlWr27HcvTEgWU6EJB6W6rpNqPwepGeG0qHqZoimQIfRp16CAmCo/n9xU4G3jhDhKpHlC44n7fSKHWKlRHJjOGPRA/a6Nzf9q3UxHh37ORJppKsjXoijjWCd4HBAOmaRE85EBIJKZv2IyAJOKNjFWTQju75P/Qnt/1zV86dROTss4KmgTbaE6ctEBOkHnqIlaiKA79ICe0LN1bz1aL9brV+uUVc5soB+y3j4BfLub+g==

Likelihood vs. Chi-SquareFor computational reasons, it is often much easier to minimise the logarithm of the likelihood function:

In problems with Gaussian errors, it turns out that the likelihood function boils down to the Chi-Square with a constant offset and a factor -2 in difference.

The likelihood function for fits comes in two versions:• Binned likelihood (using Poisson) for histograms.• Unbinned likelihood (using PDF) for single values.

The “trouble” with the likelihood is, that it is unlike the Chi-Square, there is NOsimple way to obtain a probability of obtaining certain likelihood value!

@ lnL@✓

��✓=✓̄

= 0

5

See Barlow 5.6

ChiSquareRecall, the ChiSquare is a sum over bins in a histogram:

N observedi

N expectedi

i6

�2(✓) =NbinsX

i

✓N observedi �N

expected

i

�(N observedi )

◆2

AAACj3icfVFdaxNBFJ1dv2qsmuqjL4NBSB4Mu6FQpVWCvuiLVDBtIZMss5O72aHzsczcLQ3L/h1/kG/+GydpHmojXhg4nHPu3Jlz80pJj0nyO4rv3X/w8NHe486T/afPnncPXpx5WzsBE2GVdRc596CkgQlKVHBROeA6V3CeX35e6+dX4Ly05geuKphpvjSykIJjoLLuTyZKOR/1GZaAfEA/UOZrncl58y1rmM7tdcNQmhXNpfFt21KmoMA+ZYXjIniC8bbL5h7cFSyC8S3dUeG6AoFrtW2Yl0vN+/+5YRCGObkscTAfZd1eMkw2RXdBugU9sq3TrPuLLayoNRgUins/TZMKZw13KIWCtsNqDxUXl3wJ0wAN1+BnzSbPlr4JzIIW1oVjkG7Y2x0N196vdB6cmmPp72pr8l/atMbi3ayRpqoRjLgZVNSKoqXr5dCFdCEgtQqACyfDW6koeUgawwo7IYT07pd3wdlomB4O338/7I0/bePYI6/Ia9InKTkiY/KFnJIJEdF+NIqOo5P4ID6KP8bjG2scbXtekr8q/voHsVnKAw==


N observedi

N expectedi

i7

�2(✓) =NbinsX

i

(N observedi �Nexpected

i )2

N observedi

<latexit sha1

_base64="m/DnbuNayyLGq0Hj

kl/bb0LAYqI=">AAACenicfVH

LahsxFNVMX6n7ctNlKYiYtjYl

ZsYYmi4Kod10FVKok4DHHjTyn

VhEj0G6E2LEfER/rbt+STddV

ON4kcalBwSHc85F0rlFJYXDJP

kZxXfu3rv/YOdh59HjJ0+fdZ/

vnjhTWw4TbqSxZwVzIIWGCQqU

cFZZYKqQcFpcfG7900uwThj9D

VcVzBQ716IUnGGQ8u73jC/FfN

TPcAnIBvQjzVytcjH3R7nPVGG

ufIZCr2ghtGuahmalZdz3j9rI

Td8UDuwlLEJkn265cFUBx9Ydz

EeN/890k3d7yTBZg26TdEN6ZI

PjvPsjWxheK9DIJXNumiYVzj

yzKLiEppPVDirGL9g5TAPVTIG

b+XV1DX0dlAUtjQ1HI12rNyc8

U86tVBGSiuHS3fZa8V/etMbyY

OaFrmoEza8vKmtJ0dB2D3QhbG

hErgJh3IrwVsqXLHSLYVudUEJ

6+8vb5GQ0TMfDD1/HvcNPmzp2

yEuyR/okJe/JIflCjsmEcPIre

hW9id5Gv+O9eBC/u47G0WbmBf

kL8fgPqHvDsA==</latexit>


N observedi

N expectedi

i8

�2(✓) =NbinsX

i

(N observedi �Nexpected

i )2

N expectedi

<latexit sha1

_base64="dhDUQ/T5bI+iF1d

Fo9EZzJAdOwI=">AAACenicf

VFdaxNBFJ1dv2r8ivVRhKFBT

ZCG3RCwPghFX3wqFUxbyCbL7O

RuM3Q+lpm7pWHYH+Ff881f4o

sPzqZ5qI14YeBwzrncuecWlR

QOk+RnFN+5e+/+g52HnUePnz

x91n2+e+JMbTlMuJHGnhXMgRQ

aJihQwlllgalCwmlx8bnVTy/

BOmH0N1xVMFPsXItScIaByrv

fM74U81E/wyUgG9CPNHO1ysX

cH+U+U4W58hkKvaKF0K5pGpqV

lnHfP2otN3VTOLCXsAiWfbql

wlUFHFt1MB81/j96k3d7yTBZ

F90G6Qb0yKaO8+6PbGF4rUAj

l8y5aZpUOPPMouASmk5WO6gYv

2DnMA1QMwVu5tfRNfR1YBa0N

DY8jXTN3uzwTDm3UkVwKoZLd

1tryX9p0xrLg5kXuqoRNL8eV

NaSoqHtHehC2LCxXAXAuBXhr5

QvWcgWw7U6IYT09srb4GQ0TM

fDD1/HvcNPmzh2yEuyR/okJe

/JIflCjsmEcPIrehW9id5Gv+

O9eBC/u7bG0abnBfmr4vEfnEH

DqA==</latexit>

Binned LikelihoodThe binned likelihood is a sum over bins in a histogram:

N observedi

N expectedi

i

L(✓)binned =NbinsY

i

Poisson(N expectedi , Nobservedi )

9

f(n,�) =�n

n!e�

Unbinned LikelihoodThe binned likelihood is a sum over single measurements:

i

PDF(xobservedi )

L(✓)unbinned =Nmeas.Y

i

PDF(xobservedi )

10

Notes on the likelihoodFor a large sample, the maximum likelihood (ML) is indeed unbiased and has the minimum variance - that is hard to beat! Also, the binned LLH approaches the unbinned version. However...

For the ML, you have to know your PDF. This is also true for the Chi-Square, but unlike for the Chi-Square, you get no goodness-of-fit measure to check it!

Also, the small statistics, the ML is not necessarily unbiased, but still fares much better than the ChiSquare! But be careful with small statistics.The way to avoid this problem is using simulation - more to follow.

11

Evaluating the likelihoodAs mentioned, it is possible to evaluate a likelihood fit (i.e. get a p-value).This is done by first fitting the data, and then (given the fit parameters) one simulates data according to these parameters, and fit these many times.

12Finally, you see what fraction of the fits have a worse LLH. Note the Gaussian shape!

The likelihood ratio testNot unlike the Chi-Square, were one can compare χ2 values, the likelihood between two competing hypothesis can be compared (SAME offset constant/factor!).

While their individual LLH values do not say much, their RATIO says everything!

As with the likelihood, one often takes the logarithm and multiplies by -2 to match the Chi-Square, thus the “test statistic” becomes:

If the two hypothesis are simple (i.e. no free parameters) then the Neyman-Pearson Lemma states that this is the best possible test one can make.

If the alternative model is not simple but nested (i.e. contains the null hypothesis), this difference approximately behaves like a Chi-Square distribution with Ndof = Ndof(alternative) - Ndof(null). This is called Wilk’s Theorem.

13