Upload
joan-mcdonald
View
215
Download
0
Embed Size (px)
Citation preview
Machine LearningSaarland University, SS 2007
Holger Bast[with input from Ingmar Weber]
Max-Planck-Institut für InformatikSaarbrücken, Germany
Lecture 10, Friday June 22nd, 2007(Everything you always wanted to know
about statistics … but were afraid to ask)
Overview of this lecture
Maximum likelihood vs. unbiased estimators
– Example: normal distribution
– Example: drawing numbers from a box
Things you keep on reading in the ML literature [example]
– marginal distribution
– prior
– posterior
Statistical tests
– hypothesis testing
– discussion of its (non)sense
Maximum likelihood vs. unbiased estimators
Example: maximum likelihood estimator from Lecture 8, Example 2
– μ(x1,…,xn) = 1/n ∙ Σi xi σ2(x1,…,xn) = 1/n ∙ Σi (xi – μ)2
– X1,…,Xn independent identically distributed random
variables with mean μ and variance σ2
– E μ(X1,…,Xn) = μ [blackboard]
– E σ2(X1,…,Xn) = (n–1) / n ∙ σ2 ≠ σ2 [blackboard]
– unbiased variance estimator = 1 / (n-1) ∙ Σi (xi – μ)2
Example: number x drawn from box with numbers 1..n for unknown n
– maximum likelihood estimator: n = x[blackboard]
– unbiased estimator: n = 2x – 1 [blackboard]
Marginal distribution Joint probability distribution, for example
– pick a random MPII staff member
– random variables X = department, Y = gender
– for example, Pr(X = D3, Y = female)
D1 D2 D3 D4 D5
male 0.24 0.09 0.13 0.25 0.11
female 0.03 0.03 0.04 0.04 0.04
0.27 0.12 0.17 0.29 0.15
0.82
0.18
Pr(D3)
Pr(female)
Note:– matrix entries sum to 1– in general, Pr(X = x, Y = y) ≠ Pr(X = x) ∙ Pr(Y = y)
[holds if and only if X and Y are independent]
Frequentism vs. Bayesianism
Frequentism
– probability = relative frequency in large number of trials
– associated with random (physical) system
– only applied to well-defined events in well-defined space
for example: probability of a die showing 6
Bayesianism
– probability = degree of belief
– no random process at all needs to be involved
– applied to arbitrary statements
for example: probability that I will like a new movie
Prior / Posterior probability
Prior
– guess about the data, no random experiment behind
– go on computing with the guess like with a probability
– for example: Z1,…,Zn from E-Step of EM algorithm
Posterior
– probability related to an event that has already happened
– for example: all our likelihoods from Lectures 8 and 9
Note: these are no well-defined technical terms
– but often used as if, which is confusing
– the Bayesianism way …
Hypothesis testing
Example: do two samples have the same mean?
– e.g., two groups of patients in a medical experiment, one group with medication and one group without
– for example, 8.6 4.3 3.2 5.1 and 2.1 4.2 7.6 3.2 2.9
Test
– Formulate null hypothesis, e.g. equal means
– compute probability p of the given (or more extreme) data, assuming that the null hypothesis is true [blackboard]
Outcome
– p ≤ α = 0.05 hypothesis rejected with significance level 95%
one says: the difference of the means is statistically significant
– p > α = 0.05 the hypothesis cannot be rejected
one says: the difference of the means is statistically insignificant
Hypothesis testing — BEWARE!
What one would ideally like:
– given this data, what is the probability that my hypothesis if true?
– formally: Pr(H | D)
What one gets from hypothesis testing
– given that my hypothesis is true, what is the probability of this (or more extreme) data
– formally: Pr(D | H)
– but Pr(D | H) could be low for other reasons than the hypothesis!! [blackboard example]
Useful at all?
– OK: challenge theory by attempting to reject it
– NO: confirm theory by rejecting corresponding null hypothesis
Literature
Read the wonderful articles by Jacob Cohen
– Things I have learned (so far)American Psychologist, 45(12):1304–1312, 1990
– The earth is round (p < .05)American Psychologist 49(12):997–1003, 1994