Upload
farrell-foyle
View
36
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Estimation , Variation and Uncertainty. Simon French [email protected]. Aims of Session. gain a greater understanding of the estimation of parameters and variables. gain an appreciation of point estimation. - PowerPoint PPT Presentation
Citation preview
Aims of Session
• gain a greater understanding of the estimation of parameters and variables.
• gain an appreciation of point estimation.• gain an appreciation of how to assess the
uncertainty and confidence levels in estimates
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable
Cynefin and statistics
Repea
table
even
ts
Uniqueevents
Events?
Estim
ation
and
conf
irmat
ory
analy
sis
explo
rato
ry
analy
ses
Frequentist Statistics
Key point: Probability represents a long run frequency of occurrence
Proportion of heads in tosses of a coin
0
0.5
1
1.5
1 2 5 10 20 50 100 1000 10000
No of tosses
Frequentist Statistics
• Scientific Method is based upon repeatability of experiments
• Parameters in a (scientific) model or theory are fixedÞ Cannot talk of the probability of a objective
quantity or parameter value• Data come from repeatable experiments
Þ Can talk of the probability of a data value
Measurement and Variation of Objective Quantities
• Ideally we simply perform an experiment and measure the quantities that interest us
• But variation and experimental error mean that we cannot simply do this
• So we need to make multiple measurements, learn about the variation and estimate the quantity of interest
EstimationTry to find a function of the data that is tightly distributed about the quantity of interest.
Distribution of data
datapointQuantity of interest, Distribution of mean
Quantity of interest, Data mean
Confidence intervalsintervals defined from the data
95% confidence intervals: calculate interval for each of 100 data sets
about 95 will contain .
Uncertainty• But there is more uncertainty in what we
do than just variation and experimental error
• We do our calculations in a statistical model.
• But the model is not the real world• So there is modelling error – which covers
a multitude of sins!
Uncertainty
• So a 95% confidence interval may represent a much greater uncertainty!
• Studies have shown that the uncertainty bounds given by scientists (and others!) are often overconfident by a factor of 10.
Estimation of model parameters
• Sometimes the quantities that we wish to estimate do not exist!
• Parameters may only have existence within a model– Transfer coefficients– Release height in atmospheric dispersion– Risk aversion
Why do we want estimates?• [Remember our exhortations that you should be
clear on your research objectives or questions.]• To measure ‘something out there’• To find the parameter to use for some purpose in
a model– Evaluation of systems– Prediction of some effect – May use estimate of parameters and their uncertainty
to predict how a complex systems may evolve, e.g. through Monte Carlo Methods.
Independence
• Many estimation methods assume that each error is probabilistically independent of the other errors… and often they are far from independent.– 1700 2 ‘independent’ samples– IPCC work on climate change
• Dependence in data changes – increases! - the uncertainty in the estimates
Rev. Thomas Bayes
• 1701?-1761• Main work published
posthumously:T. Bayes (1763) An essay towards solving a problem in the doctrine of chances. Phil Trans Roy. Soc. 53 370-418
• Bayes Theorem – inverse probability
Bayes theorem
There is a constant, but‘easy’ to find as probability
adds (integrates) to one
Posterior probability
likelihood prior probability
p(| x) p(x | ) × p()
18
Bayes theorem
Probability distribution of parameters p()
Posterior probability
likelihood prior probability
p(| x) p(x | ) × p()
19
Bayes theorem
likelihood of datagiven
parameters p(x|)
Posterior probability
likelihood prior probability
p(| x) p(x | ) × p()
20
Bayes theorem
Probability distributionof parameters
given data p(|x)
Posterior probability
likelihood prior probability
p(| x) p(x | ) × p()
On the treatment of negative intensity measurements
Simon [email protected]
Crystallography data• Roughly, x-rays shone at a
crystal diffract into many rays radiating out in a fixed pattern from the crystal.
• The intensities of these diffracted rays are related to the modulus of the coefficients in the Fourier expansion of the electron density of molecule.
• So getting hold of the intensities gives structural information
Intensity measurement• Measure X-ray intensity in a diffracted ray and
subtract the background ‘near to it’
Measured intensity, I = ray strength - background
• But in protein crystallography most intensities are small relative to background so some are ‘measured’ as negative
• And theory says they are non-negative …• Approaches in the early 1970s simply set
negative measurements to zero … and got biased data sets
A Bayesian approach• Good reason to think the likelihood for intensity
measurements is near normal– Difference of Poisson (‘counting statistics’)– Further ‘corrections’
• Theory gives the prior: “Wilson’s statistics” (AJC Wilson 1949)
• Estimate with the posterior mean
0 J JE J I J p I J p J dJ
Normal Likelihood Wilson’s Statistics
Simon French and Keith Wilson (1978)
On the treatment of negative intensity measurements
Acta Crystallographica A34, 517-525
Prior
Posterior
Toss a biased coin 12 times; obtain 9 heads
Bayesian Estimation
Take mean, median or mode
Prior
Posterior
Toss a biased coin 12 times; obtain 9 heads
Bayesian confidence interval
Highest 95% density