Estimation , Variation and Uncertainty

Estimation, Variation and Uncertainty

Simon French

[email protected]

Aims of Session

• gain a greater understanding of the estimation of parameters and variables.

• gain an appreciation of point estimation.• gain an appreciation of how to assess the

uncertainty and confidence levels in estimates

Cause and effect can be determined with

sufficient data

K nowable The realm of

Scientific Inquiry

Complex The realm of Social Systems

Cause and effect may be determined after the event

Chaotic Cause and effect not discernable

K nown The realm of Scientific

Knowledge Cause and effect understood

and predicable

Cynefin and statistics

Repea

table

even

ts

Uniqueevents

Events?

Estim

ation

and

conf

irmat

ory

analy

sis

explo

rato

ry

analy

ses

Frequentist Statistics

Key point: Probability represents a long run frequency of occurrence

Proportion of heads in tosses of a coin

0

0.5

1

1.5

1 2 5 10 20 50 100 1000 10000

No of tosses

Frequentist Statistics

• Scientific Method is based upon repeatability of experiments

• Parameters in a (scientific) model or theory are fixedÞ Cannot talk of the probability of a objective

quantity or parameter value• Data come from repeatable experiments

Þ Can talk of the probability of a data value

Measurement and Variation of Objective Quantities

• Ideally we simply perform an experiment and measure the quantities that interest us

• But variation and experimental error mean that we cannot simply do this

• So we need to make multiple measurements, learn about the variation and estimate the quantity of interest

EstimationTry to find a function of the data that is tightly distributed about the quantity of interest.

Distribution of data

datapointQuantity of interest, Distribution of mean

Quantity of interest, Data mean

Confidence intervalsintervals defined from the data

95% confidence intervals: calculate interval for each of 100 data sets

about 95 will contain .

Uncertainty• But there is more uncertainty in what we

do than just variation and experimental error

• We do our calculations in a statistical model.

• But the model is not the real world• So there is modelling error – which covers

a multitude of sins!

Uncertainty

• So a 95% confidence interval may represent a much greater uncertainty!

• Studies have shown that the uncertainty bounds given by scientists (and others!) are often overconfident by a factor of 10.

Estimation of model parameters

• Sometimes the quantities that we wish to estimate do not exist!

• Parameters may only have existence within a model– Transfer coefficients– Release height in atmospheric dispersion– Risk aversion

Why do we want estimates?• [Remember our exhortations that you should be

clear on your research objectives or questions.]• To measure ‘something out there’• To find the parameter to use for some purpose in

a model– Evaluation of systems– Prediction of some effect – May use estimate of parameters and their uncertainty

to predict how a complex systems may evolve, e.g. through Monte Carlo Methods.

Independence

• Many estimation methods assume that each error is probabilistically independent of the other errors… and often they are far from independent.– 1700 2 ‘independent’ samples– IPCC work on climate change

• Dependence in data changes – increases! - the uncertainty in the estimates

Bayesian Statistics

Rev. Thomas Bayes

• 1701?-1761• Main work published

posthumously:T. Bayes (1763) An essay towards solving a problem in the doctrine of chances. Phil Trans Roy. Soc. 53 370-418

• Bayes Theorem – inverse probability

Bayes theorem

Posterior probability

likelihood prior probability

p(| x) p(x | ) × p()

Bayes theorem

There is a constant, but‘easy’ to find as probability

adds (integrates) to one



p(| x) p(x | ) × p()

18

Bayes theorem

Probability distribution of parameters p()



p(| x) p(x | ) × p()

19

Bayes theorem

likelihood of datagiven

parameters p(x|)



p(| x) p(x | ) × p()

20

Bayes theorem

Probability distributionof parameters

given data p(|x)



p(| x) p(x | ) × p()

On the treatment of negative intensity measurements

Simon [email protected]

mailto:[email protected]

Crystallography data• Roughly, x-rays shone at a

crystal diffract into many rays radiating out in a fixed pattern from the crystal.

• The intensities of these diffracted rays are related to the modulus of the coefficients in the Fourier expansion of the electron density of molecule.

• So getting hold of the intensities gives structural information

Intensity measurement• Measure X-ray intensity in a diffracted ray and

subtract the background ‘near to it’

Measured intensity, I = ray strength - background

• But in protein crystallography most intensities are small relative to background so some are ‘measured’ as negative

• And theory says they are non-negative …• Approaches in the early 1970s simply set

negative measurements to zero … and got biased data sets

A Bayesian approach• Good reason to think the likelihood for intensity

measurements is near normal– Difference of Poisson (‘counting statistics’)– Further ‘corrections’

• Theory gives the prior: “Wilson’s statistics” (AJC Wilson 1949)

• Estimate with the posterior mean

0 J JE J I J p I J p J dJ

Normal Likelihood Wilson’s Statistics

Simon French and Keith Wilson (1978)

On the treatment of negative intensity measurements

Acta Crystallographica A34, 517-525

Prior

Posterior

Toss a biased coin 12 times; obtain 9 heads

Bayes theorem

Prior

Posterior


Bayesian Estimation

Take mean, median or mode

Prior

Posterior


Bayesian confidence interval

Highest 95% density

But why do any of these?

Just report the posterior.

It encodes all that is known about 1

Documents

Estimation , Variation and Uncertainty