49
Statistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences Birkbeck College (University of London) Malet Street London WC1E 7HX [email protected]

Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

  • Upload
    others

  • View
    33

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Statistics and Data Analysisin

Proficiency Testing

Michael ThompsonSchool of Biological and Chemical Sciences

Birkbeck College (University of London)Malet Street

London WC1E [email protected]

Page 2: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Organisation of a proficiency test

“Harmonised Protocol”. Pure Appl Chem. 2006, 78, 145-196.

Page 3: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Where do we use statistics inproficiency testing?

• Finding a consensus and its uncertainty touse as an assigned value

• Assessing participants’ results• Assessing the efficacy of the PT scheme• Testing for sufficient homogeneity and

stability of the distributed test material• Others

Page 4: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Criteria for an ideal scoringmethod

• Adds value to raw results.• Easily understandable, based on the

properties of the normal distribution.• Has no arbitrary scaling transformation.• Is transferable between different

concentrations, analytes, matrices, andmeasurement principles.

Page 5: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

How can we construct a score?

• An obvious idea is to utilise the propertiesof the normal distribution to interpret theresults of a proficiency test.

BUT…

We do not makeany assumptionsabout the actualdata.

Page 6: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Example dataset A• Determination of protein nitrogen in a meat

product.

Page 7: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

A weak scoring method

• On average, slightly more than 95% of laboratoriesreceive z-score within the range ±2.

sxxz

077.0126.2

sx

Page 8: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Robust mean and standarddeviation

• Robust statistics is applicable to datasets thatlook like normally distributed samplescontaminated with outliers and stragglers (i.e.,unimodal and roughly symmetric.

• The method downweights the otherwise largeinfluence of outliers and stragglers on theestimates.

• It models the central ‘reliable’ part of the dataset.

robrob ˆ,ˆ

Page 9: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Can I use robust estimates?

Measurement axis

Skewed

Bimodal

Heavy-tailed

Page 10: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

nxxx 21Tx

MAD5.1ˆ,medianˆ,0,21Set 00 pk

ppipp

ppipp

ppippi

i

kxk

kxk

kxkx

x

ˆˆifˆˆ

ˆˆifˆˆ

ˆˆˆˆif~

)~var()(ˆ

)~(meanˆ2

1

1

ip

ip

xkf

x

1converged,notIf pp

Huber’s H15

Page 11: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

References: robust statistics

• Analytical Methods Committee,Analyst,1989, 114, 1489

• AMC Technical Brief No 6, 2001(download from www/rsc.org/amc)

• P J Rousseeuw, J. Chemomet, 1991, 5, 1.

Page 12: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Is that enough?

• On average, slightly less than 95% oflaboratories receive a z-score between ±2.

robrobxz ˆˆ

048.0ˆ

128.2ˆ

rob

rob

Page 13: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

What more do we need?

• We need a method that evaluates the datain relation to its intended use, rather thanmerely describing it.

• This adds value to the data rather thansimply summarising it.

• The method is based on fitness forpurpose.

Page 14: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Fitness for purpose

• Fitness for purpose occurs when the uncertaintyof the result uf gives best value for money.

• If the uncertainty is smaller than uf , the analysismay be too expensive.

• If the uncertainty is larger than uf , the cost andthe probability of a mistaken decision will rise.

Page 15: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Fitness for purpose

• The value of uf can sometimes be estimatedobjectively by decision theoretic methods, but ismost often simply agreed between thelaboratory and the customer by professionaljudgement.

• In the proficiency test context, uf should bedetermined by the scheme provider.

Reference: T Fearn, S A Fisher, M Thompson,and S L R Ellison, Analyst, 2002, 127, 818-824.

Page 16: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

• If we now define a z-score thus:

we have a z-score that is both robustified againstextreme values and tells us something about fitnessfor purpose.

• In an exactly compliant laboratory, scores of 2<|z|<3will be encountered occasionally, and scores of |z|>3rarely. Better performers will receive fewer of theseextreme z-scores.

A score that meets all of thecriteria

fpprob uxz whereˆ

Page 17: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Example data A again• Suppose that the fitness for purpose criterion set

for the analysis is an RSD of 1%. This gives us:021.01.201.0 p

Page 18: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Finding a consensus fromparticipants’ results

• The consensus is not theoretically the bestoption for the assigned value but is usuallythe only practicable value.

• The consensus is not necessarily identicalwith the true value. PT providers have tobe alert to this possibility.

Page 19: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

What is a ‘consensus’?

• Mean? - easy to calculate, but affected byoutliers and asymmetry.

• Robust mean? - fairly easy to calculate, handlesoutliers but affected by asymmetry.

• Median? - easy to calculate, more robust forasymmetric distributions, but larger standarderror than robust mean.

• Mode? - intuitively good, difficult to define,difficult to calculate.

Page 20: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

• The robust mean provides a useful consensusin the great majority of instances, where theunderlying distribution is roughly symmetricand there are 0-10% outliers.

• The uncertainty of this consensus can besafely taken as

The robust mean as consensus

nxu roba ̂

Page 21: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

When can I use robust estimates?

Measurement axis

Skewed

Bimodal

Heavy-tailed

Page 22: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Skewed distributions

• Skews can arise when the participants’results come from two or moreinconsistent methods.

• They can also arise as an artefact at lowconcentrations of analyte as a result ofdata recording practice.

• Rarely, skews can arise when thedistribution is truly lognormal.

Page 23: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Possible use of a trimmed dataset?

Page 24: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Can I use the mode?How many modes? Where are they?

Page 25: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

The normal kernel density foridentifying a mode

where Φ is the standard normal density,

AMC Technical Brief No. 4

n

i

i

hxx

nhy

1

1

2

)2/exp()(

2aa

Page 26: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

A normal kernel

Page 27: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

A kernel density

Page 28: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Another kernel density

Page 29: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Graphical representation of sample data

Page 30: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Kernel density of the aflatoxin data

Page 31: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Uncertainty of the mode

• The uncertainty of the consensus can beestimated as the standard error of themode by applying the bootstrap to theprocedure.

• The bootstrap is a general procedurebased on resampling for estimatingstandard errors of complex statistics.

• Reference: Bump-hunting for the proficiencytester – searching for multimodality. P JLowthian and M Thompson, Analyst, 2002,127,1359-1364.

Page 32: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

The normal mixture model

AMC Technical Brief No 23, and AMC Software.Thompson, Acc Qual Assur, 2006, 10, 501-505.

1,)()(11

m

jj

m

jjj pyfpyf

2

2/)(exp()(

22j

j

yyf

Page 33: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Mixture models found by the maximumlikelihood method (the EM algorithm)

• The M-step

• The E-step

nyjPpn

iij /)(̂ˆ

1

n

ii

n

iiij yjPyjPy

11)(̂)(̂̂

)(̂)(̂)̂(ˆ1 1

22i

n

j

m

iiji yjPyjPy

)(ˆ)(ˆ)(̂1

i

m

jjjijji yfpyfpyjP

Page 34: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Kernel density and fit of 2-componentnormal mixture model

Page 35: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Kernel density and variance-inflatedmixture model

Page 36: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Useful References

• Mixture modelsM Thompson. Accred Qual Assur. 2006, 10, 501-505.AMC Technical Brief No. 23, 2006. www/rsc.org/amc

• Kernel densitiesB W Silverman, Density estimation for statistics and dataanalysis. Chapman and Hall, London, 1986.AMC Technical Brief, no. 4, 2001 www/rsc.org/amc

• The bootstrapB Efron and R J Tibshirani, An introduction to thebootstrap. Chapman and Hall, London, 1993AMC Technical Brief, No. 8, 2001 www/rsc.org/amc

Page 37: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

• Use z-scores based on fitness forpurpose.

• Estimate the consensus as the robustmean and its uncertainty asif the dataset is roughly symmetric.

• If the dataset is skewed and plausiblycomposite, use kernel density methodsor mixture models

Conclusions—scoring

nrob̂

Page 38: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Homogeneity testing

• Comminute and mix bulk material.• Split into distribution units.• Select m>10 distribution units at random.• Homogenise each one.• Analyse 2 test portions from each in

random order, with high precision, andconduct one-way analysis of variance onresults.

Page 39: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Design for homogeneity testing

2,

MSWMSBsMSWs saman

Page 40: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Problems with simple ANOVAbased on testing

• Analytical precision too low—methodcannot detect consequential degree ofheterogeneity.

• Analytical precision too high—methodfinds significant degree of heterogeneitythat may not be consequential.

(Everything is heterogeneous!)

0:0 samH

Page 41: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences
Page 42: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences
Page 43: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences
Page 44: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

• Material passes homogeneity test if

• Problems are:– ssam may not be well estimated;– too big a probability of rejecting

satisfactory test material.

“Sufficient homogeneity”:original definition

pLsams 3.0

Page 45: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Fearn test

• Test by rejecting when

Ref: Analyst, 2001, 127, 1359-1364.

220 : LsamH

2

11

,122

12

2

mmanmL

samFs

ms

Page 46: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

Problems with homogeneitydata

• Problems with data are common:e.g., no proper randomisation, insufficientprecision, biases, trends, steps,insufficient significant figures recorded,outliers.

• Laboratories need detailed instructions.• Data need careful scrutiny before

statistics.• HP1 is incorrect in saying that all outlying

data should be retained.

Page 47: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences
Page 48: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences
Page 49: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdfStatistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences

General references

• The Harmonised Protocol (revised)M Thompson, S L R Ellison and R WoodPure Appl. Chem., 2006, 78, 145-196.

• R E Lawn, M Thompson and R F Walker,Proficiency testing in analytical chemistry. TheRoyal Society of Chemistry, Cambridge, 1997.

• ISO Guide 43. International StandardsOrganisation, Geneva, 1997.