Precision in chemical analysis: a critical survey of uses and abuses[Doi 10.1039%2Fc2ay25083g] Thompson, Michael -- Precision in Chemical Analysis- A Critical Survey of Uses and Abuses

Dynamic Article LinksC<AnalyticalMethods

Cite this: Anal. Methods, 2012, 4, 1598

www.rsc.org/methods CRITICAL REVIEW

Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3GView Online / Journal Homepage / Table of Contents for this issue

Precision in chemical analysis: a critical survey of uses and abuses

Michael Thompson

Received 23rd January 2012, Accepted 24th April 2012

DOI: 10.1039/c2ay25083g

Precision is a key quantity in assessing the quality of chemical measurement results. It enters into

considerations of uncertainty, fitness for purpose, method validation, instrumental performance,

internal quality control, proficiency testing, and higher-level activities. The standard deviation of

measurement results derived from a single analytical ‘system’ (a combination of a particular analytical

procedure and a specific type of test material) depends on many factors, including the conditions of

measurement, the state of the test material, and the concentration of the analyte. It is essential that

these factors are properly matched to the use to which the precision information will be put.

Apologia

Does the analytical community really need yet another long

document about data quality? Well, in relation to precision, I

think that the answer is ‘yes’. In more than 50 years as an analyst

I have seen (and am still seeing) numerous examples of incorrect

estimation and inappropriate use of precision, in method devel-

opment, validation, quality control and, especially, in relation to

the estimation of uncertainty.

Problems occur when practitioners assess precision by repli-

cation of measurement under one set of conditions and then

apply the estimate to different and therefore inappropriate

conditions. Commonly encountered mismatches of conditions

School of Biological and Chemical Sciences, Birkbeck, University ofLondon, Malet Street, London WC1E 7HX, UK. E-mail: [email protected]

Michael Thompson

Michael Thompson has been

a professional analytical chemist

since 1960, since that date

having worked in industry, the

Civil Service, academia and as

a consultant. He is currently

Emeritus Professor of Analyt-

ical Chemistry at Birkbeck

University of London. He has

a long-standing interest and

research involvement in the

quality of analytical data. He

has been awarded the SAC Gold

Medal, the Theobald Lecture-

ship (both by the RSC), the

Harvey Wiley Award (by

AOAC International), and Honorary Life Membership by the

International Association of Geoanalysts.

1598 | Anal. Methods, 2012, 4, 1598–1611

occur between: (a) instrumental precision and repeatability; (b)

repeatability and reproducibility; (c) validation and quality

control; (d) one type of test material and another; and (e) one

concentration of analyte and another. Unfortunately the resul-

ting discrepancies can be large, and ignoring them can give rise to

misunderstanding and bad decision making. Getting the condi-

tions wrong gives rise to the tendency for analytical chemists to

underestimate when specifying the uncertainty associated with

their results. The fault is not all with practitioners: there is

insufficient guidance in textbooks and normative documents. It is

definitely worth thinking carefully about precision.

In this review, I examine the various conditions under which

precision is assessed and their relevance to various quality-

related practices: method development, validation, internal

quality control, collaborative trials and proficiency tests. I hope

to be excused for having taken most of the examples from my

own experience in the transport industry, forensics, prospecting

technology, biogeochemistry, and food quality.

1. Introduction

1.1 The concept of precision and scope of this study

Precision is a ubiquitous feature of chemical analysis as it figures

in most aspects of data quality, including method validation,1

internal quality control2 and proficiency testing.3 Crucially it is

referenced in the quantification of various contributions to the

combined uncertainty of the result of a measurement.4,5 The

current definition of precision in the Vocabulaire International

de M�etrolgie (VIM3) is: closeness of agreement between indica-

tions or measured quantity values obtained by replicate measure-

ments on the same or similar objects under specified conditions.6 As

the level of precision depends critically on the conditions of

measurement, it is essential for analytical chemists to understand

the exact implications of the various ways of assessing it.

Precision per se is an ordinal quantity because the only levels

available for comparison are lower, equal or higher. Standard

This journal is ª The Royal Society of Chemistry 2012

http://dx.doi.org/10.1039/c2ay25083g






http://pubs.rsc.org/en/journals/journal/AY

http://pubs.rsc.org/en/journals/journal/AY?issueid=AY004006

Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

deviation provides a related ratio scale but quantifies dispersion

rather than precision. Awkwardly then for clear and concise

expression, higher precision correlates with smaller standard

deviation, and we must shun the common fault of using precision

and standard deviation as synonyms. The notion of precision, like

uncertainty, is usually taken by analytical chemists to exclude the

influence of observations resulting from gross errors, that is,

mistakes in the execution of the analytical procedure, malfunc-

tion of equipment or faulty calculation. This exclusion colours

our attitude to the handling of outlying results.

Precision can refer to any set of results conforming to the

above definition. For most purposes in chemical analysis,

however, conditions of measurement is assumed to imply the

replication of the entire analytical procedure starting with

separate test portions of a single homogeneous test sample. (In

some instances it may be relevant to include variation caused by

the physical treatment of the material submitted to the labora-

tory (the laboratory sample) before analysis.) In this context

homogeneous means only that residual heterogeneity makes

a minor, usually trivial, contribution to the variation in the

results. Other conditions of replication are of restricted use. For

instance, instrumental conditions refer only to the performance of

the analytical instrument, excluding variations arising from the

chemical treatment of the test portion.

This review treats only precision applied to chemical

measurement data on interval and ratio scales, that is, it excludes

qualitative and semi-quantitative data. The discussions assume

that the digit resolution in any measurement is capable of

reflecting adequately the dispersion of results. It is important to

notice that the precision of an analytical method is meaningful

only when the procedure is applied to a narrowly defined class of

test material, ‘vegetables’ for example. More strictly we should

refer to the precision of an analytical system comprising

a procedure and a type of test material. Sampling precision,7

although very important in the wider context, is not addressed

here.

1.2 Development of the concept of precision

The essential ideas of data quality were established by the 1890s.8

The terms error, accuracy, and precision were used then with

meanings very close to modern conceptions. Bias was recognised

as having components inherent in the measurement method itself

but also from a personal equation. A concept close to the modern

uncertainty was also recognised (but not under that name).

Standard deviation was introduced in 1893 in lectures by Karl

Pearson.

Uptake of these ideas by the analytical chemistry community

was very slow and patchy. From an early date9 practitioners were

content to discuss analytical method performance solely in terms

of accuracy (then taken as the closeness of agreement between

a measured value and a reference value). Precision, however, was

overall slow to emerge as a separate concept among analytical

chemists.

This was no doubt because analysis until recently comprised

the painstaking and time-consuming chemical manipulations of

gravimetry and titrimetry. Analysts had to concentrate on

getting the chemistry right, and there was little incentive for them

to replicate longwinded measurements. Landmark texts such as


Hillebrand and Lundell10 did not mention precision. The advent

of rapid instrumental analysis changed all of that. By the 1950s

analysts began to recognise a clear distinction between accuracy

(as smallness of bias) and precision (as smallness of dispersion).11

(Note: accuracy was originally regarded as smallness of bias, but

has been replaced for this meaning with trueness. Accuracy is

now defined in VIM3 as ‘closeness of agreement between

a measured quantity value and a true quantity value of

a measurand’.)

The effect of conditions of measurement on precision was

recognised by the 1950s.12 The key distinction between within-

laboratory conditions and between-laboratory conditions for

estimating precision was emphasised by Youden13,14 who, in

1969, tentatively ascribed to them the respective terms repeat-

ability and reproducibility in relation to the landmark develop-

ment of the collaborative trial (inter-laboratory method

performance study). Adoption of these terms in the analytical

community was reinforced through international standards15

and protocols.16 Even so, this distinction has been slow to be

generally recognised: undergraduate texts in analytical chemistry

until recently tended to ignore the distinction (or even get it

wrong!). In a survey of papers in one issue of The Analyst in 1994,

it was found that repeatability and reproducibility were confused

with each other in no less than 40% of the papers.17 This lack of

discrimination was cognate with the common propensity of

analysts to underestimate their uncertainties, a tendency still

discernible in 2008.18

1.3 Interpretations of VIM3

Shortcomings in the VIM3 definition of precision are immedi-

ately apparent to the analytical chemist. At face value VIM3

precision does not apply to measurements on substances (e.g.,

steel) or specific bodies of material (e.g., consignments of

peanuts), as opposed to objects. We must assume that the wider

meaning was intended. Another problem stems from same or

similar. Replicated chemical measurement usually (but not

always) involves separate test portions of the material of interest,

rather than the same object. The alternative word similar is

vague. Which qualities have to be similar? How similar is similar?

An earlier definition of precision19 is of interest: closeness of

agreement between independent test results obtained under stipu-

lated conditions. A noteworthy difference from VIM3 is the

requirement for independence has been dropped. This change

could have important repercussions. Independence is one of the

premises supporting many types of statistical inference, such as

commonly used tests of significance. The possibility of serial

dependence (time series) in analytical data should always be

borne in mind.

The most commonly referenced conditions of replication are

repeatability and reproducibility. These terms, as originally

developed for use in analytical chemistry, had a narrow specific

meaning.13,15,19 Their original definitions, however, have been

generalised in VIM3, so that there are now unhelpful variations

in the way that the terms can be interpreted. Besides these

conditions, plus the VIM3-defined intermediate conditions, there

are other conditions of replication that are commonly encoun-

tered but are not defined in normative documents, not consis-

tently named, and often used inappropriately. Of these,

Anal. Methods, 2012, 4, 1598–1611 | 1599


Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

instrumental precision and calibration precision (using my own

terminology for want of apt alternatives) are in a class apart,

referring largely to the instrumental measurement aspect of an

analytical method. All other conditions refer to replication of the

complete analytical procedure, beginning with the selection and

weighing of a test portion. These conditions are summarised in

Table 1 and discussed in more detail subsequently. A quantita-

tive example of the way that standard deviation in a single

method (zinc in foodstuffs by atomic spectrometry) depends on

conditions of measurement is also shown in the table, to

emphasise the importance of using the value appropriate to the

context.

Fig. 1 Values (solid circles) and 95% confidence limits (open circles) of

a standard deviation estimated from various numbers of independent

random observations taken from a standard normal distribution (i.e.,

with a variance of unity).

1.4 Statistical considerations

This survey is concerned with clarifying the definitions and

appropriate applications of the various conditions for assessing

precision. There is an appendix commenting on aspects of the

statistical approaches that support these applications. However,

there are certain statistical facts that, from the beginning, colour

any discussion of precision and uncertainty and are therefore

worth stating immediately.

�A standard deviation s is an estimate of a population value s,

derived from observations x1, x2, ., xn via the equation

s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPðxi � xÞ2=ðn� 1Þ

qand is itself a variable. Estimates from

small numbers of replicated results have a wide dispersion

(Fig. 1). For example, if you want a standard deviation estimate

that itself has a relative standard error of 10%, you would need to

use 50 replicated results. With the customary 10 replicates, the

relative standard error of the estimate would be 22%. This

Table 1 Current usage in terms denoting conditions of replication in chemicastandard deviations (RSD) refer to the determination of zinc, at midrange co

Name of condition Conditions of replication

‘Instrumental’ Replication of measurement aquickly as possible, on a singleportion of test solution, with nadjustment of instrument.

‘Calibration’ Replication of results obtainedrepeat measurement of a singlsolution, involving evaluationestimated calibration function

Repeatability Replication with the same anaprocedure, instrument and reain the same laboratory, by theanalyst, in a ‘short’ period of

Intermediate (synonyms ‘run-to-run’and ‘within-laboratoryreproducibility’)

Replication in separate runs. Sanalytical procedure and laborbut there may be different anainstruments, or batches of rea

Reproducibility (1) Replication by the same analyprocedure in different laborato

Reproducibility (2) Replication by the same nomimethod but with variation in din different laboratories.

Reproducibility (3) Replication by various methoddifferent laboratories.

1600 | Anal. Methods, 2012, 4, 1598–1611

implies that random variations in estimates of major components

of uncertainty are often able to dwarf the complete contributions

from minor components. Moreover, while the variance (s2)

provides an unbiased estimate of s2, s is not unbiased relative to

s, especially noticeable for small (<10) n. These features may be

important when we consider standard deviation per se. However,

use of the t-distribution takes care of these problems in esti-

mating the confidence limits of means. (Notes: the standard

deviation of a statistic (as opposed to a simple variable) is called

a ‘standard error’. ‘Relative standard deviation’ or ‘relative

l measurement. All assume independence of results. The example relativencentrations in foodstuffs, by atomic spectrometry

CommentsExampleRSD, %

s

o

Does not include variationoriginating from separate testportions or chemical manipulations.

0.9

bye testvia the.

Does not include variationoriginating from separate testportions or chemical manipulations.

1.9

lyticalgents,sametime.

The ‘short period of time’ is thelength of an analytical ‘run’, that is,the period in which we assume thatthe factors affecting the magnitude oferrors have not changed.

2.9

ameatory,lysts,gent.

This condition of precision isaddressed in internal quality control.

4.0

ticalries.

This is the estimate provided by thecollaborative (interlaboratory) trial.

5.8

naletails

Estimate can often be obtained fromthe results of a single round ofa proficiency test.

6.0

s in Estimate can often be obtained fromthe results of a single round ofa proficiency test. The standarddeviation is usually greater than thatof reproducibility (1).

7.4



Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

standard error’ refers to the value of s/�x or 100s/�x as

appropriate.)

� Chemical analysis is concerned with quantities that are

strictly bounded: masses and amounts of substance (and there-

fore mass fractions, concentrations, etc.) cannot be negative. But

the quantities actually measured—light intensity is an example—

are only surrogates for mass or amount, and are often not

bounded at a signal strength corresponding to zero concentra-

tion. This means that chemical measurements can (and some-

times do) give rise to analytical signals that impute negative

results on the concentration scale (Fig. 2). While such results

have no corresponding physical realisation, they are still

imparting real information about precision and we need to

handle them aptly to avoid incorrect conclusions.

� Precision varies with the concentration of the analyte, as

does relative precision also. In much of the discussion below, it

has been possible to discuss aspects of precision as thought it

were invariant. However, in analytical systems where the

concentration of the analyte in typical samples is dispersed over

a wide range, this dependence of precision on concentration has

to be taken into account. Methods of doing that are discussed in

detail below (Section 2.7 ff).

� Outliers among replicated results in chemical measurement

are not rare. In a study of proficiency test results from 2006,

covering many different analytes, test materials and laboratories,

as many as 4% of reported results were classed as outliers.20 The

appropriate statistical treatment of such results is still a disputed

topic among analytical chemists.

2. Definitions and applications

2.1 Instrumental precision

Instrumental conditions of replication pertain when a single

portion of the prepared test material (usually a test solution) is

subjected to replicated measurement, with no instrumental

adjustment, in the shortest possible time. This quantity essen-

tially describes the short-term behaviour of the instrument only

Fig. 2 Conceptual calibration function (solid line) near zero concen-

tration. Random variation (example points) in the analytical signal at

zero gives rise to a proportion of negative concentration results via the

extrapolated function.


and is therefore of limited relevance when, as usual, measurement

is preceded by chemical treatment. It is an essential tool in

instrumental development, and is often quoted in research papers

but, as a glance at Table 1 confirms, would be grossly misleading

if used to predict analytical performance in real-life conditions. It

is properly used, in classes of analysis like atomic spectrometry,

in checking the short-term instrumental stability before a run of

analysis begins.

It is less helpful where several rapidly replicated readings from

a single test solution are averaged for the final reading, as in

atomic spectrometry methods. A transient instrument malfunc-

tion, imperfectly mixed solution, or a memory effect might well

give rise to poor precision at that stage. However, the converse

inference is not true: a poor precision estimate does not neces-

sarily imply a problem. Standard deviations estimated from a few

repeated readings will have relatively enormous uncertainties, so

little can be read into these outputs (Fig. 1). For instance, for

a population standard deviation of 1.0, the 95% confidence limits

of a standard deviation estimated from the commonly used three

results will be as wide as (0.16, 1.92). Clearly such a statistic

would be misleading in screening results for problems.

Instrumental standard deviations tend to be several times

smaller than the corresponding repeatability standard devia-

tions, which include lower-frequency variations in the baseline

signal and sensitivity, and effects brought about by variations in

the chemical treatment of successive test portions. They should

never be taken to represent the precision of the whole analytical

procedure. A common mistake occurs when instrumental preci-

sion is assessed at or near zero concentration and then used by

the unwary to quantify a detection limit. This practice, some-

times seen in instrument brochures, gives rise to a false idea of the

detection power of an analytical method, with discrepancies as

large as tenfold between ‘instrumental detection limit’ and the

more reasonable ‘repeatability detection limit’, which includes

inter alia some effects resulting from variation in chemical pre-

treatment of the test portions.21

2.2 Calibration precision

When an instrument is calibrated, the analyst measures the

analytical signals (xi) corresponding to calibrators containing

different concentrations (ci) of the analyte and calculates a cali-

bration function x¼ f(c, q) with parameter estimates q¼ [q1, q2,

., qn]. A concentration c0 in a test solution is calculated from the

corresponding response x0 via the ‘evaluation function’ c0 ¼f�1(x0, q) in an operation sometimes called ‘inverse calibration’.

(With a simple linear calibration function x ¼ a + bc, we have

parameter estimates q ¼ [a, b] and c0 ¼ (x0 � a)/b.) As both x0and q are variables, they interact to provide an unexpectedly

large dispersion of possible values of c0 (Fig. 3). The corre-

sponding standard deviation sx0can be calculated directly from

the calibration data and, under the normal assumption, provides

confidence limits on the predicted concentration, sometimes

called ‘inverse confidence limits’22 or ‘fiducial limits’.23 Lack of fit

between the calibration data and the selected calibration function

is subsumed into the precision. This can be a useful exercise to

check or improve the calibration strategy. However, the cali-

bration precision obtained does not reflect the uncertainty of the

‘real-life’ analytical result and must not be used for that purpose.

Anal. Methods, 2012, 4, 1598–1611 | 1601


Fig. 3 Schematic diagram of an estimated calibration function (diagonal

solid line) with confidence interval (shaded band) and a newly observed

response (horizontal solid line) with its own confidence interval (shaded

band). The response x0 gives rise to an estimated concentration c0. The

interaction between the two confidence bands gives rise to an unexpect-

edly wide confidence interval (double arrow) around c0.

Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

This is because error contributions from the preparation of the

test solution from the test portion, and matrix mismatch between

the calibrators and test solutions, are not accounted for.

Fig. 4 A run of replicate observations under repeatability conditions

showing raw results (upper plot) and differences between adjacent results

(lower plot). The clear trend of the raw results is absent in the differences.

The standard deviation of the differences is divided byffiffiffi2

pto obtain the

de-trended standard deviation.

2.3 Repeatability precision

Repeatability conditions are defined as: condition of measure-

ment, out of a set of conditions that includes the same measurement

procedure, same operators, same measuring system, same oper-

ating conditions and same location, and replicate measurements on

the same or similar objects over a short period of time.6 Unlike

instrumental precision and calibration precision, repeatability is

taken to involve ‘real-life’ analysis as the chemical manipulations

preceding measurement are included as possible sources of

variation. However, the definition is ambiguous for analytical

chemists and this leads, unfortunately, to a range of possible sub-

types of repeatability.

2.3.1 Initial interpretation—the ‘run’. As well as the previ-

ously noted problem with same or similar, the vagueness in this

definition occurs in short period, which leaves the conditions of

measurement open to an important variety of interpretations.

First, an analytical procedure may extend over several days in

stages, between which the part-completed work is set aside. Thus

we might have: on Day 1, weighing the test portions; on Day 2

chemical decomposition; on Day 3 making solutions to a fixed

volume and instrumental measurement. This is hardly a short

period, but would certainly be covered by the original idea of

repeatability conditions. Secondly, overlooking this problem,

how short is short?

A prima facie interpretation of repeatability might be that

short means a period during which factors that determine

precision remain constant, including factors not specified in the

VIM3 definition. But factors do not remain constant—there are

inevitably systematic changes with time, even if negligibly small.

So in the real world we have to specify instead a period during

which the factors have of necessity to be regarded as constant.

Any systematic changes within the period then become attributed

1602 | Anal. Methods, 2012, 4, 1598–1611

to random variation. This period is the analytical run,24 during

which a number of test materials of the same type are processed

sequentially as a batch, and drifts will be negligible, or at least

tolerable in the context of fitness for purpose. A run could

conceivably comprise hundreds of different test materials or as

few as one. The duration of a run would depend on the stability

of the particular method and the requirements of fitness for

purpose. Often the run would be defined as the period between

discrete changes in the analytical system, such as the preparation

of new batches of reagents or restarting an instrument after an

overnight shutdown. Adjustment of calibration drift might or

might not define the start of a run.

2.3.2 Factors affecting repeatability precision. A run of

analyses carried out to assess repeatability precision could

conceivably comprise nothing other than a succession of test

portions of the same homogeneous test sample. In such a run it is

a simple task to detect significant drifts and, if required, de-trend

the data. This possibility immediately engenders two extreme

versions of repeatability precision, that is, precision determined

either on the raw data or on the de-trended data. Although this

experiment is seldom carried out, the distinction is by no means

an academic quibble. Fig. 4 shows a run of such repeatability

results in which a clear trend is visible, of a magnitude roughly

the same as the peak-to-peak variation. This behaviour is typical

of many analytical systems. The lower run shows the same results

de-trended by plotting difference between successive results. The

standard deviation of the raw results (0.58) is reduced to 0.46 in

the de-trended data. (Note that the standard deviation of the

differences is divided byffiffiffi2

pto obtain the de-trended standard

deviation, because the variance of a difference is the sum of the

variances of the individual values.) A specific term, immediate

conditions, is suggested for detrended repeatability conditions,

which would apply to duplicates adjacent in the sequence of an

analytical run.

A further factor affecting estimates of repeatability standard

deviation is the condition of the test material. In ‘real-life’

analysis (that is, with typical test materials) the state of the test

samples will be representative automatically. Often, however,

precision is studied by the replication of measurements on



Fig. 5 Absolute differences of duplicated measurements plotted against

mean results (points), with lines showing percentiles for a relative stan-

dard deviation of 0.1 (median solid, 95th percentile dashed, and 99th

percentile dotted).

Fig. 6 Duplicated results (arbitrary units) showing absolute difference

plotted against mean (black squares), bin boundaries (dashed lines), and

medians of results in bins (red solid circles). The corresponding estimates

of standard deviation are the median absolute differences divided by

0.954.

Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

reference materials or ‘control materials’ that, even when matrix-

matched, are not completely representative. These materials are

specially prepared for use by atypically fine grinding and thor-

ough mixing, and sometimes additionally by chemical treatment

to ensure long-term stability. That is of course essential for many

analytical tasks, such as internal quality control (see Section 2.4),

but it will usually improve precision noticeably beyond that of

‘real-life’ analysis. These special materials will tend to provide

unrealistically small estimates of dispersion that may not be

suitable for incorporation into uncertainty budgets.

2.3.3 Within-run repeatability. In real-life analysis, duplica-

tion is often required, either as a feature of the method per se or

for the purpose of within-run internal quality control. If the

duplicate test portions were processed in adjacent positions in the

run, then the de-trended precision would describe the distribu-

tion of differences between adjacent pairs of results. However,

de-trended precision would not be typical of variation over the

whole run, that is, under a within-run interpretation of repeat-

ability. For that purpose the duplicate test portions should be

located at random positions in the sequence. An additional

feature of ‘real-life’ analysis is that the run will comprise

a number of different test materials, reference materials, control

materials, blanks, etc. Adjacent test solutions in the measure-

ment sequence may thus have very different compositions, giving

rise to the possibility of memory effects. These extra sources of

variation combine to produce the ‘real-life’ repeatability condi-

tions, which cannot therefore be assessed simply by an unbroken

run of repeat test portions.

Real-life repeatability could be addressed by intercalating

portions of a single test material at random positions in a real-life

run. However, any single test material might not be representa-

tive of the class of the test material specified in the procedure and

could give rise to an atypical standard deviation. An alternative

approach that avoids this problem is to duplicate the entire set of

test materials in a randomised sequence in the run. The precision

standard deviation could be estimated from the differences

between corresponding pairs. The potential for variation of

precision with concentration of the analyte would have to be

taken into account but this is readily manageable, given enough

data (see Section 2.3.4).

2.3.4 Within-run precision tests. In instances where dupli-

cated results are obtained within a run from a not-large number

of test materials, the data can be used to check for unacceptably

poor repeatability precision. This may be especially valuable in

non-routine analysis where run-to-run statistical control is a void

concept. (Even in routine (multi-run) analysis, duplication can

support run-to-run control as an extra diagnostic guide and help

to eliminate blunders.25) However, we have seen that standard

deviations estimated from two results are biased, as would be an

average of several such estimates. An additional complication is

that the concentrations of the analyte in successive test materials

will differ and the precision is likely to be dependent on the

concentration. A compact way of overcoming these problems is

to ‘map’ the absolute difference between each duplicate pair

against the mean of the two results. Lines can be placed on such

a map showing percentiles of a prescribed distribution as

a function of concentration (Fig. 5). A key feature of the map is


that the median absolute difference (MAD) between random

duplicate results from a normal distributionN(m, s2) will have an

unbiased expectation of 0.954s, close enough to s to ignore the

factor for visual comparison. The underlying standard deviation

could represent an independent criterion of fitness for

purpose.26,27 Adherence to the criterion could be judged visually

(Fig. 5).

When large numbers of duplicated results are available, stan-

dard deviations can be estimated with reasonable accuracy and

the relationship with concentration explored.28 The absolute

differences can be binned into narrow concentration ranges

containing differences from at least 20 different test materials

(Fig. 6). Within each bin the concentration c can be regarded as

constant and estimated as the median, and the standard devia-

tion can be estimated as sc ¼ MAD/0.954. Use of the median

robustifies the estimate against the influence of outlying results

from atypical test materials. The relationship between sc and c

can then be considered.

Anal. Methods, 2012, 4, 1598–1611 | 1603


Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

2.3.5 Repeatability precision from collaborative trials.

Collaborative trials (interlaboratory method performance

studies) involve the analysis of at least five different examples of

a class of test material, by a carefully described method, inde-

pendently in eight or more competent laboratories. The test

materials are selected to encompass a range of concentrations of

the analyte and to span the matrix types falling within the class.

The analysis is carried out with blind duplication or with a split-

level design in a randomised order to minimise observer bias. The

results for each test material are separately subjected to one-way

analysis of variance, to provide estimates of repeatability and

reproducibility standard deviation. These statistics are regarded

as properties of the method applied to the specified class of test

material. These conditions of analysis approximate to real-life,

although the test materials are likely to be atypical in that they

will have been subjected to fine grinding to ensure that they are

sufficiently close to homogeneous. A large number of methods

have been subjected to a collaborative trial, especially in the food

analysis sector. Shortcomings of the collaborative trial are (i) it is

very expensive to carry out, typically £50 000–100 000 and (ii) the

dataset is small so that the 95% confidence intervals on the

estimated standard deviations are uncomfortably wide.29

The repeatability standard deviation derived from a collabo-

rative trial is an average over all the participating laboratories. It

is a useful guide for potential users of the method, but cannot be

taken a priori as describing precision in individual participating

laboratories, or in non-participants, as conditions of measure-

ment will differ between laboratories. In consequence, a method

would have to be revalidated and its repeatability precision re-

assessed for use in a new laboratory.

2.4 Intermediate conditions of precision

Intermediate conditions are defined as: condition of measurement,

out of a set of conditions that includes the same measurement

procedure, same location, and replicate measurements on the same

or similar objects over an extended period of time, but may include

other conditions involving changes.6 This definition suffers from

the ambiguities previously noted, and is also unfortunately often

referred to as ‘within-laboratory reproducibility’. From the

standpoint of analytical chemistry, the only useful condition

from the set is run-to-run precision, which is the relevant measure

for internal quality control1 involving control materials and

control charts.

2.4.1 Statistical control. The essential idea of statistical

control is that the behaviour of a relevant index of the system

resembles an independent random variable from a normal

distribution. A deviation from the mean value, of magnitude

greater than three standard deviations, is taken as so unlikely

under statistical control that it indicates the opposite. Out-of-

control conditions thus show that factors affecting the level of

uncertainty of the measurement have changed. This topic has

been reviewed in depth for univariate analytical data30 and

studied for multivariate data.31

Internal quality control is essential in routine analysis to

ensure as far as possible that conditions affecting uncertainty of

results are stable over long periods of time, so that uncertainties

established during or close to the initial validation of a method in

1604 | Anal. Methods, 2012, 4, 1598–1611

a single laboratory can be attributed to results obtained there in

the indefinite future. It is executed by the insertion, into each run

of test portions, of one or more control materials. These materials

act as surrogates for the test materials so must have a bulk

composition typical of the materials under test. Moreover, the

concentration(s) of the analyte(s) in the control material(s) must

be appropriate for the application. The control materials will

probably be more finely divided than usual to reduce heteroge-

neity to negligible levels, and dried or otherwise treated to ensure

stability. Results are plotted on control charts. Where out-of-

control conditions are indicated, analysis should be halted until

the cause of the problem has been investigated and, where

necessary, alleviated. The affected run of results is reviewed with

the possibility of rejection and reanalysis.

A word of caution about the exact purpose of a control chart is

required to deflect practitioners away from some common

misunderstandings: the control chart must be based only on the

statistical behaviour of an indicator variable of the system under

study. That variable would normally be the result of a specific

analytical method applied in a particular laboratory to portions

of a specific control material. The control chart must be defined

by the mean and standard deviation of the indicator variable

itself and no other. Separately determined preferred values or

certified values should not be used for the mean. Nor should fit-

for-purpose, certified ranges or other uncertainties be used to set

the standard deviation. The point is that the control chart

describes the complete analytical system applied to a specific

control material. Certificate values, however, describe the refer-

ence material alone, while fitness-for-purpose criteria describe an

ideal (and therefore nonexistent) situation. (The behaviour of the

control material may not represent exactly the precision relevant

to routine test materials, because it is likely to be more finely

ground, so judgement should be used in any application of the

control statistics outside the control chart itself.)

Certified reference materials are sometimes used in quality

control, indeed are mandatory in some sectors, but are much

more reasonably used as occasional checks on accuracy, effec-

tively as a one-laboratory proficiency test. The use of a CRM on

a scale appropriate for internal quality control would nearly

always be inordinately expensive but on a lesser scale ineffectual.

2.4.2 Initiating a control chart. Setting up a control chart

calls for considerably more care than generally realised.30 Text-

books commonly refer to deriving the control lines from the

parameters of the process, that is, the mean m and the standard

deviation s. However, we have access only to estimates �x, s of

these. When setting up a control chart we typically have few

observations and their values, even for an ‘in control’ system, are

likely to be erratic. Moreover, the analysts will often be relatively

inexperienced with a new method so that early results are likely

to be more variable, and more likely to contain outliers, than

those obtained when the process has ‘bedded down’. In practice

the limits on a control chart may become stabilised only after

about 30 results have accumulated and the parameters estimated

from them by a robust method.32

In addition to these statistical problems, the results have to be

as far as possible representative of the system as it operates under

routine conditions. We need to see the variability of results on the

control material when it is in a random position in successive



Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

runs containing the usual number of test materials, check solu-

tions, blanks, and duplicates, etc. All of this implies that a good

value for run-to-run standard deviation cannot be obtained

during a one-off validation exercise, but only after the system has

been in actual use for some time. That presents a difficulty

because we need a control chart from the very start of routine

operations.

The best approach seems to be to start immediately after

validation, with an interim chart that is reviewed and replaced

after enough ‘real-life’ results have accumulated. The reviews

should take place after (say) 10, 20 and 30 runs have taken place,

and thereafter at less frequent intervals. The interim chart could

start with a mean value estimated under repeatability conditions

during validation. The standard deviation sr estimated under

repeatability conditions would be too small for a run-to-run

precision, but a value of srtr¼ 1.6sr has been suggested as suitable

for an interim chart.32

2.5 Reproducibility conditions

Reproducibility conditions: condition of measurement, out of a set

of conditions that includes different locations, operators,

measurement systems, and replicate measurements on the same or

similar objects.6 There are several conditions under this heading

that are of crucial interest to analytical chemists, because

reproducibility is the condition where the standard deviation

most closely approximates uncertainty (see Section 2.6). It should

be noted that reproducibility conditions in analytical chemistry

originally referred to collaborative trials (see Section 2.5.1), and

that convention should ideally remain the established practice.

However, VIM3 redefined the term vaguely, and analysts must

be prepared to encounter alternative usage.

2.5.1 Reproducibility (1st meaning) and Horwitz’s generalisa-

tions. The most easily defined and by far the most intensively

studied version of this condition is simple between-laboratory

precision where a single well-defined analytical method is used in

different laboratories, giving rise to the familiar reproducibility

standard deviation sR derived form the collaborative trial (more

properly, the interlaboratory method performance study)

described in Section 2.3.5. It is a common observation that the

standard deviations tend to increase with increasing concentra-

tion. The resulting statistics are attributed to the method rather

than the laboratories participating. The test materials have to be

specially prepared for ‘homogeneity’ by fine-grinding, however,

so the resulting statistics tend if anything to underestimate values

relevant to routine practice.

Interlaboratory studies of method precision have been carried

out since the 1930s, and Horwitz has made databases of precision

statistics garnered from several thousand in the food sector.33–42

From these he was able to see two striking generalisations.43

� The reproducibility relative standard deviation tended to be

2% at a mass fraction of 0.01, doubling for each reduction in

mass fraction by a factor of 100. This can be expressed more

conveniently as the ‘Horwitz function’, sR ¼ 0.02c0.8495 or, in

logarithmic form, log10 sR ¼ 0.8495log10 c � 1.699, where sR is

the standard deviation predicted at mass fraction c. This trend

held over a long period of time, irrespective of the measurement

principle, the analyte, and the test material.


� The ratio sr/sR had an average of 0.5 over all results.

These behaviours were carefully tested by subsequent statis-

tical studies.44 The trend of the data was strikingly close to the

Horwitz function over the approximate concentration range 10�7

to 10�1 mass fraction, the ‘Horwitz region’. Within that range the

average value of sr/sR was 0.49. This latter result has an impor-

tant use. Collaborative trials are costly to organise, but it is easy

to obtain a value of sr in a single laboratory. Failing the avail-

ability of an interlaboratory study, a tentative estimate of sRwould be simply 2sr.

The existence of the Horwitz function has been tentatively

attributed to evolution of the methods used towards fitness for

purpose.45 By ‘natural selection’, with no other overarching

principle, methods that were either too expensive (uncertainty

too small) or gave rise to too many incorrect decisions (uncer-

tainty too great) would be discarded in favour of more suitable

methods. The Horwitz function is simply an ‘emergent’ feature of

fitness. Because of that the Horwitz function is used in a number

of contexts as a concentration-dependent fitness criterion, for

example in proficiency tests (for example, FAPAS46) and method

validation47 in the food sector.

Outside the Horwitz region the standard deviation predicted

by the function is systematically too high. At mass fractions

below about 10�7 the function predicts values of sR/c that would

exceed 0.3, thereby implying that all concentrations would be

below the detection limit and rendering analysis futile. However,

the observed trend, from 10�7 down to 10�14, is close to sR/c ¼0.22, just sufficiently below the detection limit. At mass fractions

above about 10%, precision tends again to be better than the

Horwitz prediction. This may be related to the high concentra-

tion of the analyte in relation to potential interferents and the use

of high-precision gravimetric and volumetric procedures.

We should note that the Horwitz function, with no unknown

parameters, is a descriptor of analytical methods in general.

Detrimentally it has no intercept (sR ¼ 0 at zero concentration)

so cannot take account of results near detection limits of indi-

vidual methods. Other functional relationships have been devised

to provide better fits of individual methods (see Section 2.7).

2.5.2 Reproducibility (2nd meaning). Participants in profi-

ciency tests often claim to use a standard method, but in fact

introduceminormodifications to suit their own environment or to

accommodate peculiarities of the test materials they encounter.

These modifications introduce an extra source of variation into

the uncertainty of the results reported. This can be seen for

example in Fig. 7, which shows results relating to the determina-

tion of protein in foods and feeds by theKjeldahlmethod. The line

shows the trend of results from a large collaborative trial of one

particular version of the method with 22 laboratories partici-

pating and 26 test materials of varied matrix composition.48 The

points show robust standard deviations of results from 26 rounds

of a proficiency test in which laboratories used a number of

different versions of the method. The proficiency test reproduc-

ibilities exceed those from the collaborative trial at the same

concentrations by a mean factor of 1.4.

However, an opposite tendency could possibly neutralise or

even reverse this effect. Participants in a mature proficiency test

that calls for a defined procedure, will have built up a consider-

able body of relevant experience, unavailable to the collaborative

Anal. Methods, 2012, 4, 1598–1611 | 1605


Fig. 7 Comparison of reproducibility standard deviations from profi-

ciency test rounds (points) with the trend of statistics from a collaborative

trial (line). The test materials were cereals (open circles), fish (closed

circles), meat (triangles), and milk powder (asterisks).

Fig. 8 Ratio of robust standard deviations from proficiency rounds in

the food analysis sector to the trend of collaborative trial reproducibility

standard deviations at mass fractions greater than about 10�7.

Fig. 9 Ratio of robust standard deviations from proficiency rounds in

the food analysis sector to the trend of collaborative trial reproducibility

standard deviations at mass fractions less than about 10�7.

Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

trial participants (who test newly formulated procedures), and

thus demonstrate a smaller dispersion of results. This improve-

ment could also be the outcome of the proficiency test partici-

pants working to a predetermined fitness-for-purpose criterion.

(Collaborative trial participants have no prescribed criterion for

precision—they execute the method as closely as possible to the

written procedure and report the outcome.)

2.5.3 Reproducibility (3rd meaning). Results derived on the

same material but obtained by a variety of analytical methods in

different laboratories (the usual conditions in a proficiency test)

are likely to show an somewhat greater dispersion still, because

of the potential for bias between different analytical methods. An

early study, encompassing a variety of analytes, test materials

and measurement principles, showed that the robust standard

deviation of results from proficiency test rounds showed a quasi-

Horwitz dependence on concentration,49 namely srob ¼0.023c0.8255. This equation describes a trend with a standard

deviation greater than that found in collaborative trials by an

average factor close to 1.5.

A study of more recent and far more numerous statistics from

a proficiency test50 shows a monotonic increasing trend over

a wider range of mass fractions. At levels exceeding 10�6.92 the

trend of collaborative trial statistics (that is, reproducibility

standard deviations) has been found to exceed the Horwitz

function by a factor rising to a maximum of 1.6 at about 10�4

mass fraction (Fig. 8).

Below 10�6.92 the Horwitz function predicts a standard deviation

that would bring results uncomfortably close to or lower than any

reasonable definition of detection limit. So to be fit for any purpose

at all, standard deviations have to be lower than that prediction.

Theobserved trendof collaborative trial statistics at concentrations

lower than 10�6.92 is close to sR/c ¼ 0.22, showing that higher

precision is available, given the need, regardless of the Horwitz

prediction. However, the cost of such determinations can be very

high. The trend of the proficiency test data is only slightly greater

than sR/c ¼ 0.22, with an average ratio of about 1.1 (Fig. 9).

2.6 Precision and uncertainty

The ISO document ‘‘Guide to the expression of uncertainty in

measurement’’ (GUM)4 describes the estimation of uncertainty

1606 | Anal. Methods, 2012, 4, 1598–1611

of measurement in terms of a complete operational model of the

measurement process, broken down into fundamental inputs

each traceable to international standards. Each fundamental

input will be quantified by a standard deviation that characterises

its dispersion, whether that is estimated directly by replication

(‘‘Type A’’) or in any other way (‘‘Type B’’ uncertainty). Under

this regime, many of the inputs will be based on precisions

assessed under repeatability conditions, often during method

validation.

Many scientists, however, think that chemical measurement is

usually too complex to be represented adequately by such

a model. The implication is that the GUM approach might tend

to underestimate uncertainty in analysis, either by overlooking

latent inputs or by ignoring possible interactions among known

inputs. As a readily available alternative, those analytical scien-

tists have often regarded reproducibility precision as a practical

estimate of standard uncertainty. The Eurachem Guide5 recom-

mends reproducibility standard deviation as a basis for uncer-

tainty provided that bias and contributions associated with

traceability (usually calibration uncertainties) also are taken into

account.

2.6.1 Limitations of replication in uncertainty estimation.

Many metrologists are uncomfortable with replication as an



Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

unqualified method of estimating uncertainty, and it is easy to see

why. Replication alone could be appropriate only under two

conditions, namely:

� the replication of the measurement is able to explore all of

the scope for variation in the analytical measurement procedure;

� the analytical method is known to be unbiased or, more

strictly, systematic effects are negligible in relation to uncon-

trolled variation.

Repeatability replication clearly does not fulfil the first

requirement. For example, if a procedure specified drying the test

material for one hour at 110 �C, we might reasonably expect

variations in timing between 50 and 70 minutes and in temper-

ature between 105 and 115 �C on different occasions or in

different laboratories. Repeatability results would not reflect this

potential variation, so the repeatability standard deviation srwould be too small an estimate of the uncertainty contribution.

On these grounds alone, repeatability standard deviation should

tend to underestimate standard uncertainty and individual

contributions to an uncertainty budget.

It therefore comes as a surprise to find many instances where

estimates of sr obtained covertly have been found considerably

to exceed claimed levels of standard uncertainty in results from

contracted-out analysis produced by accredited laboratories.51

This finding provides a compelling case for users of contracted-

out data to conduct such checking by the insertion of blind

duplicate portions of actual test samples (in contrast with care-

fully prepared control materials, which would provide a smaller

standard deviation). Results such as these in sufficient numbers

could be assessed by using a ‘duplicate map’ as discussed in

Section 2.3.4.

2.6.2 Relevance of interlaboratory replication. Measurements

under reproducibility conditions are much better able to explore

the sample space for variation, because practice and environment

in one laboratory will be to a large extent independent of those in

another. Even so there are conceivable objections to this

assertion.

� Practice in different laboratories might be unexpectedly

uniform so that the potential for variation is not fully explored.

For example, if the written procedure specified a heating time of

60 � 10 minutes, different laboratories might tend to use a time

of 60 � 2 minutes and thus not sample the full range allowed.

This would give rise to an estimate of reproducibility standard

deviation sR smaller than that corresponding with the written

procedure.

�Within any class of test material there will be variation in the

matrix of the test material. For example, in the class ‘vegetable’

there might be carrots, onions, potatoes, brassicas, etc. Each of

these matrix varieties will, in principle, introduce its own matrix

effect52 and thus engender a contribution to the uncertainty

budget. This contribution may not be encompassed by repro-

ducibility standard deviation. Perhaps more importantly, any

inherent bias in a single measurement procedure will systemati-

cally affect all results from the laboratories using it. (Some

metrologists claim that interlaboratory precision permits no

proper reference to traceability, but I regard this feature as

subsumed in the issue of bias.)

Thus these two considerations imply that sR tends to

underestimate standard uncertainty, although perhaps by


a relatively small margin because analytical chemists use

methods with negligible bias wherever possible. Rather

surprisingly, when we compare reported estimates of sR and

corresponding standard uncertainties we find an opposite

tendency to be the case, overall by a substantial margin.53 In

a number of contexts the tendency has been for sR to be greater

by an average factor of 1.5. This implies that there is a broad

tendency for unknown factors affecting uncertainty to be

omitted from uncertainty budgets, presumably because of the

complexity of chemical measurement. This phenomenon has

been called ‘dark uncertainty’. Replication under reproduc-

ibility conditions can apparently account for at least some of

these unknown factors, however.

2.7 Precision as a function of concentration

When the analyte is determined directly (that is, not by differ-

ence) by using a single procedure, it is a common observation

that the dispersion of the measurement results increases

smoothly with the concentration of the analyte. When the

concentration range is small, especially when confined to values

less than ten times the detection limit, this variation in precision

may be imperceptible or, for practical purposes, negligible. For

wide concentration ranges distant from the detection limit,

however, a roughly constant relative dispersion is often

apparent. These two behaviours can be reconciled into a widely

applicable single model by error-propagation theory.

2.7.1 Characteristic function. Consider an analytical system

in which the net signal at zero concentration has a standard

deviation of magnitude a. The signal at a higher concentration c

> 0 will be greater because of an additional contribution from the

uncertainty in the estimated gradient of the calibration function.

That term will be proportional to the concentration c, thus

contributing a standard deviation of bc. The combined disper-

sion at c will be described by s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ ðbcÞ2

q. This equation

(sometimes called the ‘characteristic function’) displays the

observed relationship. At low concentrations s increases slowly,

being determined largely by the value of a (Fig. 10). At higher

concentrations the term bc dominates and a tendency towards

constant relative standard deviation b prevails (Fig. 11).

Characteristic functions conforming to that model have been

confirmed in various types of analytical system under conditions

of instrumental precision, repeatability precision and reproduc-

ibility precision54 and are of presumably wide applicability. By

implication they also apply to uncertainty, although that might

be impracticable to demonstrate. Characteristic functions are

essential for estimating uncertainties at concentrations other

than those utilised in validation.

A characteristic function refers only to a single analytical

procedure under fixed conditions of replication. Its parameters

are unique and have to be estimated by validation. It takes

proper account of detection limit phenomena by virtue of the

a parameter. In all of these aspects it differs from the Horwitz

function (see Section 2.5.1), which has a quite different purpose.

The Horwitz function is a generalisation about the trend of sR in

large suites of analytical methods. It has no unique parameters to

determine, and takes no account of detection limit.

Anal. Methods, 2012, 4, 1598–1611 | 1607


Fig. 10 Standard deviation varying with concentration according to the

characteristic function s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ ðbcÞ2

q, with a ¼ 5, b ¼ 0.05.

Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

2.7.2 Estimating parameters of a characteristic function.

Demonstration of compliance (or otherwise) of an analytical

system with the model such as the characteristic function requires

a large amount of data in the form of precise estimates of stan-

dard deviation at closely spaced concentrations. Even collabo-

rative trials, widely regarded as the most informative type of

method validation, can scarcely provide enough information for

the task.29 Nevertheless, in instances where the characteristic

function is likely to prevail, its parameters can be readily esti-

mated as part of method validation. The parameter a is simply

the standard deviation (or uncertainty) estimated at (or close to)

zero concentration. The parameter b is simply the asymptotic

relative dispersion (ARD) at concentrations well above the

detection limit. In cases where the ARD is not approached, the

parameters can be found by estimating a first and then b asffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2c � a2

p=c, where sc is the dispersion found at concentration

c [ 0.

2.8 Precision and detection limit

The key measure of detection capability is the dispersion of the

measured analytical signal at or very close to zero concentration.

This is used to calculate the detection limit as a concentration

corresponding with critical points in the dispersion, almost

always under the assumption of a normal distribution. The

Fig. 11 Relative standard deviation varying with concentration

according to the characteristic function s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ ðbcÞ2

q, with a¼ 5, b¼

0.05.

1608 | Anal. Methods, 2012, 4, 1598–1611

detection limit cL is defined in various ways so that, at the limit,

only a negligible proportion of measurement results will fall at or

below zero. (A proportion of negative observations is the natural

outcome of estimating concentrations close to zero: there is, of

course, no corresponding physical realisation. Some analytical

instruments suppress such readings, as do some data recording

practices.)

There is a considerable literature on the detection limit,55,56,57

which is still regarded as a key aspect of validation, but has

definite shortcomings. The definition of the term has become

unduly complicated and, if it were followed to the letter, would

be difficult to put into practice. Analytical chemists commonly

use a simplified version. Moreover, the detection limit cL,

together with other related ‘limits’, encourages a false dichotomy

of a concentration scale that is in reality continuous. Analysts

tend to consider a result of 1.1cL valid but a result of 0.9cL as

qualitatively different and to be reported only on an ordinal scale

(‘less than cL’ for example). A more modern approach is simply

to report the result—censored at zero if necessary—with its

appropriate uncertainty.58 It is not clear at present whether this

new paradigm will prevail.

2.8.1 Conditions of measurement for the detection limit. The

main issue for the purposes of this review is to identify the

conditions of measurement that provide the standard deviation

leading to a useful detection limit. Ideally the dispersion cited

should represent the standard uncertainty but that is seldom

given sufficient consideration. A detection limit is commonly

estimated under instrumental conditions of precision (Section

2.1), which cannot represent the true detection capability of the

entire analytical procedure. At the other extreme, estimation by

extrapolating real-life uncertainties to zero concentration

requires an elaborate experiment and more data than is

economically practicable, even in collaborative trials. There is

little information available on the comparison. One experiment,

however, found that detection limits extrapolated from repeat-

ability data tended to be between four and ten times greater than

the comparable values based on instrumental precision.59

Uncertainty in chemical measurement springs from three main

sources: (i) variation derived from the calibration/evaluation

function; (ii) variation in the preparation of the treated test

solution used for measurement; and (iii) error in the comparison

caused by matrix mismatch. An interesting conjecture is that

items (ii) and (iii) will contribute negligibly to uncertainty at zero

concentration, which will therefore be dominated by repeat-

ability dispersion. Were that true, a ‘real-life’ detection limit

could be estimated very easily from single laboratory validation.

There is virtually no experimental support for this conjecture,

but there are indications that it might repay further investigation.

In one study, ‘characteristic functions’ of repeatability and

reproducibility standard deviations estimated in a large collab-

orative trial,48 extrapolated to almost the same intercept a (that

is, the standard deviation at zero concentration). The intercept

estimates and their standard errors were (as % mass fraction):

repeatability standard deviation, 0.236 (0.044); reproducibility

standard deviation 0.261 (0.037), so the estimates were not

significantly different. The data and fitted lines are shown in

Fig. 12. Unfortunately, very few collaborative trials will provide

enough observations for this kind of study to be conclusive.



Fig. 12 Standard deviation of repeatability (open circles) and repro-

ducibility (solid circles) from a large collaborative trial (26 materials, 26

laboratories). The characteristic functions (lines) were fitted by

a weighted non-linear method. The bar on the y-axis shows the estimated

95% confidence interval for the intercepts.

Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

Appendix A: useful distinctions among conditions ofmeasurement

These are regarded as a minimal set of conditions of measure-

ment for the unambiguous definition of precision in analytical

chemistry. They are proposed as a starting point for the eventual

establishment of an appropriate range of normative terms and

definitions.

A.1 Instrumental conditions: when a single portion of the

prepared test material (usually a test solution) is subjected to

replicated measurement, with no instrumental adjustment, in the

shortest possible time.

A.2 Calibration conditions: when an ‘inverse calibration’

precision is calculated from calibration data.

A.3 Immediate conditions: subset of ‘VIM3’ repeatability

conditions defined by differences between results from sequen-

tially adjacent pairs of a test material in an analytical run.

A.4 In-run conditions: subset of ‘VIM3’ repeatability condi-

tions defined by variation in results from replicate portions of

a test material alone in a run.

A.5 ‘Real-life’ conditions: subset of ‘VIM3’ repeatability

conditions where replicate test portions are in random positions

within a full-length run of routine test materials.

A.6 Run-to-run conditions: subset of ‘VIM3’ intermediate

conditions where test portions are replicated in random positions

within many runs of routine test materials.

A.6 Collaborative conditions: subset of ‘VIM3’ reproducibility

conditions prevailing in collaborative trials.

A.7 Generic conditions: subset of ‘VIM3’ reproducibility

conditions prevailing when a single method with procedural

variations is used for routine analysis.

A.8 Broad conditions: subset of ‘VIM3’ reproducibility condi-

tions where there is no restriction on the method of analysis.

Appendix B: statistical aspects

B.1 Rounding

For the most part, analytical results are treated as stemming

from an underlying continuous distribution. Under modern


conditions the initial measurements are truncated according the

digit resolution of the instrument display and then subjected to

calculations that give the final result to an inordinate number of

significant figures. These results should be rounded for reporting

to a degree that avoids both a false suggestion of high precision

and a loss of information.

The commonly used rule of thumb is to retain the first digit

that is uncertain but, naively applied, that is sometimes delete-

rious. For instance, in the repeated results 4.8, 4.7, 4.5, 5.4, 5.1,

the digit to the left of the decimal point is variable but, under the

simple rule, all of the results round to 5, which would generate

a variance estimate of zero. A more useful principle is that the

rounding should reflect the dispersion of the data. We have seen

(above) that a standard deviation based on a statistically small

number of results is very unlikely to be more precise than 10%

relative, which suggests that only one significant figure is likely to

be meaningful in a standard deviation (or uncertainty). However,

retaining just one significant figure sometimes creates an ambi-

guity. Rounding a raw analytical result of 0.951 to exactly 1

implies a possible range of 0.500 to 1.499 and a potential relative

uncertainty of �50%. If the estimate were just slightly lower at

0.949, however, the rounded version would be 0.9, implying

a range between 0.85 and 0.949, that is, a relative uncertainty of

about �6%. This ambiguity could be removed for all practical

purposes by retaining an extra decimal point and rounding either

number to 0.95.

An appropriate rule is therefore: (a) do no rounding except the

final reported value; and (b) round the estimated standard

deviation, standard error or standard uncertainty to two

significant figures and round the result (or the mean result) to the

corresponding degree.

B.2 Observations near natural limits

Analysts respond to results falling below natural limits (such as

a concentration of zero) in a variety of ways, all of which cause

problems in a subsequent statistical analysis:

� repeat the measurement until a non-negative result is

obtained;

� record a value of zero;

� record a value of ‘less than’ an arbitrary limit, such as

a detection limit or multiple thereof.

The first two practices give rise to a positively biased mean and

a negatively biased standard deviation if naive estimation

procedures are used. This could have a noticeable effect on

precision estimates and thence detection limits. However,

a simple expedient is to estimate the precision at the expected

detection limit. The probability of obtaining a subzero result is

then negligible, as is the effect of the slight increase in

concentration.

A proportion of ‘less than’ results in a dataset renders least-

squares estimation impossible, although maximum likelihood

estimation can cope with these mixed-type datasets.60 These

difficulties can be important for chemical measurement, where

a considerable proportion of work is related to testing for

undesirable impurities. The best method for studying precision is

to record observations exactly as they occur, so that simple

statistical estimation applies. (This is a different circumstance

from reporting results to a customer, where subzero results

Anal. Methods, 2012, 4, 1598–1611 | 1609


Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

would usually be unacceptable. For that purpose, reporting

a zero result with a zero-truncated expanded uncertainty would

be appropriate.58)

B.3 Outliers and robust estimation

Suspect values are not rare in chemical measurement: some of

these may be outliers. In estimating precision it is sometimes

regarded as appropriate to remove the influence of outlying

results. This is usually a matter of professional judgement

depending on the circumstances and purpose of the measure-

ment: no strict guidelines can be drawn. When a genuine outlier

occurs in replicated data the mean will be biased and the stan-

dard deviation inflated. These features can affect statistical

inference, such as tests of significance. In calibration/evaluation,

an outlier can have an unexpected large effect on an estimated

concentration.

In a dataset with an outlier, the analyst is faced with

a choice between describing all of the data badly and most of

the data well. Some analysts argue that raw statistics provide

a true representation of the dispersion of results that could be

expected from further use of the same procedure. I believe that

this is usually incorrect, because outliers are inherently

unpredictable. In any event, it is a question of degree—

everybody would agree to delete (or at least review) a value

that was ten times greater than the other members of the

dataset!

In cases where reducing the influence of outliers is deemed

necessary, the analyst has two options: (i) to employ outlier tests

and reject any result found unlikely to be a part of the pop-

ulation to be represented; or (ii) to employ robust statistical

methods, which typically downweight results far from the

central tendency. Outlier tests such as Dixon’s, Grubbs’s and

Cochran’s are commonly used by analysts, but have their

problems. Robust methods have become widely used since the

AMC study provided a rationale and software for robust

descriptive statistics61 and one-way analysis of variance.62

Robust regression methods in calibration are also

recommended.63

Robust methods are mostly applicable to unimodal distribu-

tions that are (outliers aside) close to symmetrical about the

mode. It is important to realise that there is no single robust

estimate of a parameter—the value obtained will depend on the

statistical procedure used, although all reputable methods should

give very similar results for roughly symmetric distributions.64

Having said that, there is no justification for calculating

a barrage of robust estimates and selecting from them on

a subjective basis.

In collaborative trials, the Harmonised Protocol requires that

outliers be rejected before analysis of variance is carried for the

estimation of repeatability and reproducibility standard devia-

tions. This is because the precisions are regarded as properties of

the analytical method rather than of the participating laborato-

ries.16 There is a fixed protocol for outlier removal, involving

Grubbs’s and Cochran’s tests. Robustification can be applied

also to analysis of variance and has been found to give statistics

almost identical with those from outlier deletion in results of

collaborative trials.65

1610 | Anal. Methods, 2012, 4, 1598–1611

B.4 Non-normal distributions

Normal distributions result from the addition of numerous

small independent variations—the central limit theorem.

Analytical results are the outcome of a succession of operations

that are often numerous, each of which contributes dispersion

that is sometimes small, mostly additive, and usually indepen-

dent. In short we expect but do not guarantee that analytical

results will, outliers aside, resemble a random sample from

a normal distribution. Under repeatability conditions we see

plausibly normal datasets, although minor deviations are

commonplace. The most usually encountered deviation is

a tendency to ‘heavy tails’, that is, an unduly high proportion of

results distant from the mean. Even under reproducibility

conditions, distributions approximating to the normal are

common. However, more drastic deviations from normal can

sometimes be observed. It is not rare to see datasets with

a skewed or bimodal tendency in the results of proficiency tests,

the outcome of the participants using one or more inconsistent

analytical methods. Statistical methods for characterising such

datasets include kernel density estimation66 and mixture

modelling.67

Other special circumstances can give rise to seriously non-

normal distributions in analytical results, resulting usually in

a positive skew.

�When results are censored at or near zero, a very asymmetric

distribution may be observed.

�When an analyte is restricted to a trace phase in a mixture of

phases, the number of discrete particles containing the analyte in

a test portion will vary according to a Poissonian distribution

(called the ‘nugget effect’ in geochemical analysis because it is

often observed in the analysis of ores of precious metals). When

small (less than ten) numbers of particles are involved, results

deviate more-or-less strongly from the normal distribution.

� When effects are multiplicative (rather than additive), as in

quantitative versions of the polymerase chain reaction (PCR),

distributions with a lognormal tendency are observed.68

� When the final result is a quotient of two widely dispersed

normally distributed variables, the outcome may have a notice-

able positive skew. This may sometimes be observed in profi-

ciency test data when the raw result is close to the detection limit

and is then corrected for a low recovery.

In such instances the analyst is sometimes tempted to log-

transform the data before statistical treatment. In most of the

situations described above (that is, apart from PRC), log-

transformation would tend to confuse rather than clarify the

issues, because the raw data will not be strictly lognormal. Log-

transformation may, however, be a useful technique in statistical

operations such as regression and analysis of variance, in

instances when we expect data showing an approximation to

constant relative standard deviation. The transformation has the

effect of stabilising the variance across the concentration range,

obviating the need for weighted statistical methods.

To avoid misunderstanding, readers should note that collec-

tions of data for the concentration of a trace constituent in

a large number of samples of a particular type often show

a distribution resembling the lognormal. This real variation

between different test materials must not be confused with the

dispersion of replicated analytical results.



Dow

nloa

ded

by W

agen

inge

n U

R o

n 20

Oct

ober

201

2Pu

blis

hed

on 2

5 A

pril

2012

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/C

2AY

2508

3G

View Online

Tests for normality are seldom informative except in speci-

alised studies. With small numbers of observations the power of

such a test is low, so significant outcomes are unlikely. In

particular, such tests require inordinately large numbers of

observations to distinguish between normally and lognormally

distributed variables.69 Real-life datasets with large numbers of

observations nearly always deviate from the normal to a signifi-

cant extent.

References

1 M. Thompson, S. L. R. Ellison andR.Wood,Pure Appl. Chem., 2002,74, 835–855.

2 M. Thompson and R. Wood, Pure Appl. Chem., 1995, 67, 649–666.3 M. Thompson, S. L. R. Ellison andR.Wood,Pure Appl. Chem., 2006,78, 145–196.

4 ISO/IEC Guide 98:1995, Guide to the Expression of Uncertainty inMeasurement (GUM), ISO, Geneva, 1995.

5 Quantifying Uncertainty in Analytical Measurement, ed. A. Williams,S. L. R. Ellison and M. Roesslein, Eurachem/CITAC Guide, 2nd edn,2000, Available from the Eurachem Secretariat and Website, http://www.eurachem.com/.

6 International Vocabulary of Basic and General Terms in Metrology(VIM), 3rd edn, JCGM 200:2008, http://www.bipm.org/vim.

7 Eurachem/Eurolab/CITAC/Nordtest/AMC Guide: MeasurementUncertainty Arising from Sampling, ed. M. H. Ramsey and S. L. R.Ellison, Eurachem, 2007, ISBN 978 0 948926 26 6.

8 S. W. Holman, Discussion of the Precision of Measurements, Wiley,New York, 1892.

9 J. Kjeldahl, Fresenius’ Z. Anal. Chem., 1883, 22, 366–382.10 W. F. Hillebrand and G. E. F. Lundell, Applied Inorganic Analysis,

Wiley, New York, 1929.11 H. W. Fairbairn, A Cooperative Investigation of Precision and

Accuracy in Chemical, Spectrochemical and Modal Analysis ofSilicate Rocks, U.S. Geological Survey Bulletin 980, WashingtonDC, 1951.

12 C. R. N. Strouts, J. H. Gilfillan and H. N. Wilson, AnalyticalChemistry: the Working Tools, Clarendon Press, Oxford, 1955.

13 W. J. Youden, Statistical Methods for Chemists, Wiley, New York,1951.

14 W. J. Youden, Statistical Techniques for Collaborative Tests,Association of Official Analytical Chemists, Washington DC, 1969.

15 ISO 5725-1:1994, Accuracy (Trueness and Precision) of MeasurementMethods and Results—Part 1: General Principles and Definitions, ISO,Geneva, 1994.

16 W. Horwitz, Pure Appl. Chem., 1988, 60, 855–864.17 M. Thompson, Analyst, 1994, 119, 127N.18 S. L. R. Ellison and K. Mathieson, Accredit. Qual. Assur., 2008, 13,

231–238.19 ISO 3534-1:2006, Statistics—Vocabulary and Symbols—Part 1:

General Statistical Terms and Terms Used in Probability, ISO,Geneva, 2006.

20 M. Thompson, K. Mathieson, L. Owen, A. P. Damant and R. Wood,Accredit. Qual. Assur., 2009, 14, 73–78.

21 M. Thompson, Analyst, 1988, 113, 1469–1471.22 J. N. Miller and J. C. Miller, Statistics and Chemometrics for

Analytical Chemistry, Pearson Education Ltd, Harlow, UK, 6thedn, 2005.

23 N. R. Draper and H. Smith, Applied Regression Analysis, Wiley, NewYork, 3rd edn, 1998.

24 M. Thomspon, Analyst, 2000, 125, 385–386.25 AMC Technical Briefs, No 49, 2011.26 M. Thompson and R. J. Howarth, J. Geochem. Explor., 1978, 9, 23–

30.


27 AMC Technical Briefs, No 9, 2002.28 M. Thompson and B. J. Coles,Accredit. Qual. Assur., 2011, 16, 13–19.29 M. Thompson, Accredit. Qual. Assur., 2008, 13, 479–482.30 R. J. Howarth, Analyst, 1995, 120, 1851–1873.31 R. J. Howarth, B. J. Coles and M. H. Ramsey, Analyst, 2000, 125,

2032–2037.32 M. Thompson and P. J. Lowthian, Notes on Statistics for Analytical

Chemists, Imperial College Press, Singapore, 2011.33 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1984, 67, 81–

90.34 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1984, 67, 648–



198.37 M.Margosis, W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem.,

1988, 71, 619–635.38 J. T. Peeler, W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem.,

1989, 72, 784–806.39 W. Horwitz, R. Albert, M. J. Deutch and J. N. Thompson, J. - Assoc.

Off. Anal. Chem., 1990, 73, 661–680.40 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1991, 74, 718–

744.41 W. Horwitz, R. Albert, M. J. Deutch and J. N. Thompson, J. AOAC

Int., 1992, 75, 227–239.42 W. Horwitz, R. Albert and S. Nesheim, J. AOAC Int., 1993, 76, 461–

491.43 W. Horwitz and R. Albert, J. AOAC Int., 1996, 79, 589–621.44 M. Thompson and P. J. Lowthian, J. AOAC Int., 1997, 80, 676–679.45 M. Thompson, Analyst, 1999, 124, 991.46 FAPAS Secretariat, Central Science Laboratory, FERA, Sand

Hutton, York YO41 1LZ, UK.47 W. Horwitz, P. Britton and S. J. Chirtel, J. AOAC Int., 1998, 81,

1257.48 P. F. Kane, J. - Assoc. Off. Anal. Chem., 1984, 67, 869–877.49 M. Thompson and P. J. Lowthian, Analyst, 1995, 120, 271–272.50 Unpublished data.51 M. H. Ramsey, personal communication.52 M. Thompson and S. L. R. Ellison, Accredit. Qual. Assur., 2005, 10,

82–97.53 M. Thompson and S. L. R. Ellison, Accredit. Qual. Assur., 2011, 16,

483–487.54 M. Thompson, TrAC, Trends Anal. Chem., 2011, 30, 1168–1175.55 ISO 11843-1:1997, Capability of detection—Part 1: Terms and

definitions.56 Compendium of Chemical Terminology (‘The Gold Book’), Online

Corrected Edition, IUPAC, ‘‘Detection limit’’, 2006.57 L. A. Currie, Anal. Chim. Acta, 1999, 391, 105–126.58 Analytical Methods Committee, Accredit. Qual. Assur., 2008, 13, 29–

32.59 M. Thompson, Analyst, 1988, 113, 1579–1587.60 Y. Pawitan, In All Likelihood: Statistical Modelling and Inference

Using Likelihood, Clarendon Press, Oxford, 2001, pp. 312–313.61 Analytical Methods Committee, Analyst, 1989, 114, 1693–1697.62 Analytical Methods Committee, Analyst, 1989, 114, 1699–1702.63 AMC Technical Brief No 50, Anal. Methods, 2012, 4, 893–894.64 S. L. R. Ellison, Accredit. Qual. Assur., 2009, 14, 411–419.65 P. J. Lowthian, M. Thompson and R. Wood, Analyst, 1998, 123,

2803–2807.66 M. Thompson, Analyst, 2002, 127, 1359–1364.67 M. Thompson, Accredit. Qual. Assur., 2006, 10, 501–505.68 M. Thompson, S. L. R. Ellison, L. Owen, K. Mathieson, J. Powell,

P. Key, R. Wood and A. P. Damant, J. AOAC Int., 2006, 89, 232–239.

69 M. Thompson and R. J. Howarth, Analyst, 1980, 105, 1188–1195.

Anal. Methods, 2012, 4, 1598–1611 | 1611


Documents

Precision in chemical analysis: a critical survey of uses and abuses[Doi 10.1039%2Fc2ay25083g] Thompson, Michael -- Precision in Chemical Analysis- A Critical Survey of Uses and Abuses