Upload
acg2903
View
221
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Precision is a key quantity in assessing the quality of chemical measurement results. It enters intoconsiderations of uncertainty, fitness for purpose, method validation, instrumental performance,internal quality control, proficiency testing, and higher-level activities. The standard deviation ofmeasurement results derived from a single analytical ‘system’ (a combination of a particular analyticalprocedure and a specific type of test material) depends on many factors, including the conditions ofmeasurement, the state of the test material, and the concentration of the analyte. It is essential thatthese factors are properly matched to the use to which the precision information will be put.
Citation preview
Dynamic Article LinksC<AnalyticalMethods
Cite this: Anal. Methods, 2012, 4, 1598
www.rsc.org/methods CRITICAL REVIEW
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3GView Online / Journal Homepage / Table of Contents for this issue
Precision in chemical analysis: a critical survey of uses and abuses
Michael Thompson
Received 23rd January 2012, Accepted 24th April 2012
DOI: 10.1039/c2ay25083g
Precision is a key quantity in assessing the quality of chemical measurement results. It enters into
considerations of uncertainty, fitness for purpose, method validation, instrumental performance,
internal quality control, proficiency testing, and higher-level activities. The standard deviation of
measurement results derived from a single analytical ‘system’ (a combination of a particular analytical
procedure and a specific type of test material) depends on many factors, including the conditions of
measurement, the state of the test material, and the concentration of the analyte. It is essential that
these factors are properly matched to the use to which the precision information will be put.
Apologia
Does the analytical community really need yet another long
document about data quality? Well, in relation to precision, I
think that the answer is ‘yes’. In more than 50 years as an analyst
I have seen (and am still seeing) numerous examples of incorrect
estimation and inappropriate use of precision, in method devel-
opment, validation, quality control and, especially, in relation to
the estimation of uncertainty.
Problems occur when practitioners assess precision by repli-
cation of measurement under one set of conditions and then
apply the estimate to different and therefore inappropriate
conditions. Commonly encountered mismatches of conditions
School of Biological and Chemical Sciences, Birkbeck, University ofLondon, Malet Street, London WC1E 7HX, UK. E-mail: [email protected]
Michael Thompson
Michael Thompson has been
a professional analytical chemist
since 1960, since that date
having worked in industry, the
Civil Service, academia and as
a consultant. He is currently
Emeritus Professor of Analyt-
ical Chemistry at Birkbeck
University of London. He has
a long-standing interest and
research involvement in the
quality of analytical data. He
has been awarded the SAC Gold
Medal, the Theobald Lecture-
ship (both by the RSC), the
Harvey Wiley Award (by
AOAC International), and Honorary Life Membership by the
International Association of Geoanalysts.
1598 | Anal. Methods, 2012, 4, 1598–1611
occur between: (a) instrumental precision and repeatability; (b)
repeatability and reproducibility; (c) validation and quality
control; (d) one type of test material and another; and (e) one
concentration of analyte and another. Unfortunately the resul-
ting discrepancies can be large, and ignoring them can give rise to
misunderstanding and bad decision making. Getting the condi-
tions wrong gives rise to the tendency for analytical chemists to
underestimate when specifying the uncertainty associated with
their results. The fault is not all with practitioners: there is
insufficient guidance in textbooks and normative documents. It is
definitely worth thinking carefully about precision.
In this review, I examine the various conditions under which
precision is assessed and their relevance to various quality-
related practices: method development, validation, internal
quality control, collaborative trials and proficiency tests. I hope
to be excused for having taken most of the examples from my
own experience in the transport industry, forensics, prospecting
technology, biogeochemistry, and food quality.
1. Introduction
1.1 The concept of precision and scope of this study
Precision is a ubiquitous feature of chemical analysis as it figures
in most aspects of data quality, including method validation,1
internal quality control2 and proficiency testing.3 Crucially it is
referenced in the quantification of various contributions to the
combined uncertainty of the result of a measurement.4,5 The
current definition of precision in the Vocabulaire International
de M�etrolgie (VIM3) is: closeness of agreement between indica-
tions or measured quantity values obtained by replicate measure-
ments on the same or similar objects under specified conditions.6 As
the level of precision depends critically on the conditions of
measurement, it is essential for analytical chemists to understand
the exact implications of the various ways of assessing it.
Precision per se is an ordinal quantity because the only levels
available for comparison are lower, equal or higher. Standard
This journal is ª The Royal Society of Chemistry 2012
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
deviation provides a related ratio scale but quantifies dispersion
rather than precision. Awkwardly then for clear and concise
expression, higher precision correlates with smaller standard
deviation, and we must shun the common fault of using precision
and standard deviation as synonyms. The notion of precision, like
uncertainty, is usually taken by analytical chemists to exclude the
influence of observations resulting from gross errors, that is,
mistakes in the execution of the analytical procedure, malfunc-
tion of equipment or faulty calculation. This exclusion colours
our attitude to the handling of outlying results.
Precision can refer to any set of results conforming to the
above definition. For most purposes in chemical analysis,
however, conditions of measurement is assumed to imply the
replication of the entire analytical procedure starting with
separate test portions of a single homogeneous test sample. (In
some instances it may be relevant to include variation caused by
the physical treatment of the material submitted to the labora-
tory (the laboratory sample) before analysis.) In this context
homogeneous means only that residual heterogeneity makes
a minor, usually trivial, contribution to the variation in the
results. Other conditions of replication are of restricted use. For
instance, instrumental conditions refer only to the performance of
the analytical instrument, excluding variations arising from the
chemical treatment of the test portion.
This review treats only precision applied to chemical
measurement data on interval and ratio scales, that is, it excludes
qualitative and semi-quantitative data. The discussions assume
that the digit resolution in any measurement is capable of
reflecting adequately the dispersion of results. It is important to
notice that the precision of an analytical method is meaningful
only when the procedure is applied to a narrowly defined class of
test material, ‘vegetables’ for example. More strictly we should
refer to the precision of an analytical system comprising
a procedure and a type of test material. Sampling precision,7
although very important in the wider context, is not addressed
here.
1.2 Development of the concept of precision
The essential ideas of data quality were established by the 1890s.8
The terms error, accuracy, and precision were used then with
meanings very close to modern conceptions. Bias was recognised
as having components inherent in the measurement method itself
but also from a personal equation. A concept close to the modern
uncertainty was also recognised (but not under that name).
Standard deviation was introduced in 1893 in lectures by Karl
Pearson.
Uptake of these ideas by the analytical chemistry community
was very slow and patchy. From an early date9 practitioners were
content to discuss analytical method performance solely in terms
of accuracy (then taken as the closeness of agreement between
a measured value and a reference value). Precision, however, was
overall slow to emerge as a separate concept among analytical
chemists.
This was no doubt because analysis until recently comprised
the painstaking and time-consuming chemical manipulations of
gravimetry and titrimetry. Analysts had to concentrate on
getting the chemistry right, and there was little incentive for them
to replicate longwinded measurements. Landmark texts such as
This journal is ª The Royal Society of Chemistry 2012
Hillebrand and Lundell10 did not mention precision. The advent
of rapid instrumental analysis changed all of that. By the 1950s
analysts began to recognise a clear distinction between accuracy
(as smallness of bias) and precision (as smallness of dispersion).11
(Note: accuracy was originally regarded as smallness of bias, but
has been replaced for this meaning with trueness. Accuracy is
now defined in VIM3 as ‘closeness of agreement between
a measured quantity value and a true quantity value of
a measurand’.)
The effect of conditions of measurement on precision was
recognised by the 1950s.12 The key distinction between within-
laboratory conditions and between-laboratory conditions for
estimating precision was emphasised by Youden13,14 who, in
1969, tentatively ascribed to them the respective terms repeat-
ability and reproducibility in relation to the landmark develop-
ment of the collaborative trial (inter-laboratory method
performance study). Adoption of these terms in the analytical
community was reinforced through international standards15
and protocols.16 Even so, this distinction has been slow to be
generally recognised: undergraduate texts in analytical chemistry
until recently tended to ignore the distinction (or even get it
wrong!). In a survey of papers in one issue of The Analyst in 1994,
it was found that repeatability and reproducibility were confused
with each other in no less than 40% of the papers.17 This lack of
discrimination was cognate with the common propensity of
analysts to underestimate their uncertainties, a tendency still
discernible in 2008.18
1.3 Interpretations of VIM3
Shortcomings in the VIM3 definition of precision are immedi-
ately apparent to the analytical chemist. At face value VIM3
precision does not apply to measurements on substances (e.g.,
steel) or specific bodies of material (e.g., consignments of
peanuts), as opposed to objects. We must assume that the wider
meaning was intended. Another problem stems from same or
similar. Replicated chemical measurement usually (but not
always) involves separate test portions of the material of interest,
rather than the same object. The alternative word similar is
vague. Which qualities have to be similar? How similar is similar?
An earlier definition of precision19 is of interest: closeness of
agreement between independent test results obtained under stipu-
lated conditions. A noteworthy difference from VIM3 is the
requirement for independence has been dropped. This change
could have important repercussions. Independence is one of the
premises supporting many types of statistical inference, such as
commonly used tests of significance. The possibility of serial
dependence (time series) in analytical data should always be
borne in mind.
The most commonly referenced conditions of replication are
repeatability and reproducibility. These terms, as originally
developed for use in analytical chemistry, had a narrow specific
meaning.13,15,19 Their original definitions, however, have been
generalised in VIM3, so that there are now unhelpful variations
in the way that the terms can be interpreted. Besides these
conditions, plus the VIM3-defined intermediate conditions, there
are other conditions of replication that are commonly encoun-
tered but are not defined in normative documents, not consis-
tently named, and often used inappropriately. Of these,
Anal. Methods, 2012, 4, 1598–1611 | 1599
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
instrumental precision and calibration precision (using my own
terminology for want of apt alternatives) are in a class apart,
referring largely to the instrumental measurement aspect of an
analytical method. All other conditions refer to replication of the
complete analytical procedure, beginning with the selection and
weighing of a test portion. These conditions are summarised in
Table 1 and discussed in more detail subsequently. A quantita-
tive example of the way that standard deviation in a single
method (zinc in foodstuffs by atomic spectrometry) depends on
conditions of measurement is also shown in the table, to
emphasise the importance of using the value appropriate to the
context.
Fig. 1 Values (solid circles) and 95% confidence limits (open circles) of
a standard deviation estimated from various numbers of independent
random observations taken from a standard normal distribution (i.e.,
with a variance of unity).
1.4 Statistical considerations
This survey is concerned with clarifying the definitions and
appropriate applications of the various conditions for assessing
precision. There is an appendix commenting on aspects of the
statistical approaches that support these applications. However,
there are certain statistical facts that, from the beginning, colour
any discussion of precision and uncertainty and are therefore
worth stating immediately.
�A standard deviation s is an estimate of a population value s,
derived from observations x1, x2, ., xn via the equation
s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPðxi � xÞ2=ðn� 1Þ
qand is itself a variable. Estimates from
small numbers of replicated results have a wide dispersion
(Fig. 1). For example, if you want a standard deviation estimate
that itself has a relative standard error of 10%, you would need to
use 50 replicated results. With the customary 10 replicates, the
relative standard error of the estimate would be 22%. This
Table 1 Current usage in terms denoting conditions of replication in chemicastandard deviations (RSD) refer to the determination of zinc, at midrange co
Name of condition Conditions of replication
‘Instrumental’ Replication of measurement aquickly as possible, on a singleportion of test solution, with nadjustment of instrument.
‘Calibration’ Replication of results obtainedrepeat measurement of a singlsolution, involving evaluationestimated calibration function
Repeatability Replication with the same anaprocedure, instrument and reain the same laboratory, by theanalyst, in a ‘short’ period of
Intermediate (synonyms ‘run-to-run’and ‘within-laboratoryreproducibility’)
Replication in separate runs. Sanalytical procedure and laborbut there may be different anainstruments, or batches of rea
Reproducibility (1) Replication by the same analyprocedure in different laborato
Reproducibility (2) Replication by the same nomimethod but with variation in din different laboratories.
Reproducibility (3) Replication by various methoddifferent laboratories.
1600 | Anal. Methods, 2012, 4, 1598–1611
implies that random variations in estimates of major components
of uncertainty are often able to dwarf the complete contributions
from minor components. Moreover, while the variance (s2)
provides an unbiased estimate of s2, s is not unbiased relative to
s, especially noticeable for small (<10) n. These features may be
important when we consider standard deviation per se. However,
use of the t-distribution takes care of these problems in esti-
mating the confidence limits of means. (Notes: the standard
deviation of a statistic (as opposed to a simple variable) is called
a ‘standard error’. ‘Relative standard deviation’ or ‘relative
l measurement. All assume independence of results. The example relativencentrations in foodstuffs, by atomic spectrometry
CommentsExampleRSD, %
s
o
Does not include variationoriginating from separate testportions or chemical manipulations.
0.9
bye testvia the.
Does not include variationoriginating from separate testportions or chemical manipulations.
1.9
lyticalgents,sametime.
The ‘short period of time’ is thelength of an analytical ‘run’, that is,the period in which we assume thatthe factors affecting the magnitude oferrors have not changed.
2.9
ameatory,lysts,gent.
This condition of precision isaddressed in internal quality control.
4.0
ticalries.
This is the estimate provided by thecollaborative (interlaboratory) trial.
5.8
naletails
Estimate can often be obtained fromthe results of a single round ofa proficiency test.
6.0
s in Estimate can often be obtained fromthe results of a single round ofa proficiency test. The standarddeviation is usually greater than thatof reproducibility (1).
7.4
This journal is ª The Royal Society of Chemistry 2012
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
standard error’ refers to the value of s/�x or 100s/�x as
appropriate.)
� Chemical analysis is concerned with quantities that are
strictly bounded: masses and amounts of substance (and there-
fore mass fractions, concentrations, etc.) cannot be negative. But
the quantities actually measured—light intensity is an example—
are only surrogates for mass or amount, and are often not
bounded at a signal strength corresponding to zero concentra-
tion. This means that chemical measurements can (and some-
times do) give rise to analytical signals that impute negative
results on the concentration scale (Fig. 2). While such results
have no corresponding physical realisation, they are still
imparting real information about precision and we need to
handle them aptly to avoid incorrect conclusions.
� Precision varies with the concentration of the analyte, as
does relative precision also. In much of the discussion below, it
has been possible to discuss aspects of precision as thought it
were invariant. However, in analytical systems where the
concentration of the analyte in typical samples is dispersed over
a wide range, this dependence of precision on concentration has
to be taken into account. Methods of doing that are discussed in
detail below (Section 2.7 ff).
� Outliers among replicated results in chemical measurement
are not rare. In a study of proficiency test results from 2006,
covering many different analytes, test materials and laboratories,
as many as 4% of reported results were classed as outliers.20 The
appropriate statistical treatment of such results is still a disputed
topic among analytical chemists.
2. Definitions and applications
2.1 Instrumental precision
Instrumental conditions of replication pertain when a single
portion of the prepared test material (usually a test solution) is
subjected to replicated measurement, with no instrumental
adjustment, in the shortest possible time. This quantity essen-
tially describes the short-term behaviour of the instrument only
Fig. 2 Conceptual calibration function (solid line) near zero concen-
tration. Random variation (example points) in the analytical signal at
zero gives rise to a proportion of negative concentration results via the
extrapolated function.
This journal is ª The Royal Society of Chemistry 2012
and is therefore of limited relevance when, as usual, measurement
is preceded by chemical treatment. It is an essential tool in
instrumental development, and is often quoted in research papers
but, as a glance at Table 1 confirms, would be grossly misleading
if used to predict analytical performance in real-life conditions. It
is properly used, in classes of analysis like atomic spectrometry,
in checking the short-term instrumental stability before a run of
analysis begins.
It is less helpful where several rapidly replicated readings from
a single test solution are averaged for the final reading, as in
atomic spectrometry methods. A transient instrument malfunc-
tion, imperfectly mixed solution, or a memory effect might well
give rise to poor precision at that stage. However, the converse
inference is not true: a poor precision estimate does not neces-
sarily imply a problem. Standard deviations estimated from a few
repeated readings will have relatively enormous uncertainties, so
little can be read into these outputs (Fig. 1). For instance, for
a population standard deviation of 1.0, the 95% confidence limits
of a standard deviation estimated from the commonly used three
results will be as wide as (0.16, 1.92). Clearly such a statistic
would be misleading in screening results for problems.
Instrumental standard deviations tend to be several times
smaller than the corresponding repeatability standard devia-
tions, which include lower-frequency variations in the baseline
signal and sensitivity, and effects brought about by variations in
the chemical treatment of successive test portions. They should
never be taken to represent the precision of the whole analytical
procedure. A common mistake occurs when instrumental preci-
sion is assessed at or near zero concentration and then used by
the unwary to quantify a detection limit. This practice, some-
times seen in instrument brochures, gives rise to a false idea of the
detection power of an analytical method, with discrepancies as
large as tenfold between ‘instrumental detection limit’ and the
more reasonable ‘repeatability detection limit’, which includes
inter alia some effects resulting from variation in chemical pre-
treatment of the test portions.21
2.2 Calibration precision
When an instrument is calibrated, the analyst measures the
analytical signals (xi) corresponding to calibrators containing
different concentrations (ci) of the analyte and calculates a cali-
bration function x¼ f(c, q) with parameter estimates q¼ [q1, q2,
., qn]. A concentration c0 in a test solution is calculated from the
corresponding response x0 via the ‘evaluation function’ c0 ¼f�1(x0, q) in an operation sometimes called ‘inverse calibration’.
(With a simple linear calibration function x ¼ a + bc, we have
parameter estimates q ¼ [a, b] and c0 ¼ (x0 � a)/b.) As both x0and q are variables, they interact to provide an unexpectedly
large dispersion of possible values of c0 (Fig. 3). The corre-
sponding standard deviation sx0can be calculated directly from
the calibration data and, under the normal assumption, provides
confidence limits on the predicted concentration, sometimes
called ‘inverse confidence limits’22 or ‘fiducial limits’.23 Lack of fit
between the calibration data and the selected calibration function
is subsumed into the precision. This can be a useful exercise to
check or improve the calibration strategy. However, the cali-
bration precision obtained does not reflect the uncertainty of the
‘real-life’ analytical result and must not be used for that purpose.
Anal. Methods, 2012, 4, 1598–1611 | 1601
Fig. 3 Schematic diagram of an estimated calibration function (diagonal
solid line) with confidence interval (shaded band) and a newly observed
response (horizontal solid line) with its own confidence interval (shaded
band). The response x0 gives rise to an estimated concentration c0. The
interaction between the two confidence bands gives rise to an unexpect-
edly wide confidence interval (double arrow) around c0.
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
This is because error contributions from the preparation of the
test solution from the test portion, and matrix mismatch between
the calibrators and test solutions, are not accounted for.
Fig. 4 A run of replicate observations under repeatability conditions
showing raw results (upper plot) and differences between adjacent results
(lower plot). The clear trend of the raw results is absent in the differences.
The standard deviation of the differences is divided byffiffiffi2
pto obtain the
de-trended standard deviation.
2.3 Repeatability precision
Repeatability conditions are defined as: condition of measure-
ment, out of a set of conditions that includes the same measurement
procedure, same operators, same measuring system, same oper-
ating conditions and same location, and replicate measurements on
the same or similar objects over a short period of time.6 Unlike
instrumental precision and calibration precision, repeatability is
taken to involve ‘real-life’ analysis as the chemical manipulations
preceding measurement are included as possible sources of
variation. However, the definition is ambiguous for analytical
chemists and this leads, unfortunately, to a range of possible sub-
types of repeatability.
2.3.1 Initial interpretation—the ‘run’. As well as the previ-
ously noted problem with same or similar, the vagueness in this
definition occurs in short period, which leaves the conditions of
measurement open to an important variety of interpretations.
First, an analytical procedure may extend over several days in
stages, between which the part-completed work is set aside. Thus
we might have: on Day 1, weighing the test portions; on Day 2
chemical decomposition; on Day 3 making solutions to a fixed
volume and instrumental measurement. This is hardly a short
period, but would certainly be covered by the original idea of
repeatability conditions. Secondly, overlooking this problem,
how short is short?
A prima facie interpretation of repeatability might be that
short means a period during which factors that determine
precision remain constant, including factors not specified in the
VIM3 definition. But factors do not remain constant—there are
inevitably systematic changes with time, even if negligibly small.
So in the real world we have to specify instead a period during
which the factors have of necessity to be regarded as constant.
Any systematic changes within the period then become attributed
1602 | Anal. Methods, 2012, 4, 1598–1611
to random variation. This period is the analytical run,24 during
which a number of test materials of the same type are processed
sequentially as a batch, and drifts will be negligible, or at least
tolerable in the context of fitness for purpose. A run could
conceivably comprise hundreds of different test materials or as
few as one. The duration of a run would depend on the stability
of the particular method and the requirements of fitness for
purpose. Often the run would be defined as the period between
discrete changes in the analytical system, such as the preparation
of new batches of reagents or restarting an instrument after an
overnight shutdown. Adjustment of calibration drift might or
might not define the start of a run.
2.3.2 Factors affecting repeatability precision. A run of
analyses carried out to assess repeatability precision could
conceivably comprise nothing other than a succession of test
portions of the same homogeneous test sample. In such a run it is
a simple task to detect significant drifts and, if required, de-trend
the data. This possibility immediately engenders two extreme
versions of repeatability precision, that is, precision determined
either on the raw data or on the de-trended data. Although this
experiment is seldom carried out, the distinction is by no means
an academic quibble. Fig. 4 shows a run of such repeatability
results in which a clear trend is visible, of a magnitude roughly
the same as the peak-to-peak variation. This behaviour is typical
of many analytical systems. The lower run shows the same results
de-trended by plotting difference between successive results. The
standard deviation of the raw results (0.58) is reduced to 0.46 in
the de-trended data. (Note that the standard deviation of the
differences is divided byffiffiffi2
pto obtain the de-trended standard
deviation, because the variance of a difference is the sum of the
variances of the individual values.) A specific term, immediate
conditions, is suggested for detrended repeatability conditions,
which would apply to duplicates adjacent in the sequence of an
analytical run.
A further factor affecting estimates of repeatability standard
deviation is the condition of the test material. In ‘real-life’
analysis (that is, with typical test materials) the state of the test
samples will be representative automatically. Often, however,
precision is studied by the replication of measurements on
This journal is ª The Royal Society of Chemistry 2012
Fig. 5 Absolute differences of duplicated measurements plotted against
mean results (points), with lines showing percentiles for a relative stan-
dard deviation of 0.1 (median solid, 95th percentile dashed, and 99th
percentile dotted).
Fig. 6 Duplicated results (arbitrary units) showing absolute difference
plotted against mean (black squares), bin boundaries (dashed lines), and
medians of results in bins (red solid circles). The corresponding estimates
of standard deviation are the median absolute differences divided by
0.954.
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
reference materials or ‘control materials’ that, even when matrix-
matched, are not completely representative. These materials are
specially prepared for use by atypically fine grinding and thor-
ough mixing, and sometimes additionally by chemical treatment
to ensure long-term stability. That is of course essential for many
analytical tasks, such as internal quality control (see Section 2.4),
but it will usually improve precision noticeably beyond that of
‘real-life’ analysis. These special materials will tend to provide
unrealistically small estimates of dispersion that may not be
suitable for incorporation into uncertainty budgets.
2.3.3 Within-run repeatability. In real-life analysis, duplica-
tion is often required, either as a feature of the method per se or
for the purpose of within-run internal quality control. If the
duplicate test portions were processed in adjacent positions in the
run, then the de-trended precision would describe the distribu-
tion of differences between adjacent pairs of results. However,
de-trended precision would not be typical of variation over the
whole run, that is, under a within-run interpretation of repeat-
ability. For that purpose the duplicate test portions should be
located at random positions in the sequence. An additional
feature of ‘real-life’ analysis is that the run will comprise
a number of different test materials, reference materials, control
materials, blanks, etc. Adjacent test solutions in the measure-
ment sequence may thus have very different compositions, giving
rise to the possibility of memory effects. These extra sources of
variation combine to produce the ‘real-life’ repeatability condi-
tions, which cannot therefore be assessed simply by an unbroken
run of repeat test portions.
Real-life repeatability could be addressed by intercalating
portions of a single test material at random positions in a real-life
run. However, any single test material might not be representa-
tive of the class of the test material specified in the procedure and
could give rise to an atypical standard deviation. An alternative
approach that avoids this problem is to duplicate the entire set of
test materials in a randomised sequence in the run. The precision
standard deviation could be estimated from the differences
between corresponding pairs. The potential for variation of
precision with concentration of the analyte would have to be
taken into account but this is readily manageable, given enough
data (see Section 2.3.4).
2.3.4 Within-run precision tests. In instances where dupli-
cated results are obtained within a run from a not-large number
of test materials, the data can be used to check for unacceptably
poor repeatability precision. This may be especially valuable in
non-routine analysis where run-to-run statistical control is a void
concept. (Even in routine (multi-run) analysis, duplication can
support run-to-run control as an extra diagnostic guide and help
to eliminate blunders.25) However, we have seen that standard
deviations estimated from two results are biased, as would be an
average of several such estimates. An additional complication is
that the concentrations of the analyte in successive test materials
will differ and the precision is likely to be dependent on the
concentration. A compact way of overcoming these problems is
to ‘map’ the absolute difference between each duplicate pair
against the mean of the two results. Lines can be placed on such
a map showing percentiles of a prescribed distribution as
a function of concentration (Fig. 5). A key feature of the map is
This journal is ª The Royal Society of Chemistry 2012
that the median absolute difference (MAD) between random
duplicate results from a normal distributionN(m, s2) will have an
unbiased expectation of 0.954s, close enough to s to ignore the
factor for visual comparison. The underlying standard deviation
could represent an independent criterion of fitness for
purpose.26,27 Adherence to the criterion could be judged visually
(Fig. 5).
When large numbers of duplicated results are available, stan-
dard deviations can be estimated with reasonable accuracy and
the relationship with concentration explored.28 The absolute
differences can be binned into narrow concentration ranges
containing differences from at least 20 different test materials
(Fig. 6). Within each bin the concentration c can be regarded as
constant and estimated as the median, and the standard devia-
tion can be estimated as sc ¼ MAD/0.954. Use of the median
robustifies the estimate against the influence of outlying results
from atypical test materials. The relationship between sc and c
can then be considered.
Anal. Methods, 2012, 4, 1598–1611 | 1603
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
2.3.5 Repeatability precision from collaborative trials.
Collaborative trials (interlaboratory method performance
studies) involve the analysis of at least five different examples of
a class of test material, by a carefully described method, inde-
pendently in eight or more competent laboratories. The test
materials are selected to encompass a range of concentrations of
the analyte and to span the matrix types falling within the class.
The analysis is carried out with blind duplication or with a split-
level design in a randomised order to minimise observer bias. The
results for each test material are separately subjected to one-way
analysis of variance, to provide estimates of repeatability and
reproducibility standard deviation. These statistics are regarded
as properties of the method applied to the specified class of test
material. These conditions of analysis approximate to real-life,
although the test materials are likely to be atypical in that they
will have been subjected to fine grinding to ensure that they are
sufficiently close to homogeneous. A large number of methods
have been subjected to a collaborative trial, especially in the food
analysis sector. Shortcomings of the collaborative trial are (i) it is
very expensive to carry out, typically £50 000–100 000 and (ii) the
dataset is small so that the 95% confidence intervals on the
estimated standard deviations are uncomfortably wide.29
The repeatability standard deviation derived from a collabo-
rative trial is an average over all the participating laboratories. It
is a useful guide for potential users of the method, but cannot be
taken a priori as describing precision in individual participating
laboratories, or in non-participants, as conditions of measure-
ment will differ between laboratories. In consequence, a method
would have to be revalidated and its repeatability precision re-
assessed for use in a new laboratory.
2.4 Intermediate conditions of precision
Intermediate conditions are defined as: condition of measurement,
out of a set of conditions that includes the same measurement
procedure, same location, and replicate measurements on the same
or similar objects over an extended period of time, but may include
other conditions involving changes.6 This definition suffers from
the ambiguities previously noted, and is also unfortunately often
referred to as ‘within-laboratory reproducibility’. From the
standpoint of analytical chemistry, the only useful condition
from the set is run-to-run precision, which is the relevant measure
for internal quality control1 involving control materials and
control charts.
2.4.1 Statistical control. The essential idea of statistical
control is that the behaviour of a relevant index of the system
resembles an independent random variable from a normal
distribution. A deviation from the mean value, of magnitude
greater than three standard deviations, is taken as so unlikely
under statistical control that it indicates the opposite. Out-of-
control conditions thus show that factors affecting the level of
uncertainty of the measurement have changed. This topic has
been reviewed in depth for univariate analytical data30 and
studied for multivariate data.31
Internal quality control is essential in routine analysis to
ensure as far as possible that conditions affecting uncertainty of
results are stable over long periods of time, so that uncertainties
established during or close to the initial validation of a method in
1604 | Anal. Methods, 2012, 4, 1598–1611
a single laboratory can be attributed to results obtained there in
the indefinite future. It is executed by the insertion, into each run
of test portions, of one or more control materials. These materials
act as surrogates for the test materials so must have a bulk
composition typical of the materials under test. Moreover, the
concentration(s) of the analyte(s) in the control material(s) must
be appropriate for the application. The control materials will
probably be more finely divided than usual to reduce heteroge-
neity to negligible levels, and dried or otherwise treated to ensure
stability. Results are plotted on control charts. Where out-of-
control conditions are indicated, analysis should be halted until
the cause of the problem has been investigated and, where
necessary, alleviated. The affected run of results is reviewed with
the possibility of rejection and reanalysis.
A word of caution about the exact purpose of a control chart is
required to deflect practitioners away from some common
misunderstandings: the control chart must be based only on the
statistical behaviour of an indicator variable of the system under
study. That variable would normally be the result of a specific
analytical method applied in a particular laboratory to portions
of a specific control material. The control chart must be defined
by the mean and standard deviation of the indicator variable
itself and no other. Separately determined preferred values or
certified values should not be used for the mean. Nor should fit-
for-purpose, certified ranges or other uncertainties be used to set
the standard deviation. The point is that the control chart
describes the complete analytical system applied to a specific
control material. Certificate values, however, describe the refer-
ence material alone, while fitness-for-purpose criteria describe an
ideal (and therefore nonexistent) situation. (The behaviour of the
control material may not represent exactly the precision relevant
to routine test materials, because it is likely to be more finely
ground, so judgement should be used in any application of the
control statistics outside the control chart itself.)
Certified reference materials are sometimes used in quality
control, indeed are mandatory in some sectors, but are much
more reasonably used as occasional checks on accuracy, effec-
tively as a one-laboratory proficiency test. The use of a CRM on
a scale appropriate for internal quality control would nearly
always be inordinately expensive but on a lesser scale ineffectual.
2.4.2 Initiating a control chart. Setting up a control chart
calls for considerably more care than generally realised.30 Text-
books commonly refer to deriving the control lines from the
parameters of the process, that is, the mean m and the standard
deviation s. However, we have access only to estimates �x, s of
these. When setting up a control chart we typically have few
observations and their values, even for an ‘in control’ system, are
likely to be erratic. Moreover, the analysts will often be relatively
inexperienced with a new method so that early results are likely
to be more variable, and more likely to contain outliers, than
those obtained when the process has ‘bedded down’. In practice
the limits on a control chart may become stabilised only after
about 30 results have accumulated and the parameters estimated
from them by a robust method.32
In addition to these statistical problems, the results have to be
as far as possible representative of the system as it operates under
routine conditions. We need to see the variability of results on the
control material when it is in a random position in successive
This journal is ª The Royal Society of Chemistry 2012
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
runs containing the usual number of test materials, check solu-
tions, blanks, and duplicates, etc. All of this implies that a good
value for run-to-run standard deviation cannot be obtained
during a one-off validation exercise, but only after the system has
been in actual use for some time. That presents a difficulty
because we need a control chart from the very start of routine
operations.
The best approach seems to be to start immediately after
validation, with an interim chart that is reviewed and replaced
after enough ‘real-life’ results have accumulated. The reviews
should take place after (say) 10, 20 and 30 runs have taken place,
and thereafter at less frequent intervals. The interim chart could
start with a mean value estimated under repeatability conditions
during validation. The standard deviation sr estimated under
repeatability conditions would be too small for a run-to-run
precision, but a value of srtr¼ 1.6sr has been suggested as suitable
for an interim chart.32
2.5 Reproducibility conditions
Reproducibility conditions: condition of measurement, out of a set
of conditions that includes different locations, operators,
measurement systems, and replicate measurements on the same or
similar objects.6 There are several conditions under this heading
that are of crucial interest to analytical chemists, because
reproducibility is the condition where the standard deviation
most closely approximates uncertainty (see Section 2.6). It should
be noted that reproducibility conditions in analytical chemistry
originally referred to collaborative trials (see Section 2.5.1), and
that convention should ideally remain the established practice.
However, VIM3 redefined the term vaguely, and analysts must
be prepared to encounter alternative usage.
2.5.1 Reproducibility (1st meaning) and Horwitz’s generalisa-
tions. The most easily defined and by far the most intensively
studied version of this condition is simple between-laboratory
precision where a single well-defined analytical method is used in
different laboratories, giving rise to the familiar reproducibility
standard deviation sR derived form the collaborative trial (more
properly, the interlaboratory method performance study)
described in Section 2.3.5. It is a common observation that the
standard deviations tend to increase with increasing concentra-
tion. The resulting statistics are attributed to the method rather
than the laboratories participating. The test materials have to be
specially prepared for ‘homogeneity’ by fine-grinding, however,
so the resulting statistics tend if anything to underestimate values
relevant to routine practice.
Interlaboratory studies of method precision have been carried
out since the 1930s, and Horwitz has made databases of precision
statistics garnered from several thousand in the food sector.33–42
From these he was able to see two striking generalisations.43
� The reproducibility relative standard deviation tended to be
2% at a mass fraction of 0.01, doubling for each reduction in
mass fraction by a factor of 100. This can be expressed more
conveniently as the ‘Horwitz function’, sR ¼ 0.02c0.8495 or, in
logarithmic form, log10 sR ¼ 0.8495log10 c � 1.699, where sR is
the standard deviation predicted at mass fraction c. This trend
held over a long period of time, irrespective of the measurement
principle, the analyte, and the test material.
This journal is ª The Royal Society of Chemistry 2012
� The ratio sr/sR had an average of 0.5 over all results.
These behaviours were carefully tested by subsequent statis-
tical studies.44 The trend of the data was strikingly close to the
Horwitz function over the approximate concentration range 10�7
to 10�1 mass fraction, the ‘Horwitz region’. Within that range the
average value of sr/sR was 0.49. This latter result has an impor-
tant use. Collaborative trials are costly to organise, but it is easy
to obtain a value of sr in a single laboratory. Failing the avail-
ability of an interlaboratory study, a tentative estimate of sRwould be simply 2sr.
The existence of the Horwitz function has been tentatively
attributed to evolution of the methods used towards fitness for
purpose.45 By ‘natural selection’, with no other overarching
principle, methods that were either too expensive (uncertainty
too small) or gave rise to too many incorrect decisions (uncer-
tainty too great) would be discarded in favour of more suitable
methods. The Horwitz function is simply an ‘emergent’ feature of
fitness. Because of that the Horwitz function is used in a number
of contexts as a concentration-dependent fitness criterion, for
example in proficiency tests (for example, FAPAS46) and method
validation47 in the food sector.
Outside the Horwitz region the standard deviation predicted
by the function is systematically too high. At mass fractions
below about 10�7 the function predicts values of sR/c that would
exceed 0.3, thereby implying that all concentrations would be
below the detection limit and rendering analysis futile. However,
the observed trend, from 10�7 down to 10�14, is close to sR/c ¼0.22, just sufficiently below the detection limit. At mass fractions
above about 10%, precision tends again to be better than the
Horwitz prediction. This may be related to the high concentra-
tion of the analyte in relation to potential interferents and the use
of high-precision gravimetric and volumetric procedures.
We should note that the Horwitz function, with no unknown
parameters, is a descriptor of analytical methods in general.
Detrimentally it has no intercept (sR ¼ 0 at zero concentration)
so cannot take account of results near detection limits of indi-
vidual methods. Other functional relationships have been devised
to provide better fits of individual methods (see Section 2.7).
2.5.2 Reproducibility (2nd meaning). Participants in profi-
ciency tests often claim to use a standard method, but in fact
introduceminormodifications to suit their own environment or to
accommodate peculiarities of the test materials they encounter.
These modifications introduce an extra source of variation into
the uncertainty of the results reported. This can be seen for
example in Fig. 7, which shows results relating to the determina-
tion of protein in foods and feeds by theKjeldahlmethod. The line
shows the trend of results from a large collaborative trial of one
particular version of the method with 22 laboratories partici-
pating and 26 test materials of varied matrix composition.48 The
points show robust standard deviations of results from 26 rounds
of a proficiency test in which laboratories used a number of
different versions of the method. The proficiency test reproduc-
ibilities exceed those from the collaborative trial at the same
concentrations by a mean factor of 1.4.
However, an opposite tendency could possibly neutralise or
even reverse this effect. Participants in a mature proficiency test
that calls for a defined procedure, will have built up a consider-
able body of relevant experience, unavailable to the collaborative
Anal. Methods, 2012, 4, 1598–1611 | 1605
Fig. 7 Comparison of reproducibility standard deviations from profi-
ciency test rounds (points) with the trend of statistics from a collaborative
trial (line). The test materials were cereals (open circles), fish (closed
circles), meat (triangles), and milk powder (asterisks).
Fig. 8 Ratio of robust standard deviations from proficiency rounds in
the food analysis sector to the trend of collaborative trial reproducibility
standard deviations at mass fractions greater than about 10�7.
Fig. 9 Ratio of robust standard deviations from proficiency rounds in
the food analysis sector to the trend of collaborative trial reproducibility
standard deviations at mass fractions less than about 10�7.
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
trial participants (who test newly formulated procedures), and
thus demonstrate a smaller dispersion of results. This improve-
ment could also be the outcome of the proficiency test partici-
pants working to a predetermined fitness-for-purpose criterion.
(Collaborative trial participants have no prescribed criterion for
precision—they execute the method as closely as possible to the
written procedure and report the outcome.)
2.5.3 Reproducibility (3rd meaning). Results derived on the
same material but obtained by a variety of analytical methods in
different laboratories (the usual conditions in a proficiency test)
are likely to show an somewhat greater dispersion still, because
of the potential for bias between different analytical methods. An
early study, encompassing a variety of analytes, test materials
and measurement principles, showed that the robust standard
deviation of results from proficiency test rounds showed a quasi-
Horwitz dependence on concentration,49 namely srob ¼0.023c0.8255. This equation describes a trend with a standard
deviation greater than that found in collaborative trials by an
average factor close to 1.5.
A study of more recent and far more numerous statistics from
a proficiency test50 shows a monotonic increasing trend over
a wider range of mass fractions. At levels exceeding 10�6.92 the
trend of collaborative trial statistics (that is, reproducibility
standard deviations) has been found to exceed the Horwitz
function by a factor rising to a maximum of 1.6 at about 10�4
mass fraction (Fig. 8).
Below 10�6.92 the Horwitz function predicts a standard deviation
that would bring results uncomfortably close to or lower than any
reasonable definition of detection limit. So to be fit for any purpose
at all, standard deviations have to be lower than that prediction.
Theobserved trendof collaborative trial statistics at concentrations
lower than 10�6.92 is close to sR/c ¼ 0.22, showing that higher
precision is available, given the need, regardless of the Horwitz
prediction. However, the cost of such determinations can be very
high. The trend of the proficiency test data is only slightly greater
than sR/c ¼ 0.22, with an average ratio of about 1.1 (Fig. 9).
2.6 Precision and uncertainty
The ISO document ‘‘Guide to the expression of uncertainty in
measurement’’ (GUM)4 describes the estimation of uncertainty
1606 | Anal. Methods, 2012, 4, 1598–1611
of measurement in terms of a complete operational model of the
measurement process, broken down into fundamental inputs
each traceable to international standards. Each fundamental
input will be quantified by a standard deviation that characterises
its dispersion, whether that is estimated directly by replication
(‘‘Type A’’) or in any other way (‘‘Type B’’ uncertainty). Under
this regime, many of the inputs will be based on precisions
assessed under repeatability conditions, often during method
validation.
Many scientists, however, think that chemical measurement is
usually too complex to be represented adequately by such
a model. The implication is that the GUM approach might tend
to underestimate uncertainty in analysis, either by overlooking
latent inputs or by ignoring possible interactions among known
inputs. As a readily available alternative, those analytical scien-
tists have often regarded reproducibility precision as a practical
estimate of standard uncertainty. The Eurachem Guide5 recom-
mends reproducibility standard deviation as a basis for uncer-
tainty provided that bias and contributions associated with
traceability (usually calibration uncertainties) also are taken into
account.
2.6.1 Limitations of replication in uncertainty estimation.
Many metrologists are uncomfortable with replication as an
This journal is ª The Royal Society of Chemistry 2012
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
unqualified method of estimating uncertainty, and it is easy to see
why. Replication alone could be appropriate only under two
conditions, namely:
� the replication of the measurement is able to explore all of
the scope for variation in the analytical measurement procedure;
� the analytical method is known to be unbiased or, more
strictly, systematic effects are negligible in relation to uncon-
trolled variation.
Repeatability replication clearly does not fulfil the first
requirement. For example, if a procedure specified drying the test
material for one hour at 110 �C, we might reasonably expect
variations in timing between 50 and 70 minutes and in temper-
ature between 105 and 115 �C on different occasions or in
different laboratories. Repeatability results would not reflect this
potential variation, so the repeatability standard deviation srwould be too small an estimate of the uncertainty contribution.
On these grounds alone, repeatability standard deviation should
tend to underestimate standard uncertainty and individual
contributions to an uncertainty budget.
It therefore comes as a surprise to find many instances where
estimates of sr obtained covertly have been found considerably
to exceed claimed levels of standard uncertainty in results from
contracted-out analysis produced by accredited laboratories.51
This finding provides a compelling case for users of contracted-
out data to conduct such checking by the insertion of blind
duplicate portions of actual test samples (in contrast with care-
fully prepared control materials, which would provide a smaller
standard deviation). Results such as these in sufficient numbers
could be assessed by using a ‘duplicate map’ as discussed in
Section 2.3.4.
2.6.2 Relevance of interlaboratory replication. Measurements
under reproducibility conditions are much better able to explore
the sample space for variation, because practice and environment
in one laboratory will be to a large extent independent of those in
another. Even so there are conceivable objections to this
assertion.
� Practice in different laboratories might be unexpectedly
uniform so that the potential for variation is not fully explored.
For example, if the written procedure specified a heating time of
60 � 10 minutes, different laboratories might tend to use a time
of 60 � 2 minutes and thus not sample the full range allowed.
This would give rise to an estimate of reproducibility standard
deviation sR smaller than that corresponding with the written
procedure.
�Within any class of test material there will be variation in the
matrix of the test material. For example, in the class ‘vegetable’
there might be carrots, onions, potatoes, brassicas, etc. Each of
these matrix varieties will, in principle, introduce its own matrix
effect52 and thus engender a contribution to the uncertainty
budget. This contribution may not be encompassed by repro-
ducibility standard deviation. Perhaps more importantly, any
inherent bias in a single measurement procedure will systemati-
cally affect all results from the laboratories using it. (Some
metrologists claim that interlaboratory precision permits no
proper reference to traceability, but I regard this feature as
subsumed in the issue of bias.)
Thus these two considerations imply that sR tends to
underestimate standard uncertainty, although perhaps by
This journal is ª The Royal Society of Chemistry 2012
a relatively small margin because analytical chemists use
methods with negligible bias wherever possible. Rather
surprisingly, when we compare reported estimates of sR and
corresponding standard uncertainties we find an opposite
tendency to be the case, overall by a substantial margin.53 In
a number of contexts the tendency has been for sR to be greater
by an average factor of 1.5. This implies that there is a broad
tendency for unknown factors affecting uncertainty to be
omitted from uncertainty budgets, presumably because of the
complexity of chemical measurement. This phenomenon has
been called ‘dark uncertainty’. Replication under reproduc-
ibility conditions can apparently account for at least some of
these unknown factors, however.
2.7 Precision as a function of concentration
When the analyte is determined directly (that is, not by differ-
ence) by using a single procedure, it is a common observation
that the dispersion of the measurement results increases
smoothly with the concentration of the analyte. When the
concentration range is small, especially when confined to values
less than ten times the detection limit, this variation in precision
may be imperceptible or, for practical purposes, negligible. For
wide concentration ranges distant from the detection limit,
however, a roughly constant relative dispersion is often
apparent. These two behaviours can be reconciled into a widely
applicable single model by error-propagation theory.
2.7.1 Characteristic function. Consider an analytical system
in which the net signal at zero concentration has a standard
deviation of magnitude a. The signal at a higher concentration c
> 0 will be greater because of an additional contribution from the
uncertainty in the estimated gradient of the calibration function.
That term will be proportional to the concentration c, thus
contributing a standard deviation of bc. The combined disper-
sion at c will be described by s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ ðbcÞ2
q. This equation
(sometimes called the ‘characteristic function’) displays the
observed relationship. At low concentrations s increases slowly,
being determined largely by the value of a (Fig. 10). At higher
concentrations the term bc dominates and a tendency towards
constant relative standard deviation b prevails (Fig. 11).
Characteristic functions conforming to that model have been
confirmed in various types of analytical system under conditions
of instrumental precision, repeatability precision and reproduc-
ibility precision54 and are of presumably wide applicability. By
implication they also apply to uncertainty, although that might
be impracticable to demonstrate. Characteristic functions are
essential for estimating uncertainties at concentrations other
than those utilised in validation.
A characteristic function refers only to a single analytical
procedure under fixed conditions of replication. Its parameters
are unique and have to be estimated by validation. It takes
proper account of detection limit phenomena by virtue of the
a parameter. In all of these aspects it differs from the Horwitz
function (see Section 2.5.1), which has a quite different purpose.
The Horwitz function is a generalisation about the trend of sR in
large suites of analytical methods. It has no unique parameters to
determine, and takes no account of detection limit.
Anal. Methods, 2012, 4, 1598–1611 | 1607
Fig. 10 Standard deviation varying with concentration according to the
characteristic function s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ ðbcÞ2
q, with a ¼ 5, b ¼ 0.05.
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
2.7.2 Estimating parameters of a characteristic function.
Demonstration of compliance (or otherwise) of an analytical
system with the model such as the characteristic function requires
a large amount of data in the form of precise estimates of stan-
dard deviation at closely spaced concentrations. Even collabo-
rative trials, widely regarded as the most informative type of
method validation, can scarcely provide enough information for
the task.29 Nevertheless, in instances where the characteristic
function is likely to prevail, its parameters can be readily esti-
mated as part of method validation. The parameter a is simply
the standard deviation (or uncertainty) estimated at (or close to)
zero concentration. The parameter b is simply the asymptotic
relative dispersion (ARD) at concentrations well above the
detection limit. In cases where the ARD is not approached, the
parameters can be found by estimating a first and then b asffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2c � a2
p=c, where sc is the dispersion found at concentration
c [ 0.
2.8 Precision and detection limit
The key measure of detection capability is the dispersion of the
measured analytical signal at or very close to zero concentration.
This is used to calculate the detection limit as a concentration
corresponding with critical points in the dispersion, almost
always under the assumption of a normal distribution. The
Fig. 11 Relative standard deviation varying with concentration
according to the characteristic function s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ ðbcÞ2
q, with a¼ 5, b¼
0.05.
1608 | Anal. Methods, 2012, 4, 1598–1611
detection limit cL is defined in various ways so that, at the limit,
only a negligible proportion of measurement results will fall at or
below zero. (A proportion of negative observations is the natural
outcome of estimating concentrations close to zero: there is, of
course, no corresponding physical realisation. Some analytical
instruments suppress such readings, as do some data recording
practices.)
There is a considerable literature on the detection limit,55,56,57
which is still regarded as a key aspect of validation, but has
definite shortcomings. The definition of the term has become
unduly complicated and, if it were followed to the letter, would
be difficult to put into practice. Analytical chemists commonly
use a simplified version. Moreover, the detection limit cL,
together with other related ‘limits’, encourages a false dichotomy
of a concentration scale that is in reality continuous. Analysts
tend to consider a result of 1.1cL valid but a result of 0.9cL as
qualitatively different and to be reported only on an ordinal scale
(‘less than cL’ for example). A more modern approach is simply
to report the result—censored at zero if necessary—with its
appropriate uncertainty.58 It is not clear at present whether this
new paradigm will prevail.
2.8.1 Conditions of measurement for the detection limit. The
main issue for the purposes of this review is to identify the
conditions of measurement that provide the standard deviation
leading to a useful detection limit. Ideally the dispersion cited
should represent the standard uncertainty but that is seldom
given sufficient consideration. A detection limit is commonly
estimated under instrumental conditions of precision (Section
2.1), which cannot represent the true detection capability of the
entire analytical procedure. At the other extreme, estimation by
extrapolating real-life uncertainties to zero concentration
requires an elaborate experiment and more data than is
economically practicable, even in collaborative trials. There is
little information available on the comparison. One experiment,
however, found that detection limits extrapolated from repeat-
ability data tended to be between four and ten times greater than
the comparable values based on instrumental precision.59
Uncertainty in chemical measurement springs from three main
sources: (i) variation derived from the calibration/evaluation
function; (ii) variation in the preparation of the treated test
solution used for measurement; and (iii) error in the comparison
caused by matrix mismatch. An interesting conjecture is that
items (ii) and (iii) will contribute negligibly to uncertainty at zero
concentration, which will therefore be dominated by repeat-
ability dispersion. Were that true, a ‘real-life’ detection limit
could be estimated very easily from single laboratory validation.
There is virtually no experimental support for this conjecture,
but there are indications that it might repay further investigation.
In one study, ‘characteristic functions’ of repeatability and
reproducibility standard deviations estimated in a large collab-
orative trial,48 extrapolated to almost the same intercept a (that
is, the standard deviation at zero concentration). The intercept
estimates and their standard errors were (as % mass fraction):
repeatability standard deviation, 0.236 (0.044); reproducibility
standard deviation 0.261 (0.037), so the estimates were not
significantly different. The data and fitted lines are shown in
Fig. 12. Unfortunately, very few collaborative trials will provide
enough observations for this kind of study to be conclusive.
This journal is ª The Royal Society of Chemistry 2012
Fig. 12 Standard deviation of repeatability (open circles) and repro-
ducibility (solid circles) from a large collaborative trial (26 materials, 26
laboratories). The characteristic functions (lines) were fitted by
a weighted non-linear method. The bar on the y-axis shows the estimated
95% confidence interval for the intercepts.
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
Appendix A: useful distinctions among conditions ofmeasurement
These are regarded as a minimal set of conditions of measure-
ment for the unambiguous definition of precision in analytical
chemistry. They are proposed as a starting point for the eventual
establishment of an appropriate range of normative terms and
definitions.
A.1 Instrumental conditions: when a single portion of the
prepared test material (usually a test solution) is subjected to
replicated measurement, with no instrumental adjustment, in the
shortest possible time.
A.2 Calibration conditions: when an ‘inverse calibration’
precision is calculated from calibration data.
A.3 Immediate conditions: subset of ‘VIM3’ repeatability
conditions defined by differences between results from sequen-
tially adjacent pairs of a test material in an analytical run.
A.4 In-run conditions: subset of ‘VIM3’ repeatability condi-
tions defined by variation in results from replicate portions of
a test material alone in a run.
A.5 ‘Real-life’ conditions: subset of ‘VIM3’ repeatability
conditions where replicate test portions are in random positions
within a full-length run of routine test materials.
A.6 Run-to-run conditions: subset of ‘VIM3’ intermediate
conditions where test portions are replicated in random positions
within many runs of routine test materials.
A.6 Collaborative conditions: subset of ‘VIM3’ reproducibility
conditions prevailing in collaborative trials.
A.7 Generic conditions: subset of ‘VIM3’ reproducibility
conditions prevailing when a single method with procedural
variations is used for routine analysis.
A.8 Broad conditions: subset of ‘VIM3’ reproducibility condi-
tions where there is no restriction on the method of analysis.
Appendix B: statistical aspects
B.1 Rounding
For the most part, analytical results are treated as stemming
from an underlying continuous distribution. Under modern
This journal is ª The Royal Society of Chemistry 2012
conditions the initial measurements are truncated according the
digit resolution of the instrument display and then subjected to
calculations that give the final result to an inordinate number of
significant figures. These results should be rounded for reporting
to a degree that avoids both a false suggestion of high precision
and a loss of information.
The commonly used rule of thumb is to retain the first digit
that is uncertain but, naively applied, that is sometimes delete-
rious. For instance, in the repeated results 4.8, 4.7, 4.5, 5.4, 5.1,
the digit to the left of the decimal point is variable but, under the
simple rule, all of the results round to 5, which would generate
a variance estimate of zero. A more useful principle is that the
rounding should reflect the dispersion of the data. We have seen
(above) that a standard deviation based on a statistically small
number of results is very unlikely to be more precise than 10%
relative, which suggests that only one significant figure is likely to
be meaningful in a standard deviation (or uncertainty). However,
retaining just one significant figure sometimes creates an ambi-
guity. Rounding a raw analytical result of 0.951 to exactly 1
implies a possible range of 0.500 to 1.499 and a potential relative
uncertainty of �50%. If the estimate were just slightly lower at
0.949, however, the rounded version would be 0.9, implying
a range between 0.85 and 0.949, that is, a relative uncertainty of
about �6%. This ambiguity could be removed for all practical
purposes by retaining an extra decimal point and rounding either
number to 0.95.
An appropriate rule is therefore: (a) do no rounding except the
final reported value; and (b) round the estimated standard
deviation, standard error or standard uncertainty to two
significant figures and round the result (or the mean result) to the
corresponding degree.
B.2 Observations near natural limits
Analysts respond to results falling below natural limits (such as
a concentration of zero) in a variety of ways, all of which cause
problems in a subsequent statistical analysis:
� repeat the measurement until a non-negative result is
obtained;
� record a value of zero;
� record a value of ‘less than’ an arbitrary limit, such as
a detection limit or multiple thereof.
The first two practices give rise to a positively biased mean and
a negatively biased standard deviation if naive estimation
procedures are used. This could have a noticeable effect on
precision estimates and thence detection limits. However,
a simple expedient is to estimate the precision at the expected
detection limit. The probability of obtaining a subzero result is
then negligible, as is the effect of the slight increase in
concentration.
A proportion of ‘less than’ results in a dataset renders least-
squares estimation impossible, although maximum likelihood
estimation can cope with these mixed-type datasets.60 These
difficulties can be important for chemical measurement, where
a considerable proportion of work is related to testing for
undesirable impurities. The best method for studying precision is
to record observations exactly as they occur, so that simple
statistical estimation applies. (This is a different circumstance
from reporting results to a customer, where subzero results
Anal. Methods, 2012, 4, 1598–1611 | 1609
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
would usually be unacceptable. For that purpose, reporting
a zero result with a zero-truncated expanded uncertainty would
be appropriate.58)
B.3 Outliers and robust estimation
Suspect values are not rare in chemical measurement: some of
these may be outliers. In estimating precision it is sometimes
regarded as appropriate to remove the influence of outlying
results. This is usually a matter of professional judgement
depending on the circumstances and purpose of the measure-
ment: no strict guidelines can be drawn. When a genuine outlier
occurs in replicated data the mean will be biased and the stan-
dard deviation inflated. These features can affect statistical
inference, such as tests of significance. In calibration/evaluation,
an outlier can have an unexpected large effect on an estimated
concentration.
In a dataset with an outlier, the analyst is faced with
a choice between describing all of the data badly and most of
the data well. Some analysts argue that raw statistics provide
a true representation of the dispersion of results that could be
expected from further use of the same procedure. I believe that
this is usually incorrect, because outliers are inherently
unpredictable. In any event, it is a question of degree—
everybody would agree to delete (or at least review) a value
that was ten times greater than the other members of the
dataset!
In cases where reducing the influence of outliers is deemed
necessary, the analyst has two options: (i) to employ outlier tests
and reject any result found unlikely to be a part of the pop-
ulation to be represented; or (ii) to employ robust statistical
methods, which typically downweight results far from the
central tendency. Outlier tests such as Dixon’s, Grubbs’s and
Cochran’s are commonly used by analysts, but have their
problems. Robust methods have become widely used since the
AMC study provided a rationale and software for robust
descriptive statistics61 and one-way analysis of variance.62
Robust regression methods in calibration are also
recommended.63
Robust methods are mostly applicable to unimodal distribu-
tions that are (outliers aside) close to symmetrical about the
mode. It is important to realise that there is no single robust
estimate of a parameter—the value obtained will depend on the
statistical procedure used, although all reputable methods should
give very similar results for roughly symmetric distributions.64
Having said that, there is no justification for calculating
a barrage of robust estimates and selecting from them on
a subjective basis.
In collaborative trials, the Harmonised Protocol requires that
outliers be rejected before analysis of variance is carried for the
estimation of repeatability and reproducibility standard devia-
tions. This is because the precisions are regarded as properties of
the analytical method rather than of the participating laborato-
ries.16 There is a fixed protocol for outlier removal, involving
Grubbs’s and Cochran’s tests. Robustification can be applied
also to analysis of variance and has been found to give statistics
almost identical with those from outlier deletion in results of
collaborative trials.65
1610 | Anal. Methods, 2012, 4, 1598–1611
B.4 Non-normal distributions
Normal distributions result from the addition of numerous
small independent variations—the central limit theorem.
Analytical results are the outcome of a succession of operations
that are often numerous, each of which contributes dispersion
that is sometimes small, mostly additive, and usually indepen-
dent. In short we expect but do not guarantee that analytical
results will, outliers aside, resemble a random sample from
a normal distribution. Under repeatability conditions we see
plausibly normal datasets, although minor deviations are
commonplace. The most usually encountered deviation is
a tendency to ‘heavy tails’, that is, an unduly high proportion of
results distant from the mean. Even under reproducibility
conditions, distributions approximating to the normal are
common. However, more drastic deviations from normal can
sometimes be observed. It is not rare to see datasets with
a skewed or bimodal tendency in the results of proficiency tests,
the outcome of the participants using one or more inconsistent
analytical methods. Statistical methods for characterising such
datasets include kernel density estimation66 and mixture
modelling.67
Other special circumstances can give rise to seriously non-
normal distributions in analytical results, resulting usually in
a positive skew.
�When results are censored at or near zero, a very asymmetric
distribution may be observed.
�When an analyte is restricted to a trace phase in a mixture of
phases, the number of discrete particles containing the analyte in
a test portion will vary according to a Poissonian distribution
(called the ‘nugget effect’ in geochemical analysis because it is
often observed in the analysis of ores of precious metals). When
small (less than ten) numbers of particles are involved, results
deviate more-or-less strongly from the normal distribution.
� When effects are multiplicative (rather than additive), as in
quantitative versions of the polymerase chain reaction (PCR),
distributions with a lognormal tendency are observed.68
� When the final result is a quotient of two widely dispersed
normally distributed variables, the outcome may have a notice-
able positive skew. This may sometimes be observed in profi-
ciency test data when the raw result is close to the detection limit
and is then corrected for a low recovery.
In such instances the analyst is sometimes tempted to log-
transform the data before statistical treatment. In most of the
situations described above (that is, apart from PRC), log-
transformation would tend to confuse rather than clarify the
issues, because the raw data will not be strictly lognormal. Log-
transformation may, however, be a useful technique in statistical
operations such as regression and analysis of variance, in
instances when we expect data showing an approximation to
constant relative standard deviation. The transformation has the
effect of stabilising the variance across the concentration range,
obviating the need for weighted statistical methods.
To avoid misunderstanding, readers should note that collec-
tions of data for the concentration of a trace constituent in
a large number of samples of a particular type often show
a distribution resembling the lognormal. This real variation
between different test materials must not be confused with the
dispersion of replicated analytical results.
This journal is ª The Royal Society of Chemistry 2012
Dow
nloa
ded
by W
agen
inge
n U
R o
n 20
Oct
ober
201
2Pu
blis
hed
on 2
5 A
pril
2012
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/C
2AY
2508
3G
View Online
Tests for normality are seldom informative except in speci-
alised studies. With small numbers of observations the power of
such a test is low, so significant outcomes are unlikely. In
particular, such tests require inordinately large numbers of
observations to distinguish between normally and lognormally
distributed variables.69 Real-life datasets with large numbers of
observations nearly always deviate from the normal to a signifi-
cant extent.
References
1 M. Thompson, S. L. R. Ellison andR.Wood,Pure Appl. Chem., 2002,74, 835–855.
2 M. Thompson and R. Wood, Pure Appl. Chem., 1995, 67, 649–666.3 M. Thompson, S. L. R. Ellison andR.Wood,Pure Appl. Chem., 2006,78, 145–196.
4 ISO/IEC Guide 98:1995, Guide to the Expression of Uncertainty inMeasurement (GUM), ISO, Geneva, 1995.
5 Quantifying Uncertainty in Analytical Measurement, ed. A. Williams,S. L. R. Ellison and M. Roesslein, Eurachem/CITAC Guide, 2nd edn,2000, Available from the Eurachem Secretariat and Website, http://www.eurachem.com/.
6 International Vocabulary of Basic and General Terms in Metrology(VIM), 3rd edn, JCGM 200:2008, http://www.bipm.org/vim.
7 Eurachem/Eurolab/CITAC/Nordtest/AMC Guide: MeasurementUncertainty Arising from Sampling, ed. M. H. Ramsey and S. L. R.Ellison, Eurachem, 2007, ISBN 978 0 948926 26 6.
8 S. W. Holman, Discussion of the Precision of Measurements, Wiley,New York, 1892.
9 J. Kjeldahl, Fresenius’ Z. Anal. Chem., 1883, 22, 366–382.10 W. F. Hillebrand and G. E. F. Lundell, Applied Inorganic Analysis,
Wiley, New York, 1929.11 H. W. Fairbairn, A Cooperative Investigation of Precision and
Accuracy in Chemical, Spectrochemical and Modal Analysis ofSilicate Rocks, U.S. Geological Survey Bulletin 980, WashingtonDC, 1951.
12 C. R. N. Strouts, J. H. Gilfillan and H. N. Wilson, AnalyticalChemistry: the Working Tools, Clarendon Press, Oxford, 1955.
13 W. J. Youden, Statistical Methods for Chemists, Wiley, New York,1951.
14 W. J. Youden, Statistical Techniques for Collaborative Tests,Association of Official Analytical Chemists, Washington DC, 1969.
15 ISO 5725-1:1994, Accuracy (Trueness and Precision) of MeasurementMethods and Results—Part 1: General Principles and Definitions, ISO,Geneva, 1994.
16 W. Horwitz, Pure Appl. Chem., 1988, 60, 855–864.17 M. Thompson, Analyst, 1994, 119, 127N.18 S. L. R. Ellison and K. Mathieson, Accredit. Qual. Assur., 2008, 13,
231–238.19 ISO 3534-1:2006, Statistics—Vocabulary and Symbols—Part 1:
General Statistical Terms and Terms Used in Probability, ISO,Geneva, 2006.
20 M. Thompson, K. Mathieson, L. Owen, A. P. Damant and R. Wood,Accredit. Qual. Assur., 2009, 14, 73–78.
21 M. Thompson, Analyst, 1988, 113, 1469–1471.22 J. N. Miller and J. C. Miller, Statistics and Chemometrics for
Analytical Chemistry, Pearson Education Ltd, Harlow, UK, 6thedn, 2005.
23 N. R. Draper and H. Smith, Applied Regression Analysis, Wiley, NewYork, 3rd edn, 1998.
24 M. Thomspon, Analyst, 2000, 125, 385–386.25 AMC Technical Briefs, No 49, 2011.26 M. Thompson and R. J. Howarth, J. Geochem. Explor., 1978, 9, 23–
30.
This journal is ª The Royal Society of Chemistry 2012
27 AMC Technical Briefs, No 9, 2002.28 M. Thompson and B. J. Coles,Accredit. Qual. Assur., 2011, 16, 13–19.29 M. Thompson, Accredit. Qual. Assur., 2008, 13, 479–482.30 R. J. Howarth, Analyst, 1995, 120, 1851–1873.31 R. J. Howarth, B. J. Coles and M. H. Ramsey, Analyst, 2000, 125,
2032–2037.32 M. Thompson and P. J. Lowthian, Notes on Statistics for Analytical
Chemists, Imperial College Press, Singapore, 2011.33 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1984, 67, 81–
90.34 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1984, 67, 648–
652.35 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1985, 68, 112–
121.36 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1985, 68, 191–
198.37 M.Margosis, W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem.,
1988, 71, 619–635.38 J. T. Peeler, W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem.,
1989, 72, 784–806.39 W. Horwitz, R. Albert, M. J. Deutch and J. N. Thompson, J. - Assoc.
Off. Anal. Chem., 1990, 73, 661–680.40 W. Horwitz and R. Albert, J. - Assoc. Off. Anal. Chem., 1991, 74, 718–
744.41 W. Horwitz, R. Albert, M. J. Deutch and J. N. Thompson, J. AOAC
Int., 1992, 75, 227–239.42 W. Horwitz, R. Albert and S. Nesheim, J. AOAC Int., 1993, 76, 461–
491.43 W. Horwitz and R. Albert, J. AOAC Int., 1996, 79, 589–621.44 M. Thompson and P. J. Lowthian, J. AOAC Int., 1997, 80, 676–679.45 M. Thompson, Analyst, 1999, 124, 991.46 FAPAS Secretariat, Central Science Laboratory, FERA, Sand
Hutton, York YO41 1LZ, UK.47 W. Horwitz, P. Britton and S. J. Chirtel, J. AOAC Int., 1998, 81,
1257.48 P. F. Kane, J. - Assoc. Off. Anal. Chem., 1984, 67, 869–877.49 M. Thompson and P. J. Lowthian, Analyst, 1995, 120, 271–272.50 Unpublished data.51 M. H. Ramsey, personal communication.52 M. Thompson and S. L. R. Ellison, Accredit. Qual. Assur., 2005, 10,
82–97.53 M. Thompson and S. L. R. Ellison, Accredit. Qual. Assur., 2011, 16,
483–487.54 M. Thompson, TrAC, Trends Anal. Chem., 2011, 30, 1168–1175.55 ISO 11843-1:1997, Capability of detection—Part 1: Terms and
definitions.56 Compendium of Chemical Terminology (‘The Gold Book’), Online
Corrected Edition, IUPAC, ‘‘Detection limit’’, 2006.57 L. A. Currie, Anal. Chim. Acta, 1999, 391, 105–126.58 Analytical Methods Committee, Accredit. Qual. Assur., 2008, 13, 29–
32.59 M. Thompson, Analyst, 1988, 113, 1579–1587.60 Y. Pawitan, In All Likelihood: Statistical Modelling and Inference
Using Likelihood, Clarendon Press, Oxford, 2001, pp. 312–313.61 Analytical Methods Committee, Analyst, 1989, 114, 1693–1697.62 Analytical Methods Committee, Analyst, 1989, 114, 1699–1702.63 AMC Technical Brief No 50, Anal. Methods, 2012, 4, 893–894.64 S. L. R. Ellison, Accredit. Qual. Assur., 2009, 14, 411–419.65 P. J. Lowthian, M. Thompson and R. Wood, Analyst, 1998, 123,
2803–2807.66 M. Thompson, Analyst, 2002, 127, 1359–1364.67 M. Thompson, Accredit. Qual. Assur., 2006, 10, 501–505.68 M. Thompson, S. L. R. Ellison, L. Owen, K. Mathieson, J. Powell,
P. Key, R. Wood and A. P. Damant, J. AOAC Int., 2006, 89, 232–239.
69 M. Thompson and R. J. Howarth, Analyst, 1980, 105, 1188–1195.
Anal. Methods, 2012, 4, 1598–1611 | 1611