A stochastic model for interlaboratory tests

Computational Statistics & Data Analysis 12 (1991) 201-209 North-Holland

201

A stochastic model for interlaboratory tests

Laurie Davies

University of Essen, D-4300 Essen, Germany

Received March 1989 Revised January 1990

Abstract: A model for interlaboratory tests is proposed which assumes that correctly taken measurements lie in a compact interval. Those distributions with support [ - l,l], zero mean and given variance which minimize the fisher information are obtained. An outlier model is then developed and a neighbourhood of such a model is defined. This allows an answer to the question of what is being estimated when the assumed parametric model does not hold.

Keywords: Interlaboratory tests, Compact distributions, Fisher information, Outlier models, Neighbourhoods of models.

1. Introduction

Interlaboratory tests are often used as a means of determining the quantities of certain chemicals in small samples. One example is the determination of mercury in river water. A sample of water is taken and sent to different laboratories. Each laboratory returns a set of measurements which then have to be analyzed in order to obtain an estimate of the quantity of mercury in the sample as well as the variability of the estimate. The main problem in the statistical evaluation of such interlaboratory tests is the presence of outliers where the word “outlier” can be applied to a laboratory as well as to an individual observation. There exist national and international standards on the evaluation of such data and they all have in common the use of outlier tests to determine outlying observations and laboratories. The weakness of such tests is now well known (Hampel (1985) Davies and Gather (1989)) and it is a matter of some interest to develop procedures for evaluating interlaboratory tests which do not suffer from these deficiencies.

In order to compare different evaluation procedures it is neccessary to develop some form of stochastic model to describe interlaboratory tests. The national and international standards mentioned above do not explicitly describe a statistical model but to judge by the tests used they are based on the normal distribution. The purpose of this paper is to describe an alternative model which is hoped will

0167-9473/91/$03.50 0 1991 - Elsevier Science Publishers B.V. All rights reserved

202 L. Davies /A stochastic model for interlaboratoty tests

clarify the issues involved and enable a more transparent comparison of different evaluation procedures.

2. A simple sample

Before describing a model for an interlaboratory test we first consider an individual laboratory. The laboratory returns n readings X,, . . . , X,, and we require a model for this which allows for the possibility of outliers. The standard practice in chemistry is to assume that “correct” measurements are i.i.d. JV( p, a2) and that outliers are caused by some malfunctioning of the measuring instrument, for example, dirt. Standard practice is also to eliminate any measurement differing markedly from the remainder. Such behaviour is consistent with the assumption that the measuring instrument will, if functioning correctly, return measurements which lie in a compact interval. To go to extremes, it will not return negative measurements nor readings indicating more mercury than water in the sample. Even without going to extremes a case can be made for assuming that correct observations will necessarily lie in some compact interval although the question of how compact will be left open for the moment. Given that correct measurements lie in a compact interval [a - 6, a + b] the outliers can be defined as observations which lie outside this interval. A weaker definition is possible (Davies and Gather (1989)) but the present one is sufficient for our needs.

The above considerations lead to the following model for n measurements. Let P,, denote the empirical measure of the n measurements, P the common distribution of the n - k “correct” measurements, Pn_k the empirical measure of these measurements and Qk the empirical measure of the k outliers. We have

where (2-l)

supp(Qk) = R ’ supp(P), (2.2a)

n-l Osks 2.

[ 1

(2.2b)

(2.2c)

The condition (2.2a) is nothing more than the definition of an outlier. The condition (2.2b) specifies a location and scale model for the “correct” measurements and the condition (2.2~) is a consequence of the fact that for such models it is not possible to identify more than [(n - 1)/2] outliers.

In order to complete the description of the model of a simple sample it remains to specify the distribution PO of (2.2b).

3. Minimum Fisher distributions

One of the main statistical problems when analyzing interlaboratory tests is to give a reasonable estimate of the mean of P, the distribution of the correct

L. Davies / A stochastic model for inter-laboratory tests 203

measurements. We make the philosophical assumption that this represents the true amount of the substance being measured in the sample. The estimation of a location parameter is most difficult for that distribution P which minimizes the Fisher information amongst all distributions whose support lies in the compact interval in which correct measurements are assumed to lie. This leads then to problem of determining the distribution with support [ - 1,1] and minimal Fisher information.

The Fisher information I(P) for location is given by

*(p)_up PdP)?, + J J’* dP

where the supremum is taken over all bounded continuously differentiable functions 4 with /$J’ d P > 0 and \c, (i) denotes the first derivative of 4. We have the additional restriction

SUPP(P) = [-ul. This problem has unique solution, namely the probability measure with density cos( n~/2)~ whose variance however is too large in comparison with the support [ - l,l] (see below). As with the normal distribution on the real line one can consider the problem of minimizing the Fisher information amongst all distributions with support [ - l,l] and a given variance a2

/-x2 dP(x) - (lx dP(x)j2 = cr2.

We shall however add the additional restriction that P have mean zero

1 x dP(x) = 0,

as this will simplify the arguments and still lead to an acceptable family of distributions. Although we give no proof of this the arguments given in Huber (1981), pages 77-80, lead to the following. The distribution PF minimizing I(P) under the above conditions has differentiable density function f with f ( - 1) = f(1) = 0. If we write

f=u2, then u satisfies the following differential equation,

ZJ2)(x) = (A,+X,x+X,x2)2+).

As the densities f(x) and f ( -x) have the same Fisher information and I is a strictly convex function (Huber (1981) page 80) it follows that we require symmetric solutions of (3.1). In particular this implies that h, = 0 and ~‘~‘(0) = 0. We must therefore solve

J2)(x) = (A, + X2x2)u(x), (3 4

204 L. Davies /A stochastic model for interlaboratory tests

under the initial conditions

U(0) = 1, U(‘)(O) = 0. (3.2)

The further condition u(l) = 0 can be satisfied by an appropriate choice of X, and X,. Finally the resulting u2 can be normalized to give a density on [ - l,l]. The following arguments are largely heuristic but hopefully correct.

If X, > 0 then u(‘)(x) > 0 for x in some interval (0, T) and this will lead to a bimodal density. This we reject for two reasons. The first is that the variances of such densities are too large when compared to the support [ - l,l] and the second is that a bimodal density is not a good model for interlaboratory tests. If X, = 0 then the variance may be calculated to be 0.424 which is also too large in comparison to the support. We therefore restrict ourselves to the case X, < 0 and we write (3.1) in the form

UC2)(X) = 7X2(1 + Cx2?Ix2)u(x).

If we set C(x) = u(x/a) then we see that

S2’(x) = -(1+ nx2)ii(x). (3.3)

Let z( TJ) denote the first zero of this function on Iw +. Then z( 77) + 0 as q + cc and z(n) + cc as 111 - 1. This is at least made plausible noting that 77 = - 1 has the solution

24(x) = exp( -x*/2).

If ii, denotes the solution of (3.3) for - 1 -C 77 -C cc subject to the initial conditions (3.2) it follows that

u,(x) = kJ(z(ll>x>,

satisfies (3.2) u,(l) = 0, and also the differential equation

U?‘(X) = -z(?l)2( 1+ n+?)‘x*)u,(x).

Let a*(n) denote the variance of the density fV given by

As n + cc we obtain the solution with A, = 0 discussed above and a*( 77) -+ 0.424. As q 1 - 1 a*( 11) --, 0. For 17 = 0 the solution of (3.3) is u(x) = cos x giving z(0) = 7r/2. Thus the density function

f(x) = cos2E 2 ) (3.4)

minimizes the Fisher information over [ - l,l] without regard to the variance. It is also the limiting distribution of a Brownian motion B(t) confined to the interval I - 1,ll as t + 00. Presumably this happens precisely because the density (3.4)

L. Davies / A stochastic model for interlaboratory tests 205

minimizes the Fisher information over [ - l,l]. We have no similar example for the other densities which arise.

The variance of the density (3.4) is (l/3) - (2/7r2) implying a standard deviation of 0.36. This is again too large when compared to the support [ - l,l] and we therefore concentrate on the range - 1 < 17 < 0 and set 71 = - y2. If ii satisfies (3.3) with this 17 we write

ii(x) =exp -g U(X). i i

Then u satisfies the differential equation

uC2)(x) - 2yxu(‘)(x) + (1 - y)u(x) = 0

subject to u(O) = 1 and u(‘)(O) = 0. On writing u as a power series 00

u(x) = 1+ c UjX2j, j=l

we obtain a, = - (1 - y)/2 and

(YC4j + ‘> - ljaj

a,,, = 2(j+ 1)(2j+ 1) ' j2 1.

Thus for y > l/5 all the a,, j 2 1, are negative and u is a monotone decreasing function. It is therefore relatively easy to determine the zeros of u and hence the function (z( 77) for - 1 < 77 < 0.

Using these methods it is possible to choose a standard deviation in relation to the interval [ - l,l], say l/5, and obtain the corresponding minimum Fisher distribution. This at least provides a theoretical alternative to the normal distribution although the actual values of this distribution will, for such a small variance, not differ greatly from those of the normal distribution with the same variance.

For fixed 77 we obtain through affine transformations a whole class of probability measures. Each such probability measure is uniquely determined by its mean p and variance u2. Furthermore its support is of the form [p - c( q)a,p + c( ~)a] and it is unimodal and symmetric about I_L. We will call such a distribution a compact normal distribution and denote it by CN(p, u2). It is to be emphasized that each valid value of n gives rise to a distinct family of compact normal distributions but we shall assume that n and hence c(q) is fixed once and for all.

4. What is being estimated?

Given an empirical distribution jn of the form (2.1) the estimation problem is that of estimating the mean and the variance of the distribution of the observations corresponding to the distribution Pn_k. Assuming the parametric model (2.2) for the distribution of the correct measurements this reduces to the problem of estimating the parameters p and u2. The model goes beyond a classical parametric model in that it allows for possible outliers. Nevertheless, it is still

206 L. Davies / A stochastic model for interlaboratory tests

strongly wedded to the parametric family (2.2). It is part of the philosophy of robust statistics to consider full neighbourhoods of parametric models and in this case one could replace the assumption

PE (PO(+): PER, 001,

by the assumption that P lies in neighbourhood of some Pp,a~ = Po(. - p/u). In keeping with the assumption that the distribution of correct measurements has compact support we will continue to demand that supp( P) = [a - b, a + b] for some Q and b. It would be possible to let the support of P be larger than that of the CN( /_L, a*) in whose neighbourhood P lies. However we will not consider this slight extra generality and we shall assume that supp( P) c supp(CN(~.~, a*)). Given (Y > 0 we denote by d some metric on the set of probability measures and define

&+) = (P: SUPP@‘) -PP(CN(P, o*), d(CN(IJ.7 a*), P) <a}.

(4.1)

We now replace the model (2.2) by

where (a) supp( Qk) c [w \ supp( P), (b) P E Zl&( a) for some p, a*, (c) 0 I k I

[(n - I)/21. We turn to the question of what is to be estimated. A measure P in Up,m2( a)

can be interpreted in two different ways. The first interpretation is that the true distribution is CN(p., u*) and that P is caused by some form of contamination such as rounding errors. The second interpretation is that P itself is the true distribution. Some mixed form of interpretation is also possible. We shall assume that P represents the true distribution of the measurements. Apart from rounding errors it will in general not be possible to decide which interpretation is correct. However, interpreting P as the true distribution allows an answer to the question of what is being estimated off the parametric model. It is generally assumed that correctly performed measurements will be unbiased. This cannot be verified but if it is not accepted it is not clear what the alternative could be. The quantities to be estimated are therefore lx dP(x) and, as a measure of variability, lx” dP(x) -

(Jx dP(x))2. This gives, at least in the case under consideration, an answer to the question

of what is being estimated by a robust statistic off the assumed parametric model. It also allows a comparison of different robust estimators away from the model. It would seem plausible, for example, that robust estimators with a linear influence function near zero will perform better, in terms of bias, than those estimators whose influence functions are not of this form.

5. A model for interlaboratory

We now turn to the problem with outliers to the case of

tests

of extending the above model of a simple sample interlaboratory tests. Although there exist many

L. Davies /A stochastic model for interlaboratory tests 207

articles on the statistical evaluation of such tests there does not seem to exist any precise description of the stochastic model on which the evaluation procedures are based. Judging by the tests used to detect outliers it would seem that the following model is assumed. For simplicity we suppose that n laboratories take part and that each laboratory returns k measurements. We denote the measure- mentstakenbytheithlaboratorybyXij, j=l,..., k, i=l,..., n.

The distributional assumptions are the following. For each i the random variables Xi], 1 <j I k, are normally distributed with mean Mj and variance u2,

xlj- H(M,, U’).

Given M, the random variables Xii, 1 <j I , are independently distributed. The variance a2 is the same for all laboratories but the means M,, 1 I i I n, are themselves random variables which are independently and identically distributed with a common normal distribution given .,V(p,, 2’). It would, of course, be possible to let the variance a2 also be a random variable but this would not essentially alter the model.

The first step in extending the simple sample model described in the previous section is to replace the normal distributions by the compact distributions. The basic model is then of the form

This model involves a random choice of a probability measure of the form CN( Mi, u2) to describe the distribution of the readings of the i th laboratory. In order to obtain a full neighbourhood of this model it would therefore seem necessary to consider a probability measure on the space of probability measures on the real line. We introduce the following notation. For any metric space S we will denote the Borel-u-algebra on S by .%9(S) and the set of probability measures defined on 3’(S) by w(S).

If d is the metric on S the Prohorov metric d, on w(S) is defined by

d,(W,, W2)=inf{~rO:IV1(B)~W2(BP)+a, W2(B)~W,(BH)+a

for all bEL&?(S)},

where B” = {x: d,(x, y) < (Y for some y E B}. If S is a separable metric space then zV( S) with the Prohorov metric is also

separable metric space. The probability measure on 22’(~(W)) used to describe the model for inter-

laboratory tests is the following. The set

CN = { CN(pL, u2) : /A E R, u2 2 0)

is a closed subset of w(R) and we define T: CN + R x 03, by T(CN(p, a2)) =

(I4 u2).

208 L. Davies / A stochastic model for interIaboratory tests

The mapping T is a bijective, measurable and with a measurable inverse. We define a probability measure pp,B~,O~ on 3?( YY’-(W)) by

where aO,z denotes the Dirac measure at the point u2. Let Up,O2( a) be as defined by (4.1) with d = d,. We write

U(CK a> = n (u,.&)). !&

Then U(CN, a) is a Bore1 subset of w(R) and we define

= (p E WV(R)>: %@p,h7 4 < a, SW?(P) = q CN, a,] 9

where Dp denotes the Prohorov metric on zV(YV(W)). ?V(p., ,X2, u2, a) can be interpreted as a neighbourhood of the measure lF’p,p2,0~. It is chosen to guarantee that any random measures with distribution P in YY’-(~, Z2, u2, a) will have compact supports.

The description of a model with this neighbourhood is as follows. The distribution of the readings of the ith laboratory is Pi and the (Pi); are independently and identically distributed random variables with a common distribution P in the neighbourhood of some lpp,22,02.

The question of what is to be estimated at distributions off the parametric model may be answered as follows. The statistical functional which reduces to p in the parametric model is

that

and

x dP(x) dlFp(P), i

corresponding to a2 is

/(/x2 dP(x) - (/x dP(x))*) dP(P),

that corresponding to 2Z2 is

I( /x dP(x)i2 dP(P) - (/(lx dP(x)) dP(P))2.

The above description of a full neighbourhood of the parametric model does not allow for the outliers. However it should be clear how outliers including individual measurements as well as individual laboratories can be taken into accoundt. It is perhaps worth noting that all one requires of outliers is that they lie outside the support of some measure. It is not necessary to assume that they are independent either of themselves or of the correct observations (Davies and Gather (1989)).

L. Davies / A stochastic model for interlaboratory tests 209

Finally we mention the problem of affine equivariance. It is clear that the problem of analyzing interlaboratory tests is affine invariant in that the same evaluation procedure should work for affinely transformed data i.e. the estimates should be affine equivariant and the detected outliers affine invariant. In this situation it would seem reasonable to consider affine invariant distances d on the set of probability measures W(R) in the following sense

d( P;, P:) = 44, 4)

for each affine transformation A: R + 08. Here PA denotes the probability measure

PA(B) = P@-‘(B)), B E 9?(R).

The Prohorov metric and the Levy metric are not affine invariant whilst the Kolmogorov metric is. Further affine invariant metrics are the total variation metric and the fuzzy finite interval metric

d,( PI, P,) = inf( a>O:P,(J)5P*(Jq)+a, P,(J)sP,(J,)+a

for compact intervals J ) ,

where J, is defined as [a - bea, a + be”] for J = [a - b, a + b].

The problem with affine invariant metrics is that W(R) is no longer separable. To see this it is sufficient to note that if d is any affine invariant metric then d ( PI, PI) = d, > 0 for any two distinct point measures P, and P2. It is no longer clear how to define appropriate probability measures and neighbourhoods on

W(R).

Acknowledgements

I would like to acknowledge the help of a sympathetic referee who found an embarrassing number of mistakes in the first version.

References

[l] Davies, P.L. and Gather, U. (1989). The identification of multiple outliers. Preprint. [2] Hampel, F.R. (1985). The breakdown points of the mean combined with some rejection rules.

Technometrics, 27, 95-107.

[3] Huber, P.J. (1981). Robust statistics. Wiley, New York.

Documents

A stochastic model for interlaboratory tests