14
ELSEVIER Computational Statistics & Data Analysis 19 (1995) 655-668 COMPUTATIONAL STATISTICS & DATA ANALYSIS Estimation of tolerance limits from reference data Paul Jordan Department q[ Vitamin and Nutrition Research, F. Hoffman-La Roche Ltd., Basel, Switzerland Abstract Results from a series of studies which have been published only in the form of mean values and standard deviations are often combined for statistical analysis. In this paper, the results of seven studies of the relationship between daily intake of vitamin C and the resulting steady-state serum concentration are used to estimate a tolerance interval about a regression line in order to give recommendations for daily intake. In the regression model, a random study effect arises which, together with the residual error, contributes to the overall variability. We use a consistent iteratively reweighted least squares method to estimate the parameters of the model, which are then used in the estimation of a one-sided tolerance interval. Comparison of our method with a simple least squares model indicates an improvement in the estimation in the region of greater serum concentrations. 1. Introduction In the literature (Lowry, 1946; Dodds, 1947; Davey, 1952; Morse, 1956; Kallner, 1977; Kallner 1979; Jacob, 1988), studies from various authors are known which investigated the steady-state serum concentration of vitamin C in humans with different daily intakes. As such a study requires a continuous dietary control of the participants over a long period, there are only a few good studies available based on different samples with different dietary intakes. We were asked to give a recommendation for vitamin supplementation based on this kind of data to several national health organizations; i.e. recommend such a minimum daily vitamin C dose that at least ~0% of the population have a certain vitamin level with probability (1 - ~). This leads to the problem of finding a one- sided tolerance limit for the ~-th percentile of the population depending on an arbitrary intake (for ~k, the values 80, 90 or 95 are usually taken for nutritional recommendations). A good survey of tolerance limits is given by Patel (1986). 0167-9473/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0167-9473(94)00028-H

Estimation of tolerance limits from reference data

Embed Size (px)

Citation preview

ELSEVIER Computational Statistics & Data Analysis 19 (1995) 655-668

COMPUTATIONAL STATISTICS

& DATA ANALYSIS

Estimation of tolerance limits from reference data

Paul J o r d a n

Department q[ Vitamin and Nutrition Research, F. Hoffman-La Roche Ltd., Basel, Switzerland

Abstract

Results from a series of studies which have been published only in the form of mean values and standard deviations are often combined for statistical analysis. In this paper, the results of seven studies of the relationship between daily intake of vitamin C and the resulting steady-state serum concentration are used to estimate a tolerance interval about a regression line in order to give recommendations for daily intake.

In the regression model, a random study effect arises which, together with the residual error, contributes to the overall variability. We use a consistent iteratively reweighted least squares method to estimate the parameters of the model, which are then used in the estimation of a one-sided tolerance interval. Comparison of our method with a simple least squares model indicates an improvement in the estimation in the region of greater serum concentrations.

1. Introduction

In the literature (Lowry, 1946; Dodds, 1947; Davey, 1952; Morse, 1956; Kallner, 1977; Kallner 1979; Jacob, 1988), studies from various authors are known which investigated the steady-state serum concentration of vitamin C in humans with different daily intakes. As such a study requires a continuous dietary control of the participants over a long period, there are only a few good studies available based on different samples with different dietary intakes.

We were asked to give a recommendation for vitamin supplementation based on this kind of data to several national health organizations; i.e. recommend such a minimum daily vitamin C dose that at least ~0% of the population have a certain vitamin level with probability (1 - ~). This leads to the problem of finding a one- sided tolerance limit for the ~-th percentile of the population depending on an arbitrary intake (for ~k, the values 80, 90 or 95 are usually taken for nutritional recommendations). A good survey of tolerance limits is given by Patel (1986).

0167-9473/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0 1 6 7 - 9 4 7 3 ( 9 4 ) 0 0 0 2 8 - H

656 P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668

The mean serum level, the standard deviation and the sample size are usually reported for each of these studies.

As data were collected from various sources in different contexts, it is not possible to exclude the occurrence of study effects due to different measurement methods and other unknown factors which may influence the results. In this paper, we develop a natural estimation for the tolerance limit based on treating the study effects in a random effect model for the case where such a study effect must not be neglected.

In Section 2, we choose a linear model which contains a study effect and describe it by a random variable. The occurrence of a study effect prevents us from treating this model with standard methods known from the theory of linear models (see e.g. Johnson and Leone (1977) or Draper and Smith (1981), i.e. the corresponding maximum likelihood equations for the regression coefficients and the variances are not analytically solvable any more.

In Section 3, we first develop an iterative procedure to solve the ML equations. Lemma 1 proves that the obtained estimators are consistent at any step. Then we develop a test for the existence of the study effect. Furthermore, we propose a natural choice of an estimator for the variance of estimated serum levels for any given intake value.

Section 4 deals with the estimation of tolerance limits as a function of an arbitrary chosen intake value, i.e. we obtain the one-sided tolerance limit as a curve linear in the intake/serum level diagram. It is on the one hand a generalisation of the method described by Owen (1968) which estimates a single tolerance limit based on one given sample mean and the standard deviation; and on the other hand it is partly related to the techniques for one-way ANOVA models described in the paper of Mee and Owen (1983).

An estimation of a tolerance limit finally enables us to give a solution for the "inverse" problem i.e. to determine the minimal intake such that a desired tolerance limit can be achieved.

The numerical example of Section 5 is based on real data. We compare our method with the "naive" approach omitting the study effect and find a considerable difference for both high and low intake values.

2. Choice of a model

It is a well-known fact from the pharmacokinetic behaviour of water-soluble vitamins that in a wide range, the relation between intake and serum concentration is approximately linear. This linearity fails only at high concentrations where saturation effects may occur (see e.g. Toggenburger et al. (1979)). As our studies consider steady-state concentrations with rather low values, we modelize the dependence by means of a linear relationship: Let Ylj denote the steady-state serum level of the j-th participant at the i-th study and xi be the daily intake at the i-th study. Then, we express the linear relation by

Y i j = ~ + f l x i + 6i + eii ( i = 1 . . . . . k ; j = 1 . . . . , ni). (2.1)

P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668 657

The intercept ~ stands for a minimal background serum level which always exists. 6~ represents a random study-effect term and the eifs are the residual error terms with the assumptions:

eij ",~ N(O, a{).

Later, we will use the reported standard deviations s~ as estimators of a~'s. Further, we assume

6i "~ N(O, z2), cov(6i, ~'ij) = 0 Vi, j.

Denoting the reported means of serum level by y~ enables us to rewrite (2.1) as

Yi = ~ +/3xi + qi, qi "~ N(O, a2) = N(O, a~/ni + rE). (2.2)

In the next section, we will use (2.2) in order to estimate the study-effect variance z 2 and to develop a test for the hypothesis ~z > 0.

3. Estimation of the study effect

3.1. M a x i m u m likelihood estimation fo r the study-effect term

For further analysis, we start with (2.2) and consider the s~ as known and set a~ = s~. We proceed using the maximum likelihood method is order to estimate ~, /3 and r. The logarithm of the likelihood function is given by

k k 1 ~2 (Yg - (~ +/3xi) ) 2 l n ( L ) = c o n s t - - ~ l n ( a 2 / n ~ + ) - ~ 2(a2/n i + z2 )

i=1 i=1 (3.1.1)

which leads to the following ML-equations:

O(lnL) ~ Yi - (~ + flxi) O, (3.1.2a) & 2 =

O(ln L) k ^ ^ _ _ - v x , ( ~ - ( j_+ ~x,)) o,

1... 2 ^2 cqfl 0 ~ = (3.1.2b) i= 1 ai /n i + r

O(ln L) k 1 ^ V (Yi -- (~ + xi)) 2 ~ ( , ~ ) ~ - 0 =:~ -- i=IE f f 2 / n j _ t _ ,~2i=Z.a 1 (~/2/n/.jW ~-- ~ ~--~- O. (3.1.2C)

Eq. (3.1.2c) yields:

k E (y, - ~ - ~x,)21( 1 + ~ l n , ~ 2 ) ~

,~2 i = 1 = k

1/(1 + d / n f ) i=1

(3.l.3)

658 P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668

(3.1.2) and (3.1.3) are an implicit nonlinear system of equations for the parameters to be estimated which cannot be solved analytically. Therefore we propose the following double iteration procedure which consists of six steps and is an extension of a method described by Pocock, et al. (1981) and prove its consistency.

Iterative procedure to solve the M L equations (a) Initial values e0 and flo for e and fl by OLS. (b) The initial value ~02 is given by

k

E ( Y i - So -- floXi)) 2 -- a2/ni ,g2__. i = 1

k

(c) By solving (3.1.3) iteratively, one obtains a new estimate for z 2. The qth step is given by

k 2 ^2 ( Y i - (do + fioXi))2/(1 + ai /nizq-1) 2

~ 2 i = 1 "gq ~--- k

2 ^2 1/(1 + ai/niZq-1) i = 1

(d) By weighted least squares, one obtains new estimators ~1 and fl~. The weights matrix is given by

V = diag(a~/nx + 9 2 , . . . , a2/nk + ~2)

(e) Repetition of steps (c) and (d) until convergence is achieved. (f) Calculate the weighted least squares estimator for z2:

. ~ 2 = 1 k 1 I y ' - - t x')2

where the weights w~ are the diagonal elements of the final estimation of V- 1

If this iteration is convergent, due to the fixpoint theorem for contracting operators, it converges toward a solution of the ML equations.

The whole procedure can be considered as an iteratively reweighted least square solution where the population variances enter into the weight matrix V.

L e m m a 1. (a)-(e) yield a consistent estimation ofT, fl and "C 2 at any step, if Vj- 1 exists and is convergent as supposed (contains z j).

Proof. The consistency is shown by induction. Cto, flo and because of 2 p l i m ( y i - ~o - f loXi) 2 7~2 + a2/ni = an

also z 2 are consistent. 2 follows that (i) From the consistency of zj

notation is used (see below), then

~ + ~ - -

of ~j+x a n d flj+x" when

- - (X ' V ; 1 X ) - ~ X ' V;~y = (X ' V ; ~ X ) - ~ X ' v ; ' ( x ~ + 't)

matrix

P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668 659

where

^ 2 Vj = diag(tr2/nl + ~2 . . . . . t72/nk + "Cj).

Therefore

plim ~ = fl + pl im(N-XX ' V f l X ) -1 ( N - 1 X ' Vfl~l).

X is a non-random variable and therefore not correlated with r/, which implies

p l im(N- 1X' V Slr / ) = 0

if Vj exists and is convergent as supposed (contains z j). So one finds finally

X' V f ~ X = O(N), p l i m ( N - 1 X ' V]-IX) -1 = 0(1).

2 follows that of 2. (ii) From the consistency of ~j, flj and z j_ 1 zj

^ ^ 2

~ - 2 - - ~-'~-2 - 2 = 2 , ' 2 Vq plim i=1 ~ (1 + 0.i /nizj_ 1 ,q- t) i= ( 1 + 0. i /njzj_ 1 ,q- 1 )

k _ _ 2 ^ 2 (Yi -- &j fljXi)2/( 1 + eYj/niZq-1) 2

=~ pl imi=l k : z2 Vq 2 ^ 2

l / ( 1 + 0. i lniZq_l) i = 1

and therefore 2 is consistent

M a t r i x notation used in the p r o o f o f L e m m a 1

x~ = (1, Xo)', fl' = (~, f ly and X: design matrix

3.2. Test f o r occurrence o f the stud), effect

In certain situations it may be desirable to perform a significance test for the existence of study effects, i.e. to test the hypothesis z 2 = 0. In the c case when the test is not significant an estimation of 0 .2 c a n be achieved by simply regressing the population means against the x values. Then the variance of this OLS estimator is well known (e.g. Draper-Smith) and can be used later for calculation of the tolerance limits.

In order to develop a method for testing the study effect we start with formula- tion (2.1) of our model and first perform standard techniques of analysis.

660 P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668

The hypothesis that no study effect exists can be formulated as:

Ho: z 2 = 0 or equivalently 61 = 6z . . . . . 3k = 0, i = 1 . . . . , k.

Writing

y,j - Yi = (y , j - Y,.) - ( ) ~ - Y~)

we get

k n i k n i k

E E (Yi j -- ) i ) 2 = E E (Yi j -- ; i . ) 2 _~_ E n , (~ i - ~i.)2. (3 .2 .1) i = 1 j = 1 i=1 j = l i=1

The left-hand side of the equation above is the residual sum of squares the first term on the right hand side is within-groups term and the second term is the between- groups or study effect term.

Since

~i = ~.. + f i (x , - ~.)

the second term in (3.2.1) becomes

k

Y~ n , [ y , - y.. - ~ ( x , - . ~ . ) ] z . i = 1

Therefore, under Ho, the ratio

k

n i [ ( Y i -- Y.. -- f i (x i -- ~ )22/q F = i=1 ~ ., (3.2.2)

~, (Yi j - - Yi.)2/r i = 1 j = 1

is distributed as Fq,, with degrees of freedom

k q = k - - 2 and r = ~, n i - k.

i=1

The estimation of the slope is given by the iterative procedure described in Section 3.1 As the individual values y~ are unknown, we substitute the error term

k n i

Z E (y,~ - Y,.)~ i=lj=l

by

k ~, s 2 ( n i - 1)

i=1

where the si's are the standard deviations from the studies. This shows that in our model, mean, standard deviation and sample size are suf f ic ient for testing the study effect.

P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668 6 6 1

3.3. Estimation o f the variance o f a predicted value

For our final goal, the construct ion of a tolerance limit, we need an est imation for the variance of the predicted value at any given x0:

= (Xo) = + dXo

Let x6 = (1, Xo)' and fl' = (0~, fl)' as above. Then,

2 a~: = Var()3(Xo)) = x~ Var(fl)x0 (3.3.1)

where the var iance-covariance matrix of the coefficients may be obtained by the weighted least squares estimator:

Var(fi) ~- (X' V - ' X ) - I ~2 = (X' V - ' X ) - I (Y - Xfi)' V- l (y _ Xfi ) (3.3.2) k - 2

where Xdeno tes the design matrix, and for the elements of Vwe use the final values of the iterations. Expression (3.3.2) is only approximately true, as the weights matrix V is estimated itself. We will see later that the differences between the initial and final est imation of fl are very small which justifies our approximat ion (see also Johns ton (1972)).

4. Estimation of the tolerance limits

For any given intake value Xo, we assume that the individual values y of the corresponding hypothetical popula t ion are normally distributed according to

Yij(Xo) "~ N(~ + flXo, z 2 + o'2(x0)) (4.1)

The problem, how to get an est imation for ~2(Xo), will be discussed later. As we have only estimated values for y(xo) and z 2, the problem of finding a tolerance limit is reduced to finding a value k, such that the probability is 1 - ~ that at least a propor t ion ~O of the popula t ion is above

)~(Xo) -- /£(.~2 + 0.2(X0))1/2. (4.2)

Mathematically, the problem is to find x such that

P { P { y >_ ~(Xo) - t¢(f 2 + o'2(Xo')) 1/2 } ~" ~t} = 1 -- e (4.3)

The inner probabili ty in (4.3) is with respect to y and the outer is with respect to the estimations of y(xo) = o~ + flXo and z 2. Rewriting the inner probabil i ty yields

P { y >_ f~(Xo) -- K(f 2 + o'2(Xo)) ~/2 } _>

{ . y - +_ Xo) ;,(Xo) - + - + Xo) "~ P ~(.[.2 + O-2(X0))1/2 ~-- (272 .31_ O. 2(X0))1/2 ] >- 0" (4.4)

662 P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668

As

y - (a + flXo) ..~ N(0, 1), ('C 2 2 I- 0"2(X0)) 1/2

the right inequality of (4.4) can be transformed to

~(Xo) - ~(~2 + ~ (Xo) )1 /2 _ (~ + 3xo) < K, (l .2 -1- 0"2(X0)) 1/2

where K~ is defined by

o 0 ,f x~ ~ e-t2/2 dt = O"

Kq, Substituting (4.5) into (4.3), the latter becomes after rewriting

{i,(Xo) - (~ + fl~o)l(~ ~ + ~(~o)) '/~ - 1% ) P k (~2 -F ff2(Xo))l/2/('C2 -~-0"2(Xo)) 1/2 _~< K__ = 1 -- 0~.

Dividing (4.4) by a(~(xo)) := a~ and multiplying with ay gives

(4.5)

(4.6)

) (Xo) - (~ p ak

+ fiX0) __ Ko(Z 2 + 02(Xo)) 1/2 o-~

(~2 + a~(Xo))I/2 (~ + o2(Xo)) 1/2

~--- ~(~2 "4- a2(Xo)) 1/2 a~ -- -1-0~.

(4.7)

With the ansatz (z 2 + o'2(Xo)) 1/2 = Z x/1 + )t(Xo), the weight matrix V becomes V = z W with W = diag(z 2 + aZ/nl ,... ,z 2 + a~/nk) and

2 = z 2 xo(X' W - 1X)- lX'o (4.8) 0"9 (Xo)

Defining

g(Xo):=(.C2+t72(Xo))l/2 ( ~ + 2(Xo ) )1/2 a~ = Xo( )~W_lX)_ l x , ° (4.9)

and considering 2(Xo) as non-random, (4.7) can be written as

p(()3(Xo)- (e +(~.2/~2)l/2flXo))/ay-- Kq'g(x°!<~gg(x°)) = 1 - ~ ' - (4.10)

Neglecting the fact that the weight matrix contains estimations, the WLS-estimator of z 2 is approximately z2-distributed, and the probability distribution in the left-hand side of (4.10) can be approximated by a noncentral t-distribution with non-centrality parameter d = - K~g(Xo), t = ~:g(Xo) and k - 2 degrees of freedom.

P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668 663

The next problem which we must solve by an approximation is to find a value for the - unknown - variance of the serum concentration o(x0) at an arbitrary intake value of x0 in order to obtain a value for 9(Xo). This will be achieved by regressing the known si as a function of the corresponding intake x/s.

Finally, we obtain the following expression for the tolerance limits

T(y) ~- ~ + t~ - - - - t l - ~ (~2 + aZ(Xo))l/2 (4.11) g(Xo)

where t l - , is the (1-~)-percentile of the non-central t-distribution with k - 2 degrees of freedom and non-centrality parameter d.

5. Numerical example

5.1. Description o f the study data

For vitamin C, we try to establish an intake recommendation based on the following data which are summarized in Table 1 below according to the methods developed in the previous sections.

All calculations were performed with the SAS ® system using its specific program- ming language.

Fig. 1 gives a graphical representation of the vitamin intake and the resulting mean serum level.

5.2. Results o f the estimation procedure

We applied the iterative procedure described in Section 3.1. to our data until the convergence criterion

^2 ^2 - 9 Iv,.i -- z,,i+ll <-- 10 Vr

was met.

Table 1 Number of participants, intake and steady state serum concentration of the vitamin C studies treated in this paper

Study Number of Intake Mean steady-state Standard participants (mg/day) serum level(mg/dl) deviation

Lowry (1946) 24 8 0.18 0.029 Davey (1952) 4 25 0.29 0.022 Morse (1956) 20 33 0.47 0.034 Dodds (1947) 41 107 1.05 0.120 Kallner (1977) 4 90 0.83 0.066 Kallner (1979) 3 60 0.77 0.042 Jacob (1988) 11 5 0.13 0.034

664 P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668

M 1.1 e

a l . O n

0 . 9 S

e r 0 . 8 U

m 0 . 7 -

1 0 . 6 - e

v

e 0.5~ 1

0 . 4 ( II1

0 . 3 g / d 0 . 2 1

) o . i

D

, 0

A

1 ' ' ' ' ' 1 . . . . I . . . . I . . . . I ' ' ' ' 1 ' ' ' ' 1 . . . . I . . . . I ' ' ' ' 1 ' ' ' ' 1 . . . .

0 10 20 50 40 50 60 70 80 90 100 110

ga, ly intake of vi tamin C in mg

STUDY + + + D a v e y 1952 x × x Oodds 1947 * * * J a c o b 1988 O D D K a l l n e r 1977 o o o K a l l n e r 1979 ~ A a L o w r y 1946

~ ~ M o r s e 1956

Fig. 1. Mean serum level of vitamin C (mg/dl) versus daily intake in mg for the studies listed in Table 1.

Step (a) gives the following initial es t imat ions for ~ and fl:

020 = 0.1217, /~ = 0.008744.

f rom which the following es t imat ions of Z2o are obtained:

Zo,̂ 2 o = 0.003811, Zo,̂ 2 oo = 0.003398.

The nextstep (d) yields:

&i = 0.1104, /~i = 0.008794.

and the i terated es t imat ion for r 2 becomes:

920 = 0.003398, ~,o~ = 0.003398.

And the final WLS-es t ima to r becomes:

?2 = 0.005126.

So we see that our i terat ion a lgor i thm has a l ready converged after only one weighted- leas t -squares step,

The F-test for the s tudy effect descr ibed in Section 3.2 gives a value of 5.274 (p = 0.0003), i.e., the s tudy effect is even highly significant.

P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668 665

5.3. Estimation of tolerance limits

5.3.1. By iteratively reweighted least squares By means of stepwise regression we found that a quadratic relationship between

intake (x) and standard deviation of serum level fits the data optimally (Fig. 2). The high SS(regression)-SS(error) ratio which results in a R2-value of 0.955 justifies the approximation that o(Xo) is treated as a non-random function of x. By means of (4.1) and (4.9) we then calculate g(Xo) numerically. The solid line in Fig. 3 shows this function graphically.

5.3.2. Comparison with an OLS approach When we neglect all information about the known study variances ai and treat

the study means like a single measurement we obtain the OLS estimations. In this case, 9(Xo) simply reduces to

g(Xo) = (x;(X'X)- XXo)- where X is the design matrix of OLS estimation (see e,g, Draper and Smith (1981). This function is shown by the dashed line in Fig. 3.

Fig. 4 compares both regression lines and tolerance limits obtained by iteratively reweighted least squares (solid lines) and OLS (dashed lines), respectively.

Fig. 5 shows the final estimations of the tolerance limits for various percentages ~# at the 0.05 and the 0.01 significance level.

0.12: rl

0.II S

t 0 . i 0 a

n d 0 .09 a

r 0 .08 d

d 0.07

e [] [] v 0 . 0 6 i

a 0.05 t i o 0.04

n

0 . 0 3

[] 0 . 0 2

, ~ , , l , , , , l ~ , , . l . ~ , , i , , , , l . . . . i , , , , i , , , , i . , . . i , , , ~ i . . , .

10 20 30 40 50 60 70 80 90 100 1 0

Daily intake of vitamin C in mg

Fig. 2. Standard deviation of the study populations versus intake. The solid line represents the regression line for the evaluation of the standard deviation s(x) of a hypothetical population.

666 P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655 668

2 . 8

2 . 7

2.6 2.5 2 . 4

2 . 3

2 . 2

g 2 . 1

2 . 0 11.9

1.8

1.7

1.6

1.5

1.4

1.3

Fig, 3. The function

/ \

/ \ / k

/ k / %

/ % / \

/ \ / %

/ \ / \

/ \ \

\ \

k

N

' ' ' ' l ' ' ' ' l ' ' ' ' l ' ' ' ' l ' ' ' ' l ' ' ' ' l ' ' ' ' l ' ' ' ' l ' ' ' ' l ' ' ' ' l . . . . I

10 20 ,50 40 50 60 70 80 90 IO0 110

( ) for the iteratively reweighted least squares, solution (solid line) and for the OLS approach (dashed line).

i.i'.

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.I

0.0

-0.i

- 0 . 2 I ' ' ' ' 1 ' ' ' ' 1 ' ' ' ' 1 ' ' ' ' 1 ' ' ' ' 1 ' ' ' ' 1 ' ' ' ' 1 . . . . I ' ' ' ' l ' ' ' ' l ' ' ' '

0 10 20 30 40 50 60 70 80 90 100 1

Fig. 4. Regression line and 95% tolerance limit for the iteratively, reweighted least squares solution (solid line) and for the OLS approach (dashed line) at the ct = 5% level.

P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668 667

i 1

i 0

0 9

0 8

0 7

0 6

0 5

0 4

0 3

0 2

0 1

0 0

-0 1

-0 2

1 - a = 0 9 5 / / ~ / / / ' 1

/

d ~

~ P ' ' I ' ' ' ' I ' ' ' ' I . . . . I ' ' ' ' I ' ' ' ~ I ' ' ' ' I . . . . I ' ' ' ' I ' ' ' ' I ' ' ' ~

0 I0 20 50 40 50 60 70 80 90 I00 110

I . i

1 . 0

0 . 9 E s 0.8 t i 0.7

a 0 . 6 t

0 . 5 e

d 0.4

v 0.3 a

I 0.2 U

e 0.1 $

0.0

- 0 . i

-0 .2

/ /

1 - a = 0 . 9 5 ~

' ' ' ' I ' ' ' ' l . . . . I . . . . I ' ' ' ' l ' ' ' ' i ' ' ' ' l . . . . I ' ' ' ' l ' ' ' ' l ' ' ' ' l

10 20 30 40 50 60 70 80 90 100 110

Fig. 5. Regression line and tolerance limits for the iteratively reweighted least squares solution for ¢ = 9 5 % , ~ = 8 0 % a n d ¢ = 7 0 % ( ~ = 5 % a n d ~ = 1%).

6. Discussion

When we look at our results we notice that we are in the "worst" case where the variance of the study effect is of the same magnitude as the reported population variances and the largest discrepancy of the two approaches appears. In the limit

668 P. Jordan/Computational Statistics & Data Analysis 19 (1995) 655-668

where

r 2 >> a { Vi

the OLS approach would be adequate. In the other limit where

"c 2 <~ o ~2 Vi

a simple weighted-least-squares solution with weights proportional to the inverse of the standard error of the means might be considered.

When we compare our more sophisticated estimations with the "naive" one based on OLS we find that, although the regression lines are almost the same, the tolerance limits are different for the x values caused by the different shape of g(xo). The tolerance limits obtained by OLS are always smaller than those from the more complicated method.

In order to give a recommendation for vitamin supplementation, one has to find the intersection of the tolerance-limit line (for the desired ~ and ~) with the horizontal line at the desired serum value. We see that for high values of serum concentration, the OLS method would considerably underestimate the needed intake values.

For small and medium range intake values, OLS and iteratively reweighted least squares yield almost the same results and one can assume that the effect of the approximation to treat g(Xo) is negligible. Nevertheless, it would be an interesting topic for further analysis to find that distribution empirically.

References

Davey, B. L. et al., J. Nutr. 47, (1952). Draper, N. R. and Smith, H. Applied regression analysis (Wiley, New York, 1981). Dodds, M. L. and MvLeod, F. L. Science (July 1947). Jacob, R. A. et al., Am. J. Clin. Nutr.48, (1988). Johnson, N. L. and Leone. F. C. Statistics and experimental design (Wiley, New York, 1977). Johnston, J. Econometric methods (McGraw-Hill Kogakusha, Tokyo, 1972). Kallner, A. et al., Nutr. Metab. 21, (1977). Kallner, A. et al., Am. J. Clin. Nutr. 32,(1979). Lowry, O. H. et al., J. Biol. Chem. 166, (1946). Mee, R. W. and Owen, D. B. Improved factors for one-sided tolerance limits for balanced one-way

ANOVA random model, JASA 78, (1983) 901. Morse, E. H. et al., J. Nutr. 60, (1956). Owen, D. B. A survey of properties and applications of the non-central t-distribution, Technometrics 10,

(1968) 445. Patel, J. K. Tolerance limits - a survey Comm. Statist Theory Methods 15 (1986), 2719. Popcock, S. J. Cook, D. G. and S. A. Beresford, Regression of area mortality rates on explanatory

variables: What weighting is appropriate? Appl. Statist 30 (1981) 286. Toggenburger, G. Landolt, M. and Semenza G. Na ÷ -dependent electroneutral L-Ascorbate trans-

port, FEBS Lett 108 (1979) 154.