Optimal Experimental Design for Polynomial Regression

Optimal Experimental Design for Polynomial RegressionAuthor(s): Stephen M. StiglerSource: Journal of the American Statistical Association, Vol. 66, No. 334 (Jun., 1971), pp. 311-318Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2283928 .

Accessed: 16/06/2014 11:51

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 11:51:22 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=astata

http://www.jstor.org/stable/2283928?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


? Journal of the American Statistical Association June 1971, Volume 66, Number 334

Theory and Methods Section

Optimal Experimental Design for Polynomial Regression

STEPHEN M. STIGLER*

The problem of choosing the optimal design to estimate a regression function which can be well-approximated by a polynomial is considered, and two new optimality criteria are presented and discussed. Use of these criteria is illustrated by a detailed discussion of the case that the regression function can be assumed approximately linear. These criteria, which can be considered as compromises between the incompatible goals of inference about the regression function under an assumed model and of checking the model's adequacy, are found to yield designs superior in certain respects to others which have been proposed to deal with this problem, including minimum bias designs.

1. INTRODUCTION

In this article we shall be concerned with the standard univariate regression setup; that is, for n fixed (but not necessarily distinct) values of an independent vatiable x, an experiment is run and a response Y measured. We shall write

Yi = f(xi) + ei, i = 1, ,n,

where f is an unknown function, the xi's are known, and the ei's are uncorrelated random errors with mean zero and common variance a2 (either known or unknown).

In many situations, while the fuinction f may not be known, we would be willing to assume that it is sufficiently smooth over the range of interest to be adequately represented by a polynomial. If we assume f(x) =,io+fix + - - - +fmxm is of degree m, where the l3V's are unknown, we will call the setup model Pm. If we let Y, , and e denote column vectors, Y' = (Y1, * , Yn), '= (13o, 13l,

* 13m), ?1=(E1, * * * , En), and let X denote the n X (m+ 1) matrix given by

Xl X12 ... x2mI 1 X2 X2 2 . . . X2n

x~~x2.

then model Pm can be written Y=Xg+e, where e has mean 0 and covariance matrix 2I. (I is the nXn iden- tity matrix). We will always assume that the xi's are specified in such a manner that the matrix X'X is non- singular. In the present context this is equivalent to assuming the xi's take at least m+1 distinct values (so that in particular, n> m+ 1). If this condition is satisfied,

* Stephen M. Stigler is assistant professor, Department of Statistics, University of Wisconsin. Part of the work for this article was done while the author was a summer employee with Bell Telephone Laboratories, Inc., RIolmdel, N. J.

then the unique least squares estimator of m is given by l}= (X'X)-1X'Y- (i?oil .. * m)', which has mean ,B

and covariance matrix o2(X'X)-1. The design problem which we shall be concerned with

is the following: How should the values x1, * * *, x, of the independent variable be chosen (assuming the statistician has a choice) in order to give rise to the "best" experiment? The question of which is the best design of course depends critically on what one means by 'best." A number of optimality criteria have been proposed and discussed in the past (for example, see Kiefer [8] and Box and Draper [1, 2],) but as we shall argue in Section 5 (see Table 3 and accompanying discussion), none of these are entirely satisfactory. The main point of this paper will be to introduce two new criteria, which we shall call restricted D- and G-optimality, which seem to overcome some of the drawbacks of the designs Kiefer discusses, while meeting most of the points raised by Box and Draper.

Before discussing the different optimality criteria we add one further condition that all designs will be required to satisfy. We will require -1 < xi < 1 for i = 1, ***, n. The reason for this constraint is that in any

practical problem there will be some finite interval of values of x over which we are interested in estimating the regression function. The restriction of the design to this "region of interest" can be justified on the grounds that often the region of interest will roughly coincide with the region over which the model is assumed valid. Even if it were feasible to run the experiment outside of this region, the possibility of serious bias will, for all practical purposes, limit the design to the region of interest, as was discovered by Box and Draper [1]. We will lose no generality by taking this region to be the interval from -1 to 1, since the criteria we shall consider are invariant under linear transformations on the independent variable. Thus the best design for the interval [a, b] can be easily found by transforming (linearly) the best design for [-1, 1] to [a, b].

In describing designs we shall employ the device introduced by Kiefer [8] of describing a design as a discrete probability measure t on the region of interest [-1, 1]. That is, the design t which corresponds to the selection of values xi, x2, * *, x. is the measure which puts mass

311



312 Journal of the American Statistical Association, June 1 971

1/n at each value of x which occurs only once, mass 2/n, at each value which occurs twice, etc. Similarly, corre- sponding to a discrete probability measure t we have the design which takes nt(x) independent measurements at the value x, for each x in [-1, 1]. For reasons of mathematical convenience we shall often describe a design by a measure t without insisting that nt take on only integer values. Thus, for example, we will state that for some optimality criteria the best design for linear regression is (- 1) = (1) = 1/2, while n (1) is not an integer unless n

is even. For such situations as this it is recommended that an approximation to the stated design be used (in the above example with n odd, take (n- 1)/2 measurements for x= -1 and (n+1)/2 for x=1). Such approxi- mations will yield "approximately best" designs if n is large, for most optimality criteria [8].

In connection with this characterization of designs as probability measures, it will be convenient to let Mm(t) denote the (m+ 1) X (m+ 1) matrix whose ijth element is

aii-J Xi+j2dt,

where t is a probability measure on [-1, 1]. Mm is often called the information matrix of the design 5, since if the measure t corresponds to an experiment run at the (not necessarily distinct) points xi, --, x,,, then Mm(t) = (l/n)X'X, and (cr2/n)Mj-'(t) is the covariance matrix of the least squares estimator.

2. SEVERAL OPTIMALITY CRITERIA 2.1 D-Optimal and G-Optimal Designs

Perhaps the most carefully studied optimality criterion for the design of regression experiments is what Kiefer refers to in his work as D-optimality (see [8], in particular). A design to is said to be D-optimal for the model Pm if it minimizes the determinant of the covariance matrix of 0 (which is also called the generalized variance of 0); that is, if it maximizes f Mm(t) f.

From an historical point of view, the design criterion which was first considered (by Smith [15]) was a global criterion based on the variance of the estimated regression function. Following Kiefer, we shall say a design is G-optimalfor the model Pm if it minimizes

m max var E[ ]'x.

-1?x ? 1 k=O

Much of the appeal of these criteria stems from the fact, discovered by Hoel [3] when he found the D-optimal designs and proved in more general contexts by Kiefer and Wolfowitz [12] and Karlin and Studden [5, 6], that a design is D-optimal if and only if it is G-optimal. This discovery simultaneously gave a global justification for D-optimality and showed that G-optimal designs were good for parameter estimation.

Another attractive feature of these criteria is that if the errors are assumed to be normally distributed, then the D-optimal design minimizes the expected volume of the smallest invariant confidence region (in (m+ 1)-

dimensional space) for 5, for any confidence coefficient. Further arguments in favor of D- and G-optimality as appropriate criteria and other characterizations of D- optimal designs are given by Kiefer [8, 10], and by Kiefer and Wolfowitz [11].

It should be remarked, however, that while these are quite reasonable criteria to consider for the model Pm if one is interested in estimating the whole regression function over the region [-1, 1], the optimal designs have a severe practical disadvantage: they are extremely model dependent. In fact, the D-optimal design for the model Pm is concentrated on m+1 points. Thus if an experiment is run under the assumption that the model Pm is correct, but in fact this model is inadequate and P,+, is more appropriate, it is impossible to detect this departure from the assumed model using the D-optimal design for Pm, no matter how large the sample size. For example, the D-optimal design for the model P1 (linear regression) calls for half the measurements to be taken at x = -1 and half at x =1; using this design it is not possible to detect the presence of a quadratic term in the regression function.

2.2 Designs for Making Inferences about m

Another criterion that has been widely considered is that of minimizing var(0m). The rationale behind this criterion is that from the point of view of squared error loss, the design optimal in this sense permits the sharpest inferences possible to be made about 3m. This would be particularly appropriate for a statistician who wishes to test the adequacy of the model Pm.- by testing the hypothesis Am 0.

The main drawback of this criterion is that it is only appropriate for a very limited problem: estimating #3m or testing hypotheses about 13m. If, for example, the hypothesis #3m =0 is accepted and the statistician decides the model Pm.- is adequate, then the design which is optimal in the above sense is not optimal in any sense for making inferences about the model Pm,-. On the other hand, should he reject the hypothesis 8 .m =0, deciding that the model Pm is indeed the appropriate one, then again the design is not optimal for making global inferences about the regression function. Also, as with D-optimal designs, the optimal design is concentrated on m +1 points, making it impossible to detect the presence of terms of higher degree than xm.

2.3 Minimum Bias Designs The criteria so far discussed are alike in one disturbing

respect: they are all so dependent on the assumed model that the optimal designs provide no opportunity for a check of the model's adequacy. One attempt to meet this objection has been made by Box and Draper [1, 2]. If we let

m Um(X) = (3iX

-=o

denote the least-squares estimate of the true regression function f(x), calculated under the assumption that the



Optimal Design for Regression 313

model Pm is correct, then Box and Draper suggested that the first requirement the design should satisfy is that it should miniimize

J E[Jm(x) - f(x) ]2dx = 2 var[Jm(x)]dx

+ 2 1 [Efm(X) f (X)2dx

- V+B,

where they referred to the first term on the right, V, as "variance error" (i.e., error due to sampling variation), and the second term, B, as "bias error" (i.e., error due to a mistaken or inadequate model). If a design could be chosen to minimize V+B, and some additional freedom of choice existed, they suggested attempting to satisfy the additional criterion of maximizing the power of a goodness-of-fit test for some class of alternative functions g, such as polynomials of degree m+d, for some d>O.

The principal difficulty with adopting the criterion "minimize V+B" is that the optimal design depends on the function f, which is unknown. Even if it is assumed that f belongs to some well-behaved family of functions such as the polynomials of degree m+1, the optimal design cannot be found, as it will depend on the (unknown) coefficient of xm+l.

To avoid this difficulty, Box and Draper [1] recommended that one choose the design to minimize B alone. They then showed that if f were a polynomial of degree m+d, this could be done by choosing a design t whose first 2mn+d moments agreed with the first 2m+d moments of a uniform distribution on [-1, 1]. Unfor- tunately, their recommendation that the design minimize B alone, which is based on calculations for the case m=d=1, depends quite strongly on the fact that they allowed the "operability region" over which the experiment could be run to extend indefinitely beyond the region of interest [-1, 1]. For the case they considered, they found that if the model were inadequate to the point where bias contributed even only 20 percent to V+B, then the square root of the second moment of the design which minimized V+B differed only slightly from that of the design which minimized B alone, but was drastically different from that which minimized V alone. Thus they concluded that minimum bias designs come closer to minimizing V+B than do designs which minimize V alone.

However, these calculations are open to a different interpretation. It can be reasonably argued that the great difference between the design to minimize B and the design to minimize V is due largely to allowing the operability region to extend beyond the region of interest, and that their results show that if a serious possibility of bias exists, then the experimenter would be ill-advised to take any observations at all outside of the region of interest [-1, 1]. This can be viewed as a converse to the uni- versally acknowledged fact that polynomial models are badly suited for extrapolation purposes if a serious possibility of model inadequacy exists. If we accept this in-

terpretation of Box and Draper's results and constrain the operability region to the region of interest, then the comparisons of the square roots of the second moments of the designs cease to be dramatic (ranging from .58 to 1.00 instead of from .58 to oo ) and bias considerations no longer seem to be of overriding importance. We shall see in Section 5 (particularly Table 3 and discussion) that minimum bias designs, while they are an important attempt to meet realistically the problem of checking the representational adequacy of the model, may often be inappropriate, inefficient, or both.

We might also remark here that this conclusion (that bias considerations are not of overriding importance) reduces the appeal of Hader, Karson, and Manson's approach in [16] to the associated estimation problem. They suggested choosing the estimator according to the minimum bias criterion, rather than least squares.

An additional objection has been raised by Kiefer [8] to the use of integrated error rather than maximum error as a criterion. This was that unlike maximum error, integrated error is not invariant under nonlinear transformations of the independent variable, and therefore you may get a different "optimal" design depending upon your scale of measurement.

3. A NEW CRITERION

It should be evident from the preceding discussion that all of the criteria we have considered (and therefore the designs to which they lead) have serious shortcomings; specifically, they may not be appropriate to many of the problems most often faced in practice. D-optimal (and G-optimal) designs permit no check of the adequacy of the model; designs for making inferences about /3m are inefficient for more general inferences about the regression function; designs to minimize maximum or integrated mean square error depend upon the unknown regression function; and minimum bias designs can be quite inefficient, as will be shown. It is therefore desirable to find a criterion and designs which meet the following considerations:

a. The design should allow for a check of whether or not the assumed model provides an adequate fit to the true regression function.

b. If it is concluded that the model is adequate, it should be possible to make reasonably efficient inferences concern- ing that model.

c. The optimal design should not depend on unknown parameters.

The criteria which we shall now propose in an attempt to satisfy the preceding goals are compromises between previously studied criteria.

Definition 1: We shall call to a C-restricted D-optimal design for the model Pm if 40 maximizes I Mm(t) I among all designs t satisfying j Mm(s ) I < CI Mm+i( ) j . The justification for this choice of definition is based

on the fact that if jSm+i is the least-squares estimate of 13m+i for the model Pm+i, and t corresponds to an experiment run at xl, * * * , xn, then n var(i:m+i) = | Mm(t) |



314 Journal of the American Statistical Association, June 1971 * I Ml,() f-1 2. Thus this criterion says "minimize the generalized variance of the least squares estimators i3o, 013, * , A m for the model Pm subject to the constraint that var(i0m+i) ? v2C/n."

Another criterion we shall consider is described by the following definition:

Definition 2: We shall call to a C-restricted G-optimal design for the model Pm if ;o minimizes

max dm(x, ,)

among all designs t satisfying j mm I < CI MM+,() I where dm(x, t) g(x)'Mm1l(Q)g(x), and g(x)' = (1, X, * * *, ).

This choice of definition can be explained by noting that if t corresponds to an experiment run at xi, * * *, Xny then n-ld(x, {) =var(Z Z'0 ixi). Then this criterion says "'minimize the maximum variance of the best linear unbiased estimate of the regression function for the model Pm, subject to the constraint that var(,m+i) < Ca2/n." The optimal designs in the senses of both definitions 1 and 2 will actually achieve var(Am?+) = cr2C/n.

These definitions describe a class of optimal designs rather than a single design; the actual choice of design will depend upon what value is specified for the number C. To see what considerations enter into the choice of C, we note that in adopting this criterion the statistician is in effect employing the following line of reasoning: "I believe rather strongly that the model Pm is indeed the correct one for this experiment; that is, I believe the true regression function can be quite closely approximated over the region of interest [-1, 1] by a polynomial of degree m. I am, however, reluctant to use a design which allows no opportunity to check the adequacy of the model, and I would like a design that will allow me either to estimate .m+i or test hypotheses such as 1m+l =0, with some specified degree of precision."

The choice of C reflects a compromise between two conflicting goals: precise inferences about 13m+i and precise inferences about the model Pm. On the one hand, C should be chosen sufficiently small so that it will be possible to detect practically significant departures from the model with a specified precision (this requirement could be phrased in terms of the power of the test H: 13m+=O); while on the other hand, large values of C will yield more efficient designs for the model Pm.

How C can be chosen in practice will be discussed further in a later section in connection with an example. We do note now that for C= oo, the optimal design will be the D-optimal design for definition 1 and the G- optimal design for definition 2, while if C takes its minimum possible value (C cannot be arbitrarily small; a lower bound is (n/a-2). varO(m+i), where varo(Am+i) is the minimum variance attainable by any design for 4m+i), then the optimal design in both cases will be the optimal design for making inferences about 13m+i. Therefore we can think of restricted D- and U-optimal designs as compromises between these extremes.

4. FINDING OPTIMAL DESIGNS

In general it is quite difficult to find restricted D- and G-optimal designs, although a few simple observations reduce the problem to a nonlinear programming problem of manageable proportions. It would be desirable to have an analytic characterization of these designs comparable to the results of Hoel, Kiefer, Wolfowitz, Karlin, and Studden on D-optimal designs and designs for estimating /'3m, but the nature of the criterion (constrained optimiza- tion) makes it doubtful that a useful characterization will be found.

The search for optimal designs is first simplified by noting that for both criteria under consideration, we can restrict attention to measures t symmetrical about zero. This fact is proved in the appendix, and has the effect of reducing the dimensionality of the programming problem by about 75 percent.

The problem can be further simplified by restricting our attention to symmetrical measures supported by no more than m+3 points (m+2 when m is odd) which must include -1, 0, and +1. (The optimal design may put weight zero on 0 when m is even, in which case the support will be m+2 points including -1 and + 1.) The reason for this is that the design criteria depend only on the first 2m+2 moments of i, and, since we may assume the odd moments are zero, the problem reduces to a well- studied problem from the theory of moments (see [4] and [14], for example): Given the first m+1 moments of a probability distribution on [0, 1], what is the smallest number of points such that there is a distribution supported by that number of points having the prescribed moments?

These two facts together reduce the problem to a nonlinear programming problem with m variables (m+1 if m is even). These variables are the (m - 1)/2 points (m/2 if m is even) xi, i = 1, * - *, (m-1)/2 in [0, 1 ] such that the optimal t is supported by { ? xi } together with + 1, and 0; and the (m+1)/2 weights ((m+2)/2 if m is even) t(xi), i=1, - * *, (m-1)/2 and, t(0). For a few small values of m the optimal designs can be found analytically, as we will show in the next section.

5. AN EXAMPLE: SOME DESIGNS FOR LINEAR REGRESSION

We shall illustrate the preceding ideas for the standard linear regression model; that is, we shall find the C- restricted D- and G-optimal designs for the model P1 and discuss various aspects of their performance.

As we remarked in the previous section, we can restrict our attention to symmetrical designs supported by the three points -1, 0, and + 1. If we let

Ilk y ", d d, 11

0|1h 2 I - = | o | p2,




1 0 /A2

|M2(t) = 0 - 2 0 -2(&4 -JU22),

/2 0 A4

di(x, (1x) (1 X22) (D + X22

and max di(x, t) 1 + ,2-1.

-1 5xl

The restricted D-optimality criterion says "choose t to maximize A2 subject to U4 - 22? = '" while the restricted G-optimality criterion says "choose t to minimize 1 +f2-1 subject to M4-M22 ?C-1." Thus for this case, m=1, the optimal designs are identical. While this will not be true in general, it seems likely that since the optimal designs do agree for the extreme values of C(C= so and C =smallest feasible value), their performances will not differ greatly for intermediate values of C. In particular, it is to be hoped that the restricted D-optimal designs (which are somewhat simpler to calculate) perform well from the point of view of maximum variance.

The optimal designs for linear regression are given by the following theorem.

Theorem: For C04, the C-restricted D- and G-optimal design for the model P1 is given by

1 1 /1 1 t0(-1 -to1) =4 +2 4 C

1~(0)=-~- / 1 1 (5.1) t??=2 4 C/--

The theorem follows immediately from the following lemma.

Lemma: Let Z be any random variable such that P(0?< Z?< 1) = 1. Then for any constant K, O < K5 <4, var(Z) _ K implies E(Z) < +V -K.

Proof of Lemma: Since 0 <Z < 1, E(Z) > E(Z2) =var(Z) + [E(Z) ]2, and hence E(Z) - [E(Z) ]2 > var(Z) > K. Thus [E(Z)]2-E(Z) + X< 4K or [E(Z)2]2 < 1 -K, or / E(Z) - :1 </1 -K. Thus in particular, E(Z) <2+ i/4 -K, where equality is possible only when var(Z) =K and P(Z=0) =1-P(Z=1)= a-V4=-K.

Q.E.D. Lemma

The theorem then follows if we let Z denote the square of a random variable with distribution #, and K=1/C. It is clear from the proof that the design is unique.

We note that for C= oo, the design becomes to( 1) = (1) = 1/2, which is the design which maximizes |M(i |, that is, the D-optimal design for P1. For C=4

we get _L (-1) =\o(1) = 1/4, I (0() = 1/2, which is the best design for estimating 132 in the model P2. Therefore we see that as 0 varies from 4 to X0 we get compromises between these designs.

Table 1. C-RESTRICTED D- AND G-OPTIMAL DESIGNS FOR P1

C (o (-1), I 0(+1) to (0)

4.0 .250 .500

4.5 .333 .333

5.0 .362 .276

5. 5 .381 .2 39

6.0 .394 .211

7.0 .414 .173

8.0 .427 .146

9.0 .436 .127

10.0 .444 .113

15.0 .464 .072

20.0 .474 .053

25.0 .479 .042

50.0 .490 .020

co .500 .000

The question remains, how should C be chosen? As we have already mentioned, this choice should reflect both the desire for efficiency for the model P1 and the wish to check the fit of this model. As an aid in making this choice we consider the following measures of the efficiency of a design for a model.

Definition: The model Pm D-efficiency of a design t is given by

emD(t) I Mm(t) 1fm(m+ 1)

Le.I mQ)D J(5.2) lmax I Mm(00|

The model Pm G-efficiency of a design t is given by

G _~) m + 1 em.(t) - (5.3) max dm(x, ,)

-1$X;1

Thus D-efficiency is defined relative to the D-optimal design for the model, and has the interpretation that (if the errors are normally distributed) the same expected volume confidence sets for A can be achieved by using n runs with design t as will result from n emD(t) runs with the D-optimal design for model Pm. Similarly, since for the G-optimal design maxzdm(x, t) = m+ 1, G-efficiency is defined relative to the G-optimal design for the model, and has the interpretation that the same maximum variance is obtained by using n runs with design t as will result from n em G() runs with the G-optimal design for model Pm.

If we let i0 denote the C-restricted D- and G-optimal design for the model P1, then elementary calculations give us:

eoD(to) eoG(tO) = 1,

eiD(o)+

ela(~o)-(1+21/11)/(3+1,



316 Journal of the American Statistical Association, June 1 971

27 1 / 1. e2D(t) = 14C [2 4 C]

e2G(io) 3 min -- - -

1 1/1J 1 )

4 2 4 c emD(to) = emG(o) =0 for m > 2.

As illustrated by the figure, both eiD and e1G increase as C increases, that is, as to approaches the D- and G- optimal design for Pi. Both e2D and e2G reach a maximum at C = 4.5 where 4o is the D- and G-optimal design for P2.

THE MODEL P1 AND P2 EFFICIENCIES OF THE RESTRICTED D- AND G-OPTIMAL

DESIGN to FOR MODEL P2 e (0)

1.00 e

50

.2'~~~~~~~~~~~~~~~~~~~~~~~~~~

4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0

C

As an example of how we could proceed in practice, consider the following problem. We wish to design an experiment to estimate the interference effect (measured by induced voltage) of a certain power line on phone lines as a function of distance. Based on previous experi- ence with similar situations, we are willing to assume that over our range of interest the log of the induced voltage behaves approximately as a linear function of the distance from the power line, and that the errors of measurement are independent normally distributed random variables with a common variance. Let us further simplify the problem by transforming (linearly) the scale of the distance measurements (x) so that our range of interest is from x = -1 to x = 1, and by assuming that the common variance of the errors is known. The problem then becomes one of choosing n distances x1, * *, at which to measure the log of the induced voltage Yi, * * *, Yn iin order to "best" fit the data by a straight line (model P1).

In designing this experiment we must remember that the true relationship is not exactly linear; we merely

hope it can be quite well approximated by a linear function. We would like to design the experiment in such a way that if the true relationship differs significantly from a linear one we will detect the difference with high probability.

As a measure of departure from linearity, we could consider either the Box-Draper measure of integrated squared bias relative to variance

1 C1 B =-J [El,(x) - f(x) ]2dx

or the maximum bias relative to variance

B' = max EJ(x) - f(x) I /a2 -1-.<

wvhere f(x) = /n?i1x. If f(x) is quadratic, then

B 2 u, + and

032 B = i- Inax ,U2, -I 2},

where ,i2=fx2dS Since for a fixed design, both measures are increasing

functions of 1321 /, it seems reasonable to take 1321/cI as a measure of quadratic departure from linearity.

For the case under consideration we might define "does not differ significantly from a linear relationship" to mean that 12 1< .5o-, where f2 iS the coefficient of x2 in the quadratic which best approximates the true relationship. The number .5 was chosen since, for the purposes of prediction to which the model will be put, the presence of a quadratic term with 1321 <.5o would be largely "washed out" by the random error.

If we decide to follow the procedure "test the hypothesis Ho: /2=0 versus the alternative H1:,/2 $0 at the significance level a=.20 with power of .75 at the alternative 1/21 =.5o; if we accept the hypothesis, fit model P1; if we reject the hypothesis, fit model P2," then we might state our design criterion as "maximize elD(Q) subject to P(02)>?.75 for 1 221 >.5o-," where P(/32) is the power of uniformly most powerful unbiased .20 level test of the hypothesis 32= 0. This criterion is met by a restricted D-optimal design. In fact, we find that

p(02) = 1 - <1.2816 -

/ /~~n 2\ + <-1.2816- -

and C can be found for a given n by solving the equation

p(.5a) = .75.

We fiind that we mnust take

C = (0.06534)n,

the optimal design can then be found from the earlier




Table 2. THE MODEL P1 AND P2 EFFICIENCIES OF THE RESTRICTED D- AND G-OPTIMAL

DESIGN to FOR MODEL P2

n c to(0) 1) () eD(0) e2(0) e(0) e2(0)

62 4.05 .444 .278 .746 .975 .715 .834

65 4.25 .379 .310 .788 .995 .766 .931

70 4.57 .323 .339 .823 1.000 .807 .969

75 4.90 .286 .357 .845 .995 .833 .857

80 5.23 .258 .371 .862 .986 .852 .773

90 5.88 .217 .391 .885 .965 .878 .652

100 6.53 .189 .406 .901 .943 .896 .566

110 7.19 .167 .416 .913 .921 .909 .501

120 7.84 .150 .425 .922 .901 .919 .450

130 8.49 .136 .432 .929 .882 .927 .409

140 9.15 .125 .438 .935 .864 .933 .375

150 9.80 .115 .442 .941 .848 .939 .346

200 13.07 .083 .458 .957 .779 .956 .250

300 19.60 .054 .473 .973 .688 .972 .162

400 26.14 .040 .480 .980 .628 .980 .120

500 32.67 .032 .484 .984 .585 .984 .095

1000 65.34 .016 .492 .992 .467 .992 .047

theorem. We recall that no feasible design exists uinless C>4, which means the criterion cannot be met unless n ?62. If o2 were unknown, then C could be found from tables of the noncentral t-distribution. Table 2 gives the optimal designs and their efficiencies for different choices of n.

It is interesting to compare the efficiency of the restricted D- and G-optimal design 4o for a particular value of n, say n = 100, with that of some other designs which have been suggested. Consider:

1. The D- and G-optimal design for P2, given by 2( -1) =, (0) = b (1) = 1/3. 2. The best design for estimating g2, given by t2( -1)

- 2(1) =1/4, t2(0) =1/2. 3. The "equal spacing" design t3 which takes one observa-

tion at each of 100 equally spaced points (this is the design which minimizes bias of all degrees).

4. The design given by t4(-1) =t4(1) = 1/6, t4(0) =2/3 (this design maximizes the power of the goodness of fit test for quadratic alternatives, subject to minimizing quadratic bias).

Table 3 gives the efficiencies as defined by (5.2) and (5.3) for the designs described above, together with the power p(.5cT) at the alternative 1 321 = .5o of the uniformly most powerful unbiased test of the hypothesis /2 = 0. Thus, for example, the relative model P1 D-efficiency of the designs {i and tj can be found as the ratio elD,(t)/ eiD(%i), in the sense that in order to achieve equivalent performance (from the point of view of generalized variance) from the designs ti and tj, it is necessary to take e1l (ti)l/eiD() measurements using tj for every one measurement using t.

We see from Table 3 that the design Al, which takes 40 measurements at -1, 19 at 0, and 41 at +1, seems to

Table 3. THE EFFICIENCIES OF VARIOUS DESIGNS FOR LINEAR REGRESSION

eD (E) eG (E) e2D(E) e2G(t) P ( . 517)

.90 .90 .94 .57 .75

.82 .80 1.00 1.00 .86

&2 ..71 .67 .94 .75 .89

&3 .58 .50 .59 .33 .58

&4 ..58 .50 .79 .50 .86

perform very well. With respect to D- and G-efficiency for the model P1, it is considerably superior to all of its competitors. In fact, in order for the minimum bias designs t3 and 4 to achieve the same maximum variance as to for linear regression, 100 ejG(0o)/ejG(4s3)= 180 measurements would be required. In addition, should it be necessary to fit a quadratic to the data, to performs quite credibly (excellently from the standpoint of D-efficiency). While it is true that most of the other designs (Q2 in particular) will give higher power for testing Ho: =20, it would seem that the principal problem of concern is one of inference about the regression function. Therefore we would not wish to sacrifice too much efficiency for the sake of power, especially since in specifying C for our design we agreed that p(.5cT) = .75 was sufficient.

We also note that, depending upon the cost of experi- mentation and the degree of confidence in the assume1 model, it might prove unwise to follow the advice of Kendall and Stuart [7, p. 161] and Kussmaul [13] that the experimenter use the D-optimal design for the high- est degree model he will admit is possible. The restricted D-optimal design is more efficient for the assumed model, provides an adequate check of that model, and is reasonably efficient if a higher degree model is found necessary.

6. REMARKS

The design criteria we have introduced here, restricted D- and G-optimality, are attempts to provide a basis for choosing a design for polyniomial regression which will permit efficient inferenices to be made about the assumed model while still allowing the model to be checked. It could be argued that the designs considered do not really provide an adequate opportunity to check for departures from the assumed model. For example, the restricted D- and G-optimal design for linear regression is concentrated at three points and can be useful for determining the presence of a quadratic term in the regression function, but it is no good at all for estimating cubic or higher-order coefficients.

To answer this criticism we note that the idea upon which the criteria are based can be easily extended. If we are dealing with a situation where it is felt that the model Pm should do, but we wish to protect against models Pm+i, Prn2, . * * P , , then appropriate criteria wrould be: "maximize |M,b(O or minimize max d7,,(x, t)



318 Journal of the American Statistical Association, June 1971 subject to the constraints I Mm+i() I < cil Mm+i+i(w) I| i = 0, 1, * *, /-I1." The constants C; could be chosen to reflect the degree of protection desired. Computation of the optimal design would be a tedious but feasible nonlinear programming problem.

It should also be noted that the principle upon which our criteria are based could be easily applied to problems other than polynomial regression. It would be particularly appropriate for regression with more than one independent variable (response surfaces) or for regression on functions other than powers of x, such as ortho- gonal polynomials or trigonometric functions.

APPENDIX

The fact that there are symmetrical restricted D- and G-optimal designs follows from the arguments Kiefer used to prove that there is an invariant D-optimal design [8].

Let us denote by V(C) the class of all designs t satisfying the constraint I Mm(t) I 9 C| Mmt+(,) j. We first show that V(C) is convex. Let t, and (a be in V(C), and 0 <X<l1 Let t*=X\1+(1-X)2. Denote by P the (m+2) dimensional column vector given by P' = (0, 0, . * *, 0, 1). Then V(C) is just the class of all designs t with P'M-1m+i()P < C. Now as Kiefer noticed, if A and B are any symmetric positive definite matrices, (XA+ (1-X)B)-1 < XA-1+ (1-X)B-1, where we write A <B to mean B-A is semipositive definite. Then taking A = Mm?i(t;), B = Mm+1(Q2), we have, since Mm.+(Q*) = XMm+i(l ) + (1- X)Mmpl Q2) y

P'M-lm,+1(Q*)P ? P'(XM-1f+1(6i) + (1 -,X)M-1e1Q2))P -

XP'Mem+1(6i)P + (1 - X)P'M'1M+(12)P

<C.

Thus V(C) is convex. Next, we show that the sets of restricted D- and G-

optimal designs are convex subsets of V(C). First let t, and t2 be restricted D-optimal designs, t* = Xti + (1- X)42, and 0 < X < 1. Then it follows by diagonalizing Mm(.%) and Mm(.2) (see [8] p. 283) that I Mm(.*) | >|Mm(i) j. Since t, is restricted D-optimal and * E V(C), we must have j Mm(i*)n = Mm(i)j, and * is restricted D-optimal also.

Secondly, if t, and 62 are restricted G-optimal and Xt l + (1- X) 2, then by the previously mentioned fact

that Q-1(t*) = XMm-(i) + (1 -X)Mm-1( 2), we see that dm(x, X*)=cdm(x, *) +(1- X)dm(x, 2) for all -1 <x < 1. Thus

max dm(x, t*) < X max dm(x, 1) 1 <X:! < _1<r 1 S

+ (1- max dm(X, t2) -1 <X<l

= max dm(x, , -1<a:<1

Since tj is restricted G-optimal and t*CV(C), we must have max dm(x, t*) =max dm(x, S), and t* is restricted G-optimal, too.

Since restricted D- and G-optimal designs are convex sets, and since if S, is optimal in either sense and we define {e by 42(x) = 1(-x), it follows that there is a symmetrical optimal design, namely (Q1+ 2)/2.

REFERENCES

[1] Box, G. E. P., and Draper, N. R., "A Basis for the Selection of a Response Surface Design," Journal of the American Statistical Association, 54 (September 1959), 622-54.

[2] , 'The Choice of a Second Order Rotatable Design," Biometrika, 50 (December, 1963), 335-52.

[3] Hoel, P. G., "Efficiency Problems in Polynomial Estima- tion," The Annals of Mathematical Statistics, 29, (1958), 1134-45.

[4] Karlin, S., and Shapley, L. S., Geometry of Moment Spaces, No. 12 of Memoirs of the American Mathematical Society, 1953.

[5] Karlin, S., and Studden, W. J., "Optimal Experimental Designs," The Annals of Mathematical Statistics, 37, (August, 1966), 783-815.

[6] , Tchebycheff Systems: with Applications in Analysis and Statistics, New York: Wiley Interscience, 1966.

[7] Kendall, M. G., and Stuart, A., The Advanced Theory of Statistics, Vol. 3, New York: Hafner Publishing Co., 1968.

[8] Kiefer, J., "Optimum Experimental Designs," Journal of the Royal Statistical Society, Ser. B, 21 (No. 2, 1959), 273-319.

[9] , "Optimum Designs in, Regression Problems, II," The Annals of Mathematical Statistics, 32, (March, 1961), 298-325.

[10] I "Two More Criteria Equivalent to D-Optimality of Designs," The Annals of Mathematical Statistics, 33 (June, 1962), 792-6.

[11] , and Wolfowitz, J., "Optimum Designs in Regression Problems," The Annals of Mathematical Statistics, 30 (June, 1959), 271-92.

[12] , "The Equivalence of Two Extremum Problems," Canadian Journal of Mathematics, 12 (No. 3, 1960), 363-6.

[13] Kussmaul, K., "Protection Against Assuming the Wrong Degree in Polynomial Regression," Technometrics, 11 (Nov- ember, 1969), 677-82.

[14] Shohat, J. A., and Tamarkin, J. D., The Problem of Moments, New York: American Mathematical Society Surveys, No. 1, 1943.

[15] Smith, K., "On the Standard Deviations and Interpolated Values of an Observed Polynomial Function and Its Con- stants and the Guidance They Give Towards a Proper Choice of the Distribution of Observations," Biometrika, 12 (November, 1918), 1-85.

[16] Hader, R., J., Karson, M. J. and Manson, A. R., "Minimum Bias Estimation and Experimental Design for Response Surfaces," Technometrics, 11 (August, 1969), 461-75.



Documents

Optimal Experimental Design for Polynomial Regression