2
Statistics and Computing 5 (1995) 91-92 Comments on J. A. Nelder 'The statistics of linear models." back to basics' JOHN GOWER Department of Statistics, Open University, Milton Keynes, MK7 6AA, UK Submitted December 94 accepted January 95 I am totally convinced by the arguments given in John Nelder's paper and hope that others will be too. I particu- larly like the way that the issues are discussed in terms of simple models and I shall continue in the same vein. Firstly, I think the issue of constraints might be even more simply demonstrated in terms of the model l]i : E(yi) = # + O~i from which it is immediately obvious that # and ai cannot be separately estimated. Nevertheless, ai - aj may be esti- mated to give (ai=aj) = Yi --Yj, where the 'hat' is placed centrally to emphasize that the parameters are not estimated separately. We have that E(cti~ozj)-= OL i --OZj and, under the usual independence assumptions, Var(ai=aj) = 2~r2. Not a word about constraints; they are irrelevant. If, for some good reason, we wish to impose linear constraints on the parameters, then this is easy to do. Writing the model in vector form n = e(y) = + to which we add the constraints ate -- 0, then the param- eters are all estimable and are given by fltc t~=]~c and &=y-~l. Thus, &i-&j = (ai=~j) = Yi-Yj with the same expecta- tion and variance as without the constraint. But now, after a little algebra, we have that , 2c ] cc 2 Vard~i = 1 -t (1,c)2 l'c] cr so that when e = 1 (the usual symmetric constraint) Var&i = [1- 1/n]a 2 and when e=el (as in GLIM) Var&1 2~r2 for i r 1 and, not unnaturally, Var&l = 0. Thus the constraints affect the variances of the param- eters. These results are perfectly valid if one wishes to fit 0960-3174 1995 Chapman & Hall the constrained models, but then one must have a very good reason to do so. In nearly all practical situations, good reasons are not forthcoming and even if there were good reason for imposing constraints, there seems little reason to expect the conventional ones to be those required. The constraints adopted for solving the normal equations are merely a convenience that allow explicit algebraic expression and, more importantly, the avoidance of inverting singular matrices. Only contrasts are estimable, and these and their variances are invariant to the particular choice of linear constraint; to require estimates of variances of individual ai when the parameters themselves are not estimable is beyond comprehension. Whether or not the oL i are instances of random variables does not make a lot of difference to the variances of contrasts. In my view the confusion has arisen, at least in part, because of insuffi- cient distinction between substantive constraints and identi- fiability constraints; it is a pity that the word constraint is often used without qualification. Identification constraints are more in the nature of a (re)parametrization of the model, as can be seen most clearly in the GLIM form where the redundant parameters disappear. Secondly, I have some comments on marginality. In the linear case John Nelder's arguments are very persuasive, but with non-linearity things seem less clear. The most simple model with multiplicative interaction is which has a similar degree of unidentifiability to the corresponding additive model, usually handled by simi- lar identifiability constraints c~i. =/3.j = 7,. = 6.j = 0. Nothing new comes in here and the marginality principle applies to the desirability of including main effects with the interaction. There is a small additional lack of identifi- ability of the scaling of the multiplicative parameters inherent in the identity ~/i6j = (Tr~/i)(1/Tr)6j) for arbitrary 7r but this causes little difficulty. When further multiplicative interaction terms are

Comments on J. A. Nelder ‘The statistics of linear models: back to basics’

Embed Size (px)

Citation preview

Page 1: Comments on J. A. Nelder ‘The statistics of linear models: back to basics’

Sta t i s t ics a n d C o m p u t i n g 5 (1995) 9 1 - 9 2

Comments on J. A. Nelder 'The statistics of

linear models." back to basics'

J O H N G O W E R

Department of Statistics, Open University, Milton Keynes, MK7 6AA, UK

Submitted December 94 accepted January 95

I am totally convinced by the arguments given in John Nelder's paper and hope that others will be too. I particu- larly like the way that the issues are discussed in terms of simple models and I shall continue in the same vein.

Firstly, I think the issue of constraints might be even more simply demonstrated in terms of the model

l]i : E(yi) = # + O~ i

from which it is immediately obvious that # and ai cannot be separately estimated. Nevertheless, ai - aj may be esti- mated to give (ai=aj) = Yi --Yj, where the 'hat' is placed centrally to emphasize that the parameters are not estimated separately. We have t h a t E(cti~ozj)-= OL i --OZj and, under the usual independence assumptions, Var (a i=a j ) = 2~r 2. Not a word about constraints; they are irrelevant.

If, for some good reason, we wish to impose linear constraints on the parameters, then this is easy to do. Writing the model in vector form

n = e ( y ) = +

to which we add the constraints a t e -- 0, then the param- eters are all estimable and are given by

fltc t ~ = ] ~ c and & = y - ~ l .

Thus, & i - & j = (ai=~j) = Y i - Y j with the same expecta- tion and variance as without the constraint. But now, after a little algebra, we have that

, 2c ] c c 2 Vard~i = 1 -t (1,c)2 l 'c] cr

so that when e = 1 (the usual symmetric constraint) Var&i = [ 1 - 1/n]a 2 and when e = e l (as in GLIM) Var&1 2~r 2 for i r 1 and, not unnaturally, Var&l = 0. Thus the constraints affect the variances of the param- eters. These results are perfectly valid if one wishes to f i t 0960-3174 �9 1995 Chapman & Hall

the constrained models, but then one must have a very good reason to do so. In nearly all practical situations, good reasons are not forthcoming and even if there were good reason for imposing constraints, there seems little reason to expect the conventional ones to be those required. The constraints adopted for solving the normal equations are merely a convenience that allow explicit algebraic expression and, more importantly, the avoidance of inverting singular matrices. Only contrasts are estimable, and these and their variances are invariant to the particular choice of linear constraint; to require estimates of variances of individual ai when the parameters themselves are not estimable is beyond comprehension. Whether or not the oL i are instances of random variables does not make a lot of difference to the variances of contrasts. In my view the confusion has arisen, at least in part, because of insuffi- cient distinction between substantive constraints and identi- fiability constraints; it is a pity that the word constraint is often used without qualification. Identification constraints are more in the nature of a (re)parametrization of the model, as can be seen most clearly in the GLIM form where the redundant parameters disappear.

Secondly, I have some comments on marginality. In the linear case John Nelder's arguments are very persuasive, but with non-linearity things seem less clear. The most simple model with multiplicative interaction is

which has a similar degree of unidentifiability to the corresponding additive model, usually handled by simi- lar identifiability constraints c~i. =/3 . j = 7,. = 6.j = 0. Nothing new comes in here and the marginality principle applies to the desirability of including main effects with the interaction. There is a small additional lack of identifi- ability of the scaling of the multiplicative parameters inherent in the identity ~/i6j = (Tr~/i)(1/Tr)6j) for arbitrary 7r but this causes little difficulty.

When further multiplicative interaction terms are

Page 2: Comments on J. A. Nelder ‘The statistics of linear models: back to basics’

92 Gower

included, the model becomes

R

r = l

where the interaction terms may be collected into a rank R matrix R which may be parametrized in its singular value decomposition form UFR V' where U and V are ortho- normal and/~R is zero off its diagonal. Marginality con- siderations remain valid. Suppose R = P Q ' is any decomposition of R; then unidentifiability may be expressed by an arbitrary R • R orthogonal matrix H encapsulated in R = (PH)(H'Q' ) . The singular value decomposition form is merely a useful way of handling the calculations, especially when one is interested in increasing R one dimension at a time to fit a sequence of nested models. But surely nobody would regard the ortho- normal identifiability constraints as being paralleled by substantive constraints on the actual parameters that should be taken into account when determining the variances of their estimates. However, the expression R = (PH)(H'Q' ) shows that the row parameters P H and the column parameters QH are invariant to rotations in R-dimensional Euclidean space and hence eligible to distance (the multidimensional form of o~i• inner- product and confidence region interpretations. Even row/ column distances have some meaning, provided care is taken over the distribution of the scale parameters /'R between the two sets. These are the parameters of statis- tical interest, not the artificially constrained forms of the singular value decomposition; yet with linear models, similar artificial constraints of no statistical importance have been treated seriously.

A sub-model of interest in the biadditive family (as I prefer to term it) is that for Tukey's one-degree-of-freedom for non-additivity, which may be written:

expressing that the interaction is proportional to the

product of the main effects. The usual identifiability con- straints apply but the most simple way of reparametrizing this model is to write:

. , j = e (yv )

= . + < + 9j + ~ , g j = ( . - ~0 + ( ~ i + 1)(/~j + 1 )

o r

where the parameters have been redefined and are all estimable, apart from the relative scalings of the products, without requiring identifiability constraints. The reparame- trization has simplified things but shows that the original parametrization of the Tukey model unexpectedly con- tains the seeds of a violation of the marginality principle.

If we now examine the most simple triple product model

?]ijk ~-- E(Yijk) = OLiflj "~k,

which represents a three-factor interaction with neither main effects nor two-factor interactions, the first thing to note is that on a logarithmic scale this is a simple main effects model--no marginality problems there. If we do not transform, the usual considerations of reparametrizing

$ to oq = a~ + A, flj = flj + B and % = "y~ + C show that a reparameterized form of the model induces main effects and two-product interaction terms. However, if the origi- nal form of the model already contained an interaction Oir (say), then the induced terms are of the form Oir + aiflj which does not simplify to a single multipli- cative term. These examples suggest to me that the margin- ality principle rests rather strongly on the additivity assumption and needs some kind of modification if it is to be extended to biadditive and other classes of non-linear model. I hope that there is some simple explanation of the difficulties I perceive, or that a simple generalization of the principle can be found. My intuition is that margin- ality is a fundamental concept that should not be too much influenced by the type of model under consideration.