Upload
laurie-m
View
212
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [University North Carolina - Chapel Hill]On: 11 November 2014, At: 06:55Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office:Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics - Theory andMethodsPublication details, including instructions for authors and subscriptioninformation:http://www.tandfonline.com/loi/lsta20
A characterization of the independence -distribution - preserving covariance structurefor the multivariate maximum squared - radiistatisticDean M. Young a , John W. Seaman Jr a & Laurie M. Meaux ba Department of Information Systems , Baylor University , Waco, TX,76798-8005b Department of Mathematical Sciences , University of Arkansas ,Fayetteville, AR, 72701Published online: 27 Jun 2007.
To cite this article: Dean M. Young , John W. Seaman Jr & Laurie M. Meaux (1992) A characterizationof the independence - distribution - preserving covariance structure for the multivariate maximumsquared - radii statistic, Communications in Statistics - Theory and Methods, 21:6, 1605-1613, DOI:10.1080/03610929208830867
To link to this article: http://dx.doi.org/10.1080/03610929208830867
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)contained in the publications on our platform. However, Taylor & Francis, our agents, and ourlicensors make no representations or warranties whatsoever as to the accuracy, completeness, orsuitability for any purpose of the Content. Any opinions and views expressed in this publicationare the opinions and views of the authors, and are not the views of or endorsed by Taylor &Francis. The accuracy of the Content should not be relied upon and should be independentlyverified with primary sources of information. Taylor and Francis shall not be liable for any losses,actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out ofthe use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantialor systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, ordistribution in any form to anyone is expressly forbidden. Terms & Conditions of access and usecan be found at http://www.tandfonline.com/page/terms-and-conditions
COMMUN, STATIST.-THEORY METH., 21(6), 1605-1613 (1992 )
A CHARACTERIZATION OF THE 1NDEPEM)ENCE-DISTRIBUTION- PRESERVING COVARIANCE STRUCTURE FOR THE MULTIVARIATE
MAXIMUM SQUARED-RADII STATISTIC
Dean M. Young and John W. Seaman, Jr. Laurie M. Meaux
Department of Information Systems Department of Mathematical Sciences Baylor University University of Arkansas
Waco, TX 76798-8005 Fayetteville, AR 72701
Key Words and Phrases: dependent observation vectors; independence- distribution-preserving covariance structures; multivariate outlier detection; matrix-normal random matrices.
ABSTRACT Necessary and sufficient conditions on the observation covariance structure
and on the set of linear transformations are given for which the distribution of the multivariate maximum squared-radii statistic for detecting a single multivariate outlier is invariant from the distribution assuming the usual independence covariance structure. Thus, we extend the work of Baksalary and Puntanen (1990), who have given necessary and sufficient conditions for an independence- distribution-preserving covariance structure for Grubbs' statistic for detecting a univariate outlier. We also extend the work of Marco, Young, and Turner (1987) and Pavur and Young (1991), who have given sufficient conditions for an independence-distribution-preserving dependency structure for the multivariate squared-radii statistic.
1. INTRODUCTION Research concerning the effect of the violation of the usual independence
assumption for sampled observations has been ongoing for the past thirty years.
Copyright O 1992 by Marcel Dekker, Inc.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
1606 YOUNG, SEAMAN, AND MEAUX
Many papers, such as Walsh (1947), Tubbs (1980a, 1980b), Smith and Lewis (1980), Lawoko and McLachlan (1982), and Pavur and Davenport (1985) have demonstrated that ignoring requisite dependence structures for statistics may result in severe consequences. Nevertheless, in some cases dependence structures may be changed without &himental effects to sampling distributions. Thus, it is of interest to study dependency structures for which the sampling distribution of a
statistic is the same as that under assumptions of independence. We shall refer to such dependency structures as independence distribution preserving (IDP).
IDP covariance structures have been devised for many different statistics. Faleschini (1956), Bhat (1962), and Baldessari (1965, 1966) have derived IDP covariance structures for tests in univariate analysis of variance. Gallo (1967) has derived IDP covariance structures for analysis of covariance, and Baldessari (1968) has studied IDP structures for balanced incomplete block designs. Turner, Young, and Marco (1986) have presented an IDP structure for the Lilliefors' test statistic which is used to detect univariate nonnormality. Srivastava (1980), Young, Pavur, and Marco (1989), and Baksalary and Puntanen (1990) have considered IDP structures for Grubbs' statistic for detecting a univariate outlier. Also, Tranquilli and Baldessari (1988) have described an IDP structure for the statistic used to test for model utility of a multiple regression model and Jeyaramam (1982) has given sufficient conditions for an IDP covariance structure for testing linear hypotheses concerning the parameters of a general hear model.
D P structures for statistics formed from multivariate observations have also been studied. P a w (1987) has obtained an IDP covariance structure for a test statistic used in multivariate analysis of variance. Marco, Young, and Turner (1987) and Pavur and Young (1991) have given sufficient conditions for an IDP covariance structure for the multivariate squared-radii statistic, which is utilized in detecting a single multivariate outlier.
In this paper we present a complete solution to the problem of robusmess of the power of the multivariate maximum squmd-radii statistic against the effect of
correlation and unequal covariance matrices in the following manner. We completely characterize the set of linear transformations on the matrix of observations and observation covariance structures such that the distribution of the maximum squaml-radii statistic is identical to its distribution (except possibly for a scalar multiplier) under the assumption of independent observations. Hence, this paper extends the work of Baksalary and Puntanen (1990) to the case of
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
INDEPENDENCE-DISTRIBUTION-PRESERVING COVARIANCE 1607
multivariate dependent observations in the single outlier-detection setting. The paper also generalizes the work of Marco et al. (1987) and P a w and Young (1991) in that we demonstrate that the IDP dependency structure for the multivariate squared-radii statistic given in P a w and Young (1991) is not only sufficient, but also necessary.
2. MATHEMATICAL PRELIMINARIES In this section we &fine notation utilized throughout the remainder of the
paper, and we present two lemmas which are used to derive the main results. Given n @ x 1) vectors ml,m2, ...,m,,, denote by vec(m) the np x 1 vector formed
by the vertical concatenation of the mi's. Let Y, = [Y,,Y,, ..., Y,]' represent an n x p random matrix where each Yi is a p x 1 random vector. The random matrix Y is
said to have a matrix normal distribution with mean = [cL,,&,...,~J' and
covariance ma& V,,,, if vec(Y') has a multivariate normal distribution with
mean vec(pf) and covariance matrix V. We shall write Y- MNWV). Throughout the remainder of the paper we shall denote the k x k identity
matrix by Ik and we shall let M ¬e an n x n matrix of ones, J denote an n x 1
vector of ones, and E be an n x 1 vector of n - 1 zeros and a one in the kh element where 1 5 k 5 n. We shall also let 6 denote an arbitrary p x 1 vector, Z be a p x p symmetric, positive definite matrix, J&, be a p x 1 vector, and 6 be a p x 1 vector
not equal to p,,. For any two matrices A and B, denote by A 8 B the Kronecker
matrix product defined by (aiiB), as given in Anderson (1984, p. 599).
The problem of detecting a single outlier from a multivariate normal population may be formulated as an hypothesis testing problem as follows. Let X1,X2, ...,%, be a random sample of n p-dimensional observation vectors. The null
hypothesis is that the group of sampled observations are a random sample from a multivariate normal population with mean & and covariance ma& Z. The
alternative hypothesis is that the expected value of a single observation is not equal to p,,. Thus, we may expms the multivariate outlier detection problem as testing
the hypotheses
W,: X - MNNJ 8 &), c& 8 Zl and
H,: X - M N [ ( J @ & + E ~ ~ ' ) , C ~ I , ~ Z ] .
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
1608 YOUNG, SEAMAN, AND MEAUX
Two test statistics are commonly used to test for the presence of a single multivariate outlier from a multivariate normal population. The fmt statistic is defined as T:, = max T: with
T: = (X, - % f ~ f & - Z)[(n - l)(n - 2)/n]
1 ' where gi = ~6 Xj and A, = f. (% - %f(xj - 4). The second statistic,
j+i
sometimes called the multivariate maximum squared-radii statistic, is DL = max D: where
D: = (xi - X jA"(x, - X)
with % = Xjn and A = 2 (X, - %fa - %). However, the TL and DL i=1 i-1
statistics are equivalent since one can be written as a monotone function of the other. If we let
B = I,, - (l/n)M, (2.1) then, clearly, both T:, and D:, depend solely on the distribution of
BX=[X,-XIx,-XI...I%-Xj . (2.2)
We shall need the following two lemmas for the proof of our main result.
Lemma 1. Let X - MN[(J 8 & + E 8 6'), c& 8 XI where c > 0 and let B be
defined as in (2.1). Then BX - MN[(E - (l/n)J) Q 6'1, cB Q XI.
Proof. From Barra (1981, p. 11 1) we have that
E@X) = (I,, - (l/n)M)(J @ CL) + E 8 6')
= (E - (l/n)J) 8 6'
and
Cov@X) = 1% - (l/n)M) 8 $I(c& Q WI,, - (l/n)M) 8 6] = c(I, - (l/n)M) 8 C
=cB 82. (2.4)
The following lemma, which is well known, concerns solutions of a matrix equation and may be found in Lewis and Odell(197 1, p. 10).
,
Lemma 2. Let A. B, C, and D be real matrices such that the matrix equation ACB = D is defined and A, B, D an hown but C is unknown. Then the matrix
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
INDEPENDENCE-DISTRIBUTION-PRESERVING COVARIANCE 1609
equation ACB = D has a solution if and only if AA+ DB'B = D and the general form of the solution is C = A'DB' + E - AtAEBB+ where E is an arbitrary manix and At, Bt, are the Moore-Penrose inverses of the matrices A and B, respectively.
In the next section we shall use the ideas, notation, and lemmas provided in this section to prove the main result of this paper.
3. THE MAIN RESULT In the following theorem we provide a characterization of the dependency
structures and of the set of linear transformations for which the distribution of the multivariate maximum squared-radii statistic is identical to the distribution assuming the usual independence covariance suucture for the observations.
Theorem. Let X - MN[(J 8 6 + E 8 6'1, V] where V is a symmetric, positive
definite matrix. Let T be an arbitrary n x n matrix and let B be defined as in (2.1). Then ~~-MN[(E-(l/n)J)86',k~8~]ifandonl~if~=~and
V = c(I,,- ( l l n ) ~ ) @c+(M @$)H+H'W 8 $) (3.1) where c > 0 and H is an np x np matrix such that rank(H) I p and V is positive
definite.
Proof. We shall f i t prove the necessity portion of the theorem. Let T be an n x n matrix. From expression (2.3) we must have that T(J 8 A + E 8 6') = (E - (l/n)J) 8 6' and
(I - T)(E 8 6') = (1In)J 8 6'. (3.2) Thus, it follows that TJ @ = 0, and hence that TJ = 0. From Lemma 2 it follows that T = KB where K is an arbitrary n x n matrix. Hence, we have that KBJ 8 6 = 0 where J.~o is an arbitrary p x 1 vector. Substituting into (3.2), we
have that (I - KB)(E 8 6') = (1ln)J 8 6', which implies that
[E - (l/n)Jl8 6' = (KB)(E O 6')
= [KO - (l/n)M)I(E 8 6') = lK(E - (l/n)J)IO d,
which, in turn, implies that E - (l/n)J = K(E - (11n)J). (3.3)
Equation (3.3) may be reexpressed as (K - I)@ - (1In)J) = 0. It follows that K - I = 0 or that K = I, and hence T = B. From (2.4) we see that the distribution of
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
1610 YOUNG, SEAMAN, AND MEAUX
TX, or equivalently BX, wiU be identical (except for a constant multiple) to the distribution of BX under the usual independence structure if
(B@$)V(B8I,,) = k B 8 C (3.4)
where k > 0. Solving (3.4) for V, we have, by Lemma 2, that V=(kB O Z ) + Q (3.5)
where Q = U - (B 8 $)U(B Q IJ and U is an arbitrary np x np matrix. Thtn, it
Following Pavur (1987), we may express Q as
Q = (l/n)CM 8 $)U[& - (1/2n)(M @$)I+[&, - (1/2n)(M Q ?JU(l/n)(M @ $)I. Letting W = U[& - (1/2n)(M O 91 and using the fact that (l/n)(M 8 $) is
idempotent, we have
Q = (M 8 IJ[( l /n2)(~ 8 $)w] + [ ~ ' ( l / n 2 ) ( ~ 8 IJ]m 8 IJ. Finally, letting H = (llnz)(M 8 $)W, we have
Q=(M@JJH+H'(M@$) (3.6)
where rank@) 5 p. The result follows when we let k = c and substitute the right- hand side of equation (3.6) for Q into equation (3.5).
The sufficiency portion of the proof follows directly from the substitution of the matrix V, d e F i (3.1), into the quantity (B 8 $)V(B 8 $) and simplifying,
from letting c = k, and from Lemma 1.
We note that given c, conditions on H for which the ma& V is positive
definite are given in Pavur and Young (1991).
4. COMMENTS We have provided a multivariate extension to the work of Baksalary and
Puntanen (1990), who have provided necessary and sufficient conditions for an IDP covariance structure and linear msfonnation for Grubbs' univariate outlier statistic. We have also provided an extension of the work by Marco er al. (1987) and Pavur and Young (1991), who have provided sufficient conditions for an IDP covariance structure for the multivariate squared-radii statistics for detecting a single multivariate outlier. Finally, we note that the dependency structure V, defined in (3. I), will be IDP for any statistic which is a function of the matrix BX,
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
INDEPENDENCE-DISTRIBUTION-PRESERVING COVARIANCE 1611
defined in (2.2). Some examples of statistics which are functions of the matrix BX are the test statistics for detecting multivariate nonnonnality developed by Hawkins (1981), Fattorini (1982), and Paulson, Roohan, and Sullo (1987).
REFERENCES
Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
Baksalary, J.K. and Puntanen, S. (1990). A complete solution to the problem of robustness of Grubbs' test. Canad. Statist. 18,285-287.
Baldessari, B. (1965). Remarques sur les echantillons Gaussien. Publications de l'lnstitut de Statistquies de J'Universite de Paris XIV, 4,393-405.
Baldessari, B. (1966). Analysis of variance of dependent data. Statistica XXVI, 895-903.
Baldessari, B. (1968). The incomplete blocks design of dependent data. Giorn. Istit. Ital. Art. XM(I, 144-151.
Bana, J.R. (1981). Mathematical Basis of Statistics. Academic Press: New York.
Bhat, B.R. (1962). On the distribution of certain quadratic fonns in normal variates. J. Roy. Statist. Soc. B 24, 148-151.
Faleschini, L. (1956). Analisi &Ua varianze & fenomeni correlativi. Pontificia Academia Scientiarum Commentationes XVI, 7,315-343.
Fattorini, L. (1982). Assessing multivariate normality on beta plots. Statistica 42,25 1-257.
Gallo, F. (1967). Covariance analysis for dependent data, Statistica XXVII, 83- 92.
Hawkins, D.M. (1981). A new test for multivariate normality and homoscedas- ticity. Technumenics 23,105-1 10.
Jeyaratnam, S. (1982). A sufficient condition on the covariance matrix for F tests in linear models to be valid. Biomem'ka 69,679-680.
Lawoko, C.R.O. and McLachlan, G.J. (1982). Some asymptotic results on the effect of autocorrelation on the error rates of the sample linear discriminant function. Pattern Recognition 16,119-121.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
1612 YOUNG, SEAMAN, AND MEAUX
Lewis, T.O. and Odell, P.L. (1971). Estimation in Linear Models. Prentice-Hall, Inc., New Jersey.
Marco, V.R., Young, D.M., and Turner, D.W. (1987). A note on the effect of simple equicorrelation in detecting a spurious multivariate observation. Comm. Statist. A16.1027-1036.
Paulson, A. S., Roohan, P, and Sullo, P. (1987). Some empirical distribution function tests for multivariate normality. J. Statist. Comput. Simul. 28, 15- 30.
Pavur, R.J. (1987). Distribution of multivariate quadratic forms under certain covariance structures. Canad. J. Statist. 15,169-176.
Pavur, R.J. and Davenport, J.M. (1985). The (large) effect of (small) correlations in ANOVA and correction procedws. h e r . J. Math. Management. Sci. 577-92.
Pavur, R.J. and Young, D.M. (1991). Conditions for the invariance for the multivariate versions of Gmbbs' test and Bartlett's test under a general dependency structure. Mem'ka 38,83-97.
Smith, J.H. and Lewis, T.O. (1980). Determining the effects of intraclass correlation on factorial experiments. Comm. Statist. A9,1353-1364.
Srivastava, M.S. (1980). Effect of equicorrelation in detecting a spurious observation. Canad. J. Statist. 2,249-254.
Tranquilli, G. B. and Baldessari, B. (1988). Regression analysis with dependent observations: conditions for the invariance of the distribution of the F- statistic. Statistics XLW, 49-57.
Tubbs, J. (1980a). Effect of autocorrelated observations on confidence sets based upon chi-square statistics. ZEEE Trans. Systems, Man, Cybern. SMC-10 4, 177- 180.
Tubbs, J. (1980b). The effect of serial correlation on confidence regions for the parameters of a multivariate normal population. Comm. Statist. A13, 1341-1351.
Turner, D.W., Young, D.M., and Marco, V.R. (1986). A note on the robustness of the Lilliefors test for univariate normality with respect to equicorrelated data. Comm. Statist. A15,2355-2361.
Walsh, E. J. (1947). Concerning the effect of intraclass correlation on certain ~ i ~ c a n c e tests. Ann. Math. Statist. 18,88-96.
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14
INDEPENDENCE-DISTRIBUTION-PRESERVING COVARIANCE 1613
Young, D.M., Pavur, R.J., and Marco, V.R. (1989). On the effect of correlation and unequal variances in detecting a spurious observation. Canad. J. Statist. 17, 103-105.
Received September 1991
Dow
nloa
ded
by [
Uni
vers
ity N
orth
Car
olin
a -
Cha
pel H
ill]
at 0
6:55
11
Nov
embe
r 20
14