2
PSYCHOMETRIKA--VOL. 55, NO. 4, 727-728 DECEMBER 1990 NOTES AND COMMENTS WHEN CAN WE TRUST THE F-APPROXIMATION OF THE BOX-TEST? FRIEDRICH FOERSTER AND GERHARD STEMMLER UNIVERSITY OF FREIBURG, F.R.G. Consider a multivariate context with p variates and k independent samples, each of size n. To test equality of the k population covariance matrices, the likelihood ratio test is commonly employed. Box's F-approximation to the null distribution of the test statistic can be used to compute p-values, if sample sizes are not too small. It is suggested to regard the F-approxi- mation as accurate if the sample sizes n are greater than or equal to 1 + 0.0163p 2 + 2.7265p - 1.4182p°'5 + 0.235p1"4. In (k), for 5 - p < 30, k <- 20. Key words: Box-test, F-approximation, homogeneityof covariance matrices. Box (1949) published a likelihood criterion to test the equality of population co- variance matrices. This test is known to be sensitive to even slight departures from multinormality. A more robust test, indeed, was developed by Tiku and Balakrishnan (1985) at least for two groups, but the frequently used statistical program packages (if any homogeneity test is available) still use the Box-test with an F-approximation. As Box showed, the null distribution of his test statistic can be expressed as an infinite series. The truncated series solution corresponded very well with the (known) exact results for p = 1 and p = 2; for greater values of p, the deviation from the exact distribution is controlled by calculating the residual terms of the series. Critical values of Box's test statistic were tabulated by Korin (1968) for relatively small values ofp -< 6 variables, k --- I0 groups, and (equal) sample sizes of n -< 21. For larger values of p, k and/or n, an F-approximation was suggested, but whose accuracy is known only for small values ofp and k. The aim of this paper is to calculate the series solution for larger values of p and k to find minimum group sample sizes necessary for an adequate F-approximation. A FORTRAN program calculated the critical value of Box's statistic from the F-approximation (a = .05), and then determined the corresponding significance level & of the series solutions. The series were limited to 30 terms (the required Bernoulli polynomials were calculated recursively). An (arbitrary but suggestive) cutoff point of & -< .051 was chosen for which the minimum sample sizes n were calculated for p = 5 to 30 in steps of 5, and k = 2, 10, and 20. Using these values, an equation was derived by least-squares to estimate the necessary n (given p and k): n(p, k) ->- 1 + 0.0163p 2 + 2.7265p - 1.4182p °'5 + 0.235p 1.4, In (k). (1) (5_p_< 30, 2--< k_< 20) Now, for all p = 5 to 30, all k = 2 to 20, a = .01, .05, . I0, .20 and .50 and n calculated from (1), the series solution was computed. The values of ~t were throughout close to the a's: for a = .01 we found & E (.0103, :0104); for ot = .05, & E (.0508, .0510); This research was supported by the Deutsche Forschungsgemeinschaft through Ste 405/2-1. Requests for reprints should be sent to Friedrich Foerster, Forschungsgruppe Psychophysiologie, Universit/it Freiburg, Belfortstr. 20, D-7800 Freiburg, F.R.G. 0033 -3123/90/1200-89118500.75/0 © 1990 The Psychometric Society 727

When can we trust theF-approximation of the box-test?

Embed Size (px)

Citation preview

PSYCHOMETRIKA--VOL. 55, NO. 4, 727-728 DECEMBER 1990 NOTES AND COMMENTS

W H E N CAN WE TRUST THE F - A P P R O X I M A T I O N OF T H E BOX-TEST?

FRIEDRICH FOERSTER AND GERHARD STEMMLER

UNIVERSITY OF FREIBURG, F.R.G.

Consider a multivariate context with p variates and k independent samples, each of size n. To test equality of the k population covariance matrices, the likelihood ratio test is commonly employed. Box's F-approximation to the null distribution of the test statistic can be used to compute p-values, if sample sizes are not too small. It is suggested to regard the F-approxi- mation as accurate if the sample sizes n are greater than or equal to 1 + 0.0163p 2 + 2.7265p - 1.4182p °'5 + 0.235p 1"4. In (k), for 5 - p < 30, k <- 20.

Key words: Box-test, F-approximation, homogeneity of covariance matrices.

Box (1949) published a likelihood criterion to test the equality of population co- variance matrices. This test is known to be sensitive to even slight departures from multinormality. A more robust test, indeed, was developed by Tiku and Balakrishnan (1985) at least for two groups, but the frequently used statistical program packages (if any homogeneity test is available) still use the Box-test with an F-approximation. As Box showed, the null distribution of his test statistic can be expressed as an infinite series. The truncated series solution corresponded very well with the (known) exact results for p = 1 and p = 2; for greater values of p, the deviation from the exact distribution is controlled by calculating the residual terms of the series. Critical values of Box ' s test statistic were tabulated by Korin (1968) for relatively small values o f p -< 6 variables, k --- I0 groups, and (equal) sample sizes of n -< 21. For larger values of p, k and/or n, an F-approximation was suggested, but whose accuracy is known only for small values o f p and k. The aim of this paper is to calculate the series solution for larger values of p and k to find minimum group sample sizes necessary for an adequate F-approximation.

A F O R T R A N program calculated the critical value of Box ' s statistic from the F-approximation (a = .05), and then determined the corresponding significance level & of the series solutions. The series were limited to 30 terms (the required Bernoulli polynomials were calculated recursively). An (arbitrary but suggestive) cutoff point of & -< .051 was chosen for which the minimum sample sizes n were calculated for p = 5 to 30 in steps of 5, and k = 2, 10, and 20. Using these values, an equation was derived by least-squares to estimate the necessary n (given p and k):

n(p, k) ->- 1 + 0.0163p 2 + 2.7265p - 1.4182p °'5 + 0.235p 1.4, In (k). (1)

( 5 _ p _ < 30, 2--< k_< 20)

Now, for all p = 5 to 30, all k = 2 to 20, a = .01, .05, . I0, .20 and .50 and n calculated from (1), the series solution was computed. The values of ~t were throughout close to the a ' s : for a = .01 we found & E (.0103, :0104); for ot = .05, & E (.0508, .0510);

This research was supported by the Deutsche Forschungsgemeinschaft through Ste 405/2-1. Requests for reprints should be sent to Friedrich Foerster, Forschungsgruppe Psychophysiologie, Universit/it Freiburg, Belfortstr. 20, D-7800 Freiburg, F.R.G.

0033 - 3123/90/1200-89118500.75/0 © 1990 The Psychometric Society

727

728 PSYCHOMETRIKA

for a = .I0, & E (.1017, 1021); for a = .20, & ~ (.2025, .2033); and for a = .50, & E (.5030, .5044), respect ively.

Although Box ' s series solution can be carried out today with digital compute r s even for large values of p , k, and n, in practice one will fall back upon the much easier (but liberal) F-approximat ion in spite o f a certain loss of accuracy. Howeve r , to avoid excess ively large errors in a, normali ty must be assured, and samples sizes should be equal and not too small. We suggest minimum sample sizes as given by (I).

References

Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317-346. Korin, B. P. (1968). On the distribution of a statistic used for testing a covariance matrix. Biometrika, 55,

171-178. Tiku, M. L., & Balakrishnan, N. (1985). Testing the equality of variance-covariance matrices the robust way.

Communications in Statistics~Theory and Methods, 13, 3033-3051.

Manuscript received 5/2/89 Final version received 11/13/89