9
7he Cunudiun Jiiurnul of Stutistirs Vol. 19. No. 2. 1991. Pages 209-217 La Revue Cunudienne de Stutistique 209 On the asymptotics of randomness statistics D. McDONALD* University of Ottawa Key words and phrases: Asymptotics, q-sample, Cram&-von Mises, nonparametric AMS 1985 subject class$cations: Primary 60F05; secondary 62G 10. ABSTRACT statistics Kiefer ( 1959) studied the asymptotics of g-sample CramCr-von Mises nonparametric statistics when 9 is fixed and the sample sizes tend to infinity. Here we prove the asymptotic normality of such statistics when the sample sizes stay fixed or small while the number of samples, q. becomes large. RESUME Le comportement asymptotique des statistiques non-paramttriques de CramCr-von Mises pour q Cchantillons a CtC CtudiC par Kiefer (1959) dans le cas ou q demeure fixe et les tailles Cchantillonnales deviennent grandes. Ici on obtient la normalite asymptotique de ces statistiques lorsque q tend vers I'infini mais les tailles Cchantillonnales demeurent petites. 1. INTRODUCTION During the startup phase of a production process there may be no specified or nominal control value for quality measurements, nor any a priori information on the distribution of these measurements. While statistics on the product quality are being collected it is useful to establish that the process is in control. Small samples are taken periodically, and a process is in control if all the observations are deemed to be independent and identical- ly distributed. The first step is to implement a sequential and necessarily nonparametric quality-control procedure to detect an out-of-control situation. Reynolds ( 1972) proposed a CUSUM (cumulative sum) procedure based on the signed sequential ranks of the obser- vations. A procedure based on the sequential ranks was analyzed in McDonald (1985, 1987). Next, after enough statistics have been collected without an out-of-control signal, one could implement a parametric quality-control procedure or, as in Park and Reynolds (1987), use the past values as a standard sample. In either case it is necessary to do retrospective tests on the past measurements to be very sure the process was always in control. This is a major application of the results in this paper. Suppose small samples {n(i)}!=,, are taken periodically for q periods for a total of N = Cy=, n(i) observations. Let F' represent the empirical distribution function of the ith sample. Let F represent the empirical distribution function of all the observations. Lehmann (1951) proposed statistics of the form 2 Srn { F'(s) - F(S)}* dF(s). i=l --03 *Research supported in part by Natural Sciences and Engineering Research Council of Canada grant A4551.

On the Asymptotics of Randomness Statistics

Embed Size (px)

Citation preview

Page 1: On the Asymptotics of Randomness Statistics

7he Cunudiun Jiiurnul of Stutistirs Vol. 19. No. 2. 1991. Pages 209-217 La Revue Cunudienne de Stutistique

209

On the asymptotics of randomness statistics D. McDONALD*

University of Ottawa

Key words and phrases: Asymptotics, q-sample, Cram&-von Mises, nonparametric

AMS 1985 subject class$cations: Primary 60F05; secondary 62G 10.

ABSTRACT

statistics

Kiefer ( 1959) studied the asymptotics of g-sample CramCr-von Mises nonparametric statistics when 9 is fixed and the sample sizes tend to infinity. Here we prove the asymptotic normality of such statistics when the sample sizes stay fixed or small while the number of samples, q. becomes large.

RESUME Le comportement asymptotique des statistiques non-paramttriques de CramCr-von Mises pour

q Cchantillons a CtC CtudiC par Kiefer (1959) dans le cas ou q demeure fixe et les tailles Cchantillonnales deviennent grandes. Ici on obtient la normalite asymptotique de ces statistiques lorsque q tend vers I'infini mais les tailles Cchantillonnales demeurent petites.

1. INTRODUCTION

During the startup phase of a production process there may be no specified or nominal control value for quality measurements, nor any a priori information on the distribution of these measurements. While statistics on the product quality are being collected it is useful to establish that the process is in control. Small samples are taken periodically, and a process is in control if all the observations are deemed to be independent and identical- ly distributed. The first step is to implement a sequential and necessarily nonparametric quality-control procedure to detect an out-of-control situation. Reynolds ( 1972) proposed a CUSUM (cumulative sum) procedure based on the signed sequential ranks of the obser- vations. A procedure based on the sequential ranks was analyzed in McDonald (1985, 1987). Next, after enough statistics have been collected without an out-of-control signal, one could implement a parametric quality-control procedure or, as in Park and Reynolds (1987), use the past values as a standard sample. In either case it is necessary to do retrospective tests on the past measurements to be very sure the process was always in control. This is a major application of the results in this paper.

Suppose small samples {n(i)}!=, , are taken periodically for q periods for a total of N = Cy=, n( i ) observations. Let F' represent the empirical distribution function of the ith sample. Let F represent the empirical distribution function of all the observations. Lehmann (1951) proposed statistics of the form

2 Srn { F'(s) - F ( S ) } * dF(s). i=l --03

*Research supported in part by Natural Sciences and Engineering Research Council of Canada grant A4551.

Page 2: On the Asymptotics of Randomness Statistics

210 McDONALD Vol. 19, No. 2

He used this statistic to test the hypothesis that the samples are independent and that the sampling distribution, F say, is the same for each sample. The asymptotics of such nonparametric q-sample Cram&-von Mises statistics were studied in Kiefer (1 959). The emphasis, however, is on the case where n(i) -t 00 while q stays fixed. In the above quality-control situation, where q + 00 while n(i) stays fixed, the asymptotics were studied in Chouinard and McDonald (1958a) in the special case of the %-statistic:

8 = 2 1: (n(i) F'(s) - n(i) + 1 N + l

i= I

One aspect of retrospective testing is the detection of a change point, that is, a point in time where the sampling distribution changes. Typically one wishes to contrast the observations after the change point with those prior to it; that is, one views the data as a two-sample problem. Nonparametric tests were proposed by Pettitt (1979). For recent results and a general review see Csorgo and Horvith (1988). The family of regular randomness shtistics proposed below, which include Lehmann's statistic and %, are fairly robust for detecting a change in the sampling distribution but are not as powerful as the change-point statistics precisely when the alternative is a change point. On the other hand, these statistics are particularly powerful when dependencies exist within and between the small samples (see McDonald 1989).

A second application of our results is a test for randomness. The %-statistic was inspired by the statistic (1) and was studied in Chouinard and McDonald (1985b) and proposed as a test for randomness - the rankings R, = { R ~ ( , ) / ( N + , ) : j = 1,. . . , n(i)} of the n(i) points of the ith realization when ranked among the N points of q realizations are randomly distributed among { I , . . . , N}. Here Ric,, denotes the rank of the j t h largest of the ith realization. One may easily integrate to check that

Now among the class of simple point processes on the line, Poisson processes, whether

(a) {n(i)}yz, are i.i.d. Poisson random variables, and (b) the {Ri(,j) : i = 1,. . . , q ; j = I , . . . ,n(i)} satisfy randomness.

homogeneous or nonhomogeneous, are characterized by the property that for all q

This characterization and the %-statistic provide a practical test for the class of (non- homogeneous) Poisson processes, and the asymptotic normality of % under randomness is used to set the level of the test. This test was extended to spatial point processes in McDonald (1989).

The proof of the asymptotic normality of the %-statistic given in Chouinard and McDonald (1 !Ma) was extremely tedious. Here we prove asymptotic normality for the family of regular randomness statistics defined below. We say regular because of the smoothness conditions imposed on the kernel. These smoothness conditions seem too strong, and one might hope to prove asymptotic normality without them.

2. DEFINITIONS AND MAIN RESULTS

DEFINITION 2.1. We say S, is a regular randomness statistic with kernel k, if it is of the form

U

i= I

Page 3: On the Asymptotics of Randomness Statistics

1991 ASYMPTOTICS OF RANDOMNESS STATISTICS 21 1

and if for all positive integers { N , q, i } and for all vectors Z i = (z;~. . . . , zg, . . . , Zinci ) )

with components in [0, I ]

and

for all vectors ( V I , . . . , vn(i)) and ( W I , . . . , w,,~;)). [that is, the Hessian of k,(N, i , zi) is a linear form uniformly bounded in i].

We remark that % is of this form, as is, after integration, N times the statistic (1). To establish asymptotic results under the null hypothesis of i.i.d. observations, we

first consider a sequence { n ( i ) } E , , and we define a family {Ug : j = I , ..., n(i) , i = I , . . . , 00) of independent random variables which, without loss of generality, we may assume to be uniform on [0, I ] having distribution function F. Denote thejth largest of the ith sample by U;(,) and let Ui denote the vector {Ui(,) : j = 1,. . . , n ( i ) } . The ranks {Ri( , ) : j = 1 ,..., n ( i ) , i = 1 , . . . , q } are the ranks associated with the first q samples and hence change with q. but we do not make this dependence explicit. We establish the following.

THEOREM 2.2. If S, is a regular randomness statistic and if

then (Sy-'€S4)/d/21ur S, + Z(O,l), where Z(0 , l ) indicates a standard normal random variable.

After the proof we give examples showing the first two conditions are independent. In general the third condition is hard to check, but in special cases we can verify it. Note, however, that k, (N, i , R , ) = ~~~~ Ri( j , is a regular randomness statistic, but S, collapses to a constant, so the third condition fails.

COROLLARY 2.3. I f

N ( q ) < 00, 1 , 1

Iim inf - C I { n ( i ) 2 2) > 0, Iim - q-O0 9 i = l Y+W N ie{l ,...,y} P O 0 4

max n( i ) = 0, lim sup -

then ( ' ~ i - E 'J~)/JTG@~ * Z(O, 1).

3. PROOFS

We first define the statistic

i= I

obtained by substituting U ; for R, in S,. By Taylor's formula we remark that

(",(i) - 2) ( U;(m) - $)

Page 4: On the Asymptotics of Randomness Statistics

21 2 McDONALD Vol. 19, No. 2

for some stochastic 0 where

By Definition 2.1 the remainder term is bounded by

Now,

4 =EX i= I

k ( N - k + 1). N

= c ( N + 2 ) ( N + 1)2 ' & = I

Hence ( 2 ) is 0(1), so if ?/ar Sq + 00 as q + 00, we see that the remainder term is negligible asymptotically. It follows that Tq is asymptotically equivalent to S, + A,.

The main idea of the proof is to show that as q + 00, Sq and Aq suitably standardized are asymptotically independent and that 4 standardized converges in distribution to a standard normal. It is easy to check that Tq standardized converges to a normal, so it will follow that S, standardized must converge in distribution to a normal.

PROWSITION 3.1. The conditional distribution of 4 given the ranks {Ri( , ) I , . . . , n( i ) , i = 1,. . . , q } has mean 0 and conditional variance Vq satisfying

We start with the asymptotics of 4. : j =

x K(""'i k)}, (3) N + I ' N + l

where K ( u , v ) = min[u, v ] - uv. r f : moreovcr, the hypotheses of Theorem 2.2 hold and if lim inf-( ?/ar &)IN > 0,

then V,/'Ihr 41 converges almost surely to 1 as q + 00 (and lim inf V q / N > 0 a s . ) .

Proof. Conditional on the ranks {Rl(') = r,(,) : j = 1,. . . ,n(i) , i = 1, . . . , q } , the {U,cJ)} are distributed like the order statistics of N i.i.d. uniforms. That is, conditionally, the distribution of Ul(,) is that of W(r,,,,), the r,(j)th order statistic of N i.i.d. uniforms. Since W(r,,,,) has mean r , ( , ) / (N + I ) , it follows, after reordering, that given the ranks, 47 is of the form

(4) 2 Chv(W(k) - E W(k)). k = l

Page 5: On the Asymptotics of Randomness Statistics

1991 ASYMPTOTICS OF RANDOMNESS STATISTICS 21 3

Statistics of this form are discussed in Stigler (1969). Equation (3) follows, since

( 5 ) c ~ V ( W ( ~ ) , W ( m ) ) = N+2 1 (min (- k -} m - - k -) m

N + I ' N + l N + l N + l .

Next, (1 / N ) V 9 is obviously asymptotic to

Now

a k by the Glivenko-Cantelli theorem. Moreover 4 (N, i, z;) is bounded uniformly for { j =

I , . . . , n(i ) , i = 1,. . . , q } and for vectors z; = ( z ; ~ , . . . ,z;,,(;)) having components in [0, 1 1 , so (1 / N ) V 9 is also asymptotic to

azi,

using the uniform continuity of K , since each of the N 2 terms of the difference tends uniformly to 0. Next, using the condition in Definition 2.1, we have (for some stochastic

l 4 5 K2- Nn(i)o(l) i= I

N2

= K 2 4 11,

so the above converges to 0 almost surely. The final replacement of RI by Ul follows in a similar fashion, so we conclude that (6) has the same asymptotics as

Page 6: On the Asymptotics of Randomness Statistics

21 4 McDONALD Vol. 19, No. 2

Since the rows { l l i( j , : j = 1,. . . , n(i)} are independent, if we subtract the expected value of (7) and take the fourth moment, terms involving more than six different rows have expectation 0. This centered fourth moment of (7) then comprises at most 0 ( N 6 ) terms, which are uniformly bounded, divided by N * . It follows that the expression (7) centered by its expected value converges almost surely to 0. Since E V, = ?/ar A, and lim infq+m (Vur A , ) / N > 0 by hypothesis, it follows that V,/E V , converges almost surely to 1 and lim inf V , / N > 0.

PROPOSITION 3.2. r f lim inf,, (Vur A q ) / N > 0, then the conditional distribution of A , / J m given the ranks {Ric,) : j = 1,. . . , n( i ) , i = 1,. . . , q } converges weakly to a standard normal distribution which is asymptotically independent of S,. More precisely, if we pick a subsequence so that the pair

Q.E.D.

S, - ES, A, Jq' JF&

converges in distribution to a pair of variables S, Z , then the conditional distribution of Z given S is a standard normal: that is, Z is independent of S.

Proof. Apply Corollary 4.1 in Stigler (1969) to the representation (4) with b~ = (log N ) 2 . Since lckNl 5 KI for all k , condition (b) there follows. By hypothesis and Proposition 3.1 we have that lirn inf V , / N > 0. Hence since

.. - ,. ij=bw

it follows, using the uniform boundedness of the C ~ N ' S , that there exists a C such that (c) in Stigler's Corollary 4.1 holds. The asymptotic normality of A,/& as q --t 03

now follows from that corollary. By Proposition 3.1 the conditional distribution of A , / J F b also converges weakly to a standard normal; that is, the distribution of A,/J- is (asymptotically) independent of the ranks and hence of S,, since S, is a function of the ranks. Q.E.D.

PROPOSITION 3.3. lflimq+, ( l /N) maxi=l,.,,,qn(i) = 0 and lim inf,, (Vur T , ) / N > 0, then we have

T, - E T, =5 Z(0,l). Jm

Proof. Since the rows {Ui(,) : j = 1,. . . , n ( i ) } are independent, it suffices to verify the Lindberg condition for the central limit theorem. By hypothesis, lim inf,,(7.'ar T,) /N > 0 and [k,(N,i ,Ui) - Ek,(N,i ,Ui)( 5 2 n ( i ) ~ by Defini- tion 2.1. The Lindberg condition follows, since

1 gtoo N # = I , ...,q lim - , m a n(i) = 0.

Q.E.D.

LEMMA 3.4. 1 2 1 {Fq, sq, A,, t,},"=, be such that:

(a) F, = 3, + 4 + L,, where F,, s,, 4, and Lq have expectation 0.

Page 7: On the Asymptotics of Randomness Statistics

1991 ASYMPTOTICS OF RANDOMNESS STATISTICS 21 5

(b) Along every weakly convergent subsequence the joint law of

converges weakly to the joint distribution of two independent random variables S, Z, where Z is a standard normal ( S may, in principle, depend on the subsequence).

(c) One has

(d) T q / d q converges weakly to a standard normal.

Then S is a standard normal distribution.

Proof. Consider any subsequence of q's (the index is not made explicit) such that

Let d, = d m / d m and t, = d m / d m . Denote the characteristic function of F q / d w by y,. Since Fq is a differentiable function of the order statistics of independent uniforms, it has a density so limlul- yq(u) = 0 by the Riemann-Lebesgue lemma. By (d), Iyq(u) - exp(-u2/2)1 -+ 0 as q + 00 for u in a compact interval containing 0, and hence, by the preceding, it does so uniformly for all u. We conclude Iyq(tqu) - exp(-ttu2/2)1 -+ 0 uniformly in u as q -+ 00. Similarly IE e x p ( i u d , A , / d m ) - exp(-dtu2/2)1 -+ 0 uniformly in u as q -+ 00. Next,

/yq(tqu) - Ee'USe-'/24u2 I

I) - , ~ ~ i u S ~ - 1 / 2 d ~ u ~

which tends to 0 b b and (c), where we note that (c) implies the convergence in probability of L,/ P=- Vur S,. We conclude that

IEeiUS - exp{-i ( t t - di)u2}1 + 0

uniformly in bounded u. Since S has unit variance and zero mean, we conclude that S is standard normal (and tt - d t -+ 1). Since this holds regardless of the subsequence taken, the theorem follows. Q.E.D.

Proof of Theorem 2.2. We have already shown that if Vur S, + 00 then T, is asymptotically equivalent to S, + 4.

First suppose lim inf- (Vur & ) / N > 0. Since asymptotically (Vur T,) /N = { ?/ar(S,+A,)}/N and since S, is asymptotically independent of A,, we have lim inf- (Vur T,)/N 2 lim inf- (Vur S q ) / N . Moreover, since lim inf- (Vur S , ) / N > 0, we have lim inf- (Vur T,)/N > 0. Setting Fq = T, - E T,, 3, = S, - ES,, and

Page 8: On the Asymptotics of Randomness Statistics

216 McDONALD Vol. 19, No. 2

t, = L, - EL,, we see by Propositions 3.3 and 3.2 that the hypotheses of Lemma 3.4 are satisfied. The result then follows.

If, on the other hand, there is a subsequence such that lim- (Vur A,)/N = 0, then s , / J q is asymptotically equal to r9F9/dm, which again is asymptotically normal. Q.E.D.

Note that the first two conditions of the theorem are independent. If n(i) = i , then the first condition holds but the second fails. If, however, n( i ) = 2 except for integers i such that i = 2k, k = 1.2 , . . . , where n ( i ) = i then N(2k) = 2 x 2k - 2k + x;=, 2’ = 4 x 2k - 2k - 2, so the second condition holds; but since r 1 ( 2 ~ ) / N ( 2 ~ ) + i, the first condition fails.

Proof of Corollary 2.3. Take

From Theorem 2 in Chouinard and McDonald (1985b) we have

Vur % lim inf -

-00 9

But if

then 1 - € €

n(i) + 1 - 2 3 <- + -. N 1 , 1

4 P O o 4 i = ,

lim inf - 2 1 + e , lim inf - C - Moreover.

so Var % 2 i ( 1 - 2 (1 - & ) (T) - >o.

45 lim inf - *+m 4

The result now follows from Theorem 2.2, since the other hypotheses are clearly satisfied. Q.E.D.

For the %-statistic exact means and variances have been calculated by painful com- binatonal methods (see Chouinard and McDonald 1985a). The expected value and the variance of Tq are easier to calculate than those of S,, since for each independent real- ization i the (Ui ic j ) : j = 1,. . . , n ( i ) ) are the order statistics of independent uniforms. By the proof of Theorem 2.2 we have (E T, - ! E S , ) / J q -0, since Z/ar S, + 00 as 4 -+ 00. Hence, under the hypotheses of Theorem 2.2, it follows using Slutsky’s lemma that

S, - E T9 =+ Z ( 0 , 1). JW

Page 9: On the Asymptotics of Randomness Statistics

1991 ASYMPTOTICS OF RANDOMNESS STATISTICS 21 7

Kilani Ghoudi (1990) in his M.Sc. Thesis used the software package Maple (see Char et al. 1985) to symbolically evaluate the mean and variance of Tq.

ACKNOWLEDGEMENT

I gratefully acknowledge an off-the-cuff remark by Harry Kesten which is essential for proving the asymptotics. Thanks also to Kilani Ghoudi for several corrections. Finally I thank the referee for several insightful simplifications and generalizations as well as for corrections and improvements in style.

REFERENCES Char, B., Geddes, K., Gonnet. G., and Watt, S. (1985). Maple User’s Guide. Fourth Edition. WATCOM

Chouinard, A,, and McDonald, D. (1985a). Limit theorems for a rank test measuring randomness. Technical

Chouinard, A,, and McDonald, D. (1985b). A characterization of non-homogeneous Poisson processes. Stochas-

Csorgo, M., and Horvhth, L. ( 1 988). Nonparametric Methods of Changepoint Problems. Handbook of Statistics,

Ghoudi, K. (1990). Multivariate non-parametric quality control statistics. M.Sc. Thesis, University of Ottawa,

Kiefer, J. (1959). K-sample analogues of the Kolmogorov-Smirnov and Cramtr-v. Mises tests. Ann. Math.

Lehmann, E.L. (1951). Consistency and unbiasedness of certain nonparametric tests. Ann. Math. Statist., 22,

McDonald, D. (1985). A cusum procedure based on sequential ranks. Technical Report No. 78, Laboratory for

McDonald, D. (1990). A cusum procedure based on sequential ranks. Naval Res. Logist., Naval Res. Logist.,

McDonald, D. (1989). On nonhomogeneous, spatial Poisson processes. Canad. J . Sratisr., 17(2), 183-195. Park, C., and Reynolds, M.R. (1987). Nonparametric procedures for monitoring a location parameter based on

Pettitt, A.N. (1979). A non-parametric approach to the change-point problem. Appl. Srarisr., 23(2), 126135. Reynolds, M.R. (1972). A sequential nonparametric test for symmetry with applications to process control.

Stigler, S.M. (1969). Linear functions of order statistics. Ann. Math. Statist., 40(3), 770-788.

Publications, 3 10 pp.

Report No. 51, Laboratory for Research in Statistics and Probability, Carleton University 34 pp.

tics, 15, 113-119.

7. Elsevier Science, 403-425.

69 PP.

Statist.. 30, 420-447.

165-179.

Research in Statistics and Probability Carleton University, 16 pp.

37. 627446.

linear placement statistics. Sequential Anal., 6(4), 303-323.

Technical Report No. 148. Stanford University.

Received 29 March I989 Revised I5 May 1990 Accepted I5 November 1990

Department of Mathematics University of Ottawa

585 King Edward Avenue Ottawa, Ontario K I N 6N5