28

Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

TESTS OF GOODNESS OF FIT BASED ON THEL2-WASSERSTEIN DISTANCE �E. del Barrio; C. Matr�an; J. Rodr��guez-Rodr��guezDepartamento de Estad��stica e Investigaci�on Operativa,Universidad de Valladolid, SpainJ.A. Cuesta-AlbertosDepartamento de Matem�aticas, Estad��stica y Computaci�on,Universidad de Cantabria, SpainAbstractGiven P1 and P2 in the set of probabilities on the line with �nite second order moment, P2(<);the L2-Wasserstein distance between P1 and P2, is de�ned as the lowest L2-distance between randomvariables with these distribution laws. When P 2 P2(<); has variance �20 ; and H is a location andscale family also in P2(<); the ratio (�20)�1 infQ2HW2(P;Q) does not depend on location or scalechanges on P . Therefore it can be considered as a measure of dissimilarity between P and H.This dissimilarity is used to analize goodness of �t, showing its connection with the so-calledcorrelation tests. We also obtain, through approximations by Brownian bridges to the quantileprocess, the asymptotic behaviour of the sample version of this statistic, giving direct proofs whichcover the family of tests that arose from the Shapiro-Wilk test of normality.Key words and phrases: Wasserstein distance, correlation test, Shapiro-Wilk, goodnessof �t, test of normality, quantile process, principal components decomposition, Brownianbridge, convergence of integrals, heavy tailed distributions.A.M.S. 1991 subject classi�cation: Primary: 62F05, 62E20. Secondary: 60F25.1 IntroductionIn this work we introduce a general framework to measure dissimilarities between prob-ability measures on the line. Our approach is based on the Wasserstein distance which,although is well known and widely used in probabilistic contexts (see e.g. [18], [9]), hasnot yet obtained the adequate acknowledge of its statistical merits.�Research partially supported by DGICYT, grants PB95-0715-c02-00, 01, 02 and PAPIJCL VA08/971

Page 2: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 2The methodology consists of analyzing the distance between a �xed distribution anda location and scale family of probability distributions in <. When applied to sampledistributions this approach leads to natural estimators of location and scale, but also tonatural L2-based tests of goodness of �t. Our development will pay special attention toanalyze the behaviour for di�erent kinds of tails in the distributions, including as keyexamples the uniform, normal, exponential and a heavier tailed law.For probabilities P1 and P2 in P2(<) (the set of probabilities on the line with �nitesecond order moment), the (L2-)Wasserstein distance between P1 and P2, W(P1; P2),is de�ned as the lowest L2-distance between random variables (r.v.'s), de�ned in anyprobability space, with these distribution laws:W(P1; P2) := inf �hE (X1 �X2)2i1=2 ;L(X1) = P1;L(X2) = P2� :Besides the intrinsic interest ofW as a direct consequence of this de�nition, a main factwhich makes W useful in statistics on the line (the multivariate setting is very di�erent)is that it can be explicitely obtained in terms of quantile functions. If F1 and F2 are thedistribution functions of P1 and P2 and F�11 and F�12 are the respective left continuousinverses, or quantile functions, then (see e.g. [28], [2])W(P1; P2) = �Z 10 �F�11 (t)� F�12 (t)�2 dt�1=2 (1)(recall that F�1 is de�ned on (0; 1) by F�1(t) = inffs : F (s) � tg and veri�es that itsdistribution function is F when considered as a r.v. de�ned on the unit interval).The L2-distance is related to several statistical procedures which range from the meanvalue to goodness of �t tests, like those proposed by Cramer-Von Mises and Anderson-Darling. However the Wasserstein distance, although constitutes the natural L2-distancebetween probability measures, has not been explicitely used, to our best knowledge, inconnection with goodness of �t.Let P be a probability measure in P2(<) with distribution function F and variance�20 and let H � P2(<) be a family of probabilities obtained from the probability Q0 bychanges in location and scale. We will assume Q0 to be standardized. Then, if H0 is thedistribution function of Q0, it tourns out (see (6)) thatW2(P;H) := inf nW2(P;Q) : Q 2 Ho = �20 � �Z 10 F�1(t)H�10 (t)dt�2 : (2)In consequence, the value W2(P;H)=�20 does not depend on changes on location andscale on P and can be considered as a measure of dissimilarity between P and H. There-

Page 3: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 3fore, if Pn is the sample probability measure based on a random sample of size n from P ,the statistic Rn := W2(Pn;H)S2nwhere S2n is the empirical variance, measures the dissimilarity between Pn and H and itis natural to try to employ it to test the hypothesis P 2 H.The purpose of this paper is to analyze this possibility through a uni�ed treatment ofRn, based on approximations to quantile processes by Brownian bridges, B(t), valid fordi�erent parent families H. Our approach incorporates a trick which gives a sense toZ 10 �B2(t)� EB2(t)�w(t)dteven in some cases where R 10 t(1 � t)w(t)dt =1, thus avoiding the problem which ariseswhen centering by divergent sequences is necessary. We also analize the role of tests basedon Rn to test the hypothesis P 2 H, for heavy tailed distributions in P2(<) showingtheir unadequacy from the easy fact that only the tails have influence on its asymptoticbehaviour.However, notice that (2) relates Rn to the so-called correlation tests, whose interest islargely motivated by the powerful Shapiro-Wilk test of normality [23]. This test is basedon the statistic W 12 := X 0V �10 m� nPi=1 �Xi �X�2m0V �10 V �10 m�1=2 ;where X = (X1n;X2n; :::;Xnn) is the ordered sample and m is the vector of means andV0 is the covariance matrix of the ordered statistic of a sample of size n taken from astandard normal law.The presence of V �10 (for which only twenty years after Leslie in [11] obtained reason-ably accurate approximations) in this expression motivated succesive rede�nitions as thatgiven by Shapiro and Francia (see [22])W � 12 := X 0m� nPi=1 �Xi �X�2m0m�1=2Later in [30], �rst assuming that F = �, where � is the distribution function of thestandard normal law, starting from the consideration of the statistic

Page 4: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 4L0n = nXi=1(Xin �Hin)2; (3)and standardizing when only is assumed that P is normal, De Wet and Venter proposedthe version W �� 12 := X 0H� nPi=1 �Xin �X�2H 0H�1=2 ;based on the vector H = (H1n; :::;Hnn) of the quantiles of �, Hin = ��1( in+1):Their proof of the fact that the asymptotic law of 2n(1 � W �� 12 ) � an, for adequatecentering constants an, is that of U := 1Xj=3(Y 2j � 1)=j (4)for a sequence, fYjgj, of i.i.d. standard gaussian r.v.'s can be considered as the �rst formalresult about the asymptotic behaviour of Shapiro-Wilk's test.Also, in the context of plots of normality, Brown and Hettmansperger in [3] have con-sidered the problem of a best choice for the plotting positions. They just obtained thesame statistic Rn and heuristically provided some connections with orthogonal decompo-sitions through Hermite polinomials, giving additional support to previous considerationsby Stephens in [26]. This approach will be wholly justi�ed in our theorems 8 and 9.In fact the asymptotic equivalence of the previous statistics motivated an extenseliterature (e.g. [24], [22], [30], [31], [20], [26] [19], [12], [29]), but the only direct proof ofthe asymptotic behaviour of one of these statistics is that one in [30].Curiously the, to our best knowledge, only reported attempts of considering this kindof statistics in the light of the modern theory of quantile processes, are [5], motivatinga new statistic whose distribution is not very related to that of W , or [4], which onlygets the law of L0n. Moreover, this last approach depends heavily on the previous resultsbecause it requires to give a sense to the limit expressionZ = Z 10 B2(t)� EB2(t)(�(��1(t)))2 dt(here and everywhere � and � are respectively the distribution and the density functionsof the standard normal law) but the function

Page 5: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 5t :! B2(t)� EB2(t)(�(��1(t)))2 (5)is not integrable with probability one (this quickly arises from Lemma 2.2 in [8] taking intoaccount that the function t(1� t)=(�(��1(t)))2 is not integrable on (0; 1)). The problemis circumvented in [4], through the limit law obtained in [30], by stating that Z is a r.v.with the same law as U (see (4)), so that the arguments heavily depends on the proof in[30].Also in [6], [7] and [8] has been developed an ambitious program on convergence ofintegrals of empirical and quantile processes, but it does not cover our results.Perhaps the main di�culty to treat these statistics in the quantile processes setuparises from the divergent character of the centering sequence an, proposed in [30],an = 1n+ 1 nXj=1 jn+1 �1� jn+1��� ���1 � jn+1���2 � 32 :This di�culty will be circumvented in Theorem 8, where we will show that, althoughthe set of trajectories of a Brownian bridge B(t) in [0; 1] for which the function (5) isintegrable has zero probability, the sequence(Z 1� 1n1n B2(t)� EB2(t)(�(��1(t)))2 dt)nis a L2-Cauchy sequence, whence we can give an adequate sense to Z as a L2-limit. Wewant to emphasize that our process does not depend at all on any previous result on thebehaviour of Shapiro-Wilk family of statistics.Moreover, by adequately formulating our statistic Rn we will enjoy additional bene�ts,which in some sense can be the best improvement of our approach. In particular, we willprovide a general framework in which we have not only been able to obtain the asymptoticdistributions of those statistics belonging to the Shapiro-Wilk family, but we have alsofound limit laws for some of these statistics which have been not previously reported inthe literature concerning the correlation tests (see [15]).2 The resultsFor simplicity of notation we will often identify a probability distribution with its distri-bution function. The weak convergence (respectively, convergence in probability) of the

Page 6: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 6sequence of probability measures fPng (resp. of r.v.'s fXng) will be denoted by Pn !!(resp. Xn p!).H = fH : H(x) = H0(x��� ); � 2 <; � > 0g will be a location and scale family ofdistribution functions in P2(<); where H0 is a canonical representative, which, for sim-plicity, we choose with zero mean and unit variance. Therefore given H(x) = H0(x��� ) inH it happens that � and � are its mean and its standard deviation. Also note that thecorresponding quantile functions verify H�1(t) = �+ �H�10 (t).To begin this section we give some relevant well known properties of the Wassersteindistance (see e.g. [2], [25], [9]) for future reference.Proposition 1 (a) Let m1 and m2 be the mean values of P1 and P2 (in P2(<)), and letP �1 and P �2 be the result of centering these probability measures in mean, thenW2(P1; P2) =W2(P �1 ; P �2 ) + (m1 �m2)2:(b) Let fPngn be a sequence in P2(<). Then are equivalent:i. Pn ! P 2 P2(<) in W-distance (i.e. W(Pn; P )! 0).ii. Pn !! P and R jtj2dPn ! R jtj2dP <1:iii. If Fn (resp. F ) is the distribution function of Pn (resp. P ), then F�1n ! F�1a.s. and in L2(0; 1).From (1) and a) in Proposition 1 it follows that if P is a probability law in P2(<) withdistribution function F , mean �0 and standard deviation �0:W2(P;H) = inffW2(P;H); H 2 Hg = inf�>0�Z 10 �F�1(t)� �0 � �H�10 (t)�2 dt�= inf�>0��20 + �2 � 2� Z 10 �F�1(t)� �0�H�10 (t)dt� (6)= �20 � �Z 10 �F�1(t)� �0�H�10 (t)dt�2= �20 � �Z 10 F�1(t)H�10 (t)dt�2 :Thus the nearest law in H to P is given by � = �0, the mean of P , and � =R 10 F�1(t)H�10 (t)dt, the covariance between the r.v.'s F�1 and H�10 de�ned on the unitinterval. Also note that the ratioW2(P;H)=�20 is not a�ected by location or scale changeson P; thus it can be considered as a measure of dissimilarity between P and H.

Page 7: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 7In particular, if � denotes the distribution function of a standard normal law, the bestW-approximant in the set HN of normal laws to P , will be the law N(�; �) determinedby � = �0 and � = R 10 F�1(t)��1(t)dt: AlsoW2(P;HN )�20 = 1� �R 10 F�1(t)��1(t)dt�2�20has the role of dissimilarity from the \normal pattern", but other patterns as e.g. uni-formness will be also considered.The following easy technical result shows the continuity of the best W-approximationby elements of a location and scale family.Proposition 2 Given H a location and scale family in P2(<), the application �H :P2(<) �! H which maps a probability P to its best W-approximation by elements ofH is W-continuous.PROOF:Recalling the development in (6), it becomes clear that the bestW-approximationto a probability P 2 P2(<), with distribution function F , is given by the law of the quan-tile function mP + �P;HH�10 , where mP is the mean of P and �P;H = R 10 F�1(t)H�10 (t)dt.Now let us assume that W(Pn; P )! 0. From iii) in (b) in Proposition 1 we have that,if Fn (resp. F ) is the distribution function of Pn (resp. P ), then F�1n ! F�1 a.s. and inL2(0; 1). Therefore mPn = Z 10 F�1n (t)dt �! mP = Z 10 F�1n (t)dt and�Pn;H = Z 10 F�1n (t)H�10 (t)dt �! �P;H = Z 10 F�1(t)H�10 (t)dt:Since, from (1), the W-distance between probabilities is the L2-distance between theirquantile functions, we haveW(�(Pn); �(P )) = kmPn + �Pn;HH�10 �mP � �P;HH�10 k2� jmPn �mP j+ j�Pn ;H � �P;HjkH�10 k2 �! 0;showing the continuity of �. 2

Page 8: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 8Remark 1 Note that the essential part of the previous proof consists of showing thecontinuity of the parameters mP and �P;H. With di�erent techniques it is possible toprove more general results on continuity of projections based on W, over closed familiesof probabilities (not necessarily of location and scale type) for which the existence anduniqueness of the projection be guaranteed.From now on we will assume that X1;X2; :::;Xn is a random sample obtained from aprobability law P 2 P2(<) with distribution function F . The associated sample probabil-ity law (resp. distribution function) will be denoted by Pn (resp. Fn). For a �xed n, theordered sample will be represented as X1n;X2n; :::;Xnn and X will be the random vectorX = (X1n;X2n; :::;Xnn).Let us assume that F 2 H. Thus we are assuming the associated parametric modelX1 = �0 + �0Y1, where L(Y1) = H0. To estimate the mean �0 = �(F ) and the standarddeviation �0 = �(F ), on the basis of the X1;X2; :::Xn sample we can use the naturalestimators associated to the W-metric by(�n; �n) := arg min(�;�)W (Pn;L(�+ �Y1)) :By the usual argument we have that �n is the sample mean, �n = �(Fn) = �Xn, while�n = nXk=1Xkn Z knk�1n H�10 (t)dt:Thus the W-estimator of �0 is a L-estimate (a linear combination of order statistics)which has been widely studied from other motivations (Chapter 8 in [21] or Chapter 19in [25] give an extense summary and further references, while Mason and Shorack, [16],give necessary and su�cient conditions for asymptotic normality) although the cases to betreated here will be obtained from the approximations by Brownian bridges to the quantileprocess which we will formulate later. On the other hand note that the strong consistencyof the estimators follows quickly, without additional conditions, from Proposition 2 takinginto account the Glivenko-Cantelli theorem and the strong law of large numbers.Now let us return to the dissimilarity measure between the probability law P andthe pattern H = fH : H(x) = H0(x��� ); � 2 <; � > 0g. Taking into account that thisdissimilarity is invariant under location or scale changes in the distribution P , we cannaturally consider the dissimilarity between the sample distribution Pn and H to test thenull hypothesis that P 2 H. The statistic to study is thenRn = W2(Pn;H)S2n = 1� �2nS2n

Page 9: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 9where S2n = 1n Pni=1(Xi � �Xn)2 is the variance of the sample distribution Pn.The invariance of Rn with respect to scale or translation transformations, allows usto assume F = H0 and from the convergence S2n ! �2(H0) = 1 a.s. it will be generallypossible to study the asymptotic behaviour of Rn through that of S2nRn which, in turnsadmits the following decomposition0 � R�n := S2nRn = S2n � �Z 10 F�1n (t)H�10 (t)dt�2= Z 10 (F�1n (t))2dt� �Xn�2 � �Z 10 F�1n (t)F�1(t)dt�2= Z 10 (F�1n (t)� F�1(t))2dt� �Z 10 (F�1n (t)� F�1(t))dt�2��Z 10 (F�1n (t)� F�1(t))F�1(t)dt�2:= Rn(1)�Rn(2)�Rn(3) (7)where we have taken into account the standarization of H0 and that F = H0.Let us remark that nRn(2) = (n1=2 �Xn)2 which, taking into account that F 2 P2(<),has a �21 asymptotic law. On the other handnRn(3) = (n1=2(Z 10 F�1n (t)F�1(t)dt� 1))2 = (n1=2(�n � 1))2;which under not too restrictive conditions has a scaled �21 asymptotic law (but see (21)).Finally note that for the N(0; 1) law, nRn(1) is similar to the statistic L0n of De Wet andVenter. However we need a joint treatment of (Rn(1);Rn(2);Rn(3)).To do this, we will assume the following regularity conditions on the continuous parentdistribution function FAssumptions Let a = supfx : F (x) = 0g; b = inffx : F (x) = 1g;�1 � a � b � 1.We will assume that1. F is twice di�erentiable on (a; b).2. F 0(x) = f(x) > 0; x 2 (a; b).3. For some > 0 we havesup0<t<1 t(1 � t)f 0(F�1(t))=f2(F�1(t)) � :

Page 10: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 10The general quantile process �n is de�ned by�n(t) := n1=2f(F�1(t)) �F�1(t)� F�1n (t)� ; 0 � t � 1and we have (Theorem 6.2.1 in [7]):Theorem 3 If Assumptions 1, 2 and 3 hold, we can de�ne a sequence of Brownian bridgesfBn(t); 0 � t � 1gn such thatn(1=2)�� sup1n+1�t�1� 1n+1 j�n(t)�Bn(t)j(t(1� t))� = 8<: OP (log n); if � = 0;OP (1); if 0 < � � 12:Theorem 3 allows to consider jointly the three integrals in (7) because, in terms of thegeneral quantile process the decomposition (7) is equivalent ton �S2n � �2n� = Z 10 �n(t)f(F�1(t))!2 dt� Z 10 �n(t)f(F�1(t))dt!2 � Z 10 �n(t)F�1(t)f(F�1(t)) dt!2 :When Z 10 t(1� t)f(F�1(t))2dt <1; (8)it is possible to show, using the same techniques than to prove Theorem 6.4.2 in [7] (basedon Theorem 3 above), that there exists a Brownian bridge B(t) in [0; 1], such thatZ nn+11(n+1) �n(t)f(F�1(t))!2 dt� Z nn+11(n+1) �n(t)f(F�1(t))dt!2 � Z nn+11(n+1) �n(t)F�1(t)f(F�1(t)) dt!2!! Z 10 B(t)f(F�1(t))!2 dt� Z 10 B(t)f(F�1(t))dt!2 � Z 10 B(t)F�1(t)f(F�1(t)) dt!2 : (9)Therefore, if the following conditions, on the behaviour of the extremes, holdn Z 1n0 (F�1n (t)� F�1(t))2dt = n Z 1n0 �X1n � F�1(t)�2 dt p! 0 and (10)n Z 11� 1n (F�1n (t)� F�1(t))2dt = n Z 11� 1n �Xnn � F�1(t)�2 dt p! 0;

Page 11: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 11taking into account that for every Borel set, A � [0; 1]:ZA(F�1n (t)� F�1(t))2dt � �ZA(F�1n (t)� F�1(t))dt�2 andZA(F�1n (t)� F�1(t))2dt � �ZA(F�1n (t)� F�1(t))F�1(t)dt�2 ; (11)the �nal expression in (9) will be the limit of the statistic of interest. The followingtheorem summarizes this fact.Theorem 4 Under Assumptions 1, 2 and 3, if F 2 H and (8) and (10) hold, thennRn !! Z 10 B(t)f(F�1(t))!2 dt� Z 10 B(t)f(F�1(t))dt!2 � Z 10 B(t)F�1(t)f(F�1(t)) dt!2Next we are going to apply previous theorem to several patterns H.2.1 Uniform patternThe conditions involved in Theorem 4 are easily checked for the uniform model, i.e. if weasssume that H0 is an uniform distribution and, taking into account the expression of thedensity of the standard uniform distribution, we trivially obtain the following result:Theorem 5 (Uniformmodel) If H is the familiy of the uniform distributions on intervals,then nRn !! 12 "Z 10 B2(t)dt� �Z 10 B(t)dt�2#� 144�Z 10 �t� 12�B(t)dt�2 : (12)Through a principal components decomposition (see e.g. [25]) this can be expressed interms of a sequence, fZjgj, of independent N(0; 1) random variables. The expression intothe brackets has been studied in relation with the Watson statistic and admits an easyexpansion, which leads to another in terms of a sequence of exponential i.i.d. randomvariables (see e.g. [25]). On the other hand Lockhart and Stephens have obtained in [15]the expansion for (12) by the analysis of the covariance function of the gaussian processB(t)� Z 10 B(t)dt� 12�t� 12�Z 10 �t� 12�B(t)dt;resulting a much more involved expression for the corresponding eigenvalues.

Page 12: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 12It is troublesome to �nd a so involved expression of this limit law when developedin orthogonal series. This specially when compared with the situation for the normalfamily (see Theorem 9), but as we will prove in Remark 2 that is the only situation wherethe contribution of Rn(2) and Rn(3) precisely cancels out some terms in the asymptoticorthogonal series expansion of Rn(1).2.2 Normal patternThe normal model needs a more careful treatment. The �rst problem arises from the factthat the integral in (8) diverges. In fact we have (see [1])Z 1� 1n1n t(1� t)�(��1(t))2dt = log log n+ log 2 + + o(1); (13)where = limk!1 �Pkj=1 1j � log k� is Euler's constant. Since it is well known (see e.g.Lemma 5.3.2 in [7] or Corollary 2.2 in [8]) thatP 24Z 10 B(t)f(F�1(t))!2 dt <135 = 8><>: 1; if R 10 (t(1�t))f(F�1(t))2dt <10; if R 10 t(1�t)f(F�1(t))2dt =1;we must apply a sharper argument. To do this we begin by the consideration of thebehaviour of the extremes by getting (10) for the normal law.Proposition 6 If fXin; i = 1; :::; ng is the ordered sample obtained from a (i.i.d.) randomsample of a N(0,1) law, then:n Z 1n0 �X1n � ��1(t)�2 dt p! 0 and n Z 11� 1n �Xnn � ��1(t)�2 dt p! 0:PROOF: By simmetry we will only consider the behaviour of fX1ngn: It is well known(see e.g. [10]) that: an (X1n � bn) w!for some an !1 and bn = ��1(1=n). Hencen Z 1=n0 (X1n � bn)2 dt = (X1n � bn)2 p! 0:

Page 13: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 13Then, starting fromn Z 1=n0 �X1n � ��1(t)�2 dt = n Z 1=n0 (X1n � bn)2 dt+ n Z 1=n0 �bn ���1(t)�2 dt+2n (X1n � bn) Z 1=n0 �bn � ��1(t)� dt;by Schwarz inequality, it will be su�cient to prove that the second summand convergesto 0.If Zn denotes a r.v. with the N(0,1) law conditioned to the interval (�1; bn), we haven Z 1=n0 �bn � ��1(t)�2 dt = E [bn � Zn]2 = [bn � EZn]2 + V ar (Zn) ;and E [Zn] = ��(bn)�(bn) and Var(Zn) = 1 � bn�(bn)�(bn) � �(bn)�(bn)!2 :Therefore, taking into account the well known equivalence �(x) � jxj�(x) for x !�1, and applying L`Hopital rule:limn!1E [bn � Zn]2 = 1 + limn!1 b2n + bn�(bn)�(bn) ! = 1 + limn!1 b2n�(bn) + bn�(bn)�(bn) != 1 + limn!1 2bn�(bn) + b2n�(bn) + �(bn) + bn�0(bn)�(bn) != 1 + limn!1 2bn�(bn) + b2n�(bn) + �(bn) � b2n�(bn)�(bn) != 2 + limn!1 2bn�(bn)�(bn) = 0:2Proposition 7 On an adequate probability space there exists a sequence fBn(t)gn ofBrownian bridges on [0; 1] such that, when the parent distribution is the N(0,1) law, thestatistic nR�n := n (S2n � �2n) veri�esnR�n�24Z nn+11(n+1) Bn(t)�(��1(t))!2 dt� Z nn+11(n+1) Bn(t)�(��1(t))dt!2 � Z nn+11(n+1) Bn(t)��1(t)�(��1(t)) dt!21A p! 0:

Page 14: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 14PROOF: From Proposition 6 and (11) it follows thatnR�n�24Z nn+11(n+1) �n(t)�(��1(t))!2 dt� Z nn+11(n+1) �n(t)�(��1(t))dt!2 � Z nn+11(n+1) �n(t)��1(t)�(��1(t)) dt#21A p! 0:Now let us show that (in an adequate space)Z nn+11(n+1) �n(t)�(��1(t))!2 dt� Z nn+11(n+1) Bn(t)�(��1(t))!2 dt p! 0: (14)Theorem 3 guarantees the existence of a sequence of Brownian bridges such that, forevery � 2 (0; 1=2):������Z nn+11n+1 �n(t)�(��1(t))!2 dt� Z nn+11n+1 Bn(t)�(��1(t))!2 dt������� Z nn+11n+1 �n(t)�Bn(t)�(��1(t)) !2 dt+ 2 Z nn+11n+1 j�n(t)�Bn(t)jjBn(t)j�(��1(t))2 dt� Op(1)n2��1 Z nn+11n+1 (t(1� t))2��(��1(t))2 dt+Op(1)n�� 12 Z nn+11n+1 (t(1� t))�jBn(t))j�(��1(t))2 dt:= A(1)n +A(2)nBut if 0 < � < 1 then limn!1 n��1 Z nn+11n+1 (t(1� t))��(��1(t))2dt = 0 (15)because, once more the equivalence, jxj�(x) � �(x); as x! �1 easily shows thatn��1 Z 121n+1 t��(��1(t))2dt= �n��1(n+ 1)� ��1( 1n+1 )�(��1( 1n+1)) � n��1 Z 121n+1 �t��1�(��1(t)) + t���1(t)�(��1(t))2 ��1(t)dt! 0:Therefore A(1)n p! 0. On the other hand, also for � 2 (0; 1=2) (taking � = �+ 12 in (15))E "n�� 12 Z nn+11n (t(1� t))�jBn(t))j�(��1(t))2 dt# = n�� 12 Z nn+11n (t(1� t))�+ 12�(��1(t))2 dt! 0;

Page 15: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 15thus A(2)n p! 0, which ends the proof of (14).Now let us consider the proof of Z nn+11n+1 �n(t)�(��1(t))dt!2 � Z nn+11n+1 Bn(t)�(��1(t))dt!2= Z nn+11n+1 �n(t)�Bn(t)�(��1(t)) dt! Z nn+11n+1 �n(t) +Bn(t)�(��1(t)) dt! p! 0: (16)The �rst factor in (16) is bounded by24Z nn+11n+1 �n(t)�Bn(t)�(��1(t)) !2 dt351=2which, in turns, converges in probability to zero by the convergence A(1)n p! 0 alreadyproved. Moreover, as it is well known, the law ofZ nn+11n+1 Bn(t)�(��1(t))dtis N(0; �21(1=(n + 1))), with�21(x) := Z 1�xx Z 1�xx u ^ v � uv�(��1(u))�(��1(v))dudv:It is easy to verify that �21(x)! 1 as x! 0; from whichZ nn+11n+1 Bn(t)�(��1(t))dt = Op(1);and therefore also Z nn+11n+1 �n(t)�(��1(t))dt = Op(1);which by the convergence of the �rst factor shows (16).In an analogous way we have:�����Z nn+11n+1 (�n(t)�Bn(t))��1(t)�(��1(t)) dt����� � "�Z 10 (��1(t))2dt�Z nn+11n+1 (�n(t)�Bn(t))2(�(��1(t)))2 dt#1=2 p! 0;

Page 16: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 16the law of Z nn+11n+1 Bn(t)��1(t)�(��1(t)) dtbeing a N(0; �22( 1n+1 )), where�22(x) := Z 1�xx Z 1�xx u ^ v � uv�(��1(u))�(��1(v))��1(u)��1(v)dudv! 1=2 as x! 0;from which we �nalice the proof because this shows that also Z nn+11n+1 �n(t)��1(t)�(��1(t)) dt!2 � Z nn+11n+1 Bn(t)��1(t)�(��1(t)) dt!2 p! 0:2 Now the convergence and the characterization of the limit law are easier problems.In the following theorem we establish the convergence in law of Rn through that of itsequivalent version based on the Brownian bridge. This will be achieved centering by itsmean and then showing its convergence in L2-sense.In the next theorem we obtain the limit law of Rn. Notice that the main di�culty isto give a sense to the expression Z 10 B2(t)� EB2(t)(�(��1(t)))2 dt (17)because, as stated in the introduction, from Lemma 2.2 in [8] it quickly follows thatthis function is not integrable a.s. Therefore the problem is that we can not assume theexistence of the limn Z nn+11n+1 B2(t)� EB2(t)(�(��1(t)))2 dt:But it turns out that this limit does exist in L2-sense and we can de�ne (17) as thevalue of this L2-limit. This process is carried out in the next theorem.Theorem 8 (Normal case) Let fXngn be a sequence of independent identically distributedrandom variables with a normal distribution. Thenn(Rn � an) !! Z 10 B2(t)� EB2(t)(�(��1(t)))2 dt� Z 10 B(t)�(��1(t))dt!2 � Z 10 B(t)��1(t)�(��1(t) dt!2 ;

Page 17: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 17where an = 1n Z nn+11n+1 t(1� t)[�(��1(t))]2dt:PROOF: As it was already observed we can assume without loss of generality that thevariables have the standard normal law, and then, taking into account the asymptoticnormality of the sample variance S2n, we have:n(Rn � an)� n(R�n � an) = nS2nR�n(1� S2n)= Op(1)pn(R�n � an + an) p! 0by (13), whenever n(R�n � an) = Op(1). In particular this shows that we can prove thetheorem by showing that the asymptotic law of n(R�n � an) is that of the functional ofthe Brownian bridge involved in the statement of the theorem, and by Proposition 7, itsu�ces even to give a limit sense toZ 10 B2(t)� EB2(t)(�(��1(t)))2 dt:If An := Z nn+11n+1 B2(t)�EB2(t)(�(��1(t)))2 dt;then it can be shown thatEA2n = Z nn+11n+1 Z nn+11n+1 2(s ^ t� st)2(�(��1(s)))2(�(��1(t)))2dsdt! Z 10 Z 10 2(s ^ t� st)2(�(��1(s)))2(�(��1(t)))2dsdt <1: (18)From this it is easy to show that E(An � Am)2 ! 0 as n;m ! 1; hence that Anconverges in L2 to some r.v. A := Z 10 B2(t)� EB2(t)[�(��1(t))]2 dt:2

Page 18: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 18The next theorem provides an explicit expression for the limit law just obtained. Noticethat, in some sense, the proof of Theorem 9 contains that of Theorem 8 because the keystep in this theorem is statement (18) and the proof of Theorem 9 relies, precisely, in toanalyze more deeply the limit in (18).Theorem 9 Let fXngn be a sequence of independent identically distributed random vari-ables with a normal distribution. Thenn(Rn � an) !!�32 + 1Xj=3 Z2j � 1j ;where fZngn is a sequence of independent N(0; 1) random variables andan = 1n Z nn+11n+1 t(1� t)[�(��1(t))]2dt:PROOF: It su�ces to show that the distribution of the functional of the Brownian bridgeinvolved in the statement of the previous theorem coincides with that of�32 + 1Xj=3 Z2j � 1j :The operator L : L2(0; 1)! L2(0; 1) de�ned byLf(t) := Z 10 s ^ t� st�(��1(s))�(��1(t))f(s)dshas eigenvalues �j = 1j ; j = 1; 2; :::, with associated eigenfunctions Hj(��1(t)), Hj beingthe j-th Hermite polynomial. But fHj(��1(t))g1j=1 is a complete orthonormal system inL2(0; 1), whenceZ nn+11n+1 B(t)�(��1(��1(t)))!2 dt = 1Xj=1 Z nn+11n+1 B(t)�(��1(t))Hj(��1(t))dt!2Since H1(x) = 1;H2(x) = x; we have the relationWn := Z nn+11n+1 B(t)�(��1(t))!2 dt� E 0@Z nn+11n+1 B(t)�(��1(t))!2 dt1A

Page 19: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 19� Z nn+11n+1 B(t)�(��1(t))dt!2 � Z nn+11n+1 B(t)��1(t)�(��1(t)) dt!2= 1Xj=30@ Z nn+11n+1 B(t)Hj(��1(t))�(��1(t)) dt!2 �E Z nn+11n+1 B(t)Hj(��1(t))�(��1(t)) dt)!21A�E Z nn+11n+1 B(t)H1(��1(t))�(��1(t)) dt!2 � E Z nn+11n+1 B(t)H2(��1(t))�(��1(t)) dt!2:= 1Xj=3 �Zj(n)2 � EZj(n)2�� EZ1(n)2 � EZ2(n)2;where the r.v.'s fZj(n)gMj=1 have, for every �xed M , a joint M-dimensional gaussian law,and their variances, �2j (1=n), satisfy�2j (1=n) = Z nn+11n+1 Z nn+11n+1 (s ^ t� st)�(��1(s))�(��1(t))Hj(��1(s))Hj(��1(t))dsdt! �j = 1j :MoreoverCov(Zj(n); Zk(n)) = Z nn+11n+1 Z nn+11n+1 (s ^ t� st)�(��1(s))�(��1(t))Hj(��1(s))Hk(��1(t))dsdt! 0as n!1. Therefore for every �xed M :MXj=3 �Zj(n)2 � EZj(n)2�� EZ1(n)2 � EZ2(n)2 !!�32 + MXj=3 Z2j � 1j ;where Z1; Z2; :::; ZM are independent N(0; 1) random variables.Now let us observe thatVar Z nn+11n+1 B2(t)� EB2(t)(�(��1(t)))2 dt! = 2 Z nn+11n+1 Z nn+11n+1 s ^ t� st�(��1(s))�(��1(t))!2 dsdt! 2 Z 10 Z 10 s ^ t� st�(��1(s))�(��1(t))!2 dsdt= 2 Z 10 1Xj=1 Z 10 s ^ t� st�(��1(s))�(��1(t))Hj(��1(s))ds!2 dt

Page 20: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 20= 2 1Xj=1 Z 10 �2jH2j (��1(s))ds = 2 1Xj=1 1j2 ;while Var0@ MXj=1 �Zj(n)2 � EZj(n)2�1A! Var0@ MXj=1 Z2j � 1j 1A = 2 MXj=1 1j2 :On the other hand, taking into account that if X and Y are standardized randomvariables with a joint normal law and covariance �, then Cov(X2; Y 2) = 2�2Cov Z nn+11n+1 B2(t)� EB2(t)�2(��1(t)) dt; Z2k(n)!= 1Xj=1Cov �Z2j (n); Z2k(n)� = 2 1Xj=1 [E(Zj(n)Zk(n))]2= 2 1Xj=1 Z nn+11n+1 Z nn+11n+1 s ^ t� st�(��1(s))�(��1(t))Hj(��1(s))Hk(��1(t))dtds!2= 2 Z nn+11n+1 Z nn+11n+1 s ^ t� st�(��1(s))�(��1(t))Hk(��1(t))dt!2 ds! 2 Z 10 Z 10 s ^ t� st�(��1(s))�(��1(t))Hk(��1(t))dt!2 ds= 2 Z 10 �2kH2k (��1(t))dt = 2 1k2 ;hence Cov0@Z nn+11n+1 B2(t)� EB2(t)�2(��1(t)) dt; MXj=1Z2j (n)1A! 2 MXj=1 1j2 :Now it is obvious thatVar0@ 1Xj=M+1(Z2j (n)� EZ2j (n))1A! 2 1Xj=M+1 1j2 ;and then that Wn !! �32 + 1Xj=3 Z2j � 1j :

Page 21: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 212Remark 2 Let us consider the decomposition (7). It happens that the only parent dis-tribution where the asymptotic terms corresponding to nRn(2) and nRn(3) just cancelout some terms of the principal components expansion of the limit law of nRn(1) is thenormal.More precisely, let F be an arbitrary distribution function with �nite variance anddensity function f . Under the hypothesis of cancellation of the �rst term in the orthogonalseries development ofZ 10 B2(t)� EB2(t)(f(F�1(t)))2 dt � Z 10 B(t)f(F�1(t))dt!2 � Z 10 B(t)F�1(t)f(F�1(t)) dt!2 ;h(t) � 1 should be an eigenfunction of the operatorLh(t) := Z 10 s ^ t� stf(F�1(t))f(F�1(s))h(s)ds:Therefore the relationZ 10 s ^ t� stf(F�1(t))f(F�1(s))h(s)ds = �h(t)should hold.Now let h(t) = g(F�1(t)); by di�erentiating two times in the relation above we obtainthat g must satisfy the equationg00(x) + l(x)g0(x) + l0(x)g(x) = �1�g(x); (19)where l(x) = ddx(log f(x)). If g(x) = 1 is a solution of (19), then l0(x) = � 1� andl(x) = � 1�x+b; from which log f(x) = � 1�x2+bx+c, and necessarily (under the additionalhypothesis of standarization) f(x) = 1p2�e�x2=2:Remark 3 The comparison of Rn with the statistics of Shapiro-Wilk, Shapiro-Franciaor De Wet-Venter could be now carried from Theorem 9 through the available results byLeslie, Stephens and Fotopoulos in [12] or by Verrill and Johnson in [29].

Page 22: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 222.3 Heavier tailed patternsFor distributions with heavier tails than the normal the situation is even more complex.An interesting fact already observed in several senses ([4], [27], [17], [14]) is the badbehaviour of this kind of tests for distributions with heavy tails. In fact let us assumethat R�n is not shift tight so that other normalizing sequence, bn = o(n), is necessary toachieve a nondegenerated law. I.e. let us assume taht bnR�n � cn !! V where cn 2 <.Then, by Theorem 3, for every �xed � > 0 we have the approximation������Z 1��� �n(t)f(F�1(t))!2 dt� Z 1��� Bn(t)f(F�1(t))!2 dt������! 0and then bnn Z 1��� �n(t)f(F�1(t))!2 dt! 0;thus (recall inequalities (11) and that nRn(2) !! �21), the statistic bnR�n�cn has the samebehaviour thanbnn 0@Z[�;1��]c �F�1n (t)� F�1(t)�2 dt� Z[�;1��]c �F�1n (t)� F�1(t)�F�1(t)dt!21A � cn:I.e.: It only depends of the tails of the distribution, so that a sample obtained from adistribution di�erent from F but with same tails would be indistinguishable through thisstatistic. This phenomenom makes undesirable to handle this kind of tests for goodnessof �t purposes.However, in spite of the previous remark, it is possible to obtain the asymptotic dis-tribution of this test in several cases. For instance, Lockhart in [13] and McLaren andLockhart in [17] have obtained the asymptotic normality of the related correlation test of�t for the exponential, extreme value and logistic distributions at rateplog n. Notice thatthe exponential case could be almost trivially treated from Theorem 5.4.3 ii) in [7], butthis could be even considered as casual because that proof does not generalize to othertails.To end the paper we will provide an example showing that for distributions with heaviertails (but in P2(<)) we can obtain even nonnormal limits for R�n. This fact, as far as weknow, was previously unknown.Example 1 Let

Page 23: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 23Q(x) = 8<: (px log(x))�1; 0 < x < e�3;�(p1� x log(1� x))�1; 1 � e�3 < x < 1:We can assume that Q is de�ned also in [e�3; 1 � e�3], in such a way that it is anondecreasing function of C2 class in (0; 1) and verifyingQ(1�x) = �Q(x) and Q0(x) > 0for every x 2 (0; 1) and R 10 Q2(t)dt = 1:If we de�ne F = Q�1, then F is a distribution function (and Q its quantile function)with variance 1, which (as it can be easily checked) veri�es our Assumptions 1, 2 and 3.We will denote its density function by f .We will analize the behaviour of R�n for this example in the following propositions.Proposition 10 Let fXngn be a sequence of independent random variables with the dis-tribution function just de�ned. If f 1ngn and f 2ngn are two independent sequences ofindependent random variables with exponential distribution with E ij = 1, and Si(x) =P1�j<x+1 ij; x � 1 then(log n)2 Z 10 �F�1n (t)� F�1(t)�2 dt� 2log(n+ 1)!!! � := 1 11 � 4q 11 + Z 11 1u 0@ S1(u)u !�1=2 � 11A2 du (20)+ 1 21 � 4q 21 + Z 11 1u 0@ S2(u)u !�1=2 � 11A2 duPROF: Let fBn(t)gn be the sequence of Brownian bridges of Theorem 3. Then������ (log n)2n Z 1� log nnlog nn �2n(t)(f(Q(t))2dt!1=2 � (log n)2n Z 1� log nnlog nn B2n(t)(f(Q(t))2dt!1=2������� 0@ sup1n+1�t� nn+1 j�n(t)�Bn(t)j(t(1� t))1=2 1A (log n)2n Z 1� log nnlog nn t(1� t)(f(Q(t)))2dt!1=2 p! 0;because of the �rst factor is Op(1) and

Page 24: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 24(log n)2n Z �log nn t(1� t)4t3(log t)2dt � (log n)2n Z �log nn 14t2(log t)2dt! 0:MoreoverE (log n)2n Z 1� log nnlog nn B2n(t)(f(Q(t))2dt! = (log n)2n Z 1� log nnlog nn t(1� t)(f(Q(t)))2dt! 0;and then (log n)2 Z 1� log nnlog nn �F�1n (t)� F�1(t)�2 dt p! 0:By simmetry we will only consider the integral in the left tail, which we split in twopieces.Since Q varies regularly at 0 with exponent �1=2 it is well known (see e.g. [10]) that,using Q(1=n) as normalizing constants, we havelog npn X1n !! L1;2:From this it is straightforward using L'Hopital rule to get(log n)2 Z 1n+10 �F�1n (t)� F�1(t)�2 dt� 1log(n+ 1)!= (log n)2 X2inn+ 1 + Z 1n+10 1t(log t)2dt� 2X1n Z 1n+10 1pt log tdt� 1log(n+ 1)!= (log n)2 X2inn+ 1 � 2X1n Z 1n+10 1pt log tdt! !! L21;2 + 4L1;2On the other hand, the fact thatlimt!1 jtjf(t)F (t) = limt!0 jQ(t)jf(Q(t))t = limt!0 jQ(t)jtQ0(t) = 2;permits to apply Theorem 6.4.5 ii) in [7] (take p = = 2; � = 0; L � 1) to obtain(log n)2 Z log nn1n+1 �F�1n (t)� F�1(t)�2 dt !! Z 10 1u 0B@ ~S(u)u !� 12 � 11CA2 du;

Page 25: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 25where ~S(u) :=P1�j<u j; u � 1; for a sequence f jgj; of i.i.d. exponentially distributedr.v.'s with E j = 1.Finally, taking into account the simultaneous character of the approximations used toprove the convergences above (based in Lemma 3.0.1 in [7]), standard arguments aboutthe asymptotic independence of functions of order statistics like Rossberg's lemma (seee.g. Lemma 5.1.4 in [7]) and some elementary calculus on distributions we get (20). 2As it was already observed nRn(2) !! �21, so that (log n)2Rn(2) p! 0. On the otherhand, from the computations in the proof above, it is simple by using Schwarz inequalityand inequalities (11) to show that(log n)2Rn(3) = (log n)2 �Z 10 �F�1n (t)� F�1(t)�F�1(t)dt�2 (21)� (log n)2 "Z 1n0 �F�1n (t)� F�1(t)�F�1(t)dt (22)+ Z 11� 1n �F�1n (t)� F�1(t)�F�1(t)dt#2 ! 4:This completes the proof of the following consequence of Proposition 10.Proposition 11 With the notation and hypotheses of Proposition 10 we have(log n)2 S2nRn � 2log(n + 1)! !! � � 4: (23)The obtention of the asymptotic behaviour of Rn from that S2nRn is not now as easyas in the previously considered cases. What it is obvious from (23) is that(log n)2 Rn � 2S2n log(n+ 1)! !! �� 4; (24)but to get the behaviour of S2n log(n + 1) is far of being trivial although the conclusion,given in the following proposition, is amusing: The inclusion of S2n contributes to theasymptotic law of Rn just cancelling the �4 summand arising from Rn(3); and retrievingthe original asymptotic law of Rn(1).

Page 26: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 26Proposition 12 With the notation and hypotheses of Proposition 10 we have(log n)2 Rn � 2log(n+ 1)! !! � (25)PROOF: First note that, for small enough t, the quantile function, QX2; of the square ofthe random variables X1;X2; :::Xn; veri�esQX2(1� t) = inffx : t � P (X2i > x)g = inf �x : F (px) � 1� t2�= �Q(1� t2)�2 = 2t(log t� log 2)2 :Therefore QX2(1� t) is regularly varying at 0 with exponent -1 and the Central LimitTheory allows to assure that (QX2(1=n))�1Pni=1X2i is shift tight, hence(log n)2n nXi=1X2i � bn !!where bn = (log n)2EX2i IfX21� n(log n)2 g: From this it is obvious thatlog n(S2n � 1) � log n 1n nXi=1X2i � bnlog n!+ bnlog n � log n �p bnlog n � log n:The last expression isbnlog n � log n = � log n�EX2i IfX21> nlog n)2 g� = �(log n)2 Z 1n0 1x(log x)2dx! �2by L'Hopital rule, hence log n(S2n � 1) p!�2 and(log n)2 2S2n log(n+ 1) � 2log(n+ 1)! = 2(log n)2S2n log(n+ 1)(1 � S2n) p! 4;which, with (24), shows (25). 2References[1] Bickel, P. and van Zwet, W. R. (1978) Asymptotic expansions for the power ofdistribution free tests in the two-sample problem. Ann. Statist. 6, 937-1004, 1978.

Page 27: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 27[2] Bickel, P. and Freedman, D. (1981) Some asymptotic theory for the bootstrap. Ann.Statist., 9, 1196-1217.[3] Brown, B. and Hettmansperger, T. (1996) Normal scores, normal plots, and test fornormality. Jour. Amer. Stat. Assoc., 91, 1668-1675.[4] Cs�org�o, M. (1983)Quantile Processes with Statistical Applications. SIAM, Philadel-phia.[5] Cs�org�o, M. and R�ev�esz, P. (1981)Strong Approximations in Probability and Statistics.Academic Press, New York.[6] Cs�org�o, M. and Horv�ath, L. (1988) On the distributions of Lp norms of weighteduniform empirical and quantile process. Ann. Prob., 16, 142-161.[7] Cs�org�o, M. and Horv�ath, L. (1993) Weighted Approximations in Probability andStatistics. John Wiley and Sons.[8] Cs�org�o, M., Horv�ath, L. and Shao, Q.-M. (1993).Convergence of integrals of uniformempirical and quantile processes Stochastic. Process. Appl., 45, 283-294.[9] Cuesta-Albertos, J.A., Matr�an, C., Rachev, S.T., R�uschendorf, L. (1996). Mass trans-portation problems in Probability Theory. Math. Scientist 21, 34-72.[10] Galambos, J. (1987). The Asymptotic Theory of Extreme Order Statistics 2nd ed..Krieger, Melbourne, Florida.[11] Leslie, J.R. (1984). Asymptotic properties and new approximations for both thecovariance matrix of normal order statistics and its inverse. Colloquia MathematicaSocietatis J�anos Bolyai , 45, 317-354.[12] Leslie, J.R.; Stephens, M.A. and Fotopoulos, S. (1986). Asymptotic distribution ofthe Shapiro-Wilk W for testing for normality. Ann. Statist., 14, 1497-1506.[13] Lockhart, R.A. (1985). The asymptotic distribution of the correlation coe�cient intesting �t to the exponential distribution. Canad. J. Statist.,13, 253-256.[14] Lockhart, R.A. (1991). Overweight tails are ine�cient. Ann. Statist. 19, 2254-2258.[15] Lockhart, R.A. and Stephens, M.A. (1996). The Probability Plot: Test of Fit Basedon the Correlation Coe�cient. Preprint.[16] Mason, D. and Shorack, G. (1992) Necessary and su�cient conditions for asymptoticnormality of L-statistics. Ann. Prob., 20, 1779-1804.

Page 28: Go o dness of t based on the W - Semantic Scholar · in the set of probabiliti es on the line with nite second order momen t, (

Goodness of �t based on the Wasserstein distance 28[17] McLaren, C.G. and Lockhart, R. (1987). On the asymptotic e�ciency of certaincorrelation tests of �t. Canad. J. Statist., 15, 159-167.[18] Rachev, S.T. (1991). Probability Metrics and the Stability of Stochastic Models.Wiley.[19] Royston, J. (1982). An extension of Shapiro and Wilk�s W test for normality to largesamples. Appl. Statist., 31, 115-124.[20] Sarkadi, K. (1975). The consistency of the Shapiro-Francia test. Biometrika, 62,445-450.[21] Ser ing, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley.New York.[22] Shapiro, S. and Francia, R. (1972). An approximate analysis of variance test fornormality. Journal of the American Statistical Association, 67, 215-216.[23] Shapiro, S. and Wilk, M. (1965). An analysis of variace test for normality (completesamples). Biometrika, 52, 591-611.[24] Shapiro, S. and Wilk, M. (1968). Approximation for the null distribution of the Wstatistic. Technometrics, 10, 861-866.[25] Shorack, G. and Wellner, J. (1986). Empirical Processes with Applications to Statis-tics. John Wiley and Sons.[26] Stephens, M.A.(1975). Asymptotic properties for covariance matrices of order statis-tics. Biometrika, 62, 23-28.[27] Stephens, M.A. (1986). Tests based on regression and correlation. In Goodness-of-Fit Techniques (R. B. D'Agostino and M.A. Stephens, eds.) 195-233. North-Holland.Amsterdam.[28] Vallender, S. (1973). Calculation of the Wasserstein distance between probabilitydistributions on the line. Theory prob.Appl., 18, 785-786.[29] Verrill, S. and Johnson, R. (1987). The asymptotic equivalence of some modi�edShapiro-Wilk statistics- Complete and censored sample cases. The Annals of Statis-tics, 15, 413-419.[30] de Wet, T. and Venter, J. (1972). Asymptotic distributions of certain test criteria ofnormality. South Africa Statistical Journal, 6, 135-149.[31] de Wet, T. and Venter, J. (1973) Asymptotic distributions for quadratic forms withapplications to test of �t. The Annals of Statistics, 2, 380-387.