25
March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1 To appear in the Journal of Nonparametric Statistics Vol. 00, No. 00, Month 20XX, 1–25 Cram´ er–von Mises distance: Probabilistic interpretation, confidence intervals, and neighborhood-of-model validation L. Baringhaus a and N. Henze b* a Institut f¨ ur Mathematische Stochastik, Leibniz Universit¨ at Hannover, Postfach 6009, D-30060 Hannover, Germany; b Karlsruher Institut f¨ ur Technologie (KIT), Institut ur Stochastik, Englerstraße 2, D-76131, Germany (Received 00 Month 20XX; accepted 00 Month 20XX) We give a probabilistic interpretation of the Cram´ er–von Mises distance Δ(F, F0)= (F - F0) 2 dF0 between continuous distribution functions F and F0. If F is unknown, we construct an asymptotic confidence interval for Δ(F, F0) based on a random sample from F . Moreover, for given F0 and some value Δ0 > 0, we propose an asymptotic equivalence test of the hypothesis that Δ(F, F0) Δ0 against the alternative Δ(F, F0) < Δ0. If such a ‘neighborhood-of-F0 validation test’, carried out at a small asymptotic level, rejects the hypothesis, there is evi- dence that F is within a distance Δ0 of F0. As a neighborhood-of-exponentiality test shows, the method may be extended to the case that H0 is composite. Keywords: representation of Cram´ er–von Mises distance; confidence interval for the Cram´ er–von Mises distance; equivalence testing; neighborhood-of-exponentiality test; model validation AMS Subject Classification: 62G10; 62G20 1. Introduction The Cram´ er–von Mises distance Δ(F,F 0 )= −∞ (F (x) F 0 (x)) 2 dF 0 (x) (1.1) between continuous distribution functions is one of the distinguished measures of deviation between distributions. At first sight, Δ(F,F 0 ) seems to be difficult to interpret, but we will show that 2 3 + Δ(F,F 0 ) can be regarded a probability * Corresponding author. Email: [email protected]

Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

  • Upload
    leliem

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

To appear in the Journal of Nonparametric StatisticsVol. 00, No. 00, Month 20XX, 1–25

Cramer–von Mises distance: Probabilistic interpretation,

confidence intervals, and neighborhood-of-model validation

L. Baringhausa and N. Henzeb∗

aInstitut fur Mathematische Stochastik, Leibniz Universitat Hannover, Postfach 6009,

D-30060 Hannover, Germany; b Karlsruher Institut fur Technologie (KIT), Institut

fur Stochastik, Englerstraße 2, D-76131, Germany

(Received 00 Month 20XX; accepted 00 Month 20XX)

We give a probabilistic interpretation of the Cramer–von Mises distance∆(F, F0) =

∫(F −F0)

2dF0 between continuous distribution functions F and F0. IfF is unknown, we construct an asymptotic confidence interval for ∆(F, F0) basedon a random sample from F . Moreover, for given F0 and some value ∆0 > 0,we propose an asymptotic equivalence test of the hypothesis that ∆(F, F0) ≥ ∆0

against the alternative ∆(F, F0) < ∆0. If such a ‘neighborhood-of-F0 validationtest’, carried out at a small asymptotic level, rejects the hypothesis, there is evi-dence that F is within a distance ∆0 of F0. As a neighborhood-of-exponentialitytest shows, the method may be extended to the case that H0 is composite.

Keywords: representation of Cramer–von Mises distance; confidence interval forthe Cramer–von Mises distance; equivalence testing;neighborhood-of-exponentiality test; model validation

AMS Subject Classification: 62G10; 62G20

1. Introduction

The Cramer–von Mises distance

∆(F,F0) =

∫∞

−∞

(F (x)− F0(x))2 dF0(x) (1.1)

between continuous distribution functions is one of the distinguished measuresof deviation between distributions. At first sight, ∆(F,F0) seems to be difficultto interpret, but we will show that 2

3 +∆(F,F0) can be regarded a probability

∗Corresponding author. Email: [email protected]

Page 2: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

involving four independent random variables, two of which having distributionfunction (df) F , and the other two following df F0.

For testing the hypothesis

H0 : F = F0

that an unknown continuous df F equals some given continuous df F0, basedon an independent and identically distributed sample X1,X2, . . . ,Xn from F ,there is the time-honored Cramer–von Mises statistic

ω2n = n∆(Fn, F0) = n

∫∞

−∞

(Fn(x)− F0(x))2 dF0(x).

Here, Fn(x) = n−1∑n

j=1 1Xj ≤ x is the empirical distribution function (edf)

of X1, . . . ,Xn, and 1A denotes the indicator function of an event A (see,e.g. Csorgo and Faraway (1996) for a short historical account, especially withrespect to the contributions of Smirnov (1936) and Smirnov (1937)).

The Cramer–von Mises test which rejects H0 for large values of ω2n is one of the

most prominent goodness-of-fit tests. The test statistic has the computationallysimple form

ω2n =

1

12n+

n∑

j=1

(F0(X(j))−

2j − 1

2n

)2

,

where X(1) ≤ . . . ≤ X(n) are the order statistics of X1, . . . ,Xn. The H0-limit

distribution of ω2n is well-known (see, e.g. Anderson and Darling (1952)), and

the finite-sample distribution of ω2n approaches this limit very quickly (see, e.g.

Csorgo and Faraway (1996), p. 229, for extensive tables of quantiles of ω2n).

The limit distribution of ω2n under fixed alternatives is also known, see, e.g.,

Shorack and Wellner (1986). For an elementary derivation we refer to Angus(1983). To state the result, consider any continuous df F 6= F0, and rewrite(1.1) as ∆(F ) := ∆(F,F0), thus suppressing the dependence on F0.

Under the alternative F ,

√n

(ω2n

n−∆(F )

)D−→ N(0, σ2(F )), (1.2)

whereD−→ denotes convergence in distribution,

σ2(F ) = 4

∫∫(F (x)−F0(x)) (F (y)−F0(y)) [F (x∧y)−F (x)F (y)] dF0(x)dF0(y)

(1.3)and x ∧ y is shorthand for min(x, y). Here and in what follows, an unspecifiedintegral is over R.

2

Page 3: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

If σ2(F ) > 0 and σ2n denotes a consistent estimator of σ2(F ), Slutsky’s Lemma

yields

√n

σn

(ω2n

n−∆(F )

)D−→ N(0, 1). (1.4)

The paper is organized as follows. In Section 2 we give an interpretation of23 +∆(F,F0) in terms of a probability involving four independent random vari-ables, of which two follow the df F and the other the df F0. In Section 3, weuse (1.2) to approximate the power function of the Cramer–von Mises test.Moreover, (1.4) yields asymptotic confidence intervals for ∆(F ). Even more im-portant is the fact that we may construct an asymptotic equivalence test of thehypothesis H∆0

: ∆(F ) ≥ ∆0 versus the alternative K∆0: ∆(F ) < ∆0. Here,

∆0 is a given value that defines a neighborhood N0 := F : ∆(F ) < ∆0 ofF0. If α is a small positive number, an asymptotic level α-test of H∆0

versusK∆0

leads to the rejection of H∆0, there is much evidence that the unknown

underlying distribution belongs to the above neighborhood N0 and in this senseis ‘sufficiently close to F0’. For more information on equivalence testing, espe-cially bioequivalence testing, we refer to Wellek (2010) and Romano (2005). Thelatter paper derives asymptotically optimal equivalence tests for a real-valuedfunctional within a regular parametric family of distributions. Moreover, Munkand Czado (1998) and Freitag and Munk (2005) study equivalence testing forequality of two distributions and for structural relationship in a semiparamet-ric two-sample context, respectively, based on trimmed versions of the Mallowsdistance.

Section 4 shows that the approach also covers the more general case of a com-posite hypothesis H0 provided that the limit distribution of the Cramer–vonMises statistic under fixed alternatives to H0 is available. Section 5 considersa real-data example, and Section 6 contains some concluding remarks. For thesake of readability, lengthy proofs are deferred to Section 7.

2. A probabilistic representation of the Cramer–von Mises

distance

It is sometimes argued that ∆(F,F0), being a weighted L2-distance betweendistribution functions, is difficult to interpret. The following result shows that23 + ∆(F,F0) is nothing but a probability involving four independent random

variables; this probability attains its minimum value 23 if, and only if, the ran-

dom variables are identically distributed.

Theorem 1 Let X1,X2, Z1, Z2 be independent random variables, where X1,X2

have continuous df F and Z1, Z2 have continuous df F0. Then

∆(F,F0) = P (X1 ∨X2 < Z1) + P (Z1 ∨ Z2 < X1)−2

3,

3

Page 4: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

where x ∨ y is shorthand for max(x, y).

Proof. Notice that Fubini’s theorem yields

∆(F,F0) =

∫(F (x)− F0(x))

2 dF0(x)

=

∫E [(1X1 ≤ x − 1Z1 ≤ x) (1X2 ≤ x − 1Z2 ≤ x)] dF0(x)

= E

[∫(1X1 ≤ x − 1Z1 ≤ x) (1X2 ≤ x − 1Z2 ≤ x) dF0(x)

]

= E

[ ∫(1X1 ∨X2 ≤ x+ 1Z1 ∨ Z2 ≤ x

−1Z1 ∨X2≤x−1X1 ∨ Z2≤x) dF0(x)

].

Since, by the continuity of F0, we have that∫1X1 ∨ X2 ≤ xdF0(x) = 1 −

F0(X1 ∨X2) and the other integrals lead to similar expressions, it follows that

∆(F,F0) = E(W ),

where

W = F0(X1 ∨ Z2) + F0(X2 ∨ Z1)− F0(X1 ∨X2)− F0(Z1 ∨ Z2). (2.1)

In what follows, let Z0 be independent of X1,X2, Z1, Z2 and having the distri-bution function F0. By conditioning on X1, Z2 and using the fact that the jointdistribution of (Z0, Z2) is that of (Z1, Z2), we have

E[F0(X1 ∨ Z2)] = P(Z0 < X1 ∨ Z2)

= P(Z0 < X1 < Z2) + P(Z0 < Z2 < X1)

+P(X1 < Z0 < Z2) + P(Z2 < Z0 < X1)

= P(Z1 < X1 < Z2) + P(Z1 < Z2 < X1)

+P(X1 < Z1 < Z2) + P(Z2 < Z1 < X1).

By symmetry, it follows that E[F0(X2 ∨ Z1)] = E[F0(X1 ∨ Z2)]. Moreover,

E[F0(X1 ∨X2)] = P(Z0 < X1 ∨X2)

= P(X1 < Z0 < X2) + P(X2 < Z0 < X1)

+P(Z0 < X1 < X2) + P(Z0 < X2 < X1)

= P(X1 < Z1 < X2) + P(X2 < Z1 < X1)

+P(Z1 < X1 < X2) + P(Z1 < X2 < X1).

4

Page 5: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Since E[F0(Z1 ∨ Z2)] = P(Z0 < Z1 ∨ Z2) = 2/3, (2.1) and symmetry give

E(W ) = P(Z1 ∧ Z2 < X1 < Z1 ∨ Z2) + 2P(Z1 ∨ Z2 < X1) + P(X1 < Z1 ∧ Z2)

−P(X1 ∧X2 < Z1 < X1 ∨X2)− P(Z1 < X1 ∧X2)−2

3.

Since 1 = P(Z1 ∧ Z2 < X1 < Z1 ∨ Z2) + P(Z1 ∨ Z2 < X1) + P(X1 < Z1 ∧ Z2)and P(X1 ∧X2 < Z1 < X1 ∨X2) + P(Z1 < X1 ∧X2) = 1− P(X1 ∨X2 < Z1),the result follows.

Remark 1 (i) Theorem 1 shows that

∆(F,F0) =

∫(F − F0)

2 dF0 =

∫(F0 − F )2 dF = ∆(F0, F ).

At first glance, this symmetry looks somewhat surprising. It follows, however,also from

0 =1

3

∫d(F−F0)

3 =

∫(F−F0)

2 d(F−F0) =

∫(F−F0)

2 dF−∫

(F−F0)2 dF0.

(ii) Using integration by parts, we have

8

3=

1

3

∫d(F + F0)

3 =

∫(F + F0)

2 d(F + F0),

whence

8

3+ 2∆(F,F0) =

∫(F + F0)

2 d(F + F0) +

∫(F − F0)

2 d(F + F0)

= 2

∫F 2 d(F + F0) + 2

∫F 20 d(F + F0)

=4

3+ 2

∫F 2 dF0 + 2

∫F 20 dF.

Therefore,

∆(F,F0) =

∫F 2 dF0 +

∫F 20 dF − 2

3.

By noting that

∫F 2 dF0 = P(X1 ∨X2 < Z1) and

∫F 20 dF = P(Z1 ∨ Z2 < X1)

we have another proof of Theorem 1.

(iii) In comparison to Theorem 1, it is interesting to note that Lehmann (1951)

5

Page 6: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

based a nonparametric two-sample test on the measure of deviation

∫(F − F0)

2 d(F + F0) = P(X1 ∨X2 < Z1 ∧ Z2) + P(Z1 ∨ Z2 < X1 ∧X2)−1

3

(see also Sundrum (1954)).

3. The case of a simple hypothesis

This section exploits statistical applications of (1.2). In what follows, it is indis-pensable to have a consistent estimator σ2

n of σ2(F ) figuring in (1.2). Such anestimator is obtained if F in (1.3) is replaced throughout by the edf Fn. PuttingUj := F0(Xj), U(j) := F0(X(j)), j = 1, . . . , n, and Gn(t) = n−1

∑nj=1 1Uj ≤ t,

0 ≤ t ≤ 1, we have

σ2n = 4

∫ 1

0

∫ 1

0(Gn(s)− s) (Gn(t)− t) (Gn(s ∧ t)−Gn(s)Gn(t)) ds dt. (3.1)

To state an expression for σ2n in terms of U(1) . . . , U(n) that is suitable for

computations, let

S :=

n∑

i=1

n∑

j=1

(1− U(j))(1− U(i) ∨ U(j)) +

n∑

i=1

j<k

(1− U(k))(1 − U(i) ∨ U(k))

+

n∑

i=1

k<j

(1− U(j))(1 − U(i) ∨ U(k)).

Proposition 1 Putting

Uk :=1

n

n∑

i=1

Uk(i), Vk :=

n∑

i=1

(i− 1)Uk(i), k ≥ 1, (3.2)

we have

σ2n

4=

S

n3− 1 +

1

n

(2U1 + U1 · U2 − U3

)+

U4

4− U2

2

4

+4V1 − V3 − U1

2+ 2V1U2

n2− 4U1V1

n3− 4V 2

1

n4− 1

n2

n−1∑

j=1

n∑

i=j+1

U(i)U2(j).

The proof of Proposition 1 is given in Section 7.

6

Page 7: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Remark 2 Angus (1983) stated his result (1.2) without discussing whether thevariance σ2(F ) of the limiting normal distribution is positive. Since

F (x∧y)− F (x)F (y) =

∫(1u ≤ x)−F (x))(1u ≤ y)−F (y)) dF (u), x, y∈R,

we obtain by Fubini’s theorem that σ2(F ) given in (1.3) takes the form

σ2(F ) = 4

∫ (∫(F (x)− F0(x))(1u ≤ x)− F (x)) dF0(x)

)2

dF (u). (3.3)

Choosing F = Fn, the edf of X1, . . . ,Xn, we see that σ2n = σ2(Fn) ≥ 0. If F

and F0 have supports that are disjoint closed intervals, for example, it followsfrom (1.3) that σ2(F ) = 0. This observation contradicts an assertion of Tiagode Oliveira (1987). Recognizing that ω2

n is a V -statistic this author uses thestandard asymptotic theory of V -statistics, see, e.g., Serfling (1980), to obtainthe limit distribution of ω2

n for each fixed alternative. Starting from the alter-native formula for the variance given there, the author argues that the variancevanishes if, and only if, F = F0. In fact, if F and F0 are continuous and mu-tually absolutely continuous with the same (closed) interval as support, then(3.3) readily shows that σ2(F ) = 0 if, and only if, F = F0. In the remainder ofthis section, only alternative distributions F with σ2(F ) > 0 are considered.

From (1.2) we obtain the following approximation of the power of the Cramer–von Mises test against a fixed alternative distribution function F .

Corollary 1 Let cn = cn(α) be the critical value of the Cramer–von Misestest, carried out at level α. Then, under a fixed alternative distribution functionF , the power PF (ω

2n > cn) may be approximated by

PF (ω2n > cn) ≈ 1− Φ

( √n

σ(F )

(cnn

−∆(F )))

, (3.4)

where Φ denotes the distribution function of the standard normal distribution.

Proof. The result follows immediately from

PF (ω2n > cn) = PF

( √n

σ(F )

(ω2n

n−∆(F )

)>

√n

σ(F )

(cnn

−∆(F )))

≈ 1−Φ

( √n

σ(F )

(cnn

−∆(F )))

.

Example 1 To give an impression on the quality of the approximation providedby the right-hand side of (3.4), we consider the case where F0(t) = t, 0 ≤ t ≤ 1,and Hγ(t) := tγ , 0 ≤ t ≤ 1, is the alternative distribution function parametrized

7

Page 8: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

γn 0.3 0.5 0.7 0.8 0.9 1.1 1.2 1.3 1.5 1.7 1.920 .99 .73 .28 .14 .07 .06 .10 .15 .32 .52 .70 MC

.95 .65 .16 .01 .00 .00 .00 .03 .21 .43 .61 App50 1 .98 .56 .27 .10 .08 .18 .34 .69 .91 .98 MC

1 .93 .50 .16 .00 .00 .05 .24 .62 .84 .94 App100 1 1 .85 .48 .14 .12 .33 .61 .94 1 1 MC

1 .99 .77 .40 .02 .01 .24 .54 .88 .98 1 App200 1 1 .99 .76 .24 .20 .60 .90 1 1 1 MC

1 1 .94 .69 .12 .07 .52 .82 .99 1 1 App

Table 1. Empirical and approximated power against several alternatives (α = 0.05), rounded to 2decimal places

by the parameter γ > 0; see also Shorack and Wellner (1986), Excercise 4.4.3.Then

∆(Hγ) =

∫ 1

0(tγ − t)2 dt =

1

2γ + 1− 2

γ + 2+

1

3.

Moreover, straightforward calculations give

σ2(Hγ) = 4

∫ 1

0

∫ 1

0(sγ − s)(tγ − t)(sγ ∧ tγ − sγtγ) dsdt

= 4

[2

2γ + 1

(1

3γ + 2− 1

2γ + 3

)− 2

γ + 2

(1

2γ + 3− 1

γ + 4

)

−(

1

2γ + 1− 1

γ + 2

)2].

Table 1 displays the empirical power of the Cramer-von Mises test againstalternative df’s Hγ for several values of γ and sample sizes n = 20, n = 50,n = 100 and n = 200. The nominal level is 0.05. Each entry in the lines denotedby ‘MC’ is based on a Monte Carlo simulation with 25 000 replications. Thelines denoted by ‘App’ show the corresponding approximations given by theright-hand side of (3.4). In each case, the approximation seems to be a lowerbound for the true power.

From (1.4), we obtain the following asymptotic confidence interval for ∆(F ).

Corollary 2 Given α ∈ (0, 1), let uα = Φ−1(1−α/2) be the (1−α/2)-quantileof the standard normal distribution. Then

In :=[ω2

n

n− uασn√

n,ω2n

n+

uασn√n

]

8

Page 9: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

is an asymptotic confidence interval for ∆(F ) at level 1− α, i.e., we have

limn→∞

PF (In ∋ ∆(F )) = 1− α.

For illustration, we pick up Example 1. Table 2 gives the empirical coverageprobability of In for ∆(Hγ) in the case α = 0.05 and n ∈ 20, 50, 100 forvarious values of γ. Each of the entries in Table 2 is based on 10 000 replications.

γn 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.020 .89 .90 .89 .99 .98 .98 .93 .90 .90 .9050 .93 .93 .91 .94 .99 .97 .91 .92 .93 .93100 .94 .94 .93 .90 .99 .93 .92 .93 .94 .94

Table 2. Empirical coverage probability for ∆(Hγ) (10 000 replications), rounded to 2 decimal places,α = 0.05

A further application of (1.4) is an equivalence testing procedure or‘neighborhood-of-F0 validation procedure’ that tests, for some given positivevalue ∆0, the hypothesis

H∆0: ∆(F ) ≥ ∆0 versus K∆0

: ∆(F ) < ∆0.

Theorem 2 Let α ∈ (0, 1). The test that rejects H∆0if

ω2n

n≤ ∆0 −

σn√nΦ−1(1− α)

is an asymptotic level-α-test of H∆0versus K∆0

. This test is consistent againsteach alternative distribution.

Proof. Using (1.4) we have for each F ∈ H∆0

lim supn→∞

PF

(ω2n

n≤ ∆0 −

σn√nΦ−1(1−α)

)= lim sup

n→∞

PF

(√n

σn

(ω2n

n−∆0

)≤ Φ−1(α)

)

≤ α.

In particular,

limn→∞

PF

(ω2n

n≤ ∆0 −

σn√nΦ−1(1− α)

)= α

for each F such that ∆(F ) = ∆0. Therefore, the test has asymptotic level α. It

9

Page 10: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

is easy to see that

limn→∞

PF

(ω2n

n≤ ∆0 −

σn√nΦ−1(1− α)

)= 1

if ∆(F ) < ∆0. Thus, this test is consistent against each fixed alternative.

To exhibit the limiting power of the test against special local alternatives wefirstly state a limit theorem for ω2

n under a triangular scheme. To this end, letF be a continuous df with σ2(F ) > 0 and ∆(F ) > 0. For each n ≥ 2, supposeXn1, . . . ,Xnn are independent and identically distributed random variables withcontinuous df Gn such that limn→∞Gn = F . We can (and do) assume that therandom variables Xnk are defined on the same probability space (Ω,A,P), say.Let Fn be the empirical df of Xn1, . . . ,Xnn. Define (with this Fn) ω

2n as before.

Theorem 3 Under these assumptions, we have, as n → ∞:

(i) σ2(Gn) → σ(F ),

(ii) σ2(Fn) → σ(F ) in probability,

(iii)√n

(ω2n

n−∆(Gn)

)D−→ N(0, σ2(F )).

Proof. (i) obviously follows from the definition of

σ2(H) = 4

∫ (∫(H(x)− F0(x))(1u ≤ x)−H(x)) dF0(x)

)2

dH(u)

for distribution functions H. Using a central limit theorem for processes, see,e.g., Van der Vaart and Wellner (1996), Section 2.11, there is a Brownian bridge(U(t), 0 ≤ t ≤ 1) with continuous sample paths such that the empirical process(√n (Fn(x)−Gn(x)) , x ∈ R) converges in distribution to (U (F (x)) , x ∈ R) .

Thus, as n → ∞,

supx∈R

|Fn(x)− F (x)| → 0 in probability. (3.5)

By (3.5) and, again, by the definition of σ2(H), (ii) follows. To prove (iii) weargue as in Angus (1983). As n → ∞,

2

∫(Gn(x)− F0(x))

[√n (Fn(x)−Gn(x))

]dF0(x)

D−→ N(0, σ2(F )).

10

Page 11: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Therefore, due to

√n

∣∣∣∣ω2n

n−∆(Gn)− 2

∫(Gn(x)− F0(x)) (Fn(x)−Gn(x)) dF0(x)

∣∣∣∣

=√n

∫(Fn(x)−Gn(x))

2 dF0(x)

≤ 1√n

(√n sup

x∈R|Fn(x)−Gn(x)|

)2

→ 0 in probability,

the proof is finished.

Now, suppose ∆0 = ∆(F ) and assume that the sequence (Gn) has the additionalproperty that

∆(Gn) = ∆0 −an√n, n ≥ 1,

where (an) is some sequence of positive real numbers converging to some positivereal number a. Then as an easy consequence of the theorem we obtain

√n

(ω2n

n−∆0

)D−→ N(−a, σ2(F )).

Thus, for such a sequence (Gn) of local alternatives, the asymptotic power is

limn→∞

P

(ω2n

n≤ ∆0 −

σ(Fn)√n

Φ−1(1− α)

)= Φ

(a

σ(F )− Φ−1(1− α)

).

4. Composite hypothesis: neighborhood-of-exponentiality testing

This section shows that, at least in principle, the above reasoning carries overto the case that H0 is a composite hypothesis, e.g., a parametric family of dis-tributions. We confine ourselves to the important special case of ‘neighborhood-of-exponentiality testing’, but it will become apparent what is needed to tacklethe problem of a model validation also in other cases (see, e.g., Dette and Munk(2003) and Czado et al. (2007) in other contexts).

In what follows, write Eµ for the df of the exponential distribution with densityµ−1 exp (−x/µ) for x > 0 and 0 else, and put E = E1. The parameter µ ∈ (0,∞)is the mean of Eµ. Let E be the family of these exponential distributions. Thegoodness-of-fit problem of testing the hypothesis

H0 : F ∈ E

of exponentiality against general or special alternatives has received consider-able interest in the literature, see. e.g. Baringhaus and Henze (1991) or Henze

11

Page 12: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

and Meintanis (2005), where many more references are supplied.

Writing Xn = n−1∑n

j=1Xj for the mean of X1, . . . ,Xn, the Cramer–von Misesstatistic for testing H0 versus general alternatives is

ω2n = n

∫∞

−∞

(Fn(x)−EXn(x))2 dEXn

(x)

=1

12n+

n∑

j=1

(EXn

(X(j))−2j − 1

2n

)2

(see, e.g. D’Agostino and Stephens (1986), p. 133). Since

EXn(X(j)) = 1− exp(−X(j)/Xn), j = 1, . . . , n,

ω2n is a function of the scaled random variables Xj/Xn, j = 1 . . . , n, and

hence is distribution-free under H0, that is, the distribution of ω2n does not

depend on the true unknown distribution Eµ. For results on the limit nulldistribution of ω2

n we refer to Stephens (1976). For convenience, we present asmall table (Table 3) showing the critical values ((1 − α)-quantiles) of ω2

n forα ∈ 0.1, 0.05, 0.025, 0.01 and samples sizes n = 20, n = 50 and n = 100. Thevalues have been obtained by simulations based on 100 million replications.

α = 0.100 α = 0.050 α = 0.025 α = 0.010

n = 20 0.1735 0.2191 0.2660 0.3293n = 50 0.1741 0.2205 0.2687 0.3343n = 100 0.1743 0.2210 0.2697 0.3360

Table 3. Critical values for ω2n

In the sequel, we assume that X1 is a positive random variable with df Fand finite mean µ. We will construct an equivalence test of the hypothesisH∆0

: ∆(F, E) ≥ ∆0 against the alternative K∆0: ∆(F, E) < ∆0, where ∆0 is

some given positive number and

∆(F, E) :=∫

0(F (xµ)−E(x))2 e−xdx

is the Cramer–von Mises distance between F and the class of exponential dis-tributions. Notice that x 7→ F (xµ) is the df of the random variable X1/µ, whichshows that ∆(F, E) is scale invariant. In order derive an asymptotic test of H∆0

versus K∆0, we first give the limit distribution of ω2

n for fixed alternatives.

Theorem 4 Let X1 have the df F, where F has finite positive mean µ andfinite positive variance σ2. Additionally, let F be differentiable with uniformly

12

Page 13: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

continuous density f = F ′. Define

Vn =√n

(ω2n

n− ∆(F, E)

).

Then VnD−→ N(0, τ2(F )), where

τ2(F )

= 4

[∫∞

0

∫∞

0(F (xµ)−E(x)) (F (yµ)−E(y)) (F (µ(x∧y))−F (xµ)F (yµ)) dE(x)dE(y)

+ 2ρ

∫∞

0

(∫ xµ

0y dF (y)− µF (xµ)

)(F (xµ)− E(x)) dE(x) + σ2ρ2

]

and

ρ =

∫∞

0xf(xµ)(F (xµ)− E(x)) exp(−x) dx. (4.1)

The proof of Theorem 4 is given in Section 7.

To avoide estimation of the density f figuring in (4.1) when estimating theparameter ρ, notice that, putting F = 1− F , ρ can alternatively be written as

ρ =1

µ2

∫∞

0x

(exp

(−x

µ

)− F (x)

)exp

(−x

µ

)dF (x).

A consistent estimator τ2n of τ2(F ) is obtained if, in the above expression, wereplace throughout F by the edf Fn, µ by Xn and σ2 by the empirical variancen−1

∑nj=1(Xj −Xn)

2 (or its unbiased version (n− 1)−1∑n

j=1(Xj −Xn)2). The

estimator can be expressed in terms of the scaled variables Yj := Xj/Xn, j =1, . . . , n, and their order statistics Y(1) ≤ . . . ≤ Y(n). Putting Tj := exp(−Y(j)),j = 1, . . . , n, we have the following result.

Theorem 5 A consistent estimate τ2n of τ2(F ) is given by

τ2n = 4[J + 2κη + σ2

Y κ2],

where

J = J1 − 2J2 + J3 − J24

13

Page 14: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

with

J1 =1

n3

n∑

i,j=1

Tj(Ti ∧ Tj) +1

n3

n∑

i=1

j<k

Tk(Ti ∧ Tk) +1

n3

n∑

i=1

k<j

Tj(Ti ∧ Tk),

J2 =1

n2

n∑

i=1

T 2i

(1− Ti

2

)+

1

n2

j<i

T 2i

(1− Ti

2

)+

1

n2

i<j

TiTj

(1− Ti

2

),

J3 =1

n

n∑

i=1

[Ti

(1− Ti

2

)]2,

J4 =1

n2

n∑

i,j=1

Ti ∧ Tj −1

n

n∑

i=1

Ti

(1− Ti

2

),

κ =1

n

n∑

j=1

Y(j)

(Tj −

(1− j

n

))Tj ,

η =1

n2

n∑

j,k=1

(Y(j)−1)Tj ∧ Tk −1

n

n∑

j=1

(Y(j)−1)

(1−Tj

2

)Tj ,

and σ2Y = 1

n

∑nj=1(Yj − 1)2, the empirical variance of the scaled variables

Y1, . . . , Yn.

The proof of Theorem 5 is given in Section 7.

The next results are immediate consequences of Theorem 4 and Theorem 5.Throughout, only alternative distributions F with τ2(F ) > 0 are considered.

Corollary 3 With α and uα as in Corollary 2, an asymptotic confidenceinterval for ∆(F, E) at level 1− α is

Jn :=[ ω2

n

n− uατn√

n,ω2n

n+

uατn√n

],

i.e., we have limn→∞ PF (Jn ∋ ∆(F, E)) = 1− α.

Corollary 4 Let dn = dn(α) be the critical value of the Cramer–von Misestest for exponentiality, carried out at level α. Then, under a fixed alternativedistribution function F satisfying the assumptions of Theorem 4, the powerPF (ω

2n > dn) may be approximated by

PF (ω2n > dn) ≈ 1− Φ

( √n

τ(F )

(dnn

−∆(F, E)))

. (4.2)

Example 2 Let F be the df of the Erlang(1,2)-distribution (Gamma dis-

14

Page 15: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

tribution with shape parameter 2 and scale parameter 1), that is, F (x) =1 − (1 + x)e−x for x ≥ 0 and 0 else. The density is f(x) = xe−x for x ≥ 0and 0 else, the mean is µ = 2, and the variance is σ2 = 2. By straightforwardcalculation we obtain

∆(F, E) =∫

0

(exp(−x)− (1 + 2x)e−2x

)2e−x dx =

11

1500= 0.0073333 . . . .

The variance of the limit distribution is τ2(F ) = 4[K + 2ρλ+ 2ρ2

], where,

putting h(x) = e−x − (1 + 2x)e−2x for x ≥ 0 and 0 else,

K =

∫∫h(x)h(y)

(1−(1+2(x∧y)) e−2(x∧y) −

(1−(1+2x)e−2x

)(1−(1+2y)e−2y

))

· e−xe−y dxdy

= 2

∫h(x)(1 + 2x)e−3x dx

∫h(y)e−y dy −

(∫h(x)(1 + 2x)e−3x dx

)2

−∫∫

h(x)h(y) (1 + 2(x ∧ y)) e−2(x∧y) · e−xe−y dxdy

=: 2K1 ·K2 −K21 −K3 , say,

ρ =

∫x(f(xµ)(F (xµ) −E(x))e−x dx = 2

∫∞

0

(x2e−4x − x2e−5x − 2x3e−5x

)dx

= − 79

10000,

and

λ =

∫ (∫ xµ

0y dF (y)− µF (xµ)

)(F (xµ)− E(x)) dE(x)

= −4

∫∞

0

(x2e−4x − x2e−5x − 2x3e−5x

)dx = −2ρ.

The integrals K1,K2 and K3 are calculated to be

K1 =

∫∞

0

(e−4x + 2xe−4x − e−5x − 4xe−5x − 4x2e−5x

)dx = − 49

1000,

K2 =

∫∞

0

(e−2x − e−3x − 2xe−3x

)dx = − 1

18,

K3 = 2

∫∞

0

(∫∞

x

(e−2y − e−3y − 2ye−3y

)dy

)

·(e−4x + 2xe−4x − e−5x − 4xe−5x − 4x2e−5x

)dx

=361

13712.

15

Page 16: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Thus,

K =1868351

6174000000.

Putting pieces together we obtain

τ2(F ) = 4(K − 2τ2) =3430351

4823437500= 0.000711183 . . . .

Table 4 gives the empirical coverage probability of Jn for ∆(F, E) in the caseα = 0.05 and n ∈ 20, 50, 100, 200 for the Erlang(1,2)-distribution. Each ofthe entries in Table 4 is based on 10 000 replications.

n 20 50 100 200.93 .94 .94 .95

Table 4. Empirical coverage probability for ∆(F, E) (10 000 replications), rounded to 2 decimal places,α = 0.05

To conclude this example, Table 5 displays the empirical power of the Cramer-von Mises test for exponentiality against the Erlang(1,2)-distribution for thesample sizes n = 20, n = 50 and n = 100. The nominal level is 0.05. Each entryin the lines denoted by ‘MC’ is based on a Monte Carlo simulation with 25 000replications. The lines denoted by ‘App’ show the corresponding approximationsgiven by the right-hand side of (4.2). Like in Example 1, the approximationseems to be a lower bound for the true power.

n 20 50 100MC .48 .90 1.0App .27 .78 .97

Table 5. Power of the Cramer–von Mises test for exponentiality against the Erlang(1,2)-distribution(α = 0.05), rounded to 2 decimal places

Finally, as an analogue to Theorem 2 treating

H∆0: ∆(F, E) ≥ ∆0 versus K∆0

: ∆(F, E) < ∆0, (4.3)

where ∆0 is a given positive number, we obtain an equivalence test or‘neighborhood-of-exponentiality test’.

Theorem 6 Let α ∈ (0, 1). The test that rejects H∆0if

ω2n

n≤ ∆0 −

τn√nΦ−1(1− α) (4.4)

16

Page 17: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

is an asymptotic level-α-test of H∆0against K∆0

. This test is consistent againsteach alternative distribution.

Proof. The proof follows the same lines as the proof of Theorem 2 and is thereforeomitted.

5. A real data example

To illustrate a new K-S type goodness-of-fit test for exponentiality based onempirical Hankel transforms, Baringhaus and Taherizadeh (2013) consideredthe n = 21 values 33.5, 20.9, 17.8, 102.8, 110.6, 20.7, 54.2, 69.0, 63.4, 37.8,37.1, 45.7, 35.3, 55.2, 5.8, 0.6, 54.3, 89.8, 58.6, 39.0, 24.8 (in millimeters) ofthe monthly amount of rainfall in January during the period 1991 to 2011,taken at the weather station Berlin-Tempelhof of the German Weather Service.For these data, a test of exponentiality was rejected at the 5% level. Treat-ing the testing problem (4.3) we applied the test (4.4) to these data, choosing∆0 = 11

1500 = 0.0073333 . . . = ∆(F, E), where F is the df of the Erlang dis-tribution Erlang(1,2), see Example 2. Since the values τ2n = 0.001379132 andω2n = 0.2654209 were observed, and due to the fact that ω2

n/n = 0.01263909exceeds ∆0, the hypothesis is accepted at each level α ∈ (0, 1/2). A Q-Q plotof the data against the quantiles F−1((i− 3/8)/(n+0.25)), i = 1, . . . , n, of theErlang(1,2) distribution shown in Figure 1 seems to confirm this finding.

0 100500

1

2

3

4

5

x(i)

F−1(

i−3/8n+0.25

)

• • ••• • ••

••• • ••

• ••

••

Figure 1. Q-Q plot of rainfall data

17

Page 18: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

6. Concluding remarks

We have given a probabilistic representation of the Cramer–von Mises distancebetween continuous distribution functions. Moreover, we have shown that thetime-honored Cramer–von Mises statistic can be used to construct nonpara-metric confidence intervals for the Cramer–von Mises distance between an un-known continuous df F and a given continuous df F0 when a random sampleof F is available. The same holds for a continuous distribution F with support[0,∞) and the class E of exponential distributions. Moreover, given a positivenumber ∆0, there are asymptotic level-α-tests of H0 : ∆(F,F0) ≥ ∆0 versusK0 : ∆(F,F0) < ∆0 and H0 : ∆(F, E) ≥ ∆0 versus K0 : ∆(F, E) < ∆0, where∆(F,F0) and ∆(F, E) denote the Cramer–von Mises distance between F and F0

and F and E , respectively. The method requires an asymptotic normal distribu-tion of the Cramer–von Mises statistic under fixed alternatives and a consistentestimator of the variance of the limit distribution, and will presumably alsowork in other contexts, for example in the case of testing for normality. In thelatter case, Gurtler (2000)) constructed a neighborhood-of-normality test anda pertaining confidence interval based on the BHEP statistic for testing forunivariate and multivariate normality (see Baringhaus and Henze (1988) andEpps and Pulley (1983)).

7. Proofs

Proof of Proposition 1: From (3.1) we have

σ2n

4= I1 − 2I2 + I3 − I4,

where

I1 =

∫ 1

0

∫ 1

0Gn(s)Gn(t)Gn(s ∧ t) ds dt,

I2 =

∫ 1

0

∫ 1

0sGn(t)Gn(s ∧ t) ds dt =

∫ 1

0

∫ 1

0tGn(s)Gn(s ∧ t) ds dt,

I3 =

∫ 1

0

∫ 1

0stGn(s ∧ t) dsdt,

I4 =

∫ 1

0

∫ 1

0(Gn(s)−s) (Gn(t)− t)Gn(s)Gn(t) dsdt =

[ ∫ 1

0(Gn(s)−s)Gn(s) ds

]2.

Now,

I1 =1

n3

n∑

i=1

n∑

j=1

n∑

k=1

∫ 1

0

∫ 1

01U(i)≤s1U(j)≤ t1U(k)≤s ∧ tds dt. (7.1)

18

Page 19: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Since

∫ 1

01U(j) ≤ t1U(k) ≤ s ∧ tdt =

1U(j) ≤ s(1− U(j)), if j = k,

1U(k) ≤ s(1− U(k)), if j < k,

1U(k) ≤ s(1− U(j)), if k < j,

it follows that the double integral figuring in (7.1) equals (1−U(j))(1−U(i)∨U(j))if j = k, (1 − U(k))(1 − U(i) ∨ U(k)) if j < k and (1 − U(j))(1 − U(i) ∨ U(k)) ifj > k. Hence, the triple sum in (7.1) is

S =

n∑

i=1

n∑

j=1

(1− U(j))(1− U(i) ∨ U(j)) +

n∑

i=1

j<k

(1− U(k))(1− U(i) ∨ U(k))

+

n∑

i=1

k<j

(1− U(j))(1− U(i) ∨ U(k)).

Likewise,

I2 =1

n2

n∑

i=1

n∑

j=1

∫ 1

0

∫ 1

0s1U(i) ≤ t1U(j) ≤ s ∧ tds dt. (7.2)

Since

∫ 1

01U(i) ≤ t1U(j) ≤ s ∧ tdt =

1U(i) ≤ s(1− U(i)), if i = j,

1U(j) ≤ s(1− U(j)), if i < j,

1U(j) ≤ s(1− U(i)), if i > j,

the double integral figuring in (7.2) is (1−U(i))(1−U2(i))/2 if i = j, (1−U(j))(1−

U2(j))/2 if i < j and (1−U(i))(1−U2

(j))/2 if i > j. Using the notation introduced

in (3.2), straightforward algebra gives

I2 =1

2− U1 − U3

2n− V1

n2− U2

2+

V3

2n2+

1

2n2

n−1∑

j=1

n∑

i=j+1

U(i)U2(j).

Next, we have

I3 =1

n

n∑

j=1

∫ 1

0

∫ 1

0st1U(j) ≤ s ∧ tdsdt =

1

4n

n∑

j=1

(1− U2

(j)

)2.

Finally,

I4 =

[ ∫ 1

0(Gn(s)− s)Gn(s) ds

]2.

19

Page 20: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Here, the term within squared brackets equals

1

n2

n∑

i=1

n∑

j=1

∫ 1

01U(i) ≤ s1U(j) ≤ sds− 1

n

n∑

i=1

∫ 1

0s1U(i) ≤ sds

=1

n2

n∑

i=1

n∑

j=1

(1− U(i) ∨ U(j)

)− 1

n

n∑

i=1

1

2

(1− U2

(i)

)

=1

2− U1

n− 2V1

n2+

U2

2.

Putting the results together, the assertion follows.

Proof of Theorem 4: Throughout the proof, an unspecified integral is over [0,∞).Since ∆(F, E) =

∫(F (x) − Eµ(x))

2dEµ(x), we have Vn = An +Bn, where

An =√n

(∫ (Fn(x)−EXn

(x))2

dEXn(x)−

∫ (F (x)−EXn

(x))2

dEXn(x)

),

Bn =√n

(∫ (F (x)− EXn

(x))2

dEXn(x)−

∫(F (x)− Eµ(x))

2 dEµ(x)

).

Since

√n sup

x∈R|Fn(x)− F (x)|2 = oP(1) as n → ∞, (7.3)

we have

An = 2√n

(∫(Fn(x)− F (x))

(F (x)− EXn

(x))dEXn

(x)

)

+√n

(∫(Fn(x)− F (x))2 dEXn

(x)

)

= 2√n

(∫(Fn(x)− F (x))

(F (x)− EXn

(x))dEXn

(x)

)+ oP(1).

Again using (7.3) and the fact that

limn→∞

∫|Eµ(x)− EXn

(x)|dEXn(x) = 0

P-almost surely, we obtain

An = 2√n

(∫(Fn(x)− F (x)) (F (x)− Eµ(x)) dEXn

(x)

)+ oP(1).

20

Page 21: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Since Xn → µ P-a.s. we can apply Scheffe’s theorem to obtain

An = 2√n

(∫(Fn(x)− F (x)) (F (x)− Eµ(x)) dEµ(x)

)+ oP(1)

= 2√n

(∫(Fn(xµ)− F (xµ)) (F (xµ)− E(x)) dE(x)

)+ oP(1).

The treatment of Bn is similar. We have Bn = B1,n +B2,n, where

B1,n =√n

∫ (F (x)−EXn

(x))2

dEXn(x)−√

n

∫(F (x)− Eµ(x))

2 dEXn(x)

= 2√n

∫ (Eµ(x)−EXn

(x))(F (x)−Eµ(x)) dEXn

(x) + oP(1)

= 2√n

∫ (Eµ(xXn)−E(x)

) (F (xXn)−Eµ(xXn)

)dE(x) + oP(1)

= 2√n

∫ (Eµ(xXn)−E(x)

)(F (xµ)−E(x)) dE(x) + oP(1)

and

B2,n =√n

∫(F (x)−Eµ(x))

2 dEXn(x)−√

n

∫(F (x)−Eµ(x))

2 dEµ(x)

=√n

∫ ((F (xXn)−Eµ(xXn)

)2 − (F (xµ)−E(x))2)dE(x)

= 2√n

∫ ((F (xXn)−F (xµ)−(Eµ(xXn)−E(x))

)(F (xµ)−E(x))

)dE(x) + oP(1).

Thus,

Bn =√n

∫ (F (xXn)− F (xµ)

)(F (xµ)− E(x)) dE(x) + oP(1)

= 2√n(Xn − µ)

∫∞

0xF ′(xµ)(F (xµ) − E(x))e−x dx+ oP(1).

Remembering the definition of ρ in (4.1) we recognize that VnD−→ V , where

V = 2

(∫U(F (xµ))(F (xµ) − E(x)) exp(−x) dx+ Zρ

).

Here, (U(t), 0 ≤ t ≤ 1, Z) is a Gaussian process with (U(t), 0 ≤ t ≤ 1) beinga Brownian bridge with continuous sample paths, and Z is a centered normalvariable with variance σ2. Moreover, the covariance of U(F (xµ)) and Z is

Cov(U(F (xµ)), Z) =

∫ xµ

0y dF (y)− µF (xµ), x ≥ 0.

The distribution of V is the centered normal distribution stated in the theorem.

21

Page 22: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Proof of Theorem 5: Putting κ := µρ and

η :=1

µ

∫∞

0

(∫ xµ

0y dF (y)− µF (xµ)

)(F (xµ)− E(x)) dE(x),

τ2(F ) figuring in Theorem 4 can be written as

τ2(F ) = 4

[J + 2κη +

σ2

µ2κ2],

where

J :=

∫∫(F (xµ)−E(x)) (F (yµ)−E(y)) (F (µ(x ∧ y))−F (xµ)F (yµ)) dE(x)dE(y)

=

∫∫F (xµ)F (yµ)F (µ(x ∧ y)) dE(x)dE(y)

= −2

∫∫E(x)F (yµ)F (µ(x ∧ y)) dE(x)dE(y)

= +

∫∫E(x)E(y)F (µ(x ∧ y)) dE(x)dE(y)

= −[∫

(F (xµ)− E(x))F (xµ) dE(x)

]2

=: J1 − 2J2 + J3 − J24 (say).

Replacing F by the edf Fn and µ by Xn, an estimator of J1 is

J1 =

∫∫Fn(xXn)Fn(yXn)Fn(Xn(x ∧ y)) dE(x)dE(y)

=1

n3

n∑

i=1

n∑

j=1

n∑

k=1

∫∫1Y(i) ≤ x1Y(j) ≤ y1Y(k) ≤ x ∧ ye−(x+y)dxdy.

Here, Y(1) ≤ . . . ≤ Y(n) are the order statistics of Y1, . . . , Yn. By analogy withthe reasoning in the proof of Proposition 1, we have

J1 =1

n3

n∑

i,j=1

exp(−(Y(j) + Y(i)∨Y(j))

)+

1

n3

n∑

i=1

j<k

exp(−(Y(k) + Y(i)∨Y(k))

)

= +1

n3

n∑

i=1

k<j

exp(−(Y(j) + Y(i) ∨ Y(k))

).

22

Page 23: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Likewise, it follows that

J2 =

∫∫E(x)Fn(yXn)Fn(Xn(x ∧ y)) dE(x)dE(y)

=1

n2

n∑

i=1

e−2Y(i)

(1− 1

2e−Y(i)

)+

1

n2

j<i

e−2Y(i)

(1− 1

2e−Y(i)

)

+1

n2

i<j

e−(Y(i)+Y(j))

(1− 1

2e−Y(i)

).

In the same way, we obtain

J3 =

∫∫E(x)E(y)Fn(Xn(x ∧ y)) dE(x)dE(y)

=1

n

n∑

i=1

[∫1Y(i) ≤ x(1 − e−x)e−x dx

]2

=1

n

n∑

i=1

[e−Y(i)

(1− 1

2e−Y(i)

)]2.

Finally, an estimator of J24 is

J24 =

[∫(Fn(xXn)− E(x))Fn(xXn) dE(x)

]2,

where

J4 =1

n2

n∑

i,j=1

e−Y(i)∨Y(j) − 1

n

n∑

i=1

e−Y(i)

(1− 1

2e−Y(i)

).

The estimator of κ is

κ =1

Xn

∫x

(exp

(− x

Xn

)− (1− Fn(x))

)exp

(− x

Xn

)dFn(x)

=1

n

n∑

j=1

Y(j)

(exp

(−Y(j)

)−(1− j

n

))exp

(−Y(j)

),

23

Page 24: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

and the estimator of η takes the form

η =1

Xn

∫∞

0

(∫ xXn

0ydFn(y)−XnFn(xXn)

)(Fn(xXn)− E(x)) dE(x)

=1

n

n∑

j=1

∫∞

0

(Y(j) − 1

)1Y(j) ≤ x

(1

n

n∑

k=1

1Y(k) ≤ x −(1− e−x

))e−x dx

=1

n2

n∑

j=1

n∑

k=1

(Y(j)−1) exp(−Y(j) ∨ Y(k))−1

n

n∑

j=1

(Y(j)−1)

(1− 1

2e−Y(j))

)e−Y(j) .

Acknowledgement: The authors would like to thank two anonymous refereesand an Associate Editor for many helpful remarks.

References

Anderson, T.W., and Darling, D.A. (1952). Asymptotic theory of certain goodness-of-fit criteria based on stochastic processes. Ann. Math. Statist., 23, 193–212.

Angus, J.E. (1983). Asymptotic distribution of Cramer-von Mises one-sample teststatistics under an alternative. Comm. Statist. A–Theory Methods, 12, 2477–2482.

Baringhaus, L., and Henze, N. (1988). An invariant consistent test for multivariatenormality. Metrika, 13, 269–274.

Baringhaus, L. and Henze, N. (1991). A class of consistent tests for exponentialitybased on the empirical Laplace transform. Ann. Inst. Statist. Math., 43, 51–69.

Baringhaus, L. and Taherizadeh, F. (2013). A K-S type test for exponentiality basedon empirical Hankel transforms. Comm. Statist. A–Theory Methods, 42, 3781–3792.

Csorgo, S. and Faraway, J. (1996). The exact and asymptotic distributions of Cramer–von Mises statistics. J. Royal Statist. Soc. Ser. B, 58, 221–234.

Czado, C., Freitag, G., and Munk, A. (2007). A nonparametric test for similarity ofmarginals – with applications to the assessment of bioequivalence. J. Statist. Plann.Inference, 137, 697–711.

D’Agostino, R.B. and Stephens, M.A. (1986). Goodness-of-Fit-Techniques. MarcelDekker, New York.

Dette, H. and Munk, A. (2003). Some methodological aspects of validation of modelsin nonparametric regression. Statist. Neerlandica, 57, 207–244.

Epps, T.W., and Pulley, L.B. (1983). A test for normality based on the empiricalcharacteristic function. Biometrika, 70, 723–726.

Freitag, G., and Munk, A. (2005). On Hadamard differentiability in k-sample semi-parametric models – with applications to the assessment of structural relationships.J. Multiv. Anal., 94, 123–158.

Gurtler, N. (2000). Asymptotic results on the BHEP tests for multivariate normalitywith fixed and variable smoothing parameter (in German). Doctoral dissertation,University of Karlsruhe.

Henze, N. and Meintanis, S.G. (2005). Recent and classical tests for exponentiality: apartial review with comparisons. Metrika, 61, 29–45.

Lehmann, E.L. (1951). Consistency and unbiasedness of certain nonparametric tests.Ann. Math. Statist., 22, 165–179.

24

Page 25: Cram´er–von Mises distance: Probabilistic interpretation ...henze/seite/aiwz/media/gnst-2015-09-01-rev... · Cram´er–von Mises distance: Probabilistic interpretation, ... portant

March 12, 2016 Journal of Nonparametric Statistics GNST-2015-09-01-rev-1

Munk, A., and Czado, C. (1998). Nonparametric validation of similar distributions andthe assessment of goodness of fit. J. Roy. Statist. Soc. Ser. B, 60, 223–241.

Romano, J.P. (2005). Optimal testing of equivalence hypotheses. Ann. Statist., 33,1036–1047.

Serfling, R. (1980). Approximation theorems of mathematical statistics. Wiley, NewYork.

Shorack, G. and Wellner, J. (1986). Empirical processes with applications to statistics.Wiley, New York.

Smirnov, N.V. (1936). Sur la distribution de ω2 (criterium de M. R.v. Mises). C. R.Acad. Sci. Paris, 202, 449–452.

Smirnov, N.V. (1937). On the distribution of von Mises’ ω2-criterion (in Russian).Mat. Sb. (Nov. Ser.), 2, 973–993.

Stephens, M. (1976). Asymptotic results for goodness-of-fit statistics with unknownparameters. Ann. Statist., 4, 357–369.

Sundrum, R.M. (1954). On Lehmann’s two-sample test. Ann. Math. Statist., 25, 139–145.

Tiago de Oliveira, J. (1987). Asymptotics of the Cramer-von Mises test of goodness-of-fitting. In Goodness-of-fit, Debrecen, Colloq. Math. Soc. Janos Bolyai, 45, 415–421.

Van der Vaart, A. and Wellner, J. (1996). Weak convergence and empirical processes.Springer, New York.

Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority. CRCPress, Boca Raton.

25