An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

An update on the An update on the Statistical ToolkitStatistical Toolkit

Barbara Mascialino, Maria Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Grazia Pia, Andreas Pfeiffer,

Alberto Ribon, Alberto Ribon,

Paolo ViarengoPaolo Viarengo

July 19July 19thth, 2005, 2005

G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo

“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.

Release StatisticsTesting-V1-01-00 downloadable from the web:http://www.ge.infn.it/geant4/analysis/HEPstatistics/

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS

• Kolmogorov-Smirnov test

• Goodman approximation of KS test

• Kuiper test

)(

4 22

nm

nmDmn

)()( xGxFSupD mnmn

)()()()( 00* xFxFMaxxFxFMaxD TT

Dmn

Tests based on maximum distanceTests based on maximum distanceunbinned distributionsunbinned distributions

SUPREMUMSUPREMUMSTATISTICSSTATISTICS

• Fisz-Cramer-von Mises test

• Anderson-Darling test

i

ii xFxFnn

nnt 2

21221

21 )]()([)(

i k kkk

kiikk

ik nh

HnH

HnnFh

nkn

nA

4)(

)(1

)1(

)1( 2

22

Tests containing a weighting functionTests containing a weighting function

binned/unbinned distributionsbinned/unbinned distributions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS

QUADRATICQUADRATICSTATISTICSSTATISTICS

+ + WEIGHTING WEIGHTING FUNCTIONFUNCTION

Sum/integral of all the distances

1.1. Status of the existing testsStatus of the existing tests

2. New GoF Tests added2. New GoF Tests added

3. Description of the power study3. Description of the power studyphase Iphase I

4. Description of the power study4. Description of the power studyphase IIphase II

5. A concrete example: IMRT5. A concrete example: IMRT

1.1. Status of the existing tests:Status of the existing tests:Fisz-Cramer-von MisesFisz-Cramer-von Mises

Conover (book) + Darling (1957):

- The two-sample Cramer-von Mises test (Fisz test) has the same asymptotic distribution of the one-sample test (Cramer-von Mises test).

- The equation of the asymptotic distribution is available in the paperby Anderson and Darling (1952).


1.1. Status of the existing tests:Status of the existing tests:two-sample Anderson-Darlingtwo-sample Anderson-Darling

Scholz and Stephens (1987):-The two-sample Anderson-Darling test can be written in different ways:

- exact formulation (for unbinned distributions only)- approximated formulation (for binned/unbinned distributions)

-The approximated distance is already available in the toolkit.

-The asymptotic distributions of both exact and approximated formulations are available in the paper.- The two-sample Anderson-Darling test has the same asymptotic distribution of the one-sample test.


1.1. Status of the existing tests:Status of the existing tests:Tiku testTiku test

Tiku (1965):

- Cramer-von Mises test in a chi-squared approximation.

- Cramer-von Mises test statistics is converted into a central chi-square, bypassing the problem of integrating the weighting function.


2. New GoF Tests:2. New GoF Tests:weighted Kolmogorov-Smirnovweighted Kolmogorov-Smirnov

Canner (1975) & Buning (2001):

- Canner modified KS test introducing one weighting function identical to the one used in AD test.

- Buning modified KS test introducing one weighting function similar to the one used in AD test.

- The equation of the asymptotic distribution is not available in Canner’s paper, only a few critical values for some samples sizes (n=m).

))()( xxGxFSupKSW mnmn

unbinned distributionsunbinned distributions

2. New GoF Tests:2. New GoF Tests:weighted Cramer von Misesweighted Cramer von Mises

Buning (2001):

- Buning modified CVM test introducing one weighting function similar to the one used in AD test.

- The equation of the asymptotic distribution is not available in the paper, only critical values for many samples sizes.

unbinned distributionsunbinned distributions

)][ xGFCVMW mnmn 2

2. New GoF Tests:2. New GoF Tests:WatsonWatson

Watson (1975):

-Derives from Cramer-von Mises test statistics.

- Like Kuiper test it can be applied in case of cyclic observations.

- The equation of the asymptotic distribution is not available in the paper, only critical values for many samples sizes.

)()()()( xFxGxGxFW nmmn 2 2

Other newsOther news

• New user layer dealing with ROOT histograms (Andreas is working on that).

• Paper to IEEE-TNS

• Next release of the GoF Statistical Toolkit scheduled within summer.

Future developmentsFuture developments

• Fix some design-related problems.

• New design (add uncertainties).

• Extend the toolkit to the comparison of:– Experimental data versus theoretical functions,– k-sample problem,– Many dimensional one-, two-, k-sample problem.

Which is the recipe toWhich is the recipe toselect the most suitableselect the most suitable

Goodness-of-Fit testGoodness-of-Fit testamong the ones available inamong the ones available inthe GoF Statistical Toolkit?the GoF Statistical Toolkit?

3. Description of the3. Description of thepower study – phase Ipower study – phase I

SAMPLE 1SAMPLE 1

RESULTS:RESULTS:LOCATION-SCALE LOCATION-SCALE

ALTERNATIVEALTERNATIVE

SAMPLE 2SAMPLE 2TESTTEST

PARENT 1PARENT 1 PARENT 2PARENT 2MONTEMONTECARLOCARLO

REPLICATIONSREPLICATIONSk=1000k=1000

EDF STATISTICS (UNBINNED DATA): EDF STATISTICS (UNBINNED DATA): KS, KSW, KSA, KUIPER, CVM, ADAKS, KSW, KSA, KUIPER, CVM, ADA

““EMPIRICAL” POWER EVALUATIONEMPIRICAL” POWER EVALUATION

RESULTS:RESULTS:GENERAL GENERAL


COMPARISON WITHCOMPARISON WITHPUBLISHED PUBLISHED

RESULTSRESULTS

REALREALDATADATA

EXAMPLESEXAMPLES

Parent distributionsParent distributions

1)(1 xf

Uniform)

2(

2

2

2

1)(

x

exf

Gaussian

||3

2

1)( xexf

Double exponential

24

1

11)(

xxf

Cauchy

xexf )(5

Exponential

Contaminated Normal Distribution 2

)1,1(5.0)4,1(5.0)(7 xf

)9,0(1.0)1,0(9.0)(6 xf

Contaminated Normal Distribution 1

Skewness and tailweightSkewness and tailweight

025.05.0

5.0975.0

xx

xxS

125.0875.0

025.0975.0

xx

xxT

ParentParent SS TTf1(x) Uniform 1 1.267

f2(x) Gaussian 1 1.704

f3(x) Double exponential 1 2.161

f4(x) Cauchy 1 5.263

f5(x) Exponential 4.486 1.883

f6(x) Contamined normal 1

1 1.991

f7(x) Contamined normal 2

1.769 1.693

SkewnessSkewness TailweightTailweight

Comparative evaluation of testsComparative evaluation of tests

ShortShort

(T(T<1.5)<1.5)

MediumMedium

(1.5 < T < 2)(1.5 < T < 2)

LongLong

(T>2)(T>2)

SS~~11 KSKS KS – CVMKS – CVM CVM - ADCVM - AD

SS>1.5>1.5 KS - ADKS - AD ADAD CVM - ADCVM - AD

Skewness

Skewness

TailweightTailweight

2222 Supremum Supremum statistics statistics

teststests

Supremum Supremum statistics statistics

teststests

Tests Tests containing a containing a

weight functionweight function

Tests Tests containing a containing a

weight functionweight function< <

4. Description of the4. Description of thepower study – phase IIpower study – phase II

SAMPLE 1SAMPLE 1

RESULTS:RESULTS:LOCATION-SCALE LOCATION-SCALE


SAMPLE 2SAMPLE 2TESTTEST

PARENT 1PARENT 1 PARENT 2PARENT 2MONTEMONTECARLOCARLO

REPLICATIONSREPLICATIONSk=1000k=1000

BINNED/UNBINNED DATABINNED/UNBINNED DATACHI2, KS, KSW, KSA, KUIPER, CVM, CVMA, ADA, ADCHI2, KS, KSW, KSA, KUIPER, CVM, CVMA, ADA, AD

““EMPIRICAL” + “MC” POWER EVALUATIONEMPIRICAL” + “MC” POWER EVALUATION

RESULTS:RESULTS:GENERAL GENERAL


LINEAR LINEAR POWERPOWER

CORRELATIONCORRELATIONBETWEENBETWEEN

TESTSTESTSISODYNESISODYNES

Which is the most suitable goodness-of-fit test?

EXAMPLE EXAMPLE : unbinned dataLateral profilesLateral profiles

5. A concrete example: IMRT5. A concrete example: IMRT

Mich

ela

Mich

ela

Pierge

ntili

Pierge

ntili

mn GFH :0

mn GFH :1

GoF test selectionGoF test selection

025.05.0

5.0975.0

xx

xxS

125.0875.0

025.0975.0

xx

xxT

SkewnessSkewness TailweightTailweight

S = 1: symmetric distributionS < 1: left skewed distributionS > 1: right skewed distribution

T is always greater than 1, the longer the tail the greater

the value of T.

1.1. Classify the type of the distributions in terms of Classify the type of the distributions in terms of skewness S and tailweight Tskewness S and tailweight T

Comparative evaluation of tests powerComparative evaluation of tests power

ShortShort

(T(T<1.3)<1.3)

MediumMedium

(1.3 < T < 2)(1.3 < T < 2)

LongLong

(T>2)(T>2)

SS~~11 KSKS KS – CVMKS – CVM CVM - ADCVM - AD

SS>1.5>1.5 KS - ADKS - AD ADAD CVM - ADCVM - ADSkewness

Skewness

TailweightTailweight

2. Choose the most appropriate test for the classified 2. Choose the most appropriate test for the classified type of distributiontype of distribution

GoF test: test selection & resultsGoF test: test selection & results

Moderate skewed – medium tail

KOLMOGOROV-SMIRNOV TESTD=0.27 – p>0.05

X-variable: Ŝ=1.53 T=1.36

Y-variable: Ŝ=1.27 T=1.34

^

^

RESULTSRESULTS: unbinned data

Documents

An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005