23
An update on the An update on the Statistical Toolkit Statistical Toolkit Barbara Mascialino, Maria Barbara Mascialino, Maria Grazia Pia, Andreas Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Pfeiffer, Alberto Ribon, Paolo Viarengo Paolo Viarengo July 19 July 19 th th , 2005 , 2005

An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Embed Size (px)

Citation preview

Page 1: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

An update on the An update on the Statistical ToolkitStatistical Toolkit

Barbara Mascialino, Maria Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Grazia Pia, Andreas Pfeiffer,

Alberto Ribon, Alberto Ribon,

Paolo ViarengoPaolo Viarengo

July 19July 19thth, 2005, 2005

Page 2: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo

“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.

Release StatisticsTesting-V1-01-00 downloadable from the web:http://www.ge.infn.it/geant4/analysis/HEPstatistics/

Page 3: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS

• Kolmogorov-Smirnov test

• Goodman approximation of KS test

• Kuiper test

)(

4 22

nm

nmDmn

)()( xGxFSupD mnmn

)()()()( 00* xFxFMaxxFxFMaxD TT

Dmn

Tests based on maximum distanceTests based on maximum distanceunbinned distributionsunbinned distributions

SUPREMUMSUPREMUMSTATISTICSSTATISTICS

Page 4: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

• Fisz-Cramer-von Mises test

• Anderson-Darling test

i

ii xFxFnn

nnt 2

21221

21 )]()([)(

i k kkk

kiikk

ik nh

HnH

HnnFh

nkn

nA

4)(

)(1

)1(

)1( 2

22

Tests containing a weighting functionTests containing a weighting function

binned/unbinned distributionsbinned/unbinned distributions

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS

QUADRATICQUADRATICSTATISTICSSTATISTICS

+ + WEIGHTING WEIGHTING FUNCTIONFUNCTION

Sum/integral of all the distances

Page 5: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

1.1. Status of the existing testsStatus of the existing tests

2. New GoF Tests added2. New GoF Tests added

3. Description of the power study3. Description of the power studyphase Iphase I

4. Description of the power study4. Description of the power studyphase IIphase II

5. A concrete example: IMRT5. A concrete example: IMRT

Page 6: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

1.1. Status of the existing tests:Status of the existing tests:Fisz-Cramer-von MisesFisz-Cramer-von Mises

Conover (book) + Darling (1957):

- The two-sample Cramer-von Mises test (Fisz test) has the same asymptotic distribution of the one-sample test (Cramer-von Mises test).

- The equation of the asymptotic distribution is available in the paperby Anderson and Darling (1952).

binned/unbinned distributionsbinned/unbinned distributions

Page 7: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

1.1. Status of the existing tests:Status of the existing tests:two-sample Anderson-Darlingtwo-sample Anderson-Darling

Scholz and Stephens (1987):-The two-sample Anderson-Darling test can be written in different ways:

- exact formulation (for unbinned distributions only)- approximated formulation (for binned/unbinned distributions)

-The approximated distance is already available in the toolkit.

-The asymptotic distributions of both exact and approximated formulations are available in the paper.- The two-sample Anderson-Darling test has the same asymptotic distribution of the one-sample test.

binned/unbinned distributionsbinned/unbinned distributions

Page 8: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

1.1. Status of the existing tests:Status of the existing tests:Tiku testTiku test

Tiku (1965):

- Cramer-von Mises test in a chi-squared approximation.

- Cramer-von Mises test statistics is converted into a central chi-square, bypassing the problem of integrating the weighting function.

binned/unbinned distributionsbinned/unbinned distributions

Page 9: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

2. New GoF Tests:2. New GoF Tests:weighted Kolmogorov-Smirnovweighted Kolmogorov-Smirnov

Canner (1975) & Buning (2001):

- Canner modified KS test introducing one weighting function identical to the one used in AD test.

- Buning modified KS test introducing one weighting function similar to the one used in AD test.

- The equation of the asymptotic distribution is not available in Canner’s paper, only a few critical values for some samples sizes (n=m).

))()( xxGxFSupKSW mnmn

unbinned distributionsunbinned distributions

Page 10: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

2. New GoF Tests:2. New GoF Tests:weighted Cramer von Misesweighted Cramer von Mises

Buning (2001):

- Buning modified CVM test introducing one weighting function similar to the one used in AD test.

- The equation of the asymptotic distribution is not available in the paper, only critical values for many samples sizes.

unbinned distributionsunbinned distributions

)][ xGFCVMW mnmn 2

Page 11: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

2. New GoF Tests:2. New GoF Tests:WatsonWatson

Watson (1975):

-Derives from Cramer-von Mises test statistics.

- Like Kuiper test it can be applied in case of cyclic observations.

- The equation of the asymptotic distribution is not available in the paper, only critical values for many samples sizes.

)()()()( xFxGxGxFW nmmn 2 2

Page 12: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Other newsOther news

• New user layer dealing with ROOT histograms (Andreas is working on that).

• Paper to IEEE-TNS

• Next release of the GoF Statistical Toolkit scheduled within summer.

Page 13: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Future developmentsFuture developments

• Fix some design-related problems.

• New design (add uncertainties).

• Extend the toolkit to the comparison of:– Experimental data versus theoretical functions,– k-sample problem,– Many dimensional one-, two-, k-sample problem.

Page 14: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Which is the recipe toWhich is the recipe toselect the most suitableselect the most suitable

Goodness-of-Fit testGoodness-of-Fit testamong the ones available inamong the ones available inthe GoF Statistical Toolkit?the GoF Statistical Toolkit?

Page 15: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

3. Description of the3. Description of thepower study – phase Ipower study – phase I

SAMPLE 1SAMPLE 1

RESULTS:RESULTS:LOCATION-SCALE LOCATION-SCALE

ALTERNATIVEALTERNATIVE

SAMPLE 2SAMPLE 2TESTTEST

PARENT 1PARENT 1 PARENT 2PARENT 2MONTEMONTECARLOCARLO

REPLICATIONSREPLICATIONSk=1000k=1000

EDF STATISTICS (UNBINNED DATA): EDF STATISTICS (UNBINNED DATA): KS, KSW, KSA, KUIPER, CVM, ADAKS, KSW, KSA, KUIPER, CVM, ADA

““EMPIRICAL” POWER EVALUATIONEMPIRICAL” POWER EVALUATION

RESULTS:RESULTS:GENERAL GENERAL

ALTERNATIVEALTERNATIVE

COMPARISON WITHCOMPARISON WITHPUBLISHED PUBLISHED

RESULTSRESULTS

REALREALDATADATA

EXAMPLESEXAMPLES

Page 16: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Parent distributionsParent distributions

1)(1 xf

Uniform)

2(

2

2

2

1)(

x

exf

Gaussian

||3

2

1)( xexf

Double exponential

24

1

11)(

xxf

Cauchy

xexf )(5

Exponential

Contaminated Normal Distribution 2

)1,1(5.0)4,1(5.0)(7 xf

)9,0(1.0)1,0(9.0)(6 xf

Contaminated Normal Distribution 1

Page 17: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Skewness and tailweightSkewness and tailweight

025.05.0

5.0975.0

xx

xxS

125.0875.0

025.0975.0

xx

xxT

ParentParent SS TTf1(x) Uniform 1 1.267

f2(x) Gaussian 1 1.704

f3(x) Double exponential 1 2.161

f4(x) Cauchy 1 5.263

f5(x) Exponential 4.486 1.883

f6(x) Contamined normal 1

1 1.991

f7(x) Contamined normal 2

1.769 1.693

SkewnessSkewness TailweightTailweight

Page 18: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Comparative evaluation of testsComparative evaluation of tests

ShortShort

(T(T<1.5)<1.5)

MediumMedium

(1.5 < T < 2)(1.5 < T < 2)

LongLong

(T>2)(T>2)

SS~~11 KSKS KS – CVMKS – CVM CVM - ADCVM - AD

SS>1.5>1.5 KS - ADKS - AD ADAD CVM - ADCVM - AD

Skewness

Skewness

TailweightTailweight

2222 Supremum Supremum statistics statistics

teststests

Supremum Supremum statistics statistics

teststests

Tests Tests containing a containing a

weight functionweight function

Tests Tests containing a containing a

weight functionweight function< <

Page 19: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

4. Description of the4. Description of thepower study – phase IIpower study – phase II

SAMPLE 1SAMPLE 1

RESULTS:RESULTS:LOCATION-SCALE LOCATION-SCALE

ALTERNATIVEALTERNATIVE

SAMPLE 2SAMPLE 2TESTTEST

PARENT 1PARENT 1 PARENT 2PARENT 2MONTEMONTECARLOCARLO

REPLICATIONSREPLICATIONSk=1000k=1000

BINNED/UNBINNED DATABINNED/UNBINNED DATACHI2, KS, KSW, KSA, KUIPER, CVM, CVMA, ADA, ADCHI2, KS, KSW, KSA, KUIPER, CVM, CVMA, ADA, AD

““EMPIRICAL” + “MC” POWER EVALUATIONEMPIRICAL” + “MC” POWER EVALUATION

RESULTS:RESULTS:GENERAL GENERAL

ALTERNATIVEALTERNATIVE

LINEAR LINEAR POWERPOWER

CORRELATIONCORRELATIONBETWEENBETWEEN

TESTSTESTSISODYNESISODYNES

Page 20: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Which is the most suitable goodness-of-fit test?

EXAMPLE EXAMPLE : unbinned dataLateral profilesLateral profiles

5. A concrete example: IMRT5. A concrete example: IMRT

Mich

ela

Mich

ela

Pierge

ntili

Pierge

ntili

mn GFH :0

mn GFH :1

Page 21: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

GoF test selectionGoF test selection

025.05.0

5.0975.0

xx

xxS

125.0875.0

025.0975.0

xx

xxT

SkewnessSkewness TailweightTailweight

S = 1: symmetric distributionS < 1: left skewed distributionS > 1: right skewed distribution

T is always greater than 1, the longer the tail the greater

the value of T.

1.1. Classify the type of the distributions in terms of Classify the type of the distributions in terms of skewness S and tailweight Tskewness S and tailweight T

Page 22: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

Comparative evaluation of tests powerComparative evaluation of tests power

ShortShort

(T(T<1.3)<1.3)

MediumMedium

(1.3 < T < 2)(1.3 < T < 2)

LongLong

(T>2)(T>2)

SS~~11 KSKS KS – CVMKS – CVM CVM - ADCVM - AD

SS>1.5>1.5 KS - ADKS - AD ADAD CVM - ADCVM - ADSkewness

Skewness

TailweightTailweight

2. Choose the most appropriate test for the classified 2. Choose the most appropriate test for the classified type of distributiontype of distribution

Page 23: An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005

GoF test: test selection & resultsGoF test: test selection & results

Moderate skewed – medium tail

KOLMOGOROV-SMIRNOV TESTD=0.27 – p>0.05

X-variable: Ŝ=1.53 T=1.36

Y-variable: Ŝ=1.27 T=1.34

^

^

RESULTSRESULTS: unbinned data