M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing

Mobile Computing Group

A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing

Outline

Background -concepts

Goodness-of-fit (GoF)

Chi2 tests for GoF

The presentation follows the pyramid schema

Background

Descriptive vs. inferential statistics

– Descriptive : data used only for descriptive purposes (use tables, graphs, measures of variability etc.)

– Inferential : data used for drawing inferences, make predictions etc.

Sample vs. population

– A sample is drawn from a population, assumed to have some characteristics.

– The sample is often used to make inferences about the population (inferential statistics) :

Hypothesis testing

Estimation of population parameters

Background

Statistic vs. parameter

– A statistic is related (estimated from) a sample. It can be used for both descriptive and inferential purposes

– A parameter refers to the whole population. A sample statistic is often used to infer a population parameter

Example : the sample mean may be used to infer the population mean (expected value)

Hypothesis testing

– A procedure where sample data are used to evaluate a hypothesis regarding the population

– A hypothesis may refer to several things : properties of a single population, relation between two populations etc.

– Two statistical hypotheses are defined: a null H0 and an alternative H1

H0 is the often a statement of no effect or no difference. It is the hypothesis the researcher seeks to reject

Background

Inferential statistical test

– Hypothesis testing is carried out via an inferential statistic test :

Sample data are manipulated to yield a test statistic

The obtained value of the test statistic is evaluated with respect to a sampling distribution, i.e., a theoretical probability distribution for the possible values of the test statistic

The theoretical values of the statistic are usually tabulated and let someone assess the statistical significance of the result of his statistical test

The goodness-of-fit is a type of hypothesis testing

– devise inferential statistical tests, apply them to the sample, infer the matching of a theoretical distribution to the population distribution

GoF as hypothesis testing

Hypothesis H0:

– The sample is derived from a theoretical distribution F()

The sample data are manipulated to derive a test statistic

– In the case of the chi2 statistic this includes aggregation of data into bins and some computations

The statistic, as computed from data, is checked against the sampling distribution

– For the chi2 test, the sampling distribution is the chi2 distribution, hence the name

Goodness-of-fit

Statistical tests and statistics : the big picture

Chi2 type tests

EDF-based tests

Specialized tests

Classical chi2 statistics

Generalized chi2 statistics

Pearson chi2 statistic

Modified chi2 statistic

Log-likelihood ratio statistic

e.g., KS test, Anderson-Darling test

e.g., Shapiro-Wilk test for normality

Pearson chi2 statistic

M : number of bins

Oi (Ni): observed frequency in bin i

n : sample size

Ei (npi) : expected frequency in bin i according to the theoretical distribution F()

M

i i

iiM

i i

ii

pn

pnN

E

EOX

1

2

1

22

i

ji xdFiXPp )bin infalls(

If X1, X2, X3…Xn , the random sample and F() the theoretical distribution under test,

the Pearson chi2 statistic is computed as:

Interpretation of chi2 statistic

Theory says that the Pearson chi2 statistic follows a chi2 distribution, whose df are

– M-1, when the parameters of the fitted distribution are given a priori (case 0 test)

– Somewhere between M-1 and M-1-q, when the q parameters of the distribution are estimated by the sample data

– Usually, the df for this case are taken to be M-1-q

Having estimated the value of the chi2 statistic X2 , I check the chi2 distribution with M-1 (M-1-q) df to find

– What is the probability to get a value equal to or greater than the computed value X2, called p-value

– If p > a, where a is the significance level of my test, the hypothesis is rejected, otherwise it is retained

– Standard values for a are 0.1, 0.05, 0.01 – the higher a is the more conservative I am in rejecting the hypothesis H0

Example

A die is rolled 120 times

1 comes 20 times, 2 comes 14, 3 comes 18, 4 comes 17, 5 comes 22 and 6 comes 29 times

The question is: “Is the die biased?” –or better: “Do these data suggest that the die is biased?”

Hypothesis H0 : the die is not biased

– Therefore, according to the null hypothesis these numbers should be distributed uniformly

– F() : the discrete uniform distribution

Example – cont.

Interpretation

– The distribution of the test statistic has 5 df

– The probability to get a value smaller or equal than 6.7 under a chi2 distribution with 5 df (p-value) is 0.75, which is < 1-a for all a in {0.01..0.1}.

– Therefore the hypothesis that the die is not biased cannot be rejected

Computations:

Bin Oi Ei Oi- Ei (Oi- Ei)2 (Oi- Ei)

2/ Ei 1 20 20 0 0 0 2 14 20 -6 36 1.8 3 18 20 -2 4 .2 4 17 20 -3 9 .45 5 22 20 2 4 .2 6 29 20 9 81 4.05

Sums 120 120 0 X2=6.7

Interpretation of Pearson chi2

Graphical illustration

z

25zf z

6.7 11.07 15.099.24

P-value : 0.25 0.1 0.05 0.01

10% of the area under the curve

At 10% significance level, I would reject the hypothesis if the computed X2>9.24)

Properties of Pearson chi2 statistic

It can be estimated for both discrete and continuous variables

– Holds for all chi2 statistics. Max flexibility but fails to make use of all available information for continuous variables

It is maybe the simplest one from computational point of view

As with all chi2 statistics, one needs to define number and borders of bins

– These are generally a function of sample size and the theoretical distribution under test

Bin selection

How many and which?

– Different opinions in literature, no rigid proof of optimality

There seems to be convergence on the following aspects

– Probability of bins

The bins should be chosen equiprobable with respect to the theoretical distribution under test

– Minimum expected frequencies npi :

(Cramer, 46) : npi > 10, for all bins

(Cochran, 54) : npi > 1 for all bins, npi >= 5 for 80% of bins

(Roscoe and Byars,71)

Bin selection

Relevance of bins M to sample size N

– (Mann and Wald, 42), (Schorr, 74) : for large sample sizes

1.88n2/5 < M < 3.76n2/5

– (Koehler and Larntz,80) : for small sample size

M>=3, n>=10 and n2/M>=10

– (Roscoe and Byars, 71)

Equi-probable bins hypothesis : N > M when a = 0.01 and a = 0.05

Non-equiprobable bins : N>2M (a = 0.05) and N>4M (a=0.01)

Bin selection

Bins vs. sample size according to Mann and Ward

Bin selection : cont. vs. discrete

xFx

xFx

x

x

0.10.20.30.40.50.60.70.80.91.0

Bin i

Equi-probable bins easy to select

1 2 3 4 5 6 7

1.0

Less straightforward to define equi-probable bins

References

D.J. Sheskin, Handbook of parametric and nonparametric statistical procedures

– Introduction (descriptive vs. inferential statistics, hypothesis testing, concepts and terminology)

– Test 8 (chap. 8) – The Chi-Square Goodness-of-Fit Test (high-level description with examples and discussion on several aspects)

R. Agostino, M. Stephens, Goodness-of-fit techniques

– Chapter 3 – Tests of Chi-square type

Reviews the theoretical background and looks more generally at chi2 tests, not only the Pearson test.

Textbooks

References

S. Horn, Goodness-of-Fit tests for discrete data: A review and an Application to a Health Impairment scale

– Good discussion of the properties and pros/cons of most goodness-of-fit tests for discrete data

– accessible, tutorial-like

Papers

Documents

M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing