For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category

2test

•For testing significance of patterns in qualitative data

•Test statistic is based on counts that represent the number of items that fall in each category•Test statistics measures the agreement between actual counts and expected counts assuming the null hypothesis

Chi-squared Tests

2Distribution

The chi-square distribution can be used to see whether or not an observed counts agree with an expected counts.Let

O = observed count and

E = Expected count

Chi-squared Distribution

EEO 2)(2

Testing if Observed Countsare in Agreement with Known Percentages

Consider items of a population distributed over k categories in in proportions

If H0 is true then we expectEi = n , expected frequency

for the ith category as opposed to Oi, observed frequency.

k ...

,2,

1

0:

0 iiH

0i

Observed ExpectedFrequency Frequency

H 40 50T 60 50

sum 100 100

An ExampleBiased Coin?

2

22

2 2

2 2

40 5050

60 5050

1050

1050

10050

10050

2 2

4

statistic formula

O EE

( )

( ) ( )

( ) ( )

2distribution

degrees of freedom = (R –1)(C – 1)

R = number of rows

C = number of columns

2distribution

Is our chi square value an extreme outcome just by chance while in fact the null hypothesis is true and sample frequencies are not significantly apart from the ideal frequencies?

Note that chi-squared statistic is a positive number

2test

•only the right-hand sideof the table is used

•nondirectional test

•the statistic has no sign

Biased Die?

Observed ExpectedDie Frequency Frequency

1 4 102 6 103 17 104 16 105 8 106 9 10

sum 60 60

2

22

2 2

2 2

2 2

4 1010

6 1010

17 1010

16 1010

8 1010

9 1050

14 2

statistic formula

O EE

( )

( ) ( )

( ) ( )

( ) ( )

.

2distribution

degrees of freedom =

number of terms -1

2 x 2 contingency tablesChi-squared test for independence

Var A

Var B

a1

a2

b1 b2 total

total

Ho : The two variable are independent

Ha : The two variables are associated

2 2 2test x contingency

tables

Operator

Result

A

B

defnotdef. total

total

100 900 1000

60 440 500

160 1340 1500

2test

2test

Operator

Result

A

B

defnotdef.

total

total

100 900 1000

60 440 500

160 1340 1500

Total number of items=1500

Total number of defective items=160

Overall defective rate =160/1500=0.1067

Now, apply this rate to the number of items produced by each operator.

2test

Operator

Result

A

B

defnotdef.

total

total

100 900 1000

60 440 500

160 1340 1500

Expected defective from Operator A

= 1000 * 0.1067 = 106.7

(expected not defective=1000-106.7=893.3)

Expected defective from Operator B

= 500 * 0.1067 = 53.3

(expected not defective=500-53.3=446.7)

2test

Operator

Expected

A

B

defnotdef. total

total

106.7 893.3

53.3 446.7

Result

OperatorA

B

defnotdef.

total

total

100 900 1000

60 440 500

160 1340 1500

2test

r x c contingency tables

SA A NO D SD

Gr 1 12 18 4 8 12

Gr2 48 22 10 8 10

Gr3 10 4 12 10 12

2test

•use when you have categorical data

•measure the difference between actual counts and expected counts

•test the independence of two variables

•Assumptions:data set is a random sampleyou have at least 5 counts in each category

•degrees of freedom =(categories var1 -1)(categories var2 -1)

Documents

For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category