Upload
cynthia-price
View
223
Download
2
Embed Size (px)
Citation preview
2test
•For testing significance of patterns in qualitative data
•Test statistic is based on counts that represent the number of items that fall in each category•Test statistics measures the agreement between actual counts and expected counts assuming the null hypothesis
Chi-squared Tests
2Distribution
The chi-square distribution can be used to see whether or not an observed counts agree with an expected counts.Let
O = observed count and
E = Expected count
Chi-squared Distribution
EEO 2)(2
Testing if Observed Countsare in Agreement with Known Percentages
Consider items of a population distributed over k categories in in proportions
If H0 is true then we expectEi = n , expected frequency
for the ith category as opposed to Oi, observed frequency.
k ...
,2,
1
0:
0 iiH
0i
Observed ExpectedFrequency Frequency
H 40 50T 60 50
sum 100 100
An ExampleBiased Coin?
2
22
2 2
2 2
40 5050
60 5050
1050
1050
10050
10050
2 2
4
statistic formula
O EE
( )
( ) ( )
( ) ( )
2distribution
degrees of freedom = (R –1)(C – 1)
R = number of rows
C = number of columns
2distribution
Is our chi square value an extreme outcome just by chance while in fact the null hypothesis is true and sample frequencies are not significantly apart from the ideal frequencies?
Note that chi-squared statistic is a positive number
2test
•only the right-hand sideof the table is used
•nondirectional test
•the statistic has no sign
Biased Die?
Observed ExpectedDie Frequency Frequency
1 4 102 6 103 17 104 16 105 8 106 9 10
sum 60 60
2
22
2 2
2 2
2 2
4 1010
6 1010
17 1010
16 1010
8 1010
9 1050
14 2
statistic formula
O EE
( )
( ) ( )
( ) ( )
( ) ( )
.
2distribution
degrees of freedom =
number of terms -1
2 x 2 contingency tablesChi-squared test for independence
Var A
Var B
a1
a2
b1 b2 total
total
Ho : The two variable are independent
Ha : The two variables are associated
2 2 2test x contingency
tables
Operator
Result
A
B
defnotdef. total
total
100 900 1000
60 440 500
160 1340 1500
2test
2test
Operator
Result
A
B
defnotdef.
total
total
100 900 1000
60 440 500
160 1340 1500
Total number of items=1500
Total number of defective items=160
Overall defective rate =160/1500=0.1067
Now, apply this rate to the number of items produced by each operator.
2test
Operator
Result
A
B
defnotdef.
total
total
100 900 1000
60 440 500
160 1340 1500
Expected defective from Operator A
= 1000 * 0.1067 = 106.7
(expected not defective=1000-106.7=893.3)
Expected defective from Operator B
= 500 * 0.1067 = 53.3
(expected not defective=500-53.3=446.7)
2test
Operator
Expected
A
B
defnotdef. total
total
106.7 893.3
53.3 446.7
Result
OperatorA
B
defnotdef.
total
total
100 900 1000
60 440 500
160 1340 1500
2test
r x c contingency tables
SA A NO D SD
Gr 1 12 18 4 8 12
Gr2 48 22 10 8 10
Gr3 10 4 12 10 12
2test
•use when you have categorical data
•measure the difference between actual counts and expected counts
•test the independence of two variables
•Assumptions:data set is a random sampleyou have at least 5 counts in each category
•degrees of freedom =(categories var1 -1)(categories var2 -1)