Upload
kenneth-ball
View
213
Download
1
Embed Size (px)
Citation preview
Chi-square Basics
The Chi-square distribution
2
1
n
ii
z
• Positively skewed but becomes symmetrical with increasing degrees of freedom
• Mean = k where k = degrees of freedom• Variance = 2k• Assuming a normally distributed dataset
and sampling a single z2 value at a time 2(1) = z2
– If more than one… 2(N) =
Why used?
• Chi-square analysis is primarily used to deal with categorical (frequency) data
• We measure the “goodness of fit” between our observed outcome and the expected outcome for some variable
• With two variables, we test in particular whether they are independent of one another using the same basic approach.
One-dimensional
• Suppose we want to know how people in a particular area will vote in general and go around asking them.
• How will we go about seeing what’s really going on?
Republican Democrat Other
20 30 10
• Hypothesis: Dems should win district
• Solution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference
22 ( )O E
E
• Plug in to formula
Republican Democrat Other
Observed 20 30 10Expected 20 20 20
2 2 2(20 20) (30 20) (10 20)
20 20 20
• Reject H0
• The district will probably vote democratic
• However…
2
2.05
(2) 10
5.99
Conclusion
• Note that all we really can conclude is that our data is different from the expected outcome given a situation– Although it would appear that the district will vote
democratic, really we can only conclude they were not responding by chance
– Regardless of the position of the frequencies we’d have come up with the same result
– In other words, it is a non-directional test regardless of the prediction
More complex
• What do stats kids do with their free time?
TV Nap Worry Stare at Ceiling
Males 30 40 20 10Females 20 30 40 10
• Is there a relationship between gender and what the stats kids do with their free time?
• Expected = (Ri*Cj)/N• Example for males TV: (100*50)/200 = 25
TV Nap Worry Stare at Ceiling
Total
Males 30 40 20 10 100Females 20 30 40 10 100
50 70 60 20 200
• df = (R-1)(C-1)– R = number of rows– C = number of columns
TV Nap Worry Stare at Ceiling
Total
Males (E) 30 (25) 40 (35) 20 (30) 10 (10) 100Females (E)
20 (25) 30 (35) 40 (30) 10 (10) 100
50 70 60 20 200
Interpretation
• Reject H0, there is some relationship between gender and how stats students spend their free time
2
2.05
(3) 10.10
7.82
Other
• Important point about the non-directional nature of the test, the chi-square test by itself cannot speak to specific hypotheses about the way the results would come out
• Not useful for ordinal data because of this
Assumptions
• Normality– Rule of thumb is that we need at least 5 for our expected
frequencies value
• Inclusion of non-occurences– Must include all responses, not just those positive ones
• Independence– Not that the variables are independent or related (that’s what the
test can be used for), but rather as with our t-tests, the observations (data points) don’t have any bearing on one another.
• To help with the last two, make sure that your N equals the total number of people who responded
Measures of Association
• Contingency coefficient
• Phi
• Cramer’s Phi
• Odds Ratios
• Kappa
• These were discussed in 5700