Upload
cecily-alexander
View
239
Download
1
Embed Size (px)
Citation preview
Chi-Square Tests
Chi-Square Tests
• Categorical data
• 1-sample, compared to theoretical distribution– Goodness-of-Fit Test
• 2+ samples, 2+ levels of response variable– Chi-square Test
Slide #1
Chi-square Slide #2
Chi-Square -- Examples
• Does the dominant plants in plots differ between two locations?
• Does the frequency of females in majors differ between majors in the natural sciences, social sciences, and humanities?
• Does the occurrence of a food item in the stomachs of lake trout and chinook salmon differ?
Chi-square Slide #3
What do those examples have in common?
• A categorical response variable– dominant plant in a plot– sex of student (male or female)– occurrence of a food item (Y/N)
• Compare response frequencies among >2 groups– between two locations– among three divisions– between lake trout and chinook salmon
Chi-square Slide #4
An Illustrative Example• When Chinook Salmon were first introduced to
Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring.
Chi-square Slide #5
Observed Table
– Recall – “… the diets of 50 Lake Trout and 40 Chinook Salmon … found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring”
LH no LH Total
Lake TroutCh. Salmon
Total
5040
36 1424 16
3060 90
Chi-square Slide #6
Observed Table
• If there is no difference between rows (i.e., the Ho) then the total row could represent either row.
• Thus, the proportion of predator (regardless of type) that consumed Lake Herring is estimated to be 60/90 or 0.67
LH no LH Total
Lake Trout 36 14 50Ch. Salmon 24 16 40
Total 60 30 90
Chi-square Slide #7
Expectations if Ho is true• If there is no difference and the common
proportion is estimated by 0.67 then how many ….
•LT do we expect to have LH = 50*0.67
•LT … … to not have LH = 50*0.33
•CS … … to have LH = 40*0.67
•CS … … to not have LH = 40*0.33
90
60*50
90
30*50
90
60*40
90
30*40
Chi-square Slide #8
Create Expected Table
LH no LH Total
Lake Trout 50Ch. Salmon 40
Total 60 30 90
90
60*50• LT to have LH = = 33.3
33.3
Chi-square Slide #9
LH no LH Total
Lake Trout 50Ch. Salmon 40
Total 60 30 90
Create Expected Table
90
30*50• LT to NOT have LH = = 16.7
16.726.733.3
13.316.7
• Expected counts are the product of the marginal totals divided by the table total.
Chi-Square Tests Slide #10
A New Test Statistic
table
22
ectedexp
ectedexpobserved
df = (rows-1)*(cols-1)
Chi-Square Tests Slide #11
Chi-Square Distribution• Right-skewed (all values are positive)• Less sharply skewed with increasing df
– df are related to the size of the table, not n
• All p-values are “right-ofs” – no “one-tailed” tests with chi-square
• Examine HO – page 1
0 10 20 30 40 50
Chi-square
Chi(3)Chi(10)Chi(20)
Chi-square Slide #12
Chi-Square Test• Ho: “distribution of individuals into the levels is
same for each population”• HA: “distribution of individuals into levels is
different for at least one pair of populations”• Assume: at least 5 in each cell of expected table• Statistic: Observed frequency table
• Test Statistic:
• df: (rows-1)*(columns-1)• When: categorical variable, 2+ populations/groups
table
22
ectedexp
ectedexpobserved
Chi-square Slide #13
A Full Example• When Chinook Salmon were first introduced to
Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring.
Chi-square Slide #14
• Modification -- the researchers recorded what the dominant food item was. Do the dominant food items in Lake Trout and Chinook Salmon differ at the 5% level?
• See R HO Page 2.
LH smelt Mysis Total
Lake Trout 32 10 8 50Ch. Salmon 18 18 4 40
Total 50 28 12 90
Another Full Example
Chi-Square Tests
Examine HO – Page 3
Slide #15
Chi-Square Tests Slide #17
Goodness-of-Fit Test
• Compare observed to theoretical frequencies of individuals in categories.
• Examples –– Test whether responses are “random” (e.g., preference)– Test Mendelian genetics (e.g., 3:1 and 9:3:3:1 theories).– Test use of available resources (e.g., compare habitat
usage to availability).
Chi-Square Tests Slide #18
An Illustrative Example
• Determine, at the 10% level, if Northland students prefer the Chris Duarte Group (CDG), Ronnie Baker Brooks (RBB), or Bernard Allison (BA).
• Hypotheses?• Ha: “different # of students prefer each artist”
• Ho: “same # of students prefer each artist”
Chi-Square Tests Slide #19
• Under Ho, what proportion prefer each artist?
• If n=78, how many students prefer each artist if Ho is true?
Artist CDG RBB BA
Freq 26 26 26
1/3
26
An Illustrative Example
ExpectedTable
Chi-Square Tests Slide #20
• Suppose these results were obtained:
Artist CDG RBB BA
Freq 24 38 16
• Is there a preference – i.e., are these observations significantly different from what was expected when assuming no preference?
An Illustrative Example
ObservedTable
Chi-Square Tests Slide #21
A New Test Statistic
table
22
ectedexp
ectedexpobserved
df = cells - 1
Chi-Square Tests Slide #22
Artist CDG RBB BA
# 24 38 16
Artist CDG RBB BA
# 26 26 26
26
2624 2
26
2638 2 26
2616 2c2 =
c2 = 0.15 + 5.54 + 3.85 = 9.54
df = (3-1) = 2 p-value = 0.00848
Conclusion?
An Illustrative Example
ObservedTable
ExpectedTable
Chi-Square Tests Slide #23
Goodness-of-Fit Test
• Ho: distribution of individuals into levels follows the theoretical distribution
• HA: distribution of individuals into levels does NOT follow the theoretical distribution
• Sample: randomized, single variable of size n
• Assume: at least 5 in each cell of expected table
• Statistic: Observed frequency table
Chi-Square Tests Slide #24
Goodness-of-Fit Test
• Test Statistic:
• df: cells-1
• Confidence Region:
–
table
22
ectedexp
ectedexpobserved
n
p̂1p̂*zp̂
where is sample proportion in level of interestp̂
Chi-Square Tests
Examine HO – Page 5
Slide #25