Upload
juniper-octavio
View
77
Download
3
Embed Size (px)
DESCRIPTION
Chi-square test or c 2 test. What if we are interested in seeing if my “ crazy ” dice are considered “fair”? What can I do?. Chi-square test. Used to test the counts of categorical data Three types Goodness of fit (univariate) Independence (bivariate) - PowerPoint PPT Presentation
Citation preview
Chi-square testChi-square testor
2 test
What if we are interested in seeing if my “crazycrazy” dice are considered “fair”?
What can I do?
Chi-square testChi-square test•Used to test the countscounts of
categorical data•ThreeThree types
–Goodness of fit (univariate)– Independence (bivariate)–Homogeneity (univariate with two samples)
Chi-square distributions
Chi-square Distributions
0 5 10 15 20 25x
df = 1
df = 2
df = 3
df = 4
df = 5
df = 8
df = 10
df = 15
Upper-tail Areas for Chi-square DistributionsRight-tail area df = 1 df = 2 df = 3 df = 4 df = 5
> .100 < 2.70 < 4.60 < 6.25 < 7.77 < 9.230.100 2.70 4.60 6.25 7.77 9.230.095 2.78 4.70 6.36 7.90 9.370.090 2.87 4.81 6.49 8.04 9.520.085 2.96 4.93 6.62 8.18 9.670.080 3.06 5.05 6.75 8.33 9.830.075 3.17 5.18 6.90 8.49 10.000.070 3.28 5.31 7.06 8.66 10.190.065 3.40 5.46 7.22 8.84 10.380.060 3.53 5.62 7.40 9.04 10.590.055 3.68 5.80 7.60 9.25 10.820.050 3.84 5.99 7.81 9.48 11.070.045 4.01 6.20 8.04 9.74 11.340.040 4.21 6.43 8.31 10.02 11.640.035 4.44 6.70 8.60 10.34 11.980.030 4.70 7.01 8.94 10.71 12.370.025 5.02 7.37 9.34 11.14 12.830.020 5.41 7.82 9.83 11.66 13.380.015 5.91 8.39 10.46 12.33 14.090.010 6.63 9.21 11.34 13.27 15.080.005 7.87 10.59 12.83 14.86 16.740.001 10.82 13.81 16.26 18.46 20.51
< .001 > 10.82 > 13.81 > 16.26 > 18.46 > 20.51
Right-tail area df = 6 df = 7 df = 8 df = 9 df = 10 > .100 < 10.64 < 12.01 < 13.36 < 14.68 < 15.980.100 10.64 12.01 13.36 14.68 15.980.095 10.79 12.17 13.52 14.85 16.160.090 10.94 12.33 13.69 15.03 16.350.085 11.11 12.50 13.87 15.22 16.540.080 11.28 12.69 14.06 15.42 16.750.075 11.46 12.88 14.26 15.63 16.970.070 11.65 13.08 14.48 15.85 17.200.065 11.86 13.30 14.71 16.09 17.440.060 12.08 13.53 14.95 16.34 17.710.055 12.32 13.79 15.22 16.62 17.990.050 12.59 14.06 15.50 16.91 18.300.045 12.87 14.36 15.82 17.24 18.640.040 13.19 14.70 16.17 17.60 19.020.035 13.55 15.07 16.56 18.01 19.440.030 13.96 15.50 17.01 18.47 19.920.025 14.44 16.01 17.53 19.02 20.480.020 15.03 16.62 18.16 19.67 21.160.015 15.77 17.39 18.97 20.51 22.020.010 16.81 18.47 20.09 21.66 23.200.005 18.54 20.27 21.95 23.58 25.180.001 22.45 24.32 26.12 27.87 29.58
< .001 > 22.45 > 24.32 > 26.12 > 27.87 > 29.58
22 distribution distribution• Different df have different curves• Skewed right• Cannot take on negative values• As df increases, curve shifts
toward right & becomes more like a normal curvenormal curve
• Each curve has a mode at df-2 and a mean at df
2 2 assumptionsassumptions• SRS SRS – reasonably random sample• Have countscounts of categorical data &
we expect each category to happen at least once
• Sample sizeSample size – to insure that the sample size is large enough we should expect at least five in each category.
***Be sure to list expected counts!!
Combine these together:
All expected counts are at
least 5.
2 2 formulaformula
exp
expobs 22
22 (observed cell count - expected cell count)expected cell count
2 2 Goodness of fit testGoodness of fit test
• Uses univariate data (one sample, one variable)
• Want to see how well the observed counts “fit” what we expect the counts to be
• Use 22cdf functioncdf function on the calculator to find p-valuesp-values
Based on df –Based on df –
df = number of df = number of categoriescategories - 1 - 1
Let’s test our dice!Let’s test our dice!
Hypotheses – written in Hypotheses – written in wordswords
H0: proportions are equal
Ha: at least one proportion is not the same
Be sure to write in context!
Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to claim that successful people are more likely to be born under some signs than others?
Aries 23 Libra 18 Leo20
Taurus 20 Scorpio 21 Virgo 19
Gemini 18 Sagittarius19 Aquarius24
Cancer 23 Capricorn 22 Pisces29
How many would you expect in each sign if there were no difference between them?
How many degrees of freedom?
I would expect CEOs to be equally born under all signs.
So 256/12 = 21.333333Since there are 12 signs –
df = 12 – 1 = 11
Assumptions:
•Have a random sample of CEO’s
•All expected counts are greater than 5. (I expect 21.33 CEO’s to be born in each sign.)
H0: The proportions of CEO’s born under each sign are the same.
Ha: At least one of the proportion of CEO’s born under each sign is different.
2.) Compute the residuals. (Observed – Expected)
Sign Observed value
Expected value
(256/12)
Residual = Observed - expected
Aires 23 21.333 1.667
Taurus 20 21.333 -1.333
Gemini 18 21.333 -3.333
Cancer 23 21.333 1.667
Leo 20 21.333 -1.333
Virgo 19 21.333 -2.333
Libra 18 21.333 -3.333
Scorpio 21 21.333 -0.333
Sagittarius
19 21.333 -2.333
Capricorn 22 21.333 0.667
Aquarius 24 21.333 2.667
Pisces 29 21.333 7.667
3.) Square the residuals
Sign Observed value
Expected value
(256/12)
Residual = Observed - expected
(Observed-expected)2
Aires 23 21.333 1.667 2.778889
Taurus 20 21.333 -1.333 1.776889
Gemini 18 21.333 -3.333 11.108889
Cancer 23 21.333 1.667 2.778889
Leo 20 21.333 -1.333 1.776889
Virgo 19 21.333 -2.333 5.442889
Libra 18 21.333 -3.333 11.108889
Scorpio 21 21.333 -0.333 0.110889
Sagittarius
19 21.333 -2.333 5.442889
Capricorn 22 21.333 0.667 0.444889
Aquarius 24 21.333 2.667 7.112889
Pisces 29 21.333 7.667 58.782889
4. Compute the components for each cell
Sign Observed value
Expected value
(256/12)
Residual = Observed - expected
(Observed-expected)2
(Observed-expected)2
Expected value
Aires 23 21.333 1.667 2.778889 0.130262
Taurus 20 21.333 -1.333 1.776889 0.083293
Gemini 18 21.333 -3.333 11.108889 0.520737
Cancer 23 21.333 1.667 2.778889 0.130262
Leo 20 21.333 -1.333 1.776889 0.083293
Virgo 19 21.333 -2.333 5.442889 0.255139
Libra 18 21.333 -3.333 11.108889 0.520737
Scorpio 21 21.333 -0.333 0.110889 0.005198
Sagittarius
19 21.333 -2.333 5.442889 0.255139
Capricorn 22 21.333 0.667 0.444889 0.020854
Aquarius 24 21.333 2.667 7.112889 0.333422
Pisces 29 21.333 7.667 58.782889 2.755491
5. Find the sum of the components (that’s the chi-square statistic)
Sign Observed value
Expected value
(256/12)
Residual = Observed - expected
(Observed-expected)2
(Observed-expected)2
Expected value
Aires 23 21.333 1.667 2.778889 0.130262
Taurus 20 21.333 -1.333 1.776889 0.083293
Gemini 18 21.333 -3.333 11.108889 0.520737
Cancer 23 21.333 1.667 2.778889 0.130262
Leo 20 21.333 -1.333 1.776889 0.083293
Virgo 19 21.333 -2.333 5.442889 0.255139
Libra 18 21.333 -3.333 11.108889 0.520737
Scorpio 21 21.333 -0.333 0.110889 0.005198
Sagittarius
19 21.333 -2.333 5.442889 0.255139
Capricorn 22 21.333 0.667 0.444889 0.020854
Aquarius 24 21.333 2.667 7.112889 0.333422
Pisces 29 21.333 7.667 58.782889 2.755491
Σ = 5.094
P-value = 2cdf(5.094, 10^99, 11) = .9265 = .05
Since p-value > , I fail to reject H0. There is not sufficient evidence to suggest that the CEOs are born under some signs more than under others.
094.5
3.21
3.2129...
3.21
3.2120
3.21
3.2123222
2
Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short) A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df?
Are the results consistent with the theoretical distribution predicted by the genetic model? (see next page)
Expected counts:Y & N = 56.25Y & S = 18.75E & N = 18.75E & S = 6.25We expect 9/16 of the
100 flies to have yellow and normal
wings. (Y & N)
Since there are 4 categories,
df = 4 – 1 = 3
Assumptions:
•Have a random sample of fruit flies
•All expected counts are greater than 5. Expected counts:Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25
H0: The proportions of fruit flies are the same as the theoretical model.
Ha: At least one of the proportions of fruit flies is not the same as the theoretical model.
P-value = 2cdf(5.671, 10^99, 3) = .129 = .05
Since p-value > , I fail to reject H0. There is not sufficient evidence to suggest that the distribution of fruit flies is not the same as the theoretical model.
671.5
25.625.610
...75.18
75.182025.56
25.5659 2222
A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112 g Brazil nuts, 183 g of cashews, 207 g of almonds, 71 g or hazelnuts, and 446 g of peanuts. You wonder whether your mix is significantly different from what the company advertises?
Why is the chi-square goodness-of-fit test NOT appropriate here?
What might you do instead of weighing the nuts in order to use chi-square?
Because we do NOT have countscounts
of the type of nuts.We could countcount the
number of each type of nut and then perform a
2 test.
Example:Does the color of a car influence the chance that it will be stolen?Of 830 cars reported stolen, 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors.It is known that 15% of all cars are white, 15% are blue, 35% are red, 30% are black, and 5% are other colors.
Category Color Observed Expected
1 White 140 .15*830 = 124.5
2 Blue 100 .15*830 = 124.5
3 Red 270 .35*830 = 290.5
4 Black 230 .30*830 = 249
5 Other 90 .05*830 = 41.5
Category Color Observed Expected
1 White 140 124.5
2 Blue 100 124.5
3 Red 270 290.5
4 Black 230 249
5 Other 90 41.5
Let π1, π2, . . . Π5 denote true proportions of stolen cars that fall into the 5 color categories
Ho: π1 = .15, π2 = .15, π3 = .35, π4 = .30, π5 = .05
Ha: Ho is not true.
α = .0122 (observed cell count - expected cell count)
expected cell count Test statistic:
Assumptions: The sample was a random sample of stolen cars. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.
Calculations:
5.41
)5.4190(
0.249
)0.249230(
5.290
)5.290270(
5.124
)5.124100(
5.124
)5.124140( 222222
x
= 1.93 + 4.82 + 1.45 + 1.45 + 56.68= 66.33
P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with 4 df. The computed value is larger than 18.46, so P-value < .001.
Because P-value < α, Ho is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.