27
Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said “coke”. 70 people said “pepsi”. 30 people said “sprite”. Test the hypothesis that people differ in their soda preferences. Solution: H 0 : p pepsi = p coke =p sprite =1/3 H A : not all p’s are equal Use a one-way chi-square test. Pepsi Coke Sprit e 70 50 30

Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Embed Size (px)

Citation preview

Page 1: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Datasets with counts

150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said “coke”. 70 people said “pepsi”. 30 people said “sprite”. Test the hypothesis that people differ in their soda preferences.

Solution:

H0: ppepsi= pcoke=psprite=1/3

HA: not all p’s are equal

Use a one-way chi-square test.

Pepsi Coke Sprite

70 50 30

Page 2: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

One-way Chi-Square Test (2)

• Used when your dependent variable is counts within categories (# pepsi lovers, # coke lovers, # sprite lovers)

• Used when your DV has two or more mutually exclusive categories

• Compares the counts you got in your sample to those you would expect under the null hypothesis

• Also called the Chi-Square “Goodness of Fit” test.

Page 3: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

One-way 2 example

Which power would you rather have: flight, invisibility, or x-ray vision?

Flight Invisibility X-ray vision

18 people 14 people 10 people

Is this difference significant, or is just due to chance?

Page 4: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

One-way 2 example

H0: pfly = pinvis = pxray = 1/3

HA: not all p’s are equal

Flight Invisibility X-ray vision

fo= 18 fo= 14 fo= 10

Step 1: Write hypotheses

Step 2: Write the observed frequencies, and also the frequencies that would be expected under the null hypothesis

N=42

Page 5: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

One-way 2 example

Flight Invisibility X-ray vision

fo= 18fe= 14

fo= 14fe= 14

fo= 10fe= 14

H0: pfly = pinvis = pxray = 1/3

HA: not all p’s are equal

Step 1: Write hypotheses

Step 2: Write the observed frequencies, and also the frequencies that would be expected under the null hypothesis

N=42

Page 6: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

One-way 2 example

Flight Invisibility X-ray vision

fo= 18fe= 14

fo= 14fe= 14

fo= 10fe= 14

N=42

Step 3: Compute the relative squared discrepanciese

eo

f

ff 2)(

143.1)( 2

e

eo

f

ff0

)( 2

e

eo

f

ff143.1

)( 2

e

eo

f

ff

And sum them up

e

eo

f

ff 22 )( 286.2143.10143.1

1kdf

Page 7: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

One-way 2 example

Flight Invisibility X-ray vision

f0= 18fe= 14

f0= 14fe= 14

f0= 10fe= 14

N=42

143.1)( 2

0

e

e

f

ff0

)( 20

e

e

f

ff143.1

)( 20

e

e

f

ff

286.22 2df

Step 4: Compare to critical value of2

99.52 crit Retain null!

Page 8: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Steps:

1) State hypotheses

2) Write observed and expected frequencies

3) Get 2 by summing up relative squared deviations

4) Use Table I to get critical 2

Calculating one-way 2

Page 9: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Practice

Suppose we ask 200 randomly selected people if they think that voting should be made compulsory. The data come out like this:

No Yes

fo= 84 f0= 116

Is there evidence for a clear preference?

e

eo

f

ff 22

Page 10: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Practice

Suppose we ask 200 randomly selected people if they think that voting should be made compulsory. The data come out like this:

No Yes

f0= 84 f0= 116

fe=100 fe=100

56.2

2

e

eo

f

ff 56.2

2

e

eo

f

ff

12.52 1df

84.32 crit Reject null!

Page 11: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

• we tested H0 that all cell frequencies are equal• But can test any expected frequencies• example – political affiliation among psych grad students:

Democrat Republican Independent

9 5 18

• political affiliation in the U.S. (Gallup):

Democrat Republican Independent

46% 43% 11%

Other null hypotheses

Page 12: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Other null hypotheses

Democrat Republican Independent

fo= 9fe= 14.7

fo= 5fe= 13.8

fo= 18fe= 3.5

N=32

21.2)( 2

e

eo

f

ff61.5

)( 2

e

eo

f

ff07.60

)( 2

e

eo

f

ff

89.87)( 2

2

e

eo

f

ff 21kdf

Is the distribution for psych grad students different than the distribution for the U.S.?

If not, then 46% of the 32 students would be Democrats, 43% would be republican, and 11% would be independent

99.52 crit Reject null!

Page 13: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Points of interest about 2

1. 2 cannot be negative

2. 2 will be zero only if each observed frequency exactly equals the expected frequency

3. The larger the discrepancies, the larger the 2

4. The greater the number of groups, the larger the 2. That’s why 2 distribution is a family of curves with df = k-1.

Page 14: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Two Factor Chi-SquareA 1999 New Jersey poll sampled people’s opinions concerning the use of the death penalty for murder when given the option of life in prison instead. 800 people were polled, and the number of men and women supporting each penalty were tabulated.

Preferred Penalty

Death Penalty Life in Prison No Opinion

Female 151 179 80

Male 201 117 72

Contingency table: shows contingency between two variablesAre these two variables (gender, penalty preference)

independent??

Page 15: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Two-Factor Chi-Square Test

• Used to test whether two nominal variables are independent or related

• E.g. Is gender related to socio-economic class?

• Compares the observed frequencies to the frequencies expected if the variables were independent

• Called a chi-squared test of independence

• Fundamentally testing, “do these variables interact”?

Page 16: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

ExampleA 1999 New Jersey poll sampled people’s opinions concerning the use of the death penalty for murder when given the option of life in prison instead. 800 people were polled, and the number of men and women supporting each penalty were tabulated.

Preferred Penalty

Death Penalty Life in Prison No Opinion

Female 151 179 80

Male 201 117 72

H0: distribution of female preferences matches distribution of male preferences

HA: female proportions do not match male proportions

Page 17: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Example

Preferred Penalty

Death Penalty Life in Prison No Opinion

Female f0= 151fe= ___

f0= 179fe= ___

f0= 80fe= __

Male f0= 201fe= ___

f0= 117fe= ___

f0= 72fe= __

Page 18: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Example

Preferred Penalty

Death Penalty Life in Prison No Opinion

Female f0= 151fe= 133.3?

f0= 179fe= 133.3?

f0= 80fe= 133.3?

Male f0= 201fe= 133.3?

f0= 117fe= 133.3?

f0= 72fe= 133.3?

WRONG -- this is saying there is an equal # of men and women, and an equal preference for prison sentences (e.g. no main effects).

We are willing to let there be main effects. We just want to test whether the distribution of preferences for men and women is the same (e.g. no interaction effects)

Page 19: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Example

Preferred Penalty

Death Penalty Life in Prison No Opinion

Female f0= 151fe= ___

f0= 179fe= ___

f0= 80fe= __

Male f0= 201fe= ___

f0= 117fe= ___

f0= 72fe= __

We need to look at the marginal totals to get our expected frequencies

Page 20: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Example

Preferred Penalty

Death Penalty Life in Prison No Opinion frow

Female f0= 151fe= ___

f0= 179fe= ___

f0= 80fe= __

410

Male f0= 201fe= ___

f0= 117fe= ___

f0= 72fe= __

390

fcol 352 296 152 n = 800

Page 21: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Example

Preferred Penalty

Death Penalty Life in Prison No Opinion frow

Female f0= 151fe= ___

f0= 179fe= ___

f0= 80fe= __

410

Male f0= 201fe= ___

f0= 117fe= ___

f0= 72fe= __

390

fcol 352pdeath=.44

296plife=.37

152pnone=.19

n = 800

Page 22: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Example

Preferred Penalty

Death Penalty Life in Prison No Opinion frow

Female f0= 151fe=.44(410)

f0= 179fe=.37(410)

f0= 80fe=.19(410)

410

Male f0= 201fe=.44(390)

f0= 117fe=.37(390)

f0= 72fe=.19(390)

390

fcol 352pdeath=.44

296plife=.37

152pnone=.19

n = 800

Page 23: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Example

Preferred Penalty

Death Penalty Life in Prison No Opinion frow

Female f0= 151fe=180.4

f0= 179fe=151.7

f0= 80fe=77.9

410

Male f0= 201fe=171.6

f0= 117fe=144.3

f0= 72fe=74.1

390

fcol 352pdeath=.44

296plife=.37

152pnone=.19

n = 800

e

eo

f

ff 22 02.2006.016.504.506.091.479.4

)1)(1( preferencegender kkdf 221 99.52 crit Reject null!

Page 24: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Steps:1) State hypotheses2) Get expected frequencies

3) Get 2 by summing up relative squared deviations4) Use table to get critical 2

Calculating two-way 2

)( rowcol

e fn

ff

Page 25: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Practice

Suppose we want to determine if there is any relationship between level of education and medium through which one follows current events. We ask a random sample of high school graduates and a random sample of college graduates whether they keep up with the news mostly by reading the paper or by listening to the radio or by watching television.

radio paper TV

HS 10 29 61

college 24 44 32

)( rowcol

e fn

ff

e

eo

f

ff 22

Page 26: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Practice

radio paper TV frow

HS fo=10

fe=17

fo=29

fe=36.5

fo=61

fe=46.5

100

college fo=24

fe=17

fo=44

fe=36.5

fo=32

fe=46.5

100

fcol 34pradio= .17

73ppaper= .365

93pTV= .465

N=200

e

eo

f

ff 22 = 17.89

99.52 critdf = (2)*(1) = 2

Page 27: Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said

Assumptions of Chi-Square Test

1. Categories are mutually exclusive

– A subject cannot be counted in more than one cell

2. Expected frequency in each cell must be

– at least 10 when kA and kB are less than or equal to 2

– at least 5 when kA or kB is greater than 2 (e.g., a 2x3 design)

– N must be sufficiently large to ensure that this is true