13
1 Chi-Square Test Introduction The chi-square is Non parametric test Makes no assumption about distribution of population Use data at the nominal scale Chi-Square Test In Chi-square we compare the observed frequency of some Observation With the expected frequency The comparison of observed and expected frequencies is used to calculate the value of the chi-square statistic The symbol for chi-square and the formula are as follows: where O is the observed frequency, and E is the expected frequency. How will you show whether the distribution of absenteeism is significant..? The HR Manager at Georgetown Paper Ltd. Is concerned about absenteeism among workers. He decides to sample the company records to determine whether absenteeism is distributed evenly throughout the six day work week. Week Day Number Absent Mon 12 Tue 9 Wed 11 Thu 10 Fri 9 Sat 9

BRM Chi Square Test

Embed Size (px)

Citation preview

Page 1: BRM Chi Square Test

1

Chi-Square Test

Introduction

• The chi-square is Non parametric test

• Makes no assumption about distribution

of population

• Use data at the nominal scale

Chi-Square Test

In Chi-square we compare the observed frequency of some

Observation With the expected frequency

The comparison of observed and expected frequencies is used to

calculate the value of the chi-square statistic

The symbol for chi-square and the formula are as follows:

where

• O is the observed frequency, and

• E is the expected frequency.

How will you show whether the distribution of absenteeism is significant..?

The HR Manager at Georgetown Paper Ltd. Is

concerned about absenteeism among workers. He

decides to sample the company records to

determine whether absenteeism is distributed evenly

throughout the six day work week.

Week Day Number Absent

Mon 12

Tue 9

Wed 11

Thu 10

Fri 9

Sat 9

Page 2: BRM Chi Square Test

2

Chi-Square Test (goodness-of-fit test)

Suppose we required to purchase

computer for college use, select

Apple Computers, IBM

Computers, or Some other brand

of computer. We want to know if

there is a significant difference

among the frequencies with which

these three brands of computers

are selected or if the students select

equally among the three brands.

The data for 100 students is

recorded in the table.

Computer

No. of

people

preferred

IBM 47

Apple 36

Other 17

Frequency Table

Frequency with which students select computer brand

Computer Observed

Frequency

IBM 47

Apple 36

Other 17

From the table we can see that:

Expected

Frequency (O-E)2/E

33.333 5.604

33.333 0.213

33.333 8.003

Total (chi-square)

13.820

Equal expected frequencies

This table also indicated the expected frequency for

each category. Since there are 100 measures or

observations and there are three categories (Apple,

IBM, and Other) we would indicate the expected

frequency for each category to be 100/3 or 33.333.

Page 3: BRM Chi Square Test

3

Critical Value for Chi-Square

The degree of freedom

It is the number of constraints needed to calculate the critical value

Degree of freedom (df)= (C-1)

Chi-Square Test (goodness-of-fit test)

Null hypothesis:

• there are no differences between the observed and

the expected frequencies.

Alternate hypothesis

• there are significant differences between the

observed and expected frequencies.

Set the alpha level.

• alpha level at .05

• Calculate the critical value with respect to the

degrees of freedom & alpha level

df = C – 1

=(3 -1)= 2

for df =2 & Alpha =.05, the critical value is 5.991

Chi-Square Test (goodness-of-fit test)

Page 4: BRM Chi Square Test

4

• Write the decision rule for rejecting the null

hypothesis.

Reject H0 if Chi-Square > or = 5.991.

Chi-Square Test (goodness-of-fit test)

Write a summary statement based on the decision.

Since our calculated value of (13.820) is greater than

5.991, we reject the null hypothesis and not reject the

alternative hypothesis.

Chi-Square (goodness-of-fit test) with equal expected frequencies

Chi-Square Test (goodness-of-fit test)

In a national study, students required to buy computers

for college use bought IBM computers 50% of the time,

Apple computers 25% of the time, and other computers

25% of the time. A survey on 100 new students shows that

36 bought Apple Computers, 47 bought IBM computers,

and 17 bought some other brand of computer. We want to

know if these frequencies of computer buying behavior is

similar to or different than the national study data.

Page 5: BRM Chi Square Test

5

• Null hypothesis

• there are no differences between the observed

and the expected frequencies.

• Alternate hypothesis

• there are significant differences between the

observed and expected frequencies.

• Set the alpha level

• alpha level is .05

Chi-Square Test (goodness-of-fit test)

Computer Observed

Frequency

Expected

Frequency (O-E)2/E

IBM 47 50 0.18

Apple 36 25 4.84

Other 17 25 2.56

Total (chi-square)

7.58

From the table we can see that:

Chi-Square (goodness-of-fit test) with unequal expected frequencies

• Calculate the critical value with respect to the

degrees of freedom & alpha level

df = (C - 1) = 2

for df =2 & Alpha =.05

the critical value is 5.991

• Write the decision rule for rejecting the null

hypothesis.

Reject H0 if Chi-Square >= 5.991.

Chi-Square (goodness-of-fit test) with unequal expected frequencies

Page 6: BRM Chi Square Test

6

Write a summary statement based on the decision.

Since our calculated value of (7.58) is greater

than 5.991, we reject the null hypothesis and not reject

the alternative hypothesis.

Chi-Square (goodness-of-fit test) with unequal expected frequencies

Chi square test: Test of

Independence

Chi-Square :test of independence

we wants to know if there is a significant

difference in the frequencies with which males

come from small, medium, or large cities and

females comes. The two variables we are

considering here are hometown size (small,

medium, or large) and gender (male or female).

Another way of putting our research question is:

Is gender independent of size of hometown?

Page 7: BRM Chi Square Test

7

The data for 30 females and 6 males is in the following table.

Frequency with which males and females

come from small, medium, and large cities

Small Medium Large Totals

Female 10 14 6 30

Male 4 1 1 6

Totals 14 15 7 36

Chi-Square :test of independence

The formula for chi-square :

where

O is the observed frequency, and

E is the expected frequency.

Chi-Square :test of independence

• The degrees of freedom for the two-dimensional chi-square statistic is:

df = (C - 1)(R - 1)

• where C is the number of columns or levels of the first variable and R is the number of rows or levels of the second variable.

• In the table above we have the observed frequencies. Now we must calculate the expected frequency for each of the six cells. For two-variable chi-square we find the expected frequencies with the formula:

Expected Frequency for a Cell =

(Column Total X Row Total)/Grand Total

Chi-Square :test of independence

Page 8: BRM Chi Square Test

8

• In the table above we can see that the Column Totals are 14 (small), 15 (medium), and 7 (large), while the Row

Totals are 30 (female) and 6 (male). The grand total is 36. • Using the formula we can thus find the expected

frequency for each cell.

• The expected frequency for the small female cell is 14X30/36 = 11.667

• The expected frequency for the medium female cell is 15X30/36 = 12.500

• The expected frequency for the large female cell is 7X30/36 = 5.833

• The expected frequency for the small male cell is 14X6/36 = 2.333

• The expected frequency for the medium male cell is 15X6/36 = 2.500

• The expected frequency for the large male cell is 7X6/36 = 1.167

Chi-Square :test of independence

We can put these expected frequencies in our table and also include the values

for (O - E)2/E. The sum of all these will of course be the value of chi-square.

Observed frequencies, expected frequencies, and (O - E)2/E for males and

females from small, medium, and large cities

Small Medium Large Totals

Observed Expected (O-E)2/E Observed Expected (O-E)2/E Observed Expected (O-E)2/E

Female 10 11.667 0.238 14 12.500 0.180 6 5.833 0.005 30

Male 4 2.333 1.191 1 2.500 0.900 1 1.167 0.024 6

Totals 14

15

7

36

Chi-Square :test of independence

From the table we can see that:

and df = (C - 1)(R - 1) = (3 - 1)(2 - 1) = (2)(1) = 2

Chi-Square :test of independence

Page 9: BRM Chi Square Test

9

State the null hypothesis and the alternative hypothesis

Set the alpha level.

Calculate the value of the appropriate statistic. Also indicate

the degrees of freedom

df = (C - 1)(R - 1) = (2)(1) = 2 at this df & 5%α

Write the decision rule for rejecting the null hypothesis.

Reject H0 if >= 5.991.

Chi-Square :test of independence

To write the decision rule we can know the critical value

by looking at Table and noting the tabled value for the column

for the .05 level and the row for 2 df.

Write a summary statement based on the decision.

Fail to reject H0

Note: Since our calculated value of (2.538) is not greater than

5.991, we fail to reject the null hypothesis and so reject the

alternative hypothesis.

Chi-Square :test of independence

27

Example

Ms. Jan Kilpatrick is the marketing manager for a manufacturer of sports cards. She plans to begin selling a series of cards with pictures and playing statistics of former Major League Baseball players. One of the problems is the selection of the former players. At a baseball card show at South Mall last weekend, she set up a booth and offered cards of the following six Hall of Fame baseball players: Tom Seaver, Nolan Ryan, Ty Cobb, George Brett, Hank Aaron, and Johnny Bench. At the end of the day she sold a total of 120 cards. The number of cards sold for each old-time player is shown in the table on the right. Can she conclude the sales are not the same for each player?

Page 10: BRM Chi Square Test

10

28

Step 1: State the null hypothesis and the alternate hypothesis.

H0: there is no difference between fo and fe

H1: there is a difference between fo and fe

Step 2: Select the level of significance.

α = 0.05 as stated in the problem

Step 3: Select the test statistic.

The test statistic follows the chi-square distribution,

designated as χ2

29

Step 4: Formulate the decision rule.

070.11

2

5,05.2

2

16,05.2

2

1,2

2

1,22 if 0HReject

ef

efof

ef

efof

ef

efof

kef

efof

k

30

Page 11: BRM Chi Square Test

11

31

Step 5: Compute the value of the Chi-square

statistic and make a decision

e

eo

f

ff2

2

32

34.40

The computed χ2 of 34.40 is in the rejection region, beyond the critical value of 11.070. The

decision, therefore, is to reject H0 at the .05 level . Conclusion: The difference between the observed and the expected frequencies is not due to

chance. Rather, the differences between f0 and fe and are large enough to be considered

significant. It is unlikely that card sales are the same among the six players.

33

Step 1: State the null hypothesis and the alternate hypothesis.

H0: There is no relationship between adjustment to civilian life

and where the individual lives after being released from prison.

H1: There is a relationship between adjustment to civilian life

and where the individual lives after being released from prison.

Step 2: Select the level of significance. α = 0.01 as stated in the problem

Step 3: Select the test statistic. The test statistic follows the chi-square distribution, designated as χ2

Contingency Analysis - Example

Page 12: BRM Chi Square Test

12

34

Step 4: Formulate the decision rule.

345.11

if HReject

2

3,01.2

2

)3)(1(,01.2

2

)14)(12(,2

2

)1)(1(,22

0

e

eo

e

eo

e

eo

e

eo

cr

f

ff

f

ff

f

ff

f

ff

Contingency Analysis - Example

35

Computing Expected Frequencies (fe)

(120)(50)

200

36

Computing the Chi-square Statistic

Page 13: BRM Chi Square Test

13

37

Conclusion

5.729

The computed χ2 of 5.729 is in the “Do not rejection H0” region. The null hypothesis is not rejected

at the .01 significance level.

We conclude there is no evidence of a relationship between adjustment to civilian life and where

the prisoner resides after being released from prison. For the Federal Correction Agency’s

advisement program, adjustment to civilian life is not related to where the ex-prisoner lives.