18
1 Session 9 Tests of Association in two-way tables

Session 9

  • Upload
    shubha

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Session 9. Tests of Association in two-way tables. Learning Objectives. By the end of this session, you will be able to conduct and interpret results from a chi-square test for comparing several proportions or (equivalently) testing the association between two categorical variables - PowerPoint PPT Presentation

Citation preview

Page 1: Session 9

1

Session 9

Tests of Association

in two-way tables

Page 2: Session 9

2

By the end of this session, you will be able to

conduct and interpret results from a chi-square test for comparing several proportions or (equivalently) testing the association between two categorical variables

explain how results above can be extended to the study of associations in general r x c tables

state assumptions underlying the above test and actions to take if assumptions fail

Learning Objectives

Page 3: Session 9

3

Below is a 5x2 table of observed frequenciesshowing animals who did or did not get diseasedafter inoculation with one of five vaccines.

Vaccine diseased healthy Total

A 43 237 280

B 52 198 250

C 25 245 270

D 48 212 260

E 57 233 290

Total 225 1125 1350

Question:

Is there an association between occurrence of disease and type of vaccine?

An example

Page 4: Session 9

4

To answer the question, need to testH0: disease occurrence is independent of type of vaccine (i.e. proportions diseases are the same for all vaccines)H1: the two variables are associated

If H0 is true, best estimate of proportion is diseased total divided by grand total= 225/1350 = 0.167

We use this to compute expected values in each cell of the table under the null hypothesis.

Null and alternative hypotheses

Page 5: Session 9

5

Expected values in the first row:

Expected value in cell 1 = (225 / 1350)*280 = (225*280) / 1350

= 46.67Expected value in cell 2 = (1125 / 1350)*280

= (1125*280) / 1350= 233.33

Can you calculate expected values in the nextrow? Check that your 2 numbers add to 250.

Computation of expected values

Page 6: Session 9

6

Vaccine diseased healthy Total

A 46.67 233.33 280

B 41.67 208.33 250

C 45.00 225.00 270

D 43.33 216.67 260

E 48.33 241.67 290

Total 225 1125 1350

Table of expected values

Note:

Page 7: Session 9

7

Now compute the chi-square test-statistic given by

If H0 is true, X2 follows a 2 distribution with 4 d.f. Note: d.f.=(r-1)(c-1) where r=number of rows and c=number of columns in the table.

Comparing 16.56 to we get a p-value of 0.0024, a highly significant result. We may conclude there is strong evidence of an association between disease occurrence and type of vaccine.

22

allcells

(O-E)X 16.56

E

24χ

Chi-square test

Page 8: Session 9

8

Survey results are often expressed in terms of2-way tables. In general, such tables may containr rows and c columns. Questions of interest in such tables centre on whether these is an association between the two variables that have been tabulated.

For example if the table tabulates education level of HH head (none, primary, secondary, tertiary) by poverty levels (not poor, poor, very poor), the question “is poverty related to education” may be asked.

Extensions to r × c tables

Page 9: Session 9

9

To answer the above question, the null hypothesis is that the two variables are NOT related, against the alternative that they are.

Under the null hypothesis, comparison of expected values with observed values leads to a chi-square test. The d.f. associated with this test = (r-1)(c-1).

In the above example, the d.f.=(4-1)(3-1)=6

Chi-square test for an r × c table

Page 10: Session 9

10

The chi-square test is approximate Validity relies on “large” samples Small samples of unbalanced data (large and

small counts together) may invalidate the approximation

Rules of thumb for validity involve the expected values, E

Need large expected values under H0

Say, most E5 and none less than 1 If rule of thumb is not satisfied, may have an

unreliable p-value

Assumption underlying the test

Page 11: Session 9

11

(a) Simple approaches:

Collect more data if this is possible

Collapse rows or columns if the table has more than

two rows/columns. But need to recognise that this leads to loss of information with some types of variables, there may be no

natural way of combining rows/columns

Actions when assumptions fail

Page 12: Session 9

12

(b) Use a continuity correction

This method is often called Yate’s correction and is applicable just to 2x2 tables.

First we show the standard chi-square value corresponding to a table with cell counts a, b, c, d as below. (Verify later that this is correct)

col1 col2

row1 a b r1

row2 c d r2

n1 n2 N

1 2 1 2

2

2 ad bc NX =

r r n n

Actions when assumptions fail

Page 13: Session 9

13

(b) Continuity correction (continued)…

The approximation of X2 to the chi-square is improved by reducing the absolute value of O-E by ½ before calculating X2. This results in the X2 taking the value below.

1 2 1 2

2

2 | ad bc | ½N NX =

r r n n

Note: The equivalent when comparing two proportions using a z-test is to reduce by ½, the r value for the first p=r/n and increase by ½ the r value for the second proportion.

Actions when assumptions fail

Page 14: Session 9

14

Example of use of continuity corrn

Whether smoker?

Job

Driver Conductor Total

No 40

67.8%

52

78.8%

92

73.6%

Yes 19

32.2%

14

21.2%

33

26.4%

Total 59

100.0%

66

100.0%

125

(100%)

Above is the example on Bus data used during the practical sessions. Question of interest is whether the proportion of smokers are different across job types.

Page 15: Session 9

15

The usual chi-square test leads to X2=1.937

Applying the continuity correction, we get

X2 = 1.412

Here, there is little difference because the sample sizes are reasonably large.

More important to apply the continuity correction for small sample sizes.

Example of use of continuity corrn

Page 16: Session 9

16

Actions when assumptions fail

(c) Use an Exact Test

• When actions suggested in (a) or (b) are not possible, consider using an Exact Test.

• Details of such tests are beyond the scope of this module.

• Some software packages (e.g. Stata) have the facility to perform Fisher’s exact test. SPSS does this only for 2x2 tables. Special software also exist for such tests, e.g. StatXact.

Page 17: Session 9

17

Limitations

Chi-square tests are limited, in that only two factors are examined at a time.

This may cause erroneous inferences to be made (see Practical 15 for an example).

The inter-relations between more than two factors can be investigating using more sophisticated statistical techniques, e.g. log-linear modelling.

Page 18: Session 9

18

References

Altman, D.G., Machin, D., Bryant, T.N., and Gardner, M.J. (2000) Statistics with confidence. (2nd Edition). BMJ Books, Bristol, UK. pp 240.

Armitage, P., Matthews J.N.S. and Berry G. (2002). Statistical Methods in Medical Research. 4th edn. Blackwell.