Chi-Square Test

105 1

Chi-Square Test

Chapter 7

105 2

Content

• test of fourfold data

• test of paired fourfold data

• Fisher probabilities in fourfold data

• test of R×C table

• Multiple comparison of sample rates

• test of goodness of fit

22

2

2

objection ： to deduce if there is any discrimination of the ratio or structure

ratio between two populations or among more than two populat

ions

multiple comparison of the ratio of multi-samples

to deduce if there is any correlation between two class variables

test of goodness of fit

test statistic :

fit for :qualitative data

2

105 4

Section 1 test of fourfold data 2

105 5

objective ： to judge if there is any discrimination of the rate or structure ratio between two populations （ equal to the u-test ）

demand ： the number of individuals from the two samples classified into two categories should be transformed into a fourfold data

105 6

（ 1 ） distribution is a continuous distribution ：

（ 2 ） one of the basic characters is that it can be p

lus to others ：

1 The basic idea of test distribution2

2

（3）critical value of 2 ：

2

105 7

0

0. 1

0. 2

0. 3

0. 4

0. 5

0 2 4 6 8 10 12 14 16

2( )f

2

1

6 10

105 8

2. The basic idea of test

eg 7-1 one hospital want to compare the curative effect of d

rug A （ experimental group ） and drug B control grou

p ） in lowering encephalic pressure 。 They classified 200

patients with high encephalic pressure into two groups at r

andom ， the results are as follows (table 7-1) 。 So wheth

er the effective ratio is different?

2

105 9

group effective ineffective total the ratio of

effective（%） experimental group

99(90.48) a 5(13.52) b 104(a+ b) 95.20

control group 75(83.52) c 21(12.48) d 96 (c+d) 78.13

total 174(a+c) 26(b+d) 200 (n) 87.00

Table 7-1 the comparison of the efficient ratio between two groups in lowering encephalic pressure

105 10

this data can be sorted into the form as chart7-2 ， th

at is to say there are two groups disposed, the number of e

ach of them is made up of two parts, occurred and not oc

curred. There are four basic data( )in the table ，

and other data can be induced by them, that is why it is c

alled fourfold table data.

99 5

75 21

105 11

treatment group

occur not occur total

first a b a+b

second c d c+d

total a+c b+d n

chart 7-2 the basic form of fourfold table data

105 12

Basic idea ： can be understood through the basic formula of test

22 ( ) , ( )( 1)A T

T -1行数列数

A means actual frequency ，while T means theoretical frequency 。

2

105 13

The respected frequencies can be calculated by the following formula ：

R CRC

n nTn

TRC refers to the respected frequencies in Ro

w R and Line C

nR refers to the total of the right row

nC refers to the total of the right line

105 14

the respected frequency is set by the hypothesis ， and by the ratio after merging 。

T0 1 2:H

105 15

the test statistic :the value of r

eflects the fitness of actual frequency a

nd respected frequency

2

105 16

from formula 7-1,we can see that the value of

also depends on the size of (exactly t

he size of ) 。 is decided by the number of the grids which can be evaluated freely, but not the sample size .

22( )A T

T

105 17

（ 1 ） establish hypothesis, and set the crit

eria of the test 。H0:π1=π2 the effective ratios of the two collectivities in lowering

encephalic pressure between the experimental group and contro

l group is equal

H1:π1≠π2 the ratios of them are not equal

α=0.05 。

3. The process of hypothesis test

105 18

（ 2 ） to calculate the test statistic

48.90200/17410411 T ， 52.1348.9010412 T

52.8348.9017421 T ， 48.1252.132622 T 。

2 2 2 2

2 ( 9 9 9 0 . 4 8 ) ( 5 1 3 . 5 2 ) ( 7 5 8 3 . 5 2 ) ( 2 1 1 2 . 4 8 )

9 0 . 4 8 1 3 . 5 2 8 3 . 5 2 1 2 . 4 8 1 2 . 8 6

1)12)(12(

105 19

consult the critical value table of 2 in

the additional table 8,we can conclude

that P is smaller than 0.05。According to

the test criteria 0.05 ,we should

reject 0H ，and accept 1H ，that is to say

that the effective ratios of the two

groups are different in lowering the

encephalic pressure, the former is

better than the latter .

105 20

2.The special formula

22 ( )

( )( )( )( )ad bc n

a b a c b d c d

22 (99 21 5 75) 200

12.86104 96 174 26

105 21

distribution is a continuous one, while the fourfol

d table data is dispersible, the value of

calculated by the latter is also dispersible, so in order

to improve the continuousness of the statistic distri

bution ,the continuousness correcting is needed.

22

105 22

3.The corrected formula

22 ( 0.5)c

A T

T

2

2( )

2( )( )( )( )c

n| ad - bc | - n

=a+b c+d a+c b+d

105 23

the conditions in choosing test formula for the fourfold table data ：

40, 5n T

2

， special formula ；

， corrected formula ；

， Fishier exact probabilities method 。

40, 1 5n T

40 1n T 或

the continuity correcting for test is on fit for the fourfold table data when equals to 1 ， while is more than one ,it shouldn’t be corrected 。

2

105 24

eg 7-2 one doctor want to compare the effect of dr

ug A and drug B in curing cerebrovascular disease

s ， he classifies 78 patients with such illness into t

wo groups at random ,the results are as follows (ta

ble 7-2),So whether the curative effect of the two d

rugs is the same ?

105 25

Table 7-2 the comparison of the efficient ratio in curing cerebrovascular diseases with two kinds of drugs

组别有效无效合计有效率（%）

胞磷胆碱组 46 6 52 88.46

神经节苷酯组 18 8（4.67） 26 69.23

合计 64 14 78 82.05

105 26

0 1 2 1 1 2: , : , 0.05H H

in this case, ， so the corrected formula can be used here

2278, 4.67 5n T

14.314642652

78)2/78186846( 22

c

， through the critical value table of ,we can know that 。 According to the test level 0.05, can’t be rejected ,so we can’t say that the effective ratios is different in curing cerebrovascular diseases.

12 10.005.0 P

0H

105 27

If not corrected ， then

so the conclusion is on the contrary 。

2 4.35 0.05P ，

105 28

Section 2

2 -test of paired fourfold table

105 29

It is the same as the measurement data that there are g

roup design and paired design among the deduction of th

e differences of the two population ratios (proportions) in

enumeration count data . That is fourfold table data and paired fourfold table data

105 30

Example 7-3,A laboratory has measured the serum antinuclear antibodies in 58 patients with questionable systemic lupus erythematosus by latex agglutination and immunofluorescence ,according to table 7-3. Is there the difference between the two methods?

105 31

latex agglutination immunofluorescence

＋－ total

＋ 11（a） 12（b） 23 － 2（c） 33（d） 35

total 13 45 58

Table7-3 the result of the two methods

105 32

In the paired design experiment ,there are four possible results of the two treatments as to the each pair:

① positive number both of the two methods( a) ；

② negative number both of the two methods (d) ；

③ positive number of immunofluorescence, negative number of latex agglutination (b) ；

④ positive number of latex agglutination, negative number of immunofluorescence (c) 。

105 33

a, d are the agreement of the two methodsb, c are not agreement of the two methods

cb

cb

2

2 )( , 1

cb

cbc

22 )1(

, 1=

Statistic:

105 34

Cautions:The method is used for small sample

Reasons :

1. only consider the disagreement condition (b,c)

2. not consider the sample size n and the conditions of the a

greement (a,d)

When the n ,a,d are large enough and the b,c are relative small ,there is nothing practical significance even if there is statistical significance.

105 35

0H：CB，1H：CB， 05.0 4014212 cb ，amendatory formula

79.5212

)1212( 22

c

1 ，look up the 2 critical value table 025.001.0 P . According to the level 05.0 ,reject

0H ， accept1H .there is

difference between the two methods, the positive ratio of

immunofluorescence is higher than that of latex agglutination.

Steps of the test ：

105 36

Section 3

Fisher exact probabilities method in 2×2 table

105 37

conditons ：

Basis of theroy ： hypergeometric distribution

not test

40n ，or 1T ，

or P ，

2

105 38

Example 7-4 ， a doctor will study the precaution affect of t

he type B hepatitis immunoglobulin against intrauterine infe

ction of fetus, and randomized 33 positive HBsAg patients i

nto two groups ： precaution group and nonprecuation gro

up ， looking at the table 7-4.Is there the difference betwee

n the two groups on the fetus infection ratio?

105 39

group positive negative total Infection

ratio（%）

Precaution 4 18 22 18.18

No precaution 5(3) 6 11 45.45

Total 9 24 33 27.27

table7-4 comparison between the two groups of fetus infection ratio of HBV

105 40

1.Basic idea:

When the periphery total numbers of fourfold

table are fixed, we can calculate the all

combinations probabilities of the four actual

frequencies, then make deduction according to the

α level and the cumulative probabilities.

105 41

(1) (2) (3) (4) (5)

0 22 1 21 2 20 3 19 4 18 9 2 8 3 7 4 6 5 5 6

ad-bc= -198 ad-bc= -165 ad-bc= -132 ad-bc =-99 ad-bc= -66

(6) (7) (8) (9) (10) 5 17 6 16 7 15 8 14 9 13 4 7 3 8 2 9 1 10 0 11

ad-bc= -33 ad-bc=0 ad-bc=33 ad-bc=66 ad-bc= 99

1 ． Calculate Pi :

combination number:

minimal periphery total number +1

For example7-4 ， the numbers of combination: 9+1=10

105 42

The sum of the Pi is 1

Calculation formula:

( ) ( )!( )!( )!

! ! ! ! !i

a b c d a c b dP

a b c d n

！

105 43

2.calculation of the accumulation probabilities

If crossing decibel of existent fourfold t

able is a*d* － b*c*=D*, the probability is

P*, than Di represents the crossing decibel

of other combination fourfold table, the pro

babilities are Pi.

105 44

(1)One-sided test

If the D*>0 in the existent fourfold table,

we must calculate the accumulation probabili

ties of all on the base of Di≥D* and Pi≤P*. If

D*<0, then we should calculate the accumula

tion probabilities on the condition of Di≤D*

and Pi≤P*.

105 45

(2)Two-sided test

Calculate the accumulation probabilities of all assembly fourfold table which are consistent with and .

If or , the sequences of all combination in the fourfold table are symmetry, we can get the two-sided accumulation probabilities only through the one-sided accumulation probabilities ×2.

|||| *DDi *PPi

dcba dbca

105 46

Checking procedure (this example is n=33<40)

0H ： 21 ， 1H ： 21 ， 05.0

1 、 Calculate the D* and P* of existent sample fourfold t

able ,as well as Di of all fourfold tables, please referenc

e the table 7-5.

in this example.

2 、 Calculate the Pi of all fourfold table consistent with

.

* *66, 0.08762728D P

|||| *DDi

105 47

3 、 Calculate the accumulation probabilitis of the fo

urfold tables corresponding and . I

n this example , , , , , and ,

are in line with the qualification. The accumulation pr

obability is

66|| iD *PPi

1P 2P 3P 4P5P 10P

1 2 3 4 5 10 0.1210 0.05P P P P P P P

05.0According to the size of test we can’t presume that the HBV infection rate of the infants which were performed precaution injection isn’t equal to that of who without pre-caution injection.

105 48

Fourfold table combination i a b c d

bcadDi iP

1 0 22 9 2 -198 0.00000143 2 1 21 8 3 -165 0.00009412 3 2 20 7 4 -132 0.00197656 4 3 19 6 5 -99 0.01844785 5* 4 18 5 6 -66* 0.08762728* 6 5 17 4 7 -33 7 6 16 3 8 0 8 7 15 2 9 33 9 8 14 1 10 66 0.09120390

10 9 13 0 11 99 0.01289752

Table7-5 The Fisher exact probility calculating table of the example7-4

105 49

Example 7-5 Some research studies the P53 expression of aden

oma of adenocarcinoma and adenoma of gallbladder, detect P53 expr

ession of 10 respective samples of each disease from the same time

exairesis by immunohistochemistry, data were shown in Table 7-6.

The problem is whether there is any significant difference between t

he positive rate between adenoma of adenocarcinoma and adenoma

of gallbladder ?

105 50

Types Positive Negative Total

Adenoma of adenocarcinoma

6 4 10

Adenoma of gallbladder

1 9 10

Totol 7 13 20

Table 7-6 P53 positive expression rate between adenoma of adenocarcinoma and adenoma of gallbladder

105 51

0H：21，1H：21，05.0

i Four-fold table combinations

a b c d bcadDi iP

1 0 10 7 3 -70 2 1 9 6 4 -50 3 2 8 5 5 -30 4 3 7 4 6 -10 5 4 6 3 7 10 6 5 5 2 8 30 7* 6 4 1 9 50* 0.02708978* 8 7 3 0 10 70 0.00154799

In this sample, a+b+c+d=10, and judge from table

7-7 ， every combination in the four-fold table displaye

d a symmetry distribution centered by i=4 and i=5.

Table 7-7 Fisher exact propability calculating table of Example7-5

*is existed examples

105 52

1. Calculate D*, P*, and among the existent samples. Now *=50, P*=0.02708978.

2. Calculate each Pi in every combination of the four-fold table if Di≥

50.

3. Calculate accumulated possibility when Di≥50 and Pi≤P*. Here is

P7 and P8.

4. Calculate two-sided accumulated possibility P.

P ＞ 0.05 ， we can’t reject H0 according to size of test α=0.05, so

we’re unable to say P53 positive expression rate between adenoma of adenocarcinoma and adenoma of gallbladder is different 。

7 8 0.02708978 0.00154799 0.0286P P

7 82 ( ) 0.057P P P

105 53

In example 7-5, if we draw the conclusion by our profession

that P53 expression rate in adenoma of adenocarcinoma

should be higher than adenoma of gallbladder, then we can

make a one-sided test, 0H ： 21 , 1H ： 21 ,

05.0 , calculate the one-sided possibility from table 7-7

and get the result that 0286.087 PPP ,

05.001.0 P , thus can reject H0 and accept H1, we can say

that P53 expression rate in adenoma of adenocarcinoma is

higher than adenoma of gallbladder

Notice:

105 54

interception fifthMulti-comparison of many sam

ple rates

105 55

The method of multi-comparison of many sample rates are partition of method 、 scheffe’ method and SNK method. This unit just introduces the method of multi-comparison of many sample rates are partition of method

2

2

105 56

partition of method2

105 57

One principle idea

The data of multi-comparison of many sample rates can change into 2*C data. But we must set critical value again .

105 58

1.inter-comparison of many sample rates

We should set again. k:the number of comparison

'

'

12

=k

+

105 59

2.The treatment group and one comparison group . We should set again. k:the number of comparison

'

'

2( 1)k

105 60

105 61

Two 、 inter-comparison of many groups

105 62

Example 7-9 Please have inter-comp

arison for data 7-8 ， in order to test any gr

oups whether or not be effective?

105 63

Test steps0H ： 21 ，total effective rate of any groups is equal

1H ： 21 ，total effective rate of any groups is unequal

05.0

' 0.050.05 / 4 0.0125

3(3 1) / 2 1

105 64

group effect Un-effect sum 2 P

Phy-group 199 7 206

Drug-group 164 18 182 total 363 25 388

6.76 <0.0125

Phy-group 199 7 206

Plaster-group 118 26 144 total 317 33 350

21.32 <0.00313

Drug-group 164 18 182

Plaster-group 118 26 144 total 282 44 326

4.59 >0.0125

Table 7-12 inter-comparison of three methods

105 65

According to the level 0125.0' , physical group

and drug group defuse 0H ，acceptting 1H ；

physical group and plaster group defuse 0H ，

accepting 1H ；drug group and plaster group

don’t defuse 0H .We may think the effective

rate of physical group and drug group、plaster

group are different ;we don’t think the effective

rate of drug group and plaster group isn’t

different.

105 66

Three 、 the comparison of many treatment group and one comparison group

105 67

Example 7-10 We think the drug group is comparison ,physical group and plaster group are treatment groups , is there the total effective rate of two treatment groups and comparison difference?

0H ： CT ，the total effective rate of treatment groups and

comparison group are equal

1H ： CT ，the total effective rate of treatment groups

and comparison group are unequal

05.0

0125.0)13(2

05.0'

105 68

Physical group and drug group： 76.62 ， 0125.0P

Plaster group and drug group ： 59.42 ， 0125.0P According to the level of 0125.0' , Physical

group and drug group defuse 0H ,accepting 1H ，we

can think the total effective rate of physical group and

drug group are different ；Plaster group and drug

group don’t defuse 0H .We don’t think the two total

effective rates are different. Combine the result of

table 7-8，the effective rate of physical group is higher

than drug group.

105 69

Interception si xth the test for

l i near-trend of order grouping datas( omi t)

105 70

Interception seventh

The test of for frequency distribution

2

105 71

Pearson can reflect the accordance of fact freque

nt and theory frequent .So can infer the degree

of frequent distribution, example, normal distributi

on , binomial distribution ， Poisson distribution ，negative distribution and so on 。

22

105 72

Example 7-12 In order to observe room dis

tribution of patients of keshan ， inquirer set of

f 279 units of this region. Stat total cases of past

years of these units. We can see this data in colu

mn 1,2 of table 7-15 ， did this data obey Poiss

on distribution ？

105 73

cases(X) Observe A P(X) Theory T (A-T)2/T

(1) (2) (3) (4)=(3) n (5) 0 26 0.0854 23.8 0.20 1 51 0.2102 58.6 0.99 2 75 0.2585 72.1 0.12 3 63 0.2120 59.1 0.26 4 38 0.1304 36.4 0.07 5 17 0.0641 17.9 0.05 6 5 0.0263 7.3 0.36 7 3 0.0092 2.6

≥ 8 1 0.0039* 1.1

total 279(n) 2.05( 2 )

9 11

Table 7-15 Poisson distribution test

* ： X≥8 P ： 0039.09961.01

105 74

279n ， 686fX ， 23422fX

46.2279/686 ，

22342 686 / 279

2.36279 1

2 （）

Means and variance are close, so we can test this

data obey Poisson distribution。

0H ：this data obey Poisson distribution

1H ：this data doesn’t obey Poisson

distribution

10.0

105 75

According to the probability of Poisson

distribution function!

)(X

eXPX ， 2.46 ，

we can get the number0 ， 1 ， 2 … …，

probability )(XP ， theory frequent

nXPTX )( ，and any rawT

TA 2)( .

105 76

22 ( )

2.05A T

T

Use 527 (Because 6T ，7T ，8T united,it

was only seven columns.so 27 )，look up critical value table of 2 ，

9.075.0 P 。According to 10.0 critical level ,didn’t defuse 0H ，so we could think

this data obey Poisson distribution。

105 77

Documents

Chi-Square Test