77
105 1 Chi-Square Test Chapter 7

Chi-Square Test

Embed Size (px)

DESCRIPTION

Chi-Square Test. Chapter 7. Content. test of fourfold data test of paired fourfold data Fisher probabilities in fourfold data test of R×C table Multiple comparison of sample rates test of goodness of fit. objection : - PowerPoint PPT Presentation

Citation preview

Page 1: Chi-Square Test

105 1

Chi-Square Test

Chapter 7

Page 2: Chi-Square Test

105 2

Content

• test of fourfold data

• test of paired fourfold data

• Fisher probabilities in fourfold data

• test of R×C table

• Multiple comparison of sample rates

• test of goodness of fit

22

2

2

Page 3: Chi-Square Test

      objection : to deduce if there is any discrimination of the ratio or structure

ratio between two populations or among more than two populat

ions

multiple comparison of the ratio of multi-samples

to deduce if there is any correlation between two class variables

test of goodness of fit

test statistic :

fit for :qualitative data

2

Page 4: Chi-Square Test

105 4

Section 1 test of fourfold data 2

Page 5: Chi-Square Test

105 5

objective : to judge if there is any discrimination of the rate or structure ratio between two populations ( equal to the u-test )

demand : the number of individuals from the two samples classified into two categories should be transformed into a fourfold data

Page 6: Chi-Square Test

105 6

( 1 ) distribution is a continuous distribution :

( 2 ) one of the basic characters is that it can be p

lus to others :

1 The basic idea of test distribution2

2

(3)critical value of 2 :

2

Page 7: Chi-Square Test

105 7

0

0. 1

0. 2

0. 3

0. 4

0. 5

0 2 4 6 8 10 12 14 16

2( )f

2

1

6 10

Page 8: Chi-Square Test

105 8

2. The basic idea of test

eg 7-1 one hospital want to compare the curative effect of d

rug A ( experimental group ) and drug B control grou

p ) in lowering encephalic pressure 。 They classified 200

patients with high encephalic pressure into two groups at r

andom , the results are as follows (table 7-1) 。 So wheth

er the effective ratio is different?

2

Page 9: Chi-Square Test

105 9

group effective ineffective total the ratio of

effective(%) experimental group

99(90.48) a 5(13.52) b 104(a+ b) 95.20

control group 75(83.52) c 21(12.48) d 96 (c+d) 78.13

total 174(a+c) 26(b+d) 200 (n) 87.00

Table 7-1 the comparison of the efficient ratio between two groups in lowering encephalic pressure

Page 10: Chi-Square Test

105 10

this data can be sorted into the form as chart7-2 , th

at is to say there are two groups disposed, the number of e

ach of them is made up of two parts, occurred and not oc

curred. There are four basic data( )in the table ,

and other data can be induced by them, that is why it is c

alled fourfold table data.

99 5

75 21

Page 11: Chi-Square Test

105 11

treatment group

occur not occur total

first a b a+b

second c d c+d

total a+c b+d n

chart 7-2 the basic form of fourfold table data

Page 12: Chi-Square Test

105 12

Basic idea : can be understood through the basic formula of test

22 ( ) , ( )( 1)A T

T -1行数 列数

A means actual frequency ,while T means theoretical frequency 。

2

Page 13: Chi-Square Test

105 13

The respected frequencies can be calculated by the following formula :

R CRC

n nTn

TRC refers to the respected frequencies in Ro

w R and Line C

nR refers to the total of the right row

nC refers to the total of the right line

Page 14: Chi-Square Test

105 14

the respected frequency is set by the hypothesis , and by the ratio after merging 。

T0 1 2:H

Page 15: Chi-Square Test

105 15

the test statistic :the value of r

eflects the fitness of actual frequency a

nd respected frequency

2

Page 16: Chi-Square Test

105 16

from formula 7-1,we can see that the value of

also depends on the size of (exactly t

he size of ) 。 is decided by the number of the grids which can be evaluated freely, but not the sample size .

22( )A T

T

Page 17: Chi-Square Test

105 17

( 1 ) establish hypothesis, and set the crit

eria of the test 。H0:π1=π2 the effective ratios of the two collectivities in lowering

encephalic pressure between the experimental group and contro

l group is equal

H1:π1≠π2 the ratios of them are not equal

α=0.05 。

3. The process of hypothesis test

Page 18: Chi-Square Test

105 18

( 2 ) to calculate the test statistic

48.90200/17410411 T , 52.1348.9010412 T

52.8348.9017421 T , 48.1252.132622 T 。

2 2 2 2

2 ( 9 9 9 0 . 4 8 ) ( 5 1 3 . 5 2 ) ( 7 5 8 3 . 5 2 ) ( 2 1 1 2 . 4 8 )

9 0 . 4 8 1 3 . 5 2 8 3 . 5 2 1 2 . 4 8 1 2 . 8 6

1)12)(12(

Page 19: Chi-Square Test

105 19

consult the critical value table of 2 in

the additional table 8,we can conclude

that P is smaller than 0.05。According to

the test criteria 0.05 ,we should

reject 0H ,and accept 1H ,that is to say

that the effective ratios of the two

groups are different in lowering the

encephalic pressure, the former is

better than the latter .

Page 20: Chi-Square Test

105 20

2.The special formula

22 ( )

( )( )( )( )ad bc n

a b a c b d c d

22 (99 21 5 75) 200

12.86104 96 174 26

Page 21: Chi-Square Test

105 21

distribution is a continuous one, while the fourfol

d table data is dispersible, the value of

calculated by the latter is also dispersible, so in order

to improve the continuousness of the statistic distri

bution ,the continuousness correcting is needed.

22

Page 22: Chi-Square Test

105 22

3.The corrected formula

22 ( 0.5)c

A T

T

2

2( )

2( )( )( )( )c

n| ad - bc | - n

=a+b c+d a+c b+d

Page 23: Chi-Square Test

105 23

the conditions in choosing test formula for the fourfold table data :

40, 5n T

2

, special formula ;

, corrected formula ;

, Fishier exact probabilities method 。

40, 1 5n T

40 1n T 或

the continuity correcting for test is on fit for the fourfold table data when equals to 1 , while is more than one ,it shouldn’t be corrected 。

2

Page 24: Chi-Square Test

105 24

eg 7-2 one doctor want to compare the effect of dr

ug A and drug B in curing cerebrovascular disease

s , he classifies 78 patients with such illness into t

wo groups at random ,the results are as follows (ta

ble 7-2),So whether the curative effect of the two d

rugs is the same ?

Page 25: Chi-Square Test

105 25

Table 7-2 the comparison of the efficient ratio in curing cerebrovascular diseases with two kinds of drugs

组别 有效 无效 合计 有效率(%)

胞磷胆碱组 46 6 52 88.46

神经节苷酯组 18 8(4.67) 26 69.23

合计 64 14 78 82.05

Page 26: Chi-Square Test

105 26

0 1 2 1 1 2: , : , 0.05H H

in this case, , so the corrected formula can be used here

2278, 4.67 5n T

14.314642652

78)2/78186846( 22

c

, through the critical value table of ,we can know that 。 According to the test level 0.05, can’t be rejected ,so we can’t say that the effective ratios is different in curing cerebrovascular diseases.

12 10.005.0 P

0H

Page 27: Chi-Square Test

105 27

If not corrected , then

so the conclusion is on the contrary 。

2 4.35 0.05P ,

Page 28: Chi-Square Test

105 28

Section 2

2 -test of paired fourfold table

Page 29: Chi-Square Test

105 29

It is the same as the measurement data that there are g

roup design and paired design among the deduction of th

e differences of the two population ratios (proportions) in

enumeration count data . That is fourfold table data and paired fourfold table data

Page 30: Chi-Square Test

105 30

Example 7-3,A laboratory has measured the serum antinuclear antibodies in 58 patients with questionable systemic lupus erythematosus by latex agglutination and immunofluorescence ,according to table 7-3. Is there the difference between the two methods?

Page 31: Chi-Square Test

105 31

latex agglutination immunofluorescence

+ - total

+ 11(a) 12(b) 23 - 2(c) 33(d) 35

total 13 45 58

Table7-3 the result of the two methods

Page 32: Chi-Square Test

105 32

In the paired design experiment ,there are four possible results of the two treatments as to the each pair:

① positive number both of the two methods( a) ;

② negative number both of the two methods (d) ;

③ positive number of immunofluorescence, negative number of latex agglutination (b) ;

④ positive number of latex agglutination, negative number of immunofluorescence (c) 。

Page 33: Chi-Square Test

105 33

a, d are the agreement of the two methodsb, c are not agreement of the two methods

cb

cb

2

2 )( , 1

cb

cbc

22 )1(

, 1=

Statistic:

Page 34: Chi-Square Test

105 34

Cautions:The method is used for small sample

Reasons :

1. only consider the disagreement condition (b,c)

2. not consider the sample size n and the conditions of the a

greement (a,d)

When the n ,a,d are large enough and the b,c are relative small ,there is nothing practical significance even if there is statistical significance.

Page 35: Chi-Square Test

105 35

0H:CB,1H:CB, 05.0 4014212 cb ,amendatory formula

79.5212

)1212( 22

c

1 ,look up the 2 critical value table 025.001.0 P . According to the level 05.0 ,reject

0H , accept1H .there is

difference between the two methods, the positive ratio of

immunofluorescence is higher than that of latex agglutination.

Steps of the test :

Page 36: Chi-Square Test

105 36

Section 3

Fisher exact probabilities method in 2×2 table

Page 37: Chi-Square Test

105 37

conditons :

Basis of theroy : hypergeometric distribution

not test

40n ,or 1T ,

or P ,

2

Page 38: Chi-Square Test

105 38

Example 7-4 , a doctor will study the precaution affect of t

he type B hepatitis immunoglobulin against intrauterine infe

ction of fetus, and randomized 33 positive HBsAg patients i

nto two groups : precaution group and nonprecuation gro

up , looking at the table 7-4.Is there the difference betwee

n the two groups on the fetus infection ratio?

Page 39: Chi-Square Test

105 39

group positive negative total Infection

ratio(%)

Precaution 4 18 22 18.18

No precaution 5(3) 6 11 45.45

Total 9 24 33 27.27

table7-4 comparison between the two groups of fetus infection ratio of HBV

Page 40: Chi-Square Test

105 40

1.Basic idea:

When the periphery total numbers of fourfold

table are fixed, we can calculate the all

combinations probabilities of the four actual

frequencies, then make deduction according to the

α level and the cumulative probabilities.

Page 41: Chi-Square Test

105 41

(1) (2) (3) (4) (5)

0 22 1 21 2 20 3 19 4 18 9 2 8 3 7 4 6 5 5 6

ad-bc= -198 ad-bc= -165 ad-bc= -132 ad-bc =-99 ad-bc= -66

(6) (7) (8) (9) (10) 5 17 6 16 7 15 8 14 9 13 4 7 3 8 2 9 1 10 0 11

ad-bc= -33 ad-bc=0 ad-bc=33 ad-bc=66 ad-bc= 99

1 . Calculate Pi :

combination number:

minimal periphery total number +1

For example7-4 , the numbers of combination: 9+1=10

Page 42: Chi-Square Test

105 42

The sum of the Pi is 1

Calculation formula:

( ) ( )!( )!( )!

! ! ! ! !i

a b c d a c b dP

a b c d n

Page 43: Chi-Square Test

105 43

2.calculation of the accumulation probabilities

If crossing decibel of existent fourfold t

able is a*d* - b*c*=D*, the probability is

P*, than Di represents the crossing decibel

of other combination fourfold table, the pro

babilities are Pi.

Page 44: Chi-Square Test

105 44

(1)One-sided test

If the D*>0 in the existent fourfold table,

we must calculate the accumulation probabili

ties of all on the base of Di≥D* and Pi≤P*. If

D*<0, then we should calculate the accumula

tion probabilities on the condition of Di≤D*

and Pi≤P*.

Page 45: Chi-Square Test

105 45

(2)Two-sided test

Calculate the accumulation probabilities of all assembly fourfold table which are consistent with and .

If or , the sequences of all combination in the fourfold table are symmetry, we can get the two-sided accumulation probabilities only through the one-sided accumulation probabilities ×2.

|||| *DDi *PPi

dcba dbca

Page 46: Chi-Square Test

105 46

Checking procedure (this example is n=33<40)

0H : 21 , 1H : 21 , 05.0

1 、 Calculate the D* and P* of existent sample fourfold t

able ,as well as Di of all fourfold tables, please referenc

e the table 7-5.

in this example.

2 、 Calculate the Pi of all fourfold table consistent with

.

* *66, 0.08762728D P

|||| *DDi

Page 47: Chi-Square Test

105 47

3 、 Calculate the accumulation probabilitis of the fo

urfold tables corresponding and . I

n this example , , , , , and ,

are in line with the qualification. The accumulation pr

obability is

66|| iD *PPi

1P 2P 3P 4P5P 10P

1 2 3 4 5 10 0.1210 0.05P P P P P P P

05.0According to the size of test we can’t presume that the HBV infection rate of the infants which were performed precaution injection isn’t equal to that of who without pre-caution injection.

Page 48: Chi-Square Test

105 48

Fourfold table combination i a b c d

bcadDi iP

1 0 22 9 2 -198 0.00000143 2 1 21 8 3 -165 0.00009412 3 2 20 7 4 -132 0.00197656 4 3 19 6 5 -99 0.01844785 5* 4 18 5 6 -66* 0.08762728* 6 5 17 4 7 -33 7 6 16 3 8 0 8 7 15 2 9 33 9 8 14 1 10 66 0.09120390

10 9 13 0 11 99 0.01289752

Table7-5 The Fisher exact probility calculating table of the example7-4

Page 49: Chi-Square Test

105 49

Example 7-5 Some research studies the P53 expression of aden

oma of adenocarcinoma and adenoma of gallbladder, detect P53 expr

ession of 10 respective samples of each disease from the same time

exairesis by immunohistochemistry, data were shown in Table 7-6.

The problem is whether there is any significant difference between t

he positive rate between adenoma of adenocarcinoma and adenoma

of gallbladder ?

Page 50: Chi-Square Test

105 50

Types Positive Negative Total

Adenoma of adenocarcinoma

6 4 10

Adenoma of gallbladder

1 9 10

Totol 7 13 20

Table 7-6 P53 positive expression rate between adenoma of adenocarcinoma and adenoma of gallbladder

Page 51: Chi-Square Test

105 51

0H:21,1H:21,05.0

i Four-fold table combinations

a b c d bcadDi iP

1 0 10 7 3 -70 2 1 9 6 4 -50 3 2 8 5 5 -30 4 3 7 4 6 -10 5 4 6 3 7 10 6 5 5 2 8 30 7* 6 4 1 9 50* 0.02708978* 8 7 3 0 10 70 0.00154799

In this sample, a+b+c+d=10, and judge from table

7-7 , every combination in the four-fold table displaye

d a symmetry distribution centered by i=4 and i=5.

Table 7-7 Fisher exact propability calculating table of Example7-5

*is existed examples

Page 52: Chi-Square Test

105 52

1. Calculate D*, P*, and among the existent samples. Now *=50, P*=0.02708978.

2. Calculate each Pi in every combination of the four-fold table if Di≥

50.

3. Calculate accumulated possibility when Di≥50 and Pi≤P*. Here is

P7 and P8.

4. Calculate two-sided accumulated possibility P.

P > 0.05 , we can’t reject H0 according to size of test α=0.05, so

we’re unable to say P53 positive expression rate between adenoma of adenocarcinoma and adenoma of gallbladder is different 。

7 8 0.02708978 0.00154799 0.0286P P

7 82 ( ) 0.057P P P

Page 53: Chi-Square Test

105 53

In example 7-5, if we draw the conclusion by our profession

that P53 expression rate in adenoma of adenocarcinoma

should be higher than adenoma of gallbladder, then we can

make a one-sided test, 0H : 21 , 1H : 21 ,

05.0 , calculate the one-sided possibility from table 7-7

and get the result that 0286.087 PPP ,

05.001.0 P , thus can reject H0 and accept H1, we can say

that P53 expression rate in adenoma of adenocarcinoma is

higher than adenoma of gallbladder

Notice:

Page 54: Chi-Square Test

105 54

interception fifthMulti-comparison of many sam

ple rates

Page 55: Chi-Square Test

105 55

The method of multi-comparison of many sample rates are partition of method 、 scheffe’ method and SNK method. This unit just introduces the method of multi-comparison of many sample rates are partition of method

2

2

Page 56: Chi-Square Test

105 56

partition of method2

Page 57: Chi-Square Test

105 57

One principle idea

The data of multi-comparison of many sample rates can change into 2*C data. But we must set critical value again .

Page 58: Chi-Square Test

105 58

1.inter-comparison of many sample rates

We should set again. k:the number of comparison

'

'

12

=k

+

Page 59: Chi-Square Test

105 59

2.The treatment group and one comparison group . We should set again. k:the number of comparison

'

'

2( 1)k

Page 60: Chi-Square Test

105 60

Page 61: Chi-Square Test

105 61

Two 、 inter-comparison of many groups

Page 62: Chi-Square Test

105 62

Example 7-9 Please have inter-comp

arison for data 7-8 , in order to test any gr

oups whether or not be effective?

Page 63: Chi-Square Test

105 63

Test steps0H : 21 ,total effective rate of any groups is equal

1H : 21 ,total effective rate of any groups is unequal

05.0

' 0.050.05 / 4 0.0125

3(3 1) / 2 1

Page 64: Chi-Square Test

105 64

group effect Un-effect sum 2 P

Phy-group 199 7 206

Drug-group 164 18 182 total 363 25 388

6.76 <0.0125

Phy-group 199 7 206

Plaster-group 118 26 144 total 317 33 350

21.32 <0.00313

Drug-group 164 18 182

Plaster-group 118 26 144 total 282 44 326

4.59 >0.0125

Table 7-12 inter-comparison of three methods

Page 65: Chi-Square Test

105 65

According to the level 0125.0' , physical group

and drug group defuse 0H ,acceptting 1H ;

physical group and plaster group defuse 0H ,

accepting 1H ;drug group and plaster group

don’t defuse 0H .We may think the effective

rate of physical group and drug group、plaster

group are different ;we don’t think the effective

rate of drug group and plaster group isn’t

different.

Page 66: Chi-Square Test

105 66

Three 、 the comparison of many treatment group and one comparison group

Page 67: Chi-Square Test

105 67

Example 7-10 We think the drug group is comparison ,physical group and plaster group are treatment groups , is there the total effective rate of two treatment groups and comparison difference?

0H : CT ,the total effective rate of treatment groups and

comparison group are equal

1H : CT ,the total effective rate of treatment groups

and comparison group are unequal

05.0

0125.0)13(2

05.0'

Page 68: Chi-Square Test

105 68

Physical group and drug group: 76.62 , 0125.0P

Plaster group and drug group : 59.42 , 0125.0P According to the level of 0125.0' , Physical

group and drug group defuse 0H ,accepting 1H ,we

can think the total effective rate of physical group and

drug group are different ;Plaster group and drug

group don’t defuse 0H .We don’t think the two total

effective rates are different. Combine the result of

table 7-8,the effective rate of physical group is higher

than drug group.

Page 69: Chi-Square Test

105 69

Interception si xth the test for

l i near-trend of order grouping datas( omi t)

Page 70: Chi-Square Test

105 70

Interception seventh

The test of for frequency distribution

2

Page 71: Chi-Square Test

105 71

Pearson can reflect the accordance of fact freque

nt and theory frequent .So can infer the degree

of frequent distribution, example, normal distributi

on , binomial distribution , Poisson distribution ,negative distribution and so on 。

22

Page 72: Chi-Square Test

105 72

Example 7-12 In order to observe room dis

tribution of patients of keshan , inquirer set of

f 279 units of this region. Stat total cases of past

years of these units. We can see this data in colu

mn 1,2 of table 7-15 , did this data obey Poiss

on distribution ?

Page 73: Chi-Square Test

105 73

cases(X) Observe A P(X) Theory T (A-T)2/T

(1) (2) (3) (4)=(3) n (5) 0 26 0.0854 23.8 0.20 1 51 0.2102 58.6 0.99 2 75 0.2585 72.1 0.12 3 63 0.2120 59.1 0.26 4 38 0.1304 36.4 0.07 5 17 0.0641 17.9 0.05 6 5 0.0263 7.3 0.36 7 3 0.0092 2.6

≥ 8 1 0.0039* 1.1

total 279(n) 2.05( 2 )

9 11

Table 7-15 Poisson distribution test

* : X≥8 P : 0039.09961.01

Page 74: Chi-Square Test

105 74

279n , 686fX , 23422fX

46.2279/686 ,

22342 686 / 279

2.36279 1

2 ( )

Means and variance are close, so we can test this

data obey Poisson distribution。

0H :this data obey Poisson distribution

1H :this data doesn’t obey Poisson

distribution

10.0

Page 75: Chi-Square Test

105 75

According to the probability of Poisson

distribution function!

)(X

eXPX , 2.46 ,

we can get the number0 , 1 , 2 … …,

probability )(XP , theory frequent

nXPTX )( ,and any rawT

TA 2)( .

Page 76: Chi-Square Test

105 76

22 ( )

2.05A T

T

Use 527 (Because 6T ,7T ,8T united,it

was only seven columns.so 27 ),look up critical value table of 2 ,

9.075.0 P 。According to 10.0 critical level ,didn’t defuse 0H ,so we could think

this data obey Poisson distribution。

Page 77: Chi-Square Test

105 77