Upload
demetria-lois
View
45
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Chi-Square Test. Chapter 7. Content. test of fourfold data test of paired fourfold data Fisher probabilities in fourfold data test of R×C table Multiple comparison of sample rates test of goodness of fit. objection : - PowerPoint PPT Presentation
Citation preview
105 1
Chi-Square Test
Chapter 7
105 2
Content
• test of fourfold data
• test of paired fourfold data
• Fisher probabilities in fourfold data
• test of R×C table
• Multiple comparison of sample rates
• test of goodness of fit
22
2
2
objection : to deduce if there is any discrimination of the ratio or structure
ratio between two populations or among more than two populat
ions
multiple comparison of the ratio of multi-samples
to deduce if there is any correlation between two class variables
test of goodness of fit
test statistic :
fit for :qualitative data
2
105 4
Section 1 test of fourfold data 2
105 5
objective : to judge if there is any discrimination of the rate or structure ratio between two populations ( equal to the u-test )
demand : the number of individuals from the two samples classified into two categories should be transformed into a fourfold data
105 6
( 1 ) distribution is a continuous distribution :
( 2 ) one of the basic characters is that it can be p
lus to others :
1 The basic idea of test distribution2
2
(3)critical value of 2 :
2
105 7
0
0. 1
0. 2
0. 3
0. 4
0. 5
0 2 4 6 8 10 12 14 16
2( )f
2
1
6 10
105 8
2. The basic idea of test
eg 7-1 one hospital want to compare the curative effect of d
rug A ( experimental group ) and drug B control grou
p ) in lowering encephalic pressure 。 They classified 200
patients with high encephalic pressure into two groups at r
andom , the results are as follows (table 7-1) 。 So wheth
er the effective ratio is different?
2
105 9
group effective ineffective total the ratio of
effective(%) experimental group
99(90.48) a 5(13.52) b 104(a+ b) 95.20
control group 75(83.52) c 21(12.48) d 96 (c+d) 78.13
total 174(a+c) 26(b+d) 200 (n) 87.00
Table 7-1 the comparison of the efficient ratio between two groups in lowering encephalic pressure
105 10
this data can be sorted into the form as chart7-2 , th
at is to say there are two groups disposed, the number of e
ach of them is made up of two parts, occurred and not oc
curred. There are four basic data( )in the table ,
and other data can be induced by them, that is why it is c
alled fourfold table data.
99 5
75 21
105 11
treatment group
occur not occur total
first a b a+b
second c d c+d
total a+c b+d n
chart 7-2 the basic form of fourfold table data
105 12
Basic idea : can be understood through the basic formula of test
22 ( ) , ( )( 1)A T
T -1行数 列数
A means actual frequency ,while T means theoretical frequency 。
2
105 13
The respected frequencies can be calculated by the following formula :
R CRC
n nTn
TRC refers to the respected frequencies in Ro
w R and Line C
nR refers to the total of the right row
nC refers to the total of the right line
105 14
the respected frequency is set by the hypothesis , and by the ratio after merging 。
T0 1 2:H
105 15
the test statistic :the value of r
eflects the fitness of actual frequency a
nd respected frequency
2
105 16
from formula 7-1,we can see that the value of
also depends on the size of (exactly t
he size of ) 。 is decided by the number of the grids which can be evaluated freely, but not the sample size .
22( )A T
T
105 17
( 1 ) establish hypothesis, and set the crit
eria of the test 。H0:π1=π2 the effective ratios of the two collectivities in lowering
encephalic pressure between the experimental group and contro
l group is equal
H1:π1≠π2 the ratios of them are not equal
α=0.05 。
3. The process of hypothesis test
105 18
( 2 ) to calculate the test statistic
48.90200/17410411 T , 52.1348.9010412 T
52.8348.9017421 T , 48.1252.132622 T 。
2 2 2 2
2 ( 9 9 9 0 . 4 8 ) ( 5 1 3 . 5 2 ) ( 7 5 8 3 . 5 2 ) ( 2 1 1 2 . 4 8 )
9 0 . 4 8 1 3 . 5 2 8 3 . 5 2 1 2 . 4 8 1 2 . 8 6
1)12)(12(
105 19
consult the critical value table of 2 in
the additional table 8,we can conclude
that P is smaller than 0.05。According to
the test criteria 0.05 ,we should
reject 0H ,and accept 1H ,that is to say
that the effective ratios of the two
groups are different in lowering the
encephalic pressure, the former is
better than the latter .
105 20
2.The special formula
22 ( )
( )( )( )( )ad bc n
a b a c b d c d
22 (99 21 5 75) 200
12.86104 96 174 26
105 21
distribution is a continuous one, while the fourfol
d table data is dispersible, the value of
calculated by the latter is also dispersible, so in order
to improve the continuousness of the statistic distri
bution ,the continuousness correcting is needed.
22
105 22
3.The corrected formula
22 ( 0.5)c
A T
T
2
2( )
2( )( )( )( )c
n| ad - bc | - n
=a+b c+d a+c b+d
105 23
the conditions in choosing test formula for the fourfold table data :
40, 5n T
2
, special formula ;
, corrected formula ;
, Fishier exact probabilities method 。
40, 1 5n T
40 1n T 或
the continuity correcting for test is on fit for the fourfold table data when equals to 1 , while is more than one ,it shouldn’t be corrected 。
2
105 24
eg 7-2 one doctor want to compare the effect of dr
ug A and drug B in curing cerebrovascular disease
s , he classifies 78 patients with such illness into t
wo groups at random ,the results are as follows (ta
ble 7-2),So whether the curative effect of the two d
rugs is the same ?
105 25
Table 7-2 the comparison of the efficient ratio in curing cerebrovascular diseases with two kinds of drugs
组别 有效 无效 合计 有效率(%)
胞磷胆碱组 46 6 52 88.46
神经节苷酯组 18 8(4.67) 26 69.23
合计 64 14 78 82.05
105 26
0 1 2 1 1 2: , : , 0.05H H
in this case, , so the corrected formula can be used here
2278, 4.67 5n T
14.314642652
78)2/78186846( 22
c
, through the critical value table of ,we can know that 。 According to the test level 0.05, can’t be rejected ,so we can’t say that the effective ratios is different in curing cerebrovascular diseases.
12 10.005.0 P
0H
105 27
If not corrected , then
so the conclusion is on the contrary 。
2 4.35 0.05P ,
105 28
Section 2
2 -test of paired fourfold table
105 29
It is the same as the measurement data that there are g
roup design and paired design among the deduction of th
e differences of the two population ratios (proportions) in
enumeration count data . That is fourfold table data and paired fourfold table data
105 30
Example 7-3,A laboratory has measured the serum antinuclear antibodies in 58 patients with questionable systemic lupus erythematosus by latex agglutination and immunofluorescence ,according to table 7-3. Is there the difference between the two methods?
105 31
latex agglutination immunofluorescence
+ - total
+ 11(a) 12(b) 23 - 2(c) 33(d) 35
total 13 45 58
Table7-3 the result of the two methods
105 32
In the paired design experiment ,there are four possible results of the two treatments as to the each pair:
① positive number both of the two methods( a) ;
② negative number both of the two methods (d) ;
③ positive number of immunofluorescence, negative number of latex agglutination (b) ;
④ positive number of latex agglutination, negative number of immunofluorescence (c) 。
105 33
a, d are the agreement of the two methodsb, c are not agreement of the two methods
cb
cb
2
2 )( , 1
cb
cbc
22 )1(
, 1=
Statistic:
105 34
Cautions:The method is used for small sample
Reasons :
1. only consider the disagreement condition (b,c)
2. not consider the sample size n and the conditions of the a
greement (a,d)
When the n ,a,d are large enough and the b,c are relative small ,there is nothing practical significance even if there is statistical significance.
105 35
0H:CB,1H:CB, 05.0 4014212 cb ,amendatory formula
79.5212
)1212( 22
c
1 ,look up the 2 critical value table 025.001.0 P . According to the level 05.0 ,reject
0H , accept1H .there is
difference between the two methods, the positive ratio of
immunofluorescence is higher than that of latex agglutination.
Steps of the test :
105 36
Section 3
Fisher exact probabilities method in 2×2 table
105 37
conditons :
Basis of theroy : hypergeometric distribution
not test
40n ,or 1T ,
or P ,
2
105 38
Example 7-4 , a doctor will study the precaution affect of t
he type B hepatitis immunoglobulin against intrauterine infe
ction of fetus, and randomized 33 positive HBsAg patients i
nto two groups : precaution group and nonprecuation gro
up , looking at the table 7-4.Is there the difference betwee
n the two groups on the fetus infection ratio?
105 39
group positive negative total Infection
ratio(%)
Precaution 4 18 22 18.18
No precaution 5(3) 6 11 45.45
Total 9 24 33 27.27
table7-4 comparison between the two groups of fetus infection ratio of HBV
105 40
1.Basic idea:
When the periphery total numbers of fourfold
table are fixed, we can calculate the all
combinations probabilities of the four actual
frequencies, then make deduction according to the
α level and the cumulative probabilities.
105 41
(1) (2) (3) (4) (5)
0 22 1 21 2 20 3 19 4 18 9 2 8 3 7 4 6 5 5 6
ad-bc= -198 ad-bc= -165 ad-bc= -132 ad-bc =-99 ad-bc= -66
(6) (7) (8) (9) (10) 5 17 6 16 7 15 8 14 9 13 4 7 3 8 2 9 1 10 0 11
ad-bc= -33 ad-bc=0 ad-bc=33 ad-bc=66 ad-bc= 99
1 . Calculate Pi :
combination number:
minimal periphery total number +1
For example7-4 , the numbers of combination: 9+1=10
105 42
The sum of the Pi is 1
Calculation formula:
( ) ( )!( )!( )!
! ! ! ! !i
a b c d a c b dP
a b c d n
!
105 43
2.calculation of the accumulation probabilities
If crossing decibel of existent fourfold t
able is a*d* - b*c*=D*, the probability is
P*, than Di represents the crossing decibel
of other combination fourfold table, the pro
babilities are Pi.
105 44
(1)One-sided test
If the D*>0 in the existent fourfold table,
we must calculate the accumulation probabili
ties of all on the base of Di≥D* and Pi≤P*. If
D*<0, then we should calculate the accumula
tion probabilities on the condition of Di≤D*
and Pi≤P*.
105 45
(2)Two-sided test
Calculate the accumulation probabilities of all assembly fourfold table which are consistent with and .
If or , the sequences of all combination in the fourfold table are symmetry, we can get the two-sided accumulation probabilities only through the one-sided accumulation probabilities ×2.
|||| *DDi *PPi
dcba dbca
105 46
Checking procedure (this example is n=33<40)
0H : 21 , 1H : 21 , 05.0
1 、 Calculate the D* and P* of existent sample fourfold t
able ,as well as Di of all fourfold tables, please referenc
e the table 7-5.
in this example.
2 、 Calculate the Pi of all fourfold table consistent with
.
* *66, 0.08762728D P
|||| *DDi
105 47
3 、 Calculate the accumulation probabilitis of the fo
urfold tables corresponding and . I
n this example , , , , , and ,
are in line with the qualification. The accumulation pr
obability is
66|| iD *PPi
1P 2P 3P 4P5P 10P
1 2 3 4 5 10 0.1210 0.05P P P P P P P
05.0According to the size of test we can’t presume that the HBV infection rate of the infants which were performed precaution injection isn’t equal to that of who without pre-caution injection.
105 48
Fourfold table combination i a b c d
bcadDi iP
1 0 22 9 2 -198 0.00000143 2 1 21 8 3 -165 0.00009412 3 2 20 7 4 -132 0.00197656 4 3 19 6 5 -99 0.01844785 5* 4 18 5 6 -66* 0.08762728* 6 5 17 4 7 -33 7 6 16 3 8 0 8 7 15 2 9 33 9 8 14 1 10 66 0.09120390
10 9 13 0 11 99 0.01289752
Table7-5 The Fisher exact probility calculating table of the example7-4
105 49
Example 7-5 Some research studies the P53 expression of aden
oma of adenocarcinoma and adenoma of gallbladder, detect P53 expr
ession of 10 respective samples of each disease from the same time
exairesis by immunohistochemistry, data were shown in Table 7-6.
The problem is whether there is any significant difference between t
he positive rate between adenoma of adenocarcinoma and adenoma
of gallbladder ?
105 50
Types Positive Negative Total
Adenoma of adenocarcinoma
6 4 10
Adenoma of gallbladder
1 9 10
Totol 7 13 20
Table 7-6 P53 positive expression rate between adenoma of adenocarcinoma and adenoma of gallbladder
105 51
0H:21,1H:21,05.0
i Four-fold table combinations
a b c d bcadDi iP
1 0 10 7 3 -70 2 1 9 6 4 -50 3 2 8 5 5 -30 4 3 7 4 6 -10 5 4 6 3 7 10 6 5 5 2 8 30 7* 6 4 1 9 50* 0.02708978* 8 7 3 0 10 70 0.00154799
In this sample, a+b+c+d=10, and judge from table
7-7 , every combination in the four-fold table displaye
d a symmetry distribution centered by i=4 and i=5.
Table 7-7 Fisher exact propability calculating table of Example7-5
*is existed examples
105 52
1. Calculate D*, P*, and among the existent samples. Now *=50, P*=0.02708978.
2. Calculate each Pi in every combination of the four-fold table if Di≥
50.
3. Calculate accumulated possibility when Di≥50 and Pi≤P*. Here is
P7 and P8.
4. Calculate two-sided accumulated possibility P.
P > 0.05 , we can’t reject H0 according to size of test α=0.05, so
we’re unable to say P53 positive expression rate between adenoma of adenocarcinoma and adenoma of gallbladder is different 。
7 8 0.02708978 0.00154799 0.0286P P
7 82 ( ) 0.057P P P
105 53
In example 7-5, if we draw the conclusion by our profession
that P53 expression rate in adenoma of adenocarcinoma
should be higher than adenoma of gallbladder, then we can
make a one-sided test, 0H : 21 , 1H : 21 ,
05.0 , calculate the one-sided possibility from table 7-7
and get the result that 0286.087 PPP ,
05.001.0 P , thus can reject H0 and accept H1, we can say
that P53 expression rate in adenoma of adenocarcinoma is
higher than adenoma of gallbladder
Notice:
105 54
interception fifthMulti-comparison of many sam
ple rates
105 55
The method of multi-comparison of many sample rates are partition of method 、 scheffe’ method and SNK method. This unit just introduces the method of multi-comparison of many sample rates are partition of method
2
2
105 56
partition of method2
105 57
One principle idea
The data of multi-comparison of many sample rates can change into 2*C data. But we must set critical value again .
105 58
1.inter-comparison of many sample rates
We should set again. k:the number of comparison
'
'
12
=k
+
105 59
2.The treatment group and one comparison group . We should set again. k:the number of comparison
'
'
2( 1)k
105 60
105 61
Two 、 inter-comparison of many groups
105 62
Example 7-9 Please have inter-comp
arison for data 7-8 , in order to test any gr
oups whether or not be effective?
105 63
Test steps0H : 21 ,total effective rate of any groups is equal
1H : 21 ,total effective rate of any groups is unequal
05.0
' 0.050.05 / 4 0.0125
3(3 1) / 2 1
105 64
group effect Un-effect sum 2 P
Phy-group 199 7 206
Drug-group 164 18 182 total 363 25 388
6.76 <0.0125
Phy-group 199 7 206
Plaster-group 118 26 144 total 317 33 350
21.32 <0.00313
Drug-group 164 18 182
Plaster-group 118 26 144 total 282 44 326
4.59 >0.0125
Table 7-12 inter-comparison of three methods
105 65
According to the level 0125.0' , physical group
and drug group defuse 0H ,acceptting 1H ;
physical group and plaster group defuse 0H ,
accepting 1H ;drug group and plaster group
don’t defuse 0H .We may think the effective
rate of physical group and drug group、plaster
group are different ;we don’t think the effective
rate of drug group and plaster group isn’t
different.
105 66
Three 、 the comparison of many treatment group and one comparison group
105 67
Example 7-10 We think the drug group is comparison ,physical group and plaster group are treatment groups , is there the total effective rate of two treatment groups and comparison difference?
0H : CT ,the total effective rate of treatment groups and
comparison group are equal
1H : CT ,the total effective rate of treatment groups
and comparison group are unequal
05.0
0125.0)13(2
05.0'
105 68
Physical group and drug group: 76.62 , 0125.0P
Plaster group and drug group : 59.42 , 0125.0P According to the level of 0125.0' , Physical
group and drug group defuse 0H ,accepting 1H ,we
can think the total effective rate of physical group and
drug group are different ;Plaster group and drug
group don’t defuse 0H .We don’t think the two total
effective rates are different. Combine the result of
table 7-8,the effective rate of physical group is higher
than drug group.
105 69
Interception si xth the test for
l i near-trend of order grouping datas( omi t)
105 70
Interception seventh
The test of for frequency distribution
2
105 71
Pearson can reflect the accordance of fact freque
nt and theory frequent .So can infer the degree
of frequent distribution, example, normal distributi
on , binomial distribution , Poisson distribution ,negative distribution and so on 。
22
105 72
Example 7-12 In order to observe room dis
tribution of patients of keshan , inquirer set of
f 279 units of this region. Stat total cases of past
years of these units. We can see this data in colu
mn 1,2 of table 7-15 , did this data obey Poiss
on distribution ?
105 73
cases(X) Observe A P(X) Theory T (A-T)2/T
(1) (2) (3) (4)=(3) n (5) 0 26 0.0854 23.8 0.20 1 51 0.2102 58.6 0.99 2 75 0.2585 72.1 0.12 3 63 0.2120 59.1 0.26 4 38 0.1304 36.4 0.07 5 17 0.0641 17.9 0.05 6 5 0.0263 7.3 0.36 7 3 0.0092 2.6
≥ 8 1 0.0039* 1.1
total 279(n) 2.05( 2 )
9 11
Table 7-15 Poisson distribution test
* : X≥8 P : 0039.09961.01
105 74
279n , 686fX , 23422fX
46.2279/686 ,
22342 686 / 279
2.36279 1
2 ( )
Means and variance are close, so we can test this
data obey Poisson distribution。
0H :this data obey Poisson distribution
1H :this data doesn’t obey Poisson
distribution
10.0
105 75
According to the probability of Poisson
distribution function!
)(X
eXPX , 2.46 ,
we can get the number0 , 1 , 2 … …,
probability )(XP , theory frequent
nXPTX )( ,and any rawT
TA 2)( .
105 76
22 ( )
2.05A T
T
Use 527 (Because 6T ,7T ,8T united,it
was only seven columns.so 27 ),look up critical value table of 2 ,
9.075.0 P 。According to 10.0 critical level ,didn’t defuse 0H ,so we could think
this data obey Poisson distribution。
105 77