Upload
andrea-heath
View
215
Download
0
Embed Size (px)
Citation preview
Medical Statistics Medical Statistics (full English class)(full English class)
Ji-Qian Fang
School of Public Health
Sun Yat-Sen University
Vocabulary for Chapter 10 (I) ssttaattiissttiiccaall ddeessccrriippttiioonn 统计描述
eennuummeerraattiioonn ddaattaa 计数资料
aabbssoolluuttee mmeeaassuurree 绝对量
rreellaattiivvee mmeeaassuurreess 相对量
ccaatteeggoorryy 类别
ffrreeqquueennccyy 频数、频率
rreellaattiivvee ffrreeqquueennccyy 相对频数、频率
pprrooppoorrttiioonn 比率
iinntteennssiittyy 强度
rraattee 速率
rraattiioo 比
ddeennoommiinnaattoorr 分母
nnuummeerraattoorr 分子
ppoooolleedd eessttiimmaattee 联合估计
mmyyooppiiaa 近视眼
bbaallaannccee 均衡
ssttaannddaarrddiizzaattiioonn 标准化
ddiirreecctt ssttaannddaarrddiizzaattiioonn 直接标准化
iinnddiirreecctt ssttaannddaarrddiizzaattiioonn 间接标准化
ssttaannddaarrdd ppooppuullaattiioonn 标准人口
ssttaannddaarrdd mmoorrttaalliittyy rraatteess 标准死亡率
ssttaannddaarrddiizzeedd mmoorrttaalliittyy rraattee 标准化死亡率
ssttaannddaarrdd mmoorrttaalliittyy rraattiioo 标准死亡率比
wweeiigghhtteedd aavveerraaggee 加权平均
iinncciiddeennccee rraatteess 发病率
Absolute measure: The numbers counted for each
category (frequencies) The absolute measure can
hardly be used for comparison between different populations.
1. Relative measure1. Relative measure Three kinds of relative measures:
Frequency (Proportion) Intensity (Rate) Ratio
(1) Relative Frequency(1) Relative Frequency
condition certain with possiblely units ofnumber totalThe
conditioncertain with units ofnumber The
frequency Relative Proportion
Note: The Chinese text book is wrong!It is not “rate”!It is proportion or frequency!
Example 10-1(P.304, revised)
%16.15)gradeFirst Myopia( P
%89.15)grade SecondMyopia( P
%36.18)grade ThirdMyopia( P
%08.35)MyopiagradeFirst ( P%60.35)Myopiagrade Second( P
%32.29)Myopiagrade Third( P
Question: Which grade has the most serious condition of myopias?
Table 10-1 Prevalence rates and constitute of myopia in a junior high school Grade Number
of students tested
Number of students
with myopia
Prevalence rate (%)
Constitute Among myopias
(%) First grade 442 67 15.16 35.08 Second grade 428 68 15.89 35.60 Third grade 305 56 18.36 29.32 Total 191 100.00
Prevalence rates describe : P(Myopia|First grade) P(Myopia|Second grade) P(Myopia|Third grade)Constitute among myopias describe: P(First grade | Myopia) P(Second grade | Myopia) P(Third grade | Myopia)Which grade has the most serious condition of myopias?Answer: P(Myopia|Third grade) = Maximum --The third grade has the highest prevalence of myopias P(Second grade | Myopia)= Maximum -- Among the myopias, the absolute number of Second grade students is the highest.
(2) Intensity(2) IntensityExample A smoking population had Example A smoking population had followed up for 562833 person-years, 346 followed up for 562833 person-years, 346 lung cancer cases were found.lung cancer cases were found.
The incidence rate of lung cancer in the The incidence rate of lung cancer in the smoking population is :smoking population is :
Incidence rate =346/562833Incidence rate =346/562833
=61.47 per 100,000 person-year=61.47 per 100,000 person-year
year theduring disease ofrisk the toexposing yearsperson
year theduring occuring patients ofNumber
yearcertain in rate Incidence
year theduringdeath ofrisk the toexposing yearsperson
year theduring deaths ofNumber
yearcertain in rateMortality
Example The mortality rate of liver cancer in Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.Guangzhou is 32 per 100,000 per year.
period in the observed years-person Total
period in the appearing events ofNumber
periodcertain in Intensity
In general,
Denominator: Sum of the person-years observed in the period
Numerator: Total number of the event appearing in the period
Unit: person/person year, or 1/YearNature: the relative frequency per unit of time.
(3) Ratio(3) Ratio Ratio is a number divided by another related numberExamples Sex ratio of students in this class: No. of males : No. of females = 52%
Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40
2. Caution 2. Caution in use of relative measuresin use of relative measures
a. The denominator should be big enough! Otherwise the absolute measure should be used.
Example: Out of 5 cases, 3 were cured– 60% ?b. Attention to the population where the relative
measure comes from. Mistake in the textbook (P.305) : “Distinguish between constitutes and proportion” !? We should say “Distinguish between Prevalence
rate and Constitute among patients” Prevalence rate: Population is the students in thesame grade Constitutes: Population is all the patients
Table 10-2 Constitute of infectious diseases in a city (Frequency distribution among patients)
1985 1995
Infectious disease No. of cases
(Frequency)
Relative
frequency (%)
No. of cases
(Frequency)
Relative
frequency (%)
Diarrhea 3604 49.39 2032 37.92
Hepatitis 1203 16.49 1143 21.33
Epidemic encephalitis 698 9.56 542 10.11
Measles 890 12.20 767 14.31
Others 902 12.36 875 16.33
Total 7297 100.00 5359 10.00
• The above two frequency distributions reflect two populations of all patients;
• To describe the prevalence rate, one has to look at the general population;
c. Pooled estimate of the frequency Pooled estimate = numerators / denominators Example: The prevalence of myopia among 3 grades ≠ (15.16+15.89+18.37)/3 The prevalence of myopia among 3 grades = (67+68+56)/(442+428+305) = 192/1175 = 16.34d. Comparability between frequencies or
between frequency distributions – Notice the balance of other conditions
e. If the distributions of other variables are different, to improve the comparability, “Standardization” is needed.
f. To compare two samples, hypothesis test is needed. (See Chi square test)
The following will emphasize the above two points:
Standardization
Hypothesis test
3. Standardization for 3. Standardization for crude frequency or crude intensitycrude frequency or crude intensity
Crude incidence rate of city A=28.96; Crude incidence rate of city B=35.03 -- Strange!?
They are not comparable ! -- Because the constitute are quite different
City A City B Age
group
(Year)
Population Constitute No.of
cases
Incidence
rate
Population Constitute No.of
cases
Incidence
rate
1 - 2542 0.1219 316 124.31 1014 0.2592 117 115.38
5 - 4285 0.2054 168 39.21 1905 0.4870 16 8.40
10 -12 14029 0.6727 120 8.55 992 0.2538 4 4.03
Total 20856 1.0000 604 28.96 3911 1.0000 137 35.03
Table 10-3 Incidence rates of infectious diseases, children of two cities
Standardized incidence rate of city A = 793/24767 = 32.02 ‰
Standardized incidence rate of city B = 3523/24767 = 21.12 ‰
Two steps: Select a standard population– taking as “weight” Weighted average of the actual incidence rates–direct standardization rate
Direct standardization of the incidence rates of infectious disease for children in city A and B
City A City B Age
group
(Year)
Standard
population
Ni
Actual
incidence
rate (‰)
Pa
Expected
number of
cases
Ni Pa
Actual
incidence
rate (‰)
Pb
Expected
number of
cases
Ni Pb
1 - 3556 124.31 422 115.38 410
5 - 6190 39.21 243 8.40 52
10 -12 15021 8.55 128 4.03 61
Total 24767
(N)
28.96 793
Σ Ni Pa
35.03 3523
Σ Ni Pb
•Known: Age specific populations Ni1, Ni2; Total no.of deaths Di1=432, Di2 =210
• Select a set of standard mortality rates
• Standard mortality ratio: SMR1 = Di1/ Ni1Pi = 432/100.67 = 4.2912 (smoker)
SMR2 = Di2/ Ni2Pi = 210/100.67 = 0.8620 (non-smoker)
• Standardized mortality rate P’1=34.60 SMR1=148.48 (1/105),
P’2=34.60 SMR2=29.83 (1/105)
Table 10-6 Indirect standardization of death rates for lung cancer, smokers and non-smokers
Smoker Non-smoker Age
group
(Year)
Standard
mortality rate
of lung cancer
(1/105)
Pi
Observed
person-years
Ni1
Expected
number of
deaths
Ni1 Pi
Observed
person-years
Ni2
Expected
number of
deaths
Ni2 Pi
35 - 7.04 49705 3.50 189370 13.33
45 - 25.70 42633 10.96 104762 26.92
55- 108.25 28117 30.44 60043 65.00
65- 263.94 10624 28.04 27540 72.69
75- 451.87 6137 27.73 14532 65.67
Total 34.60 137216 100.67 396247 243.61
Vocabulary of Chapter 10 (II)
chi-square test 卡方检验 2 test 卡方检验
u test u检验
contingency table 列联表
observed frequency 观察频数
theoretical frequency 理论频数
row 行
column 列
adjustment 校正
positive rate 阳性率
equivalent to 等价于
large sample 大样本
significant difference 有统计学意义的差异
1. Sampling error of frequency1. Sampling error of frequency
Example Suppose the death rate is 0.2, if the rats
are fed with a kind of poison..
What will happen when we do the experiment on
n=1, 2, 3 or 4 rat(s)?
n d Frequency distribution Sample rate 1 0
1 0.8 0.2
0/1=0 1/1=1
2 0 1 2
0.8× 0.8=0.64 0.8× 0.2+0.2× 0.8=0.32 0.2× 0.2
0/2=0 1/2=0.5 2/2=1
3 0 1 2 3
0.8× 0.8× 0.8=0.512 3(0.8× 0.8× 0.2)=0.384 3(0.8× 0.2× 0.2)=0.096 0.2× 0.2× 0.2=0.008
0/3=0 1/3=0.3 2/3=0.7 3/3=1
4 0 1 2 3 4
0.8× 0.8× 0.8× 0.8=0.4096 4(0.8× 0.8× 0.8× 0.2)=0.4096 6(0.8× 0.8× 0.2× 0.2)=0.1536 4(0.8× 0.2× 0.2× 0.2)=0.0256 0.2× 0.2× 0.2× 0.2=0.0016
0/4=0 1/4=0.25 2/4=0.5 3/4=0.75 4/4=1
In general,
Supposed the population proportion is , sample size =n
The frequency is a random variable
When is unknown and n is big enough, is approximately equal to
nP
P
)1(
n
XP
n
ppsP
)1(
P
Example 10-5 HBV Surface antigen. Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive. 200 people were tested, 7 positive.
%5.3200
7
n
XP
%30.10130.0200
)035.01(035.0)1(
n
ppsP
If the sample size n is big enough, and observed frequency is p , then we have approximately
))1(
,(~n
ppNP
2. Confidence Interval of Probability2. Confidence Interval of Probability
If the sample size n is big enough,
and observed frequency is p , then
95% Confidence interval
99% Confidence intervaln
ppp
)1(96.1:
n
ppp
)1(58.2:
Example 10-5 HBV Surface antigen. Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive. 200 people were tested, 7 positive.
%05.6~%95.0%30.196.1%5.3
)1(96.1:
n
ppp
%85.6~%15.0%30.158.2%5.3
)1(58.2:
n
ppp
3. The hypothesis testing of 3. The hypothesis testing of proportion (u test)proportion (u test)
1. Comparison of sample proportion and population proportion
Example 10.6 Cerebral infarction
Cases Cure rate
New Method 98 50%
Routine 30%
3.0:3.0: 10 HH
Statistic u
Decision rule If , then reject Otherwise, no reason to reject (accept )
Since , reject
uu 0H
0H
96.1u0H
n
pu
)1( 00
0
32.4
98)3.01(3.0
3.05.0
)1( 00
0
n
pu
0H
2. Comparison of two sample proportions
Example 10.7 Carrier rate of Hepatitis B
City: 522people were tested, 24 carriers, 4.06% (population carrier rate: 1)
Countryside: 478people were tested, 33 carriers, 6.90% (population carrier rate: 2)
211210 :: HH
Pooled estimate
Standard error of P1-P2
21
21
nn
XXpc
057.0478522
3324
cp
)11
)(1(21
21 nnpps ccpp
0147.0)478
1
522
1)(057.01(057.0
21 pps
Statistic u
Decision rule
If , then reject
Otherwise, no reason to reject (accept )
Since , not reject
21
21
pps
ppu
uu 0H
0H
565.10147.0
069.0046.0
21
21
pps
ppu
96.1u 0H
0H
SummarySummaryThe parameter estimation and hypothesis
testing of proportion are based on the normal approximation (when sample size is big enough)
How big is enough?
By experience,
n > 5 and n(1-) >5 If the sample size is not big, u test can’t be
used and there is no t-test for proportion. (see more detailed text book)
The u test can only be used for comparing with a given 0 (one
sample)or comparing 1 with 2 (two samples).
If we need to compare more thantwo samples, Chi-square test is widelyused.
1. Basic idea of 1. Basic idea of 22 test test Given a set of observed frequency distribution A1, A2, A3 …
to test whether the data follow certain theory. If the theory is true, then we will have a set of theoretical frequency distribution: T1, T2, T3 …
Comparing A1, A2, A3 … and T1, T2, T3 …
If they are quite different, then the theory might not be true;
Otherwise, the theory is acceptable.
Example10-8 Acute lower respiratory infection Treatment Effect Non-effect Total Effect rate
Drug A 68(64.82) a 6(9.18) b 74 (a+b) 91.89 %
Drug B 52(55.18) c 11(7.82) d 63(c+d) 82.54 %
Total 120 (a+c) 17 (b+d) 137 53.59 %
(2) Chi-square test for 2(2) Chi-square test for 22 table2 table
HH0: : 11==22, , HH1: : 11≠≠22, , αα=0.05 =0.05
To calculate the theoretical frequencies If If HH0 is true, 11==2 2 120/137120/137
TT1111=74=74120/137 =64.82, 120/137 =64.82, TT2121=63=63120/137=55.18120/137=55.18
TT1212=74=7417/137 =9.18, 17/137 =9.18, TT2222=63=6317/137=7.8217/137=7.82
To compare A and T by a statistic 2
If H0 is true, 2 follows a chi-square distribution. =(row-1)(column-1) If the 2 value is big enough, we doubt about H0 , then reject H0 !
......)()(
12
21212
11
211112
T
TA
T
TA
n
nnT CR
RC totalRow :Rn alColumn tot :Cn
To Example10-8 ,
=(row-1)(column-1)=(2-1)(2-1)=1, 2
0.05(1)=3.84,
Now, 2=2.734<3.84, P>0.05, H0 is not rejected.
We have no reason to say the effects of two treatments are different.
734.282.7
)82.711(
18.9
)18.96(
18.55
)18.5552(
82.64
)82.6468( 22222
For 22 table, there is a specific formula of chi-square calculation:
734.2171206374
137)5261168( 22
To Example10-8 ,
Large sample is required
(1) N 40, Ti 5, N 40
(2)If n < 40 or Ti < 1, 2 test is not applicable
(3)If N 40, 1 Ti < 5 , needs adjustment: ...
)5.0()5.0(
2
222
1
2112
T
TA
T
TA
Example 10-9 Hematosepsis
Treatment Effective No effect Total Effective rate (%)
Drug A 28 (26.09) 2 (3.91) 30 93.33
Drug B 12 (13.91) 4 (2.09) 16 75.00
Total 40 6 46 86.96
rejectednot is ,05.0,84.3687.1
84.3
1)12)(12(
687.16401630
46)246
122428(
05.0:,:
0
2)1(05.0
2
2
211210
HP
HH
(3) (3) 22 test for paired 2 test for paired 22 table2 table Example 10-10 Two diagnosis methods
are used respectively for 53 cases of lung cancer.
Question: Are the two positive rates equal?
Method A Method B Total
+ -
+ 25(a) 2(b) 27
- 11(c) 15(d) 26
Total 36 17 53
Note:The two samples are not independentNote:The two samples are not independent --The above --The above 22 test does not work test does not work
Method A Method B Total
+ -
+ 25(a) 2(b) 27
- 11(c) 15(d) 26
Total 36 17 53Question: Are the two positive rates equal?Basic idea: ComparingComparing and and Equivalent to ComparingComparing “2” “2” and and “11”“11”Given 13 patients, do they fall in the two cells with equal chance?
Example 10-10 Two diagnosis methods are used respectively for 53 cases of lung cancer.
53
225 Ap
53
1125 Bp
H0: 1=2, H1: 1≠2, α=0.05
When H0 is true,
For large sample (b+c>40)
Otherwise, needs adjustment
If the 2 value is too big, then reject H0
bA 1cA 2
221
cbTT
cb
cbcb
cbc
cb
cbb
222
2 )(
2
)2
(
2
)2
(
cb
cb
2
2 )1(
Example10-10:
=1, 4.92>3.84, =1, 4.92>3.84, PP<0.05, <0.05, HH00 is is rejectedrejected
Conclusion: There is significant Conclusion: There is significant difference in positive rates between difference in positive rates between the two diagnosis methods. the two diagnosis methods.
Since Since PPAA< < PPBB , method B is better. , method B is better.
92.4112
)1112( 22
(4)Chi-square test for R(4)Chi-square test for RC tableC tableTable 6.6 Blood types of patient with different diseases
Blood type Total Disease status
A B O
Digestive ulcer 679 134 983 1796
Stomach cancer 416 84 383 883
Control 2625 570 2892 6087
Total 3720 788 4258 8766
Remark: There is no order among the categories!
0H: The distributions of blood types in three populations are all same
1H: The distributions are not all same
n
nnT CR
RC
......)()(
12
21212
11
211112
T
TA
T
TA
To calculate theoretical frequenciesTo calculate theoretical frequencies
To compare A and T by statistic To compare A and T by statistic 22
Specific formulaSpecific formula
1
22
CR
RC
nn
An
543.40142586087
2892
37201796
6798766
222
= (3–1) (3–1) =4, 205.0=9.488 , p<0.05 , 0H is rejected.
Conclusion: the three diseases might have different distributions of blood type
Caution: (1) Either 22 table or RC table are all
called contingency table. 22 table is a special case of RC table
(2) When R>2, “H0 is rejected”only means there is difference among some groups. Does not necessary mean that all the groups are different.
(3) The 2 test requires large sample :
By experience, The theoretical frequencies should be greater than 5 in more
than 4/5 cells; The theoretical frequency in any cell should be greater than
1.
Otherwise, we can not use chi-square test directly.
If the above requirements are violated, If the above requirements are violated, what should we do?what should we do?(1) Increase the sample size.(1) Increase the sample size.(2) Re-organize the categories, (2) Re-organize the categories, Pool some categories, Pool some categories, oror Cancel some categories Cancel some categories
Think: In fact, it is not appropriate to use a Chi-square test for Example 10-10 in the textbook. Why?