Upload
laurencia-leny
View
94
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Metodologi penelitian
Citation preview
[email protected]/2010
Population is a large group of study subjects (human, animals, tissues, blood specimens, medical records, etc) with defined characteristics [“Population is a group of study subjects defined by the researcher as population”]
Sample is a subset of population which will be directly investigated. Sample should be (or assumed to be) representative to the population; otherwise all statistical analyses will be invalid
All investigations are always performed in the sample, and the results will be applied to the population
[email protected]/2010
Avoid using ambiguous terms
Sample populationSampled populationPopulasi sampelStudy population ~ sample
[email protected]/2010
Gap between Das Sein & Das Sollen
Literature study
Research question(s) / Hypothesis
Methods / Design
Data collection &analyses
Conclusions
In the real world(“Population”)
In the sample
Infer
[email protected]/2010
Sample is assumed to be representative
to the population. In research: measurements are always done in the sample, the
results will be applied to population.
S
P P
S
[email protected]/2010
Target population Accessible population
IntendedSample
Actualstudy subjects
Actualstudy subjects
[email protected]/2010
Target population = domain = population in which the results of the study will be applied. In clinical research it is usually characterized by demographic & clinical characteristics; e.g. normal infants, teens with epilepsy, post-menopausal women with osteoporosis. Accessible population = subset of target population which can be accessed by the investigator. Frame: time & place. Example: teens with epilepsy in RSCM, 2000-2005; women with osteoporosis, 2002 RSGSIntended sample = subjects who meet eligibility criteria and selected to be included in the studyActual study subjects = subjects who actually completed the participation in the study
[email protected]/2010
Accessible population(+ time,
place)
Usually based on practicalpurposes
Appropriatesampling technique
[Non-response, drop outs,withdrawals, loss to follow-up]
Target population
(demographic, clinical)
IntendedSample
[Subjects selectedfor study]
Actualstudy
subjectsSubjects
completedthe study
[email protected]/2010
Target Population(Domain)
Accessible population
IntendedSample
Actualstudy
subjects
External validity II:Does AP represent TP?
[Internal validity: does ASS represent IS?]
[External validity I:Does IS represent AP?}
[email protected]/2010
Internal validity: how well the study was done (usu. measurement, but also incl. whether actual study subjects represent intended sample or not). Many drop outs? loss to follow up? low compliance?.External validity I: assess whether intended sample represents accessible population (random sampling? convenient sampling?) External validity II: whether accessible population represents target population. This cannot be calculated, but can be judged by common sense & general knowledge
Validity: Internal & external
[email protected]/2010
A. Probability samplingSimple random sampling (r. table, computer generated)Stratified random samplingSystematic samplingCluster samplingOthers: two stage cluster sampling, etc
B. Non-probability samplingConsecutive samplingConvenience sampling Judgmental sampling / Purposive sampling
Sampling methods
[email protected]/2010
Predicting the 1936 Election
In 1936, Literary Digest mailed questionnaires to 10 million people, asking who they would vote for in the upcoming presidential election. The list was complied from magazine subscribers, car owners and telephone directories. Based on the 2.3 million responses, they predicted a victory for Republican Landon over Roosevelt by a 60 to 40 margin.Roosevelt won with 61% of the vote, to 36% for Landon.George Gallup correctly predicted the election—and the results of the Literary Digest poll!—to within 1 percent, using random samples.
[email protected]/2010
Probability sampling (1)Simple random sampling: – Select 50 out of 900 students 1. Using Random number table:
o Example: 146*72 2*238*9 12*970 *127*63 8*759*0 29*874
*390*48 6*83012. Using computer generated random numbers (pseudo-random) Command: How many subjects do you have? 900
How many do you want to select? 50Enter → 017, 068, 113, 142, etc
Repeating the procedure exactly will result in completely different numbers
[email protected]/2010
Simple Random Sample: n = 20, N= 2000
[email protected]/2010
Probability sampling (2)
Systematic sampling: Every m subject is selectedSelected number: k
Example: k =3, m =10:3, 13, 23, 33, 43, etc
Better (more representative) than SRS if no natural trends or strata
[email protected]/2010
Probability sampling (2)
Stratified [random] sampling: Random sampling is done in each strata separately, e.g., by sex, age group, stage of disease, etcThe results then combined
[email protected]/2010
Stratified sample of 20 from 4 strata
[email protected]/2010
Probability sampling (3)
Cluster sampling
Subjects are selected separatelyaccording to cluster or place (RT, RW,district, etc)
[email protected]/2010
Cluster Sample of 20 (cluster size = 4)
[email protected]/2010
Non-probability sampling (1)
Consecutive sampling:
Subjects are selected according to theirappearance on the listMost commonly used in clinical studies
Can be expected resembling randomsampling if time span is long enough
This is the best of non-probability sampling
[email protected]/2010
Non-probability sampling (2)
Convenience samplingJudgmental sampling
They are rarely justified except for certain conditions, e.g. normal values
[email protected]/2010
All statistical analyses (inferences) are based on (simple) random samplingWhether or not a sample is representative to the population depends on whether or not it resembles the results if it were done by random sampling
Note
[email protected]/2010
How to generalize results in the sample
to the population:
Introduction to statistical inference
[email protected]/2010
IMPORTANT!!!Statistical significance vs. clinical
importanceNegligible clinical difference may be statistically very significant if the number of subjects >>>. e.g., difference in reduction of cholesterol level of 3 mg/dl, n1=n2 = 10,000; p = 0.00002Large clinical difference may be statistically non-significant if the no of subjects <<<, e.g. 30% difference in cure rate, if n1 = n2 = 10, p = 0.74
[email protected]/2010
R
x = 300 mg/dl
x = 300mg/dl
Standardtreatment
New treatment
Cholesterol level, mg/dl
t = df = 9998 p = 0.00002
x = 200
x = 197
Clinical
Statistical
Clinical importance vs. statistical significance
n=10000
n=10000
[email protected]/2010
Cured Died
Standard Rx 0 10 (100%)
New Rx 3 7 (70%)
Fischer exact test: p = 0.211
Clinical significance vs. statistical significance
Absolute risk reduction = 30% Clinical
Statistical
[email protected]/2010
Abstract• Objectives:• Methods:• Results: After 2 months of
treatment, there was significant difference in LDL (P = 0.0032), HDL (P = 0.048), but there was no significant difference in triglyceride (P= 0.073) between the 2 groups.
• Conclusion:
[email protected]/2010
Can the results of the study (in sample) be applied in the accessible or target population?Hypothesis testing & confidence interval
Introduction to statistical inference
[email protected]/2010
Statistic and Parameter
An observed value drawn from the sample is called a statistic (cf. statistics, the science)The corresponding value in population is called a parameterWe measure, analyze, etc statistics and translate them as parameters
[email protected]/2010
Examples of statistics:
ProportionPercentageMeanMedian ModeDifference in proportion/mean
ORRRSensitivitySpecificityKappaLRNNT
[email protected]/2010
There are 2 ways in inferring statistic into parameter:
Hypothesis testing p valueEstimation: confidence interval (CI)
P Value & CI tell the same concept in different ways
[email protected]/2010
P value
Determines the probability that the observed results are caused solely by chance (probability to obtain the observed results if Ho were true)
[email protected]/2010
C 30 (60%) 20 (40%) 50
E 40 (80%) 10 (20%) 50
X2= ; df = 1; p = 0.0432
Group Success Failure Total
[email protected]/2010
C 30 (60%) 20 (40%) 50
E 40 (40%) 10 (20%) 50
X2= ; df = 1; p = 0.0432
Group Success Failure Total
If drugs E and C were equally effective, we still can have the above result (difference of success rate of 20%)
but the probability is small (4.32%)
If drugs E and C were equally effective, the probability that the result is merely caused by chance is 4.32%
If we define in advance that p<0.05 is significant,than the result is called statistically significant
[email protected]/2010
Similar interpretation applies to ALL hypothesis testing: t-test, Anova,
non-parametric tests, Pearson correlation, multivariate tests, etc:
If null-hypothesis null were true, the probability of obtaining the
result was ……. (example 0,02 or 2%, etc)
[email protected]/2010
Confidence Intervals
Estimate the range of values (parameter) in the population using a statistic in the sample (as point estimate)
[email protected]/2010
X XX
If the observedresult in the
sample is X, whatis the figure inthe population?
CI
A statistic (point estimate)
S
P
[email protected]/2010
Most commonly used CI:
CI 90% corresponds to p 0.10CI 95% corresponds to p 0.05CI 99% corresponds to p 0.01
Note:p value only for analytical studiesCI for descriptive and analytical studies
[email protected]/2010
How to calculate CI
General Formula:
CI = p Z x SE
•p = point of estimate, a value drawn from sample (a statistic)
•Z = standard normal deviate for , if = 0.05 Z = 1.96 (~ 95% CI)
[email protected]/2010
Example 1
100 FKUI students 60 females (p=0.6)What is the proportion of females in Indonesian FK students? (assuming FKUI represents FK in Indonesia)
[email protected]/2010
Example 1
70501060
96160
10040609616095
.;...
..
....%
npqSE(p)
=±=
±=
±=
=
X0.5/10
xCI
[email protected]/2010
Example 2: CI of the mean
• 100 newborn babies, mean BW = 3000 (SD = 400) grams, what is 95% CI?
95% CI = x 1.96 x SEM
3080;2920
)803000();803000(803000100
400x96.13000CI%95
nSDSEM
[email protected]/2010
Examples 3: CI of difference between proportions (p1-p2)
• 50 patients with drug A, 30 cured (p1=0.6)• 50 patients with drug B, 40 cured (p2=0.8)
29.0;11.0)09.02.0();9.02.0()pp(CI%95
09.050
4.0
50
)2.08.0(
50
)4.06.0(
n
qp
n
qp)pp(SE
)pp(xSE96.1)pp()pp(CI%95
21
2
21
2
1121
212121
[email protected]/2010
Example 4: CI for difference between 2 means
Mean systolic BP:50 smokers = 146.4 (SD 18.5) mmHg50 non-smokers = 140.4 (SD 16.8) mmHg
x1-x2 = 6.0 mmHg
95% CI(x1-x2) = (x1-x2) 1.96 x SE (x1-x2)
SE(x1-x2) = S x V(1/n1 + 1/n2)
[email protected]/2010
Example 4: CI for difference between 2 means
V
13.01.0;)(1.96X3.536.095%CI
3.53501
501
17.7)xSE(x
17.798
16.24918.6)(49s
2)n(n1)s(n1)s(n
s
21
21
222
211
[email protected]/2010
Other commonly supplied CI
Relative risk (RR)Odds ratio (OR)Sensitivity, specificity (Se, Sp)Likelihood ratio (LR)Relative risk reduction (RRR)Number needed to treat (NNT)
[email protected]/2010
Suggested CI presentation:
• 95%CI: 1.5 to 4.5• 95%CI: -2.5 to 4.3• 95%CI: 12 to -6
• Not recommended: 3 + 1.5• Not recommended: -9 + -3
[email protected]/2010
In contrast to CI for proportion, mean, diff. between proportions/means, where the values of CI are symmetrical around point estimate, CI’s for RR, OR, LR, NNT are asymmetrical because the calculations involve logarithm
[email protected]/2010
Examples
RR = 5.6 (95% CI 1.2 ; 23.7)OR = 12.8 (95% CI 3.6 ; 44,2)NNT = 12 (95% CI 9 ; 26)
[email protected]/2010
If p value <0.05, then 95% CI:exclude 0 (for difference), because if A=B then A-B = 0 p>0.05exclude 1 (for ratio), because if A=B then A/B = 1, p>0.05
For small number of subjects, computer calculated CI may not meet this rule due to correction for continuity automatically done by the computer
[email protected]/2010
Concluding remarksIn every study sample should (assumed to) be representative to the population. Otherwise all statistical calculations are not validp values (hypothesis testing) gives you the probability that the result in the sample is merely caused by chance, it does not give the magnitude and direction of the differenceConfidence interval (estimation) indicates estimate of value in the population given one result in the sample, it gives the magnitude and direction of the difference
[email protected]/2010
Concluding remarks
p value alone tends to equate statistical significance and clinical importanceCI avoids this confusion because it provides estimate of clinical values and exclude statistical significance whenever applicable, supply CI especially
for the main results of study in critical appraisal of study results, focus
should be on CI rather than on p value.