Upload
ajay-dhamija
View
13.971
Download
3
Tags:
Embed Size (px)
Citation preview
Power Analysis and Sample SizeDetermination
AK DhamijaAK Dhamija
Researchers differ
A researcher conducted a study comparing theeffect of an intervention vs placebo on reducingbody weight, and found 5 kg reduction amongthe intervention group with P=0.01.
Another researcher conducted a similar studycomparing the effect of the same intervention vsthe same placebo on reducing body weight, andfound the same 5 kg reduction with theintervention group but could not claim that theintervention was effective because P=0.35.
A researcher conducted a study comparing theeffect of an intervention vs placebo on reducingbody weight, and found 5 kg reduction amongthe intervention group with P=0.01.
Another researcher conducted a similar studycomparing the effect of the same intervention vsthe same placebo on reducing body weight, andfound the same 5 kg reduction with theintervention group but could not claim that theintervention was effective because P=0.35.
Agenda Power Sample Size Calculations Examples Changes in the basic formulae Flaws in Statements
Power Sample Size Calculations Examples Changes in the basic formulae Flaws in Statements
Power is Effected by…..
Variation in the outcome (σ2) ↓ σ2 → power ↑
Significance level (α) ↑ α → power ↑
Difference (effect) to be detected (δ) ↑ δ → power ↑
One-tailed vs. two-tailed tests Power is greater in one-tailed tests than in
comparable two-tailed tests
Variation in the outcome (σ2) ↓ σ2 → power ↑
Significance level (α) ↑ α → power ↑
Difference (effect) to be detected (δ) ↑ δ → power ↑
One-tailed vs. two-tailed tests Power is greater in one-tailed tests than in
comparable two-tailed tests
Power Changes
2n = 32, 2 sample test, 81% power, δ=2,σ = 2, α = 0.05, 2-sided test
Variance/Standard deviation σ: 2 → 1 Power: 81% → 99.99% σ: 2 → 3 Power: 81% → 47%
Significance level (α) α : 0.05 → 0.01 Power: 81% → 69% α : 0.05 → 0.10 Power: 81% → 94%
2n = 32, 2 sample test, 81% power, δ=2,σ = 2, α = 0.05, 2-sided test
Variance/Standard deviation σ: 2 → 1 Power: 81% → 99.99% σ: 2 → 3 Power: 81% → 47%
Significance level (α) α : 0.05 → 0.01 Power: 81% → 69% α : 0.05 → 0.10 Power: 81% → 94%
Power Changes
2n = 32, 2 sample test, 81% power, δ=2, σ = 2,α = 0.05, 2-sided test
Difference to be detected (δ) δ : 2 → 1 Power: 81% → 29% δ : 2 → 3 Power: 81% → 99%
Sample size (n) n: 32 → 64 Power: 81% → 98% n: 32 → 28 Power: 81% → 75%
One-tailed vs. two-tailed tests Power: 81% → 88%
2n = 32, 2 sample test, 81% power, δ=2, σ = 2,α = 0.05, 2-sided test
Difference to be detected (δ) δ : 2 → 1 Power: 81% → 29% δ : 2 → 3 Power: 81% → 99%
Sample size (n) n: 32 → 64 Power: 81% → 98% n: 32 → 28 Power: 81% → 75%
One-tailed vs. two-tailed tests Power: 81% → 88%
Power Formula
Depends on study design Not hard, but can be VERY algebra
intensive May want to use a computer program or
statistician
Depends on study design Not hard, but can be VERY algebra
intensive May want to use a computer program or
statistician
How Big a Sample We Need? Fundamental research question Should be addressed after determining the primary
objective and study design Too Few Patients in a clinical study
– May fail to detect a clinically important difference Too Many
– Involve extra patients– Therapy may have risks– Cost more
Fundamental research question Should be addressed after determining the primary
objective and study design Too Few Patients in a clinical study
– May fail to detect a clinically important difference Too Many
– Involve extra patients– Therapy may have risks– Cost more
How Big a Sample We Need? Fundamentalresearch question How Big?
18 180 1,800 18,000 180,000
Fundamentalresearch question How Big?
18 180 1,800 18,000 180,000
Sample Size Formula Information
Variables of interest type of data e.g. continuous, categorical
Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome
variables One or two-sided tests
Variables of interest type of data e.g. continuous, categorical
Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome
variables One or two-sided tests
Sample Size & Study Design Randomized controlled trial (RCT) Block/stratified-block randomized trial Equivalence trial Non-randomized intervention study Observational study Prevalence study Measuring sensitivity and specificity
Randomized controlled trial (RCT) Block/stratified-block randomized trial Equivalence trial Non-randomized intervention study Observational study Prevalence study Measuring sensitivity and specificity
Sample Size & Data Structure Paired data Repeated measures Groups of equal sizes Hierarchical data
Paired data Repeated measures Groups of equal sizes Hierarchical data
Sample Size Non-randomized studies looking for differences or
associations require larger sample to allow adjustment for confounding factors
Absolute sample size is of interest surveys sometimes take % of population approach
Study’s primary outcome is the variable you do the samplesize calculation for If secondary outcome variables considered important make sure
sample size is sufficient
Increase the ‘real’ sample size to reflect loss to follow up,expected response rate, lack of compliance, etc. Make the link between the calculation and increase
Non-randomized studies looking for differences orassociations require larger sample to allow adjustment for confounding factors
Absolute sample size is of interest surveys sometimes take % of population approach
Study’s primary outcome is the variable you do the samplesize calculation for If secondary outcome variables considered important make sure
sample size is sufficient
Increase the ‘real’ sample size to reflect loss to follow up,expected response rate, lack of compliance, etc. Make the link between the calculation and increase
StepsStep 1. Define Primary Objective
To see if feeding milk to 5 year old kids enhancesgrowth.
Step 2. Study Design Extra Milk Diet
5 yr olds Normal Milk Diet
Outcome: height (cm)
Step 3. Define clinically significant differenceone wishes to detect Difference (∆) of 0.5 cm
Step 1. Define Primary Objective To see if feeding milk to 5 year old kids enhances
growth.Step 2. Study Design
Extra Milk Diet
5 yr olds Normal Milk Diet
Outcome: height (cm)
Step 3. Define clinically significant differenceone wishes to detect Difference (∆) of 0.5 cm
StepsStep 4. Define degree of certainty of finding this
differencebeta (β) or type II error : The probability of NOT detecting asignificant difference when there really is one.
Risk of a false-negative finding ie Risk of declaring no significantdifference in height between the milk diets when a differencereally does exist.
Set at ≤ 20%
Power of the Test: Probability of detecting a predefined clinicallysignificant difference.
Power = (1- β) = 1 -20% = 80%
Step 4. Define degree of certainty of finding thisdifference
beta (β) or type II error : The probability of NOT detecting asignificant difference when there really is one.
Risk of a false-negative finding ie Risk of declaring no significantdifference in height between the milk diets when a differencereally does exist.
Set at ≤ 20%
Power of the Test: Probability of detecting a predefined clinicallysignificant difference.
Power = (1- β) = 1 -20% = 80%
StepsStep 5. Define significance level
Alpha (α) or type I error: The probability of detecting a significant differencewhen the treatments are really equally effective
Risk of a false-positive finding
Set at 5% :One has a 5% chance or 1 in 20 odds of declaring a significant differencebetween the milk diets when in fact they are really equal.
We are willing to accept that 1 time out of 20 we will produce a falsepositive finding
StatisticalStatistical significancesignificance testing doestesting does notnot eliminateeliminate uncertainty,ituncertainty,it merelymerelyquantifiesquantifies it.it.
Step 5. Define significance levelAlpha (α) or type I error: The probability of detecting a significant differencewhen the treatments are really equally effective
Risk of a false-positive finding
Set at 5% :One has a 5% chance or 1 in 20 odds of declaring a significant differencebetween the milk diets when in fact they are really equal.
We are willing to accept that 1 time out of 20 we will produce a falsepositive finding
StatisticalStatistical significancesignificance testing doestesting does notnot eliminateeliminate uncertainty,ituncertainty,it merelymerelyquantifiesquantifies it.it.
For the Milk Study Type I error (α) = 0.05 Type II error (β) = 0.20 Power = (1- β) = 0.80 Clinically significant diff (∆) = 0.5cm Measure of variation (SD) = 2.0 cm
– Exists in literature or “Guesstimate”FormulaFormula
N =N = 2(SD)2(SD)22 xx f(f(αα,, ββ))∆∆22
== 2(2)2(2)22 xx 7.9 / 0.57.9 / 0.522
== 252.8 (each group)
Type I error (α) = 0.05 Type II error (β) = 0.20 Power = (1- β) = 0.80 Clinically significant diff (∆) = 0.5cm Measure of variation (SD) = 2.0 cm
– Exists in literature or “Guesstimate”FormulaFormula
N =N = 2(SD)2(SD)22 xx f(f(αα,, ββ))∆∆22
== 2(2)2(2)22 xx 7.9 / 0.57.9 / 0.522
== 252.8 (each group)
Beta
Alpha 0.05 0.10 0.20 0.50
0.10 10.8 8.6 6.2 2.7
0.05 13.0 10.5 7.9 3.8
0.02 15.8 13.0 10.0 5.4
0.01 17.8 14.9 11.7 6.6
Simple Method Nomogram Standardized difference
= smallest medically relevant diffestimated standard deviation
= 0.5/2.0 = 0.25Assumptions:
1. 2 sample comparison only2. Same number of subjects
per group3. Variable is a continuous
measure that is normallydistributed
Nomogram Standardized difference
= smallest medically relevant diffestimated standard deviation
= 0.5/2.0 = 0.25Assumptions:
1. 2 sample comparison only2. Same number of subjects
per group3. Variable is a continuous
measure that is normallydistributed
500
1 sample test Study Objective : Study effect of new sleep aid Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 1-β = 90% Difference(δ) = 1 (4 hours of sleep to 5) Standard deviation(σ) = 2 hr
Change δ from 1hr to 2 hr makes n goes from 43 to 11
2 2 2 21 / 2 1
2 2
( ) (1.960 1.282) 242.04 43
1
Z Zn
Study Objective : Study effect of new sleep aid Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 1-β = 90% Difference(δ) = 1 (4 hours of sleep to 5) Standard deviation(σ) = 2 hr
Change δ from 1hr to 2 hr makes n goes from 43 to 11
2 2 2 21 / 2 1
2 2
( ) (1.960 1.282) 242.04 43
1
Z Zn
2 2
2
(1.960 1.282) 210.51 11
2n
1 sample test Change power from 90% to 80% makes n goes from 11 to 8 (Small sample: start thinking about using the t distribution)
Change the standard deviation from 2 to 3 makes n goes from 8 to 18
2 2
2
(1.960 0.841) 27.85 8
2n
Change power from 90% to 80% makes n goes from 11 to 8 (Small sample: start thinking about using the t distribution)
Change the standard deviation from 2 to 3 makes n goes from 8 to 182 2
2
(1.960 0.841) 317.65 18
2n
Sleep Aid Example: 2 Sample Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr) Two sample randomized parallel design Needed 43 in the one-sample design In 2-sample need twice that, in each group! 4 times as many people are needed in this design
Change δ from 1hr to 2 hr makes n goes from 72 to 44
2 2 2 21 / 2 1
2 2
2( ) 2(1.960 1.282) 284.1 85 170 total!
1
Z Zn
Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr) Two sample randomized parallel design Needed 43 in the one-sample design In 2-sample need twice that, in each group! 4 times as many people are needed in this design
Change δ from 1hr to 2 hr makes n goes from 72 to 44
2 2 2 21 / 2 1
2 2
2( ) 2(1.960 1.282) 284.1 85 170 total!
1
Z Zn
2 2
2
2(1.960 1.282) 221.02 22 44 total
2n
Sleep Aid Example: 2 Sample
Change power from 90% to 80% makes n goes from 44 to 32
Change the standard deviation from 2 to 3 makes n goes from 32 to 72
2 2
2
2(1.960 0.841) 215.69 16 32 total
2n
Change power from 90% to 80% makes n goes from 44 to 32
Change the standard deviation from 2 to 3 makes n goes from 32 to 722 2
2
2(1.960 0.841) 335.31 36 72 total
2n
Summary
Changes in the detectable difference haveHUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group
Changes in α, β, σ, number of samples, if it is a 1-or 2-sided test can all have a large impact on yoursample size calculation
Changes in the detectable difference haveHUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group
Changes in α, β, σ, number of samples, if it is a 1-or 2-sided test can all have a large impact on yoursample size calculation
Matched Pair Designs
Similar to 1-sample formula Means (paired t-test)
Mean difference from paired data Variance of differences
Proportions Based on discordant pairs
Similar to 1-sample formula Means (paired t-test)
Mean difference from paired data Variance of differences
Proportions Based on discordant pairs
Difference in Proportion Study Objective To increase survival by 5% with a new cancer drug P1 = % survival (std) = 85% P2 = % survival (new) = 90% Power = 90%
N = P1 (100 - P1) + P2 (100 - P2) x f (α, β) = 913.5 (each group)(P2 - P1)2
= 1827 Total
A very large study has the power to demonstrate statisticalsignificance for very small, even clinically inconsequentialdifferences.
Study Objective To increase survival by 5% with a new cancer drug P1 = % survival (std) = 85% P2 = % survival (new) = 90% Power = 90%
N = P1 (100 - P1) + P2 (100 - P2) x f (α, β) = 913.5 (each group)(P2 - P1)2
= 1827 Total
A very large study has the power to demonstrate statisticalsignificance for very small, even clinically inconsequentialdifferences.
Changes in basic formulae Unequal #s in Each Group
Ratio of cases to controls Use if want λ patients randomized to the treatment arm for every patient randomized
to the placebo arm
Take no more than 4-5 controls/case
2 1
2 2 21 / 2 1 1 2
1 2
controls for every case
( ) ( / )
n n
Z Zn
2 1
2 2 21 / 2 1 1 2
1 2
controls for every case
( ) ( / )
n n
Z Zn
# of Covariates & # of Subjects At least 10 subjects for every variable investigated
In logistic regression No general justification This is stability, not power Peduzzi et al., (1985) biased regression coefficients and
variance estimates
Principle component analysis (PCA) (Thorndike1978 p 184): N≥10m+50 or even N ≥ m2 + 50
At least 10 subjects for every variable investigated In logistic regression No general justification This is stability, not power Peduzzi et al., (1985) biased regression coefficients and
variance estimates
Principle component analysis (PCA) (Thorndike1978 p 184): N≥10m+50 or even N ≥ m2 + 50
Balanced Designs: Easier
Equal numbers in two groups is the easiestto handle
If you have more than two groups, still,equal sample sizes easiest
Complicated design = simulations Done by the statistician
Equal numbers in two groups is the easiestto handle
If you have more than two groups, still,equal sample sizes easiest
Complicated design = simulations Done by the statistician
Multiple Comparisons If you have 4 groups
All 2 way comparisons of means 6 different tests
Bonferroni: divide α by # of tests 0.025/6 ≈ 0.0042
High-throughput laboratory tests
If you have 4 groups All 2 way comparisons of means 6 different tests
Bonferroni: divide α by # of tests 0.025/6 ≈ 0.0042
High-throughput laboratory tests
Flaws in Statements "A previous study in this area recruited 150 subjects and found highly
significant results (p=0.014), and therefore a similar sample size shouldbe sufficient here."
Previous studies may have been 'lucky' to find significant results, due to randomsampling variation.
"Sample sizes are not provided because there is no prior information onwhich to base them."
Find previously published information Conduct small pre-study If a very preliminary pilot study, sample size calculations not usually necessary
No prior information on standard deviations Give the size of difference that may be detected in terms of number of standard
deviations
"A previous study in this area recruited 150 subjects and found highlysignificant results (p=0.014), and therefore a similar sample size shouldbe sufficient here."
Previous studies may have been 'lucky' to find significant results, due to randomsampling variation.
"Sample sizes are not provided because there is no prior information onwhich to base them."
Find previously published information Conduct small pre-study If a very preliminary pilot study, sample size calculations not usually necessary
No prior information on standard deviations Give the size of difference that may be detected in terms of number of standard
deviations
Roadmap1. Do a sample size calculation before you start
collecting data2. Collect data3. Perform statistical test : IF p value < 0.05, declare
statistical significance4. Consider clinical significance by looking at the size of
the difference
1. Do a sample size calculation before you startcollecting data
2. Collect data3. Perform statistical test : IF p value < 0.05, declare
statistical significance4. Consider clinical significance by looking at the size of
the difference
References
“Sample Size Estimation”, Phil Hahn Queen’sUniversity
”Sample Size and Power”, Laura Lee Johnson,Ph.D., Statistician, National Center forComplementary and Alternative Medicine
”Sample Size Estimation and Power Analysis”,Ayumi Shintani, PhD, MPH Department ofBiostatistics, Vanderbilt University
“Sample Size Estimation”, Phil Hahn Queen’sUniversity
”Sample Size and Power”, Laura Lee Johnson,Ph.D., Statistician, National Center forComplementary and Alternative Medicine
”Sample Size Estimation and Power Analysis”,Ayumi Shintani, PhD, MPH Department ofBiostatistics, Vanderbilt University