Power Analysis and Sample Size Determination

Power Analysis and Sample SizeDetermination

AK DhamijaAK Dhamija

Researchers differ

A researcher conducted a study comparing theeffect of an intervention vs placebo on reducingbody weight, and found 5 kg reduction amongthe intervention group with P=0.01.

Another researcher conducted a similar studycomparing the effect of the same intervention vsthe same placebo on reducing body weight, andfound the same 5 kg reduction with theintervention group but could not claim that theintervention was effective because P=0.35.

A researcher conducted a study comparing theeffect of an intervention vs placebo on reducingbody weight, and found 5 kg reduction amongthe intervention group with P=0.01.

Another researcher conducted a similar studycomparing the effect of the same intervention vsthe same placebo on reducing body weight, andfound the same 5 kg reduction with theintervention group but could not claim that theintervention was effective because P=0.35.

Agenda Power Sample Size Calculations Examples Changes in the basic formulae Flaws in Statements

Power Sample Size Calculations Examples Changes in the basic formulae Flaws in Statements

Power is Effected by…..

Variation in the outcome (σ2) ↓ σ2 → power ↑

Significance level (α) ↑ α → power ↑

Difference (effect) to be detected (δ) ↑ δ → power ↑

One-tailed vs. two-tailed tests Power is greater in one-tailed tests than in

comparable two-tailed tests

Variation in the outcome (σ2) ↓ σ2 → power ↑

Significance level (α) ↑ α → power ↑

Difference (effect) to be detected (δ) ↑ δ → power ↑

One-tailed vs. two-tailed tests Power is greater in one-tailed tests than in

comparable two-tailed tests

Power Changes

2n = 32, 2 sample test, 81% power, δ=2,σ = 2, α = 0.05, 2-sided test

Variance/Standard deviation σ: 2 → 1 Power: 81% → 99.99% σ: 2 → 3 Power: 81% → 47%

Significance level (α) α : 0.05 → 0.01 Power: 81% → 69% α : 0.05 → 0.10 Power: 81% → 94%

2n = 32, 2 sample test, 81% power, δ=2,σ = 2, α = 0.05, 2-sided test

Variance/Standard deviation σ: 2 → 1 Power: 81% → 99.99% σ: 2 → 3 Power: 81% → 47%

Significance level (α) α : 0.05 → 0.01 Power: 81% → 69% α : 0.05 → 0.10 Power: 81% → 94%

Power Changes

2n = 32, 2 sample test, 81% power, δ=2, σ = 2,α = 0.05, 2-sided test

Difference to be detected (δ) δ : 2 → 1 Power: 81% → 29% δ : 2 → 3 Power: 81% → 99%

Sample size (n) n: 32 → 64 Power: 81% → 98% n: 32 → 28 Power: 81% → 75%

One-tailed vs. two-tailed tests Power: 81% → 88%

2n = 32, 2 sample test, 81% power, δ=2, σ = 2,α = 0.05, 2-sided test

Difference to be detected (δ) δ : 2 → 1 Power: 81% → 29% δ : 2 → 3 Power: 81% → 99%

Sample size (n) n: 32 → 64 Power: 81% → 98% n: 32 → 28 Power: 81% → 75%

One-tailed vs. two-tailed tests Power: 81% → 88%

Power Formula

Depends on study design Not hard, but can be VERY algebra

intensive May want to use a computer program or

statistician

Depends on study design Not hard, but can be VERY algebra

intensive May want to use a computer program or

statistician

How Big a Sample We Need? Fundamental research question Should be addressed after determining the primary

objective and study design Too Few Patients in a clinical study

– May fail to detect a clinically important difference Too Many

– Involve extra patients– Therapy may have risks– Cost more

Fundamental research question Should be addressed after determining the primary

objective and study design Too Few Patients in a clinical study

– May fail to detect a clinically important difference Too Many

– Involve extra patients– Therapy may have risks– Cost more

How Big a Sample We Need? Fundamentalresearch question How Big?

18 180 1,800 18,000 180,000

Fundamentalresearch question How Big?

18 180 1,800 18,000 180,000

Sample Size Formula Information

Variables of interest type of data e.g. continuous, categorical

Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome

variables One or two-sided tests

Variables of interest type of data e.g. continuous, categorical

Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome

variables One or two-sided tests

Sample Size & Study Design Randomized controlled trial (RCT) Block/stratified-block randomized trial Equivalence trial Non-randomized intervention study Observational study Prevalence study Measuring sensitivity and specificity

Randomized controlled trial (RCT) Block/stratified-block randomized trial Equivalence trial Non-randomized intervention study Observational study Prevalence study Measuring sensitivity and specificity

Sample Size & Data Structure Paired data Repeated measures Groups of equal sizes Hierarchical data

Paired data Repeated measures Groups of equal sizes Hierarchical data

Sample Size Non-randomized studies looking for differences or

associations require larger sample to allow adjustment for confounding factors

Absolute sample size is of interest surveys sometimes take % of population approach

Study’s primary outcome is the variable you do the samplesize calculation for If secondary outcome variables considered important make sure

sample size is sufficient

Increase the ‘real’ sample size to reflect loss to follow up,expected response rate, lack of compliance, etc. Make the link between the calculation and increase

Non-randomized studies looking for differences orassociations require larger sample to allow adjustment for confounding factors

Absolute sample size is of interest surveys sometimes take % of population approach

Study’s primary outcome is the variable you do the samplesize calculation for If secondary outcome variables considered important make sure

sample size is sufficient

Increase the ‘real’ sample size to reflect loss to follow up,expected response rate, lack of compliance, etc. Make the link between the calculation and increase

StepsStep 1. Define Primary Objective

To see if feeding milk to 5 year old kids enhancesgrowth.

Step 2. Study Design Extra Milk Diet

5 yr olds Normal Milk Diet

Outcome: height (cm)

Step 3. Define clinically significant differenceone wishes to detect Difference (∆) of 0.5 cm

Step 1. Define Primary Objective To see if feeding milk to 5 year old kids enhances

growth.Step 2. Study Design

Extra Milk Diet

5 yr olds Normal Milk Diet

Outcome: height (cm)

Step 3. Define clinically significant differenceone wishes to detect Difference (∆) of 0.5 cm

StepsStep 4. Define degree of certainty of finding this

differencebeta (β) or type II error : The probability of NOT detecting asignificant difference when there really is one.

Risk of a false-negative finding ie Risk of declaring no significantdifference in height between the milk diets when a differencereally does exist.

Set at ≤ 20%

Power of the Test: Probability of detecting a predefined clinicallysignificant difference.

Power = (1- β) = 1 -20% = 80%

Step 4. Define degree of certainty of finding thisdifference

beta (β) or type II error : The probability of NOT detecting asignificant difference when there really is one.

Risk of a false-negative finding ie Risk of declaring no significantdifference in height between the milk diets when a differencereally does exist.

Set at ≤ 20%

Power of the Test: Probability of detecting a predefined clinicallysignificant difference.

Power = (1- β) = 1 -20% = 80%

StepsStep 5. Define significance level

Alpha (α) or type I error: The probability of detecting a significant differencewhen the treatments are really equally effective

Risk of a false-positive finding

Set at 5% :One has a 5% chance or 1 in 20 odds of declaring a significant differencebetween the milk diets when in fact they are really equal.

We are willing to accept that 1 time out of 20 we will produce a falsepositive finding

StatisticalStatistical significancesignificance testing doestesting does notnot eliminateeliminate uncertainty,ituncertainty,it merelymerelyquantifiesquantifies it.it.

Step 5. Define significance levelAlpha (α) or type I error: The probability of detecting a significant differencewhen the treatments are really equally effective

Risk of a false-positive finding

Set at 5% :One has a 5% chance or 1 in 20 odds of declaring a significant differencebetween the milk diets when in fact they are really equal.

We are willing to accept that 1 time out of 20 we will produce a falsepositive finding

StatisticalStatistical significancesignificance testing doestesting does notnot eliminateeliminate uncertainty,ituncertainty,it merelymerelyquantifiesquantifies it.it.

For the Milk Study Type I error (α) = 0.05 Type II error (β) = 0.20 Power = (1- β) = 0.80 Clinically significant diff (∆) = 0.5cm Measure of variation (SD) = 2.0 cm

– Exists in literature or “Guesstimate”FormulaFormula

N =N = 2(SD)2(SD)22 xx f(f(αα,, ββ))∆∆22

== 2(2)2(2)22 xx 7.9 / 0.57.9 / 0.522

== 252.8 (each group)

Type I error (α) = 0.05 Type II error (β) = 0.20 Power = (1- β) = 0.80 Clinically significant diff (∆) = 0.5cm Measure of variation (SD) = 2.0 cm

– Exists in literature or “Guesstimate”FormulaFormula

N =N = 2(SD)2(SD)22 xx f(f(αα,, ββ))∆∆22

== 2(2)2(2)22 xx 7.9 / 0.57.9 / 0.522

== 252.8 (each group)

Beta

Alpha 0.05 0.10 0.20 0.50

0.10 10.8 8.6 6.2 2.7

0.05 13.0 10.5 7.9 3.8

0.02 15.8 13.0 10.0 5.4

0.01 17.8 14.9 11.7 6.6

Simple Method Nomogram Standardized difference

= smallest medically relevant diffestimated standard deviation

= 0.5/2.0 = 0.25Assumptions:

1. 2 sample comparison only2. Same number of subjects

per group3. Variable is a continuous

measure that is normallydistributed

Nomogram Standardized difference

= smallest medically relevant diffestimated standard deviation

= 0.5/2.0 = 0.25Assumptions:

1. 2 sample comparison only2. Same number of subjects

per group3. Variable is a continuous

measure that is normallydistributed

500

1 sample test Study Objective : Study effect of new sleep aid Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 1-β = 90% Difference(δ) = 1 (4 hours of sleep to 5) Standard deviation(σ) = 2 hr

Change δ from 1hr to 2 hr makes n goes from 43 to 11

2 2 2 21 / 2 1

2 2

( ) (1.960 1.282) 242.04 43

1

Z Zn

Study Objective : Study effect of new sleep aid Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 1-β = 90% Difference(δ) = 1 (4 hours of sleep to 5) Standard deviation(σ) = 2 hr


2 2 2 21 / 2 1

2 2

( ) (1.960 1.282) 242.04 43

1

Z Zn

2 2

2

(1.960 1.282) 210.51 11

2n

1 sample test Change power from 90% to 80% makes n goes from 11 to 8 (Small sample: start thinking about using the t distribution)

Change the standard deviation from 2 to 3 makes n goes from 8 to 18

2 2

2

(1.960 0.841) 27.85 8

2n

Change power from 90% to 80% makes n goes from 11 to 8 (Small sample: start thinking about using the t distribution)

Change the standard deviation from 2 to 3 makes n goes from 8 to 182 2

2

(1.960 0.841) 317.65 18

2n

Sleep Aid Example: 2 Sample Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr) Two sample randomized parallel design Needed 43 in the one-sample design In 2-sample need twice that, in each group! 4 times as many people are needed in this design


2 2 2 21 / 2 1

2 2

2( ) 2(1.960 1.282) 284.1 85 170 total!

1

Z Zn

Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr) Two sample randomized parallel design Needed 43 in the one-sample design In 2-sample need twice that, in each group! 4 times as many people are needed in this design


2 2 2 21 / 2 1

2 2

2( ) 2(1.960 1.282) 284.1 85 170 total!

1

Z Zn

2 2

2

2(1.960 1.282) 221.02 22 44 total

2n

Sleep Aid Example: 2 Sample

Change power from 90% to 80% makes n goes from 44 to 32

Change the standard deviation from 2 to 3 makes n goes from 32 to 72

2 2

2

2(1.960 0.841) 215.69 16 32 total

2n

Change power from 90% to 80% makes n goes from 44 to 32

Change the standard deviation from 2 to 3 makes n goes from 32 to 722 2

2

2(1.960 0.841) 335.31 36 72 total

2n

Summary

Changes in the detectable difference haveHUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group

Changes in α, β, σ, number of samples, if it is a 1-or 2-sided test can all have a large impact on yoursample size calculation

Changes in the detectable difference haveHUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group

Changes in α, β, σ, number of samples, if it is a 1-or 2-sided test can all have a large impact on yoursample size calculation

Matched Pair Designs

Similar to 1-sample formula Means (paired t-test)

Mean difference from paired data Variance of differences

Proportions Based on discordant pairs

Similar to 1-sample formula Means (paired t-test)

Mean difference from paired data Variance of differences

Proportions Based on discordant pairs

Difference in Proportion Study Objective To increase survival by 5% with a new cancer drug P1 = % survival (std) = 85% P2 = % survival (new) = 90% Power = 90%

N = P1 (100 - P1) + P2 (100 - P2) x f (α, β) = 913.5 (each group)(P2 - P1)2

= 1827 Total

A very large study has the power to demonstrate statisticalsignificance for very small, even clinically inconsequentialdifferences.

Study Objective To increase survival by 5% with a new cancer drug P1 = % survival (std) = 85% P2 = % survival (new) = 90% Power = 90%

N = P1 (100 - P1) + P2 (100 - P2) x f (α, β) = 913.5 (each group)(P2 - P1)2

= 1827 Total

A very large study has the power to demonstrate statisticalsignificance for very small, even clinically inconsequentialdifferences.

Changes in basic formulae Unequal #s in Each Group

Ratio of cases to controls Use if want λ patients randomized to the treatment arm for every patient randomized

to the placebo arm

Take no more than 4-5 controls/case

2 1

2 2 21 / 2 1 1 2

1 2

controls for every case

( ) ( / )

n n

Z Zn

2 1

2 2 21 / 2 1 1 2

1 2

controls for every case

( ) ( / )

n n

Z Zn

# of Covariates & # of Subjects At least 10 subjects for every variable investigated

In logistic regression No general justification This is stability, not power Peduzzi et al., (1985) biased regression coefficients and

variance estimates

Principle component analysis (PCA) (Thorndike1978 p 184): N≥10m+50 or even N ≥ m2 + 50

At least 10 subjects for every variable investigated In logistic regression No general justification This is stability, not power Peduzzi et al., (1985) biased regression coefficients and

variance estimates

Principle component analysis (PCA) (Thorndike1978 p 184): N≥10m+50 or even N ≥ m2 + 50

Balanced Designs: Easier

Equal numbers in two groups is the easiestto handle

If you have more than two groups, still,equal sample sizes easiest

Complicated design = simulations Done by the statistician

Equal numbers in two groups is the easiestto handle

If you have more than two groups, still,equal sample sizes easiest

Complicated design = simulations Done by the statistician

Multiple Comparisons If you have 4 groups

All 2 way comparisons of means 6 different tests

Bonferroni: divide α by # of tests 0.025/6 ≈ 0.0042

High-throughput laboratory tests

If you have 4 groups All 2 way comparisons of means 6 different tests

Bonferroni: divide α by # of tests 0.025/6 ≈ 0.0042

High-throughput laboratory tests

Flaws in Statements "A previous study in this area recruited 150 subjects and found highly

significant results (p=0.014), and therefore a similar sample size shouldbe sufficient here."

Previous studies may have been 'lucky' to find significant results, due to randomsampling variation.

"Sample sizes are not provided because there is no prior information onwhich to base them."

Find previously published information Conduct small pre-study If a very preliminary pilot study, sample size calculations not usually necessary

No prior information on standard deviations Give the size of difference that may be detected in terms of number of standard

deviations

"A previous study in this area recruited 150 subjects and found highlysignificant results (p=0.014), and therefore a similar sample size shouldbe sufficient here."

Previous studies may have been 'lucky' to find significant results, due to randomsampling variation.

"Sample sizes are not provided because there is no prior information onwhich to base them."

Find previously published information Conduct small pre-study If a very preliminary pilot study, sample size calculations not usually necessary

No prior information on standard deviations Give the size of difference that may be detected in terms of number of standard

deviations

Roadmap1. Do a sample size calculation before you start

collecting data2. Collect data3. Perform statistical test : IF p value < 0.05, declare

statistical significance4. Consider clinical significance by looking at the size of

the difference

1. Do a sample size calculation before you startcollecting data

2. Collect data3. Perform statistical test : IF p value < 0.05, declare

statistical significance4. Consider clinical significance by looking at the size of

the difference

References

“Sample Size Estimation”, Phil Hahn Queen’sUniversity

”Sample Size and Power”, Laura Lee Johnson,Ph.D., Statistician, National Center forComplementary and Alternative Medicine

”Sample Size Estimation and Power Analysis”,Ayumi Shintani, PhD, MPH Department ofBiostatistics, Vanderbilt University

“Sample Size Estimation”, Phil Hahn Queen’sUniversity

”Sample Size and Power”, Laura Lee Johnson,Ph.D., Statistician, National Center forComplementary and Alternative Medicine

”Sample Size Estimation and Power Analysis”,Ayumi Shintani, PhD, MPH Department ofBiostatistics, Vanderbilt University

Business

Power Analysis and Sample Size Determination