Biostatistics Case Studies 2014

Biostatistics Case Studies 2014

Youngju Pak, PhD.

Biostatistician

[email protected]

Session 1:

Sample Size & Power for Inequality and Equivalence Studies I

Class Schedule

Date Topic Related Paper

Session 1, Sept 16 Sample Size & Power for Inequality and Equivalence Studies I

Howard Paper, Gilchrist Paper, Williamson Paper

Session 2, Sept 23 Sample Size & Power for Inequality and Equivalence Studies II

Diestelhorst Paper

Session 3, Sept 30 Research Study Designs To Be Determined

Session 4, Oct 7 Regression Models and Multivariate Analyses

TBD

Session 5, Oct 14 Survival Analysis Fundamentals

TBD

Session 6, Oct 21 Free Topics & Discussion TBD

Announcements

• All class materials will be uploaded in the following website

• http://research.labiomed.org/Biostat/Education/CaseStudies_Fall2014/CaseStudies2014Outline.htm

• Try to read posted articles before each as best as you can and pay more attention to statistical components when you read them

• Send me an e-mail ([email protected]) so I can communicate with you if necessary.

• Send me a copy of article that you want to discuss if you have one. This might be used for the last session

http://research.labiomed.org/Biostat/Education/CaseStudies_Fall2014/CaseStudies2014Outline.htm

Inequality study:• Two or more treatments are assumed equal (H0)and

the study is designed to find overwhelming evidence of a difference (Superiority and/or Inferiority).

• Most common comparative study type.

• It is rare to assess only one of superiority or inferiority (“one-sided” statistical tests), unless there is biological impossibility of one of them.

• Hypotheses:Ha: | mean(treatment ) - mean (control ) | ≠ 0H0: | mean(treatment ) - mean (control ) | = 0

Insignificnat p-values for Inequality tests

• Insignificant p-values (> 0.05) usually mean that you don’t find a statistically sufficient evidence to support Ha and this doesn’t necessary mean H0 is true.

• H0 might or might not be true => Your study is still “INCONCLUSIVE”.

• Insignificant p-values do NOT prove your null !

Equivalence Study:Two treatments are assumed to differ (H0) and the study is designed to find overwhelming evidence that they are equal.

• Usually, the quantity of interest is a measure of biological activity or potency(the amount of drug required to produce

an effect) and “treatments” are drugs or lots or batches of drugs.

• AKA, bioequivalence.

• Sometimes used to compare clinical outcomes for two active treatments if neither treatment can be considered standard or accepted. This usually requires LARGE numbers of subjects.

Hypotheses for equivalence tests• Ha : mean (trt 1) – mean (trt 2) = 0

• H0: mean(trt 1) - mean (trt 2 ) ≠ 0

• With a finite sample size, it is very hard to find two group means are exactly the same.

• So we put a tolerability level for the equivalence, AKA, the equivalence margin, usually denoted as Δ

• Practical hypotheses would be • Ha : Δ 1< mean(trt 1) – mean (trt2) < Δ2

• H0 : mean(trt 1) – mean (trt2) ≤ Δ 1

or mean(trt 1) – mean (trt2) ≥ Δ2

Non-inferiority

Today, we are going to learn how to determine sample size for Inequality tests using software

for three papers.

Then, Discuss some logic.

Paper #1

How was N=498 determined?

What reduction in CVD events can 224 + 224 subjects detect? Nevertheless

How many subjects would be needed to detect this Δ?

Software Output for % of CVD Events

224 + 224 → detect 6.7% vs. 1.13%, i.e., 88% ↓.

Need 3115 + 3115 to detect 25% ↓ from 6.7% to 5%, i.e., a total of (3115+3115)/0.9 = 6922.

From earlier design paper (Russell 2007):

Δ = 0.85(0.05)

mm = 0.0425 mm

Software Output for Mean IMT

Each group N for 10% Dropout → 0.9N = 224

→ N = 224/0.9 = 249. Total study size = 2(249)=498

Paper #2

Williamson paper

Software Output - Percentages

Software Output - Means

Can detect 0.4 SDs. Units? Since normal range =~ 6SD, this corresponds to ~0.4/6=7% shift in normal range.

Applies to any continuously measured outcome.

Paper #3

From Nance paper

Δ = ~8%

Δ

SD√(1/N1 + 1/N2)= 2.82

Solve for SD to get SD =~ 6.8%

Software Output for Gilchrist Paper

Some Logic

How was 498 determined?

Back to:

How IMT Change Comparison Will be Made

Strength of Treatment Effect:

Signal:Noise Ratio t=

Observed Δ

SD√(1/N1 + 1/N2)

Δ = Aggressive - Standard Mean Diff in IMT changes

SD = Std Dev of within group IMT changes

N1 = N2 = Group size

| t | > ~1.96 ↔ p<0.05

Could Solve for N

Observed Δ

SD√(1/N1 + 1/N2)

This is not quite right.

The Δ is the actual observed difference.

This sample Δ will vary from the real Δ in “everyone”.

Need to increase N in case the sample happens to have a Δ that is lower than the real Δ (50% possibility).

≥~1.96 if (with N = N1 = N2):

Δ ≥ 1.96SD√(2/N) or N ≥ 2SD2

Δ2(1.96)2

t =

Need to Increase N for Power

Need to increase N to:

2SD2

Δ2(1.96 + 0.842)2

Power is the probability that p<0.05 if Δ is the real effect, incorporating the possibility that the Δ in our sample could be smaller.

2SD2

Δ2(1.96)2N = for 50% power.

for 80% power.N =

N =2SD2

Δ2(1.96 + 1.282)2 for 90% power.

from Normal Tables

Info Needed for Study Size: Comparing Means

1. Effect

2. Subject variability

3. Type I error (1.96 for α=0.05; 2.58 for α=0.01)

4. Power (0.842 for 80% power; 1.645 for 95% power)

(1.96 + 0.842)22SD2

Δ2N =

Same four quantities, but different formula, if comparing %s, hazard ratios, odds ratios, etc.

(1.96 + 0.842)2 2(0.16)2

(0.0425)2N = = 224

Each group N for 10% Dropout → 0.9N = 224

→ N = 224/0.9 = 249. Total study size = 2(249)=498

2SD2

Δ2N = (1.96 + 0.842)2

Change Effect Size to be Detected

SD Estimate Could be Wrong

Should examine SD as study progresses.

May need to increase N if SD was underestimated.

Some Study Size Software

Free Study Size Software

www.stat.uiowa.edu/~rlenth/Power

Study Size Software in GCRC Lab

ncss.com ~$500

nQuery - Used by Most Drug Companies

Documents

Biostatistics Case Studies 2014