Efficacy, Safety and FutilityStopping Boundaries
ExL Pharma WorkshopPhiladelphia, PA
Feb 25-26, 2007
Cyrus R. Mehta
President, Cytel Inc.
email: [email protected] – web: www.cytel.com – tel: 617-661-2011
1 ExL Pharma. Feb 25-26, 2008. Philadelphia
Contents of the Talk
• Three Real Examples with Early Stopping Boundaries
– Efficacy boundaries alone (CHARM trial)
– Efficacy and safety boundaries (CRASH trial)
– Efficacy and futility boundaries (COMET trial)
• Acknowledgement: I thank Professor Stuart Pocock forproviding me with these examples
2 ExL Pharma. Feb 25-26, 2008. Philadelphia
1. Tough Efficacy Boundaries:CHARM Trial
• Candersartan vs. placebo for reducing mortality in heartfailure patients (CHARM). American Heart Journal(Pocock, 2005)
• Primary endpoint is all causes mortality
• Require 85% power to detect a 14% reduction in annualmortality from 8% in the placebo group, with a 2-sidedlevel-0.05 test
3 ExL Pharma. Feb 25-26, 2008. Philadelphia
How Many Events Needed?
• Number of events needed to achieve 1 − β power is
D = 4[zα/2 + zβ
ln(HR)
]2
= 1570
• Investigators want a 4 year study
– Enroll for 2 years
– Follow up for 2 additional years after last patient enrolled
• What sample size will produce 1570 events in 4 years?
4 ExL Pharma. Feb 25-26, 2008. Philadelphia
Sample Size Calculation
• Patients enroll at the rate of A per month for Sa monthsand are followed for an additional Sf months
Accrual Period Accrual Plus Follow-up� �
0 Sa (Sa + Sf)
• For exponential survival with hazard rate λ the expectednumber of failures by calendar time l is
Dλ(l) =
⎧⎨⎩ A(l − 1−e−λl
λ) for l ≤ Sa
A{Sa − e−λl
λ(eλSa − 1)} for l > Sa
5 ExL Pharma. Feb 25-26, 2008. Philadelphia
• Sa = 24, Sa + Sf = 48. Find the accrual rate A such that
Dλe(48) + Dλc(48) = 1570
• We must enroll A = 317 subjects/month for 24 months. Sample sizeN = 317 × 24 = 7600
6 ExL Pharma. Feb 25-26, 2008. Philadelphia
Single Look Design: BlindedMonitoring of Events
• The single-look design monitors events in blinded fashionuntil 1570 events have been observed. Then performs finalanalysis
• Statistical significance is declared if Z ≥ 1.96, orequivalently p ≤ 0.05
• Designs with unblinded interim monitoring and possibleearly stopping have more complicated criteria for declaringsignificance
8 ExL Pharma. Feb 25-26, 2008. Philadelphia
Group Sequential Design:Unblinded Monitoring of Events
• In a group sequential design a DMC performs unblindedefficacy analyses up to K times, after observingD1, D2, . . . DK events
• Let c1, c2, . . . cK be corresponding stopping boundaries.Statistical significance is declared the the first time that|Zj| ≥ cj
• We require the cj’s to satisfy the level condition:
P0
K⋃j=1
(|Zj| ≥ cj) = α
9 ExL Pharma. Feb 25-26, 2008. Philadelphia
Spending Function Boundaries
• Specify a monotone increasing function of t for t ∈ [0, 1] withα(0) = 0, α(1) = α. Lan and DeMets (1983) have proposed
α(t) = 4 − 4Φ(
zα/4√t
)
but any other montone function could be used also
• Let tj = ( Dj
DK) be the “information fraction” at look j
• Solve recursively for c1, c2, . . . cK:
P0{|Z1| ≥ c1} = α(t1)
α(t1) + P0{|Z1| < c1, |Z2| ≥ c2} = α(t2)
and for j = 3, . . . K,
α(tj−1) + P0{|Z1| < c1, . . . , |Zj−1| < cj−1, |Zj| ≥ cj} = α(tj)
10 ExL Pharma. Feb 25-26, 2008. Philadelphia
Lan-DeMets (OBF) α-Spending Function
α(t) = 4 − 4Φ(
zα/4√t
)
11 ExL Pharma. Feb 25-26, 2008. Philadelphia
Lan-DeMets (PK) α-Spending Function
α(t) = α log{1 + (e − 1)t}
12 ExL Pharma. Feb 25-26, 2008. Philadelphia
A Parametric Family of SpendingFunctions
• Gamma Family Hwang IK, Shih WJ and DeCani JS(1990). Statistics in Medicine, 9, 1439-1445
α(t) = α(1 − e−γt)
(1 − e−γ), where γ �= 0
• Setting γ to -4 or -5 generates boundaries similar toO’Brien-Fleming, while setting γ to 1 generates boundariessimilar to Pocock
• Can generate very conservative or very aggressiveboundaries by choice of γ
13 ExL Pharma. Feb 25-26, 2008. Philadelphia
Impact of Group Sequential Designon Power and P-value Penalty
• Taking multiple looks for early stopping:
– decreases power
– raises the p-value hurdle at the final look
• The magnitude of these changes increases with number oflooks and with aggressivelness of the stopping boundaries
• The Lan-DeMets, O’Brien-Fleming type boundary ispopular because it is not too aggressive
• For the CHARM trial, however, the DMC wanted muchtougher boundaries
15 ExL Pharma. Feb 25-26, 2008. Philadelphia
Haybittle Peto Boundaries
• The DMC decided that they would use the Haybittle-Petoboundaries
• These are very simple rules based purely on the p-value ateach look
– Plan for 7 looks
– Reject if p < 0.0001 at the first 3 looks
– Reject if p < 0.001 at the next 3 looks
– Adjust the p-value at the final look to get a level-α test
17 ExL Pharma. Feb 25-26, 2008. Philadelphia
Last-look P-Value for HP
Let cj = Φ−1(1 − 0.0001/2) for j = 1, 2, 3 and cj = Φ−1(1 − 0.001/2)for j = 4, 5, 6. Then c7 satisfies
P0
{6⋃
j=1
(|Zj| ≥ cj) ∪ (|Z7| ≥ c7)
}= α
18 ExL Pharma. Feb 25-26, 2008. Philadelphia
Comments on HP Boundaries
• Developed as an ad-hoc method to enable interimmonitoring when there is no serious intention to stop early
• Final p-value depends on number and spacing of interimlooks. It must be re-calculated if these design parameterschange
19 ExL Pharma. Feb 25-26, 2008. Philadelphia
Interim Monitoring of CHARM
Efficacy Results at each DMC meeting
Date # Deaths HP Boundary P-Value Hazard Ratio
8/9/99 12 0.0001 0.3 0.55
3/27/00 199 0.0001 0.0007 0.618
7/27/00 331 0.0001 0.0002 0.664
3/1/01 599 0.001 0.0006 0.755
8/9/01 861 0.001 0.001 0.799
2/22/02 1187 0.001 0.009 0.859
8/1/02 1438 0.001 0.015 0.880
3/31/03 1831 0.001 0.055 0.914
20 ExL Pharma. Feb 25-26, 2008. Philadelphia
Why Didn’t They Stop at Look 4?
• Secondary endpoints, CV death and CHF hospitalizationstill awaiting adjudication
• Very short average length of follow-up
• No previous trial had shown evidence of benefit fromCandesartan
• Results did not appear strong enough to influence clinicalpractice
22 ExL Pharma. Feb 25-26, 2008. Philadelphia
2. Asymetric Efficacy and SafetyBoundaries: The CRASH Trial
Large international multicenter trial to determine efficacy and safety ofadministering intravenous corticosteroids to subjects with significant headinjury (Lancet, vol 364, 2004)
• Endpoint is death within 14 days of randomization
• Randomize subjects with Glasgow Coma Score ≤ 14 to placebo orcorticosteroids
• Placebo arm 14-day mortality estimated to be 15%
• Design for 90% power to detect a 2% drop in 14-day mortality with atwo-sided test conducted at level α = 0.05
• Risk benefit ratio is unclear. Corticosteroids believed to be beneficial.But evidence from meta-analysis suggests possibility of harm
23 ExL Pharma. Feb 25-26, 2008. Philadelphia
Single-Look Design
12640 patients required to achieve 90% power
24 ExL Pharma. Feb 25-26, 2008. Philadelphia
Drawback of the Single-LookDesign
• Very large sample size commitment with no possibility ofearly termination for benefit, harm or futility
• Suppose the corticosteroids are actually beneficial? Do wereally have to randomize 6320 patients to placebo beforewe know for sure?
• What if the meta-analysis results are correct andcorticosteroids are actually harmful? In that case we willhave randomized 6320 patients to a treatment that isworse than placebo
25 ExL Pharma. Feb 25-26, 2008. Philadelphia
Group Sequential Design
• Monitor the interim data
• Stop the trial early if evidence of benefit or harm emerges
26 ExL Pharma. Feb 25-26, 2008. Philadelphia
Interim Monitoring of Crash
• Recruitment began in April 1999. The DMC met twice. The efficacyresults at the two meetings are tabulated below along with the finaldata:
Date of Corticosteroid Placebo Statistics
DMC Meeting Deaths Subjects Deaths Subjects δ̂ se(δ̂) Z
June 2003 627 (20.2%) 3102 562 (18.2%) 3091 0.2 0.01 2.0
May 2004 902 (20.6%) 4377 776 (17.9%) 4334 0.027 0.0084 3.2
Final Data 1052 (21.1%) 4985 893 (17.9%) 4979 0.032 0.0079 4.0
• The safety boundary was crossed at the second look and the DMCstopped the trial, declaring that use of corticosteroids was unsafe
• The final analysis confirmed the conclusions of the DMC
29 ExL Pharma. Feb 25-26, 2008. Philadelphia
3. Futility Boundaries: TheCOMET Trial
• Dornase alpha versus placebo for patients with ahospitalized exacerbation of chronic bronchitis
• Primary endpoint is 90-day all-cause mortality
• Encouraging results from 244 patient pilot (p = 0.002)
• Investigators plan a 3-look 5600 patient trial
• Provides 90% power to detect a 20% drop in 90-daymortality (15% to 12%) with a two-sided level-0.05 test
31 ExL Pharma. Feb 25-26, 2008. Philadelphia
Options for Futility Boundaries
1. Low conditional power
• Somewhat arbitrary. How low?
• Impact on overall power must be evaluated
2. Lower confidence bound rules out clinical benefit
• Has same drawbacks as conditional power
3. Formal futility boundary
• Overall power is preserved via β-spending function
• Boundary can be expressed in terms of conditional power
33 ExL Pharma. Feb 25-26, 2008. Philadelphia
Benefit of Formal FutilityBoundaries
• Stopping a trial for futility (as opposed to safety) is adifficult recommendation for the DMC to make, and apainful decision for the sponsor
• As a result many trials continue to the end, consumingresources that could have been better utilized on othercompounds
• The presence of a formal futility boundary whose operatingcharacteristics have been examined ahead of time canencourage more aggressive early stopping for futility
34 ExL Pharma. Feb 25-26, 2008. Philadelphia
Implications of Adding a FutilityBoundaries
• Need larger sample size (5699 patients) for same amountof power
• Lower expected sample size either if risk reduction is 20%or if it is 0%
• The two boundaries meet at l3 = u3 = −1.959; an easierhurdle than the corresponding boundary c3 = −1.993 fordesign with no futilty boundary
36 ExL Pharma. Feb 25-26, 2008. Philadelphia
Boundary Interaction
• Because of the futility boundary, the final efficacyboundary became easier to cross (from -1.993 to -1.959).This is sometimes refered to as buying back the α
• In this case the final efficacy boundary is more favorableeven than -1.96, the one-sided efficacy cut-off for asingle-look trial
• We shall see, however that there is a price to be paid forthis windfall – the futility boundary is binding.
38 ExL Pharma. Feb 25-26, 2008. Philadelphia
Why Futility Boundary Relaxesthe Efficacy Criterion
• Multiple looks give you extra opportunities to cross anefficacy boundary under H0
• Therefore you pay a penalty (c3 = 1.993 in Plan 1) toprevent excess false positives
• But multiple looks give you extra opportunities to crossfutility boundary under H1
• Therefore you receive a reward (l3 = u3 = 1.959 in Plan5) to prevent excess false negatives
39 ExL Pharma. Feb 25-26, 2008. Philadelphia
Warning! This Futility Boundary isBinding
• The sponsor should be aware that taking advantage of thisreward (i.e., relaxing the standard for declaring statisticalsignificance at the final look) will make the futilityboundary binding
• If you overrule the futility boundary, the type-1 error willbe inflated
40 ExL Pharma. Feb 25-26, 2008. Philadelphia
Non-Binding Futility Boundary
• To make the futility boundary non-binding, you must leavethe efficacy boundary untouched
• Only in that way can you be assured that the type-1 errorwill be preserved
• This will cost you a slight loss of power since you cannotpull up the efficacy boundary anymore
42 ExL Pharma. Feb 25-26, 2008. Philadelphia
Plan 3 has a non-binding futility boundary
Notice how the sample size increases when going from Plan 1to Plan 2 to Plan 3 to compensate for the power loss
43 ExL Pharma. Feb 25-26, 2008. Philadelphia
Interim Monitoring of Comet
At the DMC meeting in July 1995, trial was stopped for futility with 197/1866(10.6%) deaths on dornase alpha and 163/1865 (8.7%) deaths on placebo
45 ExL Pharma. Feb 25-26, 2008. Philadelphia
Final Comments on COMET
• In the actual trial there was no futility boundary
• Trial was indeed stopped, but only after much discussion
• That decision still remains a topic for discussion
• Had there been a futility boundary it would have beencrossed by a wide margin and there would be no furtherdiscussion
• What would DMC have done had the results had come out163/1865 (8.7%) for placebo and 153/1866 ( 8.2%) fordornase alpha? Without a futility boundary the decisionwould have been more difficult
46 ExL Pharma. Feb 25-26, 2008. Philadelphia