Download pdf - Eﬃcacy, Safety and Futility Stopping · PDF fileEﬃcacy, Safety and Futility Stopping Boundaries ExL Pharma Workshop Philadelphia, PA Feb 25-26, ... I thank Professor Stuart Pocock

Efficacy, Safety and FutilityStopping Boundaries

ExL Pharma WorkshopPhiladelphia, PA

Feb 25-26, 2007

Cyrus R. Mehta

President, Cytel Inc.

email: [email protected] – web: www.cytel.com – tel: 617-661-2011

1 ExL Pharma. Feb 25-26, 2008. Philadelphia

Contents of the Talk

• Three Real Examples with Early Stopping Boundaries

– Efficacy boundaries alone (CHARM trial)

– Efficacy and safety boundaries (CRASH trial)

– Efficacy and futility boundaries (COMET trial)

• Acknowledgement: I thank Professor Stuart Pocock forproviding me with these examples


1. Tough Efficacy Boundaries:CHARM Trial

• Candersartan vs. placebo for reducing mortality in heartfailure patients (CHARM). American Heart Journal(Pocock, 2005)

• Primary endpoint is all causes mortality

• Require 85% power to detect a 14% reduction in annualmortality from 8% in the placebo group, with a 2-sidedlevel-0.05 test


How Many Events Needed?

• Number of events needed to achieve 1 − β power is

D = 4[zα/2 + zβ

ln(HR)

]2

= 1570

• Investigators want a 4 year study

– Enroll for 2 years

– Follow up for 2 additional years after last patient enrolled

• What sample size will produce 1570 events in 4 years?


Sample Size Calculation

• Patients enroll at the rate of A per month for Sa monthsand are followed for an additional Sf months

Accrual Period Accrual Plus Follow-up� �

0 Sa (Sa + Sf)

• For exponential survival with hazard rate λ the expectednumber of failures by calendar time l is

Dλ(l) =

⎧⎨⎩ A(l − 1−e−λl

λ) for l ≤ Sa

A{Sa − e−λl

λ(eλSa − 1)} for l > Sa


• Sa = 24, Sa + Sf = 48. Find the accrual rate A such that

Dλe(48) + Dλc(48) = 1570

• We must enroll A = 317 subjects/month for 24 months. Sample sizeN = 317 × 24 = 7600


Single Look Design


Single Look Design: BlindedMonitoring of Events

• The single-look design monitors events in blinded fashionuntil 1570 events have been observed. Then performs finalanalysis

• Statistical significance is declared if Z ≥ 1.96, orequivalently p ≤ 0.05

• Designs with unblinded interim monitoring and possibleearly stopping have more complicated criteria for declaringsignificance


Group Sequential Design:Unblinded Monitoring of Events

• In a group sequential design a DMC performs unblindedefficacy analyses up to K times, after observingD1, D2, . . . DK events

• Let c1, c2, . . . cK be corresponding stopping boundaries.Statistical significance is declared the the first time that|Zj| ≥ cj

• We require the cj’s to satisfy the level condition:

P0

K⋃j=1

(|Zj| ≥ cj) = α


Spending Function Boundaries

• Specify a monotone increasing function of t for t ∈ [0, 1] withα(0) = 0, α(1) = α. Lan and DeMets (1983) have proposed

α(t) = 4 − 4Φ(

zα/4√t

)

but any other montone function could be used also

• Let tj = ( Dj

DK) be the “information fraction” at look j

• Solve recursively for c1, c2, . . . cK:

P0{|Z1| ≥ c1} = α(t1)

α(t1) + P0{|Z1| < c1, |Z2| ≥ c2} = α(t2)

and for j = 3, . . . K,

α(tj−1) + P0{|Z1| < c1, . . . , |Zj−1| < cj−1, |Zj| ≥ cj} = α(tj)


Lan-DeMets (OBF) α-Spending Function

α(t) = 4 − 4Φ(

zα/4√t

)


Lan-DeMets (PK) α-Spending Function

α(t) = α log{1 + (e − 1)t}


A Parametric Family of SpendingFunctions

• Gamma Family Hwang IK, Shih WJ and DeCani JS(1990). Statistics in Medicine, 9, 1439-1445

α(t) = α(1 − e−γt)

(1 − e−γ), where γ �= 0

• Setting γ to -4 or -5 generates boundaries similar toO’Brien-Fleming, while setting γ to 1 generates boundariessimilar to Pocock

• Can generate very conservative or very aggressiveboundaries by choice of γ


LD(OF) Boundaries of CHARM


Impact of Group Sequential Designon Power and P-value Penalty

• Taking multiple looks for early stopping:

– decreases power

– raises the p-value hurdle at the final look

• The magnitude of these changes increases with number oflooks and with aggressivelness of the stopping boundaries

• The Lan-DeMets, O’Brien-Fleming type boundary ispopular because it is not too aggressive

• For the CHARM trial, however, the DMC wanted muchtougher boundaries


LD(OF), Haybittle-Peto andGamma(-12) Boundaries


Haybittle Peto Boundaries

• The DMC decided that they would use the Haybittle-Petoboundaries

• These are very simple rules based purely on the p-value ateach look

– Plan for 7 looks

– Reject if p < 0.0001 at the first 3 looks

– Reject if p < 0.001 at the next 3 looks

– Adjust the p-value at the final look to get a level-α test


Last-look P-Value for HP

Let cj = Φ−1(1 − 0.0001/2) for j = 1, 2, 3 and cj = Φ−1(1 − 0.001/2)for j = 4, 5, 6. Then c7 satisfies

P0

{6⋃

j=1

(|Zj| ≥ cj) ∪ (|Z7| ≥ c7)

}= α


Comments on HP Boundaries

• Developed as an ad-hoc method to enable interimmonitoring when there is no serious intention to stop early

• Final p-value depends on number and spacing of interimlooks. It must be re-calculated if these design parameterschange


Interim Monitoring of CHARM

Efficacy Results at each DMC meeting

Date # Deaths HP Boundary P-Value Hazard Ratio

8/9/99 12 0.0001 0.3 0.55

3/27/00 199 0.0001 0.0007 0.618

7/27/00 331 0.0001 0.0002 0.664

3/1/01 599 0.001 0.0006 0.755

8/9/01 861 0.001 0.001 0.799

2/22/02 1187 0.001 0.009 0.859

8/1/02 1438 0.001 0.015 0.880

3/31/03 1831 0.001 0.055 0.914



Why Didn’t They Stop at Look 4?

• Secondary endpoints, CV death and CHF hospitalizationstill awaiting adjudication

• Very short average length of follow-up

• No previous trial had shown evidence of benefit fromCandesartan

• Results did not appear strong enough to influence clinicalpractice


2. Asymetric Efficacy and SafetyBoundaries: The CRASH Trial

Large international multicenter trial to determine efficacy and safety ofadministering intravenous corticosteroids to subjects with significant headinjury (Lancet, vol 364, 2004)

• Endpoint is death within 14 days of randomization

• Randomize subjects with Glasgow Coma Score ≤ 14 to placebo orcorticosteroids

• Placebo arm 14-day mortality estimated to be 15%

• Design for 90% power to detect a 2% drop in 14-day mortality with atwo-sided test conducted at level α = 0.05

• Risk benefit ratio is unclear. Corticosteroids believed to be beneficial.But evidence from meta-analysis suggests possibility of harm


Single-Look Design

12640 patients required to achieve 90% power


Drawback of the Single-LookDesign

• Very large sample size commitment with no possibility ofearly termination for benefit, harm or futility

• Suppose the corticosteroids are actually beneficial? Do wereally have to randomize 6320 patients to placebo beforewe know for sure?

• What if the meta-analysis results are correct andcorticosteroids are actually harmful? In that case we willhave randomized 6320 patients to a treatment that isworse than placebo


Group Sequential Design

• Monitor the interim data

• Stop the trial early if evidence of benefit or harm emerges


Using East for the Design


Evaluate Properties by Simulation


Interim Monitoring of Crash

• Recruitment began in April 1999. The DMC met twice. The efficacyresults at the two meetings are tabulated below along with the finaldata:

Date of Corticosteroid Placebo Statistics

DMC Meeting Deaths Subjects Deaths Subjects δ̂ se(δ̂) Z

June 2003 627 (20.2%) 3102 562 (18.2%) 3091 0.2 0.01 2.0

May 2004 902 (20.6%) 4377 776 (17.9%) 4334 0.027 0.0084 3.2

Final Data 1052 (21.1%) 4985 893 (17.9%) 4979 0.032 0.0079 4.0

• The safety boundary was crossed at the second look and the DMCstopped the trial, declaring that use of corticosteroids was unsafe

• The final analysis confirmed the conclusions of the DMC


Tracking the Path of the TestStatistic


3. Futility Boundaries: TheCOMET Trial

• Dornase alpha versus placebo for patients with ahospitalized exacerbation of chronic bronchitis

• Primary endpoint is 90-day all-cause mortality

• Encouraging results from 244 patient pilot (p = 0.002)

• Investigators plan a 3-look 5600 patient trial

• Provides 90% power to detect a 20% drop in 90-daymortality (15% to 12%) with a two-sided level-0.05 test


Trial with No Futility Boundary


Options for Futility Boundaries

1. Low conditional power

• Somewhat arbitrary. How low?

• Impact on overall power must be evaluated

2. Lower confidence bound rules out clinical benefit

• Has same drawbacks as conditional power

3. Formal futility boundary

• Overall power is preserved via β-spending function

• Boundary can be expressed in terms of conditional power


Benefit of Formal FutilityBoundaries

• Stopping a trial for futility (as opposed to safety) is adifficult recommendation for the DMC to make, and apainful decision for the sponsor

• As a result many trials continue to the end, consumingresources that could have been better utilized on othercompounds

• The presence of a formal futility boundary whose operatingcharacteristics have been examined ahead of time canencourage more aggressive early stopping for futility


COMET with Futility Boundary


Implications of Adding a FutilityBoundaries

• Need larger sample size (5699 patients) for same amountof power

• Lower expected sample size either if risk reduction is 20%or if it is 0%

• The two boundaries meet at l3 = u3 = −1.959; an easierhurdle than the corresponding boundary c3 = −1.993 fordesign with no futilty boundary



Boundary Interaction

• Because of the futility boundary, the final efficacyboundary became easier to cross (from -1.993 to -1.959).This is sometimes refered to as buying back the α

• In this case the final efficacy boundary is more favorableeven than -1.96, the one-sided efficacy cut-off for asingle-look trial

• We shall see, however that there is a price to be paid forthis windfall – the futility boundary is binding.


Why Futility Boundary Relaxesthe Efficacy Criterion

• Multiple looks give you extra opportunities to cross anefficacy boundary under H0

• Therefore you pay a penalty (c3 = 1.993 in Plan 1) toprevent excess false positives

• But multiple looks give you extra opportunities to crossfutility boundary under H1

• Therefore you receive a reward (l3 = u3 = 1.959 in Plan5) to prevent excess false negatives


Warning! This Futility Boundary isBinding

• The sponsor should be aware that taking advantage of thisreward (i.e., relaxing the standard for declaring statisticalsignificance at the final look) will make the futilityboundary binding

• If you overrule the futility boundary, the type-1 error willbe inflated



Non-Binding Futility Boundary

• To make the futility boundary non-binding, you must leavethe efficacy boundary untouched

• Only in that way can you be assured that the type-1 errorwill be preserved

• This will cost you a slight loss of power since you cannotpull up the efficacy boundary anymore


Plan 3 has a non-binding futility boundary

Notice how the sample size increases when going from Plan 1to Plan 2 to Plan 3 to compensate for the power loss


Verify Properties by Simulation


Interim Monitoring of Comet

At the DMC meeting in July 1995, trial was stopped for futility with 197/1866(10.6%) deaths on dornase alpha and 163/1865 (8.7%) deaths on placebo


Final Comments on COMET

• In the actual trial there was no futility boundary

• Trial was indeed stopped, but only after much discussion

• That decision still remains a topic for discussion

• Had there been a futility boundary it would have beencrossed by a wide margin and there would be no furtherdiscussion

• What would DMC have done had the results had come out163/1865 (8.7%) for placebo and 153/1866 ( 8.2%) fordornase alpha? Without a futility boundary the decisionwould have been more difficult