Upload
noelle
View
44
Download
0
Embed Size (px)
DESCRIPTION
Optimizing Group Sequential Designs that Allow Changes to the Population Sampled Based on Interim Data. Michael Rosenblum Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Joint Work with Mark J. van der Laan Division of Biostatistics - PowerPoint PPT Presentation
Citation preview
Optimizing Group Sequential Designs that Allow Changes to the
Population Sampled Based on Interim Data
Michael RosenblumDepartment of Biostatistics
Johns Hopkins Bloomberg School of Public Health
Joint Work with Mark J. van der LaanDivision of Biostatistics
University of California, Berkeley
``Death is caused by swallowing small amounts of saliva over a long period of time.” --George Carlin
<<La mort est causé par ingestion de petites quantités de salive sur une longue période de temps.>> --George Carlin
Outline of Talk
• Motivation: Why Change the Population Sampled in Group Sequential Designs?
• Simple Example• Our Method• Our Method Applied to Example• Use Results to Compare Power of
Adaptive vs. Non-Adaptive Designs
Adaptive Clinical Trial DesignsFDA is Interested:
“A large effort has been under way at FDA during the past several years to encourage the development and use of new trial designs, including enrichment designs.”
Adaptive Clinical Trial Designs• Pharmaceutical Companies are Interested:
“An adaptive clinical trial conducted by Merck saved the company $70.8 million compared with what a hypothetical traditionally designed study would have cost…”
Why Use Adaptive Designs?
Benefits:– Can Give More Power to Confirm Effective
Drugs and Determine Subpopulations who Benefit Most
– Can Reduce Cost, Duration, and Number of Subjects of Trials
Designs Must:– Control Probability of False Positive Results
Group Sequential Randomized Trial Designs
• Participants Enrolled over Time• At Interim Points, Can Change Sampling
in Response to Accrued Data.
FDA Draft Guidance on Adaptive Designs
Focus is AW&C (adequate and well-controlled) trials.
Distinguishes well understood vs. less well understood adaptations.
Explains chief concerns: Type I error, bias, interpretability.
FDA Draft Guidance on Adaptive Designs
• Adapt Study Eligibility Criteria Using
Only Pre-randomization data.• Adapt to Maintain Study Power Based
on Blinded Interim Analyses of Aggregate Data (or Based on Data Unrelated to Outcome).
• Adaptations Not Dependent on Within Study, Between-Group Outcome Differences
Well Understood Adaptations:
FDA Draft Guidance on Adaptive Designs
• Group Sequential Methods (i.e. Early
Stopping)
Well Understood Adaptations:
FDA Draft Guidance on Adaptive Designs
• Adaptive Dose Selection• Response-Adaptive Randomization• Sample Size Adaptation Based on
Interim-Effect Size Estimates• Adaptation of Patient Population Based
on Treatment-Effect Estimates• Adaptive Endpoint Selection
Less-Well Understood Adaptations:
FDA Draft Guidance on Adaptive Designs
Adaptation of Patient Population Based on Treatment-Effect Estimates
“These designs are less well understood, pose challenges in avoiding introduction of bias, and generally call for statistical adjustment to avoid increasing the Type I error rate.“
Example
Population: Subjects with depression.
Research Questions: How does a new antidepressant compare to placebo in effect on change in Hamilton Rating Score of Depression (HRSD) after 6 weeks?
Does it differ depending on initial severity of depression?
Prior Data Indicates: Maybe only a treatment benefit for those with severe initial depression.
Mean Drug-Placebo Difference as a Function of Initial Severity
(from meta-analysis of Kirsch et al. 2008)
Some Possible Fixed Designs• Enroll from total population (both those
with moderate initial depression and severe initial depression)
Subpopulation 1
Subpopulation 2
Subpopulation 2
• Enroll only from those with severe initial depression
?21 TT Subpopulation 1
Stage 1 Decision Stage 2
Subpopulation 2 Subpopulation 1
Subpopulation 2
Enrichment Design Recruitment Procedure
Recruit Both Populations
Recruit Only Subpop.1
Recruit Only Subpop.2
Recruit Both Pop.Subpopulation 1
Subpopulation 2
If TreatmentEffect Strong inTotal Pop.
Else, if TreatmentEffect Strongerin Subpop. 1
Else, if TreatmentEffect Strongerin Subpop. 2
Problem and Goals• Problem:
Analyze Group Sequential Designs thatAllow Changes to Population Sampled at Interim Points, Based on Earlier Data (Including Outcomes Data)
• Goals: 1. Make Inferences about Selected Populations2. Control Asymptotic, Worst-case, Familywise Type I Error Rate, Making No Model Assumptions
• Our Contribution: General Method for Reducing Problem to Optimization Problem that Can Be Easy to Solve with Standard Statistical Software
Some Related Work• Adapt Treatments and/or Population Sampled
[Thall, Simon, Ellenberg 1988, Schaid, Wieand, Therneau 1990, Wittes and Brittain 1990, Follman 1997, Bauer and Köhne 1994, Bauer and Kieser 1999, Liu, Proschan, Pledger 2002, Stallard and Todd 2003, Sampson and Sill 2005, Bischoff and Miller 2005, Freidlin and Simon 2005, Jennison and Turnbull, 2003, 2006, 2007, Wang, Hung, O’Neill 2009]
Our Contribution Theorem that Gives Reduction of Problem
of Computing Asymptotic, Worst-Case, Familywise Error Rate to Optimization Problem that Often Can be Solved Numerically with Standard Statistical Software.
Reduces Problem to Computations of
Probability that a Multivariate Normal is in a Set of Regions (that Depend on Design).
Scope of Method
Can be applied to studies with: • Any number of pre-planned treatment
arms• Any number of pre-planned stages• Sequential testing• Adaptation of Randomization Probabilities
Asymptotic, Worst-case familywise Type I Error
• We get sharp bounds on the asymptotic, worst-case, Type I error.
• For sample size n, P’ the set of possible data generating distributions, this is defined as:
€
limsupn→ ∞
[supP∈P'P(Reject at least one true null)]
Example from Earlier• 2-Stage Design, 2 Subpopulations of Interest• Three Null Hypotheses:
H00: Mean Effect in Total (Mixture) Population is ≤ 0.
H01: Mean Effect in Subpopulation 1 (moderate) is ≤ 0.
H02: Mean Effect in Subpopulation 2 (severe) is ≤ 0.
• Stage 1: Recruit Equal Number of Subjects from Each Subpopulation. Get t-statistics:
• Stage 2: if recruit as in Stage 1.
Else, Recruit from severe depressed only.• Test Procedure: if
then reject null for the population enrolled in stage 2.• Our Method Shows Asympt. Worst Case, FWER ≤ 0.05.
€
T10,T11,T12.
€
T11 > T12 or T11 > 0.3
1.65. 22
12
1211
T
TT
Computation of Worst-Case Asymptotic Familywise Error RateConsider Case of
μ1 > 0 (H01 false), and μ2 = 0 (H02 true). Type I Error only when Subpopulation 2 Selected
for Recruitment in Stage 2, ANDFinal Statistic:
By our Theorem, we have uniformly* in μ1 , σ1, σ2:
1.65. 22
12
1211
T
TT
.065.14
2/
4HReject
1
1
1
102
nn
P
Computation of Worst-Case Asymptotic Familywise Error RateWorst-Case Familywise Error Rate for case:
μ1 > 0 (H01 false), and μ2 = 0 (H02 true)
.65.14
2/
4suplim
1
1
1
1
, 11
nnn
.0302.065.12/sup0
.0302.0
Thus, in the Case:
μ1 > 0 (H01 false), and μ2 = 0 (H02 true), we haveAsymptotic, Worst-Case, FWER at most
Computation of Worst-Case Asymptotic Familywise Error RateMore generally, for more complex designs,
need to solve optimization problems of the form:
where X is a multivariate normal random variable with mean vector and covariance matrix that are functions of . , for R and S fixed regions.
Solve by grid search combined with bound on approximation error of grid search.
€
sup(λ1 ,...,λ J )∈R
Pλ1 ,...,λ J(X ∈S),
€
1,...,λ J
Power ComparisonConsider 3 scenarios:a) New antidepressant works only for those
with severe initial depression, and mean benefit modest (1.8 HRSD points)
b) Same as (a) but benefit strong (3 pts.) c) New antidepressant works equally well for
those with moderate and with severe initial depression, and benefit modest.
For each, consider:1. first stage enrolls equally from each subpop2. 75% of first stage enrolled moderate depr.
Power Comparison
Power Comparison
When Our Method Can Be AppliedOur Main Theorem:• Gives Reduction of Problem of Computing Worst-Case,
Asymptotic Familywise Error Rate to
Optimization Problem that Often Can be Solved with Standard Statistical Software
• Relies on:
1. Prespecification of Adaptation Algorithm
2. Centered Statistics from Each Stage Asymptotically Normal, and Independent of Past Stage Data Given Previous Design Decision.
3. Uniform Convergence in (2).
4. Selection and Rejection Regions Simple to Compute.
Assumptions
To Get Uniform Asymptotic Normality of Test Statistics, Conditioned on Prior Design Selection, Need to Assume:
• Variances Bounded Away from Zero• Sample Size at Each Stage Goes to InfinityTo Efficiently Compute Asymptotic FWER:• Decision Regions for Between-Stage
Adaptations and Rejection Regions Easy to Specify and Compute.
• Asymptotic Distribution of Test Statistics Depends on Finite Dimensional Parameter of Data Generating Distribution (e.g. Moments)
Limitations of Adaptation
• In General No Guarantee of Power Gain; May Lead to Loss of Efficiency
• Enrichment Designs Result in Reduced Generalizability of Results
• May Encourage Poorly Planned Designs (with Hope Adaptation will Fix Everything)
Open Problems
Generalizing Our Method to Designs with:• Adaptation of Monitoring Frequency• Adaptation of List of Covariates Collected• Survival Outcomes
Confidence Intervals
Conclusions
General Method for Reducing Problem of Computing Worst-case, Asymptotic, Familywise Error Rate to Optimization Problem that Often Can Be Easy to Solve with Standard Statistical Software.
References
Rosenblum, M. and van der Laan, M.J. Optimizing Randomized Trial Designs to Distinguish which Subpopulations Benefit from Treatment (under review)
http://people.csail.mit.edu/mrosenblum/adaptive_subpop.pdf
Kirsch, I., Deacon, B., Huedo-Medina, T., Scoboria, A., Moore, T. & et al. (2008). Initial severity and antidepressant benefits: A meta-analysis of data submitted to the food and drug administration. PLoS Med 5, e45.doi:10.1371/journal.pmed.0050045.
FDA (2010). Draft Guidance for Industry. Adaptive Design Clinical Trials for Drugs and Biologics. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM201790.pdf
Comparison to Alternative Analysis Method
Method based on [Bauer and Köhne 1994]:
Use Closed Test Principle [Marcus 1976] to
Deal with Multiple Hypotheses, and Prespecified
Function to Combine Stagewise p-values.
• Extremely Useful, Flexible Method• Facilitates Proving Bounds on FWER• Restricted to Cases where p-values in Each Stage
Dominate U[0,1] under Intersection Null Hypotheses.
Group Sequential Designs We Consider
• K stages (though may stop early)• Population Sampled at Stage m is
Prespecified Function of Data Collected in Past Stages.
• Allow Early Stopping for Efficacy, Safety, or Futility
• Prespecify: Primary Outcome, Null Hypotheses, Test statistics, Rejection Regions
When Our Method Can Be AppliedOur Main Theorem:• Gives Reduction of Problem of Computing Worst-Case,
Asymptotic Familywise Error Rate to
Optimization Problem that Often Can be Solved with Standard Statistical Software
• Relies on:
1. Prespecification of Adaptation Algorithm
2. For Each Possible Design Selection Before a Stage, Resulting Centered Test-Statistics from that Stage Asymptotically Normal (e.g. t-test or log-rank test)
3. Uniform Convergence in (2).
4. Selection and Rejection Regions Simple to Compute.