89
It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual Changes in the way we do phase II trials Adaptive methods for the refinement and evaluation of predictive biomarkers in the pivotal trials in a non-exploratory manner Use of archived tissues in focused “prospective-retrospective” designs based on randomized pivotal trials

It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Embed Size (px)

Citation preview

Page 1: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual– Changes in the way we do phase II trials– Adaptive methods for the refinement and evaluation

of predictive biomarkers in the pivotal trials in a non-exploratory manner

– Use of archived tissues in focused “prospective-retrospective” designs based on randomized pivotal trials

Page 2: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Biomarker Adaptive Threshold Design

Wenyu Jiang, Boris Freidlin & Richard Simon

JNCI 99:1036-43, 2007

Page 3: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Biomarker Adaptive Threshold Design

• Randomized trial of T vs C

• Have identified a univariate biomarker index B thought to be predictive of patients likely to benefit from T relative to C

• Eligibility not restricted by biomarker

• No threshold for biomarker determined

• Biomarker value scaled to range (0,1)

• Time-to-event data

Page 4: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Procedure AFallback Procedure

• Compare T vs C for all patients– If results are significant at level .04 claim

broad effectiveness of T– Otherwise proceed as follows

Page 5: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Procedure A

• Test T vs C restricted to patients with biomarker B > b – Let S(b) be log likelihood ratio statistic

• Repeat for all values of b• Let S* = max{S(b)}• Compute null distribution of S* by permuting

treatment labels• If the data value of S* is significant at 0.01 level,

then claim effectiveness of T for a patient subset• Compute point and interval estimates of the

threshold b

Page 6: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Sample Size Planning (A)

• Standard broad eligibility trial is sized for 80% power to detect reduction in hazard D at significance level 5%

• Biomarker adaptive threshold design is sized for 80% power to detect same reduction in hazard D at significance level 4% for overall analysis

Page 7: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Model Hazard reduction

for those

who benefit

Overall

Power

Adaptive

Test

Everyone

benefits

33% .775 .751

50%

benefit

60% .888 .932

25%

benefit

60% .429 .604

Page 8: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Estimated Power of Broad Eligibility Design (n=386 events) vs Adaptive Design A (n=412 events) 80% power for 30% hazard reduction

Model Broad Eligibility

Design

Biomarker Adaptive

Threshold A40% reduction in 50% of

patients

(22% overall reduction)

.70 .78

60% reduction in 25% of patients

(20% overall reduction)

.65 .91

79% reduction in 10% of patients

(14% overall reduction)

.35 .93

Page 9: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

0

*

*

log ( ) log ( ) ( ) ( )

binary treatment indicator

( , , , ) log partial likelihood

( ) max ( , , , )

b̂=argmax{l(b)}

ˆ ˆb value for bootstrap sample of cases

ˆ empirical distribution o

h t h t I B b I B b

l b

l b l b

b

F

*

*

*

ˆf b

ˆ for b based on percentiles of

ˆ ( ) probability patient with biomarker value B will benefit

from treatment with E rather than C

CI F

F B

Estimation of Threshold

Page 10: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C

1.0 mg DES, or 5.0 mg DES were combined as E.

The end-point was overall survival (death from any cause).

Covariates: Age: In years

Performance status (pf): Not bed-ridden at all vs other

Tumor size (sz): Size of the primary tumor (cm2)

Index of a combination of tumor stage and histologic grade (sg)

Serum phosphatic acid phosphatase levels (ap)

Page 11: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Prostate Cancer Data

Covariate # patients with measured covariate

Overall Test

p value

Procedure A

Stage 2

p value

Procedure B

p value

AP 505 .084 .019 .041

SG 494 .110 .025 .050

Page 12: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Prostate Cancer Data

Covariate # patients with measured covariate

Estimated Threshold

95% CI 80% CI

AP 505 36 (9,170) (25,108)

SG 494 11 (10,13) (11,11)

Page 13: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 14: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 15: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Procedure B

• S(b)=log likelihood ratio statistic for treatment effect in subset of patients with Bb

• T=max{S(0)+R, max{S(b)}}• Compute null distribution of T by permuting

treatment labels• If the data value of T is significant at 0.05 level,

then reject null hypothesis that T is ineffective • Compute point and interval estimates of the

threshold b

Page 16: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Sample Size Planning (B)

• Estimate power of procedure B relative to standard broad eligibility trial based on Table 1 for the row corresponding to the expected proportion of sensitive patients ( ) and the target hazard ratio for sensitive patients– e.g. =25% and =.4 gives RE=.429/.641=.67

• When B has power 80%, overall test has power 80*.67=53%

• Use formula B.2 to determine the approximate number of events needed for overall test to have power 53% for detecting =.4 limited to =25% of patients

Page 17: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Example Sample Size Planning for Procedure B

• Design a trial to detect =0.4 (60% reduction) limited to =25% of patients– Relative efficiency from Table 1 .429/.641=.67

• When procedure B has power 80%, standard test has power 80%*.67=53%

• Formula B.2 gives D’=230 events to have 53% power for overall test and thus approximate 80% power for B

• Overall test needs D=472 events for 80% power for detecting the diluted treatment effect

Page 18: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Events needed to Detect Hazard Ratio With Proportional Hazards

2

1 14log

z zD

Page 19: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Events (D’) Needed for Overall Test to Detect Hazard Ratio

Limited to Fraction

2' /D D

Page 20: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 21: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 22: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Multiple Biomarker DesignA Generalization of the Biomarker Adaptive Threshold Design

• Have identified K candidate binary classifiers B1 , …, BK thought to be predictive of patients likely to benefit from T relative to C

• RCT comparing new treatment T to control C• Eligibility not restricted by candidate classifiers

• Let the B0 classifier classify all patients positive

Page 23: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Test T vs C restricted to patients positive for Bk for k=0,1,…,K – Let S(Bk) be a measure of treatment effect in patients

positive for Bk – Let S* = max{S(Bk)} , k* = argmax{S(Bk)}– S* is the largest treatment effect observed– k* is the marker that identifies the patients where the

largest treatment effect is observed

Page 24: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• For a global test of significance– Randomly permute the treatment labels and repeat

the process of computing S* for the shuffled data– Repeat this to generate the distribution of S* under

the null hypothesis that there is no treatment effect for any subset of patients

– The statistical significance level is the area in the tail of the null distribution beyond the value of S* obtained for the un-suffled data

– If the data value of S* is significant at 0.05 level, then claim effectiveness of T for patients positive for marker k*

Page 25: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Repeating the analysis for bootstrap samples of cases provides– an estimate of the stability of k* (the

indication)

Page 26: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Adaptive Signature Design An adaptive design for generating and prospectively testing a gene expression

signature for sensitive patients

Boris Freidlin and Richard SimonClinical Cancer Research 11:7872-8, 2005

Page 27: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Adaptive Signature DesignEnd of Trial Analysis

• Compare E to C for all patients at significance level 0.04– If overall H0 is rejected, then claim

effectiveness of E for eligible patients– Otherwise

Page 28: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Otherwise:– Using only the first half of patients accrued during the

trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment E compared to control C

– Compare E to C for patients accrued in second stage who are predicted responsive to E based on classifier

• Perform test at significance level 0.01

• If H0 is rejected, claim effectiveness of E for subset defined by classifier

Page 29: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Treatment effect restricted to subset.10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400

patients.

Test Power

Overall .05 level test 46.7

Overall .04 level test 43.1

Sensitive subset .01 level test(performed only when overall .04 level test is negative)

42.2

Overall adaptive signature design 85.3

Page 30: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Overall treatment effect, no subset effect.10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400

patients.

Test Power

Overall .05 level test 74.2

Overall .04 level test 70.9

Sensitive subset .01 level test 1.0

Overall adaptive signature design 70.9

Page 31: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

* * *1log ( ,..., )

1i

i i i iKi

pt t x x

p

True Model

Page 32: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Classifier Development

• Using data from stage 1 patients, fit all single gene logistic models (j=1,…,M)

• Select genes with interaction significant at level

log ( )i j i j ij j i ijit p t x t x

Page 33: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Classification of Stage 2 Patients

• For i’th stage 2 patient, selected gene j votes to classify patient as preferentially sensitive to T if

ˆ ˆexp j j ijx R

Page 34: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Classification of Stage 2 Patients

• Classify i’th stage 2 patient as differentially sensitive to T relative to C if at least G selected genes vote for differential sensitivity of that patient

Page 35: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Empirical PowerResponse Rate for Control Patients 25%

Response Rate in

Sensitive Subset

Overall .05 Overall .04 Subset .01 Overall Adaptive

98% 49.5 45.4 75.8 85.7

95% 43.0 38.5 63.1 75.0

87% 36.7 31.7 34.5 51.6

80% 31.6 28.4 17.6 38.8

71% 26.0 22.6 6.3 26.3

Page 36: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Adaptive Signature Design for Clinical Trial of Advanced Prostate Cancer

Richard Simon, D.Sc.Chief, Biometric Research Branch, National Cancer Institute

http://brb.nci.nih.govhttp://brb.nci.nih.gov

Page 37: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Cancers of a primary site often represent a heterogeneous group of diverse molecular diseases which vary fundamentally with regard to – the oncogenic mutations that cause them– their responsiveness to specific drugs

Page 38: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• How can we develop new drugs in a manner more consistent with modern tumor biology and obtain reliable information about what regimens work for what kinds of patients?

Page 39: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Developing a drug with a companion test increases complexity and cost of development but should improve chance of success and has substantial benefits for patients and for the economics of medical care

Page 40: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Although the randomized clinical trial remains of fundamental importance for predictive genomic medicine, some of the conventional wisdom of how to design and analyze rct’s requires re-examination

• The concept of doing an rct of thousands of patients to answer a single question about average treatment effect for a target population presumed homogeneous with regard to the direction of treatment efficacy in many cases no longer has an adequate scientific basis

Page 41: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Predictive biomarkers– Measured before treatment to identify who will

benefit from a particular treatment

Page 42: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Prospective Co-Development of Drugs and Companion Diagnostics in Ideal Settings

1. Develop a completely specified classifier identifying the patients most likely to benefit from a new drug

Based on biology, pre-clinical data and phase I-II studies

2. Establish analytical validity of the classifier3. Design and analyze a focused clinical trial to evaluate effectiveness of the new treatment and how it relates to the classifier

Page 43: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Cancer biology is complex and it is not always possible to have the right single completely defined predictive classifier identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual– Adaptive methods for the refinement and evaluation

of predictive biomarkers in the pivotal trials in a non-exploratory manner

– Use of archived tissues in focused “prospective-retrospective” designs based on previously conducted randomized pivotal trials

• Simon, Paik, Hayes; JNCI 101:1-7, 2009

Page 44: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Adaptive Signature Design

Boris Freidlin and Richard SimonClinical Cancer Research 11:7872-8, 2005Clinical Cancer Research 11:7872-8, 2005

Page 45: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Adaptive Signature DesignEnd of Trial Analysis

• Compare X to C for all patients at significance level 0.01– If overall H0 is rejected, then claim

effectiveness of X for eligible patients– Otherwise

• Compare X to C in adaptively defined subset of patients using threshold of statistical significance 0.04

Page 46: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Divide the patients randomly into a training set T and a validation set V. The training set will contain one-third of the patients.

• Using the biomarker information, treatment and outcome for the patients in T, develop a binary classifier that identifies the subset of patients who appear most likely to benefit from the new treatment X compared to control C– f(B1,B2,B3,B4) = log hazard ratio of death for X relative to C as a

function of biomarker values– If f(B1,B2,B3,B4)/ser <c then Classifier(B1,B2,B3,B4)=X– If f(B1,B2,B3,B4)/ser >c then Classifier(B1,B2,B3,B4)=C– Cutpoint c optimized

Page 47: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Use the classifier developed in training set T to classify the patients in the validation set V.

• Let VX denote the subset of patients in V who are classified as likely to benefit from X

• Compare survivals of patients who received T to survivals of those who received C for patients accrued in VX

– If the difference in survival is significant at level 0.04, then the new treatment is more effective than the control for patients with biomarker values for which Classifier(B1,B2,B3,B4) =X.

Page 48: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• This approach can also be used to identify the subset of patients who don’t benefit from X in cases where X is superior to C overall at the 0.01 level. The patients in VC= V – VX are predicted not to benefit from X. Survivals of X vs C can be examined for patients in that subset and a confidence interval for the hazard ratio calculated.

Page 49: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• This design has improved statistical power for identifying treatments that benefit a subset of patients in molecularly heterogeneous diseases

• It has greater specificity than the standard approach which results in over-treatment of vast numbers of patients with approved drugs that do not benefit them

Page 50: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Sample Size Planning for Advanced Sample Size Planning for Advanced Prostate Cancer TrialProstate Cancer Trial

• Survival endpoint• Final analysis when there are 700 deaths total

– 90% power for detecting a 25% overall reduction in hazard at two-sided 0.01 significance level (increase in median from 12 months to 9 months)

• Power for evaluating treatment in adaptively determined subset– 157 deaths required for 80% power to detect 37% reduction in hazard at

two-sided 0.04 significance level. – If one-third of patients in the validation set are classifier positive, then to

have 157 deaths in the subset we need 157*3=471 deaths in the validation set. Since the validation set is two-thirds of the total, we require 707 total deaths.

– To have 700 deaths at final analysis, 935 patients will be accrued and followed till the event rate is 75%

Page 51: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Sample Size PlanningSample Size Planning

• For this example, the sample size is strongly dependent on having high statistical power for detecting relatively modest treatment effect overall and in an adaptively defined subset consisting of only 33% of the patients.

• The number of required patients can be substantially reduced by – Targeting larger treatment effects– Targeting treatment benefits that apply to more than 33% of the patients– Refining the simple interim analysis for futility described for this example

Page 52: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Tumor specimen at entry as condition for eligibility• Specimen preserved for later assay• Assays will be performed prior to analysis using

analytically validated tests– Reproducible, robust and accurate for use with

archived tissue– No cut-point required– Additional markers could be included prior to using

specimens

Page 53: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Interim Futility Analysis

• Interim futility analysis conducted when there are approximately 340 patients who have been followed for 6 months after randomization

• The analysis will use 6-month progression-free survival as intermediate endpoint.

• If difference between X group and C group is not significant at one-sided 0.20 level, then accrual will be terminated

• Power 90% for detecting 12 percentage point increase in proportion free of recurrence at 6 months from baseline of 40%

Page 54: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Interim Futility Analysis

• Interim futility analysis does not utilize any of the 5% type I error of the study

• Using 6 month PFS as endpoint for interim futility analysis does not assume that PFS is a valid surrogate of survival; only that it is plausible to not expect a survival benefit if there is no PFS benefit

• Using PFS enables trial to be stopped earlier if there is no evidence of benefit for X

• The one-sided 0.20 significance level is used because the overall effect may be weak if the treatment benefits only a 33% subset of the patients.

Page 55: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

If the Markers Were Measured at Randomization

• Analytically validated tests would be required by the start of accrual

• The interim analysis could involve marker-defined subsets of patients

• Restricting accrual based on interim evaluation of marker specific treatment effects could substantially reduce sample size but would introduce additional issues not addressed in the current design

Page 56: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Key Features

• Trial-wise type I error limited to 0.05– Chance of any false positive conclusion of treatment

benefit limited to 0.05• Randomized treatment assignment• Regulatory endpoint• Sample size sufficient for

– evaluating treatment effect in 33% subset• Biomarkers measured using analytically validated tests• Analysis algorithm pre-defined, and specific analysis

plan defined prior to any assaying of tumors or data analysis

Page 57: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• This approach is as sound statistically as the conventional one treatment fits all design– It provides strong evidence for evaluating the

new treatment overall and within the classifier positive subset and for evaluating the classifier

– In settings where a single conventional “average effect” trials would be the basis for drug approval, this design should be the basis for approval either overall or for the identified subset.

Page 58: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• This approach is more science based and consistent with tumor biology than the standard approach of treating thousands of patients with a heterogeneous disease to answer one question of whether the average treatment effect is zero and then treating everyone in a one treatment fits all manner.

Page 59: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Cross-Validated Adaptive Signature Design

Wenyu Jiang, Boris Freidlin, Richard Simon

Clin Ca Res 16:691-8, 2010

Page 60: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Cross-Validated Adaptive Signature Design

End of Trial Analysis

• Compare T to C for all patients at significance level overall

– If overall H0 is rejected, then claim effectiveness of T for eligible patients

– Otherwise

Page 61: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Otherwise

• Partition the full data set into K parts• Form a training set by omitting one of the K

parts. The omitted part is the test set– Using the training set, develop a predictive classifier

of the subset of patients who benefit preferentially from the new treatment T compared to control C using the methods developed for the ASD

– Classify the patients in the test set as either sensitive or not sensitive to T relative to C

• Repeat this procedure K times, leaving out a different part each time– After this is completed, all patients in the full dataset

are classified as sensitive or insensitive

Page 62: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Compare T to C for sensitive patients by computing a test statistic S e.g. the difference in response proportions or log-rank statistic (for survival)

• Generate the null distribution of S by permuting the treatment labels and repeating the entire K-fold cross-validation procedure

• Perform test at significance level 0.05 - overall

• If H0 is rejected, claim effectiveness of E for subset defined by classifier– The sensitive subset is determined by developing a

classifier using the full dataset

Page 63: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

80% Response to T in Sensitive Patients25% Response to C otherwise

25% Response to C 10% Patients Sensitive

ASD CV-ASD

Overall 0.05 Test 0.223 0.240

Overall 0.04 Test 0.198 0.209

Sensitive Subset 0.01 Test

0.205 0.661

Overall Power 0.351 0.714

Page 64: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

70% Response to T in Sensitive Patients25% Response to T Otherwise

25% Response to C20% Patients Sensitive

ASD CV-ASD

Overall 0.05 Test 0.486 0.503

Overall 0.04 Test 0.452 0.471

Sensitive Subset 0.01 Test

0.207 0.588

Overall Power 0.525 0.731

Page 65: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

70% Response to T in Sensitive Patients25% Response to T Otherwise

25% Response to C30% Patients Sensitive

ASD CV-ASD

Overall 0.05 Test 0.830 0.838

Overall 0.04 Test 0.794 0.808

Sensitive Subset 0.01 Test

0.306 0.723

Overall Power 0.825 0.918

Page 66: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

35% Response to T 25% Response to C

No Subset EffectASD CV-ASD

Overall 0.05 Test 0.586 0.594

Overall 0.04 Test 0.546 0.554

Sensitive Subset 0.01 Test

0.009 0

Overall Power 0.546 0.554

Page 67: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

25% Response to T 25% Response to C

No Subset EffectASD CV-ASD

Overall 0.05 Test 0.047 0.056

Overall 0.04 Test 0.04 0.048

Sensitive Subset 0.01 Test

0.001 0

Overall Power 0.041 0.048

Page 68: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Predictive Analysis of Clinical Trials

• Using cross-validation we can evaluate our methods for analysis of clinical trials, including complex subset analysis algorithms, in terms of their effect on improving patient outcome via informing therapeutic decision making

• R. Simon Clinical trials for predictive medicine, Clinical Trials 2010:1-9

Page 69: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Define an algorithm for predicting optimal treatment as function of covariate vector x using training dataset D

• For patients with covariate vector x, the algorithm predicts preferred treatment

• R(x | D) = T– Or

• R(x | D) = C

Page 70: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• At the conclusion of the trial randomly partition the patients into 10 equally sized sets P1 , … , P10

• Let D-i denote the full dataset minus data for patients in Pi

• Using 10-fold complete cross-validation, omit patients in Pi

• Apply the defined algorithm to analyze the trial using only data in D-i

• For each patient j in Pi record the treatment recommendations based on fitting the algorithm to data D-I i.e. Rj=T or Rj=C.

Page 71: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Repeat the above for all 10 loops of the cross-validation

• When all 10 loops are completed, all patients have been classified as what their optimal treatment is predicted to be

Page 72: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Test of Significance for Effectiveness of T vs C Using the New Algorithm

• Let A denote the set of patients for whom treatment T is predicted optimal with the new algorithm i.e. Rj=T

• Compare outcomes for patients in A who actually received T to those in A who actually received C– Let z = standardized log-rank statistic

• Compute statistical significance of z by randomly permuting treatment labels and repeating the entire procedure– Do this 1000 or more times to generate the permutation

null distribution of treatment effect for the patients predicted to be the best candidates for T

Page 73: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• The significance test based on comparing T vs C for the adaptively defined subset is the basis for demonstrating that T is more effective than C for some patients.– This test may be more powerful than the standard

overall test in cases where the proportion of patients who benefit from T is limited.

• Although there is less certainty about which patients actually benefit, prediction accuracy may be substantially greater than for the standard single null hypothesis test based method have greater specificity for identifying the right patients

Page 74: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C

1.0 mg DES, or 5.0 mg DES were combined as E.

The end-point was overall survival (death from any cause).

Covariates: Age: In years

Performance status (pf): Not bed-ridden at all vs other

Tumor size (sz): Size of the primary tumor (cm2)

Index of a combination of tumor stage and histologic grade (sg)

Serum phosphatic acid phosphatase levels (ap)

Page 75: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

After removing records with missing observations in any of the covariates, 485 observations remained.

A proportional hazards regression model was developed using patients in both E and C groups. Main effect of treatment, main effect of covariates and treatment by covariate interactions were considered.

log[HR(z,x)]=a z + b’x + z c’x

z = 0,1 treatment indicator (z=0 for control)

x = vector of covariates

log[HR(1,x)] – log[HR(0,x)] = a + c’x

Define classifier C(X) = 1 if a + c’x < c

= 0 otherwise

c was fixed to be the median of the a + c’x values in the training set.

Page 76: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is 0.09. The new treatment thus shows no benefit overall

at the 0.05 level.

Page 77: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Figure 2: Cross-validated survival curves for patients predicted to benefit from

the new treatment. log-rank statistic = 10.0, permutation p-value is .002

Page 78: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank statistic is 0.54.

Page 79: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Proportional Hazards Model Fitted to Full Dataset

coef p-valueTreatment -2.195 0.12age 0.002 0.85pf(Normal.Activity) -0.260 0.25sz 0.020 0.001sg 0.113 0.004ap 0.002 0.21Treatment*age 0.050 0.003Treatment*pf(Normal.Activity) -0.743 0.026Treatment*sz -0.010 0.26Treatment*sg -0.074 0.19Treatment*ap -0.003 0.11

Page 80: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• By applying the analysis algorithm to the full RCT dataset D, recommendations are developed for how future patients should be treated; i.e. R(x|D) for all x vectors.

• The stability of the recommendations can be evaluated based on the distribution of R(x|D(b)) for non-parametric bootstrap samples D(b) from the full dataset D.

Page 81: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 82: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 83: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 84: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of
Page 85: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Characteristics of Patients for Whom Classifications are Stable

< 20% of classifiers favor X >80% of classifiers favor X

Median IQR Median IQR

Age 76 4 68 12

Size 7 8 17 17

Stage 9 2 11 4

Acid Phos 0.6 0.4 1.45 9.42

Perf Status 144 good 31 poor 166 good 2 poor

Page 86: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Standard Analysis Algorithm

• Test the overall H0

• If you reject H0 then treat all future patients with T, otherwise treat all future patients with C

Page 87: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Expected K-Year DFS Using Standard Analysis

• If the overall null hypothesis is not rejected– Expected K-Year DFS is the observed K-year

DFS in the control group

• If the overall null hypothesis is rejected– Expected K-Year DFS is the observed K-year

DFS in T group

Page 88: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

Expected K-Year DFS Using New Algorithm

• Let S(T) = observed K-year DFS for patients j for whom Rj=T and who received treatment T– mT such patients

• Let S(C) = observed K-year DFS for patients j for whom Rj=C and who received treatment C– mC such patients

• Expected K-Year DFS using new algorithm • {mT S(T) + mC S(C)}/{mT + mC} • Confidence limits for this estimate can be

obtained by bootstrapping the complete cross-validation procedure

Page 89: It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of

• Hence, alternative methods for analyzing RCT’s can be evaluated in an unbiased manner with regard to their value to patients using the actual RCT data