Biostatistics in cancer RCTs

Biostatistics in Cancer Clinical Trials

Dr. Bhaswat S. Chakraborty

VP, R&D, Cadila Pharmaceuticals Ltd.

Presented at the “Recent Trends in Bio-Medical Biostatistics”,

Gujarat University, Ahmedabad on 24.02.2007

Contents Research and Regulations of Cancer Trials Pivotal Cancer Trials (Phase III)

Efficacy end points Merits and demerits

Optimum Study Designs Superiority Non-Inferiority and other designs

Sample Size Considerations Scientific questions Basics of sample size calculation

Statistical Plan for a Cancer RCT Statistical Analysis of Cancer Data Tumor Data Analysis – an Example Conclusion

From Parkin, D. M. et al. CA Cancer J Clin 2005;55:74-108.

Worldwide Cancer Statistics (All Types)

From Parkin, D. M. et al. CA Cancer J Clin 2005;55:74-108.

Population Based Cancer Registries in India

(PBCR)(PBCR)

Cancer Research Today Research is conducted mainly on

New Drugs New Combinations Radiotherapy Surgery

In the West, research is usually done by large co-operative groups, in addition to those mentioned for India

In India Large Pharmaceuticals Co-operative Groups, e.g., ICON (Indian Co-operative Oncology Network)

Regional Cancer Centres & Govt. sponsored studies Academia

What does FDA Look for?

FDA approves a drug application based on Substantial evidence of efficacy & safety from

“adequate and well-controlled investigations” A valid comparison to a control Quantitative assessment of the drug’s effect

(21 CFR 314.126.)

The design of cancer trials intended to support drug approval is very important

Study Design: Approaches Randomised Controlled Trials (RCT) most preferred

approach Demonstrating superiority of the new therapy

Other approaches Single arm studies (e.g., Phase II)

e.g., when many complete responses were observed or when toxicity was minimal or modest

Equivalence Trials No Treatment or Placebo Control Studies Isolating Drug Effect in Combinations Studies for Radio- and Chemotherapy Protectants

Randomized Clinical Trials Gold standard in Phase III

Single centre CT Primary and secondary indications Safety profile in patients Pharmacological / toxicological characteristics

Multi-centre CT Confirmation of the above Effect size Site, care and demographic differences Epidemiological determination Complexity Far superior to meta-analyzed determination of effect

Non-Inferiority Trials

New drug not less effective by a predefined amount, the noninferiority (NI) margin

NI margin cannot be larger than the effect of the control drug in the new study

If the new drug is inferior by more than the NI margin, it would have no effect at all

NI margin is some fraction of (e.g., 50 percent) of the control drug effect

Placebo Control Equality Trials No anticancer drug treatment in the control arm is

unethical Sometimes acceptable

E.g., in early stage cancer when standard practice is to give no treatment

Add-on design (also for adjuvants) all patients receive standard treatment plus either no

additional treatment or the experimental drug

Placebos preferred to no-treatment controls because they permit blinding

Unless very low toxicity, blinding may not be feasible because of a relatively high rate of recognizable toxicities

Drug or Therapy Combinations Use the add-on design

Standard + Placebo Standard + Drug X

Effects seen in early phases of development Establish the contribution of a drug to a standard

regimen Particularly if the combination is more effective

than any of the individual components

What to Measure? Time to event end points

Survival Disease free survival Progress (of disease) free survival

Objective response rates Complete Partial Stable disease Progressive disease

Symptom end points Palliation QoL

Cancer Trials – End Points

Endpoint Evidence Assessment Some Advantages Some Disadvantages

Survival Clinical benefit RCT needed Blinding not essential

Direct measure of benefit Easily measured Precisely measured

Requires larger and longer studies Potentially affected by crossover therapy Does not capture symptom benefit Includes noncancer deaths

Disease-Free Survival (DFS)

Surrogate for accelerated approval or regular approval*

RCT needed Blinding preferred

Considered to be clinical benefit by some Needs fewer patients and shorter studies than survival

Not a validated survival surrogate in most settings Subject to assessment bias Various definitions exist

Cancer Trials – End PointsEndpoint Evidence Assessment Some Advantages Some Disadvantages

Objective Response Rate (ORR)


Single-arm or randomized studies can be used Blinding preferred in comparative studies

Can be assessed in single-arm studies

Not a direct measure of benefit Usually reflects drug activity in a minority of patients Data are moderately complex compared to survival

Complete Response (CR)


Single-arm or randomized studies can be used Blinding preferred in comparative studies

Durable CRs represent obvious benefit in some settings (see text) Can be assessed in single-arm studies

Few drugs produce high rates of CR Data are moderately complex compared to survival

0

Design Concepts

Dif

fere

nce

in C

linic

al E

ffic

acy

(Є)

= Meaningful Difference

Non-Inferiority

Equivalence

Inferiority

Superiority

-

+

Non-Superiority

Equality

Phase III Cancer Trials

0

10

20

30

40

50

60

70

80

90

Survival DFS QoL

New

Standard

New Drug (or Regimen) is Compared with a Standard

Superiority Trials

Phase III Cancer Trials

0

5

10

15

20

25

30

35

40

Survival DFS QoL

New

Standard

Non-Inferiority or Equivalence Trials

Understanding Basics μ0 and μA

Means under Null & Alternate Hypotheses σ0

2 and σA2

Variances under Null & Alternate Hypotheses (may be the same) N0

and NA Sample Sizes in two groups (may be the same)

H0: Null Hypothesis μ0 – μA = 0

HA: Alternate Hypothesis μ0 – μA = δ

Type I Error (α): False +ve Probability of rejecting a true H0

Type II Error (β): False –ve Probability of rejecting a true HA

Power (1-β): True +ve Probability of accepting a true HA

Basics of Sample Size Calculation

Answer the scientific questions for the Trial size

Understand the distribution and variability of the data

Construct correct Null and Alternate hypotheses

From the hypotheses derive formula for sample size

Also make sure that this size trial has adequate power to establish a true alternate

Five Key Questions1. What is the main purpose of the trial?2. What is the principal measure of patient outcome?3. How will the data be analysed to detect a treatment

difference?4. What type of results does one anticipate with standard

treatment?5. How small a treatment difference is it important to detect

and with what degree of certainty?

Answers to all of the five questions above enable us to calculate the sample size and analyze the data with most appropriate test of hypothesis.

Pocock SJ: Clinical Trials: A Practical Approach Chichester: Wiley; 1983

Reliable or historical data available? No

Yes Use conventional methods for analysis

Start Planning

Normally distributed continuous data? Summary

measure: mean & mean difference

Yes

Use parametric methods of analysis, two sample ‘t’ or

ANOVA

Use non-parametric methods of analysis, Mann-Whitney U or Proportional Odds Model

Use bootstrap simulation for sample size

μT – μC

σ∆normal =

Effect Size

Nnormal =2 [Z1-α/2 + Z1-β/2]2

∆2normal

Sample Size

No

α/2

Understanding Sample Size DeterminationH0: μ0 – μA = 0 HA: μ0 – μA = δ

α/2

Power = 1-β

β

S.Error =σ(√2/N) S.Error =σ(√2/N)

0+Z1-α/2σ√(2/N)

0

δ–Z1-βσ√(2/N)

δX0–XA

Critical Value

From the Previous Graph, We have

0+Z1-α/2σ√(2/N) = δ–Z1-βσ√(2/N)

Upon simplification,

Nnormal =2 [Z1-α/2 + Z1-β/2]2

∆2normal

Sample Size: 2-Sample, Parallel Superiority/Non-Inferiority Trial

(z+ zβ)2 (p1 (1– p1) + p2(1 – p2))

(Є – )2

N in each arm =

Power: 2-Sample, Parallel Superiority/Non-Inferiority Trial

Sample and Power for Simulated Tumor Data

0

0.2

0.4

0.6

0.8

1

0.3 0.4 0.5 0.6 0.7 0.8

Relative risk

86

64

50

110

n

Expected Relative Risk

Statistical Plan Primary outcome considerations Study Design Sample size calculation Randomization Statistical consideration in Inclusion/Exclusion criteria

(Homogeneity within centre and strata) Accrual of patients Cleaning of data Interim Analysis

Go/No go criteria α Considerations

Final analysis Final conclusions

Accrual of Patients

Study of the statistical trends in accrual patterns Seasonal Planned approaches Reasons for drop outs and loss to follow up Motivational factors

Monitoring of recruitment progress and strategies Frequency Parameters Duration

Understanding natural history and non-cancer, non-intervention deaths

Changes in accrual after Interim Analysis

Randomisation

Generation of randomisation scheme according to Centre Block Strata

Patient Investigational Product to be given Measures of ensuring non-bias

Allocations What should go on the labels

Primary, secondary, tertiary packaging

Blinding

Often difficult in oncology trials Test and control are of different characteristics

Different routes of administration Different schedules

New low toxicity oral treatments are relatively easy to blind

In other cases the end-point evaluating investigator must be different from the one administering the drugs

Data Capture Manual or Electronic CRF is the main source of raw data capture Data must be quality assured

Integrity, accountability, traceability Data must be validated All production and/or quality system software, purchased or

developed in-house Should document

Intended use, and information against which testing results and other evidence can be compared

To show that the software is validated for its intended use

Data Cleaning & Locking

Data are cleaned based on a good plan for interim or final analysis E.g.,

Hundred percent data are made quality checked and assured Eligibility criteria for data selection Correction and editing Double data entry or other methods of data integrity

Data will be locked after cleaning the data and resolving all the queries SOP for data locking No change after locking

Only locked data are used as input into data analysis program

Interim Analysis of Data

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

1 2 3 4

Nominal p

Looks

Interim Analysis of Data

Look

No

min

al P

valu

e

0.0

0.0

10

.03

0.0

5

1 2

2 Looks

o pococko ob+fleo fle+har+ob

Look

No

min

al P

valu

e

0.0

0.0

10

.03

0.0

5

1 2 3

3 Looks

Look

No

min

al P

valu

e

0.0

0.0

10

.03

0.0

5

1 2 3 4

4 Looks

Look

No

min

al P

valu

e

0.0

0.0

10

.03

0.0

51 2 3 4 5

5 Looks

How many times can you look into the data?

Type 1 error at kth

test is NOT the same as the nominal p value for the kth test

Considerations for IA

Stopping rules for significant efficacy Stopping rules for futility Measures taken to minimize bias A procedure/method for preparation of data for analysis Data has to be centrally pooled, cleaned and locked Data analysis - blinded or unblinded? To whom the interim results will be submitted?

DSMB Expert Steering Group

What is the scope of recommendations from IA results? Safety? Efficacy? Both? Futility? Sample size readjustment

for borderline results?

Final Analysis and Conclusion Clinically meaningful margins must be well defined in Control trials

prospectively Superiority and non-inferiority margins must not be confused

Two or one-sidedness of α should also be prospectively defined Power must be adequate Variance must be analysed using the right model Strategy for dealing with multiple end points must be prespecified

Too many end points ot tests will increase the false positive (α) error Sometimes (e.g., in equality trials) statistically significant results may not

be medically significant

Data censoring or skewed data E.g., time to event data

Intent-to-Treat Principle

All randomized patients Exclusions on prespecified baseline criteria permissible

also known as Modified Intent-to-Treat Confusion regarding intent-to-treat population: define and agree

upon in advance based upon desired indication Advantages:

Comparison protected by randomization Guards against bias when dropping out is realted to outcome

Can be interpreted as comparison of two strategies Failure to take drug is informative Refects the way treatments will perform in population

Concerns: “Difference detecting ability”

Per Protocol Analyses

Focuses on the outcome data

Addresses what happens to patients who remain on

therapy

Typically excludes patients with missing or

problematic data

Statistical concerns: Selection bias

Bias difficult to assess

Intent to Treat & Per Protocol Analyses

Both types of analyses are important for approval

Results should be logically consistent

Design protocol and monitor trial to minimize

exclusions Substantial missing data and poor drug compliance

weaken trial’s ability to demonstrate efficacy

Missing Data

Protocol should specify preferred method for dealing with missing primary endpoint ITT

e.g., treat missing as failures e.g., assign outcome based on blinded case-by-case

review Per Protocol

e.g., exclusion of patients with missing endpoint

Data Safety and Monitoring Board (DSMB)

All trials may not need a DSMB

DSMB Membership Medical Oncologist, Biostatistician and Ethicist

Statistical expertise is a key constituent of a DSMB

Three Critical Issues Risk to participants

Practicality of periodic review of a trial

Scientific validity of the trial

Simulated Tumor Data: An Example

time death group futime number reduction in size0 0 1 0 1 11 0 1 1 1 34 0 1 4 2 17 0 1 7 1 1

10 0 1 10 5 16 1 1 10 4 1

14 0 1 14 1 118 0 1 18 1 15 1 1 18 1 3

12 1 1 18 1 123 0 1 23 3 310 1 1 23 1 33 1 1 23 1 13 1 1 23 3 17 1 1 24 2 33 1 1 25 1 1

26 0 1 26 1 21 1 1 26 8 12 1 1 26 1 4

25 1 1 28 1 229 0 1 29 1 429 0 1 29 1 229 0 1 29 4 128 1 1 30 1 62 1 1 30 1 53 1 1 30 2 1

12 1 1 31 1 332 0 1 32 1 234 0 1 34 2 136 0 1 36 2 129 1 1 36 3 137 0 1 37 1 29 1 1 40 4 1

16 1 1 40 5 141 0 1 41 1 23 1 1 43 1 16 1 1 43 2 63 1 1 44 2 19 1 1 45 1 1

18 1 1 48 1 149 0 1 49 1 335 1 1 51 3 117 1 1 53 1 73 1 1 53 3 1

59 0 1 59 1 12 1 1 61 3 25 1 1 64 1 32 1 1 64 2 3

Simulated Tumor Data: An Example

time death group futime number reduction in size1 0 2 1 1 3

210 0 2 210 1 10180 1 2 180 8 8180 0 2 180 1 610 0 2 10 1 113 0 2 13 1 1

221 1 2 365 2 71 1 2 17 5 3

18 0 2 18 5 1142 1 2 365 1 5

2 1 2 19 5 176 1 2 21 1 422 0 2 22 1 125 0 2 25 1 1025 0 2 25 1 525 0 2 25 1 1

6 1 2 26 1 16 1 2 27 1 62 1 2 29 2 62 1 2 36 8 8

38 0 2 38 1 122 1 2 39 1 11

4 1 2 39 6 524 1 2 40 3 141 0 2 41 3 241 0 2 41 1 1

1 1 2 43 1 144 0 2 44 1 1

2 1 2 44 6 145 0 2 45 1 2

2 1 2 46 1 446 0 2 46 1 449 0 2 49 3 350 0 2 50 1 187 1 2 100 4 654 0 2 54 3 438 1 2 54 2 159 0 2 59 1 3

Simulated Tumor Data: An ExampleControl Group

0

10

20

30

40

50

60

70

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

Patient No

Su

rviv

al T

ime

(Day

s)

Experimental Group

0

50

100

150

200

250

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37

Patient No

Su

rviv

al T

ime

(day

s)

0

20

40

60

80

100

120

1 2

group

tim

e

± Standard deviation

Descriptive Statistics

Variable: timegrouped by: group

95%N Mean Conf. (±) Std.Error Std.Dev.

1 48 15.77083333 4.241619672 2.108375644 14.607254952 38 47.73684211 19.66266124 9.704135603 59.8203094

Entire sample 86 29.89534884 9.420677178 4.738064537 43.93900293

Kaplan Meier

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200 250

time

Pro

bab

ility Censored

1

2

Log-rank Test (Cox-Mantel)

Events observed

Events expected

1 29 21.092563062 18 25.90743694

Chi-squareDegrees of Freedom P

6.369814034 1 0.011607777

at Mean

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200 250

time

Pro

bab

ility

Cox Regression

Equation95% Hazard =

Coefficient Conf. (±) Std.Error P Exp(Coef.)

group -0.823394288 0.667410889 0.340517244 0.015603315 0.438939237

Conclusion of Tumor Data Kaplan Meier

Two survival patterns are different with a median of 12 and 70 days for the Control and Experimental Groups

Log-Rank Test The p-value of 0.0116 indicates significantly higher survival

experience of the experimental group Cox Regression

Hazard of death for the Experimental group is estimated to be about 44% that of the Control group

The log hazard coefficient is – 0.8234 (hence, e-0.8234=0.4389, which gives us the estimated unadjusted Experimental hazard ratio). It means that the expected log hazard for the Experimental group is .8234 lower than it is for the Control group

Difference in survival time in Experimental & Control groups is highly significant (p=0.0156)

Conclusions

Clinical testing of new Oncology products is very sophisticated and complex

A Statistician’s role in Cancer trials is invaluable

Statistical considerations must be thoroughly given attention and built in while planning the study design and calculating the sample size

Cancer clinical data is very complex (censored, skewed, often fraught with missing data point), therefore, proper hypothesization and statistical treatment of data are required

Prospective RCTs are usually the preferred approach for evaluation of new therapies

Conclusions Survival as primary end point is preferred by regulatory agencies Randomisation and blinding offer a robust way to remove bias in end-point

estimations Data must be accurately captured without any bias and analysed by prospectively

described methods Interim analysis should carefully plan ‘ spending’ function Final analysis should be done carefully, independently and meaningfully (medical

as well as scientific) Choose clinically relevant delta Design, conduct, and monitor trials to minimize missing data and poor compliance to

drug Analysis

Both intent-to-treat and per protocol analyses should be conducted Sensitivity analyses

There are many oustanding statistical issues in Cancer trials that need no be discussed and solved

Acknowledgements

Dr. Nikunj Patel Dr. Sumit Goyal Dr. Manish Harsh Dr. Nilesh Patel Ms. Darshini Shah

Thank You Very Much

Health & Medicine

Biostatistics in cancer RCTs