Statistical Methods for Dose-Finding Experiments (Chevret/Statistical Methods for Dose-Finding Experiments) || Dose-Finding Based on Multiple Ordinal Toxicities in Phase I Oncology

JWBK084-12 JWBK084-Chevret March 2, 2006 2:27 Char Count= 0

12

Dose-finding based onmultiple ordinal toxicities inphase I oncology trials

B. Nebiyou Bekele and Peter F. ThallM. D. Anderson Cancer Center, Department of Biostatistics and AppliedMathematics, The University of Texas, Houston, Texas, USA

12.1 IntroductionThis chapter describes a method for designing a phase I trial of presurgical gemcitabinewith external beam radiation (EBR) for patients with soft tissue sarcoma. We plannedthe trial with a team of three oncologists who share responsibility for trial conduct.Each patient receives a fixed dose of 50 cGy EBR and one of 10 doses of gemcitabine,100, 200, . . . or 1000 mg/m2. In most oncology chemotherapy trials, the patient is atrisk of several different types of toxicity, each occurring at several possible severitylevels, or ‘grades’. The toxicities used as a basis for dose-finding in the sarcoma trialare summarized in Table 12.1.

The ‘severity weight’ listed beside each grade of each toxicity will play a centralrole in all that follows. Because toxicity evaluation may take up to six weeks, tofacilitate trial conduct the cohort size is allowed to vary between three and four. Ifthe first three patients in a cohort have had all of their toxicities evaluated before thenext is accrued, then that cohort is considered complete, the next dose is chosen andtreatment of the next cohort at that dose is begun. The safety constraint is imposed

243

Statistical Methods for Dose-Finding Experiments Edited by S. ChevretC© 2006 John Wiley & Sons, Ltd. ISBN: 0-470-86123-1


244 MULTIPLE ORDINAL TOXICITIES

Table 12.1 Toxicities and elicited severity weights used in the sarcoma trial.

Type of toxicity Grade Severity weight

1 Myelosuppression without fever 3 1.04 1.5

Myelosuppression with fever 3 5.04 6.0

2 Dermatitis 3 2.54 6.0

3 Liver 2 2.03 3.04 6.0

4 Nausea/vomiting 3 1.54 2.0

5 Fatigue 3 0.54 1.0

that no untried dose may be skipped when escalating. The planned sample size is36 patients.

The methodology that we developed for dose-finding in the sarcoma trial was mo-tivated by several concerns. The oncologists requested that the dose-finding methodaccount for the fact that the toxicities that they had identified are not equally importantand do not occur independently. For example, fatigue and nausea/vomiting are likelyto occur together, as are myelosuppression and fever. It was also requested that themethod utilize the information that a low-grade toxicity observed at a given dose,while not dose-limiting, is a warning that a higher grade of that toxicity is more likelyto occur at a higher dose.

Most phase I oncology trials require multiple toxicities to be monitored. These of-ten include transient conditions such as fatigue, nausea/vomiting, myelosuppression,thrombocytopenia, fever, infection, organ dysfunction and irreversible toxicities suchas permanent organ damage or death. In general, the different toxicities do not occurindependently. Most phase I protocols define ‘toxicity’ as the occurrence at grade 3 or4 of several listed toxicities. While it is convenient to reduce each toxicity to the binaryvariable, typically calling grades 0, 1 or 2 ‘no toxicity’ and grades 3 or 4 ‘toxicity’,this discards useful information. For example, if several patients experience a grade 2toxicity at dose level k, then a typical method based on the above binary variablewould escalate to level k + 1 as if no toxicities had occurred at level k. We will use aprobability model that distinguishes between grades 0, 1 and 2, rather than combiningthem as the event ‘no toxicity’, in order to obtain a more reliable basis for predictingthe jump from grade 2 to grade 3 or higher as the dose is increased from k to k + 1.Moreover, if each type of toxicity is defined as a binary variable, defining ‘toxicity’


MULTIPLE ORDINAL TOXICITIES 245

as the maximum of these indicators implicitly assumes that the different toxicities areequally important. For example, this definition does not distinguish between a patientwith grade 3 fatigue and a patient who has suffered complete kidney failure.

To develop a dose-finding method for the sarcoma trial addressing all of theseissues, we characterized patient outcome as a vector of correlated, ordinal-valuedtoxicities by applying the multivariate ordinal probit regression model of Chen andDey [1], extended to allow the different toxicities to have different numbers of severitylevels. To address the oncologists’ concern that different toxicities may not be equallyimportant, we elicited numerical weights to characterize the clinical importance ofeach severity level of each type of toxicity. We defined a patient’s total toxicity burdento be the sum of the weights of all toxicities experienced by that patient. Given theelicited weights, we identified a target total toxicity burden by constructing a setof hypothetical dose-toxicity scenarios and asking the physicians, in each scenario,whether they would escalate, repeat the same dose, or de-escalate for the next cohort.Our method assigns each cohort the dose having the posterior mean total toxicityburden closest to the target.

We developed the methodology during several meetings with the oncologists.Initially, they specified five toxicities, myelosuppression (M), fever, dermatitis (D),nausea/vomiting (N) and fatigue (F), as binary variables. We defined ‘toxicity’ con-ventionally as the maximum of these five indicators, and we constructed a continualreassessment method (CRM) design [2] with a target toxicity rate of 30 %. At the sec-ond meeting the issues arose that M is positively associated with fever and that M withfever (M+) is a much more severe event than M without fever (M−). This led us to com-bine M and fever into the five-level ordinal toxicity M0 < M−

3 < M−4 < M+

3 < M+4 ,

where the grade is denoted by a subscript and M0 denotes no M of grade > 2. Thisalso motivated the physicians to refine the other toxicities, making D, N and F each atrinary variable, and they also added liver toxicity (L), defined as a four-level variable(Table 12.1). We next elicited numerical weights to quantify the clinical importance ofeach level of each type of toxicity. We repeated this process over the course of severalmeetings, until the algorithm made decisions that the oncologists considered clinicallysensible under all of the scenarios. Additional details are given in reference [3].

12.2 Probability modelFor the sarcoma trial, since the decision is which dose to give the next cohort, werequired a model characterizing how the probabilities of the severity levels of eachtype of toxicity vary with dose, while also accounting for association among thetoxicities. Since we use computer simulation of the trial design to examine its operatingcharacteristics and calibrate its parameters before actual trial conduct, and this requiresthe model to be fit thousands of times, computational tractability is also an essentialrequirement. To obtain a model with all of these properties for the sarcoma trial, weapplied the Bayesian multivariate ordinal probit model of Chen and Dey [1]. Thisbelongs to the general class of models, developed by Albert and Chib [4] and Chib



and Greenberg [5], that uses a vector of correlated latent Gaussian variables to induceassociation among discrete outcomes.

Let Y = (Y1, . . . , YJ ) denote the vector of ordinal toxicities. The j th type oftoxicity, Y j , takes on one of the C j + 1 values {y j,0, y j,1, . . . , y j,C j }, where y j,k is thekth most severe level for k = 0, . . . , C j . In the sarcoma trial, J = 5. For example, ifY j is dermatitis then C j = 2, y j,0 denotes grade 0, 1 or 2, y j,1 denotes grade 3 and y j,2

denotes grade 4. Binary Y j corresponds to C j = 1. We replaced each raw gemcitabinedose d by x = log(d/1000), and we will refer to x as the ‘dose’. Association among theY j values is induced by the vector ZJ×1 = (Z1, . . . , Z J ) of correlated latent variables,assumed to be multivariate normal with E(Z j ) = β j,0 + xβ j,1 for each j , all variancesequal to 1 and correlation matrix ��. Thus, E(ZZZ ) = XXXβββ, where XXXJ×2J is the blockdiagonal matrix with the j th block (1 x) and βββ2J×1 = (β1,0, β1,1, . . . , βJ,0, βJ,1). Thelatent variable vector ZZZ determines the observed vector YYY as follows:

Y j = y j,k if γ j,k ≤ Z j < γ j,k+1 for k = 0, 1, . . . , C j and j = 1, . . . , J,

where the cut-off parameters γγγC j ×1j = (γ j,1, . . . , γ j,C j ) satisfy the constraint −∞ =

γ j,0 < γ j,1 < · · · < γ j,C j < γ j,C j +1 = +∞. Denote A j,k = (γ j,k, γ j,k+1] andγγγ C+×1 = (γ1, . . . , γJ ), where C+ = C1 + · · · + CJ . The variance–covariancematrix �� of Z must be its correlation matrix to ensure identifiability of the posteriors,which also requires that γ j,1 ≡ 0. Since γ j,0 = −∞, γ j,C j +1 = +∞ and γ j,1 = 0, ifC j > 1 there are only C j − 1 random cut-point parameters. Thus, while γγγ has C+entries, it actually contains only � J

j=1 1 (C j > 1)(C j − 1) parameters, where 1 (A)indicates the event A. Denoting the vector of all model parameters by θθθ, the marginaldistribution of Y j for a patient treated with dose x is

π j,k(x, θθθ ) = Pr(Y j = y j,k | x, θθθ )

= Φ{γ j,k+1 − (β j,0 + β j,1x)} − Φ{γ j,k − (β j,0 + β j,1x)}, (12.1)

where Φ is the standard normal cumulative distribution function (CDF). Denoteπππ j (x, θθθ ) = (π j,1(x, θθθ ), . . . , π j,C j (x, θθθ )) for each j = 1, . . . , J and πππ (x, θθθ) =(πππ1(x, θθθ ), . . . ,πππ J (x, θθθ )). Let φWWW (· | μμμ,��) denote the probability distribution func-tion (PDF) of a multivariate normal random vector W with mean vector μμμ andvariance–covariance matrix ��, and write W ∼ N (μμμ,��). For a given vector k =(k1, . . . , kJ ) of toxicity severity levels, the corresponding outcome is yyy(kkk) =(y1,k1 , . . . , yJ,kJ ). This would arise from the J -dimensional set AAA(kkk,γγγ ) = A1,k1 × · · ·× AJ,kJ of Z values. A single patient’s likelihood contribution may be written as

L(YYY | γγγ ,βββ,��, x) =C1∏

k1=0

· · ·CJ∏

kJ =0

{∫A(kkk,γγγ )

φZ (zzz | Xβββ,��) dz

}1[YYY=yyy(kkk)]

. (12.2)

This shows how Z induces association among the elements of Y through ��. Let x(i)

denote the i th patient’s dose and Xi the corresponding matrix. The likelihood for npatients is obtained by substituting YYY = YYY i , x = x(i) and X = Xi in equation (12.2)and taking the product over i = 1, . . . , n.



Denote the J (J − 1)/2 unique off-diagonal elements of �� by ρρρ =(ρ1,2, ρ1,3, . . . , ρJ−1,J ) and the cut-point parameter vector by γγγ , so that the modelparameter vector is θθθ = (βββ,γγγ ,ρρρ). A priori, we assume that βββ ∼ N (μμμ,��) and re-quire Pr(β j,1 > 0) = 1 for all j = 1, . . . , J. Thus, the prior of βββ is 2J -variate nor-mal with J entries truncated at 0, but μμμ and �� correspond to the untruncated 2J -variate normal. This ensures that Pr (Y j > y j,k | x) = 1 − Φ{γ j,k − (β j,0 + β j,1x)}increases with x for each j and k > 1, so that toxicity severity increases withdose. For each j with C j > 1, we will assume that {γ j,2, . . . , γ j,C j } follow indepen-dent, uninformative priors on [0, 10], with each g(γγγ j) ∝ 1, subject to the constraint0 < γ j,2 < γ j,3 < · · · < γ j,C j . The upper limit 10 on the support of the distribution ofthe γ j,k values is chosen for numerical convenience, since the probability mass beyond10 is negligible. We assume that the elements ofρρρ are iid (independent and identicallydistributed) N (0, 1000), truncated to the support [−1, +1], with �� positive definite.A method for eliciting a prior on βββ will be described in Section 12.4.

12.3 Dose-finding algorithmIf doses are to be chosen based on multivariate toxicity data, inevitably some formof dimension reduction must be carried out to obtain a real-valued criterion to useas a basis for deciding whether to escalate, stay at the same dose or de-escalate. Thefollowing dose-finding method incorporates medical knowledge into the dimensionreduction process. The method requires positive-valued numerical severity weights,characterizing the importance of each level of each type of toxicity, and these mustbe elicited from the physicians. For each j = 1, . . . , J, denote the elicited weightof toxicity type j at severity level y j,k by w j,k, and denote wwwj = (w j,1, . . . , w j,C j )and wwwC+−J×1 = (www1, . . . ,wwwJ), subject to the admissibility requirement 0 = w j,0 <

w j,1 < w j,2 < · · · < w j,C j . If the physicians assign the same weights to adjacentseverity levels w j,k = w j,k+1, then levels k and k + 1 of Y j should be combined. Thenumerical domain of the w j,k values is arbitrary since the method is invariant to theweights’ multiplicative scale, and we used the interval 0 to 10 in the sarcoma trialbecause the physicians were comfortable with this range.

The severity weight of Y j for a patient given dose x is the random variableW j taking on the value w j,k with probability π j,k(x, θθθ ). This replaces the observedtoxicity Y j with the weight-valued variable W j , by assigning the severity categoryprobabilities of Y j to the corresponding elicited weights. The patient’s total toxicityburden is TTB = ∑J

j=1 W j . The dose-finding algorithm is based on the posteriorexpected TTB at each dose, which takes the following form:

ψ(www, x, data) = E{E(TTB | x, θθθ

) | data} =J∑

j=1

C j∑k=1

w j,k E{π j,k(x, θθθ ) | data}.

(12.3)

It is easy to show [3] that ψ(www, x, data) is increasing in x . We use ψ(www, x, data) asa basis for dose-finding by eliciting a fixed target TTB value, ψ∗, from the physicians,



and assigning each successive cohort the dose currently having ψ(www, x, data) closestto ψ∗. For simplicity, suppress j and consider one toxicity. If Y = yk is observed atx , then the posterior of {π1(x, θθθ ), . . . , πC (x, θθθ )} must shift probability mass towardsπk(x, θθθ ). If yk is a high-level toxicity, i. e. if k is high in the range 1, . . . , C , since thewk values are increasing in k, it follows that

∑Ck=1 wkπk(x, θθθ ) must increase stochas-

tically; hence it must increase in expectation given the data. By the monotonicity ofψ(www, x, data) in x , this tends to decrease the next selected dose. Similarly, observationof low-level toxicities will on average increase the chosen dose.

12.4 Elicitation processTo implement the method, the physicians must specify the cohort size, c, sample size,N , doses, ddd = (d1, . . . , dK ), the toxicities to be monitored and their severity levels.Since there will be N/c cohorts available to search among the K doses, it is useful tostudy several feasible values of N in the computer simulations, and choose N on thatbasis. The physicians must specify a numerical severity weight for each level of eachtoxicity, within some given positive-valued numerical range, with 0 correspondingto no toxicity. The method is invariant to the particular numerical range of severityweights, so the main criterion is that the physicians should use a range with whichthey are comfortable. This establisheswww. This process may also lead the physicians tomodify the toxicities, and this is a natural part of the process of establishing toxicitiesand severity weights.

To establish a target TTB given ddd,YYY and www, ask the physicians to specify severalhypothetical J -variate toxicity outcomes for m hypothetical cohorts, with the severi-ties of the toxicities varying widely between cohorts, from ‘very toxic’ to ‘not toxicat all’. The number of hypothetical cohorts should be large enough to obtain a reason-able representation of the range of possible toxicities and severities, but not so largethat the elicitation process becomes unduly burdensome to the physicians. Toxicitieshaving larger numbers of levels should be included in more cohorts than toxicitieshaving fewer levels, and no single type of toxicity should be the only contributor toall of the ‘very toxic’ cohorts. The statisticians may suggest additional cohorts, butthese must make sense clinically to the physicians. Denote the outcomes of the mhypothetical cohorts by yyy∗

1 = (yyy∗1,1, . . . , yyy∗

1,c), . . . , yyy∗m = (yyy∗

m,1, . . . , yyy∗m,c). For each

hypothetical cohort, r = 1, . . . , m, first ask the physicians whether observation ofyyy∗

r = (yyy∗r,1, . . . , yyy∗

r,c) in the first cohort of the trial would cause them to repeat thesame dose (Dr = repeat), escalate to a higher dose (Dr = escalate), or de-escalate toa lower dose (Dr = de-escalate) for the next cohort. Next, ask what dose, d∗

r = d(yyy∗r ),

they would consider most likely to produce the outcomes yyy∗r of that hypothetical co-

hort. Letwww∗r,l denote the severity weight vector corresponding to yyy∗

r,l , for r = 1, . . . , mand l = 1, . . . , c. The mean total toxicity burden of the r th hypothetical cohort is

TTB∗r = 1

c

c∑l=1

J∑j=1

w∗r,l, j .



Denote the m ordered mean hypothetical TTB values by TTB∗(1) ≤ · · · ≤ TTB

∗(m),

and the corresponding vector of decisions in this order of increasing TTB valuesby D(1), . . . , D(m). An admissible sequence of decisions ordered in this way is oneconsisting of a string of escalations, followed by a string of repeats, followed by astring of de-escalations. If the m decisions are not admissible then, working with thephysicians, one must modify the hypothetical outcomes, elicited decisions, weights orpossibly other portions of the underlying structure as appropriate. Once an admissibleset of decisions is established, the target TTB is defined to be the mean of the elicitedTTB

∗r values for which the physicians’ decision was to repeat the same dose, formally

ψ∗ =∑m

r=1 TTB∗r 1(Dr = repeat)∑m

r=1 1(Dr = repeat).

Because ψ∗ is determined from the physicians’ subjective input, it is analogous to afixed target toxicity probability specified by the physicians in the simpler case of onebinary toxicity.

Since we assume vague priors onγγγ andρρρ, only the hyperparametersμμμ and�� of theprior on βββ must be specified. The doses d∗

1 , . . . , d∗m obtained as answers to the second

question may also be used to construct a prior onβββ, as follows. Beginning with a vagueN (μo, �o) prior on βββ, with μo

j,0 = 0, μoj,1 = 1, var o(β j,0) = var o(β j,1) = 10 000 for

each j , compute the posterior of βββ given the hypothetical data {d∗1 , yyy∗

1, . . . , d∗m, yyy∗

m}.We modified this distribution by multiplying each variance by m, the number ofhypothetical cohorts, and setting each off-diagonal element of � equal to 0. Thisyielded the prior on βββ used at the start of the trial. As a check for internal consistency,one may assume in turn that each hypothetical cohort is the first cohort in the trial. Ifthe algorithm’s decisions are the same as those made by the physicians, then one mayproceed; otherwise, the prior or possibly some other aspect of the model or designmust be modified, as appropriate, so that the method properly reflects clinical practice.

12.5 Application to the sarcoma trialThe oncologists specified toxicity severity weights ranging from 0 (no clinical im-portance) to 6 (Table 12.1). The elicited toxicity severity weights are illustrated inFigure 12.1. Table 12.2 summarizes the 16 hypothetical cohorts used in the elicitationprocess for the sarcoma trial, with no toxicities of any type denoted by NT. Each tox-icity is subscripted by its grade. For example, the first patient in hypothetical cohort3, with outcomes denoted by M−

3 + D3 + N3, experienced grade 3 myelosuppressionwithout fever, grade 3 dermatitis and grade 3 nausea/vomiting. The description of eachcohort is followed by the corresponding empirical mean TTB, the elicited decisionfor the next cohort and the dose considered by the physicians to be most likely to havecaused the cohort’s outcomes. Using the elicitation method described in Section 12.4,the three cohorts for which the decision was to repeat the same dose were 1, 9 and16, giving a target per patient TTB of ψ∗ = (3.00 + 3.12 + 3.00)/3 = 3.04.



0 1 2 3 4 5 6 7

Myelosup w/Fever Grade 3

Myelosup w/Fever Grade 4

Myelosup w/o Fever Grade 3

Myelosup w/o Fever Grade 4

Dermatitis Grade 3

Dermatitis Grade 4

Liver Toxicity Grade 2



Naus./Vomiting Grade 3

Naus./Vomiting Grade 4

Fatigue Grade 3

Fatigue Grade 4

Elicited Toxicity Severity Weights

Grade 2 Grade 3 Grade 4

Figure 12.1 Elicited toxicity severity weights identified as clinically important bythe investigators (Naus., Nausea; Myelosup, Myelosuppression).

At this writing, 18 patients have been treated and evaluated. Table 12.3 summa-rizes their outcomes and TTBs. The third cohort included only three patients becausethe toxicities of all of these three patients were evaluated before a fourth patientwas available to be accrued. The first cohort was treated at 400 mg/m2. Althoughψ(www, 700, data4) = 3.24 is closest to the target ψ∗ = 3.04, because no untried dosemay be skipped when escalating, the second cohort received 500 mg/m2. Incorpo-rating the second cohort’s data, since ψ(www, 600, data8) = 2.85 is closest to 3.04, thethird cohort received 600 mg/m2. Since ψ(www, 700, data11) = 3.24, the fourth cohortreceived 700 mg/m2. Since ψ(www, 700, data15) = 3.19, the trial stayed at 700 mg/m2.

The last cohort of patients resulted in ψ(www, 700, data18) = 3.14, resulting in a decisionto again stay at 700 mg/m2.

For this trial, conventional dose-finding methods typically would define one bi-nary ‘toxicity’ as the maximum of the indicators 1 (myelosuppression grade ≥3),1 (dermatitis grade ≥3), 1 (liver toxicity grade ≥3), 1 (nausea/vomiting grade ≥3),1 (fatigue grade ≥3). Thus, for example, a conventional method would consider a pa-tient with grade 2 liver toxicity (TTB = 2.5) to have ‘no toxicity’ and a patient withgrade 4 fatigue (TTB = 1) to have ‘toxicity’. Furthermore, conventional methodswould not distinguish the latter patient from a patient with grade 4 myelosuppression



Table 12.2 Hypothetical cohorts used to elicit the target TTB for the sarcoma trial.Myelosuppression, dermatitis, liver toxicity, fatigue and nausea/vomiting aredenoted by M, D, L, F and N respectively, with the grade represented by a subscript.With myelosuppression, the presence and absence of fever are denoted bysuperscripted + and −. NT denotes no toxicities of any type and d∗ is the doseconsidered by the physicians most likely to have caused the cohort’s outcomes.

Cohort Outcomes TTB∗

Decision d∗

1 M+4 , D4, NT, NT 3.00 Repeat 400

2 M−4 , L3, F4, N4 1.88 Escalation 200

3 M−3 + D3 + N3, M−

3 , M−3 , M−

3 2.00 Escalation 3004 D4, D4, L2, L2 4.00 De-escalation 6005 M−

3 + F3, M−3 + F3, L2 + F3, N3 2.25 Escalation 300

6 M+4 , L4, D4, NT 4.50 De-escalation 700

7 D3, D3, NT, NT 1.25 Escalation 1008 M−

3 , D3, F3, F3 1.25 Escalation 2009 D3, D4, L2, L2 3.12 Repeat 400

10 M+3 + N3, M+

3 + D3 + N3, 5.50 De-escalation 800D3 + F3 + N3, F3 + N3

11 M−3 + D3 + F3, F3, F2, N4 2.12 Escalation 300

12 L3, L3, NT, NT 1.50 Escalation 20013 M+

3 + D3 + F4, M+3 + D3 + F4, 5.62 De-escalation 900

M−3 + F4, D3 + F4

14 M−3 + N3, F4, L2, D3 + N3 2.38 Escalation 300

15 D4 + F4, L3 + F4, L2 + N4, N4 4.25 De-escalation 60016 M−

3 + F4, L2 + F3, M−3 + D3 + N4, L2 3.00 Repeat 500

with fever, grade 4 dermatitis, and grade 4 liver toxicity (TTB = 18). The proposedalgorithm based on the TTB with target 3.04 for ψ(www, x, data) makes more sensi-ble decisions. For example, three of the four patients in the first cohort had grade 3myelosuppression without fever and one also has grade 3 nausea and grade 3 der-matitis, for TTB values {5, 1, 1, 0} and empirical mean TTB = 1.75. As noted above,ψ(www, 700, data4) = 3.24 so the algorithm escalated to 500 mg/m2. In contrast, a con-ventional method would score three toxicities in these four patients and de-escalateto a lower dose.

12.6 Simulation study and sensitivity analyses12.6.1 Simulation study design

This section describes a simulation study of the sarcoma trial to assess the method’saverage behavior. In order to obtain a manageable set of dose-toxicity scenarios for



Table 12.3 Patient-by-patient illustration of the method in the sarcoma trial.Toxicities for the first 18 patients enrolled in the trial.

Patient Dose Myelosuppression Dermatitis Liver Fatigue Nausea TTB

1 400 Gr. 3 w/o fever Gr. 3 None None Gr. 3 5.0

2 400 Gr. 3 w/o fever None None None None 1.0

3 400 Gr. 3 w/o fever None None None None 1.04 400 None None None None None 05 500 None Gr. 3 None None None 2.56 500 None None None None None 07 500 Gr. 3 w/o fever Gr. 3 None None None 3.5

8 500 Gr. 4 w/o fever None Gr. 2 None None 3.5

9 600 None Gr. 3 None None None 2.510 600 None None Gr. 3 None None 2.011 600 None Gr. 3 None None None 2.512 700 Gr. 3 w/o fever None None None None 1.0

13 700 None None Gr. 3 None None 3.014 700 None None None Gr. 3 None 0.5

15 700 Gr. 4 w/o fever Gr. 3 Gr. 2 Gr. 3 None 6.5

16 700 None None Gr. 3 None None 3.0

17 700 Gr. 3 w/o fever None Gr. 3 None Gr. 3 5.518 700 None None None None None 0

the simulations, we considered only cases where the target TTB occurred at 200,500 or 800 mg/m2, and we categorized the main source of toxicity as being ei-ther those having high severity (HS) weights (w ≥ 5) or greater, or low severity(LS) weights (w ≤ 2). The remaining toxicities, grade 3 dermatitis (w = 2.5) andgrade 3 liver toxicity (w = 3), were considered intermediate and were included ineither group. For each simulation scenario, the fixed probabilities were p j,1,d , . . . ,

p j,C j ,d , for j = 1, . . . , 5 and d = 100, . . . , 1000, where p j,k,d = Pr(Y j = y j,k | d)and p j,0,d = 1 − ∑C j

k=1 p j,k,d . These probabilities were chosen nonparametrically tosatisfy

∑j

∑k w j,k p j,k,d = 3.04 for d = 200 under scenarios 1 and 2, d = 500 under

scenarios 3 and 4 , and d = 800 under scenarios 5 and 6. Figure 12.2 summarizes thesix scenarios graphically in terms of the TTB as a function of dose. Association amongthe elements of each simulated (Y1, . . . , Y5) was induced by first generating a sampleof standard normal random variables ZZZ5×1 with specified correlation matrix, thendefining the correlated uniform(0, 1) random variates UUU 5×1 = (Φ(Z1), . . . , Φ(Z5))and then denoting Pj,k,d = ∑k

r=0 p j,r,d for k = 0, . . . , C j and Pj,−1,d = 0, defining



Dose mg/m

To

tal T

oxi

city

Bu

rde

n

0 200 400 600 800 1000 1200

10

15

2

Target

Scenario

Total Toxicity Burden by Dose

2

Scenario 1

Scenario 5

Scenario 6

Scenario 3

Scenario 4

5

0

Figure 12.2 Total toxicity burden as a function of dose under each of the six dose-toxicity scenarios considered in the simulation study.

Y j = y j,k if Pj,k−1,d ≤ U j < Pj,k,d . The correlations were elicited from the physi-cians in terms of the underlying toxicities. The trial was simulated 1000 times undereach scenario, and each reported value is the average over these replications. Wefollowed the computational framework developed by Albert and Chib [4] for onepolytomous outcome, extended by Chib and Greenberg [5] to accommodate corre-lated binary outcomes and by Chen and Dey [1] to the correlated ordinal case.

For the simulation study, we evaluated the MCMC algorithm’s performance usingstandard convergence diagnostics. A burn-in of 1000 and a chain of length 30 000,retaining every 15th sample, provided adequate convergence. Additional details areprovided in Bekele and Thall (Section 12.6.1) [3].

12.6.2 Simulation results

Table 12.4 summarizes the simulation results. Under scenarios 1 and 2, where thetarget ψ∗ = 3.04 is achieved at 200 mg/m2, the starting dose of 400 mg/m2 is unac-ceptably toxic. Under scenario 1, most of the toxicity is due to LS events such as L2,

F or N. Under scenario 2, most of the toxicity burden is due to HS toxicities such asM+ or L4.



Tabl

e12

.4Si

mul

atio

nre

sults

for

the

sarc

oma

tria

lund

ersi

xdo

se-t

oxic

itysc

enar

ios.

(LS

=lo

wse

veri

tyto

xici

ties,

HS

=hi

ghse

veri

tyto

xici

ties,

Psel

=%

sele

cted

,Npa

ts=

num

ber

ofpa

tient

str

eate

d).

Gem

cita

bine

dose

(mg/

m2)

Mai

nSc

enar

ioto

xici

ties

100

200

300

400

500

600

700

800

900

1000

ψ∗

=3.

04at

200

mg/

m2

1L

SPs

el1.

693

.74.

70.

00.

00.

00.

00.

00.

00.

0N

pats

2.2

21.8

5.3

5.6

1.1

0.0

0.0

0.0

0.0

0.0

2H

SPs

el4.

192

.03.

80.

00.

00.

00.

00.

00.

00.

0N

pats

4.5

22.0

4.5

4.6

0.4

0.0

0.0

0.0

0.0

0.0

ψ∗

=3.

04at

500

mg/

m2

3L

SPs

el0.

00.

00.

05.

685

.49.

00.

00.

00.

00.

0N

pats

0.0

0.0

0.0

5.6

18.5

9.7

2.1

0.1

0.0

0.0

4H

SPs

el0.

00.

10.

05.

286

.97.

80.

00.

00.

00.

0N

pats

0.0

0.0

0.1

6.6

20.6

7.8

1.0

0.0

0.0

0.0

ψ∗

=3.

04at

800

mg/

m2

5L

SPs

el0.

00.

00.

00.

00.

00.

310

.480

.78.

30.

2N

pats

0.0

0.0

0.0

4.0

4.0

4.0

5.6

11.4

5.9

1.0

6H

SPs

el0.

00.

00.

00.

00.

00.

113

.380

.36.

20.

0N

pats

0.0

0.0

0.0

4.0

4.0

4.0

5.7

11.7

5.8

0.7



200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0

5

10

15

20

25

200 400 600 800 1000

0

4

8

12

16

20

200 400 600 800 1000

0.0

2.8

5.6

8.4

11.2

14.0

Dose

Pro

babili

ty o

f S

ele

ctio

n

Ave

rage N

um

ber

of P

atie

nts

Probability of SelectionAverage # of Patients

Target TTB at 200mg/m2

n=36


n=36


n=36

0.0

0.2

0.4

0.6

0.8

1.0

Figure 12.3 Graphical display of simulation results for the sarcoma trial underscenarios 1, 3 and 5.

Under either of these scenarios, the method chooses the best dose, 200 mg/m2,

over 90 % of the time, and on average treats 22 (61 %) of the 36 patients at this dose.The algorithm appears to perform well regardless of whether most of the toxicityburden arises from low or high severity toxicities. For scenarios 3 and 4, the targetTTB is achieved at 500 mg/m2, with most of the TTB due to LS toxicities underscenario 3 and HS toxicities under scenario 4. The method is insensitive to the sourceof the toxicities, with an 85–87 % correct selection rate and most of the 36 patientstreated at or within one level of the selected MTD. For each of scenarios 5 and 6, whereψ∗ = 3.04 at 800 mg/m2, the correct selection rate is about 80 %, slightly lower thanthe other cases. This is due primarily to the ‘do-not-skip’ rule, which requires that atleast one cohort be treated at each dose level when escalating, so that at least 16 of the36 patients must be treated at doses below 800 mg/m2 and few patients are availablefor evaluation at the higher dose levels. The selection probabilities and sample sizesare illustrated in Figure 12.3.

12.6.3 Sensitivity analyses

To address the method’s robustness to the severity weights, we performed a sensitivityanalysis by randomly perturbing the elicited weight vector, www = (www1, . . . ,wwwJ), to



obtain hypothetical weights. For each type of toxicity, we randomly perturbed themaximum weight, wC , of its most severe level by multiplying it by a uniform [0.5, 1.5]random variable, and then we randomly perturbed the lower weights w1 < w2 <

· · · < wC−1 while still maintaining their order. The hypothetical www(h) obtained inthis way determines a TTB target, ψ (∗,h), and given fixed outcome probabilities,{p j,k,d}, as defined in Section 12.6.1, where www(h) determines the best dose, d (∗,h),

among the 10 doses {100, . . . , 1000}, having the TTB closest to ψ (∗,h). Independentlygenerating 10 000 suchwww(h) vectors, the distribution of the corresponding ψ (∗,h) valueshas (2.5, 50, 97.5)th percentiles (1.76, 2.97, 4.67). To examine the distribution ofthe corresponding d (∗,h) values, e.g. under scenario 4, where d = 500 has the TTBclosest to the elicited ψ∗ = 3.04, d (∗,h) = 400, 500 and 600 mg/m2 with probabilities0.051, 0.875 and 0.071. Thus, despite the fact that www(h) is obtained as a rather severeperturbation of www, under scenario 4 the targeted dose under www(h) is nearly certain to bewithin one dose level of the dose (500 mg/m2) targeted by the elicited www, and 87.5 %of the time the targeted dose is the same. Similar results were obtained under the otherscenarios. Thus, the targeted dose appears to be robust to the elicited weights.

To examine the sensitivity of the dose-finding method to www, we simulated the trialunder scenario 4 using each of 16 hypothetical weight vectors, www(h,1), . . . ,www(h,16). Wechose these to reflect the distribution of d (∗,h) noted above, so that the targeted doseswere d (∗,h,1) = 400, d (∗,h,2) = · · · = d (∗,h,15) = 500 and d (∗,h,16) = 600. For the r thhypothetical weight vector, www(h,r ), denote the percentage absolute deviation of theselected dose, dsel, from d (∗,h,r ) by devr = 100 |dsel − d (∗,h,r )|/d (∗,h,r ). In all 16 cases,dsel was within one level of d (∗,h,r ) over 99.9 % of the time. For d (∗,h,1) = 400, doses(300, 400, 500) were selected (1.7, 60.6, 37.7) % of the time, and on average dev1

was 10.4 %. In 13 of the 14 cases where d (∗,h,r ) = 500, this target dose was selectedbetween 80.6 % and 86.5 % of the time, and in one case 500 was selected 73.2 % ofthe time. The mean values of dev2, . . . , dev15 varied from 2.9 % to 5.5 %. In the 16thcase, (500, 600, 700) were selected (43, 54, 3) % of the time, and on average dev16

was 12.1 %. Thus, the method appears to be robust to the elicited weights in termsof changes in the targeted dose, correct selection percentage and deviation of theselected dose from the targeted dose.

Using the elicited severity weights, we also examined the method’s sensitivityto cohort size, sample size, starting dose and ψ∗. Simulations examining the effectsof cohort size and sample size were conducted under scenario 4. For cohort sizes(1, 2, 3, 4, 5), with starting dose 400 and sample size 36, the corresponding correctselection percentages were (89, 86, 89, 87, 86). Since the range of these values is wellwithin what would be expected from simulation variation, the method appears to beinsensitive to cohort size. For sample sizes (28, 32, 36, 40, 44), with the cohort sizefixed at 4 and starting dose 400 mg/m2, the correct selection percentages were (82,84, 87, 88, 89). Thus, the method’s reliability improves with larger sample size. Weexamined the effect of changing the starting dose from 400 mg/m2 to 100 mg/m2 underscenarios 2, 4 and 6, where the target TTB is achieved at 200 mg/m2, 500 mg/m2 and800 mg/m2 respectively. In these cases the correct selection percentages were 92 %when the target is 200 mg/m2, 84 % when the target is 500 mg/m2 and 42 % when the



target is 800 mg/m2. The comparatively low value in the last case is as expected, sincethe ‘do-not-skip rule’ with starting dose 100 mg/m2 requires that 28 of the 36 patientsbe treated at doses below 800 mg/m2, which leaves at most two cohorts to treat atthe correct dose. If this rule is dropped, then the correct selection percentage in thiscase is 80 %. To examine the effect of higher correlation among the toxicities on thecorrect selection rate, we changed the correlations so that there was high correlationbetween fatigue and nausea/vomiting (0.60), low correlation between fatigue anddermatitis (0.20) and moderate correlation between fatigue and myelosuppression(0.40). Under scenarios 2, 4 and 6, with this correlation structure the correct selectionpercentages were 92 % when the target TTB was achieved at 200 mg/m2, 86 % at500 mg/m2 and 79 % at 800 mg/m2. Since these are nearly identical to the values inTable 12.4 obtained with the original correlation structure, it appears that this degreeof association among the toxicities does not alter the method’s behavior, on average.

We assessed the method’s sensitivity to ψ∗ by simulating the trial with targetψ∗(Δ) = ψ∗ ± Δ, for Δ = ± 0.25 and ± 0.50, i.e. ψ∗(Δ) = 2.54, 2.79, 3.29 and3.54, under each of scenarios 2, 4 and 6. Defining the ‘best’ dose d among the 10 levelsstudied as that at which

∑j

∑k w j,k p j,k,d is closest to ψ∗(Δ), in each of these cases

the best dose is identical to that for which Δ = 0. The percentages of selecting the bestdose for = (−0.50, −0.25, 0, +0.25, +0.50) are (68, 76, 92, 75, 65) under scenario2, (77, 82, 87, 85, 80) under scenario 4 and (55, 67, 80, 79, 75) under scenario 6.However, the method selects a dose within one level, i.e. within ± 100 mg/m2, of thebest dose over 99.5 % of the time in all of these cases.

12.7 Concluding remarksThe dose-finding method used for the sarcoma trial accommodates several differ-ent toxicities having severity levels of varying clinical importance. Our simulationstudy shows that, on average, the algorithm performs well under a wide variety ofcircumstances. The method requires substantially more effort to implement than con-ventional dose-finding methods, including close interaction with the physicians toestablish the toxicities, severity weights and target TTB, as well as a simulation studyof the design.

The method is much easier to implement in the case of one ordinal toxicity,which still provides a substantial advantage over methods based on one binary tox-icity. A single ordinal Y takes on one of C + 1 severity values y0 < y1 < . . . , yC ,

there is one latent variable Z ∼ N (β0 + β1x, 1) with (Y = yk) = (γk ≤ Z < γk+1)and πk(x, θθθ ) = Pr(Y = yk | x, θθθ ) =�{γk − (β0 + β1x)}−�{γk+1 − (β0 + β1x)} fork = 0, . . . , C, where 0 = γ1 < γ2 < · · · < γC . Only one vector of severity weights,(w1, . . . , wk), is elicited, the TTB = W, where W is univariate with Pr(W = wk) =πk(x, θθθ ) and ψ∗ is the elicited target for E(W | data) = ∑C

k=1 wk E{πk(x, θθθ ) | data}.For example, if a single ordinal toxicity is defined in terms of grades 0, 1, 2, 3, 4 but haselicited severity weights 0, 1, 2, 3, 6, and if πππ (a)(x) = (0.50, 0.10, 0.10, 0.20, 0.10)and πππ (b)(x) = (0.10, 0.10, 0.50, 0.10, 0.20) for a given dose x, then both of these



probability vectors yield the same conventionally used criterion Pr(Y ≥ 3 | x) = 0.30for dose-limiting (grade 3 or 4) toxicity, whereas πππ (a)(x) has E(a)(TTB) = 1.5 whileπππ (b)(x) has E(b)(TTB) = 2.6. This illustrates the fact that, even with only one toxicity,accounting for multiple severity levels and eliciting severity weights provides a moreinformative evaluation of toxicity.

References1. M.H. Chen and D.K. Dey (2000) Bayesian analysis for correlated ordinal data models, in

Generalized Linear Models: A Bayesian Perspective (eds S.K. Ghosh, D.K. Dey and B.K.Mallick), Marcel Dekker, New York, PP. 133–57.

2. J. O’Quigley, M. Pepe and L. Fisher (1990) Continual reassessment method: a practicaldesign for phase I clinical trials in cancer. Biometrics, 46, 33–48.

3. B.N. Bekele and P.F. Thall (2004) Dose-finding based on multiple toxicities in a soft tissuesarcoma trial. J. Am. Statistical Assoc., 99, 26–35.

4. J.H. Albert and S. Chib (1993) Bayesian analysis of binary and polytomous response data.J. Am. Statistical Assoc., 88, 669–79.

5. S. Chib and E. Greenberg (1998) Analysis of multivariate probit models. Biometrika, 85,347–61.

Documents

Statistical Methods for Dose-Finding Experiments (Chevret/Statistical Methods for Dose-Finding Experiments) || Dose-Finding Based on Multiple Ordinal Toxicities in Phase I Oncology