15
This article was downloaded by: [York University Libraries] On: 12 November 2014, At: 06:26 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of the American Statistical Association Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uasa20 Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose Ying Kuen Cheung a a Ying Kuen Cheung is Associate Professor, Department of Biostatistics, Columbia University, New York, NY 10032 . This project was supported by grant R01NS055809 from the National Institute of Neurological Disorders and Stroke. The content is solely the responsibility of the author and does not necessarily represent the official views of the National Institute of Neurological Disorders and Stroke or the National Institutes of Health. Published online: 01 Jan 2012. To cite this article: Ying Kuen Cheung (2007) Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose, Journal of the American Statistical Association, 102:480, 1448-1461, DOI: 10.1198/016214507000000699 To link to this article: http://dx.doi.org/10.1198/016214507000000699 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Embed Size (px)

Citation preview

Page 1: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

This article was downloaded by: [York University Libraries]On: 12 November 2014, At: 06:26Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of the American Statistical AssociationPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uasa20

Sequential Implementation of Stepwise Proceduresfor Identifying the Maximum Tolerated DoseYing Kuen Cheunga

a Ying Kuen Cheung is Associate Professor, Department of Biostatistics, ColumbiaUniversity, New York, NY 10032 . This project was supported by grant R01NS055809from the National Institute of Neurological Disorders and Stroke. The content is solelythe responsibility of the author and does not necessarily represent the official views ofthe National Institute of Neurological Disorders and Stroke or the National Institutes ofHealth.Published online: 01 Jan 2012.

To cite this article: Ying Kuen Cheung (2007) Sequential Implementation of Stepwise Procedures for Identifyingthe Maximum Tolerated Dose, Journal of the American Statistical Association, 102:480, 1448-1461, DOI:10.1198/016214507000000699

To link to this article: http://dx.doi.org/10.1198/016214507000000699

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Sequential Implementation of Stepwise Proceduresfor Identifying the Maximum Tolerated Dose

Ying Kuen CHEUNG

This article considers the problem of finding the maximum tolerated dose (MTD) of a drug in human trials. The MTD is defined as themaximum test dose with toxicity probability less than or equal to a target toxicity rate. We adopt the multiple test framework, with step-down tests used in an escalation stage and step-up tests used in a deescalation stage, to allow sequential dose assignments for ethicalpurposes. By formulating the estimation problem as a testing problem, the proposed procedures formally control the error probability ofselecting an unsafe dose. In addition, we can control the probability of correctly selecting the MTD under a parameter subspace where notoxicity probability lies in an interval bracketed by the target toxicity rate and an unacceptably high toxicity rate, the so-called “indifferencezone.” This frequentist property, which is currently lacking in the conduct of dose-finding trials in humans, is appealing from a regulatorystandpoint. We give the general expressions of the selection probabilities and apply some common statistical tests to the stepwise procedure.The design parameters are calibrated so that the average number of patients receiving an overdose is kept low. From a practical viewpoint,stepwise tests are simple and easy to understand, and the sequential implementation operates in a manner similar to the traditional algorithmfamiliar to clinicians. Extensive simulations illustrate that our methods yield good, competitive operating characteristics under a wide rangeof scenarios with realistic sample size and performs well even in situations in which other existing methods may fail, namely when thedose–toxicity curve is flat up to the targeted MTD.

KEY WORDS: Algorithm-based design; Background toxicity; Familywise error rate; Least favorable configuration; Minimum overdose;Phase I trial; Unbiased selection.

1. INTRODUCTION

In the early phases of clinical development of a new drug, theprimary concern is to establish safety of the drug. A specificobjective is to identify the maximum tolerated dose (MTD),defined as the maximum dose that causes toxicity with a pre-specified probability θ . Traditionally, the MTD is identified bya “3+3” algorithm, whereby the MTD is given as the highesttest dose with observed toxicity rate <.33 in a group of six pa-tients. Despite its widespread use in practice, there are a num-ber of criticisms against this traditional algorithm. In particular,the MTD estimation properties are unclear, although Storer andDeMets (1987) suggested that the algorithm is intended to iden-tify the 33rd percentile of the tolerance distribution for someclinical toxicity, that is, θ = .33. Several authors have since pro-posed new methods to address this percentile estimation prob-lem, including up-and-down designs (Storer 1989), the contin-ual reassessment method (CRM; O’Quigley, Pepe, and Fisher1990), and the biased coin design (BCD; Durham, Flournoy,and Rosenberger 1997). These methods generally have desir-able estimation properties and accommodate different choicesof target probability θ . But their use in practice has been lim-ited, because their operations are either unfamiliar or opaqueto the clinicians; therefore, there seems to be renewed interestin studying the statistical properties of the modification of thetraditional algorithm (Lin and Shih 2001).

In addition to logistical considerations, most methods in theliterature assume strict monotonicity of the dose–toxicity curve(see, e.g., condition D2 in Shen and O’Quigley 1996 for theCRM). A practical implication is that these methods may fail inthe scenario where the target toxicity probability θ is chosen tobe close to the background rate. This scenario is quite common.For example, Table 1 extracts from the work of Teal, Silver, and

Ying Kuen Cheung is Associate Professor, Department of Biostatistics,Columbia University, New York, NY 10032 (E-mail: [email protected]).This project was supported by grant R01NS055809 from the National Instituteof Neurological Disorders and Stroke. The content is solely the responsibilityof the author and does not necessarily represent the official views of the Na-tional Institute of Neurological Disorders and Stroke or the National Institutesof Health.

Simard (2005) the number of severe adverse events by treat-ment dose in a Phase II randomized study of repinotan in acutestroke patients. The most common adverse event was worsen-ing neurological status. The observed adverse event rates in theplacebo group and the three test doses were respectively 28%,15%, 25%, and 32%. Although the observed rates suggested anonmonotone dose–toxicity relationship, they were not statis-tically significantly different (p = .15 by Fisher’s exact test).On the other hand, this data set illustrates the high backgroundrate in acute stroke patients and the flatness at the lower endof the dose–toxicity curve. We planned a clinical trial of earlyphysical therapy in acute stroke patients based on this data. Al-though early rehabilitation might enhance recovery from stroke,prematurely instituting physical therapy could cause worseningneurological status, cardiac complications, or death, which wasestimated to occur in 25% of untreated stroke patients based onthe pooled adverse event rate in Table 1. The trial is conductedto identify the maximum (earliest) dose of physical therapy thatcan be instituted without causing adverse events in excess ofthis background rate. As suggested in the data, we expect thedose–toxicity curve to be quite flat at 25% up to the MTD,which is defined as the maximum dose with toxicity probability≤ θ = .25.

In this article we explore a class of algorithm-based de-signs based on stepwise test procedures that do not require themonotonicity assumption. Consider a trial with K test doses,and let pi denote the true toxicity probability associated withdose level i. Our goal is to identify the MTD with high proba-bility while keeping the probability of choosing an unsafe doselow. Precisely, dose i is said to be safe if pi is not much higherthan θ , that is, pi < θ + δ ≡ φ for some δ > 0. The maximumsafe dose (MAXSD) is then defined as

ν = max{i :pi < φ}, (1)

with the convention that max∅ = 0. This definition of theMAXSD was introduced by Tamhane, Dunnett, Green, and

© 2007 American Statistical AssociationJournal of the American Statistical Association

December 2007, Vol. 102, No. 480, Theory and MethodsDOI 10.1198/016214507000000699

1448

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 3: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Cheung: Implementation of Stepwise Procedures 1449

Table 1. Dose-toxicity data of the repinotan trial in acutestroke patients

Dose Sample Number of Event(mg/day) size adverse events rate

Placebo 58 16 27.6%.50 61 9 14.8%

1.25 61 15 24.6%2.50 60 19 31.7%

Wetherington (2001) and corresponds to the well-accepted no-tion of minimum effective dose (MED) for efficacy which maybe formulated in terms of a family of noninferiority hypotheses(see Hsu and Berger 1999 for the definition of MED). We adoptdifferent terminology from that of Hsu and Berger (1999), whocalled ν defined in (1) MTD; here we reserve the term MTDfor the maximum dose with pi ≤ θ in accord with the conven-tion in Phase I human trials (Storer and DeMets 1987). We thendefine a stepwise test with respect to the family of hypothe-ses, H0i :pi ≥ φ versus H1i :pi < φ for i = 1, . . . ,K . Underthis formulation, a dose is not considered safe unless proven bythe data by rejecting the null hypothesis. When a type I erroris committed against any true H0i , an unsafe dose will be se-lected for future studies. Our primary concern is thus to protectagainst making any false-positive inference with a prespecifiedprobability. Let Pπ (·) denote probability computed under theprobability vector π = (p1, . . . , pK)T . Then the familywise er-ror rate (FWER) of a stepwise procedure ν is defined as

FWER(ν) = max0≤m≤K

supπ∈�m

Pπ (ν > m),

where �m = {π :pm < φ,pk ≥ φ; k > m} is the parameter sub-space in which ν = m. In this article we consider designs with

FWER(ν) ≤ α0 (2)

for given α0 and φ. Note that because �0, . . . ,�K partition theentire parameter space � = {π :pi ∈ [0,1] for i = 1, . . . ,K},designs with property (2) provide strong control of the FWERwithout assuming monotonicity on the parameters.

A comparatively less serious error is the selection of a safedose that is lower than the MTD, which is equivalent to ν < m

when π ∈ �∗m = {π :p∗

m ≤ θ,pk ≥ φ; k > m}, where p∗m :=

max{p1, . . . , pm}. This occurs when the test fails to reject somefalse H0i for i ≤ m under �∗

m. Thus we may call this a type IIerror, in accordance with the Neyman–Pearson paradigm, anddefine power as 1 − Pr(type II error) = Pπ (ν ≥ m) for π ∈ �∗

m.Intuitively, the power of a test procedure ν here indicates howclose ν is to the true ν while having the probability of ν > ν

controlled through the FWER. In the context of dose-finding,however, we opt to control the probability of an incorrect selec-tion under the parameter subspace

⋃Km=0 �∗

m directly by con-trolling

PCS(ν) = min0≤m≤K

infπ∈�∗

m

Pπ (ν = m) ≥ 1 − α1 (3)

for given α1 and θ . Some notes on the rationales about ob-jective (3) are in order. First, π ∈ ⋃K

m=0 �∗m does not imply

monotonicity (i.e., p1 ≤ · · · ≤ pK ), but rather implies that everydose below ν is safe. The latter condition, referred to as weak

monotonicity by Tamhane et al. (2001), seems to be a pru-dent assumption in many situations. Second, the computationof PCS(ν) involves a search in the subspace

⋃Km=0 �∗

m, whereno pi lies in the interval (θ,φ). Note that a type II error as de-fined here could occur under some π ∈ ⋃K

m=0 �m. Here �∗m is

introduced to separate the MTD from the dose above it and en-sure adequate probability of correctly selecting the MTD; thisoperational objective is motivated by its relevance to the objec-tive of dose-finding with respect to the MTD. In a sense, weare indifferent to whether a dose with pi ∈ (θ,φ) is consideredsafe, because there is no explicit control of such probability.However, we may look at the numerical operating characteris-tics under the “indifference zone” scenarios after the test pro-cedure has been calibrated and examine whether the behavioris reasonable. The inferential role of such a separation betweenparameters in guarantee of (3) is discussed further in Section 2.

The multiple test framework for dose-finding inference wasintroduced by Tamhane, Hochberg, and Dunnett (1996), Hsuand Berger (1999), and Tamhane et al. (2001), who consideredsituations in which randomization of subjects to doses is fea-sible. This is generally not the case in the early-phase safetytrials, in which doses are determined sequentially for ethicalpurposes. In Section 2 we introduce a two-stage stepwise test-ing procedure that allows sequential implementation, and de-rive its key properties. In Section 3 we give applications of theprocedure with some statistical tests along with a discussion ondesign calibration. We present numerical examples of the cali-brated designs in the context of acute stroke trials in Section 4,and a simulation study comparing the proposed procedures andexisting methods in Section 5. We conclude with a discussionin Section 6.

2. TWO–STAGE STEPWISE TESTS:GENERAL METHOD

2.1 The Procedure

Let Yi(j) denote the number of toxic outcomes in thefirst j patients at dose i, and let the random vector Xi (j) ={Yi(1), . . . , Yi(j)} represent the data accrued for dose i up topatient j . Suppose for the moment that n patients have beenobserved for each dose. Let {Xi (n) ∈ Bn} be the rejection re-gion and {Xi (n) ∈ Bn} be the acceptance region for hypothesisH0i , where Bn and Bn are Borel sets that partition the rangespace of Xi (n). A step-down (SD) procedure starts with test-ing the most restrictive hypothesis (i.e., H01) and continuesto the next restrictive one until a null hypothesis is accepted.Then ν is estimated by νSD = min{i : Xi (n) ∈ Bn} − 1 with theconvention min∅ = K + 1. It is a consequence of the closedtesting principle (Marcus, Peritz, and Gabriel 1976) that thefamilywise error rate is bounded above by the individual testlevel. A step-up (SU) test proceeds in the opposite directionto a SD procedure and starts with the least significant statis-tic. Assuming monotonicity, we may a priori fix the test orderby starting with the highest dose and working downward until{Xi (n) ∈ Bn} occurs for some i, at which point ν is estimatedby νSU = max{i : Xi (n) ∈ Bn}. To achieve FWER(νSU) ≤ α, wegenerally need to devise Bn such that the individual test level issmaller than α.

A SU procedure may not be viable in a human trial settingbecause it is not ethical to treat patients at a high dose unless

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 4: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

1450 Journal of the American Statistical Association, December 2007

some evidence of safety is seen at its lower doses. However,we can implement a SD procedure in a sequential manner inan escalation stage (stage 1) and continue subsequent enroll-ments in a deescalation stage (stage 2) according to a SU test,as follows. Let nil be the number of patients at dose i at theend of stage l = 1,2 and denote Ril = {Xi (nil) ∈ Bnil

} andAil = {Xi (nil) ∈ Bnil

}. In other words, ni2 is the cumulativesample size at dose i, and Ri2 is a test that uses all of the dataaccrued. We allow nil to be random as long as Pr(Ril |pi) =Pr(Rj l |pj ) if pi = pj . Stage 1 starts at dose S1 < K and es-calates to dose i + 1 if and only if Ri1 is observed; we al-low a trial to start at any dose below the highest test dose, al-though it often starts at the lower end of the test doses in prac-tice. Stage 2 starts at S2 = min{i :Ai1 is observed} − 1, fromwhich deescalation occurs until Ri2 is observed for some i. Atthe end of the deescalation stage, ν is estimated by νSDSU =max{i :Ri1 ∩Ri2 is observed}, where the maximum takes val-ues from {1, . . . , S2}. Therefore, doses between S2 and νSDSU

are tested twice.

2.2 Expressions for FWER(νSDSU) and PCS(νSDSU)

Let βil = βl(pi) = Pr(Ril |pi) for l = 1,2 and γi = γ (pi) =Pr(Ri1 ∩Ri2|pi). Direct computation gives

Pπ (νSDSU = m) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

βm2

{S1−1∏

j=m+1

(1 − βj2)

}

S1,K

for 0 ≤ m ≤ S1 − 1{

m−1∏

j=S1

βj1

}

γmm+1,K

for S1 ≤ m ≤ K,

(4)

with β02 ≡ 1, K+1,K ≡ 1, and

m,K = (1 − βm1) +K−1∑

i=m

(1 − βi+1,1)

i∏

j=m

(βj1 − γj )

+K∏

j=m

(βj1 − γj ). (5)

Whereas (4) gives useful expressions for the exact computationof the distribution of νSDSU under a given toxicity configura-tion π , the results that follow shed light on how to computePCS(νSDSU) and FWER(νSDSU) under the following condition:

Condition 1. βil and γi are nonincreasing in pi .

Condition 1 implies that the test regions Ri1 and Ri1 ∩ Ri2

are unbiased. This assumption is satisfied by many tests associ-ated with the likelihood ratio and usually can be verified.

Theorem 1. Suppose that Condition 1 holds; then we havethe following results:

a. Pπ (νSDSU = m) is nonincreasing in pi for i ≤ m.b. Pπ (νSDSU = m) is nondecreasing in pi for i > m.

The proof of Theorem 1a follows trivially from (4) and theobservation that m+1,K is free of pi for i ≤ m. The proof ofTheorem 1b is given in the Appendix.

Theorem 2. For every πm ∈ �m, there exists πm−1 ∈ �m−1such that

Pπm(νSDSU > m) ≤ Pπm−1(νSDSU > m − 1)

for m = 1, . . . ,K .

By applying Theorem 2 recursively, for every π ∈ �m form ≥ 1, we can find π0 ∈ �0 such that Pπ (νSDSU > m) ≤Pπ0(νSDSU > 0). Consequently, we have

supπ∈�m

Pπ (νSDSU > m) ≤ supπ∈�0

Pπ (νSDSU > 0).

Taking the maximum of both sides over all m thus gives

FWER(νSDSU) = 1 − infπ∈�0

Pπ (νSDSU = 0)

= 1 − Pπ∗0(νSDSU = 0) ≡ 1 − P ∗

0 ,

where π∗0 = (φ, . . . , φ)T . The second identity is due to The-

orem 1b. This, together with (4), gives the computational formfor FWER(νSDSU) based on the test regions Ril for specified φ.

Furthermore let π∗m be the probability vector with pi = θ

for i ≤ m and = φ for i > m. It follows from Theorem 1that infπ∈�∗

mPπ (νSDSU = m) = Pπ∗

m(νSDSU = m) ≡ P ∗

m andthat PCS(νSDSU) = minm P ∗

m. The configuration π∗m, minimiz-

ing the probability of correct selection under �∗m, is called least

favorable for m = 0,1, . . . ,K .

Remark 1. Our design approach is to construct test regionsRil so as to keep the constraint PCS(νSDSU) ≥ 1 − α1, whichexcludes π in the “indifference zone,” from consideration. The-orem 1a implies that although there is no explicit control of theprobability of selecting dose m when pm lies in the indifferenceinterval (θ,φ), the design gives a reasonable property that theselection probability is decreasing as pm increases. This may beviewed as an extension of unbiasedness in hypothesis testing tounbiasedness in dose-finding design.

Remark 2. Suppose that π = (p1, . . . , pK)T ∈ �m withp∗

m < φ and pi ≥ φ for i > m. It is a ready consequence ofTheorem 1 that Pπ (νSDSU = ν) is bounded below by any pre-specified level 1 − α1 whenever α1 ≥ α0 and the test regionsRil (l = 1,2) are chosen such that βl(φ) ≤ 1 − (1 − α0)

1/S1

and γ (p∗m) > 1 − ε for a given ε > 0. In other words, the prob-

ability of correctly choosing ν can be guaranteed without intro-ducing an indifference zone. This result is analogous to that ofHsu (1981, 1984) that a separation between parameters is notnecessary to bound the probability of an incorrect selection inthe ranking and selection setting (see remark 2.1 in Hsu 1981).

2.3 An Upper Bound for Expected Number of Overdoses

Let N denote the total number of patients enrolled to adose so that a conclusion about H0i can be reached, that is,N = ni1 if Ai1 occurs and = ni2 if Ri1 occurs. The approachhere is to calibrate the design within a class of Ril to mini-mize Eφ(N), where Eφ(·) denotes expectation computed underpi = φ, subject to the error constraints P ∗

0 ≥ 1 − min(α0, α1)

and P ∗m ≥ 1 − α1 for m = 1, . . . ,K . (Thus, as far as the error

constraints are concerned, we assume without loss of generalitythat α0 ≤ α1.) Minimizing Eφ(N) is a simple operating objec-tive that represents the hope to conclude that a truly toxic doseis toxic with the smallest number of patients on average. Let Tm

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 5: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Cheung: Implementation of Stepwise Procedures 1451

denote the number of patients treated at doses {m, . . . ,K} in atrial.

Theorem 3. Let ε1 = Pr(Ri1|φ) be the probability of erro-neous escalation. Then

E∗m(Tm+1) ≤ (1 − εK−m

1 )Eφ(N)/(1 − ε1) ≡ τ ∗m

for m = S1, . . . ,K − 1,

where E∗m(·) = Eπ∗

m(·|S2 ≥ m) denotes conditional expectation

computed under the least favorable π∗m given S2 ≥ m.

Note that Eπ∗m(Tm+1) = E∗

m(Tm+1)Pπ∗m(S2 ≥ m). Theorem 3

thus prescribes an upper bound for the marginal expected num-ber of patients receiving an overdose as an increasing functionof Eφ(N) and ε1 when the starting dose S1 does not exceedthe true ν. Therefore, in addition to the error constraints, weconsider minimizing Eφ(N) over all designs that respect theconstraint that ε1 ≤ ε∗ for some specified ε∗. The proofs of thetheorems are given in the Appendix. Using arguments similarto the proof of Theorem 3 will give the upper bound for theexpected number of overdose for situations in which the start-ing dose S1 > ν as well. In this case we expect that an inflatednumber of patients will receive an overdose not only becauseE∗

m(Tm+1) may increase, but also because Pπ∗m(S2 ≥ m) = 1.

To be precise, we get

Eπ∗m(Tm+1) ≤ τm

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

τ ∗S1−1 + {1 − (1 − ε2)

S1−m−1}Eφ(ni2)/ε2

for m = 0, . . . , S1 − 1

τ ∗m{Pr(Ri1|θ)}m−S1+1

for m = S1, . . . ,K − 1,

where ε2 = Pr(Ri2|φ). Whereas a minimum overdose designmay be calibrated with respect to some weighted average ofτm’s, we opt for the computationally simple objective Eφ(N).On the other hand, it can be informative during the planningstage to compute these upper bounds for design comparisonpurposes.

3. APPLICATIONS WITH SPECIFIC TESTS

For practical purposes, it is often desirable to impose a max-imum group size Nmax so that the total sample size will notexceed KNmax. In Sections 3.1 and 3.2 we consider the two-stage stepwise procedure applied with tests with equal and fixedgroup size, that is, nil ≡ nl for all i and n1 ≤ n2 ≤ Nmax. Wediscuss designs with the sequential probability ratio test (SPRT)in Section 3.3.

3.1 Likelihood Ratio Test

The likelihood ratio test prescribes the test region Ril ={Yi(nl) ≤ cl} with c1 ≤ c2 and n1 ≤ n2 ≤ Nmax. Then ε1 =Pr{Yi(n1) ≤ c1|φ} and Eφ(N) = (1 − ε1)n1 + ε1n2. Due tosequential enrollment, it is possible to escalate or deescalatewithout accruing all nl patients to a dose at stage l if future ob-servations will not alter the decision. For instance, we shoulddeescalate once we observe Yi(n

′1) > c1 for some n′

1 < n1 instage 1. Likewise, we can escalate to dose i + 1 if Yi(n

′1) ≤

c1 + (n1 − n′1). These logical checks prevent future incoher-

ent escalation (deescalation) that follows a toxic (nontoxic) out-come, as described by Cheung (2005).

The design is defined completely by the parameters (c1, n1,

c2, n2); the set of design parameters that give the smallestEφ(N) among those that satisfy all constraints is considered op-timal in terms of overdose control. An exhaustive search is pos-sible by iterating n2 from 2 to Nmax in the outermost loop, withinner loops n1 ∈ {1, . . . , n2} and cl ∈ {0, . . . , nl} for l = 1,2.

Remark 3. Suppose that the optimal design thus obtained hasEφ(N) rounded up to N . Such optimality is local in that theremay exist a design with Eφ(N) smaller than N if we allow n2to iterate beyond Nmax. A globally minimum overdose designmay subsequently be obtained, as follows:

Algorithm 1.1. Iterate n1 from 2 to N and c1 from 0 to n1 for each n1.2. For each (n1, c1), increment n2 by 1 at each iteration until

(a) Eφ(N) ≥ N ; or (b) all error constraints are satisfied forsome c2 ∈ {0, . . . , n2}.

3. Select the design with smallest Eφ(N) among all designsin step 2(b).

Because Eφ(N) is increasing in n2 without bound, Algo-rithm 1 is guaranteed to stop at step 2 after a finite number ofiterations. If no design satisfies the error constraints in step 2(b),then the local minimum overdose design minimizes Eφ(N)

globally as well.

3.2 The Traditional Test Regions

The second class of tests takes an initial small cohort of n0patients in an attempt to speed up escalation at the virtuallynontoxic dose, or to quickly detect a high toxicity rate withouttaking a full cohort of n1(≥ n0) patients in stage 1. Specifi-cally, the test regions are Ril = {Yi(n0) ≤ c0} ∪ {c0 < Yi(n0) ≤c0, Yi(n1) ≤ c1} and Ri2 = {Yi(n2) ≤ c2} for n0 ≤ n1 ≤ n2 andc0 ≤ c0 ≤ c1 ≤ c2. Consequently, Eφ(N) = (1 − ε0)n0 + (ε0 −ε1)n1 + ε1n2, where ε0 = Pr{Yi(n0) ≤ c0|φ}. This approachmimics the test region in the traditional “3+3” algorithm andincludes it as a special case when n0 = 3, n1 = n2 = 6, c0 = 0,

and c0 = c1 = 1. Thus this class of tests is called the “traditionaltests” throughout. We include the possibility of expanding thegroup size in stage 2 (i.e., if n1 < n2), because it is commonpractice to expand the cohort at the potential MTD. Sequen-tial escalation and deescalation can be carried out in a coherentmanner as in the likelihood ratio tests. Also note that a tradi-tional test region reduces to that of a likelihood ratio test whenn0 = n1 and c0 = c0 = c1. Therefore, the minimum overdosedesign associated with this class of tests will be at least as safeas that with the likelihood ratio tests in terms of Eφ(N). Wecan verify that the traditional tests, and thus the likelihood ratiotests, satisfy Condition 1 by applying the results of Lehmann(1986, lemma 2, p. 85).

3.3 The Sequential Probability Ratio Test

Let λi,n denote the likelihood ratio for dose i, that is,

λi,n = θYi(n)(1 − θ)n−Yi(n)

φYi(n)(1 − φ)n−Yi(n)

and define stopping times nil = inf{n > 0 :λi,n ≥ ρl or λi,n ≤ζ } for l = 1,2 with ζ ≤ θ/φ and ρ2 ≥ ρ1 ≥ (1−θ)/(1−φ). Wethen define Wald’s (1945) sequential test Ril = {λi,nil

≥ ρl} and

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 6: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

1452 Journal of the American Statistical Association, December 2007

Ail = {λi,nil≤ ζ }. Because Ri2 ⊆ Ri1, we have γi = βi2 ≤ βi1.

We can verify that Condition 1 holds for these test regions byapplying the theorem of Hoel (1970). Ignoring overshoot, wecan approximate βi1 and βi2 by aφ := (1 − ζ )/(ρ1 − ζ ) andbφ := (1 − ζ )/(ρ2 − ζ ) when pi = φ and by aθ := ρ1aφ andbθ := ρ2bφ when pi = θ . The probability of correct selectionP ∗

m under the least favorable π∗m can then be computed as func-

tions of ρ1, ρ2, and ζ according to (4), with expression (5) sim-plified as

∗m,K = 1 − aφ

1 − aφ + bφ

+ bφ

1 − aφ + bφ

(aφ − bφ)K−m+1.

In addition, because ni1 = ni2 on Ai1, we have that N =ni1I (Ai1) + ni2I (Ri1) = ni2, where I (·) is the indicator func-tion of an event, and thus,

Eφ(N) ∼ (1 − bφ) log ζ + bφ logρ2

φ log(θ/φ) + (1 − φ) log{(1 − θ)/(1 − φ)} . (6)

We can verify through routine differentiation that the right sideof (6) is strictly increasing in ρ2 and strictly decreasing in ζ . Itis not as clear how to perform a global search for the minimumoverdose design, however, because ρ1 and ρ2 may increase in-definitely while Eφ(N) is bounded above by −C log ζ for somefinite C > 0. The following lemma facilitates the search of thedesign that minimizes (6) subject to the error constraints andaφ ≤ ε∗.

Lemma 1.a. ∗

m,K is strictly increasing in ζ,ρ1, and ρ2, so that∗

m,K → 1 as ρ2 → ∞. Thus P ∗0 = (1 − bφ)S1−1∗

S1,Kis

strictly increasing in ζ,ρ1, and ρ2.b. P ∗

K is strictly decreasing in ζ,ρ1, and ρ2.c. For any ζ and ρ1, there exists ρ∗

2 (ζ, ρ1) such that P ∗K ≤

P ∗m for m = 1, . . . ,K − 1 for every ρ2 ≥ ρ∗

2 (ζ, ρ1).

We restrict the choice of α0 and α1 to (1 − α0)1/S1 + (1 −

α1)1/K−S1+1 ≥ 1, so that ζ := {1 − (1 − α1)

1/K−S1+1}(1 −α0)

−1/S1 is between 0 and 1. Let ξ = (ζ, ρ1, ρ2) denote the de-sign triplet, and let �1 = {ξ : ζ ≤ ζ } and �2 = {ξ : ζ > ζ } bepartitions of the space of the design parameters.

Theorem 4.a. Any design ξ ∈ �2 that satisfies the error constraints also

must satisfy ρ1 < ρ(ζ ), where

ρ(ζ ) := 1 − ζ(1 − α0)1/S1

1 − (1 − α0)1/S1for 0 ≤ ζ ≤ 1.

b. If ξ = (ζ , ρ(ζ ), ρ(ζ )) satisfies the error constraints andaφ ≤ ε∗

1 , then ξ minimizes Eφ(N) among all ξ ’s that satisfy theerror constraints in �1.

Consequent to Theorem 4a, we can search for the minimumoverdose design ξ2 among all ξ ∈ �2, as follows.

Algorithm 2.1. Iterate ζ from ζ to θ/φ on a discrete domain with grid

width .001.2. For each given ζ , iterate ρ1 from max{(1 − θ)/(1 −

φ), ζ + (1 − ζ )/ε∗} to ρ(ζ ) on a discrete domain withgrid width .01.

3. For each given (ζ, ρ1), find ρ2,LB(ζ, ρ1) := min{ρ2 ≥ρ1 :P ∗

0 (ζ, ρ1, ρ2) ≥ 1 − α0} and ρ2,UB(ζ, ρ1) :=max{ρ2 ≥ ρ1 :P ∗

K(ζ,ρ1, ρ2) ≥ 1 − α1}. Then:a. If ρ2,LB(ζ, ρ1) > ρ2,UB(ζ, ρ1), then no ρ2 for this

given (ζ, ρ1) will satisfy the error constraints.b. Otherwise, iterate ρ2 from ρ2,LB(ζ, ρ1) to ρ2,UB(ζ,

ρ1) on a discrete domain with grid width .01 andrecord the smallest ρ2 for which the error constraintsare met.

4. ξ2 = (ζ , ρ1, ρ2) is the design with the smallest Eφ(N)

among those recorded in step 3(b).

By strict monotonicity of P ∗0 and P ∗

K in ρ2, ρ2,LB(ζ, ρ1)

and ρ2,UB(ζ, ρ1) can be determined efficiently through a bi-nary search. It is possible that ρ2,UB(ζ, ρ1) = ∞. In such cases,however, Lemma 1c guarantees that the error constraints willbe satisfied for some ρ2 ≤ max{ρ2,LB(ζ, ρ1), ρ

∗2 (ζ, ρ1)} < ∞.

Therefore, Algorithm 2 will have a finite number of iterations.If the condition in Theorem 4b is satisfied, then let ξ1 = ξ

be the optimum design in �1. Otherwise, an exhaustive searchfor ξ1 on a discrete domain is possible by iterating ζ from 0to ζ , and ρ1 and ρ2 from their corresponding lowest possiblevalues up to ρ2 defined in Algorithm 2, because any design withζ ≤ ζ < ζ and ρ2 > ρ2 will induce a larger Eφ(N) than ξ2

and can be excluded from consideration. The global minimumoverdose design is then chosen between ξ1 and ξ2.

To avoid an indefinite trial period and enrollment, we maytruncate the SPRT at Nmax for each dose in stage 1 and continuein stage 2 until the stopping rule is reached or a total of KNmax

patients have been enrolled. The test regions are modified asR′

il = {2 logλi,n′il

> log ζ + logρl} and A′il = {2 logλi,n′

il<

log ζ + logρl} for l = 1,2, where n′i1 = ni1 ∧ Nmax, n′

i2 =ni2 ∧ Ni2, and Ni2 is the number of patients treated at dose i

at the trial’s end, that is, when KNmax patients have been en-rolled. Because R′

il ∩ {n′il = nil} = Ril ∩ {n′

i1 = nil}, the oper-ating characteristics of the truncated design will be quite closeto those of the untruncated design as Pr(n′

il = nil) is large.

4. NUMERICAL ILLUSTRATIONS

4.1 Design Parameters and General Behaviors

Consider an acute stroke trial of physical therapy or drugwith K = 4 regimens and possible starting dose S1 = 1 or 2.Table 2 gives the parameters of the minimum overdose de-signs associated with the three statistical tests described in Sec-tion 3, along with each design’s behavior: Eφ(N), ε1, ε2, andτm. Each design was derived with respect to error constraintsφ = .45, α0 = .10, θ = .25, and 1 − α1 = .40 or .50. The searchwas confined to n2 ≤ Nmax = 20 for designs associated withthe likelihood ratio test and the traditional test and to ε1 ≤ .5for all three tests. For the SPRT, the minimum overdose designswere determined according to Algorithm 2 based on the ap-proximation (6) of Eφ(N) and the probability approximationsaφ, bφ, aθ , and bθ . However, the numerical values of the de-sign’s behavior in Table 2(c) are based on Monte Carlo esti-mates with 20,000 replicates. Values given in Table 2 for theother two tests are computed exactly.

An increase in the lower bound of the PCS(νSDSU) generallyrequires larger sample sizes. For example, as 1 − α1 increases

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 7: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Cheung: Implementation of Stepwise Procedures 1453

Table 2. Minimum overdose designs associated with three statistical tests for given K = 4, α0 = .10,1 − α1, S1, θ = .25, φ = .45,

ε∗ = .5, and Nmax = 20(a) Likelihood ratio test

S1 1 − α1 n1 c1 n2 c2 Eφ(N) ε1 ε2 τ0 τ1 τ2 τ3

1 .40 8 3 20 5 13.7 .48 .055 24.9 20.7 15.9 9.6.50 13 5 23 6 17.3 .43 .051 29.1 25.6 20.8 13.4

2 .40 11 4 18 4 13.8 .40 .041 39.4 21.4 17.0 10.8.50 13 5 24 6 17.7 .43 .036 52.5 28.5 23.2 15.0

(b) Traditional test

S1 1 − α1 n0 c0 c0 n1 c1 n2 c2 Eφ(N) ε1 ε2 τ0 τ1 τ2 τ3

1 .40 4 0 2 10 4 17 4 11.9 .48 .060 21.5 18.2 14.1 8.6.50 10 1 4 15 6 22 6 15.2 .38 .071 24.2 21.0 17.2 11.2

2 .40 6 1 3 11 4 18 4 12.6 .41 .041 37.7 19.7 15.6 9.8.50 7 2 3 14 6 24 6 16.2 .49 .036 52.2 28.2 21.2 13.6

(c) SPRT

S1 1 − α1 ζ ρ1 ρ2 Eφ(N) ε1 ε2 τ0 τ1 τ2 τ3

1 .40 .330 1.67 11.88 11.4 .44 .056 19.7 15.7 11.6 6.8.50 .258 1.74 13.07 14.3 .46 .053 25.4 21.3 16.5 10.0

2 .40 .393 1.61 16.24 11.1 .44 .040 29.2 18.1 13.4 7.8.50 .310 1.69 18.33 13.2 .45 .037 35.1 21.9 16.5 9.8

from .40 to .50, the maximum group size n2 of the likelihood ra-tio test with S1 = 1(2) increases from 20 (18) to 23 (24), and theacceptance and rejection boundaries of the SPRT grow widerapart (Fig. 1). The effects of the starting dose S1 on the sam-ple sizes are not as clear cut. However, starting a trial at onelevel above the lowest (i.e., S1 = 2) often implies an increasein the upper bounds, τm, of the average number of overdoses,

Figure 1. Two sets of decision boundaries of the SPRT in thetwo-stage stepwise procedure with S1 = 1 and 1 − α1 = .40 ( ,ζ = .33, ρ1 = 1.67, ρ2 = 11.88) or = .50 ( , ζ = .258, ρ1 = 1.74,ρ2 = 13.07). Decisions regarding each dose are based on three bound-aries. A deescalation will be made when the sample path Yi(n) hits thetop line (acceptance of H0i ); an escalation, when Yi(n) hits the middleline. Dose i is declared to be MTD when Yi(n) hits the bottom line(rejection of H0i ).

although this increase is not substantial except in the situationsin which all test doses have unacceptable toxicity rates. A row-by-row comparison of the three classes of tests indicates that theSPRT induces a smaller Eφ(N) than the traditional test, whichin turn induces a smaller Eφ(N) than the likelihood ratio test.This is expected because the traditional tests include the likeli-hood ratio tests as special cases, and the SPRT is known to mini-mize Eφ(N) [and Eθ(N)] for comparable error rates (Wald andWolfowitz 1948). As a result, the SPRT generally yields smalleroverdose upper bounds, τm, than the traditional and likelihoodratio tests.

4.2 Sequential Implementation

Suppose that the repinotan trial was redesigned as a Phase Isafety trial with test doses of .5, 1.25, 2.5, and 4.0 mg/day. Thedata from Teal et al. (2005), in hindsight, was suggestive of ascenario in which the three lower doses were safe with a 25%toxicity probability, whereas it might be conservatively specu-lated that the highest dose is unsafe. Under this scenario, wewould like to identify 2.5 mg/day (dose 3) as the MTD. Sup-pose that each patient was enrolled to the trial sequentially witha latent tolerance uniformly distributed on the interval (0, 1).If the uniform variate was smaller than the toxicity probabilityassociated with the dose given to the patient, then the patienthad a toxic outcome; otherwise, the patient did not have a toxicoutcome. To anticipate what might be seen in the actual trial,a sequence of uniform random variates was generated and rununder the scenario p1 = p2 = p3 = .25 and p4 = .45 using thedesigns in the first rows in Table 2(a)–(c), that is, S1 = 1 and1 − α1 = .4.

Figure 2(a) shows the outcomes for the design with the like-lihood ratio test. Among the first seven patients at dose 1 weretwo toxic outcomes. Had the eighth patient been given dose 1,

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 8: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

1454 Journal of the American Statistical Association, December 2007

(a) (b)

(c) (d)

Figure 2. Simulated trials by minimum overdose designs associated with (a) the likelihood ratio test, (b) the traditional test, (c) the SPRT,and (d) a two-stage CRM. Each point represents a patient, with “o” indicating no toxicity and “x” indicating toxicity.

the ninth patient would have received dose 2 regardless of theoutcome of the eighth patient, because the stage 1 test region,Ri1 = {Yi(8) ≤ 3}, was to be observed with certainty. Thus theeighth patient may have received dose 2 without altering thedose escalation decision. Similarly, dose escalation to dose 3occurred after five consecutive nontoxic outcomes were seenat dose 2, and deescalation occurred after four toxic outcomeswere seen at dose 4. These coherence checks suggested in Sec-tion 3.1 helps reduce the number of patients treated at dosesother than the final MTD estimate. After enrolling 35 patients,the trial concluded that dose 3 was the MTD.

The traditional test for the design in Figure 2(b) started eachdose with a small cohort (of size n0 = 4) in attempt to reach adecision early. An escalation will occur when there is no toxicoutcome in these four patients, as in dose 2 in the figure, anddeescalation will occur when there are at least three toxic out-comes, as in dose 4 in the figure. This helps further reduce thenumber of patients treated at unpromising doses. The trial en-rolled a total of 31 patients and gave the same MTD estimate asthat with the likelihood ratio test.

As shown in Figure 2(c), the stepwise test with the SPRTallowed quick escalation after as few as two nontoxic observa-tions (dose 1 and dose 3) but also was able to detect a toxicdose with only five patients (dose 4). The trial ended with 20

patients enrolled to dose 3 when {Y3(20) = 4} reached the re-jection boundary (see Fig. 1).

To illustrate how existing methods may miss the correct con-clusion under this particular scenario, Figure 2(d) shows thecorresponding outcome sequence for the CRM with an initial“group-of-three” design before the first toxic outcome appearedin patient 4 at dose 2. (The design details are given in Sec. 5.)Thereafter, dose assignments went back and forth between dose2 and its neighboring doses. The trial ended with 20 patients atdose 2, which is the final estimate of the MTD. This behav-ior is expected from the CRM because it is intended to con-centrate dose assignments at a dose with toxicity probabilityθ = .25 and to provide a good local approximation of the truetoxicity probability there. This myopic approach, which makesthe method work well and robustly under a strictly increasingdose–toxicity curve, prevents the trial from exploring higherdoses that may be equally safe under a flat dose–toxicity curve.Other methods that only target a single dose with toxicity rate θ

may suffer from the same “complacency” problem. In contrast,the minimum overdose designs allow exploration of high doseswith a prespecified safety parameter, ε1, while controlling thePCS.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 9: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Cheung: Implementation of Stepwise Procedures 1455

5. SIMULATION STUDY

To further examine and compare the operating characteristicsof the proposed methods, we ran 20,000 simulation replicatesusing the three stepwise procedures described in Section 4.2under a wide range of scenarios. We used truncated SPRT be-cause an open-ended trial rarely would be considered in prac-tice. Specifically, the SPRT was truncated at 20 patients perdose in stage 1 and at 80 patients in stage 2. We consideredtwo existing methods for comparison purposes:

• The continual reassessment method (CRM). This startedwith an initial design that escalated after every group ofthree consecutive nontoxic observations. Once the firsttoxic outcome was seen, the dose–toxicity curve wouldbe updated after every single patient, and the next patientwould be given the dose with toxicity probability esti-mated to be closest to θ = .25. The dose–toxicity proba-bility for dose i was modeled as d

ψi with d1 = .15, d2 =

.25, d3 = .35, and d4 = .50, where ψ was a priori lognor-mal with location 0 and scale 1.34 (O’Quigley and Shen1996).

• Biased coin design (BCD). During a trial, an escalationoccurred with probability 1/3 (= θ/(1 − θ)) after a non-toxic outcome, and deescalation occurred with certaintyafter a toxic outcome. At the end of the trial, the MTDwas estimated through isotonic regression (Stylianou andFlournoy 2002).

Both of these methods enrolled up to a maximum of 80patients and stopped when 20 patients had been allocated toa dose. All designs started at S1 = 1. Whereas the stepwiseprocedures include “selecting none” (when νSDSU = 0) as apossible decision, the existing methods do not have a clearrecommendation for this. For comparison purposes, when theregression-based MTD estimate in CRM and BCD was dose1, we performed an additional test: The final selection wouldremain dose 1 if {Y1(n

′) ≤ c.1} was observed, where n′ is thenumber of patients treated at dose 1, and the method would se-lect no dose if otherwise. The constant c.1 is the critical value ofan exact binomial test for H01 :p1 ≥ .45 versus H11 :p1 < .45with significance level of .10, pretending that Y1(n

′) is binomialby ignoring the sequential nature of data accrual, for example,c.1 = 5 when n′ = 20. This attempt was intended to control thetype I error rate induced by CRM and BCD in scenarios wheredose 1 was unsafe (i.e., p1 ≥ .45).

We include the simulation results under 13 toxicity proba-bility configurations that are representative of a wider scopeof scenarios. The first set of five scenarios comprises the leastfavorable configurations L0–L4 for which the proposed step-wise procedures are intended. The results are summarized inTable 3. In the other eight scenarios, the dose–toxicity curvesare strictly increasing around the MTD. The CRM and the BCDare expected to work well under these toxicity configurations.To present the results systematically, we group these scenar-ios into two sets of four with shallow and steep dose–toxicitycurves. Table 4 gives the results under the shallow scenariosH1–H4, and Table 5 gives the results under the steep scenariosT1–T4.

In summary, the stepwise procedures with the likelihood ra-tio test (LRT) and the traditional test (Trad) are comparable.

Trad requires 2–5 fewer patients on average than LRT in the13 simulation scenarios but has slightly smaller PCS (up to 4percentage points). The stepwise procedure with the truncatedSPRT (SPRT) is generally at least as good as LRT and Trad interms of average sample size and PCS, except under scenarioT1, where “selecting none” is a correct decision. We note, how-ever, that selecting dose 1 (toxicity probability .35) in scenarioT1 may not be considered a serious mistake. All three proce-dures have a similar average number of toxicities. Importantly,the FWER is controlled at or below .10 in all scenarios.

Under the least favorable configurations, CRM and BCD aregenerally outperformed by the stepwise procedures. Simula-tions with larger sample sizes did not show much improvementin the PCS for these two methods. This is somewhat expectedbecause strict monotonicity is required by the CRM, and theisotonic estimates of toxicity probabilities are known to be bi-ased upward on the high doses and downward on the low doses,especially when the dose–toxicity curve is flat. Under steepdose–toxicity curves T1–T4, CRM and BCD are generally bet-ter than the stepwise methods, but SPRT is quite competitive interms of PCS and average sample size. In addition, we note thatthe stepwise procedures select an unsafe dose less often thanCRM and BCD under the steep scenarios T2 and T3. This illus-trates the advantage of formally controlling the FWER in thestepwise procedures. In the shallow scenarios H4 and H1–H3where the dose immediately above the MTD falls is not un-safe, the stepwise procedures have comparable PCS to CRMand BCD. Overall, CRM seems to be the safest in terms of theaverage number of toxicities and is more reliable in that thevariability in the sample size is smallest. However, the stepwiseprocedures appear to give reasonable operating characteristicsconsistently over a wide range of scenarios, including thosewith a flat dose–toxicity curve.

One may be curious how the stepwise procedures will behavein situations with a much smaller sample size than 80, as a ref-eree pointed out that the testing framework may be less usefulin a small-sample setting. Naturally, we can reduce the requiredsample size by expecting and allowing larger error rates. Forexample, if we set α0 = .15 and 1 − α1 = .30 for θ = .25 andφ = .45, then the minimum overdose design parameters withthe likelihood ratio test are n1 = 6, c1 = 2, n2 = 10, and c2 = 2.This design is more in line with the practical expectations oftrial size. Alternatively, being slightly less wary of error control,we could simply apply truncation to the SPRT at Nmax = 10 instage 1 and at a total of 40 in stage 2. Tables 6–8 show the op-erating characteristics of the stepwise procedure with the trun-cated SPRT with respect to three sets of (α0,1 − α1), denotedas SPRT(·, ·), with the respective arguments α0 and 1 − α1.We observe that the actual error rates exceed the prespecifiedconstraints due to truncation, although the violation is small.Consider SPRT(.10, .40). The probability of selecting an unsafedose is .12 (> .10) under the global null L0, and the probabil-ity of correctly selecting the MTD is .38 (< .40) under scenarioL4; see Table 6. These values suggest that the prespecified errorconstraints may be too ambitious for the available sample size.

Anticipating that the actual FWER would be slightly higherthan the prespecified α0 in a truncated SPRT, we could cali-brate the test procedure with (α0,1 − α1) = (.08, .35) to keepthe FWER under .10 while imposing a less stringent PCS con-straint. Generally, SPRT(.08, .35) shifts the distribution of dose

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 10: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

1456 Journal of the American Statistical Association, December 2007

Table 3. Operating characteristics of the CRM, the BCD, and the stepwise procedures associated with the LRT, the traditional test,and the truncated SPRT under the least favorable configurations

Proportion of selecting dose Number of toxicitiesave(IQR)

Sample sizeave(IQR)Design .00 1 2 3 4

Scenario L0 π .45 .45 .45 .45LRT .91 .05 .02 .01 .01 9(4,10) 19(6,26)

Trad .90 .05 .03 .01 .01 8(5,10) 17(7,25)

SPRT .90 .05 .02 .01 .01 8(3,12) 19(4,27)

CRM .93 .05 .01 .00 .00 10(8,11) 22(20,20)

BCD .94 .05 .01 .00 .00 14(12,16) 32(25,37)

Scenario L1 π .25 .45 .45 .45LRT .37 .55 .04 .02 .01 11(7,15) 32(23,41)

Trad .40 .51 .05 .02 .01 10(7,13) 28(20,37)

SPRT .26 .66 .05 .02 .01 11(4,16) 32(15,45)

CRM .37 .52 .09 .01 .00 8(6,9) 26(20,29)

BCD .38 .51 .10 .01 .00 12(10,14) 36(28,42)

Scenario L2 π .25 .25 .45 .45LRT .22 .23 .49 .04 .02 12(8,15) 37(28,47)

Trad .24 .23 .47 .04 .02 11(7,14) 34(26,43)

SPRT .20 .19 .56 .04 .02 10(4,15) 33(17,46)

CRM .31 .27 .34 .07 .01 8(6,9) 28(20,33)

BCD .30 .21 .42 .06 .01 13(10,16) 43(36,50)

Scenario L3 π .25 .25 .25 .45LRT .18 .13 .20 .45 .04 11(8,14) 38(32,47)

Trad .18 .14 .21 .43 .04 10(7,13) 35(29,44)

SPRT .18 .14 .16 .49 .03 9(4,13) 32(18,44)

CRM .30 .24 .18 .25 .04 7(6,8) 28(20,34)

BCD .25 .16 .23 .32 .04 13(10,16) 47(39,56)

Scenario L4 π .25 .25 .25 .25LRT .16 .11 .12 .19 .42 9(6,11) 36(32,41)

Trad .16 .10 .12 .20 .41 8(5,11) 34(29,41)

SPRT .18 .14 .11 .14 .44 7(3,10) 28(17,38)

CRM .30 .23 .17 .16 .14 7(6,8) 28(20,33)

BCD .23 .14 .18 .17 .28 12(9,15) 48(39,58)

selection to the left of that of SPRT(.10, .40) and thus is moreconservative. On the other hand, if one is willing to allow alarger FWER (say .15), then one may consider SPRT(.12, .40),which shifts the distribution of dose selection to higher dosesfrom those of SPRT(.10, .40). As a result of this shift, there aregenerally improvements in terms of average sample size andPCS, except under scenarios L0 and T1.

Tables 6–8 also give the operating characteristics of the CRMthat enrolls up to a maximum of 40 patients, stops when 16 pa-tients have been allocated to a dose, and will select no dose ifthe model-based MTD estimate is dose 1 and {Y1(n

′) > c.15}occurs. The stopping sample size, 16, is chosen so that the av-erage sample size of CRM matches that of SPRT(.12, .40), andthe critical constant c.15 is intended to control the type I errorrate at .15 in scenarios in which dose 1 is unsafe.

Overall, the relative performances of SPRT and CRM aresimilar to those in situations with larger sample sizes, althoughthe PCS difference between the two methods generally in-creases under steep dose–toxicity curves, with CRM being su-perior. This is not surprising, because CRM is known to con-verge to the MTD quickly under these scenarios. However, thestepwise procedure SPRT has several desirable features that fa-vor its use in a small-sample setting. First, it has robust perfor-

mance across all simulation scenarios, with PCS ranging from.40 to .70 for SPRT(.12, .40), whereas CRM performs poorlyunder the flat dose–toxicity curves (L2–L4). Second, althoughtruncation may have a greater-than-desired effect on preserva-tion of the FWER, it can be easily mended by calibrating thedesign with a more stringent α0 than desired as in SPRT(.08,.35). Third, we could view α0 and α1 as tuning parameters thathelp control the shape of the distribution of dose selection as inSPRT(.12, .40). Such versatility and flexibility will prove usefulin the planning stage of a trial.

6. DISCUSSION

The fundamental problem of the traditional “3+3” algorithmis the lack of a clear quantitative objective. On the other hand,its sequential operations are ethically accepted by and familiarto clinicians. This article proposes a class of two-stage stepwiseprocedures that can be viewed as extension of this traditional al-gorithm with a clearly defined objective and explicit frequentistproperties. The recent development of dose-finding methodol-ogy addresses the formulation of the scientific objective, but noattention has been given how to formally control the error ratesincurred. Although not much regulatory concern has been ex-pressed about such control, a statement such as “a dose with

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 11: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Cheung: Implementation of Stepwise Procedures 1457

Table 4. Operating characteristics of the CRM, the BCD, and the stepwise procedures associated with the LRT, the traditional test, andthe truncated SPRT under shallow dose-toxicity curves

Proportion of selecting dose Number of toxicitiesave(IQR)

Sample sizeave(IQR)Design .00 1 2 3 4

Scenario H1 π .25 .35 .45 .60LRT .32 .44 .20 .03 .00 12(8,16) 35(25,45)

Trad .35 .42 .20 .03 .00 11(8,14) 32(23,41)

SPRT .25 .47 .25 .03 .00 12(4,17) 34(16,50)

CRM .35 .42 .20 .03 .00 8(6,9) 27(20,32)

BCD .36 .39 .22 .02 .00 13(10,15) 39(31,46)

Scenario H2 π .17 .25 .35 .45LRT .06 .28 .43 .19 .03 12(8,15) 41(32,51)

Trad .07 .29 .41 .19 .04 11(8,14) 37(30,46)

SPRT .09 .20 .45 .23 .03 11(5,16) 37(20,51)

CRM .08 .27 .42 .20 .02 8(6,9) 31(26,36)

BCD .08 .20 .51 .19 .02 12(10,15) 47(39,55)

Scenario H3 π .10 .20 .25 .40LRT .01 .11 .29 .50 .10 10(8,13) 42(36,48)

Trad .01 .12 .29 .47 .11 9(7,12) 38(32,43)

SPRT .03 .10 .21 .55 .11 10(5,13) 35(21,46)

CRM .01 .08 .32 .47 .12 7(6,9) 34(29,38)

BCD .01 .06 .36 .47 .11 11(9,13) 50(43,58)

Scenario H4 π .05 .05 .18 .25LRT .00 .00 .08 .34 .58 6(4,8) 38(34,42)

Trad .00 .00 .10 .35 .55 6(4,7) 34(29,37)

SPRT .01 .00 .08 .23 .68 6(2,8) 28(20,35)

CRM .00 .00 .02 .36 .62 6(5,7) 33(29,36)

BCD .00 .00 .03 .36 .62 7(6,9) 47(41,53)

Table 5. Operating characteristics of the CRM, BCD, and the stepwise procedures associated with the LRT, the traditional test,and the truncated SPRT under steep dose-toxicity curves

Proportion of selecting dose Number of toxicitiesave(IQR)

Sample sizeave(IQR)Design .00 1 2 3 4

Scenario T1 π .35 .45 .55 .60LRT .74 .22 .04 .00 .00 10(4,15) 25(8,36)

Trad .74 .22 .04 .00 .00 9(5,13) 22(10,32)

SPRT .67 .29 .04 .00 .00 11(3,16) 27(5,41)

CRM .74 .22 .03 .00 .00 8(7,10) 23(20,24)

BCD .75 .22 .02 .00 .00 13(11,15) 33(26,38)

Scenario T2 π .11 .25 .45 .50LRT .01 .37 .56 .05 .01 11(8,15) 39(30,47)

Trad .02 .40 .53 .05 .01 10(7,13) 35(27,42)

SPRT .03 .25 .67 .05 .01 10(5,15) 34(19,45)

CRM .01 .19 .68 .11 .01 8(7,9) 33(29,37)

BCD .01 .17 .73 .08 .00 11(9,14) 45(38,51)

Scenario T3 π .05 .10 .25 .50LRT .00 .01 .39 .59 .02 9(7,11) 40(35,45)

Trad .00 .01 .41 .55 .02 8(7,10) 35(31,40)

SPRT .00 .02 .25 .71 .02 9(4,11) 32(20,40)

CRM .00 .00 .14 .78 .08 7(6,9) 34(30,38)

BCD .00 .00 .18 .77 .04 10(8,12) 48(42,54)

Scenario T4 π .05 .05 .12 .25LRT .00 .00 .02 .38 .60 6(4,7) 37(34,41)

Trad .00 .00 .03 .41 .56 5(3,7) 33(29,36)

SPRT .00 .00 .03 .25 .72 5(2,7) 28(19,34)

CRM .00 .00 .00 .22 .77 6(4,7) 33(29,36)

BCD .00 .00 .01 .23 .75 7(5,8) 45(39,51)

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 12: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

1458 Journal of the American Statistical Association, December 2007

Table 6. Operating characteristics of the stepwise test associated with the truncated SPRT with given α0, 1 − α1 and Nmax = 10, andthe CRM under the least favorable configurations

Proportion of selecting dose Number of toxicitiesave(IQR)

Sample sizeave(IQR)(α0,1 − α1) .00 1 2 3 4

Scenario L0 π .45 .45 .45 .45SPRT(.10, .40) .88 .07 .03 .01 .01 7(3,11) 15(4,26)

SPRT(.08, .35) .91 .05 .02 .01 .01 7(3,11) 15(4,26)

SPRT(.12, .40) .85 .08 .04 .02 .01 7(3,11) 15(4,26)

CRM .84 .13 .02 .01 .00 8(7,10) 19(17,17)

Scenario L1 π .25 .45 .45 .45SPRT(.10, .40) .35 .55 .06 .03 .01 9(4,13) 25(13,40)

SPRT(.08, .35) .40 .53 .04 .02 .01 9(4,13) 26(16,40)

SPRT(.12, .40) .28 .59 .08 .04 .01 8(4,13) 24(13,40)

CRM .23 .66 .09 .02 .00 7(5,8) 22(17,26)

Scenario L2 π .25 .25 .45 .45SPRT(.10, .40) .30 .17 .45 .05 .02 8(4,13) 26(16,40)

SPRT(.08, .35) .34 .16 .44 .04 .02 9(4,13) 27(18,40)

SPRT(.12, .40) .21 .20 .49 .07 .03 8(4,13) 25(16,40)

CRM .20 .38 .33 .07 .01 7(5,8) 24(17,29)

Scenario L3 π .25 .25 .25 .45SPRT(.10, .40) .26 .15 .15 .40 .04 8(4,11) 26(17,40)

SPRT(.08, .35) .31 .13 .14 .39 .03 8(4,12) 27(18,40)

SPRT(.12, .40) .20 .15 .17 .43 .05 7(4,11) 26(17,39)

CRM .19 .35 .18 .24 .04 6(5,7) 24(17,30)

Scenario L4 π .25 .25 .25 .25SPRT(.10, .40) .24 .14 .11 .12 .38 6(3,9) 24(16,36)

SPRT(.08, .35) .27 .13 .11 .12 .37 6(3,9) 26(18,38)

SPRT(.12, .40) .20 .14 .12 .14 .40 6(3,8) 24(16,34)

CRM .20 .34 .17 .16 .13 6(5,7) 24(17,29)

Table 7. Operating characteristics of the stepwise test associated with the truncated SPRT with given α0, 1 − α1 and Nmax = 10, andthe CRM under shallow dose-toxicity curves

Proportion of selecting dose Number of toxicitiesave(IQR)

Sample sizeave(IQR)(α0,1 − α1) .00 1 2 3 4

Scenario H1 π .25 .35 .45 .60SPRT(.10, .40) .35 .39 .22 .04 .00 9(4,14) 27(16,40)

SPRT(.08, .35) .40 .37 .20 .03 .00 9(4,14) 27(17,40)

SPRT(.12, .40) .25 .44 .26 .05 .00 9(4,14) 26(16,40)

CRM .22 .54 .21 .03 .00 7(5,8) 23(17,28)

Scenario H2 π .17 .25 .35 .45SPRT(.10, .40) .20 .20 .35 .21 .03 9(5,13) 29(20,40)

SPRT(.08, .35) .25 .20 .33 .19 .03 9(5,13) 30(22,40)

SPRT(.12, .40) .07 .23 .40 .25 .04 9(5,13) 28(20,40)

CRM .04 .32 .42 .20 .03 7(5,8) 27(22,32)

Scenario H3 π .10 .20 .25 .40SPRT(.10, .40) .12 .11 .20 .46 .11 8(4,11) 29(20,40)

SPRT(.08, .35) .16 .11 .20 .44 .09 8(5,11) 30(23,40)

SPRT(.12, .40) .02 .11 .23 .50 .13 8(4,11) 29(20,40)

CRM .00 .10 .32 .46 .12 6(5,7) 30(26,33)

Scenario H4 π .05 .05 .18 .25SPRT(.10, .40) .05 .00 .09 .23 .63 5(2,7) 26(19,34)

SPRT(.08, .35) .06 .00 .08 .24 .61 5(3,8) 28(21,35)

SPRT(.12, .40) .00 .00 .09 .25 .65 5(2,7) 26(20,31)

CRM .00 .00 .02 .37 .61 5(4,6) 29(26,32)

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 13: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Cheung: Implementation of Stepwise Procedures 1459

Table 8. Operating characteristics of the stepwise test associated with the truncated SPRT with given α0, 1 − α1 and Nmax = 10, andthe CRM under steep dose-toxicity curves

Proportion of selecting dose Number of toxicitiesave(IQR)

Sample sizeave(IQR)(α0,1 − α1) .00 1 2 3 4

Scenario T1 π .35 .45 .55 .60SPRT(.10, .40) .68 .27 .05 .00 .00 8(3,14) 21(5,39)

SPRT(.08, .35) .72 .24 .04 .00 .00 9(3,14) 22(7,40)

SPRT(.12, .40) .64 .31 .05 .00 .00 8(3,14) 21(7,37)

CRM .57 .39 .04 .00 .00 7(6,8) 20(17,21)

Scenario T2 π .11 .25 .45 .50SPRT(.10, .40) .13 .23 .56 .06 .01 9(4,13) 28(18,40)

SPRT(.08, .35) .17 .24 .53 .05 .01 9(5,13) 29(21,40)

SPRT(.12, .40) .03 .28 .60 .08 .01 8(4,13) 28(18,40)

CRM .01 .21 .67 .11 .01 7(6,8) 29(25,33)

Scenario T3 π .05 .10 .25 .50SPRT(.10, .40) .08 .02 .25 .64 .02 8(4,11) 28(20,40)

SPRT(.08, .35) .09 .02 .26 .62 .01 8(5,11) 29(22,40)

SPRT(.12, .40) .00 .02 .29 .66 .02 7(4,10) 28(20,37)

CRM .00 .01 .15 .77 .08 7(5,8) 31(27,34)

Scenario T4 π .05 .05 .12 .25SPRT(.10, .40) .05 .01 .03 .26 .66 5(2,7) 26(19,32)

SPRT(.08, .35) .06 .00 .03 .27 .64 5(3,7) 27(21,35)

SPRT(.12, .40) .00 .00 .03 .27 .70 5(2,7) 25(20,31)

CRM .00 .00 .01 .24 .75 5(4,6) 30(26,32)

over 45% toxicity rate will be selected with less than .10 prob-ability” is assuring to clinicians and helps them appreciate sta-tistical inputs in a dose-finding study. Above all, the proposedstepwise procedures yield competitive operating characteristicsin a wide range of dose–toxicity probability configurations. Interms of implementation, the SPRT is slightly more cumber-some than the other two tests, but all three tests that we considerin this article are operationally simple and can be made accessi-ble to clinicians, because escalation and deescalation rules canbe tabulated numerically before the trial.

A potential weakness of the stepwise tests seems to be therandomness in sample size. But because the clinical investiga-tors are quite used to such randomness in the context of humandose-finding studies as in the traditional “3+3” algorithm, wedo not consider this a practical concern. The actual underly-ing criticism is perhaps the larger-than-typical sample size; forexample, a maximum of 80 patients for 4 dose levels in our il-lustrations seems to be greater than what is generally expectedin a human dose-finding trials, although the average sample sizeis quite feasible. In Section 5 we showed that the stepwise pro-cedures can be easily adapted to small-sample settings by ex-plicitly allowing a larger error rate and by applying truncationdirectly to the SPRT, while still providing competitive operatingcharacteristics to existing methods such as the CRM. This beingsaid, the proposed method allows investigators to use prespec-ified error rates to gauge the (in)adequacy of the sample size,whereas nowadays practice takes the converse approach, withsample size first determined based on somewhat arbitrary con-ventional expectations.

The primary focus of this article is on binomial outcomesas found in most clinical applications. However, the proper-ties of the two-stage stepwise procedure developed in Sec-

tion 2 are generally applicable to other outcome types. Con-sider, for example, the setting of reliability testing of a devicewith exponential failure times. Let μ1 ≤ · · · ≤ μK be the haz-ard rates of the K test levels, and consider the same familyof hypotheses as specified in Section 1 with H0i :μi ≥ φ ver-sus H1i :μi < φ, where φ is an unacceptably high hazard rate.Then the test regions Ril may be defined in terms of the ran-dom vector Xi (j) = {Yi(1), . . . , Yi(j)}, where Yi(j) is the par-tial sum of the first j observations in level i. It is then straight-forward to derive analogous results regarding the FWER andPCS in Section 2 under the unbiased condition that Pr(Ril |μi)

and Pr(Ri1 ∩ Ri2|μi) are nonincreasing in μi . The proposedstepwise test will be particularly practical in reliability testingof an expensive device, because fewer items may be tested atthe unpromising levels as a result of the sequential implemen-tation. In addition, because of the relatively large cohort, thenumber of experiment suspensions will be smaller than that in-duced by a fully sequential method, such as stochastic approx-imation.

APPENDIX: TECHNICAL DETAILS

Proof of Theorem 1b

It suffices to show that ∂j,K/∂pi ≥ 0 for i ≥ j , because

Pπ (νSDSU = m) ∝ m+1,K for m ≥ S1, ∝ ∏S1−1j=m+1(1 − βj2)S1,K

for m < S1, and j,K is free of pi for i < j . By mathematical in-duction, we can verify that m,K in (5) may be generated recur-sively as i,K = 1 − βi1 + (βi1 − γi)i+1,K for i = 1, . . . ,K , withK+1,K = 1. It is easy to see from the recursion that 0 ≤ i,K ≤ 1for all i, because 1 − βi1 ≤ i,K ≤ 1 − γi when 0 ≤ i+1,K ≤ 1 andK+1,K = 1. Therefore, for i > j ,

∂j,K

∂pi= (βj1 − γj )

∂j+1,K

∂pi= · · · =

i−1∏

k=j

(βk1 − γk)∂i,K

∂pi,

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 14: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

1460 Journal of the American Statistical Association, December 2007

with ∂i,K/∂pi = −(1 − i+1,K)∂βi1/∂pi − i+1,K∂γi/∂pi ≥ 0under Condition 1.

Proof of Theorem 2

Let πm = (p1, . . . , pK)T ∈ �m, that is, pm < φ, pi ≥ φ fori > m. First, for the case where m ≤ S1 − 1, define πm−1 =(p1, . . . , pm−1, φ,pm+1, . . . , pK)T ∈ �m−1. Let Qm,i denotePπm(νSDSU > m,S2 = i). Thus our goal is to show that

K∑

i=S1−1

Qm,i ≤K∑

i=S1−1

Qm−1,i for m = 1, . . . ,K.

First, Qm,S1−1 = Pπm(AS1,1 ∩ {⋃S1−1j=m+1 Rj2}) = Pπm−1 (AS1,1 ∩

{⋃S1−1j=m+1 Rj2}) ≤ Qm−1,S1−1. (Note that Qm,S1−1 = 0 ≤

Qm−1,S1−1 when m = S1 − 1.) For S1 ≤ i ≤ K − 1,

Qm,i = Pπm

(i⋂

j=S1

Rj1 ∩Ai+1,1 ∩{

i⋃

k=m+1

Rk2

})

= Pπm−1

(i⋂

j=S1

Rj1 ∩Ai+1,1 ∩{

i⋃

k=m+1

Rk2

})

≤ Pπm−1

(i⋂

j=S1

Rj1 ∩Ai+1,1 ∩{

i⋃

k=m

Rk2

})

= Qm−1,i .

Using a similar argument, we can show that Qm,K ≤ Qm−1,K , andthus verify (A.1).

In the second case when m ≥ S1, we consider πm−1 = (p1, . . . ,

pm−1,pm+1, . . . , pK,pK)T , and (A.1) becomes

K∑

i=m+1

Qm,i ≤K∑

i=m

Qm−1,i =K+1∑

i=m+1

Qm−1,i−1 for m = 1, . . . ,K,

because Qm,i = 0 when i ≤ m. Then, for m + 1 ≤ i ≤ K − 1,

Qm,i = Pπm

(i⋂

j=S1

Rj1 ∩Ai+1,1 ∩{

i⋃

k=m+1

Rk2

})

≤ Pπm

(m−1⋂

j=S1

Rj1 ∩i⋂

j=m+1

Rj1 ∩Ai+1,1 ∩{

i⋃

k=m+1

Rk2

})

= Pπm−1

(m−1⋂

j=S1

Rj1 ∩i−1⋂

j=m

Rj1 ∩Ai1 ∩{

i−1⋃

k=m

Rk2

})

= Qm−1,i−1.

To complete the proof, we have

Qm−1,K + Qm−1,K−1

= Pπm−1

(K⋂

j=S1

Rji ∩{

K⋃

k=m

Rk2

})

+ Pπm−1

(K−1⋂

j=S1

Rj1 ∩AK1 ∩{

K−1⋃

k=m

Rj2

})

≥ Pπm−1

(K⋂

j=S1

Rj1 ∩{

K−1⋃

k=m

Rk2

})

+ Pπm−1

(K−1⋂

j=S1

Rj1 ∩AK1 ∩{

K−1⋃

k=m

Rk2

})

= Pπm−1

(m−1⋂

j=S1

Rj1 ∩K−1⋂

j=m

Rj1 ∩{

K−1⋃

k=m

Rk2

})

= Pπm

(m−1⋂

j=S1

Rj1 ∩K⋂

j=m+1

Rj1 ∩{

K⋃

k=m+1

Rk2

})

≥ Qm,K.

Thus (A.1) holds for all m.

Proof of Theorem 3

Given that {S2 = i} occurs for some i between m and K − 1, theexpected number of patients treated at dose i + 1 is E(ni+1,1|Ai+1,1)

and that at dose j (between m + 1 and i) is either E(nj1|Rj1) orE(nj2|Rj1). Therefore,

Eπ∗m(Tm+1|S2 = i)

≤ Eφ(ni+1,1|Ai+1,1) +i∑

j=m+1

Eφ(nj2|Rj1)

= Eφ(ni+1,1|Ai+1,1) + (i − m)Eφ(ni+1,2|Ri+1,1)

≡ n1,A + (i − m)n2,R.

Similarly, Eπ∗m(Tm+1|S2 = K) ≤ (K − m)n2,R. The conditional dis-

tribution of S2 given S2 ≥ m is Pπ∗m(S2 = i|S2 ≥ m) = (1 − ε1)εi−m

1for m ≤ i ≤ K − 1 and = εK−m

1 for i = K . The proof is then com-pleted by direct computation,

E∗m(Tm+1) =

K∑

i=m

Pπ∗m(S2 = i|S2 ≥ m)Eπ∗

m(Tm+1|S2 = i)

≤K−1∑

i=m

(1 − ε1)εi−m1 {n1,A + (i − m)n2,R}

+ εK−m1 (K − m)n2,R

= (1 − εK−m1 )n1,A + ε1(1 − ε1)

(K−m−1∑

i=1

iεi−11

)

n2,R

+ εK−m1 (K − m)n2,R

= (1 − εK−m1 )n1,A + ε1(1 − εK−m

1 )n2,R/(1 − ε1)

= (1 − εK−m1 )

{(1 − ε1)n1,A + ε1n2,R

}/(1 − ε1)

= (1 − εK−m1 )Eφ(N)/(1 − ε1).

Proof of Lemma 1

We observe that aφ is decreasing in ρ1 and ζ and that bφ

is decreasing in ρ2 and ζ . Thus the proof of Lemma 1a can beachieved by showing that ∗

m,Kis decreasing in aφ and bφ for

m = 1, . . . ,K . This is obvious when m = K , because ∗K,K

=1 − bφ . Now for m ≤ K − 1, we take the derivative of ∗

m,K

with respect to aφ and get ∂∗m,K

/∂aφ = −bφAm/(1 − aφ + bφ)2,where

Am = 1 − (K − m + 1)(1 − aφ + bφ)(aφ − bφ)K−m

− (aφ − bφ)K−m+1

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14

Page 15: Sequential Implementation of Stepwise Procedures for Identifying the Maximum Tolerated Dose

Cheung: Implementation of Stepwise Procedures 1461

≤ 1 − (K − m + 1)(1 − aφ + bφ)(aφ − bφ)K−m+1

− (aφ − bφ)K−m+1

= 1 − (K − (m − 1) + 1

)(1 − aφ + bφ)(aφ − bφ)K−(m−1)

− (aφ − bφ)K−(m−1)+1

= Am−1.

Because AK−1 = (1 − aφ + bφ)2 ≥ 0, we have that Am ≥ 0, and thus∂∗

m,K/∂aφ ≤ 0 for all m ≤ K . In addition, we have

∂∗m,K

∂bφ= − (1 − aφ){1 − (aφ − bφ)K−m+1}

(1 − aφ + bφ)2

− (K − m + 1)bφ(aφ − bφ)K−m

1 − aφ + bφ

≤ 0.

The derivations of limρ2→∞ ∗m,K

= 1 and Lemma 1b are straight-forward.

To prove Lemma 1c, we obtain from (4) that P ∗m/P ∗

K= ∗

m+1,K/

aK−mθ for m ≥ S1 and = (1−bφ)S1−m−1∗

S1,K/a

K−S1θ for m ≤ S1 −

1. Because aθ < 1 and bφ → 0 as ρ2 → ∞, P ∗m/P ∗

Kwill be greater

than 1 for sufficiently large ρ2 provided that S1 < K .

Proof of Theorem 4

To prove Theorem 4a, suppose that ζ > ζ and express P ∗K

= P ∗K

(ζ,

ρ1, ρ2) as an explicit function of the design parameters. It then canbe verified that P ∗

K(ζ, ρ(ζ ), ρ(ζ )) = {1 − ζ(1 − α0)1/S1 }K−S1+1 <

{1− ζ (1−α0)1/S1 }K−S1+1 = 1−α1. By Lemma 1b, P ∗K

(ζ,ρ1, ρ2) <

1−α1 for all ρ2 ≥ ρ1 ≥ ρ(ζ ). In other words, ρ1 < ρ(ζ ) is a necessarycondition for the error constraints to hold.

To prove Theorem 4b, we observe that P ∗0 = 1 − α0 at ξ . There-

fore, by Lemma 1a, P ∗0 (ζ, ρ(ζ ), ρ(ζ )) < 1 − α0 when ζ < ζ . In other

words, P ∗0 (ζ, ρ1, ρ2) ≥ 1 − α0 implies that ρ2 ≥ ρ(ζ ), which in turns

induces a larger Eφ(N) than ξ . A similar argument can be made tocomplete the proof by considering the case where ζ = ζ .

[Received September 2006. Revised May 2007.]

REFERENCES

Cheung, Y. K. (2005), “Coherence Principles in Dose-Finding Studies,” Bio-metrika, 92, 863–873.

Durham, S. D., Flournoy, N., and Rosenberger, W. F. (1997), “A Random-WalkRule for Phase I Clinical Trials,” Biometrics, 53, 745–760.

Hoel, D. G. (1970), “On the Monotonicity of the OC of an SPRT,” The Annalsof Mathematical Statistics, 41, 310–314.

Hsu, J. C. (1981), “Simultaneous Confidence Intervals for All Distances Fromthe Best,” The Annals of Statistics, 9, 1026–1034.

(1984), “Constrained Simultaneous Confidence Intervals for MultipleComparisons With the Best,” The Annals of Statistics, 12, 1136–1144.

Hsu, J. C., and Berger, R. L. (1999), “Stepwise Confidence Intervals WithoutMultiplicity Adjustment for Dose-Response and Toxicity Studies,” Journal ofthe American Statistical Association, 94, 468–482.

Lehmann, E. L. (1986), Testing Statistical Hypothesis (2nd ed.), New York:Wiley.

Lin, Y., and Shih, W. J. (2001), “Statistical Properties of the Traditional Algo-rithm-Based Designs for Phase I Cancer Trials,” Biostatistics, 2, 203–215.

Marcus, R., Peritz, E., and Gabriel, K. R. (1976), “Closed Testing ProceduresWith Special Reference to Ordered Analysis of Variance,” Biometrika, 63,655–660.

O’Quigley, J., and Shen, L. Z. (1996), “Continual Reassessment Method:A Likelihood Approach,” Biometrics, 52, 673–684.

O’Quigley, J., Pepe, M., and Fisher, L. (1990), “Continual ReassessmentMethod: A Practical Design for Phase I Clinical Trials in Cancer,” Biomet-rics, 46, 33–48.

Shen, L. Z., and O’Quigley, J. (1996), “Consistency of Continual ReassessmentMethod Under Model Misspecification,” Biometrika, 83, 395–405.

Storer, B. E. (1989), “Design and Analysis of Phase I Clinical Trials,” Biomet-rics, 45, 925–937.

Storer, B., and DeMets, D. (1987), “Current Phase I/II Designs: Are They Ad-equate?” Journal of Clinical Research and Drug Development, 1, 121–130.

Stylianou, M., and Flournoy, N. (2002), “Dose Finding Using the Biased CoinUp-and-Down Design and Isotonic Regression,” Biometrics, 58, 171–177.

Tamhane, A. C., Dunnett, C. W., Green, J. W., and Wetherington, J. D. (2001),“Multiple Test Procedures for Identifying the Maximum Safe Dose,” Journalof the American Statistical Association, 96, 835–843.

Tamhane, A. C., Hochberg, Y., and Dunnett, C. W. (1996), “Multiple Test Pro-cedures for Dose Finding,” Biometrics, 52, 21–37.

Teal, P., Silver, F. L., and Simard, D. (2005), “The BRAINS Study: Safety,Tolerability, and Dose-Finding of Repinotan in Acute Stroke,” The CanadianJournal of Neurological Sciences, 32, 61–67.

Wald, A. (1945), “Sequential Tests of Statistical Hypotheses,” The Annals ofMathematical Statistics, 16, 117–186.

Wald, A., and Wolfowitz, J. (1948), “Optimum Character of the SequentialProbability Ratio Test,” The Annals of Mathematical Statistics, 19, 326–339.

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

6:26

12

Nov

embe

r 20

14