11

Click here to load reader

Optimal design of clinical trials comparing several treatments with a control

Embed Size (px)

Citation preview

Page 1: Optimal design of clinical trials comparing several treatments with a control

PHARMACEUTICAL STATISTICS

Pharmaceut. Statist. 2007; 6: 23–33

Published online 22 August 2006 in Wiley InterScience

(www.interscience.wiley.com) DOI: 10.1002/pst.240

Optimal design of clinical trials

comparing several treatments

with a control

Ian C. Marschner*,y

Asia Biometrics Center, Pfizer Australia, West Ryde, NSW, Australia

Clinical trials are often designed to compare several treatments with a common control arm in

pairwise fashion. In this paper we study optimal designs for such studies, based on minimizing the

total number of patients required to achieve a given level of power. A common approach when

designing studies to compare several treatments with a control is to achieve the desired power for each

individual pairwise treatment comparison. However, it is often more appropriate to characterize

power in terms of the family of null hypotheses being tested, and to control the probability of rejecting

all, or alternatively any, of these individual hypotheses. While all approaches lead to unbalanced

designs with more patients allocated to the control arm, it is found that the optimal design and

required number of patients can vary substantially depending on the chosen characterization of power.

The methods make allowance for both continuous and binary outcomes and are illustrated with

reference to two clinical trials, one involving multiple doses compared to placebo and the other

involving combination therapy compared to mono-therapies. In one example a 55% reduction in

sample size is achieved through an optimal design combined with the appropriate characterization of

power. Copyright # 2006 John Wiley & Sons, Ltd.

Keywords: clinical trials; combination therapy; multiple comparisons; optimal design; power;

unbalanced allocation

1. INTRODUCTION

In some clinical trials the primary objective is tomake pairwise comparisons of k52 treatment

arms with a common control arm. Practicalexamples of this design, to be discussed later inthe paper, include the comparison of a combina-tion therapy with each of its constituent mono-therapies, and the comparison of multiple doses ofa drug with placebo. The design leads to acollection of k primary comparisons between thecommon control arm and the other treatmentyE-mail: [email protected]

*Correspondence to: Ian C. Marschner, Asia BiometricsCenter, Pfizer Australia, 38-42 Wharf Rd, West Ryde, NSW2114, Australia.

Copyright # 2006 John Wiley & Sons, Ltd.

Page 2: Optimal design of clinical trials comparing several treatments with a control

arms, with no primary comparisons between thenon-control arms. In this paper we will exploreoptimal design of such studies through the useof unbalanced allocation to the treatment andcontrol arms. Unlike previous research in thisarea, we will discuss optimization of the design interms of the power for testing hypotheses ratherthan the precision for estimating treatment effects,hence making the results more relevant to thedesign of typical clinical trials.

For contexts involving many-to-one compari-sons, such as comparing several treatments witha control, an appropriately chosen unbalanceddesign may be more efficient than a balanceddesign, even under homoscedasticity. As a simpleexample, consider a trial with 300 patients in totaland three treatment arms including a commoncontrol. If we divide the patients evenly betweenthe three arms, then each of the two primarycomparisons with the control arm will involve 200patients. Alternatively, if we allocate 150 patientsto the control and 75 to each of the other twoarms, then the primary comparisons will eachinvolve 225 patients. Since an unbalanced compar-ison involving 225 patients may be more efficientthan a balanced comparison involving 200patients, the potential exists for the unbalanceddesign to be more efficient. Fleiss [1] confirmedthis, observing that the variance of the treatmentdifferences is minimized by allocating a proportiong� ¼ 1=ð1þ

ffiffiffik

pÞ of the total sample size to the

control arm, with the remaining sample sizebeing divided evenly between the non-controlarms. Since g* exceeds 1/(1+k) the optimal designallocates more patients to the control arm than thebalanced design. The approach of minimizingsome measure of the dispersion of the treatmentdifferences can be extended using more generaloptimality criteria, and most work has focused onthe so-called ‘alphabet’ optimality criteria (D, A,DA, MV etc.) [2,3].

While criteria based on minimum variance,or maximum information, are common in manyoptimal design problems, in clinical trials thesample size is generally determined based onachieving a desired level of power for testinghypotheses, rather than a desired level of precision

for estimating treatment effects. It therefore makessense to maximize power rather than minimizevariance. Although the two approaches are clearlyrelated, they are not equivalent when appropriateallowance is made for the multiple comparisonsbeing conducted and the specific form of thehypotheses being tested. In particular, since thehypotheses to be tested can be stated in variousways when making multiple treatment compar-isons against a common control, there are variouscharacterizations of power that can be used indesigning such a study. These alternative char-acterizations have an important effect on thecorresponding designs, which can be substantivelydifferent to those based on minimum varianceconsiderations. In the next section we consider thetypes of hypotheses and characterizations ofpower that can be used when comparing severaltreatments with a control.

2. TYPES OF POWER

Throughout we will use the term ‘control’ to meana common comparator for all treatment compar-isons, which includes situations in which thecommon comparator is an experimental treatment,such as when a combination therapy is comparedto its constituent mono-therapies. We will beconcerned with continuous or binary outcomeswhere the effect of treatment i ¼ 1; . . . ; k ismeasured by the mean response or success prob-ability in treatment i minus the mean responseor success probability in the control arm ði ¼ 0Þ:Thus, the measure of treatment effect for treat-ment i is yi � y0; i ¼ 1; . . . ; k; where the y’s areeither means of continuous outcomes or successprobabilities.

When comparing k treatments with a controlthere are at least three different ways to frame thehypotheses, and correspondingly three differentways to characterize power. The appropriateapproach in any particular context will dependon the objectives of the study, and in Section 6 weconsider illustrations of this. Firstly, each indivi-dual comparison can be considered in isolation,

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

24 I. C. Marschner

Page 3: Optimal design of clinical trials comparing several treatments with a control

leading to a collection of k null hypotheses.Throughout this paper we will focus on one-sidedtesting situations, which means the collection ofnull hypotheses and associated alternative hypoth-eses can be stated as

H0i : yi � y040 versus

H1i : yi � y0 > 0 ði ¼ 1; . . . ; kÞ ð1Þ

In contrast to the collection of hypotheses in (1), asingle null and alternative hypothesis can bedefined in either of the following ways:

H0 : All of H0i are true versus

H1 : At least one of H0i is false ð2Þ

H0 : All of H0i are true versus

H1 : All of H0i are false ð3Þ

Under scenarios (1) and (2), the study would beconsidered positive if statistical significance wasachieved on any of the individual comparisons. Inboth of these situations it is therefore necessary totake account of the multiple comparisons beingconducted and to adjust the comparison-wisesignificance level accordingly. A crude way ofdoing this is through the Bonferroni procedure,which would use a comparison-wise significancelevel of a/k, to ensure a family-wise level of a. Apreferred method which will be used in ourcalculations is the Dunnett procedure [4,5]. Likethe Bonferroni procedure, the Dunnett procedureprovides strong control of the family-wisesignificance level by controlling type I error foreach component hypothesis. Step-wise procedures,such as the Hochberg and step-down Dunnettprocedures [5], are also available in this contextbut we will not study them in this paper. The termunion-intersection testing applies to tests ofhypotheses of the form (2).

Under scenario (3), which is referred to asintersection–union testing, the study would beconsidered positive only if statistical significanceis achieved on all of the individual comparisons,so no adjustment of comparison-wise significancelevel is required [6]. This fact is acknowledgedin clinical trial guidelines issued by regulatoryagencies [7,8].

Corresponding to the three scenarios, weconsider three characterizations of power, usingterminology proposed by Westfall et al. [5].Individual power, also sometimes called per-pairpower, refers to the probability of rejecting aparticular H0i that is false, and corresponds toscenario (1). Minimal power, also sometimescalled any-pair power, refers to the probabilityof rejecting at least one H0i that is false, andcorresponds to scenario (2). Complete power, alsosometimes called all-pairs power, refers to theprobability of rejecting all H0i that are false, andcorresponds to scenario (3). The appropriatenessof each of these types of power will depend onthe context and objectives of particular studies. Inpractice, the default approach is typically tobase the design on the individual power of eachcomparison. In this particular case, Zhu andWong [3] have provided some comments on therelationship between individual power and mini-mum variance optimality criteria. However, inmost clinical trials comparing several treatmentswith a control, minimal or complete power will bemore relevant than individual power. While thethree characterizations of power have been studiedpreviously in both this and other contexts [5,9,10],optimal study design based on these types ofpower has not previously been considered forthe design of clinical trials and in the remainderof the paper we will focus on this.

3. OPTIMAL DESIGNS

In this section we consider each of the versions ofpower and how they can be used to optimize thestudy design. Our goal is to find designs thatmaximize the power for given sample size, orequivalently that minimize sample size for givenpower. The optimality criteria is therefore basedon minimization of the total size of the studyN subject to achieving a desired level of power1�b, where power can refer to any of the threeversions discussed in Section 2. We considerunbalanced designs that allocate a proportiong of the total sample size to the control arm with

Several treatments compared with a control 25

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

Page 4: Optimal design of clinical trials comparing several treatments with a control

non-control allocations being (1�g)/k. For thepresent discussion we assume a homoscedasticNormal outcome for patient j in treatment groupi, Xij ¼d Nðmi;s

2Þ where mi ¼ mþ D for the non-control arms i ¼ 1; . . . ; k and m0 ¼ m for thecontrol arm i ¼ 0: This leads to a common effectsize d ¼ D=s and corresponding to each individualone-sided null hypothesis H0i : mi � m040 there isa Normally distributed test statistic

Zi ¼%Xi � %X0

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

gNþ

k

ð1� gÞN

r ¼ð %Xi � %X0Þ

s=ffiffiffiffiN

p CkðgÞ

where %Xi are the treatment group sample meansand

CkðgÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffigð1� gÞ

1þ gðk� 1Þ

sð4Þ

The expressions for power and sample size to bepresented below make use of the fact that Zi hasthe following distribution that reduces to standardNormal under mi ¼ m0:

Zi¼dNfdCkðgÞ

ffiffiffiffiN

p; 1g ð5Þ

Use of the Normally distributed test statistics Zi

depends on the variance being known. Knownvariance is a typical assumption during the designstage of a study and we will not consider makinguse of the t-distribution in our design calculations.This would be expected to have minimal effect onthe resulting optimal designs, particularly insamples of at least moderate size. In regards toother modelling assumptions, we note that themethods to be discussed can be extended to coverunequal treatment effects and heteroscedasticity,and this will be discussed in Section 7. They canalso be modified to account for binary outcomes,as discussed in Section 5.

The following expressions for power arise froma one-sided test with family-wise significance levela. We consider single step multiple comparisonsprocedures for which, corresponding to the family-wise significance level a there is a comparison-wiselevel Aða; k; gÞ which makes adjustment for the kmultiple comparisons being undertaken. As notedin Section 2, for complete power no such adjust-

ment is necessary and Aða; k; gÞ ¼ a: For minimalor individual power, where such adjustment isnecessary, then Bonferroni correction will yieldAða; k; gÞ ¼ a=k which does not depend on g,whereas Dunnett correction yields an expressionthat depends on all three arguments. In thiscase (see [5,11]), assuming s is known andf ¼ g=ð1� gÞ is the odds of being in the controlarm, then Aða; k; gÞ ¼ 1� FfY�1k;gð1� aÞg where

Yk;gðxÞ ¼Z 1�1

Flgzþ xffiffiffiffiffiffiffiffiffiffiffiffiffi1� l2g

q8><>:

9>=>;

k

dFðzÞ and

l2g ¼1

1þ fkð6Þ

Using the chosen form of A, the three versionsof power can be determined. Individual powerrefers to the probability that, for given d, aparticular Zi will be sufficiently large to rejectH0i, or Pr½Zi5F�1f1� Aða; k; gÞg�: Individualpower therefore depends only on the univariatedistribution of each individual test statistic, whichunder the null hypothesis is standard Normal withdistribution function FðxÞ: This leads to thefollowing expression for individual power as afunction of N and g:

PI ðN; gÞ ¼ 1� F½F�1f1� Aða; k; gÞg

� dCkðgÞffiffiffiffiN

p� ð7Þ

Minimal power refers to the probability that atleast one Zi will be sufficiently large to reject H0

in favour of H1 in (2), or 1� Pr½Z15F�1f1� Aða; k; gÞg; . . . ;Zk5F�1f1� Aða; k; gÞg�: Minimalpower therefore depends on the multivariatedistribution of all k test statistics. Under the nullhypothesis this multivariate distribution is ak-dimensional Normal distribution with mean zero,unit variance and correlation l2g ; as defined in (6).Denoting the associated multivariate distributionfunction by Fk;gðz1; . . . ; zkÞ; then Yk;gðzÞ ¼ Fk;g

ðz; . . . ; zÞ [11]. This leads to the following expressionfor minimal power, as a function of N and g:

PMðN; gÞ ¼ 1�Yk;g½F�1f1� Aða; k; gÞg

� dCkðgÞffiffiffiffiN

p� ð8Þ

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

26 I. C. Marschner

Page 5: Optimal design of clinical trials comparing several treatments with a control

Complete power refers to the probability that allZi will be sufficiently large to reject H0 in favour ofH1 in (3), or PrfZ15F�1ð1� aÞ; . . . ;Zk5F�1

ð1� aÞg: Thus, using the same rationale as forminimal power, and the fact that the multivariateNormal distribution is centrally symmetric [12],complete power as a function of N and g is

PCðN; gÞ ¼ Yk;gf�F�1ð1� aÞ þ dCkðgÞffiffiffiffiN

pg ð9Þ

Setting (7)–(9) to 1�b, a general expression arisesfor the sample size as a function of g:

NðgÞ ¼Uða; k; gÞ þ Vðb; k; gÞ

dCkðgÞ

� �2

ð10Þ

The appropriate forms for U and V in (10) are asfollows: for individual power Uða; k; gÞ ¼F�1f1� Aða; k; gÞg and Vðb; k; gÞ ¼ F�1ð1� bÞ;for minimal power Uða; k; gÞ ¼ F�1f1� Aða; k; gÞg and Vðb; k; gÞ ¼ Y�1k;gð1� bÞ; and for completepower Uða; k; gÞ ¼ F�1ð1� aÞ and Vðb; k; gÞ ¼Y�1k;gð1� bÞ:

Using (10), the optimal design is then deter-mined by the value g* that minimizes N(g).Although a proof is lacking for the general case,we have found N(g) to be convex in all calcula-tions, leaving little difficulty in undertakingnumerical minimization. The calculations includedin this paper were performed using R software[13], which includes a multivariate Normal dis-tribution function pmvnorm in the mvtnorm add-on package. For individual power, algebraicminimization is possible if U does not depend ong, such as for Bonferroni correction, and leads tothe optimal design of Fleiss [1] discussed in Section1. If U does depend on g, such as for Dunnettcorrection, or if minimal or complete power isused, then the optimal design will not coincidewith that of Fleiss and may be quite different.

4. SUB-OPTIMAL DESIGNS

While the above optimality criteria will be widelyappropriate, in some contexts it will be of interestto use a sub-optimal design due to non-statisticalconsiderations. For example, one of the studies to

be discussed in Section 6 has the feature that it isdesirable, for administrative reasons, to allocate asmany patients as possible to the control arm. It istherefore of interest to determine how far thecontrol proportion g can be increased beyond theoptimum value g* without unacceptably increasingthe sample size. The definition of an unacceptableincrease in sample size will depend on thecontext and the available resources. However,since the default design will often be based onbalanced allocation, a useful guide in such situa-tions is to determine how far g can be increasedbeyond g* such that the total sample size N doesnot exceed that required for a balanced design.Adopting this strategy one finds the sub-optimalvalue gs5g� with NðgsÞ ¼ Nf1=ð1þ kÞg usingequation (10).

For individual power using Bonferroni correc-tion, the dependence of N on g enters only throughCkðgÞ: It can be seen from (4) that Ckð0:5Þ ¼Ckf1=ð1þ kÞg which leads to an intriguing resultfor individual power, namely, that the requiredsample size under a balanced design is the same asthat under a design that allocates 50% of patientsto the control arm. This is true regardless of thevalue of k. Even though the result is exact onlywhen Aða; k; gÞ is independent of g (e.g. Bonferronicorrection), it is generally a very good approxima-tion under Dunnett correction. This means thatthe proportion allocated to the control arm can beincreased to as high as 50% without compromisingthe sample size relative to a balanced design whenindividual power is used. Of course, if there are nonon-statistical considerations warranting a large gthen the optimal design based on individualpower, g� ¼ 1=ð1þ

ffiffiffik

pÞ; will be preferable and

will occur somewhere between the balanced and50% allocation designs by virtue of the inequality1=ð1þ kÞ41=ð1þ

ffiffiffik

pÞ40:5:

Under the alternative types of power, minimaland complete, the above result will not hold sincethe multivariate distribution of the test statisticsdepends on g. However, the same principle can befollowed in that a maximum g can be determinedfrom (10) such that no extra sample size is requiredover and above a balanced design. In this case itmay be that gs can be even higher than 50%

Several treatments compared with a control 27

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

Page 6: Optimal design of clinical trials comparing several treatments with a control

without increasing the total sample size relative toa balanced design.

While Section 6 will illustrate the situationwhere it is desirable to maximize the proportionallocated to the control arm, in other contexts itmay be of interest to minimize the proportion ofpatients on the control arm. This may arise insome placebo-controlled studies where there is anethical imperative to maximize the number ofpatients on active treatment. Equation (10) can beused to explore the extra sample size arising fromg4g�: In this case the balanced design may bepreferred, or else designs with lower sample size,1=ð1þ kÞ5g5g�; or higher sample size, 05g51=ð1þ kÞ; than the balanced design can be explored.

5. BINARY OUTCOMES

When the outcome is binary then a similarapproach can be applied after modification ofthe multivariate distribution of the test statisticsand modification of equation (10) to makeallowance for the differing treatment effect var-iances under the null and alternative hypotheses.Hasselblad and Allen [14] have previously con-sidered the power of various tests for comparingseveral treatments to a common control based onbinary outcomes. While they considered a widerrange of tests than considered here, an assumptionof equal sample sizes was made and so our primaryfocus of optimizing the design through unbalancedsample sizes was not considered by Hasselblad andAllen [14].

Let the outcome Xij for patient j in treatmentgroup i be binary with success probability pi wherepi ¼ pþ D for the non-control arms i ¼ 1; . . . ; kand p0 ¼ p for the control arm i ¼ 0: This leads toa common treatment difference D between thecontrol and non-control arms, and the standardtest statistic corresponding to each individual one-sided null hypothesis H0i : pi � p040 is

Zi ¼#Pi � #P0ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

pð1� pÞgN

þpð1� pÞkð1� gÞN

r ¼ð #Pi � #P0Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1� pÞ=N

p CkðgÞ

where #Pi is the proportion of successes intreatment group i. The variance of Zi is given bythe ratio of the treatment effect variance under thealternative hypothesis, s21; to that under the nullhypothesis, s20:

R2ðr; k; gÞ ¼s21s20¼

pð1� pÞgN

þpið1� piÞkð1� gÞN

� �=

pð1� pÞgN

þpð1� pÞkð1� gÞN

� �¼

1þ frk1þ fk

where f ¼ g=ð1� gÞ is the odds of being in thecontrol treatment group, and

r ¼pið1� piÞpð1� pÞ

¼ 1þ dfsðc�1 � cÞ � dg

using the notation s2 ¼ pð1� pÞ for the outcomevariance, d ¼ D=s for the effect size and c ¼p=ð1� pÞ for the odds of success in the controlgroup. The discussion of sample size presentedbelow will make use of the fact that Zi has thefollowing distribution that reduces to standardNormal under pi ¼ p0:

Zi¼dNfdCkðgÞ

ffiffiffiffiN

p;R2ðr; k; gÞg ð11Þ

Comparison of the distribution in (11) for binaryendpoints, with the distribution in (5) for contin-uous endpoints, shows a similar form withmodified variance arising from the fact that r 6¼ 1under the alternative hypothesis.

As for continuous endpoints, use of the teststatistic Zi and its associated sampling distributionrequires assumptions about parameters that de-termine the variance under the null and alternativehypotheses. This requires assumptions about thepi, which is standard at the design stage of a studywith binary outcomes.

Based on the distribution in (11), a similarargument can be used to that outlined in Section 3to provide a modified form of NðgÞ for binaryoutcomes, which is analogous to equation (10) forcontinuous outcomes:

NðgÞ ¼Uða; k; gÞ þ Rðr; k; gÞVðb; k; gÞ

dCkðgÞ

� �2

ð12Þ

Equation (12) makes use of the same definitionsof U and V that were used in (10), with the

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

28 I. C. Marschner

Page 7: Optimal design of clinical trials comparing several treatments with a control

exception that the multivariate Normal distribu-tion Fk;g; and the function Yk;g; has l2g ¼ 1=ð1þfrkÞ rather than the 1=ð1þ fkÞ correlation it hasfor continuous outcomes.

As in Section 3, the optimal design is determinedfrom (12) by the value gn that minimizes NðgÞ:Likewise, the discussion of sub-optimal designs inSection 4 is similarly pertinent to the context ofbinary outcomes although the 50% rule is not anexact result for binary outcomes. In terms ofthe form of the optimal design, this will not bethe same as the design of Fleiss [1] under anyof the characterizations of power. However,although not exact, the design g ¼ 1=ð1þ

ffiffiffik

generally provides a good approximation to theoptimal design when individual power is used. Forother characterizations of power the optimaldesign may be substantively different, as forcontinuous outcomes.

6. EXAMPLES

We consider two numerical examples correspond-ing to real life clinical trials that motivatedconsideration of these methods.

6.1. Example 1

The first example is a double-blind placebo-controlled trial of four doses of atorvastatin, adrug widely used to improve cholesterol levels, forimproving bone mineral density (BMD) in womenat risk of osteoporosis. The trial is a proof ofconcept trial that would be used as a basis tolaunch studies of fracture rates, and thereforewould be successful if any of the atorvastatin dosesshowed evidence of improving the continuousBMD outcome compared to placebo. It is there-fore an example of a trial where minimal powercorresponding to (2) can be used as the basis of thedesign. Corresponding to the five treatment arms,we have k ¼ 4 comparisons of each atorvastatindose with the placebo control. Using per centincrease in BMD as the outcome measure, we willexplore designs that provide 90% power to detect

a mean difference of 2%, assuming a standarddeviation of 3.5%, at a one-sided significance levelof 2.5%, adjusting for multiple comparisons by theDunnett method.

Based on these assumptions, Figure 1 shows thedependence of total sample size on the propor-tion of patients allocated to the control arm. Abalanced design using minimal power wouldrequire N ¼ 245 (49 patients in each arm),compared to N ¼ 190 using optimal allocation ofg� ¼ 41% (78 patients on placebo and 28 on eachdose of active treatment). Alternatively, a ba-lanced design using individual power wouldrequire N ¼ 425 (85 patients in each arm),compared to N ¼ 388 using optimal allocation ofg� ¼ 33% (128 patients on placebo and 65 on eachdose of active treatment). Note that the optimalproportion allocated to the control arm is the sameas the minimum variance design when individualpower is used, 1=ð1þ

ffiffiffi4

pÞ ¼ 33%; but is larger

when minimal power is used (41%). Importantly,the balanced design has a sample size excess of29% based on minimal power, and an excessof 10% based on individual power. Comparedto a balanced design based on individualpower, which would likely be the defaultdesign in most circumstances, the use of a designbased on minimal power with optimal allocationwould lead to a reduction of 55% in the requiredsample size (190 compared to 425). This representsa substantial saving in resources, obtainedthrough the joint effects of using the appropriatecharacterization of power and the optimalchoice of design. Although there may be someconcern about increasing the proportion of pa-tients allocated to placebo (41% under the optimaldesign compared to 20% under the balanceddesign), the optimal design will still have morethan half of all patients allocated to activetreatment, which would typically be consideredethically acceptable.

Although of less interest in this placebo-controlled example, Figure 1 does confirm theresult described in Section 4 that when individualpower is used then a design with 50% allocation tocontrol will have approximately the same samplesize as a balanced design. For minimal power an

Several treatments compared with a control 29

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

Page 8: Optimal design of clinical trials comparing several treatments with a control

even higher proportion (66%) can be allocatedto the control without increasing the sample sizerelative to a balanced design. In a placebo-controlled trial this is of only theoretical interestand would not be worth pursuing; however, inExample 2 we consider a situation where max-imizing the proportion of patients on the controlarm has important non-statistical advantages thatare worth pursuing.

In order to explore the dependence of efficiencygains on the number of treatment comparisonsand the type of power, Figure 2 provides acomparison of the sample size excess needed forthe balanced design compared to the optimaldesign. Although the main example had k ¼ 4active treatment arms, calculations for Figure 2have been performed assuming this number rangesfrom k ¼ 2 to 5. It is seen that the gains tend to begreatest when minimal power is used, and whenthere are a greater number of treatment compar-isons. Note that if the optimal minimal powerdesign is compared to the default design usingindividual power and balanced allocation, then thegains are even more impressive. Complete power

was not of interest in this example but the resultsare included in Figure 2 for comparison.

6.2. Example 2

The second example is a study of a combinationtherapy, atorvastatin for cholesterol and amlodi-pine for blood pressure, formulated into a singlepill and compared with each of its constituentmono-therapies. Since the combination therapyis only useful if it is superior to both of itsconstituents, this is an example of a trial wherecomplete power is appropriate for determining thedesign. Corresponding to the three treatmentarms, we have k ¼ 2 comparisons of each mono-therapy with the combination. An importantfeature of this study is that the researchersconsider it desirable to maximize the number ofpatients receiving the combination, as this willmake the study more attractive to patients andwill give treating physicians wider experience withthe new combination.

For this study a patient is considered to haveresponded if they achieve both the target choles-

Proportion allocated to placebo

Tota

l sam

ple

size

0.2 0.4 0.6 0.8

200

300

400

500

minimal powerindividual power

balanced designsoptimal designs

Figure 1. Plot of total sample size required to achieve 90% power in Example 1, as a function of the

proportion of patients allocated to placebo.

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

30 I. C. Marschner

Page 9: Optimal design of clinical trials comparing several treatments with a control

terol level and the target blood pressure level.Based on this binary outcome we will exploredesigns that provide 80% power to establishsuperiority of the combination assuming a combi-nation response rate of 50% and a mono-therapyresponse rate of 30%, with one-sided significancelevel of 2.5%.

Based on these assumptions a balanced designusing complete power would require N ¼ 324; or108 patients in each arm, compared to N ¼ 317;using optimal allocation of 38% to the combina-tion therapy (121 patients on combination therapyand 98 on each mono-therapy). Thus, there is nomeaningful reduction in sample size from usingan optimal design in this case, with the balanceddesign having only a 2% excess. However, it ispossible to allocate as much as 46% of all patientsto the combination (150 patients on combinationtherapy and 87 on each mono-therapy), withoutexceeding the balanced sample size of N ¼ 324:Thus, an extra 13% of patients (33% versus 46%),or 42 patients in total, can be allocated to thecombination therapy without any increase in

sample size compared to the balanced design.Given the study features outlined above, thisobservation could lead to very useful advantagesin the conduct of the study. In some settings theextra percentage of patients that can be allocated tothe common comparator arm may be much higherthan 13%, as illustrated in Example 1 where it wasan extra 46% (66% versus 20%). Individual powercalculations are not provided here but are similar tothe complete power calculations. Minimal powerwould not be considered relevant for this study.

7. DISCUSSION

Our discussion and examples have assumedhomoscedasticity and that power is to be deter-mined for equal treatment effects. These are notessential assumptions although they will be quitecommon in designing clinical trials. One of theproblems with more flexible methods at the designstage of a study is that frequently there will not be

Number of comparisons (k)

Sam

ple

size

exc

ess

usin

g ba

lanc

ed d

esig

n (%

)

2 3 4 5

0

10

20

30

40

Minimal Power

Complete PowerIndividual Power

Figure 2. Sample size differences between optimal designs and balanced designs as a function of the number of

comparisons, using design assumptions from Example 1.

Several treatments compared with a control 31

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

Page 10: Optimal design of clinical trials comparing several treatments with a control

prior information of sufficient detail and accuracyto take advantage of the flexibility. Nonetheless, itis possible to straightforwardly generalize thediscussion to include unequal treatment effects orheteroscedasticity. This is trivial for calculationsbased on individual power. For calculations basedon minimal or complete power, equations (8)and (9) can be used replacing Yk;g with thek-dimensional function Fk;g having d indexed byi ¼ 1; . . . ; k in its ith argument, to account fordifferent treatment effects and/or variances.Although a simple formula such as (10) will notarise in this case, numerical calculations can still beundertaken to minimize the required sample sizeusing the more complex power formulas.

One complication that arises when the effect sizesdi differ between the k comparisons is that thepower depends on the precise configuration of thedi’s. This leads to the consideration of the leastfavourable configuration of effect sizes, and therequirement that the power is controlled for theconfiguration of di’s at which the power attains itsminimum [9,10]. When using a single step multiplecomparisons procedure as assumed in this paper,the complete power for the least favourable config-uration can be controlled by using the completepower calculations described in this paper. How-ever, for controlling minimal power when the effectsizes di differ between the k comparisons, calcula-tions should be based on the individual powercalculations [9,10]. When using step-up or step-down multiple comparisons procedures these com-ments are no longer true. As the purpose of thepresent paper is to illustrate the issues associatedwith optimal design, we have made the simplifyingassumption throughout that the multiple compar-isons procedure will be a single step procedure suchas the Dunnett procedure. While there is noconceptual barrier to generalizing the optimaldesign approaches described in this paper to step-wise multiple comparisons methods, such as thestep-down Dunnett procedure, the calculationsinvolved would be more complex. Of course, thechoice of a single step or step-wise procedure isirrelevant for complete power calculations, as noadjustment for multiple comparisons is carried outin that setting.

Another assumption that we have made is thatthe hypotheses to be tested are one-sided. Again,while there is no conceptual barrier to generalizingthe methods to account for two-sided hypotheses,additional complications will arise, includingmodifications to the power expressions. In parti-cular, when testing two-sided hypotheses theremay be a requirement for correct directionaldecisions in the definition of power, in order tocontrol the probability of a type III error. This hasbeen considered by Horn and Vollandt [9], whoconclude that this additional requirement wouldinvolve only a negligible change in the requiredsample size to achieve a given level of power.

In assessing whether to use an unbalanceddesign such as those discussed in this paper, anystatistical advantages need to be considered in thecontext of other considerations. For example,enrolling fewer patients in the non-control armsmay be acceptable from the point of view ofachieving adequate power for efficacy compari-sons, but would lead to a reduction in safetyinformation for these arms. This could be aconcern in some placebo-controlled situations,but may actually have a positive impact in thecombination therapy example as more safetyinformation would be available for the experi-mental combination therapy.

The results of the numerical examples illustratethe important impact that efficient statisticaldesign can have on resource allocation in clinicaltrials comparing several treatments with a control.In particular, the sample size can be reduced byallocating more patients to the control, and thiscan be done optimally depending on the types ofhypotheses that are being tested, and the corre-sponding characterization of power that is used todesign the study. Sample size reductions can besubstantial, particularly if there are more than twocomparisons and minimal power is used to designthe study. Compared with the ‘default’ approachof using a balanced design to achieve adequatepower for individual comparisons, if the statisti-cian were to identify the appropriateness of aminimal power design with optimal allocation,then quite dramatic improvements could beachieved more than halving the sample size in

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst

32 I. C. Marschner

Page 11: Optimal design of clinical trials comparing several treatments with a control

some cases. Even when optimal designs do notlead to meaningful reductions in required re-sources, gains in the duration of the study maybe possible if the control arm is more attractive topotential participants, such as in a combinationtherapy study. In this case the statistician canidentify an unbalanced design that does notincrease the sample size relative to a balanceddesign, but does improve the attractiveness of thestudy, and hence its recruitment rate, by allocatingmore patients to the control arm.

REFERENCES

1. Fleiss JL. The design and analysis of clinicalexperiments. Wiley: New York, 1986; p. 96.

2. Hedayat AS, Jacroux M, Majumdar D. Optimaldesigns for comparing test treatments with controls.Statistical Science 1988; 3:462–491.

3. Zhu W, Wong WK. Optimal treatment allocation incomparative biomedical studies. Statistics in Medi-cine 2000; 19:639–648.

4. Dunnett CW. A multiple comparison procedure forcomparing several treatments with a control.Journal of the American Statistical Association1955; 50:1096–1121.

5. Westfall PH, Tobias RD, Randall D, Rom D,Wolfinger RD, Hochberg Y. Multiple comparisonsand multiple tests. SAS Institute Inc.: Cary, 1999.

6. Berger RL, Hsu JC. Bioequivalence trials, intersec-tion-union tests and equivalence confidence sets.Statistical Science 1996; 11:283–319.

7. ICH. Guideline E9: statistical issues in clinical trials.International Conference on Harmonisation of Tech-nical Requirements for Registration of Pharmaceu-ticals for Human Use (ICH), 1998. Accessed atwww.ich.org.

8. EMEA. Points to consider on multiplicity issues inclinical trials. European Medicines Agency(EMEA), 2002. Accessed at www.emea.eu.int.

9. Horn M, Vollandt R. A manual for the determina-tion of sample sizes for multiple comparisons:formulas and tables. Informatik, Biometrie undEpidemiologie 2001; 32:1–38.

10. Horn M, Vollandt R. Sample size for comparisonsof k treatments with a control based on differentdefinitions of power. Biometrical Journal 1998;40:589–612.

11. Hochberg Y, Tamhane AC. Multiple comparisonprocedures. Wiley: New York, 1987.

12. Serfling RJ. Multivariate symmetry and asymmetry.In Encyclopedia of statistical sciences (2nd edn),Balakrishnan N, Read CB, Vidakovic B (eds).Wiley: New York, 2004.

13. R Development Core Team. R: a language andenvironment for statistical computing. R Foundationfor Statistical Computing: Vienna, 2004. ISBN3-900051-00-3, URL http://www.R-project.org.

14. Hasselblad V, Allen AS. Power calculations forlarge multi-arm placebo-controlled studies of di-chotomous outcomes. Statistics in Medicine 2003;22:1943–1954.

Several treatments compared with a control 33

Copyright # 2006 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2007; 6: 23–33DOI: 10.1002/pst