7
C IIN ICAwS.L PANEL DISCUSSION Clinical Trials 2005; 2: 352-358 TRIAL S Panel discussion of case study 3 Marc Walton, Richard Simon, Frank Rockhold, William DuMouchel, Garry Koch, Robert O'Neill Dr Anello: Marc Walton, I wonder if you wouldn't mnind going first since you had a lot of interac- tion with this team during the regtilatory aspects phase. Dr Walton: Certainly. This was in actuality a P'hase II study, not the seamless Phase lI/Ill that was initially being considered. But as a Phase II study, it is a good example of an approach to addressing a very important question in the process of drug development, that is, to establish a good uinder- standing of the performance of the drug, including the dose-response relationship of the drug. This question is broadly applicable to many different settings and many different drugs. One of my questions was, what is unique about this setting that made this approach feasible or worthwhile? My answer would be, perhaps not much. There are certain aspects of this setting that the design really grappled with directly. Buit conceptually, the idea of wanting to establish the dose response, wanting to explore a broad range of relatively closely spaced doses, occurs time and again in nmany different drug development programs. There is a lot in this example that will be broadly applicable and that is not uniq-ue to this particular setting. The model of the early clinical evaluation, predicting the 90-day clinical status of a patient, was an important feature of this study. It sped the adaptive random allocation of patients to the dose that would be most informative. I want to note that this model had relatively little regulatory impact. From a regulatory point of view, when we look at what we have learned at the end, we rely upon the 90- day data for all patients. The importance of the one- week anid the four-week data was only in selecting which doses to allocate to the next patient. The meaning of those data evaporated as the patients' time in stidy matured. By the end, it has relatively little importance in interpreting the final result of the study and low regulatory impact. You can view that as a nice way of utilizing these approachies without impinging upon our abilitv to use the data. In estimating the dose response during the study, the Bayesian approach allowed you to include multiple kinds of uncertainties that have relatively little final concern for us, for the same reasons. A feature of the study design that was not emphasized in your talk was our concern about the adaptive algorithm allocating patients to doses that would not teach us most about the dose responise curve. We had a concern that, at some point, the utility of adding patients to the placebo group, if this was an effective drug, could become much less than focusing upon doses in the area where the upper bend in the curve lies. This was of concern because in studies like this there is the potential for a shift in the patient population as the different sites enroll and leave. There can be site-related differ- ences related to subjective physician assessments as well. We had concern that there might not be full comparability, and we might be misled by doses only within a narrow range. IThis study did include a lower bound on the probability of assigning a patient to placebo, so we could be assured that there would be a reasonably good distribution of placebo patients throughout the study, providing us with a good comparison beinchmark. We can think of that as purchasing insurance. It may have slightly decreased the efficiency of learning about the dose response curve, but it ensured that we would still be able to reliably interpret the study in the event of site- or time-related effects. The final analysis was not heavily influenced by the prior. The expectation was, and proved to be the case, that the prior would be really swamped out the data obtained during the study. From a regulatory point of view, the validity of prior knowledge did not become a concern, because we relied mainly upon the study data. Coming back to the idea of the seamless approach, although that was the initial plan, it was wisely not the final plan. It is the rare developinent program where the Phase 11 stutdy needs to answer onily the q-uestion of what dose to use in the Phase III. We need the opportunity to ' Society for Clinical Trials 2005 I 0. I 1 91 /I 740774505cn I 06oa

Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

CIINICAwS.L PANEL DISCUSSION Clinical Trials 2005; 2: 352-358TRIAL S

Panel discussion of case study 3

Marc Walton, Richard Simon, Frank Rockhold, William DuMouchel,Garry Koch, Robert O'Neill

Dr Anello: Marc Walton, I wonder if you wouldn'tmnind going first since you had a lot of interac-tion with this team during the regtilatory aspectsphase.

Dr Walton: Certainly. This was in actuality a P'haseII study, not the seamless Phase lI/Ill that wasinitially being considered. But as a Phase II study, itis a good example of an approach to addressing avery important question in the process of drugdevelopment, that is, to establish a good uinder-standing of the performance of the drug, includingthe dose-response relationship of the drug. Thisquestion is broadly applicable to many differentsettings and many different drugs. One of myquestions was, what is unique about this setting thatmade this approach feasible or worthwhile? Myanswer would be, perhaps not much.

There are certain aspects of this setting thatthe design really grappled with directly. Buitconceptually, the idea of wanting to establish thedose response, wanting to explore a broad rangeof relatively closely spaced doses, occurs timeand again in nmany different drug developmentprograms. There is a lot in this example that willbe broadly applicable and that is not uniq-ue to thisparticular setting.The model of the early clinical evaluation,

predicting the 90-day clinical status of a patient,was an important feature of this study. It sped theadaptive random allocation of patients to the dosethat would be most informative. I want to note thatthis model had relatively little regulatory impact.From a regulatory point of view, when we look atwhat we have learned at the end, we rely upon the 90-day data for all patients. The importance of the one-week anid the four-week data was only in selectingwhich doses to allocate to the next patient. Themeaning of those data evaporated as the patients'time in stidy matured. By the end, it has relativelylittle importance in interpreting the final result of thestudy and low regulatory impact. You can view that asa nice way of utilizing these approachies withoutimpinging upon our abilitv to use the data.

In estimating the dose response during the study,the Bayesian approach allowed you to includemultiple kinds of uncertainties that have relativelylittle final concern for us, for the same reasons.

A feature of the study design that was notemphasized in your talk was our concern about theadaptive algorithm allocating patients to doses thatwould not teach us most about the dose responisecurve. We had a concern that, at some point, theutility of adding patients to the placebo group, ifthis was an effective drug, could become much lessthan focusing upon doses in the area where theupper bend in the curve lies. This was of concernbecause in studies like this there is the potential for ashift in the patient population as the different sitesenroll and leave. There can be site-related differ-ences related to subjective physician assessments aswell.We had concern that there might not be full

comparability, and we might be misled by dosesonly within a narrow range. IThis study did include alower bound on the probability of assigning apatient to placebo, so we could be assured thatthere would be a reasonably good distribution ofplacebo patients throughout the study, providing uswith a good comparison beinchmark. We can thinkof that as purchasing insurance. It may have slightlydecreased the efficiency of learning about the doseresponse curve, but it ensured that we would still beable to reliably interpret the study in the event ofsite- or time-related effects.

The final analysis was not heavily influencedby the prior. The expectation was, and provedto be the case, that the prior would be reallyswamped out the data obtained during the study.From a regulatory point of view, the validity of priorknowledge did not become a concern, because werelied mainly upon the study data.Coming back to the idea of the seamless

approach, although that was the initial plan, itwas wisely not the final plan. It is the raredevelopinent program where the Phase 11 stutdyneeds to answer onily the q-uestion of what dose touse in the Phase III. We need the opportunity to

' Society for Clinical Trials 2005 I 0. I 1 91 /I 740774505cn I 06oa

Page 2: Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

Panel discussion 353

explore the data, and ask questions like what is theright population to employ in the Phase III study toeither improve the safety or the efficacy. In this case,critical Phase II design parameters might be therange of severity in the stroke patients, or the righttime window froml tinme of onset for the strokepatients, or the concoinitant treatments andmedications needed or to be avoided. Otherimportant questions are whether the protocolis being followed consistently, or are revisionsneeded. All of these are very important lessonsfrom the Phase II, and I feel that we need to learnabout these things from Phase II before initiatingPhase ITI.

Dr Simon: The first point is that in many cases,Bayesian statisticians are too enamored withcomputing or philosophical issues, and the actionreally is in the priors. Bayesian methods, in myexperience, really have a major contribution tomake when there is prior information or priorassumptions that need to be incorporated into theanalysis. Not some subjective prior of the investi-gator that nobody really cares about anyway, butpriors based on data, or on assumptions shared bymost stakeholders.

For example, I have studied a lot of Bayesianapproaches to problems of multiplicity where thereis a community assumption that treatment bysubset interactions, qualitative interactions areunlikely, and you can use that to specify a prior.For those kind of problems, you can show that theBayesian methods give really much more strongerinferences than the frequentist methods, and theywork where the frequentist methods tend not towork. But for situations where you really don't haveassumptions or data external to the trial that need tobe brought into the analysis, my feeling is that theBayesian methods actually don't have muchadvantage over frequentist methods.

In this particular example, there was very littlefocus on the prior in terms of what was presented. Icouldn't even understand really what the prior onthe dose response curve was. If it's truLe that it wassort of a noninformative prior, I would questionwhether that was appropriate. lf it was used, then 1would think that a similar kind of results could havebeen obtained with an adaptive frequenitist methodbased oni interim analysis.

I also had a question about the posteriordistribution. Based on the definition of the ED95,given that there was no effectiveness of any dose, Iwould have thought the ED95 would be defined aszero. The smallest dose that is at least 95%/ effectiveis the best response, and yet that isn't what theposterior distribution of the ED9S looked like.

The second point is, does the Bayesian paradigmmake the use of a non-fully randomized design with

adaptive allocation more valid thain would afrequentist model? My answer is no. Now, it reallvdoesn't matter if it is purely an exploratory Phase 1Itype of thing, but if you want to draw inference,then that issue would come up and the question is,would Bayesian analysis make the abseince of areally pure raindoinization among dose levels mnorepalatable than a frequentist mnodel would?

I would say no; any frequentist can write down asimilar model and do a model-dependent analysis.Both the Bayesian and the frequentist approachesdepend on model assumptions that the treatmentassignment is ignorable, using the phrase of PaulRosenbaum and Don Rubin. Once you are doing amodel-based analysis, whether it's Bayesian orfrequentist, the Bayesian paradigm really has noadvantage in terms of the adaptiveness of thetreatmient allocation.

The third question is, are more dose levels betterthan fewer dose levels? I would say the answer to thatquestion depends on being very specific about whatyour objectives are. It was said that the objective wasdetermining a dose with sufficient efficacy to takeinto a confirmiatory trial. That is a very differentobjective than trying to determiine the ED95. If yourobjective is to know if there is a dose that issufficiently effective to take into a confirmatory trial,you could answer that question with many fewerpatients by sttudying fewer dose levels. You could dothat if you are in a situation where you have a prior,and where you put a lot of prior weight oni the beliefthat there is a monotone dose response curve, notsome multimodal dose response curve.

The fourth point was, "Do Bayesian methodspermit more adaptiveness than frequentistmethods?" There are many ways you can formulateadaptive frequentist methods. Many times withBayesian methods, it is more inltuitive inl terms ofhow you are doing the adaptiveness, and that isattractive, but you actually can do adaptivenesseither way. If you want to make inference, the issuesof nonrandomization comne up, and you have thesaine probleims either way.My final point is: are seamless Phase II/Ill

protocols a good idea? Generally, no. One of thebiggest problems in drug developmnent in terms ofdefining effective studies and doing things quickly isInaking sure you have crystal clear objectives foreach study yoLu do. You can waste a lot of time indr-ug development without really clear objectives. Ithink Phase Il/Ill kinds of protocols encourageblurred objectives. In this case, it was not a Phase 11/111 protocol, but there were still some blurredobjectives, similar to what I have alluded to, likethe difference between an ED95 and whether thereis an effective dose. In this study, I wonder whetheryou could have gotten to the final answer moreefficiently by doing a pilot with fewer dose levels to

Clinical Trials 2005; 2: 352-358www.SCTjournal.com

Page 3: Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

354 Panel discussion

try to answer the question of an effective dose, andonly once you have answered that affirmatively, toget into what is the ED95.

Dr Rockhold: 'lo state my prior, we haveimplemented mainy designs like this, mostly inearly development, looking at tolerability. We haverun a few adaptive designs for efficacy in Phase II,but these were in areas where we didn't have theadvantage of having even a remnote surrogate likethey did in this trial.

I can definitely see the benefit in the use ofdecision-making. One of the important aspects ofthis particular trial was that it dealt with stroke andlooked at benefit to risk comparisons. In a diseaselike this, tolerability becomes less of an issuebecause of the severity of the disease. If you werelookinig at a less severe disease, you would have tobuild tolerability into this model to understand howto pick a dose for Phase III. I assume they werelooking at that, although it wasn't mentioned in thediscussion. Looking at benefit to risk and tolerabilityhas to be built into the model because the vastmajority of studies we do are in diseases that arefortunately nowhere near this severe.

The question about exploratory versus confirmia-tory has certainly come up. I agree that when yourun trials that are trying to serve miiultiple masters,you run into problems. Making that clear upfront iscritical.A question that I would like to pose is how they

arrived at the number of doses and whether or notthat number was critical. I try to press people in ourinstitution to run more doses. People tend to workto fewer doses, but most of the medi cations we workon are oral medications. Running trials like this at15, 20 different oral medication sizes is logisticallyproblematic.

The objective at the beginning was to go intoPhase III with one dose. It would have been verydifficult to go into Phase I11 and really understand thebenefit to risk at a given dose from this trial. In regardto the futility, anrd I use Dr Grieve's term here, theywoLuld have liked to have reaclhed that decisionsooner. They attributed it to an enrollment that wastoo swift. That mnay have been ?part of it, but when wehave tried to run futility designs, whether it is anadaptive design or not, they are very conservative.

If you didn't have an iindependent data monitor-ing committee, and trial sponsors were looking, Ibelieve they would have come to the conclusionthat futility was reached a long time ago, regardlessof enrollment.

In order to roll out these designs in the future,and - this is not a criticism of the model - it isimportant to make people] understand the charac-teristics of futility and whether or not these modelsare useful.

You are dealing with a surrogate of a surrogate inorder to fit this longitudinal model. I was interestedto hear Dr Walton say that if they were goingto review the data, they would have looked at the90-day data. In an organization like mine, asDr Grieve very properly indicated upfront, you aretrying to sell this type of design to decision makers.They want to uniderstand, if we make a decision, is itgoing to stick. If we are making a decision internally,and I can characterize the probability of a decisionto them, they will (within the limits of how muchthey trust me), use that to make a decision.

If the regulators' view of the risk of the decision isanother factor to consider, that makes it all theharder to sell. If you are only going to look at theendpoint data, that is going to make the design evenmore difficuilt to sell internally.

The recruitment speed issue was a good one. lt isvery difficult to tell people in an organization thatthey really ought to be recruiting slower.

From an ethical standpoint, using electronic datacaptuLre or the system they have, which is a goodsurrogate, is absolutely essential. It is going to bevery difficult to convince people they need to recruitslower.

Dr DuMouchel: Th-lis was a very interesting studythat raises a lot of interesting points aboutdifferences between being Bayesian or not.Although in soime respects, I do agree with someof the other panelists that you don't really have tobe Bayesian to utilize the techniques.

The flexibility was a hallmark of the way in whichthis study was done; there was a flexible notion ofwhich doses to use and flexible models for the doseresponse cuLrve, and flexible methodology fordeciding when to stop. In the textbook studies ofclinical trials, that flexibility seems to get sup-pressed. But I don't think that inflexibility is afundamental featture of frequentist approaches; it'smaybe a fundamental feature of regulation.

T'here ineeds to be more interest in how you canbe flexible but still be strict enough. It's likeparenting; you lay down a lot of inflexible rulesbecause it just makes your life a lot easier. There areseveral different reasons that the idea of adaptivedose allocation has occurred in the literature. One isto allow this nonparametric dose response esti-mation to be better. When the adaptivity uses thepast responses, that may throw a monkey wrenchinto certain frequentist calculations. Even there,you could compute frequentist numbers withsimulation.

I was involved in a trial where the adaptivity wasbased on the use of the covariates. It was a surgicaltreatmenit, a fetal tissue implant for Parkinson'sdisease. We had a very small sample size, only a totalof 40 patients for the whole study, and we wanted to

Clinical Trials 2005; 2: 352-358 www.SCTjournal.com

Page 4: Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

Panel discussion 355

get as much balance as possible with things like age,sex and the disease severity. As each patient waspresented and being randomized, I tised a compu-tation of the amount of information that you wouldbe getting on the slopes and curves of the treatmenteffect in order to get the best balance in the design.The probability of allocation to treatment or controlwas based on that.

The idea of a stopping rule being based onposterior probabilities works well. It is again anexample where the frequentist approach is a littleclumnsy and a clear win for being Bayesian. While isnot certain whether usinig the posterior probabilitiesof interesting events is better or worse than somefancy optimal stopping decision theory, it isprobably easier to explain to someone.

The idea of the simulations was very valuablehere because of this ability to be both Bayesian andalso to be able to predict certain frequeintistproperties. That is a good win.

The notion of validating the computer code isclearly one that several people have brought up overthe course of our two-day conference. lt points outthe need for more standardized procedures in theBayesian world, and that will probably happen overtime.

There is not necessarily such a need for one-offdesigns every time. We can hopefully decide on afew Bayesian modifications of practice that are thebiggest wins and then stanidardize them.

This bimodal posterior distribution was veryinteresting and perhaps an example of being a littletoo flexible in dose response modeling. Somesimpler version of the dose response curve wouldgive a better answer there. T'he whole idea of anED95 is called into question if you don't have amonotone dose response curve.

The two greatest defects of the frequentistapproach need more emphasis. TIhe first is theexcessive focus on P-values. The other big defect isthe very clumsy approach that the frequentisttheory has with respect to multiple conmparisons. Ihave had quite a bit of involvement in. a data iminingapproach on FDA postmarketing surveillance data.We are evaluating millions of parameters, trying toestiimate them, and that we need a Bonferroniestimate of every parameter getting estimated seemnskind of an inmpossible situation.

The Bayesian approach is that you don't need toadjust for multiplicity, but you need to use yourprior distribution, and the prior distributionautomatically shrinks the miost extreme values in away that looks like an adjustment for multiplicity.The use of shrinkage estimation and hierarchicalmodeling did not receive much focus here in theclinical trial examples. I would encourage people tobe a little bolder with the use of historical data. That

is an example where sone hierarchical modelingcould work.

In the past, I have been involved with thecombining of experiments in many differentenvironments, which is like using historical data.T'here was one example where we actually hadmodeled dose response curves siimultaneouslv withdata from multiple species, not to mention multipledesigns. That was a most demanding problem.

Finally, there was one discussion earlier about ifyou have these Bayesian imodels and they givecoimplex results and how you put all that on thelabel. One thing that will eveintually develop isthat the label will just have the URL for a WAebsitewhere you can go and get all your questionsanswered. You would put in your own patientcharacteristics and get a prediction for what thisdrug will do for you.

Dr Koch: First of all, I want to express to thespeakers that I very much appreciated learningabout their experience. I thought they providedoutstanding documnentation of all of the processesthat they weint through in order to carry out thisstudy. I did have a few questions for people tocontemplate. One was, are we dealing with aparadigm in which there actually is a correct dosethat works best for everybody, or are we in a scenariowhere some doses work better for some people andother doses work better for other people? It could bethat the bimodal posteriors suggest that high doseswork for some patients and low doses work forothers. I am not sure how one can proceed toimplement the design in a manner that would bepotentially sensitive to that.

Another consideration that might apply in otherindications is that one would not only be interestedin dose response for a primary endpoint; one mightalso be interested in dose response for some keysecondary endpoints. One might not only beinterested in dose response for all randomizedpatients, but might also be interested in wantingto know what the dose response was for males, forfemales, for younger people or older people. Thereare ways the Bayesian or other adaptive proce-dures can be applied which are as informative todose response across rmultiple endpoints and doserespoinse across imultiple subgroups as if vouessentially assumled that the dose response ishomogeneous across all subgroups or across allpotential endpoints.My second comment is that you can indeed do a

Phase Il/Ill design, but the Phase III part of it shouldbe independent and self-standing. It should achievecertain goals in and of itself, although in terms ofultimately having convincing evidence, it is fine tointegrate the two studies. I think Phase I1/III is

Clinical Trials 2005; 2: 352-358www.SCTjournal.com

Page 5: Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

356 Panel discussion

feasible when there are clearly go sigigals, but thatthe Phase III should indeed be self-standing.

One of the curious things I was wondering interms of implementation was whether or not thestudy could have recruited a relatively small numberof centers. That smaller number of centers may haveenrolled about 150 to 250 patients. They could havegotten as much learning as possible from that firststage of this process. If everything looked fullyinformative, they would then proceed to have therapid recruitment that would have brought in theadditional 300 to 600 patients. This is a way inwhich you don't have to have an interruption in therecruitment process, but it does allow you to havetime to catch your breath before the process runsaway fromn you.

Dr O'Neill: A lot of good things have been said. DrsGrieve, Krams and Pfizer should be conimended fordoing this study. One can say what else might youhave done, but this is an area where we reallyhaven't figured out things, so the dose ranging inthe Phase I1 area in stroke is difficult.

I would hate to think that this was a desigin-induced negative resuLlt. At the end of the day, thisactually would have been a good product, but itsuffered from a design that actually said it wasnegative. It is probably not the case, but I have oftenwondered why outcome-dependent allocationdesignls have not beein used. They have been aroundfor 30 years in the statistical literature, but they havenever been used. One of my thoughts is that formodest effects it is not clear whether the effect is dueto the covariate or whether it is due to thetreatment.

As you accrue, if you do have a situation wheredifferenit people respond differently, then covariatesare as important as the treatment itself. They madesure that you had a fixed allocation of the placebothroughout the entire design. That doesn't necess-arily mean that you had the same covariates for thatplacebo patient as you do throughout the wholetrial. If there was a domiinant covariate that mnighthave been responsible for the bimodality, then thatwould possibly explain what was going on.

TIhere is a role for allocation on outcomne-dependent allocations, but it is usually for bigeffects, anid not for modest effects. If you are in themodest effect game, it is much harder to tease outdose response.

There is another issue here. Suppose this hadturned out to be a positive study, everyone washappy, and you had dose response. In terms of thepackage, is it a confirmatory study, is it one of thetwo st-udies? Would it support approval? I wouldtend to think yes. This is a large trial, 400 patients.17 doses is huge. We never see trials with 17 doses. Itmight have been that they could have gotten by

with six doses or five doses, but that is after the fact.One could have gone into this withl an inverse Ushape. Too much of a good thing is a bad thing; it isnot unusual that if you go a little too far, you go offthe deep end.

At the end of the day, what I like is that they triedsomething that folks haven't been trying. All theplanning that went into this, the amount ofdocumentation that they provided, and all thesimulations, is really a good example of the kind ofupfront thinking that needs to go into somriethingthat has this level of complexity and needs to get theteam involved. 'l'hey had 18 centers involved,everybody was on-board, and there was enthusiasticrecruitment. That is amazing. A lot of positivesresulted from this experience.

Response

Dr Grieve: I accept entirely that almiost everyaspect of this clinical trial could have been donefrom a frequentist perspective. There are methods todo everything, to do the prediction, to do thestopping, to do all of it. It would probably be moredifficult in some aspects, but it could have beendone. Ilowever, I take the view, as Dr Berry didyesterday, that within the Bayesiani approach youhave a coherent system within which you can do allof these, so why should I decide to do it in annon-Bayesian way?

Some questions hiave been mnade about thedefinition of the ED95. What is the ED95? Ofcourse, we could have had an inverted UJ doseresponse cLirve. The ED95 was defined as being withrespect to the maximum response within the doseinterval that we were investigating. Had you had aninverse U dose response, we would have beenlooking for the minimum dose that would give you95%/o of that maximum, because we wouldn't havebeen wanting to go too high, where you tip backdown again.

ED95 was defined in that particular way becauseof an early studiy, a patient safety study, in whichthere was some indication that at the top end of thedose, it was starting to come down again.Dr Krams: Unfortunately, we recruited imuch,much qulicker than would have been appropriate,and Dr Grieve hias donie analyses showing that withhalf the patients over the same time, we would havecome to very similar conclusions.

It is importarnt to rememnber that whilst we did aquite good job in irnplementing this trial, this was amajor flaw. The other issue is that the protocol waswritten in a very conservative way, and it stated thatwe would only allow stopping the trial after 500patients worth of evaluable data would be available.Ihis meant that long after the independent Data

Clinical Trials 2005; 2: 352-358 www. SCTjou rnal.com

.

Page 6: Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

Panel discussion 357

Monitoring Committee had realized this was notgoing to be a winner, there were still patientscoming in.

While we did quite a good job, we could havedone a much better one. I -want to make a couple ofgeneral points and then answer some of thequestions. The first point is we had the CopenhagenStroke Study, and I am really grateful to Tom Skewerand Henry Jorgenson to allow us to work with it, butthat is a study froin one center in Copenhagen 10years ago. I tried very hard to get much morerealistic clinical trial data. There are 80 000 patientsworth of data lying in vaults somewhere. Can we allwork together to make that data available to eachother?

It was asked why we have so many doses. 'lheanswer was that we went to our PK/PD (pharmaco-kinetic/pharmacodynamic) modeling people andasked them what a biologically sensible step wouldbe. The answer is a combination of PK/PD modeling-driven reasoning, and what can you do withoutgetting into an arena where dilution errors make itimpossible to distinguish one dose from the other.We have run a lot of simulations since then, lookingat what the advantage of having a very high numberof doses is versus a smaller numIber of doses. All inall, if we have to do it again, we would probably runit with nine treatment arms.

rhere was a very strong impression that thismight be a drug where we would have an inverse Ushape, and that is why it was important that we usesomething like the normal dyinamic linear imodel,which is not dependent on the assumptions ofmonotonicity. We didn't discuss all the aspects ofsafety observations. The indeperndent Data Moni-toring Committee looked very carefully at all thesafety data, and the antibody data coming inl, andwas very much better suited to make judgments on afuller understanding of the dose response.

If I was asked whether I wanted to have a fullunderstanding of the dose response in the futuredrug development versus just understanding whatthe safety was on a particular dose, I clearly wouldgo for the full characterization of the dose responseagain.

Audience questions

Dr Gould: I am Larry Gould from Merck. First of all,I would like to commend Pfizer for supporting andsticking with a fairly complicated and elaboratetechnical development schemne over a long period oftime. 'I'hat is really quite a commitment.

Second, I think there were two commients madein the discussion that everyone ought to tattoo ontheir hands. One of them is Richard Simon'scomment, which I will paraphrase, that before you

undertake each stage, you ought to have a prettyclear idea of the statements you want to be able tomake after you have completed the stages and doneyour analysis. In other words, knowing what youwere looking to get at the outset is a good thing. TIheother thing is Marc Walton's comment that Phase 11tells you more than just what your dosage should befor Phase III, and it has a very real value.My question is what would have happened if,

given the data you had, you had actually pursuedsomething like a more or less conventionalfrequentist type approach, maybe a little imagina-tive, ulsing group sequential stopping rules andsuch? Would that actuLally have led you to the samekind of conclusion with about the same economy ofpatients?

Dr Grieve: I haven't yet done that piece of work,and when I have some more time, I will do it and Iwill tell you what happened.

Dr Gillespie: Bill Gillespie, Pharsight. The mainquestioni I had is early orn when you had firstpresented this work, back in 1999 and in somesubseq-uent publications, you described the decisionanalytic strategy for the stopping rule. It wascertainly an intriguing approach and one that I aminterested in pursuing. I was curious as to yourreasons for going to the posterior probability basedapproach.

Dr Grieve: The reason that we ended up goiing forthat approach was because the objectives of the trialchanged during the development of the algorithm.When we first developed the decision theoreticapproach, we were looking to detect bigger effectsthan the ones that we ended up trying to detect, andthe decision theoretic algorithm didn't seem to be assensitive to very small effects as did the one based onthe bounds of posterior probability.

Dr Krams: One of the reasons why we didn't go thedecision analytic route-and Dr Grieve and I had bigbattles about this-was that we thought it would bevery difficult to explaiin. TIhat is a shame. lThere is ahuge need for an information sharing, just as we doit here, so that in future research programs, thedifficulty with explaining doesn't, in itself, becomea stu.mbling'block.

Dr Rubin: Some experience suggests that whenyou get bimodal posterior distributions, it is amnistake, a mistake in one of possibly three senses.'The most common is, when using these reallyexpensive Markov chain Monte Carlo runs, youhaven't got convergence yet, and in order to detectthat, you have to run multiple chains, perhapsmillions of iterations, and wait for the multiplechains to converge to each other.

Clinical Trials 2005; 2: 352-358www.SCTjournal.com

Page 7: Panel discussion ofcase study3 · 2009-05-21 · Panel discussion ofcase study3 Marc Walton, RichardSimon, Frank Rockhold, William DuMouchel, GarryKoch, Robert O'Neill DrAnello:MarcWalton,

358 Panel discussion

A lot of experience suggests that that can happen.Even if that step is correct, even if you have thecorrect posterior distribution for the model being fit,bimodality of a posterior distribution often reflectsthe fact that you have chosen a model that isincorrect. It actually suggests that you should dosomething to fix the nmodel. It is an importantdiagnostic.

The third sense in which it indicates a mistake isthat there could be a hidden covariate that you arenot modeling. It could be two groups of people, andthere is a mixture of two types of people, and that isnot really a mistake in the sense that the model iswrong, but it is a mistake in the] sense that youhaven't included something in the model that youshould have.

Dr Grieve: I certainly agree that if are trying tomodel a dose response curve in which youanticipate either that it is monotonically increasing,or at least it's an inverted U, and it turns out thatyou don't have that, and it is just random variation.Then, using an ED95 might not be the mostappropriate model to use.

In terms of the covariates, clearly when wespecifv in a protocol what covariate we are goingto use to do our primary anialysis, that's it, we havechosen it. After the event, we put in just aboutevery covariate that my colleague here couldthink of, and it didn't change it. We still had abimodal result. There may be some parametersout there, which even the greatest brains, clinicalbrains in neurology can't think of as beinginfluential, but everything that we tried, it stil]stayed the same.

Dr Simon: Why isn't your ED95 zero? You have notreatment effect. You defined your ED95 as thesmallest dose that gives at least 95%-Yo of the fullresponse. The full response is zero. The smallest dosethat gives 95% of the full response is zero.Dr Grieve: The full response is not zero. If yousaw that the placebo response was about 17 points,the maximum response was about 18.5, so therewas a maximum response, and we are defining theED95 with respect for that maximum difference.Dr Simon: The way you are looking at thatposterior distribution is very inconsistent with theconclusion for futility.

Dr Krams: The decision problem that we had waswhether there was a three point or greateradditional benefit over and above placebo.Dr DeBrota: Dave DeBrota. I work at Eli Lilly. Itoccurs to me that Pfizer has done a wonderful job ofshowing a way that new things can be done, and Iam very curious about the future of the source codethat has been created. Will this continLue to be aproprietary asset of Pfizer that no one will see, or willthis source code be subject to greater scrutiny, orperhaps even released into a domain in which it cancontributed to by many different institutions?Dr Grieve: Since we ran the trial, we havedeveloped the same approach on our other softwareplatform, so we could do what we did with thishome grown system using WinBugs now. In theoryat least, it will at some point in the-not too distantfuture become available more generally.Dr Louis: Torn Louis, Hopkins. Dr. Rubin provideda long list of reasons that you might get a bimodalposterior. It is important to add that whether youare talking about the likelihood, the posteriorsequentially monitored stuidies have the ability tocome up withl a bimlodal posterior even in the mostvanilla case with no covariates. '[hat needs to beadded to the list.Dr Anello: I would like to thank the speakers andthe discussants for really excellent presentations.

Participants

Marc Walton: Director, Division of TherapeulticBiological Internal Medicine Products, FDARichard Simon Chief, Biomietric Research Branch,NCIFrank Rockhold: Senior Vice President, Biomedi-cal Data Sciences, GlaxoSmithKlineWilliam DuMouchel Chief Statistical Scientist,Lincoln Technologies, Inc.Gary Koch: Professor and Director, BiometricConsulting Laboratory, Department of Biostatistics,School of Public Health, University of NorthCarolina at Chapel HillRobert O'Neill: Director, Office of Biostatistics,Center for Devices and Radiological Health, FDA

Clinical Trials 2005; 2: 352-358 www.SCTjournal.com