Quality assessment of randomized control trials in dental research I. Methods

Journal of Periodontal Research 1986: 21: 305-314

Quality assessment of randomizedcontrol trials in dental research

ALEXIA A. ANTCZAK, JULIE TANG AND THOMAS C. OFIALMERS

Department of Health Policy and Management, School of Public Health, Harvard University,Boston, Massachusetts, and The Mount Sinai School of Medicine of City University of New York,

New York, New York, U.S.A.

The criteria and scoring method for a system to evaluate the quality of randomized controltrials (RCTs) in dental research based on published reports is presented. This sytem is basedon one devised for evaluation of RCTs in medicine. Items assessed in this system includerandomization and blinding procedures, subject selection criteria, treatment protocols, andstatistical analyses. Assessing the quality of RCTs can contribute to improved study design,implementation and reporting by investigators, and evaluation of reports by referees andeditors of scientific journals.

(Accepted for publication January 3, 1986)

Introduction

The randomized control trial (RCT) has be-come the standard experimental tool forevaluation of medical therapies. The processof randomly assigning subjects to differentgroups is the best available method for en-suring comparabiHty across treatments. Thevalidity of a clinical trial depends on severalaspects of the study design, including themethod for randomization, the criteria forsubject selection, treatment descriptions,blinding procedures, and use of appropriatestatistical analyses (Chilton & Barbano1974, Levenstein & Bishop 1981).

Reporting of clinical trials has also re-ceived attention (DerSimonian et al. 1982,Mosteller 1979, O'Fallon et al. 1978). Ade-quate reporting facilitates interpretation ofthe quality of the trial design and results,and comparison with other trials. It alsopermits replication of the trial. Several pa-pers have evaluated reporting of trials in

the medical hterature (Blackburn, Smith &Chalmers 1982, Chalmers et al. 1978, 1981,1983, Freiman et ai. 1978, Mosteller, Gil-bert & McPeek 1980). Items of particularinterest include the bias-reducing techniquesof randomization and blinding, descriptionof statistical methods employed, subjectselection criteria, and sample size and povi'ercalculations. Although some standards havebeen set, further collaboration is needed inorder to develop uniform rules for design,implementation, analysis and reporting.Standard designs would allow more directcomparison between trials and facilitatecombining data from smaller, inconclusivetrials in order to increase statistical power.

The process of combining the results ofmultiple randomized control trials, knownas meta-anaiysis, is a relatively new scien-tific method. It is applicable to situations inwhich trials are undersized, have conflictingresults or demonstrate a difference whichneeds clarification. Analysis of quality is

306 ANTCZAK, TANG AND CHALMERS

the first step. A system for evaluating thedesign, implementation and analysis of dataof randomized control trials has been ap-plied to 286 miscellaneous RCTs in medi-cine. This paper provides an adaptation ofthe quality assessment system for use inevaluating clinical trials in dental research.An accompanying paper applies the qualityanalysis to periodontal trials. A meta-analy-sis of periodontal research will be presentedin a subsequent paper.

Material and Methods

The quality scoreThe quality score is divided into three majorsections, evaluated on a separate form (seeappendix). These include: (A) basic identifi-cation of the paper; (B) the study protocol;and (C) data analysis of presentation.

Form A records basic identifying datato facilitate classification of the papers, itincludes title, author(s), journal, and coun-try, as well as what procedures were evalu-ated, whether a biostatistician was involved,what was being randomized (either patientsor parts of mouths), whether time or costdata were included, a general classificationof the enthusiasm of the authors for theresults, and a definition of the populationfrom which subjects were sampled.

Form B items evaluate the quality of thestudy protocol. To minimize bias in assign-ing these scores, readers are blinded to au-thors, sources, and results. Differentialphoto-copying of the methods section andrelevant sections of the introduction is per-formed by a trained technician. Detailedinterpretation of the items in this form areas follows:

Selection description: This item refers to theclarity with which subjects selected for thestudy are characterized. The two aspects ofimportance are the subject characteristicsand the methods for evaluating those

characteristics. Full credit is given when thefollowing are available: distribution of sub-jects by age (mean, median or range); sex;number of teeth (mean or minimum); someassessment of oral health or disease status;and a description of the diagnostic work-up performed to determine oral health ordisease status. For example, for clinicaltrials of periodontal treatment, periodontalhealth status might be evaluated by gingivalscore, plaque score, pocket depth and/orlevel of attachment, or degree of mobihty.For trials of caries preventive agents,DMFS scores might be used. Partial creditis assigned when only part of the aboveinformation is reported. Papers are con-sidered inadequate when, for example, onlyage and/or general periodontal condition(e.g., adult subjects with chronic periodonti-tis) are reported without additional meas-ures of periodontal status.

Number of patients seen and reject log: Thisitem reflects the importance of a descriptionof the ehgible population not accepted forthe trial. Full credit is assigned when thetotal number of patients, specifying poten-tially eligible and actually included, is given.The optimal presentation would include atable or report in the text with the numberof patients excluded before randomizationalong with the relevant reasons for ex-clusion. Partial credit is given when the au-thors mention the criteria for exclusion withthe statement that not all eligible patientsseen were included. It is important to notethat the reject log refers to patients whowere excluded from the study despite thefact that they satisfied all major biologicalcriteria for admission. The outcome in thesepatients should ideally be compared withthe outcome of subjects participating in thetrial to obtain information about potentialbias in selection of subjects.

Definition of the therapeulie regimen: De-

A S S E S S M E N T OF R A N D O M I Z E D T R I A L S 307

scriptions of the therapeutic regimen mustbe sufficiently detailed to allow proper in-terpretation of the results, comparison withother reports, and replication in futurestudies and/or practice. Criteria for creditfor this item are dependent on the pro-cedures being evaluated. In trials involvingperiodontal surgical tecliniques, for exam-ple, full credit is given when a completedescription of the surgical procedure is in-cluded. Partial credit is assigned when lessdetailed descriptions are given (e.g., modi-fied Widman flap). In studies evaluatingnon-surgical periodontal treatment, full cre-dit is indicated when the description of non-surgical treatment includes informationabout the extent (e.g., amount of time, num-ber of visits, by quadrant, with or withoutanesthesia, by a periodontist, hygienist orgeneral dentist, etc.) and/or frequency (e.g.,every 2 wk, every 3 months) of the treat-ment. Partial credit is assigned when men-tion is made only of a non-surgical phase(e.g., scaling and root planing). No credit isgiven when there is no description of thenon-surgical treatment. For studies of anti-microbial agents, full credit is assignedwhen complete descriptions of the regimensare available (e.g., 250 mg Pencillin 4x/dfor 2 wk) and partial credit when generaldescriptions are presented (e.g., clorhexidi-ne rinses for 2 wk). Studies evaluating dif-ferent preventive regimens likewise receivefull credit when the regimen is described indetail (e.g., patients were seen once every 2wk for polishing, flossing, and oral hygieneinstruction including plaque disclosure),and partial credit when general descriptionsare given (e.g., patients were seen on recallevery 3 months). For studies evaluating thecaries preventive potential of various fluo-ride regimens, the type, concentration, routeof application and frequency of the fluoridetherapy should be provided to receive fullcredit. Less detailed descriptions, such assodium fluoride rinses twice weekly, are as-

signed partial credit. No credit would begiven if the description simply said fluoriderinses.

Description of the placebo: The two itemsrelated to placebo appearance and taste areevaluated for trials which include chemo-therapeutic drugs, fluoride rinses, or antimi-crobial agents; either antibiotics administer-ed systemically or locally, or antisepticagents in rinses or toothpastes. Full creditis assigned if the placebo appearance and/or taste were controlled and described. Thepresence of placebos which were not ident-ical to the experimental agent, or failure todescribe the placebo characteristics resultsin no credit being given.

Follow-up schedule: Studies receive full cre-dit for this item when the follow-up scheduleis described in detail and includes times,procedures performed, and evaluations.Partial credit is assigned when general de-scriptions such as times for follow-up visitsor procedures performed are available, butnot both.

Test of adherence to treatment: This itemassesses the extent to which subjects are seenon follow-up or receive the entire treatmentprescribed. Full credit is given when thenumber of subjects available for each fol-low-up visit or the number receiving fulltreatment regimens is reported. Partial cre-dit is assigned when adherence is describedin qualitative terms only (e.g., patients at-tended most recall visits or received the ma-jority of their treatment doses).

Blinding of the randomization process:(Chalmers et al. 1983) This item evaluatesthe randomization procedure to ensure thatthe investigators were not able to predict orinfluence which treatment the next patientwould receive in studies where subjects wererandomized to different groups, or which


quadrants would be assigned in studies witha 'spht-mouth' design. Full credit for blin-ded randomization is assigned when the al-location is defined through a centralizedoffice, by telephone, by sealed envelopes, orthrough a centralized pharmacy. Methodssuch as ID number, table of random num-bers, flip of a coin or alternate patients arenot considered bhnded procedures andstudies using these methods are not givenany credit. Using a table of random num-bers would be appropriate as long as thenumbers generated are coneealed in a sealedenvelope so that investigators enrolling sub-jects are not aware of the assigned treat-ment.

Blinding of subjects andjor observers: Fullcredit is assigned when blinding as to thera-py was performed and reported. In somesituations, blinding is not possible and thisitem is not included.

Test of the randomization procedure: Thisitem assigns credit for evaluation of the suc-cess of the randomization procedure inachieving equal groups of patients. Full cre-dit is assigned when pre-treatment infor-mation about the different groups is re-ported. Ideally, this should include specificinformation about the prognostic factors(e.g., distribution of teeth in disease severitycategory, DMFS score, presence of metasta-sis). Partial credit is given when authorsclaimed they assessed the randomizationprecedure but failed to report methods orresults.

Testing of blinding: Studies using blindingprocedures should evaluate and report onthe effectiveness of these blinding mechan-isms. Full credit is assigned when this evalu-ation is reported in detail and partial creditwhen it is simply mentioned. In studieswhere blinding is not applicable, this itemis not evaluated.

Stopping rules: (Chalmers et al. 1976) Thisitem applies particularly to studies wheretreatments other than those assigned byrandom allocation were, by protocol, admi-nistered to subjects or particular sites whichwere not responding adequately to the ran-domly assigned treatment. Here, full creditis assigned when definitions of non-re-sponse are provided, the alternative treat-ment to be perfonned is indicated, and thenumber of subjects so affected is reported.All papers should include a statement abouthow decisions will be made to stop a study.

Size of the study: Full credit is assigned forthis item when the criteria for the samplesize calculation are reported. Partial creditis given when the authors claim that thetotal number of subjects required was deter-mined in advance but do not explain theunderlying assumptions which led to thenumber.

Error measurement: This item relates to themeasures of oral health or disease statusused to classify subjects and evaluate treat-ment outcomes. Full credit is given when theprotocol for error measurement is describedand results are reported. Partial credit isassigned when it is simply stated that exam-iners were calibrated.

Form C items evaluate the quality of dataanalysis and presentation. Readers have theentire paper available for scoring this sec-tion. Items in this section are interpreted asfollows:

Dates of the study: Credit is given whenspecific dates of the beginining and end ofrandomization are provided.

Results of pre-randomization analyses: Thisitem refers to how authors handle the distri-bution of subjects as determined by the ran-domization process. Full credit is assignedwhen authors present data and state that

A S S E S S M E N T OF R A N D O M I Z E D T R I A L S 309

there were no statistically significant differ-ences between groups or between quadrantsas a result of randomization, or when dataanalysis takes into consideration unbalan-ced randomization. Partial credit is givenwhen data are presented on the results ofrandomization (without calculation of stat-istical significance) or when authors simplystate without statistical documentation thatthere were no differences.

Major endpoints: When both the test stat-istic and its significance level (p-value) areavailable so that readers can verify the stat-istical conclusions, full credit is assigned.When either alone is reported, partial creditis given. The absence of both is unaccept-able.

Post-beta estimate: (Freiman et al. 1978)This item applies to studies where no statis-tically significant differences are reportedbetween treatments. Full credit is givenwhen authors discuss the possibility of aType II Frror and estimate its probabihty.Partial credit is assigned when authors ac-knowledge that lack of statistical signifi-cance may be related to the small samplesize.

Confidence limits: Full credit is assignedwhen confidence limits or mean differencesand their standard errors are reported sothat confidence limits can be determined.Adequate sizing can also be determinedfrom the width of the confidence intervalsaround the mean difference.

Life table/repeat measures: (Peto et al. 1977)The presence of either Hfe tables or repeatmeasures is evaluated for this item depend-ing on the outcome criteria in the study. Forstudies which look at mortality, treatmentof oral cancer for example, a life tablewould be an appropriate way to presentresults. This is especially important when

the period of observation differs betweensubjects. For other studies, repeated mea-surement of outcomes of interest, such aslevel of attachment or caries increment,might be reported. Full credit is given whenlife table methods are employed. Partial cre-dit is given when time curves are presentedwithout adequate description of the meth-od.

Timing of events: Full credit is given whenboth the number of subjects evaluated ateach time and the value of the variablesmeasured are given, preferably in tabularform. Partial credit is assigned when onlyone of these, either number of subjects orvariable values, is reported.

Regressionlcorrelation analysis: When eitherregression or correlation analysis is per-formed to allow for variables in prognosticfactors, full credit is given.

Statistical analysis: This item relies on thereaders' impression of the overall quality ofthe statistical analysis based on the resultsof the first eight items and the appropriate-ness of the statistical methods used in thestudy design and type of data collected.Considerations here include choice of sta-tistical tests (e.g., parametric versus non-parametric methods). This analysis can bescored as either excellent, good, fair or poor.

Withdrawals: Full credit is assigned when astudy reports that there were no dropoutsor withdrawals, or when all are listed bytreatment group and reason for withdrawal.No credit is given when there is no mentionof withdrawals or when authors reportmore than 15% dropouts.

Handling of withdrawals: (Peto et al. 1977)Maximum credit is given when results arepresented with and without withdrawals inthe analysis. When dropouts are included


in their assigned treatment group, partialcredit is assigned. Less credit is given whenwithdrawals are simply discarded. Whendropouts are included in other than theirrandomly assigned treatment group, orwhen their assignment is not reported, nocredit is given.

Side effects discussion: When data on sideeffects and the number of subjects sufferingfrom each is presented, full credit is as-signed. Partial credit is given when there isonly a general discussion of side effects.When authors fail to mention side effects,studies receive no credit.

Retrospective analysis: Full credit is as-signed when retrospective analysis is donefor a number of prognostic factors (e.g., byinitial pocket depth, tooth type, responseto initial therapy, presence of metastasis atdiagnosis, history of fluoride exposure).Partial credit is given when the analysis isonly performed for one variable, or whenobvious factors are not included.

Study procedureTo perform a quantitative quality assess-ment of randomized control trials of a par-ticular therapy, one must first identify anddelimit the therapy or disease of interest.Examples include all randomized controltrials of periodontal surgical treatments foradult chronic periodontitis, of chemothe-rapeutic versus surgical treatments of oralcancer, of sealants to prevent occlusal cariesin children 6-14 yr of age, of school-basedversus home-based fluoride rinse programsfor caries prevention, and of different post-operative pain medications following ex-traction of impacted third molars.

The second stage involves a literaturesearch to identify reports of all pertinenttrials during a specified time period. Thisis accomphshed by a Medlar search plusscanning of the bibhographies of all pub-

hshed trials and review articles. Then allreports are evaluated using the criteria de-scribed previously for Forms A, B and C.All papers are read and scored indepen-dently by two readers. Any area of disagree-ment is settled in conference and a finalscore for the paper is determined.

Some studies will have a number of pub-lished reports in the literature. Space limi-tations in journals often require, for exam-ple, that later reports of a study do notinclude details of subject selection or studyprotocol. It is necessary, therefore, to calcu-late an overall score for the study based oncombined information from all the reports.For example, even if only the first report ofa trial provides adequate selection descrip-tion, the score for the study should be givenfull credit because the information was in-cluded somewhere.

The method for calculating an overallquahty score for a randomized control trialis as follows. A score is assigned to eachitem on each form when the item is appli-cable. An example of a non-applicable itemwould be placebo taste and appearance fora surgical trial. The total number of pointsearned becomes the numerator, the totalnumber of potential (applicable) points thedenominator. Items are assigned differentweights depending upon their perceived im-portance. Randomization and blinding pro-cedures and descriptions of subjects andtreatments carry the greatest weight. Anoverall score is determined by adding thescore of all three forms. The range of valuesfor an overall score is from 0.00 to 1.00. Todate, 286 miscellaneous RCTs in medicinehave been evaluated using this system. Themean quality score for these studies is0.46 + 0.19. Some trials have had scores inthe 0.90s, and some less than 0.10.

Discussion

The criteria and scoring method for a sys-

ASSESSMENT OF RANDOMIZED TRIALS 311

tem to evaluate the quality of randomizedcontrol trials (RCTs) in dental researchbased on published reports has been pre-sented. The score determined for a studyreflects the information available in the re-port and may, therefore, not be an entirelyaccurate assessment of what was done inthe trial. It is, however, the only informationavailable to readers upon which to makejudgments about the research and findings.Evaluation of the quality of RCTs is thefirst step in efforts to combine data from anumber of similar trials in meta-analysis.It is useful in situations where trials areundersized, have conflicting results, or areinconclusive. Assessing the quality of RCTscan also contribute to improved study de-sign, implementation, and reporting. Theseimprovements will afford better interpre-tation and comparison of results and max-imize the benefits obtained from clinical re-search.

Acknowiedgements

This work was supported by Grant No.LM03116 from the National Library ofMedicine and Grant No. 1 R03 HS 05138-01 from the National Center for Health Ser-vices Research.

References

Blackburn, B. A., Smith, H. Jr. & Chalmers, T.C. 1982. The inadequate evidence for shorthospital slay after hernia or varicose vein strip-ping surgery. The Mount Sinai Journal of Medi-cine 49, 3S3-390.

Chalmers, T. C , Celano, P., Sacks, H, C. &Smith, H. Jr. 1983. Bias in treatment assign-ment in controlled clinical trials. New EnglandJournal of Medicine 309: 1358-1361.

Chalmers, T. C. and discussants. 1976. How toturn off an experiment. In: Ethical Safeguardsin Research on Humans, ed. Cooper, J. D. &Lley, H. D. pp, 119-143. Washington, D.C.:Interdisciplinary Communications Associates.

Chalmers, T. C , Smith, H. Jr., Ambroz, A., Reit-man, D. & Schroeder, B. J. 1978. In defense ofthe VA randomized control trial of coronaryartery surgery. Clinical Research 26: 230-235.

Chalmers, T. C , Smith, H. Jr., Blackburn, B.,Silverman, B., Schroeder, B., Reitman, D. &Ambroz, A. 1981. A method for assessing thequality of a randomized control trial. Con-trolled Clinical Trials 2: 31-49.

Chilton, N. W. & Barbano, J. P. 1974. Guidehnesfor reporting clinical trials. Journal of Peri-odontal Research 9: (suppl. 14) 207-208.

DerSimonian, R., CharetVe, J., McPeek, B. &Mosteller, F. !982. Reporting on methods inclinical trials. New England Journal of Medicine306: 1332-1337.

Freiman, J. A., Chalmers, T. C, Smith, H. Jr. &Kuebler, R. R. 1978. The importance of beta,the type II error and sample size in the designand interpretation of the randomized controltrial. New England Journal of Medicine 299:690-694.

Levenstein, M. J. & Bishop, Y. M. M. 1981.Analysis and reporting as causes of controver-sies. In: Controversies in Clinical Care, ed. Ro-senoer, V. M. & Rothschild, M., Ch. 1 pp.1-24. New York: Spectrum.

Mostelier, F. 1979. Problems of omission in com-munications. Clinical Pharmocological Thera-peutics 25: 761-764.

Mosteller, K, Gilbert, J. P. & McPeek, B. 1980.Reporting standards and research strategies forcontrolled trials. Controlled Clinical Trails 1:37-58.

O'Fallon, J. R., Dubey, S. D., Satsberg, D. S.,Edmonson, J. K., Soffer, A. & CoUon, T. 1978.Should there be statistical guidelines for medi-cal research papers? Biometrics 34: 687-695.

Peto, R., Pike, M. C , Armitage, P., Breslow, N.E., Cox, D. R., Howard, S. U., Mantel, N.,McPherson, K., Peto, J. & Smith, P G. 1977.Design and analysis of randomized clinicaltrials requiring prolonged observation of eachpatient. IT. Analysis and examples. BritishJournal of Cancer 35: 1-39.

Address:

Alexia A. AntczakDepartment of Health Policy and ManagementHarvard School of Public Health677 Hiintington AvenueBoston, Massachusetts 02115U.S.A.

ANTCZAK, TANG AND CHALMERS

AppendixFORM A

Study Reader1.1 Author(s)

1.2 Title

1.3 Journal1.4 Volume and Year1.5 Procedure(s) evaluated1.6 BiostatisticJan

a. Authorb. Creditsc. Neither

1.7 Countrya. U.S.b. U.K.c. Scand.d. Other/Unknown

1.8 Randomizea. Patientb. Mouth

1.9 Cost and Time Dataa. Specificb. Generalc. None

1.10 Global Enthusiasm of Authorsa. Enthusiasticb. Moderatec. Equivocald. Negative

1.11 Sampling of Patient Populationa. Dental school patientsb. Dental studentsc. Private practice patientsd. School childrene. Other

A S S E S S M E N T OF R A N D O M I Z E D T R I A L S '313

FORM B; STUDY PROTOCOL

Reader.

2.1

2.2

2.3

2.4

2.5

2.6

2.7

Selection Descriptiona. adequateb. fairc. inadequate

Rejection Loga. yesb. parfiaic. no/unknown

Therapeutic RegimenDefinitiona. adequateb. fairc. inadequate

Placebo/ControlAppearancea. sameb. differentc. unstatedd. not applicable

Placebo/ControiTastea. sameb. differentc. unstatedd. not appiicabie

Foilow-up Schedulea. adequateb. fairc. inadequate

Test of Adherenceto Treatmenta. adequateb. fairc. inadequate

Randomization Blinda. yesb. partiaic. faird. inadequate

Points

31.50

31>S0

3

0

31.5%

3.f.5

3.1.50

1,3Q

10500

2.9 Patient Blindeda. yesb. no/unknownc. not appiicabie

2.1C Observer Blind toTreatmenta, yesb. partiaic. no/unknownd. not appiicabie

2.11 Observer Blind to Resultsa. yesb. partiaic. no/unknown

2.12 Testing Randomizationa. adequateb. fairc. inadequate

2.13 Testing Blindinga. adequateb. fairc. inadequated. not appiicabie

2.14 Stopping Rulesa. adequateb. fairc. inadequate

2.15 Prior Estimate ofSampie Sizea. yes, reportedb. partialc. no/unknown

2.16 Error Measurementa. adequate & biindedb. partialc. no/unknown

Total Possible Points

Points

8Q

8̂40

4t0

31.50

3

0

31.50

3150

31.50

63

3i4 ANTCZAK, TANG AND CHALMERS

ID

3.1

3.2

FORM C: DATA ANALYSIS AND PRESENTATION

Reader.

3.3

3.4

3.5

3.6

3.7

Dates of the Studya. yesb. no

Resuits of RandomizationA. Data analysis

a. adequateb. fairc. inadequate

B. Prognosticaiiy favorsa. treatmentb. controlsc. equivocaid. unknown

Major Endpointsa. test & pb. p, no testc. test, no pd. neither

Post Beta Estimatea. yesb. mentionc. nod. not appiicabie

Confidence Limitsa. yesb. noc. not appiicabie

Life Table/RepeatMeasurementa. adequateb. fairc. inadequated. not appiicabie

Timmg of Eventsa. compieteb. availablec. neither

Points

2Q

210

41

•1

0

31,50

,3D

10

4-20

3.8 Regression/CorrelationAnalysisa. yesb. noc. not appiicabieWhich?a. regressionb. correiationc. neither

3.9 Statisticai Analysisa. exceilentb. goodc. faird. poor

3.10 Withdrawalsa. listedb. nonec. no iist/unknownd. >15%

3.11 Handling Withdrawaisa. severai waysb. include in originaic. discardd. change group/unknowne. not appiicabie

3.12 Side Effects Discussiona. adequateb. fairc. poord. not appiicabie

3.14 Retrospective Anaiysisa. goodb. partiaic. noned. not appiicabie

Points

20D

4 '

•1

0

4

10

42T0

=3.1.S-Q

1:50:

Total Possible Points 40

Documents

Quality assessment of randomized control trials in dental research I. Methods