12
Journal of Consulting and Clinical Psychology 1983, Vol. 51, No. 1,42-53 Copyright 1983by the American Psychological Association, Inc. 0022-006X/83/5101-0042S00.75 Comparative Therapy Outcome Research: Methodological Implications of Meta-Analysis David A. Shapiro and Diana Shapiro Medical Research Council / Social Science Research Council Social and Applied Psychology Unit, University of Sheffield, Sheffield, England Findings obtained in the course of a meta-analysis of 143 outcome studies, pub- lished over a 5-year period, in which 2 or more treatments were compared with a control group, are used to evaluate the quality of such research. The statistical conclusion and internal validity of the research reviewed are generally satisfactory, although construct validity is insufficient to rule out the influence of nonspecific and demand effects. The construct and external validity of the work reviewed are severely limited by its unrepresentativeness of clinical practice. Several studies are marred by nonreproducible accounts of treatments, inadequate description of clients, therapists, and design features, and faulty data presentation. It is con- cluded that this meta-analysis underlines the urgent need for greater method- ological diversity and clinical realism in therapy research. Recent reviews of comparative psycho- therapy-outcome research (Frank, 1979; Lu- borsky, Singer, & Luborsky, 1975; Smith & Glass, 1977; Smith, Glass, & Miller, 1980) have converged on the conclusion that the widely differing available therapies are mod- estly, but equally, effective. However, this view has been contested by proponents of behavioral methods (Eysenck, 1978; Kazdin & Wilson, 1978; Rachman & Wilson, 1980). These authors assert that such a conclusion is contradicted by the results of well-con- ducted studies favoring behavioral methods (Bandura, 1977; Franks & Wilson, 1978; Rachman & Hodgson, 1980) and owes its support to the above-cited reviewers' ill-con- ceived aggregation of data from unsound re- search studies. The reviews by Smith and Glass (1977) and Smith et al. (1980) used meta-analysis, the numerical combination of data from in- dependent studies, to generate collective, Portions of this study were presented at the annual meeting of the Society for Psychotherapy Research, As- pen, Colorado, in June, 1981. Diana Shapiro is now with the North Derbyshire Dis- trict Psychology Service. We gratefully acknowledge helpful comments on this work from Chris Barker, Gene Glass, Paul Jackson, Mary Lee Smith, Peter Warr, and an anonymous reviewer. Requests for reprints should be sent to David A. Shap- iro, MRC/SSRC Social and Applied Psychology Unit, The University, Sheffield, S10 2TN, England. summary statistics pertaining to the research question to which they are addressed (Glass, McGaw, & Smith, 1981). The method in- volves the application of the methodological principles of empirical research to the liter- ature review process, in pursuit of superior objectivity and dependability. Beyond the field of therapy, the method has been applied to such problems as the effects of school class size on attainment (Glass & Smith, 1979), the relationship between social class and achievement (White, 1977), interpersonal ex- pectancy or "experimenter effects" (Rosen- thai & Rubin, 1978), self-serving bias in in- terpersonal influence situations (Arkin, Cooper, & Kolditz, 1980), and sex differences in conformity (Cooper, 1979). Elements of the method include calculation and aggre- gation across studies of indexes of statistical significance or magnitude of effect, system- atic and often exhaustive search procedures, coding of objective and qualitative features of the source studies, and investigation of the correlates of study outcome via disaggrega- tion and regression analysis (Shapiro & Sha- piro, 1982a). A major criticism of meta-analysis con- cerns its aggregation of data from studies of diverse quality. Eysenck (1978), for example, invokes the computer scientist's dictum, "garbage in—garbage out." On the other hand, recent appraisals of meta-analysis (Cook & Leviton, 1980; Shapiro & Shapiro, 42

Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

Journal of Consulting and Clinical Psychology1983, Vol. 51, No. 1,42-53

Copyright 1983 by the American Psychological Association, Inc.0022-006X/83/5101-0042S00.75

Comparative Therapy Outcome Research:Methodological Implications of Meta-Analysis

David A. Shapiro and Diana ShapiroMedical Research Council / Social Science Research Council

Social and Applied Psychology Unit, University of Sheffield, Sheffield, England

Findings obtained in the course of a meta-analysis of 143 outcome studies, pub-lished over a 5-year period, in which 2 or more treatments were compared witha control group, are used to evaluate the quality of such research. The statisticalconclusion and internal validity of the research reviewed are generally satisfactory,although construct validity is insufficient to rule out the influence of nonspecificand demand effects. The construct and external validity of the work reviewed areseverely limited by its unrepresentativeness of clinical practice. Several studiesare marred by nonreproducible accounts of treatments, inadequate descriptionof clients, therapists, and design features, and faulty data presentation. It is con-cluded that this meta-analysis underlines the urgent need for greater method-ological diversity and clinical realism in therapy research.

Recent reviews of comparative psycho-therapy-outcome research (Frank, 1979; Lu-borsky, Singer, & Luborsky, 1975; Smith &Glass, 1977; Smith, Glass, & Miller, 1980)have converged on the conclusion that thewidely differing available therapies are mod-estly, but equally, effective. However, thisview has been contested by proponents ofbehavioral methods (Eysenck, 1978; Kazdin& Wilson, 1978; Rachman & Wilson, 1980).These authors assert that such a conclusionis contradicted by the results of well-con-ducted studies favoring behavioral methods(Bandura, 1977; Franks & Wilson, 1978;Rachman & Hodgson, 1980) and owes itssupport to the above-cited reviewers' ill-con-ceived aggregation of data from unsound re-search studies.

The reviews by Smith and Glass (1977)and Smith et al. (1980) used meta-analysis,the numerical combination of data from in-dependent studies, to generate collective,

Portions of this study were presented at the annualmeeting of the Society for Psychotherapy Research, As-pen, Colorado, in June, 1981.

Diana Shapiro is now with the North Derbyshire Dis-trict Psychology Service.

We gratefully acknowledge helpful comments on thiswork from Chris Barker, Gene Glass, Paul Jackson, MaryLee Smith, Peter Warr, and an anonymous reviewer.

Requests for reprints should be sent to David A. Shap-iro, MRC/SSRC Social and Applied Psychology Unit,The University, Sheffield, S10 2TN, England.

summary statistics pertaining to the researchquestion to which they are addressed (Glass,McGaw, & Smith, 1981). The method in-volves the application of the methodologicalprinciples of empirical research to the liter-ature review process, in pursuit of superiorobjectivity and dependability. Beyond thefield of therapy, the method has been appliedto such problems as the effects of school classsize on attainment (Glass & Smith, 1979),the relationship between social class andachievement (White, 1977), interpersonal ex-pectancy or "experimenter effects" (Rosen-thai & Rubin, 1978), self-serving bias in in-terpersonal influence situations (Arkin,Cooper, & Kolditz, 1980), and sex differencesin conformity (Cooper, 1979). Elements ofthe method include calculation and aggre-gation across studies of indexes of statisticalsignificance or magnitude of effect, system-atic and often exhaustive search procedures,coding of objective and qualitative featuresof the source studies, and investigation of thecorrelates of study outcome via disaggrega-tion and regression analysis (Shapiro & Sha-piro, 1982a).

A major criticism of meta-analysis con-cerns its aggregation of data from studies ofdiverse quality. Eysenck (1978), for example,invokes the computer scientist's dictum,"garbage in—garbage out." On the otherhand, recent appraisals of meta-analysis(Cook & Leviton, 1980; Shapiro & Shapiro,

42

Page 2: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

COMPARATIVE OUTCOME RESEARCH 43

1982a; Strube & Hartmann, 1982) argue thatit is preferable to retain data from studies ofvarying quality and then to determine em-pirically whether these yield differing results,rather than to discard data deemed "inferior"in terms of the reviewer's preferred meth-odological criteria. Furthermore, this inclu-sive strategy also yields additional benefits.By coding substantive and methodologicalfeatures of a large and representative set ofstudies, we may obtain a composite profileof a research literature. This is particularlyuseful in appraising the current state of theart in a research domain and guiding sug-gestions for further research. The present ar-ticle is devoted to just such a purpose in re-spect to the comparative therapy-outcomeliterature, drawing upon our meta-analysis(Shapiro & Shapiro, 1982b).

The Shapiro and Shapiro Study

Our meta-analysis (Shapiro & Shapiro,1982b) was designed in response to criticismsof the work of Smith and Glass (1977) ad-vanced by Kazdin and Wilson (1978) and byRachman and Wilson (1980). We took ac-count also of methodological considerationsin comparative-outcome research outlinedby Mahoney (1978) and Kazdin (1978), Ouraim was to appraise the extent to which anunbiased sample of relatively well-designed,recently published comparative-outcomestudies supports the case for differential ef-fectiveness of diverse therapy techniques.Consistent with the emphasis of Mahoney(1978), we confined our attention to studiesfollowing a contrast-group design* in whichclients are assigned to two or more treatmentgroups and one or more control groups. Thisdesign was followed only by a small minority(less than 10%) of studies concerned withtreatment outcomes. Studies were excludedfrom our meta-analysis if they comparedonly one active treatment with one or morecontrol groups, if they compared supposedlyactive treatments without any no-treatmentor minimal-treatment groups, or if treat-ments could only be compared in their datavia own-control comparisons, as in multiple-baseline or crossover designs. We confinedour search to Psychological Abstracts, 1975-79. No reference was made to review articles

or bibliographies, in order to avoid the sam-pling bias such sources might introduce. Ourresearch yielded 143 studies meeting theabove design requirements and presentingsufficient data to permit calculation or esti-mation of effect sizes; only 21 (15%) of thesestudies were also included by Smith et al.(1980).

Before considering the methodological im-plications of this study, an outline of its sub-stantive findings is in order. Consistent withprevious reviews (Smith & Glass, 1977; Smithet al., 1980) the mean effect size approached1 standard deviation unit and differencesamong treatment methods accounted for atmost 10% of the variance in effect size. Theimpact of differences between treatmentmethods was outweighed by the combinedeffects of other variables, such as the natureof the target problem under treatment (ac-counting for some 20% of the variance), as-pects of the measurement methods used toassess outcome, and features of the experi-mental design. The methodological implica-tions of these nontreatment influences uponoutcome will be considered below. Multipleregression analysis suggested that differencesbetween treatments were largely independentof these other factors, however. Direct com-parisons between pairs of treatments figuringtogether in the same subsets of the data sug-gested some consistent differences, with cog-nitive and certain multimodal behavioralmethods yielding favorable results. Dynamicand humanistic methods were sparsely rep-resented in the data and were associated withapparently modest effects.

Issues Addressed

Meta-analysis cannot transcend the limi-tations of the data upon which it is based. Itcan but hold a mirror to the scientific com-munity, summarizing the conclusions andthe quality of the available evidence concern-ing the substantive questions at issue. In eval-uating the quality of the comparative, con-trolled-outcome literature encompassed byour meta-analysis, we conceptualize the de-pendability of conclusions from research interms of Cook and Campbell's (1979) dis-cussion of validity. These authors have sub-divided the concepts of internal and external

Page 3: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

44 DAVID A. SHAPIRO AND DIANA SHAPIRO

validity, familiar from Campbell and Stan-ley's (1963) treatment, into four types. Sta-tistical conclusion validity refers to the valid-ity with which a study permits conclusionsabout covariation between the assumed in-dependent and dependent variables. Threatsto statistical conclusion validity typicallyarise from unsystematic error rather than sys-tematic bias. Internal validity refers to thevalidity with which statements can be madeabout whether there is a causal relationshipfrom independent to dependent variables, inthe form in which they were manipulated ormeasured. Threats to internal validity typi-cally involve systematic bias. Construct va-lidity of putative causes and effects refers tothe validity with which we can make gener-alizations about higher-order constructs fromresearch operations. Finally, external validityrefers to the validity with which conclusionscan be drawn about the generalizability of acausal relationship to and across populationsof persons, settings, and times. In addition,we are concerned with reporting adequacy,the extent to which accounts of the proce-dures and circumstances of each study, andthe data obtained, meet minimal require-ments of precision and replicability.

The methodological concerns in treatmentevaluation raised by Mahoney (1978) andKazdin (1978) are usefully subsumable withinthe Cook and Campbell (1979) scheme. Forexample, Kazdin (1978) urges that the clin-ical-analogue continuum be analyzed intocomponent dimensions relating to targetproblem; population; recruitment; therapists,selection, set and setting; treatment; and as-sessment. Study parameters bearing uponeach of these dimensions were coded withinour meta-analysis, thus permitting a thor-ough analysis of the external validity of theresearch reviewed therein. Since our meta-analytic coding was designed to refine themethods of Smith and Glass (1977) withinresource constraints, we do not have databearing upon every concern voiced by Cookand Campbell (1979), Mahoney (1978), andKazdin (1978). On the other hand, our dataare sufficiently detailed to permit some con-sideration of all four validity types and toyield important conclusions concerning thecurrent state of the art in outcome research(Frank, 1979; Shapiro, 1980).

Statistical Conclusion Validity

A central issue in statistical conclusion va-lidity concerns statistical power. The greaterthe sample size, the greater the power of thestudy to yield a significant effect (Cohen,1977; Kraemer, 1981). For example, Krae-mer (1981) reports that a study with 10 sub-jects in each of two groups has a 68% prob-ability of detecting an effect size of 1 (groupMs separated by 1 SD) via a two-sample, one-tailed t test with a 5% alpha level. Thus thechoice of sample size depends on the mag-nitude of effect the investigator wishes to beable to detect. The 414 treated groups in ourmeta-analysis contained a mean of 11.98(SD = 7.12) clients; the 143 control groupscontained a mean of 12.12 (SD = 6.64)clients. Forty two (10%) of the treated groupscontained six or fewer clients. Of the treatedgroups, 263 (64%) contained 10 or moreclients and 115 (28%) contained 13 or moreclients. If we assume that an effect size of 1is a reasonable expectation in outcome re-search, these data suggest that the majorityof our published, controlled comparative-outcome studies use adequate, although notimpressive, sample sizes. Since the probabil-ity of publication is enhanced by the attain-ment of statistical significance (Shapiro &Shapiro, 1982a), one might expect that pub-lished findings with small samples would tendto show larger effects than more powerfulstudies with larger samples, capable of statis-tical significance with a smaller effect. Con-sistent with this expectation, correlations of-.14 and -.21, respectively, were obtainedbetween treated-group and control-group Nand effect size.

Uncontrolled variation in the way a treat-ment is implemented presents a threat to sta-tistical conclusion validity. In our meta-anal-ysis, we examined the precision with whichtreatment methods were described, assessedin terms of the extent to which the sourcepaper permitted reproduction of the methodby other workers. Three levels of reproduc-ibility were defined. Of the groups includedin our analysis, 159 (38%) received treat-ments that were well described in the text orin documents available from the source-studyauthor or in other cited work. These treat-ments were either fully manualized or used

Page 4: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

COMPARATIVE OUTCOME RESEARCH 45

well-defined and standard procedures thatany recently trained clinical psychologistmight be expected to replicate on the basisof the information supplied. At the oppositeextreme, 47 (11%) of the groups were treatedby methods that could not be reproducedwith any precision on the basis of availableinformation. Between these extremes lay the208 (50%) groups receiving treatments givenoutline descriptions or adapted from meth-ods that were well described elsewhere. Thus,almost two-thirds of our data were based ontreatments that were not well specified. Evenamong the 38% that were coded as highlyreproducible, full manuals were available inonly a minority of cases. Thus, our standardsin coding-this variable were not exacting. Ofcourse, it must be acknowledged that sometreatments are intrinsically more readilyspecifiable than others. Nonetheless, the ma-jority of the literature reviewed is deficientin this respect. However, the impact of thisvariable upon outcome was minimal,r(412) = .03, p> .10.

A related requirement is the monitoringof the actual carrying out of experimentalprocedures in order to check that the inde-pendent variables are manipulated as in-tended. So rarely was this done in the studiesunder review that it did not occur in the sam-ple of the studies on which the coding systemwas developed, and hence was not coded inthe meta-analysis proper. This virtually totalneglect by researchers of the need to monitortreatments is a source of grave concern.One need not lack integrity or competence to exhibitvariability in the administration of even the most simpleexperiment, and these variations may figure prominentlyin the interpretation of one's data (Mahoney, 1978,p. 665).

Internal Validity

A major safeguard against bias and con-sequent lack of internal validity is the ran-dom assignment of clients to treatment con-ditions. Only 10% of the client groups withinour meta-analysis were not randomly as-signed. Although somewhat controversial(Mahoney, 1978, p. 664), procedures inwhich systematic constraints were imposedon otherwise random assignment (either toassure matched group means or to match in-

dividual cases), as adopted in 32% of thegroups under study, were considered to resultin at least as much control over extraneoussources of variation as achieved by uncon-strained randomization. It is of some interestto note that the 57% of groups using uncon-strained randomization yielded an overalleffect size typical of the study as a whole,whereas the poorest designs (failing to ran-domize) yielded a somewhat reduced effectsize, M = .76, although there was no signif-icant effect of the assignment variable uponeffect size, F(3, 384) = 1.54, p > .10.

A related design feature is client attrition;if high, this can introduce bias in the com-parisons between groups. With a mean of10,68% of clients lost from the treated group(SD = 13.49) and a mean of 9.19% of clientslost from the control group (SD= 13.56),attrition was commendably low in mostcases. Similarly, 75% of the effect sizes wereobtained from groups showing no significantdifferences on any measure at pretreatmenttesting, and the adequacy of pretreatmentequivalence was, if anything, positively cor-related with obtained effect size, r(1401) =.07, p < .05. In these respects, therefore, wemay conclude that most studies are well de-signed and executed and that the overall re-sults are, if anything, attenuated by the pres-ence of inferior studies.

Construct Validity

Representativeness of Treatments

Cook and Campbell (1979) consider theadequacy with which the independent vari-able is represented within a study as an im-portant aspect of construct validity. Severalaspects of treatment as the independent vari-able are germane to the issue of representa-tiveness. Behavioral and cognitive therapieswere overwhelmingly predominant, with only5% of the groups receiving dynamic or hu-manistic methods. As reported in detail byShapiro and Shapiro (1982b), these lattermethods yielded relatively modest effect sizes,implying that the overall mean effect size maybe inflated relative to clinical practice by thedisproportionate preponderance of thesemethods. Any inference concerning the rel-ative efficacy of behavioral and nonbehav-

Page 5: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

46 DAVID A. SHAPIRO AND DIANA SHAPIRO

ioral methods is, however, subject to the im-portant cautionary limitations detailed byShapiro and Shapiro (1982b).

Only 166 (41%) of the treated groups wereseen individually; the majority, 210 (52%),were treated entirely in groups. Individualtherapy was associated with larger effect sizes,r(405) = .15, p < .01. Thus it appears thatthe preponderance of grpup treatments mayhave attenuated the overall effect size ob-tained in the meta-analysis. Therapists typi-cally had some 3 years' experience, M=2.9l;this average level represents advanced grad-uate students, rather than fully qualified andestablished professionals. However, the cor-relation between therapist experience andeffect size was apparently negative, r(269) =.14, p < .05, although this association wasprobably due to the tendency of researchersto employ more experienced therapists whenthe target problems were less readily treated.The correlation was abolished by partiallingout the effects of dummy variables repre-senting target problems (Shapiro & Shapiro,1982b). It is nonetheless of some interest thatno positive association occurred with thera-pist experience, even after making allowance,for the tractability of the treatment target.The typical duration of treatment, M = 6.89hours, might appear unrepresentatively short,in comparison with clinical folklore. How-ever, Garfield (1978) reports that traditionalexpectations in this regard are at variancewith the empirical data, which indicate thatmost clinic clients remain in therapy for onlya few interviews. On the other hand, theremay be important differences between pre-mature termination of a therapy expected toextend for several months and participationin the designedly brief therapy typical of ourmeta-analysis. The duration of therapy wasuncorrelated with effect size, r(394) = .05,p> .10.

Control for Nonspecific Effects

Significantly, Cook and Campbell (1979,p. 60) cite the placebo effect and the intro-duction of the double-blind experimental de-sign as their first example of a construct va-lidity problem and the efforts of researchersto overcome it. Within the literature on psy-

chological treatment, extensive discussionand research has focused on the comparativecredibility of treatment and control condi-tions (Bootzin & Lick, 1979; Kazdin, 1979;Kazdin & Wilcoxon, 1976; Lick & Bootzin,1975; Shapiro, 1981;Wilkins, 1979a, 1979b).Ideally, construct-validity considerations re-quire the demonstration that the theoreticalmechanisms underlying the differential effi-cacy of treatments correspond to those pos-tulated by the treatment's rationale. Thus, alltreatments under comparison, including con-trol conditions, should be of demonstrablyequal credibility to the client, so as to elim-inate differential client expectation of benefitas the origin of outcome differences. Fur-thermore, the impact of investigator expec-tancies should be reduced via blind assess-ment procedures (client-change data ob-tained by personnel unaware of the treatmentassignment of individual clients) and the useof measurement methods that are minimallyreactive to the demand characteristics of theassessment situation. How well does the re-search reviewed in our meta-analysis meetthese requirements?

Of the 143 studies reviewed, 70 (49%) in-cluded at least one control group receivingsome form of minimal or placebo treatment.This leaves fully one half of the studies mak-ing no attempt to control for participationand expectancy effects, except insofar as thesecan be considered controlled for in the com-parison between active treatments. The ab-solute (as distinct from relative) impact oftreatment is only hazardously inferred fromstudies lacking such controls.

Kazdin and Wilcoxon (1976) have re-viewed strategic issues in the design of ex-perimental procedures to control for non-specific treatment effects. These authors ar-gue that the most rigorous procedure involvesempirical demonstration that the controlcondition generates comparable client expec-tancies for change to those generated by thepurportedly active treatment under investi-gation. Unfortunately, however, this empiri-cal credibility matching was encountered inonly four of our source studies, although sev-eral authors described the construction ofplacebo conditions with a view to enhancingcredibility, without presenting data to dem-onstrate their success in this endeavour.

Page 6: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

COMPARATIVE OUTCOME RESEARCH 47

In the absence of empirical credibilitymatching, Kazdin and Wilcoxon advocate atreatment-element control strategy, in whichthe control procedure resembles the activetreatment as closely as possible. The leastpreferred strategy reviewed by these authorsis the attention placebo method, wherein anyprocedure is designated by the experimenteras a control for nonspecific treatment effects.In reviewing the minimal treatment condi-tions employed within our source studies, wefound two additional classes: Minimal-con-tact conditions were confined to supplyinginformation or instructions, usually with lesstherapist contact than in the active treatmentgroups; counseling/discussion control groupswere considered separately from other pla-cebo control groups where these were clearlydesignated as placebos by the source studyauthor, in a design contrasting these withbehavioral treatments and with no articu-lated theoretical rationale sufficient to war-rant coding as an active, dynamic/human-istic method.

In Table 1, minimal treatments of the fourtypes outlined above are listed separately,according to whether they figured as a treat-ment group for which effect sizes were ob-tained by comparison with another (usuallyno-contact) control group or as the controlcondition used in the calculation of effectsizes for that study. The table shows that therewere only 15 instances of the treatment-ele-ment strategy favoured by Kazdin and Wil-coxon (1976); the most frequent strategy wasthe attention placebo (32 groups), of whichthe counseling/discussion control (12 groups)may be considered a variant. Table 1 showsthat, considered as treatment groups, the fourcontrol strategies yielded similar effect sizes,with the exception of the minimal-contactgroups, whose mean was inflated by one

group obtaining a mean effect size of 3.22.To the right of Table 1 are shown the meaneffect sizes obtained by all treatment groupsin each study whose control condition foreffect-size calculation fell in each of the fourclasses of minimal control group. The largerthese means, the greater the advantage en-joyed by the treatment groups over the con-trol condition of interest here. Once again,there are few differences, the apparentlygreater disadvantage of the counseling/dis-cussion groups being due to one group againstwhich a mean effect size of 4.68 was recorded.Overall, we conclude that the choice of con-trol groups in the source studies was some-what unsatisfactory. In terms of the discus*sion by Kazdin and Wilcoxon (1976), onlya small minority of our source studies fol-lowed preferred strategies for eliminating thehypothesis that nonspecific effects may ac-count for the differences obtained betweentreated and untreated clients. On the otherhand, there is no evidence from these datathat stronger controls for non-specific effectsresult in apparently weaker treatment effects,as would be expected were the apparent ef-fects of treatment inflated by uncontrolledeffects of this kind.

Outcome Assessment

Measures were administered blind in thecase of 530 (53%) of the outcome assessmentswithin the meta-analysis. Nonblind assessors,people who knew which group each clientwas in but, did not themselves act as thera-pists, administered 183 measures (18%). Weconsider that such assessors are vulnerableto bias but to a somewhat less degree thanthe'therapists themselves. The therapist treat-ing the client administered 293 (29%) mea-sures. Thus, over one half of our data are

Table 1Minimal Treatments Within Meta-Analysis

Type

As treatments As control conditions

N of groups Effect size SD N of groups Effect size

Minimal contactAttention placeboCounseling/discussionTreatment element

51986

1.37.50.50.54

1.12.67.48.58

91349

.611.041.59.74

.36

.542.06.83

Page 7: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

48 DAVID A. SHAPIRO AND DIANA SHAPIRO

vulnerable to criticism on this score, a factwhose importance is heightened by the sig-nificant correlation of our "blindness" mea-sure with effect size, r(l,004) = -.10, p<.01, indicating larger measured effects oftreatment with nonblind assessment. Thevulnerability of the measures to demandcharacteristics was assessed via a modifica-tion of Smith et al.'s (1980) reactivity scale.Only 192 (11%) of the measures were mini-mally reactive (physiological measures). Themajority (1,172 or 64%) of measures werecoded as slightly reactive (blind behavioralratings and standardized tests). Measurescoded as highly reactive, comprising suchmeasures as therapist ratings and simple self-reports, numbered 464. There was a signifi-cant tendency for more reactive measures toyield larger effect size, r( 1,826) = . 11, p; < .01.Taken together with the data on blinding,these results indicate that demand character-istics are free to influence outcome in manystudies and that they do indeed appear to in-flate the estimates of effectiveness obtainedto some degree.

In outcome research, construct-validity is-sues arise in relation to choosing dependentvariables and putting them into operation.Our coding of the specificity of outcomemeasurement reflects the degree to which thisdirectly taps the target problem under treat-ment (Agras, Kazdin, & Wilson, 1979, p. 68;projective measures = 1; standardized traitmeasures = 2; directly related psychologicalmeasures of physiological measures wherethe target is nonphysiological = 3; behavioraltests or physiological measures of physiolog-ical targets = 4). As shown in Table 2, themajority of outcome meaures were highlyspecific to the goals of treatment. The impactof specificity upon effect size was highly sig-nificant, F(3, 1824) = 15.83, p < .001. Morepowerful effects were associated with rela-tively specific measures, although the rela-tionship was not linear. These results are gen-erally encouraging, at least from the stand-point of the behaviorally oriented investigator.It appears that the overall Impact of treat-ment in our meta-analysis is somewhat at-tenuated by the minority of measures thatare not closely related to the target problem.On the other hand, personologists mightpoint to the relatively modest effects of treat-

Table 2Specificity of Outcome Assessmentand Effect Size

Specificityrating"

1234

N

2294

1091441

%

0166024

Mean effectsize

-.07.55

1.05.88

SD

.03

.671.191.29

• Defined in the text.

ment upon standardized trait measures as alimitation on the depth and generalization ofits impact.

A further important aspect of assessmentis its modality. Conventional wisdom favorsmultimodal assessment, the use of measuresbased on diverse technologies, which coun-ters the threat to construct validity posed byreliance upon a single assessment modality.In our meta-analysis, these were coded intoseven types (psychometric measures, self-rat-ings, direct behavioral measures, indirect be-havioral ratings, "real-life" nonbehavioraldata, physiological measures, and projectivetests). As shown in Table 3, most studies con-fined themselves to only one or two of thesemodes of assessment, despite the mean 4.42outcome measures obtained in each study.This limitation is somewhat unsatisfactory,although Table 3 shows that the number ofassessment modes was not related to themean effect size obtained for each group, F(4,409) = 1.33, p > .25. These findings are sub-ject to the minor qualification that some out-come measures obtained in the source stud-ies were discarded because authors did notsupply any data; but there is no reason tobelieve that such measures were any morediverse in respect to modality than those re-ported in full.

Table 3Multimodal Assessment and Effect Size

Mean effectM[modes)

12345

^(studies)

645421

31

^(groups)

17716754124

size

1.02.89

1.16.87.88

SD

.84

.53

.88

.41

.11

Page 8: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

COMPARATIVE OUTCOME RESEARCH 49

Outcome research should demonstrate themaintenance of treatment effects over time.Of our data, 77% were obtained immediatelyposttreatment, and only 6% were obtained 4or more months after the end of treatment.Although there was no relationship betweenextent of follow-up and the effect size ob-tained, F(3, 1824) = 1.57,/» .10, the meanfollow-up of .7-9 months is clearly unsatisfac-tory, indicating that controlled comparative-outcome research of the kind reviewed in ourmeta-analysis has generally failed to addressthe issue of maintenance of therapy gains. Inmitigation, however, it should be noted thata few longer term assessments were excludedfrom the meta-analysis because data fromuntreated controls were no longer available.Practical and ethical constraints militate hereagainst the demonstration of longer termtreatment effects, particularly in clinical pop-ulations.

External Validity

Cook and Campbell (1979) define externalvalidity in relation to problems of general-izing to particular target persons, settings,and times, and of generalizing across typesof persons, settings, and times. In the presentcontext, the central issue is that of represen-tativeness: To what extent do the persons andsettings of comparative-outcome researchrepresent those of clinical practice?

Concerning representativeness of samples,the practical constraints upon researcherscan lead to sampling procedures that limitthe generalizability of the findings. Only 67(17%) of the treated groups comprised indi-viduals who had sought the aid of a clinicianor whose problems were of clinical severity.The participation of all but 42 (11%) of thegroups was solicited by the investigators. Ofthe groups, 238 (61%) had a mean age of 20years or less; the great majority were under-graduate students; 282 (82%) of the groupswere university or college educated. Only 35(6%) of the groups comprised individualswho had received a psychiatric diagnosis.Only 30 groups (7%) presented with anxietyand depression similar to the generalizedneurotic problems commonly seen by clini-cians; in contrast, 126 (30%) of the groupswere treated for performance anxieties such

as test and public-speaking anxiety and 106(26%) were treated for physical and habitdisorders. These data point to serious limi-tations upon the external validity of the stud-ies reviewed.

Table 4 shows the correlations of clientvariables, aside from target problem, witheffect size. Kazdin (1978) argues against theunquestioning assumption that analoguestudies yield an inflated estimate of treatmentefficacy. In respect to client characteristics,for example, he suggests that the "real" clin-ical client may be more highly motivated toachieve improvement, thus evincing morecompliance and positive expectancy than stu-dents solicited to participate in return forcourse credit. Table 4 suggests little differencein outcomes between clinical and nonclinicalpopulations, between solicited and other par-ticipants, or between young, highly educated(i.e. college student) samples and other par*ticipants. On the other hand, those few stud-ies of patients receiving a formal psychiatricdiagnosis yielded significantly lower effectsizes than obtained for the nondiagnosed par-ticipants involved in the great majority ofstudies reviewed. Furthermore, differences ineffect size among clients with different targetproblems were highly significant, F(29,384) =3.53, p < .001, and accounted for over 20%of the variance in effect size, eta2 = .21. Asshown in Table 5, the largest effects were ob-tained for phobias, M = 1.28, and the small-est for anxiety and depression, M - .67, withintermediate results for physical and habitproblems, sexual and social problem's, andperformance anxieties. Thus it does appearthat the overall efficacy of treatments in-cluded in our review is inflated by the un-derrepresentation within the data of clinicalpopulations suffering depression and anxiety,

Table 4Correlations of Client Variables With Effect Size

Variable N(of groups)

SeveritySource (solicited vs. others)Age (years)Education (high vs. others)Diagnosis (present vs. absent)

.04

.05

.04

.00-.09*

393394393344414

* p < .05.

Page 9: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

50 DAVID A. SHAPIRO AND DIANA SHAPIRO

Table 5Target Classes and Effect Size

Target class

Anxiety and depressionPhobiasPhysical and habit

problemsSocial and sexual

problemsPerformance anxieties

N

3076

106

76126

Mean effect% size SD

118

26

1830

.671.28

1.10

.95

.80

.62

.88

.85

.75

.71

which are typically the most frequent recip-ients of psychotherapy in practice.

An important aspect of representativenessconcerns the setting in which treatment iscarried out and assessments of outcome aremade. The 71% of our groups that weretreated in laboratory settings yielded datathat were unrepresentative in two respects:Not only was the treatment itself carried outin a setting unlike that of clinical practice butlaboratory-based outcome assessments can-not be deemed representative of assessmentsmade in clinical settings (Kazdin, 1978). Al-though nonlaboratory data yielded slightlysmaller effect sizes, this difference was notsignificant, f(388) = 1.43, p > .10.

Reporting Adequacy

We note quite severe inadequacies of datapresentation; for only 60% of the data weremeans and standard deviations presentedwithin the source papers. A further 24% ofeffect sizes were calculated on the basis ofsupplied means together with error termsobtained from analysis of variance and re-lated statistics; the remaining 16% were in-ferred from minimal data, such as probabilityvalues and sample sizes, or by probit trans-formation of dichotomous data (Glass et al,1981). Thus, a substantial minority of thesepublished treatment-outcome studies failedto report basic descriptive statistics.

In addition to the problems raised by non-reproducible treatment methods, the failureto monitor treatment interventions, and thelack of evidence concerning the credibility oftreatment and control conditions, many dataon treatment, client, contextual, and designvariables were missing from the meta-anal-ysis as a result of simple reporting omissions

within the source papers. These lacunae se-verely limited the number of variables thatcould be included within multiple regressionanalyses and, thus, hampered our attemptsto examine the relationships among studyparameters in their impact upon effect size.Table 6 shows that few studies specifiedwhether or not clients were concurrently onmedication; the blindedness or otherwise ofthe data gatherer was specified in a bare ma-jority of cases. Several of the remaining ma-jor instances of missing data concern thera-pists; this may reflect the behavioral orien-tation of many of the source-study authorsand a consequent lack of concern with theimpact of the therapists. Many importantdesign variables were afflicted by substantialproportions of missing values. The data ofTable 6 represent quite conservative esti-mates of the extent of source-study authors'underreporting. They exclude data inferredduring coding (such as the age of college stu-dent samples). Furthermore, our selection ofparameters for coding within the rneta-anal-ysis was based on prior acquaintance withreporting standards current within the liter-ature, together with pilot coding exercises.Missing values therefore represent the failureof source-study authors to meet quite modest

Table 6Missing Values

Variable % missing

Treatment variablesModeTherapist experienceDuration

Client variablesEducationAgeSeverity/screeningSource

Contextual variablesSettingMedication

Design variablesAssignment of clientsAssignment of therapistsN of therapistsAttrition of treated groupAttrition of control groupPretreatment equivalenceBlindedness of data gatherer

1.736.54.3

16.95.15.14.8

6.088.4

6.327.525.618.918.723.245.0

Page 10: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

COMPARATIVE OUTCOME RESEARCH 51

reporting standards. Furthermore, most ofthe missing data would require minimal jour-nal space, so that space restrictions cannotaccount for them, although brief reports wereoften needlessly scanty and imprecise in thesematters.

Conclusions

, Our overall finding of an effect size ap-proaching 1 standard deviation unit appearsquite large, relative to findings in other do-mains of behavioral and social research(Cohen, 1977; Smith et al., 1980). But howgood is the evidence upon which this estimateis based? The present review suggests that itsstatistical conclusion and internal validity arerather better than its construct and externalvalidity. In respect to construct validity, theprevalence and somewhat inflated outcomesof reactive and nonblind outcome assess-ment, coupled with the general failure to con-trol adequately for nonspecific treatment ef-fects, give cause for concern. There is alsoinsufficient multimodal outcome assessment.These deficiencies pale into insignificance,however, when set against the overwhelm-ingly unrepresentative nature of the dataupon which our meta-analysis was per-formed. Typically, these studies concernedbehavioral and cognitive therapies directedtoward subclinical, focused target problems,such as performance and social anxieties,rather than generalized anxiety and depres-sion. Data were usually obtained in labora-tory settings, with a planned treatment du-ration of some 7 hours, with college studentclients whose participation was solicited, andwith the majority of the treatments con-ducted in groups by graduate student thera-pists. Some consolation may be derived fromthe generally small measured impact of theseaspects of unrepresentativeness upon the ef-fect sizes obtained, The fact remains that, onfeatures relevant to all Kazdin's (1978) di-mensions of generality, most of these data arequite unrepresentative of clinical practice,and the quantity of evidence with acceptablegenerality is vanishingly small. The virtualabsence of follow-up data is another factorseriously limiting the clinical utility of thefindings. In respect to our final concern inthe present review, the reporting adequacy of

the source studies is somewhat mixed; al-though many treatments were quite highlyreproducible, little attempt was made tomonitor the actual conduct of therapy, andthere were several gratuitously missing dataconcerning the implementation of the re-search. A substantial minority of authors re-ported basic descriptive statistics inade-quately.

Of course, the unrepresentativeness of ourdata might result from the relatively stringentdesign criterion (two or more active treat-ment groups plus a control) applied in se-lecting studies for consideration. For exam-ple, assignment to control conditions may beless feasible or considered ethically unac-ceptable with clinical populations, and longerterm follow-up data on untreated controls areespecially difficult to obtain. On the otherhand, the less exacting design criteria ofSmith et al. (1980), requiring only the com-parison of two or more groups, did riot resultin much more representative data. Thus, itappears that contrast-design outcome re-search is indeed generally unrepresentativeof clinical practice. This places severe limi-tations upon its implications for policy con-cerning service provision and training (Sha-piro, 1980; Vandenbos, 1981). The method-ological profile of this research revealed bymeta-analysis highlights a pressing need forthe redirection of the efforts of investigatorstoward realistic clinical studies. .

Several commentators have striven to un-derstand the modest progress of current ther-apy research toward the goal of identifyingeffective ingredients within clinical practice(Agras et al., 1979; Frank, 1979; Garfield,1981; Horowitz, 1982; Rachman & Wilson,1980). Despite the varying emphases of theseauthors, the following common themes areunderlined by this meta-analysis:

1. All research design necessarily involvescompromise between conflicting prioritiesand requirements, in the context of resourceconstraints (Waskow, Note 1). For example,the design requirements of internal and con-struct validity may often conflict, in the prac-tical implementation of a study, with the re-quirements of external validity. As revealedby the present review, contrast-design out-come research has been marred by the re-current tendency of investigators to make the

Page 11: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

52 DAVID A. SHAPIRO AND DIANA SHAPIRO

same compromises (e.g., sacrificing externaland construct validity to internal validity).Our confidence in the data as a whole wouldbe much greater had different investigatorsresolved their common dilemmas in morevaried ways. Data overwhelmingly tending toshare the same fault (such as the lack of ef-fective control for nonspecific effects notedhere) is more vulnerable to alternative ex-planations, in terms of the factors left un-controlled by that fault, than data with morevaried deficiencies requiring multiple alter-native explanations for its interpretation tobe challenged (Smith et al., 1980). Some ofthe blame for this uniformity of error mustlie with sociocultural factors such as the re-ward and opportunity structures within whichinvestigators conduct research (Agras et al.,1979).

2. It is costly of time and resources to con-duct a clinically realistic contrast-design out-come study meeting current methodologicalrequirements, and therefore research effortsmust be diversified to include complemen-tary research strategies (Agras et al., 1979;Horowitz, 1982). The analogue laboratorystudy of the type predominating within ourmeta-analysis should be viewed as but onesuch strategy. Other strategies sharing someclaim to resolve issues of cause-effect rela-tionships crucial to the identification of ther-apeutic ingredients include single-case ex-perimental designs (Hayes, 1981; Hersen &Barlow, 1976) and experimental-process re-search, in which treatment techniques aresystematically varied over the course of ther-apy and client changes in response to theseare analyzed via both extensive and intensivedesigns (Shapiro & Shapiro, Note 2). De-scriptive and correlational studies should alsofind a place within a methodologically diverseand evolving research effort, integrated viaits sustained concern with a given scientificquestion (Agras et al., 1979).

3. The weaknesses of meta-analysis iden-tified by critics, such as overgeneralization,indiscriminate inclusion of low-quality data,and idiosyncratic and unacceptable conclu-sions, are largely if not wholly inherent in theresearch endeavours reviewed by a meta-analysis (Shapiro & Shapiro, 1982a). Thelong-held and laudable aspiration of thosebehaviorally oriented investigators who have

been most critical of meta-analysis (e.g.,Eysenck, 1978; Rachman & Wilson, 1980)has been the development of a firm basis forclinical practice in replicated outcome stud-ies. The establishment of such a research basedepends upon generalizing across studies,and meta-analysis does this more systemat-ically and sensitively than traditional litera-ture reviews, especially where careful disag-gregation is employed (Shapiro & Shapiro,1982a; Strube & Hartmann, 1982). If theconclusions of such an analysis are uncon-genial, this merely reflects what the literaturein question can tell us. If the impact of ther-apy techniques upon outcome appears rela-tively modest, for example, then investigatorsshould take such a finding seriously. Theymight consider methodological improve-ments, such as thorough efforts to control theubiquitous nonspecific effects. Alternatively,they might pursue the substantive implica-tions of the uncongenial result obtained. Forexample, experiments could be designed toidentify and demystify the so-called "non-specific" factors, together with other likelyinfluences upon outcome, such as the client'sreadiness for change (Frank, 1979). If theimage in the mirror displeases you, little isgained by throwing the mirror away.

Reference Notes1. Waskow, I. E. Presidential Address. Society for. Psy-

chotherapy Research: Oxford, England, July, 1979.2. Shapiro, D. A., & Shapiro, D. Comparative therapy

outcome research: Reflections on meta-analysis. Paperpresented at the London Conference of the BritishPsychological Society, December 1981.

ReferencesAgras, W. S., Kazdin, A. E., & Wilson, G. T. Behavior

therapy: Toward an applied clinical science. San Fran-cisco: Freeman, 1979.

Arkin, R., Cooper, H., & Kolditz, T. A statistical reviewof the literature concerning the self-serving bias in in-terpersonal influence situations. Journal of Personal-ity, 1980, 48, 435-448.

Bandura, A. Self-efficacy: Toward a unifying theory ofbehavioral change. Psychological Review, 1977, 84,191-215.

Bootzin, R. R., & Lick, J. R. Expectancies in therapyresearch: Interpretive artifact or mediating mecha-nism? Journal of Consulting and Clinical Psychology,1979, 47, 852-855.

Campbell, D. T., & Stanley, J. C. Experimental andquasi-experimental designs for research on teaching.In N. L. Gage (Ed.), Handbook of research on teaching.

Page 12: Comparative Therapy Outcome Research: Methodological … · 2019-08-26 · tailed t test with a 5% alpha level. Thus the choice of sample size depends on the mag-nitude of effect

COMPARATIVE OUTCOME RESEARCH 53

Chicago: Rand McNally, 1963. (Also published asExperimental and quasi-experimental designs for re-search. Chicago: Rand McNally, 1966.)

Cohen, J. Statistical power analysis for the behavioralsciences (Rev. ed.) New York: Academic Press, 1977.

Cook, T. D., & Campbell, D. T. Quasi-experimentation:Design arid analysis for field settings. Chicago: RandMcNally, 1979.

Cook, T. D., & Leviton, L. C. Reviewing the literature:A comparison of traditional methods with meta-anal-ysis. Journal of Personality, 1980, 48, 449-472.

Cooper, H. M. Statistically combining independent stud-ies: A meta-analysis of sex differences in conformityresearch. Journal of Personality and Social Psychology,1979, 37, 131-135.

Eysenck, H. J. An exercise in mega-silliness. AmericanPsychologist, 1978, 33, 517.

Frank, J. D. The present status of outcome studies. Jour-nal of Consulting and Clinical Psychology, 1979, 47,310-316.

Franks, C. M., & Wilson, G. T. Annual review of behaviortherapy: theory and practice (Vol. 6). New York: Brun-ner/Mazel, 1978.

Garfield, S. L. Research on client variables in, psycho-therapy. In S. L. Garfteld & A. E. Bergin (Eds.), Hand-book of psychotherapy and behavior change (2nd Ed.).New York: Wiley, 1978.

Garfield, S. L. Evaluating the psychotherapies. BehaviorTherapy, 1981,12, 295-307.

Glass, C. V., McGaw, B., & Smith, M. L. Meta-analysisin social research. Beverley Hills, Calif.: Sage Publi-cations, 1981.

Glass, G. V., & Smith, M. L. The effects of class size onachievement. Journal of Education and Policy Studies,1979, 7, 2-16.

Hayes, S. C. Single case experimental design and em-pirical clinical practice. Journal of Consulting andClinical Psychology, 1981,'49, 193-211.

Hersen, M., & Barlow, D. H. Single case experimentaldesigns: Strategies for studying behavior change. NewYork: Pergamon Press, 1976.

Horowitz, M. J. Strategic dilemmas and the socializationof psychotherapy researchers. British Journal of Clin-ical Psychology, 1982, 21, 119-127.

Kazdin, A. E. Evaluating the generality of findings inanalogue therapy research. Journal of Consulting andClinical Psychology, 1978, 46, 673-686.

Kazdin, A. E. Nonspecific treatment factors in psycho-therapy outcome research. Journal of Consulting andClinical Psychology, 1979,47, 846-85J.

Kazdin, A. E., & Wilcoxon, L. A. Systematic desensi-tization and non-specific treatment effects: a meth-odological evaluation. Psychological Bulletin, 1976,83,- 729-758.

Kazdin, A. E., & Wilson, G. T. Evaluation of behaviorTherapy: issues, evidence and research strategies.Cambridge, Mass.: Ballinger, 1978.

Kraemer, H. C. Coping strategies in psychiatric clinicalresearch. Journal of Consulting and Clinical Psychol-ogy, 1981, 49, 309-319.

Lick, J. R., & Bootzin, R. R. Expectancy factors in thetreatment of fear Methodological and theoretical is-sues. Psychological Bulletin, 1975, 82, 917-931.

Luborsky, L., Singer, B., & Luborsky, L. Comparativestudies of psychotherapies. Archives of General Psy-chiatry, 1975, 32, 995-1008.

Mahoney, M. J. Experimental methods and outcomeevaluation. Journal of Consulting and Clinical Psy-chology, 1978, 46, 660-672.

Rachman, S. J., & Hodgson, R. Obsessions and com-pulsions. Englewood Cliffs, N.J.: Prentice Hall, 1980.

Rachman, S. J., & Wilson, G. T. The Effects of psycho-logical therapy: Second enlarged edition. New York:Pergamon Press, 1980.

Rosenthal, R., & Rubin, D. B. Interpersonal expectancyeffects: The first 345 studies. Behavioral and BrainSciences, 1978, 3, 377-415.

Shapiro, D. A. Science and psychotherapy: The state ofthe art. British Journal of Medical Psychology, 1980,53, 1-10.

Shapiro, D. A. Comparative credibility of treatment ra-tionales: Three tests of expectancy theory. BritishJournal of Clinical Psychology, 1981, 20, 111-122.

Shapiro, D. A., & Shapiro, D. Meta-analysis of com-parative therapy outcome research: a Critical ap-praisal. Behavioral Psychotherapy, 1982,10,4-25 (a).

Shapiro, D. A., & Shapiro, D, Meta-analysis of com-parative therapy outcome studies: a replication andrefinement. Psychological Bulletin, 1982, 92, 581-604 (b).

Smith, M. L., & Glass, G. V. Meta-analysis of psycho-therapy outcome studies. American Psychologist, 1977,32, 752-760.

Smith, M. L., Glass, G. V, & Miller, T. I. The benefitsof psychotherapy. Baltimore, Md.: Johns HopkinsUniversity Press, 1980.

Strube, M. J., & Hartmann, D. P. A critical appraisalof meta-analysis. British Journal of Clinical Psychol-ogy, 1982, 21, 129-139.

Vandenbos, G. R. Psychotherapy: Practice, research, pol-icy. Beverley Hills, Calif.: Sage Publications, 1980.

White, K. The relationship between socioeconomic sta-tus and academic underachievement. (Doctoral dis-sertation, University of Colorado, 1977). DissertationAbstracts International, 1977, 37, 5076A. (UniversityMicrofilms No. 77-03250).

Wilkins, W. Expectancies in therapy research: Discrim-inating among heterogeneous nonspecifics. Journal ofConsulting and Clinical Psychology, 1979, 47, 837-845. (a)

Wilkins, W. Heterogeneous referents, indiscriminate lan-guage, and complementary research purposes. Journalof Consulting and Clinical Psychology, 1979,47, 856-859. (b)

Received February 24, 1982Revision received April 13, 1982