Rejoinder to Questions About the Fort Bragg Evaluation

Journal of Child and Family Studies, VoL 5, No. 2, 1996, pp. 197-206

Rejoinder to Questions About the Fort Bragg Evaluation

Leonard Bickman, Ph.D., 1,4 E. Warren Lambert, Ph.D., 2 Wm. Thomas Summerfelt, Ph.D., 2 and Craig Anne Heflinger, Ph.D. 3

Bickman, Hefiinger, Lambert, and Summerfelt (1996) describes the early mental health outcome results of the Fort Bragg Evaluation. More comprehensive results from the same project have been reported, including a final report to the Army (Bickman et al., 1994), a book (Bickman et al., 1995), and two special issues of journals (Bickman, 1996a, 1996b). In some cases the commentary refers to this later material. In addition, we conducted new analyses to answer questions raised in some reviews. The commentaries have thoughtfully raised several issues about the evaluation of the Fort Bragg Demonstration. Some concerns are warranted, as every study has strong and weak points. The following is our response to the four main questions raised in the previous articles followed by a list of conclusions for which there is general agreement.

QUESTIONS ABOUT THE NATURE OF THE DEMONSTRATION

Was There Sufficient Intermediate Care Use?

Burchard (1996) says that intermediate services (respite care, thera- peutic individual and group homes, day treatment, after school treatment, and in-home services) were not used as much as the continuum-of-care

1Director, Center for Mental Health Policy, Vanderbilt University, Nashville, Tennessee. 2Research Associate, Center for Mental Health Policy, Vanderbilt University, Nashville, Tennessee.

3Senior Research Associate, Center for Mental Health Policy, Vanderbilt University, Nashville, Tennessee.

4Correspondence should be directed to Leonard Bickman, Center for Mental Health Policy, Vanderbilt University, 1207 18th Avenue South, Nashville, Tennessee 37212.

197

1062-1024/96/0600-0197509.50/00 1996 Human Scicnc~ Press, Inc.

I98 Bickman, Lambert, Summerfelt, and HefUnser

philosophy seems to predict. In the article published in this issue, 8.8% of the children received intermediate care out of the total number of children who received any services. More recent data show that 13.9% used intermediate services (Summerfelt, Foster, & Saunders, 1996). However, without citations to the literature to which Burchard refers, responding to his concerns directly is difficult. Nevertheless, there may be a misunderstanding about the nature of the data being presented.

The contiunum-of-care philosophy highlights the importance of deliv- e r ing bo th the least res t r ic t ive and most appropr i a t e services. Appropriateness entails matching needs with services delivered. The Fort Bragg Evaluation included youth of varying severity in a case mix that would be typical for a community mental health agency. Since the denominator is all treated youth, rates of intermediate level service use at or above 90% would be of questionable appropriateness. If the denominator were reduced to children with SED, the percentage of those using intermediate care should be considerably, as it was at Fort Bragg, higher. We used the presence of a diagnosis and a score lower than 61 on the CGAS to define SED. More than 60% of children with SED received intermediate services at the Demonstration. The remaining 40% received comprehensive intake assessments and intensive outpatient services. We suspect these rates are closer to what Burchard would expect.

Burchard also raised questions concerning whether the impact of the Demonstration on psychopathology would be greater if we examined only the children who received the intermediate services. The outcomes of children who received intermediate level services were compared with a clinically matched group at the Comparison site. The results showed that the Demonstration children did better on two of the twelve measures, how= ever, the Comparison did better than Demonstration on two other measures. Thus, the provision of intermediate services provided no clinical advantage for the Demonstration children (Bickman et aL, 1995). More- over, when the median cost of services is compared, the Demonstration children using intermediate services were five times more expensive than comparable children at the Comparison site.

Was the Continuum-of-Care Sufficiently Mature?

Evans and Banks (1996) are perceptive in pointing out that our field is still in its infancy. The Fort Bragg Evaluation recognized that providers lacked the experience to integrate and deliver the full array of services that were available in the Demonstration because of the novelty of these serv-

Fort Bragg Evaluation 199

ices. Most practitioners have no opportunities to use the treatment options that were available at the Demonstration.

We were sensitive to the problem of testing a program that was not fully implemented. More than 97% of the outcome data in the Demon- stration were collected 15 months after the Demonstration was funded and it took more than 30 months to collect all baseline data. Clearly, the Evalu- ation and Demonstration staff both felt that the Demonstration was s~lfficiently stable to be evaluated. Moreover, only two small effects were found between outcomes and the date when a child entered the study; one favoring the Demonstration and one the Comparison. We do not think these results are meaningful,

Did Financing Differences Cause Potential Bias?

Friedman (1996) points out that the financing structure was different at the two sites. Though this difference might plausibly affect the test of the continuum of care, several factors mitigate its impact in explaining the psychopathology results. The generosity of the CHAMPUS system reduced the costs parents paid at both sites. The deductible was quite low ($50 per person and $100 per family for active duty dependents in the lower pay grades) and did not apply to hospitalization. Also important is the fact that yearly out-of-pocket expenditures are limited to $1,000 for the entire family. Medical expenditures count toward the deductible and the out-of-pocket maximum. Further, payments made to other nonpsychiatric insurers count toward the family's deductible and coinsurance expenses. Given the ease with which a family may reach its out-of-pocket maximum, the marginal cost of services under CHAMPUS was zero for many children, especially for the families that accounted for the lion's share of expenditures. The cost of services to the family, especially for the most expensive children, was very similar at both sites. While financing strategies may have affected the decision to use services, it is unlikely that for the heavy users of services, it had much of an effect.

Was the Demonstration a 2~ype of Managed Care?

Friedman (1996) challenges our use of the term "managed care" to characterize the Demonstration. Some experts define managed care as any form of peer review to control utilization, while others understand it to mean active case management to coordinate and assure continuity of care (Broskowsld, 1991). Others simply define managed care as "any patient care that is not determined solely by the provider" (Goodman, Brown, &

200 Bickman, Lambert, Summerfelt, and HefUnger

Deitz, 1992). Written guidelines were used in the Demonstration to assign children to a level of care, as suggested by Olsen, Riekles, and "l]ravlik (1995) in their definition of managed care. Financial risk is not necessary for the care to be considered managed care. Many managed behavioral care companies, especially in contracts with the private sector, do not use contracts that are capitated and at risk. The managed care (in contrast to managed costs) aspects of the Demonstration are pertinent to under- s tanding cur ren t managed care systems. The D e m o n s t r a t i o n was administered by clinical managers using what we described as "clinical wisdom" as the basis for decisions (Biekman, 1996e). The Evaluation showed that this wisdom was not necessarily a cost-effective management strategy.

QUESTIONS ABOUT THE NATURE OF THE SAMPLE STUDIED

Can We Generalize from a Military Dependent Population?

Burchard (1996) raises the concern that a military population is not comparable to a nonmilitary population in that the military families are more stable and, thus, children experience less emotional disturbance. This is the flip-side of the critique that military families have many stressors and are, consequently, likely to suffer higher levels of emotional disturbance. The investigators have considered the issue of the generalizability of this sample of families. In the past, it had been suggested that there existed a "military family syndrome" (LaGrone, 1978), characterized by authoritarian fathers, depressed mothers and behaviorally disordered children. Much of the evidence of a syndrome has been based on myth, anecdotal evidence, and poorly drawn samples (Jensen, Xenakis, Wolf, & Bain, 1991). The empirical evidence suggests that the prevalence of children diagnosed with conduct disorders and other diagnostic categories is comparable to civilian families (Morrison, 1981). Jensen and his colleagues in two separate studies (Jensen et al., 1991, 1995) found no evidence of a military family syndrome and concluded that the characteristics of children in military families were comparable to those in civilian families. Further- more, the Evaluation sample was compared with community samples used for standardizing the Child Behavior Checklist (Aehenbaeh, 1991; Breda, 1993). Overall similarity was found between the Evaluation sample and the sample of community children who had been referred for clinical services. Therefore, the fact that the Evaluation Sample included only military dependents is no reason to preclude extending the findings to civilian populations.


Were the Children in the Study Sufficiently Ill to Generalize to Other Studies of Children with SED?

Several reviewers (Burchard, 1996; Friedman, 1996; Kingdon & Ichi- nose, 1996) voice the opinion that the children studied in the evaluation represented more moderate as opposed to severe disturbance and thus the results may not apply to the children with severe illness. This issue is dis- cussed in more detail in Lambert and Guthrie (1996) and Bickman (1996d). About two-thirds of the children in the evaluation sample have both a diagnosis and functioning impairment that would categorize them as children with SED. The evaluation sample includes the entire range of pathology treated in a typical service system, with over-sampling of more serious cases. We see treated children as an important mainstream sample, comparable to children with parents working in government or large corporations. Such a mainstream sample is a fundamental one for national health care policy. Missing from the FBEP sample are certain special groups, such as children from single-parent families living on public assistance. However, criticizing a study of a population representing a full range of disorders for not being a study limited to a special one is unconvincing.

Moreover, we analyzed the following five subgroups to determine if the Demonstration showed an advantage over the Comparison site: (a) Children who needed intermediate care but not hospital care; (b) Children who needed more treatment than outpatient care; (c) Children with severe psychopathology and impairment; (d) Children from families with multiple problems, including family difficulties; and (e) Children with more than one type of clinical problem. Analyses of 12 key outcome variables at six and twelve months showed a somewhat larger number of findings than could be expected from chance alone, but the results favored the Demon- stration and Comparison sites about equally (Lambert & Guthrie, 1996).

Moreover, the clinical significance of these findings was minor. The small .30 SD effect size found is equivalent to a t-score of 53 versus 50 on the CBCL. It is unlikely that any clinician would attach importance to this small difference. At the policy level there should be little support for the widespread implementation of continua of care given the small to nonex- istent clinical differences and the increased cost over usual services.

Were the Comparison Site Parents More Motivated Since it was More Difficult for Them to Obtain Treatment for their Children?

Friedman (1996) questions whether the Evaluation considered motivation for treatment. He suggested that the Comparison parents would be

202 Bicknmn, Lambert, Summerfelt, and Heflinger

more highly motivated since they had to pay for services. This motivation could result in improved outcomes. Motivation and readiness for treatment can be examined in terms of the parent's rating of the seriousness of the child's problem, their hope that treatment will help, and by the burden experienced by parents. Related to motivation is the child's severity, as measured by standardized ratings, if we can assume that parents of a se- riously disturbed child would be more motivated to find treatment. Tests of baseline equality at intake are shown in "l~ble 1.

According to one item ranging from "very serious" to "not at all serious," Comparison parents saw their child's problems as slightly more serious than did Demonstration parents. According to another item, Dem- onstration parents, entering the study after some contact with the Rumbaugh Clinic, had somewhat more confidence that treatment would help. While these two items differed, more reliable scales combining many items found the sites about equal at Wave 1. How should such differences be interpreted?

The Evaluation was a quasi-experiment-- i t was impossible to randomly to assign children to clinics hundreds of miles apart--and it is true that there are some baseline differences between the Demonstration and Comparison. However, when the few differences at baseline are weighed against the many similarities (Breda, 1996), the sites appear surprisingly well matched on child and family mental health variables. Even if there were mild site differences on outcome measures, the analysis of covariance used in the 12-month outcome study should reduce, if not eliminate, such problems in the outcome study. Short of a randomized clinical trial, of course, one can speculate about unknown site differences as "possible confounds."

Table 1. Motivation and Readiness for Treatment Intake

Item or Sig. at Content Scale Wave 1 Site

Seriousness of child's problem Item 0.01 Comparison more serious ( ~ = .18 sDs)

Confidence that treatment will help Item <.001 Demo more confident (ES = ~39 SDs)

Burden of Care Scale 0.17 NS CBCL, parent-reported child behavior Scale 0.95 NS Problems Severity:. Structured interviews Scale 0.84 NS (symptoms + functioning impairment) from P-CAS+CAFAS


Were the Demonstration Parents Sufficiently Involved?

Friedman (1996) says that parents were not sufficiently involved in at the Demonstration. However, significantly more parents in the Demonstra- tion reported that they were very involved in treatment planning and service delivery (Biekman et at, 1995; Sormichsen & Heflinger, 1993). Moreover, in a subsequent analysis we could find no relationship between several measures of parent self-reported involvement and clinical and functional outcomes.

QUESTIONS ABOUT ALTERNATIVE EXPLANATIONS FOR THE RESULTS

Were the Comparison Children's Scores Artificially Lower Because More of Them Were Hospitalized?

Burchard (1996) suggests that the children in the Comparison site had less opportunity to exhibit emotional or behavioral problems because more of them were hospitalized, thus artifaetually reducing their symptom scores. While hypothesizing this "artifact" is perceptive, before considering a problem, we must see if it applies empirically. To prevent behavior problems, the incidence of hospital treatment is less important than the percentage of time spent in all residential facilities. To make this comparison, we used data not available when the article was submitted. The total days of residential care (hospital + residential treatment centers + intermediate residential) were divided by the total number of days from intake to the one year follow-up. Over one year, Demonstration eases were in hospital or residential care an average of 7% of their days, 4% at the Comparison [t(793) = 2.8, p < .01]. Empirically, then, we must reject the hypothesis that 24-hour care was more common at the Comparison, and thus pre- vented pathological behavior. If twenty-four hour care reduced pathology, then it was the Demonstration that experienced artffactually less psychopathology, not the Comparison.

QUESTIONS ABOUT THE NATURE OF THE ANALYSES

Are the Results Different for Different Instruments?

Evans and Banks (1996) suggest that the results of this study can simply be accounted for by the differences between professional assessment and self reports. The sign test presented by Evans and Banks in their Ihble

204 Bickman~ Lambert, Summerfelt, and Hefiinger

1 is inventive. However, the sign test of (higher, lower) is a modest sub- stitute for analyses of covariance and random regressions based on exact measures. The sign test credits a site with better outcomes even when the difference between means is minuscule.

If it were true that the Demonstration showed superior outcomes on self-report measures and the Comparison produced superior results on rater measures, how would we interpret this? We know that client satisfac- tion is better in the Demonstration. Should we accept the following as a finding? "Clients, more satisfied at the Demonstration, thought Demon- stration outcomes were significantly better; outside observers rated outcomes significantly worse at the Demonstration." Many interpretations are possible. Our feeling as of this writing is to hold with the original question: Are mental health outcomes better at the Demonstration? The answer is no. The outcomes at the two sites are about equal.

Would Categorical Analyses Produce Different Results?

Evans and Banks (1996) also raise the question about whether the results would be different if we analyzed just the number of cases that failed to improve. Even though it is unlikely that a categorical analys!s would be more sensitive to change than examining a continuous variable, we (Bickman et aL, 1995) examined a categorical approach in two ways. First, we analyzed the number of children who either lost or gained a diagnosis in eight major diagnostic categories. While most children improved (i.e., lost a diagnosis), there were no significant differences between sites. The second way this issue was examined was to use the Reliable Change Index (Jacobsen & 1~uax, 1991) on twelve key outcome measures. This analysis produced similar results. Categorical methods produced no different conclusion. The sites were equivalent in affecting psychopathology.

Would the Results be Different When Children are Followed for a Longer Time?

Friedman (1996) feels that children should be followed at least a year after entering treatment. He suggests that different results might appear over time. Funding has been obtained to follow the Evaluation sample for at least three years. Besides the present results at six months, results at one year have been reported (Lambert & Guthrie, 1996); eighteen month follow up is currently being conducted. At twelve months--most children have terminated treatment, and overall outcomes are about equal at the Demonstration and Comparison (Lambert & Guthrie, 1996).


Were Too Many Analyses Conducted?

Friedman (1996) is critical of thi~ paper for its analysis of too many measures. We agree with this criticism of the short term impact study in the present issue. In later analyses (Bickman et al.,1995), a list of twelve key outcomes was used. Eight were high-level total scores, such as CBCL or YSR total pathology, and four were individualized scores, such as the narrow band score best representing the child's presenting problem.

POINTS OF AGREEMENT

There are many points of agreement between the evaluators and the reviewers, especially the papers by Weisz, Han, and Valeri (1996) and Heng- geler, Schoenwald, and Munger (1996). We agree that: (a) typical community-based services may not be effective and embedding them in a continuum would not necessarily make them more effective; Co) the individual impact of service components needs to be examined; (c) research based services need to be studied in real world, ecologically valid environments, not just university laboratories; (d) providers should be accountable for client outcomes; (e) more efforts need to be devoted to creating better outcome measures; (f) theories of intervention need more precision; and (g) the null effects found in the present study are consistent with other findings.

REFERENCES

Achenbach, T. M. (1991). Manual for the child behavior checklist and 1991 profile. Burlington, VT: University of Vermont, Department of Psychiatry.

Bickman, L. (1996a). (Ed.) Special Issue: The Fort Bragg Experiment. The Journal of Mental Health Adrainistration, 23, 7-15.

Bickman, I., (1996b). (Ed.) Special Issue: Methodological issues in evaluating mental health sexvices. Evaluation and Proyam Plannin~ 19.

Bickman, L. (1996c). The evaluation of a children's mental health managed care demonstration. The ./ouma/of Mental Health Administration, 23, 7-15.

Bickman, L. (1996d). Implications of the Fort Bragg Evaluation. The ./ourna/of Mental Health Adm/nism~/on, 23, 108.118.

Bickman, L, Guthrie, P., Foster, E. W., Lambert, E. W., Summerfelt, W. T., Breda, C., & Heflinger, C. A. (1995). Evaluating managed mental health care: The Fort Bragg Exper/ment. New York: Plenum Publishing.

Bickman, L., Guthrie, P., Foster, E. M., Lambert, E. W., Summerfelt, W. 1"., Breda, C., & Heflinger, C. A. (1994). Final report of the outcome and cost~utilization studies of the Fort Bragg Evaluation Project. Nashville, TN: Center for Mental Health Policy.

Bickman, L., Heflinger, C. A., Lambert, E. W., & Summerfelt, W. T. (1996). The Fort Bragg managed care experiment: Short term impact on psychopathology. Journa/of Child and Family Studies.

2 0 6 Bickman, Lambert, Summertelt, and HefUager

Breda, C. (1996). Methodological issues in evaluating mental health outcomes of a children's mental health managed care Demonstration. The Journal of Mental Health Administration, 23, 4O-5O.

Breda, C. (1993). R e p m s e n t a ~ of the emluation sample: Appendix L to the Final Report of the Fort Bragg Evaluation.. Nashville, TN: Center for Mental Health Policy.

Broskowski, A. (1991). Current mental health care environments: Why managed care is necessary. Professional Psychology: Research and Practice, 22; 6-14.

Burchard, J. D. (1996). Evaluation of the Fort Bragg Managed Care Experiment Journal of Ch/ld and Fam//y Stud/es, 5.

Evans, M. E., & Banks, S. M. (1996). Commentary on the Fort Bragg Managed Care Experiment. Journal of Child and Family Studies, 5

Friedman, R. M. (1996). The Fort Bragg Study: What can we conclude? Journal of Child and Famay Stua~, 5.

Goodman, M., Brown, J., & Deitz, P. (1992). Managing managed care: A mental health pmc'tuioners survival guide. Washington, DC: American Psychiatric Press.

Henggeler, S. W., Schoenwald, S. K., & Munger, R. L. (1996). Families and therapists achieve clinical outcomes, systems of care mediate the process. Journal of Child and Family Stud/es.

Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychopathology research. Journal of Consulting and Clinical Psycho/ogy, 59, 12-19.

Jansen, P. S., Watanabe, H. If., Richters, J. E., Cortes, R., Roper, M., & Liu, S. (1995). Prevalence of mental disorder in military children and adolescents: Findings from a two-stage community survey. Journal of Academy of Child and Adolescent Psychiatry, 34, 1514-1524.

Jansen, P. S., Xenakis, S. N., Wolf, P., & Bain, M. W. (1991). The "milita~ family syndrome" revisited: By the numbers. Journal of Nervous and Mental D~ease, 179, 102-107.

Kingdon, D. W., & Ichinose, C. If. (1996). The Fort Bragg Managed Care Experiment: What do the results mean for publicly funded systems of care? Journal of Child and Family Stud/es.

LaGrone, D. A. (1978). The,military family syndrome. American Journal of Psychiatry, 35, 1040-1043.

Lambert, E. W., & Guthrie, P. (1996). Clinical outcomes of a children's mental health managed care Demonstration. The Journal of Mental Health Administration, 23, 51-68.

Morrison, J, (1981). Rethinking the military family syndrome. American Journal of Psychiatry, 138~ 354-357.

Olsen, D. P., RicHes, J., & Travlik, K. (1995). A treatment-team model of managed mental health care. Psychiatric Services, 46~ 252-256.

Sonnichsen, S. E., & Heflinger, C. A. (1993). Pre//minary assessment ofpatental involvement. In the Final Report of the Implementation Study of the Fort Bragg E~l~o6on Projec~ (Appendix G). Unpublished document. Nashville, TN: Center for Mental Health Policy.

Summerfelt, W. T., Foster, E. M., & Sannders, R. C. (1996). Mental health services utilization in a children's mental health managed care demonstration. The Journal of Mental Health Admin/stration, 23, 80-91.

Weisz, J. R., Han, S. S., & Valeri, S. (1996). What we can learn from Fort Bragg. Journal of Child and Family S~dies.

Documents

Rejoinder to Questions About the Fort Bragg Evaluation