Innovative methodologies for research using databases

STATISTICS IN MEDICINE, VOL. 10 ,629433 (1991)

INNOVATIVE METHODOLOGIES FOR RESEARCH USING DATABASES

LINCOLN E. MOSES Division of Biostatistics, Department of Health Research and Policy, Stanford University School of Medicine,

Suite 114. Stanford, CA 94305-5092, U S A .

INTRODUCTION

Two areas for strengthening and improving database research seem especially promising: (i) Methods of acquiring better data and (ii) better methods of analysing such data. Of the two, ‘better data’ seems much mosre important, and most of my remarks will be directed there, but I first offer a few thoughts on issues of analysis, an area where, presumably, I have a better right to speak anyway, being a licensed statistician.

DATA ANALYSIS

Two kinds of problem plague the effort to reach valid conclusions from statitistical studies; they are bias and sampling error. Almost always bias is the more difficult to assess and to allow for. In instances where data are abundant - as sometimes happens with database research -sampling error may become inconsequential, but bias is not usually helped at all simply by virtue of abundant data. And bias can be ruinous. One sometimes helpful approach to coping with bias stems from the fact that often one can conceive of several legitimate methods of analysing a given body of observational data. Those alternative approaches may be evidently vulnerable to different kinds of bias. The investigator may then do well to apply the various approaches seeing how, and how greatly, conclusions vary among them. This can give insight into the confidence that can be placed in the conclusions. An example is found in The National Halothane Study,’ mounted to assess the comparative safety of halothane and other general anaesthetics; the study made much use of the multiple analyses principle. The key methodology was comparison of standardized mortality rates of the various anaesthetics. both direct and indirect standardization were routinely applied. The strata, which could take account of age, physical status, sex, operation performed, and hospital, were constructed in many ways. The consonance among these various analyses was strong, but since all the analyses just referred to were complex and not transparent, one other kind of comparison was especially important. it was possible, for some of the more common operations, to find many hospitals in which two regimens (say halothane and cyclopropane) were widely used, and then directly to compare outcomes for that operation on those anaesthetic regimens in such hospitals. When those transparent findings also fitted well with the more global analyses, confidence in the overall conclusions was still further enhanced. This approach also warned us that there were not enough data about the use of ether anaesthesia to decide whether it was more or less safe than the others; the reason lay in its spotty use. It was not used at all in some hospitals, but was the most commonly used in some others. All together, there

0 1991 by John Wiley & Sons, Ltd. 0277-671 5/9 1/040629-05$05.00

630 L. E. MOSES

was no solid base for comparison within hospitals that widely used both ether and halothane or cyclopropane.

In addition to pointing to the great relevance of bias, compared with variance, and to recommending systematic recourse to multiple analyses, I should like to pause for a moment to say a ritual basd word for p-values. They are often irrelevant, simply gauging sampling error and especially sample size. Contentment with a lot of samll p-values tends to divert attention from the real questions, which are questions of estimation. ‘How large is this eflect? That is an important question. The p-value answers quite a different one. ‘How small is the probability that if the effect is truly zero we should see so big a sample effect as the one we have here? Not only is the question well off the track of substantive interest, the answer to it is very largely only a meter for measuring sample size, because the almost-fact that ‘all null hypotheses are false’.

Somewhat related to the issue of p-values is my final topic concerning methods of analysis. Just as multiple analyses may be helpful in assessing bias, they greatly disturb the ability to compute correct p-values, or .more generally, to assess questions of statistical significance correctly. Data- base research is quite vulnerable to problems of multiplicity, for there is so much opportunity to look at many things, many times. The issues are hard, and the foundries of mathematical statistics need to provide more help. Meanwhile, native cunning can do a lot. Two fables, the first from Berkeley, the second from Oakland, nicely illustrate the matter.

David Freedman (in Berkeley)’ did the following study. He constructed 100 multivariate observations, each comprising a dependent variable y and 50 predictors xl, x2, . . . x ~ ~ . He then used a multiple regression program fitting y to all 50 predictors, and found 15 of the x’s to be significant at level 025. He then fitted y, by multiple regression, to those 15 ‘good predictors’. R2 was 0.36, which by the usual significance test for 15 regressors and 100 observations reached p = 0.0005. This wonderfully significant p-value attests to the influence of multiple analyses for it is a product of that, only. Freedman’s 5100 observations were independent normal random numbers. The true multiple correlation between y and any set of the x’s was zero. The sample p-value arose from applying a statistical method, suitable to a one-shot analysis, in a context that used multiple choices in the analysis of the data.

The Oakland fable comes from Mark Bl~mberg.~ He first took 20 percent of the data to locate significant predictors, then he entered those selected variables in a multiple regression using the remaining 80 per cent of cases. He dropped any of the initially selected variables that were not more significant in the 80 per cent data set. The thus-pruned set of initial predictors were used for the whole data set. One can imagine that this approach would free one from the trap that Freedman so graphically exposed, the trap of the naive use of stepwise procedures. I think of Blumberg’s device as a triumph of native cunning. A fuller understanding of its properties would be nice to have, but perhaps out of the reach of extant theory.

DATA ACQUISITION: ISSUES

It is easy to believe that good database research is more often hampered by problems with data quality than by analytical difficulties. Indeed, much of analytical artifice is directed at alleviating difficulties with missing data or proxy data, etc. Thus, the place I prefer to look for rewards from innovative methodologies is where those are directed at getting better data in the first place.

Let me begin with illustrations from an exciting conference I attended in September of this year (‘Measuring health care effectiveness: The use of large data sets for technology assessment and quality assessment’, Washington D.C., 7-8 September, 1989). My notes contained several interesting anecdotes - or empirical illustrations of immanent problems that attend working with administrative data files.

INNOVATIVE METHODOLOGIES FOR RESEARCH 63 1

Two examples of inaccurate data

Three DRG’s are used to record chest pain; the other two are angina and atheroscler~sis.~ At Cook County Hospital a careful study revealed that among cases with the diagnosis acute myocardial infarction, 57 per cent were in fact mi~diagnosed.~

Two examples of misleading data

A review of many discharges produced the anomalous finding that there were fewer MRI procedures among the sickest patients, where experience suggests more, not fewer diagnostic procedures. Reason: the discharge abstract records up to three main procedures, and for a sick enough patient, MRI does not make the short list, in competition with surgical procedures and the like (B. McNeil*).

In a similar vien, Iezzoni found a lower death rate among patients with diabetes, rather than without. Reason: only five diagnoses can appear on the discharge abstract, so relatively healthy patients, with few acute problems, are the ones whose diabetes gets into the record.

Three examples of inscrutable data

Ordinarily it is expected that older surgical patients will suffer higher complication rates. But Cebul* did not find this. Reason: older patients were, on average, referred to more experienced surgeons, who had lower surgical complication rates. Temple* found patients operated on for oesophageal varices had better experience than was seen in non-operated historical controls. But this is in opposition to established understanding. What was the reason? The inoperated historical controls included those very sick patients upon whom the surgeons had been unwilling to operate. Finally, Nelson? made the important observation that from claims data you cannot identify the time of any event. This means, for example, that a comorbidity cannot be generally identified as pre-treatment or post-treatment; thus possible cause and possible effect cannot be distinguished.

This recital is sketchy and brief but it gives clues as to how efforts to improve data quality may be an important approach.

I turn now to two brilliant examples of ways to get better data. The first of these is the Metro Firms experience.6 - At Cleveland Metropolitan General Hospital there are three parallel units, called ‘firms’; each comprises approximately 40 resident and faculty physicians, 28 in-patient beds and an affiliated outpatient clinic. For some years now all new patients and providers have been assigned to one of the firms by random number. Deliberate modifications of practice are introduced in an experimental firm and not in the others. The management information system can then track results; questions of different patient mix are already solved by the random assignment; and the complexities of adjustment were thus obviated. it is natural to speculate about how this concept can be carried forward in more, places and other settings.

The second example of getting better data into the data base relates to Inman’s yellow card system for surveillance of adverse drug reactions in the early post-marketing period. lo Rather than relying on spontaneous reports from physicians (the green card system), he collects, through

* Barbara McNeil, Robert Temple, Randall Cebul. Panel on ‘Strengths and weaknesses of large data sets for technology assessment’. Conference on ‘Measuring health care effectiveness: the use of large data sets for technology assessment and quality assessment’, Washington D.C., 7 September, 1989. t Kenneth Nelson. Comment from the floor of the conference on ‘Measuring health care effectiveness: the use of large data sets for technology assessment and quality assessment’, Washington D.C., 7 September, 1989.

632 L. E. MOSES

the pharmacies of the National Health Service, a copy of each prescription written for the new drug. After 6 months he writes to the prescribing physician, asking to know of every entry in the patient's record during the 6 months. In this way the data are not filtered through a high-level inference of physician-assessed causal relationship between drug and event; even entirely unsuspected kinds of side effects have good opportunity to become manifest. It appears to be a superior method of acquiring data concerning adverse drug reactions.

Each time that we can arrange for a better kind of data on a subject we strengthen the database and improve its usefulness.

IMPROVING DATA ACQUISITION

More and more use of databases for health services research seems inevitable, even if fraught with difficulties and traps. It is in some measure a treacherous enterprise, for at root it seeks trustworthy conclusions (necessarily resting on notions of cause and effect) from observational data. But never mind, increasing use seems inevitable, so how to improve the prospects for success is an urgent question.

Broadly, the appropriate strategy would seem to be one of thoughtfully planned initiatives, carefully assessed and reported, with retention and propagation of the techniques and ideas that prove to be helpful. Choosing promising initiatives can be an important ingredient of success. So, with some diffidence, I offer some ideas concerning features of a data-base driven proposal that would seem to favour its success.

Prospectively acquired data is far more likely to result in trustworthy, usable conclusions. A prospective program is much stronger if organized around a protocol. Definitions, specification of what data will be acquired and how and when, can go far to pre-empt the ambiguities that are the enemies of useful research. The protocol can address data quality problems by establishing edits, checks, and monitoring procedures. It can draw the fangs of the inference problems that attend multiple end-points, many covariates, etc., by spelling out in advance the lines of the primary statistical strategy to be applied.

A database research program is benefited by the quality of the participants. So, choice of collaborators is important, and so are good routines for maintaining that quality, through training, retraining, communication and feedback.

A program gains much if it involves good ways of controlling or allowing for interfering variables. The yellow card system overrides interfering variables by looking at all early pre- scriptions for the new drug. The Metro Firms side-step the influences of interfering variables by way of the random assignment of patients to firms. Of course it can be beneficial to acquire well- measured values of variables recognized correctly as influential interfering variables. Two critical ones, far more easily ascertained in a prospective study than in a retrospective one, are (i) therapeutic intent when some intervention is made; (ii) previous history of the patient in respects that are relevant to the main questions of the project. Naturally, a program is better if it addresses 'good' topics. Desirable aspects include importance, scientific interest, feasibility with respect to time and to patient flow, etc.

Finally, dissemination of findings and data bear on the desirability of a program. When we regard experience with carefully chosen and well executed database studies as steps toward stronger methodology, it is clear that propagation of findings and data is of first-order importance. Flournoy and Hearne'' go so far as to propose that experimental research projects using human subjects should include in their submissions to institutional review boards and to funding agencies, detailed justified plans for data sharing, and that peer review decisions should take those plans into account in gauging scientific merit and appropriate funding. In the area of

INNOVATIVE METHODOLOGIES FOR RESEARCH 633

database projects the same ideas seem directly relevant. But in this latter context another idea can be seen as implicitly involved. Prospectively resolving what data shall be shared, how and with whom, must quickly edge over into consultation in the drafting of the project itself. That may seem like a daunting prospect - having outsiders involved in the design phases of one’s own research -But the elusive goal of database research is to make available to a wide range of users, information that they can use advantageously. That is even more daunting, and finding the path to that ideal will almost surely be expedited by judiciously obtained experience with pre-proposal consultation involving users.

CONCLUSION

The emergence of the field of Health Service Research, and the increasing use of computer technology, ensure that accessible databases will be more and more widely used in research. Conclusions based on the results of such research will influence budgets, norms of practice, corporate behaviour, and presumably much more.

Such research is difficult to do, indeed it is full of traps, but it is inevitably going to be done, more and more, never mind the risks. It is therefore urgent that we develop ways of doing such research as well as is possible.

Two kinds of effort offer themselves. Both should be undertaken. One is to develop better methods of statistical analysis. The other is to develop ways of getting better data into the databases. I have emphasized the second of these, proposing that we need well planned data acquisition initiatives in database research, initiatives that carry the Drospect of success and generalizability, and ones where wise provision has been made, prospectively, for data sharing and propagation of findings.

REFERENCES

1. Bunker, J. P., Forrest, W. H. Jun., Mosteller, F. and Vandam, L. D. (eds) The National Halothane Study,

2. Freedman, D. A. ‘A note on screening regression equations’, American Statistician, 37, 152-155 (1983). 3. Blumberg, M. and Binns, G. S. ‘Risk-adjusted 30-day mortality of fresh acute myocardial infarctions:

1987 Medicare discharge data’, Prepared for the Quality Measurement and Management Project of the Hospital Research and Educational Trust, 1989.

4. Iezzoni, L. I. ‘Using administrative diagnostic data to assess the quality of hospital care: Pitfalls and potential of ICD-9-CM’, International Journal of Technology Assessment in Health Care, 6(2), 272-281 (1990).

5. Demlo, L. K. ‘Measuring health care effectiveness: Research and policy implications’, International Journal of Technology Assessment in Health Care, 6(2), 288-294 (1990).

6. Cargill, V., Cohen, D., Kroenke, K. and Neuhauser, D. ‘Ongoing patient randomization; an innovation in medical care research’, Health Seroices Research, 21, 663-678 (1986).

7. Cohen, D. I. and Neuhauser, D. ‘The Metro Firm Trials; an innovative approach to ongoing randomized clinical trials’, In Assessing Medical Technologies, National Academy Press, Washington

8. Cohen, D. I., Breslau, D., Porter, D. K., et al. ‘The cost implications of academic group practice-A randomized controlled trial’, New England Journal of Medicine, 314, 1553-1557 (1986).

9. Cebul, R. D. ‘Randomized controlled trials using the Metro Firm system’, Medical Care, (1990, in press). 10. Imman, W. H. ‘Yellow cards and green forms’, The Practitioner, 227, 1443-1449 (1983). 11. Flournoy, N. and Hearne, L. B. ‘Sharing scientific data 111: Planning and the research proposal’, IRB

National Institutes of Health, National Institute of General Medicine Sciences, Bethesda, 1969.

D.C., 1985, pp. 529-534.

A Reoiew of Human Subjects Research, 12(3), 6-9 (1990).

Documents

Innovative methodologies for research using databases