22
JOUR#NAL,OF Ekonometrics EI.SBVlFB Journal of Econometrics 67 (1995) 25-46 Three ways to think about testing in econometrics Philip Mirowski Department of Economics, University of Notre Dame, Notre Dame, IN 46556, USA Abstract Recent methodological disputes amongst econometricians are shown to depend criti- cally upon a sequence of differing conceptions of the empirical process, although they are nowhere spelled out in detail. The third conception of critical postmodernism is further described, comparing practices across physics, psychology, and economics using both narratives from recent philosophy of science and the technique of Birge ratios. Key words: Birge ratios; Econometricians; Empirical confirmation; Critical postmodern- ism; Quantitative estimation 22. classification: ClO; B41 1. Introduction Easily the number one nostrum prescribed by the broad spectrum of method- ologists (but not econometricians - more on this anon) concerned with the health of economics is more empiricism, or better empi~~ism, or more testing, or better testing (Blaug, 1980, pp. 256 et seq; de Marchi and Blaug, 1991; Mayer, 1993). But there is surely something incongruous and dubitable about this proposed remedy, since it is hard to imagine any human inquiry better subsi- dized and more widespread than econometric modelling, unless it be clinical cancer research or atomic weapons development. Credulity is further stretched by the now-familiar Duhem-Quine literature which questions the very idea of a The author is grateful to Wade Hands for helpful discussions, Ellen O’Brien and Martin Stack for indefatigable research assistance, and to Aris Spanos, David Hendry, Will Milberg, Robert Basmann, Nancy Wulwick, and Roger Backhouse for comments, cautions, and assistance. 03~-4076~~5~$09.50 0 1995 Elsevier Science S.A. All rights reserved SSDI 030440769401625 A

Three ways to think about testing in econometrics

Embed Size (px)

Citation preview

Page 1: Three ways to think about testing in econometrics

JOUR#NAL,OF Ekonometrics

EI.SBVlFB Journal of Econometrics 67 (1995) 25-46

Three ways to think about testing in econometrics

Philip Mirowski

Department of Economics, University of Notre Dame, Notre Dame, IN 46556, USA

Abstract

Recent methodological disputes amongst econometricians are shown to depend criti- cally upon a sequence of differing conceptions of the empirical process, although they are nowhere spelled out in detail. The third conception of critical postmodernism is further described, comparing practices across physics, psychology, and economics using both narratives from recent philosophy of science and the technique of Birge ratios.

Key words: Birge ratios; Econometricians; Empirical confirmation; Critical postmodern- ism; Quantitative estimation 22. classification: ClO; B41

1. Introduction

Easily the number one nostrum prescribed by the broad spectrum of method- ologists (but not econometricians - more on this anon) concerned with the health of economics is more empiricism, or better empi~~ism, or more testing, or better testing (Blaug, 1980, pp. 256 et seq; de Marchi and Blaug, 1991; Mayer, 1993). But there is surely something incongruous and dubitable about this proposed remedy, since it is hard to imagine any human inquiry better subsi- dized and more widespread than econometric modelling, unless it be clinical cancer research or atomic weapons development. Credulity is further stretched by the now-familiar Duhem-Quine literature which questions the very idea of a

The author is grateful to Wade Hands for helpful discussions, Ellen O’Brien and Martin Stack for indefatigable research assistance, and to Aris Spanos, David Hendry, Will Milberg, Robert Basmann, Nancy Wulwick, and Roger Backhouse for comments, cautions, and assistance.

03~-4076~~5~$09.50 0 1995 Elsevier Science S.A. All rights reserved SSDI 030440769401625 A

Page 2: Three ways to think about testing in econometrics

26 P. Mirowski~ Journal of Econometrics 67 (I9951 25-46

crucial experiment (Harding, 1976). The image of formal tests decisively dispell- ing scientific controversies for everyone involved is a narrative device rarely found in the history of science nowadays. In such an environment, repetitive throwing of money and manpower at a problem is only a short-term solution; while pulling back and asking why more empiricism of a particular sort has a reasonable prospect of producing a more robust economics may perhaps clarify our hopes and fears. I shall argue here that three unarticulated and divergent scenarios af the nature and practice of tests motivate much of present methodological controversy, not to mention a fair amount of theorizing, amongst econometricians, and that the third narrative might better describe quotidian practices amongst empirical economists.’

2. How to be a twentieth-century empiricist

The conception of the relationship of theory to empiricism has undergone at least two profound upheavals in our century; and these in turn have left their mark upon conceptions of inductive inference.* The three dominant schools have been a philosophical positivism, a modernist anti-positivism, and what Galison calls a critical postmodernism. None does full justice to the complicated set of concrete practices called theory and observation, yet each embodies a very different ideal of the contribution of the former to the latter, and each under- writes a different sort of narrative of the project of statistical inference.

In the early part of the present century, the positivist movement asserted the primacy of observation as the foundation and characteristic modality of all knowledge. Rudolf Carnap, for instance, held forth the ideal of a universal protocol language for immediate sense experience prior to all theoretical know- ledge which would permit the accumulation of a body of empirical fact, the result of observation and experiment, which would then guide the development of theoretical explanation. The protocol language was intended to be universal, both in the sense of being free of individual bias and in the sense of underwriting the unity of the physical and social sciences. There might be historical ruptures the structure and content of the theories, but the requirement that each sub- sequent theory would ‘encompass’ its predecessor was guaranteed by the stric- ture that it must account for all the accumulated empirical base, which had no comparable breaks. Problems with specifying precisely how each successor

I This paper is a prolegomena to another, entitled ‘What Can Replication Mean in Econometrics? (Notre Dame Department of Economics Discussion Paper, 1993) which treats the implications of the third, critical postmodernist, approach in much greater empirical detail. This paper includes an appendix detailing the sources for Table 3 below.

‘The following taxonomy and figures, with some emendations, are taken from Galison (1988).

Page 3: Three ways to think about testing in econometrics

P. Mirowski / Journal of Econometrics 67 (I 995) 25-46 21

111;! cumulative observation and experiment

time +

Fig. 1. Philosophical positivism.

theory could account for all the previously accumulated base of experience eventually induced Carnap to replace simple verification with probabalistic confirmation; and this, of course, was thought to provide the warrant for statistical procedures where they were required. Progress was never located in the theories themselves, except insofar as they were thought to serve as perspica- cious summaries of the empirical base.

The decline of positivism has been documented many times, even within the literature on economic methodology (Caldwell, 1982, 1991). It began with various Popperian criticisms of induction, continued with the work of Quine on the ‘Two Dogmas of Empiricism’, and was spread throughout the history and philosophy of science community by Kuhn, Hesse, Lakatos, and especially Feyerabend (Hands, 1992). One theme of this movement was that because the neutral observation language of the positivist was a pipe dream, the total separation of observation from theory was impossible; that observation and experiment were not historically cumulative in any continuously comparable sense, and therefore theoretical activity tended to constitute the bedrock of science, whereas empirical activity was downgraded to a mere excrescence of the theoretical school. When theoretical traditions experienced ruptures, their con- commitant theory-laden observations were also put at risk, so that Feyerabend’s doctrine of observational ‘incommensurability’ became one of the hallmarks of the antipositivist skepticism about scientific progress. The primacy of theory as a linguistic practice meant that older positivist histories of the ‘encompassing’ of previous traditions also came in for skeptical scrutiny: it was shown, for instance, that the Copernican system did not ‘really’ explain everything in the preceding Ptolmaic astronomy, nor did Einstein’s theory of special relativity ‘really’ subsume Newtonian mechanics (Angel, 1980, p. 95; Zahar, 1980, p. 40). Probability theory in this vision of science tended to be associated more with theoretical concerns, and only in an insignificant way with statistical procedures or inductive inference. Since pre-20th-century physicists had made relatively little direct use of statistical procedures in their experiments, this stance was

Page 4: Three ways to think about testing in econometrics

28 P. Mirowski JJournal of Econometrics 67 (1995) 25-46

observation, observation, observation, observation,

theory, theory,

time +

theory, theory,

Fig. 2. Modernist antipositivism.

initially taken as hewing more faithfully to the actual history of science than had the positivist predecessors.3

Because Imre Lakatos has been the philosopher most frequently quoted in debates over econometric methodology, it may clear up some misunderstand- ings if we try to understand his place within this schema. With his language of ‘hard core’, ‘protective belt’, and the other prosthodontic devices, he was very modernist in his stress on the primacy of theory. Theoretical traditions could either flourish or die a lingering death in his philosophy, and the theory- ladenness of observation was accorded its just due. However, as a good conser- vative, he also wished to restore general confidence in the ubiquity of scientific progress by somehow preserving the cumulative character of empirical observa- tion, primarily by appending his notions of ‘excess content’ and ‘novel facts’. These are not the sorts of ideals easily implemented within a statistical concep- tion of empiricism, but then, Lakatos was a mathematician first and an partisan of physics second. It should be noted that Lakatos was never able to reconcile his subordination of empiricism to theory with his doctrine of excess content, which now constitutes the primary locus of disillusion with his philosophy of science (Hands, 1991, pp. 944100).

The critical postmodern movement, associated with such authors as Ian Hacking, Harry Collins, Simon Schaffer, and Peter Galison, has attempted to reinstate the role of empiricism as central to an understanding of scientific inquiry, but without returning to the inverted world of Fig. 1. Their method of accomplishing this task has been to deny any fixed hierarchical relation between

3This point is made in Gigerenzer et al. (1989, p. 211) as well as many other places in the history of

statistics. The rise of error analysis as a pedagogical tool in the German natural science context is

covered in Olesko (1991). The fact that the reporting of error bars in physics is a surprisingly recent

phenomenon is discussed in Philip Mirowski, ‘A Visible Hand in the Marketplace of Ideas: Precision Measurement as Arbitrage’, forthcoming in Science in Context. The movement towards the utili-

zation of statistics in more recent times can be traced to the rise of Big Science, and especially

large-scale sub-atomic particle experiments, where huge teams of researchers sift through a wide

variety of stochastic ‘events’ in order to locate specific particle interactions.

Page 5: Three ways to think about testing in econometrics

P. MirowskilJournal of Economeirics 67 (1995) A-46 29

theory, instrument2 instrument,

instrument, theory, experiment,

experiment, instrument, observation, theory,

time -+

Fig. 3. Critical postmodernism.

theory and empiricism, and indeed to repudiate the shared positivist and antipositivist premise of the unity of all science. They split scientific activity into three groups which possess a fair amount of autonomy: the theorists, the builders of instruments, and the empiricists. Each maintain a life of their own through their own journals, their own pedagogy, their own forms of tacit knowledge. Observations are still acknowledged to be theory-laden, but now it may be the theory of the instrument-builder or the experimentalists rather than that of the high theorist. Theoretical ruptures still happen, but the relative autonomy of the empiricists allows certain ‘facts’ to survive the change of regime. On the other hand, a few contrary empirical results are rarely sufficient to bring down a theoretical tradition, because of its relative autonomy and lack of commitment to the specific lore of the experimentalists. For certain historical periods, theory may come to dominate the other spheres; but it is also possible that empirical traditions may dominate or that an empirical tradition evolves into a rival theoretical tradition. The theory is postmodern in that it repudiates a strict determinism and praises the advantages of fragmentation in structure. As Galison (1988, p. 210) writes, it ‘may suggest how the practice of physics can involve discontinuities at so many levels and yet not disintegrate into isolated blocks . . . When masons build real walls they know better than to stack bricks directly on top of one another . . . Disorder increases strength’.4

It is our contention that the above taxonomy can clarify some very murky issues in the econometric methodology literature. The first source of contention

4 ‘The idea that chance begets order, which is one of the cornerstones of modern physics was put

[by Darwin] into its clearest light’ (Peirce, 1931, Vol. 6, p. 183).

Page 6: Three ways to think about testing in econometrics

30 P. ~~rowski!~~~~~ui of Econometrics 67 (1995) L-46

is the deep ambivalence inherent in the numerous exhortations for more and better testing in the methodology literature. Let us take, for instance, Blaug (1980). However much Lakatos is made to appear the doyen of the ‘new heterodoxy’ (Ch. 2), it is fairly clear from the tenor of his recommendations (Ch. 15) that Blaug is a partisan of the positivist image of empiricism: there is that cumulative base of observation (Blaug probably would not go so far as to say experiments) that would keep us honest, if only those rascals -- the average applied economists in the trenches - would play by the rules of the falsification- ist game. it is eminently a failure of ‘method’, in the sense of irldividua1 conformity to the canons of ‘science’, which prevents us from winnowing out the wheat from the chatI As has been repeatedly pointed out by Wade Hands and other sophisticated philosophers, this is not kosher Lakatos, who would freely allow that all was fair in love, war, and inter-programmatic rivalry. Indeed, if we have correctly catagorized Lakatos as a modernist antipositivist, then it should be the severe internalist theoretical regimen rather than a steady diet of facts that should keep us in line. This is why people like Blaug are so readily hoisted on their own petard, as soon as they acknowledge the merest whiff of depen- dence of the ‘facts’ upon the theory, i.e., the theory-laden character of observation. If monolithic research programs keep churning out their own versions of the relevant facts and chucking them into the ‘protective belt’ whenever convenient, or calling them ‘novel’ anytime they see fit, then all the ritual falsi~cationist activity in the world is not going to make one iota of difference as to how traditions come to agree on the substantive content of their programs.

It seems to me that perhaps the same complaint might be made about David Hendry, at least on those occasions when he speaks with a Lakatosian cadence (He~dry, 1990, pp. 295,307). It is simply historically unfounded to assert that the activities of econometricians have been ‘essentially destructive. The notable achievement of econometrics is the weeding out of inadequate models and economic theories’ (Hendry and Mizon, 1990, p. 122). This could only have been the case in the presence of an independent stable observationa base, which Hendry calls the ‘Data Generating Process’ [DGP], about which knowledge accumulates through time. Hendry’s crusade to elevate belief in the DGP to some sort of scientific litmus test as to whether the econometrician in question is willing to testify to her belief in the existence of the real world, or else otherwise is some species of shameless bounder (Hendry, Learner, and Poirier, 1990, pS 189), is certainly anti-Lakatosian, if not, given the present state of philosophy

“Unfortunately, we lack both reliable data and powerful techniques for distinguishing sharply between valid and invalid propositions in positive economics, and the professional pressures to ‘publish or perish’ . These weaknesses, not so much of theoretical econometrics as of the actual procedures followed by applied econometricians, go a long way toward ex~Iai~jng why economists are frequently reluctant to follow their avowed falsificationist precepts’ (Blaug, 1984 pp. 260-26I).

Page 7: Three ways to think about testing in econometrics

P. Mirowski / Journal of Econometrics 67 (1995) 25-46 31

of science, thoroughly obscurantist. By all accounts, Hendry appears to be an unreconstructed frequentist, talks the Neyman-Pearson language, and likes the positivist image of Fig. 1 that goes along with it. Numerous nostalgic and exploded portraits of science abound in this literature, such as the one about the special theory of relativity ‘encompassing’ Newtonian mechanics (Gilbert, 1990, p. 289; Angel, 1980).

The rhetorical device of ‘encompassing’ embodies the tiresome contradictions of being a positivist in a predominantly antipositivist environment. Contrary to Mizon (1984, p. 139), the very idea of encompassing requires the existence of something like the DGP, for without it, it makes no sense.6 Some sort of ‘nesting’ of rival hypotheses can either take place within the strict confines of a single narrowly defined research program, which is not very interesting, or else it must depend upon a theory-neutral observation language which can ad- equately mediate between the rivals. This has always been the great flaw of the Neyman-Pearson organon: any really interesting rival hypothesis rooted in its own distinct theoretical tradition could rarely be brought into direct quantitat- ive comparison with the ‘null’ due to differences of definitions, different attitudes towards auxilliary hypotheses, differing constitutions of state spaces of possibili- ties, and so forth. It simply will not do to pretend the sensitive problem of translation across programs can be readily reduced to a problem of classical statistics or empiricism alone: ‘General models are best obtained by specifying a particular form for the distribution of relevant variables, rather than being hybrid comprehensive models obtained via linear or exponential weighting of rival models’ (Mizon, 1984, p. 143). This ideal kind of generality resides solely in the eye of the beholder, as one observes in Milton Friedman’s and Anna Schwartz’s (1991) response to Hendry and Ericsson’s claim to have ‘en- compassed’ Friedman’s own model of monetary demand.

Beyond revealing what is incoherent about one methodological stance, our taxonomy may also reveal that the quest for a single foundational basis of empiricism in orthodox economics is a forlorn hope, whether it looks like Fig. 1 or like Fig. 2. If we take the theory-independent observational base as the bedrock of inference, then neoclassical microeconomics is in big trouble, mainly because of the plethora of permitted changes in the designated fundamental causal determinants allowed through time: namely, endowments, tastes, and technologies. There is no intrinsic warrant for the kinds of invariants which are required to define and identify a cumulative base of raw data over time in the positivist world of Fig. 1. If, on the other hand, one opts instead for the portrait in Fig. 2 and deems the theory as the premier source of continuity, then again

6 Hendry himself seems to acknowledge this is true in Hendry, Learner, and Poirier (1990, p. 23 I):

‘Poirier: I gather now that your interest in non-nested hypothesis testing has dwindled.. Hendry: In

my framework, they are all nested in the DGP and always were.’

Page 8: Three ways to think about testing in econometrics

32 P. Mirowski J Journal of Econometrics 67 (1995) 25-46

there is no warrant for a well-defined subordinate empiricism. Since there is so much loose talk about the meaning and significance of general equilibrium these days, we can do no better than to quote a respected practitioner: ‘We have arrived at the point where the current model is shown to be intrinsically incapable of generating verifiable propositions. Yet, far from being a purely abstract theoretical problem, it is one of real significance for practising econo- mists’ (Kirman, 1989, p. 127).

If this makes neoclassicism sound like an impossible project, that would be a misreading of our intentions in raising these issues. The point here is rather that neither positivism nor modernist antipositivism can describe the actual practices of orthodox empirical economists. After all, thousands of applied econometric papers are published each year. Neither do these authors behave as though there were some theory-independent data-generating process, nor do they passively conform to the dictates of some overweening theoretical ukase (Summers, 1991). Instead, we would argue that Fig. 3 best describes the situation in orthodox neoclassical economics: there is a fragmentation of traditions, with some groups prosecuting empirical practices semi-decoupled from most formal economic theory, others hewing to some loose Marshallian or Nash game- theoretic (as opposed to Walrasian) dictates, some standard statistical practices mutating into theoretical traditions (as with rational expectations), other de- coupled theoretical traditions mutating into statistical projects (as with the Brock-Dechert-Scheinkman statistic growing out of an initial fascination with chaos theory), and finally, numerous statistical practices lacking any solid foundation in probability theory at all. That all can, if they so choose, claim to be part and parcel of the orthodoxy is due to some loose shared beliefs, such as privileging constrained individual optimization or perhaps treating the standard restrictions upon utility functions as inviolate; but few, if any, could attest to conformity to any strict checklist in its entirety. And it is precisely this lack of strict global discipline which renders the orthodoxy so very robust, contrary to the dictates of Blaug’s sado-masochistic school of methodology.

We might press this point further. A critical postmodernism would assert that there could be no such thing as a generic statistical procedure which could, on its own, guarantee the scientific status of a particular inquiry in all its glorious historical specificity. For instance, under its auspices we would be prompted to ask: why did regression analysis become so entrenched in neoclassical eco- nomics in the later 20th century, whereas in sociology it was factor analysis and in hydrology it was R/S analysis (Mandelbrot, 1983, p. 386)? It should give econometric methodologists some pause to realize that the specific bundle of Neyman-Pearson hypothesis testing and regression analysis which became the hallmark of the Cowles Commission in the 1940s was due to the unlikely conjunction of influences emanating from physics (Tintner and Sengupta, 1972, p. 9; Mirowski, 1989a), the family resemblances between Neyman-Pearson and neoclassical optimization, and the serendipidous fact that Neyman moved to

Page 9: Three ways to think about testing in econometrics

P. Mirowski / Journal of Econometrics 67 (1995) 25-46 33

America in 1938. In the history written from the vantage point of the victors, there were no other alternative methods at that time; but a closer reading of the record suggests there was another format of inductive statistics ensconced at the National Bureau for Economic Research in the later 1920s and 30s one much more concerned with estimating population distributions than with point hy- pothesis testing (Mirowski, 1989b). The ‘measurement without theory’ contro- versy was not transparently about naive bumbling Baconians versus partisans of a judicious mix of theory and statistics, as Koopmans would have had it; but rather, it was about the incipient dominance of Cowles over the NBER, neo- classicism over institutionalism, and Neyman-Pearson over Karl Pearson.’ The instantiation and stabilization of regularized empirical practices is always embedded in a richer history of contending theories, instruments, and observations.

Thus we cannot agree with Spanos (1990, p. 359) that, ‘if two people were given the same data set and similar theories, it is highly likely that they will end up with the same choice of statistical model and similar statistically adequate model’.’ One need only think of competing linear and nonlinear tests for extractable structure in the residuals, or else the contrast between Gaussian and Levy stable distributions, to observe that this is false. In the parlance of neural nets, learning processes beginning with somewhat different random starting points defined over the same experimental regimen can end up with entirely different ‘internal concepts’ or statistical foundations (Tienson, 1990, p. 391). In the critical postmodern idiom, we might want to think of statistical models as the equivalents of ‘instruments’, which are constructed to help the empiricists, but maintain a degree of formal and institutional autonomy in their own history, may have originally been spawned from theoretical or practical concerns, and which themselves from time to time spawn theoretical traditions. People trained as geographically close as Wisconsin and Minnesota will come up with wildly different DGPs, while someone trained at UCLA will deny their very existence. Some may deplore this as an irresponsible cacaphony; but others will be

‘It is interesting that Spanos (1989, p. 413) one of the few who have actually made the effort to read

the original texts, notes that Koopmans does not conform to what he perceives to be the novel

probabalistic framework in that article. The reason is that it was NBER researchers such as

Frederick Mills and Frederick MacCaulay who were beginning from unrestricted distributions of

economic data, and not the Cowles contingent. On this issue, see Mirowski (1989b).

‘This may, however, misrepresent Professor Spanos’ actual position. In his letter to me dated 26

February 1992, he states: ‘I do not disagree with your statement that, given the current state of

econometric modelling, if we give two econometricians the same data set and similar theories they

are not going to reach the same statistical model. My argument was that, in the context of my approach, the two statistical models will be very similar and there will be common basis to discuss

any differences.’ But statistical differences are only one axis amongst many where disputes might be

located, the location itself being a critical matter for negotiation.

Page 10: Three ways to think about testing in econometrics

34 P. Mirowski J Journal of Econometrics 67 (1995) 25-46

heartened by the absence of a Big Brother who would by rule of force render science ‘rational’.

Thus we assert our three different scenarios of ‘testing’ in 20th-century thought more or less map into three divergent positions in the econometrics literature. I hope it is now clear that the Hendry persuasion is a version of the Neyman-Pearson frequentist conception, which is itself deeply rooted in the positivist image of science portrayed in our Fig. 1.9 From Haavelmo’s (1944, p. 28) notion of ‘autonomy’ to Hendry’s DGP, it is a metaphor where ‘Nature’ effectively performs our experiments for us by proffering observations from her ‘infinite ballot box’; we are exquisitely passive in our struggle to discern the contents of the black box. The positivist metaphor has some historical basis in Karl Pearson’s Grammar of Science (Gigerenzer et al., 1989, p. 59) and lives on in a curious new ontological inversion of the positivist project in the guise of Nancy Cartwright’s ‘capacities’.

On the other hand, the Bayesian econometricians, and in particular Edward Learner, are the true inheritors of the modernist antipositivist movement. It is no accident that both Specification Searches and the ‘Con’ article repeatedly quotes Kuhn and Polanyi; as Learner said later: ‘I think the issues are by and large defined prior to observation. These are the things I am hoping to learn from the data set’ (Hendry, Learner, and Poirier, 1990, p. 195). The claim that ‘it thus seems desirable to have a method by which evidence can be formally discounted when the postdata model construction occurs’ (Learner, 1978, p. 286) looks very much like our Fig. 2.

This is the crux of the matter that drives the latter-day positivist right up a tree: ‘Learner’s fragility means that people can doubt enormous amounts of evidence just because they choose to’ (Hendry, 1990, p. 245). In the Bayesian corpus, the problem of theory incommensurability is mostly treated as a differ- ence in individual priors; the relative subordination of empiricism to theory is represented by the ability to attach near-zero weight to various classes of evidence (Howson and Urbach, 1989, pp. 96-102); the ‘paradigm’ is held together more by an unspecified external injunction” to obey Bayes’ Rule than by any mandate of the data: theory is ‘characterized in terms of its parametric

9 One of the referees claims that Hendry might be better understood as a partisan of the approach of

R.A. Fisher, whom, as we know, disparaged the Neyman-Pearson approach. Since there is no

specific published evidence for this interpretation, and nor is there to my knowledge any explicit

consideration of fiducial inference, we shall postpone this reading of the controversy.

i” The problem referred to here is the lack of commitment on the part of Bayesian econometricians

to either de Finetti’s Dutch Book argument or else some hardwired psychological predispositions

which would explain why personalist probabilities are internally coherent, and thus why researchers

must only employ Bayes’ Rule in their otherwise idiosyncratic practices. Even Learner, the only

econometrician I have found who describes these philosophical problems in a textbook context,

explicitly opts out of commiting himself to either (Learner, 1978, pp. 33-34).

Page 11: Three ways to think about testing in econometrics

P. Mirowski 1 Journal of Econometrics 67 (I 995) 25-46 35

hypotheses placed on the parameters (mental constructs) 8, to distinguish the theory from others. The choice of the window is largely a subjective exercise, albeit one subject to professional pressure to make it large enough to include the ‘theoretical panes’ of the other theories’ (Poirier, 1988, p. 223). The idea of having your priors and the identity of your peers drummed into you in graduate school is entirely consistent with this image of science. ‘If we take (as I do) simplicity as a consequence of man’s and society’s shortcomings, the definition of simplicity necessarily changes from social milieu to social milieu . . . Concept formation is interpreted as the decision to use a more complex model that was at least implicitly known all the time’ (Learner, 1978, pp. 204, 288). This is the central thrust of Fig. 2; and of course, the spectre that stalks this program is its inability to justify any sort of ‘progress’ while maintaining the premise of the unity of science, which appears as some species of alien rule from an obscure distant metropole.

Hence our assertion that recent discussions of econometric methodology are dominated by the voices and scribblings of long-dead philosophers of science. We shall next be concerned to assert there exists a third conception of the activity of testing, already present in other sciences as well as at the margins of discussions of econometric methodology, which resonates to a greater degree with the critical postmodernist scenario of Fig. 3. The potential attractions and drawbacks of this alternative notion are sketched very briefly in the next section.

3. Replication as hallmark of postmodern empirical commitment

In the postmodern idiom, the narrative of the ‘test’ can no longer be embodied in the persona of a single investigator. There are instead different groups, with divergent notions of what needs to be done, groups which interact in unpredict- able ways. It is instructive that the ‘instrument builders’ (i.e., theoretical econo- metricians), no matter what might be their favorite image of empirical endeavor, seem to believe that their insights can be entirely embodied in a computer program such as SEARCH or PC-GIVE or RATS. This algorithmic conception of inference would of course tend to be favored by the community of instru- ment-builders, but it hardly constrains the intentions and implementations by the theorists or the actual applied economist. In their less ‘formal’ moments, I think this is implicitly recognized by both the Positivist/Frequentist and the Bayesian/Antimodernist.

‘Modelling is not seen in economics as an incremental progressive accumula- tion of knowledge in the way I see it in other subjects where [N.B.-PM] you can get credit for taking an experiment somebody else did and improving upon it in one direction’ (Hendry, Learner, and Poirier, 1990, p. 180).

‘[A] strong argument can be made that statistical inference, not Sherlock Holmes inference, is unscientific . . . It is embarrassing, on almost the last page

Page 12: Three ways to think about testing in econometrics

36 P. MirowskilJournal ofEconometrics 67 (1995) 25-46

of a book dealing with learning, to make the observation that the personal learning heretofore discussed constitutes only a small part of the learning process’ (Learner, 1978, pp. 286, 319-320).

I propose that we try to take these comments very seriously; but in order to do so, we must expand our notion of testing well beyond the algorithmic concep- tions exemplified by the Frequentist and the Bayesian. The postmodern ap- proach in sharp contrast rejects the assumption implicit in both other traditions that every individual scientist, given sufficient background and calculational technology, can decide for him/herself the complete implications of any empiri- cal exercise by wringing every last drop of ‘information’ from a sample. The irreducable fact of their various locations within communities of different interests, competencies, and objectives belies that possibility. Instead, we are prompted to ask how it is that a particular empirical result becomes stabilized through the interaction of theorists, econometric theorists, and applied empiri- cal economists. It has long been the folklore that these three character types regard the act of ‘testing’ quite differently; it follows that statistical tests them- selves must be subordinate to some larger empirical process. We shall very briefly describe how the postmodernist would organize an inquiry into the structure and status of econometrics. Further, this is not intended as yet another naval-gazing paper in philosophy with no concrete implications for any sub- sequent practice. With some preliminary clarification, the presence or absence of replicator activity will become a prime empirical question, one which itself can be subjected to both statistical and institutional inquiry.

Our case is actually made easier by the prior appearance of the sensational article by Dewald, Thursby, and Anderson (1986) in the American Economic Review which revealed that only 1.3% of the econometric exercises published in the Journal of Money, Credit and Banking could be replicated (in a sense yet to be explored), and moreover that came only after extensive correspondence with the authors after the fact, even in a regime mandating full documentation of data sets and procedures as a prerequisite of submission of articles for consideration of publication. This result was seconded by Mirowski and Sklivas (1991) who found that an attempt to encourage replicator activity by the editors of the Journal of Political Economy did nothing to improve a very low ratio of successful replication activity. Similar support has also been provided by Hub- bard and Vetter (1991, 1992).

The crux of the problem of the dearth of replication within econometrics is not illuminated by either the positivist or antipositivist narratives, although it is highlighted by the quotes from Hendry and Learner above. What seems to be the genesis of distress amongst applied econometricians is the palpable lack of credit garnished from attempting to criticize or improve upon a pub- lished empirical econometric exercise. Our postmodern point of departure would note that the act of ‘getting credit’ is neither a genuinely positivist nor antipositivist notion. Although most econometricians would happily subscribe

Page 13: Three ways to think about testing in econometrics

P. Mirowski/Journal of Econometrics 67 (1995) 25-46 31

to the assertion that there has been substantive improvement in econometric theory over the last three generations, they probably feel there has been no attendant cumulative improvement of econometric applications, i.e., improve- ment of individual empirical exercises. And conversely, the Walrasian neoclassi- cal theorist has largely absolved himself from any concern over the stability of parametric entities, as mentioned in the section above. The critical postmodern- ist would then suggest that the problem lay in the relations between diachronic slices of communities in Fig. 3, and not in some flaw intrinsic to econometrics as generic tool-making process; in other words, there exists a credibility gap between the semi-autonomous theoretical, instrumental, and empiricist traditions in econometrics. Thus, the problem of replication in contemporary economics is that empiricist traditions cannot solely resort to their statistical instruments embodied in computer packages in order to define and imple- ment replication.

To those who have never really paid attention to the actual process of investigation, it may come as a shock to learn that no experiment or observation is ever exactly replicated. As long as the definition of the phenomenon at issue is in question, then the meaning of replication is free to be situated at many different levels; at the same data, the same context, the same instrumentation, the same behavior of the experimenter, the same numerical values, the same theoretical import, the same efficacy in utilization, and so on. Collins illustrates the postmodern maxim that no published report can provide sufficient guidance to exactly replicate the experiment of another researcher due to what he calls the ‘experimenter’s regress’: ‘to find out if X exists, you need to build a good X-detector; to check that the X-detector is good, you try it out on observing X; but to know whether a good X-detector should see X or not-X, you have to know whether X exists’ (Collins, 1991a, p. 131). This is augmented with the further postulate that the provision of adequate guidance is prohibitively costly, and that requests for complete specs and documentation are reasonably re- garded as hostile acts (Mirowski and Sklivas, 1991). The explanation behind this social phenomenon is that no journal wants to publish an exact replication, a pure duplicate in every possible respect, since it in principle adds nothing to the base of knowledge; the sole motivation behind such requests for documenta- tion must therefore be an intention to undermine the reported results. Hence, experimenters will tend to encourage extensions and elaborations of their experiments, but will not regard pure replication (however defined) as a useful activity.

Although economists have long prided themselves as not being shocked by arguments from self-interest, I have found whenever these arguments about their own practices are broached, they suddenly wax righteously indignatious about fraud or bogus research or ‘unscientific’ behavior. Since models that analyze the problem of adopting standardized technology or product in the industrial organization literature (Farrell and Saloner, 1985) are very common, I cannot

Page 14: Three ways to think about testing in econometrics

38 P. Mirowski/Journal of Ecanometrics 67 (199.7) 25-46

see why what is good enough for a rational economic man is not good enough for the empirical applied econometri~ian. If what counts as replication is always and everywhere the subject of negotiation, and there is a systematic bias against initiators of experiments being willing to engage in such negotiations, then complete replication of a concrete experiment will be a rare occurrence in science. This is indeed historically the case in physics, and if we accept the postmodern anaiogy of econometric theory with instrumentation, it is also the case in economics.

The time has come to clarify the various uses of the term ‘replication’ in the context of postmodern notions of empirical practice. Let us call these four versions of replication consunquinity, mimesis, extension, and calibration. We begin with the most innocuous use of the term, and progressively add more complexity with each new connotation. The first meaning of replication is very simply that two individuals are hard at work on what they both regard as the same scientific research program. Perhaps they have undergone the same or similar training, have read the same or similar texts, and indeed regularly communicate with each other, formally or informally, on their respective activ- ities. This first version of ‘replication’, while bordering upon the trivial, is not entirely empty, since it stresses the fact that in the absence of some reciprocal recognition by the relevant actors, nobody succeeds in replicating anything. The second connotation is the more commonplace usage of replication in dis- cussions of science, where a second researcher claims to have entirely mimicked the procedures of the first; this might involve near-precise imitation of a re- ported experiment, or it might involve refitting the same equation as an original econometric investigator. This, of course, was the connotation favored by Dewald et al. (1986), and is the one that most people immediately think of when they use the term. The drawback of this interpretation is that this kind of replication is absent in most science, be it healthy or not, for the reasons outlined above. Pure mimicry of exact procedures is effectively barred by the requirement of the infinite amount of information requisite to do so, in the absence of a prior categorization of the universe of experience into relevant and irrelevant aspects, This leads us to the third, and more commonly practiced, version of replication, namely, the attempt to reproduce the ‘same phenomenon’ in a somewhat divergent setting or context. Here, instead of trying to recapitulate the research program of the original scientist, one negotiates a slightly different manifestation of the effect, motivated at least in part by the research interests of the second investigator. While this version is more common precisely because it is generally perceived by the originator as a friendly act ‘extending’ his rest&s, it is also inherently very difficult and complicated, since it involves protracted negoti- ations over the legitimacy of claiming the phenomenon or effect is really ‘the same’ in the novel setting. Finally, the most complex version of the process of replication admits that most cutting-edge research regularly needs to check the promiscuous attribution of identity, and achieves this by further intervention in

Page 15: Three ways to think about testing in econometrics

the form of adjusting the novel inquiry to those which have preceeded it, usually by imposing some other previous practices as exemplars. Here the task is to make instruments agree in their operation upon materials which are acknow- ledged by separate investigators to be ‘the same’. In the area of the numerical determination of constants, this phenomenon is often made manifest in the ‘bandwagon effect’.

Depending upon how you look at it, the bandwagon effect is either the most disturbing bit of evidence around that science is infected by social consider- ations, or else it is a most ingenious practical solution to the experimenter’s regress. It has been discovered in 19th-century attempts to measure physical constants (Henrion and Fischhoff, 1986, p. 793) and in the most up-to-date particle physics (Franklin, 1990, p. 140; Rosenfeld, 1975, p. 581). Briefly, the bandwagon effect is demonstrated by piotting the estimates of a particular physical constant by different investigators along with their accompanying standard error estimates against time. What has been observed over and over is a tendency for estimates to bunch within standard errors for some stretch of time, and then for estimates to jump outside the error bars, only to cluster within a different set of error bars for another interval. The causes of this phenomenon are still being widely debated in the history and philosophy of science literature; the proposed explanations range from shifts in theory regimes to revival of the prospect of the inconstancy of the physical constants. We, on the contrary, will regard this phenomenon as evidence for the most active form of replication, which we have dubbed ‘calibration’. Instead of ‘testing’ or checking the work of a previous investigator, the dominant tendency in science is rather to bring one’s own work into line with the previously reported practices. This, of course, is one more major reason why the more commonplace hostile version of replication, which we have dubbed ‘mimesis’, is so rare in practice.

Many have interpreted the fIndings of Dewald, Thursby, and Anderson (1986) as indicating that what they have uncovered is not ‘replication’ of any empirical facts per se, but rather an inability to police the activities of empirical econom- ists. The critical postmodernist would retort that once the notion of replication is sufficiently explicated, one discovers that the failure of replication could occur for many different reasons and at many varied levels, and that not all of them are necessarily pernicious. It is precisely because the positivist and antipositivist scenarios capture neither the social concerns nor the quatidian practices of applied economists that the narrowly conceived statistical methodologies and computer packages that they tout as panaceas are severely compromised by the pervasive absence of the latter three sorts of replicator activity: namely, mimesis, extension, and calibration. The postmodernist would not presume that this is a prima facie indication of failure, but would instead inquire further: How is stability presently negotiated in empirical econometric work? Is it solely a func- tion of the econometric tools? How does it stack up relative to our empirical knowledge of the other sciences?

Page 16: Three ways to think about testing in econometrics

40 P. Mirowskii Journal of Econometrics 67 (1995) 25-46

4. Comparing replication in the sciences: Birge ratios

In order to merely suggest at least one way this inquiry might be prosecuted, we will turn to a statistical device first introduced into physics in order to assess the relative quality of estimates of physical constants, namely the Birge ratio. The physicist Raymond Birge, one of the people most instrumental in bringing Jerzy Neyman to Berkeley in 1938 (Reid, 1982), was very concerned in the 1930s to stabilize the precision measurement of such physical constants as the speed of light and Planck’s constant. Birge’s project was to try and clarify how much of the uncertainty plaguing the constants was due to factors internal to each individual experiment, and how much was due to differences of approach between experiments, or as he put it, the difference between internal and external consistency of measurement (Petley, 1985, p. 304; Henrion and Fischoff, 1986). If we impose the conventional assumption of normality, and each experiment reports an estimated value Xi and estimated standard error cir and we indicate the mean estimated value of the sought-after constant over all experiments as X3, then Birge defined Q* or the internal consistency as the sum of the reciprocals of the squared estimated standard errors, whereas the external consistency be- tween experiments was defined by

a, = (

t (Xi - XfZ/$ (n - 1) i (l/c+) -l. i=l x i=t >

The Birge ratio was then simply defined as B = ~,/a*. As can be readily observed, when B = 1, the assessments of standard errors within experiments can adequately account for the distribution of errors in the estimates of X be- tween experiments. If B is very much greater than one, then one or more of the experiments have severely underestimated the uncertainty of the value of X, or else the presumption of joint independence between experimental determina- tions may be called into question, in the sense that systematic errors may be present. Generally, a B well above one is taken as an indication of overconfi- dence in the reporting of statistical errors. Conversely, a B very much less than one means that some experiments are reporting standard errors that are too large. A value of B less than one is taken as a prevalence of experimenter preference for erring on the side of caution.

Birge went on to demonstrate that, under the assumptions of normality and independence, B is distributed asymptotically chi-square with (n - 1) degrees of freedom. Subsequent authors have proposed various amendments to the Birge ratio in order to accomodate further statistical refinements (Petley, 1985, pp. 3055306). These amendments, as well as a discussion of issues relating to the statistical properties of the Birge ratio, are relegated to a separate paper.’ ’ Here,

” The treatment of replication as one possible postmodern mode of analysis of ‘testing’ is discussed

at much greater length in my ‘What Could Replication Mean’, cited in footnote I above.

Page 17: Three ways to think about testing in econometrics

P. Mrowski J Journal of Econometrics 67 (1995) 25-46 41

we merely wish to suggest how previous work on replication done in physics and psychology using a widely-accepted procedure might be extended to an analysis of econometrics. To that end, we report a selection of Birge ratios calculated for estimates of physical constants (Table 1) and psychological ‘constants’ (Table 2).

The lack of independence of experimental estimates of the physical constants becomes patently clear when one plots the values and standard error bars of a specific physical constant through time. Empirical values which are chrono- logically adjacent tend to bunch together, with periodic sharp changes in regime

Table 1

Birge ratios for estimates of physical constants

Constant Years B N”

Speed of light 1875-58 1.42 21 Gravitational 1798883 1.38 14 Magnetic moment proton 1949967 1.44 7 HA fine structureb 2.95 24 LA fine structure’ 1.26 14

Muon lifetime 1957780 3.28 10 Charged pion mass 1957780 2.23 IO Lambda mass 1957-80 4.34 10 Lambda lifetime 1957-80 2.72 27 Sigma lifetime 1957-80 1.62 16 Omega mass 1957-80 0.86 11

Sources: First panel, Henrion and Fischoff (1986, p. 794); second panel, Hedges (1987, p. 447).

“Number of studies estimating value of constant.

“High accuracy’ measurements of inverse fine structure constant. No dates given.

“Low accuracy’ measurements of inverse fine structure constant. No dates given.

Table 2

Birge ratios for psychological ‘constants’

Subject B N

Sex/Spatial perception 1.64 62 Sex/Spatial visualization 1.27 81 Sex/Verbal ability 4.09 11 Sex/Field articulation 1.75 14 Open ed./Reading 5.87 19 Open ed./Math achievement 2.73 17 Open ed./School attitude 2.16 11 Open ed./Self-concept 1.39 18

Source: Hedges (1987, p. 449).

Page 18: Three ways to think about testing in econometrics

42 P. Mirowski/Joumal of Econometrics 67 (1995) 25-46

to a bunch clustered around a distance value relative to the previous error estimates (Henrion and Fischhoff, 1986, p. 793). Franklin (1990, pp. 1399140) gives a graphic recent example of q + _ , the CP-violating parameter in K” decay. A break in 1973 between two sequences of measurements has the mean of the second cluster over eight standard deviations away from the first cluster; he admits that there is still no resolution of this discrepancy. Petley calls this phenomenon ‘intellectual phase locking’ and Franklin calls it the ‘bandwagon effect’, but whatever you call it, it plainly is a special case of Collins’ Experimen- ter’s Regress. Since the ‘world or the DGP or other externalist euphemism underdetermines the sought-after value, experimenters must negotiate with one another to calibrate the phenomenon, and this results in serious underestimates of the magnitude of statistical errors.

Again we note the great preponderance of studies imply Birge ratios well above unity, this time in psychology, which suggests once again widespread overconfidence in the reporting of standard errors. In Table 2, all the estimates were extracted from survey articles, so we do not have the direct explicit evidence that we have for the physical constants of the bandwagon effect, due to the inability to plot the estimates against time. Nonetheless, the collator of these ratios felt that the lack of independence of sequential estimates was the probable culprit. However, the more startling implication is that empirical statistical work in sociology and psychology is not so very different in character from that found in physics, at least in terms of magnitude of Birge ratios. Of course, one should exercise extreme caution in these comparisons, since any bias in choice of studies could certainly change the magnitude of Birge ratios. Having registered some surprise, it should be entered on the positive side of the ledger that numbers of published studies attempting to gauge a specific magnitude are potentially much greater in the social sciences than in physics, so that large numbers alone could argue for greater stability in the social sciences’ implied Birge ratios.

And now we turn briefly to the situation in econometrics. I should report that, when I began this project, I fully expected that developing a set of Birge ratios comparable to those in Tables 1 and 2 would not be a major problem in economics. I rapidly discovered I was sorely mistaken. In turning to the premier orthodox survey journal of the profession, the Journal of Economic Literature,

I was stunned to learn that only one article in more than a decade of surveys even bothered to report the actual regressions and standard errors of the works discussed (Judd and Scadding, 1982). But then, further digging revealed that such surveys are almost entirely absent, even in the specialist journal literatures. This does not mean that applied researchers in well-defined research topics do not have a fairly good idea of the location of the ‘important’ articles in their own area; the problem seems to be that much of this knowledge is tacit, often buried in inaccessible working papers, theses, and unpublished appendices; and further, standard protocol in most applied areas does not elicit explicit

Page 19: Three ways to think about testing in econometrics

published comparison of your regressions with those of other seminal papers. As a consequence, economics is rife with what we have called ‘extension’, but extremely poor in ‘mimesis’ and ‘calibration’. This situation diverges sharply from both physics and psychology, primarily because those other fields take upon themselves the onus of replication so much more seriously than do economists. For instance, the physicists have instituted the Particle Data Group, which regularly meets to review experimentai procedures and estimates of various values of constants, reject some experiments entirely, and process the remaining efforts into a concensus value (Rosenfield, 1975). The psychologists have an entire literature called ‘meta-analysis’, devoted to scrutinizing and reprocessing individual statistical estimates into best-practice results. The neo- classical economists, partly because the theorists have washed their hands of any responsibility for replications and partly because econometric theorists persist in phrasing the issue in individual decision-theoretic terms, have nothing. This is reflected in the Birge ratios in Table 3.12

These topics were chosen to span the space of empirical ‘constants’ of concern to applied economists (and not necessarily econometricians). In no sense do we seek to impugn any particular schools or journals or individuals; rather, the deeply disturbing magnitudes of the Birge ratios reported in Table 3 should indicate a structural problem in the economics discipline as a whole. We are also not concerned here to turn this exercise into some form of Neyman-Pearson hypothesis test, where the null is B = 1. an exercise which would undermine the central theses of earlier sections of the paper. We also postpone consideration of the individual literatures and the adequacy of our selection procedures, com- parability of studies included, etc. Those are the sorts of considerations which would give rise to a serious postmodern literature on the practices of econo- metrics, a literature we should like to convince the reader is desperately needed.

It turns out that there is more than a grain of truth to the Hendry and Learner observations: it seem economics journals really do not publish empirical surveys which are in any sense comprehensive or cumulative relative to those in other sciences. I think this constitutes some slim evidence that the contemporary crisis of empiricism in economics is not one requiring more or better econometric technology, or adequately dealt with by bemoaning a dearth of intrepid honest individuals fearlessly exposing their ideas to stringent tests. Rather, there are now in force rational systematic tendencies to present the results of your empirical exercise as if it were coherent with those of your local reference group, which

“The authors, sources, elasticity estimates, and standard errors are available from the author as an appendix to the paper cited in footnote I above. It should go without saying that I intend to calculate a few more Birge ratios in economics, once relevant bibliographies of econometric searches for other ‘constants’ are uncovered and various questions concerning the statistical properties of the Birge ratios are explicitly addressed.

Page 20: Three ways to think about testing in econometrics

Table 3

Birge ratios for empirical ‘constants’ in economics

Model Years B N

US money demand elasticity 1971-88 49.66 UK money demand elasticity 1Y71-91 73.34 Purchasing power parity, 1920s FF/S 1973-88 3.14 US import income elasticity 1974 Yo 29.87 US import price elasticity 1974 90 5.49 US export income elasticity 1963-W 24.90 US export price elasticity 1963 Yo 4.89 employment output elasticity, US manufacturing lY67 -74 22.70 Welfare spell length and race lY86 92 2.23

Sources: See text. and an Appendix to ‘What can replication mean in econometrics’.

Y

6 14 IO 13 9 6 5

-

always involves optimism about error bounds; and this is the case whether you are a frequentist or a Bayesian, a physicist or an economist. If the error estimates are corrupted as a consequence, then so are all derivative statistical procedures which depend upon them. What is sorely lacking in economics is any effective concern over the long-term character of short-term opportunistic negotiations. Birge ratios are simply one method of making this point. What will be required are detailed historical studies of the actual relationships between theoretical traditions, empiricist traditions, and instrumentation groups to discover how disputes are in fact stabilized. In other words, the econometric theorists are the last persons who one would expect to diagnose a malaise in economics.

References

Angel, Roger, 1980, Relativity: The theory and its philosophy (Pcrgamon, New York, NY). Blaug. Mark, 1980. The methodology of economics (Cambridge University Press. Cambridge). Caldwell, Bruce, IY82, Beyond positivism (Allen and Lnwin, London). Caldwell, Bruce, 1991, Clarifying Popper. Journal of Economic Literature 29, l-33. Cartwright. Nancy, 1989. Nature’s capacities and their measurement (Oxford University Press,

Oxford). Cartwright, Nancy, 1991, Replicability. reproducability and robustness, History of Political Econ-

omy 23, 143 155. Collins, Harry, 1985, Changing order (Sage, London). Collins, Harry, IYYla, The meaning of replication and the science ofeconomics, History of Political

Economy 23, 123- 142. Collins, Harry, I991 b, in: de Marchi and Blaug, eds. (IYY 1). de Marchi, Neil and Christopher Gilbert. eds., 1989, The history and methodology of econometrics

(Oxford University Press, Oxford). de Marchi, Neil and Mark Blaug, eds.. 1991, Appraising economic theories (Edward Elgar, Alder-

shot).

Page 21: Three ways to think about testing in econometrics

P. Mirowski / Journal of Econometrics 67 (1995) 25-46 45

Denton, Frank, 1988, The significance of significance, in: Arjo Klamer, Donald McCloskey, and Robert Solow, eds., The consequences of economic rhetoric (Cambridge University Press, New York, NY).

Dewald, W., J. Thursby, and R. Anderson, 1986, Replication in empirical economics, American Economic Review 76, 587-603.

Farrell, J. and G. Saloner, 1985, Standardization, compatibility and innovation, Rand Journal of Economics 16, 62-83.

Franklin, Allan, 1986, The neglect of experiment (Cambridge University Press, New York, NY). Franklin, Allan, 1990, Experiment, right or wrong (Cambridge University Press, New York, NY). Friedman, Milton and, Anna Schwartz, 1991, Alternative approaches to analyzing economic data,

American Economic Review 81, 39949. Galison, Peter, 1988, History, philosophy and the central metaphor, Science in Context 2, 197-212. Garfield, Jay, ed., 1990, Foundations of cognitive science (Paragon, New York, NY). Gigerenzer, Gerd and David Murray, 1987, Cognition as intuitive statistics (Lawrence Erlbaum.

Hillsdale, IL). Gigerenzer, G., Z. Swijtink, T. Porter, L. Daston, J. Beatty, and L. Kruger, 1989, The empire of

chance (Cambridge University Press, New York, NY). Gilbert, Christopher, 1990, in: Granger, ed. (1990). Gilbert, Christopher, 1991, in: de Marchi and Blaug, eds. (1991). Granger, Clive, ed., 1990, Modelling economic series (Oxford University Press, Oxford). Haavelmo, Trygve, 1944, The probability approach in econometrics, Econometrica 12, Suppl. Hands, Wade, 1991, in: de Marchi and Blaugh, eds. (1991). Hands, Wade, 1992, Testing economic rationality (Rowman and Littlefield, Totawa). Harding, Sandra, ed., 1976, Can theories be refuted? Essays on the Duhem-Quine thesis (Reidel,

Dordecht). Hedges, Larry, 1987, How hard is hard science, how soft is soft science?, American Psychologist 42,

443-455. Hendry, David, 1990, in: Granger, ed. (1990). Hendry, David and N. Ericsson, 1991, An econometric analysis of UK money demand in monetary

trends in the US and UK, American Economic Review 81, 8838. Hendry, David and Grayham Mizon, 1990, in: Granger, ed. (1990). Hendry, David and Jean-Francois Richard, 1983, The econometric analysis of economic time series,

International Statistical Review 51, 111-163. Hendry, David, Edward Learner, and Dale Poirier, 1990, A conversation on econometric method-

ology, Econometric Theory 6, 171-261. Henrion, Max and Baron Fischoff, 1986, Assessing uncertainty in physical constants, American

Journal of Physics 54, 791-797. Howson, Cohn and Peter Urbach, 1989, Scientific reasoning: The Bayesian approach (Open Court,

LaSalle). Hubbard, R. and D. Vetter, 1991, Replications in the finance literature, Quarterly Review of

Business and Economics 30, 70-78. Hubbard, R. and D. Vetter, 1992, The publication incidence of replications and critical commentary

in economics, American Economist 36, 29934. Judd, John and John Scadding, 1982, The search for a stable money demand function, Journal of

Economic Literature 20, 99331023. Kirman, Alan, 1989, The intrinsic limits of modern economic theory: The emperor has no clothes,

Economic Journal, Suppl., 99, 1266139. Learner, Edward, 1978, Specification searches (Wiley, New York, NY). Leonard, H. and K. Maskus, 1992, Edward Learner, in: Warren Samuels, ed., New horizons in

economic thought (Edward Elgar, Aldershot). Linden, Dana, 1991, Dreary days in the dismal science, Forbes, Jan. 22nd, 68-70.

Page 22: Three ways to think about testing in econometrics

46 P. Mirowskij Journal of Econometrics 67 (1995) 25-46

Mandelbrot, Benoit, 1983, The fractal geometry of nature (Freeman, New York, NY). Mayer, Thomas, 1993, Truth versus precision in economics (Edward Elgar, Aldershot). Mirowski, Philip, 1989a, The probabalistic counter-revolution, Oxford Economic Papers 41,

217-235. Mirowski, Philip, 1989b, The measurement without theory controversy, Economies et Societes,

Serie Oeconomia 65-87. Mirowski, Philip, 1990, From Mandelbrot to chaos in economic theory, Southern Economic

Journal 57, 289.--307. Mirowski, Philip and Steven Sklivas, 1991, Why econometricians don’t replicate (although they do

reproduce), Review of Political Economy 3, 146-163. Mizon, Grayham, 1984, The encompassing approach in econometrics, in: D. Hendry and K. Wallis,

eds., Econometrics and quantitative economics (Basil Blackwell, Oxford). Olseko, Katheryn, 1991, Physics as a calling (Cornell University Press, Ithaca, NY). Peirce, C.S., 1931, Collected works, Vol. 6 (Harvard University Press, Cambridge, MA). Petley, Brian, 1985, The fundamental constants and the frontier of measurement (Adam Hilger,

Bristol). Poirier, Dale, 1988a, Causal relationships and repli~bility, Journal of Econometrics 39, 213-234. Poirier, Dale, 1988b, Frequentist and subjectivist perspectives on the problem of model building,

Journal of Economic Perspectives 2, 121- 170. Reid, Constance, 1982, Neyman from life (Springer Verlag, New York, NY). Rosenfield, A.H., 1975, The particle data group: Growth and operations, Annual Review of Nuclear

Science 255555599. Shapin, Steven and Simon Schaffer, 1985, Leviathan and the air pump (Princeton University Press,

Princeton, NJ). Spanos, Aris, 1986, Statistical foundations of econometric modelling (Cambridge University Press,

New York, NY). Spanos, Aris, 1989, On rereading Haavelmo, Econometric Theory 5, 4055429. Spanos, Aris, 1990, in: Granger, ed. (1990). Summers, Lawrence, 1991, The scientific illusion in macroeconomics, Scandinavian Journal of

Economics 93, 129- 148. Ticnson, Ralph, 1990, in: Garfield, ed. (1990). Tintner, Gerhard and Jati Sengupta, 1972, Stochastic economics (Academic, New York, NY). Weintraub, E. Roy, 1991, Stabilizing dynamics (Cambridge University Press, New York, NY). Zahar, Elie, 1980, Einstein, Meyerson and the role of mathematics in physical discovery, British

Journal for the Philosophy of Science 31, l-43.