6
Journal of Statistical Planning and Inference 139 (2009) 1251 -- 1256 Contents lists available at ScienceDirect Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi Reified Bayesian modelling: Issues and opportunities (response to the discussion) Michael Goldstein a, , Jonathan Rougier b a University of Durham, UK b University of Bristol, UK We would like to thank the discussants for their thoughtful engagement with our proposed “reified modelling” approach. We shall respond firstly with general observations relevant to the wider issues raised and then address the individual discussants. General observations The general question of how we learn about complex physical systems using models is of fundamental importance for our understanding of the world. However, there is very little in the way of systematic thinking about how such learning should take place, and, in particular, how we account in a quantitative way for inadequacies in our simulators. It is in this context that we have proposed our reified modelling approach. Our starting point is the problems that we encounter when working with “direct simulators”. That idea is quite natural. If we do not consider that our simulator, even at its most appropriate choice of input value, will exactly reproduce the system, then we can try to account for this mismatch by adding an independent discrepancy term to the simulator output, as per our Eq. (1). This is analogous to treating simulator inadequacy as we would treat observational error. We first used this approach in our work on pressure matching for oil reservoirs (see, e.g., Craig et al., 1997; 2001). Adding such a discrepancy term was clearly preferable to ignoring simulator inadequacy, but introduced substantial conceptual difficulties, as follows. As our reservoir simulator was very time-consuming to evaluate, our approach to constructing our emulator for the simulator was based on creating a faster approximate version of the simulator which could be evaluated many times. We therefore needed to specify our judgements about the relationship between the two simulators. We were now in the situation described in Section 2 of our paper. Judgements about the fast simulator were related to judgements for the full simulator, and we added an independent discrepancy term to the full simulator. Even though we had not yet made any evaluations of the full simulator, we had therefore constructed, implicitly, an uncertainty specification for the discrepancy between our fast simulator and the reservoir. This discrepancy was not independent of the fast simulator or the best choice of input. Further, it was not particularly helpful to consider the evaluation of the fast simulator at the best choice of input. Rather, the reason that the fast simulator was informative about the reservoir was that it was informative about the full simulator. We found this suggestive as, a few years previously, the simulator that we were now viewing as a fast approximation would have represented the state-of-the-art. If we could not now view the fast simulator as a direct simulator for the reservoir, why would it have been reasonable to consider it to have been a direct simulator a few years previously? If we should not have judged it then to be a direct simulator, then why should we now judge the full simulator to be a direct simulator? After all, in the next few years, improvements in computing power and reservoir modelling would surely turn our current full simulator into a fast approximation to some even better simulator. As we were pondering this, we were also having practical difficulties working with reservoir engineers to quantify the variance of the discrepancy term. We realised that our problems arose largely because our simulators were not, for the most part, direct simulators at all. More widely, we came to appreciate precisely what a strong statement of belief is made by viewing a simulator as direct. How, even in principle, should we describe the relation between the simulator and reality? As our problems arose from consideration of improvements to our current simulator, it seemed natural to us to construct judgements relating our simulator Corresponding author. Tel.: +44 191 334 3065; fax: +44 191 334 3051. E-mail address: [email protected] (M. Goldstein). 0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.08.010

Reified Bayesian modelling: Issues and opportunities (response to the discussion)

Embed Size (px)

Citation preview

Page 1: Reified Bayesian modelling: Issues and opportunities (response to the discussion)

Journal of Statistical Planning and Inference 139 (2009) 1251 -- 1256

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

journal homepage: www.e lsev ier .com/ locate / jsp i

Reified Bayesian modelling: Issues and opportunities (response tothe discussion)

Michael Goldsteina,∗, Jonathan Rougierb

aUniversity of Durham, UKbUniversity of Bristol, UK

Wewould like to thank the discussants for their thoughtful engagement with our proposed “reified modelling” approach. Weshall respond firstly with general observations relevant to the wider issues raised and then address the individual discussants.

General observations

The general question of how we learn about complex physical systems using models is of fundamental importance for ourunderstanding of the world. However, there is very little in the way of systematic thinking about how such learning should takeplace, and, in particular, how we account in a quantitative way for inadequacies in our simulators. It is in this context that wehave proposed our reified modelling approach. Our starting point is the problems that we encounter when working with “directsimulators”. That idea is quite natural. If we do not consider that our simulator, even at its most appropriate choice of input value,will exactly reproduce the system, then we can try to account for this mismatch by adding an independent discrepancy term tothe simulator output, as per our Eq. (1). This is analogous to treating simulator inadequacy as we would treat observational error.We first used this approach in our work on pressure matching for oil reservoirs (see, e.g., Craig et al., 1997; 2001). Adding such adiscrepancy term was clearly preferable to ignoring simulator inadequacy, but introduced substantial conceptual difficulties, asfollows.

As our reservoir simulator was very time-consuming to evaluate, our approach to constructing our emulator for the simulatorwas based on creating a faster approximate version of the simulator which could be evaluated many times. We therefore neededto specify our judgements about the relationship between the two simulators. We were now in the situation described inSection 2 of our paper. Judgements about the fast simulator were related to judgements for the full simulator, and we addedan independent discrepancy term to the full simulator. Even though we had not yet made any evaluations of the full simulator,we had therefore constructed, implicitly, an uncertainty specification for the discrepancy between our fast simulator and thereservoir. This discrepancy was not independent of the fast simulator or the best choice of input. Further, it was not particularlyhelpful to consider the evaluation of the fast simulator at the best choice of input. Rather, the reason that the fast simulator wasinformative about the reservoir was that it was informative about the full simulator.

We found this suggestive as, a few years previously, the simulator that we were now viewing as a fast approximation wouldhave represented the state-of-the-art. If we could not now view the fast simulator as a direct simulator for the reservoir, whywould it have been reasonable to consider it to have been a direct simulator a few years previously? If we should not have judgedit then to be a direct simulator, then why should we now judge the full simulator to be a direct simulator? After all, in the nextfew years, improvements in computing power and reservoir modelling would surely turn our current full simulator into a fastapproximation to some even better simulator.

Aswewere pondering this, wewere also having practical difficultiesworkingwith reservoir engineers to quantify the varianceof the discrepancy term. We realised that our problems arose largely because our simulators were not, for the most part, directsimulators at all. More widely, we came to appreciate precisely what a strong statement of belief is made by viewing a simulatoras direct. How, even in principle, should we describe the relation between the simulator and reality? As our problems arose fromconsideration of improvements to our current simulator, it seemed natural to us to construct judgements relating our simulator

∗ Corresponding author. Tel.: +441913343065; fax: +441913343051.E-mail address: [email protected] (M. Goldstein).

0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2008.08.010

Page 2: Reified Bayesian modelling: Issues and opportunities (response to the discussion)

1252 M. Goldstein, J. Rougier / Journal of Statistical Planning and Inference 139 (2009) 1251 -- 1256

to reality through the introduction of an intermediate conceptual construct, the reified simulator, which is specified to a sufficientlevel of detail that it exhausts our insights into the ways in which our current simulator may fail to represent the system, andwhich we could therefore treat as a direct simulator.

This construction, as well as helping to straighten out the logic of inferences from simulators to systems, brings additionaladvantages. Firstly, we have a systematic way of considering the impact on our analysis of specific types of improvements to oursimulators. Secondly, the reified structure allows the physical values of the inputs, where appropriate, to play a meaningful rolein the analysis. Finally, we have a unified structure within which we may reconcile the analysis from several different simulatorsof the same underlying system.

All of this comes at a price, of course, namely the need to think more carefully about simulator inadequacy than we arecurrently used to doing; this seems to us to be entirely a good thing! However, lest the reader should think from some of thediscussion that reification is such a daunting process that it could never be carried out in practice, our view, on the contrary, isthat reification only becomes complex whenwe have complex views as to the nature of simulator inadequacy, views that we feelare sufficiently important that they must have careful representation in our inferential formulation. Otherwise, reification is notso very different from building discrepancy into the direct simulator formulation.

To emphasise this, we outline what might be termed “direct” reification, a natural generalisation of the direct simulatorapproach. We recognise that we have uncertainty in the reified simulator that will not be completely resolved, no matter howmany evaluations we make of our current simulator. Therefore we decompose simulator inadequacy into two parts: one partrepresenting uncertainty about the reified simulator conditional on the current simulator and one part representing uncertaintyabout the system conditional on the reified simulator evaluated at its best input. The second part should be familiar. For the firstpart, our simple approach would be to duplicate the structure of the emulator of the current simulator for the reified simulator,but to introduce extra uncertainty, e.g., into the regression coefficients, and by decorrelating the residual. This process movessome of the physics of the current simulator into the implicit formulation of the current simulator's discrepancy. As in ourillustration, the extra uncertainty can be calibrated on a scale that runs from our current simulator to the system, with the reifiedsimulator lying in-between. Just as having some simulator discrepancy is better than ignoring this discrepancy altogether, directreification is better than treating as direct a simulator that we judge not to be direct. The balance between scientific judgementand pragmatic simplifications is problem dependent, and our framework may be used to whatever level of detail is deemedscientifically appropriate.

Specific observations

We now address the discussants' comments in turn. we restrict our responses to avoid too much duplication.We would like to thank O'Hagan for his thoughtful comments on the general issue of the structure of simulator discrepancy.

His example is nicely illuminating and provides a useful complement to our general discussion about problems with overlysimple formulations of simulator discrepancy in Section 2 of our paper. As for the caution that “Prior distributions should insteadacknowledge the extent to which the best fitting values will deviate from the physical values . . .”, this is addressed directly withinour formulation, as best fit within the reified simulator need not correspond to best fit within our current simulator. Indeed,this mismatch is an important reason for introducing the reified form when our aim is to make inferences about the physicalinputs. The effect on our posterior uncertainty is not clear-cut: as x∗ lives in our reified simulator, our prior on x∗ can be morephysical; but while it is closer to the information available in the observations, it is further from the information in the ensembleof evaluations of our current simulator.

This issue of the “best input” is raised in Challenor's comments on the distinction that we make between molecular viscosity,which corresponds to direct physical measurement, and eddy viscosity, which is a quantity which is specifically intended, atleast in part, to tune out aspects of simulator deficiency (namely the discretisation of the solver). Careful consideration of thedeficiencies in our current simulator may open up a host of challenging scientific questions. In any given problem, it is a matterof scientific judgement as to which distinctions it is important to make carefully and which, for current purposes, may betemporarily put to one side. Obviously, we will obtain a better description of our uncertainty about actual ocean behaviour ifwe think carefully about the approximations arising from our choice of grid scales, and reification offers us the possibility ofincorporating our judgements about such effects directly into our analysis. Challenor comments that the best set of simulatorinputs may vary across outputs. This is less of a problem within the reified approach, however, as the best input for the reifiedsimulator may imply quite different choices of “best inputs” for each output of our current simulator, depending on how werelate the outputs across the current and reified simulators, conditionally on the inputs.

Finally, Challenor expresses concern at the level of detail needed in order to carry out our proposed reification and won-ders whether the expert will have sufficient knowledge to make the required uncertainty judgements. However, if we haveonly small amounts of knowledge about the ways in which our simulators may be deficient then it is straightforward to buildthe reified structure, as we have explained above, and this structure will better express our overall uncertainties about simu-lator discrepancy than would any simple alternative approach that we are aware of. We suspect, however, that the oppositewill often be the case and that, given the opportunity to think carefully about how the simulator relates to the system, theexpert will have quite extensive views about the potential impact of what we term, in Section 4.3 of our paper, “structuralreification”.

Page 3: Reified Bayesian modelling: Issues and opportunities (response to the discussion)

M. Goldstein, J. Rougier / Journal of Statistical Planning and Inference 139 (2009) 1251 -- 1256 1253

Morris and Higdon express concern that the reified formulation may give worse results than the direct formulation. We arenot sure how they are quantifying “worse” in this context: if we are unhappy to treat our simulator as a direct simulator, thenit can hardly be worse to acknowledge this fact than to ignore it. We would not knowingly make our reified simulator moreunrealistic than our current simulator, but we would also not make the mistake of thinking that our current simulator wasphysically realistic, or that its realism would be preserved with the “tried and tested” approach of simply adding an independentdiscrepancy.

Morris and Higdon are also concerned about the extra effort involved in going beyond treating their simulator as a directsimulator. They observe that itmay be very challenging to carry out a careful reificationwhich unifies thewide range of simulatorsarising in various important large scale problems, such as climate change. While this is certainly true, we disagree with theirassertion that “Constructing beliefs regarding the reified simulator in such a case seems arduous at best.” We would not describethe process of building the original climate simulators as “arduous at best”, nor the effort of collecting relevant data, nor of tryingto understand the implications of climate change. These activitiesmay be difficult but they are not “arduous at best”—they are thenormal hardwork of science. Reification is just part of that: providing a better framework for incorporating scientific judgements,in situations where those judgements really matter. We agree that scientific progress may often be surprising and messy, butwe draw the opposite conclusion, namely that it is therefore more important, not less, to think carefully about what our currentsimulators are actually telling us about the physical systems that we are trying to understand.

Lavine, Hegerl and Lozier raise three basic issues. Firstly, do true values of the inputs to the reified simulator exist? We wouldsay yes, subject to one important exception. In order to simplify our discussion, we have not described our view of the role oftuning parameters in a reified analysis. In our view, there are two basic types of tuning parameters. Firstly, we may tune thesimulator at its best input. In our formulation, this corresponds to choosing a tuning parameter tomake our current simulator andthe reified version match more closely at the true physical inputs. Secondly, we may tune the underlying relationship betweeninputs and outputs to attempt to overcome systematic deficiencies in our current simulator. This corresponds to tuning thecurrent simulator, considered as a function, so that it more closely resembles the reified simulator considered as a function.Therefore, the reified simulator will not contain the tuning parameters in our current simulator.

However, those remaining inputswhichexpressphysical values shouldbedefined sufficiently closely tomeet the requirementsof the problem. There are two such requirements. Firstly, we may be relying on physical measurements or insights in formingour judgements about the values of the inputs, or conversely one of the aims of the analysis may be to learn about the truevalues of some of the inputs, using observational data from the system. In such cases, our reified simulator should be defined toa level of detail corresponding to these values. Secondly, we need to provide a convincing scientific case for any relationship thatwe wish to assert between our simulator and the physical system. Reification does not patch up the deficiencies in our currentsimulator, but it can express the additional uncertainty that we have introduced in the process of simplifying our input space.Because we judge the reified form to be closer to a genuine representation of the physical process than is our current simulator,it is reasonable to view best choices of inputs to the reified simulator to be closer to true values than would corresponding valuesfor our current simulator. This is all that we would claim for the illustration in the paper. Therefore, our value, for example, ofT∗2 may be viewed as an aggregate over temperature forcing for the North Atlantic, while the discrepancy introduced by spatial

variation in T∗2 is attributed to the residual discrepancy between the reified simulator and the actual behaviour of the Atlantic.

We are not proposing a definitive treatment of temperature forcing in the North Atlantic: rather, we are accounting for our ownconcerns about Zickfeld et al.'s simulator.

Secondly, Lavine, Hegerl and Lozier ask us to compare our approach to current practice among climate modellers. Our viewis that the approach of Murphy et al. (2004) is consistent with the direct simulator approach, including using the ensembleto construct an emulator. One of us discussed this in detail with the authors and the resulting comment (Rougier, 2004) wasacceptable to all parties except the journal, Nature. In Rougier et al. (2008), we have re-analysed the data from bothMurphy et al.and the climateprediction.net experiment (Stainforth et al., 2005) in a more transparently statistical way, including combiningthe two ensembles through linked emulators. The discussants identify that current practice requires climate scientists to maketheir analysis contingent on the perfect accuracy of the GCM, rather than addressing the issue of how to quantify simulatordiscrepancy (although, in fact, Murphy et al. do account for something more than measurement error). To which we can onlyreply that this seems to be “a calculated decision to make life easier for climate scientists but harder for policymakers” (Rougier,2007).

The discussants ask us to comment on the possible effects of high leverage values on our analysis. We do detailedmodelling ofour computer simulator through careful emulation. High leverage observations in such contexts should be treated carefully, as isthe case in any statisticalmodelling process. Theymay be very influential to our final conclusions because they supply informationon aspects of our simulator which are not readily available from the other evaluations. The fact that such considerations are notrelevant to many alternative methods currently in use illustrates the lack of care in those methods in keeping track of howinformation about the simulator is being translated into information about the process.

Finally, Lavine, Hegerl and Lozier ask about practical issues involved in reification. We concur that the best input approachcan be tractable, and that Monte Carlo integration, where it is practical, obviates the need to construct an emulator: exactlythis situation is described in Rougier (2007). However, in many cases we will not be able to perform as many evaluations aswe would like, and then an ensemble of carefully chosen evaluations distilled into an emulator may be preferable to randomlychosen evaluations plugged directly into the inference; Rougier and Sexton (2007) discuss this in more detail. The principles

Page 4: Reified Bayesian modelling: Issues and opportunities (response to the discussion)

1254 M. Goldstein, J. Rougier / Journal of Statistical Planning and Inference 139 (2009) 1251 -- 1256

of emulation are now well-established (see, e.g., Santer et al., 2003), and the technology is developing, under the impetus ofapplications, to handle functional outputs like fields of temperatures (see, e.g., Bayarri et al., 2007; Higdon et al., 2008). Morepertinently, though, there is not, to our knowledge, another approach for linking functional uncertainty to process uncertainty,so that emulators offer the only way forward for expensive simulators, like general climate models.

Moreover, no amount of simulator evaluations can overcome the limitations of the direct simulator approach. If theselimitations seem pertinent, then, at theminimum, direct reification offers an improvement: this does not seem anymore onerousand may even be less so, since judgements about the best input to the reified simulator can be better-informed by systemvalues. We are asked about our requirement that the reified form should be “sufficiently careful” and “accurate” that “we wouldnot consider it necessary to make judgements about any further improvements . . .”. We clarified the sense of this quote inthe next sentence—our emphasis on the higher accuracy of the reified form is only to motivate the idea that, at some level ofrefinement, even the most careful elaboration of current knowledge will exhaust our ability to make meaningful statementsabout the effects of further improving the simulator. In most cases, we envisage that our Eq. (10) will be used to express quitespecific enhancements to our current simulator. In our example, we decided to “not introduce any further simulator or regressionfunctions” because, as non-experts, we have no further beliefs to express about the practical ways that our simulator should beenhanced. This is a simplification only because, were we to think very carefully, we could doubtless find ways to further enhanceour belief specification.

Like any other modelling activity, we must strike the right balance between pragmatism and completeness. Of course, thereis a lot of work involved in careful reification of a complex, high dimensional simulator, but this should be set against the effortrequired to build andmaintain the simulator and collect the relevant process data in the first place. It seems quitewrong to us thatall this effort should fall at the final hurdle, because we are unwilling to put forward the effort to link the simulator to the system.However, we do agree that very high dimensional prior distributions are difficult to think about, which is why we recommendthe Bayes linear approach which only requires means, variances and covariances, which are much easier to specify and analysethan full joint distributions in very high dimensions, and which provides a rich set of diagnostics (Goldstein and Wooff, 2007).In Goldstein and Rougier (2006) we show how the Bayes linear approach can be used in both calibration and prediction, and indiagnostic checking. The choices we make in our reified modelling may be checked, albeit somewhat indirectly, as they implya mean, variance and correlation structure across both the elements of the physical process and the evaluations of our actualsimulator. If we have observations on aspects of the process we can compare these directly with our mean and variance, andquantify the mismatch, for example in standardised differences.

Regarding “uncertain uncertainties”, it is unfortunate that current statistical practice in climate science is so poor that“Current estimates and their stated uncertainties are not taken at face value.” This, though, is due in part to a reluctance toquantify simulator discrepancy at all, without which it makes very little sense to calibrate our climate simulators' inputs tosystem observations, which can explain the contradictory results for quantities such as climate sensitivity. If attempts to quantifythe discrepancy highlight the limitations the best input approach, then the reified approach provides an explanation and asolution. The discussants have qualms about the analysis being “for me personally, not necessarily for anyone else”, an issue thatis not limited to the reified approach, but one that is highly visible there because the first climate experts to do it will be going outon a limb. We would say that to have at least one expert team, after very careful reflection, stating their actual uncertainties issurely better than an analysis which nobody has an informed judgement for. The role of subjective analysis in scientific enquiryis subtle and complicated; see the discussion of subjectivism in science in Goldstein (2006), and the comment in Rougier (2007,Section 2).

Smith asks who owns the probability model. In our example, the judgements and conclusions are all ours. More generally,the answer will depend on whether the intention is for the individual scientist or analyst to clarify their own judgements aboutthe process, or whether the aim is to make an analysis which will influence a wider community. The former situation requiresonly a personal subjectivist analysis of the type we have described. In the latter case, we must be careful to source and justifyour judgements explicitly and to identify the extent to which experts may reasonably disagree with our conclusions; again seethe discussion in Goldstein (2006). However, our fundamental concern is to suggest ways to help experts who wish to improvetheir own uncertainty assessments, ideally in the context of a team comprising both scientists and statisticians.

Smith's second question concerns our attitude to the reified simulator that we construct, and in particular whether thisconstruction fails de Finetti's “clarity principle” to the effect that previsions may only be specified over observables. This is sucha commonmisconception of de Finetti's actual position that it is worth spending a little time addressing this confusion. About 25ago, one of us (Goldstein) wrote the following:

Traditional Bayesianmodels can be better understood by explicitly recognizing anddistinguishing between two fundamen-tally different meanings for probability statements. The former is the use of probability (or prevision) as the quantitativeexpression of the knowledge of an individual. The latter is the use of probability as a purely technical intermediary quantityhelpful in translating generalised knowledge into precise statements of the former kind. (Goldstein, 1981)

Subsequent personal communication from de Finetti confirmed that this was also his view.In the current context, there are a certain number of actual expectation and variance statements about the observable physical

process that we make as the result of the analysis. These are intended to pass the clarity test and to represent our actual bestjudgements about the values that we assert for these assessments. We work within the Bayes linear formalism precisely so

Page 5: Reified Bayesian modelling: Issues and opportunities (response to the discussion)

M. Goldstein, J. Rougier / Journal of Statistical Planning and Inference 139 (2009) 1251 -- 1256 1255

that we can be very careful in limiting such judgements to those which we are both able and willing to consider. Sometimesthe reified version of the simulator might be operationally well-defined, for example being based on a higher-resolution solver.Often, however, our quantifications over the reified form lie within the second category of probability statements, namely actingas a technical intermediary to translate our generalised knowledge into statements of beliefs about real phenomena. We onlyneed to consider the reified form to a sufficient level of detail that it will lead us to our belief statements over the observables,and because we are working with expectation as primitive within the Bayes linear formalism this does not require us to considerconcerns such as an event partition over the simulator outcomes.

Why do we consider the reified form an appropriate vehicle for this translation of beliefs? The answer to this lies in Smith'sthird question, where he asks why we consider that the reified form should obey the crucial conditional independence propertythat we invoke through our Eq. (8). In the paper, we offer the justification for this property that the reified form exhausts all ofour structured judgements which induce correlation between simulator discrepancy and the function. This is a natural way tothink about our construction. However, since we are asked directly why we should accept this argument, let us now outline thebasic subjectivist considerations which led us to this position. The fundamental question that we have considered is what wemay learn about physical systems from simulators of the system. Why do we think that we can learn anything at all? Obviously,each individual problem has its own specific features. However, for many problems, the reason that we consider that we maylearn about the system from the simulator is because the simulator has been built with precisely this aim in mind. Althoughusually imperfectly realised, the motivating idea is that, were we to know the physical values of the associated inputs, then thecorresponding simulator evaluation, while not being equal to the system value, would exhaust our knowledge conditional on thechosen input values and therefore act as our judgement about the system.

If we are working with expectation as primitive, then we may ask whether most simulators in current use could possiblyfunction as posterior expectations given the appropriate input values. Unfortunately, exactly the reasons that we gave as towhy we cannot view most simulators as “direct” apply in just the same way to rule out such simulators as posterior judge-ments. In Goldstein (1997), it was shown that our current judgements about the difference between our posterior previsionfor a quantity and the actual value of the quantity must be uncorrelated with the posterior judgement, and all ingredients ofthat judgement. This property is derived directly from the “temporal sure preference” condition, which is a minimal coherencerequirement for our current views of our future judgements. Therefore, we may rephrase our reified simulator constructionas the minimal elaboration of the simulator which could satisfy the intended role as a posterior judgement given appropriateinput values. As we have emphasised, this need not be a complicated elaboration, unless we actually have complex judge-ments to incorporate. So, in summary, reification completes the formal representation of the simulator as a posterior judgementand the conditional independence statement (8) follows as a direct consequence of temporal sure preference. The times whenwe cannot carry out this process are when the simulator is not intended, even in principle, to furnish a judgement. In suchcases, a different justification is required by the expert in order to relate the simulator to the system. For example, we mightview the simulator evaluation at the true physical values of the inputs as being, in some sense yet to be defined, an ob-servation on the system, which would induce a quite different conditional independence structure. We look forward to thecareful exposition of alternative views of simulator function which still lead to well-founded inferences about the physicalsystem.

Smith's next question involves a distinction which is unclear to us. We do not consider, implicitly or explicitly that “the worldis assumed to be run by a complicated simulator”, only that physical laws and their mathematical implementation are relevantto our judgements about the behaviour of the system given knowledge of the state of the system. Therefore, we do not makepredictions given thatwehave set the inputs to a given value in any sense that corresponds to “doing”. For example, an oil reservoirhas certain physical characteristics deriving from permeability and porosity fields, fault configurations and so forth, which willcertainly influence the production performance at the oil wells, but there is no sense in which we might envisage the effect onthe well behaviour of physically changing the fault configuration of our reservoir. However, even this is irrelevant to Smith'sactual question which is why we assume no “common cause” between various ingredients of the final uncertainty specification.The question is misconceived in the same way that the view that the world is run by a complicated simulator is misconceived.The world is just the way it is, and so, of course, there is a common cause between all of the ingredients of our formulation. Thevarious simulators that we develop help us to understandwhy the world is the way it is and the uncertainty specification that webuild up describes our state of knowledge as to the quality and limitations of our simulators. By construction, certain ingredientswithin our formulation are uncorrelated but this corresponds to lack of knowledge, not some form of mechanistic `real-world'unrelatedness, as appears to be envisaged in the question.

Smith's final question concerns the role of expert opinion within our approach. He cites the example of an expert on dispersalsimulators for contaminated particles and observes, in particular, that her simulators appear to work better with the notion of“effective height” rather than true height for the contamination plume.We can easily imagine that her simulator would not workwell with true height—after all, the simulator has not been reified. The fundamental issue remains, however, namely explaininghow the expert's simulator may be used to give uncertainty statements for actual emissions of contaminated material. GivenSmith's apparent scepticism towards methods based either on considering the order of magnitude of simulator discrepancydirectly or of trying to source this discrepancy by considering potential weaknesses in the expert's simulator, all that seems toremain is direct matching to historical data (the many past experiments which, the question implies, have been used to calibratethe simulator).

Page 6: Reified Bayesian modelling: Issues and opportunities (response to the discussion)

1256 M. Goldstein, J. Rougier / Journal of Statistical Planning and Inference 139 (2009) 1251 -- 1256

The expert may be happy with this, but should we be equally accepting? We were intrigued by the suggestion that suchmethods could have resulted in a reliable methodology for forecasting the real spread of contamination and so we googled“dispersion models”. We were not surprised to find all of the usual criticisms being made of such models with great scepticismbeing expressed about the underlying rationale for the modelling, concerns being voiced at the lack of a serious frameworkfor assessing simulator uncertainty, and in particular in the way that the simulators fail to account for actual variation in realsituations. A typical quote is as follows:

. . . most models predict the average dispersion (over a large number of realizations of the given situation) and not theevent-to-event variability about that average. As a result, even a good atmospheric transport model may have single-event errors of more than a factor of ten. (In Tracking and Predicting the Atmospheric Dispersion of Hazardous MaterialReleases: Implications for Homeland Security, National Research Council of the National Academies, National AcademiesPress, Washington, DC, USA, 2003)

This quote corresponds quite closely to the world that we recognise, in which problems of enormous practical concern arestudied using large computer simulators whose relationship to the real systems of interest is fragile at best. There is a growingawareness, among both experts and end-users such as policymakers, of the importance of careful uncertainty analysis for linkingthe simulator and the system. Smith's interpretation that our aim is to move judgements from the world of the expert into ourworld is an unhelpful view of this process, which leads tomuch of his perplexity. Our aim is to help any expert who is not satisfiedwith her current description of uncertainty about the actual systemswhich she ismodelling—which includes almost every expertmodeller that we have ever met, irrespective of the field of enquiry. For all the reasons that we have outlined above, we considerreification to be a helpful part of this process.

A final general comment.We are all aware of themassive effortmade by the scientificmodelling community to study complexphysical processes. We accept that the simulators that are built will not be perfect at the start, or perhaps ever, but we do thebest job that we can and gradually the simulators improve. Inferential analysis for simulators and systems is no different. Ifthe problem is complicated, the analysis will be demanding and we will return to it many times. To restrict ourselves to anoverly simple form for the link between simulator and system would be like insisting that only simple simulators should bebuilt. Initially, we would appear to save time and effort but soon we would be unable to patch up the deficiencies arising fromour over-simplifications. This is happening in many fields right now, and it is time to consider our uncertainties over modeldiscrepancy much more carefully.

References

Bayarri, M.J., Berger, J.O., Cafeo, J., Garcia-Donato, G., Liu, F., Palomo, J., Parthasarathy, R.J., Paulo, R., Sacks, J., Walsh, D., 2007. Computer model validation withfunctional output. Ann. Statist. 35 (5), 1874–1906.

Goldstein, M., 1981. Revising previsions: a geometric interpretation. J. Roy. Statist. Soc. B 43, 105–130 (with discussion).Goldstein, M., 1997. Prior inferences for posterior judgements. In: Chiara, M.C.D. et al. (Eds.), Structures and Norms in Science. Kluwer Academic Publishers,

Dordrecht, pp. 55–71.Goldstein, M., 2006. Subjective Bayesian analysis: principles and practice. Bayesian Anal. 1, 403–420 (with discussion).Goldstein, M., Wooff, D.A., 2007. Bayes Linear Statistics: Theory and Methods. Wiley, New York.Higdon, D., Gattiker, J., Williams, B., Rightley, M., 2008. Computer model calibration using high dimensional output. J. Am. Stat. Assoc. 103 (482), 570–583.Rougier, J.C., 2004. Brief comment arising re: quantification of modelling uncertainties in a large ensemble of climate change simulations, unpublished, available

at 〈http://www.maths.bris.ac.uk/∼mazjcr/commentMurphyetal.pdf〉.Rougier, J.C., 2007. Probabilistic inference for future climate using an ensemble of climate model evaluations. Climatic Change 81, 247–264.Rougier, J.C., Sexton, D.M.H., 2007. Inference in ensemble experiments. Philos. Trans. Roy. Soc. Ser. A 365, 2133–2143.Rougier, J.C., Sexton, D.M.H., Murphy, J.M., Stainforth, D., 2008. Analysing the climate sensitivity of the HadAM3 climate model using ensembles from different

but related experiments. In submission.Stainforth, D.A., Aina, T., Christensen, C., Collins, M., Faull, N., Frame, D.J., Kettleborough, J.A., Knight, S., Martin, A., Murphy, J.M., Piani, C., Sexton, D., Smith, L.A.,

Spicer, R.A., Thorpe, A.J., Allen, M.R., 2005. Uncertainty in Predictions of the Climate Response to Rising Levels of Greenhouse Gases. Nature 433, 403–406.