Engineering subprogramme, 7 November 2006



Engineering subprogramme, 7 November 2006

Tony O’Hagan


Three parts:

• Turbofan engine vibration model• Reification• Predictors and validation

Part 1: The new model

Turbofan vibration model

Rolls-Royce, Derby, UKMaker of civil aeroplane engines

Simulator of a fan assembly

Our example has 24 blades

Primary concern is with vibration

If amplitude is too high on any one blade it may breakIn effect this will destroy the engine

Rolls-Royce Trent 500 engine

Model details

24 inputs are vibration resonant frequency of each blade

24 outputs are amplitude of vibration for each blade

Other factorsAmount of damping – more results in more complex behaviour and longer model run times

Model resolution – it’s possible to run the solver on higher or lower resolution grids

Could also vary e.g. number of blades, operating rpm and temperature

Parameter uncertainty

It’s not possible to manufacture and assemble blades to be all identical and perfectly oriented

Variation in resonant frequencies of blades creates complex variations in their vibration amplitude

Uncertainty distribution on each model input is the distribution achieved within manufacturing tolerances

Question:Given an assembly of bladessampled from this distribution,what is the risk of high amplitudevibrations resulting?


Strategy:Emulate single output = blade 1 amplitude

24 inputs = frequencies of blades 1 to 24Because of rotational symmetry, each model run gives up to 24 design points

Simulate random blade assemblies

ResultsOutput depends most strongly on blade 1 inputAlso on neighbouring inputs, 2 and 24, etcBut high-order dependencies on all inputsSo far we’ve failed to emulate accurately even with very many design points


What’s going on here?

Can we find a way to achieve the original strategy?

Should we try instead to emulate max amplitude?

This may also be badly behaved!

Part 2: Reification

Reification – background

Kennedy & O’Hagan (2001), “Bayesian calibration of computer models”

KO’H henceforth

Goldstein & Rougier (2006), “Reified Bayesian modelling and inference for physical systems”

GR henceforth

GR discuss two problems with KO’H1. Meaning of calibration parameters is unclear

2. Assuming stationary model discrepancy, independent of code, is inconsistent if better models are possible

Reification is their solution

Meaning of calibration parameters

The model is wrong

We need prior distributions for calibration parameters

Some may just be tuning parameters with no physical meaning

How can we assign priors to these?

Even for those that have physical meanings, the model may fit observational data better with wrong values

What does a prior mean for a parameter in a wrong model?

Example: some kind of machine

Simulator says output is proportional to inputEnergy in gives work out

Proportionality parameter has physical meaning

Observations with errorWithout model discrepancy, this is a simple linear model

LS estimate of slope is 0.568

But true parameter valueis 0.65


1.0 0.559

1.2 0.693

1.4 0.868

1.6 0.913

1.8 1.028

2.0 1.075

Model discrepancy










1 1.2 1.4 1.6 1.8 2x

Red line is LS fit

Black line is simulator with true parameter 0.65

Model is wrongIn reality there are energy losses








1 2 3 4 5x










1 1.2 1.4 1.6 1.8 2x

Case 1

Suppose we haveNo model discrepancy term

Weak prior on slope

Then we’ll getCalibration close to LS value, 0.568

Quite good predictive performance in [0, 2+]

Poor estimation of physical parameter










1 1.2 1.4 1.6 1.8 2x

Case 2

Suppose we haveNo model discrepancy term

Informative prior on slope based on knowledge of physical parameter

Centred around 0.65

Then we’ll getCalibration between LS and prior values

Not so good predictive performance

Poor estimation of physical parameter










1 1.2 1.4 1.6 1.8 2x

Without model discrepancy

Calibration is just nonlinear regressiony = f(x, θ) + e

Where f is the computer code

Quite good predictive performance can be achieved if there is a θ for which the model gets close to reality

Prior information based on physical meaning of θ can be misleading

Poor calibration

Poor prediction

Case 3

Suppose we haveGP model KO’H discrepancy term with constant mean

Weak prior on mean

Weak prior on slope

Then we’ll getCalibration close to LS value for regression with non-zero intercept

The GP takes the intercept

Slope estimate is now even further from the true physical parameter value, 0.518, albeit more uncertain

Discrepancy estimate ‘corrects’ generally upwards










1 1.2 1.4 1.6 1.8 2x

Case 4

Suppose we haveGP model KO’H discrepancy term with constant mean

Weak prior on mean

Informative prior on slope based on knowledge of physical parameter

Centred around 0.65

Then we’ll getSomething like linear regression with informative prior on the slope

Slope estimate is a compromise and loses physical meaning

Predictive accuracy weakened










1 1.2 1.4 1.6 1.8 2x

Adding simple discrepancy

Although the GP discrepancy of KO’H is in principle flexible and nonparametric, it still fits primarily on its mean function

Prediction looks like the result of fitting the regression model with nonlinear f plus the discrepancy mean

This process does not give physical meaning to the calibrated parameters

Even with informative priors

The augmented regression model is also wrong


GR introduce a new entity, the ‘reified’ modelTo reify is to attribute the status of reality

Thus, a reified simulator is one that we can treat as real, and in which the calibration parameters should take their physical values

Hence prior distributions on them can be meaningfully specified and should not distort the analysis

GR’s reified model is a kind of thought experimentIt is conceptually a model that corrects such (scientific and computational) deficiencies as we can identify in f

The GR reified model is not regarded as perfectIt still has simple additive model discrepancy as in KO’H

The discrepancy in the model is now made up of two parts

Difference between f and the reified modelFor which there is substantive prior information

Discrepancy of the reified modelIndependent of both models

Reification doubts

Can the reified model’s parameters be regarded as having physical meaning?

Allowing for model discrepancy between the reified model and reality makes this questionable

Do we need the reified model?Broadly speaking, the decomposition of the original model’s discrepancy is sensible

But it amounts to no more than thinking carefully about model discrepancy and modelling it as informatively as possible

Case 5

Suppose we haveGP model discrepancy term with mean function that reflects the acknowledged deficiency of the model in ignoring losses to friction

Informative prior on slope based on knowledge of physical parameter

Then we’ll getSomething more like the original intention of bringing in the model discrepancy!

Slope parameter not too distorted, model correction having physical meaning, good predictive performance










1 1.2 1.4 1.6 1.8 2x


There is no substitute for thinking

Model discrepancy should be modelled as informatively as possible

Inevitably, though, the discrepancy function will to a greater or lesser extent correct for unpredicted deficiencies

Then the physical interpretations of calibration parameters can be compromised

If this is not recognised in their priors, those priors can distort the analysis

Final comments

There is much more in GR than I have dealt with here

Definitely repays careful reading

E.g. relationships between different simulators of the same reality

Their paper will appear in JSPI with discussionThis presentation is a pilot for my discussion!

Part 3: Validation

Simulators, emulators, predictors

A simulator is a model, representing some real world process

An emulator is a statistical description of a simulator

Not just a fast surrogate

Full probabilistic specification of beliefs

A predictor is a statistical description of realityFull probabilistic specification of beliefs

Emulator + representation of relationship between simulator and reality


What can be meaningfully called validation?Validation should have the sense of demonstrating that something is right

The simulator is inevitably wrong There is no meaningful sense in which we can validate it

What about the emulator?It makes statements like, “We give probability 0.9 to the output f(x) lying in the range [a, b] if the model is run with inputs x.”This can be right in the sense that (at least) 90% of such intervals turn out to contain the true output

Validating the emulator

Strictly, we can’t demonstrate that the emulator actually is valid in that sense

The best we can do is to check that the truth on a number of new runs lies appropriately within probability boundsAnd apply as many such checks as we feel we need to give reasonable confidence in the emulator’s validity

In practice, check it against as many (well-chosen) new runs as possible

Do Q-Q plots of standardised residuals and other diagnostic checks

Validating a predictor

The predictor is also a stochastic entity

We can validate it in the same wayAlthough getting enough observations of reality may be difficult

We may have to settle for the predictor not being yet shown to be invalid!

Validity, quality, adequacy

So, a predictor/emulator is valid if the truth lies appropriately within probability bounds

Could be conservativeNeed severe testing tools for verification

The quality of a predictor is determined by how tight those bounds are

Refinement versus calibration

A predictor is adequate for purpose if the bounds are tight enough

If we are satisfied the predictor is valid over the relevant range we can determine adequacy

Conclusion – terminology

I would like to introduce the word ‘predictor’, alongside the already accepted ‘emulator’ and ‘simulator’

I would like the word ‘validate’ to be used in the sense I have done above

Not in the sense that Bayarri, Berger, et al have applied it, which has more to do with fitness for purpose

And hence involves not just validity but quality

Models can have many purposes, but validity can be assessed independently of purpose
