30
Climate model errors, feedbacks and forcings: a comparison of perturbed physics and multi-model ensembles Matthew Collins Ben B. B. Booth B. Bhaskaran Glen R. Harris James M. Murphy David M. H. Sexton Mark J. Webb Received: 23 September 2009 / Accepted: 27 March 2010 / Published online: 7 May 2010 Ó Crown Copyright 2010 Abstract Ensembles of climate model simulations are required for input into probabilistic assessments of the risk of future climate change in which uncertainties are quan- tified. Here we document and compare aspects of climate model ensembles from the multi-model archive and from perturbed physics ensembles generated using the third version of the Hadley Centre climate model (HadCM3). Model-error characteristics derived from time-averaged two-dimensional fields of observed climate variables indi- cate that the perturbed physics approach is capable of sampling a relatively wide range of different mean climate states, consistent with simple estimates of observational uncertainty and comparable to the range of mean states sampled by the multi-model ensemble. The perturbed physics approach is also capable of sampling a relatively wide range of climate forcings and climate feedbacks under enhanced levels of greenhouse gases, again comparable with the multi-model ensemble. By examining correlations between global time-averaged measures of model error and global measures of climate change feedback strengths, we conclude that there are no simple emergent relationships between climate model errors and the magnitude of future global temperature change. Algorithms for quantifying uncertainty require the use of complex multivariate metrics for constraining projections. Keywords Ensembles Uncertainty Model errors Climate feedbacks Observational constraints 1 Introduction Quantitative predictions of future climate change on time scales of decades to centuries are required inform society in its endeavours to both adapt to the consequences of climate change and to put in place mitigation efforts to control it. The complexity of interacting processes in the climate system means that we must use three-dimensional numer- ical models that represent all those processes and feedbacks in order to make predictions that directly feed into decision making. Complex models are required to provide regional detail, details of changes in extremes and for the assess- ment of non-linear, rapid or abrupt climate change. Uncertainties or errors 1 in numerical models limit the utility of projections from any individual model. Ensemble approaches have been applied in other prediction problems to increase utility by producing estimates of uncertainties in short-term predictions (e.g. Molteni et al. 2006). By first measuring the prediction uncertainties, and then tracing those uncertainties to model biases and errors, we should be better able to target research to improve models and ultimately produce better, less uncertain, climate projec- tions. In parallel, there is a need to use information from the current generation of models to inform policy and planning now, hence there is a need to develop techniques to extract robust information from models and make credible projections. A component of any projection system should be an ensemble of models which sample natural variability, forcing uncertainty and the uncertainties in the underlying M. Collins (&) B. B. B. Booth B. Bhaskaran G. R. Harris J. M. Murphy D. M. H. Sexton M. J. Webb Met Office Hadley Centre, FitzRoy Road, Exeter EX1 3PU, UK e-mail: matthew.collins@metoffice.gov.uk 1 Here we use the term ‘‘error’’ and ‘‘model error’’ to mean differences between models and the real world, as is common in numerical weather and climate modelling, rather than, e.g. coding errors or bugs that might be easily corrected. 123 Clim Dyn (2011) 36:1737–1766 DOI 10.1007/s00382-010-0808-0

Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

Climate model errors, feedbacks and forcings: a comparisonof perturbed physics and multi-model ensembles

Matthew Collins • Ben B. B. Booth •

B. Bhaskaran • Glen R. Harris • James M. Murphy •

David M. H. Sexton • Mark J. Webb

Received: 23 September 2009 / Accepted: 27 March 2010 / Published online: 7 May 2010! Crown Copyright 2010

Abstract Ensembles of climate model simulations arerequired for input into probabilistic assessments of the risk

of future climate change in which uncertainties are quan-

tified. Here we document and compare aspects of climatemodel ensembles from the multi-model archive and from

perturbed physics ensembles generated using the third

version of the Hadley Centre climate model (HadCM3).Model-error characteristics derived from time-averaged

two-dimensional fields of observed climate variables indi-

cate that the perturbed physics approach is capable ofsampling a relatively wide range of different mean climate

states, consistent with simple estimates of observational

uncertainty and comparable to the range of mean statessampled by the multi-model ensemble. The perturbed

physics approach is also capable of sampling a relatively

wide range of climate forcings and climate feedbacks underenhanced levels of greenhouse gases, again comparable

with the multi-model ensemble. By examining correlations

between global time-averaged measures of model error andglobal measures of climate change feedback strengths, we

conclude that there are no simple emergent relationshipsbetween climate model errors and the magnitude of future

global temperature change. Algorithms for quantifying

uncertainty require the use of complex multivariate metricsfor constraining projections.

Keywords Ensembles ! Uncertainty ! Model errors !Climate feedbacks ! Observational constraints

1 Introduction

Quantitative predictions of future climate change on time

scales of decades to centuries are required inform society inits endeavours to both adapt to the consequences of climate

change and to put in place mitigation efforts to control it.

The complexity of interacting processes in the climatesystem means that we must use three-dimensional numer-

ical models that represent all those processes and feedbacks

in order to make predictions that directly feed into decisionmaking. Complex models are required to provide regional

detail, details of changes in extremes and for the assess-

ment of non-linear, rapid or abrupt climate change.Uncertainties or errors1 in numerical models limit the

utility of projections from any individual model. Ensemble

approaches have been applied in other prediction problemsto increase utility by producing estimates of uncertainties

in short-term predictions (e.g. Molteni et al. 2006). By first

measuring the prediction uncertainties, and then tracingthose uncertainties to model biases and errors, we should

be better able to target research to improve models andultimately produce better, less uncertain, climate projec-

tions. In parallel, there is a need to use information from

the current generation of models to inform policy andplanning now, hence there is a need to develop techniques

to extract robust information from models and make

credible projections.A component of any projection system should be an

ensemble of models which sample natural variability,

forcing uncertainty and the uncertainties in the underlying

M. Collins (&) ! B. B. B. Booth ! B. Bhaskaran !G. R. Harris ! J. M. Murphy ! D. M. H. Sexton ! M. J. WebbMet Office Hadley Centre, FitzRoy Road, Exeter EX1 3PU, UKe-mail: [email protected]

1 Here we use the term ‘‘error’’ and ‘‘model error’’ to meandifferences between models and the real world, as is common innumerical weather and climate modelling, rather than, e.g. codingerrors or bugs that might be easily corrected.

123

Clim Dyn (2011) 36:1737–1766

DOI 10.1007/s00382-010-0808-0

Page 2: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

physical (and increasingly chemical and biological) pro-

cesses which drive regional and global climate change.Two approaches have been adopted in recent years. The

first we term the ‘‘multi-model ensemble’’, sometimes

called the ensemble-of-opportunity, meaning the collectionof the output from the world’s climate models. Recent

efforts to collect such information (Meehl et al. 2007b)

have produced an unprecedented array of studies that feddirectly into the most recent IPCC assessment. The second

ensemble technique we term the ‘‘perturbed-physicsensemble’’ (e.g. Murphy et al. 2004) whereby a single

model structure is used and perturbations are made to

uncertain physical parameters within that structure,including the potential to switch in and out existing sec-

tions of code in some cases.

One strength of the multi-model approach is in theability to sample a wide range of structural choices which

may impact model errors, climate change feedbacks and

climate forcings; widely different dynamical cores andwidely different techniques for parameterising physical

processes. There is a potentially large ‘‘gene pool’’ of

possible models. Extensive coordination is required toensure that modelling groups produce compatible experi-

ments (the list of which is growing: e.g., Hibbard et al.

2007) and increasingly, as models become more complexincluding earth-systems processes and data assimilation

schemes for example, modelling groups share components,

potentially limiting the gene-pool. Despite great effortsworld-wide, the number of ensemble members produced is,

at most, of the order of tens of members. Knutti et al.

(2010) discuss a wide range of issues relating to multi-model ensembles.

The key strength of the perturbed physics approach is

the ability to produce a large numbers of ensemble mem-bers in a relatively easy way. It is possible to control the

experimentation and systematically explore uncertainties in

processes and feedbacks. For example, it is possible toproduce a set of ensemble experiments where the input

forcing data (e.g. in a twentieth century simulation) is the

same in each experiment, but the parameters which control,say, the climate sensitivity of the model are varied. Thus,

the different sources of uncertainty can be isolated. It is

also possible to explore a wide range of feedback processesin the model by ‘‘de-tuning’’ it, potentially revealing the

impact of previous compensating errors. Such de-tuning

also ameliorates the potential for double-counting whenconstraining models with observations (e.g. Allen et al.

2002); that is the assigning of a relative likelihood to dif-

ferent model versions based on observed data that hasalready been used in their development.

The main motivation for this paper is to document the

design and characteristics of a number of perturbed physicsensembles that have been produced as part of an extensive

programme of research at the Met Office Hadley Centre to

produce regional climate projections (e.g. Murphy et al.2007, 2009) and to contrast aspects of those perturbed

physics ensembles with corresponding multi-model

ensembles. Such basic comparisons are important when weconsider the number of approaches which use either or both

types of ensembles to produce societally relevant infor-

mation about climate change (see Murphy et al. 2009 andthe special edition of the Philosophical Transactions of the

Royal Society A—Collins 2007). In documenting studieswhich produce projections in terms of probability distri-

bution functions (PDFs), it is not always possible to devote

space to basic model diagnostics. This paper is intended toaddress this issue.

While we clearly cannot investigate all possible aspects

of the many stored Tbytes of model output we have accessto in one paper, a number of questions and issues have

driven the analysis herein:

1. What are relative model-error characteristics of the

two approaches? We might naively assume that the

multi-model ensemble contains members with a widerange of different error characteristics, whereas the

perturbed-physics approach produces members with

very similar baseline climates and thus very similarerrors. Is it possible to identify systematic and random

components of model error? What is the relative

partitioning of systematic and random errors betweenthe two types of ensembles? Why, in the multi-model

case, is the ensemble mean so often the ‘‘best model’’?

2. We know that the perturbed physics approach iscapable of producing model variants with a wide range

of different feedbacks strengths under climate change

(e.g. Webb et al. 2006; Sanderson et al. 2008). Are theranges comparable with those found in the CMIP3

models for both equilibrium and transient climate

change? What are the main drivers of uncertainties inglobal climate change feedbacks in the two types of

ensemble?

3. The total uncertainty in global mean change under, e.g.historical forcing and future SRES scenarios is a

combination of uncertainties in feedbacks and uncer-

tainties in radiative forcings. To the extent that thelatter can be estimated, what are the differences

between radiative forcings in the two ensemble

approaches?4. Finally, are there clear relationships between measures

of model error and the magnitudes of climate change

feedbacks?

Question 4 is highly relevant when we use ensembles of

climate model projections to generate predictions of cli-mate change expressed in terms of PDFs which provide a

measure the uncertainty (or credibility) in that prediction.

1738 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 3: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

We cannot simply form histograms from, or fit statistical

distributions to, the output from model simulations offuture change. A key stage in forming PDFs is to assign a

relative likelihood to each member of the ensemble by

comparing simulations of past climate and climate changewith observations (e.g. Rougier 2007). If we can clearly

deduce that, for example, a model with a very high climate

sensitivity performs less well than a model with a lowerclimate sensitivity when examining a wide range of

observational tests, then we have less belief in that highersensitivity model. Formally that model should receive a

lower weight when forming a PDF from the ensemble and

for this to be the case, i.e. to be able to distinguish betweendifferent models, there should be some relationship

between the predict and say, climate sensitivity and the

particular metric. This we call an observational constraint.A particular metric, or more generally a particular collec-

tion of metrics, is useful in assessing model fidelity if, and

only if, there is some relationship (perhaps indirect)between that set of metrics and the prediction variable of

interest.

Furthermore, we may seek predictions of joint PDFs ofvariables, e.g. future temperature and precipitation change

in a particular region. A metric optimised to constrain the

PDF of future regional temperature change may not beoptimal in constraining the PDF of future precipitation

change in that region. Likewise, an observational constraint

on climate variables in one region may not provide aconstraint on the variable in another remote region.

Murphy et al. (2007, 2009) outline a particular method

to produce joint PDFs of future climate change usingperturbed physics ensembles and observational constraints.

The perturbed physics ensembles described here, together

with others documented elsewhere, are combined with astatistical emulator of the model parameter space (see e.g.,

Rougier et al. 2009 for an example) and a ‘‘time-scaling’’

technique (Harris et al. 2006) which maps equilibrium totransient responses taking into account any errors that may

arise because of a mismatch between the patterns of tran-

sient and equilibrium. Using these tools it is possible tomimic the behaviour of HadCM3 at any choice of param-

eter values and allow the effective sampling of many more

ensemble members than those described here. The priorpredictive distributions obtained from the emulated

ensemble are then constrained with observations of the

time-averaged fields projected onto a truncated multivari-ate EOF space, and constrained with trends in various

simple surface air temperature indices to produce likeli-

hood-weighted posterior predictive distributions. Murphyet al. (2007, 2009) go further and estimate the impact of

structural uncertainty in a term called the discrepancy

which is estimated from the multi-model ensemble toproduce joint PDFs of future changes.

To now, the principal driver of for such work has been

the quantification of uncertainty and the production ofprobabilistic projections. We might also use the concept of

observational constraints and relative likelihoods of dif-

ferent models to improve models in a more targeted way(see e.g., Jackson et al. 2008). At present we test models

during their development phase using a wide variety of

different metrics and diagnostics, using different observa-tions and different experiments. If we find a model to be

deficient in a particular way (e.g., if surface temperaturesare too warm in summer) we devote resources to improving

that particular aspect of the model. We rely on our previous

experience or belief of which variables are the mostimportant and secondly how well those variables need to be

simulated in order to produce the most accurate predic-

tions. There is a danger in this approach that we mightdevote significant resources to improving a model in an

area which is largely irrelevant for our particular prediction

problem of interest. Alternatively we may neglect a vari-able which is highly influential in the prediction problem.

By systematically relating the errors in the model simula-

tion of present day and historical climate to uncertainties(errors) in our prediction variable of interest, it should be

possible to produce a better priority list for which variables

are most important.The above issues are touched upon in Sect. 5 of the

manuscript, but a more complete analysis, including the

use of observations to produce PDFs will be presented infuture publications and is also part of an ongoing pro-

gramme of research. In the recently released UK Climate

Projections (Murphy et al. 2009) the rather complex sta-tistical technique alluded to above is employed to relate

model errors to future predictions. As we shall see in

Sect. 5, there is no simple metric or diagnostic that pro-vides a clear constraint on predictions of global-mean cli-

mate change. That is to say, there is no single field that a

model needs to simulate perfectly in order for us to havecomplete confidence in a prediction from that model: a fact

that has been known intuitively by modellers for some

time. The list of metrics for testing models is multivariate.It is likely to be incomplete as there are, in general, more

climate variables in a model than are observed. The list is

also likely to contain redundant information in the sensethat there are covariances between errors in different fields

that means not all metrics are independent from each other.

The extraction of useful information about climate changefrom imperfect climate models is likely to be a complex

endeavour, on a par with the complexity of climate models

themselves or the data-assimilation schemes used in initial-value prediction.

Section 2 of the paper describes the ensemble experi-

ments examined, with a particular focus on the perturbedphysics ensemble experiments. Section 3 presents an

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1739

123

Derek J. Posselt
Derek J. Posselt
Derek J. Posselt
Derek J. Posselt
Page 4: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

analysis and comparison of model errors. In Sect. 4 feed-

backs and radiative forcings are contrasted. Section 5presents a simple analysis of the relationships between

model errors and feedback strengths. Finally Sect. 6 sum-

marises the results of the analysis.

2 Climate model ensembles and variables

2.1 Perturbed physics ensembles

The perturbed physics approach was developed in response

to the call for better quantification of uncertainties in cli-mate projections (see e.g., Chapter 14 of the IPCC Third

Assessment Report—Moore et al. 2001). The approach

involves perturbing the values of uncertain parameterswithin a single model structure, with the choice and range

for the perturbed parameters determined in discussion with

colleagues involved in parameterisation development, orby surveys of the modelling literature. In some cases, dif-

ferent variants of physical schemes may be also be swit-

ched in and out as well as parameters in those alternativeschemes being varied. Any number of experiments that are

routinely performed with single models can then be pro-

duced in ‘‘ensemble mode’’ subject to constraints oncomputer time. A significant amount of perturbed physics

experimentation been done with HadCM3 and variants,

starting with the work of Murphy et al. (2004) and Stain-forth et al. (2005) and continuing with Piani et al. (2005),

Barnett et al. (2006), Webb et al. (2006), Knutti et al.

(2006), Collins et al. (2006), Harris et al. (2006), Collinset al. (2007), Sanderson et al. (2007, 2008) and Rougier

et al. (2009). Other modelling centres are also investigating

the approach using GCMs (e.g. Annan et al. 2005,Niehorster et al. 2006) and more simplified models (e.g.

Schneider von Deimling et al. 2006) with a view to both

understanding the behaviour of their models and to quan-tifying uncertainties in predictions. Sokolov et al. (2009)

use a version of the perturbed physics approach to make a

comprehensive assessment of future global-scale changesampling uncertainties in physical, biogeochemical and

economic factors.

Here we make use of perturbed physics ensemblesproduced using in-house supercomputer resources at the

Met Office Hadley Centre. Analysis of a much larger set of

perturbed experiments performed as part of the climate-prediction.net project are presented in other publications

(e.g. Piani et al. 2005; Knutti et al. 2006; Sanderson and

Piani 2007; Sanderson et al. 2008, Frame 2009). A com-parison between the smaller in-house and larger public-

resource ensembles performed with the mixed-layer

version of HadCM3 is presented in Rougier et al. (2009) inthe context of model emulation (see below).

2.1.1 Considerations in the design of perturbed physicsensembles

Given that one of the key strengths of the perturbed-

physics approach is the ability to control the design of the

ensemble, a design must be produced. However, there are anumber of competing factors that might influence that

ensemble design:

1. To aid understanding of the results, it may be useful

to perturb one model parameter at a time. However,

this limits the potential for interactions betweenuncertainties in different processes, such as clouds

and radiation for example, which we might expect to

be important.2. To reduce the risk of over-confidence in predictions, it

is necessary to produce model versions with a wide-

range of baseline climates and climate change feed-backs. This may mean relaxing a small number of the

usual strict criteria for producing models, such as the

near-balance of the top-of-atmosphere energy fluxesand may reveal errors in model variables that have

been previously compensated for by the adjustment of

a number of different parameters and/or the introduc-tion of different representations of processes.

3. In contrast, given limited and expensive computer

resources, it may be best to attempt to produce modelversions which are somehow ‘‘good’’, perhaps by

trying to predict and minimise a collection of simple

model metrics such the root mean squared errorcharacteristics for time–mean climate fields. At least

we would not want to produce a large number of model

versions that we would consider, by normal standards,to be a complete waste of computer resource. The

potential issue in producing such ‘‘tuned’’ ensembles is

the possibility of double counting model errors whenthe ensemble is weighted to produce PDFs of climate

change. Double counting may lead to over-constrained

predictions and potential for underestimatinguncertainty.

4. To facilitate the building of the best emulator (e.g.

Rougier et al. 2009), a statistical model which relatesmodel parameters to outputs, it may be necessary to

explore a wide range of model parameters and

interactions between parameters in ways which aidthe building of that emulator. Techniques such as

‘‘Latin-Hypercubes’’ (e.g. McKay 1979) may be

employed for example. While this may result in modelversions which may be considered unacceptable when

compared to observational data, they would get down-

weighted in any posterior PDF calculation. Their job isto minimise the amount of extrapolation by the

emulator outside sampled parameter space.

1740 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Derek J. Posselt
Page 5: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

5. For more complex versions of the model (e.g. using a

dynamical ocean component rather than a mixed-layer,

q-flux or slab component) fewer ensemble membersare possible because of the extra resources required to

spin-up model versions and run scenario experiments.

No one experimental design is capable of fulfilling allthe above design criteria, yet they have all, at some time,

guided our work on quantifying uncertainty in the presence

of limited computer resources. For this reason we choose toseparate our archive of perturbed physics versions of

HadCM3 into the different sub-ensembles described below.

We call the model HadSM3 when referring to the versionof the model with a simplified mixed-layer, q-flux or slab

ocean and use the letter S to prefix the ensemble name. In

the case of the version coupled to a dynamical ocean,HadCM3, we use the prefix AO.

2.1.2 Description of perturbed physics ensembles

2.1.2.1 S-PPE-S The ensemble described in Murphy

et al. (2004) in which 31 parameters and switches in theatmosphere component of atmosphere-slab version,

HadSM3, are perturbed. Perturbations are made to a single

parameter at a time (as denoted by the suffix S in S-PPE-S),either to the minimum or to the maximum of the range

specified in consultation with modelling experts, or on/offin the case of a switch. This results in 53 different model

versions, including the standard parameter setting as

defined in the standard published version of the model(Gordon et al. 2000; Pope et al. 2000), rather than the

median or best-guess parameter values. In this design, if a

perturbation in one physical scheme has an impact on aprocess or model variable that is also related to another

scheme; there can be no compensation achieved by per-

turbing a related parameter, as might be done in the modeldevelopment process. In that sense, the single-perturbation

approach might be thought of as the simplest form of

model ‘‘de-tuning’’ (Stocker 2004) in that no attempt ismade to a priori maximise the model performance when

compared to observations (it should be stressed that no

systematic tuning of model performance was done to pro-duce the standard parameter settings). The initial purpose

of this ensemble was to provide a simple, understandable

assessment of the parameter uncertainty in HadSM3.Details of all the parameters perturbed are presented in the

appendix to Murphy et al. (2004) and also in Barnett et al.

(2006) and Rougier et al. (2009).

2.1.2.2 S-PPE-M This ensemble also utilises the mixed-

layer ocean version, HadSM3, but in this case simultaneous

‘‘multiple’’ (suffix M) perturbations are made to theparameters, i.e. all 31 parameters and switches for the

S-PPE-S case are perturbed simultaneously. Here, there can

be compensation between perturbations to physical pro-cesses. In the design of the ensemble, an attempt was made

to minimise the average of the root mean squared error of a

number of time-averaged model fields while sampling awide range of surface and atmospheric feedbacks under

climate change. This ‘‘tuned’’ design of the ensemble

was guided by deriving a linear predictor (based on theS-PPE-S ensemble), relating the 31 parameters of HadSM3

to the climate sensitivity and the Murphy et al. (2004)‘‘Climate Prediction Index’’ or CPI. Further details of

experimental design are given in Webb et al. (2006) who

also examine cloud-feedback processes under climatechange in some detail and compare with a multi-model

ensemble. In contrast to the S-PPE-S ensemble, the inter-

active sulphur cycle (Jones et al. 2001) is activated in allensemble members although no changes to sulphate

emissions are employed. The ensemble contains 129

members, which includes a version with the standardparameter settings but with interactive sulphur cycle

activated.

A particular feature of models with mixed-layer oceansis a cooling instability that can appear during the 19CO2

and/or 29CO2 phase (a description of the mechanism for

the instability is presented in the supplementary informa-tion in Stainforth et al. (2005)). This happens in one of the

129 members, leaving 128 members analysed here.

2.1.2.3 S-PPE-E An additional 103 HadSM3 experi-ments are grouped into this ensemble using the same

parameters perturbed in S-PPE-S and S-PPE-M. A small

number of experiments were performed to make initialestimates of the non-linearity of parameter combinations in

Murphy et al. (2004) (see the appendix of that paper) but

the majority of the members were produced to exploreparts of parameter space not covered by the other HadSM3

ensembles for use in the building of an emulator of the

parameter space of the atmosphere component of the model(further details can be found in Rougier et al. (2009),

Murphy et al. (2009)). The generic function of an emulator

is to map the parameters of the model onto variables ofinterest and, as a consequence, there is a requirement to

explore parameter space without recourse to potential

model validity. Thus, in contrast to the ‘‘tuned’’ S-PPE-Mensemble, no attempt is made to minimise root-mean-

squared (RMS) errors for example; the exploration of

parameter space being the main motivation for the largemajority of the members of this ensemble. The 103 are a

subset of a larger ensemble in which 13 parameter com-

binations suffer the cooling instability described above, soare not analysed.

For each member of the mixed-layer model version

ensembles, a calibration phase (from 10 to 25 years

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1741

123

Page 6: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

depending on decadal drift and variability) is performed

and the heat convergence from within the mixed-layercomponent is averaged into monthly values and kept fixed

in both the 19 and 29CO2 experiments. Unfortunately, a

coding error was subsequently discovered in the S-PPE-Mexperiments, and also in some members of the S-PPE-E

ensemble, such that the heat convergence field was speci-

fied from only 1 year of the calibration phase, rather thanbeing averaged over many years. This has the potential

impact of introducing noise into the heat convergence field,which may drive the SSTs in the 19CO2 phase away from

the seasonally varying climatology. As we shall see in later

analysis, the impact is on average rather small, in particularwhen one contrasts errors in the SST fields in the perturbed

mixed-layer experiments with those in non-fluxed adjusted

coupled model runs. Repeat experiments in which 10–20average year heat convergence fields are applied to mem-

bers with the largest SST noise show no significant dif-

ferences in global-scale features such as RMS errors fornon-SST related variables nor in the components of the

atmospheric and surface feedbacks at 29CO2. The model

versions are therefore suitable for quantifying uncertaintyand examining feedbacks, etc.

2.1.2.4 AO-PPE-A This ensemble uses the fully coupled

version of HadCM3 but with perturbations only toparameters in the atmosphere component (an updated

version of the ensemble described in Collins et al. 2006).

The standard settings and 16 combinations of parametersettings selected from the S-PPE-M ensemble are used in

order to sample a range of surface and atmosphere feed-

backs under transient climate change. Members areselected based on an approximately uniform sampling of

the climate sensitivity of the larger S-PPE-M ensemble

while ensuring that a wide range of different parametersettings are sampled. The choice was made by examining

the table of sensitivities and parameters in S-PPE-M,

rather than using any numerical algorithms. In addition,the interactive sulphur cycle is activated as it is in the

S-PPE-M ensemble but in contrast, sulphate emissions are

varied in some simulations (see Sect. 2.3). Murphy et al.(2009) also describe an ensemble with perturbations to

parameters within the HadCM3 sulphur-cycle. The results

from this ensemble will be described elsewhere. Fluxadjustments are employed in these coupled model simu-

lations to: (1) prevent model drift that would result from

perturbations to the parameters that lead to top-of-atmo-sphere net flux imbalances, and (2) to improve the credi-

bility of the simulations in simulating regional climate

change and feedbacks. The limitations of coupled mod-elling the presence of flux adjustments has been discussed

widely, e.g. Dijkstra and Neelin (1999). Here the simi-

larity of baseline surface-climate states facilitates the

combination of the HadSM3 and HadCM3 ensembles to

produce ‘‘time-scaled’’ response for a larger number ofcombinations of model parameters (Harris et al. 2006).

The spin-up technique is similar to that described in

Collins et al. (2006) except that a less vigorous salinityrelaxation is employed during the Haney-forced phase

(relaxation coefficients are those used by Tziperman et al.

(1994); 30 and 120 days for temperature and salinity,respectively) which significantly alleviates the problem of

SST and sea-ice biases found in the Collins et al. (2006)ensemble (Fig. 1). The 16 perturbed sets of parameter-

combinations are selected from the 128-member S-PPE-

M, although the combinations are not the same as thoseshown in Table 1 of Collins et al. (2006). For historical

reasons, the sea-ice scheme in HadCM3 is contained in the

atmosphere component of the model and parameters in thescheme are perturbed in line with the equivalent S-PPE-M

ensemble.

2.1.2.5 AO-PPE-O The fully coupled HadCM3 is usedwith the standard atmosphere settings (with interactive

sulphur cycle) but with perturbations to parameters and

schemes in the ocean component. The ensemble extendsthe work of Collins et al. (2007) and Brierley et al. (2009,

2010) who provide details of the physical schemes in

HadCM3 that were surveyed for parameters and switchesto perturb. Briefly, parameters in the schemes which

control horizontal mixing of heat and momentum, the

vertical diffusivity of heat, isopycnal mixing, mixed layerprocesses and water type are varied. A Latin Hypercube

design is employed which is efficient in permitting inter-

actions between perturbations to parameters. The samespin-up technique used in the AO-PPE-A ensemble is

employed to generate flux-adjustment terms. This is in

contrast to the experiments described in Collins et al.(2007) where no flux adjustments were employed. In that

study it was found that model drift can introduce biases in

surface climate which lead to differences in atmosphere/surface feedbacks under climate change. Such biases were

considered undesirable here as we wish to isolate the

impact of ocean parameter perturbations. The use of fluxadjustments also facilitates comparison with the ensembles

which employ a slab-ocean and with the AO-PPE-A

ensemble.

2.2 Multi-model ensembles

Much has been written about the CMIP3 archive of model

output and the reader is referred to Meehl et al. (2007b) for

a history and to the PCMDI web site and for a constantlyevolving list of papers based on the archive. Here we also

augment the analysis by using archived output from the

CFMIP project (e.g. Webb et al. 2006) in the case of model

1742 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 7: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

versions which use mixed-later ocean formulations. We

denote the multi-model ensembles used as follows, to beconsistent with the notation adopted above.

2.2.1 S-MME

Different atmosphere models coupled to simple mixed-layer oceans. Model output is extracted from the CFMIP

(see e.g., Webb et al. 2006) and WCRP CMIP3 database at

PCMDI (Meehl et al. 2007b). 20-year averages from19CO2 and 29CO2 experiments are used. The models

used are show in Table 1 and the ensemble consists of 16

members.

2.2.2 AO-MME

We use coupled model output from the 23 models in the

WCRP CMIP3 database. Again, the models used are shown

in Table 1. There is a significant overlap in model versionsbetween the S-MME and AO-MME ensembles.

Fig. 1 Annual mean SST biases in fixed pre-industrial CO2 simula-tions with HadCM3 with standard parameter settings. a The non flux-adjusted version of the model submitted to CMIP3. b The version ofthe model with interactive sulphur cycle and flux adjustmentsreported in Collins et al. (2006). c The standard version of the modelwith interactive sulphur cycle and adjusted Haney relaxation coeffi-cients used in this paper in generating ensembles AO-PPE-A and AO-PPE-O. Adjusting the Haney coefficients leads to a reduction in SSTbiases in all coupled-model simulations

Table 1 Models used in the multi-model ensembles in this study

Model name Atmos-slab Atmos-ocean

BCC-CM1 9

BCCR-BCM2.0 9

CCSM3 9 9

CGCM3.1(T47) 9 9

CGCM3.1(T63) 9 9

CNRM-CM3 9

CSIRO-Mk3.0 9 9

ECHAM5/MPI-OM 9 9

ECHO-G 9

FGOALS-g1.0 9

GFDL-CM2.0 9 9

GFDL-CM2.1 9

GISS-EH 9

GISS-ER 9 9

INGV-SXG 9

INM-CM3.0 9 9

IPSL-CM4 9 9

MIROC3.2 (hires) 9 9

MIROC3.2 (medres) 9 9

MIROC3.2 (high sensitivity) 9

MRI-CGCM2.3.2 9 9

PCM 9

UKMO-HadCM3 9

UKMO-HadGEM1 9 9

UIUC 9

HadCM4 9

The slab-ocean version of UKMO-HadCM3 is not selected as amember of the multi-model ensemble as that is included as a memberof the perturbed-physics ensembles. However, the coupled version isincluded as this version of HadCM3 is run without flux adjustmentsand hence may be considered to be different from the flux-adjustedperturbed physics couple model standard member

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1743

123

Page 8: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

There are a number data limitations in this archive and

analysis is only performed on the subset of multi-modelsfor which the data exists and is suitable.

2.3 Experiments and variables

Sheer volume of data prevents us from examining all

variables from all experiments run using the modelensembles described above. Hence we focus on the fol-

lowing set of experiments and variables because (1) thereexists a common set of core experiments that can be easily

and fairly compared and (2) they allow us to examine the

main feedbacks and forcings under commonly used sce-narios for climate change. The experiments examined are:

1. The 19 and 29CO2 equilibrium runs in the case of all

models with mixed-layer/q-flux/slab oceans. For somemodel experiments 19CO2 is taken to mean pre-

industrial levels, while in others it is taken to meanpresent day, or some other level (year 1900 in the case

of the MIROC models for example). We make no

practical distinction here as the differences betweenfeedbacks dominate the response at 29CO2 and

because the applied forcing due to doubling does not

depend significantly on the chosen 19CO2 baselinevalue.

2. Pre-industrial (and in the case of some AO-MME

members, present day) control experiments with noexternal forcing and experiments with 1% per year

compounded increase in CO2. We use 80 years of

output from control experiments for both MME andPPE members and 80 years of 1% per year experi-

ments which, for most MME members, are taken from

the experiment in which CO2 continues to increaseafter year 70 (the ‘‘1%to49’’ experiments). For a

handful of MME members, this experiment was not

available and the run in which CO2 is stabilised pastthe 70 year mark are employed (‘‘1%to29’’ experi-

ments). In practice, this makes little difference to the

calculation of the transient climate response, effectiveclimate feedback parameter, and other quantities of

interest.

3. Experiments forced with historical changes in radi-atively important factors. For the PPE ensemble

experiments, historical changes in CO2, methane and

some minor greenhouse gases are used, together withchanges in sulphate-aerosol emissions and variations

in solar irradiance and volcanic optical depth. The

origin of the anthropogenic and natural forcing is thesame as that in experiments using a subsequent version

of the Met Office Hadley Centre climate model

(HadGEM1), and are described in Stott et al. (2006).For some of the multi-model members, both

anthropogenic and natural factors are included but

for others only anthropogenic factors are used for the

‘‘20cm3’’ simulation (see e.g. Forster and Taylor 2006and Sect. 4.4 later).

4. Experiments forced with future changes in anthropo-

genic greenhouse gases and aerosols under the SRESA1B scenario. For the AO-PPE members, the solar

variability is prescribed by repeating the solar cycle in

the period 1993–2003 for the years 2004–2100 of thescenario. The future volcanic forcing is set constant by

holding the volcanic optical depth to the year 2000

values (close to that in the AO-PPE-A control simu-lations). A range of options appear to be used in the

AO-MME. See Forster and Taylor (2006) for more

information on both historical and A1B forcings in theWCRP CMIP3 ensemble.

We also make use of a very long multi-century simu-

lation of the standard un-flux adjusted coupled version ofHadCM3 with fixed concentrations of greenhouse gases.

This is in order to estimate the natural variability of model

error, climate change feedback parameters and radiativeforcing. While multi-century fixed-forcing experiments

with other models may yield slightly different estimates of

such variability, as we see below, it is common for inter-model or inter-model-version differences to dominate, so

the use of output from just one multi-century modelexperiment is valid.

The list of variables examined is; surface air tempera-

ture (SAT), sea surface temperature (SST), average pre-cipitation rate, net top-of-atmosphere (TOA) energy fluxes

and the shortwave (SW) and longwave (LW) components,

TOA cloud radiative forcing (CRF) SW and LW compo-nents, mean sea level pressure (MSLP), cloud amount,

surface sensible and latent heat fluxes, surface SW and LW

fluxes and zonal mean relative humidity. The use of TOAcloud radiative forcing rather than simply examining the

clear-sky fluxes is preferable as in regions of sea-ice and

land ice/snow small differences between the position of theedge of the ice can dominate the calculation of fields such

as root-mean-squared-errors. By differencing the all- and

clear-sky fluxes the relative difference in the model per-formance in terms of the radiative effects of clouds is better

captured. We use only time-averaged seasonal and annual

fields so that atmosphere-slab and fully coupled simula-tions may be compared. This list thus represents a com-

bination of impact-relevant variables and variables that

have been shown to be linked to climate change feedbackprocesses. They are also the list of variables used by

Murphy et al. (2009) in constraining PDFs of future change

using the ensemble output described here.Observational data is taken from a number of sources

indicated in Table 2. Only one data set is used to calculate

1744 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 9: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

model error terms, but the other fields are used to produce

an order-of-magnitude estimate of observational error inthe calculation as described below. In some cases gridded

observational data are derived from the same raw point

information and simply use different statistical techniquesto produce the gridded product. The treatment of uncer-

tainties in observations remains a limitation of this study as

comprehensive estimates of uncertainty simply do not existfor most variables. Nevertheless, this does represent an

advance on previous studies (e.g. Gleckler et al. 2008).

3 Model ‘‘Errors’’

The purpose of this section is to make comparisons

between the modelled and observed mean climate of themembers of the different ensembles in order to contrast the

perturbed physics and multi model approaches. There are a

number of simple and widely used metrics which may beused to quantitatively compare models with observed cli-

mate variables (e.g. Taylor 2001). It is not possible here to

make a completely comprehensive comparison of allvariables, with all possible observational sources, using all

possible metrics. We seek rather to perform an analysis and

inter-comparison of some of the main features of observedclimate between the two different approaches. The analysis

uses the climate variables outlined above, which are chosen

(based on previous experience) on the basis of their user-relevance and because of their key role in physical feed-

backs under climate change.

For each of the observed climate variables considered,

we interpolate both the observations and the multi-modeloutput onto the spatial grid of the perturbed physics

ensemble. This results in the minimum number of inter-

polation steps because of the large number of perturbedphysics members. The global mean bias in a climate var-

iable is defined as the area-weighted globally averaged sum

of the grid-box difference between the 20-year and 80-yeartime-averaged 19CO2 or pre-industrial or present-day

control climates (for slab and coupled models, respec-tively) and the observed climate variable. The sum is

calculated only on grid points at which the observed time-

averaged field exists. The root mean squared (RMS) error,e, is calculated similarly but with the global mean bias

removed before the calculation (sometimes called the

centred RMS error—Taylor 2001). The same step is per-formed when calculating the correlation between the

observed and modelled field. These types of metrics are in

the spirit of the ‘‘Taylor Diagram’’ cited above. The use ofeither pre-industrial or present-day control run is related to

that chosen by the different modelling groups for the initial

state of the 1%/year CO2 increase experiment. As statedabove, while there are detectable differences in metrics

computed from the two differently specified control runs

for a single model (e.g. Reichler and Kim 2008), thegeneric model error tends to dominate so there is little

sensitivity in the final model comparison. We calculate

the bias, RMS error and correlation for both seasonaland annual-mean fields but present only the annual-mean

values for reasons of space and because they are

Table 2 Observational dataemployed in this study to assessmodel errors

The principal fields used areindicated in bold. Otherobserved fields are used toestimate observational errors

Variable Observational field Reference

Land surface air HadCRUT3 Brohan et al. (2006)

Temperature Legates and Willmot (1990)

Sea surface temperature HadISST1.1 1871-1900(used to calibrate flux adjustment)

Rayner et al. (2003)

NCDC SST Smith and Reynolds (2004)

GISS SST Hansen et al. (1996)

Precipitation CMAP Xie and Arkin (1997)

GPCP Adler et al. (2003)

Top-of-atmosphere radiative fluxes ERBE Harrison et al. (1990)

CERES Wielicki et al. (1996)

ISCCP FD Rossow and Lacis (1990)

Mean sea level pressure HadSLP2 Allan and Ansell (2006)

ERA40 Uppala et al. (2005)

Cloud amount ISCCP D2 Rossow et al. (1996)

HIRS Wylie et al. (1994)

Surface fluxes SOC Grist and Josey (2003)

DaSilva Da Silva et al. (1994)

Zonal mean relative humidity ERA40 Uppala et al. (2005)

AIRS version 5 Aumann et al. (2003)

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1745

123

Page 10: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

representative of generic errors in different models. We

note though that in terms of constraining model predictionsusing observations, information from the annual cycle may

be of some use (e.g. Knutti et al. 2006).

In order to get an order-of-magnitude estimate of theobservational error, we compute biases, RMS differences

and correlations between all pairs of the observational

fields listed in Table 2. The maximum bias and RMSE andminimum correlation is then used as a crude estimate of the

likely magnitude of the error in the observations. In theabsence of numerical estimates of both systematic and

random errors in the majority of the observational fields,

this is the most simple approach. A conclusion of this studyis that more comprehensive estimates of errors in obser-

vational data sets are required in order to quantify uncer-

tainty in model projections of future climate change.In the case of the AO-PPE-O ensemble, all bias, RMS

error and correlation fields are indistinguishable from the

standard version of the model, presumably because of theuse of identical atmosphere parameters and flux adjust-

ments, so that ensemble is not discussed extensively in

what follows. The values of the error metrics for theAO-PPE-O ensemble are included in the figures for

completeness.

3.1 Errors in two-dimensional time-averaged fields

Examining land surface air temperature errors in the per-turbed-physics model versions with slab-ocean components

first, we see biases and RMS errors of the order of a few

degrees globally (Fig. 2). In the case of the ‘‘de-tuned’’S-PPE-S ensemble with only single parameters perturbed,

land surface temperature biases are exclusively negative

when compared to the HadCRUT3 observational data set,with the standard model versions placed towards the end of

the distribution which is closest to observations. In the case

of the ‘‘tuned’’ S-PPE-M ensemble, there is a wider spreadof biases than in the model versions with only one single

parameter perturbed, in which positive values are evident.

A similar range of RMS errors is evident in the twoensembles, reflecting the optimisation of RMS errors in

ensemble design (see above and Webb et al. 2006). Bigger

RMS errors are seen in the S-PPE-E ensemble whichexplores more regions of parameter space.

In the slab-ocean multi model ensemble, S-MME, we

see a similar range of land SAT biases as in the case of theperturbed physics ensembles, but a somewhat wider range

of RMS errors. It is possible that the specification of dif-

ferent surface boundary conditions, which may impactsurface air temperature in the multi-model ensemble, pro-

motes a wider range of spatial patterns of surface air

temperature. Fields such as orographic height, vegetationand soil properties are identical in each of the members of

the perturbed physics ensembles, although some surface-

related processes such as the roughness length are per-turbed—see the appendix to Murphy et al. (2004). We also

note at this point that correlation scores are of little use

when comparing land surface air temperatures in models asthey are is close to unity for all model versions, being

dominated by the pole to equator temperature gradient.

SSTs in models with mixed-layer or slab oceans are tiedmore closely to observations because of the calibration

phase in which the implied ocean heat transports are cal-culated. The exception is some members of the S-PPE-M

and S-PPE-E ensembles where, while part of the spread in

biases and RMS errors is due to the multiple-parameterperturbations, part may also be attributed to the afore-

mentioned error that was inadvertently introduced into the

calculation of the implied heat transports. Despite this, bothSST bias and RMS errors are of a similar magnitude in

slab-ocean perturbed physics and multi model ensembles

and are in many cases smaller than those errors seen in thenon-flux-adjusted CMIP3 coupled models (AO-MME). As

mentioned above, we have re-run a number of experiments

where noise in the calculation of the slab-model heat fluxconvergence fields was present and found that this has a

relatively small impact on global error characteristics and

feedbacks.Turning to the coupled model ensemble experiments,

the range of biases in SST is generally smaller in both

atmosphere-parameter-perturbed (AO-PPE-A) and ocean-parameter-perturbed (AO-PPE-O) ensembles in compari-

son with the coupled multi-model ensemble (AO-MME).

Similarly, RMS errors are smaller. This is because of theexclusive use of flux adjustments in the former which tend

to limit (but not eliminate) the formation of SST errors.

Perhaps surprisingly however, the range of land SAT bia-ses is also smaller in the flux-adjusted coupled PPE simu-

lations than in the multi-model case and, correspondingly,

Fig. 2 Bias, centred root mean squared errors (RMSE) and correla-tions between two-dimensional time-mean modelled and observedfields. From top to bottom; land surface air temperatures (SAT), seasurface temperatures (SST), precipitation, net top-of-atmosphere(TOA) fluxes (positive incoming), outgoing SW fluxes, outgoingLW fluxes, outgoing SW cloud forcing, outgoing LW cloud forcing,mean seal level pressure (MSLP), cloud amount, surface sensible heatflux, surface latent heat flux, surface SW fluxes, surface LW fluxesand zonal mean relative humidity. Different ensembles (S-PPE-E,etc., see Sect. 2.1) are indicated and one dot is plotted for eachensemble member. The light blue dots show the bias, RMSE andcorrelation for the ensemble mean of all the models in the ensemble.The red dot indicates the experiment with standard HadCM3parameter settings, flux adjustments and interactive sulphur cycle.The light grey shading represents an estimate of the uncertainty inobservational fields (see text). The dark grey shading indicates themean and ±2SD of 20 or 80-year means of the value calculated froma multi-century integration of the non-flux-adjusted version ofHadCM3 and hence gives an order-of-magnitude estimate of naturalvariability in the calculated errors

c

1746 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 11: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

the land surface air temperature RMS errors are generally

smaller than those seen in many of the non-flux-adjustedcoupled multi-model members. There are reasonably large

top-of-atmosphere net flux imbalances in some of the AO-

PPE-A members which might be expected to lead to largeland SAT errors, but it seems that having better ocean SSTs

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1747

123

Page 12: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

can influence the land surface temperatures in some way,

as perhaps indicated by studies of land–sea contrast whichfind a seemingly strong relationship under climate change

(e.g. Lambert and Chiang 2007; Sutton et al. 2007; Joshi

et al. 2008). It seems that flux adjustment of SSTs can alsolead to better simulation of land temperatures, at least by

these gross measures.

Global mean biases in precipitation in the slab-model

ensembles follow a similar pattern to those in global landsurface air temperature and SST in the different ensembles,

except that the S-MME has a relatively wider range of

biases than any of the other slab-ocean perturbed physicsensembles. The range of global precipitation biases in

AO-PPE-A is smaller than the range seen in AO-MME,

Fig. 2 continued

1748 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 13: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

with the former being more consistent with biases seen in

the slab-ocean ensembles. Under climate change scenarios,model differences in changes in global mean precipitation

tend to be positively correlated with differences in changes

in global mean temperature across ensembles via theircorrelation with lower tropospheric water vapour (e.g.

Held and Soden 2006). Here there is no simple relationship

between global mean SST or global mean surface airtemperature and global mean precipitation across present

day/pre-industrial simulations in either the PPE or MMEequilibrium experiments, suggesting that other factors are

at play. Different representations of the effects of aerosol

particles are one potential candidate for explaining a lackof correlation between global mean biases in precipitation

and temperature.

Looking at errors in other surface fields, we note somerelatively large negative biases in mean sea level pressure

(MSLP) in the coupled perturbed physics ensembles (AO-

PPE-A and AO-PPE-O). This is due to a numerical drift inatmospheric mass during the spin-up phase of those

ensemble members which was subsequently corrected

during the running of the control and scenario experimentsanalysed here. Such a global mean bias does not impact the

spatial pattern of MSLP and thus the RMS errors in MSLP

in those ensembles are small in comparison to those seen insome of the multi-model ensemble members. The spatial

pattern of MSLP is a leading-order indicator of the hori-

zontal circulation in the different models and model ver-sions. So the absolute value can be justifiably corrected

a posteriori in, for example, impacts studies, if required.

For the surface sensible heat flux, the range of bothbiases and RMS errors is generally smaller in the perturbed

physics ensembles in comparison with the multi-model

ensembles. In the case of surface latent heat fluxes, theranges are more comparable and are generally larger than

the sensible heat flux errors.

As indicated above, relatively large ranges of biases,compared to, for example, the TOA forcing from a dou-

bling of CO2, are evident in the models with slab ocean

components; more so in the case of the perturbed physicsensembles (by design) but also in the case of the multi-

model slab-ocean ensemble. When coupling to a slab-

ocean model, a non-zero ocean heat convergence term isgenerally permitted and counters the effect of a non-zero

TOA flux imbalance. We permit the existence of relatively

large TOA imbalances in the perturbed physics ensemblesin order to explore more fully the model parameter space

and also note that imbalances might also arise because of

missing or structurally deficient processes (a more com-plete justification and discussion is presented in Collins

et al. (2006)). The largest TOA imbalances are found in the

S-PPE-E ensemble in which parts of parameter spacenot explored by the other ensembles are sampled. It is

interesting to note that the algorithm used to pick out these

additional experiments (see Rougier et al. 2009) tends tofavour models with negative incoming net TOA biases.

These additional experiments were largely designed to

inform the building of an emulator of the parameter spaceof the model and should not necessarily be viewed as being

intended to have realistic climates in comparison to, say,

the model version in the S-PPE-M ensemble which weredesigned to have small RMS errors. As we shall see later

however, it is possible to span a wide range of global cli-mate change feedbacks with models which are close to

radiative balance at the TOA (Fig. 10, Sect. 5).

The TOA-error situation for the coupled AO ensemblesis rather different. We speculate that the members of the

AO-MME ensemble have been developed to produce a net

TOA flux close to zero to avoid climate drifts and the useof flux adjustments, hence the existence of relatively small

global-mean biases in that ensemble. There is a strong anti-

correlation (coefficient of -0.8) between global meanbiases in SW TOA fluxes and global mean biases in LW

TOA fluxes which seems to limit biases in net TOA in the

coupled multi-model ensemble. With the exception of theINM-CM3.0 model, this strong anti-correlation is also seen

in the equivalent slab-ocean versions of the multi model

ensemble. No such anti-correlation is evident in the per-turbed physics AO-PPE-A, and would not have been

expected because of the design of the ensemble.

The picture for the TOA SW fluxes is similar to that ofthe net TOA, with a larger range of biases in the models

coupled to slab oceans, and in particular some large biases

in some of the model versions in the S-PPE-E ensemble. Asmaller range of biases is seen in the coupled AO models

with the AO-MME showing the smallest range. The LW

situation is slightly different however. The S-MME andS-PPE-S and S-PPE-M ensembles show a more similar

range of smaller biases than in the SW case and, in par-

ticular, the AO-MME and AO-PPE-A ensembles have asimilar and smaller range (note the difference in scale on

the x-axis in the panels of Fig. 2). It seems that the SW

biases dominate the total spread in TOA biases in the slab-ocean perturbed physics sub-ensembles; absolute correla-

tion coefficients are greater than 0.9 (i.e. less that -0.9)

between net TOA and SW biases for all slab-model per-turbed physics ensembles. Isolating the components of the

fluxes associated with the cloud radiative forcing (labelled

SW CF in Fig. 2), we find that this is the main driver of thespread in total TOA SW biases in the PPE ensembles.

Indeed, it appears that cloud forcing biases and RMS errors

are of a similar magnitude to the biases and RMS errors intotal fluxes, indicating a major role for clouds in deter-

mining model errors in energy fluxes. We see global mean

biases and RMS errors of the order of 10% and greater inthe fractional cloud amount in both perturbed physics and

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1749

123

Page 14: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

multi-model ensembles and we return to this point later

(Sect. 5, Fig. 10). We also note here that that the picture forSW and LW TOA flux errors is reflected at the surface with

a very similar patterns of biases and RMS errors for the

different ensembles.Finally we examine errors in the zonal mean relative

humidity fields in the ensemble experiments. Here we see a

relatively small range of biases in all perturbed physicsensembles in comparison to the relatively wide range seen

in the multi-model slab ocean and coupled ensembles. Theranges of RMS errors between the PPE and MME

ensembles are more comparable however.

A motivation for the quantification and comparison ofmodel errors in this way is to use the information to assign

relative levels of credibility to different members of the

different ensembles; with the ultimate aim of producingprobabilistic projections. It is clear from Fig. 2 that, for the

majority of variables, the bias and RMS errors in the

models are bigger than the crude estimate of uncertaintythat can be attached to the observational data sets and much

bigger than that which can be attributed to natural vari-

ability. Nevertheless, differences between observationaldata sets are large enough for some variables to make it

difficult to distinguish between different models or differ-

ent model versions, which would, in turn, make it difficultto assign a relative likelihood. For example, in the case of

total outgoing SW flux biases, it is clear that a model with a

bias of 40 W m-2 is inconsistent with the observations, butwhat can we say about two models, one of which has a bias

of -5 W m-2 and the other which has a bias of

?5 W m-2? Both are, in some sense, consistent with theuncertainty in outgoing SW flux measures. Such uncer-

tainty presents a considerable challenge when both devel-

oping models and constraining ensembles of models withobservations. We discuss this issue further in Sect. 5.

It is clear from Fig. 2 that, using the perturbed physics

approach, it is possible to sample model versions in whichthere are biases and root mean squared errors in mean

climate fields which are comparable to those found in the

multi-model ensembles. The key point is that, in the case ofthe perturbed physics ensembles, it is possible to have

some control over the error characteristics of the ensemble

members one produces. In the next section we discusssome further characteristics of model errors in the two

types of ensemble.

3.2 Similarity of model errors

The relative similarity of model errors is of interest as inmany multi-model exercises it is often observed that the

ensemble-mean forecasts or ensemble-mean climatologies

are found to have greater skill/fidelity than those forecastsor climatologies produced with any individual model

(e.g. Lambert and Boer 2001; Hagedorn et al. 2005;

Reichler and Kim 2008; Gleckler et al. 2008). It can beseen in Fig. 2 (blue dots), that for the multi-model

ensembles examined here, the ensemble mean RMS error is

in many cases smaller than the smallest RMS error of anyindividual model of the ensemble. For the perturbed

physics ensembles, this is not always the case. We might

suspect that the multi-model approach would be charac-terised by a wide spread of spatial distributions of model

errors, imprinted by the different structural approaches,whereas the perturbed-physics approach would be charac-

terised by very similar spatial distributions of errors related

to the single model structure.To investigate this we can make use of a simple spatial

correlation measure to look at the differences between

spatial patterns of errors in different models (e.g. Jun et al.2008). Spatial difference fields (i.e., model two-dimen-

sional mean fields minus observations) do not tend to be

dominated by large latitudinal gradients, which result in thenear unity correlation score in Fig. 2 that is evident for

many fields. Hence they are of use here. Figure 3 shows

frequency distributions of intra-ensemble error correlationsfor the ensembles and variables considered previously. In

each case, an n by n matrix of the correlation between the

spatial errors of all pairs of ensemble members is computedfor the n-member ensemble in question. The histograms

show the relative occurrence of values of those correlation

coefficients in different bins of width 0.1, computed overthe lower triangle of the matrix and excluding the unit

values on the diagonal. The mean and spread of the his-

togram provides information on the similarity of spatialerror patterns within the ensemble.

We can produce a similar diagnostic for non-overlap-

ping sections of the multi-millennial control run of Had-CM3 to test for the effects of natural variability. For all the

variables considered in Fig. 3, correlations between spatial

errors in different sections of that long run always lie in thebin 0.9–1.0 (this is the case for both 80-year and 20-year

averages) hence the structure of the intra-ensemble error

similarity in Fig. 3 and may be interpreted as real differ-ences in spatial error structures between ensemble

members.

For the multi-model ensembles with slab-oceancomponents and with dynamic oceans (S-MME and

AO-MME), there are relatively wide distributions of spatial

patterns of model error with error correlations distributedaround an average correlation of approximately 0.5 for

most variables. There is little evidence of very low or

negative correlations which suggest that, on a global scale,models share some commonality of error patterns although

regionally errors can be of a different sign. Although

observational data sets can suffer from global and regionalbiases and random errors, repeating Fig. 3 by selecting a

1750 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Derek J. Posselt
Page 15: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

random observational data set (as described in the abovesection) for each element of the correlation matrix pro-

duces no qualitative change to the figure. This leads us to

conclude that common errors in models do not arisebecause of differences in observational data sets, unless it

is the case that all the observational data sets considered

share the same regional biases. Although we reiterate thatthe treatment of observational error by the simple sampling

of available data sets is rather crude.

In the case of the perturbed physics ensembles, there is atendency for distributions of spatial correlations to be

skewed more towards unity, i.e. more similar spatial pat-

terns of error across the ensemble than in the case of themulti-model ensembles. However, it is not universally the

case that the perturbed physics approach results in a dis-

tribution of more-identical patterns of model errors usingthis measure. In the case of the S-PPE-E, and to a certain

extent the S-PPE-M, the distributions of spatial correlations

Fig. 3 Distributions of intra-ensemble spatial correlations of annualmean model errors for the climate variables and ensembles consid-ered. The name of each ensemble is indicated above the individual

panels and the variables are indicated on the abscissa. The bin size is0.1 of correlation coefficient in each case

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1751

123

Page 16: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

are more like those computed from the multi-model

ensembles, with a larger spread of correlations and averagecorrelations much less than unity. This suggests that the

perturbed physics approach can be used to sample a rela-

tively wide range of different baseline climates. Never-theless, there is clearly ‘‘imprint’’ of the baseline model on

the spatial patterns of errors.

A further diagnostic of spatial error characteristics canbe derived from the root mean squared error statistics of the

two-dimensional fields. If we let e2 be the total meansquared error (MSE) in a single climate field in an

ensemble member (with the global mean bias removed as

above) we may decompose this total error into a systematiccomponent, es

2, a random component er2 and a component

due to natural variability, en2;

e2 " e2s # e2

r # e2n:

The systematic component, es2, is defined as the mean

squared error of the ensemble average of that particulartwo-dimensional variable. In this case, the term

‘‘systematic’’ refers to the error which is common to all

models in the ensemble. The component due to naturalvariability, en

2, is that due to taking different 20-year or 80-

year averages from a long unforced control integration: by

examining the long HadCM3 control experiment, we findthis component to be small and hence it is possible to

neglect it. The random component of mean squared error,

er2, is that associated with drawing a particular model from

the underlying distribution of all models in that particular

ensemble type. The concept of an underlying distribution

of models is simpler to imagine in the case of the perturbedphysics ensembles; it is the space of all possible parameter

settings of HadCM3. In the case of the multi-model

ensembles it is perhaps harder to define but we persist withthe analogy in order to interpret the error characteristics of

the ensembles.

Figure 4 shows the average of the relative contributionsof systematic and random errors as a fraction of the total

error computed from the multi-model and perturbed

physics ensembles. For the multi-model ensembles,approximately half or less of the total error is explained by

the systematic component. For these ensembles, the meansquared error of the ensemble mean is smaller than the

smallest mean squared error of the ensemble members and

the ensemble mean is the ‘‘best’’ model. The general sit-uation for the perturbed physics ensembles is that more of

the total error is contributed from the systematic compo-

nent than the random component. The spatial patterns ofthe errors in each member are more similar, as is seen in

Fig. 3. However, for the S-PPE-E ensemble, there is much

more a character of the multi-model partitioning of sys-tematic and random components of error. That is a greater

sampling of random model versions in which patterns of

error do not resemble each other closely. For this sub-

ensemble, the distribution of error correlations is more likethat seen in the multi-model ensembles and the ensemble

mean root mean squared error for a number of different

climate fields is close to the minimum of the error found inany individual member (Fig. 2).

Again, it is perhaps possible that Fig. 4 may be altered

substantially by existence of regional errors in observa-tional data sets. While small differences are evident on

choosing different data sets for computing the figure, thequalitative picture is not altered. Neither is it altered if we

average the ensemble mean squared errors over all the

different observational data sets considered and sample arandom observational data set to compute the random

component of error.

Some of the members of the S-PPE-E ensemble haverather large TOA flux imbalances. Nevertheless, this par-

titioning of more equal contributions from systematic and

random components is even the case if the S-PPE-Eensemble is restricted to ensemble members which are

within 5 W m-2 of TOA balance (indicated as S-PPE-Bal

on Fig. 4). Indeed, the partitioning of systematic andrandom errors for this collection of 43 experiments is

much closer to that seen in the multi-model case. This

suggests that it is possible, in some sense, to mimic thebehaviour of the multi-model ensemble, i.e. having a

greater proportion of random as opposed to systematic

errors and having the ensemble mean being the ‘‘best’’model, if this was thought to be an important aspect of the

ensemble design.

4 Feedbacks and forcings

4.1 Surface-atmosphere feedbacks

A number of frameworks exist for computing climatechange feedbacks and their components. For example,

Soden and Held (2006) analyse the components of feed-

backs in the CMIP3 models using a technique which allowsthe separation into water vapour, lapse rate, cloud and

albedo components. While computationally more tractable

than the full radiative perturbation method (e.g. Colman2003), such methods require a significant amount of pro-

cessing of three-dimensional fields of model output and the

use of a specific off-line radiation code.Given the large number of ensemble members examined

here, and the desire to examine feedbacks in the maximum

number of multi-model ensemble members, we adopt thesimplest, linear, approach to feedback analysis (e.g. Cess

et al. 1990; Boer and Yu 2003). This approach forms the

total feedback parameter (in W m-2 K-1) into componentsfrom the clear-sky and non-clear-sky regions (hereafter

1752 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 17: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

cloud feedback or cloud radiative forcing (CRF) feedback)and further into SW and LW components respectively. The

advantage of using this method is that only a handful of

surface and TOA fields are required for each ensemblemember and there is no dependence on, for example, the

choice of off-line radiation scheme. Furthermore, the

method is easily applicable to both equilibrium atmo-sphere-slab-ocean 29CO2 model experiments and to the

1% per year CO2 increase transient model experiments run

with the coupled atmosphere–ocean models; in the lattercase the ‘‘effective’’ feedback parameters (Murphy 1995)

may be calculated. Nevertheless, there are potential issues

to consider when using the simple approach that are welldocumented (Zhang et al. 1994; Colman 2003; Soden et al.

2004). These problems are alleviated in a related publica-

tion (Yokohata et al. 2010) by adopting the Taylor et al.(2007) approach in comparing the equilibrium feedbacks in

these HadCM3 ensembles with those in a similar perturbed

physics ensemble performed with the MIROC3.2 model.

Gregory et al. (2004) and Forster and Taylor (2006)adopt a time-regression technique to calculate the feedback

parameter and its components in the case of the transient

experiments. In contrast, Raper et al. (2002) use 20-yearaverage model fields centred at the time of CO2 doubling in

the 1% per year scenario. Here both techniques were tested

and no discernible differences were found between theapproaches, with high correlations ([0.9) across ensembles

between feedback components calculated in both ways. We

adopt the latter (20-year average) approach which retainsconsistency also with the analysis of feedbacks in the

equilibrium experiments which also employs 20-year

averages. Thus, the natural variability of the estimates inthe two ensembles should be similar (notwithstanding the

slightly larger natural variability that is likely in models

with a full dynamical ocean).For the estimate of the radiative forcing of doubled CO2

we use the values tabulated in Table 10.2 of Meehl et al.

(2007a) for the multi-model ensemble members. In cases

Fig. 4 The relative contribution of systematic (black bar) andrandom (white bar) mean squared errors to the total error averagedover the ensembles indicated on the figure for the different climate

variables considered. S-PPE-Bal indicates an ensembles of all thoseperturbed-physics slab models for which the net TOA flux is within5 W m-2. See Sect. 3.2 for more details

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1753

123

Page 18: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

where there is incomplete information on the radiative

forcing for a doubling of CO2, including all the perturbedphysics members, we adopt the same estimate of

3.85 W m-2 in the LW and -0.15 W m-2 in the SW

(Myhre et al. 1998). For the perturbed physics ensembles,the same radiation code is employed in each member and

no perturbations were made to parameters which might

directly effect the radiative forcing from CO2 (although aswe see later, the perturbations do seem to effect the forcing

from other agents). Hence we use the standard value of theHadCM3 double CO2 forcing in the perturbed physics case

for both simplicity and for the practical reason of not being

in possession the radiative forcing data for the ensemblemembers.

While some studies (e.g. Senior and Mitchell 2000) have

shown that the climate feedback parameter and its com-ponents may have a time-dependency in the transient cli-

mate change case, we find, in agreement with the work of

Forster and Taylor (2006), no significant variations withina single member over the course of the 1% per year CO2

increase 80-year transient experiments examined. That is

not to say however that there may be time-dependence ineither multi-model or perturbed physics ensemble members

at higher levels of radiative forcing or after significant

further climate change.Figure 5 shows the analysis of the climate feedback

parameters and its components for all the ensembles

considered. The range of total feedback parameter in theslab-ocean multi model ensemble corresponds to a range of

2.0–6.3 K in equilibrium climate sensitivity, with the upper

bound being a version of MIROC3.2 included in theCFMIP ensemble because of its known high sensitivity and

hence because of its usefulness in examining a range of

feedback processes. The range of effective climate sensi-tivity in the AO-MME ensemble is slightly wider at

1.6–7.0 K, the lowest sensitivity model being the BCCR-

BCM2.0 (with no slab-ocean version available) and thehighest being the MIROC3.2hires, which is not the same

version as the highest sensitivity model included in the

CFMIP ensemble. The behaviour of the latter is docu-mented in Yokohata et al. (2008).

As shown in other studies (e.g. Stainforth et al. 2005;

Piani et al. 2005; Webb et al. 2006) the perturbed physicsapproach is capable of exploring a range of global climate

feedbacks of a similar order of magnitude to those found in

the multi-model case. In the case of the S-PPE-E ensemble,the range of feedbacks is somewhat wider than either of the

multi-model ranges, spanning climate sensitivities from 1.6

to 7.9 K. Other perturbed-physics studies with HadCM3(Stainforth et al. 2005; Piani et al. 2005) see inferred cli-

mate sensitivities ranging from approximately 2 K up to

greater than 10 K. These sensitivities (determined from anexponential fit to a 15 year slab-model experiment rather

than an experiment integrated to equilibrium) arise because

of differences in the parameter values chosen for theensemble design in those studies; notably, and as pointed

out in Stainforth et al. (2005), because of the use of lowest

value of the entrainment rate parameter in many members(see also Sanderson et al. 2008). For the S-MME-M

ensemble examined here, there are two high climate sen-

sitivity members (6.7 and 7.1 K) for which the entrainmentcoefficient is not set close to its minimum value, but closer

to the standard value indicating that it is not essential to setthat particular parameter low to produce a high sensitivity

version of HadCM3. The range of total feedback parameter

values in the AO-PPE-A ensemble is similar to that in theS-PPE-M ensemble, a feature of the ensemble design (see

above and Webb et al. 2006). As in the case of the com-

parison of bias and RMS errors, the feedbacks in theAO-PPE-O ensemble are all very similar; there is little

impact of perturbing ocean parameters on global surface

and atmospheric feedbacks.Splitting first the total feedback parameter into compo-

nents from clear-sky and CRF areas, it is clear that the

range of the latter is larger than that in the former. Theclear-sky feedback is exclusively negative, being domi-

nated (we may assume) by the negative black-body feed-

back offset partially by positive water vapour/lapse ratefeedbacks (clear sky LW) and by some positive feedback

(clear sky SW) from sea-ice and snow albedo processes.

Cloud feedbacks (using this method) can take either signand are furthermore composed of SW and LW components

of either sign.

4.2 Drivers of uncertainties in feedbacks

Correlations between feedback components and the totalfeedback parameter provide a simple way of determining

the leading-order driver of temperature response uncer-

tainties in the ensembles (Fig. 6). It is evident that CRFfeedbacks are, as reported elsewhere (e.g. Webb et al.

2006), the major drivers of spread in the total feedback

parameters. The correlation between the total and cloudfeedback parameter is 0.8 in the case of the AO-MME

ensemble; 0.9 in the case of the S-MME ensemble; 0.8 in

the case of the S-PPE-S and greater than 0.9 for the otherperturbed physics ensembles. In terms of the SW and LW

components, in the case of both multi-model ensembles it

is the SW component of the CRF feedback which is moststrongly correlated with the total; correlation coefficients of

0.7 and 0.8 for the coupled-model and slab-model ensem-

bles respectively. In the perturbed physics cases, correla-tions between the SW and total feedback parameters are

positive, but more modest. Stronger correlations are found

between the LW component of the cloud feedback andthe total feedback parameter in the PPEs (correlation

1754 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 19: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

coefficients of 0.8 in each separate ensemble). This is in

contrast to the MMEs, where there is almost zero corre-lation between the LW components and the total feedback

parameter.

These results are confirmed by Yokohata et al. (2010) inwhich the variance of the feedback parameter in the Had-

CM3 PPEs is found to be explained by variations in both

SW and LW cloud feedbacks. Because the multi-model

ensembles have only a few members, sampling issues

might affect this conclusion. By sub-sampling the per-turbed physics ensembles it is possible to find small sub-

ensembles that behave like the multi-model ensemble; that

is having a high correlation ([0.8) of the SW cloud feed-back parameter with the total feedback parameter, while

having a low correlation (\0.1) of the LW cloud feedback

parameter with the total. The frequency of occurrence of

Fig. 5 Global atmospheric and surface climate feedback parametersin W m-2 K-1 and (effective) climate sensitivity computed at thetime of CO2 doubling, or at 29CO2 equilibrium for the ensembles asindicated on the panels. A circle is plotted for each member and thewidth of the grey shading is an estimate of uncertainty due to naturalvariability in the calculation as estimated from the long HadCM3control experiment. Top panel the total feedback parameter and

(effective) climate sensitivity; next panels the decomposition of thetotal into clear sky and CRF components; next panel the decompo-sition of the clear sky feedbacks into SW and LW components; nextpanel the decomposition of the cloudy sky feedbacks into SW andLW components and bottom panel the decomposition of the total intoSW and LW. Each panel is drawn in the same scale for comparison

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1755

123

Page 20: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

Fig. 6 Scatter plots of total feedback parameter (x-axis) againstcomponents of the total feedback parameter (y-axis) used toinvestigate the drivers of uncertainty in total feedbacks. The name

of the ensemble is indicated in the title of each plot and the correlationcoefficient is also quoted. The ordinate variable in each row isdifferent and is indicated by the title on the y-axis

1756 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 21: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

these sub-ensembles is very small however, and we assess

the chance of randomly generating MME-like behaviourgiven only a small perturbed physics ensemble to be less

than 1%. Yokohata et al. (2010) explain the apparent

importance of LW cloud feedbacks by splitting the feed-backs into the classes defined in Webb et al. (2006). They

find that the classes with substantial LW cloud feedback

(associated with high cloud) contribute little to the totalfeedback because of opposing SW cloud feedbacks. This

leads to the conclusion that, like the multi-model ensemble,it is the SW component of the cloud feedback (associated

with low cloud regions) which is the principal driver of

uncertainty in the case of the perturbed physics ensemblesexamined here.

4.3 Ocean feedbacks

In the case of the transient experiments, ocean feedbacks

are also important in determining the rate and magnitude ofclimate change. There are various ways of measuring the

efficiency of the ocean in taking up heat and we compare

four related measures here. The k parameter or ocean heatuptake efficiency (e.g. Raper et al. 2002) has the same units

as the atmosphere-surface feedback parameters discussed

above, W m-2 K-1, and can be thought of as the equiva-lent ‘‘ocean feedback parameter’’ which measures the rate

at which heat is removed per unit degree of warming. kGenerally has a time-dependence in, for example, a 1% peryear CO2 increase experiment but by measuring it at the

same time point in each ensemble member, a comparison is

possible. Alternatively, we also examine the effective heatcapacity of the ocean in J K-1 m-2, which may be trans-

lated into an effective ocean depth. A further measure may

be obtained by fitting the output of each ensemble memberto a simplified upwelling-diffusion energy balance model

(Huntingford and Cox 2000) and determining the ocean

thermal diffusivity that best matches the member. This issimilar to the approach of Forest et al. (2006) although we

note that, because the simplified model used here (cited

above) is different to that used by Forest et al. (2006), theestimate of the diffusivity are not numerically comparable.

These measures are contrasted in Fig. 7 for the three

coupled model ensembles considered.Here we do see a significant difference between the

behaviour of the perturbed physics and multi model

ensembles. In the AO-PPE-A ensemble with perturbationsto atmosphere parameters, we see little ensemble spread in

these measures of ocean heat uptake in comparison with

the spread seen in the AO-MME case. This might havebeen expected as each member of the AO-PPE-A ensemble

employs an identical ocean component. However, in the

case of the AO-PPE-O ensemble, with identical HadCM3atmosphere components but perturbations to parameters in

the ocean model, there is a similarly small spread. Collins

et al. (2007) performed a smaller number of HadCM3experiments with perturbations to parameters controlling

three vertical heat transport processes. They found only

small variations in the rate of transient warming in theseun-flux-adjusted experiments. Only marginally significant

variations were found, associated with both changes in

ocean heat uptake efficiency and atmosphere and surfacefeedbacks associated with climate drifts that arise because

of the lack of flux adjustment in those experiments. In theAO-PPE-A and AO-PPE-O ensembles, flux adjustments

are employed to limit such drifts. It appears that the mul-

tiple ocean-component parameter perturbations made here[perturbing more parameters than was done in Collins et al.

(2007)] do not affect the rate of ocean heat uptake signif-

icantly. Nor do they affect the mean surface climate fields,as can be seen from Fig. 2.

Brierley et al. (2009) examine the Collins et al. (2007)

un-flux-adjusted experiments in more detail. They foundfirstly that there is only a small impact of those limited

perturbations on the total heat uptake, the variations being

an order of magnitude smaller than the ensemble averageheat uptake. Furthermore they found an interesting form of

compensation in those experiments, such that when a single

ocean process is perturbed, direct changes in the heatuptake associated with that perturbation is often balanced

by an indirect change in heat uptake from another pro-

cesses (see e.g. Fig. 4 of Brierley et al. 2010). We may

Fig. 7 Measures of the rate of ocean heat uptake in the three ensemblesindicated. Top left panel the k parameter or ocean heat uptake efficiency(W m-2 K-1); top right panel the effective ocean heat capacity(J m-2 K-1); bottom left panel the effective ocean depth (m) computedfrom that heat capacity; bottom right the heat diffusivity (W m-1 K-1)computed by fitting a simplified energy balance model. A filled circle isplotted for each ensemble member. Forest et al. (2006) present theirfitted heat diffusivities in units of the square root of cm2 s-1.300 W m-1 K-1 corresponds to 0.86 (cm2 s-1)-1/2, 600 W m-1 K-1

is 1.2 (cm2 s-1)-1/2, 900 W m-1 K-1 is 1.5 (cm2 s-1)-1/2 and1,200 W m-1 K-1 is 1.7 (cm2 s-1)-1/2 for comparison with theirstudy. This makes the estimates here somewhat lower that the estimatespresented in Forest et al. (2006) and Stott and Forest (2007)

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1757

123

Page 22: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

conjecture that a similar compensation is happening here.

Furthermore, for the scenario experiments (1% per yearCO2 increase and SRES A1B) the linear nature of the

radiative forcing increase means that the ocean plays only a

relatively minor role in determining the rate of climatechange. Contrast, for example, the relative magnitude of

the ocean heat uptake efficiency, k, in Fig. 7 and the total

feedback parameter in Fig. 5. In scenarios where forcing isstabilised, it may be possible to see more of an impact of

the perturbations. Limited experiments have been per-formed with the AO-PPE-O ensemble and the standard

deviation of 20-year averaged global mean temperature

anomalies after 60 years of 29CO2 stabilisation is 0.14 Kin comparison with 0.06 K for the standard deviation of the

20-year averaged TCR computed at the time of CO2 dou-

bling. This represents only a modest increase in spread. It iseither the case that we have not perturbed the most

appropriate parameters in the model despite an extensive

effort to consult our ocean-modelling colleagues, or thatthe heat uptake in this particular ocean component is rather

robust to changes in parameters under the forcing scenarios

examined. Perhaps more ‘‘structural’’ changes are required.We leave these questions to further research.

4.4 Radiative forcing

Having calculated the feedback parameter and its SW and

LW components for each of the coupled atmosphere–oceanmodels from the 1% per year CO2 experiments, it is pos-

sible to estimate the radiative forcing in the historical and

SRES A1B scenarios using the simple linear method ofForster and Taylor (2006) (see their Sect. 2). The forcing is

calculated as the sum of the global feedback parameter

multiplied by the global surface air temperature responseand the global TOA flux diagnosed from the model.

Differences between modelled climate responses in

complex forcing scenarios involving aerosols and naturalfactors will be partly a consequence of differences in cli-

mate feedbacks but also partly because of differences in the

radiative forcing. These may arise because of differentspecifications of forcing agents (e.g. volcanic optical

depths, solar input, aerosol emissions) but also because of

different treatments of those forcing agents by the differentmodels, e.g. the conversion of aerosol emissions into

concentrations, or even because of different radiation codes

(e.g. Collins 2006). As models get more complex, theradiative forcing is less a well-known function of the input

data, but more a result of interactions between complex

modelled processes; the aerosol indirect effects beingprime examples.

The first feature to note is the clustering of the AO-

MME historical simulations into two groups containingthose which apply both anthropogenic and natural forcing,

and those in which only components of the anthropogenic

forcing are applied (Fig. 8). This is obvious from the‘‘negative spikes’’ in the SW forcing time series in the

historical phase of the experiments. Coincident negative

spikes are also seen in the AO-PPE-A historical phase inwhich the volcanic forcing is specified from an updated

Sato et al. (1993) series. The negative SW spikes are

accompanied by smaller positive volcanic LW forcingspikes that result from an enhanced greenhouse effect that

is particularly strong in the polar-night regions where theSW forcing is absent. An estimate of the average volcanic

radiative forcing is calculated in Fig. 9 by differencing the

radiative forcing in the years following the three latetwentieth century eruptions (1964, 1983 and 1992) with the

average value of the radiative forcing in the 5 years prior to

each eruption, and then taking the average over the threeevents. The corresponding ranges of estimated volcanic

forcing in the AO-MME and AO-PPE-A are quite similar

in the SW, LW and in the total. While this averagingreduces contamination from both natural variability and

uncertainties in other forcing agents, some uncertainties

remain as can been seen from the grey shading in Fig. 9.There is inevitably a significant amount of contamination

by natural variability when estimating the volcanic radia-

tive forcing in this way. Despite the fact that the volcanicforcing time series of stratospheric optical depth is pre-

cisely the same in each member of the perturbed physics

ensemble, the spread in total negative volcanic radiativeforcing is comparable with the spread in the multi-model

case in which different input forcing data are used.

In order to compare the century-scale historical radiativeforcing across all members of the ensembles, it is more

convenient to average the radiative forcing in the decade

1995–2004 (Fig. 9), hence avoiding large volcanic erup-tions. Here we do see differences between ensemble

members which are greater than would be expected from

natural variability. The range of SW forcing in theAO-MME is similar to that in the AO-PPE-A but with the

latter having a mean which is slightly more negative (an

ensemble mean of -0.9 W m-2 compared to -0.6 W m-2).Uncertainty in SW forcing could be a consequence of dif-

ferences in the forcing from sulphate and other aerosol

particles that arise because of different ways of specifyingthe direct forcing and due to the way that the model cal-

culates the indirect forcing. The HadCM3 aerosol scheme

translates fields of emissions into concentrations bydynamical processes and represents only the first aerosol

indirect effect (Jones et al. 2001). Despite no perturbations

to the parameters in that sulphate aerosol scheme in theperturbed-physics ensemble, there does appear to be some

significant spread in the SW forcing. In addition, rapid

cloud adjustments to changing levels of greenhouse gases(Gregory and Webb 2008) can appear as an effective

1758 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 23: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

forcing when the calculations are performed in this way.

Hence differences in the attributes of the physical climatesystem which arise through variations in model parameters

appear, in this case, to be sufficient to lead to differences in

the SW forcing which are on a par with that seen in the

multi-model ensemble. Uncertainties in aerosol forcing inequilibrium perturbed physics ensembles are further

examined in Ackerley et al. (2009), while the transient

Fig. 8 Time series of global mean surface air temperature changeand estimated radiative forcing (SW, LW and total) in coupled modelensemble simulations of the historical period and the future under theSRES A1B scenario. The top row shows the forcing time series fromthe multi-model members which include anthropogenic forcing onlyin the historical period. In the second row, the multi-model members

include both anthropogenic and natural forcings. The third row is forthe historical experiments using the perturbed physics AO-PPE-Aensemble with perturbed atmosphere parameters. The bottom tworows are the future experiments with multi-model and perturbedphysics ensembles respectively

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1759

123

Page 24: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

experiments here will be examined in further detail inforthcoming publications.

LW forcing in 1995–2004 is centred around

2.4 W m-2 in both the multi-model and perturbed-physicsensembles, with a range of 1.5–3.1 W m-2 in the

AO-MME case and a smaller range of 2.1–2.7 W m-2 in

the AO-PPE-A case (in both cases the range is greaterthan would be expected from natural variability). These

values may be approximately compared with those pre-

sented in Fig. 2.1 of Forster et al. (2007). The number ofminor greenhouse gases prescribed in the AO-MME may

vary across the ensemble, whereas changes in major andminor gases in the AO-PPE-A are the same in each

member which could be a part explanation for the smaller

range in the perturbed physics case. Another factor whichmay contribute to the slightly greater range in the case of

the AO-MME is differences in radiation codes (Collins

2006). Nevertheless, there does appear to be some influ-ence of variations in atmospheric model parameters on

the LW forcing in the AO-PPE-A ensemble when calcu-

lated in this way. This is confirmed when we look at theLW forcing in that ensemble in the future.

The spread of total historical forcing in the multi-model

ensemble (0.9–3.0 W m-2 with a mean of 1.7 W m-2) isslightly bigger than the spread in the perturbed physics

historical forcing (0.7–1.9 W m-2 with a mean of

1.5 W m-2). The perturbed physics approach does how-ever produce some significant spread in forcing, despite the

specification of identical forcing time series (greenhouse

gases, aerosol emissions and natural factors) in eachmember. The spread in total forcing in both ensembles can

be compared to that stated in Forster et al. (2007); 0.6–

2.4 W m-2 with a mean of 1.6 W m-2 for the year 2005. Itis likely that further spread would arise if a greater number

of ensemble members are performed with a wider sample

of atmosphere and sulphur-cycle parameter space and inputforcing fields.

Turning to the future forcing estimates, more interestingdifferences are evident between the AO-MME and AO-

PPE-A as measured by the average in the decade 2090–

2099. A larger range of SW forcing is diagnosed from themulti-model ensemble than in the perturbed physics

ensemble (Figs. 8, 9) with even some positive radiative

forcing in the SW. Forster and Taylor (2006) discuss this inmore detail. In addition there are a few ‘‘outliers’’ in the

calculation of the LW forcing in the multi model case.

Even excluding these outliers, the range of total forcing in2090–2099 appears to be larger in the multi-model case

than in the perturbed physics case. In order to generatefuture uncertainty in radiative forcing in the case of per-

turbed physics ensembles, it is probably necessary to

sample uncertainties in the input files which specify theless certain radiative agents such as aerosols, ozone, etc.

In summary, the range of uncertainty in radiative forcing

between the multi-model and perturbed physics cases arecomparable in terms of the forcing due to volcanic erup-

tions and the mean forcing over the historical period. In

terms of the future forcing, there is a wider spread in themulti-model than in the perturbed physics ensemble. The

consistency between the volcanic and historical forcing is

partly a consequence of contamination by natural vari-ability which reduces the signal-to-noise ratio but which

plays a relatively less important role as the forcing rises in

the future.Sampling uncertainties in historical forcing is desirable

as those will impact, for example, the component of the

committed warming in any prediction scheme and the useof historical trends in providing observational constraints.

Sampling future forcing scenarios in some way which may

be suitable for producing PDFs is a more difficult problembecause of the lack of a large body of literature on prob-

abilistic forcing scenarios; hence the conditioning of PDFs

on the SRES scenarios in Murphy et al. 2009. Nevertheless,some efforts to quantify uncertainties in the economic

Fig. 9 Time-averaged SW, LW and total radiative forcing fromdifferent coupled model ensembles and for different time periods asindicated on the y-axis of each panel. Vol indicates the annual-meanforcing averaged over the years following the major twentieth centuryvolcanic eruptions (1964, 1983 and 1992). A circle is plotted for each

member of the ensemble and the grey shading represents an estimateof the ±2SD uncertainty in the estimate due to natural variability(computed from the long HadCM3 control experiment). Left SWforcing; middle LW forcing; right total

1760 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 25: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

Fig. 10 Scatter plots of total climate change feedback parameter(Fig. 5) versus time-averaged model errors (biases and root-mean-squared errors—Fig. 2). Black crosses indicate perturbed physicsmodel experiments with HadSM3 and HadCM3, red squares indicatemulti-model experiments with slab ocean components and red

triangles indicate multi-model experiments with dynamical oceans.The grey vertical bars are an estimate of the uncertainty fromobservations in the calculation (from Fig. 2), centred on the mean biasor RMSE across all ensemble types

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1761

123

Page 26: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

drivers of future forcing are underway (Webster et al. 2002;

Sokolov et al. 2009).

5 Relating model errors to feedbacks

Having examined model errors and climate change feed-

backs in the multi-model and perturbed physics ensembles,we now examine the relationships between them. As

highlighted in the introduction, there are a number ofreasons why we may wish to do this. Firstly, in order to

make predictions of climate change in which uncertainties

in modelling processes are quantified, part of the algorithmrequires the assignment of relative likelihoods to different

models or different model versions. This, together with all

the other ingredients in the Bayesian approach (see intro-duction and Murphy et al. 2007, 2009) are used to produce

weighted probability distribution functions of future

change. Secondly, to improve models we need to knowhow to target research to do this, i.e., by quantifying the

relationship between error and climate feedback, we may

learn which improvements to different aspects of the modelsimulations will lead to the most progress in reducing

uncertainty in predictions.

Scatter plots of biases and root-mean-squared errors inindividual variables against the total strength of the feed-

backs under climate change (Fig. 10) are a first step in

examining such relationships. In doing this we might hopeto uncover simple leading-order correlations between

model errors and feedback strengths. Unfortunately, it is

clear from Fig. 10 that no such simple relationships existover all model versions in the multi-model and perturbed

physics experiments examined here. The best linear cor-

relations are found between the feedback parameter andvariables such as biases in net TOA fluxes, total outgoing

SW and SW cloud radiative forcing, with correlation

coefficients around 0.7 in the case of all the perturbedphysics experiments, but no similar correlations for the

multi-model members. The only variable in which there is

a reasonably high correlation between errors and feedbacksin both perturbed physics and multi model ensembles are

the biases in the global mean cloud amount (coefficients

around 0.6–0.7, see also Yokohata et al. 2010). Neverthe-less, for the perturbed physics ensembles there are weak to

moderately strong correlations for a number of variables

suggesting that the combination of those (and other) vari-ables into a single metric would be a way of constraining

the climate feedback parameter. In order to do this, we

must take into account both errors in observational fieldsand covariances between errors in different variables.

Reducing the degrees of freedom in such a calculation is

important and projection onto a multivariate EOF space, asdone in Piani et al. (2005) and Murphy et al. (2009), is one

way of doing this. For the multi-model ensembles, there are

much fewer weak-to-moderate correlations. One possiblereason for this is that relationships are weakened in the

model development processes when models are modified,

for example, to achieve net TOA flux balance.This lack of strong relationships for single variables is

obvious in retrospect since if there were such a clear

coupling between the errors in the present day simulationof a single variable and climate sensitivity, then this would

have probably have been discovered through simplephysical arguments and/or mechanistic studies. As has

already been pointed out in a number of studies (Min et al.

2007; Sanderson et al. 2008), it is not possible to stronglyconstrain predictions of even global mean climate change

using constraints provided from single observed fields

using simple metrics of time averaged fields (e.g. Knuttiet al. 2006). Providing constraints on regional change may

be even more challenging. Multivariate techniques are

required in which the constraint is extracted from themodel and observed data using potentially rather complex

statistical techniques. The unfortunate upshot of this is that

it becomes difficult to understand how a multivariateconstraint operates using simple physical arguments.

We leave the development of complex constraints on

climate predictions using the ensemble experimentsdescribed here to other papers which use statistical tools

such as emulators (e.g. Rougier et al. 2009) and other

aspects of the Bayesian approach (introduction and Murphyet al. 2009) which require considerable explanation.

However, we should note that such endeavours are unlikely

to be either simple to understand, simple to describe orsimple to implement (see also Knutti et al. 2010). The

combination of data from model simulations with obser-

vations to produce predictions of climate change in whichuncertainties are quantified is likely to involve a level of

complexity on a par with the development of numerical

climate models themselves, or the subject of data assimi-lation in initial-value weather and climate prediction.

6 Discussion and conclusions

We have performed a comparison of various characteristicsof perturbed physics ensembles performed with the third

version of the Hadley Centre climate model with those

collected as part of the CMIP3 and CFMIP projects. Wefind the following:

1. The perturbed physics approach can sample a widerange of different model ‘‘errors’’ in two-dimensional

time-averaged climate fields for a number of different

variables that for many variables are comparable withuncertainties in the observations and comparable with

1762 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Derek J. Posselt
Derek J. Posselt
Page 27: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

the errors in the members of the multi-model archive.

The degree of sampling of errors in climatological

fields in the perturbed physics ensembles studied hereis dependent on the algorithm used for selecting the

values of the perturbed parameters.

2. The general situation for the perturbed physicsensembles is that more of the total error is contributed

from the systematic component than the random

component, that is the ratio of the errors which arecommon to all model versions to the errors that are

unique to particular model versions is greater than

unity. However, depending on the experimental designof the perturbed physics ensemble, the ratio of

systematic to random components of the distributions

of model errors can be controlled in order to mimic thebehaviour of the multi-model case in which the

random component tends to be of the same order of

magnitude or larger than the systematic component.Thus, it is possible to produce quite different baseline

climates with the perturbed physics approach such that

the ensemble-mean appears as the ‘‘best’’ model incomparison with any individual ensemble member.

3. The perturbed physics approach can sample a wide

range of global-mean feedbacks under climate change.With the experiments examined here, both the SW and

LW components of cloud feedbacks are, at first

inspection responsible for the major component ofthe feedback uncertainty in the perturbed physics case

while it is the SW only which is dominant in the multi-

model case. However, it is likely that there is aregional cancellation between LW and SW feedbacks

and that it is the SW feedbacks associated with low-

clouds that are the dominate driver of uncertainty inboth types of ensemble (see also Yokohata et al. 2010).

Perturbing ocean parameters however results in very

little spread in measures of the rate of ocean heatuptake in the forcing scenarios examined.

4. Using a simple method to compute radiative forcing

under past and future SRES A1B conditions, perturb-ing the parameters in the physical component of the

model is sufficient to generate some spread in the

radiative forcing. For the case of volcanic forcing andfor the combined natural and anthropogenic historical

forcing, this is of the same order of magnitude of that

seen in the multi-model case despite the use of acommon set of forcing input fields in the perturbed

physics case. For the future forcing, where signal-to-

noise ratios are higher, there is more spread in theCMIP3 multi-model ensemble presumably because of

the use of different input forcing fields in thatensemble.

5. There are no simple emergent relationships between

the gross-measures of model error used here and the

global climate-change feedbacks which could be

simply employed to constrain predictions of future

climate change. Techniques to make ‘‘climate con-straints’’ are inevitably complex and multivariate.

Note that the above conclusions related to integrated

global measures of errors, forcings and feedbacks. Forregional measures, and for variables not examined here

such as variability or extremes, there may be differences

between perturbed physics and multi model ensembleswhich do not fit with these general conclusions. It remains

a challenge to produce regional projections of climate

change and this we leave to future research.What are the desirable characteristics of an ensemble of

models used to quantify uncertainties in predictions of

climate change? Firstly we should seek to minimise thesystematic component of model error (here simply defined

as the ensemble mean) by using a model structure which

is, what we might call, well specified. That is, a modelstructure which satisfies the rigorous standards of climate

modelling in terms of conservation and even coding

practices, and in which we have a good chance ofachieving a low systematic error. Using that structure, we

should then generate ensemble members which are both

consistent with the relatively large uncertainty in theobserved fields we use in our multivariate definitions of

metrics of fidelity and which exhibit a wide range offeedbacks and spatial patterns of climate change. Should

the distribution of model errors measured against observed

climatologies, variability and trends in such an ensembleexhibit the ‘‘ensemble mean is the best’’ characteristics as

found in so many other modelling and forecast applica-

tions? Annan and Hargreaves (2010) discuss this issue.For very practical reasons, we might also wish to design

ensembles in a way which aids in the fitting of model

emulators (e.g. Rougier et al. 2009) to the ensemble inorder to produce probabilistic estimates of climate change

for policy makers.

While the above definition sounds sensible, there aresome aspects of ensemble design which are difficult to

achieve and measure. What is a ‘‘wide range of feedbacks’’

for example? It is tempting to always compare the per-turbed physics ensembles with the multi-model ensembles

with the latter the implied benchmark with which the for-

mer is to be measured. Yet, of course, the multi-modelensemble has in no way been systematically designed to be

an adequate sample of all possible models one could for-

mulate, and moreover, it might be that the process of‘‘tuning’’ to replicate certain basic aspects of historical

climate (notably planetary radiation balance) might result

in an unrealistically narrow spread of future climate changeresponses which does not fully reflect the full implications

of uncertainties in the many detailed individual processes

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1763

123

Derek J. Posselt
Page 28: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

included in the models. Perturbed physics ensembles pro-

duced with different model structures may shed furtherlight on these issues (e.g. Yokohata et al. 2010). Ongoing

model development should result in better specified models

which also may produce different behaviour when used toproduce perturbed physics ensembles. In our companion

work on producing probabilistic climate change projec-

tions, we combine perturbed physics and multi-modelensemble information together with observations and esti-

mates of uncertainty in observations to produce projectsbased on as much information about the climate system as

is possible (Murphy et al. 2007, 2009).

Acknowledgments This work was supported by the Joint DECCand Defra Integrated Climate Programme—DECC/Defra (GA01101)and by the European Community ENSEMBLES (GOCE-CT-2003-505539). Hugo Lambert made useful comments on an earlier versionof the manuscript and we thank three anonymous reviewers for theircomments.

References

Ackerley D, Highwood EJ, Frame D, Booth BBB (2009) Changes inthe global sulfate burden due to perturbations in global CO2

concentrations. J Clim 20:5421–5432Adler RF et al (2003) The Version 2 global precipitation climatology

project (GPCP) Monthly precipitation analysis (1979-Present).J Hydrometeorol 4:1147–1167

Allan RJ, Ansell TJ (2006) A new globally complete monthlyhistorical mean sea level pressure data set (HadSLP2): 1850–2004. J Clim 19:5816–5842

Allen MR, Kettleborough J, Stainforth DA (2002) Model error inweather and climate forecasting. In: Proceedings of the ECMWFseminar series. http://www.ecmwf.int

Annan JD, Hargreaves JC (2010) Reliability of the CMIP3 ensemble.Geophys Res Lett 37:L02703. doi:10.1029/2009GL041994

Annan JD, Hargreaves JC, Ohgaito R, Abe-Ouchi A, Emori S (2005)Efficiently constraining climate sensitivity with ensembles ofPaleoclimate simulations. Sci On-line Lett Atmos 1:181–184

Aumann HH et al (2003) AIRS/AMSU/HSB on the Aqua mission:design, science objectives, data products, and processingsystems. IEEE Trans Geosci Remote Sens 41:253–264

Barnett DN, Brown SJ, Murphy JM, Sexton DMH, Webb MJ (2006)Quantifying uncertainty in changes in extreme event frequencyin response to doubled CO2 using a large ensemble of GCMsimulations. Clim Dyn 26:489–511

Boer G, Yu B (2003) Climate sensitivity and response. Clim Dyn20:415–429

Brierley CM, Thorpe AJ, Collins M (2009) An example of thedependence of the transient climate response on the temperatureof the modelled climate state. Atmos Sci Lett 10:23–28

Brierley CM, Collins M, Thorpe AJ (2010) The impact of perturba-tions to ocean-model parameters on climate and climate changein a coupled model. Clim Dyn 34:325–343

Brohan P, Kennedy JJ, Harris I, Tett SFB, Jones PD (2006)Uncertainty estimates in regional and global observed temper-ature changes: a new dataset from 1850. J Geophys Res111:D12106. doi:10.1029/2005JD006548

Cess RD et al (1990) Intercomparison and interpretation of climatefeedback processes in 19 atmospheric general circulationmodels. J Geophys Res 95:16601–16615

Collins WV (2006) Radiative forcing by well-mixed greenhousegases: Estimates from climate models in the IPCC AR4.J Geophys Res 111:D14317. doi:10.1029/2005JD006713

Collins M (2007) Ensembles and probabilities: a new era in theprediction of climate change. Philos Trans R Soc Lond A365:1957–1970

Collins M, Booth BBB, Harris GR, Murphy JM, Sexton DMH, WebbMJ (2006) Towards quantifying uncertainty in transient climatechange. Clim Dyn 27:127–147

Collins M, Brierley CM, MacVean M, Booth BBB, Harris GR (2007)The sensitivity of the rate of transient climate change to oceanphysics perturbations. J Clim 20:2315–2320

Colman RA (2003) A comparison of climate feedbacks in generalcirculation models. Clim Dyn 20:865–873

Da Silva A, Young C, Levitus S (1994) Atlas of surface marine data1994, volume 1: algorithms and procedures. NOAA AtlasNESDIS 6. US Department of Commerce, Washington

Dijkstra HA, Neelin JD (1999) Imperfections of the thermohalinecirculation: multiple equilibria and flux correction. J Clim12:1382–1392

Forest CE, Stone PH, Sokolov AP (2006) Estimated PDFs of climatesystem properties including natural and anthropogenic forcings.Geophys Res Lett 33:L01705

Forster PMdeF, Taylor KE (2006) Climate forcings and climatesensitivities diagnosed from coupled climate model integrations.J Clim 19:6181–6194

Forster PMdeF et al (2007) Changes in atmospheric constituents andin radiative forcing. In: Solomon S, Qin D, Manning M, ChenZ, Marquis M, Averyt KB, Tignor M, Miller HL (eds) Climatechange 2007: the physical science basis. Contribution ofworking Group I to the fourth assessment report of theintergovernmental panel on climate change. Cambridge Uni-versity Press, Cambridge, United Kingdom and New York, NY,USA

Frame DJ et al (2009) The climateprediction.net BBC climate changeexperiment part 1: design of the coupled model ensemble. PhilosTrans R Soc Lond A 367:855–870

Gleckler PJ, Taylor KE, Doutriaux C (2008) Performance metricsfor climate models. J Geophys Res 113:D06104. doi:10.1029/2007JD008972

Gordon CC et al (2000) The simulation of SST, sea ice extents andocean heat transport in a version of the Hadley Centre coupledmodel without flux adjustments. Clim Dyn 16:147–168

Gregory JM, Webb MJ (2008) Tropospheric adjustment induces acloud component in CO2 forcing. J Clim 21:58–71

Gregory JM et al (2004) A new method for diagnosing radiativeforcing and climate sensitivity. Geophys Res Lett 31:L03205

Grist JP, Josey SA (2003) Inverse analysis adjustment of the SOC air–sea flux climatology using ocean heat transport constraints.J Clim 20:3274–3295

Hagedorn R, Doblas-Reyes FJ, Palmer TN (2005) The rationalebehind the success of multimodel ensembles in seasonalforecasting. Part I. Basic concept. Tellus 57:219–233

Hansen J, Ruedy R, Sato M, Reynolds R (1996) Global surface airtemperature in 1995: return to pre-Pinatubo level. Geophys ResLett 23:1665–1668

Harris GR, Sexton DMH, Booth BBB, Collins M, Murphy JM, WebbMJ (2006) Frequency distributions of transient regional climatechange from perturbed physics ensembles of general circulationmodel simulations. Clim Dyn 27:357–375

Harrison EF, Minnis P, Barkstrom BR, Ramanathan V, Cess R,Gibson CG (1990) Seasonal variation of cloud radiative forcingderived from the Earth Radiation Budget Experiment. J GeophysRes 95:687–703

Held IM, Soden BJ (2006) Robust responses of the hydrological cycleto global warming. J Clim 19:5686–5699

1764 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123

Page 29: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

Hibbard KA, Meehl GA, Cox PM, Friedlingsten P (2007) A strategyfor climate change stabilization experiments. EOS 88:20. doi:10.1029/2007EO200002

Huntingford C, Cox PM (2000) An analogue model to deriveadditional climate change scenarios from existing GCM simu-lations. Clim Dyn 16:575–586

Jackson CS, Sen MK, Huerta G, Deng Y, Bowman KP (2008) Errorreduction and convergence in climate prediction. J Clim21:6698–6709

Jones A, Roberts DL, Woodage MJ, Johnson CE (2001) Indirectsulphate aerosol forcing in a climate model with an interactivesulphur cycle. J Geophys Res 106:20293–20310

Joshi MM, Gregory JM, Webb MJ, Sexton DMH, Johns TC (2008)Mechanisms for the land/sea warming exhibited by simulationsof climate change. Clim Dyn 30:455–465

Jun M, Knutti R, Nychka DW (2008) Spatial analysis to quantifynumerical model bias and dependence: how many climatemodels are there? J Am Stat Assoc Appl Case Stud 103:934–947

Knutti R, Meehl GA, Allen MR, Stainforth DA (2006) Constrainingclimate sensitivity from the seasonal cycle in surface tempera-ture. J Clim 19:4224–4233

Knutti R, Furrer R, Tebaldi C, Cernak J, Meehl GA (2010) Challengesin combining projections from multiple climate models. J Clim(in press)

Lambert SJ, Boer HJ (2001) CMIP1 evaluation and intercomparisonof coupled climate models. Clim Dyn 17:83–106

Lambert FH, Chiang JCH (2007) Control of land–ocean temperaturecontrast by ocean heat uptake. Geophys Res Lett 34:L13704

Legates DR, Willmott CJ (1990) Mean seasonal and spatial variabilityin global surface air temperature. Theor Appl Climatol 41:11–21

McKay MD, Conover WJ, Beckman RJ (1979) A comparison of threemethods for selecting values of input variables in the analysis ofoutput from a computer code. Technometrics 21:239–245

Meehl GA, Stocker T et al (2007a) Global climate projections. I.Climate Change 2007: the physical science basis. In: Solomon S,Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M,Miller HL (eds) Contribution of working Group I to the fourthassessment report of the intergovernmental panel on climatechange. Cambridge University Press, Cambridge, United King-dom and New York, NY, USA

Meehl GA et al (2007b) The WCRP CMIP3 multimodel dataset: anew era in climate change research. Bull Am Meteorol Soc88:1383–1394

Min SK, Simonis D, Hense A (2007) Probabilistic climate changepredictions applying Bayesian model averaging. Philos Trans RSoc Lond A 365:2103–2116

Molteni F, Buizza R, Palmer TN, Petroliagis T (2006) The ECMWFensemble prediction system: methodology and validation. QuartJ Roy Meteorol Soc 122:73–119

Moore B, Gates WL, Mata LJ, Underdal A (2001) Advancing ourunderstanding. In: Houghton JT, Ding Y, Griggs DJ, Noguer M,van der Linden PJ, Dai X, Maskell K, Johnson CA (eds) Climatechange 2001: the scientific basis. Contribution of working GroupI to the third assessment report of the intergovernmental panel onclimate change, Cambridge University Press

Murphy JM (1995) Transient response of the Hadley Centre coupledocean–atmosphere model to increasing carbon dioxide. Part III.Analysis of global mean response using simple models. J Clim8:496–514

Murphy JM, Sexton DMH, Barnett DN, Jones GS, Webb MJ, CollinsM, Stainforth DA (2004) Quantification of modelling uncertain-ties in a large ensemble of climate change simulations. Nature430:768–772

Murphy JM, Booth BBB, Collins M, Harris GR, Sexton D, Webb MJ(2007) A methodology for probabilistic predictions of regional

climate change from perturbed physics ensembles. Philos TransR Soc Lond A 365:1993–2028

Murphy JM, Sexton DMH, Jenkins G, Boorman P, Booth BBB,Brown K, Clark R, Collins M, Harris GR, Kendon E (2009)Climate change projections. ISBN 978-1-906360-02-3

Myhre G, Highwood EJ, Shine KP, Stordal F (1998) New estimates ofradiative forcing due to well mixed greenhouse gases. GeophysRes Lett 25(14):2715–2718. doi:10.1029/98GL01908

Niehorster F, Spangehl T, Fast I, Cubasch U (2006) Quantification ofmodel uncertainties: parameter sensitivities of the coupled modelECHO-G with middle atmosphere. Geophys Res Abs 8, EGU06-A-08526

Piani C, Frame DJ, Stainforth DA, Allen MR (2005) Constraints onclimate change from a multi-thousand member ensemble ofsimulations. Geophys Res Lett 32:L23825. doi:10.1029/2005GL024452

Pope VD, Gallani ML, Rowntree PR, Stratton RA (2000) The impactpf new physical parametrizations in the Hadley Centre climatemodel-HadAM3. Clim Dyn 16:123–146

Raper SCB, Gregory JM, Stouffer RJ (2002) The role of climatesensitivity and ocean heat uptake on AOGCM transient temper-ature response. J Clim 15:124–130

Rayner NA et al (2003) Global analyses of sea surface temperature,sea ice, and night marine air temperature since the latenineteenth century. J Geophys Res 108, D14, 4407. doi:10.1029/2002JD002670

Reichler T, Kim J (2008) How well do climate models simulatetoday’s climate? Bull Am Meteorol Soc 89:303–311

Rossow WB, Walker AW, Beuschel DE, Roiter MD (1996)International Satellite Cloud Climatology Project (ISCCP)documentation of new cloud datasets World MeteorologicalOrganisation WMO/TD 737, pp 115

Rougier JC (2007) Probabilistic inference for future climate using anensemble of climate model evaluations. Clim Change 81:247–264

Rougier JC, Sexton DMH, Murphy JM, Stainforth DA (2009)Analysing the climate sensitivity of the HadSM3 climate modelusing ensembles from different but related experiments. J Clim22:3540–3557

Sanderson BM, Piani C (2007) Towards constraining climatesensitivity by linear analysis of feedback patterns in thousandsof perturbed-physics GCM simulations. Clim Dyn 30:175–190

Sanderson BM et al (2008) Constraints on model response togreenhouse gas forcing and the role of subgrid-scale processes.J Clim 21:2384–2400

Sato M, Hansen JE, McCormick MP, Pollack JB (1993) Stratosphericaerosol optical depths (1850–1990). J Geophys Res 98:22987–22994

Schneider von Deimling T, Held H, Ganopolski A, Rahmstorf S(2006) Climate sensitivity estimated from ensemble simulationsof glacial climates. Clim Dyn 27:149–163

Senior CA, Mitchell JFB (2000) The time dependence of climatesensitivity. Geophys Res Lett 27:2685–2688

Smith TM, Reynolds RW (2004) Improved extended reconstructionof SST (1854–1997). J Clim 17:2466–2477

Soden BJ, Held IM (2006) An assessment of climate feedbacks incoupled ocean–atmosphere models. J Clim 19:3354–3360

Soden BJ, Broccoli AJ, Hemler RS (2004) On the use of cloud forcingto estimate cloud feedback. J Clim 17(19):3661–3665

Sokolov AP et al (2009) Probabilistic forecast for 21st centuryclimate based on uncertainties in emissions (without policy) andclimate parameters. J Clim 22:5175–5204

Stainforth DA et al (2005) Uncertainty in predictions of the climateresponse to rising levels of greenhouse gases. Nature 433:403–406

M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1765

123

Page 30: Climate model errors, feedbacks and forcings: a comparison of …climateknowledge.org/CLIMATE_530_Uncertainty_Stationarity_Readi… · Climate model errors, feedbacks and forcings:

Stocker TF (2004) Climate change: models change their tune. Nature430:737–738

Stott PA, Forest CE (2007) Ensemble climate predictions usingclimate models and observational constraints. Philos Trans RSoc Lond A 365:2029–2052

Sutton RT, Dong B-W, Gregory JM (2007) Land/sea warming ratio inresponse to climate change: IPCC AR4 model results andcomparison with observations. Geophys Res Lett 34:L02701

Taylor KE (2001) Summarizing multiple aspects of model perfor-mance in a single diagram. J Geophys Res 106:7183–7192

Taylor KE, Crucifix M, Doutriaux C, Broccoli AJ, Mitchell JFB,Webb MJ (2007) Estimating shortwave radiative forcing andresponse in climate models. J Clim 20:2530–2543

Tziperman E, Toggweiler JR, Feliks Y, Bryan K (1994) Instability ofthe thermohaline circulation with respect to mixed boundaryconditions: is it really a problem for realistic models? J PhysOceanogr 24:217–232

Uppala SM et al (2005) The ERA-40 re-analysis. Quart J RoyMeteorol Soc 131:2961–3012

Webb MJ et al (2006) On the contribution of local feedbackmechanisms to the range of climate sensitivity in two GCMensembles. Clim Dyn 27:17–38

Webster MD et al (2002) Uncertainty in emissions projections forclimate models. Atmos Environ 36:3659–3670

Wielicki BA, Barkstrom BR, Harrison EF, Lee RB III, Louis SmithG, Cooper JE (1996) Clouds and the Earth’s Radiant EnergySystem (CERES): an earth observing system experiment. BullAm Meteorol Soc 77:853–868

Wylie DP, Menzel WP, Woolf HM, Strabala KI (1994) Four years ofglobal cirrus cloud statistics using HIRS. J Clim 7:1972–1986

Xie P, Arkin PA (1997) Global precipitation: a 17-year monthlyanalysis based on gauge observations, satellite estimates, andnumerical model outputs. Bull Am Meteorol Soc 78:2539–2558

Yokohata T et al (2008) Comparison of equilibrium and transientresponses to CO2 increase in eight state-of-the-art climatemodels. Tellus 60:946–961

Yokohata T, Webb MJ, Collins M, Williams KD, Yoshimori M,Hargreaves JC, Annan JD (2010) Structural similarities anddifferences in climate responses to CO2 increase between twoperturbed physics ensembles. J Clim 23(6):1392–1410

Zhang MH, Cess RD, Hack JJ, Kiehl JT (1994) Diagnostic study ofclimate feedback processes in atmospheric GCMs. J GeophysRes 99:5525–5537

1766 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics

123