26
Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018). The ‘pause’ in global warming in historical context: (II). Comparing models to observations. Environmental Research Letters, 13(12), [123008]. https://doi.org/10.1088/1748-9326/aaf372 Publisher's PDF, also known as Version of record License (if available): CC BY Link to published version (if available): 10.1088/1748-9326/aaf372 Link to publication record in Explore Bristol Research PDF-document This is the final published version of the article (version of record). It first appeared online via IOP at DOI: 10.1088/1748-9326/aaf372/meta. Please refer to any applicable terms of use of the publisher. University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/pure/user-guides/explore-bristol-research/ebr-terms/

Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman,B. A., Oreskes, N., & Rahmstorf, S. (2018). The ‘pause’ in globalwarming in historical context: (II). Comparing models to observations.Environmental Research Letters, 13(12), [123008].https://doi.org/10.1088/1748-9326/aaf372

Publisher's PDF, also known as Version of recordLicense (if available):CC BYLink to published version (if available):10.1088/1748-9326/aaf372

Link to publication record in Explore Bristol ResearchPDF-document

This is the final published version of the article (version of record). It first appeared online via IOP at DOI:10.1088/1748-9326/aaf372/meta. Please refer to any applicable terms of use of the publisher.

University of Bristol - Explore Bristol ResearchGeneral rights

This document is made available in accordance with publisher policies. Please cite only thepublished version using the reference above. Full terms of use are available:http://www.bristol.ac.uk/pure/user-guides/explore-bristol-research/ebr-terms/

Page 2: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

Environ. Res. Lett. 13 (2018) 123007 https://doi.org/10.1088/1748-9326/aaf372

TOPICAL REVIEW

The ‘pause’ in global warming in historical context: (II). Comparingmodels to observations

Stephan Lewandowsky1,2,3 , KevinCowtan4, James SRisbey3 ,Michael EMann5, ByronASteinman6,NaomiOreskes7 and StefanRahmstorf8,9

1 University of Bristol, Bristol, United Kingdom2 University ofWesternAustralia, Crawley,WA,Australia3 CSIROOceans&Atmosphere, Hobart, Tasmania, Australia4 Department of Chemistry, University of York, York,United Kingdom5 Department ofMeteorology andAtmospheric Sciences, Pennsylvania StateUniversity, State College, United States of America6 Department of Earth and Environmental Sciences and Large LakesObservatory, University ofMinnesotaDuluth, Duluth, United States

of America7 Department of theHistory of Science,HarvardUniversity, Cambridge, United States of America8 Potsdam Institute for Climate Impact Research, Potsdam,D-14473, Germany9 University of Potsdam, Institute of Physics andAstronomy, Potsdam,Germany

E-mail: [email protected]

Keywords: climatemodels, climate projections, ‘pause’ in global warming

AbstractWe review the evidence for a putative early 21st-century divergence between globalmean surfacetemperature (GMST) andCoupledModel Intercomparison Project Phase 5 (CMIP5) projections.Weprovide a systematic comparison between temperatures and projections using historical versions ofGMSTproducts and historical versions ofmodel projections that existed at the timeswhen claimsabout a divergence weremade. The comparisons are conductedwith a variety of statistical techniquesthat correct for problems in previous work, including using continuous trends and aMonteCarloapproach to simulate internal variability. The results show that there is no robust statistical evidencefor a divergence betweenmodels and observations. The impression of a divergence early in the 21stcenturywas caused by various biases inmodel interpretation and in the observations, andwasunsupported by robust statistics.

1. Introduction

A presumed slowdown in global warming during thefirst decade of the 21st century, and an allegeddivergence between projections from climate modelsand observations, have attracted considerable researchattention. Even though the Earth’s climate has longbeen known to fluctuate on a range of temporal scales(Climate Research Committee, National ResearchCouncil 1995), the most recent fluctuation has beensingled out as a seemingly unique phenomenon, beingidentified as ‘the pause’ or ‘the hiatus.’ By the end of2017, the ‘pause’ had been the subject of more than200 peer-reviewed articles (Risbey et al 2018).

Here, we focus on one aspect of the putative‘pause’; namely, an alleged divergence between modelprojections and observed global mean surface temper-ature (GMST); in particular the claim that climate

models over-estimated warming (Fyfe et al 2013). Thequestion of whether GMST deviates from model pro-jections has often been conflated with, but is con-ceptually distinct from, questions relating to theobserved warming rate. For example, one might askwhether warming has ceased or ‘paused’ or entered a‘hiatus’. Answers to this question involve tests of thestatistical hypothesis that the warming trend is equalto zero. A different question might be whether warm-ing has slowed significantly, in which case the statis-tical question is whether there is a change in the long-term rate of warming. A third question, at issue in thisarticle, is whether the observations diverge frommodel-derived expectations.

Existing research on the ‘pause’has often conflatedthe distinct questions that can be asked about short-term warming trends. This conflation can be proble-matic because it is possible, in principle, for the

OPEN ACCESS

RECEIVED

6 February 2017

REVISED

11November 2018

ACCEPTED FOR PUBLICATION

23November 2018

PUBLISHED

19December 2018

Original content from thisworkmay be used underthe terms of the CreativeCommonsAttribution 3.0licence.

Any further distribution ofthis workmustmaintainattribution to theauthor(s) and the title ofthework, journal citationandDOI.

© 2018TheAuthor(s). Published by IOPPublishing Ltd

Page 3: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

observations to diverge from model-derived expecta-tions even though warming continues unabated.Under those circumstances it would be misleading todiscuss a ‘pause’ or ‘slowdown’, notwithstanding anydivergence between projected and observed trends.

A further difficulty in interpreting research on the‘pause’ is that this period of intense research activitycoincided with notable improvements to observa-tional datasets. Specifically, GMST datasets are evol-ving over time as they extend coverage (Morice et al2012), introduce or modify interpolation methodsthat fill-in data for areas in which observations aresparse (Cowtan andWay 2014), or remove biases aris-ing from issues such as the transition of sea surfacetemperature (SST) measurement from ships to buoys(Karl et al 2015, Hausfather et al 2017). In con-sequence, research on short-term warming trendsmay come to different conclusions, depending onwhat version of a dataset is being used. Figure 1, adap-ted from the companion article by Risbey et al (2018),shows that the consequences of revisions to GMSTdatasets are far from trivial. The figure shows GMSTtrends starting in 1998 and ending at the timesmarkedon the x-axis (vantage points). Each solid line showswhat we call ‘historically-conditioned’ trends, whichreflect only data that were available at each vantagepoint. The thin lines, by contrast, show the

retrospective (‘hindsight’) trends calculated back toearlier vantage points as if the later versions of thedataset had been available then.

The figure shows that different versions of thesame dataset can yield substantially different trendestimates, as indicated by the difference between eachsolid line and its thinner retrospective counterparts.This is particularly pronounced for HadCRUT, whichshows a distinct jump in 2012 when HadCRUT3 wasreplaced by HadCRUT4. In consequence, a data ana-lyst using HadCRUT3 in early 2012 would have con-cluded that the warming trend since 1998 had beenslightly negative, whereas the same analyst using Had-CRUT4 somemonths later would have concluded thatwarming since 1998 had been positive. Likewise,although the datasets are known to yield similar long-term estimates of global warming (e.g. from 1970 tothe present), there were considerable differencesbetween datasets for short-term trends early in the21st century. See Risbey et al (2018) for details.

In this article, we apply the same historical con-ditioning to our analysis of the putative divergencebetween models and observations during the periodknown as the ‘pause.’That is, we use the variants of thedatasets that were available at the time when assessingevidence for the divergence between models andobservations, and we also condition the model

Figure 1.Historically conditioned trends formajorGMSTdatasets; Berkeley (Cowtan andWay 2014); theU.K.MetOffice’sHadCRUT (Brohan et al 2006,Morice et al 2012); NASA’s GISTEMP (Hansen et al 2010); Cowtan andWay’s improved version ofHadCRUT (Cowtan andWay 2014); andNOAA’s dataset (Vose et al 2012). Each solid line plots the best-fitting least squares trendfrom the start year of 1998 to the end year (vantage year) as shown on the x-axis. The thick lines for each dataset correspond to theversion available and current at a given vantage year, with the vertical gray lines indicatingmajor changes in version. The thin linesprovide retrospective trends that showwhat the trendswould have been if later versions of the data (represented by solid points) hadbeen available earlier. The solid points indicate the dates of versions of each dataset that were available for analysis. The trends areincremented frommonthly datawhich results in somefine scale variation.

2

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 4: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

projections on the estimates of the forcings on the cli-mate system that were available at any given time. Thehistorical conditioning of both models and observa-tions provides the most like-with-like assessment ofthe divergence betweenmodels and observations.

In order to assess claims made about this putativedivergence we searched the literature for articles (pub-lished through 2016) that referred to a ‘pause’ or ‘hia-tus’ in GMST in the title or abstract. The search wascompleted in December 2017 and yielded 225 peer-reviewed articles (see Risbey et al 2018 for a completelist). On the basis of the abstracts, 82 of those articleswere identified as being concerned with a potentialdivergence between the model projections and obser-vations during the ‘pause’ period. (An additional 6articles mentioned the putative divergence but did notexamine it.) From this initial set of 82, we extracted acorpus of articles (N=50) that explicitly defined astart and end date for the period of interest, and thatalso specified the GMST dataset used for analysis. Thisis the minimum amount of information required toreproduce and examine the claims about a divergencebetween models and observations made in those arti-cles. Table 1 provides the citations for those articlestogether with information about the period examinedand the observational dataset used.

We summarize this literature graphically. Figure 2shows the observed and modeled warming rates forthe time periods covered in the corpus. For each arti-cle, we compute a warming trend using the dataset andperiod specified in the article. The average duration oftrends being examined was 14.6 years (median=15,range 10–21). The same period is used to obtain atrend for comparison from theCMIP5 simulations.

Consider first panel (a) in the figure. The blue his-togram shows the observations and the pink histo-gram shows the modeled trends using the CMIP5multi-model mean. All trends are computed based onthe information provided in the articles in the corpus,and each article contributes at least one observation(ormore, if an article usedmultiple datasets). It is clearthat the articles in the corpus were mainly concernedwith time periods in which GMST was either increas-ing only slightly or even decreased. At first glance,panel (a) also gives the appearance that observedwarming trends lagged behind model-derived expec-tations for the time periods considered in the corpus.Accordingly, some articles in the corpus draw strongconclusions about a divergence between models andobservations, stating for example that ‘Recentobserved global warming is significantly less than thatsimulated by climate models’ (Fyfe et al 2013, p 767),or ‘global-mean surface temperature (T) has shown nodiscernible warming since about 2000, in sharp con-trast to model simulations, which on average projectstrong warming’ (Dai et al 2015, p 555). These conclu-sions were reflected in the most recent IPCC Assess-ment Report (AR5), which examined the matchbetween observed GMST and the CMIP5 historical

realizations (extended by the RCP4.5 forcing scenariofor the period 2006–2012). The IPCC stated that‘...111 out of 114 realizations show a GMST trend over1998–2012 that is higher than the entire HadCRUT4trend ensemble ... . This difference between simulatedand observed trends could be caused by some combi-nation of (a) internal climate variability, (b)missing orincorrect radiative forcing and (c) model responseerror’ (Flato et al 2013, p 769). The consensus viewexpressed by the IPCC therefore pointed to a diver-gence between modeled and observed temperaturetrends, putatively caused by a mix of three factors.Subsequent to the IPCC report, the role of these threefactors has become clearer.

The contribution of internal climate variability tothe putative divergence between models and observa-tions has been illustrated in several ways. First, wheninternal variability is considered by selecting onlythose models whose internal variability happens to bealigned with the observed phase of the El Niño South-ern Oscillation (ENSO; Trenberth 2001), which is amajor determinant of tropical Pacific SSTs, the diver-gence between observed and modeled GMST trendsduring the ‘pause’ period is reduced considerably oreven eliminated (Meehl et al 2014, Risbey et al 2014).Second, when only those (few) model realizations areconsidered that—by chance alignment of their mod-eled internal variability to that actually observed—reproduced the observed ‘pause’, their warming pro-jections for the end of the century do not differ fromthose of the remaining realizations that diverged fromobservations during the recent period (England et al2015). These findings show that any conclusions abouta divergence between observed and modeled trendsthat are based on the CMIP5 multi-model mean arehighly problematic. The multi-model mean does notcapture the internal variability of the climate system—

on the contrary, the mean cancels out that internalvariability, and observed GMST therefore cannot beexpected to track the mean but rather should behavelike a singlemodel realization. The observed climate is,after all, a single realization of a stochastic system.

The importance of internal variability is illustratedin panel (b) infigure 2, which shows the same observedtrends from our corpus against a distribution of mod-eled trends for all CMIP5 ensemble members. Unlikethe multi-model mean in panel (a), the distribution oftrends modeled by the different ensemble members isfar broader and spans the observed trends because dif-ferent ensemble members are in different states ofinternal variability at any given simulated time. Con-sideration of internal variability thus reduces thealleged divergence between modeled and observedtrends.

Concerning radiative forcings, the possible inade-quacies anticipated by the IPCC (Flato et al 2013) havebeen confirmed by subsequent research (Huber andKnutti 2014, Schmidt et al 2014). We explore theimplications of updated forcings in detail below.

3

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 5: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

Concerning model response error, it is notablethat the IPCC report did not consider potential biasesin the observations as an alternative variable, eventhough the differences between datasets were knownat the time (figure 1). The analysis presented hereshows that when biases in observations and modeloutput are considered and corrected, then there is no

discernible divergence between models and observa-tions. There is no evidence of model response error, orthat themodels are ‘running too hot.’Our analysis alsoshows that statistical evidence for a divergencebetween models and observations was apparent onlyfor a brief period (2011–2013), before those biases

Table 1.Articles in the corpuswith start and end date of presumed ‘pause’ and observational datasets being considered (G=GISTEMP,H3=HadCRUT3,H4=HadCRUT4).

Years Observational dataset

Citation Start End G H3 H4 Other

Allan et al (2014) 2000 2012 * ERA-Interim

Brown et al (2015) 2002 2013 *

Chikamoto et al (2016) 2000 2013 * ERSST 4

Dai et al (2015) 2000 2013 * *

Delworth et al (2015) 2002 2013 *

Easterling andWehner (2009) 1998 2008

England et al (2014) 2001 2013 *

England et al (2015) 2000 2013 Cowtan andWay

Fyfe et al (2013) 1998 2012 *

Fyfe et al (2016) 2001 2014 * * RSS,UAH

Gettelman et al (2015) 1998 2014 * * *

Gu et al (2016) 1999 2014 * ERSST

Haywood et al (2014) 2003 2012 *

Huber andKnutti (2014) 1998 2012 * Cowtan andWay

Hunt (2011) 1999 2009 *

Kay et al (2015) 1995 2015 * *

Knutson et al (2016) 1998 2016 * *

Kosaka andXie (2013) 2001 2013 *

Kosaka andXie (2016) 1998 2016 * *

Kumar et al (2016) 1999 2013 * *

Li and Baker (2016) 1998 2012 *

Lin andHuybers (2016) 1998 2014 *

Lovejoy (2014) 1998 2013 *

Lovejoy (2015) 1998 2015 *

Mann et al (2016) 2001 2011 * Kaplan SST,HadISST, ERSST

Marotzke and Forster (2015) 1998 2012 *

Meehl andTeng (2012) 2001 2010 * NCEP/NCAR

Meehl et al (2014) 2000 2013 *

Meehl et al (2016) 2001 2016 * * HadISST

Meehl et al (2016) 2000 2013 NCEP/NCAR

Pasini et al (2017) 2001 2014 *

Peyser et al (2016) 1998 2012 * *

Power et al (2017) 1997 2014 *

Pretis et al (2015) 2001 2013 *

Rackow et al (2018) 1998 2012 * HadISST, ERA-Interim

Risbey et al (2014) 1998 2012 * * Cowtan andWay

Roberts et al (2015) 2000 2014 * * HadSST

Saenko et al (2016) 2003 2013 *

Saffioti et al (2015) 1998 2012 * ERA-Interim, JRA-55, NCEP/NCAR,NCEP/DOE,NOAA20CR

Schmidt et al (2014) 1997 2013 * Cowtan andWay

Schurer et al (2015) 1998 2013 *

Smith et al (2016) 2001 2016 * *

Steinman et al (2015b) 2004 2013 * HadISST, ERSST, Kaplan SST

Thoma et al (2015) 1998 2015 * ERA40

Thorne et al (2015) 1998 2012 * * BERKELEY, Cowtan andWay

Wang et al (2017) 2001 2015 * *

Watanabe et al (2013) 2001 2013 * *

Watanabe et al (2014) 2001 2012 * JRA-55, HadiSST1

Wei andQiao (2016) 1998 2014 *

Zeng andGeil (2016) 1998 2012 * * BERKELEY

4

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 6: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

were addressed. Even then, that interpretation wasmarred by questionable statistical choices.

2.Methods and data

2.1.OverviewWe ask whether there is any divergence betweenGMST trends, as captured by the major observationaldatasets, and the Coupled Model IntercomparisonProject Phase 5 (CMIP5) projections during the last20–25 years. The principal analyses are historicallyconditioned for both observations and model projec-tions. That is, analysis at any given temporal vantagepoint involves the observational data and projectionsthatwere available at that point in time.

We differentiate between different ways in whichshort-term trends can be computed relative to thelong-term trend, and we take into account the statis-tical ramifications of selecting a trend because it is lowbefore conducting a statistical test. This problem is

known as ‘selection bias’ or the ‘multiple-testingproblem’.

Our main analysis relies on a Monte Carloapproach to generate a synthetic distribution of inter-nal climate variability. This distribution provides a sta-tistical reference distribution against which theobserved GMST trends can be compared to assesstheir probability of occurrence on the basis of internalvariability alone.

All data used in the analyses and the R scripts canbe accessed athttps://git.io/fAur5.

2.2.Observational datasetsWe use four observational datasets that are summar-ized in table 2. To economize presentation, we omittedthe NOAA dataset (Vose et al 2012). All of thesedatasets have undergone revisions to debias theirestimates of GMST. For details, see Risbey et al (2018).Our analysis used versions of the GMST datasets asthey existed at different points in time over the ‘pause’research period.

Figure 2.Normalized density distribution of observed decadal temperature trends (blue) and the corresponding trends in theCMIP5simulations (pink). For the observations, the dataset and time periods were as in the corpus of published articles (N=50) on theapparent divergence betweenmodels and observations. An article contributesmultiple observations ifmultiple datasets were used.The same time periodswere used to construct themodeled distribution of trends fromglobally averaged surface air temperatures(TAS).Model runs are as in table 3 and used the original forcings from theCMIP5 experiments. Historical runs are spliced togetherwith RCP8.5. Panel (a) shows the trends for themulti-modelmean only, and panel (b) shows the trends for all CMIP5 ensemblemembers.

Table 2.GMSTdatasets used in the analysis (with labels used infigure captions). The release dates specify when the datawasmadepublicly available. If no release date is given, the dataset had been in use before research on the ‘pause’ commenced. If coverage of adataset is global, then it is compared to global output of theCMIP5model projections. If parts of the globe are not covered (HadCRUT),then themodel output ismasked to the same coverage for comparison.

Dataset (label in captions) Released SST data Model output Citation

Berkeley (BERKELEY) March 2014 HadSST3 Global Rohde et al (2013)Cowtan andWay (CW) November 2013 HadSST3 Global Cowtan andWay (2014)GISTEMP (GISTEMP) HadSST2+OISST pre 2013 Global Hansen et al (2010)

ERSSTv3 tilmid 2015

ERSSTv4 aftermid 2015

HadCRUT3 (HADCRUT) HadSST2 Masked Brohan et al (2006)HadCRUT4 (HADCRUT) November 2012 HadSST3 Masked Morice et al (2012)

5

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 7: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

All analyses reported here have been performedwith all four of the datasets shown in table 2. To econ-omize presentation, we usually focus on GISTEMPand HadCRUT because they were available through-out the period of research into the ‘pause’ and hencepermit accurate historical conditioning. GISTEMPand HadCRUT also bracket the magnitude of thewarming trends observed during the ‘pause’ (seefigure 1), with HadCRUT providing the lowest esti-mates (in part because it omits a significant number ofgrid cells in the high Arctic, which is known to warmparticularly rapidly), and GISTEMP providing ahigher estimate of warming throughout (because itprovides coverage of theArctic by interpolation).

All of the datasets were limited to the period1880–2016, with anomalies computed relative to acommon reference period of 1981–2010. This refer-ence period was chosen because the different SSTrecords are most consistent over this period, and itavoids the recent changes in ship bias (Kent et al 2017).All trends were computed using ordinary least squares.The auto-correlation structure of the data is, however,modeled in themainMonte Carlo analysis.

2.3.Model projectionsAn ensemble of 84 CMIP5 historical multimodel runswas combined with RCP8.5 projections to yieldsimulated and projected GMST for the period1880–2016. RCP8.5 makes the most extreme assump-tions about increases in forcings and therefore pro-vides the ‘hottest’ scenario for comparison to theGMST data, rendering it most suitable for the detec-tion of any divergence between rapid projections andslow actual warming. Table 3 lists themodels used andtheir runs.

Where applicable, model output was masked tothe coverage of the corresponding dataset (HadCRUT;see table 2). The masked model results were treated inthe same way as the corresponding data; namely, byaveraging separate hemispheric means to obtainGMST. This approach mirrors HadCRUT (both ver-sions 3 and 4), which also uses hemispheric averages toproduce a global mean, rather than averaging all gridcells across both hemispheres simultaneously (Brohanet al 2006, Morice et al 2012). We therefore reportcomparisons involving HadCRUT separately fromcomparisons involving the other datasets.

The CMIP5 model projections have undergonetwo notable revisions since 2013.

2.3.1. Updated forcingsClimate projections are obtained by applying estimatesof the historical radiative forcings for historical runs(until 2005), followed by the future forcings that areassumed by the scenario (e.g. RCP8.5). If thosepresumed forcings turn out to be wrong, for examplebecause economic activity or climate policies follow anunexpected path or because historical estimates are

revised, then any divergence between modeled andobserved GMST cannot be used to question thesuitability or accuracy of climate models (Flato et al2013).

Relevant variables such as volcanic eruptions,aerosols in the atmosphere, and solar activity all tookunexpected turns early in the 21st century, necessitat-ing an update to the original presumed forcings in theRCPswhich had created awarmbias in themodel pro-jections. Two such updates have been provided(Huber andKnutti 2014, Schmidt et al 2014).

The updated estimates provided by Schmidt et al(2014) became available early in 2014 (27 February)and covered the period 1989–2013. Schmidt et al(2014) identified four necessary adjustments to (a)

Table 3.CMIP5models and number oforiginal runs used in the analysis. Eachhistorical run is concatenatedwith thecorresponding RCP8.5 projection. Allmodel output is baselinedwith reference tothe period 1981–2010. Themulti-modelmeanwas computed by averaging across allruns for eachmodel first, before averagingacrossmodels.

Model name Nmodel runsa

ACCESS1 2

bcc-csm1 1

CanESM2 5

CCSM4 6

CESM1-BGC 1

CESM1-CAM5 3

CMCC-CM 1

CNRM-CM5 6

CSIRO-Mk3-6-0 10

EC-EARTH 5

FIO-ESM 3

GFDL-CM3 1

GFDL-ESM2G 1

GFDL-ESM2M 1

GISS-E2-H-CC 1

GISS-E2-H 5

GISS-E2-R-CC 1

GISS-E2-R 5

HadGEM2-AO 1

HadGEM2-CC 1

HadGEM2-ES 4

inmcm4 1

IPSL-CM5A-LR 4

IPSL-CM5A-MR 1

IPSL-CM5B-LR 1

MIROC-ESM-CHEM 1

MIROC-ESM 1

MIROC5 3

MPI-ESM-LR 3

MPI-ESM-MR 1

MRI-CGCM3 1

MRI-ESM1 1

NorESM1-M 1

NorESM1-ME 1

a For some models, the physical properties

differed between runs. We averaged across

all runs irrespective of physics.

6

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 8: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

well-mixed greenhouse gases (WMGHG; correcting asmall cool bias in the projections); (b) solar irradiance(correcting a warm bias from around 1998 onward);(c) anthropogenic tropospheric aerosols (correcting awarm bias from around 1998 onward); and (d) volca-nic stratospheric aerosols (correcting a substantialcool bias around 1992 and a growing warm bias since1998). The adjusted forcings were converted intoupdated GMST using an impulse-response model(Boucher andReddy 2008).

The alternative updated forcings provided byHuber and Knutti (2014) became available later in2014 (17 August) and covered the period 1970–2012.Huber and Knutti (2014) did not update the forcingsfromWMGHGs or anthropogenic tropospheric aero-sols, focusing instead on solar irradiation and stato-spheric aerosols only. Huber and Knutti (2014) usedtwo separate estimates to correct for solar irradiation,by the Active Cavity Radiometer Irradiance Monitorand by the Physikalisch-Meterologisches Observator-ium Davos (PMOD). Huber and Knutti (2014) alsoprovided two updated estimates of stratospheric aero-sols. One estimate, roughly paralleling that used bySchmidt et al (2014), relied on optical thickness esti-mates from NASA GISS. The other estimate addition-ally considered ‘background’ stratospheric aerosolsunconnected to volcanic eruptions (Solomon et al2011). Huber and Knutti (2014) used a climate modelof intermediate complexity (Stocker et al 1992) to esti-mate the effect of the updated forcings onGMST.

In our analyses we report model projections withtwo sets of adjustments: First, the total adjustmentsprovided by Schmidt et al (2014), referred to as S-adjusted from here on. Second, we report the adjust-ments provided by Huber and Knutti (2014) using thePMOD estimates of solar irradiation and without con-sideration of background aerosols (H-adjusted fromhere on). Those adjustments replace the assumedzero-forcings for 2001–2005 for volcanic aerosols inthe historical RCPs (Meinshausen et al 2011). TheH-adjusted projections almost certainly under-esti-mate the warm bias in the RCP forcings and thus pro-vide a lower bound of the possible effects of updatedforcings.

For both sets we carried forward the final adjust-ments to subsequent years.We alsomade the simplify-ing assumption that both adjustments were availablefrom the beginning of 2014 onward to facilitateannualizing of the updated projections.

2.3.2. Blending of air and SSTThe second revision of model projections involved therecognition that the models’ global near-surface airtemperature (coded as TAS in the CMIP5 output),which had commonly been compared with observa-tional estimates of GMST, was not strictly commensu-rate with the observations (Cowtan et al 2015). (Seealso Santer et al 2000, Knutson et al 2013 andMarotzkeand Forster 2015)GMST is obtained by combining air

temperature measurements from land-based stationswith SSTsmeasured in the top fewmeters of the ocean.A true like-with-like comparison of models to obser-vations would therefore involve a similar blend ofmodeled land temperatures and modeled SST (codedas TOS).

Cowtan et al (2015) showed that if the HadCRUT4blending algorithm is replicated on the CMIP5 modeloutputs, the divergence between model projectionsand observations is reduced by about a quarter (during2009–2013). The insight that like-with-like compar-ison required blending of model output became avail-able half-way through 201510.

For our analyses, we blended land-air (TAS) andsea-surface (TOS) anomalies from the models forcomparison to the observations, with air temperatureused over sea ice. For comparison to HadCRUT3 orHadCRUT4, the blended anomalies are masked toobservational coverage before calculation of hemi-spheric means, whereas for the remaining observa-tional records the global mean of the spatiallycomplete blended field is used.

2.4.Historical andhindsight trendsAs already noted in connection with figure 1, when thelatest available GMST datasets are used (defined hereas through the end of 2016), we term this a ‘hindsight’analysis because the current GMST data benefit fromall bias reductions made to date, irrespective of whattime period is being plotted or analyzed. To accuratelyrepresent the information available to researchers atany earlier point in time, we focus on a historically-conditioned analysis that uses the versions of each ofthe datasets that were current at the time in question.

We provide the same historical conditioning forthe CMIP5 model projections based on the two majorrevisions just discussed. Because the revisions to theforcings involve two alternative adjustments (S-adjus-ted versusH-adjusted), we use both in our historically-conditioned analysis. In addition, when models arecompared to the HadCRUT datasets, historical con-ditioning entails a change in the coverage maskbetween HadCRUT3 and HadCRUT4 to mirror thechange in coverage between the two datasets.

2.5. Continuous and broken trendsThe trends shown in figure 1were computed followingthe common approach in the literature, by computinga trend between a start and end date by estimating aslope and intercept for the regression line. Computa-tion of the trend in this manner introduces a breakbetween contiguous trend lines if the period before (orafter) the trend in question is modeled by a separate

10It is somewhat unclear whether GISTEMP prior to 2000 should

be considered an air or blended temperature dataset due to the useof night-time marine air temperatures to correct the SST data.However for the ‘pause’ period, buoy observations dominate theSST data and comparisons for this period should use blended data.

7

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 9: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

linear regression (Rahmstorf et al 2017). This isproblematic for several reasons: first, for short-termtrends, an independent estimate of slope and interceptbecomes particularly sensitive to the choice of startand end points. Second, any break at the junction oftwo contiguous trends calls for a physical explanation.Although temperature trends are oftenmodeled basedon statistical considerations alone, the statisticalmodels cannot help but describe a physical process—any break in the long-term trend line therefore tacitlyinvokes the presence of a physical process that isresponsible for this break and intercept shift. No suchprocess has been proposed or explicitly modeled.Third, even ignoring the absence of an underlyingphysical process, a broken trend cannot be interpretedas just a ‘slowdown’ in warming: a correct interpreta-tionmust include the shift in intercept, for example bystating that ‘after a jump in temperatures warming wasless than before the jump.’ The interpretations ofbroken trends in the literature generally fail tomention the intercept shift.

The solution to this problem is to compute short-term trends that are continuous: when partial trendsare continuous, they converge at a common point andshare that ‘hinge’, even though the slopes of the twopartial trends may differ (Rahmstorf et al 2017). Incomparing short-term GMST trends against the mod-eled trends we show results for both broken and con-tinuous trends.

2.6. Selection biasMost of the articles written on the ‘pause’ fail to offerany justification for the choice of start year. Publishedstart years span the range from 1995 to 2004, with themodal year being 1998 (Risbey et al 2018). This broadrangemay be indicative of a lack of formal or scientificprocedures to establish the onset of the ‘pause.’More-over, in each instance the presumed onset of the‘pause’ was not randomly chosen, but specificallybecause of the subsequent low trend (Lewandowskyet al 2015). However, therein lies a problem: if a periodis chosen (from many possible such time intervals)because of its unusually low trend, this has implicationsfor the interpretation of conventional significancelevels (i.e. p-values) of the trend (Rahmstorf et al2017). Selection of observations based on the samedata that is then being statistically tested inflates theactual p-value, thereby giving rise to a larger propor-tion of statistical TypeI errors than the researcher isled to expect (Wagenmakers 2007). Very few articleson the ‘pause’ account for or even mention this effect,yet it has profound implications for the interpretationof the statistical results. Rahmstorf et al (2017) referredto this issue as the ‘multiple testing problem,’ althoughhere we prefer the term ‘selection bias’ because we findit to be more readily accessible. More appropriatetechniques exist (Rahmstorf et al 2017) and are used inour statistical testing.

2.7. Statistical testingBecause GMST is not expected to track the multi-model mean, any divergence between models andobservations must be evaluated with respect to howunusual it is in light of the expected internal variabilityof the climate system. We generate those expectationsby decomposition of the observed warming into aforced component and internal variability. Observedtrends can then be evaluated against the expectationsderived from that internal-variability component.

The forced component is a composite of anthro-pogenic influences such as warming from greenhousegases and cooling from tropospheric aerosols, and nat-ural components such as volcanic activity and solarirradiation. Internal variability is superimposed onthis time-varying forced signal. Observed GMST (T)can thus be expressed as:

T F F V E, 1a n p n= + + + ( )–

where Fa and Fn represent anthropogenic and naturalforcings, respectively, and Vp–n represents pure inter-nal variability. The term E is a composite term thatrefers to all sources of error and bias, such as structuraluncertainty in models and observations (Cowtan et al2018) and uncertainties in the observations (Moriceet al 2012).

We estimate:

V E T F F , 2p n a n+ = - +( ) ( )–

V T F F , 3n a n= - +( ) ( )

whereVn represents the single actual historical realiza-tion of the internal variability component of theEarth’s climate, including errors and biases that escapequantification but that we implicitly model during ouranalysis. The CMIP5 multi-model ensemble mean istaken to represent the total forced signal, Fa+Fn (Daiet al 2015, Knight 2009, Mann et al 2014, 2016, 2017,Steinman et al 2015a, 2015b). To equalize the weightgiven to each model irrespective of how many runs itcontributes to the ensemble (table 3), we average runswithin eachmodel before averaging acrossmodels.

We use all data from the period 1880–2016 tocompute the residual (Vn) by subtracting the CMIP5multi-model ensemble mean from the observations.We use this single observed realization to estimate thestationary stochastic time series model that bestdescribes internal variability (and its unknown errorcomponent). Specifically, we model Vn computed byequation (3) with a selection of ARMA(p, q) models,where p ä {0, 1, 2, 3} and q ä {0, 1, 2, 3} and choosethe most appropriate model on the basis of minimumAIC. This model is then used to generate, via MonteCarlo simulations, a synthetic ensemble of realizations(N=1000) of internal variability that conform to thestatistical attributes revealed by the chosen ARMAmodel. These realizations provide a synthetic refer-ence distribution of residuals for comparison againstthe observations. To make this comparison commen-surate with the reference distribution, the observa-tions are represented by the trend of the residuals

8

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 10: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

between GMST and the CMIP5 multi-model mean.Thus, when reporting the results (figures 9 through12), all trends refer to the trend in the residuals duringthe period of interest. This comparison takes intoaccount autocorrelations in the GMST data as thesynthetic realizations capture the observed auto-correlational structure of theGMST time series11.

We report the result of that comparison as thepercentage of synthetic trends with a magnitude smal-ler than the trend of interest. For a trend to be con-sidered unusual—and hence divergent from model-derived expectations—fewer than 5% of all synthetictrends must be lower than the observed trend ofinterest.

3. Results

3.1. Comparingmodels to observationsFigures 3 and 4 show the latest available GMST dataagainst the model projections. Figure 3 shows theGMST datasets with global coverage and figure 4shows the HadCRUT4 dataset with limited coverage(and correspondinglymaskedmodel output).

In each figure, the different panels show theeffects of historical conditioning of the model pro-jections. The top-left panels (a) show the conven-tional comparison between CMIP5 global air surfacetemperatures (TAS) and GMST that constituted themost readily available means of comparison until2015. Panels (d) show a more appropriate, like-with-like comparison between the GMST data and themodels, with both model output and observationsbeing blended between land-air (TAS) and sea-sur-face (TOS) temperatures in an identical manner.This comparison became available in 2015 (Cowtanet al 2015).

Comparison of panels (a) and (d) clarifies thatblending of the model output reduces the divergencebetween models and observations early in the 21stcentury. The apparent divergence was exaggerated bythe long-standing but nonetheless inappropriate use

of TAS as the sole basis for comparison. Panels (b), (c),(e) and (f) in the figures additionally show the effects ofadjusting the forcings. The adjustments were appliedto global TAS output (panels (b) and (c) for S-adjustedand H-adjusted, respectively) as well as blended TAS-TOS output (panels (e) and (f) for S-adjusted andH-adjusted, respectively).

It is clear from these results that when the updatedforcings are applied and model output is blendedbetween TAS and TOS in the same way as the observa-tions (panels (e) and (f)), there is no discernible diver-gence betweenmodel projections and GMST. It matterslittle whether the comprehensive S-adjustments or theoverly conservative H-adjustments are applied to themodel projections. Notably, the only apparent recentdivergence arises with the HadCRUT dataset withoutadjustment of the forcings (panels (a) and (d) infigure 4).

We explore the results presented in figures 3 and 4with detailed trend analyses.

3.2. Broken and continuous trendsWefirst examine the impact of howshort-term trends arecomputed, by comparingbroken to continuous trends.

Considering first the broken trends, figures 5 and 6plot observed and modeled 15 year trends. The figurescontrast hindsight (top panels) to historically-condi-tioned perspectives (bottom panels) on the modelprojections and observations. The historically-condi-tioned panels therefore omit datasets that only becameavailable recently (BERKELEY and CW). Figure 5 showsthe datasets with global coverage and global modelprojections, whereas figure 6 shows the HadCRUT data-setwithmodel outputmasked to the samecoverage.

Each trend is computed for the 15 year period end-ing in the vantage year being plotted. Each panel onlyincludes data after the onset of modern global warm-ing, as determined by a change-point analysis for eachdataset (Cahill et al 2015). Thus, the earliest vantageyear in each figure is 15 years after the onset ofmodernglobal warming in that dataset. In each figure, panels(a) and (b) provide a hindsight view of the model pro-jections and observations, using the updated forcingsand TAS-TOS blending throughout. Panels (c) and(d), by contrast, provide a historically-conditionedperspective on the model projections and observa-tions, with the vertical lines indicating the time whenrevisions to forcings and blending of TAS and TOSbecame available. Panels (c) use S-adjustments andpanels (d) useH-adjustments, respectively.

It is clear from figures 5 and 6 that in hindsightthere is no evidence for a divergence betweenmodels and observations. The pattern differs for thehistorically-conditioned analyses, which show somedivergence between observations and models earlyin the 21st century. This divergence is particularlyapparent with HadCRUT (figure 6), for the years

11An alternative approach is to add each synthetic realization to the

CMIP5 multi-model ensemble mean and then compare theobserved GMST trend of interest (rather than its divergence fromthe multi-model mean) against the synthetic distribution of trends.However, this comparison introduces a bias when the comparisoncorrects for the selection bias problem (see section 3.3 below). Forthat comparison, each GMST trend of interest is compared to eachpossible trend of the same duration at all possible times (since onsetof global warming) in each synthetic realization. This introduces aproblem because the forcings (represented by the CMIP5 multi-model ensemble mean) were not constant across the entire period.For example, the eruption of Mt Pinatubo is echoed by a distinctdownturn in the model projections. It follows that superimpositionof the synthetic noise on the forced signal would render anypotential ‘pause’ trend in the observations less unusual for reasonsthat have nothing to do with the statistical properties of the noise.This problem can be avoided by comparing the observed residualsto the synthetic distribution of residuals. We nonetheless exploredthe alternative approach and found the results (not reported here) tobe largely unchanged.

9

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 11: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

immediately preceding the switch fromHadCRUT3 toHadCRUT4.

Turning to continuous trends, figures 7 and 8 sup-port broadly similar conclusions. In hindsight, there is

little evidence for any divergence between models andobservations. With historical conditioning, a divergencewas observable early in the 21st century and this diver-gencewasparticularly pronounced forHadCRUT3.

Figure 3.Hindsight comparisonof latest available globalGMSTdatasetswithCMIP5model projections.The solidblack line represents themulti-ensemblemeananddotted lines themost extrememodel runs.The shaded area encloses 95%ofmodel projections. BothGMSTandmodel projections are anomalies relative to a referenceperiod1981–2010.The top-left panel (a)presents the conventional comparisonusingglobal air surface (TAS) fromthemodels. Panels (b) and (c) alsouse globalTASbutwith forcings adjusted asper Schmidt et al (2014)orHuber andKnutti (2014), respectively. Panel (d)uses blendof TAS and sea surface temperature (TOS) inmodels. Panels (e) and (f) alsouseblended temperatures butwith forcings adjusted asper Schmidt et al (2014)orHuber andKnutti (2014), respectively.

10

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 12: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

3.3. Statistical comparisonOur principal statistical analysis follows up on the datajust reported (figures 5 through 8) using the Monte

Carlo approach outlined earlier. We ask whether atany time during the decade from 2007 to 2016 therewas statistical evidence for a divergence between the

Figure 4.Hindsight comparison of latest availableHadCRUT4dataset with CMIP5model projectionsmasked to the same coverage asthe observations. The solid black line represents themulti-ensemblemean and dotted lines themost extrememodel runs. The shadedarea encloses 95%ofmodel projections. BothGMST andmodel projections are anomalies relative to a reference period from1981 to2010. The top-left panel (a) presents the conventional comparison using global air surface (TAS) from themodels. Panels (b) and (c)also use global TAS but with forcings adjusted as per Schmidt et al (2014) orHuber andKnutti (2014), respectively. Panel (d) usesblend of TAS and sea surface temperature (TOS) inmodels. Panels (e) and (f) also use blended temperatures but with forcings adjustedas per Schmidt et al (2014) orHuber andKnutti (2014), respectively.

11

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 13: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

observed GMST trend since 1998 (themodal start yearof the ‘pause’ identified in the literature; Risbey et al2018) and the model projections. Only historically-conditioned observations and model projections areconsidered, although for the final year in question(2016) the conditioned data are identical to the hind-sight perspective. Because of the historical focus,datasets that were not available until recently are notconsidered (BERKELEY andCW).

Figures 9 and 10 summarize the statistical analysesusing broken trends for GISTEMP and HadCRUT,respectively. In each figure, the top row of panels showstatistical comparisons involving the ‘pause’ periodonly, whereas those at the bottom involve compar-isons of the presumed ‘pause’ period to the entirerecord of internal variability represented in the refer-ence distribution of synthetic realizations. The bottom

panels therefore deal with the selection-bias problemexplained in section 2.6 whereas the top panels do notcorrect for this bias, as is common in the literature.

Within each panel, a matrix of potential ‘pause’periods is represented. The vantage year (x-axis) is thelast year of each potential pause-period during the lastdecade, and the number of years included (y-axis)defines how far back the pause-interval extends.Trends are extended only as far back as 1998 (all cellson the diagonal involve 1998 as the start year).

For every candidate ‘pause’ defined in the matrix,the divergence of the corresponding observed GMSTtrend from the CMIP5 multi-model mean was com-pared against the synthetic realizations of internalvariability obtained by Monte Carlo (section 2.7).When comparison involved only the ‘pause’ period(top panels in the figures), the observed candidate

Figure 5.Comparison of 15 year broken trends betweenmodel projections and observations. Each trend is plottedwith the end yearas vantage year and is shown inK/decade. The solid black line represents themulti-ensemblemodelmean and dashed lines themostextrememodel runs. The shaded area encloses 95%ofmodel projections. Panels (a) and (b) provide a hindsight perspective, using thelatest available datasets with global coverage andTAS-TOS blendedmodel outputwith updated forcings. Panel (a)uses corrections toforcings provided by Schmidt et al (2014) and panel (b) uses corrections provided byHuber andKnutti (2014). Panels (c) and (d)provide a historically-conditioned perspective on data andmodels, and therefore omit datasets not available throughout. Verticallines indicatemajor revisions toCMIP5model output or interpretation (forcings adjustment and blending). Panel (c) uses correctionsto forcings provided by Schmidt et al (2014) from2014 onward and panel (d) uses corrections provided byHuber andKnutti (2014)also from2014 onward.

12

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 14: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

‘pause’ was compared against the synthetic ensemblefor that particular duration and time period only. Thepercentage of synthetic trends lower than the observedtrend is reported in the corresponding cell in thematrix. Values below 5% are additionally identified bya yellow circle as they are deemed to represent a sig-nificant divergence between modeled and observedtemperatures beyond that expected on the basis ofinternal variability alone. If none of the synthetictrends are smaller than the observed trend, thepercentage will be zero—indicating that models arewarming significantly faster than the observations.The comparison is single-tailed, so cells can take onlarge values if the observed trend is sufficientlypositive.

When the selection-bias problem was accountedfor (bottom panels in the figures), the observed can-didate ‘pause’ was compared against each possibletrend of the same duration at all possible times ineach synthetic realization since the onset of modernglobal warming. (The onset year was determinedseparately for each dataset based on the analysisreported by Cahill et al 2015.) The cell entries recordthe percentage of synthetic realizations in which atleast one such trend fell below the observed candi-date ‘pause’ trend. This percentage can be inter-preted as ‘how unusual is the observed trend in lightof what would be expected to arise due to internalvariability alone at some point in time during globalwarming’.

Figure 6.Comparison of 15 year broken trends betweenmodel projections and observations. Each trend is plottedwith the end yearas vantage year and is shown inK/decade. The solid black line represents themulti-ensemblemodelmean and dashed lines themostextrememodel runs. The shaded area encloses 95%ofmodel projections. Panels (a) and (b) provide a hindsight perspective, using thelatest available version ofHadCRUT4 andTAS-TOSblendedmodel output with updated forcingsmasked to the same coverage. Panel(a) uses corrections to forcings provided by Schmidt et al (2014) and panel (b) uses corrections provided byHuber andKnutti (2014).Panels (c) and (d) provide a historically-conditioned perspective on data andmodels. Vertical lines indicatemajor revisions toCMIP5model output or interpretation (forcings adjustment and blending). Panel (c) uses corrections to forcings provided by Schmidt et al(2014) from2014 onward and Panel (d) uses corrections provided byHuber andKnutti (2014) also from2014 onward.Note thattransition fromHadCRUT3 toHadCRUT4 is accompanied by a change in themodelmask as coverage between the two versionsdiffers.

13

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 15: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

Interpretation of the results in figures 9 and 10 isstraightforward. First, irrespective of which dataset isbeing considered or which adjustment to model pro-jections is applied, when the selection-bias problem isaccounted for, there has been no evidence at any timebetween 2007 and 2016 for the hypothesis that obser-vations lagged significantly behind model-derivedexpectations (bottom panels of figures 9 and 10). Thisconclusion holds for any trend commencing in 1998or later with a minimum duration of at least 10 years.(It is notmeaningful to consider shorter trends. This isreflected in the literature on the ‘pause’ which con-sensually focuses on trends 10 years or longer; seefigure 1 in Risbey et al 2018.)

Second, when the selection-bias problem isignored (top panels of figures 9 and 10), there

was apparent evidence of a statistically significantdivergence between models and observations fromaround 2011–2013. That is, during those three years inhistory, researchers would have had access to statisticalevidence for an apparent divergence.

Figures 11 and 12 provide another perspective on thesame analysis using continuous trends. We noted insection 2.5 that many investigators had used brokentrends in their analyses. However, as shown by Rahm-storf et al (2017), in the absence of independent evidenceof a change in the forcing functions or other identifiablechange in conditions of the system, inferring a change inthe rate of warming on the basis of broken trends isunwarranted, and may produce misleading results. Inthis instance, the figures show that the conclusions arelargely unchangedwith continuous trends.

Figure 7.Comparison of 15 year continuous trends betweenmodel projections and observations. Each trend is plottedwith the endyear as vantage year and is shown inK/decade. The solid black line represents themulti-ensemblemodelmean and dashed lines themost extrememodel runs. The shaded area encloses 95%ofmodel projections. Panels (a) and (b) provide a hindsight perspective,using the latest available datasets with global coverage andTAS-TOS blendedmodel outputwith updated forcings. Panel (a) usescorrections to forcings provided by Schmidt et al (2014) and panel (b) uses corrections provided byHuber andKnutti (2014). Panels(c) and (d) provide a historically-conditioned perspective on data andmodels, and therefore omit datasets not available throughout.Vertical lines indicatemajor revisions toCMIP5model output or interpretation (forcings adjustment and blending). Panel (c) usescorrections to forcings provided by Schmidt et al (2014) from2014 onward and panel (d) uses corrections provided byHuber andKnutti (2014) also from2014 onward.

14

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 16: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

4. Implications

We asked whether there was a meaningful divergencebetween climate-model projections andGMST duringthe 21st century. We explored a multi-dimensionalstatistical and conceptual space that simultaneouslyconsidered (a) the historical evolution of GMSTdatasets, (b) historical revisions to the CMIP5 projec-tions and their interpretation, (c) different ways ofcomputing trends, and (d) different ways in which totest hypotheses about the divergence between modelsand observations. The results of our explorationconverge on two conclusions.

First, there is no evidence, using currentlyavailable observations and model projections, for a

significant divergence between models and observa-tions during the last 20 years. This conclusion gen-eralizes across datasets (GISTEMP and HadCRUT)and it does not depend on any other choices duringdata analysis.

Second, when models and observations are his-torically conditioned, the strength of apparentevidence for a divergence between models and obser-vations crucially depends on the statistical comparisonbeing employed. When the statistical tests take intoaccount the fact that the period under considerationwas chosen for examination based on its apparent lowtrend, thereby accounting for the selection-bias pro-blem (section 2.6), no evidence for a divergencebetween models and observations existed at any time

Figure 8.Comparison of 15 year continuous trends betweenmodel projections and observations. Each trend is plottedwith the endyear as vantage year and is shown inK/decade. The solid black line represents themulti-ensemblemodelmean and dashed lines themost extrememodel runs. The shaded area encloses 95%ofmodel projections. Panels (a) and (b) provide a hindsight perspective,using the latest available version ofHadCRUT4 andTAS-TOSblendedmodel output with updated forcingsmasked to the samecoverage. Panel (a) uses corrections to forcings provided by Schmidt et al (2014) and panel (b) uses corrections provided byHuber andKnutti (2014). Panels (c) and (d) provide a historically-conditioned perspective on data andmodels. Vertical lines indicatemajorrevisions toCMIP5model output or interpretation (forcings adjustment and blending). Panel (c) uses corrections to forcingsprovided by Schmidt et al (2014) from2014 onward and panel (d)uses corrections provided byHuber andKnutti (2014) also from2014 onward.Note that transition fromHadCRUT3 toHadCRUT4 is accompanied by a change in themodelmask as coveragebetween the two versions differs.

15

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 17: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

during the last decade. This conclusion holds irrespec-tive of how trends are computed (broken versuscontinuous; section 2.5). When the selection-bias pro-blem is ignored, by contrast, apparent evidence for adivergence between models and observations existedbetween 2011 and 2013 irrespective of which dataset(GISTEMP versus HadCRUT) is considered and howtrends are computed (broken versus continuous).

Figure 13 summarizes the Monte Carlo analysisin a decision tree that outlines the major options foranalysis. The tree captures the fact that researchersmust make several choices about the analysis. Theymust decide whether or not to correct for the selec-tion-bias issue (the top decision node in the figure).They must decide how to model the pause-interval(as broken or continuous trends; second level ofdecision nodes). They must choose which dataset touse (HadCRUT or GISTEMP; third level). The tree

pinpoints the conditions under which—and when—apparent evidence for a divergence existed. The stateof the evidence is represented by the leaves of the tree(small circles) at the bottom of the figure. Greenleaves denote absence of evidence, defined as morethan half of all possible trend durations in that van-tage year exceeding the bottom 10% of synthetic rea-lizations. Any leaf that is partially or wholly orange orred signals the appearance of some degree of evidencefor a divergence between observations and modelprojections. The evidence is considered poor (orangeleaves) if half or more of all possible trend durationsin that vantage year fall below the bottom 10% ofsynthetic realizations. The evidence is considered fair(red leaves) if, in addition, there is at least one trendin that vantage year falls below the bottom 5% ofsynthetic realizations. It is clear that any such evi-dence was limited to the time period 2011–2013 and

Figure 9.MonteCarlo comparison of brokenGMST trends (GISTEMP) to a reference distribution of synthetic realizations of internalvariability. Cell entries refer to the percentage of synthetic trends lower than that observed. See text for how trends are computed.Values below 5%are circled in yellow to indicate statistical significance. Vantage years refer to the end point of the trends beingexamined and bothmodels and observations are historically-conditioned for that time.

16

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 18: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

only emerged when the selection-bias problem wasignored.

The pattern in the figure is reflected in the corpusof 50 articles on the divergence between models andobservations: 31 of those articles (62%) considered aperiod that ended in one of the three years (2011, 2012,and 2013) during which the evidence for a divergencefrom models appeared strongest. A further eight con-sidered a period ending in 2014.

The delineation of the apparent evidence infigure 13 gives rise to two important questions: first,can the choices that give rise to the apparent diver-gence be justified? Second, why was the apparent evi-dence limited to the years 2011–2013, and would thatevidence have been detectable if observations andmodels had already been debiased at that time?

4.1.Data analytic choicesThe impression that observations diverged frommodel projections arose only when analysts ignoredthe selection bias issue. Figure 9 though 12 underscorethe generality of these results: in all figures, the bottompanels (selection bias considered) showed no evidencefor any divergence, whereas the top panels (selectionbias ignored) give a different impression, with varyingdegrees of apparent divergence.

The problem that arises from the selection-biasissue—namely an inflation of the Type I error rate—was discussed and accounted for by Rahmstorf et al(2017), although they left the magnitude of the pro-blem unspecified. We quantified the problem using aMonte Carlo approach derived from the analysismethod just reported.

Figure 10.MonteCarlo comparison of brokenGMST trends (HadCRUT) to a reference distribution of synthetic realizations ofinternal variability. Cell entries refer to the percentage of synthetic trends lower than that observed. See text for how trends arecomputed. Values below 5%are circled in yellow to indicate statistical significance. Vantage years refer to the end point of the trendsbeing examined and bothmodels and observations are historically-conditioned for that time.

17

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 19: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

We generated a new set of 1000 synthetic realiza-tions as described earlier. The CMIP5 multi-modelmean was based on TAS/TOS blended global anoma-lies using S-adjusted forcings. The observations werefrom GISTEMP. The ensemble of 1000 realizationswas then used in a Monte Carlo experiment involving100 replications. On each replication, a single realiza-tion was sampled from the ensemble at random,which was taken to constitute the ‘observations’ forthat replication. From that critical realization, a single15 year trend (either broken or continuous) was cho-sen for statistical comparisonwith the remaining reali-zations in the ensemble. The trend was chosen in oneof several ways: (a) A trend was picked at random bychoosing any possible starting date between theonset of global warming (1970 for GISTEMP) and2002 with equal probability. (b) The lowest15 year

trend observed since onset of global warming (1970) inthe critical realization was selected. (c) The second-lowest trend was selected from the critical realization.(d) The trend at the 10th percentile of all possibletrendswas chosen.

Each chosen trend was then compared against15 year trends with identical start and end dates acrossthe remaining realizations in the ensemble. This com-parison is exactly analogous to the variant of our mainanalysis that ignored the selection-bias issue.

Because all realizations, including the one chosenas the ‘observations’ for a given replication, share anidentical random structure, the null hypothesis thattemperatures are driven by internal variability alone isknown to be true. A single randomly-chosen trendwould therefore be expected to fall in the middle ofthat comparison distribution, with approximately half

Figure 11.MonteCarlo comparison of continuousGMST trends (GISTEMP) to a reference distribution of synthetic realizations ofinternal variability. Cell entries refer to the percentage of synthetic trends lower than that observed. See text for how trends arecomputed. Values below 5%are circled in yellow to indicate statistical significance. Vantage years refer to the end point of the trendsbeing examined and bothmodels and observations are historically-conditioned for that time.

18

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 20: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

of all comparison trends falling above and below thechosen trend, respectively. Only occasionally shouldthe randomly-chosen trend be in the extremes of thedistribution. Specifically, by chance alone, only 5% ofthe time should the randomly-chosen trend fall belowthe 5th percentile of the comparison distribution (inwhich case the trend would be falsely identified as ‘sig-nificantly lower than expected’). Likewise, no morethan 10% of the time should the randomly-chosentrend fall below the 10th percentile of the comparisondistribution and so on.

Figure 14 shows the results by plotting the propor-tion of times (out of 100 replications) that the compar-ison trend fell below the indicated percentile of thedistribution of trends in the synthetic ensemble. Panel(a) shows the results for broken trend, and panel (b)for continuous trends.

For both types of trend, the randomly-chosentrend closely tracks the diagonal, thereby mirroringthe distribution expected under the null hypothesis.That is, in about half of the replications the trend wasnear the median of the ensemble realizations, in abouta quarter of replications the trend fell around the firstquartile of the ensemble, and so on. Assuming a sig-nificance threshold of 0.05, the observed TypeI errorrate for the randomly-chosen trend is thus around 5%,as expected.

A very different pattern is observed for trends thatwere chosen on the basis of their lowmagnitude in thecritical realization. For example, when the lowesttrend in the critical realization was chosen and thencompared to the remaining synthetic realizations, innearly half of the replications this trendwas lower thanthe 5th percentile of the comparison distribution—

Figure 12.MonteCarlo comparison of continuousGMST trends (HadCRUT) to a reference distribution of synthetic realizations ofinternal variability. Cell entries refer to the percentage of synthetic trends lower than that observed. See text for how trends arecomputed. Values below 5%are circled in yellow to indicate statistical significance. Vantage years refer to the end point of the trendsbeing examined and bothmodels and observations are historically-conditioned for that time.

19

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 21: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

put another way, the Type I error rate was vastly infla-ted beyond the nominal 5%. The problem is atte-nuated for trends that are less extreme (i.e. second-lowest trend or a trend at the 10th percentile of all pos-sible trends in the critical realization), but in all casesand for both types of trend themagnitude-based selec-tion inflates the Type I error rate, again as expected.

The figure illustrates the essence of the selection-bias problem: whenever a trend is chosen because itsmagnitude is particularly low, subsequent statisticaltests that seemingly confirm the unusual nature of thetrend yield an inflated number of false positives. In ourcorpus of articles, 79% of all reported ‘pause’ trendswere below thefirst decile in the distribution of all pos-sible trends of equal duration in the dataset, and 54%of reported trends were the lowest observed since theonset of global warming (mean percentile 0.073). Itfollows that around half of all ‘pause’ trends con-sidered in the literature would have been classified asdeviating significantly from model projections with aprobability of around 50% even if GMST had evolvedexactly as expected on the basis of the forcings withsuperimposed natural variability.

It follows that the common practice in the ‘pause’literature to ignore the selection-bias issue inad-vertently facilitated erroneous conclusions about theputative divergence betweenmodels and observations.

4.2.Debiasing of observations andmodelsThe hindsight analysis (represented by the bottom rowfor 2016 in figure 13 and the rightmost columns infigures 9 though 12) differs considerably from thehistorical-conditioning results. This difference arisesfrom two factors, namely the incremental reduction ofa cool bias in the observations during the last 10 years(figure 1) and the parallel reduction of a warm bias inthe CMIP5 model projections (e.g. figure 5, panels (c)and (d)).

We ask three questions about the debiasing:Would there have been any appearance of a divergencebetween models and observations if the debiasing hadalready been available in 2011–2013? How robust arethe choices that were made during debiasing of obser-vations and models? Were those biases known (or atleast knowable) at the time when articles reported adivergence betweenmodels and observations?

4.2.1. Debiasing: the historical counterfactualFigure 1 illustrated the effects of gradual debiasing onthe observed GMST trends since 1998. The figure alsocontained counterfactual information, represented bythe thin lines which indicate what the trend wouldhave been at an earlier time, had the later debiasingbeen available then. We can apply the same counter-factual analysis to the debiasing of the models, namelyby blending TAS/TOS throughout rather than just

Figure 13.Tree representation of the results of the present analysis. The analysis either considers or ignores the selection-biasproblem; the trends are estimated either as broken or continuous; and theGMSTdata can come fromHadCRUT (H) orGISTEMP(G). The leaves (circles) at the bottom represent the results of a historically-conditioned analysis of all trends since 1998 at each yearindicated (figures 9 through 12). Leaves are colored to reflect the level of apparent evidence for a divergence betweenmodels andobservations, with green denoting the absence of evidence, orange denoting weak evidence and red denoting fair evidence (see text forexplanation). Half circles denote the appearance of evidencewith only one of the adjustments to forcings (S-adjustments versusH-adjustments).

20

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 22: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

after its implications became widely known in 2015(Cowtan et al 2015). Likewise, the adjustment toforcings that became available in 2014 (Huber andKnutti 2014, Schmidt et al 2014) can be applied to themodel output before that time.

These counterfactual data were presented in thetop panels (a) and (b) of the earlier figures 5 through 8.It is clear from those figures that had the debiasingbeen available earlier, no discernible divergencebetween models and observations would have beendetected. By implication, it is unlikely that there wouldhave been a literature on a putative ‘pause’ or analleged divergence betweenmodels and observations ifobservations and model projections had beendebiased a decade earlier.

Given the notable role of the debiasing, we mustexamine whether those adjustments to observationsandmodels were robust and sufficiently justified.

4.2.2. Robust debiasingTwo major sources of bias have been identified andcorrected in the observational datasets: these are datacoverage (Rohde et al 2013, Cowtan and Way 2014),and the bias reduction of SST data (Karl et al 2015,Hausfather et al 2017). Both of those corrections havebeen shown to be necessary and robust.

There are multiple lines of independent evidencethat confirm the bias that arises from limited data cov-erage, in particular in the HadCRUT dataset whichomits a significant number of grid cells in the highArctic, particularly over the Arctic ocean. The bias isshown in figure 1 as the difference between datasetswith global coverage (e.g. GISTEMP, CW, BERKE-LEY) and the HadCRUT dataset. The CW dataset

(Cowtan and Way 2014) is based on HadCRUT butextends coverage to the Arctic (and other regionsomitted in HadCRUT) by interpolation. The robust-ness of that interpolation has been established byextensive cross-validation (Cowtan and Way 2014).The estimates provided in CW for the Arctic also agreewith reanalyses, such as the ERA-interim reanalysis(Simmons and Poli 2015, Simmons et al 2017) andJRA-55 reanalysis (Simmons et al 2017). The BERKE-LEY dataset also achieves global coverage by interpola-tion but uses a different approach from CW and relieson data that are collected and analyzed independentlyfrom HadCRUT (Cowtan et al 2015). Notwithstand-ing, BERKELEY closely tracks CW in figure 1. The factthat multiple approaches to interpolation converge onthe same bias correction supports their robustness.

Similarly, there are multiple lines of evidence thatshow earlier versions of SST to have suffered from acool bias. The cool bias in recent SST records arisesfrom the increasing prevalence of drifting buoy obser-vations. The bias was first reported by Smith et al(2008), and was initially addressed by Kennedy et al(2011). Subsequent work has identified a further biasin the ship data, which when addressed further increa-ses trends over the pause period (Huang et al 2015,Hausfather et al 2017).

Turning to biases in the model output, the con-ventional use of TAS (surface air temperatures) forcomparisons with observations was less appropriatethan the blending of TAS and TOS (modeled SST)(Cowtan et al 2015). Given that all observational data-sets blend surface air temperature measurements overland with SSTmeasurements, the blended data permitamore like-with-like comparison thanTAS alone.

Figure 14.MonteCarlo illustration of the consequences of ignoring the selection-bias issue. A 15 year trend is chosen from a randomrealization of internal variability (1970–2016) and is then compared to the remaining realizations in the ensemble at the same point intime. The trend can be chosen randomly, that is by randomly selecting a start year, or it can be the lowest or second-lowest in thatrealization, or it can be at thefirst decile (10th quantile) of all possible trends in that realization. A randomly chosen trend falls into theexpected position in the distribution of comparison realizations (e.g. 5%of the time it sits below the 5th percentile and so on). Cherry-picked trends lead to inflated TypeI errors. Panel (a): broken trends. Panel (b): continuous trends. See text for details.

21

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 23: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

There are, however, alternative ways in which theblending can be implemented in the model output.Here, we blended land-air and sea-surface anomalies,with air temperatures used over sea ice. An alter-native approach involves blending of absolute tem-peratures, which reduces the difference to unblended(TAS only) temperatures but renders the comparisonto observations more problematic because the obser-vations blend anomalies rather than absolute tem-peratures (Cowtan et al 2015). The present choicethus maximizes comparability of model output andobservations.

The need for adjustments to the forcings pre-sumed in CMIP5 experiments is also well understoodand supported by evidence, for example pertaining tobackground stratospheric volcanic aerosols that wereunder-estimated in the RCPs (Solomon et al 2011). Ithas also been shown that the most recent solar cyclewith a minimum in 2009 was substantially lower andmore prolonged than expected from a typical cycle(Fröhlich 2012). Accordingly, both sets of availablecorrections to forcings (Huber and Knutti 2014,Schmidt et al 2014) largely agree on the need to updatesolar irradiation and stratospheric aerosols.

However, there is less agreement about the effectof anthropogenic tropospheric aerosols, and only oneof the corrections includes this factor (Schmidt et al2014). Similarly, there are multiple ways in which theupdated forcings can be converted into temperatureadjustments. Absent the ability to re-run CMIPexperiments, this requires an emulator or model ofintermediate complexity. Schmidt et al (2014) used theformer whereas Huber and Knutti (2014) used thelatter.

We respond to those ambiguities by bracketing theavailable corrections. We use the most comprehensiveset (S-adjustment; Schmidt et al 2014) as well as themost conservative set which considers the effects ofsolar irradiation alone (H-adjustment; Huber andKnutti 2014). The fact that the differences betweenS-adjustment and H-adjustment are generally slightattests to the robustness of our results with respect tothe variety of updated forcings.

4.2.3. Unknown unknowns and known unknown biasesThe historical period of greatest interest is 2011–2013.During that time, scientists who considered the datafrom the preceding 10–15 years could detect apparentevidence for a divergence between models and obser-vations (conditional on the statistical issues reviewedearlier; seefigure 13). It is important to ascertainwhichof the biases (section 4.2) were known or at leastknowable at that time.

The importance of blending of TAS and TOS waslargely unanticipated until the issue was identified byCowtan et al (2015). Before then, with notable excep-tions (Knutson et al 2013, Mann et al 2014, Marotzkeand Forster 2015, Steinman et al 2015b), most studiesused the global surface air temperature from models

rather than blended land-ocean temperatures.Throughout, most climate scientists probably did notrealize that the comparison of unblended model out-put to blended observations substantially contributedto the observed divergence betweenmodels and obser-vations. From those scientists’ perspective, the blend-ing problem may have constituted a classic ‘unknownunknown’ until its implications were identified andquantified by Cowtan et al (2015). However, given thatscientists were, in fact, comparing different quantities—blended and unblended data—they might haveanticipated that this distorted comparison would notbe inconsequential.

In contrast to the blending issue, the existence ofthe remaining major biases had been widely recog-nized for some time, even though their exact magni-tude remained elusive. Perhaps the most strikingexample involves the lack of Arctic coverage, giventhat it has long been known that climate change isamplified in the Arctic (Manabe andWetherald 1975).There are several reasons for this Arctic amplification,all rooted in well-understood physics such as latitu-dinal differences in convection (Pithan and Maur-itsen 2014) or increasedwater vapor in the atmosphere(Serreze and Barry 2011). Accordingly, the potentiallysignificant effect of a lack of Arctic coverage on GMSTtrends was revealed on the RealClimate blog as early as2008 (Benestad 2008) and was reported in the litera-ture a short time later (Simmons et al 2010). The biasin HadCRUT was therefore understood before theperiod of interest, although itsmagnitude escaped pre-cise measurement until the advent of sophisticatedinterpolationmethods (Cowtan andWay 2014, Rohdeet al 2012, 2013).

Similarly, the bias in the SST observations arisingfrom the increase in buoy-based data was also knownbefore scientists became interested in the divergencebetween models and observations (Smith et al 2008).The exact magnitude of the bias, however, becameapparent only later (Karl et al 2015).

It is less clear when the divergence between the for-cings presumed in the RCPs underlying the CMIP5experiments and those actually observed first becameapparent. The paper by Solomon et al (2011) estab-lished an additional cooling effect from stratosphericaerosols that was not captured byCMIP5 experiments.Fyfe et al (2013) confirmed the implications of theupdated aerosol forcing in a comprehensive EarthSystem Model. The broader adjustments to forcingsused here became available in 2014 (Huber andKnutti 2014, Schmidt et al 2014). It follows that duringthe period of greatest interest (2011–2013) only lim-ited knowledge about the need to update forcings wasavailable.

In summary, researchers have had access to infor-mation about biases in GMST observations for nearly10 years. In particular, the Arctic was known to warmmore rapidly than the rest of the planet, thereby ren-dering it nearly certain that any dataset with limited

22

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 24: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

coverage of the Arctic would underestimate globalwarming. However, the magnitudes of those biasesbecame clearer only recently. Nonetheless, it is notablethat neither the coverage bias nor the bias in SST wasacknowledged in the IPCC AR5 (Flato et al 2013).Other biases in the forcings were more uncertain andtheir existence andmagnitude were pinned down onlyrecently. The implications of the blending of TAS andTOS were largely unrecognized until the issue wasreported in 2015.

5. Concluding commentary

We have established that several biases in the observa-tions and in model projections gave rise to theimpression of a divergence between modeled andobserved temperature trends. This impression waslimited to the period 2011–2013, after which theongoing debiasing eliminated any appearance of adivergence. During the period 2011–2013, the impres-sion of a divergence could appear to be statisticallysignificant, but only if the selection-bias issue wasignored.We have shown that ignoring of the selection-bias issue can drastically inflate Type-I error rates,which renders the inferences unreliable and in thiscase erroneous.

Some of the biases affecting datasets and modelprojections were either known or at least knowable atthe time. It is thus reasonable to ask what factors ledsome scientists to the view that climate warming lag-ged behindmodeledwarming trends?

One contributory factor appears to be a failure ofcommunication between surface temperature dataproviders and surface temperature data users; in thiscase the climate modeling community. The commu-nication of the limitations of observational datasetsfrom providers to users is often problematic, since itgenerally involves a deeper understanding of the datathan is required for their use. Data users have to beexperts in their own problem domains, and will notusually be able to become experts in the surface temp-erature data and associated literature as well. To illus-trate, numerous articles in the corpus omittedmentioning whether the model output was masked tothe coverage of HadCRUT. We are unsure whetherthis means masking was not performed or was takenfor granted and hence not mentioned in the Methods.This presents a clear case of room for improvement inreporting. Some temperature data providers haveattempted to address these issues by providing ensem-bles of temperature reconstructions, however theseare often not used, and typically do not address knownbut uncorrected biases in the data.

The delays in updating surface temperature data-sets are substantial: it may take some time for a bias tobe identified, after which a solution must be found,and a paper written and published before most provi-ders will update their datasets. To whit, the ship-buoy

bias was noted in 2008 (Smith et al 2008), but only par-tially corrected in HadCRUT4 in 2012 and fully cor-rected in GISTEMP in 2015. While this slow andthorough process is good for transparency, it leavesusers who have not been following the literature orconference talks in the position where theymay down-load data with knowable biases and unwittingly drawincorrect conclusions from those data. Experienceshows that referees frequently do not identify theseproblems. And even if the problem is identified, it isfar from clear how researchers can deal with a knownbias of unknownquantity.

A final but separate question is why scientists putsuch emphasis on the ‘pause’, when the evidencefor its existence has been scant. We have arguedelsewhere that constant public and political pressureby climate contrarians may have caused scientists totake positions that they would not have taken in theabsence of such opposition (Lewandowsky et al2015). The present analysis sets the stage for anexploration of the mechanisms by which this mayhave occurred, which will be the focus of futurecontributions.

ORCID iDs

Stephan Lewandowsky https://orcid.org/0000-0003-1655-2013James SRisbey https://orcid.org/0000-0003-3202-9142StefanRahmstorf https://orcid.org/0000-0001-6786-7723

References

AllanRP, LiuC, LoebNG, PalmerMD, RobertsM, SmithD andVidale P-L 2014Changes in global net radiative imbalance1985–2012Geophys. Res. Lett. 41 4398–405

Benestad R 2008Mind the gap!(http://realclimate.org/index.php/archives/2008/11/mind-the-gap)

BoucherO andReddyM2008Climate trade-off between blackcarbon and carbon dioxide emissions Energy Policy 36193–200

Brohan P, Kennedy J J,Harris I, Tett S F B and Jones PD2006Uncertainty estimates in regional and global observedtemperature changes: a newdata set from1850 J. Geophys.Res.: Atmos. 111D12106

BrownTP, LiW,CorderoCE andMauget A S 2015Comparing themodel-simulated global warming signal to observations usingempirical estimates of unforced noise Sci. Rep. 5 9957

Cahill N, Rahmstorf S and Parnell AC 2015Change points of globaltemperature Environ. Res. Lett. 10 084002

ChikamotoY,Mochizuki T, TimmermannA, KimotoMandWatanabeM2016 Potential tropical atlantic impacts onpacific decadal climate trendsGeophys. Res. Lett. 43 7143–51

Climate ResearchCommittee, National ResearchCouncil 1995Natural Climate Variability onDecade-to-Century Time Scales(Washington, DC:National Academies Press)

CowtanK,Hausfather Z,Hawkins E, Jacobs P,MannME,Miller S K, SteinmanBA, StolpeMBandWayRG2015Robust comparison of climatemodels with observationsusing blended land air and ocean sea surface temperaturesGeophys. Res. Lett. 42 6526–34

23

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 25: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

CowtanK, Jacobs P, Thorne P andWilkinsonR 2018 Statisticalanalysis of coverage error in simple global temperatureestimatorsDyn. Stat. Clim. Syst. 3 dzy003

CowtanK andWayRG2014Coverage bias in theHadCRUT4temperature series and its impact on recent temperaturetrendsQ. J. R.Meteorol. Soc. 140 1935–44

Dai A, Fyfe J C, Xie S-P andDai X 2015Decadalmodulation ofglobal surface temperature by internal climate variabilityNat.Clim. Change 5 555–9

Delworth LT, Zeng F, Rosati A, Vecchi AG andWittenberg TA2015A link between the hiatus in global warming and northamerican drought J. Clim. 28 3834–45

Easterling RDandWehner FM2009 Is the climate warming orcooling?Geophys. Res. Lett. 36 L08706

EnglandHM,McGregor S, Spence P,Meehl AG, TimmermannA,CaiW,Gupta S A,McPhaden JM, PurichA and SantosoA2014Recent intensification ofwind-driven circulation in thepacific and the ongoingwarming hiatusNat. Clim. Change 4222–7

EnglandMH,Kajtar J B andMaherN 2015Robust warmingprojections despite the recent hiatusNat. Clim. Change 5394–6

FlatoG et al 2013 Evaluation of climatemodelsClimate Change2013: The Physical Science Basis. Contribution ofWorkingGroup I to the Fifth Assessment Report of the IntergovernmentalPanel on Climate Change edT Stocker (Cambridge:CambridgeUniversity Press) pp 741–866

FröhlichC 2012Total solar irradiance observations Surv. Geophys.33 453–73

Fyfe C J et al 2016Making sense of the early-2000s warmingslowdownNat. Clim. Change 6 224–8

Fyfe J C,Gillett N P andZwiers FW2013Overestimated globalwarming over the past 20 yearsNat. Clim. Change 3 767–9

Fyfe J, SalzenK,Cole J, Gillett N andVernier J-P 2013 Surfaceresponse to stratospheric aerosol changes in a coupledatmosphere-oceanmodelGeophys. Res. Lett. 40 584–8

GettelmanA, Shindell TD and Lamarque F J 2015 Impact of aerosolradiative effects on 2000–2010 surface temperaturesClim.Dyn. 45 2165–79

GuG,Adler F R andHuffman JG 2016 Long-term changes/trendsin surface temperature and precipitation during the satelliteera (1979–2012)Clim.Dyn. 46 1091–105

Hansen J E, RuedyR, SatoMandLoK 2010Global surfacetemperature changeRev. Geophys. 48RG4004

Hausfather Z, CowtanK,ClarkeDC, Jacobs P, RichardsonMandRohdeR 2017Assessing recent warming using instrumentallyhomogeneous sea surface temperature records Sci. Adv. 3e1601207

Haywood JM, Jones A and JonesG S 2014The impact of volcaniceruptions in the period 2000–2013 on globalmeantemperature trends evaluated in theHadGEM2-ES climatemodelAtmos. Sci. Lett. 15 92–6

Huang B, BanzonVF, Freeman E, Lawrimore J, LiuW,PetersonTC, Smith TM,Thorne PW,Woodruff SD andZhangH-M2015 Extended reconstructed sea surfacetemperature version 4 (ERSST.v4): I. Upgrades andintercomparisons J. Clim. 28 911–30

HuberMandKnutti R 2014Natural variability, radiative forcingand climate response in the recent hiatus reconciledNat.Geosci. 7 651–6

HuntGB 2011The role of natural climatic variation in perturbingthe observed globalmean temperature trendClim.Dyn. 36509–21

Karl TR, Arguez A,Huang B, Lawrimore JH,McMahon J R,MenneM J, PetersonTC,Vose R S andZhangH-M2015Possible artifacts of data biases in the recent global surfacewarming hiatus Science 348 1469–72

Kay E J et al 2015The community earth systemmodel (cesm) largeensemble project: a community resource for studying climatechange in the presence of internal climate variabilityBull. Am.Meteorol. Soc. 96 1333–49

Kennedy J J, RaynerNA, Smith RO, ParkerDE and SaunbyM2011Reassessing biases and other uncertainties in sea surface

temperature observationsmeasured in situ since 1850: II.Measurement and sampling uncertainties J. Geophys. Res.Atmos. 116D14104

Kent EC et al 2017A call for new approaches to quantifying biases inobservations of sea-surface temperatureBull. Am.Meteorol.Soc. 98 1601–16

Knight J R 2009TheAtlanticmultidecadal oscillation inferred fromthe forced climate response in coupled general circulationmodels J. Clim. 22 1610–25

KnutsonRT, Zhang R andHorowitzWL 2016 Prospects for aprolonged slowdown in global warming in the early 21stcenturyNat. Commun. 7 13676

KnutsonTR, Zeng F andWittenberg AT 2013Multimodelassessment of regional surface temperature trends: CMIP3andCMIP5 twentieth-century simulations J. Clim. 268709–43

Kosaka Y andXie S-P 2013Recent global-warming hiatus tied toequatorial pacific surface coolingNature 501 403–7

Kosaka Y andXie S-P 2016The tropical pacific as a key pacemaker ofthe variable rates of global warmingNat. Geosci. 9 669–73

Kumar S, Kinter J L, Pan Z and Sheffield J 2016Twentieth centurytemperature trends inCMIP3, CMIP5, andCESM-LEclimate simulations: spatial-temporal uncertainties,differences, and their potential sources J. Geophys. Res.:Atmos. 121 9561–75

Lewandowsky S,OreskesN, Risbey J S, Newell B R and SmithsonM2015 Seepage: climate change denial and its effect on thescientific communityGlob. Environ. Change 33 1–13

Lewandowsky S, Risbey J S andOreskesN 2015On the definitionand identifiability of the alleged hiatus in global warming Sci.Rep. 5 16784

LiWT andBaker CN2016Detectingwarming hiatus periods inCMIP5 climatemodel projections Int. J. Atmos. Sci. 20169657659

LinMandHuybers P 2016Revisiting whether recent surfacetemperature trends agree with the CMIP5 ensemble J. Clim.29 8673–87

Lovejoy S 2014Return periods of global climate fluctuations and thepauseGeophys. Res. Lett. 41 4704–10

Lovejoy S 2015Using scaling formacroweather forecastingincluding the pauseGeophys. Res. Lett. 42 7148–55

Manabe S andWetherald RT 1975The effects of doubling theCO2

concentration on the climate of a general circulationmodelJ. Atmos. Sci. 32 3–15

MannEM, SteinmanAB,Miller K S, FrankcombeML,EnglandHMandCheungHA2016 Predictability of therecent slowdown and subsequent recovery of large-scalesurfacewarming using statisticalmethodsGeophys. Res. Lett.43 3459–67

MannME,Miller S K, Rahmstorf S, SteinmanBA andTingleyM2017Record temperature streak bears anthropogenicfingerprintGeophys. Res. Lett. 44 7936–44

MannME, Rahmstorf S, Steinman BA, TingleyMandMiller S K2016The likelihood of recent recordwarmth Sci. Rep. 619831

MannME, SteinmanBA andMiller SK 2014On forcedtemperature changes, internal variability, and theAMOGeophys. Res. Lett. 41 3211–9

Marotzke J and Forster PM2015 Forcing, feedback and internalvariability in global temperature trendsNature 517565–70

Meehl AG,HuA, SanterDB andXie S-P 2016Contribution of theinterdecadal pacific oscillation to twentieth-century globalsurface temperature trendsNat. Clim. Change 6 1005–8

Meehl GA,HuA andTengH2016 Initialized decadal prediction fortransition to positive phase of the interdecadal pacificoscillationNat. Commun. 7 11718

Meehl GA andTengH2012Case studies for initialized decadalhindcasts and predictions for the pacific regionGeophys. Res.Lett. 39 L22705

Meehl GA, TengH andArblaster JM2014Climatemodelsimulations of the observed early-2000s hiatus of globalwarmingNat. Clim. Change 4 898–902

24

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al

Page 26: Lewandowsky, S. , Cowtan, K., Risbey, J. S., Mann, M. E ...€¦ · Lewandowsky, S., Cowtan, K., Risbey, J. S., Mann, M. E., Steinman, B. A., Oreskes, N., & Rahmstorf, S. (2018)

MeinshausenM et al 2011TheRCP greenhouse gas concentrationsand their extensions from1765 to 2300Clim. Change 109213–41

MoriceCP, Kennedy J J, RaynerNA and Jones PD 2012Quantifying uncertainties in global and regional temperaturechange using an ensemble of observational estimates: theHadCRUT4data set J. Geophys. Res.: Atmos. 117D08101

Pasini A, TriaccaU andAttanasio A 2017 Evidence for the role of theAtlanticmultidecadal oscillation and the ocean heat uptake inhiatus predictionTheor. Appl. Climatol. 129 873–80

Peyser EC, Yin J, LandererWF andCole E J 2016 Pacific sea levelrise patterns and global surface temperature variabilityGeophys. Res. Lett. 43 8662–9

Pithan F andMauritsenT 2014Arctic amplification dominated bytemperature feedbacks in contemporary climatemodelsNat.Geosci. 7 181–4

Power S, Delage F,WangG, Smith I andKociubaG 2017Apparentlimitations in the ability of CMIP5 climatemodels to simulaterecentmulti-decadal change in surface temperature:implications for global temperature projectionsClim.Dyn.49 53–69

Pretis F,Mann LMandKaufmannKR2015Testing competingmodels of the temperature hiatus: assessing the effects ofconditioning variables and temporal uncertainties throughsample-wide break detectionClim. Change 131 705–18

RackowT,Goessling FH, Jung T, SidorenkoD, Semmler T,BarbiD andHandorf D 2018Towardsmulti-resolutionglobal climatemodelingwith ECHAM6-FESOM: II. ClimatevariabilityClim.Dyn. 50 2369–94

Rahmstorf S, Foster G andCahill N 2017Global temperatureevolution: recent trends and some pitfalls Environ. Res. Lett.12 054001

Risbey J S, Lewandowsky S, CowtanK,OreskesN, Rahmstorf S,Jokimäki A and Foster G 2018Afluctuation in surfacetemperature in historical context: reassessment andretrospective on the evidence Environ. Res. Lett. 13 123008

Risbey J S, Lewandowsky S, Langlais C,MonselesanDP,O’Kane T J andOreskesN 2014Well-estimated global surfacewarming in climate projections selected for ENSOphaseNat.Clim. Change 4 835–40

RobertsDC, PalmerDM,McNeall D andCollinsM2015Quantifying the likelihood of a continued hiatus in globalwarmingNat. Clim. Change 5 337–42

RohdeR, Curry J, GroomD, JacobsenR,Muller RA, Perlmutter S,Rosenfeld A,WickhamC andWurtele J 2013 Berkeley earthtemperature averaging processGeoinformatics Geostat. 1 2

RohdeR,Muller RA, JacobsenR,Muller E, Perlmutter S,Rosenfeld A,Wurtele J, GroomDandWickhamC2012Anew estimate of the average earth surface land temperaturespanning 1753 to 2011Geoinformatics Geostat. 1 1

SaenkoAO, Fyfe C J, Swart CN, LeeGWandEnglandHM2016Influence of tropical wind on global temperature frommonths to decadesClim.Dyn. 47 2193–203

Saffioti C, FischerME andKnutti R 2015Contributions ofatmospheric circulation variability and data coverage bias tothewarming hiatusGeophys. Res. Lett. 42 2385–91

Santer BD et al 2000 Interpreting differential temperature trends atthe surface and in the lower troposphere Science 287 1227–32

SchmidtGA, Shindell DT andTsigaridis K 2014Reconcilingwarming trendsNat. Geosci. 7 158–60

Schurer PA,Hegerl CG andObrochta P S 2015Determining thelikelihood of pauses and surges in global warmingGeophys.Res. Lett. 42 5974–82

SerrezeMC andBarry RG2011 Processes and impacts of Arcticamplification: a research synthesisGlob. Planet. Change 7785–96

SimmonsA J, Berrisford P,DeeDP,HersbachH,Hirahara S andThépaut J 2017A reassessment of temperature variations andtrends fromglobal reanalyses andmonthly surfaceclimatological datasetsQ. J. R.Meteorol. Soc. 143 101–19

SimmonsA J andPoli P 2015Arctic warming in ERA-interim andother analysesQ. J. R.Meteorol. Soc. 141 1147–62

SimmonsA J,Willett KM, Jones PD, Thorne PWandDeeDP 2010Low-frequency variations in surface atmospheric humidity,temperature, and precipitation: inferences from reanalysesandmonthly gridded observational data sets J. Geophys. Res.:Atmos. 115D01110

SmithMD, Booth BBB,Dunstone JN, Eade R,Hermanson L,Jones SG, Scaife AA, Sheen LK andThompsonV2016Roleof volcanic and anthropogenic aerosols in the recent globalsurfacewarming slowdownNat. Clim. Change 6 936–40

Smith TM, Reynolds RW, PetersonTC and Lawrimore J 2008Improvements toNOAAʼs historicalmerged land-oceansurface temperature analysis (1880–2006) J. Clim. 21 2283–96

Solomon S,Daniel J S, Neely RR,Vernier J-P,Dutton EG andThomason LW2011The persistently variable ‘background’stratospheric aerosol layer and global climate change Science333 866–70

SteinmanBA, Frankcombe LM,MannME,Miller SK andEnglandMH2015aResponse to comment onAtlantic andPacificmultidecadal oscillations andNorthernHemispheretemperatures Science 350 1326–1326

SteinmanBA,MannME andMiller SK 2015bAtlantic and Pacificmultidecadal oscillations and northern hemispheretemperatures Science 347 988–91

Stocker T F,Mysak LA andWright DG1992A zonally averaged,coupled ocean-atmospheremodel for paleoclimate studiesJ. Clim. 5 773–97

ThomaM,Greatbatch J R, KadowC andGerdes R 2015Decadalhindcasts initialized using observed surface wind stress:evaluation and prediction out to 2024Geophys. Res. Lett. 426454–61

Thorne P,Outten S, Bethke I and SelandØ2015 Investigating therecent apparent hiatus in surface temperature increases: II.Comparison ofmodel ensembles to observational estimatesJ. Geophys. Res.: Atmos. 120 8597–620

TrenberthK 2001 ElNiño SouthernOscillation (enso)Encyclopediaof Ocean Sciences ed JH Steele 2nd edn (London: Academic)pp 228–40

Vose R S et al 2012NOAAʼsmerged land-ocean surface temperatureanalysisBull. Am.Meteorol. Soc. 93 1677–85

Wagenmakers E-J 2007Apractical solution to the pervasiveproblems of p values Psychonomic Bull. Rev. 14 779–804

WangY, SuH, JiangH J, Livesey JN, Santee LM, Froidevaux L,ReadGWandAnderson J 2017The linkage betweenstratospheric water vapor and surface temperature in anobservation-constrained coupled general circulationmodelClim.Dyn. 48 2671–83

WatanabeM,Kamae Y, YoshimoriM,OkaA, SatoM, IshiiM,Mochizuki T andKimotoM2013 Strengthening of oceanheat uptake efficiency associatedwith the recent climatehiatusGeophys. Res. Lett. 40 3175–9

WatanabeM, ShiogamaH, TatebeH,HayashiM, IshiiM andKimotoM2014Contribution of natural decadal variability toglobal warming acceleration and hiatusNat. Clim. Change 4893–7

WeiMandQiao F 2016Attribution analysis for the failure ofCMIP5 climatemodels to simulate the recent global warminghiatus Sci. China Earth Sci. 60 397–408

ZengX andGeil K 2016Global warming projection in the XXIcentury based on an observational data-drivenmodelGeophys. Res. Lett. 43 10947–54

25

Environ. Res. Lett. 13 (2018) 123007 S Lewandowsky et al