13
EN4 : Quality controlled ocean temperature and salinity profiles and monthly objective analyses with uncertainty estimates Simon A. Good, 1 Matthew J. Martin, 2 and Nick A. Rayner 1 Received 2 May 2013 ; accepted 8 November 2013; published 11 December 2013. [1] We present version 4 of the Met Office Hadley Centre ‘‘EN’’ series of data sets of global quality controlled ocean temperature and salinity profiles and monthly objective analyses, which covers the period 1900 to present. We briefly describe the EN4 data sources, processing, quality control procedures, and the method of generating the analyses. In particular, we highlight improvements relative to previous versions, which include a new duplicate profile removal procedure and the inclusion of three new quality control checks. We discuss in detail a novel method for providing uncertainty estimates for the objective analyses and improving the background error variance estimates used by the analysis system. These were calculated using an iterative method that is relatively robust to initial misspecification of background error variances. We also show how the method can be used to identify issues with the analyses such as those caused by misspecification of error variances and demonstrate the impact of changes in the observing system on the uncertainty in the analyses. Citation : Good, S. A., M. J. Martin, and N. A. Rayner (2013), EN4: Quality controlled ocean temperature and salinity profiles and monthly objective analyses with uncertainty estimates, J. Geophys. Res. Oceans, 118, 6704–6716, doi:10.1002/2013JC009067. 1. Introduction [2] Observations of the temperature and salinity of the ocean are essential to many climate applications such as the initialization of seasonal and decadal forecasts [e.g., Arribas et al., 2011; Smith et al., 2007], for understanding the variability over time of ocean temperature, heat content, and salinity [e.g., Palmer et al., 2007; Levitus et al., 2012; Lyman et al., 2010; Boyer et al., 2005; Durack and Wijf- fels, 2010] and for input to ocean reanalyses [e.g., Balma- seda et al., 2008]. It is important that bad data are not allowed to impact on these applications and therefore they rely on having data sets available that have been subjected to a high level of quality control. [3] In this paper, we present the latest version of the ‘‘EN’’ series of data sets, EN4, which aims to meet this need. It is a collection of ocean temperature and salinity profiles obtained across the global oceans over the period 1900 to present to which a series of quality control checks have been applied. Associated with this are monthly objec- tive analyses with uncertainty estimates. The origin of the series of data sets (and its name) was in two European Union projects: ENACT (Enhanced Ocean Data Assimila- tion and Climate Prediction; http ://www.ecmwf.int/ research/EU_projects/ENACT/index.html) and ENSEM- BLES (http://ensembles-eu.metoffice.com/index.html) [see Ingleby and Huddleston, 2007] but, as described in this paper, the quality control procedures have been extended and uncertainty estimates are provided with the objective analyses. The latter is achieved through use of a novel method to estimate uncertainty that is relatively robust to misspecification of the variance of the errors in the analysis backgrounds compared to methods such as optimal interpo- lation (OI). The method is also used to improve the esti- mates of background error variance. Data are made freely available for scientific research and private study from the Met Office Hadley Centre observations website, http:// www.metoffice.gov.uk/hadobs. [4] The EN4 data set exists alongside a number of other similar data sets. For example, the World Ocean Database (WOD; latest version available at time of writing was WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA) [Cabanes et al., 2013]. Each has its own unique features that may make them most appropri- ate for a particular project. Features of the EN4 data set include its >100 year time span with all profiles recorded in a particular month provided in a single network common data format (NetCDF) file and the fact that it is updated to the present each month. We also apply the different bias adjustments that have been proposed by various authors [e.g., Wijffels et al., 2008; Ishii and Kimoto, 2009; Levitus et al., 2009; Gouretski and Reseghetti, 2010; Good, 2011; Gouretski, 2012; Hamon et al., 2012] to adjust for time varying biases in mechanical bathythermograph (MBT) and expendable bathythermograph (XBT) profiles and make these data available on request. This gives the user choice over which adjustments to use or the option to use 1 Met Office Hadley Centre, Exeter, UK. 2 Met Office, Exeter, UK. Corresponding author: S. A. Good, Met Office Hadley Centre, FitzRoy Road, Exeter EX1 3PB, UK. (simon.good@metoffice.gov.uk) ©2013. American Geophysical Union. All Rights Reserved. 2169-9275/13/10.1002/2013JC009067 6704 JOURNAL OF GEOPHYSICAL RESEARCH : OCEANS, VOL. 118, 6704–6716, doi :10.1002/2013JC009067, 2013

EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

EN4: Quality controlled ocean temperature and salinity profiles andmonthly objective analyses with uncertainty estimates

Simon A. Good,1 Matthew J. Martin,2 and Nick A. Rayner1

Received 2 May 2013; accepted 8 November 2013; published 11 December 2013.

[1] We present version 4 of the Met Office Hadley Centre ‘‘EN’’ series of data sets ofglobal quality controlled ocean temperature and salinity profiles and monthly objectiveanalyses, which covers the period 1900 to present. We briefly describe the EN4 datasources, processing, quality control procedures, and the method of generating the analyses.In particular, we highlight improvements relative to previous versions, which include a newduplicate profile removal procedure and the inclusion of three new quality control checks.We discuss in detail a novel method for providing uncertainty estimates for the objectiveanalyses and improving the background error variance estimates used by the analysissystem. These were calculated using an iterative method that is relatively robust to initialmisspecification of background error variances. We also show how the method can be usedto identify issues with the analyses such as those caused by misspecification of errorvariances and demonstrate the impact of changes in the observing system on the uncertaintyin the analyses.

Citation: Good, S. A., M. J. Martin, and N. A. Rayner (2013), EN4: Quality controlled ocean temperature and salinity profiles andmonthly objective analyses with uncertainty estimates, J. Geophys. Res. Oceans, 118, 6704–6716, doi:10.1002/2013JC009067.

1. Introduction

[2] Observations of the temperature and salinity of theocean are essential to many climate applications such asthe initialization of seasonal and decadal forecasts [e.g.,Arribas et al., 2011; Smith et al., 2007], for understandingthe variability over time of ocean temperature, heat content,and salinity [e.g., Palmer et al., 2007; Levitus et al., 2012;Lyman et al., 2010; Boyer et al., 2005; Durack and Wijf-fels, 2010] and for input to ocean reanalyses [e.g., Balma-seda et al., 2008]. It is important that bad data are notallowed to impact on these applications and therefore theyrely on having data sets available that have been subjectedto a high level of quality control.

[3] In this paper, we present the latest version of the‘‘EN’’ series of data sets, EN4, which aims to meet thisneed. It is a collection of ocean temperature and salinityprofiles obtained across the global oceans over the period1900 to present to which a series of quality control checkshave been applied. Associated with this are monthly objec-tive analyses with uncertainty estimates. The origin of theseries of data sets (and its name) was in two EuropeanUnion projects: ENACT (Enhanced Ocean Data Assimila-tion and Climate Prediction; http://www.ecmwf.int/research/EU_projects/ENACT/index.html) and ENSEM-

BLES (http://ensembles-eu.metoffice.com/index.html) [seeIngleby and Huddleston, 2007] but, as described in thispaper, the quality control procedures have been extendedand uncertainty estimates are provided with the objectiveanalyses. The latter is achieved through use of a novelmethod to estimate uncertainty that is relatively robust tomisspecification of the variance of the errors in the analysisbackgrounds compared to methods such as optimal interpo-lation (OI). The method is also used to improve the esti-mates of background error variance. Data are made freelyavailable for scientific research and private study from theMet Office Hadley Centre observations website, http://www.metoffice.gov.uk/hadobs.

[4] The EN4 data set exists alongside a number of othersimilar data sets. For example, the World Ocean Database(WOD; latest version available at time of writing wasWOD09, Boyer et al. [2009]), which is the main datasource used for constructing EN4, and the Coriolis data setfor ReAnalysis (CORA) [Cabanes et al., 2013]. Each hasits own unique features that may make them most appropri-ate for a particular project. Features of the EN4 data setinclude its >100 year time span with all profiles recordedin a particular month provided in a single network commondata format (NetCDF) file and the fact that it is updated tothe present each month. We also apply the different biasadjustments that have been proposed by various authors[e.g., Wijffels et al., 2008; Ishii and Kimoto, 2009; Levituset al., 2009; Gouretski and Reseghetti, 2010; Good, 2011;Gouretski, 2012; Hamon et al., 2012] to adjust for timevarying biases in mechanical bathythermograph (MBT)and expendable bathythermograph (XBT) profiles andmake these data available on request. This gives the userchoice over which adjustments to use or the option to use

1Met Office Hadley Centre, Exeter, UK.2Met Office, Exeter, UK.

Corresponding author: S. A. Good, Met Office Hadley Centre, FitzRoyRoad, Exeter EX1 3PB, UK. ([email protected])

©2013. American Geophysical Union. All Rights Reserved.2169-9275/13/10.1002/2013JC009067

6704

JOURNAL OF GEOPHYSICAL RESEARCH: OCEANS, VOL. 118, 6704–6716, doi:10.1002/2013JC009067, 2013

Page 2: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

the ensemble to understand uncertainty. However, nega-tives for some applications might be the vertical and spatialthinning of profiles and the lack of human inspection of thequality of individual profiles. The availability of alternativedata sets such as WOD09 and CORA is also very useful forunderstanding the limitations of the data. There are manychoices made when constructing a data set, such as how toperform the quality control; no one particular set of choiceswill be perfect. The different data sets sample the results ofmaking such different choices and so can be used to under-stand the size of some of the uncertainties in the data.

2. The EN4 Data Set

[5] The EN4 data set is an incremental development ofthe previous versions EN2 and EN3 that are documented inIngleby and Huddleston [2007] and at http://www.metoffi-ce.gov.uk/hadobs/en3. Here, we give a brief overview ofthe methodology and highlight differences compared tothese predecessors.

[6] The EN4 processing system is illustrated in Figure 1.Profile data are obtained from a number of sources, whichare described in section 2.1. These are preprocessed (forexample, to remove duplicates) and are then subjected to aseries of quality control checks and quality flags areassigned (section 2.2). The quality controlled data are usedto produce a monthly objective analysis of the temperatureand salinity of the ocean. By persisting anomalies throughto the next month, a forecast of the ocean state is producedand this is used in the quality control of the followingmonth of data and as the background to the next month’sanalysis. These aspects are described in section 2.3.

2.1. Data Sources

[7] Data from all types of ocean profiling instrumentsthat provide temperature and (if available) salinity informa-tion are ingested into the EN4 data set. The main datasource for the data set is the WOD09. Three other datacompilations are added to this. First, data compiled duringthe Arctic Synoptic Basin Wide Oceanography (ASBO)project were used with the aim of improving data coveragein the Arctic. This is itself a compilation of profiles fromother data sets and from it we take data sourced from

Hydrobase (http://www.whoi.edu/science/PO/hydrobase/php/index.php), the Physical and Chemical Properties fromSelected Expeditions in the Arctic Ocean project (Swift[2002], http://nsidc.org/data/arcss077.html), the BeaufortGyre Experiment, the North Pole Environmental Observa-tory (NPEO), the Freshwater Switchyard of the Arctic pro-ject, and the Nansen and Amundsen Basins ObservationalSystem (NABOS) and the Canadian Basin ObservationalSystem (CABOS) (see the acknowledgments section foraccess information and/or web references for these proj-ects). Second, data from the Global Temperature and Salin-ity Profile Program (GTSPP) [U.S. NationalOceanographic Data Center, 2006] from 1990 onwardwere used. Third, Argo data for 2000 onward were obtainedfrom the Argo global data assembly centres (GDACs).

[8] For the version of the data set presented here, alldatabases except ASBO were downloaded in September2012. The ASBO collection was obtained in 2009 and hasnot been updated since. Additional months of data will sub-sequently be added on to the data set each month usingdata from the GTSPP and the Argo GDACs.

2.2. Processing and Quality Control of Profile Data

[9] The input data are processed to produce monthlyNetCDF files containing the profiles and quality informa-tion. All data for the month being processed are read into theprocessing system. In addition, two of the quality controlchecks need data from the previous and following months.Argo data from the whole of the preceding and followingmonths are read in, and 5 days of data from before and afterthe start and end of the month from the other data sources.Profiles with high vertical resolution (>400 observationswithin the profile) are thinned so that observations arespaced approximately 1 m apart in the top 100 m of theocean, 10 m apart above 1500 m depth, and 50 m below that.

[10] The source data sets are highly overlapping. Forexample, profiles from Argo floats can be found in theArgo, GTSPP, and WOD09 data sets. It is necessary toidentify and remove the duplicates so that the EN4 data setonly contains one copy of each profile. The method fordoing this is new in EN4 and is based on that described inGronell and Wijffels [2008]. The process has four steps:

[11] 1. The algorithm tries to find and delete the Argoprofiles from WOD09 and GTSPP so that only the Argoprofiles from the Argo GDAC remain in EN4.

[12] 2. Profiles with the same (or very close to the same)location and time are identified and only one is retained.This procedure is used to find duplicates of the same profilefrom different data sources, to remove low-vertical resolu-tion versions of conductivity-temperature-depth (CTD)profiles in WOD09 that are also available at high verticalresolution, and to thin data from types of instruments thatcan yield extremely dense observations.

[13] 3. Profiles with matching temperature, salinity, anddepth values are found. This catches duplicates where oneor both profiles have incorrect location or time of observa-tion. This test also tries to find copies of profiles where onehas pressure as its vertical coordinate and the other depth.

[14] 4. A more detailed version of the third step isapplied that tries to find duplicate profiles where one orboth have been altered before they were archived. Forexample, a profile may have had some of its observations

Figure 1. Flow of processing performed on the data.

GOOD ET AL.: THE EN4 DATA SET

6705

Page 3: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

removed or its data values truncated. This step is onlyapplied to profiles that are close to each other in space andtime because of the amount of processing this requires.

[15] Application of the duplicate check results in thedeletion of large numbers of profiles from the 1990sonward due to the high overlap between Argo, GTSPP, andWOD09. Prior to 1990, when WOD09 supplies almost allthe input data, the duplicate check typically removes 0–10% of the input profiles. In comparison, Gronell and Wijf-fels [2008] state that they find 6–10% of profiles to beduplicates within a data set.

[16] Various other processing steps are applied to thedata, which are largely as described for the previous ver-sions of the data set. If profiles from moored buoys areavailable more frequently than 1 day21, a daily averagingis performed. XBT profiles can be affected by errors nearthe surface [see e.g., Kizu and Hanawa, 2002] and Inglebyand Huddleston [2007] found that XBT data are of lowerquality at depth. Therefore, bathythermograph measure-ments recorded at (i) 4 m or shallower and (ii) 950 m ordeeper are rejected.

[17] XBT depths are, where necessary, adjusted accord-ing to Hanawa et al. [1995] or Kizu et al. [2005]—seeIngleby and Huddleston [2007] for more details. However,a change in processing compared to the previous versions

of the data set is that the adjustment to XBT depths is notwater temperature dependent. Time varying biases havebeen found to occur in XBT (and also mechanical bathy-thermograph (MBT)) data [Gouretski and Koltermann,2007], affecting both the depths and temperatures. As men-tioned in the introduction, there have recently been manyproposals for adjusting for these but at present there is noclear evidence to choose any single method over the others.The approach taken here is to apply these adjustments post-processing to create an ensemble of adjusted data sets thatrepresents the uncertainty in how to adjust the data. Themost appropriate starting point for this is data without thewater temperature-dependent adjustments.

[18] The profile data are then subjected to a series ofquality control checks. These are described briefly in Table1. Further details for many of the checks can be found inIngleby and Huddleston [2007]. Below, we briefly describenew checks that have been added since that paper waswritten.

[19] Argo profiles contained in externally sourced lists ofsuspect data are rejected in order to reduce the possibilityof bad Argo data being passed as good in the EN4 data set.The lists used are the Argo gray lists of suspect floats and alist based on the results of quality control using altimetrydata [Guinehut et al., 2009].

Table 1. Quality Control Checks Applied to the Profile Dataa

Name of CheckExternal/

Automatic Description

Argo gray list External All previous gray lists available from ftp://ftp.ifremer.fr/ifremer/argo/etc/ar_grey-list/were combined and data on the gray lists that had not had delayed modequality control applied by the Argo project were rejected. Gray lists are com-bined to capture the exclusions for nonactive floats, which are not included onthe most recent version of the gray list.

Argo quality control flags External The quality control decisions for adjusted variables in the Argo input files areretained. Additionally, following the advice given at http://www.argo.ucsd.edu/Acpres_bias.html, levels with pressure error>5 20 are rejected.

Argo altimetry quality control External Decisions from the comparison of Argo data to satellite altimetry [Guinehut et al.,2009, available e.g. from ftp://ftp.ifremer.fr/ifremer/argo/etc/argo-ast9-item13-AltimeterComparison/] are used to reject data.

Manual exclusions External A set of rejections developed manually for previous versions of the EN data set isapplied.

Range check Automatic Rejects (and sets to missing data) temperatures outside the range 24 to 40�C andsalinities outside 0–45.

Profile check Automatic Looks for spikes, steps, etc., in the data.Track check Automatic Examines the path that an instrument has taken for e.g., impossible speeds. This

benefits from the extra data that are read in from outside the month beingprocessed.

Stability check Automatic Checks for density inversions in a profile.Bathymetry check Automatic Compares the location of a profile to the depth in a bathymetry data set to confirm

that it is in the ocean.Measurement depths check Automatic Looks for impossible depths (e.g., less than 0), repeated depths and out of order

depths.Waterfall check Automatic Works on salinity data from Argo instruments. Compares successive measure-

ments made by the same float to find bad values.Background checks Automatic Checks data against a persistence forecast of the ocean state. Note that this uses

the background error variances described in Ingleby and Huddleston [2007].Buddy check Automatic Compares measurements to nearby data from other instruments.Multilevel check Automatic Rejects a profile if more than half the levels are rejected; can also reverse some

rejections if the data agree with nonrejected data in the same profile.

aSee Ingleby and Huddleston [2007] and section 2.2 for more details. The second column denotes whether the quality decisions are based on manual orexternally sourced information (external) or are a result of an automatic algorithm within the processing system (automatic).

GOOD ET AL.: THE EN4 DATA SET

6706

Page 4: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

[20] The ‘‘bathymetry check’’ aims to detect profiles withlatitude and longitude that are over land rather than water,which obviously indicates an error in the location informa-tion. Land topography and ocean bathymetry information isobtained from the ETOP01 1 arc-minute data set [Amanteand Eakins, 2009]. For each profile, the four ETOP01 gridpoints surrounding the profile location are found. If all fourare land points, the profile is rejected. Interpolation of thebathymetry data is not performed owing to the risk of steepland topography making offshore locations close to the coastappear as though they are on the land.

[21] The ‘‘measurement depth check’’ aims to identifysuspect depth values within a profile. Impossible depths(depth values <0 m or >11,000 m) are rejected. Each obser-vation level in each profile is then compared to the others tocheck if depth is increasing throughout the profile. If this isnot the case, then the level that has a depth that is mostinconsistent with the others is rejected. For example, if thefirst observation in the profile, which should be the shallow-est, has a depth that is deeper than all the others, it would berejected. If the level to reject is ambiguous, the algorithmchecks whether one has already been rejected by the profilecheck (which rejects observations where there are spikes,steps etc. in the profile) and, if so, leaves the other as notrejected. Otherwise, both levels are rejected. This process isrepeated until all the remaining nonrejected levels havedepths that increase monotonically throughout the profile.

[22] The ‘‘waterfall check’’ is used to identify suspectsalinity values in Argo data that have not already beendetected in other tests. The algorithm attempts to mimic theprocess a human operator might use to manually qualitycontrol a sequence of profiles by plotting each next to theothers, giving the appearance of a waterfall. Three monthsof profiles from each float are used. The sequence of pro-files is used to establish a baseline salinity for each depth.If the difference between the salinity recorded in a profileand this baseline exceeds two thresholds (which are basedon statistics from the sequence of profiles that is beingexamined and global statistics from all Argo data), thesalinity value is rejected.

2.3. Objective Analyses

[23] Monthly potential temperature and salinity objectiveanalyses are formed from the quality controlled data. Themethod is largely unchanged from Ingleby and Huddleston[2007] so is described only briefly here. In the new versionof the data set, uncertainty estimates for the objective anal-yses are provided for the first time. These are calculatedusing a novel method, which is described in detail in sec-tion 3.

[24] The analyses are formed by combining a back-ground (or first guess)—which is a persistence-based fore-cast of the ocean state from the previous month’s objectiveanalysis (see below for details)—with the quality controlledprofiles from the month being analyzed. They are calcu-lated using the Analysis Correction (AC) scheme [Lorencet al., 1991], which is an iterative method of solving theoptimal interpolation (OI) equations [e.g., Donlon et al.,2012]. The method uses a recursive filter to model thespreading of information from the observations (the obser-vation increments), which gives advantages over the stand-ard OI algorithm such as removing the need to subselect

observations close to each grid point. The analyses are pro-vided on a grid with 1� spacing in the horizontal and with42 vertical levels. The number of iterations of the ACmethod has been increased from 10 to 50 in this version ofthe objective analyses compared to the previous version.This achieves closer convergence of the solution.

[25] Aside from the background and the profile data, thesystem requires knowledge of the covariances in the possi-ble errors in the observations and the background in orderto calculate the analyses. Errors in observations areassumed to be uncorrelated with each other and are givenvalues varying with depth as defined in Ingleby and Hud-dleston [2007]. The background error covariances are para-meterized as a set of background error variances at eachanalysis grid point along with a function that defines howthe errors covary spatially. The background error varianceshave been revised for this version of the data set (see sec-tion 3.2 for details). The covariance is specified using thecombination of two second-order autoregressive (SOAR)functions with length scales of 300 and 400 km (between�64�N and the equator the former is increased exponen-tially to 1500 km). Vertical spreading of information is alsoapplied and is governed by two correlation length scales,which are set to 200 and 100 m.

[26] The persistence-based forecast of the ocean state formonth i is calculated as:

xfi 5xc

i 1aðxai212xc

i21Þ (1)

[27] This and following equations are written followingthe notation of Ide et al. [1997]. It indicates that the fore-cast ðxf

i Þ is generated from the climatology for the calendarmonth corresponding to i ðxc

i Þ plus the anomalies from theprevious month (analysis xa

i21 minus climatology xci21 for

that calendar month), which are first reduced by a factor a.As in previous versions, a is set to be 0.9. This value corre-sponds to an e-folding time scale of 9.5 months.

[28] The climatology is unchanged from previous ver-sions. It represents the average over 1971–2000 and wasformed from smoothed objective analyses from a previousversion of the data set (EN2), which in turn used as its cli-matology the World Ocean Atlas 1998 [Antonov et al.,1998a, 1998b, 1998c; Boyer et al., 1998a, 1998b, 1998c].In the first month of the data set (January 1900), the fore-cast is the January climatology.

[29] It is important to note that the analyses will relax toclimatology in the absence of any observations. Care musttherefore be taken if using the analyses for applicationssuch as identifying trends in temperature or salinity,because a trend may be unrealistic if analyzing periodswhen there were no observations. In this case, considera-tion should be given to alternatives such as reanalysis prod-ucts that use a numerical model forecast as the background(e.g., see Xue et al. [2012], for an intercomparison of vari-ous examples of these) or objective methods that do notfeature this relaxation to climatology (as done, for example,by Rayner et al. [2003], for sea surface temperature data).

3. Objective Analysis Uncertainty Estimates

[30] In version 4 of the EN data set, uncertainty esti-mates are provided with the objective analyses for the first

GOOD ET AL.: THE EN4 DATA SET

6707

Page 5: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

time. As already described, the EN4 objective analyses areproduced using the AC scheme, which solves the OI equa-tion iteratively. If using OI to produce the analyses directly,there would be an equation that could be used to calculatethe analysis uncertainty. This is used, for example, by Ishiiet al. [2003]. Unfortunately, the way that the AC schemeoperates prevents the direct evaluation of this equation. Inany case, the OI equation will only give accurate uncer-tainty estimates if provided with accurate estimates ofbackground and observation error covariances, which areoften not available.

[31] An alternative method to generate uncertainty esti-mates for the EN4 objective analyses was therefore devel-oped. Since the AC scheme gives the same result as OI, OIcan be used as a more practical means to test that alterna-tive method. In the following sections, we describe themethod, test it, and discuss its strengths and weaknesses.Finally, we present the result of applying the method to theEN4 data.

3.1. Method

[32] The method that has been developed to calculateuncertainty estimates for each EN4 objective analysis is animproved version of a scheme used for the Operational SeaSurface Temperature and Sea Ice Analysis (OSTIA) [Don-lon et al., 2012]. It involves first calculating a special‘‘observation influence’’ analysis. This has values of closeto one where there is strong influence of observations onthe analysis and values of zero where there were no obser-vations. In general, the more that observations impact onthe analysis the smaller the analysis error variance will be.Hence, it is possible to use the observation influence toinfer analysis error variance.

[33] If we consider the situation where climatology isused as the background and assume a linear relationshipbetween the analysis error variances and the observationinfluence values (see section 3.1.1 for justification of this),we can write:

bai 5bc1g � xoi

i (2)

[34] Where bai is the analysis error variance in month i

for each grid point of the analysis, bc is the backgrounderror variance if climatology is the background, and g isthe change in analysis error variance per unit change in theobservation influence value ðxoi

i Þ. g should normally benegative since increasing the influence of observationsshould decrease the analysis error variance. In the limitwhere there are no observations, the observation influencevalues will be zero, hence ba5bc. Conversely, where theobservation influence values are close to unity the analysiswill be almost independent of the background and the anal-ysis error variance will be ba5bc1g.

[35] For the EN4 objective analyses, the background is apersistence-based forecast rather than climatology andtherefore its error variance will lie somewhere inbetweenthe two limits described above. We approximate it as:

bfi 5bc1g � bxoi

i21 (3)

[36] This equation states that the error variance of thepersistence-based forecast for month i ðbf

i Þ is related to the

previous month’s observation influence analysis ðxoii21Þ by

the same coefficients bc and g used in equation (3). How-ever, the observation influence values are reduced by a fac-tor b to reflect the temporal decorrelation of anomaliesbetween months and the fact that the persistence-basedforecasts are relaxed back to climatological values. Basedon examination of results, b is set to 0.65.

[37] When using the persistence-based forecast as thebackground, the error variance of the resulting analysismust lie between bf

i (if there are no observations) andbc1g (in the situation where the analysis is almost com-pletely determined by observations and so it is largely irrel-evant what the background is). To achieve this, theobservation influence values xoi

i are replaced by thequantity:

bxoii211xoi

i 12bxoii21

� �(4)

[38] This represents a linear transition between bxoii21

when no observations are available to 1 when the analysisis completely determined by the observations. This newdefinition of xoi

i is used when calculating analysis error var-iance using equation (2). By doing this, both previous andcurrent observations impact on the analysis error varianceestimates that are calculated.

[39] To obtain the estimates of analysis error variance,we therefore only need our observation influence analyses(section 3.1.1) and estimates of bc and g (see section 3.1.2for details of how these are obtained).

3.1.1. The Observation Influence Analysis[40] The observation influence analysis is generated by

repeating the objective analysis of a month of data but withall observation values set to one, the background values tozero, and the length scales that determine the covariance inthe background errors shortened (see below). No othermodification of the analysis scheme is required. The aim ofthe observation influence analysis is to produce an analysisthat has values that are linearly related to the analysis errorvariance at each grid point. Using the OI equations, it canbe demonstrated that this is achievable for a simple case.From OI, the equation for the analysis error variance ðBaÞis :

Ba5ðI2KHÞBf (5)

K5Bf HT HBf HT 1R� �21

(6)

[41] The equation depends on the background errorcovariance matrix, Bf, and the matrix containing theweights applied to the observation increments when form-ing the analysis, K. It also uses the identity matrix, I, andthe linearized observation operator, H (which transformsdata on the analysis grid points to observation space). Aswell as depending on the background error covariancematrix, this depends on the observation error covariancematrix, R.

[42] Take the simple situation where we have a singleobservation at location o and wish to calculate the errorvariance at a single analysis point a. We first make theassumption that the error in the background covariesbetween locations as Bf

ao5Bfaaexp 2ð j rljÞ, where Bf

ao is thebackground error covariance between the analysis and

GOOD ET AL.: THE EN4 DATA SET

6708

Page 6: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

observation locations, Bfaa is the background error variance

at the analysis grid point, r is the distance between analysisand observation locations and l is a length scale. Theweight applied to the observation increment, Kl, for a par-ticular length scale then becomes:

Kl5Bf

ao

Bfoo1R

5Bf

aaexp 2j rl j� �

Bfoo1R

(7)

[43] Note that Bfoo is the background error variance at the

observation location. Equation (5) then simplifies to:

Ba5Bfaa2KlB

fao

5Bfaa2

Bfaaexp 2ð j rljÞ

Bfoo1R

Bfaaexp 2ð j r

ljÞ

5Bfaa 12

Bfaaexp 2ð j r

l2ð ÞjÞ

Bfoo1R

24

35

5Bfaa 12Kl

2

� �

(8)

[44] We have found that analysis error variance is relatedlinearly to Kl

2. This is the same quantity that is calculated

by an observation influence analysis if the background errorcovariance length scale is halved. Therefore, in this testcase we have shown that the observation influence is line-arly related to analysis error variance.

[45] Although the length scale for the analysis error ishalf that of the background error in this simple case, thischanges if a different background error covariance structureis assumed. Also, in reality, there will be many observa-tions influencing each analysis grid point and the observa-tion influence analysis will contain the sum of the weightsapplied to the observation increments for each grid point.We examined this more complicated situation in an ideal-ized framework. We simulated sets of observations andused OI to calculate the analysis error variance and theobservation influence analysis with various scaling factorsapplied to the background error covariance length scales(Figure 2) to determine a suitable scaling factor to use forthe EN4 analyses. In Figure 2, plots a–c are shown plots ofanalysis error variance against observation influence. Eachdot in the plots represents a set of simulated observations,which each consist of between 1 and 20 observations ran-domly positioned on a 21 3 21 grid. The total backgrounderror variance was chosen to be twice the observation errorvariance. The length scales that determine the backgrounderror covariance were set to be representative of those usedin the EN4 objective analysis system (3 and 4 times thegrid spacing) when calculating the analysis error covari-ance and to 1 (plot a), 1

1:7 (plot b), and 12 (plot c) times those

values when calculating observation influence. The solidred lines are the result of fitting a line to the data. In plot a(scaling factor 1), the relationship between analysis errorvariance and observation influence is clearly not linear andthere is a clustering of points at high values of observationinfluence. However, in plot b—with the scaling factor setto 1

1:7 — an approximately linear relationship is achieved.Changing the scaling factor further results again in a non-linear relationship. This is illustrated quantitatively in plot

d which shows the root-mean-square difference betweenthe data points and the fitted line for various scaling factors,showing that the relationship is closest to linear if the scal-ing factor is 1

1:7. The optimum scaling factor varies to 11:8 if

the stretched length scale used near the equator is adopted.We therefore adopt 1

1:75 as the scaling factor.[46] In summary, our observation influence analyses are

generated by repeating an objective analysis but with allobservation values set to one, all background values set tozero, and background error covariance length scales short-ened by a factor 1

1:75. These values are expected to be line-arly related to the analysis error variance and can thereforebe used to generate uncertainty estimates for the EN4objective analyses.

3.1.2. Relating Observation Influence to AnalysisError Variance

[47] We have determined that we can produce an obser-vation influence analysis that has a linear relationship withanalysis error variance. We now need to obtain the coeffi-cients of that relationship, i.e., we need to know bc and gfrom equation (3). That relationship is not of the formderived in equation (8) for the simplest case, which isshown as the dashed red line in Figure 2b. Also, this equa-tion would only hold if we have good knowledge of thebackground error variance, which can often not be the case.We therefore needed an alternative method to obtain thecoefficients.

[48] The method developed to determine these is todivide each month of observations into two groups with 2%of the observations in one group and the remainder in theother and make objective analyses from each, using clima-tology as the background. Locations in the analyses wherethe observation influence analysis values are high (between0.95 and 1.00) are then found; these are locations wherethere are sufficient observations or observations closeenough to the analysis point to give a high observationinfluence value, i.e., where the result of the analysis isdetermined mostly by the observations with little influencefrom the background. The analysis values here are used asan estimate of the true value of the field in those locations.The difference between the other analysis and this ‘‘truth’’gives an estimate of the error in the former. These analysiserror estimates and the corresponding values of observationinfluence are then used to obtain the coefficients of the lin-ear relationship in equation (2). The proportion of observa-tions in each group was chosen to give sufficient locationswith high values of observation influence in one set of anal-yses with a range of observation influence values in theother.

[49] This method was applied to the real EN4 data. Theanalysis using 2% of the data is referred to as the ‘‘sparselysampled analysis’’ and the analysis of the remainder, the‘‘reference analysis.’’ All the analyses from the 1930s,1940s, 1960s, 1980s, 1990s, and 2000s were used to definethe relationship between observation influence and analysiserror variance in each location. The remaining decades ofdata were reserved to provide semi-independent data fortesting the resulting uncertainty estimates. Analysis valuesin a 15� diameter circle centered on each analysis grid pointover all months were selected and searched for observationinfluence values between 0.95 and 1.00 to use as an

GOOD ET AL.: THE EN4 DATA SET

6709

Page 7: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

estimate of the true value of the field. These generally werefound in the reference analyses but the sparsely sampledanalyses were also searched for locations with high obser-vation influence. Where these occurred, the differencebetween that and the other analysis and the correspondingobservation influence value were saved. These differencedata were binned by observation influence and the varianceof the data in each bin found (as long as there were at least10 points in the bin). If at least eight of the bins had var-iance estimates, a line was fitted to the data using a leastabsolute difference criterion. This fitting method was usedto reduce the influence of outliers on the fit. If it was notpossible to fit a line, the window diameter was increased by2� and the process repeated up to a maximum of 45� diame-ter. If, at observation influence values of zero or one, thefitted line had negative values, it was rejected. The coeffi-cients of these line fits defined the contents of bc and gdescribed in equation (3).

[50] For the potential temperature analyses, we wereable to associate line fits using this method for greater than90% of the ocean area shallower than 1000 m. Below thatdepth the proportion of the ocean filled with line fitsdecreases rapidly. Therefore, levels below 1000 m depthwere not investigated further and do not have uncertaintyestimates calculated. Generally, we were able to associateline fits to a lower proportion of the ocean area for the

salinity data than for potential temperature. At least 70% ofthe ocean area was filled for the salinity analyses at eachdepth above 1000 m. However, for consistency, it wasdecided to derive uncertainty estimates for the same depth

Figure 2. Relationship between analysis error variance and observation influence if the scaling factorapplied to the background error covariance length scales is (a) 1, (b) 1/1.7, and (c) 1/2. Solid red linesare lines fitted to the data; the dashed red line in Figure 2b is as would be obtained if a relationship asderived in equation (8) is assumed. Plot (d) shows how the linearity of the data points varies with scalefactor.

Figure 3. Gradients of lines relating observation influ-ence to analysis error variance (cf. Figure 2) for the topanalysis level for potential temperature on the first iterationof the method.

GOOD ET AL.: THE EN4 DATA SET

6710

Page 8: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

range as for potential temperature. Any unfilled locationsin bc and g were infilled using a nearest neighbor algorithmto give spatially complete maps of the intercept and gra-dients of the lines fitted to the data.

[51] We note that a drawback to this scheme is that the‘‘truth’’ analyses contain errors and share the same back-ground as the analyses they are compared to. We usedsimulated sets of observations to investigate these issues.The method proposed to find the equation linking observa-tion influence to analysis error variance was found to some-times underestimate the analysis error variance if the

observation influence is low, including in the situationwhere the background and observation error covariancesare known perfectly. However, it was found that themethod is more robust to inaccurate knowledge of the errorcovariances than the OI analysis error variance equation. Inparticular, it provides clear benefits in the situation wherethe background error variance is misspecified. As we showbelow, this robustness also means that the method is usefulfor identifying regions where there are misspecifications.While the underestimation of the analysis error variancewhen observation influence is low is clearly an issue, the

Figure 4. Intercepts of the lines relating observation influence to analysis error variance for selectedlevels calculated during the first iteration of the method. These were subsequently used as the back-ground error variances input to the analyses, subject to them having a minimum value of a quarter of theobservation error variance.

GOOD ET AL.: THE EN4 DATA SET

6711

Page 9: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

misspecification of background error variance is potentiallya much larger problem and justifies the use of the methodin the current study. If we were confident that the back-ground and observation error covariances were well speci-fied, it would be possible to compensate for thisunderestimation. However, as shown in section 3.2, this isnot the case. We therefore leave the underestimation of theanalysis error variance in some situations as a problem tobe considered in the future.

3.2. Results

[52] An example of the gradients of the lines (corre-sponding to g in equation (2)) calculated from the potentialtemperature data in the top level of the analysis is shown inFigure 3. These indicate how the analysis error variancealters as the influence of observations on the analysischanges. Unexpectedly, we found that the gradients arepositive in some locations. These tend to be in regionscharacterized by high variability, for example in the GulfStream region and the Agulhas region south of Africa. Thisundesirable feature implies that the analyses have largererror variances than climatology in those locations.

[53] Further investigation using simulated observationsand background fields as described previously confirmedthat this can occur in limited circumstances when the inputsto the analysis system are in error. Eyre and Hilton [2013]investigated the effect of misspecifying the backgrounderror variance. They found that analysis error variance canbecome larger than the background error variance if the lat-ter is smaller in reality than that specified to the analysissystem, resulting in the observations being fitted tooclosely. Therefore, in addition to providing analysis errorvariance estimates, our method is also useful in highlight-ing regions with misspecified error variances [similar, e.g.,to the methods of Desroziers et al., 2005].

[54] The intercepts of the fitted lines for selected levelsare shown in Figure 4. These represent the error variancewhen no observations are available and therefore are esti-mates of the background error variance. Given the above,and the fact that we find that the method for deriving therelationship between observation influence and analysis

error variance is robust to misspecification of backgrounderror variance, we attempted to improve the backgrounderror variance input to the analysis system using our results.The intercepts of the line fits were used as the estimates ofthe background error covariance. The variance was splitequally between the two length-scale components of thebackground error covariance. As errors in line fits couldresult in unrealistically low background error variance esti-mates, the minimum allowed total error variance was set tobe a quarter of the observation error variance.

[55] The estimation of the relationships between obser-vation influence and analysis error variance was thenrepeated. It was found that the occurrence of locations withhigh observation influence values was decreased in thisnew version of the analyses. Therefore, two modificationsto the method were made. First, the reference analysesfrom the first iteration of the method were compared to thenew sparsely sampled analyses. This is justified since themain requirement for the reference analyses is to generateanalysis points that are strongly determined by the observa-tions. Simulated data were generated that demonstrated thatthis does not impact on the effectiveness of the method.Second, the number of bins that were required to be filledin order to perform a line fit was halved.

[56] The result of this iteration of the method is shown inFigure 5. The areas with positive gradients in the north-west Pacific and Atlantic are reduced in magnitude in thisiteration. However, new areas of large positive gradientshave appeared, most noticeably in the Indian Ocean. Theseare not due to overestimation of the background error var-iances since this was decreased between iterations. Theyappear to be caused by profiles with no near-surface obser-vations (because they did not pass the quality control). Theanalysis method instead spreads the observation incrementsfrom depth, where the variability is larger, affecting theanalysis error variance at the surface. Again, this demon-strates that the method is useful in highlighting issues withthe analysis method. However, it is beyond the scope ofthis study to try to resolve the problem.

[57] Despite the new areas of positive gradients, sincethe new background error variances should, in general, bean improvement over those used previously, we decided tocontinue to use these when generating the EN4 analyses.Further work is required to understand more completely thecause of the positive gradients. In particular, the otherinputs to the analysis system (the background error covari-ance length scales and the observation error variances)need to be examined and perhaps refined.

[58] Finally, in Figure 6, we show some examples of theobservation influence analyses (first column), the sameobservation influence analyses adjusted for the impact ofusing a persistence forecast as background (equation (4))(second column), and the corresponding analysis error var-iance fields (third column). The first row shows the resultsfor potential temperature for January 1965 for level 21 (373m). The restricted spatial sampling of the oceans to thatdepth at that time is reflected in the observation influenceplots. In the second row (showing January 1975), the situa-tion after the introduction of XBTs is illustrated. Observa-tional sampling has clearly improved and this is reflected inthe reduction in the analysis error variance between the firsttwo rows. For example, decreases in analysis error

Figure 5. Gradients of lines relating observation influ-ence to analysis error variance for the top analysis level forpotential temperature on the second iteration of themethod.

GOOD ET AL.: THE EN4 DATA SET

6712

Page 10: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

variances are observed to the north/north-east of New Zea-land. The decrease to the north-east results from the persist-ence of information from previous month of data, whichdemonstrates the value of the persistence-based forecasts.

[59] In the third and fourth rows of Figure 6 is shown,the change in observation influence, persisted observationinfluence, and analysis error variance between July 2000and July 2012 for salinity in the top level of the analyses.This illustrates very clearly the change in observationalsampling that has been brought about by the Argo project.A very obvious difference in analysis error variance can beseen between the Argo period and pre-Argo.

3.3. Validation of Uncertainty Estimates

[60] The uncertainty estimates have been validated usingobservation minus analysis statistics. This was done usingthe same EN4 data split into two groups as before. How-ever, the data were analyzed in a different way to that usedfor deriving the uncertainty estimates. If the two differentmethods agree, this gives confidence in the uncertainty esti-mates. To give further independence between the deriva-tion of the uncertainty estimates and the validation, thevalidation is applied to the data that were with held fromthe derivation of the uncertainty estimates.

Figure 6. Examples of (left) observation influence analyses, (middle) the same observation influenceanalyses but adjusted for the impact of using a persistence-based forecast as the analysis background,and (right) the corresponding analysis error variance fields for four different date, depth, and parametercombinations (top two rows: temperature, bottom two rows: salinity).

GOOD ET AL.: THE EN4 DATA SET

6713

Page 11: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

[61] The validation method is to apply the EN4 process-ing system to the group of data containing 2% of the obser-vations in each month over the full length of the data set.The resulting analyses are compared to the observationsfrom the other group (the ‘‘reference observations’’). Tocompare the observations and analyses in this way, addi-tional uncertainty components must be accounted for—therepresentivity uncertainty induced by the point observa-tions sampling processes that occur on smaller scales thanthe resolution of the analyses and the error variance in thereference observations. These components were estimatedtogether in each analysis grid cell by interpolating the refer-ence observations to the analysis levels and then calculat-ing the variance of the differences between nearby profiles.Only profiles that were not used in deriving the uncertaintyestimates were used for this.

[62] The squared differences between the referenceobservations and the nearest analysis grid points were nor-malized by the sum of the analysis error variance at theobservation location and the corresponding estimate of rep-resentivity and observation error variance. If there wasmore than one observation within a grid cell, one was arbi-trarily chosen to calculate this difference and the rest dis-carded to avoid heavily observed areas dominating thestatistics. The square root of the mean of the normalizeddifferences was calculated for each month. The results areshown in Figure 7a. The normalized score is close to onethroughout the record, indicating that the analysis error var-iance values are not unreasonable. No differences are seenbetween the periods where the data were used to deriveuncertainty estimates (the 1930s, 1940s, 1960s, 1970s,1990s, 2000s) and those where they were not. However,when the results for each analysis level were aggregatedover the time period, it is seen that the normalized scoredeviates from one at the deeper levels, as shown in Figure7b (black lines). The cause of this appears to be an underes-timation of the climatological error variance. This is dem-onstrated by showing statistics for when the observationinfluence is less than 0.1 (gray dashed line) or greater than0.7 (gray solid line). The former corresponds to the situa-tion where the analyses are close to the climatological val-ues. In this case, the normalized score clearly deviates from

one at the deepest levels. However, in the latter case wherethe analyses are more strongly influenced by observationsthe statistics are relatively close to one. In shallower levels,there is no dependence of normalized score on observationinfluence. We hypothesis that the difference at deeper lev-els is due to the historical lack of observations at depth,which means that ocean variability is not fully sampled. Ifthis is the case, the problem should reduce in future yearsas the record of deeper observations extends in time.

4. Conclusions

[63] We have presented a new version of the Met OfficeHadley Centre EN series of data sets, EN4. The data setruns from 1900 to present and is based on subsurface oceantemperature and salinity profile data obtained from theWOD09, GTSPP, Argo, and ASBO collections. All datawere first compared to identify and remove duplicates.They were then subjected to a series of quality control pro-cedures and quality flags assigned. These included threenew checks introduced in this version of the data set. Thefirst of these checked the depth at the location of each pro-file in the ETOP01 relief data set to confirm that it lies inthe ocean. The second ensured that depths increase monot-onically throughout each profile and rejected levels thatappeared erroneous. The third implemented automatically acheck that might be done by a human operator. It comparedsequences of salinity profiles recorded by an Argo float tofind errors. The profile data are output into monthlyNetCDF files and are made available for research and pri-vate study from http://www.metoffice.gov.uk/hadobs.

[64] Monthly potential temperature and salinity objectiveanalyses were calculated from the quality controlled oceandata. These have a regular 1� horizontal grid and 42 levelsin the vertical. The backgrounds for the analyses were fore-casts of the ocean state generated by persisting anomaliesfrom the previous month. These anomalies were damped sothat the analyses relax to climatology in the absence ofobservations. A new method of calculating the uncertaintyin the analyses has been described. It was found that thismethod is robust to misspecification of the backgrounderror variances, that it is useful for detecting issues with the

Figure 7. Root-mean square (RMS) of reference observations minus analysis values after normaliza-tion by the analysis error variance and estimate of representivity and observation error by (a) date and(b) analysis level. In the latter, the dashed gray line indicates the statistics for analyses with persistedobservation influence less than 0.1 and the solid gray line for where it is greater than 0.7.

GOOD ET AL.: THE EN4 DATA SET

6714

Page 12: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

analyses, and that it can be used to improve the backgrounderror statistics used by the analysis system. However,despite updates to the background error variance, it wasfound that some regions of the analyses display the undesir-able property of having higher error variance than climatol-ogy. More investigation is needed to determine the causesof some of these issues and to resolve the problems. Addi-tionally, at depth there is evidence that the climatologicalerror variance is underestimated. In these regions, cautionis urged when using the analyses and uncertainty estimates.As with the profile data, the analysis files are made avail-able from the Met Office website.

[65] In the future, the EN4 data will be kept up to date(with approximately half a month lag) using data madeavailable in the GTSPP and Argo collections. Bias adjust-ments proposed for MBT and XBT data will be applied tothe data to make an ensemble of possible bias adjusteddata. These can be made available to users on request andprovide a mechanism to understand the sensitivity of anapplication to the uncertainty in how to adjust for biases inthe data.

[66] Acknowledgments. This work was supported by the JointDECC/Defra Met Office Hadley Centre Climate Programme (GA01101)and by ERA-CLIM, a collaborative project (2011–2013) funded by theEuropean Union under the 7th Framework Programme. Thanks to KristianMogensen for the program to generate the combined Argo gray list. Wewould like to thank the three anonymous reviewers for their comments,which significantly improved the paper. Acknowledgments for data (alsosee references in section 2.1) now follow. Argo: Argo data were collectedand made freely available by the International Argo Program and thenational programs that contribute to it. (http://www.argo.ucsd.edu, http://wo.jcommops.org/cgi-bin/WebObjects/Argo). The Argo Program is partof the Global Ocean Observing System. ASBO: The ASBO data were col-lated as part of the Arctic Synoptic Basin-wide Observations (ASBO) pro-ject by Takamasa Tsubouchi (http://www.southampton.ac.uk/oes/research/projects/arctic_synoptic_basinwide_observations_asbo.page).Beaufort Gyre experiment: The data were collected and made available bythe Beaufort Gyre Exploration Program based at the Woods Hole Oceano-graphic Institution (http://www.whoi.edu/beaufortgyre/). North Pole Envi-ronmental Observatory (NPEO): http://psc.apl.washington.edu/northpole—NSF grants OPP-9910305, OPP-0352754, and ARC-0634226. Fresh-water switchyard of the Arctic project: M. Steele, W. Smethie, and P.Schlosser, CTD data from the Arctic Switchyard Project, Office of PolarPrograms, National Science Foundation: http://psc.apl.washington.edu/switchyard/. Nansen and Amundsen Basins Observational System(NABOS), Canadian Basin Observational System (CABOS): http://nabo-s.iarc.uaf.edu/.

References

Amante, C., and B. W. Eakins (2009), ETOP01 1 arc-minute global reliefmodel: Procedures, data sources and analysis, NOAA Tech. Memo.NESDIS NGDC-24, 19 pp.

Antonov, J., S. Levitus, T. P. Boyer, M. Conkright, T. O’Brien, andC. Stephens (1998a), World Ocean Atlas 1998, Temperature of theAtlantic Ocean, vol. 1, 166 pp., NOAA Atlas NESDIS 27, U.S. Gov.Print. Off., Washington, D. C.

Antonov, J., S. Levitus, T. P. Boyer, M. Conkright, T. O’Brien, andC. Stephens (1998b), World Ocean Atlas 1998, Temperature of thePacific Ocean, vol. 2, 166 pp., NOAA Atlas NESDIS 28, U.S. Gov.Print. Off., Washington, D. C.

Antonov, J., S. Levitus, T. P. Boyer, M. Conkright, T. O’Brien,C. Stephens, and B. Trotsenko (1998c), World Ocean Atlas 1998, Tem-perature of the Indian Ocean, vol. 3, 166 pp., NOAA Atlas NESDIS 29,U.S. Gov. Print. Off., Washington, D. C.

Arribas, A., et al. (2011), The GloSea4 ensemble prediction system for sea-sonal forecasting, Mon. Weather Rev., 139, 1891–1910, doi:10.1175/2010MWR3615.1.

Balmaseda, M., A. Vidard, and D. L. T. Anderson (2008), The ECMWFocean analysis system: ORA-S3, Mon. Weather Rev., 136, 3018–3034,doi:10.1175/2008MWR2433.1.

Boyer, T. P., S. Levitus, J. Antonov, M. Conkright, T. O’Brien, and C.Stephens (1998a), World Ocean Atlas 1998, Salinity of the AtlanticOcean, vol. 4, 166 pp., NOAA Atlas NESDIS 30, U.S. Gov. Print. Off.,Washington, D. C.

Boyer, T. P., S. Levitus, J. Antonov, M. Conkright, T. O’Brien, andC. Stephens (1998b), World Ocean Atlas 1998, Salinity of the PacificOcean, vol. 5, NOAA Atlas NESDIS 31, U.S. Gov. Print. Off., Washing-ton, D. C.

Boyer, T. P., S. Levitus, J. Antonov, M. Conkright, T. O’Brien,C. Stephens, and B. Trotsenko (1998c), World Ocean Atlas 1998, Salin-ity of the Indian Ocean, vol. 6, 166 pp., NOAA Atlas NESDIS 32, U.S.Gov. Print. Off., Washington, D. C.

Boyer, T. P., S. Levitus, J. I. Antonov, R. A. Locarnini, and H. E. Garcia(2005), Linear trends in salinity for the World Ocean, 1955-1998, Geo-phys. Res. Lett., 32, L01604, doi:10.1029/2004GL021791.

Boyer, T. P., et al. (2009), World Ocean Database 2009, edited S. Levitus,216 pp., NOAA Atlas NESDIS 66, U.S. Gov. Print. Off., Washington,D. C.

Cabanes, C., et al. (2013), The CORA dataset: Validation and diagnosticsof in situ ocean temperature and salinity measurements, Ocean Sci., 9,1–8, doi:10.5194/os-9-1-2013.

Desroziers, G., L. Berre, B. Chapnik, and P. Poli (2005), Diagnosis ofobservation, background and analysis-error statistics in observationspace, Q. J. R. Meteorol. Soc., 131, 3385–3396, doi:10.1256/qj.05.108.

Donlon, C. J., M. Martin, J. Stark, J. Roberts-Jones, E. Fiedler, andW. Wimmer (2012), The operational sea surface temperature and sea iceanalysis (OSTIA) system, Remote Sens. Environ., 116, 140–158, doi:10.1016/j.rse.2010.10.017.

Durack, P. J., and S. E. Wijffels (2010), Fifty-year trends in global oceansalinities and their relationship to broad-scale warming, J. Clim., 23,4342–4362, doi:10.1175/2010JCLI3377.1.

Eyre, J. R., and F. I. Hilton (2013), Sensitivity of analysis error covarianceto the misspecification of background error covariance, Q. J. R. Mete-orol. Soc., 139, 524–533 doi:10.1002/qj.1979.

Good, S. A. (2011), Depth biases in XBT data diagnosed using bathymetrydata, J. Atmos. Oceanic Technol., 28, 287–300, doi:10.1175/2010JTECHO773.1.

Gouretski, V. (2012), Using GEBCO digital bathymetry to infer depthbiases in the XBT data, Deep Sea Res., Part I, 62, 40–52, doi:10.1016/j.dsr.2011.12.012.

Gouretski, V., and K. P. Koltermann (2007), How much is the ocean reallywarming?, Geophys. Res. Lett., 34, L01610, doi:10.1029/2006GL027834.

Gouretski, V., and F. Reseghetti (2010), On depth and temperature biasesin bathythermograph data: Development of a new correction schemebased on analysis of a global ocean database, Deep Sea Res., Part I,57(6), 812–833, doi:10.1016/j.dsr.2010.03.011.

Gronell, A., and S. E. Wijffels (2008), A semiautomated approach for qual-ity controlling large historical ocean temperature archives, J. Atmos.Oceanic Technol., 25, 990–1003, doi:10.1175/JTECHO539.1.

Guinehut, S., C. Coatanoan, A.-L. Dhomps, P.-Y. L. Traon, andG. Larnicol (2009), On the use of satellite altimeter data in Argo qualitycontrol, J. Atmos. Oceanic Technol., 26, 395–402, doi:10.1175/2008JTECHO648.1.

Hamon, M., G. Reverdin, and P.-Y. L. Traon (2012), Empirical correctionof XBT data, J. Atmos. Oceanic Technol., 29, 960–973, doi:10.1175/JTECH-D-11-00129.1.

Hanawa, K., P. Rual, R. Bailey, A. Sy, and M. Szabados (1995), A newdepth-time equation for Sippican or TSK T-7, T-6 and T-4 expendablebathythermogrpahs (XBT), Deep Sea Res., Part I, 42, 1423–1451.

Ide, K., P. Courtier, M. Ghil, and A. Lorenc (1997), Unified notation fordata assimilation: Operational, sequential and variational, 2nd WMO Int.Symp. Assimilation Obs. Meteorol. Oceanogr., J. Meteorol. Soc. Jpn.,75(1B), 181–189.

Ingleby, B., and M. Huddleston (2007), Quality control of ocean tempera-ture and salinity profiles—Historical and real-time data, J. Mar. Syst.,65, 158–175.

Ishii, M., and M. Kimoto (2009), Reevaluation of historical ocean heat con-tent variations with time-varying XBT and MBT depth bias corrections,J. Oceanogr., 65, 287–299.

GOOD ET AL.: THE EN4 DATA SET

6715

Page 13: EN4: Quality controlled ocean temperature and …...WOD09, Boyer et al. [2009]), which is the main data source used for constructing EN4, and the Coriolis data set for ReAnalysis (CORA)

Ishii, M., M. Kimoto, and M. Kachi (2003), Historical ocean subsurfacetemperature analysis with error estimates, Mon. Weather Rev., 131,51–73.

Kizu, S., and K. Hanawa (2002), Start-up transient of XBT measurement,Deep Sea Res., Part I, 49, 935–940.

Kizu, S., S. Ito, and T. Watanabe (2005), Intermanufacturer difference andtemperature dependency of the fall-rate of T-5 expendable bathythermo-graph, J. Oceanogr., 61(5), 905–912.

Levitus, S., J. I. Antonov, T. P. Boyer, R. A. Locarnini, H. E. Garcia, andA. V. Mishonov (2009), Global ocean heat content 1955-2008 in light ofrecently revealed instrumentation problems, Geophys. Res. Lett., 36,L07608, doi:10.1029/2008GL037155.

Levitus, S., et al. (2012), World ocean heat content and thermosteric sealevel change (0-2000 m), 1955-2010, Geophys. Res. Lett., 39, L10603,doi:10.1029/2012GL051106.

Lorenc, A. C., R. S. Bell, and B. Macpherson (1991), The MeteorologicalOffice analysis correction data assimilation scheme, Q. J. R. Meteorol.Soc., 117, 59–89.

Lyman, J. M., S. A. Good, V. V. Gouretski, M. Ishii, G. C. Johnson, M. D.Palmer, D. M. Smith, and J. K. Willis (2010), Robust warming of theglobal upper ocean, Nature, 465, 334–337, doi:10.1038/nature09043.

Palmer, M. D., K. Haines, S. F. B. Tett, and T. J. Ansell (2007), Isolatingthe signal of ocean global warming, Geophys. Res. Lett., 34, L23610,doi:10.1029/2007GL031712.

Rayner, N. A., D. E. Parker, E. B. Horton, C. K. Folland, L. V. Alexander,D. P. Rowell, E. C. Kent, and A. Kaplan (2003), Global analyses of sea

surface temperature, sea ice, and night marine air temperature since thelate nineteenth century, J. Geophys. Res., 108(D14), 4407, doi:10.1029/2002JD002670.

Smith, D. M., S. Cusack, A. W. Colman, C. K. Folland, G. R. Harris, andJ. M. Murphy (2007), Improved surface temperature prediction for thecoming decade from a global climate model, Science, 317, 796–799,doi:10.1126/science.1139540.

Swift, J. (2002), Physical and chemical properties from selected expedi-tions in the Arctic Ocean, edited by R. Dichtl, Y. Axford, and T. Haran,Natl. Snow and Ice Data Cent., Digital media, Boulder, Colo.

U.S. National Oceanographic Data Center (2006), Global Temperature-Salinity Profile Programme, U.S. Dep. of Commer., Natl. Oceanic andAtmos. Admin., Natl. Oceanogr. Data Cent., Silver Spring, Md. [Avail-able at http://www.nodc.noaa.gov/GTSPP/, accessed Sep 2012.]

Wijffels, S. E., J. Willis, C. M. Domingues, P. Barker, N. J. White, A.Gronell, K. Ridgway, and J. A. Church (2008), Changing expendablebathythermograph fall-rates and their impact on estimates of thermoste-ric sea level rise, J. Clim., 21, 5657–5672, doi:10.1175/2008JCLI2290.1.

Xue, Y., M. A. Balmaseda, T. Boyer, N. Ferry, S. Good, I. Ishikawa, A.Kumar, M. Rienecker, A. J. Rosati, and Y. Yin (2012), A comparativeanalysis of upper-ocean heat content variability from an ensemble ofoperational ocean reanalyses, J. Clim., 25, 6905–6929, doi:10.1175/JCLI-D-11-00542.1.

GOOD ET AL.: THE EN4 DATA SET

6716