2
PAGES2 K SEMANTIC DATABASE A SEMANTIC DATABASE OF TEMPERATURE PROXIES COVERING THE C OMMON E RA Julien Emile-Geay 1 , Nicholas P. McKay 2 , Jianghao Wang 1 , Darrell Kaufman 2 & PAGES2k consortium Abstract—Reconstructions of surface temperature over the past 2,000 years extend our knowledge of climate system behavior beyond the instrumental era, helping to distinguish between exogenous and endogenous sources of climate variability, a fundamental frontier of climate science. In this study, we describe the latest incarnation of the PAGES 2k global multi-proxy database, a multi- proxy, community-curated pool of paleoclimate records. The database is sttructured as Linked Open Data using a JSON-LD container, allowing for semantic relations to be discovered between its objects and other Linked Data. We describe elementary statistical analyses possible with this new data resource, present a reconstruction of global surface temperature via Markov random fields, and encourage experimentation via other forms of machine learning and artificial intelligence. I. MOTIVATION Low-frequency climate variability is crucial to adap- tation and planning, and can only be adequately con- strained by paleoclimate observations. Today, however, the vast majority of paleoclimate datasets are in dis- parate formats, incommensurate with each other and with climate model output. This fundamentally limits our ability to use them for the validation of Earth system models and, hence, for effective decision-making. It is therefore essential to bring all relevant observations into a consistent format. Two ingredients have made this possible: 1) PAGES2k 1 , a community-driven effort to syn- thesize all publicly-archived, temperature-sensitive proxy records of the past 2,000 years [1]. 2) LiPD, a new container designed to make paleocli- mate data intelligible to machines [2] The Linked Paleo Data (LiPD) format uses a JSON- LD framework to flexibly annotate metadata, explicating domain-specific definitions via a context file. The data themselves are stored in .csv files. Corresponding author: J. Emile-Geay, Department of Earth Sci- ences, University of Southern California, Los Angeles, CA 1 School of Earth Sciences and Environmental Sustainability, North- ern Arizona University, Flagstaff, AZ 2 1 http://www.pages-igbp.org/ini/wg/2k-network/data 80S 60S 40S 20S 020N 40N 60N 80N PAGES 2K network (Phase 2) as of 2015/07/21 (724 records from 667 sites) bivalve borehole coral historic hybrid ice core lake sediment marine sediment sclerosponge speleothem tree 200 400 600 800 1000 1200 1400 1600 1800 2000 0 100 200 300 400 500 600 700 Temporal Availability # proxies Year (CE) 0 50 100 150 200 First Millennium Fig. 1. Spatiotemporal data availability in the PAGES2k database II. DATABASE SYNOPSIS The database presently contains 724 records from 667 locations around the globe, spanning all or part of the past 2,000 years, with resolution going from monthly to centennial (Fig 1). There are 11 proxy types and dozens of measurement types. Records were selected by many volunteers worldwide, based on the following criteria: Relation to temperature . The dataset includes proxy records for which statistical or mechanistic evidence of tempera- ture exists Duration . 500y for non-annually resolved archives, 300y for terrestrial archives, 50y for annual marine archives. Chronological accuracy . For non-annually resolved records, primary chronological information was archived to en- able age modeling. Resolution . 1 data point every 50 years on average (except marine sediments, for which 200 years is the minimum average sample interval).

a semantic database of temperature proxies covering the common era

  • Upload
    lythien

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

PAGES2K SEMANTIC DATABASE

A SEMANTIC DATABASE OF TEMPERATUREPROXIES COVERING THE COMMON ERA

Julien Emile-Geay1, Nicholas P. McKay2, Jianghao Wang1, Darrell Kaufman2 & PAGES2k consortium

Abstract—Reconstructions of surface temperature overthe past 2,000 years extend our knowledge of climatesystem behavior beyond the instrumental era, helping todistinguish between exogenous and endogenous sourcesof climate variability, a fundamental frontier of climatescience. In this study, we describe the latest incarnationof the PAGES 2k global multi-proxy database, a multi-proxy, community-curated pool of paleoclimate records.The database is sttructured as Linked Open Data usinga JSON-LD container, allowing for semantic relationsto be discovered between its objects and other LinkedData. We describe elementary statistical analyses possiblewith this new data resource, present a reconstruction ofglobal surface temperature via Markov random fields, andencourage experimentation via other forms of machinelearning and artificial intelligence.

I. MOTIVATION

Low-frequency climate variability is crucial to adap-tation and planning, and can only be adequately con-strained by paleoclimate observations. Today, however,the vast majority of paleoclimate datasets are in dis-parate formats, incommensurate with each other andwith climate model output. This fundamentally limitsour ability to use them for the validation of Earth systemmodels and, hence, for effective decision-making. It istherefore essential to bring all relevant observations intoa consistent format. Two ingredients have made thispossible:

1) PAGES2k1, a community-driven effort to syn-thesize all publicly-archived, temperature-sensitiveproxy records of the past 2,000 years [1].

2) LiPD, a new container designed to make paleocli-mate data intelligible to machines [2]

The Linked Paleo Data (LiPD) format uses a JSON-LD framework to flexibly annotate metadata, explicatingdomain-specific definitions via a context file. The datathemselves are stored in .csv files.

Corresponding author: J. Emile-Geay, Department of Earth Sci-ences, University of Southern California, Los Angeles, CA1

School of Earth Sciences and Environmental Sustainability, North-ern Arizona University, Flagstaff, AZ 2

1http://www.pages-igbp.org/ini/wg/2k-network/data

80oS

60oS

40oS

20oS

0o

20oN

40oN

60oN

80oN

PAGES 2K network (Phase 2) as of 2015/07/21 (724 records from 667 sites)

bivalve

borehole

coral

historic

hybrid

ice core

lake sediment

marine sediment

sclerosponge

speleothem

tree

200 400 600 800 1000 1200 1400 1600 1800 2000

0

100

200

300

400

500

600

700

Temporal Availability

# p

roxi

es

Year (CE)

0

50

100

150

200First Millennium

Fig. 1. Spatiotemporal data availability in the PAGES2k database

II. DATABASE SYNOPSIS

The database presently contains 724 records from 667locations around the globe, spanning all or part of thepast 2,000 years, with resolution going from monthly tocentennial (Fig 1). There are 11 proxy types and dozensof measurement types. Records were selected by manyvolunteers worldwide, based on the following criteria:

Relation to temperature.The dataset includes proxy records for whichstatistical or mechanistic evidence of tempera-ture exists

Duration.� 500y for non-annually resolved archives,300y for terrestrial archives, 50y for annualmarine archives.

Chronological accuracy.For non-annually resolved records, primarychronological information was archived to en-able age modeling.

Resolution.� 1 data point every 50 years on average(except marine sediments, for which 200 yearsis the minimum average sample interval).

EMILE-GEAY & PAGES2K CONSORTIUM

80oS

60oS

40oS

20oS

0o

20oN

40oN

60oN

80oN

Screened PAGES2k network (fdr, 252 records from 245 sites)

bivalve

coral

historic

ice core

lake sediment

marine sediment

sclerosponge

speleothem

tree

200 400 600 800 1000 1200 1400 1600 1800 2000

0

50

100

150

200

250

Temporal Availability

# p

roxi

es

Year (CE)

0

50

First Millennium

Fig. 2. Screening while controlling for the false discovery rate.

Public Domain.The records are publicly available and citable.

III. DISCOVERING TEMPERATURE RELATIONS

A natural question is the extent to which these proxyrecords capture large-scale temperature information. Forthis we evaluated correlations between records (withmore than 20 available observations over the 1850-2010interval) and the HadCRUT4.2 temperature dataset [3]on a 5 ⇥ 5� grid, who missing values were infilledvia the GraphEM algorithm [4]. Because there are onlyO(160) years of instrumental data, and 2592 grid points,this is a “large p, small n problem”. Moreover, climatetimeseries feature a “warm-colored” spectrum, which in-validates many statistical assumptions (e.g. IID observa-tions). We thus evaluate the significance of mean annualtemperature correlations against isospectral Monte Carlosurrogates [5], controlling for false discoveries [6]. 252records pass this screening test (Fig 2).

IV. A PAGES2K RECONSTRUCTION OF GLOBALSURFACE TEMPERATURE

Fitting this screened dataset against HadCRUT4.2mean annual temperature using a Gaussian graphicalmodel [4] with sparsity induced by the graphical lasso[7], we reconstructed global surface temperature over thepast 2,000 years. The global mean is shown in Fig 3,together with uncertainties. The dataset can be used todiagnose the response to volcanic & solar forcing, gaugethe unusual character of twentieth century warming, orprobe the continuum of climate variability.

V. DISCUSSION

These applications only scratch the surface of whatthis database enables. We envision its use for fingerprint-ing natural climate forcings, evaluating global climate

200 400 600 800 1000 1200 1400 1600 1800 2000

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

RE/CE = +0.90

R2 = +0.90

MSE = +0.00

Year (CE)

Tem

p A

nom

Global mean temperature (20-year lowpass)

Reconstruction using hybrid GraphEM

HadCRUT4

Fig. 3. Global mean temperature reconstruction using the PAGES2khigh-resolution records and the GraphEM[4] algorithm

models, establishing relations to other climate fields(e.g. drought indices) or to (pre-)historical events. Oncepublished [8] the data will be available as Linked OpenData, allowing for web-based discovery and linkages toother datasets.

ACKNOWLEDGMENTS

Funding for the authors was provided by NSF grantsAGS-1003818, EAR-1347221 and ICER-1541029.

REFERENCES

[1] Kaufman, Darrell S. & PAGES 2K Consortium, “A community-driven framework for climate reconstructions,” Eos, Transactions

American Geophysical Union, vol. 95, no. 40, pp. 361–368, 2014.[2] McKay, Nicholas P. and J. Emile-Geay, “The Linked Paleo

Data framework: a common tongue for paleoclimatology.”https://www.authorea.com/users/17200/articles/19163/ show article,March 2015.

[3] C. P. Morice, J. J. Kennedy, N. A. Rayner, and P. D. Jones,“Quantifying uncertainties in global and regional temperaturechange using an ensemble of observational estimates: The had-crut4 data set,” Journal of Geophysical Research: Atmospheres,vol. 117, no. D8, pp. n/a–n/a, 2012.

[4] D. Guillot, B. Rajaratnam, and J. Emile-Geay, “Statistical paleo-climate reconstructions via Markov random fields,” Ann. Applied.

Statist., pp. 324–352, 2015.[5] W. Ebisuzaki, “A method to estimate the statistical significance

of a correlation when the data are serially correlated,” Journal

of Climate, vol. 10, pp. 2147–2153, 2011/10/22 1997.[6] V. Ventura, C. J. Paciorek, and J. S. Risbey, “Controlling the

Proportion of Falsely Rejected Hypotheses when ConductingMultiple Tests with Climatological Data,” Journal of Climate,vol. 17, pp. 4343–4356, 2013/02/25 2004.

[7] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covari-ance estimation with the graphical lasso,” Biostat, vol. 9, no. 3,pp. 432–441, 2008.

[8] PAGES2K Consortium, “A global multiproxy database for tem-perature reconstructions of the Common Era,” Scientific Data, inprep.