26
From “lab books” to computational Earth science. Chris Hill, MIT – [email protected] Edinburgh, July 2007

From lab books to computational Earth science. Chris Hill, MIT – [email protected]@mit.edu Edinburgh, July 2007

Embed Size (px)

Citation preview

Page 1: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

From “lab books” to computational Earth science.

Chris Hill, MIT – [email protected]

Edinburgh, July 2007

Page 2: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Lab booksA A lab notebooklab notebook is a primary record of is a primary record of research. . Researchers use a lab notebook to document their Researchers use a lab notebook to document their hypotheses, , experiments and initial analysis or interpretation and initial analysis or interpretation of these experiments. The notebook serves as an of these experiments. The notebook serves as an organizational tool, a memory aid, and can also have a role organizational tool, a memory aid, and can also have a role in protecting any intellectual property that comes from the in protecting any intellectual property that comes from the research.research.The guidelines for lab notebooks vary widely between The guidelines for lab notebooks vary widely between institution and between individual labs, but some guidelines institution and between individual labs, but some guidelines are fairly common. The lab notebook is usually written in as are fairly common. The lab notebook is usually written in as the experiments progress, rather than a later date. the experiments progress, rather than a later date. Many say Many say that lab notebook should be thought of as a diary of activities that lab notebook should be thought of as a diary of activities that are described in sufficient detail to allow another that are described in sufficient detail to allow another scientist to follow the same steps.scientist to follow the same steps.To ensure that data cannot be easily altered, notebooks with To ensure that data cannot be easily altered, notebooks with permanently bound pages are often recommended. permanently bound pages are often recommended. Researchers are often encouraged to write only with Researchers are often encouraged to write only with unerasable pen, to sign and date each page, and to have unerasable pen, to sign and date each page, and to have their notebooks inspected periodically by another scientist their notebooks inspected periodically by another scientist who can read and understand it. All of these guidelines can who can read and understand it. All of these guidelines can be useful in proving exactly when a discovery was made, in be useful in proving exactly when a discovery was made, in the case of a patent dispute.the case of a patent dispute.Several companies now offer electronic lab notebooks. This Several companies now offer electronic lab notebooks. This format has gained some popularity, especially in large format has gained some popularity, especially in large pharmaceutical companies, which have large numbers of pharmaceutical companies, which have large numbers of researchers and great need to document their experiments.researchers and great need to document their experiments.

wikipedia

Page 3: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Lab books• physical, chemical and biological scientists are taught

lab-book discipline from an early age.– reproducible results are the foundation of scientific and

engineering disciplines e.g. Mickleson/Morley.– even an infamous “Journal of Unreproducible Results”

• in computational science the “lab book” discipline is not so ubiquitous – maybe because– program is a formal statement of applied mathematical

axioms– axioms are deterministic– therefore reproducibility is not an issue– however, a programs i.e. a complex collection of simple

elemental statements is hard to comprehend. If details are not recorded, reproducibility may well be an issue.

Page 4: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Some example computational Earth science experiments.

• Aqua-planet.

• Eddying North Atlantic.

• Global ocean with eddies and seaice.

• IPCC

Page 5: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

A simple GFD configuration

• Some factors that affect the solution:– Initial conditions. – Atmosphere: Clouds,

radiation, dynamics, boundary layer, temporal and spatial discretization….

– Seaice: Thermodynamics. Aging. Stress-strain relation….

– Ocean: Dynamics, coordinate system, vertical/horizontal friction and mixing….

– Coupling: Time stepping, emergetics.

– External forcings: Solar insolation, reference profiles

Jean-Michel Campin and David Ferreira

Water covered planet.Atmosphere-ocean-seaice.

Page 6: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Red/blue shading: ocean heating/cooling.

Cyan/magenta line: +/-17.5OC @ 200m.

Streaks: Windstress.

Green thickness: Ocean mixed layer depth.

An eddying, ocean only configuration

Ocean-only, forced with atmospheric reanalysis for Jan-Mar.

• Some factors that affect the solution:– Initial conditions. – Atmosphere fluxes:

Planetary boundary layer scheme.

– Ocean: Dynamics, coordinate system, vertical/horizontal friction and mixing….

– Coupling: Time stepping, emergetics.

– External forcings: Solar insolation, reference profiles, atmospheric reanalysis.

– Non-linear/turbulent flow, so bitwise reproducibility subject to FP round off, parallel reduction operatations etc…

Page 7: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Global eddying ocean, sea-ice decadal ensemble. 50+ members.

Ensemble perturbations:Numerical formulationOcean parametersSeaice parametersInitial conditionsBoundary conditions

Page 8: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

IPCC ocean ACC transports

Sv

050

100150200250300350400

Obse

rvatio

nal

GIS

S-E

R

GIS

S-A

OM

MIR

OC

3.2

(medre

s)

MIR

CO

3.2

(hires)

CC

CM

A-C

GC

M3.1

MR

I-C

GC

M2.3

.2a

INM

-CM

3.0

CN

RM

-CM

3

Sv

Could I make this plot without too much difficulty – yesCould I rerun IPCC scenario (possibly with some parameter change) – no

Diagnosing these results is possible today (PCMDI/ESG archives) for broad scientific community. Rerunning experiments (with or without small changes) is still very hard.

Factors affecting solution range from bottom drag to land-surface formulation to emissions profiles.

Couples atmosphere, ocean, seaice, land, vegetation, chemistry etc…

Page 9: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Examples summary• To reproduce an experiment

– significant quantity of information needs to be stored – spans broad “big-picture” information (water-covered planet, atmos+ocean+seaice) to minute details (bitwise reproducibility may require record of compiler, OS etc…)

Way Forward• hand record is not practical nor ideal

(i.e. not as potentially useful as electronic record).

• Electronic information should be stored so as to be amenable to machine reasoning.– requires defined vocabularies,

precise formal structure, pattern matching, rules etc..

W3C/semantic web technologies - XML, RDF,

• In theory, using XML, RDF etc…< we could describe model systems using these and enable reruns for extra outputs (e.g. transport of S3 by flow) or derived runs (e.g. modified air-sea coupling coefficient of formulation).

• In practice this is hardwork!

Sv

050

100150200250300350400

Obs

erva

tiona

l

GIS

S-E

R

GIS

S-A

OM

MIR

OC

3.2(

med

res)

MIR

CO

3.2(

hire

s)

CC

CM

A-C

GC

M3.

1

MR

I-CG

CM

2.3.

2a

INM

-CM

3.0

CN

RM

-CM

3

Sv

Page 10: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Baby steps toward a computational Earth science “model repository”.

• What is working today – PCMDI/ESG

• Steps toward future - ESC

Page 11: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

PCMDI• Archive of all IPCC model outputs.• Stored in common format (netCDF with

standard metadata).• Stored on common mesh. Simplifies things,

but can/does degrade information and even mislead (e.g. conservation in one coordinate system may be inexact in another).

• Very limited model metadata is held.• Very successful and technically impressive

– societal utility func. of model quality!

Schm

ittner et al (2005, GR

L)

Page 12: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Earth System Curator (ESC)Can we (for better or worse!) do for models what PCMDI does for datasets?

PCMDI datasets are data “wrapped” in a common/standard container (netCDF).

The PCMDI container is “self-describing”.

This means we can query and even combine (to some degree) the PCMDI datasets.

A container analogy for modeling technology is the “component architecture” supported by systems like ESMF.

Page 13: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Building a coupled model oriented solution – Building a coupled model oriented solution – modeling system as a modeling system as a componentcomponent tree tree

• Some mathematics – component M

– no side-effects– possible persistent internal state

• Supports representation as DAG such that

M

0M

1,0M 2,0M

1,1,0M 2,1,0M 1,2,0M 2,2,0M 3,2,0M

ie ,

3,2,02,2,01,2,02,02,0 ,,,

cmnPnPnP nm ,1:, ,,,,

e.g

Page 14: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Example of actual component tree.Example of actual component tree.

• Tree of components from the GEOS-5 modeling system.

• Each box is an ESMF component.

• Components adhere to DAG semantics.

Suarez et. al

Page 15: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Individual components in ESMF

• ESC builds on an ESMF-like component model.– ESMF Component

• Container for sequence of computation that implements a particular algorithm (physics simulation e.g. Navier-Stokes solver or technical function e.g history manager). An ESMF component exposes its external interfaces through an ESMF state.

– ESMF State• Container data type to transport data between components

– ESMF Field• Container data type that can be used to push/pop n-

dimensional data with an associated mesh from an ESMF State.

Page 16: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Given a component model, like the ESMF paradigm, ESC…

• Describes a component in terms of– parameters that control the computation sequence.– states and fields that are passed into/out of the

component.• Provides two levels of description

– potential and specific.– Potential is a list of all possible parameters and fields.

It is a virtualized description in that it is not describing a specific instance.

– Specific is a description of an instantiated component in which parameters are bound to specific values and fields and states are bound to specific values.

Page 17: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

ESC component descriptions are in terms of XML schema.

• Curator-NMM– Described numerical model parameters e.g. timestep,

system requirements, • Gridspec

– Describes numerical mesh.• Curator-CIAO

– Describes components inputs and outputs• Curator-complete

– Describes wiring together of components– A coupled component is also a component i.e.

schema is recursive.

Some details (more at http://www.earthsystemcurator.org) …..

Page 18: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Curator-NMM

• The Curator-NMM schema describes model components, their content, and their connections.  It is a superset of the NMM schema.  The main constructs in the Curator-NMM schema are component, potential model, and model.  Components are "composable" pieces of code that can be coupled together in various arrangements to form different models.  A potential model consists of a group of components, and describes the set of possible models that can be built from those components.  A model is a fully specified application based on a potential model and configuration choices. 

Page 19: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Curator-NMM

Page 20: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Mosaic Grid Specification

• The Mosaic Grid Specification is a standardized description of muti-patch, structured grids being developed in coordination with CF activities.

Page 21: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Mosaic Grid Specification

Page 22: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Component – component compatibility checking.

• ESC can describe coupled (multi-component) systems.

• In principle ESC could support recombination of components from coupled systems e.g. couple component A (atmosphere dynamics) with component B (land-surface).

• Ideally, for this, compatibility constraints need to be expressed in a standard way.

Page 23: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Service architectures

• Standards services– Developing standardized descriptions is a

well-proven method toward a service oriented approach e.g.

Page 24: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Some useful (but an incomplete list of) URLs

Component models

http://www.esmf.ucar.edu

http://maplcode.org

Metadata & standards

http://www.earthsystemcurator.org

http://ncas-cms.nerc.ac.uk/NMM/

http://www.earthsystemgrid.org/

http://www.cgd.ucar.edu/cms/eaton/cf-metadata/

http://sbml.org/index.psp

http://cml.sourceforge.net/wiki/index.php/Main_Page

http://www.w3.org/

Page 25: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

Summary• Earth System Curator project is an activity developing schema

and tools to capture “semantic” information about models.– Such information provides basis for formally recording numerical

experiments – computational Earth science “lab book”.– It also provides the basis for a formal approach reproducible numerical

results – fewer “Journal of Irreproducible Results” candidates.

• Other efforts SBML (systems biology), CML (chemistry) - already “uploads” to Science submissions.

• Maybe soon a computational Earth science challenge will become, how to stop people doing dumb things with easy to use modeling services, rather than how to get people to use obtuse legacy modeling systems - maybe!

Page 26: From lab books to computational Earth science. Chris Hill, MIT – cnh@mit.educnh@mit.edu Edinburgh, July 2007

ESC collaboration

• NCAR (Cecelia Deluca, Julien Chastang), MIT (Chris Hill, Constantinos Evangelinos), Georgia Tech (Spencer Rubager, Rocky Dunlap, Angela), GFDL (Balaji, Sergey), Reading UK (Lois Steenman-Clark, Katherine Boughton), PRISM (Sophie Valcke).