Upload
adam-richards
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
From “lab books” to computational Earth science.
Chris Hill, MIT – [email protected]
Edinburgh, July 2007
Lab booksA A lab notebooklab notebook is a primary record of is a primary record of research. . Researchers use a lab notebook to document their Researchers use a lab notebook to document their hypotheses, , experiments and initial analysis or interpretation and initial analysis or interpretation of these experiments. The notebook serves as an of these experiments. The notebook serves as an organizational tool, a memory aid, and can also have a role organizational tool, a memory aid, and can also have a role in protecting any intellectual property that comes from the in protecting any intellectual property that comes from the research.research.The guidelines for lab notebooks vary widely between The guidelines for lab notebooks vary widely between institution and between individual labs, but some guidelines institution and between individual labs, but some guidelines are fairly common. The lab notebook is usually written in as are fairly common. The lab notebook is usually written in as the experiments progress, rather than a later date. the experiments progress, rather than a later date. Many say Many say that lab notebook should be thought of as a diary of activities that lab notebook should be thought of as a diary of activities that are described in sufficient detail to allow another that are described in sufficient detail to allow another scientist to follow the same steps.scientist to follow the same steps.To ensure that data cannot be easily altered, notebooks with To ensure that data cannot be easily altered, notebooks with permanently bound pages are often recommended. permanently bound pages are often recommended. Researchers are often encouraged to write only with Researchers are often encouraged to write only with unerasable pen, to sign and date each page, and to have unerasable pen, to sign and date each page, and to have their notebooks inspected periodically by another scientist their notebooks inspected periodically by another scientist who can read and understand it. All of these guidelines can who can read and understand it. All of these guidelines can be useful in proving exactly when a discovery was made, in be useful in proving exactly when a discovery was made, in the case of a patent dispute.the case of a patent dispute.Several companies now offer electronic lab notebooks. This Several companies now offer electronic lab notebooks. This format has gained some popularity, especially in large format has gained some popularity, especially in large pharmaceutical companies, which have large numbers of pharmaceutical companies, which have large numbers of researchers and great need to document their experiments.researchers and great need to document their experiments.
wikipedia
Lab books• physical, chemical and biological scientists are taught
lab-book discipline from an early age.– reproducible results are the foundation of scientific and
engineering disciplines e.g. Mickleson/Morley.– even an infamous “Journal of Unreproducible Results”
• in computational science the “lab book” discipline is not so ubiquitous – maybe because– program is a formal statement of applied mathematical
axioms– axioms are deterministic– therefore reproducibility is not an issue– however, a programs i.e. a complex collection of simple
elemental statements is hard to comprehend. If details are not recorded, reproducibility may well be an issue.
Some example computational Earth science experiments.
• Aqua-planet.
• Eddying North Atlantic.
• Global ocean with eddies and seaice.
• IPCC
A simple GFD configuration
• Some factors that affect the solution:– Initial conditions. – Atmosphere: Clouds,
radiation, dynamics, boundary layer, temporal and spatial discretization….
– Seaice: Thermodynamics. Aging. Stress-strain relation….
– Ocean: Dynamics, coordinate system, vertical/horizontal friction and mixing….
– Coupling: Time stepping, emergetics.
– External forcings: Solar insolation, reference profiles
Jean-Michel Campin and David Ferreira
Water covered planet.Atmosphere-ocean-seaice.
Red/blue shading: ocean heating/cooling.
Cyan/magenta line: +/-17.5OC @ 200m.
Streaks: Windstress.
Green thickness: Ocean mixed layer depth.
An eddying, ocean only configuration
Ocean-only, forced with atmospheric reanalysis for Jan-Mar.
• Some factors that affect the solution:– Initial conditions. – Atmosphere fluxes:
Planetary boundary layer scheme.
– Ocean: Dynamics, coordinate system, vertical/horizontal friction and mixing….
– Coupling: Time stepping, emergetics.
– External forcings: Solar insolation, reference profiles, atmospheric reanalysis.
– Non-linear/turbulent flow, so bitwise reproducibility subject to FP round off, parallel reduction operatations etc…
Global eddying ocean, sea-ice decadal ensemble. 50+ members.
Ensemble perturbations:Numerical formulationOcean parametersSeaice parametersInitial conditionsBoundary conditions
IPCC ocean ACC transports
Sv
050
100150200250300350400
Obse
rvatio
nal
GIS
S-E
R
GIS
S-A
OM
MIR
OC
3.2
(medre
s)
MIR
CO
3.2
(hires)
CC
CM
A-C
GC
M3.1
MR
I-C
GC
M2.3
.2a
INM
-CM
3.0
CN
RM
-CM
3
Sv
Could I make this plot without too much difficulty – yesCould I rerun IPCC scenario (possibly with some parameter change) – no
Diagnosing these results is possible today (PCMDI/ESG archives) for broad scientific community. Rerunning experiments (with or without small changes) is still very hard.
Factors affecting solution range from bottom drag to land-surface formulation to emissions profiles.
Couples atmosphere, ocean, seaice, land, vegetation, chemistry etc…
Examples summary• To reproduce an experiment
– significant quantity of information needs to be stored – spans broad “big-picture” information (water-covered planet, atmos+ocean+seaice) to minute details (bitwise reproducibility may require record of compiler, OS etc…)
Way Forward• hand record is not practical nor ideal
(i.e. not as potentially useful as electronic record).
• Electronic information should be stored so as to be amenable to machine reasoning.– requires defined vocabularies,
precise formal structure, pattern matching, rules etc..
W3C/semantic web technologies - XML, RDF,
• In theory, using XML, RDF etc…< we could describe model systems using these and enable reruns for extra outputs (e.g. transport of S3 by flow) or derived runs (e.g. modified air-sea coupling coefficient of formulation).
• In practice this is hardwork!
Sv
050
100150200250300350400
Obs
erva
tiona
l
GIS
S-E
R
GIS
S-A
OM
MIR
OC
3.2(
med
res)
MIR
CO
3.2(
hire
s)
CC
CM
A-C
GC
M3.
1
MR
I-CG
CM
2.3.
2a
INM
-CM
3.0
CN
RM
-CM
3
Sv
Baby steps toward a computational Earth science “model repository”.
• What is working today – PCMDI/ESG
• Steps toward future - ESC
PCMDI• Archive of all IPCC model outputs.• Stored in common format (netCDF with
standard metadata).• Stored on common mesh. Simplifies things,
but can/does degrade information and even mislead (e.g. conservation in one coordinate system may be inexact in another).
• Very limited model metadata is held.• Very successful and technically impressive
– societal utility func. of model quality!
Schm
ittner et al (2005, GR
L)
Earth System Curator (ESC)Can we (for better or worse!) do for models what PCMDI does for datasets?
PCMDI datasets are data “wrapped” in a common/standard container (netCDF).
The PCMDI container is “self-describing”.
This means we can query and even combine (to some degree) the PCMDI datasets.
A container analogy for modeling technology is the “component architecture” supported by systems like ESMF.
Building a coupled model oriented solution – Building a coupled model oriented solution – modeling system as a modeling system as a componentcomponent tree tree
• Some mathematics – component M
– no side-effects– possible persistent internal state
• Supports representation as DAG such that
M
0M
1,0M 2,0M
1,1,0M 2,1,0M 1,2,0M 2,2,0M 3,2,0M
ie ,
3,2,02,2,01,2,02,02,0 ,,,
cmnPnPnP nm ,1:, ,,,,
e.g
Example of actual component tree.Example of actual component tree.
• Tree of components from the GEOS-5 modeling system.
• Each box is an ESMF component.
• Components adhere to DAG semantics.
Suarez et. al
Individual components in ESMF
• ESC builds on an ESMF-like component model.– ESMF Component
• Container for sequence of computation that implements a particular algorithm (physics simulation e.g. Navier-Stokes solver or technical function e.g history manager). An ESMF component exposes its external interfaces through an ESMF state.
– ESMF State• Container data type to transport data between components
– ESMF Field• Container data type that can be used to push/pop n-
dimensional data with an associated mesh from an ESMF State.
Given a component model, like the ESMF paradigm, ESC…
• Describes a component in terms of– parameters that control the computation sequence.– states and fields that are passed into/out of the
component.• Provides two levels of description
– potential and specific.– Potential is a list of all possible parameters and fields.
It is a virtualized description in that it is not describing a specific instance.
– Specific is a description of an instantiated component in which parameters are bound to specific values and fields and states are bound to specific values.
ESC component descriptions are in terms of XML schema.
• Curator-NMM– Described numerical model parameters e.g. timestep,
system requirements, • Gridspec
– Describes numerical mesh.• Curator-CIAO
– Describes components inputs and outputs• Curator-complete
– Describes wiring together of components– A coupled component is also a component i.e.
schema is recursive.
Some details (more at http://www.earthsystemcurator.org) …..
Curator-NMM
• The Curator-NMM schema describes model components, their content, and their connections. It is a superset of the NMM schema. The main constructs in the Curator-NMM schema are component, potential model, and model. Components are "composable" pieces of code that can be coupled together in various arrangements to form different models. A potential model consists of a group of components, and describes the set of possible models that can be built from those components. A model is a fully specified application based on a potential model and configuration choices.
Curator-NMM
Mosaic Grid Specification
• The Mosaic Grid Specification is a standardized description of muti-patch, structured grids being developed in coordination with CF activities.
Mosaic Grid Specification
Component – component compatibility checking.
• ESC can describe coupled (multi-component) systems.
• In principle ESC could support recombination of components from coupled systems e.g. couple component A (atmosphere dynamics) with component B (land-surface).
• Ideally, for this, compatibility constraints need to be expressed in a standard way.
Service architectures
• Standards services– Developing standardized descriptions is a
well-proven method toward a service oriented approach e.g.
Some useful (but an incomplete list of) URLs
Component models
http://www.esmf.ucar.edu
http://maplcode.org
Metadata & standards
http://www.earthsystemcurator.org
http://ncas-cms.nerc.ac.uk/NMM/
http://www.earthsystemgrid.org/
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/
http://sbml.org/index.psp
http://cml.sourceforge.net/wiki/index.php/Main_Page
http://www.w3.org/
Summary• Earth System Curator project is an activity developing schema
and tools to capture “semantic” information about models.– Such information provides basis for formally recording numerical
experiments – computational Earth science “lab book”.– It also provides the basis for a formal approach reproducible numerical
results – fewer “Journal of Irreproducible Results” candidates.
• Other efforts SBML (systems biology), CML (chemistry) - already “uploads” to Science submissions.
• Maybe soon a computational Earth science challenge will become, how to stop people doing dumb things with easy to use modeling services, rather than how to get people to use obtuse legacy modeling systems - maybe!
ESC collaboration
• NCAR (Cecelia Deluca, Julien Chastang), MIT (Chris Hill, Constantinos Evangelinos), Georgia Tech (Spencer Rubager, Rocky Dunlap, Angela), GFDL (Balaji, Sergey), Reading UK (Lois Steenman-Clark, Katherine Boughton), PRISM (Sophie Valcke).