Upload
gervais-walton
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
INTEGRATED DATASYSTEM FOR CRITICALZONE OBSERVATORIES
Mark Williams, University of Colorado
The water information value ladderThe water information value ladder
Monitoring
Collation
Quality assurance
Aggregation
Analysis
Reporting
Forecasting
Distribution
Done poorly
Done poorly to moderately
Sometimes done well, by many groups,but could be vastly improved
>>> Incre
asing value >>>
Integration
Data >>> Inform
ation >>> In
sight
Slide Courtesy CSIRO, BOM, WMO, Ilya, Dozier
QuickTime™ and a decompressor
are needed to see this picture.
Provenance and transparency
QuickTime™ and a decompressor
are needed to see this picture.
CZOs as platforms for researchIntegrating satellite & ground measurements with modeling
CZO measurements
provide the basis for
advances in multiple
Earth sciences
CZOs are DATA-RICH
places to develop &
test Earth system models
Challenges to CZO Data Management
Atmosphere
Biosphere
Hydrosphere
Lithosphere
Many Object & Data Types!•Diverse media•Sensor-based
• Stationary• Mobile• Spectra/photos
•Sample-based• Sub-samples• Preparations/Fractions
• Numeric & Categorical
Hillslope Catchment Watershed
Minutes
Decades
Millenia
Eons
Sample Fractions for Soil GeochemistryAdapting SESAR IGSN for CZO
EA-IRMSFTIRSA
EA-IRMSFTIR
EA-IRMSFTIR
Ziplock (~500g)Bulk soil
horizon or depth increment
Al Can (~70 g)For Gamma
Counting 137Cs
DRY SIEVE 2 mm
glass vial:<2mm finesdry sieved
(1) Pick out plant roots & detritus, rinse with DI water, oven dry,mill (SPEX?)
>2mm:
glass vial:plant detritus
milled
(2) Remaining pebbles & rocks,hard grind
glass vial:pebbles
hard ground
<2mm
ICP-MS after Li-borate fusion
XRD?
WET SIEVE, or DENSITY, or SETTLING
(with or without sonication)
glass vial:sand +
small detritus
glass vial:silt + clay
The choice here is important. Do we want
aggregates or not?
EA-IRMSFTIR
ICP-MS after Li-borate fusion
XRDCEC
SPEX mill
EA-IRMSFTIR
ICP-MS after Li-borate fusion
SPEX mill
SA
XRDCEC
SA
ExtractionsDithionite-Citrate extraction
Na pyrophosphate extractionAmmonium oxalate extraction
Christiana River CZO example
Overall Approach
• Do not reinvent the wheel! Build on– CUAHSI HIS, EarthChemDB, LTER, etc
• Consistent data presentation on web– Metadata– Data values
• Central data system for data discovery– Harvested by SDSC (pull system)
CZO data principles and policies
• Each CZO will operate and be responsible for its own local data management system for collecting, organizing, quality controlling and publishing data through its web site.
– Different philosophy than CUAHSI ODM – Each CZO is master of it’s own data• We don’t care what goes on under the hood• Each site uses it’s own protocols, data bases, etc• Allows CZO to honor site legacy data
CZO data principles and policies• Each CZO publish’s its data on the web in ascii
format with sufficient metadata so that the data can be unambiguously interpreted
• Metadata follows a proscribed format– Data managers just need rules to follow
• Easy to harvest by central portal• Makes it simple at the site level so scientists
comply– Addresses the chokepoint that is getting
data/metadata from the scientists to data managers
Data Management Team
• David Tarboton, Utah State. PI on the CUAHSI Hydrologic Information System (HIS)
• Kerstin Lehnert, Columbia. PI on EarthChemDB• Ilya Zaslavsky, Lead, SDSC Spatial Information
Systems Lab; hosts CUAHSI HIS. • Mark Williams, CU-Boulder. PI Niwot Ridge LTER• Anthony Aufdenkampe, co-I Christiana River
Basin CZO
Integrated CZO data systemSynthesizing information management experience and software from CZO partners and neighboring earth science projects into a standards-based system for publishing environmental data to emphasize the “critical zone” nature of our shared data sets
Local CZO DB
CZO Data Publication System
Spatial, hydrologic, geophysical, geochemical, imagery, spectral…
Local CZO DB Local CZO DB
Web site Web site Web site
Standard CZO Services
Standard CZO data display formats
CZO
Desktop
Matlab
R
Excel
ArcGIS
Modeling
CZO DesktopApplications
CZOData Products
CZO Web-based Data Discovery
System
External cross-project registries
DataNet, NEON
CZO Data Repository and Indexing (CZO Central)
Data Publication Process(for hydrologic time series)
CZO Display File ODMWaterML
Service
OGC WFS
Service
Raw Display file metadata Is registered with the CZO data portal, to assure original data is discoverable and downloadable.
WFS Service Is registered with the CZO data portal
CZO Central Catalog
OGC CSW
ServiceCZO Portal utilizes the OGC CSW (catalog services for the web)
Catalog Search Service
CZO Desktop
Broader internet community
accessing data using standard
protocols.
CZO data interoperability: what does it mean
Find and download CZO resources: files and file collections, services, documents – organized by CZO thematic category and by type
Data available in compatible semantics: ontologies, controlled vocabularies
Data available via the same service interfaces (e.g. WFS, SOS) but different information models
Compatibility at the level of domain information models and databases
Dee
per i
nteg
ratio
n
Wid
er v
arie
ty o
f dat
a
Well-understood data with formal information models
available via standard services
Different types of data collected by CZOs
Data discovery portal
Shared vocabulariesand ontology management
Serviceadministration (CZOCentral)
CZOdesktop,others
System componentsLevels of interoperability
Data disclaimer
Data Catalogue• Biogeochemistry: Including: anything on (Carbon), N
(Nitrogen), P (Phosphorus) nutrients, microbes• Climatology/Meteorology: Including: Met tower, temps,
snow• Ecology/Biology: Including: microbial, land use• Geology/Chronology: Including: geologic, descriptions of
rocks-mineralogy, CRN ages/rates• Geomorphology: Including: topography, chronological data,
sediment flux, fracture space• Geophysics: Including: seismic refraction etc• Geospatial: Including: GIS/RS, imagery, geologic map,
Gordon Gulch and GLV camera's
Water Chemistry• Header group (/doc): - Title, Abstract, Investigator, Variable names, Keywords,
Methods, Instrument, Citation, Publications, Comments• Header group, column information
– COL1. Label=ValueAttribue, value=site– COL2. label=ValueAttribute, value=DateTime, UTCOffset=-7, Timezone=MST,
format=”YYYYMMDD hh:mm”– COL3. label=ValueAttribute, value=pH, units=pH, SampleMedium=water, units=pH units,
missing value indicator=, ,methods=method1, etc• Header group, column (series) defaults that apply to all columns (eg site below)• Data (/data)• GREENLAKE4,820311,6.4,18,88.51,0.40,,114.77,24.68,21.75,10.23,25.389,,58.296,83.
200,,,,,,,,,,,,,,,,,,• GREENLAKE4,820422,5.7,18,90.15,2.00,,99.80,24.68,17.40,12.79,9.591,,72.870,44.92
8,,,,,,,,,,,,,,,,,,
• Automatically harvested using WaterML and EML• ASCII format, metadata and comma-deliminated data
CZO Data Management Web Administration Interface
CZO data managers use this web-based system to register display files, edit service metadata, initiate data retrieval, validate the data against shared vocabularies, and update hydrologic time series services
The administration system will be extended to geochemical samples and other data http://central.criticalzone.org
Services edited and validated by CZO data managers
Data managers control how theirdata is annotated.
Ingesting of Display files is triggeredon the server by the Data manager.
Display file ingestion log
Editable service definitions and management interface for each CZO data service
CZO Central Catalog Statistics, March 24, 2011
(time series services only)
CZO Service Sites Variables ValuesJemez River 14 1 154854Boulder Creek 1 31 11834Santa Catalina 5 6 59222Luquillo 8 16 831098Southern Sierra 8 4 1226330Shale Hills 1 18 848624
Christina River 31 5 6870150Total: 68 81 10002112
New Development: Central CZO Data Discovery Portal
Registered data are organized by CZO thematic categories
Display files from CZO web sites are registered to the data discovery portal automatically
In addition, display files of known types are expressed as data services, which are also registered in the portal
The portal is CSW-compliant (CSW=Catalog Services for the Web): can be federated with other catalogs including data.gov
Supports search by location, resource type, thematic category, keywords, plus full-text abstract search
Federation with CUAHSI HydroCatalog, to allow search of hydrologic data from ~70 networks
Local CZO DB
Shared Vocabulary
Spatial, hydrologic, geophysical, geochemical, imagery, spectral…
Local CZO DB Local CZO DB
Web site Web site Web site
Shared Vocabulary
Standard CZO data display formats
CZO
Desktop
Matlab
R
Excel
ArcGIS
Modeling
CZO DesktopApplications
CZOData Products
CZO Web-based Data Discovery
System
External cross-project registries
DataNet
CZO Data Repository and Indexing (CZO Central)
CZO Shared Vocabulary System
Purpose:To promote the consistent use of terminology.
http://sv.critialzone.org
Builds on CUAHSI HIS
SVDatabase
Data Managers and SV
DataManagers
❶
❷
CSVData File
Unknown TermEmail
Local CZOWebsite
ObservationDatabase
CSVData File
❸Request
TermWeb Page
XML SV List
XML SV List
Preferred vocabularies. Moderators to be designated by CZO with expertise in each
category• Variable names (extended from CUAHSI HIS)• Units (extended from CUAHSI HIS) (e.g. m, g/L)• Value type (from CUAHSI HIS) (e.g. Field observation, derived value,
model output)• Sample type (from CUAHSI HIS) (e.g. stream water, ground water,
rock, soil)• Data type (from CUAHSI HIS) (e.g. average over interval, cumulative,
continuous, sporadic)• Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 =
fully infilled and quality controlled)• Spatial references ( extensible based on EPSG) (e.g. NAD 1983,
WGS84, UTM zone 11)• KEYKEY: CZO expands ODM controlled vocabularies to a larger audience
using “preferred vocabularies”
Methods
1. Major problem for metadata
2. Solution: lookup table that is part of the controlled vocabulary
3. Three parts: sample collection, sample preparation, analytical procedure
4. Up and running, needs moderators
Local CZO DB
CZO Spatial Data
Spatial, hydrologic, geophysical, geochemical, imagery, spectral…
Local CZO DB Local CZO DB
Web site Web site Web site
Spatial Data
Standard CZO data display formats
CZO
Desktop
Matlab
R
Excel
ArcGIS
Modeling
CZO DesktopApplications
Standard CZO
Services
CZO Web-based Data Discovery
System
CZO Data Repository and Indexing (CZO
Central)
Metadata and Spatial View
Spatial View
• Metadata- Multi File control
• Spatial Extent- Ex: LiDAR flights,
transects, etc.- Point data (collected
at particular location).- Uses Google Maps API- KML functionality
Guo lab, UC Merced
CZODesktop
Matlab
R
Excel
ArcGIS
ModelingLocal CZO DB
Geochemical Samples (based on CZEN)
Geochemical samples
Local CZO DB Local CZO DB
Web site Web site Web site
Geochemical web services, EarthChemDB
Standard CZO data display formats
CZO DesktopApplications
Depth-resolved
geochemistry
CZO Web-based Geochemical DB
EarthChem Data Engine & Portal
Location(Watershed)
Sampling Site(Soil / Water)
AnalysisSample(Layer/Depth)
Preparat./
Treatment
12
.
.
.
Sub-smpl 2
Sub-sample
Sub-smpl n
Chemical
Phys. Minr
Others
Data
Loc_info/Climate
Methods
Sources
Precision
Var-Lookup/Unit
Me
ta-D
ata
Ma
in D
ata
Geo-Info
Publication
Project
SMPLTime Series
Landuse/Veg.
Lab-Info
Personcontributor
Preparation/Treatment
Sample
Country/State
Lab Analysis
Sub-Sample
CZO Chemistry Database Conceptual Model – (CZOCHEMDB)
Penn State lead
Progress
Database is accessible at www.czo.psu.edu
PSU CZO students and post-docs have used template for data entry
Susan Melzar (Colorado State) has used template and data has been entered into database
Published data from Muhs et al. (2001), Harden 1987, White et al. (2008)
Current version contains 1391 records, representing 17,604 data values
Ran webinar August 24th to show database capabilities and usage of data entry template
15 participated with representation from all 6 CZO’s
User guide is in progress
datasets
(original data & derived products)
GCDM DB
Integration withEarthChemDB
35
USGS
NAVDAT
GEOROC
GfG Data EntryUser Submission
External Databases
Topical Data
Collections
Kerstin Lehnert
EarthChem Portal
36
PetDB Others
USGSGEOROC
NAVDAT
XML
XMLXML
XML
XML
Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas.
Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database.Similar to our ODM hydrology portal
INTERNATIONAL GEOSAMPLE NUMBER
•Purpose: Unique identification for samples and related sampling features in the Earth Sciences
–To allow unambiguous referencing of data to samples in publications and data systems
–To allow tracking samples through repositories & labs
–To allow integration of distributed data for samplesD3-1D3-1
Name Location PublicationD3-1 SEIR ANDERSON, 1980 D3-1 North Fiji Basin EISSEN 1994D3-1 Shimada Smt GRAHAM 1988D3-1 Gorda Ridge CLAGUE 19843-1 Lamont Smts BATIZA 1982
Name Location PublicationD3-1 SEIR ANDERSON, 1980 D3-1 North Fiji Basin EISSEN 1994D3-1 Shimada Smt GRAHAM 1988D3-1 Gorda Ridge CLAGUE 19843-1 Lamont Smts BATIZA 1982
Geoinformatics for Geochemistry
Core
Core Section 1
Core Section 3
Core Section 2
Sample 1
Sample 2
Sample 1
Sample 2
Sample 3
Sample 1
Sample 2
Sample 3
Rock powder
Mineral conc.
Leachate
Fossil separate
Microprobe mount
ParentParentChild
ChildChildParent
IGSN:XXX000120
IGSN:XXX0065B3
IGSN:XXX9K23G6
IGSN:XXX07ST4K
IGSN:XYZ0G693M
IGSN:ABC0L98SW
IGSN:ABC0L53NW
IGSN:ABC0L653X
IGSN:ABC078HGB
IGSNInternational Organization
IGSNInternational Organization
SESARSESARNear Space Observatory
(invented example)
Near Space Observatory
(invented example)
ExoPlanet(invented example)
ExoPlanet(invented example)
CZOCZOGeoscience
AustraliaGeoscience
AustraliaUSGSUSGSIEDAIEDA ICDPICDP
RepositoryRepositoryAnalytical LabAnalytical LabInvestigatorInvestigator
Registrar
Registration Agents:
Registrants:
Managing Agent:
ADAPTING IGSN for CZO•Register any type of sample: pedons, hand specimens, mineral concentrates, etc. …•Register any type of material: soil, rock, sediment, fluid, gas, bio ….•Register ‘sample-related features’: sites, wells, cores, dredges …•Register relations (parent – children): e.g. site pedon mineral
Exploring A More General Data Model: ODM 2.0
• To achieve interoperability between EarthCHEM, CUAHSI ODM, LTER EML
• Better support for samples and unique identifiers (IGSN/SESAR)
• Extensibility to table attributes• Better annotation and provenance• Enable integrated web service based
publication of a broader class of CZO data
ODM 2.0 – Field Sensor Extension to support field sensor deployments and in
situ observations• Sensor
deployment details
• Attributes of sensor
• Data series from sensor
ODM 2.0 – Provenance and Annotations Extensions
• Better support for storing provenance of observational data
General Extensibility
Provides capability to record information (add fields) in tables that was not anticipated a-priori
CZchemDB
CZO-Central GeoChemDB[ODM 2.0]
CZO-Services
EarthChem Portal
USGS
NAVDAT
GEOROC
Geochemical database
EarthChemXML
CZO Data Display Format
Geochem Services (IEDA)
CZO Web Discovery
GeoChemDB Search
Web-based User Access
CZO Desktop
GfG Data Validation & Ingest
IEDA Long-Term Archiving Service
IEDA Data Publication Service
(DataCite)
SESAR
Sample Registration
EarthChemXML
Other client systems
Other client systems
Where we are today
• Each site has a data manager• Data sets are posted to the web
– consistent metadata and ascii format in progress• We’ve prototyped harvesting data and posting to a
central data portal• Shared vocabulary system in place• Developed protocol for unique sample ID• Partnering with EarthChemDB• Expanding ODM to become more general• Way beyond what I thought possible
Work plan for next two years• Extending the CZO data publication model to geochemical and GIS
data; then to other types of data – towards deeper interoperability
• Integration based on service and information model standards (WaterML, EarthChemXML, EML, OGC services)– Requirements gathering from all CZOs, data modeling, display file format
specification, services specification, development and validation– Upgrade to WaterML 2 once approved as international standard (~Q3, 2011)
• Registering more hydrologic time series data via CZO Central– Regularly harvesting registered files and updating CZO services; keeping
provenance information• Enhancing parameter-based search across CZOs, with a shared
parameter ontology• Making CZO central data system more robust
– Currently a single server with 24/7 monitoring; need redundant setup• Enhancing role of Data Managers