Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
INTEGRATED DATASYSTEM FOR CRITICALSYSTEM FOR CRITICALZONE OBSERVATORIES
Mark Williams, University of Colorado
Award Conditions for Critical Zone Observatories
• CZOs are expected to go beyond conducting scientific studies by the CZO PIs to providing data and field sites for use by each other and by the broader community.
Award Conditions for Critical Zone Observatories
• CZOs are required to enter data (whether directly measured by the CZO or obtained from elsewhere) into a public database that can be accessed through a common portal within a
bl i f hreasonable time after the measurements are taken.
Award Conditions for Critical Zone Observatories
• The CZOs are required to submit a common data model and ontology with their first annual reports and to provide annual updates on additions.
The water information value ladderThe water information value ladderThe water information value ladderThe water information value ladder
Forecasting
Analysis
Reporting
Done poorly
Distribution
p y
Integration
Q lit
Aggregation
DistributionDone poorly to moderately
Monitoring
Collation
Quality assurance
Sometimes done well, by many groups,but could be vastly improved
Monitoring
Slide Courtesy CSIRO, BOM, WMO, Ilya, Dozier
Future Directions for Critical Zone Observatory (CZO) Science
• Develop a unifying theoretical framework of critical-zone evolution
• Develop coupled systems models to explore how critical-zone services respond to
th i li ti d t t ianthropogenic, climatic and tectonic forcings D l i t t d d t / t• Develop an integrated data/measurement framework to support these activities
• interoperability• interoperability
Provenance andProvenance andtransparency
QuickTime™ and a decompressor
are needed to see this pict reare needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
CZOs as platforms for researchIntegrating satellite & ground measurements with modeling
CZO measurementsCZO measurements provide the basis for advances in multipleadvances in multiple Earth sciences
CZOs are DATA-RICHplaces to develop & test Earth system models
Challenges to CZO Data Management
Atmosphere Many Object & Data Types!Biosphere
Many Object & Data Types!• Diverse media• Sensor-basedHydrosphere
Lithosphere
Sensor based• Stationary• Mobile• Spectra/photos
• Sample-basedMinutes• Sub-samples• Preparations/Fractions
N i & C t i l
Decades
Millenia• Numeric & Categorical
Hillslope Catchment Watershed
Eons
Sample Fractions for Soil GeochemistryAdapting SESAR IGSN for CZOAdapting SESAR IGSN for CZO
EA-IRMS
Ziplock (~500g)Bulk soil
horizon or depth increment
l i l FTIRSA
DRY SIEVE 2 mm
glass vial:<2mm finesdry sieved<2mm
WET SIEVE, or DENSITY, or SETTLING
( ith ith t i ti )
glass vial:sand + XRD
SA
>2mm:
(with or without sonication) sand small detritus
The choice here is important. Do we want
aggregates or not?EA-IRMS
FTIR
ICP-MS after Li-borate fusion
XRDCEC
SPEX mill
EA-IRMSFTIR
(1) Pick out plantroots & detritus, rinse with DI water, oven dry,mill (SPEX?)
glass vial:plant detritus
milledglass vial:silt + clay
EA-IRMS
XRDCEC
SA
EA-IRMSFTIR
(2) Remainingpebbles & rocks,hard grind
glass vial:pebbles
hard ground ICP-MS after Li-borate fusion
FTIR
ICP-MS after Li-borate fusion
SPEX mill
Al Can (~70 g)For Gamma
Counting 137Cs
XRD? ExtractionsDithionite-Citrate extraction
Na pyrophosphate extractionAmmonium oxalate extraction
Christiana River CZO example
Overall ApproachOverall Approach
• Do not reinvent the wheel! Build onDo not reinvent the wheel! Build on– CUAHSI HIS, EarthChemDB, LTER, etc
• Consistent data presentation on web• Consistent data presentation on web– Metadata
Data values– Data values
• Central data system for data discovery– Harvested by SDSC (pull system)
CZO data principles and policiesCZO data principles and policies
• Each CZO will operate and be responsible forEach CZO will operate and be responsible for its own local data management system for collecting organizing quality controlling andcollecting, organizing, quality controlling and publishing data through its web site.
– Different philosophy than CUAHSI ODM– Different philosophy than CUAHSI ODM
– Each CZO is master of it’s own data
• We don’t care what goes on under the hood• We don t care what goes on under the hood
• Each site uses it’s own protocols, data bases, etc
All CZO t h it l d t• Allows CZO to honor site legacy data
CZO data principles and policiesCZO data principles and policies• Each CZO publish’s its data on the web in
ascii format with sufficient metadata so that the data can be unambiguously interpreted
• Metadata follows a proscribed format– Data managers just need rules to follow
• Easy to harvest by central portal• Makes it simple at the site level so scientists
comply– Addresses the chokepoint that is getting
d / d f h i i ddata/metadata from the scientists to data managers
Data Management TeamData Management Team
• David Tarboton, Utah State. PI on the CUAHSIDavid Tarboton, Utah State. PI on the CUAHSI Hydrologic Information System (HIS)
• Kerstin Lehnert, Columbia. PI on EarthChemDB,
• Ilya Zaslavsky, Lead, SDSC Spatial Information Systems Lab; hosts CUAHSI HIS. y
• Mark Williams, CU‐Boulder. PI Niwot Ridge LTER
• Anthony Aufdenkampe, co‐I Christiana River Basin y p ,CZO
• Data Managers: Introduce yourselvesg y
Data Management MeetingData Management Meeting
NSF has agreed to fund
Great meeting!Great meeting!
Integrated CZO data systemIntegrated CZO data systemSynthesizing information management experience and software from CZO partners p f f pand neighboring earth science projects into a standards‐based system for publishing environmental data to emphasize the “criticalenvironmental data to emphasize the critical zone” nature of our shared data sets
CZO Data Publication SystemExternal cross
CZO Data Repository and Indexing (CZO Central)
Standard CZO ServicesCZO Desktop
CZOData Products
External cross‐project registries
DataNet, NEON
Shar
ed
ocab
ular
ies
CZO
M
etad
ata
Ont
olog
y
Arc
hive
Har
vest
er
CZO
Desktop
C O pApplications CZO Web-based
Data DiscoverySystem
voM O A H
Standard CZO data display formats
Desktop
Matlab
R
Local CZO DB Local CZO DB Local CZO DB
Web site Web site Web site Excel
ArcGIS
Modeling
$$
S ti l h d l i h i l h i l i t lSpatial, hydrologic, geophysical, geochemical, imagery, spectral…
Data Publication Process(for hydrologic time series) CZO Desktop( y g )
CZO Display File ODMWaterML Service
CZO Desktop
CZO Catalog S h
OGC WFS
Raw Display file metadata Is registered with the CZO data portal, to assure original data is discoverable and downloadable.
WFS Service Is registered with the CZO data portal
Central Catalog
Search Service
ServiceCZO data portal
OGC CSW
Broader internet community
accessing data OGC CSW Service
CZO Portal utilizes the OGC CSW (catalog services for the web)
gusing standard protocols.
CZO data interoperability: what does it mean Swhat does it mean
Find and download CZO resources: files and file Different types of data
Data discovery portal
System componentsLevels of interoperability
resources: files and file collections, services, documents – organized by CZO thematic n at
a
Different types of datacollected by CZOs
portal
category and by typeData available in compatible semantics: ontologies, controlled te
gration
ety of da
Shared vocabulariesontologies, controlled vocabulariesData available via the same service interfaces ( WFS SOS) b ee
per int
der varie
and ontology management
(e.g. WFS, SOS) but different information modelsCompatibility at the level
De
Wi
Well‐understood data with
Serviceadministration (CZOCentral)
p yof domain information models and databases
formal information models available via standard services
CZOdesktop,others
Water ChemistryWater Chemistry• Header group (/doc): ‐ Title, Abstract, Investigator, Variable names, Keywords,
M th d I t t Cit ti P bli ti C tMethods, Instrument, Citation, Publications, Comments• Header group, column information
– COL1. Label=ValueAttribue, value=site– COL2. label=ValueAttribute, value=DateTime, UTCOffset=‐7, Timezone=MST, , , , ,
format=”YYYYMMDD hh:mm”– COL3. label=ValueAttribute, value=pH, units=pH, SampleMedium=water, units=pH units,
missing value indicator=, ,methods=method1, etc
• Header group, column (series) defaults that apply to all columns (eg site below)Header group, column (series) defaults that apply to all columns (eg site below)• Data (/data)• GREENLAKE4,820311,6.4,18,88.51,0.40,,114.77,24.68,21.75,10.23,25.389,,58.296,
83.200,,,,,,,,,,,,,,,,,,• GREENLAKE4,820422,5.7,18,90.15,2.00,,99.80,24.68,17.40,12.79,9.591,,72.870,44.
928,,,,,,,,,,,,,,,,,,
• Automatically harvested using WaterML and EML• Automatically harvested using WaterML and EML• ASCII format, metadata and comma‐deliminated data
New Development: Central CZO Data lDiscovery Portal
Registered data are organized by CZO thematic categories
CZO Shared Vocabulary SystemCZO Shared Vocabulary System
Purpose:
To promoteTo promote the consistent use ofuse of terminology.
http://sv.critialzone.org
Builds on CUAHSI HIS: Interoperabiity
Data Managers and SV
CSVData
Local CZOWebsite
File
Unknown Term
CSVData
DataManagers
TermEmail
ObservationDatabase
File
⌧Request DatabaseRequest Term
Web Page
XMLXML SV List
SVDatabase
Shared vocabularies. Moderators to be designated by CZO with expertise in eachdesignated by CZO with expertise in each
category• Variable names (extended from CUAHSI HIS)• Units (extended from CUAHSI HIS) (e.g. m, g/L)• Value type (from CUAHSI HIS) (e.g. Field observation, derived
value model output)value, model output)• Sample type (from CUAHSI HIS) (e.g. stream water, ground
water, rock, soil)• Data type (from CUAHSI HIS) (e.g. average over interval,
cumulative, continuous, sporadic)• Data level (based on Ameriflux list) (e.g. level 0=raw data, ata e e (based o e u st) (e g e e 0 a data,
level 4 = fully infilled and quality controlled)• Spatial references ( extensible based on EPSG) (e.g. NAD 1983,
WGS84 UTM zone 11)WGS84, UTM zone 11)•• KEYKEY: CZO expands ODM controlled vocabularies to a larger
audience using “shared vocabularies”
MethodsMethods
1. Major problem for metadata
2. Solution: lookup table that is part of the controlled vocabulary
3. Three parts: sample collection, sample preparation, analytical procedure
4. Up and running, needs moderators
CZO Spatial DataStandard CZO Data
Spatial DataCZO Desktop
CZO Services
CZO Data Repository and Indexing (CZO
Central)
Shar
ed
ocab
ular
ies
CZO
M
etad
ata
Ont
olog
y
Arc
hive
Har
vest
er
CZO
Desktop
C O pApplications CZO Web-based
Data DiscoverySystem
voM O A H
Standard CZO data display formats
Desktop
Matlab
R
Local CZO DB Local CZO DB Local CZO DB
Web site Web site Web site Excel
ArcGIS
Modeling
S ti l h d l i h i l h i l i t lSpatial, hydrologic, geophysical, geochemical, imagery, spectral…
Metadata and Spatial View
• MetadataMetadata- Multi File control
S ti l E t t• Spatial Extent- Ex: LiDAR flights,
transects etctransects, etc.- Point data (collected at
particular location)
Spatial View
particular location).- Uses Google Maps API
KML functionality p- KML functionalityGuo lab, UC Merced
Geochemical Samples (based on CZEN)Depth
Geochemical web services, EarthChemDB
CZO Desktop
Depth‐resolved
geochemistry
EarthChem Data Engine & Portal
CZODesktopSh
ared
oc
abul
arie
s
Met
adat
a
IGSN
an
agem
ent
Arc
hive
Har
vest
er
C O pApplications
CZO Web-based Geochemical DB
Matlab
R
voM ma A H
Standard CZO data display formatsExcel
ArcGIS
ModelingLocal CZO DB Local CZO DB Local CZO DB
Web site Web site Web site
G h i l lGeochemical samples
CZO Chemistry Database Conceptual Model – (CZOCHEMDB)
Var-Lookup/
Meta-D
Publication
P j t
Personcontributor
Country
Loc_info/Cli t
Methods
Sources
Precision
/Unit
Data
Geo-Info
Project
SMPL
Landuse/Veg.
/State
/Climate PrecisionTime Series Lab-Info
Preparat./Treatmen
t
12
.
.
.
Sub-smpl 2
Sub-sample Chemical
Phys. Minr
Others
Main D
ataSample Lab Analysis
Sub-smpl n
Location(Watershed)
Sampling Site(Soil / Water)
AnalysisSample(Layer/Depth)
DataPreparation/Treatment
Sub-Sample
Penn State lead
Progress
Database is accessible at www.czo.psu.edu
PSU CZO students and post-docs have used template for data entryentry
Susan Melzar (Colorado State) has used template and datahas been entered into database
Published data from Muhs et al. (2001), Harden 1987, White etal. (2008)
Current version contains 1391 records, representing 17,604 data values
R bi A t 24th t h d t b biliti dRan webinar August 24th to show database capabilities and usage of data entry template
15 participated with representation from all 6 CZO’sp p p
User guide is in progress
Integration with EarthChem PortalIntegration withEarthChemDB
EarthChem Portal
EarthChem XML DBTopical Data Collections
Geochemical Resource Library
datasetsExternal Databases
Metadata catalog
products)
datasets(original data & derived products)
GCDM DBGEOROC
USGS
NAVDAT
GfG D E
33
GfG Data EntryUser Submission
Kerstin Lehnert
EarthChem PortalEarthChem Portal
USGS
GEOROCNAVDA
PetDB Others
USGSCT
XMLXML XML
EarthChem
XML XML
Partner databases encode their data & metadata in XML and
Data EngineDatabase
send them to the EarthChem portal database in Kansas.
EarthChem Data EngineSearch & Visualization
Queries submitted at the EarthChem portal search the
t t f th E thCh
34
Search & Visualization contents of the EarthChem Portal Database.
Similar to our ODM hydrology portal
INTERNATIONAL GEOSAMPLE NUMBERNUMBER
•Purpose: Unique identification for samples andPurpose: Unique identification for samples and related sampling features in the Earth Sciences
–To allow unambiguous referencing of data to–To allow unambiguous referencing of data to samples in publications and data systems
–To allow tracking samples through repositories &To allow tracking samples through repositories & labs
–To allow integration of distributed data for samplesD3 1 To allow integration of distributed data for samplesD3-1
ParentParentChild
Child
Core S ti 1 Sample 1 Fossil separate
Parent ChildChildParent
Section 1 Sample 1
Sample 2Microprobe mount
IGSN:XXX0065B3
Core Core Section 2
Sample 1
Sample 2Rock powder
Mineral conc.IGSN:XXX000120 IGSN:ABC0L53NW
IGSN:ABC0L653X
IGSN:ABC078HGB
Sample 3
Sample 1
LeachateGS 000 0
IGSN:XXX07ST4K IGSN:ABC0L98SW
IGSN:ABC0L53NW
Core Section 3
Sample 1
Sample 2
Sample 3IGSN:XXX9K23G6
Geoinformatics for Geochemistry
Sample 3IGSN:XYZ0G693M
ADAPTING IGSN for CZOADAPTING IGSN for CZO•Register any type of sample:Register any type of sample: pedons, hand specimens, mineral concentrates, etc. …
•Register any type of material: soil, rock, sediment, fluid, gas, bio ….
•Register ‘sample‐related features’: sites, wells, cores, d ddredges …
•Register relations (parent –hild ) it dchildren): e.g. site pedon mineral
Exploring A More General Data Model: ODM 2 0Model: ODM 2.0
• To achieve interoperability betweenTo achieve interoperability between EarthCHEM, CUAHSI ODM, LTER EML
• Better support for samples and unique• Better support for samples and unique identifiers (IGSN/SESAR)
E ibili bl ib• Extensibility to table attributes
• Better annotation and provenance
• Enable integrated web service based publication of a broader class of CZO data p
ODM 2.0 – Field Sensor Extension to t fi ld d l t d isupport field sensor deployments and in
situ observations• Sensor deployment details
• Attributes of sensor
• Data seriesData series from sensor
ODM 2.0 – Provenance and Annotations Extensions
• Better support forBetter support for storing provenance of observationalof observational data
General ExtensibilityGeneral Extensibility
Provides capability toProvides capability to record information (add fields) in tables th t tthat was not anticipated a‐priori
CZOWeb GeoChemDB
Web‐based User Access
CZO Other clientEarthChem PortalCZO Web
DiscoveryGeoChemDB
SearchCZO
Desktop
EarthChemXML
Other client systems
CZO‐Services USGS
NAVDAT
Geochem Services (IEDA)
NAVDAT
GEOROC
Geochemical EarthChemXML
CZO‐CentralGeoChemDB[ODM 2.0]
database
CZO Data Display Format GfG Data Validation & Ingest
IEDA Long‐Term Archiving Service
IEDA Data Publication Service (DataCite)
CZchemDBArchiving Service
SESAR
Sample Registration
Synthesis Example: HydrologySynthesis Example: Hydrology
Budyko curveBudyko curve
Budyko CurveBudyko Curve
Vulnerability and ResilianceVulnerability and Resiliance
R iliResiliance
Vulnerabilityy
Anthony demonstrationAnthony demonstration
Where we are todayWhere we are today
• Each site has a data managerg• Data sets are posted to the web
– consistent metadata and ascii format in progress
• We’ve prototyped harvesting data and posting to a central data portalSh d b l i l• Shared vocabulary system in place
• Developed protocol for unique sample IDP t i ith E thCh DB• Partnering with EarthChemDB
• Expanding ODM to become more general• Way beyond what I thought possible• Way beyond what I thought possible
Work plan for next two years• Extending the CZO data publication model to geochemical and GIS
data; then to other types of data – towards deeper interoperabilitytowards deeper interoperability
• Integration based on service and information model standards (WaterML, EarthChemXML, EML, OGC services)– Requirements gathering from all CZOs data modeling display file formatRequirements gathering from all CZOs, data modeling, display file format
specification, services specification, development and validation– Upgrade to WaterML 2 once approved as international standard (~Q3, 2011)
• Registering more hydrologic time series data via CZO CentralRegistering more hydrologic time series data via CZO Central– Regularly harvesting registered files and updating CZO services; keeping
provenance information
• Enhancing parameter‐based search across CZOs, with a shared g p ,parameter ontology
• Making CZO central data system more robust– Currently a single server with 24/7 monitoring; need redundant setupCurrently a single server with 4/7 monitoring; need redundant setup
• Enhancing role of Data Managers
CZO Web Services Model
CZO Web Services
Time Series Service
CZO Catalog Service
CZO Ontology Service
Geochemical Geophysical…
Spatial Data Services
Processing Services
RESTRESTREST/SOAPREST/SOAP RESTRESTRESTREST
. . .REST/SOAPREST/SOAP RESTREST
WPSCatalog
WFS (F )
SOS (Sensor)
WFS
WPSSOS (Sensor)
WFS
SOS (Sensor)
WFSWFS (Features)
WMS (Maps)
WFS (Features)
WMS (Maps)
WFS (Features)
WMS (Maps)
WFS (Features)
WMS (Maps)
WaterML 2.0 EarthChemML, EML,GeoSciML
CZO Central short term needsCZO Central short term needs• Consistent data catalog at each site
li i d b i f i di l fil• Policies and best practices for generating display files, detecting new information to harvest into CZO Central, determining update frequencyg p q y
• Validation against shared vocabularies and semantic tagging
• Extending the CZO data publication model to other types of data
• Compatibility among databases and services: CUAHSI, EarthChem other observatories (using OGC standards EMLEarthChem, other observatories (using OGC standards, EML, EarhChemML, WaterML)
• Presenting data use agreements in services, tracking usage
• Continued development of ongoing data management activities
Development of CZO Central Data System 2010 2011System, 2010‐2011
1. CZO data publication system for point time series: regularly harvest display files and update CZO data services; validateharvest display files and update CZO data services; validate against shared vocabularies; move to WaterML 2.0 [70K]
2. CZO Central Catalog: registering and harvesting CZO services into CZO central catalog; data discovery services for point data; userCZO central catalog; data discovery services for point data; user interface for CZO data managers [80K]
3. Maintain and enhance ontology of CZO parameters, to enable t b d h [20k]parameter‐based search [20k]
4. Data modeling, display file format specification, web service specification and development, for different types of CZO data –
f d i bili ki [ k]as part of CZO data interoperability working group [50k]5. Managing CZO central server installation; redundant server
setup; 24/7 monitoring, notification and reporting [20k]
Long termLong term
• Annual data managers meeting?Annual data managers meeting?
• Where does the integrated data base reside?
h i ?• Who runs it?
• Compliance– Make that an integral part of site renewals
What has been done• Significant progress in the four components: discovery portal semantics managementdiscovery portal, semantics management, services management, client applicationA t f d t f ll CZO it h b• A team of data managers from all CZO sites has been meeting bi‐weekly via VTC to discuss data sharing: data integration needs formats proceduresdata integration needs, formats, procedures, services, etc.
• A lot of CZO‐collected data made available asA lot of CZO collected data made available as services using this mechanism – but more need to be registered
• Numerous meetings, include all data managers and 4 of 6 PI’s in May 2010
SESAR System For Earth Sample Registration
–Development funded by NSF since 2004• Since 2010 funded by NSF as part of Integrated Earth Data Applications (IEDA) Cooperative AgreementApplications (IEDA) Cooperative Agreement
–Operates registry for the IGSN (International Geo Sample Number)
• Registration of name spaces for IGSN users• User services for sample registration (MySESAR)• Public search interface of sample catalogp g
–Ensures preservation and access of sample metadata• To facilitate sharing of available samplesT t l d d t ti• To support sample and data curation
www.geosamples.org