34
CZO Integrated Data Management Data Model and Metadata David Tarboton

CZO Integrated Data Management Data Model and Metadata David Tarboton

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: CZO Integrated Data Management Data Model and Metadata David Tarboton

CZO Integrated Data Management

Data Model and Metadata

David Tarboton

Page 2: CZO Integrated Data Management Data Model and Metadata David Tarboton

Based on CUAHSI HIS

Data Discovery and Integration

Data PublicationData Synthesis and

Research

HIS Central

HydroDesktopHydroServer

Metadata

Data

WaterML

GMLOGC Services

ODM

Analysis

Geo Data

Internet based system to support the sharing of hydrologic data comprised of hydrologic databases and servers connected through web services and software for data publication, discovery and access.

SupportEAR 0622374

CUAHSI

HISSharing hydrologic data

Page 3: CZO Integrated Data Management Data Model and Metadata David Tarboton

CZO Servers

Boulder Shale Sierra Luquillo Jemez Christina

Standardized web based display

Harvester

CZO Central

Data Store

Data System OverviewCZO Desktop

GetSitesGetSiteInfoGetVariableInfoGetValues

WaterOneFlowWeb Service

WaterML

ASCII text

Page 4: CZO Integrated Data Management Data Model and Metadata David Tarboton

Requirements• Sufficient metadata for published CZO data to

be unambiguously interpreted and used• Each CZO operate own local data management

system• Format used to present data and metadata

should be identical across CZOs and should support heterogeneous local systems

• Local systems are autonomous with local control on the release and publication of data

Page 5: CZO Integrated Data Management Data Model and Metadata David Tarboton

Access

• Users required to agree to CZO data use policies

• Same data use agreement for all CZOs• Data accessible freely to registered users who

have agreed to policy

Page 6: CZO Integrated Data Management Data Model and Metadata David Tarboton

Information Hierarchy

• National CZO• Experimental

Watershed • Sites• Variables• Series• Data values

Page 7: CZO Integrated Data Management Data Model and Metadata David Tarboton

Abstract data model

• (where) location, object or platform identifier• (when) date and time• (what) attribute (or identifier of attribute)• THE VALUE• (how) method (or identifier of method)• (who) creator (or identifier of creator or data

source)

Page 8: CZO Integrated Data Management Data Model and Metadata David Tarboton

Data series

• used as an organizing construct• logical grouping of data values (usually from a

column in a table)• commonly, but not limited to time series (e.g.

type series with depth)• Properties we control become identifying

series-level attributes• Properties we measure become variables or

variable level attributes

Page 9: CZO Integrated Data Management Data Model and Metadata David Tarboton

Why an Observations Data Model• Syntactic consistency (File types and formats)• Semantic consistency

– Language for observation attributes (structural)– Language to encode observation attribute values

(contextual)

• Publishing and sharing research data • Metadata to facilitate unambiguous

interpretation• Enhance analysis capability

What are the basic attributes to be associated with each single data value and how can these best be organized?

Page 10: CZO Integrated Data Management Data Model and Metadata David Tarboton

Community Design Requirements(from comments of 22 reviewers)

• Incorporate sufficient metadata to identify provenance and give exact definition of data for unambiguous interpretation

• Spatial location of measurements• Scale of measurements (support, spacing, extent)• Depth/Offset Information• Censored data• Classification of data type to guide appropriate interpretation

– Continuous– Indication of gaps

• Indicate data quality

http://www.neng.usu.edu/cee/faculty/dtarb/HydroObsDataModelReview.pdf

Page 11: CZO Integrated Data Management Data Model and Metadata David Tarboton

Observations Data Model

Soil moisture

data

Streamflow

Flux tower data

Precipitation& Climate

Groundwaterlevels

Water Quality

• A relational database at the single observation level• Common persistence model for observations data• Metadata for unambiguous interpretation• Traceable heritage from raw measurements to usable

information• Promote syntactic and semantic consistency • Cross dimension retrieval and analysis

Horsburgh et al., 2008, WRR 44: W05406

Page 12: CZO Integrated Data Management Data Model and Metadata David Tarboton

Horsburgh, J. S., D. G. Tarboton, D. R. Maidment and I. Zaslavsky, (2008), A Relational Model for Environmental and Water Resources Data, Water Resour. Res., 44: W05406, doi:10.1029/2007WR006392.

CUAHSI Observations Data Model http://his.cuahsi.org/odmdatabases.html

Page 13: CZO Integrated Data Management Data Model and Metadata David Tarboton

Stage and Streamflow Example

Page 14: CZO Integrated Data Management Data Model and Metadata David Tarboton

Water Chemistry from a profile in a lake

Page 15: CZO Integrated Data Management Data Model and Metadata David Tarboton

Water Chemistry from Laboratory Sample

Page 16: CZO Integrated Data Management Data Model and Metadata David Tarboton

CUAHSI Observations Data Modelhttp://www.cuahsi.org/his/odm.html

123

Work from Out to In

4

56

7

At last …

And don’t

forget …

Page 17: CZO Integrated Data Management Data Model and Metadata David Tarboton

ODM ODM ODM

WaterOneFlow WaterOneFlow WaterOneFlow

HydroServerDatabase

ODM Databases and WaterOneFlow Web Services

ArcGIS Server Spatial Data Services

SpatialServices

WaterOneFlowServices

Map Server Time Series AnalystHydroServer Website HydroServer Capabilities Web Service

HydroServer Database

Configuration Tool

HydroServer - A Platform for Managing and Publishing Experimental Watershed Data

http://hydroserver.codeplex.com/

Page 18: CZO Integrated Data Management Data Model and Metadata David Tarboton

Dynamic shared vocabulary moderation system

Local ODMDatabase

Master ODM Shared

Vocabulary

ODM Website

ODM SharedVocabulary Moderator

ODM Data Manager

ODMShared

VocabularyWeb Services

ODM Tools

Local Server

XMLXML

http://his.cuahsi.org/mastercvreg.html From Jeff Horsburgh

Page 19: CZO Integrated Data Management Data Model and Metadata David Tarboton

CZO Servers

Boulder Shale Sierra Luquillo Jemez Christina

Standardized web based display

Harvester

CZO Central

Data Store

Data System OverviewCZO Desktop

GetSitesGetSiteInfoGetVariableInfoGetValues

WaterOneFlowWeb Service

WaterML

ASCII text

Page 20: CZO Integrated Data Management Data Model and Metadata David Tarboton

CUAHSI HIS – looking ahead

• A “data sharing/social networking” site for hydrologic data (and possibly models)

• Simple and easy to use• Find, create, share, connect, integrate, work

together online. Collaborate• Hydro value added

Page 21: CZO Integrated Data Management Data Model and Metadata David Tarboton

CZO web based file format • Time series display files

– The data – time series in columns• Methods files

– A single file listing the methods used by the CZO• Measurement location files (the term agreed for what used to

be called a site. Other names considered were station, node, monitoring point, platform) – A single file listing the measurement locations at which

measurements are made by the CZO– Need a concept of spatial grouping for locations– Identify the groups that locations belong to – implies a

need for a location groups file. (Measurement groups)The slides from this one following contain edits made during the presentation, e.g. the change from “site” to “measurement location”. As a result they may not be entirely consistent, but were as we left things at the end of the meeting.

Page 22: CZO Integrated Data Management Data Model and Metadata David Tarboton

Time series display file

• Header– Doc group– Default parameter group– Column header group

• Data – Columns of data

Page 23: CZO Integrated Data Management Data Model and Metadata David Tarboton

Doc groupDoc Attributes

Description

Title A title for the set of data series in the fileAbstract Description of the dataInvestigator contact Information

Name and contact information for investigator responsible for the data

Keywords Keywords useful for discovery of the data seriesVariable names

Names for variables for the data series

Citation Text string that give the citation to be used when the data are referenced.

Publications Publications related to this dataComments Additional comments related to interpretation and use of this

data

Page 24: CZO Integrated Data Management Data Model and Metadata David Tarboton

Default parameters pertain to all data in file except when overridden by a specific column header (to encourage specification only once)

ExamplesDEFAULT_PARAMETER. site ="GREEN LAKE 4" DEFAULT_PARAMETER. offset_value ="2", offsetUnits =

"meters", offset_description= "this is vertical offset from the ground level down"

DEFAULT_PARAMETER. quality_control_level ="0" DEFAULT_PARAMETER. missing_value_indicator ="-

9999"

Page 25: CZO Integrated Data Management Data Model and Metadata David Tarboton

Column headersExamplesCOL1. label=ValueAttribute, value=DateTime, UTCOffset=-7,

Timezone=MST, format=”YYYYMMDD hh:mm”COL2. label=VariableName, value=StreamFlow, units=m3/s,

TimeSupport= 3, TimeSupportUnits=hr, NoDataValue=-9999, SampleMedium=water, method=method1, Offsetvalue = 3, OffsetValueUnits=m , offsetDescription = "Depth below surface"

COL3. label=VariableName, value=pH, units=pH units, missing value indicator=-9999

COL4. label=VariableName, value=conductance, units=uS/cm @ 25 degrees C

Page 26: CZO Integrated Data Management Data Model and Metadata David Tarboton

Series level attributes• Required metadata for each data value in a

CZO time series display fileSiteCodeUnitsMethodOffsetValueOffsetDescriptionSampleTypeVariableNameSampleMediumValueTypeTimeSupport

TimeSupportUnitsDataTypeDataLevel NoDataValueUTCOffsetTimeZoneOffsetValueOffsetDescriptionOffSetUnitsCensorCode

Page 27: CZO Integrated Data Management Data Model and Metadata David Tarboton

Series level attribute definitions 1Attributes DescriptionLocationCode Code used to identify the Measurement Location (refers to Measurement

locations file)Units The units associated with a data valueMethod Identifier to point to a record in the methods fileOffsetValue The value of a measurement offset if constant. (Optional)OffsetDescription Full text description of the offset value. (Optional, but required if OffsetValue is

given)VariableName Name of the variable from the variables preferred value table.SampleMedium The medium of the sample or where the measurement is made. This should

be from the SampleMediumPV preferred vocabulary table. ValueType Text value indicating what type of data value is being recorded. This should be

from the ValueTypeCV controlled vocabulary table. (e.g. Field measurement, modeled, derived)

TimeSupport Numerical value that indicates the temporal footprint of the data values. 0 is used to indicate data values that are instantaneous. Other values indicate the time over which the data values are implicitly or explicitly averaged or aggregated.

Page 28: CZO Integrated Data Management Data Model and Metadata David Tarboton

Series level attribute definitions 2Attributes DescriptionTimeSupportUnits Units of time support value from Units PV table.DataType Text value that identifies the data as one of several types (e.g. min,

max, average). PVDataLevel Level used to identify the level of quality control to which data

values have been subjected. Ameriflux is the starting point. Quality control and processing.

Version DOI. A version is associated with a publication or specific release for a specific analysis purpose.

NoDataValue The value used to encode no dataUTCOffset Offset in hours from UTC time of the corresponding LocalDateTime

value.TimeZone Time zone where observation site is located (e.g. Mountain time)OffSetUnits Units with which the offset value is measured (Units PV)CensorCode Text indication of whether the data value is censored from the

CensorCodeCV controlled vocabulary. See USGS document that Anthony knows about

Page 29: CZO Integrated Data Management Data Model and Metadata David Tarboton

Value level attributesAttributes DescriptionDateTime The date and time at which the value was

observedOffsetValue The value of a measurement offset. (Optional).

[Note that OffsetValue may be either a series level, or value level attribute for any data series, depending upon whether it is a controlled or measured property.]

SampleNumber (then put sample attributes in a separate file associated with sample numbers a cross reference to SESAR)

Type of sample, e.g. grab, from groundwater, from leaf. From sample type preferred value table. Collection method. Need a more general concept of sample attributes. Also need sample number.

Spatial Support Horizontal Optional Spatial Support Vertical OptionalValueAccuracy Specify as absolute

Any value level attribute that is the same for an entire series may be promoted to series level attribute and go in column header

Page 30: CZO Integrated Data Management Data Model and Metadata David Tarboton

Measurement Locations fileMeasurement Location File Attribute labels

Description

SiteCode Code used by organization that collects the data to identify the siteSiteName Full name of the sampling site.Latitude Latitude in decimal degrees.Longitude Longitude in decimal degrees. East positive, West negative.LatLongDatum The Spatial Reference System of the latitude and longitude coordinates in the

SpatialReferences table.Elevation  Elevation of site (in m – or do we want a separate item to give units).VerticalDatum Vertical datum of the elevation. Controlled Vocabulary from VerticalDatumCV.LocalX Local Projection X coordinate. (Optional)LocalY Local Projection Y Coordinate. (Optional)Local Z Local elevation coordinateLocalProjection Identifier that references the Spatial Reference System of the local coordinates.

(Optional) X, Y and ZPosAccuracy Value giving the accuracy with which the positional information is specified.

(Optional)Comments Comments related to the site. (Optional)

Sampling feature refers to feature of interest.

Page 31: CZO Integrated Data Management Data Model and Metadata David Tarboton

Methods file

Attributes Description LinkMethod Description of each

method.Hyperlink to external reference on the method (Optional)

Is further subdivision needed to elicit specific method elements ?

Page 32: CZO Integrated Data Management Data Model and Metadata David Tarboton

Shared vocabularies• Variable names (grouped into categories with a keyword list associated with

each name. Need a field for keywords and categories to be added to present CUAHSI HIS system) (e.g. Precipitation, Streamflow, Nitrogen, Soil moisture)

• Units (extended from CUAHSI HIS) (e.g. m, g/L)• Value type (from CUAHSI HIS) (e.g. Field observation, derived value, model

output)• Sample type (from CUAHSI HIS) (e.g. stream water, ground water, rock, soil)• Data type (from CUAHSI HIS) (e.g. average over interval, cumulative,

continuous, sporadic)• Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 = fully

infilled and quality controlled)• Spatial references ( extensible based on EPSG) (e.g. NAD 1983, WGS84, UTM

zone 11)• Censor code (from CUAHSI HIS) (e.g. less than, not-censored, non detect)• Qualifier code (in CUAHSI HIS qualifiers are not a PV. A CZO specific set of

qualifiers will need to be developed)• Vertical datum (from CUAHSI HIS) (e.g. Mean Sea Level, NGVD29)

Page 33: CZO Integrated Data Management Data Model and Metadata David Tarboton

Ilya’s Unresolved issues• Policies and best practices for generating display files

and setting up data folders, and how we detect what is new

• Update frequency• Semantic tagging (how automated)• How shall we handle situations when data are

removed/overwritten?• Need more examples and test cases• What information in log files is needed• How to present data use agreements in services• How to deal with different types of data

Page 34: CZO Integrated Data Management Data Model and Metadata David Tarboton

Other issues• Other data types

– Maps, GIS data (OGC web services?)– Geophysical data, images, geochemistry data,

geological data, soil profile data• Simple capability to store and share arbitrary

digital objects with metadata using e.g. Catalog Services for the web

• LIDAR data (just use SDSC Open Topography or NCALM)

• Archiving• Questions, additional needs (wishes)