View
220
Download
1
Category
Tags:
Preview:
Citation preview
OPeNDAP-Unidata Development of
DAP4 (a Data Access
Protocol)
Describing Progress and Seeking Input
at the ESIP Summer Meeting 2012
by Dave Fulker (OPeNDAP President)
2
Overarching Concept of OPeNDAP’s Data Access Protocol (DAP):
Clients Get Only Needed Data, When They Need themAccessing data through web services (i.e., URL ≈ dataset)
Appending query strings to invoke server functions, esp. subsetting
Getting responses of 2 major types:
Metadata - dataset descriptions & catalogs (textual)
Content - values and metadata (binary or textual)
Using responses in diverse client contexts, e.g.,
MATLAB maps DAP responses directly to its internal math types
DAP libraries (netCDF, e.g.) simplify the programming of apps
3
Some of DAP Users’
Distinguishing Needs Data often depict (scientific) phenomena where
Geospatial maps are among the useful views
But other views are important as well
Coordinates often are 2-, 3-, 4- & even 5-dimensional
These may include (time-dependent) coordinate-proxies
Users often wish to use data whose source files
Are in a variety of inconvenient formats
With insufficient or obsolete metadata
4
Present State of DAP
The DAP2 specification (after nearly 2 decades!) has multiple contemporary realizations on servers and clients
Clients include: MATLAB, GRADS, IDL, IDV...
Python apps that employ the PyDAP library
Fortran, C, C++ & Java apps that employ the netCDF library
Servers include: PyDAP, ERDAP... (often with augmented services)
Most widely deployed: TDS (Unidata) & Hyrax (OPeNDAP)
Widely used by data providers and users, including cases where DAP servers provide translations of inconveniently formatted source files
5
Branching: Hyrax & THREDDS
Multiple implementations of a protocol often is considered a good thing (per IETF, e.g.)
This can be a problem, however, if the implementations embody excessive redundancy or confuse users
Our view: co-existence of TDS (Unidata) & Hyrax (OPeNDAP) reflects some redundancy & creates some inconsistencies for users
Need #1: achieve conformance ⇒ consistency for users
Need #2: more software reuse ⇒ more advancement
6
NOAA/BAA grant for
OPeNDAP-Unidata Linked Servers (OPULS)
Goal 1: OPeNDAP/Unidata conformance & linkage
New data-model/protocol specs (DAP4), with conformance tests & extensibility demos:
Modes of asynchronous access (to near-line data, e.g.)
Server-side subsetting of data on irregular meshes
Goal 2: common software for OPeNDAP & Unidata servers
Work yet to begin...
7
OPeNDAP Data-Type Philosophy(reflected in DAP2 & now DAP4)Data model has few data types
For simplified programming & lowered risk of errors
Data types are deliberately domain-neutral
For better trans-domain utility & programmer uptake
But they allow both syntactic & semantic structures/metadata
These Types do in fact support domain needs
NetCDF-like (can represent functions on 4-D domains, e.g.)
Sequences & selections match DBMS sensibilities
8
DAP4 Data Model (simplified)dataset ≈ unique URL (with no query
string)
a dataset holds a hierarchy of groups, each a namespace
/container for variables, dimensions & attributes
each variable comprises
a name(unique
in the
group)
a type(which
applies to all values)
value(s) (organized as dimensioned
arrays)
attributes*
(optional)
*Attributes are like variables but with a semantic purpose, making a variable or a group more meaningful. E.g., variables often have an attribute (of type string) named “units.”
9
DAP4 Data Types & Relations
as in C or Java, e.g., a variable’s type may be structured or atomic: integer,
float, byte, string...
DAP variables may be (semantically) related to one another via two key grouping
constructs
relations link 1-D variables as columns
in a table;
sampled functions link
coordinate-map variables (domain) to
function-value variables (ranges)
having common indexes
in turn, relations can be linked via
variables that serve as foreign keys
10
DAP4 Operations (invoked as query strings)3 kinds of constraint expressions (i.e. query strings) yield subsets or invoke
(server-side) processing
projection(returns a subset)
selection(returns a subset)
function(today’s town
hall!)
specify included
variables (by name) as well as indices of included array
elements
limit tuples (rows) of a relation to those with
variable values satisfying a DBMS-style predicate
invoke server functions to calculate a return [we intend to target
critical needs]
11
Like netCDF, but as a Web service, users may
Skip indices
Limit index ranges
Reduce dimensionality
OPeNDAP Projection Operators
12
Other DAP-Related ServericesNote: these were not part of the DAP2 specification...Many DAP-based servers (from Unidata &
OPeNDAP, e.g.)
Accept multiple types of data as inputs
Offer several views of them over the web
Native DAP web services: for DAP-enabled clients
Source format (lossless): netCDF-to-netCDF or HDF4-to-HDF4, e.g.
Alternative web services: html (browser views), XML, WCS, etc.
Town-Hall: what other services should be offered?
13
Other OPULS AccomplishmentsIrregular mesh subsetting
Progress with U WA (Bill Howe)
To be released soon...
Asynchronous accessPreliminary trials...
Cloud-based service provision (with parallelism)MODIS reprojection (related, but not OPULS funding)
14
OPULS Process
Transparency
Public documentation updated weekly (just Google OPULS!)
Advisory committee
Jeff de La Beaujardiere, James Frew, Mike Folk, Steve Hankin, Eric Kihn, Rich Signell
Welcoming input (per this town hall)
15
Town-Hall Questions
What server functions ought to be specified in the DAP4 protocol?
Simple point-wise mathematics
Mathematics on sampled functions
Truly domain-specific functions (involving the datum, e.g.)
Which (other) web-service protocols should be leveraged by DAP servers, & what are the pertinent use cases?
To facilitate open search (exploiting ATOM), e.g.
To facilitate semantic analysis (providing RDF output, e.g.)
Others?
16
i thank
you
• OPeNDAP, Inc
• http://opendap.org
• increasing data
’s visibility
• OPeNDAP, Inc
• http://opendap.org
• increasing data
’s visibility
Recommended