21
An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri October 16, 2006

An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Embed Size (px)

Citation preview

Page 1: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

An Integrative, Standards-Compliant Framework for

TDWG Schemasand Services

Phillip C. DibnerEcosystem Associates

TDWG Annual MeetingSaint Louis, Missouri

October 16, 2006

Page 2: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

ISO 19103: Basic Definitions

ISO 19103 defines a Conceptual Schema Language (CSL) for geographic information - as a profile of UML.

A Conceptual Schema is a formal description of a conceptual model, i.e., a model that defines the concepts of a universe of discourse.

An Application Schema is a Conceptual Schema for data required by one or more applications.

Features are abstractions of real-world phenomena. Thus they figure prominently in applications that address real-world issues.

Features have characteristics, or attributes, each of which has a name, a data type, and a value domain.

Page 3: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

General Feature Model (GFM)- Strictly speaking, the GFM is defined by ISO 19109, which adds detail and context to the 19103 definition- Again: Features have properties with a name, a type, and a value domain.- This is consistent with the general definition of Objects; maps cleanly to data object representations in a variety of programming environments- Is consistent with other modeling languages than UML (GML, RDF)- Echoes the normalization imparted by ER DBMS models- Allows complex objects to be factored into simpler entities, all with the same underlying structure - Conversely, permits properties of undetermined type to be “stubbed out” with interim datatypes pending further analysis, while the rest of a model is completed- Relationships among elements are clear- Facilitates integration with other compliant components- Validated by substantial experience

In sum, it supports consistent, normalized concept models, and facilitates integration and analysis.

Page 4: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

E.g., Java object definition

public class DarwinCoreData { public String GlobalUniqueIdentifier; public String DateLastModified; public String BasisOfRecord; public String InstitutionCode; public String CollectionCode;

•• •

}

NamesTypes

Value domains are defined in context of the application: customary values known within the discipline.

Attributes

Page 5: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

ER models for DBMS

Entity 2

Types

Names

Names Types

Entity 1Reference

userIn

fo (e.g.)

Page 6: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Factoring, Design, Decomposition

Independent Entities

Add Later

Exchange / Rearrange

Page 7: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Representation as XML Schema

• Similar benefits accrue if schemas follow the same pattern: root element is an object, its children are properties. Values of these properties may be literals or other objects, which in turn have properties ...

• Provides for congruence between object models and their Schema representations, and for a natural mapping between representations.

• Really just OOA / OOD.

(Note difference in terminology - ISO Feature attributes == Schema object properties.)

Page 8: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

TDWG Schemas as Classes of Objects

Darwin Core - description of a collected or observed real-world specimen. Therefore Darwin Core can be considered to specify a class of Features, and DwC instance documents describe actual Feature instances.

ABCD - also describes entities that exist somewhere in the world. ABCD instance documents describe concrete Feature instances.

TCS - might not describe a class of Features, but does in concept describe a class of objects.

They do not in general follow an Object / property paradigm (DwC 1.4 did - it was flat. The new version is still under discussion and development.)

Page 9: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Is this a disaster?

No

• All of these schemas provide concepts, terminology, and data structures - a vocabulary - that embodies the fundamentals of the domain.

• Moreover:

• 1. It’s possible to insert properties between a container object and a nested object: define a nestedObjectType and insert a property of that type between the parent and child.

• 2. It is likely possible to transform the data automatically, in real time if need be, if you want to serve them e.g. as GML Features (an encoding of ISO Feature), using XSLT transform or other technology.

• 3. The design has arguably been done, so the benefits to analysis, factoring, etc. might not be an issue.

Page 10: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

In that case why is there an issue?

• There is some burden to supporting transforms, and we still need to define compliant schemas anyway, if we want to serve or otherwise use them as ISO Features.

• Likely to require custom software that either knows the structures explicitly or knows details of how to parse and interpret them, instead of standard tools that instantiate objects directly from instance documents.

• Maybe further analysis and refactoring will be needed after all.

• What to do? Keep the object/property model in mind for future work. Current versions can be viewed as “flattened views” of a more general conceptual model.

Page 11: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

The Feature of Interest

Values for attributes of a real-world object or phenomenon that is the subject of study (the Feature of Interest) may be:1. Asserted: the attribute is simply

assigned a value such as a sample number, experimenter, institution, guid, etc.2. Observed / Measured: the value of the attribute is an estimate derived from some procedure. There is a well-defined conceptual model for such values, built upon a strong theoretical foundation: the O&M model.

Page 12: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

The Observation Feature Type (O&M)

From Cox, 2005. OGC document 05-087r3_Observations_and_Measurements

FoI

Observation

Phenomenon

Page 13: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

If our concepts are modeled as objects, they can be incorporated into this observation

model, either naively or through more ambitious analysis.

If the property we wish to “measure”is taxon, then: - the Feature of Interest may be a (collection of) specimen(s), effectively a DataSet (as defined in TCS) - the model for the Phenomenon we are “observing” is surely addressed by the vocabulary of TCS - (perhaps TaxonConcept or TaxonName)

- the codespace for the values of the observed properties - the results - might be the set of scientific names of some designation, along with a reference to the author and publication - an AccordingTo (per TCS)If the observed property is collection or observation location, then: - the FoI is the specimen (or the field occurrence of the specimen) - the Phenomenon is geolocated geometry - the result is an instance of such a geometry

Page 14: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Remaining Issue

We have more than one model for the same kind of information. Will it be possible to combine data from different services that respectively provide, e.g., Darwin Core and ABCD? Can we develop a single conceptual model with which these and other TDWG data models - and external models - are consistent?

This is a generic problem.

Page 15: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

A General Approachto Domain Modeling

(After R. Atkinson and S. Cox, at the TDWG GIG Workshop in Edinburgh, June, 2006)1.Examine the domain and break up into subdomains

If using UML, this is accomplished by grouping related objects into various UML packages. The packages can be distributed for others to use.

2. Decide what doesn’t go in the domain of interest and belongs in someone else’s.

In UML, put in a placeholder or stub package, to be replaced later.

3. Identify the common elements that everyone agrees on, and that all implementations will include.

4. These form the basis of a conceptual model.

Page 16: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Domain Modeling(Atkinson and Cox)

5. Proceed to identifying points in question or of disagreement. Clarify implications, explore consequences for the model.

The notion is that here at least we can keep the model coherent.

6. Develop or bring into the discourse representational views that are of importance to near-term or legacy implementations. These represent the varied and sometimes incompatible viewpoints that different implementors have of the domain. Exercising these helps to clarify the conceptual model.

Methodology and tools for mapping representational views to the conceptual model and to each other are still very much under development.

Page 17: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

GIG Conceptual Modeling ExerciseTaxonomic Data

Page 18: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Representational View ExerciseDarwin Core

Page 19: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Some ConclusionsExperimentation with domain modeling approach revealed some

unanticipated aspects of our work. (In particular, the discovery of a new class - OrganismOccurrence - as the Feature of Interest whether for a field observation or a collection.) It’s a valuable approach and we should explore it further.

It is clear that TDWG is addressing many of the same, generic issues as other domain organizations.

Problems have been solved by ISO TC 211 - whence come the 191xx documents - so TDWG doesn’t have to. We should use them. This again should encourage us to think of our XML schema models as objects, and use the Object-property pattern.

The real point of this address is simply that we should adopt the lessons of Object Oriented Design and Analysis - and continue to make use of the extensive body of work that’s been done by collaborations of experts in domains outside our own. Cost: a bit of pain, but well worth it.

Page 20: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

What About Services?

Services are the mechanism by which standards-compliant data are distributed across the internet. E.g., several services defined by the OGC for distributing ISO 19103-compliant feature data have been defined and are being increasingly broadly implemented. Current TDWG efforts are incorporating some of this work.

Page 21: An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri

Acknowledgements

James A. Brass, Biospheric Sciences Branch Chief