23
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed 1,2 , Katherine Chastain 1 , and Deborah McGuin 1 Tetherless World Constellation, Rensselaer Polytechnic Institute, 110 8 th Street, Troy, NY 2 DataONE, University of New Mexico, 1 University Boulevard N.E., Albuquerque, NM 87131

Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator

  • Upload
    osman

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator. Session: Managing Ecological Data for Effective Use and Reuse. Patrice Seyed 1,2 , Katherine Chastain 1 , and Deborah McGuinness 1 - PowerPoint PPT Presentation

Citation preview

Page 1: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

Linking Disparate Datasets of the Earth Sciences with the

SemantEco AnnotatorSession: Managing Ecological Data for

Effective Use and Reuse

Patrice Seyed1,2, Katherine Chastain1, and Deborah McGuinness11 Tetherless World Constellation, Rensselaer Polytechnic Institute, 110 8 th Street, Troy, NY 121802 DataONE, University of New Mexico, 1 University Boulevard N.E., Albuquerque, NM 87131

Page 2: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

Overview

• Introduction• Semantics and Linked Data• Use Case: SemantEco• SemantEco Annotator

– Concept– Getting started– Overview

• Ontologies• Capabilities• Integration with Semantic Applications• Future Work• Quick Look Video• Summary

2

Page 3: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

Introduction

• How can we take datasets from different sources and make them– Easy to search and to discover?– Easy to use and to re-use?– Easy to integrate with each other for

visualization and other applications?

3

Page 4: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

Semantics and Linked Data

• We need a way to describe the relationships between tabular data columns…

Linked-data formats such as the Resource Description Framework (RDF) capture such relationships in subject-

predicate-object triples.

• … and we need a method of description that is both standardized and machine-readable.

Communities can develop, use, and reuse common vocabulary with ontologies, expressed in a computer-readable format: the Web Ontology Language (OWL)

4

Page 5: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

Semantics and Linked Data

• Linked format aids interoperability, making it easier to share.

• Use existing URI’s to refer to well-defined entities and concepts: – How do you make sure that everyone using

your data understands that the string “NY” refers to the US state of New York?

– What more can you learn if you can easily discover other datasets that also refer to the US state of New York?

5

Page 6: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

6

Use Case: SemantEco

• SemantEco is a data visualization environment that allows a user to explore ecological data through a map-based interface.

• Data comes from a variety of sources: – Federal, such as the USGS, EPA.– Local, such as the Darrin Freshwater

Institute of Upstate New York.– … each with different notations and best-

practices for gathering and recording.

Page 7: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

Conceptually....

• Represent data independent of the schema by which it was recorded

• This enables comparisons across data from different sources

7

• In SemantEco, we look at Measurements:• Water quality• Air quality• Birds• Fish

Page 8: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

SemantEco Annotator

Allows a user to:• Translate data into linked-data formats such as RDF:

– Linked data triples describe how columns in a data table relate to each other, and to the data in that column.

– OWL ontologies provide standard vocabularies for describing data these relationships.

– Resulting enriched RDF data can be used immediately within RDF stores / hosted as LD.

• OR to utilize semantics to annotate data:– Column headers correspond to OWL properties – Data cell values can correspond to OWL classes or datatypes– Organizational best-practices and terminology can be defined in

the data files themselves.8

Page 9: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

9

SemantEco Annotator:Getting Started

Page 10: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

10

Provenance and Metadata

• Annotator asks the user to provide metadata about the dataset.

• This is also becomes part of the final RDF, facilitating the dataset’s discoverability.

Page 11: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

11

SemantEco Annotator

-- Tabular data view

Page 12: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

12

SemantEco Annotator

-- Ontology loader-- Ontology facets

Page 13: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

13

SemantEco Annotator

-- Global settings

Page 14: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

14

SemantEco Annotator

-- Drag-and-drop to make assignments-- Work directly on tabular data

Page 15: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

15

Ontologies

• Load one or more ontologies from the dropdown menu.

• Or import from a URI.

• Annotator also maintains a list of recent imports for re-use.

Page 16: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

16

Capabilities

• Provide a definition for “Accession Code”• Specify which standard was used to record the Date• Group “Lake Name”, “Z Max” and “Sample Z” together as a single

entity: the location where the sample was taken• Make explicit that “NH4+” is the same thing as “Ammonium”, and

that the units (mg/L) apply to each number in that column.

Page 17: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

17

Integration with Semantic Applications

• Identify application’s requirements:• Eg., a piece of data with lat-long

coordinates can be plotted on a map.

• We brought in data from the Darrin Freshwater Institute containing water quality data for lakes in Upstate New York, augmenting existing data from the U.S. Geological Survey.

“Big Moose Lake”

Page 18: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

18

Integration with Semantic Applications

• Linking data to well-defined entities and concepts by URI enhances searchability.

dbpedia:New_York

“New York”“New York State”“NY”

dbpedia:New_York_City

Page 19: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

19

Future Work

• Automatic mappings directed to a particular graph closed under a predicate/object pair, use of OWL domain and range restriction axioms to guide the user in vocabulary selection decisions

• Use of OWL class definitions to enable a top-down approach for modeling data

• Ability to load enhancement files, both to facilitate translation of multiple similar datasets, and to make corrections easier.

• Construction of a platform for better management of linked data, within which the Annotator plays a vital role.

• Use of application requirements to create “templates” for new data sources to be integrated more easily.

Page 20: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

20

Summary

• “SemantEco Annotator” component for ease of translation into RDF

• Multi-purposed for translation, annotation, and generalized mapping.

• A Part of a Future “Suite” that couples Annotation and Search

Page 21: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

SemantEco Annotator Project Page

Want more info? Interested in collaborating?See Evan Patton or email Deborah McGuinness [email protected]

We also have a project page with screenshots and demonstration videos: http://tw.rpi.edu/web/project/SemantEcoAnnotator

21

Page 22: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

22

Acknowledgements

• Rensselaer Polytechnic Institute• Tetherless World Constellation at RPI• DataONE

Page 23: Linking Disparate Datasets of the Earth Sciences with the  SemantEco  Annotator

23

SemantEco: More Info

For additional information about SemantEco:“Addressing the Challenges of Multi-Domain

Data Integration with the SemantEco Framework”

Friday @ 10:35am, IN52B-02. E.W. Patton; P. Seyed; D.L. McGuinness