23
Oceanographic Data Oceanographic Data Provenance Tracking with Provenance Tracking with the Shore Side Data the Shore Side Data System System Mike McCann, Kevin Gomes Mike McCann, Kevin Gomes International Provenance and International Provenance and Annotation Workshop Annotation Workshop June 18, 2008 June 18, 2008

Oceanographic Data Provenance Tracking with the Shore Side Data System Mike McCann, Kevin Gomes International Provenance and Annotation Workshop June 18,

Embed Size (px)

Citation preview

Oceanographic Data Oceanographic Data Provenance Tracking with the Provenance Tracking with the

Shore Side Data SystemShore Side Data System

Mike McCann, Kevin GomesMike McCann, Kevin GomesInternational Provenance and International Provenance and

Annotation WorkshopAnnotation WorkshopJune 18, 2008June 18, 2008

OutlineOutline• MotivationMotivation• Monterey Bay Aquarium Research Institute Monterey Bay Aquarium Research Institute

(MBARI) Projects:(MBARI) Projects:– Monterey Ocean Observing System (MOOS)Monterey Ocean Observing System (MOOS)– Shore-Side Data System (SSDS)Shore-Side Data System (SSDS)

• Data ModelData Model• Application frameworkApplication framework• Operational detailsOperational details

– Instrument configurationInstrument configuration– Data processing softwareData processing software

MUSE dataMUSE data

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

• Diversity of Diversity of platforms & platforms & sensorssensors

• Post-experiment Post-experiment organizationorganization

• Document for Document for later use => later use => FGDCFGDC

• Motivation for a Motivation for a better design better design

Identifying RequirementsIdentifying Requirements

• Configuring InstrumentsConfiguring Instruments– Many different instrumentMany different instrument– Many different manufacturersMany different manufacturers– Varied hardware, communication and metadata interfacesVaried hardware, communication and metadata interfaces– However, all must interact with infrastructureHowever, all must interact with infrastructure

• Instrument Have LifecyclesInstrument Have Lifecycles– Changed for normal maintenance, cleaning, failureChanged for normal maintenance, cleaning, failure– Can also be change configuration depending on science Can also be change configuration depending on science

goal/experimentgoal/experiment– In-situ re-configurationIn-situ re-configuration– Instrument-Infrastructure relationship must be kept intact Instrument-Infrastructure relationship must be kept intact

and, in fact, tracked.and, in fact, tracked.

Identifying RequirementsIdentifying Requirements

• Metadata Must Tie To DataMetadata Must Tie To Data– Huge variation in data formats that Huge variation in data formats that

users must handleusers must handle– Traditionally added on after-the-factTraditionally added on after-the-fact– Not scalable and error proneNot scalable and error prone

Identifying RequirementsIdentifying Requirements

• Instruments Can Cross ObservatoriesInstruments Can Cross Observatories– Some instrument supplies are limitedSome instrument supplies are limited– Experiment configurationExperiment configuration

• Metadata and Data Can Cross Metadata and Data Can Cross ObservatoriesObservatories– Example: data processing for Example: data processing for

instruments should not have to be re-instruments should not have to be re-writtenwritten

Software Middleware for MOOSSoftware Middleware for MOOS

Shore network

MOOS moored network

Shore SideData

System

TCP/IP via satellite

Surface Benthic-1 Benthic-2

Instrumentservices

Instrumentservices

Instrumentservices

InstrumentGUI

Telemetryretriever

SSDS: Metadata and Data SSDS: Metadata and Data ManagementManagement

• Requirements for SSDS (partial list)Requirements for SSDS (partial list)– Capture observatory and instrument lifecycle Capture observatory and instrument lifecycle

datadata– Return instrument data in its native (“raw”) Return instrument data in its native (“raw”)

formatformat– Simple analysis tools for viewing dataSimple analysis tools for viewing data– Capture and archive processed data products Capture and archive processed data products

and associated metadata, maintaining known and associated metadata, maintaining known relationships between data setsrelationships between data sets

– Convert data to common formatsConvert data to common formats

SSDS: Metadata and Data SSDS: Metadata and Data ManagementManagement

Data

Acce

ss Serv

icesM

eta

data

Acce

ss Serv

ices

Aggre

gate

HTTP-b

ase

d S

erv

ices (S

OA

)

Wet Side Shore Side

Ing

est

XML

Data

Metadata

010011011101110100

12.4,92.511.9,92.312.1,91.1

InstrumentPackets

(+ infrastructuremetadata)

SSDS

Domain Logic API

SSDS Data ModelSSDS Data Model

Data Container &Data Container &Data ProducerData Producerattributesattributes

• RecordingRecording… … – WhatWhat– WhereWhere– WhenWhen– RelationsRelations

Satisfying Requirements: ConfigurationSatisfying Requirements: Configuration

Data

Acce

ss Serv

icesM

eta

data

Acce

ss Serv

ices

Aggre

gate

HTTP-b

ase

d S

erv

ices

Wet Side Shore Side

Ing

est

XML

Data

Metadata

010011011101110100

12.4,92.511.9,92.312.1,91.1

InstrumentPackets

(+ infrastructuremetadata)

SSDS

Business Logic API

Satisfying Requirements : Dynamic Satisfying Requirements : Dynamic LifecycleLifecycle

Data

Acce

ss Serv

icesM

eta

data

Acce

ss Serv

ices

Aggre

gate

HTTP-b

ase

d S

erv

ices

Wet Side Shore Side

Ing

est

XML

Data

Metadata

010011011101110100

12.4,92.511.9,92.312.1,91.1

InstrumentPackets

(+ infrastructuremetadata)

SSDS

Business Logic API

Satisfying Requirements : Resource Satisfying Requirements : Resource Mgmt.Mgmt.

Data

Acce

ss Serv

icesM

eta

data

Acce

ss Serv

ices

Aggre

gate

HTTP-b

ase

d S

erv

ices

Wet Side Shore Side

Ing

est

XML

Data

Metadata

010011011101110100

12.4,92.511.9,92.312.1,91.1

InstrumentPackets

(+ infrastructuremetadata)

SSDS

Business Logic API

Satisfying Requirements : Health Satisfying Requirements : Health MonitoringMonitoring

Data

Acce

ss Serv

icesM

eta

data

Acce

ss Serv

ices

Aggre

gate

HTTP-b

ase

d S

erv

ices

Wet Side Shore Side

Ing

est

XML

Data

Metadata

010011011101110100

12.4,92.511.9,92.312.1,91.1

InstrumentPackets

(+ infrastructuremetadata)

SSDS

Business Logic API

Subject: SSDS: No recent data stream update from instruments: 1441 A problem has been encountered while checking on the status of the following data streams that SSDS is monitoring:

Device ID :: Last update time (in hours) :: Device Name------------------------------------- MSE Surface Node-------------------------------------1441 :: 2.9 :: Medusa Card

Satisfying Requirements : Satisfying Requirements : Metadata/DataMetadata/Data

Data

Acce

ss Serv

icesM

eta

data

Acce

ss Serv

ices

Aggre

gate

HTTP-b

ase

d S

erv

ices

Wet Side Shore Side

Ing

est

XML

Data

Metadata

010011011101110100

12.4,92.511.9,92.312.1,91.1

InstrumentPackets

(+ infrastructuremetadata)

SSDS

Business Logic API

- Perl- Perl

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Application Programming InterfaceApplication Programming Interface

- Matlab (works with R2008a)- Matlab (works with R2008a)

% Import SSDS packageimport moos.ssds.services.metadata.*

% Get Home interfacehome = moos.ssds.services.metadata.DataProducerAccessUtil.getHome();

% Get Access objectdpAccess = home.create();

% Call methods on the Access objectdList = dpAccess.findByName('Back', logical(0), 'id','ascending', logical(1));it = dList.iterator;d = it.next;d.getDevice.getMfgSerialNumber

ans =

WL-30011

Application Programming InterfaceApplication Programming Interface

SSDS data life cycleSSDS data life cycle

• Instrument is defined by creating Device record. Instrument is defined by creating Device record. • Instrument is configured for deployment by writing Instrument is configured for deployment by writing

Deployment, DataContainer, RecordDescription, Deployment, DataContainer, RecordDescription, RecordVariable XML. RecordVariable XML.

• Instrument is deployed. XML metadata is ingested Instrument is deployed. XML metadata is ingested by SSDS, data packets flow into the Instrument by SSDS, data packets flow into the Instrument Packets database. Packets database.

• Automated DataStream processing software Automated DataStream processing software consumes the data packets producing a NetCDF consumes the data packets producing a NetCDF file for each instrument’s data. A DataProducer file for each instrument’s data. A DataProducer record is created linking the input DataStream to record is created linking the input DataStream to the output DataFile.the output DataFile.

SSDS data life cycle (cont.)SSDS data life cycle (cont.)

• Follow-on data processing runs consume Follow-on data processing runs consume instrument NetCDF DataContainers instrument NetCDF DataContainers producing combined data sets and producing combined data sets and graphical products. Metadata from SSDS is graphical products. Metadata from SSDS is extracted as needed to fully describe data extracted as needed to fully describe data in all the NetCDF data sets. in all the NetCDF data sets.

• User uses the data with all the needed User uses the data with all the needed information to assess its suitability for a information to assess its suitability for a particular use. particular use.

SSDS Explorer webSSDS Explorer webapplicationapplication

• Drill down Drill down deployment treedeployment tree

• Drill down Drill down processing treeprocessing tree

• Used mainly by Used mainly by developersdevelopers

Lessons LearnedLessons Learned

• Solid Interface Definitions KeySolid Interface Definitions Key

• Web Services Great – Not Always SOAPWeb Services Great – Not Always SOAP

• Policies Are As Important As InterfacesPolicies Are As Important As Interfaces

• Identify The Critical MetadataIdentify The Critical Metadata

• Consume The Critical Metadata Early Consume The Critical Metadata Early And OftenAnd Often

• Testing Station/SimulatorTesting Station/Simulator

AcknowledgementsAcknowledgements

• SSDS funded by the David and Lucille SSDS funded by the David and Lucille Packard FoundationPackard Foundation

• MOOS Leads: Mark Chaffey, Kent HeadleyMOOS Leads: Mark Chaffey, Kent Headley• Operations: Paul Coenen, Ken Heller, Hans Operations: Paul Coenen, Ken Heller, Hans

Thomas, Duane Thompson Thomas, Duane Thompson • Science: Jim Barry, Francisco Chavez, Science: Jim Barry, Francisco Chavez,

Charlie Paull, Erich Rienecker, John RyanCharlie Paull, Erich Rienecker, John Ryan• SSDS Development Team: Andrew Chase, SSDS Development Team: Andrew Chase,

Mike McCann, Brian Schlining, Rich Mike McCann, Brian Schlining, Rich SchrammSchramm