Upload
martha-holt
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
Oceanographic Data Oceanographic Data Provenance Tracking with the Provenance Tracking with the
Shore Side Data SystemShore Side Data System
Mike McCann, Kevin GomesMike McCann, Kevin GomesInternational Provenance and International Provenance and
Annotation WorkshopAnnotation WorkshopJune 18, 2008June 18, 2008
OutlineOutline• MotivationMotivation• Monterey Bay Aquarium Research Institute Monterey Bay Aquarium Research Institute
(MBARI) Projects:(MBARI) Projects:– Monterey Ocean Observing System (MOOS)Monterey Ocean Observing System (MOOS)– Shore-Side Data System (SSDS)Shore-Side Data System (SSDS)
• Data ModelData Model• Application frameworkApplication framework• Operational detailsOperational details
– Instrument configurationInstrument configuration– Data processing softwareData processing software
MUSE dataMUSE data
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
• Diversity of Diversity of platforms & platforms & sensorssensors
• Post-experiment Post-experiment organizationorganization
• Document for Document for later use => later use => FGDCFGDC
• Motivation for a Motivation for a better design better design
Identifying RequirementsIdentifying Requirements
• Configuring InstrumentsConfiguring Instruments– Many different instrumentMany different instrument– Many different manufacturersMany different manufacturers– Varied hardware, communication and metadata interfacesVaried hardware, communication and metadata interfaces– However, all must interact with infrastructureHowever, all must interact with infrastructure
• Instrument Have LifecyclesInstrument Have Lifecycles– Changed for normal maintenance, cleaning, failureChanged for normal maintenance, cleaning, failure– Can also be change configuration depending on science Can also be change configuration depending on science
goal/experimentgoal/experiment– In-situ re-configurationIn-situ re-configuration– Instrument-Infrastructure relationship must be kept intact Instrument-Infrastructure relationship must be kept intact
and, in fact, tracked.and, in fact, tracked.
Identifying RequirementsIdentifying Requirements
• Metadata Must Tie To DataMetadata Must Tie To Data– Huge variation in data formats that Huge variation in data formats that
users must handleusers must handle– Traditionally added on after-the-factTraditionally added on after-the-fact– Not scalable and error proneNot scalable and error prone
Identifying RequirementsIdentifying Requirements
• Instruments Can Cross ObservatoriesInstruments Can Cross Observatories– Some instrument supplies are limitedSome instrument supplies are limited– Experiment configurationExperiment configuration
• Metadata and Data Can Cross Metadata and Data Can Cross ObservatoriesObservatories– Example: data processing for Example: data processing for
instruments should not have to be re-instruments should not have to be re-writtenwritten
Software Middleware for MOOSSoftware Middleware for MOOS
Shore network
MOOS moored network
Shore SideData
System
TCP/IP via satellite
Surface Benthic-1 Benthic-2
Instrumentservices
Instrumentservices
Instrumentservices
InstrumentGUI
Telemetryretriever
SSDS: Metadata and Data SSDS: Metadata and Data ManagementManagement
• Requirements for SSDS (partial list)Requirements for SSDS (partial list)– Capture observatory and instrument lifecycle Capture observatory and instrument lifecycle
datadata– Return instrument data in its native (“raw”) Return instrument data in its native (“raw”)
formatformat– Simple analysis tools for viewing dataSimple analysis tools for viewing data– Capture and archive processed data products Capture and archive processed data products
and associated metadata, maintaining known and associated metadata, maintaining known relationships between data setsrelationships between data sets
– Convert data to common formatsConvert data to common formats
SSDS: Metadata and Data SSDS: Metadata and Data ManagementManagement
Data
Acce
ss Serv
icesM
eta
data
Acce
ss Serv
ices
Aggre
gate
HTTP-b
ase
d S
erv
ices (S
OA
)
Wet Side Shore Side
Ing
est
XML
Data
Metadata
010011011101110100
12.4,92.511.9,92.312.1,91.1
InstrumentPackets
(+ infrastructuremetadata)
SSDS
Domain Logic API
Data Container &Data Container &Data ProducerData Producerattributesattributes
• RecordingRecording… … – WhatWhat– WhereWhere– WhenWhen– RelationsRelations
Satisfying Requirements: ConfigurationSatisfying Requirements: Configuration
Data
Acce
ss Serv
icesM
eta
data
Acce
ss Serv
ices
Aggre
gate
HTTP-b
ase
d S
erv
ices
Wet Side Shore Side
Ing
est
XML
Data
Metadata
010011011101110100
12.4,92.511.9,92.312.1,91.1
InstrumentPackets
(+ infrastructuremetadata)
SSDS
Business Logic API
Satisfying Requirements : Dynamic Satisfying Requirements : Dynamic LifecycleLifecycle
Data
Acce
ss Serv
icesM
eta
data
Acce
ss Serv
ices
Aggre
gate
HTTP-b
ase
d S
erv
ices
Wet Side Shore Side
Ing
est
XML
Data
Metadata
010011011101110100
12.4,92.511.9,92.312.1,91.1
InstrumentPackets
(+ infrastructuremetadata)
SSDS
Business Logic API
Satisfying Requirements : Resource Satisfying Requirements : Resource Mgmt.Mgmt.
Data
Acce
ss Serv
icesM
eta
data
Acce
ss Serv
ices
Aggre
gate
HTTP-b
ase
d S
erv
ices
Wet Side Shore Side
Ing
est
XML
Data
Metadata
010011011101110100
12.4,92.511.9,92.312.1,91.1
InstrumentPackets
(+ infrastructuremetadata)
SSDS
Business Logic API
Satisfying Requirements : Health Satisfying Requirements : Health MonitoringMonitoring
Data
Acce
ss Serv
icesM
eta
data
Acce
ss Serv
ices
Aggre
gate
HTTP-b
ase
d S
erv
ices
Wet Side Shore Side
Ing
est
XML
Data
Metadata
010011011101110100
12.4,92.511.9,92.312.1,91.1
InstrumentPackets
(+ infrastructuremetadata)
SSDS
Business Logic API
Subject: SSDS: No recent data stream update from instruments: 1441 A problem has been encountered while checking on the status of the following data streams that SSDS is monitoring:
Device ID :: Last update time (in hours) :: Device Name------------------------------------- MSE Surface Node-------------------------------------1441 :: 2.9 :: Medusa Card
Satisfying Requirements : Satisfying Requirements : Metadata/DataMetadata/Data
Data
Acce
ss Serv
icesM
eta
data
Acce
ss Serv
ices
Aggre
gate
HTTP-b
ase
d S
erv
ices
Wet Side Shore Side
Ing
est
XML
Data
Metadata
010011011101110100
12.4,92.511.9,92.312.1,91.1
InstrumentPackets
(+ infrastructuremetadata)
SSDS
Business Logic API
- Perl- Perl
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Application Programming InterfaceApplication Programming Interface
- Matlab (works with R2008a)- Matlab (works with R2008a)
% Import SSDS packageimport moos.ssds.services.metadata.*
% Get Home interfacehome = moos.ssds.services.metadata.DataProducerAccessUtil.getHome();
% Get Access objectdpAccess = home.create();
% Call methods on the Access objectdList = dpAccess.findByName('Back', logical(0), 'id','ascending', logical(1));it = dList.iterator;d = it.next;d.getDevice.getMfgSerialNumber
ans =
WL-30011
Application Programming InterfaceApplication Programming Interface
SSDS data life cycleSSDS data life cycle
• Instrument is defined by creating Device record. Instrument is defined by creating Device record. • Instrument is configured for deployment by writing Instrument is configured for deployment by writing
Deployment, DataContainer, RecordDescription, Deployment, DataContainer, RecordDescription, RecordVariable XML. RecordVariable XML.
• Instrument is deployed. XML metadata is ingested Instrument is deployed. XML metadata is ingested by SSDS, data packets flow into the Instrument by SSDS, data packets flow into the Instrument Packets database. Packets database.
• Automated DataStream processing software Automated DataStream processing software consumes the data packets producing a NetCDF consumes the data packets producing a NetCDF file for each instrument’s data. A DataProducer file for each instrument’s data. A DataProducer record is created linking the input DataStream to record is created linking the input DataStream to the output DataFile.the output DataFile.
SSDS data life cycle (cont.)SSDS data life cycle (cont.)
• Follow-on data processing runs consume Follow-on data processing runs consume instrument NetCDF DataContainers instrument NetCDF DataContainers producing combined data sets and producing combined data sets and graphical products. Metadata from SSDS is graphical products. Metadata from SSDS is extracted as needed to fully describe data extracted as needed to fully describe data in all the NetCDF data sets. in all the NetCDF data sets.
• User uses the data with all the needed User uses the data with all the needed information to assess its suitability for a information to assess its suitability for a particular use. particular use.
SSDS Explorer webSSDS Explorer webapplicationapplication
• Drill down Drill down deployment treedeployment tree
• Drill down Drill down processing treeprocessing tree
• Used mainly by Used mainly by developersdevelopers
Lessons LearnedLessons Learned
• Solid Interface Definitions KeySolid Interface Definitions Key
• Web Services Great – Not Always SOAPWeb Services Great – Not Always SOAP
• Policies Are As Important As InterfacesPolicies Are As Important As Interfaces
• Identify The Critical MetadataIdentify The Critical Metadata
• Consume The Critical Metadata Early Consume The Critical Metadata Early And OftenAnd Often
• Testing Station/SimulatorTesting Station/Simulator
AcknowledgementsAcknowledgements
• SSDS funded by the David and Lucille SSDS funded by the David and Lucille Packard FoundationPackard Foundation
• MOOS Leads: Mark Chaffey, Kent HeadleyMOOS Leads: Mark Chaffey, Kent Headley• Operations: Paul Coenen, Ken Heller, Hans Operations: Paul Coenen, Ken Heller, Hans
Thomas, Duane Thompson Thomas, Duane Thompson • Science: Jim Barry, Francisco Chavez, Science: Jim Barry, Francisco Chavez,
Charlie Paull, Erich Rienecker, John RyanCharlie Paull, Erich Rienecker, John Ryan• SSDS Development Team: Andrew Chase, SSDS Development Team: Andrew Chase,
Mike McCann, Brian Schlining, Rich Mike McCann, Brian Schlining, Rich SchrammSchramm