54
Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream ACCESS-09-0006 Annual Review 1 May 2012 UAHuntsville The University of Alabama in Huntsville

Annual Review 1 May 2012

  • Upload
    maj

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

UAHuntsville The University of Alabama in Huntsville. Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream ACCESS-09-0006. Annual Review 1 May 2012. Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream. - PowerPoint PPT Presentation

Citation preview

Page 1: Annual Review 1 May 2012

Instant Karma: Applying a Proven Provenance Tool

to NASA’s AMSR-E Data Production StreamACCESS-09-0006

Annual Review1 May 2012

UAHuntsvilleThe University of Alabama in Huntsville

Page 2: Annual Review 1 May 2012

Objectives

Approach

Co-Is/Partners

Key Milestones

Advancing Collaborative Connections for Earth System ScienceACCESS

Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream

PI: Michael Goodman, NASA / MSFC

TRLin= 7 TRLcurrent= 7.9

16 April 2012

• Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community

• Customize and integrate Karma, a proven provenance tool into NASA data production

• Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice

• Engage the Sea Ice science team and user community• Adhere to the Open Provenance Model (OPM)

Apply Karma to Sea Ice data production workflows Customize Karma’s provenance dissemination user

interface Evaluate usefulness of provenance collected

- Measure traffic to Karma Provenance Browser- Collect user feedback

Expand use of Karma to other AMSR-E data production streams

Evaluate current AMSR-E SIPS product generation

06/10 Extend Karma provenance collection tools for SIPS

09/10 Enhance Karma Provenance Browser interface

10/10 Instrument AMSR-E Sea Ice production in Testbed

12/10 Evaluate with Sea Ice science team

03/11 Introduce Provenance Browser to NSIDC DAAC

06/11 Instrument AMSR-E Sea Ice production in Ops

09/11 Evaluate with AMSR-E Sea Ice user community

02/12 Instrument other AMSR-E data streams

02/12

Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Helen Conover, Rahul Ramachandran, UAHuntsville

Page 3: Annual Review 1 May 2012

Objectives

Accomplishments

Co-Is/Partners

Advancing Collaborative Connections for Earth System ScienceACCESS

Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream

PI: Michael Goodman, NASA / MSFC

TRLin= 7 TRLout= 7.9

16 April 2012

• Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community

• Customize and integrate Karma, a proven provenance tool into NASA data production

• Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice

• Engage the Sea Ice science team and user community• Adhere to the Open Provenance Model (OPM)

Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Helen Conover, Rahul Ramachandran, UAHuntsville

• Applied current provenance research results to a legacy NASA science processing system• Instrumented all processing streams for all AMSR-E standard products, using a software library developed at

UAH and provenance logging specifications provided by IU• Captured high level science relevant provenance and context information in collaboration with the Science

Team• Developed a provenance browser customized for AMSR-E science users

• Engaged user community via presentations to AMSR-E Science Team, beta-testing with AMSR-E Sea Ice Team, and outreach to Sea Ice community at GSFC

• Developed plan to transition provenance collection and display tools into production at the AMSR-E SIPS• Moving toward ISO rather than OPM, in line with NASA’s current metadata approach• Enhanced Karma visualization based on Sea-Ice rolled into visualization tools for other Karma users

Provenance Repository

Ocean

Weekly Script

OceanRainSnow

Monthly Script

Snow

Pentad Script

OceanLandSnowSea Ice

Daily Script

L2A Brightness

Temps

Pass Script

L2B OceanL2B LandL2B Rain

Query API

ProcessingTestbed

Provenance BrowserProvenance

Collection

Page 4: Annual Review 1 May 2012

Presentation Outline

• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides

1 May 2012

Page 5: Annual Review 1 May 2012

Instant Karma Overview

1 May 2012

Data Center Operations

Provenance Research

Earth Science

Instant Karma Project

• Collaboration among– AMSR-E SIPS (MSFC Earth Science Office

and UAHuntsville ITSC) – Indiana University, Data to Insight Center– AMSR-E Sea Ice science team (GSFC)

• Primary goal is to improve the collection, preservation, utility and dissemination of provenance and context information within the NASA Earth Science community, building on– Karma provenance tools– AMSR-E standard product generation, with initial focus on Sea Ice

Page 6: Annual Review 1 May 2012

Team Leads

1 May 2012

Name Organization RoleMichael Goodman NASA MSFC Principal InvestigatorHelen ConoverRahul Ramachandran

UAHuntsville Information Technology & Systems Center

Co-Investigator – Project ManagerCo-Investigator – System Architect

Beth Plale

Indiana University

Data to Insight Center

Co-Investigator – Karma System

Thorsten Markus NASA GSFC Co-Investigator – Science TeamDawn Conway UAHuntsville

Earth System Science Center

AMSR-E Science Metadata Consultant

Page 7: Annual Review 1 May 2012

Team Membersover the two-year project

1 May 2012

Name Organization RoleBruce Beaumont Anurag Bhaskar Chocka ChidambaramAjinkya KulkarniMichael McEniryKathryn RegnerCara Stein

UAHuntsville Information Technology & Systems Center

AMSR-E SIPS Implementation Graduate Student, Metadata UtilitiesProvenance Browser Provenance Browser, Metadata UtilitiesSystems Administration, ISO metadata definitionAMSR-E SIPS Systems EngineerProvenance Browser

Mehmet AktasScott JensenYuan LuoRobert PingPrajakta PurohitYiming Sun

Indiana University Data to Insight Center

Postdoc, Karma Database and ServicesPostdoc, Karma Database and Services Graduate Student, Karma System MonitoringProject ManagementGraduate Student, Provenance Collection ServicesGraduate Student, Karma Database and Services

Don CavalieriAlvaro Ivanoff

NASA GSFC Science Algorithm DeveloperScience Software Developer

Page 8: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

8

Project Approach

Focus on capturing information for contextual understanding and long term data preservation rather than full data reproducibility• SIPS processing flow is very controlled, with less need for detailed provenance

– Files in a given version of a dataset will differ only by list of input files, processing location and environment, processing date and time

• Provenance as centerpiece around which other information is aggregated

1 May 2012

Research Track: Karma Project

AMSR-E Sea Ice Team Provenance @ SIPS

Provenance researchCollection, storage, display methods

NASA data preservation spec Operational environment

Customized provenance database and browser

Science user

needsRequ

irem

ents

Research results

Operations Track: AMSR-E SIPS

Page 9: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

9

Science-Relevant Provenance and Context Information

• Data lineage (data inputs, software and hardware) plus additional contextual knowledge about science algorithms, instrument variations, etc.

• Lots of information already available, but scattered across multiple locations– Processing system configuration– Data collection and file level metadata– Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical basis documents, release

notes)– Data documentation (e.g., guide documents, README files)

• Project goal was to collate and organize information from multiple sources, make available through the AMSR-E Provenance Browser

1 May 2012

Page 10: Annual Review 1 May 2012

AMSR-E Provenance Use Cases Browse provenance graphs : convey rich information about final data

granule details [Use case 1]– Spatial location, time of observation, algorithms employed, input data and

ancillary files– Provenance bundle to include pointers to relevant documentation

Answer “Something isn’t right” question [Use case 1 variant]- E.g., did not receive data for several days so snow melt mask may be

inaccurate. Compare two data granules [Use case 2]

- Query system to compare provenance graph structure and provenance details (e.g., versions of software, number and versions of input files)

General provenance graph for a given science process, e.g., Sea Ice processing [Use case 3]- Current algorithms and versions, nominal number and versions of input files,

pointers to relevant documentation Embed provenance information as annotations in data files [Use case 4]

- ISO “Lineage” model (evolving NASA ES conventions)

1 May 2012

Page 11: Annual Review 1 May 2012

Instant Karma Major Milestones

1 May 2012

Page 12: Annual Review 1 May 2012

TRL Description

1 Basic principles observed and reportedTransition from scientific research to applied research

2 Technology concept and/or application formulatedApplied research

3 Analytical and experimental critical function and/or characteristic proof-of concept. Proof of concept validation

4 Component/subsystem validation in laboratory environmentStandalone prototyping implementation and test

5 System/subsystem/component validation in relevant environmentThorough testing of prototyping in representative environment

6System/subsystem model or prototyping demonstration in a relevant end-to-end environment (ground or space)Prototyping implementations on full-scale realistic problems

7System prototyping demonstration in an operational environment (ground or space)System prototyping demonstration in operational environment

8Actual system completed and "mission qualified" through test and demonstration in an operational environment (ground or space)End of system development

9Actual system "mission proven" through successful mission operations (ground or space)Fully integrated with operational hardware/software systems

Technology/Software Readiness Levels

1 May 2012 Instant Karma Year 2 Annual Review ACCESS-09-0006

Target Deployments

Testbed

AMSR-E SIPS

SIPS / DAAC

deployed

planned

key

Page 13: Annual Review 1 May 2012

Presentation Outline

• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides

1 May 2012

Page 14: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

14

AMSR-E Product Suite

1 May 2012

Product Description Science PI Sample ImageryAMSR-E/Aqua L2A Global Swath Spatially-Resampled Brightness

Temperatures (Tb)Wentz

AMSR-E/Aqua L2B Global Swath Ocean Products derived from Wentz Algorithm

L3 Ascending/Descending .25x.25 deg Daily Ocean GridsL3 Weekly and Monthly Ocean Grids

Wentz

AMSR-E/Aqua L2B Surface Soil Moisture, Ancillary Parms, & QC EASE-GridsDaily L3 Surface Soil Moisture, Interpretive Parms & QC EASE-Grids

Njoku

AMSR-E/Aqua L2B Global Swath Rain Rate/Type GSFC Profiling AlgorithmL3 Monthly Global Averaged Rainfall Accumulation over Land & Ocean

Kummerow, Ferraro, Wilheit

AMSR-E/Aqua Daily L3 6.25 km 89 GHz Brightness Temperature (Tb) Polar Grids

L3 Daily 12.5 km Tb, Sea Ice Concentration, & Snow Depth Polar GridsL3 Daily 25 km Tb, Sea Ice ConcentrationSea Ice Drift

Markus, Cavalieri, Comiso

AMSR-E/Aqua Daily L3 Global Snow Water Equivalent EASE-GridsL3 5-Day and Monthly Snow Water Equivalent grids

Kelly

Page 15: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

15

SIPS-GHRC Processing Architecture• Algorithm

Packages from the Science Team and Team Lead of the Science Computing Facility

• Processing automation controlled by SIPS scripts

– Pass processing is data driven

– Level-3 product generation is scheduled after nominal availability of input products

1 May 2012

Input Data and Metadata File(s)

Ancillary File(s)

Science DataMetadataProcessing HistoryQuality AssuranceBrowse imagery Custom subsets

Delivered Algorithm Package

Control Script

Provenance logging at this level

Page 16: Annual Review 1 May 2012

Provenance Repository

Query API

Provenance Information Architecture

1 May 2012

MP

Ocean

Weekly Script

OceanRainSnow

Monthly Script

Snow

Pentad Script

OceanLandSnowSea Ice

Daily Script

L2A Brightness

Temps

Pass Script

L2B OceanL2B LandL2B Rain

ProcessingTestbed

P PP

P

PM

MM

M

M

M

Granule Metadata

Provenance Logs

M

M

P

Product / AlgorithmMetadata Form

Provenance Browser

Page 17: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

17

Earth science Library for Processing History (ELPH)

• Perl Library reference implementation can be used to instrument script-driven processing

• Based on IU work in software instrumentation and Karma provenance log specification

• Logs “consumed”, “invoked” and “produced” events• Assigns unique URNs to all artifacts (e.g., data files or software processes), of

the form urn:host-id/environment:datetime:name– host-id indicates computer system on which processing is performed– environment indicates processing environment (e.g., dev, test or ops for AMSR-E)– datetime (yyyymmddhhmmss) indicates production date/time for science data

files, last modification date/time for other files, system date/time for processes or workflows

– name indicates name of file or process

1 May 2012

Page 18: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

18

Provenance and Context Metadata• Harvesting granule information from ECS metadata

– Also recording processing location associated with each data granule• Providing Context Summary information for algorithms and data products,

in consultation with AMSR-E Science Computing Facility and Sea Ice team– Algorithm versions and descriptions– Parameters and data fields in science products– Ancillary files used in processing– Flag values and explanations– Pointers to full documentation– Aligning with NASA Preservation and Context Specification

• Developing ISO Lineage compliant processing descriptions, in consultation Ted Habermann

– High-level description of product generation processes to be stored at NSIDC DAAC with collection-level metadata

– Granule specific information, which references high level description, to be embedded in data files

1 May 2012

Page 19: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

19

Context Summary Metadata Schema

1 May 2012

Basic Product Information

Delivered Algorithm Package

Documentation Links

Science Algorithm Information

Geophysical Parameters / Variables and Associated

Information Data fields

Ancillary filesFlag values

File Informatione.g., grid type

Page 20: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

20

AMSR-E Provenance Browser

1 May 2012

Browse by Product Type

Browse by Product Images

Search by File Name

Explore Provenance

Page 21: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

21

AMSR-E Provenance Browser

1 May 2012

ProcessingGraph

AlgorithmInformation

GranuleMetadata

Parameters and Algorithms

Page 22: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

22

AMSR-E Provenance Browser

1 May 2012

Algorithm Used

Parameter Selected

Page 23: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

23

AMSR-E Provenance BrowserModes of Exploration

• Browse list by product type, file name– Mid-winter Sea Ice data was incorrect after reprocessing. Look at January data

to determine what was different about this processing run. • Browse images by product type, parameter/variable

– Scan series of images to look for possible problems, e.g., incomplete data for 2011-10-04

• Search by file name– Question about a particular data file that doesn’t look right. Search by file

name; may discover that this was a problem file later replaced, and get new file.• View general processing information

– For information about how a files from a particular product are generated, view general processing graph and get pointers to more detailed documentation.

1 May 2012

Page 24: Annual Review 1 May 2012

Presentation Outline

• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides

1 May 2012

Page 25: Annual Review 1 May 2012

Advanced Uses of Provenance

I. Provenance over months and yearsII. Forward provenanceIII. Comparing provenance graphsIV. Provenance quality analysis

Page 26: Annual Review 1 May 2012

I. Month-view of Product ActivityDaily processing for Sea Ice products over 1 month is linked graph of provenance graphs, one per day.

View shows links between provenance graphs. E.g., ice mask updated for each day, feeds into the next day.

This view can be used to detect problems in processing over extended time periods.

Page 27: Annual Review 1 May 2012

Zoom in to Review Dependencies

Zoom to review complex dependencies between runs responsible for producing data products

Shows dependencies for Sea Ice Drift product, built from five daily Sea Ice products

Can see here that input data for fifth day is incomplete (instrument offline)

Zoom in and out to see entire provenance history or specific details

Beth Plale
edits for readability
Page 28: Annual Review 1 May 2012

II. Forward Provenance

Standard provenance view is historical; it shows actions that went into creating a data product

Shown here is 12km SeaIce product

Clicking Data Forward option switches to displaying later-in-time (forward) provenance dependencies

Page 29: Annual Review 1 May 2012

Forward Provenance Graph12KM Sea Ice Product

Forward provenance can be helpful if problem is discovered in data product. The progeny of bad data product can be tracked down visually by following forward provenance chain.

High level view provides quick snapshot of downstream dependencies on data product

Useful to quickly see downstream consequences of reprocessing

Page 30: Annual Review 1 May 2012

III. Comparing Two Provenance Graphs

Compare two workflows side-by-side with node-to-node mapping that uses Direct Classification of node Attendance (DCA)*

* DePiero, F. and Trivedi, M. and Serbin, S., Graph matching using a direct classification of node attendance , Pattern Recognition, (29)6, pp 1031—1048.

Page 31: Annual Review 1 May 2012

Provenance Graph Comparison

Matched sub-graph: Red nodes are matched to corresponding nodes in comparison graph

Left hand graph has 19 data files that cannot be paired with graph on right

One can see from zooming in that additional nodes are input files. This abnormal case occurred because instrument went off-line so data was not available

Beth Plale
edits to wording
Page 32: Annual Review 1 May 2012

Comparison of Metadata

Clicking on data product or process in one graph (e.g., on left), automatically highlights corresponding product or process in comparison graph

Displaying metadata for data product or process in one graph automatically displays metadata for corresponding node in comparison graph

Page 33: Annual Review 1 May 2012

IV. Analysis of Provenance Quality

• MethodologyDuplicate detection

• Annotations are mined for duplicates and conflicts.

Structural analysis• Structures of provenance graphs that are of the same types are

compared and mined for differences or similarities.

Anomaly detection• Detection of outliers in a provenance graph

Validation of attributes• Fields such as time ranges are validated to ensure that time ranges are

valid.

• Applied to 1 month of data processing

Page 34: Annual Review 1 May 2012

Analysis of Quality of Provenance

Structural analysis of 1 month of provenance detected an incorrect timestamp range:– Process_6250<-Process_6251

Time range is reversed: noEarlierThan: 2011-10-13T00:44:32.0000-04:00noLaterThan: 2011-10-13T00:44:31.0000-04:00

This is an artifact of data produced and consumed entries in log file being reversed, which results in begin and end time of time range being swapped

Page 35: Annual Review 1 May 2012

Plot of number of annotations for each node and edge in monthly ocean workflow

used edges Annotations Annotations for wasDerivedFrom edges Data Object annotations

Anomaly detection- Exposes duplicate annotations occurring from duplicate entries in the log files.

Page 36: Annual Review 1 May 2012

Annotations; zoom in I

4 wasDerivedFrom edges that have fewer annotations than usual.

used edges Annotations Annotations for wasDerivedFrom edges Data Object annotations

Anomaly detection- Exposes edges that have less annotations than usual

Page 37: Annual Review 1 May 2012

Annotations; zoom in II

wasTriggeredBy edge has annotations with same uri (keys), but different values.

Anomaly detection- Exposes annotations that share the same key, but have different values. This may or

may not be a problem. But Karma is able to warn the user about this.

Note: Image is a close up of the highlighted part in previous slide.

Page 38: Annual Review 1 May 2012

Presentation Outline

• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides

1 May 2012

Page 39: Annual Review 1 May 2012

ESDSWG and ESIP Activities• NASA Earth Science Data Systems Technology Infusion Working Group

– UAH team members participate in ESDSWG Tech Infusion, specifically the Data Stewardship Subgroup. Key subgroup activities:

• NASA Earth Science Data Preservation Content Specification• Preservation / Provenance Ontology

– Co-I Conover attended the ESDSWG meeting in October 2010 – Session on science-relevant provenance

• NASA Earth Science Data Systems Standards Process Group– Co-I Conover presented the provenance browser and metadata work at the ESDSWG

meeting in November 2011. This led to current work with ISO Lineage spec.• ESIP Federation Information Technology and Interoperability Committee and the

Preservation and Stewardship Cluster– Co-I Plale gave a presentation to IT&I in September 2010 on Instant Karma,

provenance collection, semantics, and interoperability. This group determined that the best place to engage in provenance efforts is through the ESIP Federation Preservation and Stewardship Cluster’s provenance working group.

– Co-I Conover attended the ESIP Federation meeting in July 2010 – Data Identifiers presentation of special interest to this project

– Co-I Plale attended the January 2011 ESIP meeting – Sessions on stewardship, science-relevant provenance among others

– Co-I Conover attended the July 2012 ESIP meeting – Sessions on ISO metadata, NASA Earth Science Data Preservation Content Specification, provenance ontology, others

1 May 2012

Page 40: Annual Review 1 May 2012

Outreach to Community:Posters and Presentations

“Instant Karma: Collecting Provenance for AMSR-E,” Beth Plale and Helen Conover. Joint AMSR-E Science Team Meeting, Huntsville AL, June 2, 2010.

“Instant Karma: Provenance Collection at the AMSR-E SIPS,” Helen Conover, Beth Plale, Sunil Movva, Prajakta Purohit, Kathryn Regner, Yiming Sun. Poster presented at the 9th Earth Science Data Systems Working Group Meeting, New Orleans, October 20-22, 2010.

“Instant Karma: Provenance Collection at the AMSR-E SIPS,” Helen Conover, Beth Plale, Sunil Movva, Prajakta Purohit, Kathryn Regner, Yiming Sun. Poster presented at the International Symposium on the A-Train Satellite Constellation 2010, New Orleans, October 25-28, 2010.

"Karma Provenance: Why and How? Provenance collection of unmanaged workflows”, Mehmet Aktas, Presentation at the Super Computing 2010 Conference, New Orleans, LA, November 14-20, 2010.

"Karma Provenance: Why and How? Provenance collection of unmanaged workflows”, M. Aktas, Presentation at the CS Department of University of West Florida, Pensacola, FL, November 20, 2010.

“Metadata and Provenance Collection and Representation: Antecedent to Scientific Data Preservation,” B. Plale, Open Data Seminar, University of Michigan, November 2010.

“Applying the Karma Provenance Tool to NASA’s AMSR-E Data Production Stream,” R. Ramachandran, H. Conover, K. Regner, S. Movva, Michael Goodman, B. Plale, P. Purohit, Y. Sun. American Geophysical Union Fall Meeting, December 13–17, 2010.

1 May 2012

Page 41: Annual Review 1 May 2012

Outreach to Community:Posters and Presentations

“Data Provenance for Preservation of Digital Geospatial Data,” B. Plale, B. Cao, C. Herath, and Y. Sun, Geological Society of America special volume on Transforming Data To Knowledge for Geosciences, v. 482, p. 125-137, 2011.

“Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products”, H. Conover, K. Regner. Presented at AMSR-E Science Team Meeting, Asheville, NC, June 2011.

“Instrumenting Earth Science Applications for OPM-Driven Provenance,” Mehmet Aktas, Beth Plale, Helen Conover, Prajakta Purohit. White paper distributed at ESIP Federation Meeting, Santa Fe, July 2011.

“Instant Karma Status Update: Provenance at the AMSR-E SIPS,” Conover, Plale, Aktas, B. Beaumont, D. Conway, S. Graves, S. Jensen, H. Joshi, A. Kulkarni, Y. Luo, R. Ping, P. Purohit, R. Ramachandran, K. Regner, C. Stein. Poster presented by Conover at ESIP Federation Meeting, Santa Fe, July 2011.

“Provenance Collection and Display Tools for the AMSR-E SIPS,” H. Conover, B. Beaumont, A. Kulkarni, R. Ramachandran, K. Regner, S. Graves, D. Conway. Presentation and poster at NASA Earth Science Data Systems Working Groups (ESDSWG) Meeting, Newport News, VA, November 2011.

“Key Provenance of Earth Science Observational Data Products,” H. Conover, B. Plale, M. Aktas, R. Ramachandran, P. Purohit, S. Jensen, and S. Graves. Presented at American Geophysical Union Fall Meeting, session IN22 Issues in Scientific Data Preservation and Stewardship, December 2011.

1 May 2012

Page 42: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

42

Outreach to Community:Posters and Presentations

“Provenance Collection and Display for the AMSR-E SIPS,” H. Conover, B. Beaumont, A. Kulkarni, R. Ramachandran, K. Regner, S. Graves, D. Conway. Poster presented at ESIP Federation Meeting, Washington DC, January 2012.

“Metadata and Provenance Capture: Fins in a Sea of Data,” B. Plale, invited talk, Purdue University, March 2012

“Metadata and Provenance: Dimensions of Use in Science”, B. Plale, invited talk, Biodiversity Workshop, Pacific Rim Applications and Grid Middleware Assembly (PRAGMA) 22, Melbourne, AU March 2012

“Metadata and Provenance Capture and Use,” B. Plale, invited talk, IBM TJ Watson, April 2012

In Progress:-- Paper in work, targeting IEEE Transactions on Geoscience and Remote Sensing, special issue on

Geoscience Data Provenance.

1 May 2012

Page 43: Annual Review 1 May 2012

Presentation Outline

• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides

1 May 2012

Page 44: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

44

• Event system support implemented for capturing provenance from legacy systems

• Tools for harvesting provenance from logs, specifically logs of AMSR-E reprocessed data

• Enhancements to provenance plug-in for Cytoscape, driven by AMSR-E data needs:– Graph comparison: of graph structure and metadata based– Provenance chain visualization: over long duration (months, years)– Data forward provenance: provenance into future of derived products across

science products

• Enhancements based on AMSR-E data products in Instant Karma rolled into Karma provenance system and made available to other Karma users including NSF-funded GENI project

1 May 2012

Summary of Accomplishments

Page 45: Annual Review 1 May 2012

Summary of Accomplishments• Applied current provenance research results to a legacy NASA

science processing system– Instrumented all processing streams for all AMSR-E standard products, using

a software library developed at UAH and provenance logging specifications provided by IU.

– Captured high level science relevant provenance and context information in collaboration with the AMSR-E Science Team

– Developed a provenance browser customized for AMSR-E science users• Engaged user community via

– Presentations to AMSR-E Science Team– Collaboration and beta-testing with AMSR-E Sea Ice Team– Outreach to Sea Ice community at GSFC

• Developed plan to transition provenance collection and display tools into production at the AMSR-E SIPS– Coordinating with NSIDC DAAC on delivery of provenance information

1 May 2012

Page 46: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

46

Transition Plans• Currently able to collect provenance information for all AMSR-

E standard products generated at SIPS-GHRC– Level-2B and Level-3 products– Implemented in Provenance Testbed, ready for test at the AMSR-E

SIPS• Plan to implement provenance collection in AMSR-E SIPS

before reprocessing to begin in summer 2012• Working with NSIDC DAAC on delivery of ISO Lineage

compliant provenance information with data products– High level processing description, to be stored with collection-level

metadata at NSIDC DAAC– Granule specific information, referencing high level description, to be

embedded in data files

• Provenance Browser to be available from AMSR-E SIPS / GHRC through mission close-out, then transitioned to NSIDC DAAC

1 May 2012

Page 47: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

47

Project ResultsProducts for potential reuse resulting from this research:• Provenance collection tools

– Software library with reference implementation in Perl– Schema for provenance logs– InstantKarma Adaptor to ingest provenance logs into Karma

• Conventions for provenance and context metadata– ISO Lineage for product generation processes

• High level description, to be stored at NSIDC DAAC– ISO Lineage for data granules

• Granule specific information, to be embedded in data files• References high level description

– Additional high-level data product and algorithm context information

• Provenance storage and display – Drupal profile with provenance database and two modules

• Provenance browser• Data product and algorithm context form

– Provenance-custom plugins to Cytoscape for graph comparison and provenance viewing over time

1 May 2012

Page 48: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

48

Lessons Learnedadapting cutting edge research to a legacy production system

• Expect to encounter similar issues in retrofitting legacy science processing systems for provenance capture, such as heterogeneous system components and lack of direct control over science software

• Efficient instrumenting of a science processing system requires joint effort by experts with the processing system and with provenance collection.

• Need to consider provenance collection at beginning of system design– Science algorithms should be self describing– Need a way to track intermediate files which may be overwritten each time

• Evolving standards and conventions make project planning more difficult (but allow project to contribute to this evolution).

– Provenance metadata requirements– Unique identifiers for data collections and files– NASA Earth science conventions for ISO metadata– NASA Earth science data preservation requirements

1 May 2012

Page 49: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

49

Lessons Learnedstrict reproducibility vs. full contextual information

• Contextual information– Need to identify what is important and how to present

it concisely– Presenting key details of the science algorithm may be

most important to scientists• Reproducibility

– Full reproducibility requires logging everything. Metadata may get bigger than science data.

– Need to filter out most valuable information from entire processing log.

1 May 2012

Page 50: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

50

Recommendations• Design provenance collection into entire data lifecycle, from Level 0

processing onward• Agree on standard protocols/schemas for provenance and related

information (ISO Lineage?)– Develop a semantic mapping between ISO Lineage, OPM and other

provenance schemas• Allow for instrumentation of science algorithms (not black box)• Possible follow-on work

– Monitor provenance browser usage to determine most important features– Develop provenance libraries for common science programming and scripting

languages (C, FORTRAN, Perl, Python, etc.) that incorporate provenance logging into common commands (e.g., read, invoke, write, error)

– Document provenance collection and dissemination methodology for broader use in NASA Earth science product generation systems

– Apply these tools and techniques to future missions (e.g., GPM)

1 May 2012

Page 51: Annual Review 1 May 2012

Presentation Outline

• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides

1 May 2012

Page 52: Annual Review 1 May 2012

Instant Karma Project Contacts• AMSR-E Provenance Browser

– http://provenance.itsc.uah.edu/ • Information Technology & Systems Center

– http://www.itsc.uah.edu

• Contacts– Michael Goodman [email protected] – Helen Conover [email protected] – Rahul Ramachandran [email protected]– Beth Plale [email protected] – Thorsten Markus [email protected]

1 May 2012

Page 53: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

53

Karma Provenance Tool Information• Karma v3.2.1 software is available on SourceForge• Licensed under Apache 2.0• Visualization uses Cytoscape v2.8.1. Karma

provenance plugin to Cytoscape available: http://d2i.indiana.edu/provenance_karma

• Project documentation available: http://d2i.indiana.edu/provenance_karma

• IU InstantKarma Project web site: http://d2i.indiana.edu/provenance_instantkarma

1 May 2012

Page 54: Annual Review 1 May 2012

Instant Karma Year 2 Annual Review ACCESS-09-0006

54

AcronymsAMSR-E Advanced Microwave Scanning Radiometer for EOSAPI Application Programming InterfaceDAAC Distributed Active Archive CenterECS EOSDIS Core SystemELPH Earth science Library for Processing HistoryEOSEarth Observing SystemEOSDIS Earth Observing System Data and Information SystemESDSWG Earth Science Data Systems Working Group ESIP Earth Science Information PartnersGENI Global Environment for Network InnovationsGSFC Goddard Space Flight CenterISO International Organization for StandardizationITSC Information Technology & Systems CenterIU Indiana UniversityMSFC Marshall Space Flight CenterNASA National Aeronautics and Space AdministrationNSIDC National Snow and Ice Data CenterNSF National Science FoundationOPM Open Provenance ModelSCF Science Computing FacilitySIPS Science Investigator-led Processing SystemTRL Technology Readiness LevelUAHuntsville University of Alabama in HuntsvilleURN Uniform Resource Name

1 May 2012