Upload
maj
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
UAHuntsville The University of Alabama in Huntsville. Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream ACCESS-09-0006. Annual Review 1 May 2012. Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream. - PowerPoint PPT Presentation
Citation preview
Instant Karma: Applying a Proven Provenance Tool
to NASA’s AMSR-E Data Production StreamACCESS-09-0006
Annual Review1 May 2012
UAHuntsvilleThe University of Alabama in Huntsville
Objectives
Approach
Co-Is/Partners
Key Milestones
Advancing Collaborative Connections for Earth System ScienceACCESS
Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream
PI: Michael Goodman, NASA / MSFC
TRLin= 7 TRLcurrent= 7.9
16 April 2012
• Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community
• Customize and integrate Karma, a proven provenance tool into NASA data production
• Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice
• Engage the Sea Ice science team and user community• Adhere to the Open Provenance Model (OPM)
Apply Karma to Sea Ice data production workflows Customize Karma’s provenance dissemination user
interface Evaluate usefulness of provenance collected
- Measure traffic to Karma Provenance Browser- Collect user feedback
Expand use of Karma to other AMSR-E data production streams
Evaluate current AMSR-E SIPS product generation
06/10 Extend Karma provenance collection tools for SIPS
09/10 Enhance Karma Provenance Browser interface
10/10 Instrument AMSR-E Sea Ice production in Testbed
12/10 Evaluate with Sea Ice science team
03/11 Introduce Provenance Browser to NSIDC DAAC
06/11 Instrument AMSR-E Sea Ice production in Ops
09/11 Evaluate with AMSR-E Sea Ice user community
02/12 Instrument other AMSR-E data streams
02/12
Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Helen Conover, Rahul Ramachandran, UAHuntsville
Objectives
Accomplishments
Co-Is/Partners
Advancing Collaborative Connections for Earth System ScienceACCESS
Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream
PI: Michael Goodman, NASA / MSFC
TRLin= 7 TRLout= 7.9
16 April 2012
• Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community
• Customize and integrate Karma, a proven provenance tool into NASA data production
• Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice
• Engage the Sea Ice science team and user community• Adhere to the Open Provenance Model (OPM)
Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Helen Conover, Rahul Ramachandran, UAHuntsville
• Applied current provenance research results to a legacy NASA science processing system• Instrumented all processing streams for all AMSR-E standard products, using a software library developed at
UAH and provenance logging specifications provided by IU• Captured high level science relevant provenance and context information in collaboration with the Science
Team• Developed a provenance browser customized for AMSR-E science users
• Engaged user community via presentations to AMSR-E Science Team, beta-testing with AMSR-E Sea Ice Team, and outreach to Sea Ice community at GSFC
• Developed plan to transition provenance collection and display tools into production at the AMSR-E SIPS• Moving toward ISO rather than OPM, in line with NASA’s current metadata approach• Enhanced Karma visualization based on Sea-Ice rolled into visualization tools for other Karma users
Provenance Repository
Ocean
Weekly Script
OceanRainSnow
Monthly Script
Snow
Pentad Script
OceanLandSnowSea Ice
Daily Script
L2A Brightness
Temps
Pass Script
L2B OceanL2B LandL2B Rain
Query API
ProcessingTestbed
Provenance BrowserProvenance
Collection
Presentation Outline
• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides
1 May 2012
Instant Karma Overview
1 May 2012
Data Center Operations
Provenance Research
Earth Science
Instant Karma Project
• Collaboration among– AMSR-E SIPS (MSFC Earth Science Office
and UAHuntsville ITSC) – Indiana University, Data to Insight Center– AMSR-E Sea Ice science team (GSFC)
• Primary goal is to improve the collection, preservation, utility and dissemination of provenance and context information within the NASA Earth Science community, building on– Karma provenance tools– AMSR-E standard product generation, with initial focus on Sea Ice
Team Leads
1 May 2012
Name Organization RoleMichael Goodman NASA MSFC Principal InvestigatorHelen ConoverRahul Ramachandran
UAHuntsville Information Technology & Systems Center
Co-Investigator – Project ManagerCo-Investigator – System Architect
Beth Plale
Indiana University
Data to Insight Center
Co-Investigator – Karma System
Thorsten Markus NASA GSFC Co-Investigator – Science TeamDawn Conway UAHuntsville
Earth System Science Center
AMSR-E Science Metadata Consultant
Team Membersover the two-year project
1 May 2012
Name Organization RoleBruce Beaumont Anurag Bhaskar Chocka ChidambaramAjinkya KulkarniMichael McEniryKathryn RegnerCara Stein
UAHuntsville Information Technology & Systems Center
AMSR-E SIPS Implementation Graduate Student, Metadata UtilitiesProvenance Browser Provenance Browser, Metadata UtilitiesSystems Administration, ISO metadata definitionAMSR-E SIPS Systems EngineerProvenance Browser
Mehmet AktasScott JensenYuan LuoRobert PingPrajakta PurohitYiming Sun
Indiana University Data to Insight Center
Postdoc, Karma Database and ServicesPostdoc, Karma Database and Services Graduate Student, Karma System MonitoringProject ManagementGraduate Student, Provenance Collection ServicesGraduate Student, Karma Database and Services
Don CavalieriAlvaro Ivanoff
NASA GSFC Science Algorithm DeveloperScience Software Developer
Instant Karma Year 2 Annual Review ACCESS-09-0006
8
Project Approach
Focus on capturing information for contextual understanding and long term data preservation rather than full data reproducibility• SIPS processing flow is very controlled, with less need for detailed provenance
– Files in a given version of a dataset will differ only by list of input files, processing location and environment, processing date and time
• Provenance as centerpiece around which other information is aggregated
1 May 2012
Research Track: Karma Project
AMSR-E Sea Ice Team Provenance @ SIPS
Provenance researchCollection, storage, display methods
NASA data preservation spec Operational environment
Customized provenance database and browser
Science user
needsRequ
irem
ents
Research results
Operations Track: AMSR-E SIPS
Instant Karma Year 2 Annual Review ACCESS-09-0006
9
Science-Relevant Provenance and Context Information
• Data lineage (data inputs, software and hardware) plus additional contextual knowledge about science algorithms, instrument variations, etc.
• Lots of information already available, but scattered across multiple locations– Processing system configuration– Data collection and file level metadata– Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical basis documents, release
notes)– Data documentation (e.g., guide documents, README files)
• Project goal was to collate and organize information from multiple sources, make available through the AMSR-E Provenance Browser
1 May 2012
AMSR-E Provenance Use Cases Browse provenance graphs : convey rich information about final data
granule details [Use case 1]– Spatial location, time of observation, algorithms employed, input data and
ancillary files– Provenance bundle to include pointers to relevant documentation
Answer “Something isn’t right” question [Use case 1 variant]- E.g., did not receive data for several days so snow melt mask may be
inaccurate. Compare two data granules [Use case 2]
- Query system to compare provenance graph structure and provenance details (e.g., versions of software, number and versions of input files)
General provenance graph for a given science process, e.g., Sea Ice processing [Use case 3]- Current algorithms and versions, nominal number and versions of input files,
pointers to relevant documentation Embed provenance information as annotations in data files [Use case 4]
- ISO “Lineage” model (evolving NASA ES conventions)
1 May 2012
Instant Karma Major Milestones
1 May 2012
TRL Description
1 Basic principles observed and reportedTransition from scientific research to applied research
2 Technology concept and/or application formulatedApplied research
3 Analytical and experimental critical function and/or characteristic proof-of concept. Proof of concept validation
4 Component/subsystem validation in laboratory environmentStandalone prototyping implementation and test
5 System/subsystem/component validation in relevant environmentThorough testing of prototyping in representative environment
6System/subsystem model or prototyping demonstration in a relevant end-to-end environment (ground or space)Prototyping implementations on full-scale realistic problems
7System prototyping demonstration in an operational environment (ground or space)System prototyping demonstration in operational environment
8Actual system completed and "mission qualified" through test and demonstration in an operational environment (ground or space)End of system development
9Actual system "mission proven" through successful mission operations (ground or space)Fully integrated with operational hardware/software systems
Technology/Software Readiness Levels
1 May 2012 Instant Karma Year 2 Annual Review ACCESS-09-0006
Target Deployments
Testbed
AMSR-E SIPS
SIPS / DAAC
deployed
planned
key
Presentation Outline
• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
14
AMSR-E Product Suite
1 May 2012
Product Description Science PI Sample ImageryAMSR-E/Aqua L2A Global Swath Spatially-Resampled Brightness
Temperatures (Tb)Wentz
AMSR-E/Aqua L2B Global Swath Ocean Products derived from Wentz Algorithm
L3 Ascending/Descending .25x.25 deg Daily Ocean GridsL3 Weekly and Monthly Ocean Grids
Wentz
AMSR-E/Aqua L2B Surface Soil Moisture, Ancillary Parms, & QC EASE-GridsDaily L3 Surface Soil Moisture, Interpretive Parms & QC EASE-Grids
Njoku
AMSR-E/Aqua L2B Global Swath Rain Rate/Type GSFC Profiling AlgorithmL3 Monthly Global Averaged Rainfall Accumulation over Land & Ocean
Kummerow, Ferraro, Wilheit
AMSR-E/Aqua Daily L3 6.25 km 89 GHz Brightness Temperature (Tb) Polar Grids
L3 Daily 12.5 km Tb, Sea Ice Concentration, & Snow Depth Polar GridsL3 Daily 25 km Tb, Sea Ice ConcentrationSea Ice Drift
Markus, Cavalieri, Comiso
AMSR-E/Aqua Daily L3 Global Snow Water Equivalent EASE-GridsL3 5-Day and Monthly Snow Water Equivalent grids
Kelly
Instant Karma Year 2 Annual Review ACCESS-09-0006
15
SIPS-GHRC Processing Architecture• Algorithm
Packages from the Science Team and Team Lead of the Science Computing Facility
• Processing automation controlled by SIPS scripts
– Pass processing is data driven
– Level-3 product generation is scheduled after nominal availability of input products
1 May 2012
Input Data and Metadata File(s)
Ancillary File(s)
Science DataMetadataProcessing HistoryQuality AssuranceBrowse imagery Custom subsets
Delivered Algorithm Package
Control Script
Provenance logging at this level
Provenance Repository
Query API
Provenance Information Architecture
1 May 2012
MP
Ocean
Weekly Script
OceanRainSnow
Monthly Script
Snow
Pentad Script
OceanLandSnowSea Ice
Daily Script
L2A Brightness
Temps
Pass Script
L2B OceanL2B LandL2B Rain
ProcessingTestbed
P PP
P
PM
MM
M
M
M
Granule Metadata
Provenance Logs
M
M
P
Product / AlgorithmMetadata Form
Provenance Browser
Instant Karma Year 2 Annual Review ACCESS-09-0006
17
Earth science Library for Processing History (ELPH)
• Perl Library reference implementation can be used to instrument script-driven processing
• Based on IU work in software instrumentation and Karma provenance log specification
• Logs “consumed”, “invoked” and “produced” events• Assigns unique URNs to all artifacts (e.g., data files or software processes), of
the form urn:host-id/environment:datetime:name– host-id indicates computer system on which processing is performed– environment indicates processing environment (e.g., dev, test or ops for AMSR-E)– datetime (yyyymmddhhmmss) indicates production date/time for science data
files, last modification date/time for other files, system date/time for processes or workflows
– name indicates name of file or process
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
18
Provenance and Context Metadata• Harvesting granule information from ECS metadata
– Also recording processing location associated with each data granule• Providing Context Summary information for algorithms and data products,
in consultation with AMSR-E Science Computing Facility and Sea Ice team– Algorithm versions and descriptions– Parameters and data fields in science products– Ancillary files used in processing– Flag values and explanations– Pointers to full documentation– Aligning with NASA Preservation and Context Specification
• Developing ISO Lineage compliant processing descriptions, in consultation Ted Habermann
– High-level description of product generation processes to be stored at NSIDC DAAC with collection-level metadata
– Granule specific information, which references high level description, to be embedded in data files
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
19
Context Summary Metadata Schema
1 May 2012
Basic Product Information
Delivered Algorithm Package
Documentation Links
Science Algorithm Information
Geophysical Parameters / Variables and Associated
Information Data fields
Ancillary filesFlag values
File Informatione.g., grid type
Instant Karma Year 2 Annual Review ACCESS-09-0006
20
AMSR-E Provenance Browser
1 May 2012
Browse by Product Type
Browse by Product Images
Search by File Name
Explore Provenance
Instant Karma Year 2 Annual Review ACCESS-09-0006
21
AMSR-E Provenance Browser
1 May 2012
ProcessingGraph
AlgorithmInformation
GranuleMetadata
Parameters and Algorithms
Instant Karma Year 2 Annual Review ACCESS-09-0006
22
AMSR-E Provenance Browser
1 May 2012
Algorithm Used
Parameter Selected
Instant Karma Year 2 Annual Review ACCESS-09-0006
23
AMSR-E Provenance BrowserModes of Exploration
• Browse list by product type, file name– Mid-winter Sea Ice data was incorrect after reprocessing. Look at January data
to determine what was different about this processing run. • Browse images by product type, parameter/variable
– Scan series of images to look for possible problems, e.g., incomplete data for 2011-10-04
• Search by file name– Question about a particular data file that doesn’t look right. Search by file
name; may discover that this was a problem file later replaced, and get new file.• View general processing information
– For information about how a files from a particular product are generated, view general processing graph and get pointers to more detailed documentation.
1 May 2012
Presentation Outline
• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides
1 May 2012
Advanced Uses of Provenance
I. Provenance over months and yearsII. Forward provenanceIII. Comparing provenance graphsIV. Provenance quality analysis
I. Month-view of Product ActivityDaily processing for Sea Ice products over 1 month is linked graph of provenance graphs, one per day.
View shows links between provenance graphs. E.g., ice mask updated for each day, feeds into the next day.
This view can be used to detect problems in processing over extended time periods.
Zoom in to Review Dependencies
Zoom to review complex dependencies between runs responsible for producing data products
Shows dependencies for Sea Ice Drift product, built from five daily Sea Ice products
Can see here that input data for fifth day is incomplete (instrument offline)
Zoom in and out to see entire provenance history or specific details
II. Forward Provenance
Standard provenance view is historical; it shows actions that went into creating a data product
Shown here is 12km SeaIce product
Clicking Data Forward option switches to displaying later-in-time (forward) provenance dependencies
Forward Provenance Graph12KM Sea Ice Product
Forward provenance can be helpful if problem is discovered in data product. The progeny of bad data product can be tracked down visually by following forward provenance chain.
High level view provides quick snapshot of downstream dependencies on data product
Useful to quickly see downstream consequences of reprocessing
III. Comparing Two Provenance Graphs
Compare two workflows side-by-side with node-to-node mapping that uses Direct Classification of node Attendance (DCA)*
* DePiero, F. and Trivedi, M. and Serbin, S., Graph matching using a direct classification of node attendance , Pattern Recognition, (29)6, pp 1031—1048.
Provenance Graph Comparison
Matched sub-graph: Red nodes are matched to corresponding nodes in comparison graph
Left hand graph has 19 data files that cannot be paired with graph on right
One can see from zooming in that additional nodes are input files. This abnormal case occurred because instrument went off-line so data was not available
Comparison of Metadata
Clicking on data product or process in one graph (e.g., on left), automatically highlights corresponding product or process in comparison graph
Displaying metadata for data product or process in one graph automatically displays metadata for corresponding node in comparison graph
IV. Analysis of Provenance Quality
• MethodologyDuplicate detection
• Annotations are mined for duplicates and conflicts.
Structural analysis• Structures of provenance graphs that are of the same types are
compared and mined for differences or similarities.
Anomaly detection• Detection of outliers in a provenance graph
Validation of attributes• Fields such as time ranges are validated to ensure that time ranges are
valid.
• Applied to 1 month of data processing
Analysis of Quality of Provenance
Structural analysis of 1 month of provenance detected an incorrect timestamp range:– Process_6250<-Process_6251
Time range is reversed: noEarlierThan: 2011-10-13T00:44:32.0000-04:00noLaterThan: 2011-10-13T00:44:31.0000-04:00
This is an artifact of data produced and consumed entries in log file being reversed, which results in begin and end time of time range being swapped
Plot of number of annotations for each node and edge in monthly ocean workflow
used edges Annotations Annotations for wasDerivedFrom edges Data Object annotations
Anomaly detection- Exposes duplicate annotations occurring from duplicate entries in the log files.
Annotations; zoom in I
4 wasDerivedFrom edges that have fewer annotations than usual.
used edges Annotations Annotations for wasDerivedFrom edges Data Object annotations
Anomaly detection- Exposes edges that have less annotations than usual
Annotations; zoom in II
wasTriggeredBy edge has annotations with same uri (keys), but different values.
Anomaly detection- Exposes annotations that share the same key, but have different values. This may or
may not be a problem. But Karma is able to warn the user about this.
Note: Image is a close up of the highlighted part in previous slide.
Presentation Outline
• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides
1 May 2012
ESDSWG and ESIP Activities• NASA Earth Science Data Systems Technology Infusion Working Group
– UAH team members participate in ESDSWG Tech Infusion, specifically the Data Stewardship Subgroup. Key subgroup activities:
• NASA Earth Science Data Preservation Content Specification• Preservation / Provenance Ontology
– Co-I Conover attended the ESDSWG meeting in October 2010 – Session on science-relevant provenance
• NASA Earth Science Data Systems Standards Process Group– Co-I Conover presented the provenance browser and metadata work at the ESDSWG
meeting in November 2011. This led to current work with ISO Lineage spec.• ESIP Federation Information Technology and Interoperability Committee and the
Preservation and Stewardship Cluster– Co-I Plale gave a presentation to IT&I in September 2010 on Instant Karma,
provenance collection, semantics, and interoperability. This group determined that the best place to engage in provenance efforts is through the ESIP Federation Preservation and Stewardship Cluster’s provenance working group.
– Co-I Conover attended the ESIP Federation meeting in July 2010 – Data Identifiers presentation of special interest to this project
– Co-I Plale attended the January 2011 ESIP meeting – Sessions on stewardship, science-relevant provenance among others
– Co-I Conover attended the July 2012 ESIP meeting – Sessions on ISO metadata, NASA Earth Science Data Preservation Content Specification, provenance ontology, others
1 May 2012
Outreach to Community:Posters and Presentations
“Instant Karma: Collecting Provenance for AMSR-E,” Beth Plale and Helen Conover. Joint AMSR-E Science Team Meeting, Huntsville AL, June 2, 2010.
“Instant Karma: Provenance Collection at the AMSR-E SIPS,” Helen Conover, Beth Plale, Sunil Movva, Prajakta Purohit, Kathryn Regner, Yiming Sun. Poster presented at the 9th Earth Science Data Systems Working Group Meeting, New Orleans, October 20-22, 2010.
“Instant Karma: Provenance Collection at the AMSR-E SIPS,” Helen Conover, Beth Plale, Sunil Movva, Prajakta Purohit, Kathryn Regner, Yiming Sun. Poster presented at the International Symposium on the A-Train Satellite Constellation 2010, New Orleans, October 25-28, 2010.
"Karma Provenance: Why and How? Provenance collection of unmanaged workflows”, Mehmet Aktas, Presentation at the Super Computing 2010 Conference, New Orleans, LA, November 14-20, 2010.
"Karma Provenance: Why and How? Provenance collection of unmanaged workflows”, M. Aktas, Presentation at the CS Department of University of West Florida, Pensacola, FL, November 20, 2010.
“Metadata and Provenance Collection and Representation: Antecedent to Scientific Data Preservation,” B. Plale, Open Data Seminar, University of Michigan, November 2010.
“Applying the Karma Provenance Tool to NASA’s AMSR-E Data Production Stream,” R. Ramachandran, H. Conover, K. Regner, S. Movva, Michael Goodman, B. Plale, P. Purohit, Y. Sun. American Geophysical Union Fall Meeting, December 13–17, 2010.
1 May 2012
Outreach to Community:Posters and Presentations
“Data Provenance for Preservation of Digital Geospatial Data,” B. Plale, B. Cao, C. Herath, and Y. Sun, Geological Society of America special volume on Transforming Data To Knowledge for Geosciences, v. 482, p. 125-137, 2011.
“Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products”, H. Conover, K. Regner. Presented at AMSR-E Science Team Meeting, Asheville, NC, June 2011.
“Instrumenting Earth Science Applications for OPM-Driven Provenance,” Mehmet Aktas, Beth Plale, Helen Conover, Prajakta Purohit. White paper distributed at ESIP Federation Meeting, Santa Fe, July 2011.
“Instant Karma Status Update: Provenance at the AMSR-E SIPS,” Conover, Plale, Aktas, B. Beaumont, D. Conway, S. Graves, S. Jensen, H. Joshi, A. Kulkarni, Y. Luo, R. Ping, P. Purohit, R. Ramachandran, K. Regner, C. Stein. Poster presented by Conover at ESIP Federation Meeting, Santa Fe, July 2011.
“Provenance Collection and Display Tools for the AMSR-E SIPS,” H. Conover, B. Beaumont, A. Kulkarni, R. Ramachandran, K. Regner, S. Graves, D. Conway. Presentation and poster at NASA Earth Science Data Systems Working Groups (ESDSWG) Meeting, Newport News, VA, November 2011.
“Key Provenance of Earth Science Observational Data Products,” H. Conover, B. Plale, M. Aktas, R. Ramachandran, P. Purohit, S. Jensen, and S. Graves. Presented at American Geophysical Union Fall Meeting, session IN22 Issues in Scientific Data Preservation and Stewardship, December 2011.
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
42
Outreach to Community:Posters and Presentations
“Provenance Collection and Display for the AMSR-E SIPS,” H. Conover, B. Beaumont, A. Kulkarni, R. Ramachandran, K. Regner, S. Graves, D. Conway. Poster presented at ESIP Federation Meeting, Washington DC, January 2012.
“Metadata and Provenance Capture: Fins in a Sea of Data,” B. Plale, invited talk, Purdue University, March 2012
“Metadata and Provenance: Dimensions of Use in Science”, B. Plale, invited talk, Biodiversity Workshop, Pacific Rim Applications and Grid Middleware Assembly (PRAGMA) 22, Melbourne, AU March 2012
“Metadata and Provenance Capture and Use,” B. Plale, invited talk, IBM TJ Watson, April 2012
In Progress:-- Paper in work, targeting IEEE Transactions on Geoscience and Remote Sensing, special issue on
Geoscience Data Provenance.
1 May 2012
Presentation Outline
• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
44
• Event system support implemented for capturing provenance from legacy systems
• Tools for harvesting provenance from logs, specifically logs of AMSR-E reprocessed data
• Enhancements to provenance plug-in for Cytoscape, driven by AMSR-E data needs:– Graph comparison: of graph structure and metadata based– Provenance chain visualization: over long duration (months, years)– Data forward provenance: provenance into future of derived products across
science products
• Enhancements based on AMSR-E data products in Instant Karma rolled into Karma provenance system and made available to other Karma users including NSF-funded GENI project
1 May 2012
Summary of Accomplishments
Summary of Accomplishments• Applied current provenance research results to a legacy NASA
science processing system– Instrumented all processing streams for all AMSR-E standard products, using
a software library developed at UAH and provenance logging specifications provided by IU.
– Captured high level science relevant provenance and context information in collaboration with the AMSR-E Science Team
– Developed a provenance browser customized for AMSR-E science users• Engaged user community via
– Presentations to AMSR-E Science Team– Collaboration and beta-testing with AMSR-E Sea Ice Team– Outreach to Sea Ice community at GSFC
• Developed plan to transition provenance collection and display tools into production at the AMSR-E SIPS– Coordinating with NSIDC DAAC on delivery of provenance information
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
46
Transition Plans• Currently able to collect provenance information for all AMSR-
E standard products generated at SIPS-GHRC– Level-2B and Level-3 products– Implemented in Provenance Testbed, ready for test at the AMSR-E
SIPS• Plan to implement provenance collection in AMSR-E SIPS
before reprocessing to begin in summer 2012• Working with NSIDC DAAC on delivery of ISO Lineage
compliant provenance information with data products– High level processing description, to be stored with collection-level
metadata at NSIDC DAAC– Granule specific information, referencing high level description, to be
embedded in data files
• Provenance Browser to be available from AMSR-E SIPS / GHRC through mission close-out, then transitioned to NSIDC DAAC
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
47
Project ResultsProducts for potential reuse resulting from this research:• Provenance collection tools
– Software library with reference implementation in Perl– Schema for provenance logs– InstantKarma Adaptor to ingest provenance logs into Karma
• Conventions for provenance and context metadata– ISO Lineage for product generation processes
• High level description, to be stored at NSIDC DAAC– ISO Lineage for data granules
• Granule specific information, to be embedded in data files• References high level description
– Additional high-level data product and algorithm context information
• Provenance storage and display – Drupal profile with provenance database and two modules
• Provenance browser• Data product and algorithm context form
– Provenance-custom plugins to Cytoscape for graph comparison and provenance viewing over time
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
48
Lessons Learnedadapting cutting edge research to a legacy production system
• Expect to encounter similar issues in retrofitting legacy science processing systems for provenance capture, such as heterogeneous system components and lack of direct control over science software
• Efficient instrumenting of a science processing system requires joint effort by experts with the processing system and with provenance collection.
• Need to consider provenance collection at beginning of system design– Science algorithms should be self describing– Need a way to track intermediate files which may be overwritten each time
• Evolving standards and conventions make project planning more difficult (but allow project to contribute to this evolution).
– Provenance metadata requirements– Unique identifiers for data collections and files– NASA Earth science conventions for ISO metadata– NASA Earth science data preservation requirements
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
49
Lessons Learnedstrict reproducibility vs. full contextual information
• Contextual information– Need to identify what is important and how to present
it concisely– Presenting key details of the science algorithm may be
most important to scientists• Reproducibility
– Full reproducibility requires logging everything. Metadata may get bigger than science data.
– Need to filter out most valuable information from entire processing log.
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
50
Recommendations• Design provenance collection into entire data lifecycle, from Level 0
processing onward• Agree on standard protocols/schemas for provenance and related
information (ISO Lineage?)– Develop a semantic mapping between ISO Lineage, OPM and other
provenance schemas• Allow for instrumentation of science algorithms (not black box)• Possible follow-on work
– Monitor provenance browser usage to determine most important features– Develop provenance libraries for common science programming and scripting
languages (C, FORTRAN, Perl, Python, etc.) that incorporate provenance logging into common commands (e.g., read, invoke, write, error)
– Document provenance collection and dissemination methodology for broader use in NASA Earth science product generation systems
– Apply these tools and techniques to future missions (e.g., GPM)
1 May 2012
Presentation Outline
• Introduction• Provenance for AMSR-E SIPS• Advanced Uses of Provenance• ESDSWG / ESIP / Outreach• Accomplishments / Lessons Learned• Contact Info / Acronyms • Back-up Slides
1 May 2012
Instant Karma Project Contacts• AMSR-E Provenance Browser
– http://provenance.itsc.uah.edu/ • Information Technology & Systems Center
– http://www.itsc.uah.edu
• Contacts– Michael Goodman [email protected] – Helen Conover [email protected] – Rahul Ramachandran [email protected]– Beth Plale [email protected] – Thorsten Markus [email protected]
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
53
Karma Provenance Tool Information• Karma v3.2.1 software is available on SourceForge• Licensed under Apache 2.0• Visualization uses Cytoscape v2.8.1. Karma
provenance plugin to Cytoscape available: http://d2i.indiana.edu/provenance_karma
• Project documentation available: http://d2i.indiana.edu/provenance_karma
• IU InstantKarma Project web site: http://d2i.indiana.edu/provenance_instantkarma
1 May 2012
Instant Karma Year 2 Annual Review ACCESS-09-0006
54
AcronymsAMSR-E Advanced Microwave Scanning Radiometer for EOSAPI Application Programming InterfaceDAAC Distributed Active Archive CenterECS EOSDIS Core SystemELPH Earth science Library for Processing HistoryEOSEarth Observing SystemEOSDIS Earth Observing System Data and Information SystemESDSWG Earth Science Data Systems Working Group ESIP Earth Science Information PartnersGENI Global Environment for Network InnovationsGSFC Goddard Space Flight CenterISO International Organization for StandardizationITSC Information Technology & Systems CenterIU Indiana UniversityMSFC Marshall Space Flight CenterNASA National Aeronautics and Space AdministrationNSIDC National Snow and Ice Data CenterNSF National Science FoundationOPM Open Provenance ModelSCF Science Computing FacilitySIPS Science Investigator-led Processing SystemTRL Technology Readiness LevelUAHuntsville University of Alabama in HuntsvilleURN Uniform Resource Name
1 May 2012