11
Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Embed Size (px)

Citation preview

Page 1: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Application of Provenance for Automated and Research

Driven Workflows

Tara Gibson

June 17, 2008

Page 2: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Motivation

Identify provenance models and architectures that will support a variety of real world scientific researchPromote collaboration and interoperability

Review requirements identified by the community Identify new requirements from our own use case studies that span a number of domains

Methods

Page 3: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Use case studies

Encountered two types of workflow

Automated (eg. Pipelines)

User-Driven, research oriented (eg. Digital Libraries, Data Lineage)

Page 4: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Use case type comparison

Automated User-driven Sequence of processes and data to accomplish a given task

Enables collaboration by saving context and details

Driven by workflow engine Directed by researcher

Follows predefined pattern Ad-hoc, no set pattern

Pre-determined completion strategy

Completion determined by researcher

Page 5: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Sensor Analysis

SOA based runtime intrusion detection system to prevent attacks on sensitive systems.

Large scale data streaming (~30TB per day)

Too much provenance, system would be quickly overwhelmed, record only significant events

Page 6: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Subsurface Modelling

Understand how contaminants react and move through environments by simulating experiments that would not be feasible otherwiseResearch often follows many branches of investigation with complex relationships between simulations.

Alt Parameters

Alt MaterialGeometry

Alt Parameters

Variable Flow

Alt MaterialGeometry

Alt Parameters

Alt InclusionGeom

AltMaterial

Geometry

HeterogeneousFlow

HomogeneousFlow

Alt Parameters

Alt MaterialGeometry

Alt Parameters

Variable Flow

Alt MaterialGeometry

Alt Parameters

Alt InclusionGeom

AltMaterial

Geometry

HeterogeneousFlow

HomogeneousFlow

Page 7: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Archive, Data MiningDocument data context and relationships to improve effectiveness of facility

Use of data extraction and harvesting to capture provenance and meta-data

Track relationships between experiments and computations

Allows for better collaboration and understanding

Page 8: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Requirements Summary

Record provenance about process, data, relationshipsGroup items together for comparisonRecord arbitrary meta-dataStandards-based search capabilityExamine process and data that led to result Identify the overall impact on a workflow due to changes in process/data

Page 9: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Influences on Architecture

Requirements Influences Record only the provenance from significant events and the processes and data that led to the identification of the event

Transaction based event recording

Record provenance of high throughput pipelines with minimal impact on performance

Asynchronous messaging (JMS)

Extract and record customized file metadata for context searching

Meta-data extraction/harvesting

Query for derivation graph, filtering on level of detail

Create views based on level of detail

Page 10: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Challenges

Multiple language bindingsInformation overloadScalability

Should scale to billions of triples

Augmentation – user annotationFilteringUser/Application specific views

Page 11: Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008

Questions...

Email: [email protected]