30
myGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Embed Size (px)

Citation preview

Page 1: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

myGrid/Taverna Provenance

Daniele TuriUniversity of Manchester

OMII f2f Meeting, London, 19-20/4/06

Page 2: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06
Page 3: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Components

• Identifiers– LSIDs

• Data– JDBC data store

• Metadata– RDF Provenance Plugin

• Browsing– Provenance Browser Plugin

• Security– Under development

Page 4: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

LSID

Page 5: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

LSID: Life Science Identifier

• URN specification in progress

• 5 part identifier (with optional version id)– urn:lsid:www.mygrid.org.uk:lsdocument:X1234– urn:lsid:ncbi.nlm.nlh.gov.lsid.biopathways.org:genbank_gi

:7717376

• protocol for retrieving data and metadata about an object

• commitment by the provider to always return the same data for an ID

Page 6: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

LSID (ctd)• Issue

–LSID Authorities

• Resolution

–LSID Resolvers

• Examples

–myGrid

–Long Term Ecological Research Network

–BioPathways Consortium

Page 7: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

LSID (ctd 2)

• abstraction

• lightweight

• independent from actual storage implementation

–database

– file system

–application

• both for private and public data sources

Page 8: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Data

Page 9: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Data Storage (current)

• Taverna can persist inputs, outputs and intermediate results in an SQL database via JDBC

• Optional and can be done by configuring a Baclava Data Store

• Allows the LSIDs of data items to be resolved against the actual data

Page 10: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Data Storage (future)

• Domain-specific databases– use outside myGrid

• Develop:– taverna processor for JDBC/OGSA-DAI– associated interface (cf BioMart)

• Users will be able to study the contents of an existing database and: – write queries that extract data from the database,

where the query may be parameterised with values passed in from the workflow;

– write requests that insert data from the workflow into a named table in the database.

Page 11: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Metadata

Page 12: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Metadata Generation

• Taverna Provenance Plugin

• Listen to Taverna Events

– WorkflowEventListener

• Faithfully record them as ontological instance data

– RDF graphs (one for each Taverna run)

Page 13: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Metadata

• Representation

• Ontology (Schema)

• Storage

• Query

• Browsing

Page 14: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Representation

• RDF

– triples

• subject –predicate object

– URIs (hence easy data integration)

– semantic web language

– XML serialization

– flexible, powerful

– sets of triples gives rise to graphs

Page 15: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Workflow Run

urn:lsid:..:wfInstance:8

runs

launchedBybelongsTo

urn:lsid:…:org:HY7

urn:lsid:…:person:4

urn:lsid:…:workflow:6

urn:lsid:…:processRun:84

urn:lsid:…:processRun:51

executed

executed

Page 16: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Schema

• Ontology

– RDF schema

• Taxonomic inferences

– also available as OWL

• opens it up to complex reasoning

Page 17: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06
Page 18: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Typed Workflow Run

urn:lsid:..:wfInstance:8

runs

launchedBy

Experimenter

belongsTo

Organization

urn:lsid:…:org:HY7

ProcessRunWorkflowRun Workflow

Provenance Ontology

runs

launchedBy

belongsTo

executed

urn:lsid:…:person:4

urn:lsid:…:workflow:6

urn:lsid:…:processRun:84

urn:lsid:…:processRun:51

executed

executed

Page 19: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Storage

• Named RDF graphs

– retrieve whole graphs (eg workflows)

– implementation in

• NG4J (Jena + MySQL)

– scalability issues

• Sesame2 native store

– scalable

– Java 5

Page 20: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Query

• RDF query languages

– TriQL, SeRQL, SPARQL

• query languages for named RDF graphs

• Ontology inspection/reasoning

• Canned Queries

– workflows with failed processes

– input/output of past process runs

– workflows with data changed by user

Page 21: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06
Page 22: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Browsing

Page 23: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Provenance Browsing

• Provenance Browser Plugin

– reusing Taverna GUI components

• Matthew Gamble

Page 24: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06
Page 25: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Analysis

Page 26: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Provenance Analysis

• Comparison

• Aggregation

• etc

– see work by Jun Zhao

Page 27: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Security

Page 28: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06
Page 29: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

• User sends LSID ref and credentials to the Access Point • Access Point returns data and metadata or denies

access as follows: – credentials are passed to a User Directory – User Directory passes the corresponding user to the

Authorization Authority – Authorization Authority returns the user attributes in the form of a

(possibly signed) SAML assertion – this assertion, together with the lsid and its corresponding

metadata, is passed to the Policy Enforcement Point (PEP) – PEP uses these three inputs to form an XACML request that is

passed to a Policy Decision Point (PDP) that is preloaded with an XACML Policy Set.

– PDP evaluates the request against its policy set and returns an XACML response to PEP

– PEP decodes the response and either allows data/metadata to be returned to the user or denies access.

Page 30: MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

myGrid XACML Policy

• Scenario – supervisors can access all workflows in the

organization – students can access only their own workflows – blacklisted users cannot access anything

• See policySet.xml on myGrid wiki