Taverna in 2006 Industry Workshop, tmo@ebi.ac.uktmo@ebi.ac.uk, 8 th March 2006

Preview:

Citation preview

Taverna in 2006Taverna in 2006

Industry Workshop,

tmo@ebi.ac.uk,

8th March 2006

Taverna 1Taverna 1

3 Years old, 1300 downloads in latest release over two months.

Expanding community covering an increasing variety of domains

Originally funded as part of an EPSRC pilot project, research rather than production focus

A success but with limitations

Taverna 1.3.1 WorkbenchTaverna 1.3.1 Workbench

Evolving challengesEvolving challenges

Long running data intensive workflows Manipulation of confidential or otherwise protected

information Use with classical grid systems Interaction with users during workflows Workflow authoring, service discovery and

composition Data comprehension, provenance and

visualization

User Interaction HandlingUser Interaction Handling

Interaction Service and corresponding Taverna processor allows a workflow to call out to an expert human user

Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline

Interaction Service ArchitectureInteraction Service Architecture

Patterns

Submit

Status

Results

Upload

Download

InteractionStore Proxy

PatternPattern

Pattern

Taverna 1.3

DALEC – Linking Taverna and DASDALEC – Linking Taverna and DAS

DALEC exposes a Taverna workflow as a Distributed Annotation System (DAS) annotation source.– Design workflow in Taverna– Deploy in DALEC– Access through any DAS client (Spice, Ensembl web server etc)

Standard DAS Service DALEC DAS Service

Taverna 2Taverna 2

Funded as part of OMII-UK 10 Developers Dedicated design, implementation, testing

and support team First new developers started three weeks

ago, project manager arriving in April

Ingest Ingest

Early adoptersPioneers

Pioneers ConservativesEarly adoptersPioneers

myGridPre-release

myGrid Release

OMII-UKRelease

Software Engineering

XP

Software Engineering

Quality & Test

Evaluation Evaluation OMII Software Engineering

Quality & TestPrioritise & Plan

Prioritise & Plan

Production Applications & Professional ServicesApplications & Professional Services

myGridAlliance

myGridAlliance

Source-forgecommunity

Source-forgecommunity

Future DirectionFuture Direction

Enhancements to the Workflow Core Enhancements to user interface and

experience Expanded use of semantic web

technologies Engagement with new user communities –

cheminformatics, humanities, social sciences etc.

Code remains open source and always will

Composite Workflow ModelsComposite Workflow Models

Enhanced Dataflow ModelEnhanced Dataflow Model

Modular dispatcher mechanism– Dynamic service binding– Recursive invocation– Data filter implementation– Retry, failover, back-off behaviours

Transparent third party data transfers High throughput stream handling with

implicit iteration semantics

Runtime Service BindingRuntime Service Binding

Service definition consists of an abstract description

Resolved at workflow runtime to one or more concrete resources by a broker

Allows load balancing or economic model based service selection over grid environments

Recursive InvocationRecursive Invocation Dispatcher allowing

recursive invocation to be plugged into per operation semantics.

Test Forcompletion

Invokeoperation

ModifyInput Set

GatherResult Set

Return Result

ReceiveInput

Dynamic Dispatch ConfigurationDynamic Dispatch Configuration

33rdrd Party Data Transfers Party Data Transfers

Allows ‘in place’ referencing of data – Large data sets no longer round-trip between

workflow engine and data provider– Allows restricted access to sensitive data

Automatic de-reference when a reference type is linked to a value type within a workflow.– Connecting a grid service to a web service

Service 1 Service 2 Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

Client pushes workflow input data value to workflow enactor, enactor stores the value in a local cache for future use.

Service 1Service 1 Service 2 Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

Workflow enactor sends cached data value to Service 1.

Service 1Service 1 Service 2 Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

Service 1 completes and stores its result value in a local data store, for example SRB, on the same host (Provider A). It returns a reference to that value to the workflow enactor.

Service 1 Service 2Service 2 Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

The enactor examines the workflow and determines that Service 2 understands the reference it has to the Service 1 result. It sends this reference to Service 2 which uses it to directly access the local data store.

Service 1 Service 2Service 2 Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

Service 2 completes, stores its result in the local store and returns a reference to that data to the enactor.

Service 1 Service 2 Service 3Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

The enactor examines Service 3. This service, located on another provider, cannot consume the reference returned from Service 2. The enactor forces a de-reference, requesting and caching the value of that reference from Provider A

Service 1 Service 2 Service 3Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

As the enactor now has a value rather than a reference it can invoke Service 3, which is fed data from the enactor local cache, operates over that data and returns a result which is in turn cached by the enactor.

Service 1 Service 2 Service 3

Service 1 Service 2

Provider A

Service 3

Provider B

Workflow Enactor

Enactment Engine

Logical Workflow Structure defined by user

The workflow is complete, the enactor sends the final result back to the client.

Streaming DataStreaming Data Allow execution of downstream workflow

stages on partially complete results from upstream.

Service 1 Service 2 Service 3

Non streaming (Taverna 1), entire iteration must complete at each stage

Streamed data, Service 2 starts operating on partial results from Service 1

New UI DevelopmentNew UI Development

Smart graph editing module 3d ‘virtual reality’ style enactment status

display Data playground – design workflows by

example Integrated semantic search Knowledge driven visualization for result

mining

KAVE Data and metadata KAVE Data and metadata managementmanagement

Life Science Identifiers Information Model File management Support for custom

database building Provenance metadata

capture using RDF SRB integration OGSA-DAI integration

urn:data:f2

urn:data:f2

urn:data1urn:data1

urn:data2urn:data2

urn:compareinvocation3urn:compareinvocation3

urn:data12

urn:data12

Blast_report

[input]

[output]

[input]

[distantlyDerivedFrom]

SwissProt_seq

[instanceOf]

Sequence_hit

[hasHits]

urn:hit2….

urn:hit2….

urn:hit1…urn:hit1…

urn:hit50…..

urn:hit50…..

[instanceOf]

[similar_sequence_to]

Data generated by services/workflows

Concepts

[ ]

[performsTask]

Find similar sequence

[contains]

Services

urn:data:3urn:data:3

urn:hit8….

urn:hit8….

urn:hit5…urn:hit5…

urn:hit10…..

urn:hit10…..

[contains]

[instanceOf]

urn:BlastNInvocation3urn:BlastNInvocation3

urn:invocation5urn:invocation5urn:data:f1

urn:data:f1

[output]

New sequence

Missed sequence

[hasName] [hasName

]

literalsDatumCollection

[type]

LSDatum

[type]Properties

[instanceOf]

[output]

[output]

[directlyDerivedFrom]

Process 1Process 2Process 3

Enactor

Workflow Workbench

Steering Control

Steering of simulations by

manipulation of service state

Workflow definition sent to enactor

myGrid Metadata Stores

Computational SteeringComputational Steering

Scientists

Process and data provenance captured and stored by metadata services

Scientist designs, initiates and steers simulation from Taverna

Workbench

Service TypesService Types

Closer integration with grid systems i.e. Condor, EGEE et al and their associated security and access control mechanisms.

R for numerical analysis (microarray informatics amongst others)

Continued improvements to SOAP, BioMoby, Biomart, Soaplab, SGS, Local scripting and other components

Obtaining TavernaObtaining Taverna

Taverna is available under the LGPL from our project site on Sourceforge.net– http://taverna.sourceforge.net

Release 1.3.1 as of December 2005 Win32, Solaris / Linux & OS-X Includes online and downloadable user manual,

examples etc. Support via project mailing lists

mymyGrid team & Early adoptersGrid team & Early adoptersCoreMatthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes,

Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.

UsersSimon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical

Sciences, University of Newcastle, UKHannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UKPostgraduatesMartin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan,

Antoon Goderis, Tracy Craddock, Alastair HampshireIndustrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)Robin McEntire (GSK)CollaboratorsKeith Decker