Using Provenance to Improve Workflow Design Frederico Tosta Leonardo Murta Claudia Werner Marta...

Preview:

Citation preview

Using Provenance to Improve Workflow Design

Frederico TostaLeonardo MurtaClaudia WernerMarta Mattoso

{ftoliveira, murta, werner, marta}@cos.ufrj.br

COPPE – Federal University of Rio de Janeiro - Brazil

UFRJ

2

Summary

•Motivation

• Introduction & Background

•Goal

•Approach & Implementation

•Conclusion

COPPE/UFRJ

3

Motivation

Pieces of workflows that occurred in the past may occur again in the future.

COPPE/UFRJ

4

Motivation

• The number of services and bioinformatics operations are growing: Taverna has over 3500 (2007). VisTrails has over 1200 Modules (2008).

WorkflowServicesWorkflow

ServicesWorkflowServicesWorkflows and

WF Services

COPPE/UFRJ

5

Motivation

How can we find the pieces or services that are useful during the design of a new workflow in an automatic and systematic way?

COPPE/UFRJ

6

Software Reuse

• Is the process of creating software systems from existing software [Krueger, 1992].

Quality

Reliability Reduced Cost

Productivity

SoftwareReuse

COPPE/UFRJ

7

Recommendation Systems

• E-Commerce: Apply data mining techniques to the problem of

helping user finding the items they would like to purchase.

Domain Concepts

E-commerce Customer Product* Cart Preference

Scientific Experiment

Scientist Component / Actor

Workflow(Goble, 2007)

Context

E-commerce concepts mapped into scientific experiment concepts

* what is recommended by e-commerce sites

COPPE/UFRJ

8

Goal

• Propose a proactive recommendation service that aims at suggesting frequent combinations of scientific programs for reuse.

COPPE/UFRJ

9

Approach

Workflow specification

Workflow specification

DB

Design

Design for reuse and recommendation

Provenance

COPPE/UFRJ

10

Approach

Workflow specification

Workflow specification

DB

Design

ProactiveRecommendation

Design with reuse and recommendation

Provenance

COPPE/UFRJ

11

Implementation

• Populating the database: VisTrails workflows:

- Parse provenance xml files to extract the relations.

MySQL database:- The relations are mapped into a database.- Each relation contains the modules and how

they are connected.

COPPE/UFRJ

12

Implementation

VisTrails workflow design with recommendation

Source Destination Source Port Dest Port

HmmBuild HmmCalibrate DestinationDir SourceDir

HmmBuild Cat DestinationDir Dir

HmmBuild HmmCalibrate DestinationDir HmmPath

HmmBuild HmmCalibrate StdOut HmmPath

HmmBuild HmmCalibrate StdOut HmmPath

Ports 1 and 2 are the output ports DestinationDir and StdOut, respectively. Ports 3, 4 and 5 are the input ports SourceDir, HmmPath and Dir, respectively

•Recommendation Metric:From the example, we can infer that port StdOut of HmmBuild has been connected to port HmmPath of HmmCalibrate in 40% of previously designed workflows.

COPPE/UFRJ

13

Implementation

VisTrails workflow design with recommendationCOPPE/UFRJ

14

Conclusion

• We expect that this approach may help to propagate the benefits of software reuse to the context of scientific workflows.

• Reduce the time to design workflows.

• Increase the quality of workflows designed.

COPPE/UFRJ

15

Conclusion

•Limitations: The current version of our prototype recommends

only a subsequent component based on previously used connection.

• Future works: Improve the approach recommending a

component investigating the whole path. Specify a context to each workflow. Apply weight to each relation based on workflow

usage.

COPPE/UFRJ

16

Using Provenance to Improve Workflow Design

UFRJ

Frederico TostaLeonardo MurtaClaudia WernerMarta Mattoso

{ftoliveira, murta, werner, marta}@cos.ufrj.br

COPPE – Federal University of Rio de Janeiro - Brazil

Recommended