25
Date: 10/11/2012 Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo *, Pinar Alper , Khalid Belhajjame , Oscar Corcho *, Yolanda Gil Ŧ , Carole Goble * Universidad Politécnica de Madrid, University of Manchester, Ŧ USC Information Sciences Institute IEEE eScience 2012. Chicago, USA

Common Motifs in Scientific Workflows: An Empirical Analysis

  • Upload
    dgarijo

  • View
    707

  • Download
    1

Embed Size (px)

DESCRIPTION

Slides for the e-Science 2012 presentation for the paper: Common Motifs in Scientific Workflows: An Empirical Analysis. The paper provides an analysis on 177 workflows from Taverna and Wings workflow systems, across diverse domains. The analysis highlights the commonmotifs or patterns that were found in the templates based on the functionality of each workflow step.

Citation preview

Page 1: Common Motifs in Scientific Workflows: An Empirical Analysis

Date: 10/11/2012

Common Motifs in Scientific Workflows: An Empirical

Analysis

Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho *, Yolanda Gil Ŧ, Carole Goble ⱡ

* Universidad Politécnica de Madrid,ⱡ University of Manchester,

Ŧ USC Information Sciences Institute

IEEE eScience 2012. Chicago, USA

Page 2: Common Motifs in Scientific Workflows: An Empirical Analysis

2

Overview

• Empirical analysis on 177 workflow templates from Taverna and Wings

• Catalog of recurring patterns: scientific workflow motifs.

• Data Oriented Motifs

• Workflow Oriented Motifs

•Understandability and reuse

IEEE eScience 2012. Chicago, USA

Catalog

http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg

Page 3: Common Motifs in Scientific Workflows: An Empirical Analysis

3

Background

• Workflows as software artifacts that capture the scientific method• Addition to paper publication• Reuse

• Existing repositories of workflows (myExperiment)• Sharing workflows• Exploring existing workflows.

• PROBLEMS to address:• Sometimes workflows are difficult to understand• Workflow descriptions depend on tools/files

• Decay of workflows• Identify good practices for workflow design

IEEE eScience 2012. Chicago, USA

http://www.myexperiment.org

Page 4: Common Motifs in Scientific Workflows: An Empirical Analysis

4

Approach

•Reverse-engineer the set of current practices in workflowdevelopment through an analysis of empirical evidence

•Identify workflow abstractions that would facilitateunderstandability and therefore effective re-use

IEEE eScience 2012. Chicago, USA

Page 5: Common Motifs in Scientific Workflows: An Empirical Analysis

5

Taverna and Wings

IEEE eScience 2012. Chicago, USA

http://www.taverna.org.uk/

http://www.wings-workflows.org/

Page 6: Common Motifs in Scientific Workflows: An Empirical Analysis

6

Workflow Motifs

•Workflow motif: Domain independent conceptual abstraction on the workflow steps.1. Data-oriented motifs: What kind of manipulations does the workflow have?

• E.g.: • Data retrieval • Data preparation• etc.

2. Workflow-oriented motifs: How does the workflow perform its operations?

•E.g.:• Stateful steps• Stateless steps• Human interactions• etc.

IEEE eScience 2012. Chicago, USA

WHAT?

HOW?

Page 7: Common Motifs in Scientific Workflows: An Empirical Analysis

7

Data Oriented MotifsData-Oriented Motifs

Data Retrieval

Data Preparation

Format Transformation

Input Augmentation and Output Splitting

Data Organisation

Data Analysis

Data Curation/Cleaning

Data Moving

Data Visualisation

IEEE eScience 2012. Chicago, USA

Page 8: Common Motifs in Scientific Workflows: An Empirical Analysis

8

Data Oriented MotifsData-Oriented Motifs

Data Retrieval

Data Preparation

Format Transformation

Input Augmentation and Output Splitting

Data Organisation

Data Analysis

Data Curation/Cleaning

Data Moving

Data Visualisation

IEEE eScience 2012. Chicago, USA

Page 9: Common Motifs in Scientific Workflows: An Empirical Analysis

9

Data Oriented MotifsData-Oriented Motifs

Data Retrieval

Data Preparation

Format Transformation

Input Augmentation and Output Splitting

Data Organisation

Data Analysis

Data Curation/Cleaning

Data Moving

Data Visualisation

IEEE eScience 2012. Chicago, USA

Page 10: Common Motifs in Scientific Workflows: An Empirical Analysis

10

Data Oriented MotifsData-Oriented Motifs

Data Retrieval

Data Preparation

Format Transformation

Input Augmentation and Output Splitting

Data Organisation

Data Analysis

Data Curation/Cleaning

Data Moving

Data Visualisation

IEEE eScience 2012. Chicago, USA

Page 11: Common Motifs in Scientific Workflows: An Empirical Analysis

11

Data Oriented MotifsData-Oriented Motifs

Data Retrieval

Data Preparation

Format Transformation

Input Augmentation and Output Splitting

Data Organisation

Data Analysis

Data Curation/Cleaning

Data Moving

Data Visualisation

IEEE eScience 2012. Chicago, USA

Page 12: Common Motifs in Scientific Workflows: An Empirical Analysis

12

Data Oriented MotifsData-Oriented Motifs

Data Retrieval

Data Preparation

Format Transformation

Input Augmentation and Output Splitting

Data Organisation

Data Analysis

Data Curation/Cleaning

Data Moving

Data Visualisation

IEEE eScience 2012. Chicago, USA

Page 13: Common Motifs in Scientific Workflows: An Empirical Analysis

13

Workflow Oriented MotifsWorkflow-Oriented Motifs

Intra-Workflow Motifs

Stateful (Asynchronous) Invocations

Stateless (Synchronous) Invocations

Internal Macros

Human Interactions

Inter-Workflow Motifs

Atomic Workflows

Composite Workflows

Workflow Overloading

IEEE eScience 2012. Chicago, USA

Page 14: Common Motifs in Scientific Workflows: An Empirical Analysis

14

Workflow Oriented MotifsWorkflow-Oriented Motifs

Intra-Workflow Motifs

Stateful (Asynchronous) Invocations

Stateless (Synchronous) Invocations

Internal Macros

Human Interactions

Inter-Workflow Motifs

Atomic Workflows

Composite Workflows

Workflow Overloading

IEEE eScience 2012. Chicago, USA

Page 15: Common Motifs in Scientific Workflows: An Empirical Analysis

15

Workflow Oriented MotifsWorkflow-Oriented Motifs

Intra-Workflow Motifs

Stateful (Asynchronous) Invocations

Stateless (Synchronous) Invocations

Internal Macros

Human Interactions

Inter-Workflow Motifs

Atomic Workflows

Composite Workflows

Workflow Overloading

IEEE eScience 2012. Chicago, USA

Page 16: Common Motifs in Scientific Workflows: An Empirical Analysis

16

Workflow Oriented MotifsWorkflow-Oriented Motifs

Intra-Workflow Motifs

Stateful (Asynchronous) Invocations

Stateless (Synchronous) Invocations

Internal Macros

Human Interactions

Inter-Workflow Motifs

Atomic Workflows

Composite Workflows

Workflow Overloading

IEEE eScience 2012. Chicago, USA

Page 17: Common Motifs in Scientific Workflows: An Empirical Analysis

17

Workflow Oriented MotifsWorkflow-Oriented Motifs

Intra-Workflow Motifs

Stateful (Asynchronous) Invocations

Stateless (Synchronous) Invocations

Internal Macros

Human Interactions

Inter-Workflow Motifs

Atomic Workflows

Composite Workflows

Workflow Overloading

IEEE eScience 2012. Chicago, USA

Page 18: Common Motifs in Scientific Workflows: An Empirical Analysis

18

Experiment setup

IEEE eScience 2012. Chicago, USA

•177 Workflow templates

• 111 from Taverna, sample from myExperiment• 66 from Wings, available in public server (now as Linked Data)• Diverse domains

Drug D

iscove

ry

Astronomy

Biodiversi

ty

ChemInformati

cs

Genomics

GeoInformati

cs

IST600

TextAnaly

tics05

10152025303540

TavernaWings

Page 19: Common Motifs in Scientific Workflows: An Empirical Analysis

19

Result Summary: Data Oriented Motifs

IEEE eScience 2012. Chicago, USA

•Over 60% of the motifs are data preparation motifs• Of the 4 subcategories, the most common across domains are output

splitting, input augmentation, and reformatting steps.

•Data retrieval common in domains where curated databases exist

•Data analysis is often the main functionality of the workflow

Data organisation

Page 20: Common Motifs in Scientific Workflows: An Empirical Analysis

20

Result Summary: Workflow Oriented Motifs

IEEE eScience 2012. Chicago, USA

• Around 40% composite workflows and internal macros• Workflow reuse is present even in some atomic workflows

•Human interactions steps increasingly used in some domains

Page 21: Common Motifs in Scientific Workflows: An Empirical Analysis

21

Differences and commonalities of the workflow systems

IEEE eScience 2012. Chicago, USA

•Data moving/retrieval, stateful interactions and human interaction steps are not present in Wings• Web services (Taverna) versus software components (Wings)• Wings has layered execution through Pegasus

•Data preparation steps are common in both systems

•Use of sub workflows is high

Page 22: Common Motifs in Scientific Workflows: An Empirical Analysis

22

Discussion

IEEE eScience 2012. Chicago, USAhttp://www.sandensconsulting.com/images/DataObfuscation.jpg

Our observations:

• Obfuscation of scientific workflows• The abundance of data preparation

steps make the functionality of the workflow unclear.

• Decay of scientific workflows • Create an abstract description.

• Good practices for workflow design• Sub-workflows

• Workflow overloading

Method in paperWorkflow

Page 23: Common Motifs in Scientific Workflows: An Empirical Analysis

•Empirical analysis of scientific workflows177 workflows • 2 different systems • A variety of heterogeneous domains

•Workflow motif catalog• Data oriented motifs• Workflow oriented motifs

•Future work: automatic abstractions on workflowsTemplate analysis Trace analysis (provenance) Include other workflow systems

23

Conclusions and future work

IEEE eScience 2012. Chicago, USA

Page 24: Common Motifs in Scientific Workflows: An Empirical Analysis

24

Who are we?

• Pinar AlperSchool of Computer Science, University of Manchester

• Khalid BelhajjameSchool of Computer Science, University of Manchester

• Oscar CorchoOntology Engineering Group, UPM

• Yolanda GilInformation Sciences Institute, USC

• Carole GobleSchool of Computer Science, University of Manchester

EU Wf4Ever project (270129) funded under EU FP7 (ICT- 2009.4.1). (http://www.wf4ever-project.org)

IEEE eScience 2012. Chicago, USA

Page 25: Common Motifs in Scientific Workflows: An Empirical Analysis

Date: 10/11/2012

Common Motifs in Scientific Workflows: An Empirical

Analysis

Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho *, Yolanda Gil Ŧ, Carole Goble ⱡ

* Universidad Politécnica de Madrid,ⱡ University of Manchester,

Ŧ USC Information Sciences Institute

IEEE eScience 2012. Chicago, USA