Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

Embed Size (px)

Citation preview

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    1/31

    Building Scientific Workflows on the Grid: A Comparison

    between OpenMole and Taerna

    Bojana Koteska Boro Jakimovski Anastas Mishev

    November 2014, Bucharest, Romania

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    2/31

    Oeriew

    ! Scientific workflows

    ! Characteristics of Scientific Workflow S"stems:

      OpenMole and Taverna

    ! Workflow #mplementation in OpenMole and Taerna

    ! Conclusion and future work

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    3/31

    Scientific workflows

    !  workflows needed for different scientific applications and

    scientific e$periments

    !designing% automating% controlling and managing thecomple$ flow of data and processes

    ! scientific communities & workflow s"stems to perform

    e$periments with a large set of data and computations

    ! business companies ' workflow s"stems to automate the

    manufacturing process and to aoid redundant tasks(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    4/31

    Scientific workflows

    ! Workflow - automation of a process during which documents%

    information% or tasks are passed from one participant to another

    for action% according to a set of procedural rules(

    ! Workflow engine ' software serice or )engine) that proidesthe run time e$ecution enironment for a workflow instance(

    ! Workflow management system ' a s"stem that defines%

    creates% and manages the e$ecution of workflows through theuse of software running on one or more workflow engines(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    5/31

    Grid

    ! coordinated use of man" heterogeneous and distributed

    resources

    !soling problems in man" computing and data intensiescientific applications *ph"sics% biolog" and chemistr"+

    ! large scalabilit"% transparent and consistent access to

    resources% distributed supercomputing support% high'throughput computing support% data'intensie computing

    support

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    6/31

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    7/31

    OpenMole

    ! parallel e$ecution enironments for naturall" parallel

    processes

    !adanced numerical e$periments on simulation models

    ! distribution of the workflows on multi'core machines%

    desktop'grids% clusters and grids

    ! scalable and it supports TB of data and millions of tasks(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    8/31

    Mole components

    ! Tasks

    ! Transitions

    ! -rotot"pes

    ! Samplings! .nironments

    ! /ooks

    ! Sources

    ! #n order to be run% a mole must contain at least one task

    ! and a starting task *capsule+

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    9/31

    Mole components

    ! Prototype ' a ariable that operates in the workflow *must

    be a part of a task & input or output+!  name% t"pe *#nteger% String% etc(+% dimension * 0'scalar% 1'ector%

    2' matri$% etc(+(

    ! Task ' seeral t"pes of tasks: Groo"% .$ploration%

    S"stem.$ec% Mole Task% Agent based model% Sensitiit"%

    and Stat(! name% input and output slots *for receiing or sending protot"pe to

    another task+% enironment *to assign a computing resources to a

    gien task ' grid% cluster%((+

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    10/31

    Mole components ' Tasks

    ! Groovy task ' run some small pieces of script code or call code running on

    the 34M *3aa% Scala((+

    ! Exploration task ' enables to embed a sampling which will proide an arra"

    of combination of protot"pes at the runtime(

    ! SystemExec task ' e$ecute C% C55 and -"thon codes(

    !  Mole task ' runs another Mole( This enables to embed a Mole within a Task(

    ! Stat task ' calculates the aerage% the sum and the median of protot"pe with

    dimension one(

    !  Sensitivity task ' enables ariance'based method

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    11/31

    Mole components

    ! Sampling ' can be composed graphicall"%! man" kinds of samplings: complete% shuffle% 6ip% combine7

    ! and domains: range% multiple file% single file% uniform distribution%

    logarithmic range domain% etc(

    ! Hook ' a Mole listener( #t performs a particular action on the

    output(

    ! Source ' reads data from a CS4 file and maps it toprotot"pes included in the workflow

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    12/31

    Taerna

    ! aailable as a desktop client application *Taerna

    Workbench+% Taerna Serer and command line

    !

    integrated with two other m"Grid Tools: my experiment*workflow sharing enironment+ and Bioatalogue

    *catalogue of Web serices for 8ife Sciences+

    ! scalable scientific workflow s"stem which has access to

    local and remote resources and grid serices *more than

    900+(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    13/31

    Taerna

    ! support for calling client libraries *in ;ub" and 3aa+ and

    tools and scripts on local or remote machines(

    !

    users define how data will flow between serices% but the"should not care how the serices will be inoked(

    ! workflow can be shared on m".$periment and searched

     and downloaded from other users(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    B ildi S i ifi W kfl h G id A C i b O M l d T

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    14/31

    Taerna components

    ! Services &! WS,8'st"le Web serices *path! Oauth serices

    ! component serices

    ! string constant *for setting a fi$ed'alue input for a serice+(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    B ildi S i tifi W kfl th G id A C i b t O M l d T

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    15/31

    Taerna components

    ! !ists an" iterations ' Serices in Taerna can return single

    alues or lists of alues or lists of lists(

    !  !oop ' usuall" used for inoking as"nchronous serices '

    web serice or similar where an" action should be repeated(

    ! ontrol link ' used to set dependencies between serices

    that do not directl" share data

    ! Merges ' combination of outputs from more serices into

    one single input(

    ! Parallel service invocation ' used when the same serice

    should be inoked more times in parallel(

    ! omponent ' a reusable unit of functionalit" which is

    defined b" a workflow(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    B ildi S i tifi W kfl th G id A C i b t O M l d T

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    16/31

    Comparison between OpenMole and Taerna

    ! similarities in terms of functionalit"

    ! differ slightl" in the wa" of their implementation

    ! Taverna ' addition of e$isting Web serices% the automatic

    conersion of the list depths in order to connect two

    serices

    !  #penMole & simpler definition of composite inputs *for e$(

    making combinations of the elements in arra"+

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    B ildi S i tifi W kfl th G id A C i b t O M l d T

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    17/31

    Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    18/31

    Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    19/31

    The Workflow

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    20/31

    The workflow

    ! circle represents a file containing some data

    ! s?uares a1@ a2@ a9@(( a file actions' merging of files

    contents

    !  choose alwa"s combinations of 9 files

    !  two copies of the file list in the director"

    ! each file list is shuffled and the first files of each of the

    three file lists are chosen

    !)folderA) represents the director" where the combinationsare chosen

    ! result of an each action a1@ a2@ a9@(( a is a file saed in a

    new director" )folderBD

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    21/31

    The workflow

    ! actions a1@ a2@ a9@(( a are performed in parallel

    ! when three files are created in )folderB)% the action b is

    performed' merges the contents of the 9 files into a new file

    ! files in )folderB) are deleted

    ! b is repeated E9 F 21 times in each iteration and it is

    performed in parallel with the actions a1@ a2@ a9@(( a

    immediatel" after creating 9 new files in )folderBD

    ! The new file is moed to )folderAD

    ! workflow can be e$ecuted repeatedl"

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    22/31

    The Workflow #mplementation in OpenMole

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    23/31

    The Workflow #mplementation in OpenMole

    ! )defining i) capsule' groo" task% setting the integer protot"pe i to

    1% number of workflow iterations% output i 

    ! )combinations) capsule ' e$ploration task% defining the sampling! Three multiple file domains )in folderA) and three file protot"pes Dfile)%

    Dfile1) and Dfile 2)!  All three multiple domains hae the same director" path

    !  The samplings: )ip with string name)% )ip with string1 name) and )ip

    with string2 name) are used for making arra"s of file names in )folderA)(

    ! .ach arra" is shuffled *Shuffle sampling+ and first elements are taken

    *Take *+ Sampling+(! The four file names of each arra" are combined *Combine sampling+

    ! The combinations * $ $ + are made(

    ! H outputs from this task : the protot"pe i  and arra"s *fileI1J%

    file1I1J% file2I1J% stringI1J% string1I1J and string2I1J+(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    24/31

    The Workflow #mplementation in OpenMole

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    25/31

    The Workflow #mplementation in OpenMole

    ! capsule )comb( 9 files) ' groo" task% includes a

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    26/31

    The Workflow #mplementation in OpenMole

    ! capsule )increase i) % groo" task used for increasing the

    integer protot"pe i(! transition between )comb( 9 firstD and this capsule is an aggregation

    ! e$ecuted once in each workflow iteration(

    ! capsule )comb( 9 first) ' groo" task% includes a

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    27/31

    The Workflow #mplementation in Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    28/31

    The Workflow #mplementation in Taerna

    ! )Workflow 1) ' nested workflow ' e$ecuted n times(! input n ' list of integer numbers(

    ! integer input port ni & increasing in each workflow iteration

    ! Serice ),irector"-athD' beanshell script ' the path of the

    director" )folderA) is specified as a string )director")(! 9 serices )8ist Kiles)% )8ist Kiles 1)% )8ist Kiles 2) ' identical%

    but used for making the same arra" of files in the input

    director"(

    ! .ach arra" is shuffled and first four elements are taken *3aa code+! random combinations% each of the three files

    ! Output of each serice ' list of four files(

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    29/31

    The Workflow #mplementation in Taerna

    ! Serice )concatenate and write in files b" 9 KilesD! #nputs with length 0 ' it takes one b" one element of each

    ! of the three lists of four files *combinations+

    !  Also a beanshell script and it merges the contents of each

    combination of three files & repeated times *a1@ a2@ a9@ 7 a+(

    ! Serice )combine first three) ' checking if three files e$ist

    and merging them in the director" )folderBD

    ! .$ecuted in parallel with the preious serice )concatenate and writein les b" 9 Kiles)(

    ! contents are merged into a new file

    g p p

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    30/31

    Conclusion and Kuture Work

    ! anal"6ed the most important characteristics of the two workflow

    s"stems

    ! ,ifferent implementation logic! Taerna offers natural parallelism of processes

    ! in OpenMole the user should e$plicitl" define that

    ! Taverna ' more suitable for designing comple$ workflows

    *plent" of predefined serices+

    ! #penMole ' simple and interactie interface% useful for peoplethat do not hae e$perience in designing workflows% defining

    samples automaticall"

    ! anal"6ing and measuring the e$ecution times of more comple$

    workflows in both workflow s"stems(

    g p p

    Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taerna

  • 8/19/2019 Building Scientific Workflows on the Grid- A Comparison between OpenMole and Taverna

    31/31

    Thank Lou for the attention

    g p p