12
1 Ilkay ALTINTAS- September, 2013 Ilkay ALTINTAS, Ph.D. San Diego Supercomputer Center, UCSD http://users.sdsc.edu/~altintas Roles and Challenges for Scientific Workflows and Provenance in the Age of Open Science, Cloud Computing and Web 2.0

Invited Talk for EUDAT Workshop in Barcelona

Embed Size (px)

DESCRIPTION

EUDAT Workflow Workshop - September 25th, 2013

Citation preview

Page 1: Invited Talk for EUDAT Workshop in Barcelona

1 Ilkay ALTINTAS- September, 2013

Ilkay ALTINTAS, Ph.D. San Diego Supercomputer Center, UCSD http://users.sdsc.edu/~altintas

Roles and Challenges for!Scientific Workflows and Provenance !

in the Age of Open Science, !Cloud Computing and Web 2.0!

Page 2: Invited Talk for EUDAT Workshop in Barcelona

2 Ilkay ALTINTAS- September, 2013

Workflows are a Part of Cyberinfrastructure!

Workflow Design!

!

Reporting!!

Workflow Monitoring!

!

Workflow Execution!

!!

Workflow Scheduling

and Execution Planning!

!!

Run !Review!

!

Provenance Analysis!

!

!

Deploy !and !

Publish!!

Accelerate Workflow Design and Reuse via a Drag-and-Drop Visual Interface

Facilitate Sharing

Schedule, Run and Monitor Workflow Execution

Promote Learning

Support for end-to-end computational scientific process BUILD SHARE RUN LEARN

Page 3: Invited Talk for EUDAT Workshop in Barcelona

3 Ilkay ALTINTAS- September, 2013

What motivated this?!

Page 4: Invited Talk for EUDAT Workshop in Barcelona

4 Ilkay ALTINTAS- September, 2013

Facilitating and Accelerating XXX-Info or Comp-XXX Research using Scientific Workflows!•  Important Attributes"

– Assemble complex processing easily"

– Access transparently to diverse resources "

–  Incorporate multiple software tools "

– Assure reproducibility "

– Build around community development model "

Page 5: Invited Talk for EUDAT Workshop in Barcelona

5 Ilkay ALTINTAS- September, 2013

In addition, workflows today are… !•  Encapsulations of scientific knowledge"•  Easy to share bits of scientific process"

– e.g., as research objects"•  Mostly portable"•  Facilitate and encourage reproducible science"

– Track provenance at each step of science… "•  Key integrator for (big and small) data science"•  A means to standardize scientific data

products"

Page 6: Invited Talk for EUDAT Workshop in Barcelona

6 Ilkay ALTINTAS- September, 2013

Pushing the boundaries of scientific computing created

new requirements.!

Page 7: Invited Talk for EUDAT Workshop in Barcelona

7 Ilkay ALTINTAS- September, 2013

The ‘bioinformatics’ Bottleneck!•  Resources needed for sequence analysis far

exceed the costs of sequence generation"– Cloud computing is an attractive on-demand

decentralized model"– Need new scheduling capabilities"

•  on-demand access to a shared configurable resources "•  networks, servers, storage, applications, and services"

– Need ability to easily combine users environment and community tools together with workflow "

– Various tools with different profiles"

Page 8: Invited Talk for EUDAT Workshop in Barcelona

8 Ilkay ALTINTAS- September, 2013

The ‘sensor data’ bottleneck!•  Data streaming in at various rates"•  “Big Data” by definition in its volume, variety,

velocity and viscosity"– Workflows can improve veracity and add value by

providing provenance- and standards-aware on-the-fly archival capabilities"

– Workflows can QA/QC and automate (real-time) analysis of streaming data before it is even archived."

Page 9: Invited Talk for EUDAT Workshop in Barcelona

9 Ilkay ALTINTAS- September, 2013

The ‘HPC’ bottleneck!•  Scaling for exascale not happening very

naturally"– Different memory architectures"– Analysis codes being redeveloped"– Just scheduling through the batch schedulers not

enough"– HPC workflows are becoming more interactive "–  In-situ data analysis to deal with volumes of data"

Page 10: Invited Talk for EUDAT Workshop in Barcelona

10 Ilkay ALTINTAS- September, 2013

As users see the value, they say: !•  Increase reuse "

–  best development practices by the scientific community"–  other bio packages"

•  Increase programmability by end users"–  users with various skill levels "–  to formulate actual domain specific workflows"

•  Increase resource utilization"–  optimize execution across available computing resources "–  in an efficient, transparent and intuitive manner"

•  Make workflows a part of the end-to-end scientific model from data generation to publication"

Page 11: Invited Talk for EUDAT Workshop in Barcelona

11 Ilkay ALTINTAS- September, 2013

What are some next steps?!•  Specialize workflow systems with domain-specific "

–  Tools; Data models and formats; User interfaces; Deployment "

•  Workflow publications and data repositories"–  Treat workflows same as data"–  Strong virtualization capability"

•  Standards for provenance needed"–  For data and for process"

•  Build upon prior knowledge by detecting best practice programming patterns and motifs"

•  Cater to cater to different hardware architectures"

Page 12: Invited Talk for EUDAT Workshop in Barcelona

12 Ilkay ALTINTAS- September, 2013

Ilkay Altintas [email protected] @ilkayaltintas @bioKepler @KeplerWorkflow @WIFIREProject

Thanks! & Questions…!

How to download Kepler? https://kepler-project.org/users/downloads Please start with the short Getting Started Guide: https://kepler-project.org/users/documentation