Wf4Ever: Scientific Workflows and Research Objects as tools for scientific insight and methodology curation

  • Published on
    18-Dec-2014

  • View
    350

  • Download
    3

Embed Size (px)

DESCRIPTION

Astronomers are being drowned in data: facilities like ALMA currently provide datasets in the Gigabyte range, and increasing, while facilities like the LSST and the SKA will generate datasets large enough so that data download, even of the reduced datasets, will not be feasible. In this talk we will introduce the concept of Scientific Workflows, as software tools that allow for the easy exploration of both local and remote datasets and processing services, and of Research Objects, which encapsulate all relevant aspects of a scientific experiment, and allow for its quantitative and qualitative assessment, enable reuse with proper attribution, and linkage to publications, among others. The AstroTaverna plugin, with astronomy-specific for workflow creation, was also presented in this ALMA Weekly Seminar.

Transcript

  • 1. Wf4Ever: Scientic Workows and Research Objects as tools for scientic insight and methodology curation Juande Santander-Vela jdsant@iaa.es Instituto de Astrofsica de Andaluca-CSIC
  • 2. Talk Outline Introduction Current challenges for radio astronomy and science Potential e-Science solutions: Workows and Research Objects Final points
  • 3. Introduction
  • 4. Who am I? Member of the AMIGA international collaboration, based at IAA-CSIC Ph.D. on bringing Radio Astronomical data archives and tools into the VO Applied Scientist at ESO VLT archive, Software Engineer/Astronomy Specialist at ALMA archive (May 2009-Dec 2011) Back to IAA-CSIC as VIA-SKA Project Manager, Radio Astroinformatician GROUP INTEREST IN TECH DEVELOPMENTS FOR BETTER SCIENCE
  • 5. Why Im here? Collaboration with Stephane Leon and the ALMA Data Management Group Helping bring the ALMA Science Archive to theVO Modelling radio data cubes Finding use cases for workow technology (seelater)
  • 6. AMIGA Analysis of the interstellar Medium of Isolated GAlaxies Multi-wavelength, multi-object study on isolated galaxies with strict isolation criteria Careful curation of data Very careful processing of new parameters from Groups own observation programs and data reduction Literature table scanning Virtual Observatory table harvesting and parsing Emphasis on marrying astronomy and computerscience, and buy-in of the VO E-SCIENCE USERS
  • 7. AMIGA Analysis of the interstellar Medium of Isolated GAlaxies Multi-wavelength, multi-object study on isolated galaxies with strict isolation criteria Careful curation of data Very careful processing of new parameters from Groups own observation programs and data reduction Literature table scanning Virtual Observatory table harvesting and parsing Emphasis on marrying astronomy and computerscience, and buy-in of the VO E-SCIENCE DEVELOPERS!
  • 8. AMIGA Project goal: providing a baseline for galaxy properties to compare with other environments Interaction-free sample, ideal for tracing HI infall: we can use CIG galaxies to detect the cosmic web Need for very sensitive telescopes able to resolve faint HI Square Kilometre Array & pathnders PARTICIPATING IN SKA.TEL.SDP CONSORTIUM WE NEED TOOLS FOR OUR OWN SCIENCE ANALYSIS
  • 9. Current challenges for radio astronomy and science
  • 10. Data over-abundance Moores Law for Detectors Exponential increase of individual and accumulated data sets We have more data than ever but we cant use it: Because we cant: Dicult to set up (for sharing) Dicult to nd (for using) Dicult to document (both using and sharing) Dicult to deal with (because of size, formatting, purpose) Because it is not in our best interest FULLY ?
  • 11. Courtesy J.E. Ruiz (AMIGA,Wf4Ever)
  • 12. Courtesy J.E. Ruiz (AMIGA,Wf4Ever) Tools!
  • 13. Data sharing Search Go Advanced search Home News & Comment Research Careers & Jobs Current Issue Archive Audio & Video For Authors SPECIALS See all specials Editorial Feature Opinion Elsewhere in Nature DATA SHARING Sharing data is good. But sharing your own data? That can get complicated. As two research communities who held meetings in May on the issue report their proposals to promote data sharing in biology, a special issue of Nature examines the cultural and technical hurdles that can get in the way of good intentions. Data SharingSpecials & supplements archiveArchive DATA FLIRTING DATA HOARDING IRREPRODUCIBLE RESEARCH ?
  • 14. Irreproducible researchSearch Go Advanced search Home News & Comment Research Careers & Jobs Current Issue Archive Audio & Video For Authors SPECIAL See all specials Editorial News and analysis Comment Perspectives and reviews CHALLENGES IN IRREPRODUCIBLE RESEARCH No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. Nature has published a series of articles about the worrying extent to which research results have been found wanting in this respect. The editors of Nature and the Nature life sciences research journals have also taken substantive steps to put our own houses in order, in improving the transparency and robustness of what we publish. Journals, research laboratories and institutions and funders all have an interest in tackling issues of irreproducibility. We hope that the articles contained in this collection will help. Free full access Challenges in irreproducible researchSpecials & supplements archiveArchive nature.com Sitemap Cart Login Register Search Go Advanced search Home News & Comment Research Careers & Jobs Current Issue Audio & Video For Authors SPECIAL See all specials Editorial News and analysis Comment Perspectives and reviews CHALLENGES IN IRREPRODUCIBLE RESEARCH No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. Nature has published a series of articles about the worrying extent to which research results have been found wanting in this respect. The editors of Nature and the Nature life sciences research journals have also taken substantive steps to put our own houses in order, in improving the transparency and robustness of what we publish. Journals, research laboratories and institutions and funders all have an interest in tackling issues of irreproducibility. We hope that the articles contained in this collection will help. Free full access Challenges in irreproducible researchSpecials & supplements archiveArchive
  • 15. Irreproducible researchSearch Go Advanced search Home News & Comment Research Careers & Jobs Current Issue Archive Audio & Video For Authors SPECIAL See all specials Editorial News and analysis Comment Perspectives and reviews CHALLENGES IN IRREPRODUCIBLE RESEARCH No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. Nature has published a series of articles about the worrying extent to which research results have been found wanting in this respect. The editors of Nature and the Nature life sciences research journals have also taken substantive steps to put our own houses in order, in improving the transparency and robustness of what we publish. Journals, research laboratories and institutions and funders all have an interest in tackling issues of irreproducibility. We hope that the articles contained in this collection will help. Free full access Challenges in irreproducible researchSpecials & supplements archiveArchive CHALLENGES IN IRREPRODUCIBLE RESEARCH No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. Nature has published a series of articles about the worrying extent to which research results have been found wanting in this respect. The editors of Nature and the Nature life sciences research journals have also taken substantive steps to put our own houses in order, in improving the transparency and robustness of what we publish. Journals, research laboratories and institutions and funders all have an interest in tackling issues of irreproducibility. We hope that the articles contained in this collection will help. Free full access
  • 16. Tool over-abundance ++
  • 17. Starship Asterisk* APOD and General Astronomy Discussion Forum Board index Learning & Resources The Engineering Deck: Astrophysics Source Code Library FAQ Register Login Search this forum Search 671 topics Page 1 of 7 1 2 3 4 5 ... 7 The Engineering Deck: Astrophysics Source Code Library Search Search Advanced search Post a new topic ANNOUNCEMENTS REPLIES VIEWS LAST POST Welcome & Rules (please read before posting) by RJN Mon Jan 18, 2010 7:40 pm 0 15666 by RJN Mon Jan 18, 2010 7:40 pm TOPICS REPLIES VIEWS LAST POST Guide to the Astrophysics Source Code Library by RJN Sat Jul 24, 2010 8:01 pm 13 17027 by owlice Mon Jul 01, 2013 3:32 am 1 2 Papers of Possible Interest to Astronomical Software Users by owlice Tue Oct 12, 2010 7:02 am 27 7056 by owlice Wed May 15, 2013 1:31 pm 1 2 The Astrophysics Source Code Library: New codes welcome by RJN Sat Jul 24, 2010 8:01 pm 26 5273 by Eran Ofek Thu Dec 13, 2012 9:20 pm *Web Resources and Tools for Astrophysicists/Astronomers* by owlice Sat Jul 16, 2011 12:01 pm 22 2750 by owlice Fri May 10, 2013 12:12 pm 2011 and 2012 Additions to the ASCL by owlice Thu Feb 24, 2011 11:26 pm 23 1693 by owlice Sat Dec 08, 2012 8:09 pm 21cmFAST: Simulation of the High-Redshift 21-cm Signal by owlice Thu Feb 17, 2011 10:47 pm 0 3443 by owlice Thu Feb 17, 2011 10:47 pm 2LPTIC: 2nd-order Lagrangian Perturbation Theory Initial Con by owlice Tue Jan 03, 2012 5:27 am 0 855 by owlice Tue Jan 03, 2012 5:27 am 2MASS Kit: 2MASS Catalog Server Kit by owlice Sun Mar 17, 2013 5:16 pm 0 214 by owlice Sun Mar 17, 2013 5:16 pm 3DEX: Fast Fourier-Bessel Decomposition of Spherical 3D Surv by owlice Sat Nov 26, 2011 4:00 pm 0 741 by owlice Sat Nov 26, 2011 4:00 pm AAOGlimpse: Three-dimensional Data Viewer by owlice Sat Oct 15, 2011 11:29 am 0 1034 by owlice Sat Oct 15, 2011 11:29 am ACORNS-ADI: Calibration, Registration and Nulling in Imaging by kcd Sat Mar 30, 2013 7:40 am 0 177 by kcd Sat Mar 30, 2013 7:40 am ACS: ALMA Common Software by kcd Sat Feb 09, 2013 3:44 am 0 269 by kcd Sat Feb 09, 2013 3:44 am 671 topics Page 1 of 7
  • 18. Services too!
  • 19. How to deal with all this? ++ All of this compounds the problems of reproducibility, methodology assessment, result dissemination
  • 20. How to deal with all this? AND THE CODE? WHAT SOFTWARE DOES IT DEPEND ON? WHICH CODE DID WHAT? NOT AGOOD SOLUTION TRADITIONALLY
  • 21. How to deal with all this? ++ ORCHESTATION, ENCAPSULATION, DATA ACCESS, PROVENANCE, ANNOTATION
  • 22. Why Workows? SCIENTIFIC
  • 23. Workows dene computations Events & Processes Dependencies Resources Local & Remote Processes Sequences Concurrences Triggers FORMALLY, OR AT LEAST MACHINE READABLE WORKFLOW DEFINITION LANGUAGES
  • 24. Workows enable distributed computing Distributed computing paradigm Move computation to the data Computing services Collaborative environments Linked data FOR SCIENTIFIC DISCUSSION & SCIENCE EXTRACTION Science-computing
  • 25. Workows enable distributed computing Data can be anywhere Workows can be constructed hierarchicaly Each workow does useful work on its own The data ow can be easily followed
  • ...