24
1 Digital Science Reproducibility and Visibility in Astronomy José Enrique Ruiz on behalf of the Wf4Ever Team SCIOPS 2013 ESAC, FRIDAY 13th SEPTEMBER 2013

Digital Science: Reproducibility and Visibility in Astronomy

Embed Size (px)

DESCRIPTION

The science done in Astronomy is digital science, from observing proposals to final publication, to data and software used: each of the elements and actions involved in scientific output could be recorded in electronic form. This fact does not prevent the final outcome of an experiment is still difficult to reproduce. This procedure can be long, tedious, not easily accessible or understandable, even to the author. At the same time, we have a rich infrastructure of files, observational data and publications. This could be used more efficiently if we reach greater visibility of the scientific production, which avoids duplication of effort and reinvention. Reproducibility is a cornerstone in scientific method, and extraction of relevant information in the current and future data flood is key in Astronomy. The AMIGA group (Analysis of the interstellar Medium of Isolated GAlaxies, IAA-CSIC, http://amiga.iaa.es) faces these two challenges in the European project "Wf4Ever: Advanced technologies for enhanced preservation workflow Science" to enable the preservation of the methodology in scalable semantic repositories to facilitate their discovery, access, inspection, exploitation and distribution. These repositories store the experiments on "Research Objects" whose main constituents are digital scientific workflows. These provide a comprehensive view and clear scientific interpretation of the experiment as well as the automation of the method, going beyond the usual pipelines that normally end up in data processing. The quantitative leap in volume and complexity of the next generation of archives will need analysis and data mining tasks to live closer to the data, in computing and distributed storage environments, but they should also be modular enough to allow customization from scientists and be easily accessible to foster their dissemination among the community. Astronomy is a collaborative science, but it has also become highly specialized, as many other disciplines. Sharing, preservation, discovery and a much simplified access to resources in the composition of scientific workflows will enable astronomers to greatly benefit from each other’s highly specialized knowhow, they constitute a way to push Astronomy to share and publish not only results and data, but also processes and methodologies. We will show how the use of scientific workflows can help to improve the reproducibility of the experiment and a more efficient exploitation of astronomical archives, as well as the visibility of the scientific methodology and its reuse.

Citation preview

Page 1: Digital Science: Reproducibility and Visibility in Astronomy

1

Digital Science Reproducibility and Visibility in Astronomy

José Enrique Ruiz on behalf of the Wf4Ever Team

SCIOPS 2013 ESAC, FRIDAY 13th SEPTEMBER 2013

Page 2: Digital Science: Reproducibility and Visibility in Astronomy

2

Wf4Ever Digital Science - Reproducibility and Visibility in Astronomy

1.  Intelligent Software Components (ISOCO, Spain) 2.  University of Manchester (UNIMAN, UK) 3.  Universidad Politécnica de Madrid (UPM, Spain) 4.  Poznan Supercomputing and Networking Centre (Poland) 5.  University of Oxford and OeRC (OXF, UK)

6.  Instituto Astrofísica Andalucía (IAA-CSIC, Spain) 7.  Leiden University Medical Centre (LUMC, NL)

4

Wf4Ever Advanced Workflow Preservation Technologies for Enhanced Science

3 1 6

7 5

2

2011 - 2013

Page 3: Digital Science: Reproducibility and Visibility in Astronomy

3

Astronomy research lifecycle is entirely digital »  Observation proposals »  Data reduction pipelines »  Analysis of science ready data »  Catalogs of objects and data archives »  Publish process

›  Final data results ›  Experiment in DL ADS/arXiv

Reproducible research is still not possible in a digital world

A rich infrastructure of data is not efficiently used

A normalized preservation of methodology is needed

Tools

Astronomy Research Lifecycle Digital Science - Reproducibility and Visibility in Astronomy

Page 4: Digital Science: Reproducibility and Visibility in Astronomy

4

Reproducibility and The Scientific Method Digital Science - Reproducibility and Visibility in Astronomy

http://xkcd.com/242/

Benefits »  Publishing knowledge, not advertising »  The author, the referee, the re-user »  Reputation, prestige and respect »  Higher quality of publications

›  Authors will be more careful ›  Many eyes to check results

Challenges »  Hard and time consuming »  Need incentives – not rewarded now

Page 5: Digital Science: Reproducibility and Visibility in Astronomy

5

Reproducibility and The Scientific Method Digital Science - Reproducibility and Visibility in Astronomy

I don’t know how!

Page 6: Digital Science: Reproducibility and Visibility in Astronomy

6

Visibility, Efficiency and Reuse Digital Science - Reproducibility and Visibility in Astronomy

Optimize return on investments made on big facilities »  Avoid duplication of efforts and reinvention »  How to discover and not duplicate ? »  How to re-use and not duplicate ? »  How to make use of best practices ? »  How to use the rich infrastructure of data ? »  Intellectual contribs are encoded in software

More data in archives does not imply more knowledge »  Expose complete scientific record, not the story »  Allow easy discovery of methods and tools

Page 7: Digital Science: Reproducibility and Visibility in Astronomy

7

Visibility and Social Discovery Digital Science - Reproducibility and Visibility in Astronomy

Paper discovery: the social dimension

Page 8: Digital Science: Reproducibility and Visibility in Astronomy

8

The Executable Paper Digital Science - Reproducibility and Visibility in Astronomy

Time has come to go beyond the PDF

Page 9: Digital Science: Reproducibility and Visibility in Astronomy

9

Digital Astronomy in the Local Desktop Digital Science - Reproducibility and Visibility in Astronomy

Going beyond automation Organization!

Page 10: Digital Science: Reproducibility and Visibility in Astronomy

10

Digital Astronomy in the Local Desktop Workflows to Access and Massage VO Data

# CIG Vhel e_Vhel r_Vhel Dist MType e_MType OptAssym r_MType Bmag e_Bmag 1 7299.0 3.0 1 96.9 5.0 1.5 1 1 14.167 0.271 0.173 0.571 0.040 13.383 2 6983.0 6.0 2 94.7 6.0 1.5 0 1 15.722 0.324 0.255 0.278 0.031 15.157 3 4.0 1.5 0 1 16.057 0.507 0.246 0.354 15.457 4 2310.0 1.0 3 31.9 3.0 1.5 0 1 12.818 0.424 0.252 0.863 0.017 11.685 5 7865.0 10.0 3 105.9 0.0 1.5 0 1 15.602 0.364 0.225 0.131 0.118 15.128 72 5164.0 9.0 2 68.5 5.0 1.5 1 1 14.445 0.325 0.315 0.367 0.028 13.735

Capture !Actions, Tasks, Dependencies, Provenance!

!Improve !

Clarity and Reproducibility!

Page 11: Digital Science: Reproducibility and Visibility in Astronomy

11

Scientific Workflows Digital Science - Reproducibility and Visibility in Astronomy

Living Tutorials!Templates for Re-use!

Expedite Training!Reduce time to insight!

Avoid reinvention!

Digital Libraries of workflows may boost the use of the existing infrastructure of data (VO) !

Page 12: Digital Science: Reproducibility and Visibility in Astronomy

12

!!

Software

›  Taverna ›  Kepler ›  Pegasus ›  Triana ›  ESO Reflex

Scientific Workflows Digital Science - Reproducibility and Visibility in Astronomy

Related Initiatives ›  ER-Flow ›  VAMDC ›  HELIO ›  Cyber-SKA ›  IceCore ›  Montage ›  Astro-WISE ›  AstroGrid

IVOA ›  AstroGrid ›  Grid&WS WG ›  VO France Wf WG

Self descriptive WS ›  PDL ›  SimDAL, S3

Page 13: Digital Science: Reproducibility and Visibility in Astronomy

13

!!

AstroTaverna: Create, annotate and run a workflow http://amiga.iaa.es/p/290-astrotaverna.htm

Astronomical Research Objects in Action Digital Science - Reproducibility and Visibility in Astronomy

Page 14: Digital Science: Reproducibility and Visibility in Astronomy

14

!!

AstroTaverna: Create, annotate and run a workflow http://amiga.iaa.es/p/290-astrotaverna.htm

Astronomical Research Objects in Action Digital Science - Reproducibility and Visibility in Astronomy

Page 15: Digital Science: Reproducibility and Visibility in Astronomy

15

The next generation of archives Digital Science - Reproducibility and Visibility in Astronomy

Prof. Kevin Vinsen

ASKAP Datacubes

Page 16: Digital Science: Reproducibility and Visibility in Astronomy

16

The next generation of archives Digital Science - Reproducibility and Visibility in Astronomy

Prof. Kevin Vinsen

SKA Datacubes

Page 17: Digital Science: Reproducibility and Visibility in Astronomy

17

The next generation of archives Digital Science - Reproducibility and Visibility in Astronomy

Much wider FoV and spectral coverage »  Large volumes for a single observed dataset Automated surveys »  Huge amounts of tabular data

We are moving into a world where »  computing and storage are cheap »  data movement is death

Extraction of scientifically relevant info from a multiD param. space »  Exploration services »  Anomaly detection »  Cross-matching data »  Dimensionality reduction

Detailed inspection and subset »  Filtering »  Extraction »  Re-Projection »  Analysis services

Page 18: Digital Science: Reproducibility and Visibility in Astronomy

18

»  A cloud of Web Services

»  Archives speaking Web Services

Process should benefit of the same privileges acquired by data Preserving the method ensures replication of final results at any moment

Archives should evolve from data providers into »  Virtual Data providers »  Software Tasks providers

Astronomy of multi archives/facilities/wavelength Interconnected and interoperable archives »  Data -> Virtual Observatory »  Software Tasks

The next generation of archives Digital Science - Reproducibility and Visibility in Astronomy

Preservation

The move computing to data paradigm

Page 19: Digital Science: Reproducibility and Visibility in Astronomy

19

Research Objects Digital Science - Reproducibility and Visibility in Astronomy

Distributed

Technical Objects Social Objects

Expose experimental context in a structured way in order to be understood

Page 20: Digital Science: Reproducibility and Visibility in Astronomy

20

Research Objects Digital Science - Reproducibility and Visibility in Astronomy

IPython Notebook solutions »  Web-browser as the working desktop »  Python code, plots and data, living with rich-text documentation »  Cloud-based adaptive scalable computing environment »  Fully shareable, re-usable and executable wikis »  Social platform and Git versioning

Page 21: Digital Science: Reproducibility and Visibility in Astronomy

21

Research Objects Digital Science - Reproducibility and Visibility in Astronomy

ADSLabs ADO Linked Components »  Authors »  Publications »  Journals »  Objects SIMBAD »  Tabular data behind the plots CDS »  ASCL reference of used software »  Observing time Proposals »  Used facilities, surveys or missions

http://labs.adsabs.harvard.edu/

Incentives

Similar Initiative to ESO Telbib!

Page 22: Digital Science: Reproducibility and Visibility in Astronomy

22

!!

The Incentive Papers with data links are cited more than those without

Research Objects Digital Science - Reproducibility and Visibility in Astronomy

Effect of E-printing on Citation Rates in Astronomy and Physics 2006. Edwin A. Henneken et al.

Page 23: Digital Science: Reproducibility and Visibility in Astronomy

23

!!

The Incentive Papers with data links are cited more than those without

Research Objects Digital Science - Reproducibility and Visibility in Astronomy

Effect of E-printing on Citation Rates in Astronomy and Physics 2006. Edwin A. Henneken et al.

Page 24: Digital Science: Reproducibility and Visibility in Astronomy

24

Conclusions Digital Science - Reproducibility and Visibility in Astronomy

»  Reproducibility is at the very heart of the scientific method »  Improving visibility is key in order to avoid reinvention »  Social dimension of science stressed in the discovery process »  Highly specialized science needs re-use to achieve efficiency »  In a digital world, publish decomposable executable papers »  Capture provenance and structure in the local desktop »  Scientific workflows go beyond automation: provide clarity and structure »  Transfer rate is more than an issue for next generation of archives »  The move computing to data paradigm -> back to old terminals »  Process should benefit of the same privileges acquired by data »  Digital libraries of web-services-based workflows »  The distributed digital workflow-centric Research Object »  Preserving knowledge - not only data or advertising

[email protected]