25
Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Embed Size (px)

Citation preview

Page 1: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science in CSIRO

RESEARCH DATA SUPPORT/ IM&T

Sue Cook & Dom Hogan | CSIRO Information Management & Technology29 April 2015

Page 2: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Started 1665350 years• Primary output is the

journal article • Citations link outputs• Rewards based on

numbers of outputs and citations to outputs since 1955

Supporting reproducible science | Sue Cook2 |

Started 1993*

25 years

*give or take

Page 3: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

ChangeThe goal of reproducibility means that all science outputs and contributions – articles, data and software – need publication and citations to link them

3 | Supporting reproducible science | Sue Cook

Data Software

Provenance

Page 4: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

“Citable”?

One person’s opinion - Wilke, 2015:1. Uniquely and unambiguously citable2. Available in perpetuity, in unchanged form3. Accessible to the public 4. Self-contained and complete5. Attributable authorship

“websites hosting scientific software will usually fail at least conditions 2 and 3, and thus would not be citable by my criteria.”

Journal Editors and Peer Reviewers are the gatekeepers

4 | Supporting reproducible science | Dominic Hogan

Page 5: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Early example: MDBSY

• Murray Darling Basin Sustainable Yields• Source data licensing• Quality control• Provenance• Informs policy decisions that

have large impact – decisions that wind up being defended in court. Data transparency is essential, but data quality is also essential.

5 |

Page 6: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Self-Serve Repository – Metadata and data

6 |

Page 7: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Self-Serve Repository – IP guides

7 |

Page 8: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Legal issues

• Data licences– Creative Commons promotes

reuse, but is your data derived from something with restricted permissions?

– CSIRO Data Licence: non-commercial, does not allow redistribution. Restricts reuse, but lower risk.

8 |

Page 9: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Software

• More licences available• Binaries vs Code• IP issues:

– derived code?– Open source

development?– Patents?

9 |

Page 10: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook10 |

Page 11: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

11 |

Link to code repository for updates and development

Link to the related publication

Link to the data

Licence and supplement

Attribution

Supporting reproducible science | Sue Cook

http://dx.doi.org/10.4225/08/536302C43FC28

Page 12: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

12 |

Software citation

Data citation

Supporting reproducible science | Sue Cook

Page 13: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Storage and permissions

• A controlled space allows for persistence, version control and security.• This is good for getting

DOIs, but…• What about linking to

data hosted elsewhere?• Hosted services?• Data Access Portal has

grown over 100TB in the last year – the growth rate will increase.

13 |

Page 14: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook14 |

If 1GB = 1 box trailer…33.3 minutes at ADSL 2

1TB = 33 B-Doubles23.1 days at ADSL 2

1PB = 3 supertankers63 years at ADSL 2

Page 15: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Data volumes

• CAWCR Wave Hindcast – ~10 TB moves slowly over ADSL

15 |

Page 16: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Australian Square Kilometre Array Pathfinder

• ASKAP – processing a data stream of 70 Tb/s (that’s 8.75 TB)• The data rates

arriving at the Pawsey Centre are 2.5 GB/s (or 75 PB per year) – we can’t store this much• Full operation will

deal with 16 TB per day (5.7 PB per year)

16 |

Page 17: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

ASKAP Data Management

Supporting reproducible science | Sue Cook20 |

Page 18: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

“Progressive” DOIs

18 |

Page 19: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Provenance

19 |

Page 20: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Provenance Management System (PROMS)

20 |

Don’t try this at home!Instead, go to http://ands.org.au/partner/provenance_interest_group.html

Page 21: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Sue Cook

Some elements to connect

21 |

Systems

InfrastructureProcesses

(e.g. Quality Control,Approval)

Legal

Licensing Intellectual Property

Culture

Training

Fulfillingneeds

… … …

Policy

Page 22: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Thanks

• Research Data Support team• Dom Hogan,David Benn, Anne Stevenson, John Morrissey, Cynthia Love • CSIRO Information Management & Technology

• CSIRO Applications team• CSIRO Scientific Computing team• Australia Telescope National Facility• Ian Corner for the supertanker analogy• Nick Car for the provenance slides• Australian National Data Service (ANDS)

22 | Supporting reproducible science | Dominic Hogan

Page 23: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Questions?

Supporting reproducible science | Sue Cook

Page 24: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

Supporting reproducible science | Dominic Hogan

References

• Paul L Dineen. Blue. Photo, April 16, 2010. https://www.flickr.com/photos/pauldineen/4529213297/.

• "Philosophical Transactions Volume 1 frontispiece" by Henry Oldenburg - Philosophical Transactions. Licensed under CC BY 4.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Philosophical_Transactions_Volume_1_frontispiece.jpg#mediaviewer/File:Philosophical_Transactions_Volume_1_frontispiece.jpg

• Wilke, Claus. “What Constitutes a Citable Scientific Work?” The Serial Mentor, January 2, 2015. http://serialmentor.com/blog/2015/1/2/what-constitutes-a-citable-scientific-work

• CSIRO. Water availability in the Murray-Darling basin : summary of a report to the Australian Government. 2008-10. https://publications.csiro.au/rpr/pub?pid=legacy:683

• Whan, Alex, Matt Bolger, Leanne Bischof (2014): GrainScan - Software for analysis of grain images. v2. CSIRO. Data Collection. http://dx.doi.org/10.4225/08/536302C43FC28

• Durrant, Tom, Diana Greenslade, Mark Hemer, Claire Trenham (2014). A Global Wave Hindcast focussed on the Central and South Pacific. CAWCR Technical Report No. 070. http://www.cawcr.gov.au/publications/technicalreports/CTR_070.pdf

• Car, Nicholas (2014). Inter-agency standardised provenance reporting in Australia. eResearch Australasia, 27-31 October 2014. Melbourne, Australia. 10p. https://publications.csiro.au/rpr/pub?pid=csiro:EP145084

24 |

Page 25: Supporting reproducible science in CSIRO RESEARCH DATA SUPPORT/ IM&T Sue Cook & Dom Hogan | CSIRO Information Management & Technology 29 April 2015

IM&T/Research Data SupportSue CookData Librariant +61 8 64368532e [email protected]

CSIRO IM&T

Thank you