Upload
evangeline-walsh
View
215
Download
2
Embed Size (px)
Citation preview
Supporting reproducible science in CSIRO
RESEARCH DATA SUPPORT/ IM&T
Sue Cook & Dom Hogan | CSIRO Information Management & Technology29 April 2015
Started 1665350 years• Primary output is the
journal article • Citations link outputs• Rewards based on
numbers of outputs and citations to outputs since 1955
Supporting reproducible science | Sue Cook2 |
Started 1993*
25 years
*give or take
ChangeThe goal of reproducibility means that all science outputs and contributions – articles, data and software – need publication and citations to link them
3 | Supporting reproducible science | Sue Cook
Data Software
Provenance
“Citable”?
One person’s opinion - Wilke, 2015:1. Uniquely and unambiguously citable2. Available in perpetuity, in unchanged form3. Accessible to the public 4. Self-contained and complete5. Attributable authorship
“websites hosting scientific software will usually fail at least conditions 2 and 3, and thus would not be citable by my criteria.”
Journal Editors and Peer Reviewers are the gatekeepers
4 | Supporting reproducible science | Dominic Hogan
Supporting reproducible science | Sue Cook
Early example: MDBSY
• Murray Darling Basin Sustainable Yields• Source data licensing• Quality control• Provenance• Informs policy decisions that
have large impact – decisions that wind up being defended in court. Data transparency is essential, but data quality is also essential.
5 |
Supporting reproducible science | Sue Cook
Self-Serve Repository – Metadata and data
6 |
Supporting reproducible science | Sue Cook
Self-Serve Repository – IP guides
7 |
Supporting reproducible science | Sue Cook
Legal issues
• Data licences– Creative Commons promotes
reuse, but is your data derived from something with restricted permissions?
– CSIRO Data Licence: non-commercial, does not allow redistribution. Restricts reuse, but lower risk.
8 |
Supporting reproducible science | Sue Cook
Software
• More licences available• Binaries vs Code• IP issues:
– derived code?– Open source
development?– Patents?
9 |
Supporting reproducible science | Sue Cook10 |
11 |
Link to code repository for updates and development
Link to the related publication
Link to the data
Licence and supplement
Attribution
Supporting reproducible science | Sue Cook
http://dx.doi.org/10.4225/08/536302C43FC28
12 |
Software citation
Data citation
Supporting reproducible science | Sue Cook
Supporting reproducible science | Sue Cook
Storage and permissions
• A controlled space allows for persistence, version control and security.• This is good for getting
DOIs, but…• What about linking to
data hosted elsewhere?• Hosted services?• Data Access Portal has
grown over 100TB in the last year – the growth rate will increase.
13 |
Supporting reproducible science | Sue Cook14 |
If 1GB = 1 box trailer…33.3 minutes at ADSL 2
1TB = 33 B-Doubles23.1 days at ADSL 2
1PB = 3 supertankers63 years at ADSL 2
Supporting reproducible science | Sue Cook
Data volumes
• CAWCR Wave Hindcast – ~10 TB moves slowly over ADSL
15 |
Supporting reproducible science | Sue Cook
Australian Square Kilometre Array Pathfinder
• ASKAP – processing a data stream of 70 Tb/s (that’s 8.75 TB)• The data rates
arriving at the Pawsey Centre are 2.5 GB/s (or 75 PB per year) – we can’t store this much• Full operation will
deal with 16 TB per day (5.7 PB per year)
16 |
ASKAP Data Management
Supporting reproducible science | Sue Cook20 |
Supporting reproducible science | Sue Cook
“Progressive” DOIs
18 |
Supporting reproducible science | Sue Cook
Provenance
19 |
Supporting reproducible science | Sue Cook
Provenance Management System (PROMS)
20 |
Don’t try this at home!Instead, go to http://ands.org.au/partner/provenance_interest_group.html
Supporting reproducible science | Sue Cook
Some elements to connect
21 |
Systems
InfrastructureProcesses
(e.g. Quality Control,Approval)
Legal
Licensing Intellectual Property
Culture
Training
Fulfillingneeds
… … …
Policy
Thanks
• Research Data Support team• Dom Hogan,David Benn, Anne Stevenson, John Morrissey, Cynthia Love • CSIRO Information Management & Technology
• CSIRO Applications team• CSIRO Scientific Computing team• Australia Telescope National Facility• Ian Corner for the supertanker analogy• Nick Car for the provenance slides• Australian National Data Service (ANDS)
22 | Supporting reproducible science | Dominic Hogan
Questions?
Supporting reproducible science | Sue Cook
Supporting reproducible science | Dominic Hogan
References
• Paul L Dineen. Blue. Photo, April 16, 2010. https://www.flickr.com/photos/pauldineen/4529213297/.
• "Philosophical Transactions Volume 1 frontispiece" by Henry Oldenburg - Philosophical Transactions. Licensed under CC BY 4.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Philosophical_Transactions_Volume_1_frontispiece.jpg#mediaviewer/File:Philosophical_Transactions_Volume_1_frontispiece.jpg
• Wilke, Claus. “What Constitutes a Citable Scientific Work?” The Serial Mentor, January 2, 2015. http://serialmentor.com/blog/2015/1/2/what-constitutes-a-citable-scientific-work
• CSIRO. Water availability in the Murray-Darling basin : summary of a report to the Australian Government. 2008-10. https://publications.csiro.au/rpr/pub?pid=legacy:683
• Whan, Alex, Matt Bolger, Leanne Bischof (2014): GrainScan - Software for analysis of grain images. v2. CSIRO. Data Collection. http://dx.doi.org/10.4225/08/536302C43FC28
• Durrant, Tom, Diana Greenslade, Mark Hemer, Claire Trenham (2014). A Global Wave Hindcast focussed on the Central and South Pacific. CAWCR Technical Report No. 070. http://www.cawcr.gov.au/publications/technicalreports/CTR_070.pdf
• Car, Nicholas (2014). Inter-agency standardised provenance reporting in Australia. eResearch Australasia, 27-31 October 2014. Melbourne, Australia. 10p. https://publications.csiro.au/rpr/pub?pid=csiro:EP145084
24 |
IM&T/Research Data SupportSue CookData Librariant +61 8 64368532e [email protected]
CSIRO IM&T
Thank you