Upload
leon-osinski
View
174
Download
0
Embed Size (px)
DESCRIPTION
research data management, data stewardship, research data management planning, research data labs, research data archives
Citation preview
Research data management
PROOF course Finding and controlling
scientific literature and data
TU/e, 2015
[email protected], TU/e IEC/Library
Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
Agenda
1. Research data management [RDM]: what and why
2. RDM before your research: data management plan
[discussion]
3. RDM during your research: protecting and sharing your data via a data lab
4. RDM after your research: publishing and archiving your data via a data archive
Source: Research Data Netherlands / Marina Noordegraaf
Research data management [RDM]
RDM: caring* for your data with the purpose of
1. protecting their mere existence, and;
2. making them available to others - during and after your research project
Data sharing implies RDM, or: RDM prepares the way for sharing your data during and after the project
*Goodman A, et al. (2014) Ten simple rules for the care and feeding of scientific data. PLoS Comput Biol 10(4): e1003542. doi:10.1371/journal.pcbi.1003542
“Rule 3. Conduct science with a particular level of reuse in mind”
During your research Because you work together with other researchers
After your research Because of scientific integrity: validating results by replication
requires data
Because of re-using results: data-driven science
Because your data are unique / not easily repeatable (long term observational data)
Because you benefit from it: increases your visibility and enhances the trustworthiness of your research
Why sharing research data? #1
Because it’s expected by
+ Journals [here, here, here, here]
+ Professional organizations [VSNU, KNAW]
+ Research evaluators
+ Universities, including TU/e
+ Research funders [NWO, ZonMW, EC] data management plan
Why sharing research data? #2
EC: Horizon 2020 #1Open research data pilot
“… aims to improve and maximise access to and re-use of research data generated by projects for the benefit of society and the economy.”
“Regarding the digital research data (…), the beneficiaries must: deposit in a research data repository and take measures to make it possible (…) to access, mine, exploit, reproduce, and disseminate – free of charge for any user (…) the data …”
“Participating projects will be required to develop a Data Management Plan(DMP), in which they will specify what data will be open.” [ italics mine ]
The DMP should address:
1. Data set reference and name
2. Data set description
3. Standards and metadata
4. Data sharing
5. Archiving and preservation
EC: Horizon 2020 #2Open research data pilot: data management plan [DMP]
Research data should be:
1. Discoverable
2. Accessible
3. Assessable and intelligible
4. Useable beyond the original purpose
5. Interoperable
DMP template by 3TU.Datacentrum
NWOpilot data management: scope
“The pilot applies to the following seven funding rounds:
Vici
Research talent (Social sciences)
Innovative public private partnership in ICT (Physical sciences)
Fund new chemical innovations (Chemical sciences)
HTM call (Hightech materials) (Technology foundation STW)
Urbanising deltas of the world of security and the rule of law (WOTRO)
Open programme (Earth and life sciences).”
NWOpilot data management: additional information #1
“Researchers are expected to answer four questions about data management in the research proposal (data management section).”
“After a proposal has been awarded funding, the researcher should elaborate the section into a data management plan. Within four months of the research project being awarded funding, the researcher must have submitted the first version of the data management plan to NWO.”
“For this data management plan, NWO has chosen a template that matches the guidelines for data management from Horizon 2020 as closely as possible.” [italics mine]
“During the pilot, the data management section will not be included in the decision about the awarding of funding.”
“NWO understands ‘data’ to be both collected, unprocessed data as well as analysed, generated data. (…). NWO only requests storage of data that are relevant for reuse. [italics mine]
NWOpilot data management: additional information #2
Research data managementdiscussion topics and questions
Storage and back-up
Where do you keep your research data?
Is there a back-up? Where?
Are data selections made? Not everything is to be stored but…?Metadata and documentation
Do you describe your research data? Who measured or collected what, when, how? Other context information?
Are you content with the way you document or describe your research data? Do you succeed in finding the right (version of your) research data?
Can other researchers understand and (re-)use your research data (during and after research)? Should they be able to?
Access and re-use
Who can access your research data?
What will happen to your research data when you leave TU/e?
Would you consider publishing your research data, i.e. to make them public available?
Data management plan assignment [ N=5 ]
Collection Observation during measurements (lab journal), measurement data (from apparatus, tiff files), simulation data, Matlab, Excel, PDF’s, Origin (creation of graphs), .csv, .ascii, questionnaire, SPSS, GIS
Storage, backup Own laptop, network drive, portable/external hard drive, cloud storage
(secondary backup), measurement-pc, user-pc
Documenta-tion Aimed at understanding and re-use: lab journal, accompanying Excel-/Word-
files naming, organizing data in folders + README’s, organized by data of acquisition and method of measurement
Access During your research: all users of the apparatus, access policy of network drive, SVN (version control + access control), under confidentiality, openly after publication, open
Sharing When your research is done: with colleagues, conferences, through university file servers, published as part of thesis (open), unknown
Preservation When your research is done and in the long run: DVD’s (raw and processed data), no archiving, data can be produced by running the models at any time, unknown
Source: Research Data Netherlands / Marina Noordegraaf
Protection against physical loss and destructionstorage, backupdata classification and retention; different treatment of different data
Protection against intellectual loss and unretrievability - using the correct dataMetadata, data documentation+ catalogue metadata, for discovery: creator & title data set, abstract …+ study metadata: more or less similar to the Methodology section of a paper: info on
provenance of data, workflow of data collection, instruments used, data validation + data-level metadata, for re-use by humans and machines, often embedded in software
packages: variable and code descriptions in tables or databases, codebook+ license-information: what are others allowed to do with your data?file-naming, organizing data in folders, versioning,using a relational database [ instead of Excel ]
Protection against unauthorized useaccess control
RDM during your researchprotecting and sharing your data
File-naming
File-naming conventions help you find your data, help others to find your data and help track which version of a file is most current
A good file name distinguishes a file from files with similar subjects as well as different versions of the file
Avoid using special characters in a file name: \ / : * ? < > | [ ] & $ , .
Use underscores instead of periods or spaces to separate logical elements in a file name
Avoid very long names: usually 25 characters is sufficient length
Use descriptive names, indicative of the content
Names should include all necessary descriptive information independent of where it is stored
Include dates Include a version number on files Be consistent Add a readme.txt to each folder in which the
file naming and its meaning is explained
Source: File naming conventions<
File organization
PAGE 156-3-2015
<Source: Beatriz Ramirez, Data management plan for the PhD project: development and application of a monitoring system to assess the impacts of climate and land cover changes on eco-hydrological processes in an eastern Andes catchment area
Dataverse Network: data lab for active research data where you may store your data in an organized and safe way clearly describe your data version control of your data arrange access to your data get recognition for your data [collaborate on your data]
Data lab surrogates: Google Drive, Dropbox,[ SURFdrive ], Beehub…
SURF Filesender [data transfer up to 100 Gb]
RDM during your researchdata labs
Storage and backup of data through DANS [Dutch Archiving and Networking Services]Data transfer: up to 2 Gb per datasetDataverse 3TU.Datacentrum: up to 50 Gb free
Workshop on Dataverse Network, by Leon Osinski
Workshop on Mendeley, by Rikie Deurenberg
We will contact you to ask if you’re interested!
RDM during your researchDataverse Network and Mendeley workshop
On request (informal, peer to peer sharing)“Reinhart and Rogoff kindly provided us with the working spreadsheet from the RR analysis. With the working spreadsheet, we were able to approximate closely the published RR results. While using RR's working spreadsheet, we identified coding errors, selective exclusion of available data, and unconventional weighting of summary statistics.”
Herndon, T., Ash, M., Pollin, R. (2013), Does high public debt consistently stifle economic growth? : a critique of Reinhart and Rogoff
“I'd like to thank E.J. Masicampo and Daniel LaLande for sharing and allowing me to share their data…”
Daniël Lakens (2014), What p-hacking really looks like: A comment on Masicampo & LaLande (2012)
On a (personal) website“Let me start by saying that the reason why I put all excel files online, including all the detailed excel formulas about data constructions and adjustments, is precisely because I want to promote an open and transparent debate about these important and sensitive measurement issues.”
Thomas Piketty, My response to the Financial Times, HuffPost The Blog, 29-05-2014 ; originally published as Addendum: Response to FT, 28-05-2014
RDM after your researchsharing data after your research #1
Source: www.aukeherrema.nl
A data journalJournal of open psychology data, Geoscience data journal, Data in brief , Scientific data, Frontiers data reports
A data archive or repository Catalogues of research data repositories: Databib, Re3data.org Zenodo, Figshare, DANS, Dryad, B2SHARE 3TU.Datacentrum
+ small medium sized data sets, long tail data+ static data, ‘frozen’ data sets+ preferably nonproprietary software formats suitable for long term
preservation+ DOI’s [ persistent identifier for citability and retrievability ]+ open access+ long-term availability, Data Seal of Approval+ Data Citation Index (Thomson Reuters)+ self-upload (single data sets < 4Gb)+ special collections of related data sets
RDM after your researchsharing data after your research #2
Attach your data to your publication
“What research data and waste have in common is that’s worthwhile to reuse them.”
Lilliana Abarca-Guerrero (2014), A construction waste generation model for developing countries, PhD thesis TU/e, proposition 9
“Psychology journals should require, as a condition for publication, that
data supporting the results in the paper are accessible in an
appropriate public archive”
Daniël Lakens (2014), Psychology journals should make data sharing a
requirement for publication
RDM after your researchsharing your data of your PhD thesis
RDMtime consuming and laborious but also…
“Oh yes, there are certainly benefits from this. Doing this once means it will be easier in the future (increased efficiency), so one benefit is reduced future opportunity costs. Other benefits include personal satisfaction and the indirect benefits that come from archiving and publishing in OA journals – I can now list the datasets and code on NSF Biosketches as a “product” resulting from previous funding. As I say in the post, I also expect future publications to be much easier to producebecause the data and code are well organized and annotated. I will be doing the same calculations for the next paper using these data/code and writing a follow-up post.” [ italics mine ]
Emilio M. Bruna
Data Coach [ website ]
Data librarian
Leon Osinski, Merle Rodenburg
Recommended readingVan den Eynden, Veerle e.a. (2011), Managing and sharing data: best practice for researchers, UK Data Archive
Van den Eynden, Veerle e.a. (2014), Managing and sharing research data: a guide to good practice, London: Sage [available via TU/e Library]
Recommended online course
Essentials 4 data support [English & Dutch]
Support
Be prepared to share your data after your research because it’s required and because you benefit from it
Preparation = careful and responsible data management duringyour research
[You’ll receive an evaluation form after the course by e-mail. Don’t forget to fill it in.]
Source: Research Data Netherlands / Marina Noordegraaf
Wrap up
1. Website IEC/Library [TU/e]: http://w3.tue.nl/en/services/library/
2. Data sharing increases visibility: http://dx.doi.org/10.7717/peerj.175
3. Data sharing enhances trustworthiness: http://dx.dor.org/10.1371/journal.pone.0026828
4. Data availability policy journals: http://www.nap.edu/openbook.php?record_id=10613&page=33
5. Data availability policy American Economic Review: https://www.aeaweb.org/aer/data.php
6. Data availability policy PLoS: http://www.plos.org/plos-data-policy-faq/
7. Data availability policy Nature: http://www.nature.com/authors/policies/availability.html
8. VSNU Code of Scientific Conduct (Dutch, revision 2014): http://www.vsnu.nl/files/documenten/Domeinen/Onderzoek/Code_wetenschapsbeoefening_2004_(2014).pdf
9. KNAW responsible research data management: https://www.knaw.nl/en/news/publications/responsible-research-data-management-and-the-prevention-of-scientific-misconduct?set_language=en
10. Research evaluators (Standard evaluation protocol 2015-2021): http://www.vsnu.nl/SEP
11. Radboud University research data policy: http://www.ru.nl/library/services-0/research/expert-centre/vm/policy-radboud/
12. TU/e Code of Scientific Conduct: http://www.tue.nl/en/university/about-the-university/integrity/scientific-integrity/
13. NWO and research data: http://www.nwo.nl/en/news-and-events/dossiers/datamanagement
URL’s of mentioned webpagesin order of appearance #1
14. ZonMW Toegang tot data: http://www.zonmw.nl/nl/programmas/programma-detail/toegang-tot-data-ttdata/algemeen/
15. Horizon 2020 Guidelines on data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
16. Data management plan template (3TU.Datacentrum): http://datacentrum.3tu.nl/en/what-we-offer/data-management-plan/
17. Loss of data: http://www.cursor.tue.nl/en/news-article/artikel/doctorate-ends-in-drama-after-car-burglary-1/
18. Storage, back up of data: http://www.data-archive.ac.uk/create-manage/storage
19. Catalogue metadata: http://www.data-archive.ac.uk/create-manage/document/metadata
20. Study metadata: http://www.data-archive.ac.uk/create-manage/document/study-level
21. Data-level metadata: http://www.data-archive.ac.uk/create-manage/document/data-level
22. File naming: http://www.ncdcr.gov/portals/26/pdf/guidelines/filenaming.pdf
23. Organizing data: http://www.wageningenur.nl/en/Expertise-Services/Facilities/Library/Expertise/Write-cite/Research-data-1/Data-management-plans.htm [example 2]
24. Version control: http://www.data-archive.ac.uk/create-manage/format/versions
25. Using a relational database: http://geekgirls.com/category/office/databases/ , see also http://www.datacarpentry.org and http://dx.doi.org/10.1890/0012-9623-90.2.205
URL’s of mentioned webpagesin order of appearance #2
26. Kien Leong (2010), The seven deadly spreadsheet sins: http://production-scheduling.com/seven-deadly-spreadsheet-sins/
27. Dataverse Network: http://www.dataverse.nl
28. Google Drive: https://www.google.com/drive/
29. Dropbox: http://www.dropbox.com
30. SURFdrive: https://surfdrive.surf.nl
31. Beehub: https://beehub.nl/system/
32. Data on request (Reinhart-Rogoff paper): http://dx.doi.org/10.1257/aer.100.2.573
33. Data on request (blog post Daniel Lakens): http://daniellakens.blogspot.nl/2014/09/what-p-hacking-really-looks-like.html
34. Data on personal website (Thomas Piketty): http://piketty.pse.ens.fr/en/capital21c2
35. Data journal: Journal of Open Psychology Data: http://openpsychologydata.metajnl.com/
36. Data journal: Geoscience Data Journal: http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060
37. Data journal: Data in brief: http://www.journals.elsevier.com/data-in-brief
38. Data journal: Scientific data: http://www.nature.com/sdata/
URL’s of mentioned webpagesin order of appearance #3
39. Data journal: Frontiers data reports: http://www.frontiersin.org/news/Data_Reports_a_new_type_of_peer-reviewed_article_in_Frontiers_journals/1051?utm_source=FRN&utm_medium=ECOM&utm_campaign=TWT_FRN_1502_datareport
40. Research data catalogue: Databib: http://databib.org/
41. Research data catalogue: Re3data.org: http://service.re3data.org/search/results?term=
42. Publishing data: Zenodo: http://www.zenodo.org/
43. Publishing data: Figshare: http://www.figshare.com
44. Publishing data: DANS: http://www.dans.knaw.nl/en
45. Publishing data: Dryad: http://datadryad.org/
46. Publishing data: B2SHARE: https://b2share.eudat.eu/
47. Publishing data: 3TU.Datacentrum: http://data.3tu.nl/
48. Long tail research data: http://www.nature.com/neuro/journal/v17/n11/fig_tab/nn.3838_F1.html
49. Nonproprietary software formats: http://datacentrum.3tu.nl/fileadmin/editor_upload/File_formats/Digital_Preservation_Support_levels.pdf
50. Data Seal of Approval: http://www.datasealofapproval.org
URL’s of mentioned webpagesin order of appearance #4
51. Data Citation Index (Thomson Reuters): http://wokinfo.com/products_tools/multidisciplinary/dci/
52. Self upload 3TU.Datacentrum: https://data.3tu.nl/account/signin/?next=/upload/
53. Data set underlying PhD thesis Lilliana Abarca-Guerrero: http://dx.doi.org/10.4121/uuid:31d9e6b3-77e4-4a4c-835e-5c3b211edcfc
54. PhD thesis Lilliana Abarca-Guerrero: http://repository.tue.nl/770952
55. Blogpost Daniël Lakens: http://daniellakens.blogspot.nl/2014/12/psychology-journals-should-require-data.html
56. Emilio M. Bruna, The opportunity cost of my #OpenScience… : http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690/
57. Data Coach: http://w3.tue.nl/en/services/library/about/services/datacoach/
58. Van den Eynden, V. e.a. Managing and sharing data: best practice for reseachers: http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
59. Essentials 4 data support: http://datasupport.researchdata.nl/
URL’s of mentioned webpagesin order of appearance #4