Upload
maarten-van-bentum
View
198
Download
0
Embed Size (px)
Citation preview
Course Research Data ManagementMaarten van Bentum (Library & Archive)
Blackboard
UT website, employees page
ORG-AA-BA-RESDATAMAN: Course Research Data Management
Course material: presentations, links to information, DMP template, datasets
After the course-day: contact for support and feedback
Why research data management
• Importance of quality, reliability, replicability and verification of scientific research
• Better and more efficient access to research data• Requirements of research funders with regard to data
management• Data management will become an issue in research
assessments
Benefits research data management
• Improved research quality• Improved efficiency• Protection from data-related risks• Enhanced reputation and prestige
Research Data Management: importance (1/2)
Scientific integrity (1), funder requirements (2) and developments in science (3)
(1) Fabrication, Falsification and Plagiarism (FFP) > RDM?
Neglect of basic preservation of data Neglect of data management No proper mechanism for quality control: no data or instruments
for easy data reproduction means no possible check
See also: https://www.utwente.nl/en/organization/structure/management/good-management/
Netherlands Code of Conduct for Academic Practice: Verification section
Research Data Management: importance (2/2)
(2) NWO and EU Horizon 2020 data management pilots
Focus on open data and reuse
Data Management Plan
Data archived in data repository
NWO: http://www.nwo.nl/en/policies/open+science/data+management
EU H2020:
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
(3) Development in science
Data intensive science (4th paradigm)
Data collections are future assets of research groups
What you will learn today
Data management planning: how to make a DMP, what issues and how to describe (interactive)
Awareness of importance of managing data after research: data citation and publication (persistent identifiers) and proper data archiving
Knowledge about legal issues in data management
Programme9:30 Introduction to Research Data Management Dr. ir. Maarten van Bentum, data librarian
UT - Library & Archive9:45 Data Management Planning Dr. ir. Maarten van Bentum, data librarian
UT - Library & Archive10:00 Small group assignment:
Writing a DMP section (based on one of the research cases in the group)
Dr. ir. Maarten van Bentum, data librarian UT - Library & Archive
10:45 Break
11:00 Plenary presentations: Each group presents the section they have prepared, and rest of the teams act as the EU review committee.
Dr. ir. Maarten van Bentum, data librarian UT - Library & Archive
12:30 Lunch
13:30 Data Citation: Claiming Data with DOI’s (incl. small assignments)
Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum
14:00 Hands on Data CV, ORCID (participants individually) Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum
14:45 Data publications Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum
15:00 Break
15:15 Data archive, Dataseal, DIY/DIT Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum
15:30 Legal issues: Data retention, data protection, privacy, ownership
Drs. Heiko Tjalsma, legal advisor DANS
16:30 Evaluation form: tell us what you think about this course
16:45 Closure
Data Management Plan – a definition
Formal research project document about what and how data will be collected, stored, described, and archived and how access, reuse and linking to publications will be realised.
Data Management Plan - topics
Responsibility
Description of data
Methodology data collection
Documentation: metadata (standards)
Quality assurance
Storage and backup
Policies for access and sharing and provisions for appropriate protection/privacy
Policies and provisions for reuse, redistribution
Plans for archiving and preservation of access
From: National Science Foundation and University of California
Data Management Plan - templates
Information, templates and checklists
UT template: website RDM on Library & Archive
3TU.Datacentrum: template
DANS checklist
NWO form
Writing a DMP
6 small groups (data collection, data storage and backup, data documentation, data access, data sharing and reuse, data preservation and archiving)
Use UT template
Work with research case or dataset of one of the group members
Plenary presentations and discussion (15 min each)
DMP - Data collection (1/1)
Type of data > what else should be considered to be object for management: software, models, scripts, instruments, questionnaires, informed consent, etc.
Legal and contractual regulations: Personal data? >
Dutch Personal Data Protection Act, http://www.utwente.nl/az/gegevensbescherming/ (in Dutch)
UT classification guideline for information and information systems (in Dutch)
Who collects data: third party? > contract about rights and licenses, example bankruptcy research agency (see later: data access)
DMP - Data storage and backup (1/4)
CriteriaSustainability/reliability: frequency backup (off line / off site?)
Dataset type: raw dataset, versions during processing and analysis, final datasets
Size dataset: capacity, costs, data transfer
Legal or contractual regulations
Access: individual, community, open
DMP - Data storage and backup (2/4)
Storage options
1.UT central storage
p- or m-disk (ICTS): http
://www.utwente.nl/icts/diensten/catalogus/dataopslag_mw/storage/)
2.Project, community or research institute storage
IGS Datalab: https://www.utwente.nl/igs/datalab/
§Individual data storage (computer, dvd/cd, external hard disk,…)
§Non-commercial cloud storage
Surfdrive: https://www.surfdrive.nl/en
DataverseNL: https://dataverse.nl/dvn/
§Commercial cloud storage: Dropbox, OneDrive, …
DMP - Data storage and backup (3/4)
Storage solution Advantages Disadvantages Suitable for University of Twente (ICTS) central storage M: and P:
full service; reliable, durable, secure; high speed data transfer
no sharing outside UT saving large data files; master copy of data; use encryption for sensitive and critical data; use SURFfilesender for encrypted data transfer
PC or laptop always available; portable; low cost; high speed data transfer
sensitive to damage and loss (no automatic backup); no sharing
saving large data files; temporary storage; use encryption for sensitive and critical data
Personal storage devices (USB flash, external hard drive, DVD/CD)
portable; low cost easily damaged or lost (no automatic backup); not for sensitive or critical data; difficult sharing
saving large data files; temporary storage of standard data
Non-commercial cloud services (for example, DataverseNL1, SURFdrive)
automatic synchronization on several devices; easy access; external sharing
medium speed data transfer; not for sensitive or critical data (SURFDrive: when encrypted)
sharing standard data with external parties
Commercial cloud services (for example, Dropbox, Google Drive, OneDrive)
automatic synchronization on several devices; easy access; external sharing
medium speed data transfer; not for sensitive or critical data; unclear access to data; unclear privacy regulations
sharing standard data with external parties
DMP - Data storage and backup (4/4)
UT data policy
During the research the research data will be saved in a central repository which is available to at least the members of the research group/ institute and which is managed by this research group/ institute. Storage and access should be managed in accordance with legal regulations, any third party contractual requirements, etc.
Backup3 copies (original, external/local, external/remote)
Local vs. remote depends on recovery time needed
Data transfer
https://www.utwente.nl/icts/en/diensten/catalogus/filesender/
DMP - Data documentation (1/4)
Documentation during research of dynamic data sets (for yourself, fellow researchers in the project and/or group)
Documentation after research of static data sets (for discovery, verification, replication, and reuse)
Documentation: standard metadata schemes enhanced with specific descriptive elements necessary for verification, replication, and reuse
See list: http://www.dcc.ac.uk/resources/metadata-standards/list
See also 3TU.Datacentrum Data description and formats
DMP - Data documentation (2/4)Title name of the dataset or research project that produced itCreator names and addresses of the organization or people who created the
data, including all significant contributorsIdentifier The identification number used to identify the data, even if it’s just
an internal project reference numberSubject keywords or phrases describing the subject or content of the dataDates key dates associated with the data, including:
project start and end date; release date; other dates associated with the data lifespan, e.g., maintenance
cycle, update scheduleFunders organizations or agencies who funded the researchLanguage language(s) of the intellectual content of the resource, when
relevantLocation where the data relates to a physical location, record information
about its spatial coverageRights description of any known intellectual property rights held for the dataList of file names and relationships list of all digital files in the archive, with
their names and file extensions (e.g., 'NWPalaceTR.WRL', 'stone.mov')
DMP - Data documentation (3/4)Formats format(s) of the data, e.g., FITS, SPSS, HTML, JPEGMethodology how the data was generated, including equipment or software
used, experimental protocol, other things you would include in your lab notebook. Can reference a published article, if it covers everything
Workflows or analyses to be able to reproduce your workSources references to source material for data derived from other sources,
including details of where the source data is held, how identified and accessed
Versions date/time stamped, and use a separate ID (e.g., version number) for each version
Checksums to test if your file has changed over timeExplanation of codes used in file names brief explanation of any naming
conventions or abbreviations used to label the filesList of codes used in files list of any special values used in the data (e.g.,
codes for categorical survey responses, '999 indicates a "dummy" value in the data,' etc.)
Store metadata in a text file (such as a readme file or codebook) in the same directory as the data
DMP - Data documentation (4/4)
File naming conventions: http://guides.lib.purdue.edu/content.php?
pid=440001&sid=4901667
Good directory structure:
Directory top-level should include
Project title
Unique identifier
Date (e.g. year)
Substructure should have clear, documented naming convention
e.g. each run of an experiment, each version of a dataset, each person in the group.
DMP - Data access (1/3)
- UT data policy?
- Funder requirements?
- Requirements other parties? Contracts?
- Open Access required? Possible? Dutch Personal Data Protection Act (UT Data Protection Officer)
DMP - Data access (2/3)
data accessM:drive (Home-
directory)P:drive (Group-
permissions)DataverseNL Surfdrive
Commercial cloud (Dropbox, etc)
internal group/organization no yes yes yes yes
external group/organization no no yes yes yes
on request no no yes no no
view/download rights management no yes yes yes yes
edit rights management no yes yes yes yes
collaborating on data no no yes yes yes
DMP - Data access (3/3)
DataverseNL
dynamic data sets (file version control)
static data sets (release with persistent id)
access rights management
not for privacy sensitive data!
DMP - Data sharing and reuse (1/1)
Why sharing your data?
Replication / verification
Promote your research
Enable new discoveries (reuse)
"Open where possible, protected where needed"
See NWO policy http://www.nwo.nl/en/policies/open+science
After research: public, linked to publication(s) > DataverseNL, data centres
DMP - Data preservation and archiving (1/2)
UT data policy Preferably during the research, but not later than 1 month after
finishing the research, the research data are archived in a trusted repository (e.g. DANS or 3TU.Datacentrum). The research data are, taking legal regulations, any third party contractual conditions into account, preferably publicly available. This covers at least the research data that form the basis of publications about the research, but can also comprise the full set of raw and/or edited research data.
After the research all durably stored research data and the publications based on those data are linked. This is at least the case for PhD dissertations.
DMP – Data preservation and archiving (2/2)
Data centres:
3TU.Datacentrum
DANS
List of data repositories: Databib or Data repositories