Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Introductory Data Management – Developing, Archiving, Sharing Data for Current/Future Use
IFAS Global Feed the Future Presentation for Dr. Caroline Staub’s class9:30 am – 11:00 am Tuesday, August 6, 2019George A. Smathers Libraries, Library West, 211
Table of Contents
1. Basic data management concepts/terms2. Fundamental data management plan components3. UF resources for data management and archiving4. U.S. Federal agencies public access and data management
mandates 5. Data repositories
• Discipline-specific and General• Institutional
6. DMPTool hands-on training7. References
Basic data management concepts/terms
• Data Lifecycle • Metadata• Data Curation• Data Archiving• Data Preservation• Data Repository• Digital Object Identifier (DOI)• Open Access• Open Science• ORCID - https://orcid.org/
Figure 1: Digital Curation Centre (DCC) Curation Lifecycle Model (DCC, 2007)
Fundamental data management plan components (DCC, 2013)
Administrative Data• ID (funder or institution)• Funder (i.e. USAID, USDA-NIFA)• Grant Reference #• Project Name• Project Description• PI/Researcher• Researcher ID (e.g. ORCID)• Date of 1st version, last update
Data Collection• What data will you create or
collect?• What type, format, and volume?
• Text, variant call format (VCF), >20GB• Quantitative, qualitative
• How will the data be collected or created?
• What standards or methodologies will you use?
• How will you name and structure your files and folders?
Fundamental data management plan components (1 of 2) – Name/Structure Files and FoldersName/Structure Files and Folders
• Develop naming conventions• Always include same information (e.g. date, time, location)• Retain order of information (YYYMMDD, not MMDDYYYY)• Document standard file naming (e.g. codebook)• Be descriptive so others can understand meaning• Unique identifiers (i.e. Project Name of Grant Number in folder name)• Date (embedded in file properties also)• Use application-specific codes in three letter file extensions (e.g. MOV, TIF,
XML)• Limit the depth of sub-folders to no more than two sub-folders
Fundamental data management plan components (2 of 2) – Name/Structure Files and FoldersName/Structure Files and Folders• Use sequential numbered system (e.g. v1, v2, v3,
etc.)• DO NOT use confusing labels (e.g. revision, final,
final2, etc.)• Avoid spaces (use underscore)• Use ASCII Characters only• Document, share, evaluate• Separate classes of products: raw data, derived
data, graphics, code, documents, etc.• Consider version control software (e.g. Git, GitLab,
GitHub, etc.)• Record all changes• Discard obsolete versions (but never the raw copy)• Make backups (store in three locations)
Figure 2: Data organization (Benedict, 2019)
Fundamental data management plan components (DCC, 2013)
Documentation and Metadata• What documentation and
metadata will accompany the data?
• What information is needed for the data to be read and interpreted in the future?
• How will you capture/create the documentation and metadata?
• What metadata standards will you use and why?
Ethical, Legal, and Regulatory Compliances• How will you manage any ethical
issues?• Have you gained consent for data
preservation and sharing?• How will you manage copyright
and intellectual Property Rights (IPR) issues?
• Who owns the data?• How will the data be licensed for
reuse?
Fundamental data management plan components (DCC, 2013)
Storage and Backup• How will data be stored and
backed up during research (e.g. HiPerGator)?
• Do you have sufficient storage or will you need to include charges for additional services?
• How will you manage access and security?
• What are the risks to data security and who will manage data security risks?
Selection & Preservation• Which data should be retained,
shared, and/or preserved?• What data must be
retained/destroyed for contractual, legal, or regulatory purposes?
• What is the long-term preservation plan for the dataset?
• Where and in which repository or archive will the data be held?
Fundamental data management plan components (DCC, 2013)
Data Sharing• How will you share the data?
• How will potential users find out about your data?
• Are there any required data sharing restrictions?
• What action will you take to overcome or minimize restrictions (e.g. anonymize, de-identify)?
Responsibilities & Resources• Who will be responsible for data
management?• What resources will you require
to deliver your plan?• Is additional specialist expertise
(or training for existing staff) required?
Fundamental data management plan components (Lorenzen, et al., 2016)Objective Output name Output description Output (type, format)
Obj. 1 Synthesized datasets Habitat; Fisheries independent; Fisheries dependent
Habitat (derived, geospatial), Fisheries (derived tabular)
Obj. 2 Hierarchical analyses of spatial recruitment and angler effort
Reports; Instructions for analyses; Data analyses code; Geospatial images
Reports and Instructions (test, PDF/XML); Code text, .txt); Geospatial (TIFF and GIS)
Obj. 3 Socio-ecological regional system model analyses
Reports; instructions for analyses; Data analyses code
Reports and Instructions (text, PDF/XML); Code (text, .txt)
Obj. 4 Restoration management strategyevaluation (MSE)
Simulation results; Reports; Instructions for analyses; Data analyses code
Simulation (simulated data, CSV); Reports and Instructions (text, PDF/XML); Code (text, .txt)
Table 1: Description of project data output and products in a revised DMP (awarded)
Fundamental data management plan components
Figure 3: Description of key components and processes in a fundamental data management plan
UF resources for data management and archiving
UF DropBox
IR@UF
RedCapResearchVault
HiPerGator
Figure 4: Select UF Resources for Data Management and Archiving
U.S. Federal agencies public access mandates
Public access mandate established by Executive Order of President Obama on May 9, 2013
Open, Public, Electronic & Necessary (OPEN) Government Data Act became law in January, 2019
Open and machine-readable is the new default for all government data
Open Access to fulfill public access mandates
Research output (publications, reports) can be published in traditional subscription journals, Open Access (OA) journals or as Open Access articles in hybrid journals. Under the U. S. public access mandate, federally-funded research must be made available free to readers within 12 months of publication.
• Traditional pay to read models (subscription journals)• Hybrid journals (subscription journals that also include OA articles)• Open Access journals (free to readers, regardless of affiliation)
UF Open Access Publishing (UFOAP) Fund
UF each provides financial support to help UF scholars offset costs incurred with Open Access publishing where other funding sources do not exist. OA publishing fees, up to $1,500 per year for Article Processing Charges (APCs), are eligible for payment or reimbursement.
Funds are awarded on a first-come, first-serve basis to those who meet eligibility requirements. UF scholars that are first or corresponding author are eligible to apply within 60 days of receipt of article acceptance by the publisher. Articles must be peer-reviewed research or systematic review articles.
For more information and detailed eligibility guidelines, visit https://cms.uflib.ufl.edu/ScholComm/UFOAPF
USAID Repositories
Figure 6: Development Data Library (DDL) -https://data.usaid.gov/
Figure 5: Development Experience Clearinghouse (DEC) - https://dec.usaid.gov
For Datasets and Data Assets:For Publications:
Data repositories
• Discipline-specific repositories• Examples: Development Data Library (USAID), Climate Model Data Service (NASA),
IFPRI E-brary• Registry of Research Data Repositories (https://www.re3data.org) is a great resource
for options
• General data repositories• Examples: Zenodo (https://zenodo.org/), Dryad (https://datadryad.org/), Figshare
(https://figshare.com/)
• Institutional repositories (IRs)• Example: the IR@UF (https://ufdc.ufl.edu/ufir)• Can be an additional layer of preservation, access, and discoverability for your work• For more information and instructions on how to submit,
visit http://guides.uflib.ufl.edu/ufir/home.
Choosing a data repository
• Technical Specs• Are there size limits?• What types of materials can be uploaded?
• Cost• Are there charges to use the repository?
• Discoverability• Are there options for access (open, closed, restricted)? What is required for your project?• How easy is it to find items in the repository? How about outside the repository?
• Other Considerations• Is a persistent identifier (like a DOI) needed for your materials? Is that service offered?• Is your project collaborative? Will others need to upload files or add notes to the collection?• Is deposit with a specific repository required (e.g., as a condition of a grant or award)?
• For USAID awards, materials must be submitted to USAID repositories.• Datasets should be submitted to the Data Development Library (https://data.usaid.gov/) and publications
should be submitted to the Development Experience Clearinghouse (https://dec.usaid.gov/dec/home/Default.aspx).
• More details available from http://libguides.gatech.edu/USAID_public_access_policy.
DMPTool – hands-on training
DMPTool - https://dmptool.org/DMPTool – Sign in instructions (FREE to the public)1. Navigate to https://dmptool.org (Recommended browser
Chrome, Firefox – IE not functioning)
2. Click on Sign in upper-right hand corner3. Select most relevant Sign in option
a. Option 1: Your institution (if your institution is affiliated with DMPTool) - enter University of Florida
b. Option 2: Email address (if your institution is not affiliated with DMPTool )
c. Option 3: Create account with email address (if not affiliated and need an account)
4. Click on the Next button5. Login with you GatorLink credentials6. Click on Create New DMP7. Enter metadata and ORCID (if you do not have
an ORCID, then create an ORCID – next slide
Figure 7: Data Management Plan Tool (DMPTool)
DMPTool – hands-on training
ORCID - https://orcid.org/ORCID – Sign in instructions (FREE to the public)1. Navigate to https://orcid.org/2. Click on Sign in upper-right hand
corner3. Select most relevant Sign in option
under Sign into ORCID or Register Now
a. Option 1: Personal account (current user)
b. Option 2: Institutional account (Select if affiliated with University of Florida)
c. Option 3: Register now (create new account)
4. Click on the Next button5. Click pencil icon to develop profileFigure 8: Open Contributor Researcher ID (ORCID)
References• Benedict, K. (2019). Data Management Skills & Training Resources. Presentation to the INSC590 Problems: Information Science – Data Management graduate course
guest lecture at the University of Tennessee, Knoxville School of Information Sciences. March 27, 2019.
• DCC. (2013). Checklist for a Data Management Plan. V.4.0. Edinburgh: Digital Curation Centre. Available online: https://bit.ly/1Z2Gbqk.
• DCC. (2017). DCC Curation Lifecycle Model. Accessed August 5, 2019 from http://www.dcc.ac.uk/resources/curation-lifecycle-model.
• Executive Order of President Obama for open and machine-readable government data, 5/9/2013. Available from the National Archives at https://bit.ly/2n5YBLz.
• FOSTER. (2019). Open Science Definition. Accessed August 5, 2019 from https://bit.ly/2lJEnTO.
• Lorenzen, K., Camp, E., & Dutka-Gianelli, J. (2016). Synthesizing spatial dynamics of recreational fish and fisheries to inform restoration strategies: red drum in the Gulf of Mexico. Revised Data Management Plan. http://ufdc.ufl.edu/AA00014835/00088.
• RedCap. (2019). Research Electronic Data Capture (RedCap). Accessed August 5, 2019 from https://bit.ly/2MExnII.
• *NCSU Libraries. (nd). Formats & Data Organization. Adapted from Making Data Management Easier by the University of Virginia Libraries and Storing Data by the University of Minnesota Libraries. Accessed August 6, 2019 from http://www.lib.ncsu.edu/data-management/formats.
• OPEN Government Data Act signed into law by President Trump, 1/14/2019. New release from SPARC at https://bit.ly/2T6a0ZE.
• UF Research Computing. (2019). Service rates. https://www.rc.ufl.edu/services/rates/service/.
• UF Research Computing. (2019). UF Apps for Research. https://www.rc.ufl.edu/services/uf-apps-for-research/.
• USAID. (2019). Data Resources. Accessed August 5, 2019 from https://www.usaid.gov/results-and-data/data-resources.
• Whitemire et al., (2015). A table summarizing the Federal public access policies resulting from the US Office of Science and Technology Policy Memorandum of February 2013. figshare. http://dx.doi.org/10.6084/m9.figshare.1372041. Retrieved August 6, 2019 from http://tinyurl.com/hkgqytu.
• Wilkinson, M. D, et al. (2016). The FAIR Guiding Principles for Scientific data management and stewardship. Scientific Data 3, Article number: 160018. https://www.nature.com/articles/sdata201618.
• Zenodo. (2019). Frequently Asked Questions. Accessed August 5, 2019 from http://help.zenodo.org/.
Thank you
Questions/comments?
Contact:Chelsea Johnston, Scholarly Repository Librarian, [email protected]
Suzanne Stapleton, Agricultural Sciences & Digital Initiatives Librarian, [email protected]
Plato Smith, Data Management Librarian, [email protected] Management and Curation Working Group