22
Introductory Data Management – Developing, Archiving, Sharing Data for Current/Future Use IFAS Global Feed the Future Presentation for Dr. Caroline Staub’s class 9:30 am – 11:00 am Tuesday, August 6, 2019 George A. Smathers Libraries, Library West, 211

Introductory Data Management – Developing, Archiving

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introductory Data Management – Developing, Archiving

Introductory Data Management – Developing, Archiving, Sharing Data for Current/Future Use

IFAS Global Feed the Future Presentation for Dr. Caroline Staub’s class9:30 am – 11:00 am Tuesday, August 6, 2019George A. Smathers Libraries, Library West, 211

Page 2: Introductory Data Management – Developing, Archiving

Table of Contents

1. Basic data management concepts/terms2. Fundamental data management plan components3. UF resources for data management and archiving4. U.S. Federal agencies public access and data management

mandates 5. Data repositories

• Discipline-specific and General• Institutional

6. DMPTool hands-on training7. References

Page 3: Introductory Data Management – Developing, Archiving

Basic data management concepts/terms

• Data Lifecycle • Metadata• Data Curation• Data Archiving• Data Preservation• Data Repository• Digital Object Identifier (DOI)• Open Access• Open Science• ORCID - https://orcid.org/

Figure 1: Digital Curation Centre (DCC) Curation Lifecycle Model (DCC, 2007)

Presenter
Presentation Notes
"Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods." -  https://www.fosteropenscience.eu/foster-taxonomy/open-science-definition
Page 4: Introductory Data Management – Developing, Archiving

Fundamental data management plan components (DCC, 2013)

Administrative Data• ID (funder or institution)• Funder (i.e. USAID, USDA-NIFA)• Grant Reference #• Project Name• Project Description• PI/Researcher• Researcher ID (e.g. ORCID)• Date of 1st version, last update

Data Collection• What data will you create or

collect?• What type, format, and volume?

• Text, variant call format (VCF), >20GB• Quantitative, qualitative

• How will the data be collected or created?

• What standards or methodologies will you use?

• How will you name and structure your files and folders?

Page 5: Introductory Data Management – Developing, Archiving

Fundamental data management plan components (1 of 2) – Name/Structure Files and FoldersName/Structure Files and Folders

• Develop naming conventions• Always include same information (e.g. date, time, location)• Retain order of information (YYYMMDD, not MMDDYYYY)• Document standard file naming (e.g. codebook)• Be descriptive so others can understand meaning• Unique identifiers (i.e. Project Name of Grant Number in folder name)• Date (embedded in file properties also)• Use application-specific codes in three letter file extensions (e.g. MOV, TIF,

XML)• Limit the depth of sub-folders to no more than two sub-folders

Presenter
Presentation Notes
ASCII stands for American Standard Code for Information Interchange. Below is the ASCII character table, including descriptions of the first 32 characters. ASCII was originally designed for use with teletypes, and so the descriptions are somewhat obscure and their use is frequently not as intended. See: https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html Java actually uses Unicode, which includes ASCII and other characters from languages around the world. Project or research data name Conditions (Lab instrument, solvent, Temperature, etc.) Run of experiment (sequential) Date (embedded in file properties also) Use application-specific codes in three letter file extensions (e.g. MOV, TIF, XML) Limit the depth of sub-folders to no more than two sub-folders
Page 6: Introductory Data Management – Developing, Archiving

Fundamental data management plan components (2 of 2) – Name/Structure Files and FoldersName/Structure Files and Folders• Use sequential numbered system (e.g. v1, v2, v3,

etc.)• DO NOT use confusing labels (e.g. revision, final,

final2, etc.)• Avoid spaces (use underscore)• Use ASCII Characters only• Document, share, evaluate• Separate classes of products: raw data, derived

data, graphics, code, documents, etc.• Consider version control software (e.g. Git, GitLab,

GitHub, etc.)• Record all changes• Discard obsolete versions (but never the raw copy)• Make backups (store in three locations)

Figure 2: Data organization (Benedict, 2019)

Presenter
Presentation Notes
ASCII stands for American Standard Code for Information Interchange. Below is the ASCII character table, including descriptions of the first 32 characters. ASCII was originally designed for use with teletypes, and so the descriptions are somewhat obscure and their use is frequently not as intended. See: https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html Java actually uses Unicode, which includes ASCII and other characters from languages around the world.
Page 7: Introductory Data Management – Developing, Archiving

Fundamental data management plan components (DCC, 2013)

Documentation and Metadata• What documentation and

metadata will accompany the data?

• What information is needed for the data to be read and interpreted in the future?

• How will you capture/create the documentation and metadata?

• What metadata standards will you use and why?

Ethical, Legal, and Regulatory Compliances• How will you manage any ethical

issues?• Have you gained consent for data

preservation and sharing?• How will you manage copyright

and intellectual Property Rights (IPR) issues?

• Who owns the data?• How will the data be licensed for

reuse?

Page 8: Introductory Data Management – Developing, Archiving

Fundamental data management plan components (DCC, 2013)

Storage and Backup• How will data be stored and

backed up during research (e.g. HiPerGator)?

• Do you have sufficient storage or will you need to include charges for additional services?

• How will you manage access and security?

• What are the risks to data security and who will manage data security risks?

Selection & Preservation• Which data should be retained,

shared, and/or preserved?• What data must be

retained/destroyed for contractual, legal, or regulatory purposes?

• What is the long-term preservation plan for the dataset?

• Where and in which repository or archive will the data be held?

Page 9: Introductory Data Management – Developing, Archiving

Fundamental data management plan components (DCC, 2013)

Data Sharing• How will you share the data?

• How will potential users find out about your data?

• Are there any required data sharing restrictions?

• What action will you take to overcome or minimize restrictions (e.g. anonymize, de-identify)?

Responsibilities & Resources• Who will be responsible for data

management?• What resources will you require

to deliver your plan?• Is additional specialist expertise

(or training for existing staff) required?

Page 10: Introductory Data Management – Developing, Archiving

Fundamental data management plan components (Lorenzen, et al., 2016)Objective Output name Output description Output (type, format)

Obj. 1 Synthesized datasets Habitat; Fisheries independent; Fisheries dependent

Habitat (derived, geospatial), Fisheries (derived tabular)

Obj. 2 Hierarchical analyses of spatial recruitment and angler effort

Reports; Instructions for analyses; Data analyses code; Geospatial images

Reports and Instructions (test, PDF/XML); Code text, .txt); Geospatial (TIFF and GIS)

Obj. 3 Socio-ecological regional system model analyses

Reports; instructions for analyses; Data analyses code

Reports and Instructions (text, PDF/XML); Code (text, .txt)

Obj. 4 Restoration management strategyevaluation (MSE)

Simulation results; Reports; Instructions for analyses; Data analyses code

Simulation (simulated data, CSV); Reports and Instructions (text, PDF/XML); Code (text, .txt)

Table 1: Description of project data output and products in a revised DMP (awarded)

Page 11: Introductory Data Management – Developing, Archiving

Fundamental data management plan components

Figure 3: Description of key components and processes in a fundamental data management plan

Page 12: Introductory Data Management – Developing, Archiving

UF resources for data management and archiving

UF DropBox

IR@UF

RedCapResearchVault

HiPerGator

Figure 4: Select UF Resources for Data Management and Archiving

Presenter
Presentation Notes
ResVault is a secure computing environment where scientists and collaborators can conduct research on restricted and confidential data. The software portion, tiCrypt, was developed by Tera Insights in collaboration with the University of Florida (UF) to address the specific needs of researchers working with restricted data, specifically projects requiring compliance with NIST 800-171 and NIST 800-53 standards. REDCap (Research Electronic Data Capture) is a secure, Web-based application designed to support traditional case report form data capture for your research studies.
Page 13: Introductory Data Management – Developing, Archiving

U.S. Federal agencies public access mandates

Public access mandate established by Executive Order of President Obama on May 9, 2013

Open, Public, Electronic & Necessary (OPEN) Government Data Act became law in January, 2019

Open and machine-readable is the new default for all government data

Presenter
Presentation Notes
Federal agencies data management mandates (e.g. USAID)  (Suzanne) Public access mandate established by Executive Order of President Obama on May 9, 2013, and the White House Office of Science & Technology in February, 2013 for all federal agencies with >$100 million annual expenditures OPEN Data Government Act signed into law by President Trump in January 14, 2019.  Codifies former President Obama’s Executive Order making “open and machine readable” the default for all government data (within security & private limitations)
Page 14: Introductory Data Management – Developing, Archiving

Open Access to fulfill public access mandates

Research output (publications, reports) can be published in traditional subscription journals, Open Access (OA) journals or as Open Access articles in hybrid journals. Under the U. S. public access mandate, federally-funded research must be made available free to readers within 12 months of publication.

• Traditional pay to read models (subscription journals)• Hybrid journals (subscription journals that also include OA articles)• Open Access journals (free to readers, regardless of affiliation)

Page 15: Introductory Data Management – Developing, Archiving

UF Open Access Publishing (UFOAP) Fund

UF each provides financial support to help UF scholars offset costs incurred with Open Access publishing where other funding sources do not exist. OA publishing fees, up to $1,500 per year for Article Processing Charges (APCs), are eligible for payment or reimbursement.

Funds are awarded on a first-come, first-serve basis to those who meet eligibility requirements. UF scholars that are first or corresponding author are eligible to apply within 60 days of receipt of article acceptance by the publisher. Articles must be peer-reviewed research or systematic review articles.

For more information and detailed eligibility guidelines, visit https://cms.uflib.ufl.edu/ScholComm/UFOAPF

Presenter
Presentation Notes
Page 16: Introductory Data Management – Developing, Archiving

USAID Repositories

Figure 6: Development Data Library (DDL) -https://data.usaid.gov/

Figure 5: Development Experience Clearinghouse (DEC) - https://dec.usaid.gov

For Datasets and Data Assets:For Publications:

Presenter
Presentation Notes
USAID Respositories (Suzanne) Development Experience Clearinghouse (DEC) for publications, reports, etc Development Data Library (DDL) for data assets and datasets Both sites contain User Guides and video tutorials.  You need to register to  contribute to either site.
Page 17: Introductory Data Management – Developing, Archiving

Data repositories

• Discipline-specific repositories• Examples: Development Data Library (USAID), Climate Model Data Service (NASA),

IFPRI E-brary• Registry of Research Data Repositories (https://www.re3data.org) is a great resource

for options

• General data repositories• Examples: Zenodo (https://zenodo.org/), Dryad (https://datadryad.org/), Figshare

(https://figshare.com/)

• Institutional repositories (IRs)• Example: the IR@UF (https://ufdc.ufl.edu/ufir)• Can be an additional layer of preservation, access, and discoverability for your work• For more information and instructions on how to submit,

visit http://guides.uflib.ufl.edu/ufir/home.

Presenter
Presentation Notes
Data repositories support long-term preservation, access, and reuse of data. Generally, data repositories can be grouped into three main categories. Discipline-specific repositories accept data specifically related to certain subjects. If you’re not sure where to start, Re3data includes information on over 2,000 research data repositories. You can browse by subject, content type, or country. General data repositories accept data from any discipline. Institutional repositories accept materials (including data) from their institution only. Here, we have the IR@UF, which provides long-term open access to the research, scholarship, and creative works of our academic community. Really, that just means we’re here for you, whether you are looking for a place to preserve past research or to develop new projects in digital scholarship and publishing. Also, if you have data uploaded to other repositories, we may be able to ingest it into the IR on your behalf.
Page 18: Introductory Data Management – Developing, Archiving

Choosing a data repository

• Technical Specs• Are there size limits?• What types of materials can be uploaded?

• Cost• Are there charges to use the repository?

• Discoverability• Are there options for access (open, closed, restricted)? What is required for your project?• How easy is it to find items in the repository? How about outside the repository?

• Other Considerations• Is a persistent identifier (like a DOI) needed for your materials? Is that service offered?• Is your project collaborative? Will others need to upload files or add notes to the collection?• Is deposit with a specific repository required (e.g., as a condition of a grant or award)?

• For USAID awards, materials must be submitted to USAID repositories.• Datasets should be submitted to the Data Development Library (https://data.usaid.gov/) and publications

should be submitted to the Development Experience Clearinghouse (https://dec.usaid.gov/dec/home/Default.aspx).

• More details available from http://libguides.gatech.edu/USAID_public_access_policy.

Presenter
Presentation Notes
If you don’t have a preferred or required repository for your project, there are a few questions you can consider to help you decide. Think about the technical specs of your project. Some repositories may have size limits or restrictions on the types of materials that can be uploaded.  What about costs? Some repositories may charge for deposit, maintenance, or storage space. Discoverability and access are also both important to consider. Do you need a certain level of access? Some repositories allow you to set restrictions so only certain people can see your materials, while others are open access.  Also think about discoverability. How easy is it to find what you’re looking for, in the repository itself and through Google? How easy do you need it to be? Project-specific requirements can also help you sort through your options. What other details are important to the success of your project? Do you need a persistent identifier like a DOI? Would it be beneficial to have a collaborative collection, so others can upload files or add notes? Are you required to submit to a specific repository? Sometimes, depositing data in a specific repository is required as part of a grant or award. Lots to think about! But help is available. We’re here, and happy to discuss your questions or options. 
Page 19: Introductory Data Management – Developing, Archiving

DMPTool – hands-on training

DMPTool - https://dmptool.org/DMPTool – Sign in instructions (FREE to the public)1. Navigate to https://dmptool.org (Recommended browser

Chrome, Firefox – IE not functioning)

2. Click on Sign in upper-right hand corner3. Select most relevant Sign in option

a. Option 1: Your institution (if your institution is affiliated with DMPTool) - enter University of Florida

b. Option 2: Email address (if your institution is not affiliated with DMPTool )

c. Option 3: Create account with email address (if not affiliated and need an account)

4. Click on the Next button5. Login with you GatorLink credentials6. Click on Create New DMP7. Enter metadata and ORCID (if you do not have

an ORCID, then create an ORCID – next slide

Figure 7: Data Management Plan Tool (DMPTool)

Page 20: Introductory Data Management – Developing, Archiving

DMPTool – hands-on training

ORCID - https://orcid.org/ORCID – Sign in instructions (FREE to the public)1. Navigate to https://orcid.org/2. Click on Sign in upper-right hand

corner3. Select most relevant Sign in option

under Sign into ORCID or Register Now

a. Option 1: Personal account (current user)

b. Option 2: Institutional account (Select if affiliated with University of Florida)

c. Option 3: Register now (create new account)

4. Click on the Next button5. Click pencil icon to develop profileFigure 8: Open Contributor Researcher ID (ORCID)

Page 21: Introductory Data Management – Developing, Archiving

References• Benedict, K. (2019). Data Management Skills & Training Resources. Presentation to the INSC590 Problems: Information Science – Data Management graduate course

guest lecture at the University of Tennessee, Knoxville School of Information Sciences. March 27, 2019.

• DCC. (2013). Checklist for a Data Management Plan. V.4.0. Edinburgh: Digital Curation Centre. Available online: https://bit.ly/1Z2Gbqk.

• DCC. (2017). DCC Curation Lifecycle Model. Accessed August 5, 2019 from http://www.dcc.ac.uk/resources/curation-lifecycle-model.

• Executive Order of President Obama for open and machine-readable government data, 5/9/2013. Available from the National Archives at https://bit.ly/2n5YBLz.

• FOSTER. (2019). Open Science Definition. Accessed August 5, 2019 from https://bit.ly/2lJEnTO.

• Lorenzen, K., Camp, E., & Dutka-Gianelli, J. (2016). Synthesizing spatial dynamics of recreational fish and fisheries to inform restoration strategies: red drum in the Gulf of Mexico. Revised Data Management Plan. http://ufdc.ufl.edu/AA00014835/00088.

• RedCap. (2019). Research Electronic Data Capture (RedCap). Accessed August 5, 2019 from https://bit.ly/2MExnII.

• *NCSU Libraries. (nd). Formats & Data Organization. Adapted from Making Data Management Easier by the University of Virginia Libraries and Storing Data by the University of Minnesota Libraries. Accessed August 6, 2019 from http://www.lib.ncsu.edu/data-management/formats.

• OPEN Government Data Act signed into law by President Trump, 1/14/2019. New release from SPARC at https://bit.ly/2T6a0ZE.

• UF Research Computing. (2019). Service rates. https://www.rc.ufl.edu/services/rates/service/.

• UF Research Computing. (2019). UF Apps for Research. https://www.rc.ufl.edu/services/uf-apps-for-research/.

• USAID. (2019). Data Resources. Accessed August 5, 2019 from https://www.usaid.gov/results-and-data/data-resources.

• Whitemire et al., (2015). A table summarizing the Federal public access policies resulting from the US Office of Science and Technology Policy Memorandum of February 2013. figshare. http://dx.doi.org/10.6084/m9.figshare.1372041. Retrieved August 6, 2019 from http://tinyurl.com/hkgqytu.

• Wilkinson, M. D, et al. (2016). The FAIR Guiding Principles for Scientific data management and stewardship. Scientific Data 3, Article number: 160018. https://www.nature.com/articles/sdata201618.

• Zenodo. (2019). Frequently Asked Questions. Accessed August 5, 2019 from http://help.zenodo.org/.

Page 22: Introductory Data Management – Developing, Archiving

Thank you

Questions/comments?

Contact:Chelsea Johnston, Scholarly Repository Librarian, [email protected]

Suzanne Stapleton, Agricultural Sciences & Digital Initiatives Librarian, [email protected]

Plato Smith, Data Management Librarian, [email protected] Management and Curation Working Group

[email protected]