47
Data Management for Graduate Students Marriott Library Graduate Student Workshop Series Rebekah Cummings, Research Data Management Librarian J. Willard Marriott Library, University of Utah September 27, 2016

Data Management for Graduate Students

Embed Size (px)

Citation preview

Page 1: Data Management for Graduate Students

Data Management for Graduate

StudentsMarriott Library Graduate Student Workshop Series

Rebekah Cummings, Research Data Management LibrarianJ. Willard Marriott Library, University of Utah

September 27, 2016

Page 2: Data Management for Graduate Students

• Introductions•What are data? •Why manage data? •Data Management Plans

•Data Organization•Metadata•Storage and Archiving•Questions

In the next hour…

Page 3: Data Management for Graduate Students

NameMajorResearch Project

Page 4: Data Management for Graduate Students

What is data management?

Activities and practices that support long-term preservation,

access, and use of data

Page 5: Data Management for Graduate Students

What are data? “The recorded factual material

commonly accepted in the research community as

necessary to validate research findings.”

- U.S. OMB Circular A-110

Page 6: Data Management for Graduate Students

Data are diverse

Page 7: Data Management for Graduate Students

Data are messy

Page 8: Data Management for Graduate Students

We manage data first and foremost for

ourselves

Page 9: Data Management for Graduate Students

Why else manage data?

•Meet grant and journal requirements

•Promote reproducible research

•Enable new discoveries from your data

Page 10: Data Management for Graduate Students

We are trying to avoid this scenario…

Page 11: Data Management for Graduate Students

Two bears data management

problems1. Didn’t know where he stored the data

2. Saved one copy of the data on a USB drive

3. Data was in a format that could only be read by outdated, proprietary software

4. No codebook to explain the variable names

5. Variable names were not descriptive

6. No contact information for the co-author Sam Lee

Page 12: Data Management for Graduate Students

Data Management Plans•What data are generated by your

research?•What is your plan for managing the data? •How will your data be shared?

Page 13: Data Management for Graduate Students

Elements of a DMP•Types of data, including file

formats•Data description•Data storage•Data sharing, including

confidentiality or security restrictions

•Data archiving and responsibility•Data management costs

Page 14: Data Management for Graduate Students

DMPTool – CDL

Page 15: Data Management for Graduate Students

Data organization

Page 16: Data Management for Graduate Students

File naming

Page 17: Data Management for Graduate Students

MyData.xls

MeetingNotes.doc

Presentation.ppt

Assignment1.pdf

Page 18: Data Management for Graduate Students

File naming best practices

1. Be descriptive not generic

2. Appropriate length (about 25 chars or less)

3. Be consistent4. Think critically about

your file names

Page 19: Data Management for Graduate Students

File naming best practices•Files should include only letters,

numbers, and underscores/dashes.•No special characters. •No spaces; Use dashes,

underscores, or camel case (likeThis).

•Avoid case dependency. Assume this, THIS, and tHiS are the same.

•Have a strategy for version control.•Don’t overwrite file extensions

Page 20: Data Management for Graduate Students

One potential strategy

Page 21: Data Management for Graduate Students

Version Control - Numbering

001002003009010099

Use leading zeros for scalability

Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version changes and decimals for minor changes

(v1.1, v2.6)

110

239

99

Page 22: Data Management for Graduate Students

Version Control - Dates

If using dates use YYYYMMDDJune2015 = BAD!

06-18-2015 = BAD!

20150618 = GREAT!

2015-06-18 = This is fine too

Page 23: Data Management for Graduate Students

From a DMP…“Each file name, for all types of data, will contain the project acronym PUCCUK; a reference to the file content (survey, interview, media) and the date of an event (such as the date of an interview).

Page 24: Data Management for Graduate Students

•PLPP_EvaluationData_Workshop2_2014.xlsx

•MyData.xlsx

•publiclibrarypartnershipsprojectevaluationdataworkshop22014CummingsHelenaMontana.xlsx

Who filed better?

Page 25: Data Management for Graduate Students

Who filed better? •July 24 2014_SoilSamples%_v6•20140724_NSF_SoilSamples_Cum

mings•SoilSamples_FINAL

Page 26: Data Management for Graduate Students

Structuring folders and files

• Consider all the types of files you will handle during the course of your project.

• Develop a nested folder structure that makes sense for your project and your team’s retrieval needs.

• Name folders clearly, without special characters. • Use a standard folder structure for each project or

subproject (including making folders for files not yet created)

• Create a reference document (README file) that notes the purpose of different folder.

University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management

Page 27: Data Management for Graduate Students

README files

Page 28: Data Management for Graduate Students

File organization exercise

Page 29: Data Management for Graduate Students

Describing data

Page 30: Data Management for Graduate Students

Research Documentation •Grant proposals and related reports•Applications and approvals (e.g. IRB)•Codebooks, data dictionaries•Consent forms•Surveys, questionnaires, interview protocols•Transcripts, hard copies of audio and video

files•Any software or code you used (no matter

how insignificant or buggy)

Page 31: Data Management for Graduate Students

IJ?XVAR?

FNAME?

Page 32: Data Management for Graduate Students

What goes in a codebook?

•Variable name•Variable meaning•Variable data types•Precision of data•Units

•Known issues with the data

•Relationships to other variables

•Null values•Anything else someone

needs to better understand the data

Page 33: Data Management for Graduate Students

MetadataUnstructure

d DataStructured

Data

There was a study put out by Dr. Gary Bradshaw from the University of Nebraska Medical Center in 1982 called “ Growth of Rodent Kidney Cells in Serum Media and the Effect of Viral Transformation On Growth”. It concerns the cytology of kidney cells.

Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth.

Author Gary BradshawDate 1982Publisher

University of Nebraska Medical Center

Subject Kidney -- Cytology

Page 34: Data Management for Graduate Students

At the very least…• Title• Creator• Description• Date• Type• Publisher

• Format• Identifier (DOI) • Rights• Any other critical

information to understand or cite the data.

Page 35: Data Management for Graduate Students

Data ownership

Page 37: Data Management for Graduate Students

LOCKSS (Lots of Copies

Keeps Stuff Safe)

Page 38: Data Management for Graduate Students

Options for data storage

•Personal computers or laptops

•Networked drives•External storage devices

Page 39: Data Management for Graduate Students

3-2-1 Backup RuleHave 3 copies of your data

On 2 different mediaIn more than 1 physical

location

Page 40: Data Management for Graduate Students

Ubox – box.utah.edu

Page 41: Data Management for Graduate Students

Language from a DMP

“All data files will be stored on the University server that is backed up nightly. The University's computing network is protected from viruses by a firewall and anti-virus software. Digital recordings will be copied to the server each day after interviews.

Signed consent forms will be stored in a locked cabinet in the office. Interview recordings and transcripts, which may contain personal information, will be password protected at file-level and stored on the server.

Original versions of the files will always be kept on the server. If copies of files are held on a laptop and edits made, their file names will be changed.”

Page 42: Data Management for Graduate Students

Thinking long-term

Page 43: Data Management for Graduate Students

Archiving options•Domain-specific repository

•General Purpose Data Repository

•Institutional repository

Page 44: Data Management for Graduate Students

When you archive…• Save the data in both its proprietary and non-

proprietary format (e.g. Excel and CSV; Microsoft Word and ASCII)

• Consider any restrictions on your data (copyright, patent, privacy, etc.)

• When possible/mandated/desired, share your data online with a persistent identifier (DOI or ARK)

• Include a data citation and state how you want to get credit for your data

• Link your data to your publications as often as possible

Page 45: Data Management for Graduate Students

Your data librarians

Daureen Nesdill

Research Data Management

Librarian, Sciences

Darell SchmickResearch Librarian,

Health Sciences

Rebekah Cummings

Research Data Management

Librarian, Social Sciences & Humanities

Page 46: Data Management for Graduate Students

Major takeaways•Data management starts at the

beginning of a project•Document your data so that someone

else could understand it•Have more than one copy of your

data•Consider archiving options when you

are done with your project

Page 47: Data Management for Graduate Students

Questions?

Rebekah [email protected](801) 581-7701Marriott Library, 1705Y…or ask now!