Data Management for Graduate Students

Preview:

Citation preview

Data Management for Graduate

StudentsMarriott Library Graduate Student Workshop Series

Rebekah Cummings, Research Data Management LibrarianJ. Willard Marriott Library, University of Utah

September 27, 2016

• Introductions•What are data? •Why manage data? •Data Management Plans

•Data Organization•Metadata•Storage and Archiving•Questions

In the next hour…

NameMajorResearch Project

What is data management?

Activities and practices that support long-term preservation,

access, and use of data

What are data? “The recorded factual material

commonly accepted in the research community as

necessary to validate research findings.”

- U.S. OMB Circular A-110

Data are diverse

Data are messy

We manage data first and foremost for

ourselves

Why else manage data?

•Meet grant and journal requirements

•Promote reproducible research

•Enable new discoveries from your data

We are trying to avoid this scenario…

Two bears data management

problems1. Didn’t know where he stored the data

2. Saved one copy of the data on a USB drive

3. Data was in a format that could only be read by outdated, proprietary software

4. No codebook to explain the variable names

5. Variable names were not descriptive

6. No contact information for the co-author Sam Lee

Data Management Plans•What data are generated by your

research?•What is your plan for managing the data? •How will your data be shared?

Elements of a DMP•Types of data, including file

formats•Data description•Data storage•Data sharing, including

confidentiality or security restrictions

•Data archiving and responsibility•Data management costs

DMPTool – CDL

Data organization

File naming

MyData.xls

MeetingNotes.doc

Presentation.ppt

Assignment1.pdf

File naming best practices

1. Be descriptive not generic

2. Appropriate length (about 25 chars or less)

3. Be consistent4. Think critically about

your file names

File naming best practices•Files should include only letters,

numbers, and underscores/dashes.•No special characters. •No spaces; Use dashes,

underscores, or camel case (likeThis).

•Avoid case dependency. Assume this, THIS, and tHiS are the same.

•Have a strategy for version control.•Don’t overwrite file extensions

One potential strategy

Version Control - Numbering

001002003009010099

Use leading zeros for scalability

Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version changes and decimals for minor changes

(v1.1, v2.6)

110

239

99

Version Control - Dates

If using dates use YYYYMMDDJune2015 = BAD!

06-18-2015 = BAD!

20150618 = GREAT!

2015-06-18 = This is fine too

From a DMP…“Each file name, for all types of data, will contain the project acronym PUCCUK; a reference to the file content (survey, interview, media) and the date of an event (such as the date of an interview).

•PLPP_EvaluationData_Workshop2_2014.xlsx

•MyData.xlsx

•publiclibrarypartnershipsprojectevaluationdataworkshop22014CummingsHelenaMontana.xlsx

Who filed better?

Who filed better? •July 24 2014_SoilSamples%_v6•20140724_NSF_SoilSamples_Cum

mings•SoilSamples_FINAL

Structuring folders and files

• Consider all the types of files you will handle during the course of your project.

• Develop a nested folder structure that makes sense for your project and your team’s retrieval needs.

• Name folders clearly, without special characters. • Use a standard folder structure for each project or

subproject (including making folders for files not yet created)

• Create a reference document (README file) that notes the purpose of different folder.

University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management

README files

File organization exercise

Describing data

Research Documentation •Grant proposals and related reports•Applications and approvals (e.g. IRB)•Codebooks, data dictionaries•Consent forms•Surveys, questionnaires, interview protocols•Transcripts, hard copies of audio and video

files•Any software or code you used (no matter

how insignificant or buggy)

IJ?XVAR?

FNAME?

What goes in a codebook?

•Variable name•Variable meaning•Variable data types•Precision of data•Units

•Known issues with the data

•Relationships to other variables

•Null values•Anything else someone

needs to better understand the data

MetadataUnstructure

d DataStructured

Data

There was a study put out by Dr. Gary Bradshaw from the University of Nebraska Medical Center in 1982 called “ Growth of Rodent Kidney Cells in Serum Media and the Effect of Viral Transformation On Growth”. It concerns the cytology of kidney cells.

Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth.

Author Gary BradshawDate 1982Publisher

University of Nebraska Medical Center

Subject Kidney -- Cytology

At the very least…• Title• Creator• Description• Date• Type• Publisher

• Format• Identifier (DOI) • Rights• Any other critical

information to understand or cite the data.

Data ownership

LOCKSS (Lots of Copies

Keeps Stuff Safe)

Options for data storage

•Personal computers or laptops

•Networked drives•External storage devices

3-2-1 Backup RuleHave 3 copies of your data

On 2 different mediaIn more than 1 physical

location

Ubox – box.utah.edu

Language from a DMP

“All data files will be stored on the University server that is backed up nightly. The University's computing network is protected from viruses by a firewall and anti-virus software. Digital recordings will be copied to the server each day after interviews.

Signed consent forms will be stored in a locked cabinet in the office. Interview recordings and transcripts, which may contain personal information, will be password protected at file-level and stored on the server.

Original versions of the files will always be kept on the server. If copies of files are held on a laptop and edits made, their file names will be changed.”

Thinking long-term

Archiving options•Domain-specific repository

•General Purpose Data Repository

•Institutional repository

When you archive…• Save the data in both its proprietary and non-

proprietary format (e.g. Excel and CSV; Microsoft Word and ASCII)

• Consider any restrictions on your data (copyright, patent, privacy, etc.)

• When possible/mandated/desired, share your data online with a persistent identifier (DOI or ARK)

• Include a data citation and state how you want to get credit for your data

• Link your data to your publications as often as possible

Your data librarians

Daureen Nesdill

Research Data Management

Librarian, Sciences

Darell SchmickResearch Librarian,

Health Sciences

Rebekah Cummings

Research Data Management

Librarian, Social Sciences & Humanities

Major takeaways•Data management starts at the

beginning of a project•Document your data so that someone

else could understand it•Have more than one copy of your

data•Consider archiving options when you

are done with your project

Questions?

Rebekah Cummingsrebekah.cummings@utah.edu(801) 581-7701Marriott Library, 1705Y…or ask now!

Recommended