Cal Poly - Data Management and the DMPTool

  • View

  • Download

Embed Size (px)


October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University. Many funders now require researchers to submit a Data Management Plan alongside their project proposals. The DMPTool is a free, online wizard that helps you create a data management plan specific to your project, and provides you with links and resources for ensuring your plan is successful.

Text of Cal Poly - Data Management and the DMPTool

Carly Strasser | @carlystrasser California Digital Library

Cal Poly Oct 2013

From Flickr by OZinOH

DMPTool The Data Management Planning Tool


Current DMP tools

A walkthrough of DMPTool





kr b

y (L





kr b

y Fl





kr b

y U






l Com





kr b

y D



C. Strasser



y of





kr b

y d




From Flickr by ~Minnea~

Data management Documentation Reproducibility

From Flickr by ahockey


A document that describes what you will do with your

data both during your research

and after you complete your project

From Flickr by Barbies Land

What is a data management plan?

For funders: A short plan submitted alongside grant applications

But they all have dierent requirements and express them in

dierent ways

From Flickr by 401(K) 2013

An outline of what will be collected methods Standards Metadata sharing/access long-term storage

Includes how and why

Saves time Increases research eciency

Satises requirements

Why prepare a DMP? From Flickr by natalinha

When do you plan?










Proposal writing




When do you plan?


From Flickr by gmacfadyen

Keep your plan current Incorporate changes Use as a guide for daily activities



kr b

y ce



Where to start?

Small & Simple Document what you know now Share the plan with your team Avoid procrastination and immobilization

Where to start?

Unruly Data


Research Support Oce

Data Library / Repository

Computing Support

Faculty Ethics Committee Etc...


Courtesy of Martin Donnelly

Who: Support Services & Collaborators

Plan section Support

Information about data PI, co-PIs, research sta

Metadata content & format Librarians, data repositories

Policies for access, sharing, reuse

Funder, institute, HIPPA, IRB, users

Long-term storage and data management

Librarians, IT sta, data repositories

Budget Sponsored programs oce, funder

DMPs: A Short History

Liz Lyon: Dealing with Data 2008

UK funder expectations 2009


Federal Funding Accountability and Transparency Act 2006

Across the Pond

2010 2010 present

DMPs: A Short History

DMP supplement may include: 1. the types of data, samples, physical collections, software, curriculum

materials, and other materials to be produced in the course of the project

2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies)

3. policies for access and sharing including provisions for appropriate protection of privacy, condentiality, security, intellectual property, or other rights or requirements

4. policies and provisions for re-use, re-distribution, and the production of derivatives

5. plans for archiving data, samples, and other research products, and for preservation of access to them

NSF DMP Requirements From Grant Proposal Guidelines:

Types of data produced Relationship to existing data How/when/where will the data be captured or


How will the data be processed? Quality assurance & quality control measures Security: version control, backing up Who will be responsible for data management

during/after project?

1. Types of data & other information

C. Strasser

From Flickr by Lazurite

What metadata are needed to make the data meaningful? How will you create or capture these metadata? Why have you chosen particular standards and approaches

for metadata?

2. Data & metadata standards

Are you under any obligation to share data? How, when, & where will you make the data available? What is the process for gaining access to the data? Who owns the copyright and/or intellectual property? Will you retain rights before opening data to wider use? How long? Are permission restrictions necessary? Embargo periods for political/commercial/patent reasons? Ethical and privacy issues? Who are the foreseeable data users? How should your data be cited?

3. Policies for access & sharing 4. Policies for re-use & re-distribution

What data will be preserved for the long term? For how long? Where will data be preserved? What data transformations need to occur before


5. Plans for archiving & preservation

From Flickr by theManWhoSurfedTooMuch

What metadata will be submitted alongside the datasets?

Who will be responsible for preparing data for preservation? Who will be the main contact person for the archived data?

Dont forget: Budget

Costs of data preparation & documentation Hardware, software Personnel Archive fees

How costs will be paid Request funding!

NSFs Vision*

DMPs and their evaluation will grow & change over time

Peer review will determine next steps

Community-driven guidelines

Evaluation will vary with directorate, division, & program ocer


Project name: Eects of temperature and salinity on population growth of the estuarine copepod, Eurytemora anis

Project participants and aliations: Carly Strasser (University of Alberta and Dalhousie University) Mark Lewis (University of Alberta) Claudio DiBacco (Dalhousie University and Bedford Institute of Oceanography)

Funding agency: CAISN (Canadian Aquatic Invasive Species Network) Description of project aims and purpose: We will rear populations of E. anis in the laboratory at three temperatures and three

salinities (9 treatments total). We will document the population from hatching to death, noting the proportion of individuals in each stage over time. The data collected will be used to parameterize population models of E. anis. We will build a model of population growth as a function of temperature and salinity. This will be useful for studies of invasive copepod populations in the Northeast Pacic.

Video Source: Plankton Copepods. Video. Encyclopdia Britannica Online. Web. 13 Jun. 2011

A DMP Example (1)

1. Information about data Every two days, we will subsample E. anis populations growing at our

treatment conditions. We will use a microscope to identify the stage and sex of the subsampled individuals. We will document the information rst in a laboratory notebook, then copy the data into an Excel spreadsheet. For quality control, values will be entered separately by two dierent people to ensure accuracy. The Excel spreadsheet will be saved as a comma-separated value (.csv) le daily and backed up to a server. After all data are collected, the Excel spreadsheet will be saved as a .csv le and imported into the program R for statistical analysis. Strasser will be responsible for all data management during and after data collection.

Our short-term data storage plan, which will be used during the experiment, will be to save copies of 1) the .txt metadata le and 2) the Excel spreadsheet as .csv les to an external drive, and to take the external drive o site nightly. We will use the Subversion version control system to update our data and metadata les daily on the University of Alberta Mathematics Department server. We will also have the laboratory notebook as a hard copy backup.

A DMP Example (2)

2. Metadata format & content We will rst document our metadata by taking careful notes in the laboratory

notebook that refer to specic data les and describe all columns, units, abbreviations, and missing value identiers. These notes will be transcribed into a .txt document that will be stored with the data le. After all of the data are collected, we will then use EML (Ecological Metadata Language) to digitize our metadata. EML is on of the accepted formats used in Ecology, and works well for the type of data we will be producing. We will create these metadata using Morpho software, available through the Knowledge Network for Biocomplexity (KNB). The documentation and metadata will describe the data les and the context of the measurements.

A DMP Example (3)

3. Policies for access, sharing & reuse We are required to share our data with the CAISN network after all data

have been collected and metadata have been generated. This should be no more than 6 months after the experiments are completed. In order to gain access to CAISN data, interested parties must contact the CAISN data manager ( or the authors and e