View
571
Download
2
Embed Size (px)
Citation preview
Logistics for Webinar
You must call in for audio: 866-740-1260 access code 9870179#
Participants mutedAsk questions in chat any time
20 minutes for Q&A
Recording & slides, schedule of webinars: blog.dmptool.org/webinar-series
DMPTool Webinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS13 August 2013
28 May Introduction to the DMPTool
4 June Learning about data management: Resources, tools, materials
18 June Customizing the DMPTool for your institution
25 June Environmental Scan: Who's important at your campus
9 July Promoting institutional services; EZID Outreach Made Simple!
16 July Health Sciences & DMPTool - Lisa Federer, UCLA
30 July Digital humanities and the DMPTool - Miriam Posner, UCLA
13 Aug Data curation profiles and the DMPTool – Jake Carlson, Purdue
27 Aug Talking points for meeting with institutional stakeholders
10 Sep Tools and resources that work with/complement the DMPTool
Beyond funder requirements: more extensive DMPs
Case studies 1 – How librarians have successfully used the tool
Case studies 2 – How librarians have successfully used the tool
Outreach Kit Introduction
Certification program introduction
blog.dmptool.org/webinar-series
Data Curation Profiles & the DMPTool
Jake Carlson Associate Professor of Library Science / Data
Services Specialist
Purdue University Libraries
DMPTool Webinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS13 August 2013
Road Map
• History / Background of the DCP Toolkit
• Comparing the DMP and the DCP
• Case Study in using the DCP
“Investigating Data Curation Profiles across Research Domains”
• Awarded in 2007 to Purdue Libraries and Graduate School of Library and Information Science at UIUC
• Goals of the project: – To understand the practices, attitudes and needs of
researchers in managing and sharing their data. – To Identify possible roles for librarians to facilitate data
sharing and curation.– To develop a tool for librarians to gather information on
researcher needs for their data.
Interview areas: 20 faculty, 12 disciplines
Agronomy & Soil Science (Purdue & UIUC),
Anthropology (UIUC), Biochemistry (Purdue),
Biology (Purdue), Civil Engineering (Purdue),
Earth & Atmospheric Sciences (Purdue & UIUC),
Electrical & Computer Engineering (Purdue),
Food Science (Purdue), Geology (UIUC),
Horticulture & Plant Science (Purdue & UIUC),
Kinesiology (UIUC), Speech and Hearing (UIUC)
What we asked …
• Research Data Lifecycle (story of the data)
• Characteristics of the Data• Data Management / Storage• Data Dissemination and Sharing• Data Preservation and
Repositories• Roles for Libraries and Librarians
The ability to cite this dataset in my publications
The ability for researchers within my discipline to easily find this dataset
The ability for researchers outside of my discipline to easily find this dataset
The ability for people to easily discover this dataset using Google
Prioritize your needs for the following types of services
Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA.
n=19
Prioritize your needs for the following types of services
The ability for me to submit this dataset to a repository myself
The process of submitting this dataset to a repository is automated
The ability to make these data accessible in multiple formats
The ability of the repository to provide version control for the data
Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA.
n=19
An interview based tool for gathering:
• Information about a particular data set.
• What a researcher is doing to manage / curate the data set.
• What a researcher would like to do with the data.
http://datacurationprofiles.org
DCP Sections
• Information about the Data and its Context–Overview of the Research
• Focus• Intended Audience• Funding
–Data Kinds and Stages• Data Narrative (data lifecycle)• Target Data for Sharing• Use/re-use Value• Contextual Narrative
Data Stage Output Typical File Size Format Other / NotesPrimary Data
RawSensor data
100k in 1 file per day
proprietary to the sensor
FTP downloads are mostly automated.
Processing Stage 1
Sensor data –open/accessible format Roughly 6kb .csv / .xls
Data are formatted into .csv before bring reformatted into a mySQL database.
ProcessedData vectors
800 records per intersection per day. SQL / .xls
Data are extracted from the mySQL database for analysis purposes.
Analyzedcharts/Graphs .xls / .emf
charts and graphs used for interpretation.
Publishedcharts/graphs .ppt
Data are presented via power point.
Ancillary Data
ImageStills taken from video
.gif /.jpg / .ppt
Images generated from video.
More DCP Sections
Information about Needs–Intellectual Property
–Organization and description of data
–Ingest–Access–Discovery
–Tools–Interoperability
–Measuring Impact
–Data Management
–Preservation
Context
• Focused on a specific context: developing a data management plan for submission to a funding agency.
• Focused on a broad context: understanding the researcher’s data and needs well enough to respond.
Timing
• For use in the “Planning Stages” of the Data Lifecycle
• For use in the “Active Data Stages” of the Data Lifecycle
“The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting Group.
Structure
• The DMP Tool’s structure is based on the specific elements of the agency’s data management plan.
• The DCP Toolkit is modular in nature. Questions and sections can be changed.
Level of Investment
• Generating a DMP using the DMP Tool is a short term investment.
• Generating a DCP is a longer term investment, but with a potentially large payoff.
Sharable Output
• Data management plans are intended to be submitted to a funding agency, not to be shared publicly.
• Data curation profiles are intended to be shared with others.
http://docs.lib.purdue.edu/dcp
• Both tools seek to help researchers identify and address needs in managing and curating data.
• In particular, both tools aim to foster the creation of data that are discoverable, accessible, well-described and usable by others.
“The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting Group.
• Both tools can be used to help librarians connect with researchers about their data.
• Both organizations recognize and support the roles of librarians in providing services to support the data lifecycle.
Case Study:Water Quality Field Station
with Marianne Bracke
Agricultural Sciences Information Specialist
Associate Professor of Library SciencePurdue University Libraries
The Water Quality Field Station
On a 991 acre farm facility northwest of Purdue opened in 1992.
Used to identify agricultural practices that minimize movement of AG chemicals into water supplies.
Informs the development of new and more ecologically-balanced technologies for crop production.
Graduate Students
Graduate students are on the front lines of data. Sharing data locally, between graduate students,
was challenging to do.
Project Steps
Utilize Data Curation Profiles to collect information about current data gathering, workflow and documentation.
Identify common issues and needs as observed in the Data Curation Profiles.
Produce a report with recommendations and possible approaches to addressing issues and needs
Identify
Assess
Analyze
Identify
6 interviews with Graduate Students conducted in summer of 2011.
Developed Data Curation Profiles from these interviews.
Reviewed DCPs for needs.
Analyze
There is a lack of clear and shared expectations on how data should be documented, described and organized.
Locally – variation of practice by individual by circumstance, previous training / experience, intended use, etc.
Discipline – there is a lack of standards specifically for Agronomy data.
Analyze
Data are not being generated or processed in ways that could facilitate sharing externally, or even locally at Purdue or within the lab.
Inheriting data from previous graduate students was common and potentially problematic.
Many graduate students who had received data reported some problems understanding or making use of the data.
Analyze
Graduate Students stated that they lack knowledge and skills of how they should document, describe, organize and manage their data.
These activities tend to be done in relative isolation from the lab, or even the advisor.
Physical lab notebooks are still the primary means of documentation / provenance.
Assess
DMP & DCP Connections
May uncover issues that merit further investigation through a DCP.
Uncovering data management issues could inform data management planning.
Another Case Study with DCPs
http://www.dlib.org/dlib/july13/wright/07wright.html
Thanks! Any Questions?
Jake Carlson Associate Professor of Library Science / Data
Services Specialist
Purdue University [email protected]
DMPTool Webinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS13 August 2013
blog.dmptool.org/webinar-series
From Flickr by Jeff Keacher
In 2 weeks: Talking Points for Meeting with StakeholdersPresenter: Dan Phipps
Tuesday 27 Aug @ 10am PT
Register now!
blog.dmptool.org/webinar-series/
TwitterBlog
[email protected] [email protected] @TheDMPToolblog.dmptool.orgFacebook.com/DMPTool
Questions?