33
DATA MANAGEMENT PLANS WHAT YOU NEED TO KNOW 21 April 2015

21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

Embed Size (px)

Citation preview

Page 1: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DATA MANAGEMENT PLANSWHAT YOU NEED TO KNOW21 April 2015

Page 2: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

CONTENTSNIH + NSF Data Sharing Policies

What is a Data Management Plan

Accountability

Data Products, Format, and Metadata

Storage, Sharing, and Metadata

Budgeting for Data Management

Resources

Page 3: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

NIH POLICY ON DATA SHARING

Applies to the sharing of final research data* for research purposes.

Applies to basic research, clinical studies, surveys, and other types of research supported by NIH and to research that involves human subjects and laboratory research that does not involve human subjects.

Applies to applicants seeking $500,000 or more in direct costs in any year of the proposed project period through grants, cooperative agreements, or contracts.

Applies to research applications submitted beginning October 1, 2003.*Final Research Data - Recorded factual material commonly accepted in the scientific community as necessary to document and support research findings. This does not mean summary statistics or tables. It means the data on which summary statistics and tables are based. For the purposes of this policy, final research data do not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory specimens.

Page 4: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

NSF POLICY ON DATA SHARING “Investigators are expected to share with other researchers, at no more

than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants.”

“Grantees are expected to encourage and facilitate such sharing.”

Data refers to any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc. Such data may be generated by various means including observation, computation, or experiment

Applies to research applications submitted on or after January 18, 2011.

Page 5: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

HOW NSF AND NIH CONCEPTUALISE DATA SHARING

Page 6: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DIGITAL DATA COLLECTIONS Research data collections

Products of one or a few focused research projects

Resource or community data collections Serve a specific research community Typically fall between research and reference data collections in size, scale,

funding, community of users, and duration Conform to community standards

Reference data collections Serve large segments of the research and education communities Conform to robust and comprehensive standards

Page 7: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

WHAT IS A DATA MANAGEMENT PLAN (DMP)? An opportunity for PIs to articulate how they will conform to the FEDERAL

data sharing policy for research results.

The DMP is reviewed as an integral part of the proposal, coming under ‘Intellectual Merit’ or ‘Broader Impacts’ or both, as appropriate for the scientific community of relevance.

Data management requirements and plans may change across specific Directorates, Offices, Divisions, Programs, or other NSF/NIH units.

Page 8: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

A DMP CONTAINS The types of data, samples, physical collections, software, curriculum materials, publications, and other materials to be produced in the course of the project;

The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);

Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;

Policies and provisions for re-use, re-distribution, and the production of derivatives; and

Plans for archiving data, samples, and other research products, and for preservation of access to them.

Page 9: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

Element Description ? NSF MappingData description A description of the information to be gathered; the nature and scale of the data that will be generated or collected. Yes Expected Data

Existing data A survey of existing data relevant to the project and a discussion of whether and how these data will be integrated. Yes Expected Data

Format Formats in which the data will be generated, maintained, and made available, including a justification for the procedural and archival appropriateness of those formats.

Yes Data Format and Dissemination

Metadata A description of the metadata to be provided along with the generated data, and a discussion of the metadata standards used. Yes Data Format and Dissemination

Storage and backup Storage methods and backup procedures for the data, including the physical and cyber resources and facilities that will be used for the effective preservation and storage of the research data.

Yes Data Storage and Preservation of Access

Security A description of technical and procedural protections for information, including confidential information, and how permissions, restrictions, and embargoes will be enforced.

Yes Data Format and Dissemination

Responsibility Names of the individuals responsible for data management in the research project. Yes Roles and Responsibility

Intellectual property rights Entities or persons who will hold the intellectual property rights to the data, and how IP will be protected if necessary. Any copyright constraints (e.g., copyrighted data collection instruments) should be noted.

Yes Data Format and Dissemination

Access and sharing A description of how data will be shared, including access procedures, embargo periods, technical mechanisms for dissemination and whether access will be open or granted only to specific user groups. A timeframe for data sharing and publishing should also be provided.

Yes Data Storage and Preservation of Access

Audience The potential secondary users of the data. Yes Data Format and Dissemination

Selection and retention periods A description of how data will be selected for archiving, how long the data will be held, and plans for eventual transition or termination of the data collection in the future.

Yes Period of Data Retention

Archiving and preservation The procedures in place or envisioned for long-term archiving and preservation of the data, including succession plans for the data should the expected archiving entity go out of existence.

Yes Data Storage and Preservation of Access

Ethics and privacy A discussion of how informed consent will be handled and how privacy will be protected, including any exceptional arrangements that might be needed to protect participant confidentiality, and other ethical issues that may arise.

Yes Data Format and Dissemination

Budget The costs of preparing data and documentation for archiving and how these costs will be paid. Requests for funding may be included.

Data organization How the data will be managed during the project, with information about version control, naming conventions, etc.

Quality Assurance Procedures for ensuring data quality during the project.

Legal requirements A listing of all relevant federal or funder requirements for data management and data sharing.

Page 10: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

ACCOUNTABILITY

Page 11: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

MANAGING DATA Explains how the responsibilities regarding the management of your data will

be delegated. Time allocations Project management of technical aspects Training requirements Contributions of non-project staff - individuals should be named where

possible(custodians of the repository/archive you choose to store your data

Page 12: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

ROLES AND RESPONSIBILITIES Outlines the staff/organizational roles and responsibilities for implementing

this data management plan. Who will be responsible for data management and for monitoring the data

management plan? How will adherence to this data management plan be checked or demonstrated? What process is in place for transferring responsibility for the data? Who will have responsibility over time for decisions about the data once the

original personnel are no longer available?

Page 13: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

ETHICAL AND LEGAL COMPLIANCE

Is the data regulated by policy or law?

Are there legal constraints (e.g., HIPAA) on sharing data?

How will you handle informed consent with respect to communicating to respondents that the information they provide will remain confidential when data are shared or made available for secondary analysis? Determine constraints if classified data, specific handling requirements, IRB/human

subject research If yes, how will you comply with these constraints?

Write your compliance plan point by point

If applicable, how will you manage disclosure risk in the data to be shared and archived?

Is there intellectual property (e.g., patent, copyright) rights on the datasets? Determine restrictions and conditions to share and disseminate Does someone else own the data? What are their conditions for use, sharing, and

dissemination?

Page 14: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

INTERNATIONAL DMPS Determine DMPs as established by any international research consortia or

set forth in formal science and technology agreements signed by the United States Government and foreign counterparts.

This should be addressed with any international research partners when first planning a collaboration.

Talk to the Program Officer for additional assistance.

Page 15: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DATA PRODUCTS, FORMAT, AND METADATA

Page 16: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DATA PRODUCTS Inputs and outputs (existing, intermediary, and final datasets)

Existing data and sources you are using (Digital and physical collections)

Quantitative Social and Economic Data Sets Numeric data sets, geospatial data, spatio-temporal data

Qualitative Information Microfilms, historical documents, oral interviews, video tapes, hand written records,

transcripts, tables, figures, flowcharts, 3D models, digital audio

Experimental Research  Tabulated data 

Mathematical and Computer Models May include descriptions in published articles or fully documented and robust

versions of these models

Page 17: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DATA PRODUCTS AND STANDARDS

Determine formats and estimated size, and if it will be shared Formats: RTF text, MS Excel converted to CSV, MATLAB, PNG (images), WAV audio,

MPEG video, shapefile, as well as any instrument-specific formats or software Size/amount: Rate produced, e.g., 1 TB/year, 50GB/experiment

Metadata should be machine readable for better re-usability and processing.

HINT: Sketching a diagram of data workflow helps to identify datasets and issues re their management.

Page 18: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DEFINING YOUR DATA Give a short description of what "data" will mean in your research

What data will be generated in the research?

What data types will you be creating or capturing?

How will you capture or create the data?

If you will be using existing data, state that fact and include where you got it.

What is the relationship between the data you are collecting and the existing data?

What data will be preserved and shared?

Page 19: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

METADATA “Data about data”

Typical functions Discovery tool Rights management Version identification Certify authenticity Status indicator Defines content structure Interoperability Situates geospatially Process descriptions Access and transfer

Objectives Domains Architecture

ObjectivesPrinciples

DisciplineGenreFormat

StructureExtentGranularity

Page 20: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DATA AND METADATA STANDARDS What details (metadata) are necessary for others to use your data?

List standards for formats or metadata for your datasets.

Document why you selected them

Describe the method by which metadata will be generated.

Document naming conventions/schema for your data.

List the data dictionaries/taxonomies/ontologies you will use for your data.

Describe how you will track versions of the datasets.

List and describe the tools that are necessary to use the datasets.

Page 21: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

COMMON METADATA STANDARDS OAIS, Open Archival Information System

CSDGM, Content Standard for Digital Geospatial Metadata

ICPSR, Inter-university Consortium for Political and Social Research

DDI, Data Documentation Initiative** best practices:  data life cycle and longitudinal data

SDMX, Statistics Data and Metadata Exchange

XML, Extensible Markup Language

Page 22: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

CITING YOUR DATASET Citation is the preferred form of acknowledgement

Should include a doi to establish authouritative data source or a PURL (Persistent Uniform Resource Location)

Citation: Involuntary Commitment Data, public use dataset [restricted use data, if appropriate]. Produced and distributed by the PSRDC, College of Behavioral and Community Sciences, University of South Florida (year data were downloaded).  URL

Acknowledgement: The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and the National Science Foundation under award number 1157698. 

Page 23: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

STORAGE, SHARING, & ARCHIVING

Page 24: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DATA STORAGE DURING PROJECT Document which of the digital or non-digital datasets listed will NOT be

stored or retained during the project.

Document the type of media and the location(s) where the data will be stored and who is responsible.

Document how and where the data will be backed up and who is responsible.

Document any access controls for data and/or data transfers that need to be secured and how these controls will be applied.

Page 25: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

DATA SHARING POST PROJECT Indicate which datasets used or generated will be shared

Indicate which any datasets are in proprietary formats and if they will be converted to a non-proprietary format for sharing.

Determine the audience who will use the datasets.

Determine acknowledgement protocol

Determine sharing protocols: open access or release upon request.

Account for any delay in the accessibility of your data after your research is done.

Explain details of any embargo periods.

Determine how long will data be kept beyond the life of the project

Will a third-party service be used to archive or release data?

Set a release date to share the data.

Describe any restrictions on use, sharing, repurposing, etc. of datasets

Include costs of any additional resources (3rd party services, etc.) in budget.

Page 26: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

METHODS FOR DATA SHARING Under the auspices of the PI

Data archive: A place where machine-readable data are acquired, manipulated, documented, and finally distributed to the scientific community for further analysis.

Data enclave: A controlled, secure environment in which eligible researchers can perform analyses using restricted data* resources.

Mixed mode sharing.

**Restricted Data - datasets that cannot be distributed to the general public, because of, for example, participant confidentiality concerns, third-party licensing or use agreements, or national security considerations.

Page 27: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

ARCHIVING Builds upon storage by taking additional steps toward preserving digital

files.

Safeguards data against file corruption of storage media.

Includes updating from obsolete formats.

Often includes enhanced discovery and access of datasets.

Includes a preservation strategy and disaster recovery plan.

Often handled by an third-party archiving service or data repository.

Check university guidelines.

Include deposit fees in budget.

Page 28: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

EXAMPLES OF DATA SHARING PLANSExample 1 The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.

Example 2The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed.

Example 3This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts at https://ssl.isr.umich.edu/hrs/. User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction

http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm

Page 29: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

IMPORTANT POINTS It is acceptable to state in the DMP that the project is not anticipated to

generate data or samples that require management and/or sharing. PIs should note that the statement will be subject to peer review.

If data you generate is owned by your institution, the data access plan must address the institutional strategy for providing access to relevant data and supporting materials.

Open-access publishing is not addressed in the implementation of the data management plan requirement.

Page 30: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

BUDGETING FOR DATA MANAGEMENT

Page 31: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

Expenses Documenting Preparing Publishing Disseminating Sharing research findings and supporting material 

Data sharing and archiving

NOTE: If the data have been collected already, a competitive or administrative supplement may be available.

• Reports

• Reprints

• Page charges or other journal costs • Does not cover costs for prior or early publication

• Illustrations

• Cleanup

• Documentation

• Storage and indexing of data and databases

• Development, documentation and debugging of software

• Storage, preservation, documentation, indexing, etc., of physical specimens, collections or fabricated items.

Types of Activities Covered

Page 32: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

IN CONCLUSION

Page 33: 21 April 2015. NIH + NSF Data Sharing Policies What is a Data Management Plan Accountability Data Products, Format, and Metadata Storage, Sharing, and

ESSENTIAL RESOURCES DMPTool (Argonne Laboratories)

NIH Data Sharing Policy and Implementation Guidance 8.2 Availability of Research Results

NSF NSF Data Sharing Policy NSF Data Management Plan Requirements NSF Social, Behavioral and Economic (SBE) Directorate-wide Guidance

ICPSR Effective Data Management

Databib Registry of Research Data Repositories

DataONE Best Practices