Upload
garey-cain
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
DATA MANAGEMENT PLANSWHAT YOU NEED TO KNOW21 April 2015
CONTENTSNIH + NSF Data Sharing Policies
What is a Data Management Plan
Accountability
Data Products, Format, and Metadata
Storage, Sharing, and Metadata
Budgeting for Data Management
Resources
NIH POLICY ON DATA SHARING
Applies to the sharing of final research data* for research purposes.
Applies to basic research, clinical studies, surveys, and other types of research supported by NIH and to research that involves human subjects and laboratory research that does not involve human subjects.
Applies to applicants seeking $500,000 or more in direct costs in any year of the proposed project period through grants, cooperative agreements, or contracts.
Applies to research applications submitted beginning October 1, 2003.*Final Research Data - Recorded factual material commonly accepted in the scientific community as necessary to document and support research findings. This does not mean summary statistics or tables. It means the data on which summary statistics and tables are based. For the purposes of this policy, final research data do not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory specimens.
NSF POLICY ON DATA SHARING “Investigators are expected to share with other researchers, at no more
than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants.”
“Grantees are expected to encourage and facilitate such sharing.”
Data refers to any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc. Such data may be generated by various means including observation, computation, or experiment
Applies to research applications submitted on or after January 18, 2011.
HOW NSF AND NIH CONCEPTUALISE DATA SHARING
DIGITAL DATA COLLECTIONS Research data collections
Products of one or a few focused research projects
Resource or community data collections Serve a specific research community Typically fall between research and reference data collections in size, scale,
funding, community of users, and duration Conform to community standards
Reference data collections Serve large segments of the research and education communities Conform to robust and comprehensive standards
WHAT IS A DATA MANAGEMENT PLAN (DMP)? An opportunity for PIs to articulate how they will conform to the FEDERAL
data sharing policy for research results.
The DMP is reviewed as an integral part of the proposal, coming under ‘Intellectual Merit’ or ‘Broader Impacts’ or both, as appropriate for the scientific community of relevance.
Data management requirements and plans may change across specific Directorates, Offices, Divisions, Programs, or other NSF/NIH units.
A DMP CONTAINS The types of data, samples, physical collections, software, curriculum materials, publications, and other materials to be produced in the course of the project;
The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
Policies and provisions for re-use, re-distribution, and the production of derivatives; and
Plans for archiving data, samples, and other research products, and for preservation of access to them.
Element Description ? NSF MappingData description A description of the information to be gathered; the nature and scale of the data that will be generated or collected. Yes Expected Data
Existing data A survey of existing data relevant to the project and a discussion of whether and how these data will be integrated. Yes Expected Data
Format Formats in which the data will be generated, maintained, and made available, including a justification for the procedural and archival appropriateness of those formats.
Yes Data Format and Dissemination
Metadata A description of the metadata to be provided along with the generated data, and a discussion of the metadata standards used. Yes Data Format and Dissemination
Storage and backup Storage methods and backup procedures for the data, including the physical and cyber resources and facilities that will be used for the effective preservation and storage of the research data.
Yes Data Storage and Preservation of Access
Security A description of technical and procedural protections for information, including confidential information, and how permissions, restrictions, and embargoes will be enforced.
Yes Data Format and Dissemination
Responsibility Names of the individuals responsible for data management in the research project. Yes Roles and Responsibility
Intellectual property rights Entities or persons who will hold the intellectual property rights to the data, and how IP will be protected if necessary. Any copyright constraints (e.g., copyrighted data collection instruments) should be noted.
Yes Data Format and Dissemination
Access and sharing A description of how data will be shared, including access procedures, embargo periods, technical mechanisms for dissemination and whether access will be open or granted only to specific user groups. A timeframe for data sharing and publishing should also be provided.
Yes Data Storage and Preservation of Access
Audience The potential secondary users of the data. Yes Data Format and Dissemination
Selection and retention periods A description of how data will be selected for archiving, how long the data will be held, and plans for eventual transition or termination of the data collection in the future.
Yes Period of Data Retention
Archiving and preservation The procedures in place or envisioned for long-term archiving and preservation of the data, including succession plans for the data should the expected archiving entity go out of existence.
Yes Data Storage and Preservation of Access
Ethics and privacy A discussion of how informed consent will be handled and how privacy will be protected, including any exceptional arrangements that might be needed to protect participant confidentiality, and other ethical issues that may arise.
Yes Data Format and Dissemination
Budget The costs of preparing data and documentation for archiving and how these costs will be paid. Requests for funding may be included.
Data organization How the data will be managed during the project, with information about version control, naming conventions, etc.
Quality Assurance Procedures for ensuring data quality during the project.
Legal requirements A listing of all relevant federal or funder requirements for data management and data sharing.
ACCOUNTABILITY
MANAGING DATA Explains how the responsibilities regarding the management of your data will
be delegated. Time allocations Project management of technical aspects Training requirements Contributions of non-project staff - individuals should be named where
possible(custodians of the repository/archive you choose to store your data
ROLES AND RESPONSIBILITIES Outlines the staff/organizational roles and responsibilities for implementing
this data management plan. Who will be responsible for data management and for monitoring the data
management plan? How will adherence to this data management plan be checked or demonstrated? What process is in place for transferring responsibility for the data? Who will have responsibility over time for decisions about the data once the
original personnel are no longer available?
ETHICAL AND LEGAL COMPLIANCE
Is the data regulated by policy or law?
Are there legal constraints (e.g., HIPAA) on sharing data?
How will you handle informed consent with respect to communicating to respondents that the information they provide will remain confidential when data are shared or made available for secondary analysis? Determine constraints if classified data, specific handling requirements, IRB/human
subject research If yes, how will you comply with these constraints?
Write your compliance plan point by point
If applicable, how will you manage disclosure risk in the data to be shared and archived?
Is there intellectual property (e.g., patent, copyright) rights on the datasets? Determine restrictions and conditions to share and disseminate Does someone else own the data? What are their conditions for use, sharing, and
dissemination?
INTERNATIONAL DMPS Determine DMPs as established by any international research consortia or
set forth in formal science and technology agreements signed by the United States Government and foreign counterparts.
This should be addressed with any international research partners when first planning a collaboration.
Talk to the Program Officer for additional assistance.
DATA PRODUCTS, FORMAT, AND METADATA
DATA PRODUCTS Inputs and outputs (existing, intermediary, and final datasets)
Existing data and sources you are using (Digital and physical collections)
Quantitative Social and Economic Data Sets Numeric data sets, geospatial data, spatio-temporal data
Qualitative Information Microfilms, historical documents, oral interviews, video tapes, hand written records,
transcripts, tables, figures, flowcharts, 3D models, digital audio
Experimental Research Tabulated data
Mathematical and Computer Models May include descriptions in published articles or fully documented and robust
versions of these models
DATA PRODUCTS AND STANDARDS
Determine formats and estimated size, and if it will be shared Formats: RTF text, MS Excel converted to CSV, MATLAB, PNG (images), WAV audio,
MPEG video, shapefile, as well as any instrument-specific formats or software Size/amount: Rate produced, e.g., 1 TB/year, 50GB/experiment
Metadata should be machine readable for better re-usability and processing.
HINT: Sketching a diagram of data workflow helps to identify datasets and issues re their management.
DEFINING YOUR DATA Give a short description of what "data" will mean in your research
What data will be generated in the research?
What data types will you be creating or capturing?
How will you capture or create the data?
If you will be using existing data, state that fact and include where you got it.
What is the relationship between the data you are collecting and the existing data?
What data will be preserved and shared?
METADATA “Data about data”
Typical functions Discovery tool Rights management Version identification Certify authenticity Status indicator Defines content structure Interoperability Situates geospatially Process descriptions Access and transfer
Objectives Domains Architecture
ObjectivesPrinciples
DisciplineGenreFormat
StructureExtentGranularity
DATA AND METADATA STANDARDS What details (metadata) are necessary for others to use your data?
List standards for formats or metadata for your datasets.
Document why you selected them
Describe the method by which metadata will be generated.
Document naming conventions/schema for your data.
List the data dictionaries/taxonomies/ontologies you will use for your data.
Describe how you will track versions of the datasets.
List and describe the tools that are necessary to use the datasets.
COMMON METADATA STANDARDS OAIS, Open Archival Information System
CSDGM, Content Standard for Digital Geospatial Metadata
ICPSR, Inter-university Consortium for Political and Social Research
DDI, Data Documentation Initiative** best practices: data life cycle and longitudinal data
SDMX, Statistics Data and Metadata Exchange
XML, Extensible Markup Language
CITING YOUR DATASET Citation is the preferred form of acknowledgement
Should include a doi to establish authouritative data source or a PURL (Persistent Uniform Resource Location)
Citation: Involuntary Commitment Data, public use dataset [restricted use data, if appropriate]. Produced and distributed by the PSRDC, College of Behavioral and Community Sciences, University of South Florida (year data were downloaded). URL
Acknowledgement: The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and the National Science Foundation under award number 1157698.
STORAGE, SHARING, & ARCHIVING
DATA STORAGE DURING PROJECT Document which of the digital or non-digital datasets listed will NOT be
stored or retained during the project.
Document the type of media and the location(s) where the data will be stored and who is responsible.
Document how and where the data will be backed up and who is responsible.
Document any access controls for data and/or data transfers that need to be secured and how these controls will be applied.
DATA SHARING POST PROJECT Indicate which datasets used or generated will be shared
Indicate which any datasets are in proprietary formats and if they will be converted to a non-proprietary format for sharing.
Determine the audience who will use the datasets.
Determine acknowledgement protocol
Determine sharing protocols: open access or release upon request.
Account for any delay in the accessibility of your data after your research is done.
Explain details of any embargo periods.
Determine how long will data be kept beyond the life of the project
Will a third-party service be used to archive or release data?
Set a release date to share the data.
Describe any restrictions on use, sharing, repurposing, etc. of datasets
Include costs of any additional resources (3rd party services, etc.) in budget.
METHODS FOR DATA SHARING Under the auspices of the PI
Data archive: A place where machine-readable data are acquired, manipulated, documented, and finally distributed to the scientific community for further analysis.
Data enclave: A controlled, secure environment in which eligible researchers can perform analyses using restricted data* resources.
Mixed mode sharing.
**Restricted Data - datasets that cannot be distributed to the general public, because of, for example, participant confidentiality concerns, third-party licensing or use agreements, or national security considerations.
ARCHIVING Builds upon storage by taking additional steps toward preserving digital
files.
Safeguards data against file corruption of storage media.
Includes updating from obsolete formats.
Often includes enhanced discovery and access of datasets.
Includes a preservation strategy and disaster recovery plan.
Often handled by an third-party archiving service or data repository.
Check university guidelines.
Include deposit fees in budget.
EXAMPLES OF DATA SHARING PLANSExample 1 The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.
Example 2The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed.
Example 3This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts at https://ssl.isr.umich.edu/hrs/. User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction
http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm
IMPORTANT POINTS It is acceptable to state in the DMP that the project is not anticipated to
generate data or samples that require management and/or sharing. PIs should note that the statement will be subject to peer review.
If data you generate is owned by your institution, the data access plan must address the institutional strategy for providing access to relevant data and supporting materials.
Open-access publishing is not addressed in the implementation of the data management plan requirement.
BUDGETING FOR DATA MANAGEMENT
Expenses Documenting Preparing Publishing Disseminating Sharing research findings and supporting material
Data sharing and archiving
NOTE: If the data have been collected already, a competitive or administrative supplement may be available.
• Reports
• Reprints
• Page charges or other journal costs • Does not cover costs for prior or early publication
• Illustrations
• Cleanup
• Documentation
• Storage and indexing of data and databases
• Development, documentation and debugging of software
• Storage, preservation, documentation, indexing, etc., of physical specimens, collections or fabricated items.
Types of Activities Covered
IN CONCLUSION
ESSENTIAL RESOURCES DMPTool (Argonne Laboratories)
NIH Data Sharing Policy and Implementation Guidance 8.2 Availability of Research Results
NSF NSF Data Sharing Policy NSF Data Management Plan Requirements NSF Social, Behavioral and Economic (SBE) Directorate-wide Guidance
ICPSR Effective Data Management
Databib Registry of Research Data Repositories
DataONE Best Practices