Upload
conner-patten
View
216
Download
1
Embed Size (px)
Citation preview
Martin DonnellyDigital Curation CentreUniversity of Edinburgh
What is research data and why manage it?
An introduction to the issues and drivers, benefits and funder requirements
University of Stirling25 March 2013
Running order
I. DEFINITIONSII. DRIVERSIII. RULES AND (IN)EQUATIONS
- Group Exercise (30 mins)
I. DEFINITIONS
- Digital Curation Centre, est. 2004- Three partners: Edinburgh, Glasgow and Bath- Primary funder is JISC
Helping to build capacity, capability and skills in data management and curation across the UK’s higher education research community
- DCC Phase 3 Business Plan
www.dcc.ac.uk
5
What Kinds of Data?
…whatever is produced in research or evidences its outputs
What is Research Data?
• Facts • Statistics• Qualitative • Quantitative• Unpublished
research outputs• Discipline
specific
6
“Data underpins our economy and our society - data about how much is being spent and where, data about how schools, hospitals and police are performing, data about where things are and data about the weather.”
Tim Berners Lee, director of W3C.
A Data Gift?
“the active management and appraisal of data over the lifecycle of scholarly and
scientific interest”
Data management is a part of good research practice
What is Research Data Management?
Data is (usually) central to the process
The six datacentric phases of the research lifecycle
http://www.flickr.com/photos/thinkmulejunk/352387473/
http://www.google.co.uk/imgres?q=illumina+bgi&hl=en&client=firefox-a&hs=Jl2&rls=org.mozilla:en-GB:official&biw=1366&bih
http://www.flickr.com/photos/wasp_barcode/4793484478/http://www.flickr.com/photos/charleswelch/3597432481//
http://www.flickr.com/photos/usfsregion5/4546851916//
Data...
Proliferation
Data...
Why manage HE research data?
- Research integrity (defend findings)- Research impact (linking data and
publication, making data citable)- Supports / enables reuse, which keeps
funders happy- Maximises value and increases ROI, which
keeps govt happy- Helps to meet regulatory requirements- Can control costs (via capacity planning etc)
Attitudes / approaches
- The term “research data” means different things to different people in HE
- Researchers may care enormously about their data, so much so that they worry about it going out into the world on its own
- Others (e.g. those with responsibility for compliance) may worry about it not going out into the world, or going out when it shouldn’t / underdressed
- Some may not recognise the relevance of ‘data’ in what they do…
“While many researchers are positive about sharing data inprinciple, they are almost universally reluctant in practice. ..... using these data to publish results before anyone else is theprimary way of gaining prestige in nearly all disciplines.” INCREMENTAL Project
“Data sharing was more readily discussed by early career researchers.”
Open to all? Case studies of openness in research
Choices are made according to context, with degrees of openness reached according to:• The kinds of data to be made available• The stage in the research process• The groups to whom data will be made available• On what terms and conditions it will be provided
Default position of most:• YES to protocols, software, analysis tools, methods
and techniques• NO to making research data content freely
available to everyone
Angus Whyte, RIN/NESTA, 2010
II. DRIVERS
“Surfing the Tsunami”
Science: 11 February 2011
The data deluge
• Public good• Preservation• Discovery• Confidentiality• First use• Recognition• Public funding
RCUK Policy and Code of Conduct on the Governance of Good Research ConductUnacceptable research conduct includes mismanagement or inadequate preservation of data and/or primary materials, including failure to:
– keep clear and accurate records of the research procedures followed and the results obtained, including interim results;
– hold records securely in paper or electronic form;
– make relevant primary data and research evidence accessible to others for reasonable periods after the completion of the research: data should normally be preserved and accessible for 10 yrs (in some cases 20 yrs or longer);
– manage data according to the research funder’s data policy and all relevant legislation;
– wherever possible, deposit data permanently within a national collection.
Responsibility for proper management and preservation of data and primary materials is shared between the
researcher and the research organisation.
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
April 2011 - EPSRC Letter to VCs
EPSRC expects all those institutions it funds:- to develop a roadmap that aligns their
policies and processes with EPSRC’s expectations by 1st May 2012
- to be fully compliant with these expectations by 1st May 2015
http://www.dcc.ac.uk/resources/policy-and-legal
Institutional Policies
JISC Legal
Data Access as Headline News
6.9 The Research Councils expect the researchers they fund to deposit published articles or conference proceedings in an open access repository at or around the time of publication. But this practice is unevenly enforced. Therefore, as an immediate step, we have asked the Research Councils to ensure the researchers they fund fulfil the current requirements. Additionally, the Research Councils have now agreed to invest £2 million in the development, by 2013, of a UK ‘Gateway to Research’. In the first instance this will allow ready access to Research Council funded research information and related data but it will be designed so that it can also include research funded by others in due course. The Research Councils will work with their partners and users to ensure information is presented in a readily reusable form, using common formats and open standards.
http://www.bis.gov.uk/assets/biscore/innovation/docs/i/11-1387-innovation-and-research-strategy-for-growth.pdf
Government pressure…
23
“We have opened up much public data already, but need to go much further in making this data accessible. We believe publicly funded research should be freely available. We have commissioned independent groups of academics and publishers to review the availability of published research, and to develop action plans for making this freely available”
Making Public Data Accessible
The Open Data Institute (ODI) will be the first of its kind, a pioneering centre of innovation, driven by the UK Government’s Open Data policy
24
Data for Impact
• Research Excellence Framework (REF) measures researcher contributions and their impact
• Has struggled in terms of its breadth when it comes to extending beyond paper-based metrics
• Wariness of researchers to spend time on activity that doesn’t count to the REF
• REF panels now allow submission of “a substantial, coherent and widely admired data set or research resource”
25
Data Citation
• Data access raises visibility
• Data with DOI = citeable research output
• Data citations are good for researchers
III. RULES AND (IN)EQUATIONS
STORAGE
≠ MANAGEMENT
Greenhouse = storage
Horticulture = management
DATA
MANAGEMENT
≠ SHARING
Rule 1. Don’t Share It All
But! You generally need a reason NOT to share, e.g. - Commercial interests- Ethical concerns- Data Protection Act
Various factors at play…
• Law(s) of the land(s) (FOI, DPA)• Government pressure• Funder policies (and expectations)• Publisher policies• Institutional policies• Disciplinary norms• Ethical considerations• Commercial interests / partnerships
Why not?
1. We probably can’t afford the costs of storage: increasing volumes outpace declining storage hardware costs
and
2. We probably can’t afford the time it will take to ensure it remains accessible/discoverable
Rule 2. Don’t Keep It All
According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos, http://www.emc.com/digital_universe
http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html
“Keeping 2018’s data in S3 would cost the entire global GDP”
How to decide?1. Relevance to Mission – including any legal/funder
requirement to retain the data beyond its immediate use.
2. Scientific or Historical Value – significance and relationship to publications etc.
3. Uniqueness – can it be found elsewhere / if we don’t preserve it, who will?
4. Potential for Redistribution – quality / IP / ethical concerns are addressed.
5. Non-Replicability – either impossible to replicate (e.g. atmospheric or social science data) or not financially viable.
6. Economic Case – costs of managing and preserving the resource stack up well against potential future benefits.
7. Full Documentation – surrounding / contextual information necessary to facilitate future discovery, access, and reuse is adequate.
How to Appraise & Select Research Data for Curation Angus Whyte, Digital Curation Centre, and Andrew Wilson, Australian National Data Service (2010)
All Together: Institutional Engagements
With funding from HEFCE we’re:
• Working intensively with c. 20 HEIs to increase RDM capability– 60 days of effort per HEI drawn from a mix of DCC staff– Deploy DCC and external tools, approaches and best practice
• Support varies based on what each institution wants/needs– Institution agrees a schedule of work with the DCC, and each assigns a
primary contact / programme manager
• Lessons and examples to be shared with the community
www.dcc.ac.uk/community/institutional-engagements
IE activities
Assessing needs
RDM roadmaps
Piloting tools
Policy development
Policy implementation
Data Management Planning: roles and responsibilities for data
across the research lifecycle
Group Exercise
Martin Donnelly and Jonathan RansDigital Curation CentreUniversity of Edinburgh
University of Stirling25 March 2013
Data and the Research Lifecycle
§1: Introduction and Context§2: Data Types, Formats, Standards and
Capture Methods§3: Ethics and Intellectual Property§4: Access, Data Sharing and Re-use§5: Short-Term Storage and Data
Management§6: Deposit and Long-Term Preservation§7: Resourcing§8: Adherence and Review§9: Agreement/Ratification by
Stakeholders§10: Annexes
DMP Checklist Headings
Checklist for a Data Management Plan (Donnelly and Jones)
Group exercise (20 minutes)
In groups of 4 or 5:
• Select one of the DMP Checklist headings, and brainstorm all the stakeholders you think might be involved (and how/why) – be specific!
• Remember to think of different stages of research: pre-award, in-project, post-project
• We’ll have a short reporting/discussion session at the end
SECTIONS
§1: Introduction and Context§2: Data Types, Formats,
Standards and Capture Methods
§3: Ethics and Intellectual Property
§4: Access, Data Sharing and Re-use
§5: Short-Term Storage and Data Management
§6: Deposit and Long-Term Preservation
§7: Resourcing§8: Adherence and Review§9: Agreement/Ratification by
Stakeholders§10: Annexes
N.B. There are no ‘right’ or ‘wrong’ answers All research projects are different The DMP will depend upon the nature of the research
AND the context (funder, domain, institution(s) etc) DMPs are metadata and communication tools
Notes
QUESTIONS AND CONTACTSFor more information:
– Visit http://www.dcc.ac.uk – Email [email protected]– Twitter @mkdDCC
This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License.
CREDITSImages:
Slide 3 (Definitions) – http://www.flickr.com/photos/dougbelshaw/ Slide 11 (Feet up) – http://www.flickr.com/photos/chaparral/ Slide 14 (Driver) – http://www.flickr.com/photos/rpmarks/ Slide 26 (Equations) – http://www.flickr.com/photos/billburris/ Slide 28 (Greenhouse) – http://www.flickr.com/photos/mykl/
Thanks also to DCC colleagues for their slides:Kevin Ashley, Liz Lyon, Graham Pryor, Sarah Jones, Marieke Guy