43
Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits and funder requirements University of Stirling 25 March 2013

Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Embed Size (px)

Citation preview

Page 1: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Martin DonnellyDigital Curation CentreUniversity of Edinburgh

What is research data and why manage it?

An introduction to the issues and drivers, benefits and funder requirements

University of Stirling25 March 2013

Page 2: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Running order

I. DEFINITIONSII. DRIVERSIII. RULES AND (IN)EQUATIONS

- Group Exercise (30 mins)

Page 3: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

I. DEFINITIONS

Page 4: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

- Digital Curation Centre, est. 2004- Three partners: Edinburgh, Glasgow and Bath- Primary funder is JISC

Helping to build capacity, capability and skills in data management and curation across the UK’s higher education research community

- DCC Phase 3 Business Plan

www.dcc.ac.uk

Page 5: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

                                                             

5

What Kinds of Data?

…whatever is produced in research or evidences its outputs

What is Research Data?

• Facts • Statistics• Qualitative • Quantitative• Unpublished

research outputs• Discipline

specific

Page 6: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

                                                             

6

“Data underpins our economy and our society - data about how much is being spent and where, data about how schools, hospitals and police are performing, data about where things are and data about the weather.”

Tim Berners Lee, director of W3C.

A Data Gift?

Page 7: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

“the active management and appraisal of data over the lifecycle of scholarly and

scientific interest”

Data management is a part of good research practice

What is Research Data Management?

Page 8: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Data is (usually) central to the process

The six datacentric phases of the research lifecycle

Page 9: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

http://www.flickr.com/photos/thinkmulejunk/352387473/

http://www.google.co.uk/imgres?q=illumina+bgi&hl=en&client=firefox-a&hs=Jl2&rls=org.mozilla:en-GB:official&biw=1366&bih

http://www.flickr.com/photos/wasp_barcode/4793484478/http://www.flickr.com/photos/charleswelch/3597432481//

http://www.flickr.com/photos/usfsregion5/4546851916//

Data...

Proliferation

Data...

Page 10: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Why manage HE research data?

- Research integrity (defend findings)- Research impact (linking data and

publication, making data citable)- Supports / enables reuse, which keeps

funders happy- Maximises value and increases ROI, which

keeps govt happy- Helps to meet regulatory requirements- Can control costs (via capacity planning etc)

Page 11: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Attitudes / approaches

- The term “research data” means different things to different people in HE

- Researchers may care enormously about their data, so much so that they worry about it going out into the world on its own

- Others (e.g. those with responsibility for compliance) may worry about it not going out into the world, or going out when it shouldn’t / underdressed

- Some may not recognise the relevance of ‘data’ in what they do…

Page 12: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

“While many researchers are positive about sharing data inprinciple, they are almost universally reluctant in practice. ..... using these data to publish results before anyone else is theprimary way of gaining prestige in nearly all disciplines.” INCREMENTAL Project

“Data sharing was more readily discussed by early career researchers.”

Page 13: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Open to all? Case studies of openness in research

Choices are made according to context, with degrees of openness reached according to:• The kinds of data to be made available• The stage in the research process• The groups to whom data will be made available• On what terms and conditions it will be provided

Default position of most:• YES to protocols, software, analysis tools, methods

and techniques• NO to making research data content freely

available to everyone

Angus Whyte, RIN/NESTA, 2010

Page 14: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

II. DRIVERS

Page 15: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

“Surfing the Tsunami”

Science: 11 February 2011

The data deluge

Page 16: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

• Public good• Preservation• Discovery• Confidentiality• First use• Recognition• Public funding

Page 17: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

RCUK Policy and Code of Conduct on the Governance of Good Research ConductUnacceptable research conduct includes mismanagement or inadequate preservation of data and/or primary materials, including failure to:

– keep clear and accurate records of the research procedures followed and the results obtained, including interim results;

– hold records securely in paper or electronic form;

– make relevant primary data and research evidence accessible to others for reasonable periods after the completion of the research: data should normally be preserved and accessible for 10 yrs (in some cases 20 yrs or longer);

– manage data according to the research funder’s data policy and all relevant legislation;

– wherever possible, deposit data permanently within a national collection.

Responsibility for proper management and preservation of data and primary materials is shared between the

researcher and the research organisation.

Page 18: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits
Page 19: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx

April 2011 - EPSRC Letter to VCs

EPSRC expects all those institutions it funds:- to develop a roadmap that aligns their

policies and processes with EPSRC’s expectations by 1st May 2012

- to be fully compliant with these expectations by 1st May 2015

Page 20: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

http://www.dcc.ac.uk/resources/policy-and-legal

Institutional Policies

Page 21: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

JISC Legal

Data Access as Headline News

Page 22: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

6.9 The Research Councils expect the researchers they fund to deposit published articles or conference proceedings in an open access repository at or around the time of publication. But this practice is unevenly enforced. Therefore, as an immediate step, we have asked the Research Councils to ensure the researchers they fund fulfil the current requirements. Additionally, the Research Councils have now agreed to invest £2 million in the development, by 2013, of a UK ‘Gateway to Research’. In the first instance this will allow ready access to Research Council funded research information and related data but it will be designed so that it can also include research funded by others in due course. The Research Councils will work with their partners and users to ensure information is presented in a readily reusable form, using common formats and open standards.

http://www.bis.gov.uk/assets/biscore/innovation/docs/i/11-1387-innovation-and-research-strategy-for-growth.pdf

Government pressure…

Page 23: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

                                                             

23

“We have opened up much public data already, but need to go much further in making this data accessible. We believe publicly funded research should be freely available. We have commissioned independent groups of academics and publishers to review the availability of published research, and to develop action plans for making this freely available”

Making Public Data Accessible

The Open Data Institute (ODI) will be the first of its kind, a pioneering centre of innovation, driven by the UK Government’s Open Data policy

Page 24: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

                                                             

24

Data for Impact

• Research Excellence Framework (REF) measures researcher contributions and their impact

• Has struggled in terms of its breadth when it comes to extending beyond paper-based metrics

• Wariness of researchers to spend time on activity that doesn’t count to the REF

• REF panels now allow submission of “a substantial, coherent and widely admired data set or research resource”

Page 25: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

                                                             

25

Data Citation

• Data access raises visibility

• Data with DOI = citeable research output

• Data citations are good for researchers

Page 26: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

III. RULES AND (IN)EQUATIONS

Page 27: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

STORAGE

≠ MANAGEMENT

Page 28: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Greenhouse = storage

Horticulture = management

DATA

Page 29: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

MANAGEMENT

≠ SHARING

Page 30: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Rule 1. Don’t Share It All

But! You generally need a reason NOT to share, e.g. - Commercial interests- Ethical concerns- Data Protection Act

Page 31: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Various factors at play…

• Law(s) of the land(s) (FOI, DPA)• Government pressure• Funder policies (and expectations)• Publisher policies• Institutional policies• Disciplinary norms• Ethical considerations• Commercial interests / partnerships

Page 32: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Why not?

1. We probably can’t afford the costs of storage: increasing volumes outpace declining storage hardware costs

and

2. We probably can’t afford the time it will take to ensure it remains accessible/discoverable

Rule 2. Don’t Keep It All

According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos, http://www.emc.com/digital_universe

Page 33: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html

“Keeping 2018’s data in S3 would cost the entire global GDP”

Page 34: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

How to decide?1. Relevance to Mission – including any legal/funder

requirement to retain the data beyond its immediate use.

2. Scientific or Historical Value – significance and relationship to publications etc.

3. Uniqueness – can it be found elsewhere / if we don’t preserve it, who will?

4. Potential for Redistribution – quality / IP / ethical concerns are addressed.

5. Non-Replicability – either impossible to replicate (e.g. atmospheric or social science data) or not financially viable.

6. Economic Case – costs of managing and preserving the resource stack up well against potential future benefits.

7. Full Documentation – surrounding / contextual information necessary to facilitate future discovery, access, and reuse is adequate.

How to Appraise & Select Research Data for Curation Angus Whyte, Digital Curation Centre, and Andrew Wilson, Australian National Data Service (2010)

Page 35: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

All Together: Institutional Engagements

With funding from HEFCE we’re:

• Working intensively with c. 20 HEIs to increase RDM capability– 60 days of effort per HEI drawn from a mix of DCC staff– Deploy DCC and external tools, approaches and best practice

• Support varies based on what each institution wants/needs– Institution agrees a schedule of work with the DCC, and each assigns a

primary contact / programme manager

• Lessons and examples to be shared with the community

www.dcc.ac.uk/community/institutional-engagements

Page 36: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

IE activities

Assessing needs

RDM roadmaps

Piloting tools

Policy development

Policy implementation

Page 37: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Data Management Planning: roles and responsibilities for data

across the research lifecycle

Group Exercise

Martin Donnelly and Jonathan RansDigital Curation CentreUniversity of Edinburgh

University of Stirling25 March 2013

Page 38: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Data and the Research Lifecycle

Page 39: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

§1: Introduction and Context§2: Data Types, Formats, Standards and

Capture Methods§3: Ethics and Intellectual Property§4: Access, Data Sharing and Re-use§5: Short-Term Storage and Data

Management§6: Deposit and Long-Term Preservation§7: Resourcing§8: Adherence and Review§9: Agreement/Ratification by

Stakeholders§10: Annexes

DMP Checklist Headings

Checklist for a Data Management Plan (Donnelly and Jones)

Page 40: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

Group exercise (20 minutes)

In groups of 4 or 5:

• Select one of the DMP Checklist headings, and brainstorm all the stakeholders you think might be involved (and how/why) – be specific!

• Remember to think of different stages of research: pre-award, in-project, post-project

• We’ll have a short reporting/discussion session at the end

SECTIONS

§1: Introduction and Context§2: Data Types, Formats,

Standards and Capture Methods

§3: Ethics and Intellectual Property

§4: Access, Data Sharing and Re-use

§5: Short-Term Storage and Data Management

§6: Deposit and Long-Term Preservation

§7: Resourcing§8: Adherence and Review§9: Agreement/Ratification by

Stakeholders§10: Annexes

Page 41: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

N.B. There are no ‘right’ or ‘wrong’ answers All research projects are different The DMP will depend upon the nature of the research

AND the context (funder, domain, institution(s) etc) DMPs are metadata and communication tools

Notes

Page 42: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

QUESTIONS AND CONTACTSFor more information:

– Visit http://www.dcc.ac.uk – Email [email protected]– Twitter @mkdDCC

This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License.

Page 43: Martin Donnelly Digital Curation Centre University of Edinburgh What is research data and why manage it? An introduction to the issues and drivers, benefits

CREDITSImages:

Slide 3 (Definitions) – http://www.flickr.com/photos/dougbelshaw/ Slide 11 (Feet up) – http://www.flickr.com/photos/chaparral/ Slide 14 (Driver) – http://www.flickr.com/photos/rpmarks/ Slide 26 (Equations) – http://www.flickr.com/photos/billburris/ Slide 28 (Greenhouse) – http://www.flickr.com/photos/mykl/

Thanks also to DCC colleagues for their slides:Kevin Ashley, Liz Lyon, Graham Pryor, Sarah Jones, Marieke Guy