Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Preview:

Citation preview

Inverting the Pyramid:

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk@kevingashley

Kevin.ashley@ed.ac.uk

Reusable with attribution: CC-BY

The DCC is supported by Jisc

Maximising the value of research data to society

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 2

DCC networks and partnerships

Original Slide: Martin Donnelly, DCC

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 3

About me

• 35 years ago – a mathematician in medical research

• Acquired a skill for rescuing old data:

– Lost code books

– Lost programs

– Bad or obsolete media or systems

• It was fun – but it should not have been necessary

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 4

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 5

Generic science data lifecycle

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 6

Adapted from: Harnessing the Power of Digital Data: Taking the Next Step.‖Scientific Data Management (SDM) for Government Agencies:Report from the Workshop to Improve SDM.

PLAN COLLECT INTEGRATE/TRANSFORM

PUBLISH DISCOVER ARCHIVE/DISCARD

E-Science curation report - 2003

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 7

Herve L’Hour’s analysis

• Data lifecycles are linear, cyclical or spiral (sometimes all three)

• See more at http://www.dcc.ac.uk/events/research-data-management-forum-rdmf/rdmf11 - workflows & research data management

• Linear cycles are project-based or repository-based

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 10

Traditional knowledge management view of data

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 11

Image © John Curran @ designedforlearning.co.uk

Image from forwardmotion.eu

But in research…

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 12

"DIKW-diagram" by RobOnKnowledge - Own work. Licensed under

Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons -http://commons.wikimedia.org/wiki/File:DIKW-diagram.png#mediaviewer/File:DIKW-diagram.png

I ♥ your data!

I don’t ♥ what you said about it.

LIDAR & RADAR images of ice cloud –H. Ruschennberg

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY14

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY

15

The Old weather project

Data for research, not from research

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 16

Data reuse - messages

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 17

Often your data tells stories that your

publications do not

Not all data comes from other researchers

One person’s noise is another person’s signal

Discipline-bounded data discovery doesn’t give us

all we need or want

Understanding Biodiversity

• We don’t understand what drives it

• What helps, hinders speciation

• No one project or data source is enough

• Biology, geology, climate science, chemistry…

• Big and small problems

• Reanalysis & gap analysis

Research on Biodiversity…

• Requires many different data sources

• Not all will be published

• Not all publications are for similar research reasons, so…

• Citing the publication is irrelevant

• Some is research data, other government or reference data

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 20

Why care?

• Data is expensive – an investment

• Reuse:

– More research

– Teaching & Learning

– Planning

• Impact – with or without publication

• Accountability

• Legal & regulatory requirements

Why does this matter?

• Research quality– How close can we get to

the truth?

• Research speed– How quickly can we get

to the truth?

• Research finance– How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions

• Funders – hence government and society

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY

21

Creative data reuse

• http://vimeo.com/38402965

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 22

Integrity – not without data

• Cyril Burt– Twin studies on intelligence.– Questioned 1976; now discredited

• Duke case– Data hiding leads to wasted treatments, clinical

trials, probable death & huge lawsuits

• Dutch cases– Stapel – 55 publications – “fictitious data”– Poldermans – fabricated data or negligence?

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 23

“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256

Without data reuse:

•We can waste billions

•People suffer & die

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 24

Data reuse from Hubble

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 25

Data reuse is already happening – and researchers can change

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 26

Where can it happen

Global, international

Nationally

InstitutionBy Subject

Research Group

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY28

Research data centres are good value!

• See Jisc reports on ADS, BADC, UKDA:

• Returns on investment between 400% and 1200%

• Unfortunately – many research domains have no relevant data centres

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 29

http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx

“Provision for data management, for curation and long-term preservation, and for the sharing and re-use of data, varies wildly between subject areas.”

“The data management needs of many researchers are little considered or catered for.”

If greater provision is to be

made, a shortfall in

infrastructure (both technical

and human) must be

overcome.

Policy makers are aware that in many areas of enquiry, researchers’ access to well-managed, open and reusable data opens up significant opportunities.

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY30

All from JISC MRD 2 call, 2010

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY31

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY32

The library as custodian

• Increasing role for library to provide access to institutional assets

• See Lorcan Dempsey’s thoughts on the inside-out library vs outside-in library

– http://www.slideshare.net/lisld/the-inside-out-library

• Build on library strengths – preservation, access, curation, selection

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 33

G8UK - Endorses

OA

Open Data

Charter

Policy Paper

18 June 2013

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY34

Funder requirements

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 35

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx

UK - RCUK

Canada

UK - RCUK

USA – NSF, NEH, etcDenmark

USA – non-government funders (Sloan, Gates,…)

Europe

RCUK policy - The 1-minute version

• Research data are a public good – make openly available in timely & responsible way

• Have policies & plans. Data with long-term value should be preserved & usable

• Metadata for discovery & reuse. Link publications & data

• Sometimes law, ethics get in the way. We understand.

• Limited embargos OK. Recognition is important –always cite data sources

• OK to use public money to do this. Do it efficiently.

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 36

EPSRC policy points

• Awareness of regulatory environment

• Data access statement

• Policies and processes

• Data storage

• Structured metadata descriptions

• DOIs for data

• Securely preserved for a minimum of 10 years from last use

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY

Compliance expected by 2015

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY38

DCC Policy Summary

http://www.dcc.ac.uk/resources/policy-and-legal

Helping make data reuse possible –experience from the DCC

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 39

Some lessons – a summary• Data reuse is rarely as simple as people think it is• It is already happening• It is good for research, for researchers, for funders, for

universities• Without senior management attention and researcher

involvement, your initiative will fail• Research data management services cannot involve the

library alone• Researchers need to know your services exist• Training for young researchers in good data practice is

valuable

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 40

DCC ‘institutional engagement’Assess needs

Make the case

Develop support and

services

RDM policy development

Customised Data Management Plans

DAF & CARDIO assessments

Guidance and training

Workflow assessment

DCC support

team

Advocacy with senior management

Institutional data catalogues

Pilot RDM tools

…and support policy implementation2014-11-25

Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY

41

Original Slide: Graham Pryor, DCC

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 42

Some institutional roles

• Leadership – coordinate action• Audit – who has what, where does it go?• Advice on access – data, wherever it is• Preservation – permanence• Citability• Data/publication linking• Promoting data in teaching• Selection• Education – early career researchers

Who (in the UK) is leading RDM work?

Library

IT

Research

Office

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 43

RESEARCHERS

INSTITUTIONAL SERVICES

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 44

Some example services

• Storage – persistent, shareable

• Permanent, citeable identifiers

• Database as a service (e.g. Oxford ORDS)

• Embed tools in Excel – Dataup, others

• Workflow management – Taverna

• Training for early career researchers

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 45

Make data creation easier

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 46

Make data citable

• Making data available increases citations

• Everyone – academic, funder, institution –loves citations

• Want evidence?– Alter, Pienta, Lyle – 240%, social sciences *

– Piwowar, Vision – 9% (microarray data)†

– Henneken, Accomazzi – 20% (astronomy) #

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 47

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Make data discoverable

• Data must be discoverable to be reused

• Alone, or in conjunction with publication

• Services include:

– Institutional catalogues

– national data registries

– Repository registries – databib, re3data

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 48

Dataverse –helping

researchers make data findable & reusable

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 49

Gking.harvard.edu/data

DCC guidance

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 50

http://dataintelligence.3tu.nl/en/home/

Choice of RDM training

materials for librarians

Up-skilling

for data

http://datalib.edina.ac.uk/mantra/libtraining.html

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY51

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52

What data to keep

The Data Deluge is upon us

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 53

Sensor’s ability to produce data outstrips IT’s ability to process it

Roles and Responsibilities

What data to keep

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY54

IDCC15 – London, Feb 9-12 2015

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 55

http://www.dcc.ac.uk/events/idcc15

The 10th

International Digital Curation Conference

My message to researchers• The credit belongs to you

• The data belongs to all of us

• Share, and we all reap the benefits

• The story doesn’t end with a publication

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 56

Recommended