29
(Linked) Data Curation challenges Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc

(Linked) Data Curation challenges Kevin Ashley Director, Digital Curation Centre [email protected] Reusable with attribution: CC-BY The

Embed Size (px)

Citation preview

(Linked) Data Curation challenges

Kevin AshleyDirector, Digital Curation Centre

[email protected]

Reusable with attribution: CC-BYThe DCC is supported by Jisc

2

Acknowledgements

• John Wilkins & Cameron Neylon• Ideas, images, slides, inspiration

2013-07-05 Kevin Ashley – CC-BY

3

Data views and processes

• Administration• Discovery• Work-level description• Discipline-level interpretation

2013-07-05 Kevin Ashley – CC-BY

4

Administrative view

2013-07-05 Kevin Ashley – CC-BY

Data from projects funded by NERC

Data produced by the department of linguistics

5

Discovery view

2013-07-05 Kevin Ashley – CC-BY

Data about reproductive behaviour in freshwater fish

6

Work-level description

2013-07-05 Kevin Ashley – CC-BY

72013-07-05 Kevin Ashley – CC-BY

Kevin Ashley – CC-BY 82013-07-05

9

Data is variable

• Not always textual• Not always tabular• Not always fixed• Not always clearly authored – think of archival

provenance• Not always associated with publication

2013-07-05 Kevin Ashley – CC-BY

Kevin Ashley – CC-BY 10http://www.flickr.com/photos/sethw/113073189/

95% of research results are never published

2013-07-05

Kevin Ashley – CC-BY 11http://flickr.com/photos/heymans/480396810/

If a million postdocs repeat a million experiments…

2013-07-05

Kevin Ashley – CC-BY 12http://flickr.com/photos/cliche/120070310/

And 25% of those don’t work…

2013-07-05

Kevin Ashley – CC-BY 13

…how much taxpayer’s money is that?

http://flickr.com/photos/luismimunoznajar/2093185804/2013-07-05

Kevin Ashley – CC-BY 142013-07-05

I need that data now!!! I don’t care how messy it is – I

can fix it!

I’ve wasted too much of my life fixing other’s people’s bad

data. I’m not interested until you’ve cleaned it up and

documented it. Besides, I have other things to think about

15

Grandfather’s axe

2013-07-05 Kevin Ashley – CC-BY

[email protected] CC-BY-NC-SA

When is my dataset a new dataset?

16

Authorship

• Reference data – cell-level provenance versus single author data table

• ‘Cleaned’ data – can pass through many hands• Synthesis…

2013-07-05 Kevin Ashley – CC-BY

Kevin Ashley – CC-BY 172013-07-05

Kevin Ashley – CC-BY 182013-07-05

19

Potential wins

• Provenance of machine-gathered data – linking observations to instrument descriptions

• Linking data in multiple places• Data and publications and plans• Robust assertions about data versioning• Association of data with institutions

2013-07-05 Kevin Ashley – CC-BY

Kevin Ashley – CC-BY 20

networks of people…2013-07-05

Kevin Ashley – CC-BY 212013-07-05

22

More wins

• Assertions at table and variable group level• Linking that crosses disciplinary boundaries:– Biochemistry and neuroscience– Naval history, economics and climate science

• Linking that crosses research and administrative boundaries

2013-07-05 Kevin Ashley – CC-BY

23

IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases

2013-07-05 Kevin Ashley – CC-BY

After John WIlbanks

24

Tylenol

2013-07-05 Kevin Ashley – CC-BY

N-acetyl-p-aminophenolAcetaminophen

ParacetamolSameAsN-(4-hydroxyphenyl)ethanamideN-(4-hydroxyphenyl)acetamide

25

“I never had an idea that couldn’t be improved by sharing it with as

many people as possible…”

Bill Hooker (2006)http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html

2013-07-05 Kevin Ashley – CC-BY

Kevin Ashley – CC-BY 26

IdeaDevelo

p

Fund

PlanRecor

d

Process

Publish

Read

2013-07-05

Kevin Ashley – CC-BY 27

IdeaDevelo

p

Fund

PlanRecor

d

Process

Publish

Read

2013-07-05

Kevin Ashley – CC-BY 28

IdeaDevelo

p

Fund

PlanRecor

d

Process

Publish

Read

2013-07-05

29

Challenge? Opportunity

• Linked data can improve administration of research and research data

• The real potential is in improving research quality and efficiency

• The same actors can’t do both• The actions don’t need to be in lock-step

2013-07-05 Kevin Ashley – CC-BY