29
Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. [email protected]

Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. [email protected]

Embed Size (px)

Citation preview

Page 1: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Chemistry research data in the modern age:

A clear need for curation expertise

Simon Coles

School of Chemistry,

University of Southampton, U.K.

[email protected]

Page 2: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Data Generation

SynthesisData Collection

Data Workup

Data Processing

Publication

Page 3: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Data Types

G bytes

M bytes

Lab / Institution

Subject Repository / Data Centre / Public Domain

k bytes

RAW data

DERIVED data

RESULTS data

Page 4: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Incentives and Drivers

Chemists don’t think about their data!

They need to understand that their data is valuable and has a use beyond that of an immediate gain, before they will consider curation issues.

So what are the incentives and drivers?– Data Management– Data Deluge– Publishing Data– Validation, Assessment and Peer Review– Re-analysing Data– Data Reuse and Derivative Studies– Publishing and Funding Mandates

Page 5: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Incentives - Data Management, Deluge & Publishing

“Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant”

“Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits”

“To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data”

“Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.”

‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

Page 6: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Incentives - Data Management, Deluge & Publishing

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

30,000,000

2,000,000

450,000

Page 7: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Incentives - Data Management, Deluge & Publishing

Page 8: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Separating Data from Interpretations Underlying data

(Institutional data repository)

Intellect & Interpretation

(Journal article, report,

etc)

Page 9: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

The eCrystals Data Repository

An Institutional Repository

http://ecrystals.chem.soton.ac.uk

Page 10: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

The Repository for the Laboratory

Search / Browse

Deposit

Create new compound

Add experiment data and metadata

Page 11: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Incentives - Validation & Peer Review

Page 12: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Incentives - Raw Data Re-analysis

Good data Difficult data

You never know when data might have to be revisited or new innovations will allow re-interpretation!

Page 13: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Incentives - Funding and/or publishing mandates

• Mandates to store / make data available

• RCUK statement

Page 14: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Incentives - Derivative Science

• Starting points for new science• Derivation of knowledgebases

Page 15: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Issues

• Need to engage stakeholders throughout the whole research data lifecycle:

– Instrument manufacturers, – scientists, – archivists, – librarians, – subject repositories, – data centres, – publishers, – funders, – data miners & information providers

Page 16: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Issues

• File formats, complexity and specialisation • Data corruption and bit rot• Quantity of data

Page 17: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Issues

• File formats, complexity and specialisation • Data corruption and bit rot• Quantity of data

– Future proofing…– Technology developments– eScience

Page 18: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Issues

• File formats, complexity and specialisation • Data corruption and bit rot• Quantity of data• Catering for a whole community

CreateDeposit

Link

Curate Preserve

Standards

Scientist

Funder

Collaborate Share

User

Discover Re-use

eCrystals Federation Data Deposit Model

Link

Link

Scientist

Policy AdvocacyTraining

HarvestIR Federation

Publishers

Data centres / aggregator

servicesAdvisory

Page 19: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Issues

• File formats, complexity and specialisation • Data corruption and bit rot• Quantity of data• Catering for a whole community• What data is worth storing?

– Estimated that the real cost of a crystal structure is £75 - £100 ($200)– But what about the cost of ‘producing’ the crystal?– Priceless!– The crystal was synthesised in a specialised laboratory, by highly trained

researchers under a specific research program– A laboratory, researcher or scheme of work is a transient or evolving entity – As much data as possible must be acquired and future-proofed whilst the

analyst has the substance to hand

Page 20: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Issues

• File formats, complexity and specialisation • Data corruption and bit rot• Quantity of data• Catering for a whole community• What data is worth storing?• Provenance, workflow and rights protection

Page 21: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Curation Issues

• File formats, complexity and specialisation • Data corruption and bit rot• Quantity of data• Catering for a whole community• What data is worth storing?• Provenance, workflow and protection of rights• Available expertise, library/information services structure • Cost and policy• Business models

– Subject librarian model - working closely with practitioners

– New funding/structure models to support open data as OA takes off

– Working group to assess the volume and diversity of research data

– JISC funded survey - ‘Cost of preserving research data’

– Commercialisation of knowledge derived from collections of data

Page 22: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Dealing with Data Report, June 2007 Recommendations 1

• JISC should develop a Data Audit Framework to enable all Universities & colleges to carry out an audit of departmental data collections, awareness, policies & practice…

• Each Higher Education Institution should implement an Institutional Data Management, Preservation & Sharing Policy, which recommends data deposit in an appropriate open access data repository and/or data centre where these exist.

Page 23: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Institutional Structure

• Encourage restructuring through strategic funding

• Rechannel existing funding routes• Financial structure – money for self

archive or OA publishing• Physical structure – embed LIS/curation

staff in departments for advocacy – need to go native.

• Library / Information services need to be introspective / reinvent

Page 24: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Advocacy

• Younger ‘digital’ generation• Elders will not listen• Method to engage at departmental level• Funders undervaluing work – need

enlightening

Page 25: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Funding

• Small science• Low budget / funding• Hypo publishing• Unsupported

• Initial target areas that are safe – i.e. no sensitive data

Page 26: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Practice

• Small science vs big science• Instrumentation vs manual• Automate data capture• Heterogeneity/variety in practice• Problems same in industry

Page 27: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Tools

• Seamless• Simple to use• Low barrier to use• Integrated into familiar environment• Self describing (generrate provenance and preservation

metadata in the background)• Tagging / controlled vocab tools / servers• Vocab checking• Browser tools (familiar to youth)• Thin client tools – repository lite. Minimal management.

Highly distributed repositories

Page 28: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

eInfrastructure

• Semantic / controlled vocabulary central services

Page 29: Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk

                                                             

Economic models and value

• Data *NOT* valueless once published (EPSRC train of thought)

• What is the *value* of departmental level data – this is not necessarily monetary

• Department, institution, individual, data centre, pharma, government, research council, public, third party services/businesses

• We undervalue data• Subject repository economic sustainability• Evidence to back up advocacy