24
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University of Southampton, U.K. [email protected]

© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

Embed Size (px)

Citation preview

Page 1: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Digital Repositories as a Mechanism for the Capture, Management and

Dissemination of Chemical Data

Simon Coles

School of Chemistry,

University of Southampton, U.K.

[email protected]

Page 2: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

A Data-Rich Subject – the Crystallography Problem

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

30,000,000

1.5,000,000

450,000

Page 3: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Funding Body Viewpoint

Page 4: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Open Access as the Answer?

Page 5: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Separating Data from Interpretations

Underlying data

Intellect & Interpretation

Page 6: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Data curation: databases & databanks

Validation

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Searching , harvesting, embedding

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Linking

The scholarly knowledge cycle.

Liz Lyon, eBankUK article. Ariadne, July 2003.

Page 7: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Workflow Capture and Analysis

RAW DATA DERIVED DATA RESULTS DATA

Page 8: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

The eCrystals Data Archive

http://ecrystals.chem.soton.ac.uk

Page 9: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Access to the underlying data

Page 10: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Metadata Publication

• Using simple Dublin Core • Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date

• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords

• Specifies which ‘datasets’ are present in an entry

• DOI

• Rights

• Citation

http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

Page 11: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Metadata and Data Quality Control Data manipulation toolbox

Associated Metadata

Value added

Format conversion

Page 12: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Harvesting & Aggregating: Google

Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k

Page 13: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Harvesting: OAIster

Page 14: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Linking and aggregating

Page 15: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Embedded in a science portal

Page 16: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

eBank/eCrystals Future

• Full embedding in daily laboratory practice• Roll out to other institutions• Full support from host institution• Community acceptance• Federation of repositories• Specialised aggregator services (Crystallography)• Generic aggregator services (Chemistry / Science)

Page 17: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

The Information Environment

Institutional Data Sources

Page 18: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Data and Information Loss

Page 19: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Repositories Supporting Laboratory Working Practice

• eBank-UK concentrating on dissemination of data compiled once a study is complete

• To fully assure quality and accuracy of metadata essential to capture as it is generated

• Repository architecture has the potential to store data and metadata as they are generated

• Repository also has capability to manage data and provide report generation and analysis tools

Page 20: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Laboratory Repositories

Page 21: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Workflow Analysis

Researcher, Compound, Experiment type, Timestamp

Sample preparation

Data acquisition

Deposit current dataset

Analyse: Refine experiment?

Complete experiment deposit

Page 22: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

The R4L Repository

Deposit

Search

Page 23: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

R4L Essentials

• Continual deposition and metadata capture from the very start of the experiment

• Prior Assertion Service – a legally sound protection of IPR

• Laboratory data management and analysis of heterogeneous datasets

• Production of reports – Individual experiment• Production of reports – Study involving several

experiments• Panel of publishers to direct requirements for

data publication

Page 24: © S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University

                                                             

© S.J. Coles 2006

Something to take home!

• Open access to data does not harm or hinder publication of ideas and interpretation in a conventional fashion

• Open access to data, when linked to a publication containing interpretations, enhances the value of the publication

• Open access to ALL data underpinning a publication enables efficient assessment and reuse of that data

• Essential to embed repository deposition into ALL aspects of (laboratory) working procedures